PHP XML analysis function

xiaoxiao2021-03-06  80

PHP XML analysis function

Source: Chinaasp first I have to admit that I like computer standards. If everyone follows the standards in this industry, the Internet will be a better medium. The use of standardized data exchange formats can make open and independent platform-based computing modes. This is the reason I as an XML enthusiast. Fortunately, my favorite scripting language not only supports XML and supports its support. PHP allows me to quickly publish the XML document to the Internet, collect statistics for the XML document, and convert the XML document into other formats. For example, I often use PHP XML processing capabilities to manage the articles and books written by XML. In this article, I will discuss any EXPAT parser with PHP to process the XML document. In an example, I will demonstrate the processing method of Expat. At the same time, the example can tell you how to: establish your own handler to convert the XML document into your own PHP data structure to introduce the resolver of the EXPAT XML, which is also called an XML processor, allowing the program to access the structure and content of the XML document. EXPAT is an XML parser for a PHP scripting language. It also uses in other items, such as Mozilla, Apache, and Perl. What is an event-based parser? Two basic types of XML parsers: Tree-based parser: Convert XML documents to a tree structure. Such parsers analyze the entire article while providing an API to access each element of the generated tree. Its generic standard is DOM (document object mode). Event-based parser: treat XML documents as a series of events. When a special event occurs, the parser handles the function provided by the developer. Event-based parser has an XML document data set, that is, it focuses on the data part of the XML document, not its structure. These parsers process documents from head to tail, and will similar to the beginning of the element, the end of the element, the start of feature data, etc. - Event Reports to the application via the callback function. The following is an "Hello-World" XML document example: Hello World Event-based parser will report as three events: Start Element: Greeting CDATA item, the value is: Hello World End Element : Greeting is not like a tree-based parser, and event-based parsers do not produce structures that describe documents. In CDATA, event-based parsers do not allow you to get information about the parent element Greeting. However, it provides a lower-level access, which makes it better to use resources and access faster. In this way, there is no need to put the entire document into memory; in fact, the entire document can even be greater than the actual memory value. EXPAT is such an event-based parser. Of course, if necessary, it can generate a complete native tree structure in the PHP if necessary. Examples of Hello-World above include a complete XML format. But it is invalid because there is neither DTD (document type definition), and there is no embedded DTD. For EXPAT, this is not distinguished: EXPAT is a parser that does not check the validity, so ignores any DTD contact with the document. However, it should be noted that the document still needs a complete format, otherwise expat (as other parsers that meet the XML standard) will stop with the error information. As a parser, EXAPT, the rapidness and lightness of Exapt, so that it is very suitable for Internet programs. Compile EXPAT EXPAT can compile in PHP3.0.6 (or more). Starting from Apache1.3.9, Expat has been part of Apache.

In the UNIX system, configure PHP through the -with-xml option, you can compile them into PHP. If you compile PHP to Apache, the expat will be part of Apache by default. In Windows, you must load an XML dynamic connection library. XML Example: An approach to XMLstats understands the function of Expat is to pass an example. The example we have to discuss is to use EXPAT to collect statistics for XML documents. For each element in the document, the following information will be output: the element used by the number of character data in the document Note: For demonstration, we use PHP to generate a structure. The parent element and sub-elements of the preservation element are prepared to generate a function of the XML parser instance as XML_Parser_Create (). This instance will be used for all functions. This idea is very similar to the connection tag of the MySQL function in PHP. Event-based parsers are usually required to register the callback function before parsing the document - calls when specific events occur. Expat no exceptions, it defines the following seven possible events: the beginning and end of the character objects XML parsing function describes the elements xml_set_element_handler () element data xml_set_character_data_handler () to start an external entity xml_set_external_entity_ref_handler character data () external entities appear External unparsed entity xml_set_unparsed_entity_decl_handler () Unconcerned external entity appearance processing command XML_SET_PROCESSING_INSTRUCTION_HANDLER () Processing Declaration Declaration XML_SET_NOTATION_DECL_HANDLER () Declaration Default XML_SET_DEFAULT_HANDLER () Other Events No Events All Tune Functions All callback functions must be used as an instance of the parser The first parameter (there are other parameters). For the final example script of this article. What you need to pay attention to is that it uses both the element processing function and the character data processing function. The callback process function of the element is registered by XML_SET_ELEMENT_HANDLER (). This function requires three parameters: The name of the callback function of the parser's instance processing start element The name of the callback function of the end element must exist when the XML document is started. They must be defined as the same as the prototypes described in the PHP manual. For example, EXPAT passes three parameters to the process function of the start element. In the script example, it is defined as follows: Function start_element ($ PARSER, $ NAME, $ atTRS) The first parameter is the parser, the second parameter is the name of the start element, the third parameter is all attributes and values ​​including elements. Array. Once you start parsing the XML document, Expat will call your Start_Element () function and pass the parameters in the past. XML's Case Folding options are turned off with XML_Parser_Set_Option () function. This option is open by default, so that the element name passed to the processing function is automatically converted to uppercase. But XML is sensitive to uppercase (so case in cases are very important to statistical XML documents). For our example, the CASE Folding option must be turned off.

After completing all preparations, the script is now able to parse the XML document: XML_PARSE_FROM_FILE (), a custom function, open the file specified in the parameter, and parse XML_PARSE () and XML_PARSE_FROM_FILE () in the size of 4KB When an error occurs, the FALSE will be returned when the XML document is in full. You can use the XML_GET_ERROR_CODE () function to get the last error number code. Pass this numeric code to the XML_ERROR_STRING () function to get the wrong text information. Output XML's current number of rows, making debugging easier. Call the callback function during the resolution process. Description Document Structure When parsing a document, it is necessary to emphasize the problem for Exppat: How to maintain the basic description of the document structure? As mentioned earlier, an event-based parser itself does not produce any structural information. However, the tag (TAG) structure is an important feature of XML. For example, elemental sequence means it means unlike <Figure> <title>. In other words, any author will tell you that the title and picture name are indispensable, although they all use the term "title". Therefore, in order to use an event-based parser to process XML more efficiently, you must use your own stacks or list (LISTS) to maintain the structure information of the document. In order to generate a mirror of a document structure, the script needs to know the parent element of the current element. The API with Exapt is unable to report only the current elements, without any information on the front-rear relationship. Therefore, you need to build your own stack structure. The script example uses the stack structure of advanced subsequent (Filo). Through an array, the stack will save all the start elements. For start element processing functions, the current element will be pushed to the top of the stack by the array_push () function. Correspondingly, the end element processing function removes the top element by array_pop (). For sequence <book> <title> </ title> </ book>, the stack is filled as follows: Start Element Book: Assign "Book" to the first element ($ stack [0]). Start Elements Title: Assign "Title" to the top of the stack ($ Stack [1]). End Elements Title: The top element is removed from the stack ($ Stack [1]). End Elements Title: From the stack to remove the top element ($ Stack [0]). PHP3.0 manually controls the nested nested nested nested by a $ depth variable. This makes the script look complicated. PHP4.0 makes scripts more concise via Array_POP () and array_push (). Collecting data To collect information of each element, the script needs to remember the event of each element. Save all different elements in the document by using a global array variable $ Elements. Array items are an instance of element classes, 4 attributes (class variables) $ count - The number of times the element is discovered in the document $ Chars - Elements Character event bytes $ PARENTS - Parent Element $ Childs - Son Elements are as seen, saving class instances in arrays is light. Note: One feature of PHP is that you can traverse the entire class structure through while (list () = each ()) loop, just like you traverse the entire array. All class variables (when you use php3.0, however, however, the method is output in a string. When an element is found, we need to increase its corresponding record to track how many times it appears in the document. In the corresponding $ Elements item, you should add one.</p> <p>We also want to let the parent elements know the current element is its child elements. Therefore, the name of the current element will be added to the $ childs array of the parent element. Finally, the current element should remember who is its parent element. Therefore, the parent element is added to the current element $ PARENTS array project. Display the remaining code of the statistics in the $ Elements array and the statistical results are displayed in their sub-arguments. This is the simplest nesting cycle, although the correct result is output, but the code is neither concise and no special skills, it is just a loop you might use in him every day. The script example is designed to call the command line of the PHP's CGI mode. Therefore, the format of the statistical result is the text format. If you want to use the script to the internet, you need to modify the output function to generate an HTML format. Summary ExApt is a PHP XML parser. As an event-based parser, it does not produce the structure description of the document. However, by providing underlying access, this makes it possible to better utilize resources and access faster. As a parser that does not check the validity, Exppat ignores the DTD connected to the XML document, but if the format of the document is incomplete, it will stop with the error information. Provide an event handler to process a document to establish your own event structure, such as stacks and trees to get the advantages of the XML structure information tag. There are new XML programs every day, and PHP support has also been continuously supported (for example, adding DOM-based XML parser libXML). With PHP and EXPAT, you can prepare for the upcoming valid, open and independent standards.</p> <p>Example <? / ******************************************************* ******************************* * Name: XML resolution example: XML document information statistics * Description * This example passed PHP The Expat parser collects and counts information for XML documents (for example, the number of each element, the parent element and child elements * XML files are used as a parameter ./xmlstats_php4.php3 test.xml * $ Requires: EXPAT requirements: expat php4.0 compile For CGI mode ***************************************************** *************************************** / / The first parameter is XML file $ file = $ argv [1]; / / Variable initialization $ Elements = $ stack = array (); $ total_erements = $ total_chars = 0; // Element's basic class class element {var $ count = 0; var $ chars = 0; var $ parents = array () VAR $ childs = array ();} // parses the function of the XML file Function XML_PARSE_FROM_FILE ($ PARSER, $ file) {if (! file_exists ($ file)) {Die ("can't find file /" $ file / ".");} if (! ($ fp = @fopen ($ file, "r"))) {Die ("Can't open file /" $ file / ".");} while ($ data = Fread ($ FP, 4096)) {if (! XML_PARS ($ PARSER, $ DATA, FeOf ($ FP))) {return (false);}} fclose ($ fp); return (true);} // output Result function (box form) Function Print_box ($ TITLE, $ VALUE) {Printf ("/ n % '- 60s / n", ""); Printf ("|% 20s", "$ TITLE:"); printf ("% 14 s ", $ value); Printf ("% 26S | / N "," "); Printf (" % '- 60s / n "," ");} // output result function (line form) Function Print_LINE $ TITLE, $ VALUE) {Printf ("% 20s", "$ TITLE:"); Printf ("% 15s / n", $ value);} // Sort Function Function My_Sort ($ A, $ B) {Return (is_Object ($ a) && is_Object ($ b) $ b-> count - $ a-> count: 0);} Function Start_Element ($ Parser, $ Name, $ attrs) {Global $ Elements, $ stack; Does the element have been in the overall $ elements array? IF ($ Elements [$ Name])) {// No -</p> <p>Add a class instance for an element $ element = new element; $ elements [$ name] = $ ELEMENT;} // This element is added to a $ Elements [$ name] -> count ; // Is there a parent element? IF ($ stack [count ($ stack) -1])) {// is - assigning a parent element to $ last_element $ last_element = $ stack [count ($ stack) -1]; // If the current element The parent element is empty, initialized to 0 if (! Isset ($ Elements [$ Name] -> PARENTS [$ last_element]) {$ Elements [$ Name] -> PARENTS [$ last_element] = 0;} // Element Parent Element Remeasure Add $ Elements [$ Name] -> PARENTS [$ LAST_ELEMENT] ; // If the child element of the parent element of the element is empty, it is initialized to 0 if (! ISset ($ Elements) [$ last_ELEMENT] -> Childs [$ last_element] -> Childs [$ name] = 0;} The child of the parent element of the element adds a $ elements [$ ELECT Last_Element] -> Childs [$ name] ;} // Add current elements to array_push ($ stack, $ name);} Function Stop_Element ($ PARSER, $ Name) {global $ stack; // The top elements are removed from the top elements to Array_POP;} Function char_data ($ PARSER, $ DATA) {Global $ Elements, $ stack, $ depth; // Add current Element Number of characters $ Elements [$ stack] [COUNT ($ stack) -1] -> Chars = Strlen (Trim ($ data));} // generates an instance of the parser $ PARSER = XML_PARSER_CREATE (); // Set the processing function XML_set_element_handler ($ Parser, " START_ELEMENT "," STOP_ELEMENT "); XML_SET_CHARACTER_DATA_HANDLER ($ Parser," Char_Data "); XML_Parser_Set_Option ($ Parser, XML_Opti ON_CASE_FOLDING, 0); // Resolution file $ RET = XML_PARSE_FROM_FILE ($ PARSER, $ file); if (! $ RET) {DIE (Sprintf ("XML Error:% S at line% D", XML_ERROR_STRING (XML_GET_ERROR_CODE ($ Parser) ))), XML_GET_CURRENT_LINE_NUMBER ($ PARSER))));} // Release the parser XML_Parser_Free ($ PARSER); // Release help element unset ($ Elements ["current_element"]; unset ($ Elements ["Last_Element"]); / / Sort UASORT according to the number of elements ($ Elements, "My_SORT"); // Cycle the element information in $ Elements WHILE (List ($ Name, $ Element) =</p></div><div class="text-center mt-3 text-grey"> 转载请注明原文地址:https://www.9cbs.com/read-104909.html</div><div class="plugin d-flex justify-content-center mt-3"></div><hr><div class="row"><div class="col-lg-12 text-muted mt-2"><i class="icon-tags mr-2"></i><span class="badge border border-secondary mr-2"><h2 class="h6 mb-0 small"><a class="text-secondary" href="tag-2.html">9cbs</a></h2></span></div></div></div></div><div class="card card-postlist border-white shadow"><div class="card-body"><div class="card-title"><div class="d-flex justify-content-between"><div><b>New Post</b>(<span class="posts">0</span>) </div><div></div></div></div><ul class="postlist list-unstyled"> </ul></div></div><div class="d-none threadlist"><input type="checkbox" name="modtid" value="104909" checked /></div></div></div></div></div><footer class="text-muted small bg-dark py-4 mt-3" id="footer"><div class="container"><div class="row"><div class="col">CopyRight © 2020 All Rights Reserved </div><div class="col text-right">Processed: <b>0.037</b>, SQL: <b>9</b></div></div></div></footer><script src="./lang/en-us/lang.js?2.2.0"></script><script src="view/js/jquery.min.js?2.2.0"></script><script src="view/js/popper.min.js?2.2.0"></script><script src="view/js/bootstrap.min.js?2.2.0"></script><script src="view/js/xiuno.js?2.2.0"></script><script src="view/js/bootstrap-plugin.js?2.2.0"></script><script src="view/js/async.min.js?2.2.0"></script><script src="view/js/form.js?2.2.0"></script><script> var debug = DEBUG = 0; var url_rewrite_on = 1; var url_path = './'; var forumarr = {"1":"Tech"}; var fid = 1; var uid = 0; var gid = 0; xn.options.water_image_url = 'view/img/water-small.png'; </script><script src="view/js/wellcms.js?2.2.0"></script><a class="scroll-to-top rounded" href="javascript:void(0);"><i class="icon-angle-up"></i></a><a class="scroll-to-bottom rounded" href="javascript:void(0);" style="display: inline;"><i class="icon-angle-down"></i></a></body></html><script> var forum_url = 'list-1.html'; var safe_token = 'el3lNEwIfMvu3FYkYydTIgDtUJdlgWvlDz5eyppMl9Efc6vtNL_2BDYbnyKpGLHxRYvbKb4PSBJoF0CISC8WNbtw_3D_3D'; var body = $('body'); body.on('submit', '#form', function() { var jthis = $(this); var jsubmit = jthis.find('#submit'); jthis.reset(); jsubmit.button('loading'); var postdata = jthis.serializeObject(); $.xpost(jthis.attr('action'), postdata, function(code, message) { if(code == 0) { location.reload(); } else { $.alert(message); jsubmit.button('reset'); } }); return false; }); function resize_image() { var jmessagelist = $('div.message'); var first_width = jmessagelist.width(); jmessagelist.each(function() { var jdiv = $(this); var maxwidth = jdiv.attr('isfirst') ? first_width : jdiv.width(); var jmessage_width = Math.min(jdiv.width(), maxwidth); jdiv.find('img, embed, iframe, video').each(function() { var jimg = $(this); var img_width = this.org_width; var img_height = this.org_height; if(!img_width) { var img_width = jimg.attr('width'); var img_height = jimg.attr('height'); this.org_width = img_width; this.org_height = img_height; } if(img_width > jmessage_width) { if(this.tagName == 'IMG') { jimg.width(jmessage_width); jimg.css('height', 'auto'); jimg.css('cursor', 'pointer'); jimg.on('click', function() { }); } else { jimg.width(jmessage_width); var height = (img_height / img_width) * jimg.width(); jimg.height(height); } } }); }); } function resize_table() { $('div.message').each(function() { var jdiv = $(this); jdiv.find('table').addClass('table').wrap('<div class="table-responsive"></div>'); }); } $(function() { resize_image(); resize_table(); $(window).on('resize', resize_image); }); var jmessage = $('#message'); jmessage.on('focus', function() {if(jmessage.t) { clearTimeout(jmessage.t); jmessage.t = null; } jmessage.css('height', '6rem'); }); jmessage.on('blur', function() {jmessage.t = setTimeout(function() { jmessage.css('height', '2.5rem');}, 1000); }); $('#nav li[data-active="fid-1"]').addClass('active'); </script>