PHP XML analysis function
Source: Chinaasp first I have to admit that I like computer standards. If everyone follows the standards in this industry, the Internet will be a better medium. The use of standardized data exchange formats can make open and independent platform-based computing modes. This is the reason I as an XML enthusiast. Fortunately, my favorite scripting language not only supports XML and supports its support. PHP allows me to quickly publish the XML document to the Internet, collect statistics for the XML document, and convert the XML document into other formats. For example, I often use PHP XML processing capabilities to manage the articles and books written by XML. In this article, I will discuss any EXPAT parser with PHP to process the XML document. In an example, I will demonstrate the processing method of Expat. At the same time, the example can tell you how to: establish your own handler to convert the XML document into your own PHP data structure to introduce the resolver of the EXPAT XML, which is also called an XML processor, allowing the program to access the structure and content of the XML document. EXPAT is an XML parser for a PHP scripting language. It also uses in other items, such as Mozilla, Apache, and Perl. What is an event-based parser? Two basic types of XML parsers: Tree-based parser: Convert XML documents to a tree structure. Such parsers analyze the entire article while providing an API to access each element of the generated tree. Its generic standard is DOM (document object mode). Event-based parser: treat XML documents as a series of events. When a special event occurs, the parser handles the function provided by the developer. Event-based parser has an XML document data set, that is, it focuses on the data part of the XML document, not its structure. These parsers process documents from head to tail, and will similar to the beginning of the element, the end of the element, the start of feature data, etc. - Event Reports to the application via the callback function. The following is an "Hello-World" XML document example:
In the UNIX system, configure PHP through the -with-xml option, you can compile them into PHP. If you compile PHP to Apache, the expat will be part of Apache by default. In Windows, you must load an XML dynamic connection library. XML Example: An approach to XMLstats understands the function of Expat is to pass an example. The example we have to discuss is to use EXPAT to collect statistics for XML documents. For each element in the document, the following information will be output: the element used by the number of character data in the document Note: For demonstration, we use PHP to generate a structure. The parent element and sub-elements of the preservation element are prepared to generate a function of the XML parser instance as XML_Parser_Create (). This instance will be used for all functions. This idea is very similar to the connection tag of the MySQL function in PHP. Event-based parsers are usually required to register the callback function before parsing the document - calls when specific events occur. Expat no exceptions, it defines the following seven possible events: the beginning and end of the character objects XML parsing function describes the elements xml_set_element_handler () element data xml_set_character_data_handler () to start an external entity xml_set_external_entity_ref_handler character data () external entities appear External unparsed entity xml_set_unparsed_entity_decl_handler () Unconcerned external entity appearance processing command XML_SET_PROCESSING_INSTRUCTION_HANDLER () Processing Declaration Declaration XML_SET_NOTATION_DECL_HANDLER () Declaration Default XML_SET_DEFAULT_HANDLER () Other Events No Events All Tune Functions All callback functions must be used as an instance of the parser The first parameter (there are other parameters). For the final example script of this article. What you need to pay attention to is that it uses both the element processing function and the character data processing function. The callback process function of the element is registered by XML_SET_ELEMENT_HANDLER (). This function requires three parameters: The name of the callback function of the parser's instance processing start element The name of the callback function of the end element must exist when the XML document is started. They must be defined as the same as the prototypes described in the PHP manual. For example, EXPAT passes three parameters to the process function of the start element. In the script example, it is defined as follows: Function start_element ($ PARSER, $ NAME, $ atTRS) The first parameter is the parser, the second parameter is the name of the start element, the third parameter is all attributes and values including elements. Array. Once you start parsing the XML document, Expat will call your Start_Element () function and pass the parameters in the past. XML's Case Folding options are turned off with XML_Parser_Set_Option () function. This option is open by default, so that the element name passed to the processing function is automatically converted to uppercase. But XML is sensitive to uppercase (so case in cases are very important to statistical XML documents). For our example, the CASE Folding option must be turned off.
After completing all preparations, the script is now able to parse the XML document: XML_PARSE_FROM_FILE (), a custom function, open the file specified in the parameter, and parse XML_PARSE () and XML_PARSE_FROM_FILE () in the size of 4KB When an error occurs, the FALSE will be returned when the XML document is in full. You can use the XML_GET_ERROR_CODE () function to get the last error number code. Pass this numeric code to the XML_ERROR_STRING () function to get the wrong text information. Output XML's current number of rows, making debugging easier. Call the callback function during the resolution process. Description Document Structure When parsing a document, it is necessary to emphasize the problem for Exppat: How to maintain the basic description of the document structure? As mentioned earlier, an event-based parser itself does not produce any structural information. However, the tag (TAG) structure is an important feature of XML. For example, elemental sequence
We also want to let the parent elements know the current element is its child elements. Therefore, the name of the current element will be added to the $ childs array of the parent element. Finally, the current element should remember who is its parent element. Therefore, the parent element is added to the current element $ PARENTS array project. Display the remaining code of the statistics in the $ Elements array and the statistical results are displayed in their sub-arguments. This is the simplest nesting cycle, although the correct result is output, but the code is neither concise and no special skills, it is just a loop you might use in him every day. The script example is designed to call the command line of the PHP's CGI mode. Therefore, the format of the statistical result is the text format. If you want to use the script to the internet, you need to modify the output function to generate an HTML format. Summary ExApt is a PHP XML parser. As an event-based parser, it does not produce the structure description of the document. However, by providing underlying access, this makes it possible to better utilize resources and access faster. As a parser that does not check the validity, Exppat ignores the DTD connected to the XML document, but if the format of the document is incomplete, it will stop with the error information. Provide an event handler to process a document to establish your own event structure, such as stacks and trees to get the advantages of the XML structure information tag. There are new XML programs every day, and PHP support has also been continuously supported (for example, adding DOM-based XML parser libXML). With PHP and EXPAT, you can prepare for the upcoming valid, open and independent standards.
Example / ******************************************************* ******************************* * Name: XML resolution example: XML document information statistics * Description * This example passed PHP The Expat parser collects and counts information for XML documents (for example, the number of each element, the parent element and child elements * XML files are used as a parameter ./xmlstats_php4.php3 test.xml * $ Requires: EXPAT requirements: expat php4.0 compile For CGI mode ***************************************************** *************************************** / / The first parameter is XML file $ file = $ argv [1]; / / Variable initialization $ Elements = $ stack = array (); $ total_erements = $ total_chars = 0; // Element's basic class class element {var $ count = 0; var $ chars = 0; var $ parents = array () VAR $ childs = array ();} // parses the function of the XML file Function XML_PARSE_FROM_FILE ($ PARSER, $ file) {if (! file_exists ($ file)) {Die ("can't find file /" $ file / ".");} if (! ($ fp = @fopen ($ file, "r"))) {Die ("Can't open file /" $ file / ".");} while ($ data = Fread ($ FP, 4096)) {if (! XML_PARS ($ PARSER, $ DATA, FeOf ($ FP))) {return (false);}} fclose ($ fp); return (true);} // output Result function (box form) Function Print_box ($ TITLE, $ VALUE) {Printf ("/ n % '- 60s / n", ""); Printf ("|% 20s", "$ TITLE:"); printf ("% 14 s ", $ value); Printf ("% 26S | / N "," "); Printf (" % '- 60s / n "," ");} // output result function (line form) Function Print_LINE $ TITLE, $ VALUE) {Printf ("% 20s", "$ TITLE:"); Printf ("% 15s / n", $ value);} // Sort Function Function My_Sort ($ A, $ B) {Return (is_Object ($ a) && is_Object ($ b) $ b-> count - $ a-> count: 0);} Function Start_Element ($ Parser, $ Name, $ attrs) {Global $ Elements, $ stack; Does the element have been in the overall $ elements array? IF ($ Elements [$ Name])) {// No -
Add a class instance for an element $ element = new element; $ elements [$ name] = $ ELEMENT;} // This element is added to a $ Elements [$ name] -> count ; // Is there a parent element? IF ($ stack [count ($ stack) -1])) {// is - assigning a parent element to $ last_element $ last_element = $ stack [count ($ stack) -1]; // If the current element The parent element is empty, initialized to 0 if (! Isset ($ Elements [$ Name] -> PARENTS [$ last_element]) {$ Elements [$ Name] -> PARENTS [$ last_element] = 0;} // Element Parent Element Remeasure Add $ Elements [$ Name] -> PARENTS [$ LAST_ELEMENT] ; // If the child element of the parent element of the element is empty, it is initialized to 0 if (! ISset ($ Elements) [$ last_ELEMENT] -> Childs [$ last_element] -> Childs [$ name] = 0;} The child of the parent element of the element adds a $ elements [$ ELECT Last_Element] -> Childs [$ name] ;} // Add current elements to array_push ($ stack, $ name);} Function Stop_Element ($ PARSER, $ Name) {global $ stack; // The top elements are removed from the top elements to Array_POP;} Function char_data ($ PARSER, $ DATA) {Global $ Elements, $ stack, $ depth; // Add current Element Number of characters $ Elements [$ stack] [COUNT ($ stack) -1] -> Chars = Strlen (Trim ($ data));} // generates an instance of the parser $ PARSER = XML_PARSER_CREATE (); // Set the processing function XML_set_element_handler ($ Parser, " START_ELEMENT "," STOP_ELEMENT "); XML_SET_CHARACTER_DATA_HANDLER ($ Parser," Char_Data "); XML_Parser_Set_Option ($ Parser, XML_Opti ON_CASE_FOLDING, 0); // Resolution file $ RET = XML_PARSE_FROM_FILE ($ PARSER, $ file); if (! $ RET) {DIE (Sprintf ("XML Error:% S at line% D", XML_ERROR_STRING (XML_GET_ERROR_CODE ($ Parser) ))), XML_GET_CURRENT_LINE_NUMBER ($ PARSER))));} // Release the parser XML_Parser_Free ($ PARSER); // Release help element unset ($ Elements ["current_element"]; unset ($ Elements ["Last_Element"]); / / Sort UASORT according to the number of elements ($ Elements, "My_SORT"); // Cycle the element information in $ Elements WHILE (List ($ Name, $ Element) =