Review
. XML is a set of rules that define semantic tags that divide documents into many parts and identify them. It is also a meta-marking language that defines syntax language for defining other, semantic, structured marking languages related to specific areas. XML is the most popular technology today. PHP also has the function of analyzing the XML document. Here we will discuss the situation of XML applications in PHP.
Talk to XML (Extended Markup Language: Extended Markup language), we may wish to see a HTML code first:
The above code is consistent with XML rules from the structure, and XML can be understood that the structure type of the tree containing data:
1. When references the same elements, use consistent cases, such as
How to apply PHP XML parser EXPAT?
Expat is the XML parser of the PHP script language (also known as an XML processor) that allows the program to access the structure and content of the XML document. It is an event-based parser. There are two basic types of XML parsers:
Tree-based parser: convert the XML document into a tree structure. Such parsers analyze the entire article while providing an API to access each element of the generated tree. Its generic standard is DOM (document object mode).
Event-based parser: treating XML documents as a series of events. When a special event occurs, the parser handles the function provided by the developer. Event-based parser has an XML document data set, that is, it focuses on the data part of the XML document, not its structure. These parsers process documents from head to tail, and will similar to the beginning of the element, the end of the element, the start of feature data, etc. - Event Reports to the application via the callback function.
The following is an XML document example of / "hello-world /":
Start Element: Greeting The beginning of CDATA, the value is: hello world end element: greeting
Event-based parser does not produce the structure of the document, of course, if necessary, it can generate a complete native tree structure in the PHP if necessary. In CDATA items, event-based parsers do not get information of the parent element Greeting. However, it provides a lower-level access, which makes it better to use resources and access faster. In this way, there is no need to put the entire document into memory; in fact, the entire document can even be greater than the actual memory value.
The above Hello-World example though includes a complete XML format, but it is invalid because there is neither a DTD (document type definition), and there is no embedded DTD. But EXPAT is a parser that does not check the validity, so I ignore any DTD contact with the document. It should be noted that the document still needs a complete format, otherwise expat (and other parsers that meet the XML standard) will stop with the error information.
Compile EXPAT
EXPAT can compile into a PHP3.0.6 version (or more). Starting from Apache 1.3.22, Expat has been part of Apache. In the UNIX system, PHP can be configured to configure PHP through the -with-xml option.
If PHP is compiled into a module of Apache, the expat will be part of Apache by default. In Windows, you must load an XML dynamic connection library.
XML Example: Xmlstats We want to discuss the examples of using Expat to collect statistics for the XML document.
For each element in the document, the following information will be output:
The elements used in the document of the sub-elements of the parent element of the number of characters in the element data
Note: To demonstrate, we use PHP to generate a structure to save the parent element and child elements of the element.
What are the functions used to generate an XML parser instance?
A function for generating an XML parser instance is XML_Parser_create (). This instance will be used for all functions. This idea is very similar to the connection tag of the MySQL function in PHP. Event-based parsers are usually required to register the callback function before resolving the document - calling when a specific event occurs. Expat has no exception, which defines the following seven possible events:
Beginning and end of the character data xml_set_character_data_handler objects XML parsing functions described element the xml_set_element_handler () of elements () starts external entity xml_set_external_entity_ref_handler character data () external entities appear unparsed external entity xml_set_unparsed_entity_decl_handler () external entity unresolved appear processing instruction xml_set_processing_instruction_handler () Processing The appearance of the command declares declares XML_SET_NOTATION_DECL_HANDLER () Notation declaration default XML_set_default_handler () other events without specifying a handler
All callback functions must use the example of the parser as its first parameter (there are other parameters).
For the final example script of this article, it is important to note that it uses both the element processing function and the character data processing function. The callback process function of the element is registered by XML_SET_ELEMENT_HANDLER ().
This function requires three parameters:
The example of the parser instance processing the name of the callback function of the start element is the name of the callback function of the end element. When the XML document is started, the callback function must exist. They must be defined as the same as the prototypes described in the PHP manual.
For example, EXPAT passes three parameters to the process function of the start element. In the script example, it is defined as follows:
Function Start_Element ($ Parser, $ Name, $ Attrs)
$ PARSER is a parser flag, $ name is the name of the start element, $ Attrs is an array containing all attributes and values of elements.
Once the XML document is started, EXPAT will call the start_element () function and pass the parameters in the past.
XML's Case Folding option
Use the XML_Parser_Set_Option () function to turn the Case Folding option. This option is open by default, so that the element name passed to the processing function is automatically converted to uppercase. But XML is sensitive to uppercase (so case in cases are very important to statistical XML documents). For our example, the CASE Folding option must be turned off.
How to analyze the document?
After completing all preparations, the script is now finally parsing the XML document:
Will return false.
We can use the XML_GET_ERROR_CODE () function to get the last error number code. Pass this numeric code to the XML_ERROR_STRING () function to get the wrong text information. Output XML's current number of rows, making debugging easier.
When parsing the document, it is necessary to emphasize the problem for Exppat: How to maintain the basic description of the document structure?
As mentioned earlier, an event-based parser itself does not produce any structural information. However, the tag (TAG) structure is an important feature of XML. For example, elemental sequence
In order to generate a mirror of a document structure, the script needs to know the parent element of the current element. The API with Exapt is unable to report only the current elements, without any information on the front-rear relationship. Therefore, you need to build your own stack structure.
The script example uses the stack structure of advanced subsequent (Filo). Through an array, the stack will save all the start elements. For start element processing functions, the current element will be pushed to the top of the stack by the array_push () function. Correspondingly, the end element processing function removes the top element by array_pop ().
For sequences
Start Elements Book: Pass / "BOOK /" to assign the first element ($ stack [0]). Start Elements Title: Assign / "Title /" to the top of the stack ($ stack [1]). End Elements Title: From the stack to remove the top elements ($ stack [1]). End Elements Title: From the stack to remove the top element ($ stack [0]).
PHP3.0 manually controls the nested nested nested by a $ depth variable, which makes the script look complicated. PHP4.0 makes scripts more concise via Array_POP () and array_push (). How to collect element information in XML documentation?
In order to collect information of each element, the script needs to remember the event of each element. Save all different elements in the document by using a global array variable $ Elements. Array project is an example of an element class, there are 4 attributes (class variables)
$ Count - This element is discovered in the document ($ Chars - the number of character events in the element $ parents - parent element $ childs - child elements
Note: One feature of PHP is that you can traverse the entire class structure through while (list () = each ()) loop, just like you traverse the entire array. All class variables (when you use php3.0, however, however, the method is output in a string.
When an element is found, we need to increase its corresponding record to track how many times it appears in the document. In the corresponding $ Elements item, you should add one.
We also want to let the parent elements know the current element is its child elements. Therefore, the name of the current element will be added to the $ childs array of the parent element. Finally, the current element should remember who is its parent element. Therefore, the parent element is added to the current element $ PARENTS array project.
Show statistics
The remaining code displays its statistics in the $ Elements array and its sub-array. This is the simplest nesting cycle, although the correct result is output, but the code is neither concise and no special skills, it is just a loop you might use in him every day.
The script example is designed to call the command line of the PHP's CGI mode. Therefore, the format of the statistical result is the text format. If you want to use the script to the internet, you need to modify the output function to generate an HTML format.
How to prepare a mini search engine instance with PHP & XML?
Let us first familiarize with the XML used in our program (saved as XYZ.xml).
XML Version = / "1.0 /" Encoding = / "GB2312 /"?>
It is quite simple. The root element is links. The SUB represents a category. Web is a website information, which contains attributes, the URL represents the joint connection, MEMO is a note information,
The most important point is that this search engine maintains quite simple and does not have to write a programs for maintenance of a cumbersome database. For example, we have to add a category or web page, just edit the text file, plus a blessing EB> ??? web> or ???? sub> is OK, and if If you want to move a category to another, we just need to copy this part of the SUB.
The following is the simplest use of PHP display XML examples.
The following program is to parse XML and output to the browser according to the tree structure and display the total number of elements per layer.
php $ file = /"demo.xml/";// xml file function XML_PARSE_FROM_FILE ($ PARSER, $ file) {// parsing XML file function} Function Start_element ($ Parser, $ Name, $ Attrs) {/ / I encountered the open element marker, such as to execute this paragraph} Function Stop_Element ($ PARSER, $ NAME) {// encountered this paragraph, such as body> to execute this paragraph} Function Data ($ PARSER, $ DATA) {...} Function showcount () {// Show the total number of elements of each layer}
Global $ Level, $ LevelCount, $ MaxLevel; $ level = -1; $ parser = XML_PARSER_CREATE (); // Generate an instance XML_SET_ELEMENT_HANDler ($ PARSER, / "START_ELEMENT /", / "STOP_ELEMENT /"); // Set the processing function XML_SET_CHARACTER_DATA_HANDLER ($ PARSER, / "DATA /"); XML_Parser_Set_Option ($ PARSER, XML_OPTION_CASE_FOLDING, 0); $ RET = XML_PARSE_FROM_FILE ($ Parser, $ file); // Resolution File IF (! $ RET) {DIE Sprintf (/ "XML Error:% S at line% D /", XML_ERROR_STRING (XML_GET_ERROR_CODE ($ PARSER)), XML_GET_CURRENT_LINE_NUMBER ($ PARSER)));} XML_Parser_Free ($ PARSER); // Release the parser showcount () ;? >
On the basis of the above procedure, a piece of subtree can be displayed, and we can position him in accordance with the number of elements and the order in this layer.
For example:
LINKS (0, 1) ---- Web (1,1) ---- SUB (1, 2) ---- Web (2, 1) | -- SUB (2, 2) | ---- Web (3, 1) | | ---- Sub (3, 2): : Because you want to display a subcategory (such as program design -> php->) to use him. PHP ... Function Start_Element ($ Parser, $ Name, $ Attrs) {Global $ Level, $ Levelcount, $ MAXLEVEL, $ HIDE, $ Lev, $ Num, $ PHP_SELF; $ Level = 1; IF ($ Level) > $ maxLevel) $ maxlevel = $ level; $ levelcount [$ level] = 1;
IF ($ hide) {// Judgment is within the scope of the subtree, $ hide == false is in IF ($ level == $ Lev && $ HIDE = FALSE; Else) {IF ($ level <= $ lev) $ hide = true;
IF (! $ hide) {... // Output}} Function Data ($ Parser, $ DATA) {Global $ Level, $ Hide; if (! $ hide) {= {= / {(Trim ($ DATA)! = / "/ ") {Echo Trim ($ data);}}} ... Global $ HIDE, $ Lev, $ Num, $ PHP_SELF; $ level = -1; $ hide = true; echo /"
root p> / "; if ($ lev == /" / ") {$ lev = 0; $ Num = 1;} ...?>
How does Mini's search engine do?
I have made a number of pads. Let's take a look at our main files of our search engine.
The first paragraph is imitation sina, Yahoo's according to the category query.
xml3.php
Keyword matching uses an EREGI function, we assume that the input text will not cause errors.