Analyze XML text with MSXML

xiaoxiao2021-03-06  45

I. Introduction

The current popular scripting language is the main language structure, HTML is a tag language, not a programming language, the primary mark is a description of the display, rather than a description of the structure of the document content itself. That is, the machine itself cannot parse its content, so there is an XML language.

The XML (ExtensibleMarkup Language) is a subset of SGML languages. It retains the main function of SGML while significantly reduced the complexity of SGML. The purpose of the XML language system is to make it not only represent the contents of the document, but can represent the structure of the document, which can be understood by the machine while being understood. XML requires a certain strict standard. The XML analysis program is more parental and structure than the HTML browser. XML requires the page that is created correctly using syntax and structure, not like HTML, and what should be in the browser speculation document to implement HTML display, XML Make the analysts more easily regardless of performance or stability. The analysis results of the XML document are consistent, unlike HTML, different browsers may make different analysis and display of the same HTML. At the same time, because the analysis program does not take time to rebuild incomplete documents, they can perform their tasks more effectively than similar HTMLs. They can do their best to build a corresponding tree according to the tree structure already included in the document, without having to display on the mixing structure in the information stream.

The XML standard is a processing application for data instead of just for web pages. Any type of application can be built on the above analysis program, and the browser is only a small component of XML. Of course, browsing is still extremely important because it provides XML staff with friendly tools for reading information. But for a larger project, it is just a display window. Because XML has a strict syntax structure, we can even use XML to define a communication protocol of an application layer, such as the Internet Open Trading Protocol is defined by XML. In a sense, in principle, some of the protocols and formats defined in BNF paradigm can be defined using XML. In fact, if we have enough patience, we can use XML to define a C language specification.

Of course, XML allows for a large number of HTML styles to freely develop, but it is more stringent for rules. XML has three main elements: DTD (Document Type Declaration-Document Type Declaration) or XML Schema (XML Outline), XSL (Extensible Stylesheet Language-Extensible Style Language) and XLINK (Extens Ilink Language- Extensible Link Language). The DTD and XML outline specify the logical structure of the XML file, define the relationship between elements, elements in the XML file, and the relationship between elements and elements of elements and elements; Namespace implementation unified XML document data representation and data Integration; XSL is a language for specifying the XML document rendering style, which makes data independently of each other, such as XSL to change the representation of the web browser to change the document, such as changes in the display order of the data, no need to perform again with the server communication. By changing the style sheet, the same document can be displayed larger or only the one layer is folded only, or the format that can be changed. XLINK will further extend the simple links of the current Web.

Second, implement the explanation of XML analysis

In theory, according to the format of XML, we can write an XML grammar analyzer, but in fact Microsoft has given us an XML syntax parser. If you have installed IE5.0 or above, actually You have already installed an XML syntax parser. You can download the latest MSXML SDK and Parser files from the Microsoft Site (www.microsoft.com). It is a dynamic link library called msxml.dll, the latest version is MSXML3, in fact it is a COM object library, which encapsulates all the necessary objects required for XML parsing. Because COM is a reusable object-independently in two-in-format format. So you can call it in any language (such as VB, VC, Delphi, C Builder or even scripting languages), and implement parsing of XML documents in your application. The following describes the introduction of the XML document object model is based on Microsoft's latest MSXML3. Third, XML Document Object (XML DOM) model analysis

XML DOM objects provide a standard method to operate information stored in an XML document, the DOM application programming interface (API) is used as a bridge between applications and XML documents. The DOM can be considered a standard structure system to connect documents and applications (can also be a script language). The MSXML parser allows you to load and create a document, collect the error message of the document, get all the information and structure in the document, and save the document in an XML file. The DOM is provided to the user an interface to load, reachable and manipulate and serialize the XML document. The DOM provides a complete representation of the XML document stored in memory, providing a method of random access to the entire document. The DOM allows the application to operate information in the XML document according to the logical structure provided by the MSXML parser.

Use the interface provided by MSXML to operate XML. In fact, the MSXML parser generates a DOM tree structure according to the XML document, which is capable of reading an XML document and creates a logical structure of a node based on the XML document content. The document itself is considered to be a node containing all other nodes. The DOM user can view the document as a structured information tree, not a simple text stream. This can easily operate the structure even if the semantic details of XML are not known, the application or script can be easily operated. The DOM contains two key abstractions: a tree hierarchy, and the other is used to represent a collection of document contents and structures. The tree hierarchy includes all of these nodes, and the node itself can also contain other nodes. Such benefits is for developers, he can find and modify information about a certain node through this hierarchy. DOM looks a node as a usual object so that it is possible to create a drama to load a document, then traverse all nodes, display information of the node of interest. Note Nodes can have many specific types, such as elements, attributes, and texts can be considered a node.

Microsoft's MSXML parser read an XML document and then parsing its content to an abstract information container called node (NODES). These nodes represent the structure and content of the document, and allow the application to read and operate the information in the document without the semantics of the XML known. After a document is parsed, its node can be browsed at any time without having to maintain a certain order. For developers, the most important programming object is DomDocument. DomDocument objects allow browsing, query, and modifying the contents and structures of XML documents, and each of the next objects can be collected, which can collect information about the object instance, the value of the object. And navigate to other objects of the tree. The main COM interfaces included by MSXml.dll are:

(1) DomDocument DomDocument object is the foundation of XML DOM, you can use the properties and methods it exposed to allow you to browse, query, and modify the contents and structure of the XML document. DomDocument represents the top layer node of the tree. It implements all basic methods for the DOM document and provides additional member functions to support XSL and XSLT. It creates a document object, all other objects can be obtained and created from this document object. (2) ixmldomnodeixmldomnode is the basic objects, elements, properties, comments, process instructions, or other document components in the Document Object Model (DOM), can be considered ixmldomnode, in fact, the DomDocument object itself is also an IXMLDomnode object. (3) ixmldomnodelist ixmldomnodelist is actually a collection of nodes, and the increase in nodes, deletions, and variations can be reflected immediately in the collection, and all nodes can be traversed through the "for ... next" structure. (4) The IXMLDompivalRROR IXMLDompivalError interface is used to return detailed information in the parsed process, including error numbers, line numbers, character positions, and text descriptions. The following is mainly description of the creation process of a DomDocument object, where the process of creating a document object is described with a VC description.

HRESULT HR;

IxmldomDocument * pxmldoc;

Ixmldomnode * pxdn;

HR = Coinitialize (null); FILE: // COM initialization

File: / / Get the pointer PXMLDOC for the IXmldomDocument interface.

HR = CocreateInstance (CLSID_DOM Document, NULL, CLSCTX_INPPROC_SERVER,

IID_ixmldomdocument, (void **) & pxmldoc);

File: // Get the pointer PXDN for the IXmldomnode interface.

HR = pxmldoc-> queryinterface (IID_ixmldomnode, (void **) & pxdn);

During the MSXML parser, we can use the CreateElement method in the document to create a node to load and save the XML file. With the LOAD or the LOADXML method, you can load an XML document from a specified URL. Load (loadXML) method has two parameters: the first parameter XMLSource indicates that the document that needs to be parsed, and the second parameter Issuccessful indicates whether the document load is successful. The Save method is used to save the document to a specified location. The Save method has a parameter Destination to indicate the type of object that needs to be saved, and the object can be a file, an ASP Response method, an XML document object, or a customer object that supports persistent save (persistence). Below is a simple example of the Save method (see http://www.swm.com.cn/swm/200101/ using MSXML). At the same time, in the parsing process, we need to get and set the resolution flag. With different parsing signs, we may parse an XML document in different ways. The XML standard allows the parser to verify or do not verify the document, allowing the parsing process of the document to skip the extraction of external resources. In addition, you may set the flag to indicate whether you want to remove excess space from the document. In order to achieve this, the DomDocument object exposes the following properties, allowing users to change the parser behavior when running:

(1) Async (two methods relative to C , GET_ASYNC and PUT_ASYNC respectively)

(2) ValidateonParse (two methods relative to C , is Get_Validate Onparse and PUT_VALIDATEONPARSE) (3) Resolveexternals (two methods relative to C , respectively, Get_ Resolve Externals and Put_Resolveexternals)

(4) PersercveWhitespace (two methods relative to C , are Get_ Perser CVEWHITESPACE and PUT_ PERSERCVE Whitespace)

Each attribute accepts or returns a Boolean value. The value of the default, anync, validateonparse, resolveexternals is true, and the value of the PERSERVEWHITESPACE is related to the setting of the XML document. If the XML document is set, this value is false.

At the same time, some information about and document information can be collected during the document parsing process, in fact, the following information can be obtained during the document parsing process:

(1) DOCTYPE: Actually and DTD files used to define document formats. If the XML document does not have a related DTD document, it returns NULL.

(2) Implementation: Indicates the implementation of the document, is actually used to point out the version of the XML supported by the current document.

(3) ParseError: The error that happened during the parsing process.

(4) ReadyState: Represents the status information of the XML document, ReadyState is an important role of asynchronously using Microsoft XML parsers to improve performance. When you load an XML document, your program may need to check resolution. The state, MSXML provides four states, which are being status, state, and are parsing and parsing.

(5) URL (Uniform Resource Location): The case of the URL of the XML document that is being loaded and parsed. Note that if the document is built in memory, this property returns a null value.

After obtaining the document tree structure, we can manipulate each node in the tree, which can get the nodes in the tree through two methods, respectively, NodeFromid and getElementsBytagName. NodeFromid includes two parameters, the first parameter IDString is used to represent the ID value, the second parameter node returns the interface pointer to the Node node that matches the ID. Note According to the technical requirements of XML, the ID value in each XML document must be unique, and one element is only associated with one ID. The getElementsBytagname method has two parameters. The first parameter tagname indicates the name of the element that needs to be found. If tagname is "*", return all elements in the document. The second parameter is ResultList, which is actually pointing to the pointer of the interface ixmldomnodelist, used to return all Node's collection related to the Tagname (label name).

Here is a simple example (see http://www.swm.com.cn/swm/200101/ of the specific program to parse XML text). Finally, discuss how to create a new node, can actually create a new node through method CreateNode. Createnode includes four parameters, the first parameter TYPE indicates the type of node to be created, the second parameter name represents the value of the NodeName of the new node, the third parameter namespaceuri indicates the name space related to the node, the fourth parameter node Represents a newly created node. Note You can create a node by using the Type, Name, Name, NodeName, NodeName. When a node is created, it is actually created in a name spatial range (if a name space is provided). If no namespace is provided, it is actually created within the namespace of the document. 4. Simple instance of XML document analysis using MSXML

To illustrate how to use the XML DOM model in the VC, a simple instance program is displayed (see http://www.swm.com.cn/swm/200101/ using the MSXML parsing XML text) is a consoleApplication . Below is the primary program code, this code is used to locate a special Node node in an XML document and insert a new child node.

to sum up

XML documents are much easier than HTML strict grammar requirements, so using and writing an XML parser is much easier than writing an HTML parser. At the same time, because the XML document not only marks the display attribute of the document, it is more important that it marks the structure of the document and the characteristics of the information, so it is convenient to obtain information of a particular node through the XML parser and display or modify, convenient. The user's operation and maintenance of the XML document. At the same time, we need to pay attention to XML is an open structure system that does not rely on any company, so develop XML-based applications will inevitably receive support for most software development platforms. In addition, it can be seen that the mainstream enterprises like Microsoft also position your eyes in XML COM-based systems, whether Microsoft's Office series, web server and browser or database products (SQL Server) have begun support XML - based application. By XML to customize the front end of the application, COM to implement specific business objects and database objects, making the system more flexible scalability and maintenance.

Attached: an XML more complete tutorial

http://www.rr365.net/edu-xml/xml.htm

转载请注明原文地址:https://www.9cbs.com/read-80329.html

New Post(0)