Analysis of XML Text (ZT) using MSXML
XML DOM (Document Object Model) object provides a standard method to operate information stored in the XML document, which is the DOM App Programming Interface (API) function. It is a bridge between the application and the XML document. The DOM contains two key abstract concepts: one is a tree hierarchy and the other is a collection of nodes that represent document content and structures. The tree hierarchy includes all nodes, and the node itself can also contain other nodes. This is the advantage of finding and modifying information of a particular node through this hierarchy.
Microsoft's MSXML parser reads an XML document and then parsing its content into an abstract information container, which is called node (NODES). These nodes represent the structure and content of the document, and allow the application to operate the information in the document without having to know the semantics of XML. After a document is parsed, its node can be browsed at any time without having to maintain a certain order.
For developers, the most important programming object is DomDocument. DomDocument objects allow browsing, query, and modifying the contents and structures of XML documents by exposed properties and methods.
This paper mainly introduces the structure and application of the DOM, and uses the VC programming language to give an instance of XML parsing through MSXML.
Structure and application of DomDocument objects
Creation of document objects
HRESULT HR;
IxmldomDocument * pxmldoc;
Ixmldomnode * pxdn;
// COM initialization
HR = Coinitialize (NULL);
/ * Get about ixmldomdocument
Pointer PXMLDoc * /
HR = CocreateInstance (CLSID_DOMDocument, NULL,
CLSCTX_INPPROC_SERVER, IID_IXMLDOMDOCUMENT,
(void **) & pxmldoc);
/ / Get a pointer pxdn about the ixmldomnode interface
HR = pxmldoc-> queryinterface (IID_IXMLDOM
Node, (void **) & pxdn;
During the use of the MSXML parser, we can create a node using the CreateElement method in the document to load and save the XML file, or load an XML document from a specified URL through the LOAD or the loadXML method. Load (loadXML) method has two parameters: the first parameter XMLSource indicates that the document that needs to be parsed, and the second parameter Issuccessful indicates whether the document load is successful.
Saving of document objects
The Save method is used to save the document to a specified location. The parameter Destination in the Save method is used to indicate the type of object that needs to be saved, the object can be a file, an ASP RESPONSE method, an XML document object, or a customer object that supports persistent save (persistence). Below is a part of the code of an example program using the Save method:
Bool domdocsavelocation ()
{
Bool Bresult = false;
IxmldomDocument * pixmldomdocument = null;
HRESULT HR;
Try
{
_variant_t varstring = _t ("d: //sample.xml);
/ * Here you omitted a DomDocument
Object and code loaded with XML documents * /
// Save the document to D: //sample.xml
HR = PixmldomDocument-> Save (Varstring); if (succeededed (HR))
BRESULT = True;
}
Catch (...)
{
DisplayerRortouser ();
/ * Here is omitted to ixmldomDocument
Interface referenced code * /
}
Return BRESULT;
}
Set parsing flag
In the parsing process, we need to get and set the parsing flag. With different parsing marks, we can use different ways to parse an XML document. The XML standard allows the parser to verify or do not verify the document, allowing the parsing process of the document to skip the extraction of the external resource, but also set the flag to indicate whether to remove excess space from the document. The DomDocument object exposes the following properties that allow users to use them to change the behavior of the parser when they are running.
1.async attribute method: get_async and put_async.
2. ValidateonParse Properties Method: Get_ValidateonParse and Put_ValidateonParse.
3.Resolveexternals attribute method: get_ resolveexternals and put_ resolveexternals.
4.PreserveWhitespace property method: get_ preserveWhitespace and Put_ PreserveWhitespace.
Each attribute accepts or returns a Boolean value. By default, the value of Async, ValidateonParse, ResolveExternals is true. The value of the preserveWhitespace is related to the setting of the XML document. If the XML document is set, this value is false.
The following information can be collected during the document parsing process:
1.DOCTYPE: Yes DTD file used to define document format. If the XML document does not have a related DTD document, it returns NULL.
2. IMPLEMENTATION: Indicates the implementation of the document to indicate the version of the XML supported by the current document.
3.ParseError: It is pointed out that the error occurred in the resolution process.
4.ReadyState: Represents status information of the XML document. ReadyState is important for asynchronously using Microsoft's XML parsers to improve performance. When the XML document is loaded asynchronously, the program may need to check the status of the resolution, and the MSXML provides 4 states, which are being status, have been status, parsing and parsing.
5.URL (Uniform Resource Location): The case indicating the URL of the XML document being loaded and parsed. If the document is created in memory, this property returns a null value.
Node operation
After obtaining the document tree structure, we can operate each node in the tree, generally get the nodes in the tree through two ways, which are nodefromid and getElementsByTagname.
The NodeFromid includes two parameters, the first parameter IDString is used to represent the ID value, the second parameter Node returns the interface pointer to the node that matches the ID. According to the technical requirements of XML, the ID value in each XML document must be unique, and an element can only be associated with an ID.
The getElementsBytagname method has two parameters. The first parameter tagname indicates the element name that you need to find, if tagname is "*", returns all elements in the document. The second parameter is ResultList, which is actually pointing to the pointer to the interface IXMLDomnodeList, used to return a collection of all nodes related to the Tagname. Below is part of the relevant example program:
IxmldomDocument * pixmldomdocument = null;
WSTRING STRFINDTEXT (_T ("author");
Ixmldomnodelist * pidomnodelist = null;
IXmldomnode * pidomnode = null;
Long Value;
BSTR BSTRITEMTEXT;
HRESULT HR;
Try
{
/ * Omitted to create a DomDocument here
Document object and load code for specific documents * /
/ * The following code is used to get one and label name
AuThor's collection of all nodes * /
/ / Whether it is correctly got a pointer to IdomnodeList
HR = PixmldomDocument-> getElementsBytagname
(Tchar *) strfindtext.data (), & pidomnodelist;
Succeeded (HR)? 0: throw hr;
/ / Get the number of nodes included
HR = pidomnodelist-> get_length;
En (ac))
{
Pidomnodelist-> reset ();
For (INT II = 0; II { // Get a specific node PidomnodeList-> Get_Item (II, & Pidom Node); IF (pidomnode) { / / Get the text information related to this node Pidomnode-> Get_text (& BSTRITEMTEXT); :: Messagebox (Null, BSTRITEMTEXT, StrfindText.data (), MB_OK); Pidomnode-> Release (); Pidomnode = NULL; } } } Pidomnodelist-> release (); Pidomnodelist = NULL; } Catch (...) { IF (pidomnodelist) Pidomnodelist-> release (); IF (pidomnode) Pidomnode-> Release (); DisplayerRortouser (); } You can create a new node through method CreateNode. Createnode includes 4 parameters, the first parameter TYPE indicates the type of node to be created, the second parameter name represents the value of the NodeName of the new node, the third parameter namespaceuri represents the name space related to the node, the fourth parameter Node represents a newly created node. A new node can be created by using the type (TYPE), Name, Name, and NodeName). When a node is created, it is actually created in a name spatial range (if a name space has been provided). If the namespace is not provided, it is actually created within the namespace of the document. Analysis XML To illustrate how to use an XML DOM model in VC, we will introduce a simple Console Application instance program. Below is the primary program code to locate a special node in an XML document and insert a new child node. #Include / * The following .H file is installed on the latest XML Parser later included .h file * / #Include "C: / Program Files / Microsoft XML Parser SDK / INC / MSXML2.H " #Include void main () { // Initialize COM interface Coinitialize (NULL); / * In the program, assume that the loaded XML file name is XmlData.xml, default it and executable A directory. The content of this file is as follows: XML Version = "1.0"?> xmldata> The program will look for nodes named "XMLNode", insert a node called "XMLChildNode", then find a node named "XMLText", then extract text included in this node and display it, Finally, it saves the newly changed XML document in the document name "Updatexml.xml". * / Try { // Create an instance of a parser through a smart pointer CComptr HRESULT HR = SPXMLDOM.COCREATEINSTANCE (-UUIDOF (DomDocument); IF (FAILED (HR)) Throw "Can't create an XML Parser object"; IF (SPXMLDOM.P == Null) throw "You can't create XML Parser objects"; // Create a success, start loading an XML document Variant_bool bsuccess = false; HR = SPXMLDOM-> LOAD (CCOMVARIANT) L "xmldata.xml"), & bsuccess); IF (Failed (HR)) throw "Unable to load XML document in the parser"; IF (! bsuccess) throw "Unable to load XML document in the parser"; // Check and search "XMLDATA / XMLNode" CCOMBSTR BSTRSS (L "xmldata / xmlnode"; CComptr / * Use interface ixmldomdocument The SelectsingLenode method locates the node. * / HR = SPXMLDOM-> SELECTSINGLENODE (BSTRSS, & SPXMLNODE); IF (Failed (HR)) throw "You cannot locate 'XMLNode'" in the XML node "; IF (spxmlnode.p == null) throw "You cannot locate 'XMLNode'" in the XML node "; / * DOM object" SPXMLNode " Now contains XML nodes So we can create a child node below it. * / CComptr / * Method for interface IXmldomDocument Create The Node method creates a new node. * / HR = SPXMLDOM-> CREATENODE CCOMVARIANT (Node_Element), CCOMBSTR ("XmlchildNode"), NULL, & SPXMLCHILDNODE IF (FAILED (HR)) throw "cannot be created 'Xmlchildnode' node "; IF (SPXMLCHILDNODE.P == NULL) Throw "Can't create 'XMLChildNode' Node"; // Add a new node to the SPXMLNode node CComptr HR = SPXMLNODE-> AppendChild SPXMLCHILDNODE, & SPINSERTEDNODE IF (Failed (HR)) throw "You can't create 'XMLChildNode' nodes"; IF (spinsertednode.p == null) throw "You can't move 'XMLChildNode' node"; / / Set new node properties CCOMQIPTR SPXMLCHILDELEMENT = SpinsertedNode; IF (SPXMLCHILDELEMENT.P == NULL) Throw "can't query in the XML element interface 'XmlchildNode'; HR = SPXMLCHILDELEMENT-> SetAttribute (CCOMBSTR (L "XML"), CCOMVARIANT (L "fun")); IF (FAILED (HR)) throw "cannot insert new properties"; / * The following block is used to find a node And display information about the node. * / / / Find "XMLDATA / XMLText" node / / Release the previous node SPXMLNODE = NULL; BSTRSS = L "xmldata / xmltext"; HR = SPXMLDOM-> SELECTSINGLENODE (BSTRSS, & SPXMLNODE); IF (FAILED (HR)) throw "cannot be positioned 'XmlText' Node; IF (spxmlnode.p == null) throw "You can't position the 'xmltext' node"; / / Get the text containing the node and display it CComvariant Varvalue (VT_EMPTY); HR = SPXMLNODE-> get_nodetypedValue & VARVALUE IF (FAILED (HR)) throw "cannot extract 'XMLText' Text"; IF (varValue.vt == vt_bstr) { /*Show results. Note that the string should be Transform from form BSTR to ANSI. * / Uses_Conversion; LPTSTR LPSTRMSG = W2T VARVALUE.BSTRVAL); Std :: cout << lpstrmsg << std :: endl; } // if Else { // If there is an error "You can't extract 'XMLText' Text"; } // else // Save the modified XML document as specified document name HR = SPXMLDOM-> Save (CCOMVARIANT ("UpdatedXML.XML")); IF (Failed (HR)) throw "You can't save the modified XML document"; Std :: cout << "Processing ... << std :: endl << std :: endl; } // Try Catch (char * lpstrer) { // An error occurred Std :: cout << lpstrerr << std :: endl << std :: end1; } // catch Catch (...) { // unknown mistake Std :: cout << "Unknown error ..." << std :: endl << std :: endl; } // catch // End the use of COM Couninitialize (); } Because XML documents have more stringent grammar requirements than HTML, it is easy to use and write an XML parser than writing an HTML parser. At the same time, because the XML document not only marks the display attribute of the document, it is more important that it marks the structure of the document and the feature of the information, so we can easily obtain information about a specific node through an XML parser and display or modify.