This article assumes that you are familiar with XML and .NET Framework
In the preface, the XMLTextReader and the XMLTextWriter class provide read and write operations for XML data. In this article, the authors tell the architecture of the XML reader and how they are combined with XMLDom and SAX interpreters. The author also demonstrates how to use reader analysis and verify the XML document, how to create a good XML document, and how to read / write Base64 and Binhex encoded large XML documents. Finally, the author tells how to achieve a stream-based read / write analyzer, which packages the reader in a separate class.
About three years ago, I participated in a software seminar, the theme is "no XML, there is no future programming." XML is indeed another step in step, it has been embedded. Net framework. In this article, I will explain. NET Framework is used to process the role of the XML document API and its internal features, and then I will demonstrate some common functions.
XML from MSXML to .NET
Before. Net Framework appears, you are used to using MSXML services - a COM-based class library-write Windows XML driver. Unlike. NET Framework, some code of the MSXML class library is deeper than the API, which is completely embedded at the bottom of the operating system. MSXML can indeed communicate with your application, but it cannot be truly combined with external environments.
The MSXML class library can be imported in Win32 and can be used in CLR, but it can only be used as an external server component. However, based on the .NET Framework application can be integrated with the XML class with the .NET Framework other namespace, and the code written is easy to read.
As a separate component, the MSXML analyzer provides some advanced features such as asynchronous analysis. This feature is not available in the XML class in .NET Framework, it is not available, however, the XML class in NET Framework can be easily gain the same function easily, on this basis, you can add More features.
The XML class in .NET Framework provides basic analysis, query, and converts the functionality of XML data. In .NET Framework, you can find classes that support XPath query and XSLT conversion, and class read / write XML documents. In addition, .NET Framework also includes other classes that process XML, such as the sequence of objects (XMLSerializer and the SoapFormatter class), application configuration (AppsetTingsReader class), data storage (DataSet class). In this article, I only discuss classes that implement basic XML I / O operations.
XML analysis mode
Since XML is a tag language, there should be a tool to analyze and understand information stored in a document with a certain syntax. This tool is an XML analyzer - one component is used to read the target and return to the target of the specified platform.
All XML analyzers, no matter which operation platform is it, it is not more than two types: trees or event-based processors. These two categories are usually implemented with XMLDOM (The Microsoft XML Document Object Model) and SAX (Simple API for XML). XMLDOM analyzer is a normal tree-based API - it treats XML documents as a memory tree. The SAX analyzer is an event-based API - it handles each element in the XML data stream (it processes the XML data in the stream). Typically, the DOM can be loaded and executed by an SAX, so the two types of processing are not mutually exclusive. Overall, the SAX analyzer is opposite to the XMLDOM analyzer, and their analysis model has great differences. Xmldom is well defined inside its FunctionAlation set, you can't extend it. When it is handling a large document, it should take a lot of memory space to handle this huge collection of FunctionAlation.
The SAX analyzer uses the client application to process the analysis event through the instance of the existing specified platform object. The SAX analyzer controls the entire process, "introduces the data" to the handler, which accepts or rejects processing data. The advantage of this mode is that there are few memory spaces.
.NET Framework fully supports XMLDOM mode, but it does not support SAX mode. why? Because .NET Framework supports two different analysis modes: XMLDOM analyzers and XML readers. It obviously does not support SAX analyzers, but this does not mean that it does not provide a function similar to SAX analyzer. All functions through XML reader SAX can be easily implemented and more efficient. Unlike SAX analyzers, .NET Framework's reader is entirely operating under the client application. In this way, the application itself can only "launch" the truly needed data, then jump from the XML data stream. The SAX analysis mode is to process all information useful and useless to the application.
The reader is based on the .NET Framework stream mode, its work mode is similar to the database of the database. Interestingly, achieving similar cursor analysis modes provide the underlying support of the XMLDOM analyzer in the .NET Framework. XmlReader, XMLWRITER two abstract classes are the basic class of the XML class in all .NET Framework, including XMLDOM classes, ADO.NET driver classes, and configuration classes. So you have two optional methods to process XML data in .NET Framework. Use XMLReader and XMLWRITER classes to process XML data directly, or use XMLDOM mode. More about read documents in .NET Framework can see the Cutting Edge column article of MSDN August 2002.
XMLReader class
The XML reader supports a programming interface. The interface is used to connect the XML document, "launch" you want. If you go deep into the reading reader, you will find that the reader works similar to our desktop application to remove the data from the database. The database service returns a cursor object, which contains all query result sets and returns a reference to the start address of the target dataset. The client of the XML reader receives a reference to the reader instance. This example extracts the underlying data stream and presented the removed data as an XML tree. Reader class provides read-only, forward cursors, and you can scroll through each data in the game traversal result set with the method provided by the reader class.
From a reader, you are not a tag text file, but a serialized node collection. It is a special cursor mode in the .NET Framework; in the .NET Framework, you can't find any other similar API functions.
Readers and XMLDOM analyzers have several different places. The XML reader is only available, it does not have the concept of parents, children, ancestors, brothers, and is read-only. In .NET Framework, reading and writing XML documents is divided into two completely different functions, which are completed by XMLReader and XMLWRITER classes. To edit an XML document, you can use the XMLDOM analyzer, or you design a class yourself to implement these two functions. Let's start analyze the program function of the reader. XmlReader is an abstract class that you can inherit and extend its features. User programs are generally based on the three classes below: XMLTextReader, XMLValidatingReader or XMLNodeReader class. All of these classes have a method of attributes and diagram of Figure II. It should be noted that the value of some attributes actually depends on the actual reader class, and the different classes may differ from the base class. Therefore, the description of each attribute in Figure 1 is subject to the base class. For example, the CanResolveEntity property returns only true in the XMLValidatingReader class; but it can be set to false in other reader classes. Similarly, the actual return value of certain methods in Figure II may differ from different classes. For example, if the node type is not an element node (Element Node), all return value types of the method containing the Atributes are Void.
The XMLTextReader class quickly accesses XML data streams with just, read-only way. The reader first verifies that the XML document is well format, if not, throw an exception. XMLTextReader checks if the format of the DTD is good, but does not verify the document with DTD. XMLTextReader loads XML document data from the file name of the XML document, or its URL, or from the file stream, and then quickly processes XML document data. If you need to verify the data of the document, you can use the XMLVALIDATINGReader class.
You can use a variety of ways to create an instance of the XMLTextReader class, load files from your hard drive, or load it from the URL address, and stream (streams), and you will read XML document data from text:
XMLTextReader Reader = New XmlTextReader (file);
Note that all public (public) constructor of all XMLTextReader classes require you to specify the data source, the data source can be Stream, file, or other. The XMLTextReader default constructor is protected (protected), so it cannot be used directly. Like all reader classes in .NET Framework (such as SqlDataReader class), once the reader object is connected and opened, you can use the Read method to access the data. You can only move the pointer to the first element with the read method; then we can use a Read method or other method (such as Skip, MoveToContent, and ReadinnerXML) to move pointers to the next node element. To handle the content of the entire XML document, you can use a loop multi-circular document according to the return value of the READ method, because the read method returns a Boolean value, when reading the tail node of the document, the Read method returns false, otherwise it returns true.
Figure 3 Outputting An XML Document Node Layout
String getXmlFileNodeLayout (String file)
{
// Create an XMLTextReader class to point to the target XML document
XMLTextReader Reader = New XmlTextReader (file);
// Cycle the text of the node and put into the StringWriter Writer = new stringWriter () in the StringWriter object instance;
String TabPREFIX = ""
While (Reader.Read ())
{
// Write the start flag, if the node type is an element
IF (reader.NodeType == XMLNodetype.element)
{
/ / Add the Reader.Depth tab according to the depth of the node, and then write the element name into <>.
TabPRefix = new string ('/ t', reader.depth);
Writer.writeline ("{0} <{1}>", tabprefix, reader.name;
}
Else
{
// Write the end sign, if the node type is an element
IF (reader.nodetype == xmlnodetype.endelement)
{
TabPRefix = new string ('/ t', reader.depth);
Writer.writeline ("{0} {1}>", tabprefix, reader.name;
}
}
}
// Output to the screen
String buf = Writer.tostring ();
Writer.close ();
// Close the flow
Reader.Close ();
Return BUF;
}
Figure 3 demonstrates a simple function of node elements for outputting a given XML document. This function opens an XML document and then processes all the contents in the XML document with a loop. Each time you call the Read method, the reader's pointer will move down a node. In most cases, the element node can be processed with the Read method, but sometimes, when you move from a node to the next node, it may be between two different types of nodes. But the Read method cannot move between attribute nodes. The MoveToContent method of the reader allows the pointer to jump from the head node location to the first content node location. You can also move pointers with SKIP methods in Processinginstruction, DocumentType, Comment, Whitespace, and SignificantWhitespace types. The type of each node is one of the XMLNodeType enumeration values. In the code shown in Figure 3, we only use two types: Element and endelement. The output source code reordered the original document structure, which discarded or ignored the properties and node content of the XML element, only outputting the element node name. Suppose we use the XML pieces below:
MSDN MAGAZINE
MAG>
MSDN Voices
MAG>
Mags>
The result of the above program output is as follows:
MAG>
MAG>
Mags>
The reduction amount of the child node is set according to the depth property of the reader, the depth property returns a shaped data, which represents the nested hierarchy of the current node. All text is placed in the StringWriter object (a very convenient stream-based packaged StrigBuilder class).
As mentioned earlier, the reader does not automatically access the property node via the Read method. To access the current element's attribute node collection, you must use a simple loop that is simply controlled by the return value of the MovetoneXTAttribute method. The following code is used to access all attributes of the current node, and combine the name of the attribute and its value into a string: if (Reader.haSattributes)
While (reader.movetonextAttRibute ())
BUF = Reader.Name "= /" " Reader.Value " / ","
Reader.MoveToelement ();
When you complete the processing of the property set, call the MoveToElement method to return the pointer to the element node to which the property belongs. Accurately, the MoveToeElement method is not a real movement pointer because the pointer is never removed from the element node during processing attribute set. The MoveToElement method only points to an internal member and takes the value of the member. For example, use the Name property to get a property name of a attribute, and then call the MoveToelement method to move the pointer to the element node it belong. But when you don't need to continue to handle other nodes, you don't have to call the MoveToeElement method.