Effective way to modify large XML files

xiaoxiao2021-03-06  112

introduction

As XML becomes a common representation of large information sources, the developer has begun to encounter problems when editing large XML files. This is especially true for applications that handle large log files and often need to add information for these files. The most direct way to edit the XML file is to load it into XMLDocument, modify the document in the memory, and save it back to the disk. However, this means that the entire XML document is loaded into memory, which may not do this method due to the lack of memory that the document is too large or the application needs.

This paper illustrates some of the optional ways to modify the XML document, which do not involve the content of the document to the XMLDocument instance.

Using XML contains methods

The first method of the proposal is most useful for adding values ​​to XML log files. The common problem facing developers is to need a way to simply add new entries to log files without loading documents. Because XML has a good structural rule, use a traditional way (this method because the log file is incorrect, and the log file is ended) is usually very difficult to add an entry to the XML log file.

The first method to be explained is for this case, ie the purpose is to quickly append the entry to the XML document. This method includes creating two files. The first file is the correct XML file in format, the second is the XML fragment. The correct XML files include XML fragments, XML clips use the external entity declared in the DTD or use xi: include Element. Using the included file, by simply adding to the XML file during processing, the method of the file contains an XML fragment can be effectively updated. An example containing files and included files as follows:

Logfile.xml:

System "logfile-entries.txt">

]>

;

Logfile-events.txt:

127.0.0.1

get

index.html

2004-04-01T17: 35: 20.0656808-08: 00

127.0.0.1

get

stylesheet.css

2004-04-01T17: 35: 23.0656120-08: 00

http://www.example.com/index.html

127.0.0.1

get

logo.gif

2004-04-01T17: 35: 25.238220-08: 00

http://www.example.com/index.html

The logfile-entries.txt file includes an XML fragment and can be effectively updated using a typical file IO method. The following code illustrates how to add it to the XML log file by appending the entry to the end of the text file.

Using system;

Using system.io;

USING SYSTEM.XML;

Public class test {

Public static void main (String [] args) {

Streetwriter SW = file.appendtext ("logfile-entries.txt");

XmlTextWriter XTW = New XMLTextWriter (SW);

Xtw.writestartElement ("Event");

Xtw.WriteElementstring ("IP", "192.168.0.1");

Xtw.writeElementstring ("http_method", "post");

Xtw.writeElementstring ("File", "Comments.aspx");

Xtw.writeElementstring ("Date", "1999-05-05T19: 25: 13.238220-08: 00");

Xtw.close ();

}

}

Once an entry is added to a text file, you can handle an entry in the XML log file using a traditional XML processing method. The following code uses XPath to traverse log events in logfile.xml, and lists the files they are accessed and the files accessed.

Using system;

USING SYSTEM.XML;

Public class test2 {

Public static void main (String [] args) {

XMLValidatingReader VR =

New XMLValidatingReader (New XMLTextReader ("Logfile.xml"));

vr.validationtyty = validationtype.none;

Vr.entityHandling = EntityHandling.expandentities;

XmLDocument Doc = New XmLDocument ();

Doc.Load (VR);

Foreach (xmlelement element in doc.selectnodes ("// Event")) {

String file = element.childnodes [2] .innertext;

String date = element.childNodes [3] .innertext;

Console.writeline ("{0} accessed at {1}", file, date);

}

}

}

The above code causes the following output:

Index.html Accessed AT 2004-04-01T17: 35: 20.0656808-08: 00

Stylesheet.css Accessed AT 2004-04-01T17: 35: 23.0656120-08: 00

Logo.gif Accessed AT 2004-04-01T17: 35: 25.238220-08: 00

Comments.aspx accessed at 1999-05-05T19: 25: 13.238220-08: 00

Change XMLReader for XMLWRITER

In some cases, in addition to simply adding elements to the root element, it is necessary to perform more complex operations on the XML file. For example, you want to filter every entry in the log file, and these entries do not meet certain special criteria before archiving to the log file. One way to complete this task is to load the XML file into the XMLDocument and then select an event of interest by XPath. However, doing so involve loading the entire document into memory, this method will be restricted if the document is too large. Another option for this task involves using XSLT, but because the entire XML document needs to be saved to memory, this method will encounter the same problem as the XMLDocument method. In addition, because developers are not familiar with XSLT, they will experience greater difficulties when using template matches. A way to resolve how to handle large XML document issues is to read XML using XMLReader and read it using XMLWriter while reading. Using this method, the entire document does not store in memory at the same time, and the XML can make more accurate changes rather than just adding elements. The following code example reads the XML document of the front portion, and filter out the value of all IP elements to "127.0.0.1", saved it as an archive file.

Using system;

USING SYSTEM.XML;

Using system.io;

Using system.text;

Public class test2 {

STATIC STRING IPKEY;

STATIC STRING HTTPMETHODKEY;

STATIC STRING FILEY;

STATIC STRING DATEKEY;

STATIC STRING REFERRKEY;

Public Static Void WriteAttributes (XmlReader Reader, XMLWRITER WRITER) {

IF (reader.movetofirstattribute ()) {

Do {

Writer.writeAttributeString (Reader.PREFIX,

Reader.localname,

Reader.namespaceuri,

Reader.Value);

} while (reader.movetonextAttRibute ());

Reader.MoveToelement ();

}

}

Public Static Void WriteEvent (XMLWRITER WRITER, STRING IP,

String httpmethod, string file,

String Date, String Referre) {

Writer.writestartElement ("Event");

Writer.writeElementstring ("IP", IP);

Writer.writeElementString ("http_method", httpmethod);

Writer.writeElementstring ("File", File;

Writer.writeElementstring ("Date", Date);

IF (ReferR! = null) Writer.writeElementString ("Referrer", Referrer;

Writer.writeEndelement ();

}

Public Static Void Readevent (XmlReader Reader, Out String IP,

Out string httpmethod, Out String File,

Out string date, out string referre) {

IP = httpmethod = file = DATE = ReferR = null; while (reader.read () && readype! = xmlnodetype.endelement) {

IF (reader.nodetype == xmlnodetype.element) {

IF (reader.name == ipkey) {

IP = Reader.Readstring ();

} else if (reader.name == httpmethodkey) {

HttpMethod = Reader.Readstring ();

} else if (reader.name == filekey) {

File = Reader.Readstring ();

} else if (reader.name == Datekey) {

Date = Reader.Readstring ();

// reader.read (); // Use the end tag

} else if (reader.name == referrerkey) {

Referrer = Reader.Readstring ();

}

} // if

} // while

}

Public static void main (String [] args) {

String IP, HTTPMETHOD, FILE, DATE, REFERRER;

// Set XMLNameTable using a string for comparison

XMLNameTable XNT = New NameTable ();

IpKey = Xnt.Add ("IP");

HttpMethodKey = Xnt.add ("http_method");

FileKey = Xnt.Add ("file");

DateKey = Xnt.Add ("Date");

ReferrerKey = Xnt.Add ("Referrer");

// Load XMLTextReader using the XMLNameTable above

XMLTextReader XR = New XMLTextReader ("logfile.xml", xnt);

Xr.WhitespaceHandling = WhitespaceHandling.Significant;

XMLValidatingReader VR = New XmlValidatingReader (XR);

vr.validationtyty = validationtype.none;

Vr.entityHandling = EntityHandling.expandentities;

Streamwriter SW =

New streamwriter ("logfile-archive.xml", false, encoding.utf8);

XMLWRITER XW = New XMLTextWriter (SW);

vr.movetocontent (); // Move to document elements

Xw.writestartElement (vr.prefix, vr.localname, vr.namespaceuri);

WRITEATTRIBUTES (VR, XW);

vr.read (); // moved to the first child element of the document element

// Write an incident in 127.0.0.1 (local host)

DO

{

READEVENT (VR, OUT IP, OUT HTTPMETHOD,

Out File, Out Date, Out Referre;

IF (! ip.equals ("127.0.0.1")) {

WriteEvent (XW, IP, HttpMethod, File, Date, Referre);

}

vr.read (); // Move to the next element or end tag

} while (vr.nodetype == xmlnodetype.element);

Console.WriteLine ("DONE");

vr.close ();

xw.close ();

}

}

The above code example causes the output below when writing to the logfile-archive.xml file:

192.168.0.1

Post

comments.aspx

1999-05-05T19: 25: 13.238220-08: 00

In addition to using XMLReader to XMLWRITER's chain, another interesting aspect of the above code is that using the Readevent () method checks the performance of the text comparison when using NameTable when using the Readevent () method. Using this method in XmlReader Checking the tag name of the element in the MSDN document topic in the following MSDN documentation: Object Comparison Using XMLNameTable with Xmlreader (English).

转载请注明原文地址:https://www.9cbs.com/read-126685.html

New Post(0)