Fourth XML analysis technology is incomplete test

xiaoxiao2021-03-06 178

In ordinary work, it is inevitable that XML is encountered as a data storage format. In the face of current types of solutions, what is best for us? In this article, I do an incomplete evaluation of these four mainstream programs, only for the traversal of XML, because the traversal XML is the most used in work (at least I think). Prepare test environment: AMD poison dragon 1.4G OC 1.5G, 256M DDR333, Windows2000 Server SP4, Sun JDK 1.4.1 Eclipse 2.1 resin 2.1.8, test in Debug mode. The XML file format is as follows: a1234 XX Road XX Road XX, XX Road XX, XX, XX, Sichuan / Addr> b1234 Sichuan XX City XX Township XX Village XX Group

Test method: use JSP end calling bean (as for the JSP to call, please refer to: http://blog.9cbs.net/roSen/archive/2004/10/15/138324.aspx), let each scheme respectively Analysis of 10K, 100K, 1000K, 10000K XML file, calculate its consumption time (unit: milliseconds). JSP file: <% @ Page ContentType = "text / html; charset = GB2312"%> <% @ page import = "com.test. *"%>

<% string args [] = {""}; myxmlreader.main (args);%>

The test is first appeared, the DOM (JAXP CRIMSON parser)

The DOM uses the official W3C standard of the XML document with the platform and language. The DOM is a collection of nodes or information tabs in hierarchical organizations. This hierarchy allows developers to find specific information in the tree. Analysis This structure typically needs to load the entire document and construct hierarchy before you can do anything. Since it is based on the information level, the DOM is considered to be trees or object-based. DOM and the generalized tree-based treatment have several advantages. First, since the tree is lasting in memory, it can modify it so that the application can make changes to the data and structure. It can also navigate up and down at any time, rather than being a one-time processing like SAX. DOM should be much simpler. On the other hand, for a particularly large document, parsing, and loading the entire document may be slow and very resource, so use other means to handle such data is better. These event-based models such as SAX.

Bean file: package com.test; import java.io. *; import java.util. *; Import org.w3c.dom. *; Import javax.xml.parsers. *;

Public class myxmlreader {

public static void main (String arge []) {long lasting = System.currentTimeMillis (); try {File f = new File ( "data_10k.xml"); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance (); DocumentBuilder builder = factory.newDocumentBuilder (); Document Doc = Builder.Parse (f); Nodelist NL = Doc.GtelementsBytagname ("Value"); for (int i = 0; i

10K Time: 265 203 219 172 100K Consumption Time: 9172 9016 8891 9000 1000K Consumption Time: 691719 675407 708375 73965610000K Time: OutofMemoryError

Then, the advantage of this processing of SAX is very similar to the advantages of streaming. The analysis can start immediately, not waiting for all the data being processed. Moreover, since the application is only checked when data is read, it is not necessary to store the data in memory. This is a huge advantage for large documents. In fact, the application does not even have to resolve the entire document; it can stop parsing when a condition is satisfied. In general, SAX is still much faster than its alternative DOM.

Select DOM or choose SAX?

For developers who need to write code to handle XML documents, choose DOM or SAX resolution model is a very important design decision. The DOM uses a tree structure to access the XML document, while the event model adopted by SAX. The DOM parser converts the XML document into a tree containing its content, and can traverse the tree. The advantage of using the DOM parsing model is that the programming is easy, and the developer only needs to call the manifold of the build tree, and then use the Navigation API to access the tree node required to complete the task. Elements in the tree can be easily added and modified. However, due to the use of the DOM parser, the entire XML document is required, so the requirements for performance and memory are relatively high, especially when they encounter a large XML file. Due to its traversal capabilities, DOM parsers are often used in XML documents that require frequent changes. The SAX parser uses an event-based model. It can trigger a series of events when parsing the XML document. When a given TAG is found, it can activate a callback method, telling the label that the method has been found. SAX's requirements for memory are usually relatively low because it allows developers to determine the TAG to be processed. In particular, when the developer only needs to process some of the data contained in the document, SAX has a better manifestation. However, encoding work is more difficult to use the SAX parser, and it is difficult to access multiple different data in the same document. BEAN file: package com.test; import org.xml.sax. *; Import org.xml.sax.helpers. *; Import javax.xml.Parsers. *; Public class myxmlreader extends defaulthandler {

Java.util.stack tags = new java.util.stack ();

Public myXmlreader () {super ();

public static void main (String args []) {long lasting = System.currentTimeMillis (); try {SAXParserFactory sf = SAXParserFactory.newInstance (); SAXParser sp = sf.newSAXParser (); MyXMLReader reader = new MyXMLReader (); sp. PARSE ("Data_10K.xml"), Reader;} catch (exception e) {E.PrintStackTrace ();} system.out.println ("Running time:" (system.currenttimemillis () () - lasting "Milliseconds");}

Public void characters (Char ch [], int start, int tent) THROWS SAXEXCEPTION {String Tag = (String) tags.peek (); if (tag.equals ("no")) {system.out.print ("license plate Number: " New String (CH, START, LENGTH));} IF (tag.equals (" addr ")) {system.out.println (" Address: New String (CH, Start, Length)); }} Public void startElement (String Uri, String Localname, String Qname, Attributes Attrs) {tags.push (qname);}}

10K Time: 110 47 109 78 100K Time: 344 406 375 422 1000k Consumption Time: 3234 3281 3688 331210000k Consumption Time: 32578 34313 31797 31890 30328

Then jdom http://www.jdom.org/

The purpose of JDOM is to be a Java-specific document model that simplifies the interaction with XML and is faster than using the DOM. Since it is the first Java specific model, JDOM has been vigorously promoted and promoted. It is considering that it is ultimately used as "Java Standard Extension" through "Java Specification Request JSR-102". JDM development has begun from early 2000.

There are two main aspects of JDOM and DOM. First, JDOM only uses a specific class without using an interface. This simplifies API in some respects, but also limits flexibility. Second, the API uses a Collections class, simplifies the use of Java developers that are familiar with these classes.

The JDOM document declares that its purpose is to "use 20% (or fewer) energy to solve 80% (or more) Java / XML issues" (assuming 20% depending on the learning curve). Jdom is of course useful for most Java / XML applications, and most developers have found that API is much easier to understand than DOM. JDOM also includes a considerable extensive check of program behavior to prevent users from doing anything in XML. However, it still needs you to fully understand XML to do some work beyond basic work (or even understand in some cases). This may be more meaningful than learning the DOM or JDOM interface.

JDOM does not contain a parser. It usually uses the SAX2 parser to parse and verify the input XML document (although it can also represent the previously constructed DOM as input). It contains some converters to indicate the JDOM to the SAX2 event stream, a DOM model, or an XML text document. JDOM is an open source released under the Apache license variant.

Bean file: package com.test;

Import java.io. *; import java.util. *; import org.jdom. *; import org.jdom.input. *;

public class MyXMLReader {public static void main (String arge []) {long lasting = System.currentTimeMillis (); try {SAXBuilder builder = new SAXBuilder (); Document doc = builder.build (new File ( "data_10k.xml") Element foo = doc.getrootElement (); list allchildren = foo.getChildren (); for (int i = 0; i

10K consumption: 125 62 187 94 100K consumption time: 704 625 640 766 1000K time consumption time: 27984 30750 27859 3065610000K Time: OutofMemoryError

Finally, DOM4J http://dom4j.sourceforge.net/ Although DOM4J represents a completely independent development result, it is initially, it is a smart branch of JDOM. It merges many functions that exceed the basic XML document, including integrated XPath support, XML Schema support, and event-based processing for large documents or fluidized documents. It also provides an option to build a document, which has parallel access functions via the DOM4J API and the standard DOM interface. Starting from the second half of 2000, it has been in development.

To support all of these features, DOM4J uses interfaces and abstract basic classes. DOM4J uses a large number of COLLECTIONS classes in the API, but in many cases, it also provides some alternative methods to allow better performance or more direct coding methods. Direct advantage is that although DOM4J has paid a more complex API price, it provides much flexibility than JDOM.

When adding flexibility, XPath integration, and targets for large documents, DOM4J's goals are the same as JDOM: for easy-to-use and intuitive operations of Java developers. It is also committed to becoming a more complete solution than JDOM, achieving the goals of all Java / XML issues in nature. When this goal is completed, it is more emphasized than JDOM to prevent incorrect application behavior.

DOM4J is a very very excellent Java XML API with features excellent performance, powerful and extremely easy to use, and it is also an open source software. Now you can see that more and more Java software is using DOM4J to read and write XML, especially worth mentioning that Sun's JAXM is also using DOM4J. Bean file: package com.test;

Import java.io. *; import java.util. *; import org.dom4j. *; import org.dom4j.io. *;

Public class myxmlreader {

public static void main (String arge []) {long lasting = System.currentTimeMillis (); try {File f = new File ( "data_10k.xml"); SAXReader reader = new SAXReader (); Document doc = reader.read ( f); element root = doc.getrootElement (); element foo; for (iTerator i = root.elementiterator ("value"); I.hasNext ();) {foo = (element) i.next (); system. Out.print ("license plate number:" foo.ElementText ("no")); system.out.println ("owner address:" foo.ementtext ("addr"));}} Catch (Exception E) { E.PrintStackTrace ();} system.out.println ("Running time:" (system.currenttimemillis () () "millisecond");}}

10K consumption time: 109 78 109 31 100K time consumption time: 297 359 172 312 1000K time consumption time: 2281 2359 2344 24691 2359 2344 246910000K Time: 20938 19922 20031 21078

THE END

Jdom and DOM behave poor performance during performance testing, memory overflow when testing 10M documents. In the case of small documents, it is also worth considering using DOM and JDOM. Although JDOM developers have explained that they expect to focus on performance issues before formal release, it is indeed worth recommending. In addition, DOM is still a very good choice. The DOM implementation is widely used in a variety of programming languages. It is still the basis of many other standards related to XML because it officially gets W3C recommendation (relative to non-standard Java models), so it may also need it in some types of items (such as using DOM in JavaScript). SAX performance is better, which relies on its specific resolution. A SAX detection is coming upcoming XML stream, but does not load into memory (of course, some documents are temporarily hidden in memory when the XML stream is read. Undoubtedly, DOM4J is the winner of this test. At present, many of the open source projects use DOM4J, such as Ding Ding Hibernate, also use DOM4J to read XML configuration files. If you don't consider portability, then use DOM4J! Reference: http://www-900.ibm.com/developerworks/cn/xml/x-injava/index.shtmlhttp://www-900.ibm.com/developerWorks/cn/xml/x-injava2/index .shtml (Please note that the reference should indicate the original author posted this article:! Rosen Jiang and source: http: //blog.9cbs.net/rosen)

转载请注明原文地址:https://www.9cbs.com/read-100052.html

9cbs

New Post(0)