Incomplete test of four XML analysis technologies in Java (reproduced)

xiaoxiao2021-03-06  71

Incomplete test of four XML analysis technology in Java

Sina 2004-11-02 13: 12: 49385 views

In ordinary work, it is inevitable that XML is encountered as a data storage format. In the face of current types of solutions, what is best for us? In this article, I do an incomplete evaluation of these four mainstream programs, only for the traversal of XML, because the traversal XML is the most used in work (at least I think).

preparation

test environment:

AMD Dragon 1.4G OC 1.5G, 256M DDR333, Windows2000 Server SP4, Sun JDK 1.4.1 Eclipse 2.1 Resin 2.1.8 is tested in Debug mode.

The XML file format is as follows:

A1234

XX Road X-XX, XX Road XX, XX, Sichuan Province, XX, XX, XX, XX, XX,

B1234

XX City XX Township XX, Sichuan Province XX Group

testing method:

Adopting the JSP end calling bean (as for the JSP to call, please refer to: http://blog.9cbs.net/roSen/archive/2004/10/15/138324.aspx), let each scheme parse 10K, 100K, 1000K, 10000K XML file, calculate its consumption time (unit: milliseconds).

JSP file:

<% @ Page ContentType = "Text / HTML; Charset = GB2312"%> <% @ page import = "com.test. *"%>

<% string args [] = {""}; myxmlreader.main (args);%>

test

First, the DOM (JAX CRIMSON parser)

The DOM uses the official W3C standard of the XML document with the platform and language. The DOM is a collection of nodes or information tabs in hierarchical organizations. This hierarchy allows developers to find specific information in the tree. Analysis This structure typically needs to load the entire document and construct hierarchy before you can do anything. Since it is based on the information level, the DOM is considered to be trees or object-based. DOM and the generalized tree-based treatment have several advantages. First, since the tree is lasting in memory, it can modify it so that the application can make changes to the data and structure. It can also navigate up and down at any time, rather than being a one-time processing like SAX. DOM should be much simpler.

On the other hand, for a particularly large document, parsing, and loading the entire document may be slow and very resource, so use other means to handle such data is better. These event-based models such as SAX.

BEAN file:

Package com.test;

Import java.io. *; import java.util. *; import org.w3c.dom. *; import javax.xml.parsers. *; public class myxmlreader {

Public static void main (string arge []) {

Long lasting = system.currenttimemillis ();

Try {

FILE F = New File ("DATA_10K.XML");

DocumentBuilderFactory Factory = DocumentBuilderFactory.newinstance ();

DocumentBuilder Builder = Factory.NewDocumentBuilder ();

Document doc = builder.parse (f);

Nodelist nl = doc.getElementsBytagname ("Value");

For (int i = 0; i

System.out.print ("license plate number:" Doc.GetElementsBytagname ("no"). Item (i) .GetfirstChild (). GetnodeValue ());

System.out.println ("Car owner address:" Doc.getElementsBytagname ("addr"). Item (i) .GetfirstChild (). GetnodeValue ());

}

} catch (exception e) {

E.PrintStackTrace ();

}

System.out.println ("Run Time:" (System.CurrentTimeMillis () - lasting) "Mix");}}

10K time consumption: 265 203 219 172

100K consumption time: 9172 9016 8891 9000

1000K consumption time: 691719 675407 708375 739656

10000k time: OutofMemoryError

Then SAX

The advantage of this process is very similar to the advantages of streaming. The analysis can start immediately, not waiting for all the data being processed. Moreover, since the application is only checked when data is read, it is not necessary to store the data in memory. This is a huge advantage for large documents. In fact, the application does not even have to resolve the entire document; it can stop parsing when a condition is satisfied. In general, SAX is still much faster than its alternative DOM.

Select DOM or choose SAX?

For developers who need to write code to process XML documents,

Selecting a DOM or a SAX resolution model is a very important design decision.

The DOM uses a tree structure to access the XML document, while the event model adopted by SAX.

The DOM parser converts the XML document into a tree containing its content, and can traverse the tree. The advantage of using the DOM parsing model is that the programming is easy, and the developer only needs to call the manifold of the build tree, and then use the Navigation API to access the tree node required to complete the task. Elements in the tree can be easily added and modified. However, due to the use of the DOM parser, the entire XML document is required, so the requirements for performance and memory are relatively high, especially when they encounter a large XML file. Due to its traversal capabilities, DOM parsers are often used in XML documents that require frequent changes.

The SAX parser uses an event-based model. It can trigger a series of events when parsing the XML document. When a given TAG is found, it can activate a callback method, telling the label that the method has been found. SAX's requirements for memory are usually relatively low because it allows developers to determine the TAG to be processed. In particular, when the developer only needs to process some of the data contained in the document, SAX has a better manifestation. However, encoding work is more difficult to use the SAX parser, and it is difficult to access multiple different data in the same document. BEAN file:

Package com.test; import org.xml.sax. *; import org.xml.sax.helpers. *; import javax.xml.parsers. *;

Public class myxmlreader extends defaulthandler {

Java.util.stack tags = new java.util.stack ();

Public myxmlreader () {

Super ();}

Public static void main (string args []) {

Long lasting = system.currenttimemillis ();

Try {

SAXPARSERFAACTORY SF = SAXPARSERFAACTORY.NEWINSTANCE ();

SAXPARSER SP = sf.newsaxparser ();

MyXmlReader Reader = new myXmlreader () and NEW MYXMLREADER ();

Sp.Pars (New InputSource ("DATA_10K.XML"), Reader);

} Catch (exception e) {

E.PrintStackTrace ();

}

System.out.println ("Run Time:" (System.CurrentTimeMillis () - Lasting) "Mills");

Public void characters (char ch [], int start, int layth) throws saxexception {

String tag = (string) tags.peek ();

IF (tag.equals ("no")) {

System.out.print ("license plate number:" new string (ch, start, length));}} (tag.equals ("addr")) {

System.out.println ("Address:" New String (CH, Start, Length);}}

Public void StartElement (String Uri, String Localname, String Qname, Attributes Attrs) {

Tags.push (qname);}}

10K time consumption: 110 47 109 78

100K time consumption: 344 406 375 422

1000k consumption time: 3234 3281 3688 3312

10000K time consumption: 32578 34313 31797 31890 30328

Then jdom http://www.jdom.org/

The purpose of JDOM is to be a Java-specific document model that simplifies the interaction with XML and is faster than using the DOM. Since it is the first Java specific model, JDOM has been vigorously promoted and promoted. It is considering that it is ultimately used as "Java Standard Extension" through "Java Specification Request JSR-102". JDM development has begun from early 2000.

There are two main aspects of JDOM and DOM. First, JDOM only uses a specific class without using an interface. This simplifies API in some respects, but also limits flexibility. Second, the API uses a Collections class, simplifies the use of Java developers that are familiar with these classes. The JDOM document declares that its purpose is to "use 20% (or fewer) energy to solve 80% (or more) Java / XML issues" (assuming 20% ​​depending on the learning curve). Jdom is of course useful for most Java / XML applications, and most developers have found that API is much easier to understand than DOM. JDOM also includes a considerable extensive check of program behavior to prevent users from doing anything in XML. However, it still needs you to fully understand XML to do some work beyond basic work (or even understand in some cases). This may be more meaningful than learning the DOM or JDOM interface.

JDOM does not contain a parser. It usually uses the SAX2 parser to parse and verify the input XML document (although it can also represent the previously constructed DOM as input). It contains some converters to indicate the JDOM to the SAX2 event stream, a DOM model, or an XML text document. JDOM is an open source released under the Apache license variant.

BEAN file:

Package com.test;

Import java.io. *; import java.util. *; import org.jdom. *; import org.jdom.input. *;

Public class myxmlreader {

Public static void main (string arge []) {

Long lasting = system.currenttimemillis ();

Try {

Saxbuilder Builder = new saxbuilder ();

Document doc = builder.build (New file ("DATA_10K.xml");

Element foo = doc.getrootelEment ();

List allchildren = foo.getChildren ();

For (int i = 0; i

System.out.print ("license plate number:" (Element) Allchildren.get (i)). GetChild ("no"). GetText ());

System.out.println ("owner address:" ((Element) allchildren.get (i)). GetChild ("addr"). GetText ());

}

} Catch (exception e) {

E.PrintStackTrace ();

}

System.out.println ("Run Time:" (System.CurrentTimeMillis () - lasting) "Mix");}}

10K consumption time: 125 62 187 94

100K time consumption: 704 625 640 766

1000K time consumption: 27984 30750 27859 30656

10000k time: OutofMemoryError

Finally Dom4j http://dom4j.sourceforge.net/

Although DOM4J represents complete independent development results, it is initially, it is a smart branch of JDOM. It merges many functions that exceed the basic XML document, including integrated XPath support, XML Schema support, and event-based processing for large documents or fluidized documents. It also provides an option to build a document, which has parallel access functions via the DOM4J API and the standard DOM interface. Starting from the second half of 2000, it has been in development. To support all of these features, DOM4J uses interfaces and abstract basic classes. DOM4J uses a large number of COLLECTIONS classes in the API, but in many cases, it also provides some alternative methods to allow better performance or more direct coding methods. Direct advantage is that although DOM4J has paid a more complex API price, it provides much flexibility than JDOM.

When adding flexibility, XPath integration, and targets for large documents, DOM4J's goals are the same as JDOM: for easy-to-use and intuitive operations of Java developers. It is also committed to becoming a more complete solution than JDOM, achieving the goals of all Java / XML issues in nature. When this goal is completed, it is more emphasized than JDOM to prevent incorrect application behavior.

DOM4J is a very very excellent Java XML API with features excellent performance, powerful and extremely easy to use, and it is also an open source software. Now you can see that more and more Java software is using DOM4J to read and write XML, especially worth mentioning that Sun's JAXM is also using DOM4J.

BEAN file:

Package com.test;

Import java.io. *; import java.util. *; import org.dom4j. *; import org.dom4j.io. *;

Public class myxmlreader {

Public static void main (string arge []) {

Long lasting = system.currenttimemillis ();

Try {

FILE F = New File ("DATA_10K.XML");

SAXReader Reader = New SaxReader ();

Document doc = reader.read (f);

Element root = doc.getrootelEment ();

ELEMENT FOO;

Iterator i = root.elementiterator ("value"); I.hasNext ();) {

Foo = (element) i.next ();

System.out.print ("license plate number:" foo.ElementText ("no"));

System.out.println ("Car owner address:" foo.ElementText ("addr"));

}

} Catch (exception e) {

E.PrintStackTrace ();

}

System.out.println ("Run Time:" (System.CurrentTimeMillis () - lasting) "Mix");}}

10K Time: 109 78 109 31

100K time consumption: 297 359 172 312

1000K time consumption: 2281 2359 2344 2469

10000K time consumption: 20938 19922 20031 21078

Jdom and DOM behave poor performance during performance testing, memory overflow when testing 10M documents. In the case of small documents, it is also worth considering using DOM and JDOM. Although JDOM developers have explained that they expect to focus on performance issues before formal release, it is indeed worth recommending. In addition, DOM is still a very good choice. The DOM implementation is widely used in a variety of programming languages. It is still the basis of many other standards related to XML because it officially gets W3C recommendation (relative to non-standard Java models), so it may also need it in some types of items (such as using DOM in JavaScript). SAX performance is better, which relies on its specific resolution. A SAX detection is coming upcoming XML stream, but does not load into memory (of course, some documents are temporarily hidden in memory when the XML stream is read.

Undoubtedly, DOM4J is the winner of this test. At present, many of the open source projects use DOM4J, such as Ding Ding Hibernate, also use DOM4J to read XML configuration files. If you don't consider portability, then use DOM4J! (Text / rosen)

Welcome to reprint in the case of retaining http://www.javajia.com!

转载请注明原文地址:https://www.9cbs.com/read-91457.html

New Post(0)