Brief Discussion on Working Principles of Different XML Document Models in Java

xiaoxiao2021-03-06  106

President of Dennis M. Sosnoski (DMS@sosnoski.com), Sosnoski Software Solutions, Inc. 2002 February 2002

In this article, the XML tool observers Dennis Sosnoski compares the availability of several Java document models. When you are elected a model, it is not always clear and what is clear, and if you change your mind later, you may need to make a lot of re-coding work to convert. The author combines the sample code with the analysis of the model API, which models may truly give your work easily. This article contains code samples that display five different document models.

In the first article of this series, I studied some of the performance of the main XML document model written in Java. However, when starting to select this type of technology, performance is only part of the problem. The use of convenience is at least equally important, and it is a primary reason to support the use of Java-specific models, not DOM-independent of language.

In order to understand which model truly role, you need to know how they are ranked in availability. In this article, I will try to do this, start from the sample code to demonstrate how to encode public types in each model. The results are summarized to end this paper, and some other factors that have prompted a more easily used than another.

See the previous article (see the convenient link under the content or this article "to get the background information of each model used in this contrast, including the actual version number. You can also refer to the link to the source code to download, to the model home page in the "Reference" section.

Code Comparison In these contrasts of the usage technology for different documents, I will display three basic operations in each model:

Built a document according to the input stream

Traverse elements and content and do some changes:

Remove the preamble and trailing blank from the text content.

If the content text content is empty, it deletes it.

Otherwise, it is packaged into a new element called "Text" in the namespace of the parent element.

Write the modified document to the output stream

These examples of code are based on the reference programs I used in the last article and have made some simplification. The focus of the reference process is to display the best performance of each model; for this article, I will try to display the easiest way to implement the operation in each model.

I have a structured example of each model into two separate code segments. The first paragraph is a read document, calling the modified code and writing code that has been modified. The second section is a recursive method that truly traverses document representation and execution. In order to avoid decentralization, I have ignored the abnormal processing in the code.

You can receive a complete code for all samples from the bottom of this page to get the full code of all samples. Sample download versions include a test driver, and some add code is used to check the operation of different models by calculating elements, deleting and adding numbers.

Even if you don't want to use the DOM implementation, it is worth browsing the description of the DOM usage below. Because the DOM example is the first example, I use it to explore some of the problems and structures of this example more than one problem compared to the rear model. Browse these contents to add some details you want to know, if you read one of the other models directly, you will miss these details.

DomDOM specification covers all types of operations represented by documents, but it does not involve issues such as syntax analysis and generation text to generate text output. Includes two DOM implementations in performance testing, Xerces and Crimson use different technologies. Listing 1 shows a form of top-level code for Xerces.

Listing 1. Xerces DOM top code

1 // PARSE The Document from Input Stream ("in")

2 DOMPARSER PARSER = New DOMPARSER ();

3 Parser.SetFeature ("http://xml.org/sax/features/namespaces", true); 4 Parser.Pars (New InputSource (in));

5 Document Doc = Parser.getDocument ();

6 // Recursively Walk and Modify Document

7 modiFyElement (Doc.getDocumentElement ());

8 // Write The Document To Output Stream ("OUT")

9 OutputFormat Format = New OutputFormat (DOC);

10 xmlserializer serializer = new XMLSerializer (out, format);

11 Serializer.Serialize (Doc.getDocumentelement ());

As I pointed out in the comment, the first code (1-5 line) in Listing 1 processes the syntax analysis of the input stream to build a document. XERCES defines the Domparser class to build a document from the Xerces Syntax Analyzer. The InputSource class is part of the SAX specification that can adapt to any one of several input forms for SAX analyzers. The actual syntax analysis and document constructs are performed by a single call. If this is successfully completed, the application can retrieve and use the structured document.

The second code block (line 6-7) just passes the root element of the document to the recursive modification method you want to talk about. These codes are the same in nature, so I will skip it in the remaining example, and no longer discuss it.

The third code block (line 8-11) processing is written to the output stream as a text. Here, the OutputFormat class package documentation and provides a variety of options for formatting text. The actual generation of the XMLSerializer class processing output text.

XERCES's Modify method only uses a standard DOM interface, so it is also compatible with any other DOM. Listing 2 shows the code.

Listing 2. DOM MODIFY method

1 protected void modifyelement (element element) {

2 // loop through child nodes

3 node child;

4 node next = (node) Element.getfirstchild ();

5 While ((Child = next)! = Null) {

6 // set next before we change anything

7 next = child.getnextsibling ();

8 // Handle Child By Node Type

9 IF (child.getnodetype () == node.text_node) {

10 // Trim Whitespace from Content Text

11 string trimmed = child.getnodeValue (). Trim ();

12 IF (Trimmed.Length () == 0) {

13 // delete child if Nothing but whitespace

14 Element.removechild (child);

15} else {16 // Create A "Text" Element Matching Parent Namespace

17 Document Doc = Element.getownerDocument ();

18 string prefix = element.getprefix ();

19 string name = (prefix == null)? "Text": (prefix ": text");

20 element text =

21 Doc.createElementns (Element.getNamespaceuri (), NAME

22 // Wrap The Trimmed Content with New Element

23 text.appendchild (doc.createtextNode (TRIMMED));

24 Element.Replacechild (Text, Child);

25}

26} else if (child.getnodetype () == node.ement_node) {

27 // Handle Child Elements with Recursive Call

28 ModifyElement (Element);

29}

30}

31}

The basic method used by the method shown in Listing 2 is the same as the method represented by all documents. Call it through an element, it traverses the child elements of that element. If you find a text content sub-element, you either delete text (if it is only consisting of space), either package text (if there is a non-spaced character) by new elements named "text" with the same namespace containing the elements. If you find a child element, then this method uses this sub-element, recursively call it itself.

For DOM implementation, I use a pair of references: Child and Next to track the position I am in the subsequent list of sub-elements. Before any other processing is performed on the current child node, you will be loaded into the next child node (line 7). This makes I can delete or replace the current child node without losing my trace in the list.

When I create a new element to pack non-blank text content (line 16-24), the DOM interface begins a bit messy. The method used to create elements is associated with documents and become a whole, so I need to retrieve the elements I are working in the owner document (Chapter 17). I want to place this new element in the same namespace as the existing parent element, and in the DOM, this means that I need to construct the qualified name. This operation will vary depending on whether there is a namespace, this operation will be different (line 18-19). With the limited name of the new element, I can create new elements (line 20-21).

Once you have created a new element, just create and add a text node to package the content string, then use the newly created elements to replace the original text node (line 22-24).

Listing 3. CRIMSON DOM top code

1 // Parse The Document from Input Stream

2 System.SetProperty ("javax.xml.parsers.Documentbuilderfactory", 3 "org.apache.crimson.jaxp.documentbuilderfactoryImpl");

4 DocumentBuilderFactory dbf = documentBuilderFactoryImpl.newinstance ();

5 dbf.setnamespaceaware (TRUE);

6 DocumentBuilder Builder = dbf.newdocumentbuilder ();

7 Document Doc = Builder.Parse (in);

8 // Recursively Walk and Modify Document

9 modifyElement (doc.getdocumentelement ());

10 // Write the Document to Output Stream

11 (XMLDocument) doc) .write (out);

The CRIMSON DOM sample code in Listing 3 uses a JAXP interface for syntax analysis. JAXP provides a standardized interface for syntax analysis and conversion XML documents. The syntax analysis code in this example can also be used for Xerces (settings with the features of the document builder class name) to replace the earlier Xerces specific sample code.

In this example, I first set the system characteristics in the second line to the third line to select the builder factory class (JAXP only supports building DOM), and does not support any other representation discussed in this article. ). This step is only required when you want to choose a specific DOM to be used by JAXP; otherwise, it uses the default implementation. For integrity, I contain this feature in the code, but more common is to set it into a JVM command line parameter.

Then I created an instance of the builder plant in Chain 4 to 6, enabling the namespace support for builders that use the plant instance constructor, and create a document builder from the builder plant. Finally (line 7), I use the Document Builder to grant the input stream and construct the document representation.

In order to write documents, I use the basic ways of internal definitions in CRIMSON. This method is not guaranteed to support this method in the CRIMSON, but use JAXP conversion code to use an alternative to the text as a text, such as XSL processors such as Xalan. That exceeds the scope of this article, but to get more information, you can check the JAXP tutorial in Sun.

JDOM uses JDM's top code than the code implemented using the DOM. To build a document (1-3), I use SaxBuilder with the verification by the parameter value. Write the output stream of the modified document into the output stream as simple as the provided XMLOUTPUTTER class (line 6-8).

Listing 4. JDOM top code

1 // Parse The Document from Input Stream

2 SAXBUILDER Builder = New Saxbuilder (false);

3 Document Doc = Builder.Build (in);

4 // Recursively Walk and Modify Document

5 modiFyElement (Doc.getrootElement ());

6 // Write the Document To Output Stream

7 xmloutPutter Outer = new xmloutputter (); 8 Outer.output (DOC, OUT);

The MODIFY method of JDOM in Listing 5 is also simpler than the same method of the DOM. I gain a list containing all content of the element and scan this list, check the text (icon like String object) and elements. This list is "live", so I can change it directly without calling the method of parent elements.

Listing 5. JDOM Modify method

1 protected void modifyelement (element element) {

2 // loop through child nodes

3 list children = element.getcontent ();

4 for (int i = 0; i

5 // Handle Child by Node Type

6 Object Child = Children.get (i);

7 if (child instanceof string) {

8 // Trim Whitespace from Content Text

9 string trimmed = child.toString (). Trim ();

10 IF (Trimmed.Length () == 0) {

11 // delete child if only whitespace (Adjusting Index)

12 Children.Remove (I -);

13} else {

14 // Wrap The Trimmed Content with New Element

15 element text = new element ("text", element.getnamespace ());

16 text.Settext (TRIMMED);

17 Children.Set (i, text);

18}

19} else if (child instanceof element) {

20 // Handle Child Elements with Recursive Call

21 ModifyElement (Element);

twenty two }

twenty three }

twenty four }

Creating new elements (line 14-17) is very simple, and unlike DOM versions, it does not need to access the parent document.

The top code of the DOM4JDOM4J is slightly more complicated than JDOM, but their code line is very similar. The main difference here is that I saved the DocumentFactory (line 5) used to build DOM4J documents, and refreshed with Writer after outputting the modified document text.

Listing 6. Top code for DOM4J

1 // Parse The Document from Input Stream

2 SAXReader Reader = New Saxreader (FALSE);

3 Document Doc = Reader.Read (in);

4 // Recursively Walk and Modify Document

5

M

_factory = reader.getDocumentFactory ();

6 modifyElement (Doc.GetrootElement ()); 7 // Write the Document To Output Stream

8 xmlwriter write = new XMLWriter (OUT);

9 Writer.write (DOC);

10 writer.flush ();

As you can see in Listing 6, DOM4J uses a factory method to construct the object contained in document representation (from syntax analysis). Define each component object based on the interface, so that any type of object that implements one of the interfaces can be included in the representation (reverse with JDO, it uses specific classes: These classes can be divided into subclasses and inheritance, However, any classes used in document representation need to be based on the original JDOM class). You can get documents constructed in different components by using DOM4J documentation using different factories.

In the sample code (line 5), I retrieved the (default) document factory for building a document, and stores it in an instance variable (m_factory) for use in the Modify method. This step is not strictly needed - you can use components from different factories in one document, or you can bypass the instance of the component directly - but in this case, I just want to create and use in the rest of the document. The same type of components, and use the same factory to ensure this step.

Listing 7. Dom4j modify method

1 protected void modifyelement (element element) {

2 // loop through child nodes

3 list children = element.content ();

4 for (int i = 0; i

5 // Handle Child by Node Type

6 node child = (node) Children.get (i);

7 IF (child.getnodetype () == node.text_node) {

8 // Trim Whitespace from Content Text

9 string trimmed = child.getText (). Trim ();

10 IF (Trimmed.Length () == 0) {

11 // delete child if only whitespace (Adjusting Index)

12 Children.Remove (I -);

13} else {

14 // Wrap The Trimmed Content with New Element

15 element text = m_factory.createElement

16 (QName.Get ("Text", Element.getNameSpace ()));

17 text.addText (TRIMMED);

18 Children.Set (i, text);

19}

Else IF (child.getnodetype () == node.ement_node) {

21 // Handle Child Elements with Recursive Call

22 MODIFYEELEMENT (Element); 23}

twenty four }

25}

The DOM4J Modify method in Listing 7 is very similar to the method used in JDOM. Do not check the type of content items by using the InstanceOf operator, I can get the type code through the Node interface method getNodetyPE (you can also use InstanceOf, but the type code method looks clearer). By using the QNAME object to represent the element name and build an element to distinguish the creation technology of the new element (line 15-16) by calling the saved factory.

The top class code of Electric XML (Exml) in Electric XML Listing 8 is the simplest in any of these examples. You can read and write a document with a single method call.

Listing 8. EXML top code

1 // Parse The Document from Input Stream

2 Document Doc = New Document (in);

3 // Recursively Walk and Modify Document

4 modiFyElement (Doc.getroot ());

5 // Write the Document to Output Stream

6 doc.write (out);

In Listing 9 Exml modify method, although the instanceOf is required, it is necessary to use instanceof instead of JDOM, but it is most similar to the DOM method. In Exml, you cannot create an element with namespace qualified names, so it is replaced, and I created a new element, then set its name to achieve the same effect.

Listing 9. EXML MODIFY method

1 protected void modifyelement (element element) {

2 // loop through child nodes

3 child child;

4 Child next = element.getchildren (). First ();

5 While ((Child = next)! = Null) {

6 // set next before we change anything

7 next = child.getnextsibling ();

8 // Handle Child By Node Type

9 IF (Child InstanceOf Text) {

10 // Trim Whitespace from Content Text

11 string trimmed = (text) .getstring (). TRIM ();

12 IF (Trimmed.Length () == 0) {

13 // delete child if only whitespace

14 Child.Remove ();

Else {

16 // Wrap The Trimmed Content with New Element

17 element text = new element ();

18 text.addText (TRIMMED);

19 Child.Replacewith (Text);

20 text.setname (Element.getPrefix (), "text");

twenty one }

22} else if (child instanceof element) {23 // Handle Child Elements with Recursive Call

24 MODIFYEELEMENT (Element);

25}

26}

27}

The top class code of XPPXPP (in Listing 10) is the longest one in all examples, compared to other models, it requires considerable settings.

Listing 10. XPP top code

1 // Parse The Document from Input Stream

2

M

_PARSERFAACTORY = XmlpullParserfactory.newinstance ();

3

M

_PARSERFAACTORY.SETNAMESPACEAWARE (TRUE);

4 xmlpullParser Parser = m_parserfactory.newpullParser ();

5 Parser.setInput (New BufferedReader (in)));

6 paser.next ();

7 xmlnode doc = m_Parserfactory.newNode ();

8 Parser.ReadNode (DOC);

9 // Recursively Walk and Modify Document

10 modifyElement (DOC);

11 // Write the Document to Output Stream

12 xmlrecorder recorder = m_parserfactory.NewRecorder ();

13 Writer Writer = New OutputStreamWriter (OUT);

14 Recorder.setOutput (Writer);

15 Recorder.Writenode (DOC);

16 Writer.Close ();

Because using the JAXP interface, I must first create an example of the analyzer factory and enable Name Space Processing (2-4 lines) before creating an analyzer instance. Once the analyzer instance is obtained, I can set the input to the analyzer and truly build a document representation (line 5-8), but this involves more steps than other models.

Output processing (line 11-16) also involves more steps than other models, mainly because XPP needs Writer instead of directly accepting stream as an output target.

In Listing 11 XPP Modify method, although more code is required to create new elements (line 13-21), it is the most similar to the JDOM method. Name Space Processing is a bit trouble here. I must first create a qualified name (line 15-16), then create an element, and finally set the name and namespace URI (line 18-21) later.

Listing 11. XPP Modify method

1 protected void modifyElement (XMLNode Element) throws exception {

2 // loop through child nodes

3 for (int i = 0; i

4 // Handle Child By Node Type

5 Object Child = Element.getChildat (i);

6 IF (Child InstanceOf String) {7 // Trim Whitespace from Content Text

8 string trimmed = child.toString (). Trim ();

9 IF (Trimmed.Length () == 0) {

10 // delete child if only whitespace (Adjusting Index)

11 Element.removechildat (i -);

12} else {

13 // Construct Qualified Name for Wrapper Element

15 string prefix = element.getprefix ();

16 string name = (prefix == null)? "Text": (Prefix ": text");

17 // Wrap The Trimmed Content with New Element

18 xmlnode text = m_ParserFactory.newNode ();

19 Text.Appendchild (TRIMMED);

20 element.replacechildat (i, text);

21 text.modifytag (Element.getNamespaceuri (), "Text", Name);

twenty two }

23} else if (Child InstanceOf XMLNode) {

24 // Handle Child Elements with Recursive Call

25 ModifyElement (XMLNode);

26}

27}

28}

Conclusion DOM, DOM4J and ELECTRIC XML have been almost equivalent to use code samples, where Exml may be simpler, and DOM4J is more difficult to limit some small conditional restrictions. DOM provides a very real benefit that is not related to the language, but if you only use Java code, it looks a bit trouble by comparing with the Java-specific model. I think this shows that Java-specific models typically successfully implement this goal in simplifying XML documents in Java code.

Beyond Basics: Real World Availability Code Show JDOM and Exml provide simple and clear interfaces for basic documentation (using elements, properties, text). According to my experience, their methods do not have a good completion of the programming tasks expressed in the entire document. To complete these types of tasks, DOM, and DOM4J used component methods - where all document components from attribute to namespace implement some public interfaces - work better.

The related example is the XML stream (XML streaming (XMLS)) encoded for JDOM and DOM4J recently. This code traverses the entire document and encodes each component. The JDOM implementation is much more complicated than the DOM4J, mainly because Jdom uses some unique classes without public interfaces to represent each component.

Because JDOM lacks a common interface, even if you process the code of the Document object and the same type of components such as subcomponents, there are some components such as sub-components, but they must be different. Special methods are also required to retrieve the NAMESPACE components relative to other types of subcomponents. Even when processing is considered a sub-component type of content, you need to use multiple IF statements with instanceOf check on the component type instead of using a clearer and faster Switch statement. Ironically, one of the initial goals of JDOM is to use the Java Collection class, which is largely interface-based. The use of the interface in the library has added many flexibility, which is based on increasing some complexity, and this is usually a good compromise for code designed for reuse. This may also be mainly due to DOM4J, which reaches a mature and stable state, much more than JDOM.

Despite this, DOM is still a very good choice for developers using multiple languages. The DOM implementation is widely used in a variety of programming languages. It is still the basis of many other standards related to XML, so even if you use Java-specific models, there is also a good opportunity to gradually be familiar with DOM. Because it officially won W

3C

Recommended (relatively relative to non-standard Java models), so it may also need it in some types of projects.

In this context, in the three main competitors of JDOM, DOM4J, and Electric XML, DOM4J and other two differences in the interface-based approach with multiple inheritance layers. This will make it more difficult to follow the API Javadocs. For example, a method you are looking for (such as Content (), used in line 3 of our DOM4J's Modify method example) may be part of the ELEMENT extended Branch interface, not part of the ELEMENT interface itself. Despite this, this interface-based design adds many flexibility (see Side Bar Beyond Basics: Real World Availability). Considering the advantages of DOM4J's performance, stability, and feature settings, you should regard it as a powerful candidate in most items.

Among the Java-specific document models, JDOM may have the broadest user foundation, and it is indeed one of the simplest models. Despite this, as a choice of project development, it is still tolerate the unconventionability of the API and update from one version to the next version, which also behaves very bad in performance comparison. Based on the current implementation, I would like to recommend DOM4J for people who start new projects, not JDOM.

In addition to XPP, Exml is much less than the resources occupied by any other model, and take into account the advantages of Exml easy to use, you should definitely think that it applies to the application of the JAR file size. However, Exml's XML support limitations and restricted licenses, as well as relatively poor performance depends on larger files, have to give up using it in many applications.

XPP requires more steps when analyzing and writing text documents, and more steps are also required when processing namespace. If XPP intends to add some convenient methods to handle some of the common situations, then it may be more better in comparison. As it is now expressing, the leaders in the last article have become a loser in the usability of this article. Despite this, because of the advantages of XPP performance, applications that require smaller JAR files are also worth it as an alternative to Exml.

Next time ... During the two articles I wrote, the performance and availability of the XML document model written in Java is involved. In the latter two articles in this series, I will discuss the method of XML data binding with Java technology. These methods have many similarities with the method of document model, but they further map the XML document to the actual application data structure. We will see how this operation is made so good and improve performance in terms of use. Go back to developerWorks, check the entry of the XML data binding of Java code. At the same time, you can give your comments and questions to this article through the forum link below.

转载请注明原文地址:https://www.9cbs.com/read-98043.html

New Post(0)