XML from IBM Developer Works
Introduction: This paper briefly describes the concept of DOM and internal logic structure, which gives the DOM document operation and the XML file translation of the Java implementation process. Guo Hongfeng: Mainly engaged in the development and research of distributed applications under UNIX systems. You can contact him by email ghf_email@sohu.com
1. DOM introduction Currently, W
3C
Already
2000
November
13th
It has launched a regulatory DOM Level 2. Document Object Model (DOM) is a programming interface specification for HTML and XML documents, which is unrelated to platforms and languages, so it can be implemented on various platforms with various languages. This model defines the logical structure (ie, document) in the THML and XML files (ie, documentation), provides access, accessing THML, and XML files. With the DOM specification, the DOM documentation and XML can be implemented, traversal, and the contents of the corresponding DOM document can be used. It can be said that the DOM specification is to be used freely to manipulate the XML file.
2. The logical structure in the DOM internal logic structure DOM document can be expressed in the form of a node tree. By processing the XML file, the elements in the XML file are transformed into node objects in the DOM document. The DOM document node has Document, Element, Comment, Type and other node types, where each DOM document must have a Document node and the root node of the node tree. It can have subtocks, or leaves nodes such as Text nodes, Comment nodes, and more. Each element in any format is in a good XML file has a node type in the DOM documentation. After transforming the XML file into a DOM documentation using the DOM interface, we can freely process the XML file.
3. The DOM interface DOM specification provided by the DOM interface in Java provides the JAVA API in the JDK1.4 beta of Sun, which follows the semantic description of the Dom Level 2 Core recommendation interface, which provides the implementation of the corresponding Java language.
In org.xml.dom, JKD 1.4 provides interfaces such as Document, DocumentType, Node, NodeList, Element, and Text, which are all necessary to access the DOM document. We can use these interfaces to create, traverse, and modify the DOM documentation.
In javax.xml.parsers, the DoumentBuilder and DocumentBuilderFactory combinations provided by JKD1.4 can parse XML files to the DOM documentation.
In javax.xml.transform.dom and javax.xml.transform.Stream, JDK1.4 provides the DomSource class and the StreamSource class, which can be used to write updated DOM documents in the generated XML file.
4. routine
4.1 Transforming an XML file into a DOM document This process is a process of obtaining an XML file parser to parse the XML file into a DOM document.
In JDK1.4, the Document interface describes the document tree corresponding to the entire XML file, providing access to document data, which is the goal of this step. The Document interface can be obtained from class DocumentBuilder, which contains an API of the DOM document instance from an XML document. XML's parsers can be obtained from class DocumentBuilderFactory. In JDK1.4, the XML file translates into a DOM documentation can have the following code implementation:
// Analyze a parser for an XML file
DocumentBuilderFactory Factory = DocumentBuilderFactory.newinstance ();
// Analyze the interface class of the XML file to generate the DOM document to access the DOM.
DocumentBuilder Builder = factory.newdocumentBuilder (); document = builder.parse (new file (filename);
4.2 After traversing the DOM document Get an interface class Document instance, you can access the DOM's document tree. To traverse the DOM document, first get the root element. Then get a list of bon nodes of the root element. The purpose of traversing is achieved by recursive methods.
// Get root elements
ELEMENT Element = Document.getDocumentelement ();
// List of child nodes for root elements
Nodelist = element.getChildNodes ();
// Use the recursive method to realize the traversal of the DOM documentation
GetElement (nodelist);
The getElement method is implemented as follows:
Public void getElement (nodelist nodelist) {
Node CNODE;
INT I, LEN;
String Str;
IF (nodelist.getLength () == 0) {
// This node has no child node
Return;
}
For (i = 0; i 1)
System.out.println (" STR " " LEN);
}
}
}
Note: The above code is just an object that displays the Node type and the TEXT type. Their type identifiers are 1 and 3, respectively.
4.3 Modifying the DOM Document Modify the API of the DOM documentation in the Dom Level 2 Core specification, which implements these APIs in the org.xml.dom in JKD1.4. Modifying the DOM documentation This is mainly set in Document, Element, Node, Text and other classes, the example herein adds a series of objects in the parsed DOM document, and corresponds to add a record in the XML file.
// Get root objects
Element Root = Document.getDocumentElement ();
/ / Add an Element node in the DOM documentation
Element BookType = Document.createElement ("computes");
// Translate the node into the child node of the root object
Root.Appendchild (CDROM);
/ / Add an Element node in the DOM documentation
Element booktitle = document.createElement ("Title");
// Translate the node into the child node of the BookType object
BookType.Appendchild (Booktitle);
// Add a Text node in the DOM documentation
Text bookname = document.createtextNode ("Understand Corba");
// Translate the node into the child node of BOOKNAME object
Booktitle.Appendchild (BookName);
4.4 Transform the DOM document into XML file
// Get converted the DOM document into an XML file, in JDK1.4, there is class TransformerFactory
// Implementation, class Transformer implements the conversion of the API.
TransformerFactory Tfactory = TransformerFactory.newInstance ();
Transformer Transformer = TFActory.NewTransformer ();
// Translate the DOM object to a DomSource class object, which is manifested as an information container that is converted into other forms of expression.
DomSource Source = New DomSource (Document); // Get a streamResult class object, which is a container of other formal documents that DOM document transformation, which can be an XML file, a text file, an HTML file. Here is an XML file.
StreamResult Result = New StreamResult (New File ("text.xml");
// Call the API and convert the DOM document to an XML file.
Transformer.Transform (Source, Result);
This complete program of this routine is provided, which runs through the JDK1.4 environment in Windows 2000.
An example is given, and the reader can understand the idea of DOM operation. Because the DOM specification is followed by the operation of the DOM, it is also applicable to the processing of the DOM in other languages.
Reference:
1. http://www.w3.org/tr/2000/rec-dom-level-2-core-2000-1113
This is the articles detaking JAXP, Sun's Java API for XML, helping to release doubts about JAXP nature and service purposes. This article explains the basic concepts of JAXP, and the demo XML syntax analysis requires JAXP and display how to easily change the syntax analyzer used by JAXP. This article further describes two popular SAX and DOMs related to JAVA and XML APIs related to JAXP.
Java and XML created news in every technical field, and for software developers, it seems to be the most important development in 1999 and 2000. As a result, the number of Java and XML APIs surged. Two of the most popular DOMs and SAX have also attracted great interest, while JDOM and data binding API come one after another. Just thorough understanding one or both of these technologies is a daunting task, and correctly uses all of these technologies will make you an expert. But last year, another API left a deep impression, it was Sun's Java API for XML, usually known as JAXP. This progress is not surprising if it is considered that Sun has an XML product on its platform. Surprising is the lack of JAXP understanding. Most developers using its developers have errors in the concept of this API.
What is JAXP? This article assumes that you have basic knowledge of SAX and DOM. There is no sufficient space here to explain SAX, DOM, and JAXP. If you are a newcomer in XML syntax analysis, you may have to read SAX and DOM through online resources, or browse my book. (Reference Resources There is a link to the API and my book.) After obtaining basic knowledge, it will be better.
Is API or abstract? Introducing some basic concepts before explaining the code. Strictly speaking, JAXP is an API, but it is called an abstract layer more accurate. It does not provide new ways to process XML, not complement SAX or DOM, nor does it provide new features to Java and XML processing. (If you understand this, this article is just right!) It is only easier to handle some difficult tasks through the DOM and SAX. If you encounter a vendor-specific task when using the DOM and SAX API, it also makes it possible to handle these tasks by independent vendors.
Although all of these features are to be described separately, truly need to master: JAXP does not provide grammar analysis function! There is no SAX, DOM or another XML syntax analysis, and the XML syntax cannot be analyzed. Many people have let me compare DOM, SAX or JDOM with JAXP. However, it is impossible to perform these comparisons because the top three APIs are completely different from JAXP. SAX, DOM, and JDOM are analyzed XML syntax. JAXP provides a way to reach these grammar analyzers and results. It does not provide a new way to analyze document syntax. If you want to use JAXP correctly, you must figure out this. This will enable you a big chance than other XML developers. If you still doubt (or think of me, please download JAXP distribution from Sun's Web Site (see Refigu), then you will know what basic jaxp is. There are only six classes included in JAR (JAXP.jar)! How many times will this API? All of these classes (part of the javax.xml.parsers package are located on the existing syntax analyzer. Two of these classes are also used for error handling. JAXP is much simpler than people think. So why is it still confused?
Sun's JAXP and Sun Syntax Analyzer JAXP downloads the syntaxial analyzer when SUN is downloaded. All PARSER operators are located in the Parser.jar file as part of the com.sun.xml.parser package and the related subcaps. It should be known that the grammar analyzer (code name crimson) is not part of JAXP itself. It is part of the JAXP version, but is not part of the JAXP API. Is it confusing? A little. For this way, think about: JDOM is provided with the Apache Xerces Grammar Analyzer. The syntax analyzer is not part of the JDOM, but is used by JDOM, so it includes it to ensure that JDOM can be used separately. This is also the case, but it is not as good as JDOM: JAXP is provided with Sun's Syntax Analyzer so that you can use it immediately. However, many people use the class included in the SUN as part of the JAXP API. For example, a common problem in the newsgroup is: "How to use the XMLDocument class in JAXP? What is its purpose?" This answer can be more complicated.
First, com.sun.xml.tree.xmldocument classes are not part of JAXP. It is part of the Sun Syntax Analyzer. Therefore, this problem gives people a misleading from the beginning. Second, JAXP's entire significance is to provide supplier independence when processing the syntax analyzer. The same code to use JAXP can be used with the Sun's XML syntax analyzer, Apache's Xerces XML Syntax, and Oracle's XML Syntax Analyzer. The use of Sun-specific classes is a bad idea. This is from the entire meaning of JAXP. What is the confusion of this problem now? The API in the grammatical analyzer and JAXP release (at least sun version) is mixed, and the developer will use one of the classes and characteristics as another, and vice versa.
Old and new JAXP, and finally need to point out that there are some defects using JAXP. For example, JAXP only supports SAX 1.0 and DOM first layer specifications. SAX 2.0 is completed from May 2000, and the DOM second layer specification support is even longer in most syntax analyzers. The second layer of DOM has not been completed, but it is really stable for production. The new version of these two API has a significant improvement, the most obvious is support for the XML namespace. This support also allows "XML Schema Confirmation", this related to XML is another popular technology. Equipage said that when JAXP issued 1.0 final release, SAX 2.0 and DOM first-level specification have not been completed. However, since these new versions are not included, it is really inconvenient for developers. You can also use JAXP, but you can also wait for JAXP 1.1, which supports SAX 2.0 and DOM secondary specifications. Otherwise, it will be found that the advantages provided by JAXP are at the expense of features in SAX and DOM's latest versions, and make applications more difficult to encode. Whether you wait for the next JAXP release, pay attention to this problem. If you use the JAXP with the grammar analyzer, the DOM and SAX version supported by the syntax analyzer are high than JAXP support, and there may be path problems. So, pay attention to it in advance, and once there is JAXP 1.1, upgrade immediately. After basically understanding JAXP, let's take a look at the JAXP-dependent API: SAX and DOM.
From SAX Start SAX (Simple API for XML) is an event-driven method for processing XML. It is basically consisting of many recovery functions. For example, whenever the SAX Syntax Analyzer encounters the start tag of an element, call startElement (). For a string, the characters () callback function will be called, then endElement () is called at the element end tag. There are also a lot of backup functions for document processing, errors, and other vocabulary structures. Now I know what is going on. SAX programmer implements a SAX interface that defines these callback functions. SAX also implements a class named Handlerbase that implements all of these callback functions and provides the default empty implementation of all of these callback methods. (Mention this is because it is very important in the DOM mentioned later.) SAX developers only need to extend this class, and then implement methods that need to be inserted into specific logic. Therefore, the key to SAX is to provide code for these different callback functions, and then allow the syntax analyzer to trigger each of these callback functions when appropriate.
Therefore, the typical SAX process is as follows:
Create a SAXPARSER instance with a specific supplier's grammar analyzer
Register a callback implementation (for example, by using the class extension Handlerbase)
Start the grammatical analysis, then wait while triggering the callback implementation
JAXP's SAX component provides a simple way to perform all these steps. If there is no JAXP, the SAX Syntax Analyzer is instantiated directly from the vendor class (such as org.apache.xerces.parsers.saxparser) or must use the helper class named ParserFactory. The first method is obvious: not independent of the supplier. The problem with the second method is that the class is a self-variable, that is, the character string name of the syntax analyzer class to use (or that apache org.apache.xerces.Parsers.saXParser). The syntax analyzer can be changed by passing different grammar analyzers as String. Using this method does not have to change any Import statement, but still recompile the class. This is obviously not the best solution. If you can change the syntax analyzer without recoiling classes, it may be much more simple, is this? JAXP provides a better alternative: it allows the syntax analyzer as a Java system properties. Of course, when downloading versions from the Sun, a JAXP implementation using the Sun Syntax Analyzer will be obtained. You can download the same JAXP interface to build its implementation from the Apache XML Web site. So (no matter which case), change the syntax analyzer that is being used needs to change the classpath setting, which is changed from one syntax analyzer to another, but does not require recompiling code. This is the magic of JAXP, or abstraction.
The SAX Syntax Analyzer is a JAXP SAXPARSERFACTORY class is the key to easily changing the syntax analysis. New instances of this class must be created (waiting for a while). After creating a new instance, the class is provided with a method to get a syntax analyzer that supports SAX. Inside, JAXP implementation depends on the supplier's code, so that your code is not affected. This type of factory also provides some other excellent features.
In addition to the basic work of creating a SAX Syntax Analyzer instance, the class is also allowed to set the configuration options. These options affect all grammar analyzer instances obtained by the class factory. The two available functions in JAXP 1.0 are setting namespace sensitivity (SETNAMESPACEAWARE (Boolean Awareness), and open confirmation (SETVALIDATING). Remember, once these options are set, after calling the method, they will affect all instances obtained from the class factory.
After setting the class factory, call NewsaXParser () will return a JAXP SaxParser class instance available at any time. This class encapsulates a lower SAX syntax analyzer (instance of SAX class org.xml.sax.parser). It also prevents adding any vendor-specific additional features to the syntax analyzer class. (Remember the previous discussion of XMLDocument?) This class can begin practical syntax analysis. The following list shows how to create, configure, and use SAX class.
Listing 1. Using SaxParserFactory
Import java.io.file;
Import java.io.ioException;
Import Java.io.OutputStreamwriter;
Import java.io.writer;
// jaxp
Import javax.xml.parsers.FactoryConfigurationError;
Import javax.xml.parsers.ParserConfigurationException;
Import javax.xml.parsers.saxparserfactory;
Import javax.xml.parsers.saxparser;
// SAX
Import org.xml.sax.attributelist;
Import org.xml.sax.handlerbase;
Import org.xml.sax.saxexception;
Public class testsaxparsing {
Public static void main (String [] args) {
Try {
IF (args.length! = 1) {
System.err.println ("USAGE: Java TestsaXPARSING [FileName]");
System.exit (1);
}
// Get SAX Syntax Analytics Factory
SAXPARSERFAACTORY FACTORY = SAXPARSERFACTORY.NEWINSTANCE ();
/ / Set the setting namespace sensitivity option, turn off the confirmation option
Factory.SetValidating (TRUE);
Factory.setnamespaceaware (false);
SAXPARSER PARSER = factory.newsaxparser ();
Parser.Parse (New file (args [0]), new myHandler ());
} catch (ParserConfigurationException E) {
System.out.println ("The Underlying Parser Does Not Support"
"The Requested Features.");
} catch (factoryconfiguration error e) {
System.out.Println ("Error Occurred Obtaining Sax Parser Factory.");
} catch (exception e) {
E.PrintStackTrace ();
}
}
}
Class myhandler extends handlerbase {
// SAX callback function implemented by DocumentHandler, Errorhandler, etc.
}
Note that in this code, two JAXP-specific issues may occur when using the class mill: unable to configure the SAX class, and the SAX syntax analyzer cannot be configured. The first problem FactoryConfigurationError is usually occurring when the syntax analyzer or system properties specified in the JAXP implementation cannot be obtained. The second problem ParserConfigurationException occurs when the characteristics in the syntax analyzer being used is not available. These two issues are easy to handle, and should not cause any difficulties in the use of JAXP.
After obtaining a class, close the namespace and open "confirm", you will get SAXPARSER, then start syntax analysis. Note that the sex syntax analyzer's PARSE () method obtains an instance of the SAX HandlerBase class mentioned earlier. (You can view the implementation of this class through a complete Java list.) Also passed the file you want to perform syntax analysis. However, the SAXPARSER is far more than this method.
After using the SAX Syntax Analyzer to get an instance of the SAXPARSER class, you can use it to do more. Due to the communication mode between application components in large applications, "object instance creators are their users" such that they are not always secure. In other words, a component may create an SAXPARSER instance, and another component (possibly by another developer) may need to use that instance. For this reason, some methods are provided to determine the settings of the grammar analyzer. Two ways to perform this task are isvalidating (), which notifies the call: Syntax Analyzer, or do not perform "confirm", and isnamespaceaware (), it returns an indication, indicating that the syntax analyzer can or cannot process XML documents The namespace in the middle. Although these methods provide information about the grammar analyzer to perform function, these features cannot be changed. This must be performed at the grammar analyzer factory level. In addition, there are a variety of ways to request grammar analysis of documents. In addition to only the File and Sax HandlerBase instances, SaxParser's PARSE () method can also accept SAX INPUTSOURCE, Java InputStream, or URL in String, all of which are provided with the HandlerBase instance. Therefore, different types of input documents can be handled in different ways of grammatical analysis.
Finally, you can use the SAXPARSER's getParser () method to obtain and use the sub-SAX Syntax Analyzer (Org.xml.Sax.Parser's instance). Upon obtaining this underlying instance, a usual SAX method can be obtained. The next list shows various examples of the use of the SAXPARSER class (the core class of SAX syntax analysis in JAXP).
Listing 2. Using JAXP SAXPARSER
// A instance of SAXP
SAXPARSER SAXPARSER = SAXFACTORY.NEWSAXPARSER ();
/ / View whether the validate option is supported
Boolean isvalidating = saxparser.issalidating ();
/ / View Whether to support the Namespace option
Boolean isnamespaceaware = saxparser.isnamespaceaware ();
// Multi-form syntax analysis using an instance of a File and a SAX Handlerbase
SAXPARSER.PARSE (New File (Args [0]), MyHandlerBaseInstance;
// Use a SAX INPUTSOURCE instance and a SAX HandlerBase instance
SAXPARSER.PARSE (MySaxInputSource, MyHandlerBaseInstance);
// Use an InputStream instance and a SAX HandlerBase instance
SAXPARSER.PARSE (MyInputStream, MyHandlerBaseInstance);
// Use a URI and a SAX HandlerBase instance
SAXPARSER.PARSE ("http://www.newinstance.com/xml/doc.xml", myhandlerbaseinstance);
/ / Get the underlying (package) SAX grammar analyzer
Org.xml.sax.Parser Parser = SaxParser.getParser ();
// Using the underlying grammar analyzer
Parser.setContentHandler (MyContentHandlerInstance);
Parser.seErRorHandler (MyErrorhandlerinstance); Parser.Pars (new org.xml.sax.inputsource (args [0]));
So far, there is a lot of SAX, but there is no unusual or amazing thing. In fact, JAXP has little features, especially when SAX is involved. This is very good because there are minimal functionality means being more portable, and can be used by other developers with any XML grammar analyzer with SAX, whether it is free (by open source, hope) or Through the business pathway. That's it. Use SAX in JAXP without more things. If you already know SAX, then about 98% of the content is now. Just learn two new categories and two Java exceptions, you can start. If you have never used SAX, it is also very simple and you can start now.
Treat DOM If you want to rest to greet the DOM challenge, then take a break. The process of using the DOM in JAXP is almost the same as SAX, and all of the you want to do just change two class names and a return type, which is almost. If you understand what SAX works and what DOM is, there will be no problem.
The main differences between DOM and SAX are their API structure. SAX contains an event-based callback function set, while the DOM has a tree structure in memory. In other words, in SAX, the data structure is not required (unless the developer is manually created). Therefore, SAX does not provide the function of modifying the XML document. And DOM just provides this type of functionality. Org.w
3C
.dom.Document class represents an XML document that consists of a DOM node representing elements, attributes, and other XML structures. So, JAXP does not need to trigger SAX callbacks, which is only responsible for returning a Dom Document object from grammatical analysis.
DOM Syntax Analytics Factory Basic understanding Dom and Dom and SAX are not good. The following code looks similar to the SAX code. First, obtain DocumentBuilderFactory (same as the way in SAX). The configuration class is then handled to handle confirmation and namespace (same as the way in SAX). Next, retrieve DocumentBuilder from the class factory (similar to SAXPARSER) (same as the way in SAX..., You know). Then, grammatical analysis can be performed, and the resulting Dom Document object is passed to the method of printing the DOM tree.
Listing 3. Using Document Builder
Import java.io.file;
Import java.io.ioException;
Import Java.io.OutputStreamwriter;
Import java.io.writer;
// jaxp
Import javax.xml.parsers.FactoryConfigurationError;
Import javax.xml.parsers.ParserConfigurationException;
Import javax.xml.parsers.documentBuilderFactory;
Import javax.xml.parsers.documentbuilder;
// DOM
Import Org.w
3C
. Dom. Document;
Import Org.w
3C
.dom.documenttype;
Import Org.w
3C
.dom.namednodemap; import Org.w
3C
. Dom.Node;
Import Org.w
3C
. Dom.nodelist;
Public class testDomparsing {
Public static void main (String [] args) {
Try {
IF (args.length! = 1) {
System.err.Println ("Usage: Java TestDomparsing");
System.exit (1);
}
// Get Document Builder Factory
DocumentBuilderFactory Factory = DocumentBuilderFactory.newinstance ();
// Open the confirmation option to turn off the namespace sensitivity option.
Factory.SetValidating (TRUE);
Factory.setnamespaceaware (false);
DocumentBuilder Builder = Factory.NewDocumentBuilder ();
Document doc = builder.parse (new file (args [0]));
/ / Print a document from the number of DOM and add an initial space
PrintNode (DOC, ");
/ / You can also modify the DOM document here.
} catch (ParserConfigurationException E) {
System.out.println ("The Underlying Parser Does Not Support The Requested Features.");
} catch (factoryconfiguration error e) {
System.out.println ("Error Occurred Obtaining Document Builder Factory.");
} catch (exception e) {
E.PrintStackTrace ();
}
}
Private static void printnode (node node, string indent) {
// Print the DOM tree
}
Two different problems may occur in this code (similar to SAX in JAXP): FactoryConfigurationError and ParserConfigurationException. Every reason is the same as in SAX. It is not a problem in the implementationConfigurationError, that is, the grammatical analyzer does not support requests (PARSERCONFIGURATIONEXCEPTION). The only difference between DOM and SAX is that in the DOM, use DocumentBuilderFactory to replace SAXPARSERFAACTORY, replace SAXPARSER with DocumentBuilder. It's that simple! (You can view the full code list, which includes methods for printing the DOM tree.)
After using the DOM syntax, you can get the DocumentBuilder instance after you have a DOM class. The method of the DocumentBuilder instance can be used very similar to SAX. The main difference is that variants of PARSE () do not require examples of the HandlerBase class. They return the Dom Document instance of the XML document after the syntax analysis. Another unique difference is that two methods are provided to the SAX function: SAX Errorhandler implementation to handle syntax's seterrorhandler (), and SAX EntityResolver implementation to handle SetentityResolver ( ). If you are not familiar with these concepts, you need to learn SAX through online or in my book. The following list shows an example of using these methods. Listing 4. Using JAXD DocumentBuilder
// Get a DocumentBuilder instance
DocumentBuilder Builder = BuilderFactory.NewDocumentBuilder ();
/ / View whether the validate option is supported
Boolean isvalidating = builder.issalidating ();
/ / View Whether to support the Namespace option
Boolean isnamespaceaware = builder.isnamespaceaware ();
// Set a SAX Errorhandler
Builder.setdlerHandler (MyErrorhandlerImpl);
// Set a SAX EntityResolver
Builder.sentityResolver (MyentityResolver);
// Use a variety of ways to grammar analysis of File
Document doc = builder.parse (new file (args [0]));
// Use SAX InputSource
Document doc = builder.parse (mysaxinputsource);
// Apply InputStream
Document doc = builder.parse (MyInputStream, MyHandlerBaseInstance);
// Use the URI
Document doc = builder.parse ("http://www.newinstance.com/xml/doc.xml");
Is it a bit troubled? There is more than one of this idea, and writing the DOM code is a bit trouble because it is directly acquired SAX knowledge and then uses it for DOM. Therefore, with friends, colleagues bet, saying that using JAXP is just a snatch.
Changing the Syntax Analyzer Last Theme is the ability to easily change the syntax analyzer used by JAXP. Changing the syntax analyzer used by JAXP actually changes the class factory because all SaxParser and DocumentBuilder instances come from these classes. Since it is determined which grammar analyzer is a class, therefore, you must change the class. You can change the SAXPARSerFactory interface implementation to use by setting Java System Properties Javax.xml.Parsers.SAXPARSerFactory. If this property is not defined, returns the default implementation (any of the syntax analyzer specified by the supplier). The same principle applies to DocumentBuilderFactory implementation. In this case, Javax.xml.Parsers.DocumentBuilderFactory system properties will be queried. It's so simple, we have already learned! This is all SAXP: Providing the hook to SAX, providing the hook to the DOM, and allowing the syntax analyzer to be easily changed. Conclusion As you can see, there is not much complicated thing. To change the system properties, through the class, not the grammar analyzer or builder to set "confirm", and how JAXP is not actually what people usually think, these are the most difficult part of using JAXP. In addition to the support of SAX 2.0 and DOM second-level specification, JAXP provides a helpful insertable layer on both popular Java and XML APIs. It makes the code independently of the supplier and allows the syntax analyzer to change the syntax analysis code. So download JAXP from Sun, Apache XML, or other convenience, and use it! Continue to pay attention to JAXP 1.1 and increase support for SAX 2 and DOM 2, XSLT and more. You will get first-hand news here, so please pay attention to developerWorks.
Reference
Read the JAXP 1.0 specification (English) to get details.
At Sun's Java and XML Headquarters (English) View all of Sun's Trends on XML.
Get an Apache JAXP implementation in Apache XML.
Find more API insider. Starting from SAX WEB Site (English) from Sax 2 for Java.
To view another XML supported by SAX, look at the DOM at the W3C Web site.
To get 490 page XML experts suggest, check O'Reilly published Java and XML (English), and Brett for books about the most popular technology.
Jet Java Language Developer XML Tools and API Newsgroups of DeveloperWorks
Do you need more basic XML introductions? Try the XML tutorial of developerWorks (English) and other educational articles, including the most basic topics.
Following the article about JAXP (Sun's Java API for XML Parsing), the authors analyzed the latest version 1.1 of the SAX and DOM standard support for SAX and DOM standards. After adding TRAX, JAXP 1.1 provides Java and XML developers with indispensable tools that are independent of the vendor's code to the XML document for syntax analysis and transformation.
If you often read developerWorks's XML zone, you may be somewhat strange to another JAXP article. Just a month ago, I wrote an article "JAXP Speed". In that article, I fully explain the JAXP (Java API for XML Parsing), its working principle, and how it helps you process XML data with Java programs. That article is a JAXP release version 1.0. Familiar wilderness, why should I write a JAXP article? I am one of the members of the JAXP expert group, and we are now about to complete 1.1 specification. Although most "point release" (referring to, for example, version from
1.0
Rise
To 1.1, or from
2.2
Rise
Between 2.3) Only for existing APIs, or at least simple changes, JAXP's 1.1 release is very different from the previous version. In fact, this article only tells the new methods in existing classes and functionality, while the rest will focus on the full new class and functionality of JAXP 1.1. In other words, there are too many new things in JAXP 1.1, and I can't wait to give you.
If it is a jaxp newbie, or it is now using it, or wait for it to use it, then it is suitable for you. I will tell the modifications made to the API 1.0 version, then spend some time to talk about the Trax (XML transformation). TRAX is merged into JAXP to allow an API to perform XSL transform in a vendor-based manner, which supplemented that JAXP allows supplier independence when performing XML syntax analysis. It is recommended that you read my first JAXP article, take a break, and then read this JAXP 1.1 discussion.
Enhanced Syntax Analysis API A lot of changes to JAXP APIs are made around grammar analysis, considering "P" in JAXP represents "Parsing" (syntax analysis), which is meaningful. However, significant changes in JAXP 1.1 are carried out around XML transformation, will be described in this article later. There are very few modifications in existing JAXP functionality. The biggest increase is support for SAX 2.0 (completed in May 2000), and support for the DOM level 2 (still working) level 2. The previous version of JAXP only supports SAX 1.0 and DOM level 1. The lack of this update standard once is the most criticized in JAXP 1.0.
In addition to making JAXP support SAX and DOM's latest version, there are several small changes in the API (as described in the last article). Almost all of these changes are important modifications from different companies and individuals to the JAXP expert group. All of these modifications also processes the configuration problem of the syntax analyzer returned by the two Factory (SaxParserFactory and DocumentBuilderFactory) of JAXP. Now, I will tell this, and updates in SAX and DOM standards.
Update Standards from JAXP 1.0 to 1.1 upgrades, the most expected change is an update that is supported by popular SAX and DOM standards. SAX (Simple API for XML) issued version 2.0 in May 2000, which provides great enhancement with the support of XML namespace compared to other components of XML. This namespace support allows many other XML vocabulary, such as XML mode, XLINK, and XPointer. Although these vocabulaions can also be used in SAX 1.0, the developers need to separate the local (or defined) names of the elements from their namespaces, and track these namespaces throughout the document. SAX 2.0 provides developers with this information, which greatly simplifies the process of executing these programming tasks. The DOM level 2 is also the same: there is a namespace support and other methods in many DOM classes. Although the DOM level 2 has not been completed, JAXP 1.1 supports its current specification. If the final version of the DOM standard introduces small changes, JAXP must of course include these modifications. A good news is that these changes are usually transparent to developers using JAXP. In other words, these standard updates can be said to be "automatic", without user intervention. As long as SaxParserFactory specifies the SAX 2.0 compatible syntax analyzer, you can use this update to the DocumentBuilderFactory class specifying the DOM level 2 compatible grammatical analyzer.
There are several important changes related to these updates to SAX 2.0. In SAX 1.0, the syntax analyzer interface implemented by the vendor and XML syntax analyzer project is org.xml.sax.parser. Then, JAXP class SAXPARSER provides a method through the getParser () method to get this underlying implementation class. The characteristics of this method are as follows:
Listing 1. GetParser () method
Public interface saxparser {
Public org.xml.sax.parser getParser ();
// Other Methods
}
However, in the change from SAX 1.0 to 2.0, it is opposed to using the Parser interface and replaces it with a new interface org.xml.sax.xmlReader. This is essentially useless to get the getParser () method in the case where the SAX 2.0 XMLReader class is used. To support this approach and support SAX 2.0, a new method is added to the JAXP SAXPARSER class. Naturally, the method is named GetXmlReader (), which looks as follows:
Listing 2. GetXmlReader () method
Public interface saxparser {
Public org.xml.sax.xmlreader getxmlreader ();
Public org.xml.sax.parser getParser ();
// Other Methods
}
Similarly, the class used to implement the callback in SAX 1.0 is org.xml.sax.handlerbase, and the instance of this class is provided to all JAXP 1.0 PARSE () methods. However, this class is no longer used in SAX 2.0 because some other SAX 2.0 does not support and modify. Replacing it is org.xml.sax.ext.defaulthandler. In order to adapt to this change, different versions of receiving the DefaultHandler class instance to all PARSE () methods in the SAXPARSER class to support SAX 2.0. To help you see this difference, the method discussed in Listing 3:
Listing 3. PARSE () method public interface saxparser {
// The Sax 1.0 Parse Methods
Public Void Parse (File File) and Handlerbase;
Public Void Parse (INPUTSOURCE INPUTSOURCE);
Public Void Parse (InputStream InputStream, Handlerbase Handlerbase);
Public Void Parse (InputStream InputStream, Handlerbase Handlerbase,
String systemID;
Public Void Parse (String Uri, Handlerbase Handlerbase);
// The Sax 2.0 Parse Methods
Public void Parse (File File); DEFAULTHANDLER
Public Void Parse (InputSource InputSource, DefaultHandler DefaultHandler);
Public Void Parse (InputStream InputStream, defaulthandler defaulthandler);
Public Void Parse (InputStream InputStream, DefaultHandler Defaulthandler,
String systemID;
Public void Parse (String Uri, defaulthandler defaulthandler);
// Other Methods
}
All of these methods can be used to perform grammatical analysis. However, it is only tricky when two versions of SAX are used. If you are using SAX 1.0, you will use the Parser interface and the HandlerBase class, which methods should be used very obvious. Similarly, when using SAX 2.0, it is clear that those methods that receive the defaulthandler instance and return to XMLReader should be used. So, please see all these as a reference, don't worry too much. In addition, some other modifications are made to the SAX part of the API.
The modification of the existing SAX class should complete the discussion of existing JAXP functional changes, and you need to review a new method that several JAXP SAX users can use. First, the SAXPARSERFActory class has a new method: setfeature (). As you can recall from JAXP 1.0, the SAXPARSERFActory class allows the SAXPARSER instance returned from Factory. In addition to existing methods (SetValidating () and setNameSpaceaware ()), this new method allows the SAX 2.0 function for requesting the new grammar analyzer instance. SAX 2.0 provides a functionality that allows vendors to create specific functionality for their syntax analyzer, and then users can interact with these features via SAX. For example, the user can request http://apache.org/xml/features/validation/schema function, which allows XML mode to verify the open or off. Now, this can be performed on SaxParserFactory, as shown in Listing 4:
Listing 4. Using the setFeature () method
Saxparserfactory myfactory = saxparserfactory.newinstance (); // Turn on XML Schema Validation
MyFactory.setfeature ("http://apache.org/xml/features/Validation/schema", true);
// now get an instance of the paser with schema validation enabled
SAXPARSER PARSER = MyFactory.newsaXParser ();
Of course, a getFeature () method is also provided to supplement the setFeature () method, and allow query specific features. This method returns a simple Boolean value.
In addition to the setting function (set to TRUE or FALSE), SAX is allowed to set features. In SAX, the feature is the name related to the actual Java object. For example, using the SAX Syntax Analyzer instance, you can set the feature http://xml.org/sax/properties/lexical-handler assigned a SAX LexicalLler interface for this feature. Then, the syntax analyzer will use this implementation to make vocabulary processing. Because the characteristics such as words are specific to the syntax analyzer, not the Factory (like characteristics), in the JAXP SaxParser class, not the setProperty () method in the SAXPARSERFactory class. Like functionality, a replenrate method of getProperty () is also provided in SaxParser to return values related to a particular feature.
Update in the DOM has some new ways in the DOM section of the JAXP. These methods have been added to an existing JAXP class to support the DOM level 2 option, as well as some common configurations that have occurred last year. I don't want to tell all of these options and corresponding methods, because many of them are not easy to be perceived (they are only used in very rare cases) and do not need in many applications. Of course encourage you to view this in the nearest JAXP specification (see References). After the standard update, SAX changes, and other DOM methods, you can read the most important changes in JAXP 1.1 - TRAX API.
The TRAX API has so far, which has been modified to use JAXP to perform XML syntax analysis. Now, you can speak XML transformation in JAXP 1.1. The most exciting progress in the latest version of the API is probably it is allowed to perform an XML document transformation independent of the supplier. If you are unfamiliar with XML transformation and XSLT (XML transformation), check the DW tutorial (see Resources). Although this supplier independence can extend current JAXP only as a syntax analysis, it is more urgent, because the XSL processor currently uses different methods and ways to interact to users and developers. In fact, the XSL processors of different suppliers are more different than the XML syntax analyzer.
Initially, the JAXP expert team tried to provide a simple Transform class and several ways to standardize the style sheet and the following document transformation. This initial attempt later proved to be very unreliable, but I am very happy to say that we (JAXP Expert Group) are significant progress in this attempt. SCOTT BOAG and Michael Kay This two today's XSL processor experts (are committed to Apache Xalan and Saxon) have developed TRAX with others. It supports wider range of options and features, and provides full support for almost all XML transforms - all of this works with JAXP. Like the JAXP's grammatical analysis section, do three basic steps for XML transformations:
Get Transformer Factory
Retrieve Transformer
Execution operation (transformation)
Using Factory in the JAXP transform section, use the Factory named Javax.xml.Transform.TransformerFactory. This class is similar to I am in the first JAXP article and the SAXPARSERFACTORY and DocumentBuilderFactory classes mentioned earlier in this article. Of course, only the factory instance to be used is too simple:
Listing 5. Get TransformerFactory instances
TransformerFactory Factory = TransformerFactory.newinstance ();
Once you get Factory, you can set various options on the Factory. Those options will affect all instances of the Transformer (later) of the Factory. (By the way, you can get the javax.xml.transform.templates instance through TransformerFactory. Templates are advanced JAXP concepts, this article is no longer detailed.)
First, the options that can be used are attributes. These are not an XML attribute, but similar to the characteristics that speak in the XML Syntax Analyzer. The property allows the option to pass the option to the underlying XML processor (probably an XSL processor of Apache Xalan, Saxon, or Oracle). They are highly dependent on the supplier. Similar to JAXP's grammatical analysis, setting the setAttribute () method and its partner GetAttribute (). Similar to setProperty (), the former accepts the attribute name and Object value. Similar to getProperty (), the latter accepts the attribute name and returns the associated Object value.
Set ERRORLISTENER is the second option available. ErrorListener is defined in the Javax.xml.Transform.ErrorListener interface, which allows the problem that appears in the transform and processes in the program. If you are familiar with SAX, you will find that the interface is very similar to the org.xml.sax.errorhandler interface:
Listing 6. ErrorListener interface
Package javax.xml.transform;
Public interface errorlistener {
Public void Warning (Transformerexception Exception)
THROWS TRANSFORMEREXCEPTION;
Public void error (TransformerException Exception)
THROWS TRANSFORMEREXCEPTION;
Public Void Fatalerror (Transformerexception Exception) THROWS TRANSFORMEREXCEPTION;
}
By creating an implementation of the interface, populate three callback methods, and use the seterrorListener () method on the TransformerFactory instance in use, you will have any errors.
Finally, a method is provided to set up and retrieve the URI (Uniform Resource INDICATOR, unified resource identifier, usually referred to as URL) parser generated by Factory. The interface defined in javax.xml.transform.uriresolver is also similar to its SAX correspondence interface org.xml.sax.entityResolver. This interface has a method:
Listing 7. UriResolver interface
Package javax.xml.transform;
Public interface uriResolver {
Public Source Resolve (String Href, String Base)
THROWS TRANSFORMEREXCEPTION;
}
This interface allows processes that are discovered in an XML configuration (such as XSL: Import and XSL: Include) after implementation. After returning Source, you can search the specified document at different locations when you encounter a specific URI. For example, when you encounter URI http://www.oreilly.com/oreilly.xsl, you can return to local document OREILLY.XSL without network access. You can use the TransformerFactory's seturiresolver () method to set up the implementation of the UriResolver and use the getURiresolver () method to retrieve.
Finally, once you set the desired option, you can get one or more instances of Transformer through the newTransformer method of Factory.
Listing 8. Get transformer
// Get the Factory
TransformerFactory Factory = TransformerFactory.newinstance ();
// configure the factory
Factory.setdEletRRRRESOLVER (MyErrorResolver);
Factory.SeturiResolver (MyuriResolver);
// Get a Transformer To Work with, with the options specified
Transformer Transformer = Factory.NewTransformer ("Sheet.xsl");
As you can see, this method uses a style sheet as an input to use in all transformations of the Transformer instance. In other words, if you want to use Style Table A and Style Table B to transform documents, then two TRANSFORMER instances are required, each instance is used for a style sheet. However, if you want to transform multiple documents with the same style sheet, (we call it as a pattern table C), then only one Transformer instance associated with style sheet C is required.
After the transformation XML has a Transformer instance, you can perform the actual XML transformation. This includes two basic steps:
Set the XSL style sheet to use
Performing a transformation, specifying the XML document and the resulting target as described above, the first step is actually the simplest. A style sheet must be provided when you get a transformer instance from Factory. The location of the style sheet must be specified by providing javax.xml.transform.source. So far, you have seen the Source interface in several code samples, which is a way to find input - whether or other information set, whether it is a style sheet, or other information set. TRAX not only provides Source interface, but also provides three specific implementations:
Javax.xml.transform.Stream.StreamSource
Javax.xml.transform.dom.domsource
Javax.xml.transform.sax.saxsource
The first STREAMSOURCE in these three reads input from some I / O device types. Provide some constructors to accept the inputStream, Reader or String system identifier as input. After you create it, you can pass the streamsource to Transformer. This may be the most common Source implementation. This method is very good for reading a document from a network, input stream, or other static representation.
The next source, Domsource, allows information to be read from the existing DOM tree. It provides a constructor to receive Dom Org.w
3C
.dom.Node, then read from this Node when used. If you have started grammar analysis, and the XML document has existed in the DOM structure in memory, this may be an ideal method for providing existing DOM trees for transformation.
SAXSource allows reading input from SAX producers. This Source implements accepted SAX org.xml.sax.inputSource or org.xml.sax.xmlReader as an input and uses an event from these sources as an input. This is an ideal method for triggering callbacks, which has been started using SAX and set up callbacks.
Once the transformer's instance (by providing a style sheet to be used in the appropriate source), the transformation can be performed. To complete the transformation, you need to use the transform () method (nothing strange):
Listing 9. Perform transformation
// Get the Factory
TransformerFactory Factory = TransformerFactory.newinstance ();
// configure the factory
Factory.setdEletRRRRESOLVER (MyErrorResolver);
Factory.SeturiResolver (MyuriResolver);
// Get a Transformer To Work with, with the options specified
Transformer Transformer = Factory.NewTransformer ("Sheet.xsl");
// Perform Transformation On Document A, And Print Out Result
Transformer.Transform (New StreamSource ("Documenta.xml"),
New streamResult (system.out);
The Transform () method accepts two arguments: Source implementation and javax.xml.transform.Result implementation. You should have seen the symmetry of its working principle and understand the functionality of the Result interface. Source should provide the XML document to be transformed, and Result should provide the transform output target. Similar to Source, Trax and Jaxp of the Result interface provide three specific implementations: javax.xml.transform.Stream.StreamResult
Javax.xml.transform.dom.domResult
Javax.xml.transform.sax.saxresult
StreamResult uses OutputStream (similar to System.out in the previous example) or Writer as a constructor. DomResult will output the transform to the DOM Node (assuming is dom org.w
3C
.dom.document), while SAXRESULT triggers the callback to SAX ContentHandler generated by the converted XML. All of this is similar to the corresponding Source implementation, you can easily understand its usage through the latter.
Although the above example shows the change from flow to flow, any combination of the source and results is possible. Here are a few examples:
Listing 10. Various TRAX / JAXP transformations
// Perform Transformation On Document A, And Print Out Result
Transformer.Transform (New StreamSource ("Documenta.xml",
New streamResult (system.out);
// Transform from Sax and Output Results to a Dom Node
Transformer.Transform (New SaxSource
(New INPUTSOURCE ("http://www.oreilly.com/catalog.xml"),
New DomResult (DocumentBuilder.newDocument ()));
// Transform from Dom and Output to a file
Transformer.Transform (New Domsource (MyDomtree),
New StreamResult (New FileoutputStream ("Results.xml")))));
// Use a custom Source and Result (JDOM)
Transformer.Transform (New Org.jdom.trax.jdomsource (MyjdomDocument),
New
Org.jdom.trax.jdomResult (new org.jdom.document ()));
As you can see, Trax is in transform from various input types to various output types, and in XSL style sheets using various formats (DOM trees, SAX readers, etc.) in various formats (DOM trees, SAX readers, etc.). TRAX It can provide great flexibility.
Some other useful functions in the That meter, but they are not used as those shown here, and this article does not have enough space to list all. It is recommended that when you include the JAXP specification (immediately, you will include) to view it, it is a rich and powerful API for XML transformation. You can try the output feature, set the error handling (not only in XSL transformation, but also when looking up the input source), and discover all kinds of good things in the API. Start enjoying, tell us (expert group) your thoughts! WARNING give a warning before the end. If you read this article after three months, download JAXP 1.1 and get a compiler and runtime error. Remember that this article is written in the case where JAXP 1.1 will be completed. Like any earlier release, things will always change - even in the production process from my laptop to the developerWorks. In other words, the methods and functions mentioned here are the latest, but the JAXP specification can be said to be in a change. Remember, it is important to be the concept of this article, and the method described herein may change, even slight behavior changes. The core concepts outlined herein will still appear in a JAXP 1.1 in some form. Therefore, if the details are not fully correct when the specification and reference implementations of JAXP 1.1 are completed, the details described herein are conceptually correct.
Conclusion Now, you know what the next version of JAXP will have. The final specification public draft should be completed by the end of 2000. The actual reference implementation will be released soon, and all finished work will be completed before the first quarter of 2001. Be careful when looking for JAXP reference information, because the current specification (until the beginning of November 2000) does not include the TRAX API discussed herein. When I wrote this article, the norm is being modified, so the update specification will be late.
For those who have been waiting for JAXP (considering 1.0 version, this is a quite wise transition), it is now starting to use it. In my article and book Java and XML, since JAXP 1.0 is insufficient in SAX 2.0 and DOM level 2, I have given JAXP 1.1 ambiguous support. Now, I am pleased to admit that JAXP 1.1 is a major progress. Java and XML developers will find that it is an indispensable tool for writing a libraristic analysis and transformation of the XML document. So, look carefully to JAXP 1.0 and prepare your application.
This preview of the XML BY EXAMPLE of Benoit Marchal gives an informative introduction to SAX, SAX is an event-based API for processing XML, which has become a factual standard. This preview tells when to replace DOM using SAX, outlines the commonly used SAX interface, and provides a detailed example with many code samples in Java-based applications. A division Que Publishing license for Pearson Technology Group.
This article is adapted from one of the upcoming XML BY EXAMPLE second edition, introduces SAX, which is an event-based API for processing XML. SAX is a "Document Object Model" or DOM. Way
3C
The published XML syntax analyzer is an object-based API.
You will learn about SAX:
It is an event-based API.
Operate on a lower level than the DOM.
To provide you more control than DOM.
Almost always more efficient than DOM.
But unfortunately, you need more work than DOM.
Why is another API? Don't be deceived by name. SAX may be Simple API for XML, but it needs more work than DOM. Its return - a more compact code is worth effort.
Figure 1 shows two components of a typical XML program:
Syntax Analyzer, the software component that decodes the XML file on behalf of the application. The syntax analyzer effectively avoids the developer to avoid complex XML syntax.
Application, it uses file content.
Figure 1. Architecture of the XML program
Obviously, applications can be very simple (for example, applications for converting prices between euro and USD) can also be very complicated, for example, distributed electronic trade applications for subscribering goods from the Internet.
This chapter focuses on the dotted lines or APIs (application programming interfaces) between the dashed lines in Figure 1.
Object-based and event-based interfaces You may already know that the syntax analyzer has two types of interfaces - object-based and event-based interfaces.
Discuss in detail in another chapter in the honest
3C
Developed and released DOM, it is a standard API based on object-based grammar analyzer. This brief overview of the DOM only provides background knowledge so that you can better understand SAX better.
As an object-based interface, DOM communicates with the application by building an object tree in memory. The object tree is an accurate mapping of the element tree in the XML file.
Dom is easy to learn and use because it matches the basic XML document. It is also ideal for applications I called XML (for example, browsers and editors). The XML-centric application manipulates the XML document in order to manipulate the XML document.
However, for most applications, processing XML documents is just one of its many tasks. For example, billing packages may import an XML invoice, but this is not its main activity. Calculating account balances, tracking spending, and makes payment and invoice matching are primary activities. Accounting packages may already have a data structure (most likely to be a database). The DOM model is not suitable for billing applications, because in that case, the application must maintain two copies of data in memory (one is a DOM tree, the other is the application own structure). At least, the two data in memory will reduce efficiency. For desktop applications, this may not be a major problem, but it may cause the server to be paralyzed.
For applications that do not use XML, SAX is a wise choice. In fact, SAX does not explicitly build a document tree in memory. It allows applications to store data with the most efficient way.
Figure 2 illustrates how the application is mapping between the XML tree and its own data structure.
Figure 2. Map the XML structure into the application structure
Event-based interface is as indicated by its name, an event-based syntax analyzer sends an event to an application. These events are similar to user interface events, such as the onclick event in the browser or AWT / SWING event in Java.
Event Notification Applications have something to respond. In the browser, events are usually generated for response to user operation: When the user clicks the button, the button generates an OnClick event.
In the XML syntax analyzer, events are not related to user operations, and related to elements in the XML document you are reading. There are events for the following:
Element start and end tag
Element content
entity
Grammatical analysis error
Figure 3 shows how the syntax analyzer generates an event when reading a document.
Figure 3. Syntax Analyzer Generation Event
Listing 1 shows a list of XML formats. It lists in detail the charges for XML training in different companies. Figure 4 shows the structure of the price list document. Listing 1. Pricelist.xml
XML Version = "1.0"?>
xbe: price-list>
Figure 4. Structure of price list
The XML syntax analyzer reads and explains the document. Whenever it identifies some of the contents in the document, an event will be generated.
When you read a list 1, the Scriptory Analyzer first reads the XML declaration and generates documentation. When it encounters the first start tag
Next, the syntax analyzer sees the start tag of the Product element (for simple start, in the rest of this article, I will ignore the namespace and indented spaces) and generate its third event.
After starting tag, the syntax analyzer sees the content of the Product element: XML training, which generates another event.
The next event pointed out the end tag of the Product element. The syntax analyzer has completed the grammatical analysis of the Product element. So far, it has already stimulated 5 events: 3 events of the Product element, a document start event and a Price-List start tag event.
The grammatical analyzer is now moving to the first Price-quote element. It generates two events for each Price-quote element: a start tag event and an end tag event.
Yes, even if the end tag is simplified to the / character in the start mark, the syntax analyzer still generates an end event.
There are 4 Price-Quote elements, so the grammar analyzer generates eight events when analyzing them. Finally, the syntax analyzer encounters the end tag of Price-List and generates its last two events: End Price-List and document end.
As shown in Figure 5, these events together describe the document tree to the application. Start tagging event means "the next layer of the tree", and the end tag element means "go to the top of the tree".
Figure 5. How to build a tree in the syntax analyzer
Note that the syntax analyzer passes sufficient information to build a document tree for an XML document, but is different from the DOM syntax analyzer, which does not explicitly build the tree.
Why use an event-based interface? Now, I will definitely you have been confused. Which type of API should be used, should I use - SAX or DOM? Unfortunately, this problem has no clear answer. There is no such in these two APIs in nature; they apply to different needs.
The experience is to use SAX when needed; when it is necessary to increase the convenience, DOM is used. For example, DOM is very popular in scripting languages. Note: Natural interfaces For grammar analyzers, event-based interface is the ideal choice: it only needs to report what it sees.
The main reason for SAX is efficient. SAX is less than DOM, but provides more control over the grammar analyzer. Of course, if the work of the grammar analyzer is reduced, it means that you (developers) have more jobs.
Moreover, as we have discussed, SAX has less resources that are consumed than DOM, because it does not need to build a document tree.
In the early stage of XML, DOM benefited from W
3C
Approved the official API. Gradually, developers have chosen functionality and give up their convenience and turn to SAX.
The main limit of SAX is that it cannot browse the document backwards. In fact, after inspiring an event, the syntax analyzer will forget it. As you will see, the application must explicitly buffer the incidents of interest.
Note: If the tree built by SAX is required, the application can build a DOM tree with an event received from the syntax analyzer. In fact, several DOM syntax analyzers are built on the SAX syntax analyzer.
Of course, whether it implements SAX or the DOM API, the syntax analyzer do a lot of work: it reads documents, enforces XML syntax and parsing entities - first listing these. Verifying the Syntax Analyzer also enforces document mode.
There are many reasons for using the syntax analyzer and you should master the API, SAX, and DOM. It allows you to flexibly select the best API based on the tasks in your hand. Fortunately, the modern syntax analyzer also supports two APIs.
SAX, powerful APISAX is a standard and simple API for event-based syntax analyzers based on Members of the XML-DEV mailing list. SAX is an abbreviation for "Simple API for XML".
SAX was originally defined for Java, but it can also be used for Python, Perl, C and COM (Windows objects). There must be more language bindings in the future. Moreover, through COM, SAX syntax analyzer can also be used in all Windows programming languages, including Visual Basic and Delphi.
Unlike DOM, SAX has not been recognized by official standards, but it is widely used and considered to be actually standard. (Now, SAX is edited by David Megginson, but he has announced that it will be retired.)
As you can see, in your browser, DOM is the preferred API. Therefore, the example in this chapter is written in Java. (If you feel that you need a Java quick course, please turn to Appendix A or the Teaching section of the DeveloperWorks Java area.)
Some syntax analyzers that support SAX include Xerces, Apache Parser, MSXML (Microsoft Syntax Analyzer), and XDK (Oracle Syntax Analyzer). These grammar analyzers are the most flexible because they also support DOM.
There are several grammar analyzers to provide SAX, such as the XP of James Clark, 苐 Fred, and Vivid Creations of ActiveSax (see Resources).
SAX Start Listing 2 is a Java application that looks for the cheapest price in Listing 1. The application prints the best price and supplier name. Compilation example To edit this application, you need to apply to the Java Development Toolbox (JDK) "for your platform (see Resources). For this example, Java Runtime is not enough.
Note that Java is difficult to handle the path containing spaces. If the "cheapest" company complains that it can't find a file, check the wrong space in the directory.
Download the list of this extracted from the XBE2 page of the author website. Downloadings include Xerces. If you have any questions, please visit the author website for updates.
Save Listing 2 in a file called Cheapest.java. Go to the DOS prompt, change to the directory that holds the Cheapest.java, and then send the following commands to compile the following command:
Mkdir Classes
Set classpath = classes; lib / xerces.jar
Javac -D Classes SRC / Cheapest.java
Compiling will install Java programs in the CLASSES directory. These commands assume that you have installed Xerces in the lib directory and list 2 in the SRC directory. If you install the syntax analyzer in another directory, you may have to modify the classpath (second command).
To run an app for the price list, issue the following command:
Java com.psol.xbe2.cheapest data / priceList.xml
The result should be:
The Cheapest Offer IS from XMLI ($ 699.00)
This command assumes a list 1 in a file called Data / PriceList.xml. Similarly, you may need to modify the system path.
Tip: About the event processor event processor does not call the syntax analyzer. In fact, the opposite is true: the syntax analyzer calls the event processor. Is it confused? Think about the AWT event. The event processor connected to the button does not call the button. It is waiting to be clicked.
The step-by-step discussion of the event processor defines the events in SAX as a way to connect to a specific Java interface. This section will gradually review list 2. The following section provides you with more information on the main SAX interface.
The easiest way to declare the event processor is to inherit the DefaultHandler provided by SAX:
Public Class Cheapest
Extends defaulthandler
The application only implements an event processor StartElement (), and the syntax analyzer calls it when you encounter the start tag. The Syntax Analyzer will call the document
In Listing 2, the event processor is only interested in Price-Quote, so it is only tested. The processor does not process the event of other elements.
IF (Uri.Equals (Namespace_uri) && name.equals ("price-quote"))
{
// ...
}
When the event processor discovers the price-quote element, it extracts the supplier name and price from the property list. With this information, find the cheapest product is a simple comparison process.
String Attribute =
Attributes.getValue ("", "price");
IF (NULL! = Attribute)
{
Double price = TodouBLE (Attribute);
IF (min> price)
{
Min = price; vendor = attributes.getvalue ("", "vendor");
}
}
Note that the event processor receives element name, namespace, and attribute list as parameters from the syntax analyzer.
Now let's turn attention to the main () method. It creates an event processor object and a syntax analyzer object:
Cheapest cheapest = new cheapest ();
XmlReader Parser =
XmlReaderFactory.createxmlreader (Parser_name);
XmlReader and XmlReaderFactory are defined by SAX. XMLReader is a SAX Syntax Analyzer. Factory is a helper class for creating XMLReaders.
Main () Sets a syntax analyzer feature to request namespace processing, and register the event processor using the syntax analyzer. Finally, main () uses the URI call for the XML file PARSE () method:
Parser.SetFeature ("http://xml.org/sax/features/namespaces", true);
Parser.setContentHandler (Cheapest);
Parser.Parse (Args [0]);
Tips: Name Space By default, http://xml.org/sax/features/namespaces is set to true, but explicitly set it to really make the code more readable.
The seemingly unrelated parse () method triggers syntax analysis of the XML document, which results in the calling event processor. Our StarTelement () method is called during this method. A lot of things happened behind the parse ().
Last but very important, main () printed out the results:
Object [] Objects = new object []
{
Cheapest.vendor,
New double (cheapest.min)
}
System.out.println (MessageFormat.Format (Message, Objects));
Wait! When is Cheapest.vendor and Cheapest.min get their value? We don't set them in main ()! This is true; this is the work of the event processor. Finally, the event processor is called by PARSE (). This is the beauty of the event.
note
Keep in mind that these examples cannot be compiled unless "Java Development Toolbox" has been installed.
Finally, there may be a mistake similar to:
SRC / Cheapest.java: 7: package org.xml.sax
NOT Found IN Import.
Import org.xml.sax. *;
or
CAN't Find Class COM / PSOL / XBE2 / Cheapest
OR Something It Requires
This is likely to come from the following reasons:
Class path (second command, class; lib / Xerces.jar)) is incorrect.
Enter an incorrect class name in the last command (com.psol.xbe2.chepest).
Common SAX interfaces and classes come so far, we only discuss an event (StartElement ()). Before proceeding, let's study the SAX defined primary interface.
Note: SAX version is so far, there are two SAX versions: SAX1 and SAX2. This chapter only introduces SAX2 API. SAX1 is very similar to SAX2, but it lacks namespace processing. SAX divides its event into several interfaces:
ContentHandler defines an event (eg, start and end tag) associated with the document itself. Most applications register these events.
DTDHandler defines an event associated with DTD. However, it does not define enough events to completely report DTDs. If you need to grant the DTD, use optional DeclHandler. DeclHandler is an extension of SAX, and not all of the syntax analyzers support it.
EntityResolver defines an event associated with the load entity. Only a few applications register these events.
Errorhandler defines an error event. Many applications register these events to use their own way to report error.
Note: This section is not a comprehensive reference of SAX in the success of SAX. Instead, it focuses on the most common class.
To simplify work, SAX provides the default implementation of these interfaces in the DEFAULTHANDLER class. In most cases, it is easier to extend the defaulthandler and override the relevant method for the application.
XmlReader is a registration event processor and launch a syntax analyzer, an application uses an XMLReader interface. As we see, parse (), this XMLReader method, start syntax analysis:
Parser.Parse (Args [0]);
The main method of XmlReader is:
Parse () grammar analysis of XML documents. Parse () has two versions; a receiving file name or URL, another accepts the InputSource object (see "InputSource" section).
SetContentHandler (), setDdhandler (), setEnTdDhandler (), setErrorhandler () allows the application to register the event processor.
SETFEATURE () and setProperty () Control how to work. They use a characteristic or function identification (a URI and value similar to the namespace). The function uses the boolean value, while the characteristics use "objects".
The most common XMLReaderFactory function is:
Http: // xml.org/sax/features/namespaces, all SAX Syntax Analytics can identify it. If it is set to true (default), the syntax analyzer will recognize the namespace and parse the prefix when calling the ContentHandler method.
Http://xml.org/sax/features/validation, it is optional. If you set it to true, the verification syntax will verify the document. Non-verification syntax analyzer ignores this feature.
XmlReaderFactoryXmlReaderFactory creates a syntax analyzer object. It defines two versions of CreateXmlReader (): A class name that uses the syntax analyzer as a parameter, and the other is obtained from the org.xml.sax.driver system feature.
For Xerces, the class is org.apache.xerces.Parsers.saxpival. XMLReaderFactory should be used because it is easy to switch to another SAX syntax analyzer. In fact, just change a row and recompile.
XmlReader Parser = XmlReaderFactory.createxmlReader
"Org.apache.xerces.Parsers.saxparser");
To achieve greater flexibility, the application can read class names or use CreateXmlReader () without parameters. Therefore, the syntaxial analyzer can even be changed without recompiling.
InputSourceInputSource Control Syntax Analyzer How to read files, including XML documents and entities.
In most cases, the document is loaded from the URL. However, applications with special needs can override InputSource. For example, this can be used to load a document from the database.
ContentHandlerContentHandler is the most commonly used SAX interface because it defines an event of an XML document.
As you can see, Listing 2 implements the event startElement () defined in ContentHandler. It registers contentHandler with a grammar analyzer:
Cheapest cheapest = new cheapest ();
// ...
Parser.setContentHandler (Cheapest);
ContentHandler declare the following events:
StartDocument () / EndDocument () Notifies the application document start or end.
StarTelement () / endElement () Notifies the application tag start or end. The property is passed as an Attributes parameter (see "Properties" below). Even if there is only one tag, "empty" element (for example, ) generates StarTelement () and endElement ().
StartPRefixMapping () / endprefixmapping () Notifies the application namespace scope. You almost don't need this information, because when http://xml.org/sax/features/namespaces is True, the syntax analyzer has parsed namespaces.
When the syntax analyzer discovers text (character data that has been classified) in the element, CHARACTERS () / ignorablewhitespace () will notify the application. To know, the syntax analyzer is responsible for assigning text to several events (better manages its buffers). The IgnorableWhitespace event is used to ignore spaces defined by the XML standard.
Processinginstruction () Notifys the application.
Skippedentity () Notification The application has skipped an entity (ie, when the syntax analyzer is not discovered in DTD / Schema).
SetDocumentLocator () passes the Locator object to the application; see the next Locator section. Note that no SAX syntax analyzer provides Locator, but if it is provided, the event must be activated before any other event.
Properties In StartElement () events, the application receives a list of properties in the Attributes parameter.
String attribute = attributes.getvalue ("", "price");
Attributes define the following methods:
GetValue (i) / getValue (QNAME) / getValue (URI, localname) Returns the attribute value of the i-th attribute value or a given name.
GetLength () Returns the number of properties.
GetQName (i) / getLocalName (i) / geturi (i) Returns the namespace URI of the Limited Name (with prefix), local name (without prefix) and the i-th attribute. GetType (i) / gettype (qname) / gettype (URI, localname) Returns the type of the i-th attribute or the type of attribute given. The type is a string, that is, in DTD: "cdata", "id", "idref", "idrefs", "nmtoken", "nmtokens", "entity", "enttive" or "notation"
.
Note that Attributes parameters are available only during StartElement () events. If you need it between events, copy one with AttributeSimpl.
Locator Locator provides rows and columns of the application. You don't need a syntax analyzer to provide a Locator object.
Locator Defines the following methods:
getColumnNumber () Returns the column where the current event is ended. In the endelement () event, it will return the last column where the end tag is located.
getLinenumber () Returns the row at the end of the current event. In the endelement () event, it will return the end of the line.
getPublicID () Returns the public ID of the current document event.
getSystemID () Returns the system ID of the current document event.
DTDHandlerdtdHandler declares two events related to DTD syntax analyzer.
NOTATIONDECL () Notification The application has declared a tag.
NPARSEDENTITYDECL () Notification The application has discovered an entity statement that has not been gramatic analysis.
The EntityResolverentityResolver interface only defines an event resolveentity (), it returns to INPUTSOURCE (discussed in another chapter).
Because the SAX Syntax Analyzer has resolved most URLs, few applications implements EntityResolver. The exception is a directory file (discussed in another chapter), which parses the common identifier into a system identity. If you need a directory file in your application, download the directory package of Norman Walsh (see Resources).
ErrorHandlererrorhandler Interface Defines Error Event. Applications to handle these events provide custom error handling.
After the custom error processor is installed, the syntax analyzer no longer throws an exception. Throwing an exception is the responsibility of the event processor.
The interface defines three methods corresponding to the three levels or severity of the error:
WARNING () warnings those that are not defined by the XML specification. For example, some grammar analyzers issued a warning when there is no XML declaration. It is not an error (because the declaration is optional), but it may be worth noting.
Error () warns those errors defined by the XML specification.
Fatalerror () warns those fatal errors defined by the XML specification.
Most methods defined by SAXEXCEPTIONSAX can throw SAXException. When the XML document is gramatic, the SAXException notifies an error.
Error can be a syntax analysis error or an error in an event processor. To report other exceptions from the event processor, the exception can be packaged in SAXEXCeption.
Example: Assume that the event processor captures an indexoutofboundsexception when the StarTelement event is processed. The event processor can encapsulate the indexoutofboundsexception in SAXException:
Public void startElement (String Uri,
String name, String QualifiedName,
Attributes attributes)
{
Try
{
// the code may throw an indexoutofboundsexception
}
Catch (IndexOfbounds E)
{
Throw new saxception (e);
}
}
SaxException has been passed up to the PARSE () method, which is captured and explained there.
Try
{
Parser.Parse (URI);
}
Catch (SAXEXCEPTION E)
{
Exception x = E.GETEXCEPTION ();
IF (null! = x)
IF (x instanceof indexoutofboundsexception)
// process the indexoutofboundsexception
}
Maintenance Status Listing 1 is very convenient for SAX syntax analyzers because it stores information as the properties of price elements. Applications only need to register StartElement ().
Example Listing 3 is more complicated because the information is dispersed into several elements. In particular, suppliers have different prices according to different delivery delays. If the user is willing to wait, he (or her) may get a better price. Figure 6 demonstrates the document structure.
Listing 3. XtpriceList.xml
XML Version = "1.0"?>
xbe: vendor>
xbe: vendor>
xbe: vendor>
xbe: price-list>
Figure 6. Price list structure
To find the best business, the application must collect information from several elements. However, the syntax analyzer can generate three events for each element - startElement (), characters () and endElement (). The application must associate events and elements in some way.
Listing 4 is a newly built Java application that looks for the best price in the price list. When you look for the best price, it takes into account the customer's demand for delivery. In fact, the cheapest supplier (XMLI) in Listing 3 is also the slowest. On the other hand, EmailaHolic is very expensive, but it can be delivered within two days.
You can compile and run the app as the Cheapest application described earlier. The result depends on the demand for delivery date. You will notice that this program uses two parameters: file names and customers are willing to wait for the longest delay.
Java com.psol.xbe2.bestdeal data / xtpricelist.xml 60
return:
The Best DEAL IS proposed by xmli. A (n) xml training Delivered [CCC]
In 45 Days for $ 699.00
and:
Java com.psol.xbe2.bestdeal data / xtpricelist.xml 3
return:
The Best DEAL IS proposed by EmailaHolic. A (n) xml training [ccc]
Delivered in 1 Days for $ 1,999.00
Listing 4 of the hierarchical architecture is the most complex application you see so far. This is not unusual: the level of SAX syntax analyzer is very low, so the application must take over to DOM to complete a lot of work.
The application is around two class organizations: SAX2BESTDEAL and BESTDEAL. SAX2BESTDEAL manages the interface between SAX Syntax Analyzer. It uses a consistent way to manage status and group events.
Bestdeal has logic that performs price comparison. It also maintains information in structure as an application rather than an XML optimized. Figure 7 demonstrates the architecture of the application. Figure 8 shows the UML class diagram.
Figure 7. Architecture of the application
Figure 8. Class diagram of the app
SAX2BESTDEAL handles several events: startElement (), endElement () and character (). SAX2BESTDEAL has been tracking its location in the document tree.
For example, in the characters () event, SAX2BESTDEAL needs to know that the text is name, the price is still negligible. Moreover, there are two Name elements: the name and vendor of Price-List.
The status is different from the DOM syntax analyzer, and the SAX Syntax Analyzer does not provide status information. The application is responsible for tracking its own status. This has several optional entities. Listing 4 identifies a meaningful state and the conversion between them. It is not difficult to obtain this information from the document structure in Figure 6.
Obviously, the application will first encounter a Price-List tag. Therefore, the first state should be located within the Price-List. From there, the application arrived at a Name. Therefore, the second state is located within the name of the Price-List.
The next element must be Vendor, so the third state is located in the Vendor located in Price-List. The fourth state is within Name of the Vendor located in Price-List, because Name is following Vendor. The NAME is a price-quote element, and the corresponding state is within the price of the Vendor of Price-List. Subsequently, the syntax analyzer encounters the price-quote or vendor that already exists.
This concept can be easier to make this concept on the map with state and conversion (eg, shown in Figure 9). Note that there is two different states of different name elements depending on the processing Price-List / Name or Price-List / Vendor / Name.
Figure 9. Status conversion diagram
State variable storage in the list 4 stores the current state:
Final protected int start = 0,
PRICE_LIST = 1,
PRICE_LIST_NAME = 2,
Vendor = 3,
Vendor_name = 4,
Vendor_price_quote = 5;
protected int stat;
The value of the conversion converting 1 state variable changes accordingly according to the event. In this example, ElementStart () is updated:
Ifswitch (state)
{
Case Start:
IF (name.equals ("price-list"))
State = price_list;
Break;
Case price_list:
IF (Name.Equals ("name"))
State = price_list_name;
// ...
IF (Name.Equals ("vendor"))
State = vendor;
Break;
Case vendor:
IF (Name.Equals ("name"))
State = vendor_name;
// ...
IF (name.equals ("price-quote"))
State = vendor_price_quote;
// ...
Break;
}
SAX2BESTDEAL has several instance variables to store the contents of current Name and Price-Quote. In fact, it maintains a small set of trees. Note that it is different from the DOM, which never has the entire tree because it will discard them when the application has used Name and Price-Quote.
This is a very effective memory strategy. In fact, you can handle hundreds of billions of bytes, because at any time, there is only one small subset in memory.
Conversion 2 Syntax Analyzer Calls CHARACTERS () in each character in the document. Only text in Name and Price-Quote makes sense, so the event processor uses the status.
Switch (state)
{
Case price_list_name:
Case vendor_name:
Case vendor_price_quote:
Buffer.Append (Chars, Start, Length);
Break;
}
Convert 3 endelement () event handler update status, and call Bestdeal to handle the current element: Switch (state)
{
Case price_list_name:
IF (Name.Equals ("name"))
{
State = price_list;
SetProductName (buffer.tostring ()); // ...
}
Break;
Case vendor_name:
IF (Name.Equals ("name"))
State = vendor;
// ...
Break;
Case vendor_price_quote:
IF (name.equals ("price-quote"))
{
State = vendor;
// ...
Compare (vendorname, Price, Delivery);
// ...
}
Break;
Case vendor:
IF (Name.Equals ("vendor"))
State = price_list;
// ...
Break;
Case price_list:
IF (name.equals ("price-list"))
State = start;
Break;
}
Listing 4 is a typical SAX application. There is a SAX event processor (SAX2Bestdeal), which packs events with the most suitable application format.
Note: Is the state or stack? The replacement method using the status variable is to use "stack". Push the element name (or another ID) into startElement (); then pop it out in endelement ().
Application logic (in Bestdeal) is separated from the event processor. In fact, in many cases, the application logic is written independently of XML.
The hierarchical method establishes a significant boundary between application logic and grammatical analysis.
The examples also clearly illustrate SAX more efficient than DOM, but it requires programmers to do more work. In particular, programmers must explicitly manage the conversion between state and status. (In the DOM, the state is implied during the recursive traversal of the tree.)
Flexibility XML is a very flexible standard. But in fact, the flexibility of XML applications depends on your, programmers, how to create them. This section provides some techniques to ensure your application utilizes XML flexibility.
Building a Bestdeal application is rarely constrained by the BestDeal application for the XML document structure for flexibility. If you add an element in an XML document, they will be ignored. For example, Bestdeal will accept the following Vendor elements:
xbe: vendor>
However, the contact information will be ignored. Usually, it is a good idea to ignore unknown elements - HTML browser will always do.
Force the structure, but the structure is not difficult to verify their structure from the event processor. The following code snippet (taken from the startElement ()) check the structure, and throws SAXException if the vendor element contains any elements other than the name or price.
Case vendor:
IF (Name.Equals ("name"))
{
State = vendor_name;
Buffer = new stringbuffer ();
Else IF (Name.Equals ("Price-Quote"))
{
State = vendor_price_quote;
String st = attributes.getvalue ("", "delivery");
Delivery = integer.parseint (st);
Buffer = new stringbuffer ();
}
Else
Throw New SaxException ("Expecting
Break;
If the list comes with a Contact element, it will report:
Org.xml.sax.saxException: Expecting
However, if the application actually depends on the structure of the document, it is best to write a mode and use the Validation Syntax Analyzer.
What do you do next? In this XML BY EXAMPLE second edition, you learned how to read the XML document. The rest of the book will guide you how to write the entire process of documents and end this course. Or, you can refer to the online tutorial and article to complete the study. (Of course, I hope you choose my book.)
Reference
Download the list of this extracted from the XBE2 page of the author website.
Download JDK from Sun or IBM.
Find more information about XML BY EXAMPLE second edition on the publisher page of this book, including the link to the bookstore website, where you can order this book (will be published in September).
Browse FAQ, history and software support on the official SAX homepage, it is developed by groups on the XML-DEV list. Download SAX from the SAX project page on the SourceForge.
Support SAX syntax analyzer:
·
James Clark XP
苐 fred
ActiveSAX for Vivid Creations
Xerces, Apache Grammar Analyzer, Previous IBM Syntax Analyzer, is one of several syntax analyzers that support DOM and SAX. There are currently Java and C versions, and can also be used in conjunction with Perl and COM.
Please refer to more SAX2 software in SAX2 drivers and applications on SAX pages.
If you need a directory file in your application, download the directory package of Norman Walsh.
View the recently released SAX skills in the developerWorks XML zone, including Turning On Validation in Sax-Based Parsers and USING SAX EntityResolver.
Please take a closer look at more XML application examples built in the Working XML of the DeveloperWorks XML.
See how the IBM WebSphere Application Server for the XML syntax analyzer and the XSLT engine in WAS Advanced Edition 3.5 online help.
With regard to SAX and DOM, you can open the slide PDF from Kelvin Lawrence (IBM XML Technology CTO) Speech "A Detailed Introduction To Parsing and Processing XML Documents Using Java (TM) TECHNOLOGY" . When using SAX to parse XML files, we often be surrounded by a large number of IF or Switch statements. If appropriate design mode is used, combine the appropriate algorithm, avoiding a large number of judgment statements everywhere in the parser.
About SAX understands XML Parser should know that XML has two basic resolution: DOM (Document Object Model), SAX (Simple API XML). Among them, SAX does not form a tree in the XML document, thereby reducing the occupation of memory, and SAX is an event trigger, and the user only needs to implement the interface that you care, simplify the program.
In actual applications, if only the content of XML is scanned once, it is not necessary to repeat the node, the SAX resolution should be the preferred by the user. However, it also has its own weaknesses:
The SAX mode is an event trigger, the user implements the corresponding interface method, and the judgment of the different nodes is mainly based on the node name. According to conventional processing, in order to determine the currently processed node name, a large number of judgment statements are required. In this way, a large number of judgment statements are spread to the resolution program. For changes in nodes, you need to modify the judgment statements of all interface methods in the parser, making it difficult for the entire program framework.
Combination design mode (Composite)
Regarding the design pattern, you can't consider design patterns. The design pattern can be a language that we communicate with each other when we do design, as UML is the language we use when we model it.
Each pattern describes a problem that we have repeatedly encountered during our design, as well as the core of the solution. In this way, we can use the scheme again and again without having to repeat labor. Its core is to provide a solution for related issues.
The design pattern cannot be used freely. Usually we have achieved flexibility and variability by introducing additional indirect hierarchies, making the design more complex and sacrifices a certain performance. A design pattern is only necessary when it provides the flexibility to be truly needed.
Structural map:
The key to the Composite mode is abstract class Component, which can represent both the leaf node or represent a combined node; it defines public operations of all nodes.
T.: The object is combined into a tree structure to represent a "partial-overall" hierarchy. Composite enables users to have consistency for a single object and combined object.
Applicability: Use the "Combination" mode in the following cases:
You want to indicate part of the object - the overall hierarchy.
You want users to ignore different combined objects and individual objects, and users will unify all objects in the combined structure.
Our use background Our team plans to use XML as a configuration file to describe the display of the page, so that the corresponding JSP code can be automatically generated. Below, I will explain our design ideas and their changes, with everyone.
In our system, we simplify the page, limit the page elements in several of the following. Initially, our DTD is defined as follows:
XML Version = "1.0" encoding = "GB2312-1"?>
)>
Hidden (True | False) "False"
OnkeyPress CData #implied>
"Attlist button
Action cdata #implied
Onclick cdata #implied
>
In the original XML parser, there is no planning object interface, and the operation of each node is characterized by its characteristics, and there is no uniformity of the attributes, attributes, and child elements. Moreover, some nodes have no design corresponding Java classes, some Java classes correspond to several nodes. Therefore, a large number of judgment statements are filled in the parsing process. Later, we summarized the system needs and found: there are several basic nodes: Lable, Input, Botton, others can include several basic nodes. If you want to remove all the judgment statements in the program, you need to be completely consistent with any node. This way, you need to modify our design:
1. Extension DTD, adds the classname property for each node, so that the creation of each node object is exactly the same, dynamically created by classname; new DTD see the reference.
2. Expand the IXMLELEMENT interface, and each node has a corresponding Java class, at the same time, combined design modes (Composite) enables the processing of the leaves node (basic node) and the non-leaves (container node) to be unified .
The new class diagram is as follows:
Second, a new algorithm is needed.
1. Considering that SAX is an event trigger, we use the way to enter and enter the stack, basically described below:
2. At the beginning of each node element, create a node object, and save all properties, then go to the stack;
3. In the end of each node element, 2 objects are popled up from the stack, and the first object is incremented to the second object, and the second object is put into the stack;
The detailed code of the resolution program is as follows:
Package Lulusoft.jspBuilderv10.Parser;
Import org.xml.sax. *;
Import org.xml.sax.helpers. *;
Import java.io. *;
Import org.apache.xerces.Parsers.saxparser;
Import java.util.stack;
Import lulusoft.jspbuilderv10.common. *;
Public class cxmlparser extends org.xml.sax.helpers.defaulthandler {
/ ** Private properties * /
CXMLELEMENT root = NULL;
Private CharRaywriter Contents = New CharaRrayWriter ();
Private stack stack = NULL;
/ **
* Function Description: Constructor
* /
Public cxmlparser () {
Super ();
Stack = new stack ();
}
/ **
* Function Description: Label start
* /
Public void startElement (String Namespaceuri,
String Localname,
String Qname,
Attributes attr) throws saxception {
CONTENTS.RESET ();
String cn = attr.getValue ("classname");
IF (cn == null || cn.trim (). Equals (")) {system.out.println (" Tags do not have a corresponding class! Please check! ");
}
CXMLELEMENT Element = CXMLELEMENT.CREATEEELEMENT (CN);
INT size = attr.getlength ();
For (int i = 0; i String name = attr.getqname (i); String value = attr.getValue (i); IF (Name.Equals ("ClassName")) { CONTINUE; } Element.addattribute (name, value); } Stack.push (element); } / ** * Function Description: The label ends * / Public void endelement (String Namespaceuri, String Localname, String Qname) throws saxception { IF (stack.size ()> 1) { CXMLELEMENT ELEMENT1 = (CXMLELEMENT) stack.pop (); CXMLELEMENT Element2 = (cxmlelement) stack.pop (); IF (Contents.toString (). Trim ()! = null) { Element1.seElementValue (CONTENTS.TOSTRING (). TRIM ()); } Element2.addcontainelement (element1); Stack.push (element2); } Else { Root = (cxmlelement) stack.peek (); } } / ** * Function Description: Get root packets * / Public cxmlelement getroot () { Return root; } Public void characters (char [] ch, int start, int ingth) Throws saxception { Contents.write (CH, Start, Length); } / ** * Function Description: Resolution XML and return data * @Param FileName String XML file name * @Return CXMLELEMENT * / Public static cxmlelement parseocument (String filename) { CXMLPARSER M_OXMLPARSER = New CXMLPARSER (); Try { XmlReader XR = new saxparser (); Xr.SetContentHandler (m_oxmlparser); Xr.Pars (New FileRead (New FileReader (FileName)); } Catch (SAXPARSEEXCEPTION SAXPARSEEXCEPTION) { System.out.println ("Parsing Error:"); System.out.println ("line:" saxparseException.getLineNumber ()); System.out.println ("Colum:" SaxParseException.getColumnNumber ()); System.out.println (" saxparseException.getMessage ()); Catch (SAXEXCEPTION SAXEXCEPTION) { SAXEXCEPTION.PRINTSTACKTRACE (); } Catch (Exception E) { E.PrintStackTrace (); } Return m_oxmlparser.getroot (); } } XML is currently very popular in data expression format, which is characterized by portability, unrelated to the platform, and has a direct readable form. Document Object Model (DOM) is an interface to access XML data. Unfortunately, DOM is a fairly complex API, which is more difficult to quickly master. However, if you can know the DTD of the data, it is much easier. This article will be introduced to how to use the Java version of the DOM to use the Java version of the DOM. Scalable Markup Language (XML) is quite popular, it is a portable, unrelated to the platform and direct readable data format. Many software vendors have claimed "support XML", which is usually referred to as their software products will generate or use data in XML format. XML is also considered to be a general format of an inter-enterprise exchange data. It allows companies to consisten to the data exchanged data on the XML document type definition (ie, DTD). These DTD files are independent of the data type used in the enterprise. Many standardized organizations are working on DTDs that regulate exchange data. One of the examples is the International Publishing Communications Commission (see Resources) has defined an XML DTD. This DTD can make "the transferred news information can be easily converted into electronic publishing format". These market standards will cause data to be exchanged between different applications without prior notice. Way 3C XML syntax and semantics are specified in the defined XML specification (see Resources). An XML document must be processed by grammar analysis. If each program must first syntax analysis, it will be very difficult because it is very complicated to give the syntax and semantics of this language. W 3C Document Object Model (DOM) has been defined to solve this problem. The DOM is an application programming interface for XML data. Most XML grammar analyzers generate a DOM description for the XML analyzed. The DOM standard DOM API is defined as a series of CORBA IDL interfaces (see Resources). It uses an abstract tree to describe a grammar analysis XML document. The reason why it is abstract because only these interfaces reflect the structure of the tree. The actual data structure and algorithm used to achieve abstract trees are not necessarily a tree structure. Since the DOM API is specified in the form of CORBA IDL, it is supported by many programming languages, including Java languages. We assume that the standard Java language is used herein. The DOM specification gives a detailed Java interface. The first layer of DOM is used in 1998. It leaves some reserved part to further expand it according to the later practical experience. The DOM second layer specification adds support for the XML namespace, document creation, view, and model form, etc. on the basis of the first layer. The second floor specification is still waiting for public evaluation. Although it has not been finally completed from a technical note, it is already quite stable. For an XML document, many XML syntax analyzers can be used for Java programs to generate a first layer description of the DOM. Therefore, the code here is only assumed to be a DOM-based first layer subset. General or specific DTD code written in Java using the DOM API either universal, or is based on a specific DTD. General Code works normally in all XML documents. General code is generally more difficult to write because it usually must traverse the entire DOM tree and consider various possibilities. The code cannot rely on any particular element, attribute, and document structure. General code is used to handle general transactions, such as checking the spelling errors in the document, calculating the number of text, sending files through the network, and so on. On the other hand, a particular DTD code is written in accordance with a particular DTD. It cannot be used to operate XML documents defined by another DTD. The specific DTD is easier to write because it assumes that the XML document has the format specified by this particular DTD. For example, suppose a DTD declares an element called "Name" requires an attribute called "Given", and the Java code can assume that this property exists and invigorate by simply dom gettribute (). This article will help you write a specific DTD Java code. The task after this is to learn how to write universal DTD code. One example is explained how to use the DOM API in the Java program, we will use a order program as an example because it is a typical XML-based B2B application and has a rich XML structure. Here is the DTD we use the order. xml encoding = "us-ascii"?>
Given cdata #Required Family cdata #Required >
Element street (#Pcdata)> Below is an XML document containing a single order of 100 ornaments: XML Version = "1.0"?>
555 Main Street street>
Mill
Valley
city>
California
state>
USA
country>
address>
billing>
100 main street
street>
Brisbane
city>
California
state>
USA
country>
address>
shipping>
header>
item>
order>
The XML Syntimal Analyzer describes the top XML document into one tree. This tree structure can be described with graphics as follows:
Ellipse represents an XML element. Square represents data, from the straight line departing from the NAME element represents the XML property. The Address element is not detailed in the figure.
Some simple steps for accessing XML data in a Java program now, we use a specific DTD Java code to illustrate an important part of the DOM API. In particular, we give us the following DOM methods.
GetDOCTYPE
GetName
getElementsBytagname
Item
GetFirstChild
GetNodeValue
GetaTtribute
getChildNodes
Getlength
Gettagname
In this example code, all underscore methods calls are part of the DOM API. Another complete source code is available for download.
Defining a simple interface of accessing a DOM document The first step is to define an abstract interface that greatly simplifies the code using the XML document. For the example DTD we used, define the following interfaces:
Interface order {
String Creditcard ();
String billingname ();
Double Totalprice ();
Boolean AuthorizationIzecredit ();
}
The code for accessing XML data simply calls the set of operations defined by the interface.
It should be noted that the implementation of this interface can have a variety of different ways, some of which are independent of XML (eg, an implementation can send a request to the database). Of course, here we only care about the operation of processing XML data with the DOM API, which is consistent with the previous DTD-based order. Implementation Interface Now writes a class that implements the interface and encapsulates the DOM document. For example, we define:
Class ORDERIMPL IMPLEMENTS Order {
Document has
Bundle the DOM document to the package class in the builder to pass the DOM document to the builder of the package class. The builder checks the document type to determine that the document does conform to the order DTD. Remember that these codes are just for this DTD and make assumptions to the structure and content of the data.
Public ORDERIMPL (Document Document) Throws Exception {
= document;
DocumentType DOCTYPE = these Document.getDOCTYPE ();
IF (DOCTYPE == NULL) Throw new Exception ("Cannot Determine Document Type.");
if (! doctype.getname (). Equals ("Order")) THROW NEW?
"" Document Is Not An ORDER. ");
}
Note that the code utilizes the getDOCTYPE operation to get the document type, which is defined by the DOM in the Document interface. GetDOCTYPE operation returns an object that supports the DocumentType interface. Its name represents the type of document.
Also note that some DOM implementation returns NULL to getDOCTYPE operations, which cannot be used for this builder.
This interface is given after the code for bundling the document to the package class. Each operation given in the interface example shows how to use DOM to complete a specific task.
Returns the value of the Creditcard method for a particular element illustrates how to return a value of a particular element. It uses the getElementsByTagName operation defined in the Document interface.
Public String Creditcard () {
Nodelist NL = haspedTagname ("CreditCard");
Return nl.Item (0) .Getfirstchild (). getnodevalue ();
}
Typically, getElementsByTagName returns an element list. Because there is only one element named "CReditCard" in our sample DTD, only one element is included in this list, i.e., Item (0). In this way, nl.item (0) can be represented by the following figure:
The string value of the CreditCard element can be obtained by calling getFirstChild (). getnodeValue () method by calling on the credit card node.
Note that getElementsBytagname (ElementName) operation returns all elements named by ElementName. Depending on the definition, it returns the preferences to the document tree to return elements.
Since Creditcard is unique in our sample, you can find the element directly. However, other elements (such as Name) are not unique. We cannot use the first element returned directly from getElementsByTagname. In fact, the name of a child element of Billing is Name, and one child element of Shipping is also called Name.
Attribute values returned from a unique subtree in the case where the names such as Name are not unique, the BillingName method is a method of obtaining an element value. Note Name is not unique in this document, but is unique in the Billing subtree. Also note that the Billing element is unique over the entire document. In this way, we can simply call getElementsBytagname ("billing" in the document, and then call getElementsByTagname in the returned billing element. Since getlementsByTagname is also defined in the Element interface in the DOM API, you can do this. Public string billingname () {
Nodelist BL = theDocument.getlementsBytagname ("billing");
Nodelist NL = (Element) BL.Item (0)). GetElementsBytagname ("name");
ELEMENT NAME = (Element) nl.Item (0);
Return Name.GetaTribute ("Given") " Name.GetaTribute (" family ");
}
With the BillingName method, another technique can also be explained to obtain an attribute value of an element. Note that in our DTD, the NAME element is defined two properties: Given and Family. The GetAttribute operation is defined by the ELEMENT interface, which returns the text value of the properties.
The value of returning elements from the multi tree is now considering the Price element. We can no longer use the method just now, because Price is a child element of the document, and is also a child element of each Item element. TotalPrice method illustrates another method for finding a non-only element value. According to the document structure, we need to be top-level Price elements.
Public double totalprice () {
Nodelist nl = haspedocument.getdocument (). GetChildNodes ();
Element CandidateElement = NULL;
For (int i = 0; i IF (nl.item (i) instanceof element) { CandidateElement = (Element) nl.Item (i); IF (CandidateElement.gettagname (). Equals ("price")) BREAK; } } Return double.Parsedouble (CandidateElement.getfirstchild (). getnodevalue ()); } GetDocumentElement operation returns a description document. From here to getchildnodes get its child node. By analyzing DTD, you can see that only one header element, at least one ITEM element and a price element. So we just loop look up the child node until we find a child element named price. Similarly, once we get this element, you call getFirstChild (). GetNodeValue () to get its value. Abstract Creditcard, BillingName and Totalprice are basic operations in our interface. They simply find and return the corresponding XML elements. On the other hand, our interface also includes abstract authorizecredit methods. There is no corresponding element with it in the XML document. The authorizecredit implementation is given below. It simply uses the BillingName, Creditcard and TotalPrice methods we have implemented in the package class. Public Boolean AuthorizationIt () { // illustrates Abstract Return Authorize THIS.BILLINGNAME (), THIS.Creditcard (), this.TotalPrice ()); } The class client uses us to define an abstract interface to access our XML document and implement them in the package class by using 10 important DOM operations. Next, let's introduce how to use the Java code to use these already defined interfaces. The following code simply calls the syntax analyzer and passes the DOM document returned to our package, and then calls each method we implemented. The writing of these codes uses IBM XML Parser For Java (see Resources). The use of other grammar analyzers is substantially the same. Import com.ibm.xml.xpk4j.xml4j2. *; Import java.io. *; Public class test { Public static void main (String [] args) { XML4J2DOMSource Parser = New XML4J2DOMSource (); Try { Parser.Parse ("Order.xml"); Order THEORDER = New ORDERIMPL (Parser.getDocument ()); System.out.println ("The Credit Card IS" THEORDER.CREDITCARD ()); System.out.println ("Total Price IS" THEORDER.TOTALPRICE ()); System.out.println ("The Billing Name IS" THEORDER.BILLINGNAME ()); THEORDER.AUTHORIZECREDIT () } Catch (Exception E) { E.PrintStackTrace (); } } } We show you 10 important operations of the DOM API through this very simple example. Through these operations, we explain how to find, browse, traverse elements, and get elements and their attribute values when known DTD. This will make a solid foundation for learning other DOM APIs. "Document Object Model (DOM) provides a useful module to extend its core functions in advance. This article deepens the Dom Traversal module, demonstrates how to identify whether your syntax analyzer supports the module and how to use it to traverse the selected node set or the entire DOM tree. After reading this article, you will completely understand the Dom Traversal and have a powerful new tool in your Java and XML programming toolboxes. Eight sample code list demonstrates these technologies. If you have made a lot of XML processing in the past three years, you have almost certainly encountered the "Document Object Model" (Dom). This object model represents the XML document in the application, and provides a simple way to read the XML and write data in an existing document (if you are a DOM newer, see Resources to get more background Knowledge.) If you are working hard to become an XML master, you may have already learned DOM and know how to use almost every way it provides. However, many DOM features are not recognized by most developers. Most developers have actually been exposed to the Dom core. This core refers to the DOM specification. It summarizes the meaning of the DOM. How should it operate and provide which methods for providing. Even experienced developers do not know or understand many of the uncommon DOM modules. These modules allow developers to use trees more efficient and easily, and handle different ranges of nodes, operate on HTML or CSS pages and other tasks, all of which are only available to use the core DOM specification. In the next few months, I plan to write a few articles, introduce several modules, including HTML modules - RANGE modules - In this article, the Traversal module will be described. By learning how to use DOM Traversal, you will see how fast the entire DOM tree is traversed, build custom object filters to easily find the data you need and how fast the DOM tree is traversed in an unprecedented. I will also introduce you to a utility that allows you to check if your chosen syntax analyzer supports a specific DOM module, and I will also demonstrate many other sample code. So, please start your favorite source editor and let us start. Getting Information First, make sure you have a tool to traverse some sample code. For this article, your hand must have an XML syntax analyzer. This syntax analyzer needs to provide a DOM implementation. In fact, it is very simple; almost every XML syntax analyzer you can support supports SAX (Simple API for XML) and DOM. You want to make sure that the syntax analyzer you have has a DOM level 2 support, which is simple, just read the list of the language of the syntax or simply get the latest version from the supplier. After getting a syntax analyzer, you need to make sure it supports the Dom Traversal module we are discussing. Although this should also find the description of this in the grammar analyzer document, I want to demonstrate a simple programming method to check this. In fact, the programs demonstrated in List 1 can let you ask any syntax analyzer: see if it has any modules. I include most of the specific inspections of most common DOM modules, of course, including DOM Traversal. This program uses DOM class org.w 3C .dom.domplementation and its Hasfeature () method: Check these modules by passing the name of each module to find out which modules do it easier. The code is quite simple, I left the task of the reading process process to you. Listing 1. DomModuleChecker class Import Org.w 3C . Dom.domplement; Public class dummoduleChecker { / ** Vendor DomIMplementation Impl class * / Private string vendormementationclass = "Org.apache.xerces.dom.dommementationImpl"; / ** modules to check * / Private string [] Modulenames = {"XML", "Views", "Events", "CSS", "Traversal", "Range", "HTML"}; Public DomModuleChecker () { } Public DomModuleChecker (String VendoriMplementationClass) { THIS.VENDORIMPLEMENTATIONCLASS = VendorImplementationClass; } Public void check () throws exception { Domillmentation Impl = (DOMIMPLEMENTATION) Class.Forname (VendorImplementationClass) .newinstance (); For (int i = 0; i Make sure your ClassPath environment variables and working directory have your syntax analyzer (you should include DOM implementation). You can compile this source file and run it. Also, please pay attention to the bold line in "Listing 1"; if you are using the syntax analyzer other than Apache Xerces, you need to provide the Domim Purchase interface to the Domim Purchase interface. If you are using Xerces, you can keep the programs in Listing 1 unchanged, then compile it. I just use the latest version apache Xerces 1.4.1 (In my classpath). Get the following output: Listing 2. See which modules are supported by XERCES Brett mclaughlin @ galadriel ~ $ Java DomModuleChecker Support for xml is include in this Dom mailmentation. Support for views is not inclined in this Dom mail. Support for Events Is Included in This Dom. Support for css is not inclined in this Dom mail. Support for Traversal Is Included in this Dom. Support for Range is not inclished in this Dom. Support for html is not inclined in this DOM IMPLEMENTATION. This let me know: Support for Traversal modules (also the topic of this article) does exist, then I am going to continue. If your syntax analyzer is different and does not provide DOM Traversal support, then I recommend you use Apache Xerces, at least use it for examples in this article (for links, check the reference information). Once you get a syntax analyzer that supports Dom Traversal, please continue to read the next section "Getting Started". Getting Started Multi-TRAVERSAL in Org.w 3C In the .dom.traversal package, I will explore them in this section. First, you may just want to take a look at the class in the package; there are only four classes. The first class is DocumentTraversal, which is where all works work. It is used to create two types of traversal classes provided by modules: NodeIterator and Treewalker. I will tell these two classes later. It is easy to think about how to create these classes: Create NodeTeiterator () CreateTreeWalker () CreateTreeWalker. Very simple, ha? There is also a class is Nodefilter, which is used to customize the nodes returned in iterations and trees. Next, what is the (or which) class in the grammar analyzer needs org.w 3C .dom.documenttraversal interface so that you can create a tree overhead or node iterator. This is the easiest to use the Java documentation of the syntax analyzer. However, in general, realize DOM Org.w 3C The class of the .dom.Document interface also implements DocumentTraversal. Therefore, you need to complete the work as shown in Listing 3: Listing 3. Get a DocumentTraversal instance // Get Access to your Parser's Org.w 3C .dom.document implemementation Document doc = new org.apache.xerces.dom.documentimpl (); // Get a traversal instance by type-casting DocumentTraversal Traversal = (DocumentTraversal) DOC; // Create Node Iterators or Tree Walkers Note: I left a comment in "Listing 3", where you put your own DOM code. I will detail this later. First, the type of object is forced to convert to DocumentTraversal. Then, prepare to create the NodeIterator and TreeWalker instance. The node knows that since I know how to get the DocumentTraversal instance, you can let the Dom Traversal start working. However, you need to demonstrate a specific example before starting an iterative node. Listing 4 demonstrates an XML document that stores information about a book in a online bookstore (obviously part of books). You can see that each Book item has a title, the author and a short description. These descriptions surround some keywords using the Keyword tag. If you set the property search to "True", you will use this word in the keyword search; if you are set to "false", use these keywords only in the internal index (please forgive, this is just an example!). In the process of processing online books, a common task is to allow users to search books through these keywords. For example, customers may want to "Middle Earth" or "Galaxy" or other keywords. These words appear in these books, but they are not easy to use standard DOM mode. And this is the land of Dom Traversal. Typically, you must find the root element, then find the BOOK element under the root element, find the Description element per book, then search for the keyword element of the search element "true". Even for this quite ordinary task, there are also many code to be implemented. However, Dom Traversal makes it easy. First, you need to achieve org.w 3C The unique method acceptnode () in the .dom.traversal.Nodefilter interface is created an implementation of the interface. This method accepts a Dom Org.w 3C .dom.Node acts as a unique argument, then it will be passed to each node in the DOM structure that is being processed. It then starts processing the node and returns a constant (in the form of a Java Short), indicating that the node should be returned to the current NodeIterator or I should ignore it. This means that developers do not need to write many very annoying node iterative code (this name begins to make sense, isn't it?). This filter simply checks the special type, attributes, and its values of the node, and other standards, thus judge whether the acceptance or reject the node. Since a detailed description of a thousand words, a piece of code is top of one million words, then I will demonstrate an implementation of NodeFilter, which only accepts nodes in the keyword element and the search property is "true". . Check out "Listing 5". Listing 5. Get searchable keywords Import Org.w 3C . Dom.Element; Import Org.w 3C . Dom.Node; Import Org.w 3C .dom.traversal.nodefilter; Public Class KeywordsNodefilter Implements Nodefilter { Public short acceptnode (node n) { IF (n.GetNodetyPE () == node.text_node) { Node Parent = N.GetParentNode (); IF (parent.getnodename (). Equalsignorecase ("keyword")) { IF ((Element) .GetaTributenode ("Search") .GetnodeValue () .equalsignorecase ("True")) { Return Filter_Accept; } } } // if We got here, not intended Return Filter_skip; } } If you are very familiar with the DOM core, "Listing 5" has many significance for you. First, you only need the actual keyword itself (not a keyword element), so this filter checks the node, see if it is an org.w3c .dom.Text node. Then, if its parent node is named keyword, it checks the attribute value of the element, if the value is "true", then it accepts this node (using the filter_accept constant defined in the Nodefilter interface). Otherwise, skip this node by returning Filter_SKIP. Very simple, is it? After writing this filter, you only need to create a new NodeIterator with this filter. There are several information to be provided to the CreatenodeEiterator () method discussed in the previous section. First, provide an element that starts searching from this element; unless you just want to search for a specific part of the DOM tree, I usually start searching from the root element. Step 2, you can define the search range by specifying only the elements or attributes or other structures. Because there is nodefilter, I actually want to iterate all nodes (let the filter do this), so I provide constant nodefilter.show_all. The next self-variable is an example of Nodefilter, which is of course an example of the KeywordSNodeFilter class in Listing 5. The final argument is a Boolean value that indicates whether the entity reference should be extended (if you don't know what entity reference, please check the XML tutorial, the tutorial can be found in the reference.). I almost always want to expand them, so I usually give a TRUE value. The iterator is then worked as a normal Java iterator. To see how all these work, check out inventory 6, organize all of these details, create a search XML document and print out all procedures for searching for keywords. So far, you should understand "Listing 6" based on the explanation of other listings. Please compile this program and the code in Listing 5. Then, save the contents of the list 4 as an XML document. I named my document as Keywords.xml. Add Xerces.jar to the classpath and working directory and run the KeywordSearcher class. I know that I omitted the last step, but if you are also a Java user, there should be no problem in compilation and setting. After running this class for your own XML directory copy, it should be obtained similar to "Listing 7". Listing 7. Run keyword search program C: / javaxml2 / ibm> Java KeywordSearcher Keywords.xml Processing file: Keywords.xml Search Phrase Found: 'Galaxy' Search Phrase Found: 'Hyperion' Search Phrase Found: 'DWARVES' Search Phrase Found: 'hobbit' Search Phrase Found: 'Foundation' Search Phrase Found: 'Wheel Of Time' Search Phrase Found: 'The Path of Daggers' Obviously, as documents become more complex, Nodefilter implementations will be more complicated accordingly. The main point here is that the DOM Traversal is the same as this small filter, which will become extremely powerful in more complex situations. For example, you can find data indicating a property or element in a document according to an element / attribute name. For the core DOM code, this is indeed a tricky task, which also needs to make a traversal of many trees, and NodeItemrator can handle these. Therefore, let your imagination freely, and build those filters! Find the tree in the forest, I have to introduce TreeWalker before the end of the lectures about Dom Traversal. Due to space limit (I hope this is an article, not just a chapter), I don't want to be too deep, but because you have already learned NodeIterator, this should be simple. Create TreeWalker by the method in Listing 8: If you realize that the parameters adopted by the TreeWalker method are the same as the CreateNodeFilter () method, it will not cause anything. In fact, the remaining problem is "What is the difference between iterative nodes and traversal trees?" The answer is: When using TreeWalker, you can maintain a tree structure. When using NodeITeiterator, the returned node is actually separated from the initial position in the tree. The iterative node makes the operation very quickly, because once the return node is returned to discard its position. However, when using TreeWalker, when the node returns from the custom node filter, the node remains in their tree. This allows you to view the entire XML document through a filter. Do an exercise, try writing a program to display an XML document that does not process instructions, comments, or properties in Listing 4. Before starting, there is a prompt: First, use Treewalker to ensure that the tree format is retained. Second, write a custom NodeFilter implementation so that only the nodes of the element or text type. Finally, use the program in Listing 6 as a template and change a few lines of code. Then, like that, you get a custom DOM tree view. If you understand the node section and you can write this sample program, then you are going to advance on the road of Dom Traversal. I hope you have always seen all the possibilities shown in the Traversal module. Traversing the DOM tree in a filtering method makes it easy for finding elements, attributes, text, and other DOM structures. You should also be able to write more efficient and more structural code using the DOM Traversal module. Therefore, using an existing program of the DOM tree, then convert it to use the Traversal method; I know you will be satisfied with the results. As in the past, please let me know if this article helps you (using the forum attached to this article), our web. Brief Discussion on Working Principles of Different XML Document Models in Java President of Dennis M. Sosnoski (DMS@sosnoski.com), Sosnoski Software Solutions, Inc. 2002 February 2002 In this article, the XML tool observers Dennis Sosnoski compares the availability of several Java document models. When you are elected a model, it is not always clear and what is clear, and if you change your mind later, you may need to make a lot of re-coding work to convert. The author combines the sample code with the analysis of the model API, which models may truly give your work easily. This article contains code samples that display five different document models. In the first article of this series, I studied some of the performance of the main XML document model written in Java. However, when starting to select this type of technology, performance is only part of the problem. The use of convenience is at least equally important, and it is a primary reason to support the use of Java-specific models, not DOM-independent of language. In order to understand which model truly role, you need to know how they are ranked in availability. In this article, I will try to do this, start from the sample code to demonstrate how to encode public types in each model. The results are summarized to end this paper, and some other factors that have prompted a more easily used than another. See the previous article (see the convenient link under the content or this article "to get the background information of each model used in this contrast, including the actual version number. You can also refer to the link to the source code to download, to the model home page in the "Reference" section. Code Comparison In these contrasts of the usage technology for different documents, I will display three basic operations in each model: Built a document according to the input stream Traverse elements and content and do some changes: Remove the preamble and trailing blank from the text content. If the content text content is empty, it deletes it. Otherwise, it is packaged into a new element called "Text" in the namespace of the parent element. Write the modified document to the output stream These examples of code are based on the reference programs I used in the last article and have made some simplification. The focus of the reference process is to display the best performance of each model; for this article, I will try to display the easiest way to implement the operation in each model. I have a structured example of each model into two separate code segments. The first paragraph is a read document, calling the modified code and writing code that has been modified. The second section is a recursive method that truly traverses document representation and execution. In order to avoid decentralization, I have ignored the abnormal processing in the code. You can receive a complete code for all samples from the bottom of this page to get the full code of all samples. Sample download versions include a test driver, and some add code is used to check the operation of different models by calculating elements, deleting and adding numbers. Even if you don't want to use the DOM implementation, it is worth browsing the description of the DOM usage below. Because the DOM example is the first example, I use it to explore some of the problems and structures of this example more than one problem compared to the rear model. Browse these contents to add some details you want to know, if you read one of the other models directly, you will miss these details. DomDOM specification covers all types of operations represented by documents, but it does not involve issues such as syntax analysis and generation text to generate text output. Includes two DOM implementations in performance testing, Xerces and Crimson use different technologies. Listing 1 shows a form of top-level code for Xerces. Listing 1. Xerces DOM top code 1 // PARSE The Document from Input Stream ("in") 2 DOMPARSER PARSER = New DOMPARSER (); 3 Parser.SetFeature ("http://xml.org/sax/features/namespaces", true); 4 Parser.Pars (New InputSource (in)); 5 Document Doc = Parser.getDocument (); 6 // Recursively Walk and Modify Document 7 modiFyElement (Doc.getDocumentElement ()); 8 // Write The Document To Output Stream ("OUT") 9 OutputFormat Format = New OutputFormat (DOC); 10 xmlserializer serializer = new XMLSerializer (out, format); 11 Serializer.Serialize (Doc.getDocumentelement ()); As I pointed out in the comment, the first code (1-5 line) in Listing 1 processes the syntax analysis of the input stream to build a document. XERCES defines the Domparser class to build a document from the Xerces Syntax Analyzer. The InputSource class is part of the SAX specification that can adapt to any one of several input forms for SAX analyzers. The actual syntax analysis and document constructs are performed by a single call. If this is successfully completed, the application can retrieve and use the structured document. The second code block (line 6-7) just passes the root element of the document to the recursive modification method you want to talk about. These codes are the same in nature, so I will skip it in the remaining example, and no longer discuss it. The third code block (line 8-11) processing is written to the output stream as a text. Here, the OutputFormat class package documentation and provides a variety of options for formatting text. The actual generation of the XMLSerializer class processing output text. XERCES's Modify method only uses a standard DOM interface, so it is also compatible with any other DOM. Listing 2 shows the code. Listing 2. DOM MODIFY method 1 protected void modifyelement (element element) { 2 // loop through child nodes 3 node child; 4 node next = (node) Element.getfirstchild (); 5 While ((Child = next)! = Null) { 6 // set next before we change anything 7 next = child.getnextsibling (); 8 // Handle Child By Node Type 9 IF (child.getnodetype () == node.text_node) { 10 // Trim Whitespace from Content Text 11 string trimmed = child.getnodeValue (). Trim (); 12 IF (Trimmed.Length () == 0) { 13 // delete child if Nothing but whitespace 14 Element.removechild (child); Else { 16 // CREATE A "Text" Element Matching Parent Namespace 17 Document Doc = Element.getownerDocument (); 18 string prefix = element.getprefix (); 19 string name = (prefix == null)? "Text": (prefix ": text"); 20 element text = 21 Doc.createElementns (Element.getNamespaceuri (), NAME 22 // Wrap The Trimmed Content with New Element 23 text.appendchild (doc.createtextNode (TRIMMED)); 24 Element.Replacechild (Text, Child); 25} 26} else if (child.getnodetype () == node.ement_node) { 27 // Handle Child Elements with Recursive Call 28 ModifyElement (Element); 29} 30} 31} The basic method used by the method shown in Listing 2 is the same as the method represented by all documents. Call it through an element, it traverses the child elements of that element. If you find a text content sub-element, you either delete text (if it is only consisting of space), either package text (if there is a non-spaced character) by new elements named "text" with the same namespace containing the elements. If you find a child element, then this method uses this sub-element, recursively call it itself. For DOM implementation, I use a pair of references: Child and Next to track the position I am in the subsequent list of sub-elements. Before any other processing is performed on the current child node, you will be loaded into the next child node (line 7). This makes I can delete or replace the current child node without losing my trace in the list. When I create a new element to pack non-blank text content (line 16-24), the DOM interface begins a bit messy. The method used to create elements is associated with documents and become a whole, so I need to retrieve the elements I are working in the owner document (Chapter 17). I want to place this new element in the same namespace as the existing parent element, and in the DOM, this means that I need to construct the qualified name. This operation will vary depending on whether there is a namespace, this operation will be different (line 18-19). With the limited name of the new element, I can create new elements (line 20-21). Once you have created a new element, just create and add a text node to package the content string, then use the newly created elements to replace the original text node (line 22-24). Listing 3. CRIMSON DOM top code 1 // Parse The Document from Input Stream 2 System.SetProperty ("javax.xml.parsers.Documentbuilderfactory", 3 "Org.apache.crimson.jaxp.documentbuilderFactoryImpl"); 4 DocumentBuilderFactory DBF = DocumentBuilderFactoryImpl.newinstance (); 5 dbf.setnamespaceaware (true); 6 DocumentBuilder Builder = dbf.newdocumentbuilder (); 7 Document Doc = Builder.Parse (in); 8 // Recursively Walk and Modify Document 9 modifyElement (doc.getdocumentelement ()); 10 // Write the Document to Output Stream 11 (XMLDocument) doc) .write (out); The CRIMSON DOM sample code in Listing 3 uses a JAXP interface for syntax analysis. JAXP provides a standardized interface for syntax analysis and conversion XML documents. The syntax analysis code in this example can also be used for Xerces (settings with the features of the document builder class name) to replace the earlier Xerces specific sample code. In this example, I first set the system characteristics in the second line to the third line to select the builder factory class (JAXP only supports building DOM), and does not support any other representation discussed in this article. ). This step is only required when you want to choose a specific DOM to be used by JAXP; otherwise, it uses the default implementation. For integrity, I contain this feature in the code, but more common is to set it into a JVM command line parameter. Then I created an instance of the builder plant in Chain 4 to 6, enabling the namespace support for builders that use the plant instance constructor, and create a document builder from the builder plant. Finally (line 7), I use the Document Builder to grant the input stream and construct the document representation. In order to write documents, I use the basic ways of internal definitions in CRIMSON. This method is not guaranteed to support this method in the CRIMSON, but use JAXP conversion code to use an alternative to the text as a text, such as XSL processors such as Xalan. That exceeds the scope of this article, but to get more information, you can check the JAXP tutorial in Sun. JDOM uses JDM's top code than the code implemented using the DOM. To build a document (1-3), I use SaxBuilder with the verification by the parameter value. Write the output stream of the modified document into the output stream as simple as the provided XMLOUTPUTTER class (line 6-8). Listing 4. JDOM top code 1 // Parse The Document from Input Stream 2 SAXBUILDER Builder = New Saxbuilder (false); 3 Document Doc = Builder.Build (in); 4 // Recursively Walk and Modify Document 5 modiFyElement (Doc.getrootElement ()); 6 // Write the Document To Output Stream 7 xmloutputter = new xmloutputter (); 8 Outer.output (DOC, OUT); The MODIFY method of JDOM in Listing 5 is also simpler than the same method of the DOM. I gain a list containing all content of the element and scan this list, check the text (icon like String object) and elements. This list is "live", so I can change it directly without calling the method of parent elements. Listing 5. JDOM Modify method 1 protected void modifyelement (element element) { 2 // loop through child nodes 3 list children = element.getcontent (); 4 for (int i = 0; i 5 // Handle Child by Node Type 6 Object Child = Children.get (i); 7 if (child instanceof string) { 8 // Trim Whitespace from Content Text 9 string trimmed = child.toString (). Trim (); 10 IF (Trimmed.Length () == 0) { 11 // delete child if only whitespace (Adjusting Index) 12 Children.Remove (I -); 13} else { 14 // Wrap The Trimmed Content with New Element 15 element text = new element ("text", element.getnamespace ()); 16 text.Settext (TRIMMED); 17 Children.Set (i, text); 18} 19} else if (child instanceof element) { 20 // Handle Child Elements with Recursive Call 21 ModifyElement (Element); twenty two } twenty three } twenty four } Creating new elements (line 14-17) is very simple, and unlike DOM versions, it does not need to access the parent document. The top code of the DOM4JDOM4J is slightly more complicated than JDOM, but their code line is very similar. The main difference here is that I saved the DocumentFactory (line 5) used to build DOM4J documents, and refreshed with Writer after outputting the modified document text. Listing 6. Top code for DOM4J 1 // Parse The Document from Input Stream 2 SAXReader Reader = New Saxreader (FALSE); 3 Document Doc = Reader.Read (in); 4 // Recursively Walk and Modify Document 5 M _factory = reader.getDocumentFactory (); 6 modiFyElement (Doc.getrootElement ()); 7 // Write the Document to Output Stream 8 xmlwriter write = new XMLWriter (OUT); 9 Writer.write (DOC); 10 writer.flush (); As you can see in Listing 6, DOM4J uses a factory method to construct the object contained in document representation (from syntax analysis). Define each component object based on the interface, so that any type of object that implements one of the interfaces can be included in the representation (opposite to JDO, it uses specific classes: These classes can be divided into geoconics and inherited, However, any classes used in document representation need to be based on the original JDOM class). You can get documents constructed in different components by using DOM4J documentation using different factories. In the sample code (line 5), I retrieved the (default) document factory for building a document, and stores it in an instance variable (m_factory) for use in the Modify method. This step is not strictly needed - you can use components from different factories in one document, or you can bypass the instance of the component directly - but in this case, I just want to create and use in the rest of the document. The same type of components, and use the same factory to ensure this step. Listing 7. Dom4j modify method 1 protected void modifyelement (element element) { 2 // loop through child nodes 3 list children = element.content (); 4 for (int i = 0; i 5 // Handle Child by Node Type 6 node child = (node) Children.get (i); 7 IF (child.getnodetype () == node.text_node) { 8 // Trim Whitespace from Content Text 9 string trimmed = child.getText (). Trim (); 10 IF (Trimmed.Length () == 0) { 11 // delete child if only whitespace (Adjusting Index) 12 Children.Remove (I -); 13} else { 14 // Wrap The Trimmed Content with New Element 15 element text = m_factory.createElement 16 (QName.Get ("Text", Element.getNameSpace ())); 17 text.addText (TRIMMED); 18 Children.Set (i, text); 19} Else IF (child.getnodetype () == node.ement_node) { 21 // Handle Child Elements with Recursive Call 22 MODIFYEELEMENT (ELEMENT); twenty three } twenty four } 25} The DOM4J Modify method in Listing 7 is very similar to the method used in JDOM. Do not check the type of content items by using the InstanceOf operator, I can get the type code through the Node interface method getNodetyPE (you can also use InstanceOf, but the type code method looks clearer). By using the QNAME object to represent the element name and build an element to distinguish the creation technology of the new element (line 15-16) by calling the saved factory. The top class code of Electric XML (Exml) in Electric XML Listing 8 is the simplest in any of these examples. You can read and write a document with a single method call. Listing 8. EXML top code 1 // Parse The Document from Input Stream 2 Document Doc = New Document (in); 3 // Recursively Walk and Modify Document 4 modiFyElement (Doc.getroot ()); 5 // Write the Document to Output Stream 6 doc.write (out); In Listing 9 Exml modify method, although the instanceOf is required, it is necessary to use instanceof instead of JDOM, but it is most similar to the DOM method. In Exml, you cannot create an element with namespace qualified names, so it is replaced, and I created a new element, then set its name to achieve the same effect. Listing 9. EXML MODIFY method 1 protected void modifyelement (element element) { 2 // loop through child nodes 3 child child; 4 Child next = element.getchildren (). First (); 5 While ((Child = next)! = Null) { 6 // set next before we change anything 7 next = child.getnextsibling (); 8 // Handle Child By Node Type 9 IF (Child InstanceOf Text) { 10 // Trim Whitespace from Content Text 11 string trimmed = (text) .getstring (). TRIM (); 12 IF (Trimmed.Length () == 0) { 13 // delete child if only whitespace 14 Child.Remove (); Else { 16 // Wrap The Trimmed Content with New Element 17 element text = new element (); 18 text.addText (TRIMMED); 19 Child.Replacewith (Text); 20 text.setname (Element.getPrefix (), "text"); twenty one } 22} else if (child instanceof element) { 23 // Handle Child Elements with Recursive Call24 ModiFyElement (ELEMENT) CHILD 25} 26} 27} The top class code of XPPXPP (in Listing 10) is the longest one in all examples, compared to other models, it requires considerable settings. Listing 10. XPP top code 1 // Parse The Document from Input Stream 2 M _PARSERFAACTORY = XmlpullParserfactory.newinstance (); 3 M _PARSERFAACTORY.SETNAMESPACEAWARE (TRUE); 4 xmlpullParser Parser = m_parserfactory.newpullParser (); 5 Parser.setInput (New BufferedReader (in))); 6 paser.next (); 7 xmlnode doc = m_Parserfactory.newNode (); 8 Parser.ReadNode (DOC); 9 // Recursively Walk and Modify Document 10 modifyElement (DOC); 11 // Write the Document to Output Stream 12 xmlrecorder recorder = m_parserfactory.NewRecorder (); 13 Writer Writer = New OutputStreamWriter (OUT); 14 Recorder.setOutput (Writer); 15 Recorder.Writenode (DOC); 16 Writer.Close (); Because using the JAXP interface, I must first create an example of the analyzer factory and enable Name Space Processing (2-4 lines) before creating an analyzer instance. Once the analyzer instance is obtained, I can set the input to the analyzer and truly build a document representation (line 5-8), but this involves more steps than other models. Output processing (line 11-16) also involves more steps than other models, mainly because XPP needs Writer instead of directly accepting stream as an output target. In Listing 11 XPP Modify method, although more code is required to create new elements (line 13-21), it is the most similar to the JDOM method. Name Space Processing is a bit trouble here. I must first create a qualified name (line 15-16), then create an element, and finally set the name and namespace URI (line 18-21) later. Listing 11. XPP Modify method 1 protected void modifyElement (XMLNode Element) throws exception { 2 // loop through child nodes 3 for (int i = 0; i 4 // Handle Child By Node Type 5 Object Child = Element.getChildat (i); 6 IF (child instanceof string) { 7 // Trim Whitespace from content text8 string trimmed = child.tostring (). TRIM (); 9 IF (Trimmed.Length () == 0) { 10 // delete child if only whitespace (Adjusting Index) 11 Element.removechildat (i -); 12} else { 13 // Construct Qualified Name for Wrapper Element 15 string prefix = element.getprefix (); 16 string name = (prefix == null)? "Text": (Prefix ": text"); 17 // Wrap The Trimmed Content with New Element 18 xmlnode text = m_ParserFactory.newNode (); 19 Text.Appendchild (TRIMMED); 20 element.replacechildat (i, text); 21 text.modifytag (Element.getNamespaceuri (), "Text", Name); twenty two } 23} else if (Child InstanceOf XMLNode) { 24 // Handle Child Elements with Recursive Call 25 ModifyElement (XMLNode); 26} 27} 28} Conclusion DOM, DOM4J and ELECTRIC XML have been almost equivalent to use code samples, where Exml may be simpler, and DOM4J is more difficult to limit some small conditional restrictions. DOM provides a very real benefit that is not related to the language, but if you only use Java code, it looks a bit trouble by comparing with the Java-specific model. I think this shows that Java-specific models typically successfully implement this goal in simplifying XML documents in Java code. Beyond Basics: Real World Availability Code Show JDOM and Exml provide simple and clear interfaces for basic documentation (using elements, properties, text). According to my experience, their methods do not have a good completion of the programming tasks expressed in the entire document. To complete these types of tasks, DOM, and DOM4J used component methods - where all document components from attribute to namespace implement some public interfaces - work better. The related example is the XML stream (XML streaming (XMLS)) encoded for JDOM and DOM4J recently. This code traverses the entire document and encodes each component. The JDOM implementation is much more complicated than the DOM4J, mainly because Jdom uses some unique classes without public interfaces to represent each component. Because JDOM lacks a common interface, even if you process the code of the Document object and the same type of components such as subcomponents, there are some components such as sub-components, but they must be different. Special methods are also required to retrieve the NAMESPACE components relative to other types of subcomponents. Even when processing is considered a sub-component type of content, you need to use multiple IF statements with instanceOf check on the component type instead of using a clearer and faster Switch statement. Ironically, one of the initial goals of JDOM is to use the Java Collection class, which is largely interface-based. The use of the interface in the library has added many flexibility, which is based on increasing some complexity, and this is usually a good compromise for code designed for reuse. This may also be mainly due to DOM4J, which reaches a mature and stable state, much more than JDOM. Despite this, DOM is still a very good choice for developers using multiple languages. The DOM implementation is widely used in a variety of programming languages. It is still the basis of many other standards related to XML, so even if you use Java-specific models, there is also a good opportunity to gradually be familiar with DOM. Because it officially won W 3C Recommended (relatively relative to non-standard Java models), so it may also need it in some types of projects. In this context, in the three main competitors of JDOM, DOM4J, and Electric XML, DOM4J and other two differences in the interface-based approach with multiple inheritance layers. This will make it more difficult to follow the API Javadocs. For example, a method you are looking for (such as Content (), used in line 3 of our DOM4J's Modify method example) may be part of the ELEMENT extended Branch interface, not part of the ELEMENT interface itself. Despite this, this interface-based design adds many flexibility (see Side Bar Beyond Basics: Real World Availability). Considering the advantages of DOM4J's performance, stability, and feature settings, you should regard it as a powerful candidate in most items. Among the Java-specific document models, JDOM may have the broadest user foundation, and it is indeed one of the simplest models. Despite this, as a choice of project development, it is still tolerate the unconventionability of the API and update from one version to the next version, which also behaves very bad in performance comparison. Based on the current implementation, I would like to recommend DOM4J for people who start new projects, not JDOM. In addition to XPP, Exml is much less than the resources occupied by any other model, and take into account the advantages of Exml easy to use, you should definitely think that it applies to the application of the JAR file size. However, Exml's XML support limitations and restricted licenses, as well as relatively poor performance depends on larger files, have to give up using it in many applications. XPP requires more steps when analyzing and writing text documents, and more steps are also required when processing namespace. If XPP intends to add some convenient methods to handle some of the common situations, then it may be more better in comparison. As it is now expressing, the leaders in the last article have become a loser in the usability of this article. Despite this, because of the advantages of XPP performance, applications that require smaller JAR files are also worth it as an alternative to Exml. Next time ... During the two articles I wrote, the performance and availability of the XML document model written in Java is involved. In the latter two articles in this series, I will discuss the method of XML data binding with Java technology. These methods have many similarities with the method of document model, but they further map the XML document to the actual application data structure. We will see how this operation is made so good and improve performance in terms of use. Go back to developerWorks, check the entry of the XML data binding of Java code. At the same time, you can give your comments and questions to this article through the forum link below.