content:
Read and write XML documents using the XMLDocument class using TransformerFactory and Transformer class using Xalan XML Serializer Conclusion Reference Accessories About Authors
Huang Quan (Quantumfancy@hotmail.com)
March 2002
This article briefly discusses four common methods for updating XML documents in Java language program, and analyzes the advantages and disadvantages of these four methods. Secondly, this article also made the format of how to control the XML document output by the Java program.
JAXP is the English header abbreviation of Java API for XML Process, which is the programming interface written using Java language for XML document processing. JAXP supports DOM, SAX, XSLT and other standards. In order to enhance the flexibility in JAXP, developers have designed a PlugGability Layer for JAXP. Under the support of Pluggability Layer, JAXP can implement the DOM API, SAX API's various XML parsers (XML Parser, for example Apache Xerces) Joint work, and the XSLT processor (XSLT Processor, such as Apache Xalan), which is specifically implemented. The benefit of applying Pluggability Layer is that we only need to be familiar with the definition of JAXP's respective programming interfaces, without having to understand the specific XML parsers used, the XSLT processor has a deep understanding. For example, in a Java program, the XML parser Apache Crimson is called via JAXP to process the XML document, if we want to use other XML parsers (such as apache Xerces) to increase the performance of the program, then the original program code may not Need to change, you can use it directly (what you need to do is just JAR files containing the Apache Xerces code to the environment variable ClassPath, and the JAR file containing the Apache Crimson code is removed in the environment variable ClassPath). At present, JAXP has been applied very common, which can be said to have a standard API for processing an XML document in a Java language. Some beginners are learning such a problem in learning using JAXP: I have written updates to the Dom Tree, but when the program exits, the original XML document does not change, or the old look, how to implement it The original XML document and the Synchronization update of the DOM Tree? At first, there seems to be no corresponding interface / method / class in JAXP, which is a problem that many beginners are confused. The main purpose of this article is to solve this problem, simply introduce several commonly used synchronous update of the original XML document and the Dom Tree method. In order to narrow the scope of the discussion, the XML parsers involved herein include only Apache Crimson and Apache Xerces, while the XSLT processor only uses Apache Xalan. Method 1: Direct reading and writing XML documents this may be the most stupid and most original way. After the program acquires the DOM Tree, each method of the Node interface to the DOM model is updated, and the next step should be updated the original XML document. We can use the recursive approach or to apply the TreeWalker class. Traverse the entire Dom Tree, while writing every node / element of the Dom Tree, in a pre-opened original XML document, when the DOM Tree is traveled, Dom Tree and the original XML documentation implement synchronous updates. In practice, this method is rarely used, but if you want to program your own XML parser, this method is still possible to use. Method 2: Using the XMLDocument class using the XMLDocument class? Dividing this class in JAXP! Is the author mistaken? Nothing! That is to use the XMLDocument class, it is a Write () method using the XMLDocument class. The above has been mentioned above, JAXP can be used in combination with a wide variety of XML parsers, this time we choose the XML parser is apache crimson. XMLDocument (org.apache.crimson.tree.xmldocument) is a class of Apache Crimson, not included in the standard JAXP, can't find the XMLDocument class in the JAXP documentation.
Now the problem came out, how to apply the XMLDocument class to update the function of the XML document? The following three Write () methods are available in the XMLDocument class (based on the latest version of Crimson ---- Apache Crimson 1.1.3): public the main effect of void write (OutputStream out) throws IOExceptionpublic void write (Writer out) throws IOExceptionpublic void write (Writer out, String encoding) throws IOException above three write () method is specific to the content of the output medium in the DOM Tree output For example, file output flow, application console, etc. So how do you use the above three Write () methods? Please see the Java program code snippet: string name = "fancy";
DocumentBuilder Parser;
DocumentBuilderFactory Factory = DocumentBuilderFactory.newinstance ();
Try
{
Parser = factory.newdocumentbuilder ();
Document Doc = Parser.Parse ("User.xml");
Element newlink = doc.createElement (name);
Doc.getDocumentelement (). appendchild (newlink);
(XMLDocument) DOC) .write (New FileoutPutStream (New File ("Xuser1.xml")));
}
Catch (Exception E)
{
// TO log it
}
In the above code, first create a Document object DOC, get the complete DOM Tree, then apply the appendchild () method of the Node interface, and add a new node in the DOM Tree, finally call the XMLDocument class Write (OUTPUTSTREAM OUT) Method, output content in the DOM Tree into xuser.xml (actually output to user.xml, updating the original XML document, here for easy contrast, so output to XUSER.XML files) . It should be noted that the WRITE () method is directly called directly to the Document object DOC, because the JAXP's Document interface does not define any Write () method, so you must force the DOC to convert the Document object to the XMLDocument object, and then call Write () Method, in the above code is the Write (OutputStream out) method, this method uses the default UTF-8 encoded output DOM Tree to a specific output medium, if the Chinese characters are included in the DOM Tree, then output The result may be garbled, that is, there is so-called "Chinese character issues", the solution is to use the Write (Writer Out, String Encoding) method, explicitly specify the encoding, such as setting the second parameter "GB2312 ", At this time, there is no" Chinese character problem ", and the output results can display Chinese characters normally. For a complete example, please refer to the following files: addRecord.java (see attachment), user.xml (see attachment). The operating environment of this example is: Windows XP Professional, JDK 1.3.1. To be able to run the addRecord.java, you need to download Apache Crimson to the URL http://xml.apache.org/dist/crimson/ and add the acquired crimson.jar file to the environment variable ClassPath. Note: Apache Crimson's predecessor is Sun Project X Parser, and later I don't know why, the X Parser evolved into Apache Crimson, and many of the code of Apache Crimson has been directly transplanted from X Parser. For example, the XMLDocument class used above, it is com.sun.xml.xmldocument in X Parser, and it has become an org.apache.crimson.tree.xmldocument class, in fact they have The code is the same, it may be different from the package statement and the import statement, and the beginning of the file. Early JAXP is bundled with X Parser, so some old programs use com.sun.xml package, if you recompile them now, it is possible to pass, it is definitely because of this reason. Later JAXP and Apache Crimson bundle together, such as JAXP 1.1, if you use JAXP 1.1, then you don't need to download Apache Crimson, you can also properly compile the example (AddRecord.java). The latest JAXP 1.2 EA (Early Access) is changing the string, using performance better Apache Xalan and Apache Xerces as the XSLT processor and XML parser, can not directly support Apache Crimson, so if your development environment uses JAXP 1.2 EA or It is Java XML PACK (including JAXP 1.2 EA), then you will not be able to directly compile the above example (AddRecord.java), you need to download and install Apache Crimson. Method 3: The method of updating the original XML document using the standard update of the TRANSFORMERFACTORY and TRANSFORMER Class in JAXP is to call the XSLT engine, which is the TRANSFORMERFAACTORY and TRANSFORMER classes. Please see the Java code snippet below: // Create a DomSource object first, the parameters of the constructor can be a Document object
// DOC represents the changed Dom Tree.
DOMSource Doms = New DomSource (DOC); // Create a File object, which represents the output media of the data contained in the DOM Tree, which is an XML file.
FILE F = New File ("XMloutput.xml");
// Create a streamResult object, the parameter of the constructor can be taken as a File object.
StreamResult Sr = New StreamResult (f);
// The XSLT engine in JAXP is called to implement the data in the DOM Tree to the XML file.
The input of the XSLT engine is a DomSource object and outputs it as a StreamResut object.
Try
{
// First create a TransformerFactory object, then create a Transformer object. TRANSFORMER
// Class is equivalent to an XSLT engine. Usually we use it to process the XSL file, but here we have made
// Use it to output an XML document.
TransformerFactory TF = TransformerFactory.newInstance ();
Transformer T = TF.NEWTRANSFORMER ();
// Step by one, call the Transform () method of the Transformer object (XSLT engine), the first
// The parameter is a DomSource object, and the second parameter is a StreamResult object.
T.Transform (DOMS, SR);
}
Catch (TransformerConfigurationException TCE)
{
System.out.Println ("Transformer Configuration Exception / N -----");
Tce.PrintStackTrace ();
}
Catch (Transformerexception TE)
{
System.out.Println ("Transformer Exception / N ---------");
TE.PRINTSTACKTRACE ();
}
In practical applications, we can apply traditional DOM API to get DOM Tree from the XML document, then perform various operations to Dom Tree according to actual needs, get the final Document object, next to this Document object to create Domsource objects The rest of the thing is to move the above code. After the program is running, XMLOUTPUT.XML is the result you need (of course, you can change the parameters of the StreamResult class constructor, specify different output media, not necessarily Thousands of XML documents). The greatest advantage of this method is that the contents of the control Dom Tree you can output into the format in the output medium, but the light relying on the TransformerFactory class and the Transformer class do not implement this feature, but also rely on the help of the OutputKeys class. For a complete example, please refer to the following files: addRecord2.java (see attachment), user.xml (see attachment). The operating environment of this example is: Windows XP Professional, JDK 1.3.1. In order to be able to function properly, you need to go to the URL http://java.sun.com to download and install JAXP 1.1 or Java XML Pack (Java XML Pack has JAXP). OutputKeys class javax.xml.transform.outputKeys class and java.util.properties class work with the JAXP's XSLT engine (Transformer class) to output an XML document format. Please see the following code segment:
// First create a TransformerFactory object, then create a Transformer object.
TransformerFactory TF = TransformerFactory.newInstance ();
Transformer T = TF.NEWTRANSFORMER (); // Get the output attribute of the Transformser object, which is the default output attribute of the XSLT engine, this is a
//java.util.properties object.
Properties Properties = T. GetOutputProperties ();
/ / Set the new output attribute: Output character is encoded as GB2312, which can support Chinese characters, and the XSLT engine is output.
// The XML document If the Chinese characters are included, it can be displayed normally without the so-called "Chinese Characters".
/ / Please pay attention to the string constant OutputKeys.Encoding of the OutputKeys class.
Properties.SetProperty (OutputKeys.Encoding, "GB2312");
/ Update the output attribute of the XSLT engine.
T. SetputputProperties (Properties);
// Call the XSLT engine, output according to the settings in the output attribute, output the contents in the DOM Tree to the output medium.
T.TRANSFORM (Domsource_Object, StreamResult_Object);
From the above program code, we are not difficult to see that by setting the output attribute of the XSLT engine (Transformer class), you can control the output format of the content in the DOM Tree, which is very helpful for our custom output content. Then the JAXP's XSLT engine (Transformer class) is available for output properties? Javax.xml.transform.outputKeys classes define a lot of string constants, they are all freely set output properties, and the common output properties are as follows:
Public static final java.lang.String Method can be set to "XML", "HTML", "Text" equivalent. Public static final java.lang.string Version Follow the specification version number, if Method is set to "XML", then its value should be set to "1.0", if Method is set to "HTML", then its value should be set to "4.0", if Method is set to "Text", then this output attribute is ignored. Public Static Final Java.lang.String Encoding The encoding method used in setting the output, such as "GB2312", "UTF-8", etc., if it is set to "GB2312", so-called "Chinese Characters" can be solved. Public static final java.lang.string omit_XML_Declarative settings Whether to ignore the XML declaration when output to the XML document, that is,: XML Version = "1.0" Standalone = "YES" eNCoding = "UTF-8"?> This Code. Its optional value has "Yes", "NO". Public static final java.lang.string Indentent Set whether the XSLT engine is automatically added to the XML document, which is available for "Yes", "NO". Public static final java.lang.string media_typemedia_type Sets the MIME type of the output document. If you set the output attribute of the XSLT engine? Let's summarize: First, get a collection of the default output attribute of the XSLT engine (Transformer class), which requires the GetoutputProperties () method of the Transformer class, and the return value is a Java. Util.Properties object. Properties properties = transformer.getOutputProperties (); then the new set of output attributes, such as: properties.setProperty (OutputKeys.ENCODING, "GB2312"); properties.setProperty (OutputKeys.METHOD, "html"); properties.setProperty ( OutputKeys.version, "4.0"); ......................................................................................................... The SetOutputProperties () method of the Transformer class is a java.util.properties object. We wrote a new program, which applied an OutputKeys class to control the output attribute of the XSLT engine. The schema's architecture and the previous program (AddRecord3.java) are roughly the same, but the output results are slightly different. For complete code, please refer to the following files: addRecord3.java (see attachment), user.xml (see attachment). The operating environment of this example is: Windows XP Professional, JDK 1.3.1. In order to be able to function properly, you need to go to the URL http://java.sun.com to download and install JAXP 1.1 or Java XML PACK (Java XML Pack contains JAXP). Method 4: Using the Xalan XML Serializer Method is actually a variant of method three, it requires support for Apache Xalan and Apache Xerces to run. The example code is as follows:
// First create a DomSource object, the parameter of the constructor can be a Document object // doc represents the changed Dom Tree.
Domsource Domsource = New Domsource (DOC);
// Create a DomResult object, temporarily save the output of the XSLT engine.
DomResult DomResult = New DomResult () and DOMRESULT ();
// The XSLT engine in JAXP is called to implement the data in the DOM Tree to the XML file.
The // The input of the XSLT engine is a DomSource object, which is output as a DomResut object.
Try
{
// First create a TransformerFactory object, then create a Transformer object. TRANSFORMER
// Class is equivalent to an XSLT engine. Usually we use it to process the XSL file, but here we have made
// Use it to output an XML document.
TransformerFactory TF = TransformerFactory.newInstance ();
Transformer T = TF.NEWTRANSFORMER ();
// Set the properties of the XSLT engine (essential, otherwise "Chinese Characterization").
Properties Properties = T. GetOutputProperties ();
Properties.SetProperty (OutputKeys.Encoding, "GB2312");
T. SetputputProperties (Properties);
// Step by one, call the Transform () method of the Transformer object (XSLT engine), the first
// The parameter is a DomSource object, and the second parameter is the domresult object.
T.Transform (Domsource, DomResult);
// Create the default Xalan XML Serializer, use it to store it in the DomResult object
The content in // (DomResult) is output to the output medium in the form of output stream.
Serializer Serializer = SerializerFactory.getSerializer
(OutputProperties.GetDefaultMethodproperties);
// Set the output attribute of the Xalan XML Serializer, which is essential, otherwise it may also be produced
// The so-called "Chinese character problem".
Properties Prop = Serializer.getOutputFormat ();
Prop.SetProperty ("encoding", "GB2312");
Serializer.setOutputFormat (Prop);
// Create a File object, which represents the output media of the data contained in the DOM Tree, which is an XML file.
FILE F = New File ("XUSER3.XML");
// Create a file output flow object FOS, please pay attention to the parameters of the constructor.
FileOutputStream Fos = New FileoutputStream (f);
// Set the output stream of the Xalan XML Serializer.
Serializer.SetOutputStream (FOS);
// Serialized output results.
Serializer.AssDomserializer (). serialize ());
}
Catch (Exception TCE)
{
Tce.PrintStackTrace ();
}
This method is not common, and it seems to have a little painting snake to add, so we don't discuss it. For a complete example, please refer to the following files: addRecord4.java (see attachment), user.xml (see attachment). The operating environment of this example is: Windows XP Professional, JDK 1.3.1. In order to be able to compile the addRecord4.java this program, you need to go to the URL http://xml.apache.org/dist/ to download and install Apache Xalan and Apache Xerces. Or go to the URL http://java.sun.com/xml/download.html to download and install Java XML PACK. Because the latest Java XML PACK (Winter 01) contains Apache Xalan and Apache Xerces technology. Conclusion: This paper discusses four methods for updating XML documents in Java language programming. The first method is to read and write XML files directly, this method is very cumbersome, and it is more likely to be wrong, unless you need to develop your own XML Parser, do not use this method. The second method is to use the XMLDocument class of Apache Crimson, which is very simple, easy to use, if you choose Apache Crimson as an XML parser, then use this method, but this method seems to be high efficiency (from efficiency Low Apache Crimson, additionally, high version of JAXP or Java XML PACK, JWSDP does not directly support Apache Crimson, ie this method is not universal. The third method is to use JAXP's XSLT engine (Transformer class) to output an XML document, which may be a standard method, which is very flexible, especially if it can be controlled, and we recommend this method. The fourth method is a variant of the third method. It is used in Xalan XML Serializer. It introduces serialization operations. It has superiority for the modification / output of a large number of documents. Unfortunately, it is necessary to repeat the properties of the XSLT engine and XML Serializer. Output attributes, trouble, and rely on Apache Xalan and Apache Xerces technology, a slightly short of versatility. In addition to the four methods discussed above, there are many ways to apply other APIs (such as JDOM, CASTOR, XML4J, Oracle XML Parser V2), and there are many ways to update XML documents, limited to space, and here is not discussed here. References and Source: [1] The Java Web Services Tutorial, Sun Microsystems Inc. [2] http://xml.apache.org, online xml project (crimson, xerces, xalan) [3] http: // www .jguru.com, XML Forum [4] http://forum.java.sun.com, Java Technology & XML Forum Attachment: AddRecord.javaaddRecord2.javaAddRecord3.javaAddRecord4.javauser.xml About Authors of Huang Quan, Beijing University fourth-grade student For Java, XML technology is deeply interested in long-term programming experience. You can contact him through QuantumFancy@hotmail.com.