DOM experience
Frush
This article describes the structure of the DOM (Document Object Model) and the general usage method. Through this article, readers can learn to use DOM to perform common processing on XML documents. This article does not discuss DOM design and implementation skills.
Key words:
XML DOM
Overview
The DOM (Document Object Model) is a description system of XML data, which saves XML data with a document of a tree structure. In addition, the DOM also includes an API that is analyzed and processes XML data.
Before you start using the DOM, you first take a look at its structure. The overall structure on the DOM is a Composite mode. All XML units, whether documents, elements, or attributes, text, is a Node (node) in the DOM. According to the definition of the Composite mode, each Node can contain other NODEs, so it is easy to constitute a tree structure. Lift a simple example, the following XML document
Author>
Book>
This will be like this in the storage form in the DOM:
Since I have already learned the structure of the DOM document, I will learn how to operate the DOM documentation. For such a tree structure, more important operations have document generated, document traversal, and the processing of the node content (read, modification, etc.), the operation of the node itself (insert, deletion, replacement, etc.) and the serialization of the document. Below, we will learn these operations one by one.
DOM documentation
Use the DOM to process XML data, first require the following three steps:
1. Create DocumentBuilderFactory. This object will create DocumentBuilder.
2. Create DocumentBuilder. DocumentBuilder will parse the input to create a Document object.
3. Analyze the input XML and create a Document object.
DocumentBuilderFactory is a Singleton, so you can't go directly, you should call DocumentBuilderFactory.newInstance () to get the instance of DocumentBuilderFactory. In addition, DocumentBuilderFactory is also an object factory (you can see from the name), you can use it to create DocumentBuilder.
A DocumentBuilder's PARSE method will usually return a document object (need to plug in: Document is just an interface, with javax.xml.parsers.DocumentBuilder's Parse method is actually Org.Apache.crimson.tree.xmldocument object). The PARSE method accepts many input parameters, including File, InputStream, InputSource, String type URI, and more. The Parse method analyzes the input source and generates a DOM tree structure - Document object in memory.
The common code of these three steps is as follows:
File docfile = new file ("ORDERS.XML");
Document Doc = NULL;
Try {
DocumentBuilderFactory DBF = DocumentBuilderFactory.newinstance ();
DocumentBuilder DB = dbf.newdocumentbuilder ();
DOC = db.parse (docfile);
} catch (exception e) {system.out.print ("Problem Parsing The File.");
The PARSE method may throw IOEXCEPTION or SAXEXCEPTION, indicating that the input exception and parsing exception, respectively.
Before you create DocumentBuilder, you can set some parameters for DocumentBuilder to adjust it in the behavior when generating Document. Controlable parameters include:
l setcoalescing: Determines if the parser converts the CDATA node into text and merges the CDATA node with the text node around it (if appropriate). The default is False.
l setXpandentityReferences: Determine if an external entity reference is expanded. If you are True, insert the external data into the document. The default is True.
l setIgnoringcomments: Determine if the comment in the file is ignored. The default is False.
l SetignoringeElementContentWhitespace: Determine if you ignore the blank in the element content (similar to the way of browser processing HTML). The default is False.
l setNamespaceAware: Determine if the parser pays attention to the namespace information. The default is False.
l SetValidating: By default, the parser will not verify the document. Set this parameter to True to open the verification.
Set the statements of the parameters as follows:
DocumentBuilderFactory DBF = DocumentBuilderFactory.newinstance ();
DBF.SetValidating (TRUE);
DocumentBuilder DB = dbf.newdocumentbuilder ();
DOM document traversal
The DOM uses Composite mode. The Node class is the base class of all XML units, Element, Attr, Document, etc. are all Node derived classes. Each Node can contain other NODE or to include content in text format. Therefore, the traversal of the DOM document is quite simple.
First get the root node of the document. Use the document.getDocumentelement () method to get an Element type object, which is the root node of the document. For an HTML document, the getDocumentelement () method is the node.
As long as the root node is obtained, all direct child nodes of the node can be obtained with the node.getchildnodes () method, thereby traversing the entire tree structure. In addition, a node can be judged by the node.haschildnodes () method to obtain the end condition of the traversal algorithm. The return value of the getChildNodes () method is Nodelist object, NodeList has two methods: int getLength () and Node Item (int), you can use these two methods to securely access each element.
The above method is depth priority traversal (using iterative algorithm), and one method is a broad-priority traversal algorithm, and the method to be used is GetFirstChild () (get the first child node) and getNextSibling () (get next Brother node). Treatment of elements
First of all, you must first understand the concept of "node" and "element": node and elements in the DOM are not equivalent. "Element" refers to the sum of a pair of tags (TAG) and its internal contained string values, such as the following is an element:
China
Country>
But it is not a node, but two. The first node is the
So, when you handle the content of an element, you need two steps:
1. Locate nodes representing the element;
2. Handling the first child node of the node;
As long as you know the name of an element, you can use the Element.GetElementSbyTagName (String Name) to find all nodes that represent the element. The getElementsBytagname method will automatically traverse the entire tree structure and save all the found nodes returned in a nodelist. Since the Tree structure of the DOM is built in memory, this operation will not be too slow. After finding the node, use the node.getfirstchild () method to obtain a text node that represents the element value, you can modify the value of the node with the Node.setNodeValue (String) method.
Handling content of other types of nodes
If the node to be accessed is the property node (Node.GetNodeType () == Attribute_Node, you can get all the properties in the node via the GetAttributes () method. The GetAttributes method returns a NameDNodeMap type object, which is a name-----value mapping table that can be randomly accessed by the String type name, or sequentially accessed through the INT type sequence number. Att class (attribute node) has getValue () and setValue () two Accessor for accessing the value of the attribute.
There are 12 different types of nodes, and the two most commonly used element nodes and attribute nodes are introduced here, and others have to help it. Node has a getNodeType () method that returns a Short type value to determine the true type of an object, which acts as a role of RTTI. Below is the possible return value of the getNodeType () method:
Public static final short element_node = 1;
Public Static Final Short Attribute_Node = 2;
Public static factory short text_node = 3;
Public Static Final Short CData_section_node = 4;
Public static final short entity_reference_node = 5; public static factory short entity_node = 6;
Public Static Final Short processing_instruction_node = 7;
Public static final short comment_node = 8;
Public static factory short document_node = 9;
Public static factory short document_type_node = 10;
Public static factory short document_fragment_node = 11;
Public Static Final Short Notation_Node = 12;
Node processing
For tree data structures, common node processing is the insertion, deletion, and replacement of nodes. DOM provides very easy to use APIs for these operations.
Node can be inserted Node.appendChild (Node), can also be used Node.insertBefore (Node newChild, Node refChild); remove nodes can Node.removeChild (Node oldChild); replacement node can Node.replaceChild (Node newChild, Node oldChild ). The DOM will automatically adjust the tree structure, delete, and replace the Oldchild node, which is very convenient.
Document is also a node (Node), so you can also insert the node directly into the document. However, pay attention: Only the node created by the document can be inserted into the document, otherwise the WRONG_Document_err exception will be triggered. Create a node using the Document.createxxxx method. You can clone a node with a ClonEnode (Boolean Deep) method, determine if the depth copy is determined by the Boolean type parameter, but the cloned node cannot insert another document. In addition, you can introduce nodes in other documents with Document.ImportNode (Node ImportedNode, Boolean Deep).
When you need to process the properties of the element, you can insert the properties with Element.SetaTributenode (Attr Newattr), using Element.RemoveAttribute (String Name) to delete unwanted properties. If the attribute has the same name, you can use Element. RemoveAttributeNode (Attr Oldattr) to specify a deletion of an attribute node.
Serialization of documents
Each ELEMENT covers the TSTRING method, so as long as a one of the Element is specified as the root, then call its toString method, it will recursively get the entire tree structure under which it is converted into String type objects. As long as this String type object is output to the specified device, you can get an XML document, which is very convenient. The following code will generate a new HTML document (HTML can be said to be a subset of XML) and output on the standard output device.
Document newdoc = null;
Try {
DocumentBuilderFactory DBF = DocumentBuilderFactory.newinstance (); DocumentBuilder DB = dbf.newdocumentbuilder ();
NewDoc = db.newdocument ();
} catch (exception e) {};
Element head = newdoc.createElement ("HEAD");
ELEMENT TITLE = NewDoc.createElement ("Title");
Title.Appendchild (NewDoc.createTextNode ("Document Created By Dom");
Head.Appendchild (title);
Element body = newdoc.createElement ("body");
Body.Appendchild (NewDoc.createTextNode ("this is a test document");
Element newroot = newdoc.createElement ("html");
NEWROOT.APPENDCHILD (BODY);
Newroot.insertbefore (Head, Body);
NewDoc.Appendchild (newroot);
System.out.println (newroot);
to sum up
The DOM generates a tree structure in memory to process XML data, which has an advantage in processing speed and convenience, but the storage space is consumed. If the XML document is relatively large, the resolution process may take a longer time.