SAX implementation "incremental file"

zhaozj2021-02-16  56

SAX implements "incremental file" (wang hailong) This article assumes that you are familiar with XML, DOM, SAX and other "words". J This example takes an Java language, but the principle is connected, and it is equally applicable to C and C #. 1. "Increment File" in XML format We know that there are several ways to program the files, read, write, append, and so on. The file that opens the Append mode can be called "increment" file, and the new content can be added to the end of the file. The log log file is basically taken. An important application for "incremental file" is the log log file. We don't have to consider "existing" content in the file, as long as you simply add, add it. For complex situations, the Append method is not enough. If we use the format processing in the log log file (such as HTML format, XML format), then you can't simply add new content to the end of the file, we must consider "already" content in the file, put The new content is placed in a suitable location. For example, the Log log file uses an HTML format. We at least put the new content in the and tag, even should be placed in

and The middle of the label. We must ensure that the file format is correct and the information location is correct. The format information, structural information, which refers specifically, refer to XML format. The "incremental file" discussed in the text refers to the "incremental file" in the XML format. This article is intended to illustrate the method of analyzing and operating the XML format "incremental file". 2. XML data processing method 2.1 XML Database If you need to make comprehensive management, add, delete, query, comparison, etc., consider using XML Database, please refer to some Open Source Project, such as Apache Xindice. 2.2 DOM XPath analysis file, generate DOM TREE, add actions to Dom Tree's nodes. You can use XPath to use, for example, try to locate the target node, "HTML / BODY // Table [Position () = last ()], positioning the last table of the HTML file, add the column that you need to add to This table is in. With regard to XPath, you can refer to the example of apache xalan, an example of XPATHAPI. Advantages: clear concept, good structure, easy to read. Disadvantages: It is especially true that it takes more time and space to use XPath. 2.3 SAX FILTER

Analyze the file, generate SAX events, process SAX events. SAX operates in a PIPELINE pipeline, and some filter can be added to the PIPELINE pipe, chain processing, the results of the previous step Event, becoming the next SAX event. Disadvantages: The concept of SAX is not as intuitive, and the readability of the code is not as good as DOM. The result can only be generated again and cannot be repeated. SAX programming is often better than DOM. Advantages: The difficulty of SAX is not more than DOM, and the amount of encoded is even less than the amount of encoding of the DOM. SAX is fast, and it is very small. 2.4 XML-Object Binding XML data and Java objects are binding. Map the XML element into a Java object, an attribute or child of an XML element map to a member variable of a Java object. Related Open Source Project: Sun Jaxb; Castor., Etc. The binding mode is divided into static binding and dynamic binding. Dynamic binding is flexible, but the speed is slow. This article tends to static binding, followed by static binding characteristics. Static binding process: Write XML Schema files; use the code generation tool to process the XML Schema file, automatically generate the Java class; use the object-generated Java class object, which is equivalent to direct operation XML data. Advantages: This method is the smallest programming, and the structure of the code is best. The speed is faster than DOM (fast limited), space is smaller than DOM (small limited), like DOM, generates the results tree, and can search for findings. Disadvantages: The speed is slower than SAX, the space is larger than SAX. When the document structure is complex to a certain extent, the automatically generated code is a burden. 2.5 About Apache Project Http://xml.apache.org core is apache Xerces and apache xalan. Http://jakarta.apache.org/commons/digester.html (Small size) Digester: XML-Object Dynamic binding. http://jakarta.apache.org/commons/jxpath/index.html jxpath: Operate the Java hierarchy object using the XPath. 3. SAX implementation "incremental file" is now cut into the topic. SAX has the highest time and spatial efficiency, this paper mainly discusses this method. The core interface of SAX is the ContentHandler interface, and ContentHandler accepts SAX events for processing. The problem we face now is that the file operation is required. First, it is a ContentHandler that can write the SAX event to the file. This class is very important for our problems. Apache Xerces and Apache Xalan provide some XML Serializer classes that provide some XML Filter classes that also implements the Source and Result interfaces in the Transform package. To reasonably assemble these classes, a flexible processing pipe can be achieved.

Because the corresponding problems in this article are small, it is not necessary to introduce such complexity. The ContentHandler interface is not very complicated, and there are so many open source codes to refer to, we can implement a simple SAXWRITER class to write the SAX event to a file. The following code is extracted from the code of Apache Xerces to achieve the most simplified function. 3.1 SAXWriter code package example; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; import org.xml.sax.helpers.DefaultHandler; import java. io.OutputStreamWriter; import java.io.PrintWriter; import java.io.Writer; import java.io.OutputStream; import java.io.IOException; import java.io.UnsupportedEncodingException; public class SAXWriter extends DefaultHandler {public SAXWriter () { .} / ** Print writer * / protected PrintWriter printWriter;. / ** Sets the output stream for printing * / public void setOutput (OutputStream stream, String encoding) throws UnsupportedEncodingException {if (encoding == null) {encoding = "UTF8 ";} java.io.Writer writer = new OutputStreamWriter (stream, encoding); printWriter = new PrintWriter (writer);.} // setOutput (OutputStream, String) / ** Sets the output writer * / public void setOutput (java .io.writer write {

PrintWriter = Writer InstanceOf PrintWriter? (PrintWriter) Writer: New PrintWriter;} // setOutput (java.io.writer)

// ContentHandler methods / ** Start element * / public void startElement (String uri, String local, String raw, Attributes attrs) throws SAXException {printWriter.print ( '<');. PrintWriter.print (raw); if (attrs ! = null) {int LEN = attrs.getLength (); for (int i = 0; i '); printwriter.flush ();} // . startElement (String, String, String, Attributes) / ** End element * / public void endElement (String uri, String local, String raw) throws SAXException {printWriter.print ( "'); printwriter.flush ();} // endelement (string) / ** characters. * / Public void characters (char ch [] "(char ch [], int start, int length) throws saxexception { PrintWriter.write (CH, Start, Length); PrintWriter.flush ();} // characters (char [], int, int); / ** ignorable white. * / public void ignorablewhitespace (char ch [], int Start INT length) THROWS SAXEXCEPTION {Characters (CH, Start, Length); PrintWriter.flush ();} // ignorablewhitespace (char [], int, int);} This class is extended below. 3.2 ContentAppender source code

Package example; import org.xml.sax.saxexception; import org.xml.sax.attributes; import org.xml.sax.contenthandler; import org.xml.sax.helpers.defaulthandler;

public class ContentAppender extends SAXWriter {String root = null; String message = null; public ContentAppender () {} public void appendMessage (String message) {this.message = message;} / ** Start element * / public void startElement (String. URI, STRING LOCAL, STRING RAW, ATTRIBUTES ATTRS) THROWS SAXEXCEPTION {f (root == null) {// Record the root element root = RAW;

/ ** End element. * / Public void endelement (String Uri, String Local, String Raw) throws saxception {if (Message! = Null && raw.equals (root) {// if it is the end of root Element Super Super SUPER .startElement ("Message", "Message", NULL

PrintWriter.print (Message);

Super.enDelement ("", "Message", "Message");

PrintWriter.println (); Super.Endelement (URI, LOCAL, RAW);}}

Extend the SaxWriter, we can place the newly added content in any element. The ContentAppender class adds the newly added content on the last node of the root element. 3.3 main function

package example; // java import java.io.File; import java.io.InputStream; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.FileNotFoundException; import java .io.UnsupportedEncodingException; // org.xml // Imported SAX classes import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax .helpers.XMLReaderFactory; import org.xml.sax.helpers.XMLFilterImpl; import org.xml.sax.XMLReader; import org.xml.sax.XMLFilter; import org.xml.sax.ContentHandler; // Imported Serializer classes import org .apache.xalan.serialize.Serializer; import org.apache.xalan.serialize.SerializerFactory; import org.apache.xalan.templates.OutputProperties; public class IncrementalWriter {public void process (String logFile, String newMessage) throws Exception {String newFileName = logfile ".new"; // Create An XmlReader. XmlReader Reader = XmlReaderFactory.createxmlReader ("Org.apa che.xerces.parsers.SAXParser "); ContentAppender handler = new ContentAppender (); FileOutputStream fo = new FileOutputStream (newFileName); handler.setOutput (fo, null); handler.appendMessage (newMessage);

Reader.setContentHandler (Handler); Reader.Parse (); fo.close (); // delete old file and rename new file to log file. File oldfile = new file (logfile); oldfile.delete (); file newfile = new file (newfilename); new file (logfile));} // for example, a.xml is as beelow // // Old Message // Public static void main (string [] args) throws exception {incrementAlwriter Tester = new incrementAlwriter (); tester.process ("a.xml", "hello"); // append "hello" to a.xml file.} This three files are simple. JDK1.4 is compiled. The result of the execution is that an element is added in a.xml file. 4. Application recommendations have already described the simplest example, (limited to power, I can't simplify). The above code, each write a message, to handle a file, so that the efficiency of reading and writing files is very low. In the actual application, other methods can be combined. For example, first put the information in the DOM Tree or Java Object, store a certain number, then call the code similar to the above, put the data to the file once. This article recommends a method to use XML-Object Binding and SAX. For example, using JAXB's XML-Object Binding mechanism, use XML-Object to store structure simple part of data, and then store these data once a specified location in a complex XML file once again. When the file becomes large, it is recommended to divide several file storage. It is possible to speed up the code for memory recycling as follows (you can release the Dom Tree or Java Object): Vector v = new vector (2000); // process vector, after this V = null; // Cancel the Reference

System.gc (); // suggest JVM to Collect Unuse Object 5. Analysis of HTML HTML is not well structured in XML format, so structured document XML appears. Things are always interacting. The shortcomings of HTML gave birth to XML, XML's emergence and development, but in turn affected the processing of HTML. The following is a brief introduction to the method of resolving the HTML page according to the XML structure - parse HTML with DOM or SAX. 5.1 HTMLBuider Apache Xerces contains an org.apache.html.DOM package that implements the interface of these HTML document elements. The HTMLBuider class is an entry class that implements the org.xml.sax.documenthandler interface, accepts the SAX event, and generates an HTML document tree. The key to this method is to find a suitable XMLReader to parse the HTML document. Many XML parsers are very low, most HTML documents are not a good structure XML, so often there is some important HTML element resolution. 5.2 NEKOHTML Project Nekohtml Open Source Project Using the apache Xerces XNI (Xerces Native Interface) interface, parsing the HTML document is a nice open source project. It is possible to process each HTML element to generate an HTML document tree. It is possible to generate SAX events to each HTML element. Nekohtml can handle HTML documents with the foregoing methods - (1) DOM XPath, (2) SAX FILTER.

The example of DOM XPath is very intuitive, easy to read, easy to get started. SAX FILTER is more difficult to understand, it is not easy to go up. Steps to use SAX FILTER: 1. Establish your own Filter, implement the org.apache.xerces.xni.parser.xmldocumentFilter interface. See org.cyberneko.html.filters package.

2. Create org.cyberneko.html.parsers.saxpivalser. transfer

parser.setFeature ( "http://cyberneko.org/html/features/balance-tags", false); // optional XMLDocumentFilter [] filters = new XMLDocumentFilter [] {myFilter, writer}; // create filters parser.setProperty ("http://cyberneko.org/html/properties/filters", filters); // set filters paser.parse (...); 6. Summary XML and corresponding tools, greatly alleviate our document processing work, speed up our project. J

转载请注明原文地址:https://www.9cbs.com/read-23518.html

New Post(0)
CopyRight © 2020 All Rights Reserved
Processed: 0.078, SQL: 9