Java and XML joint programming SAX

xiaoxiao2021-03-06 91

SAX concept

SAX is an abbreviation of Simple API for XML. It is not a standard proposed by W3C, which can be said to be "folk" factual standard. In fact, it is a discussion product of community nature. Nonetheless, the application of SAX is less than the DOM in XML, and almost all XML parsers will support it.

Compared with DOM, SAX is a lightweight method. We know that when processing DOM, we need to read the entire XML document, then create a DOM tree in memory, generate each Node object on the DOM tree. When the document is small, this will not cause any problem, but once the document is large, the DOM will become quite cost-effective. In particular, its demand for memory will also be multiplied, so that in some applications, using DOM is a very disadvantage (such as in applet). At this time, a better alternative solution is SAX.

SAX is conceptually different from DOM. First, different from the DOM document driver, it is an event-driven, that is, it does not need to read the entire document, and the read process of the document is the resolution process of SAX. The so-called event drive refers to a program operation method based on a callback mechanism. (If you are more clear about the Java new agent event model, it will be easy to understand this mechanism)

Click to enlarge

In XMLReader accepts an XML document, parsing during reading into an XML document, that is, the process of reading documents and resolutions is carried out simultaneously, which is very different from DOM. Before the analysis begins, you need to register a contentHandler to XmlReader, which is equivalent to an event listener, defining a lot of methods in ContentHandler, such as startDocument (), which customizes when in the parsing process, should be processed at the beginning of the document thing. When XmlReader reads the right content, the corresponding event will be thrown, and the processing of this event is given to ContentHandler, calling its corresponding method to respond.

This generally said that some are not easy to understand, don't worry, followed by the examples will let you understand the resolution process of SAX. Take a look at this simple XML file:

ogden nash fleas </ title> <line> adam </ line> </ poem></p> <p>When XmlReader reads the <poem> tag, the contentHandler.StartElement () method is called, and the label name POEM is passed as a parameter. Do the appropriate action in the StarTelement () method you implemented, to handle things that should be done when <poem> appears. Each event is thrown as the resolution process (that is, the process of document reading), the corresponding method is also called, and finally, when the parsing is completed, the method is handled, the processing of the document It is also completed. The following is a method that is called when the XML file is parsed, and the method is called:</p> <p>Encounter project</p> <p>Method callback</p> <p>{Document start} StartDocument () <poem> StartElement (Null, "Poem", Null, {Attributes}) "/ N" Characters ("<poem> / n ...", 6, 1) <author> StartElement NULL, "Author", Null, {Attributes} "" OGDEN NASH "Characters (" <poem> / n ... ", 15, 10) </ author> endelement (null," author ", null) / n "Characters (" <poem> / n ... ", 34, 1) <title> StartElement (NULL," Title ", NULL, {Attributes})" Fleas "Characters (" <poem> / n ... " , 42, 5) </ Title> endelement (NULL, "Title", NULL) "/ N" Characters ("<poem> / n ...", 55, 1) <line> StartElement (NULL, "LINE" , NULL, {Attributes} "ADAM" Characters ("<poem> / n ...", 62, 4) </ line> endelement (null, "line", null) / n "characters (" <poem > / n ... ", 67, 1) </ poem> endelement (null," poem ", null) {Document end} enddocument () ContentHandler is actually an interface, when processing a specific XML file, It is necessary to create a class that implements ContentHandler to handle specific events, which can be said that this is actually the core of SAX processing XML files. Let's take a look at some of the methods defined in it:</p> <p>Void Characters (CHAR [] CH, INT Start, Int Length:</p> <p>This method is used to handle the character string in the XML file. Its parameter is a character array, and the starting position and length read in this array, we can easily use the String class. Structure method to get this string String class: String Charencontered = New String (CH, Start, Length).</p> <p>Void startDocument ():</p> <p>When you encounter the beginning of the document, call this method, you can do some pre-processed work.</p> <p>Void EndDocument ():</p> <p>Corresponding to the above method, when the document ends, calling this method, you can do some kind of work.</p> <p>Void StartElement (java.lang.string namespaceuri, java.lang.string localname, java.lang.string qname, attributes atts)</p> <p>This method is triggered when a label is read. The name domain is not supported in the SAX1.0 version, and the support for the name domain is provided in the new version 2.0. The Namespaceuri in the parameters is the name domain. LocalName is the label name, QNAME is the tag's modified prefix, when not These two parameters are not null when using the name of the domain. And ATTS is a list of properties included in this tag. With ATTS, all attribute names and corresponding values can be obtained. It should be noted that an important feature in SAX is that its stream processing. When you encounter a label, it does not record the label you have encountered before, that is, in the startElement () method, all The information you know is the name and attribute of the label. As for the nested structure of the label, the name of the upper label, whether there is a sub-genus, etc., it is not known, you need you. The program is completed. This makes SAX no DOM in programming processing is so convenient. Void endelement (java.lang.string namespaceuri, java.lang.string localname, java.lang.string qname)</p> <p>This method corresponds to the above method, calling this method when you encounter an end tag.</p> <p>Because ContentHandler is an interface, it may be some inconvenient when used, so it is also a Helper class in SAX: DefaultHandler, which implements this interface, but all the methods are empty, in real When you only need to inherit this class, then overload the corresponding method.</p> <p>OK, the basic knowledge of SAX here is almost finished, let's take a look at two specific examples to better understand SAX use.</p> <p>SAX programming instance</p> <p>We still follow the document examples used in DOM, but first, let's take a simple application, we hope to count the number of times of each label in the XML file. This example is simple, but it is enough to explain the basic ideas of SAX programming.</p> <p>Of course, the Import statement is of course the IMPORT statement:</p> <p>Import org.xml.sax.helpers.defaulthandler; import javax.xml.parsers. *; import org.xml.sax. *; import org.xml.sax.helpers. *; import java.util. *; import java. IO. *;</p> <p>Then we create a class inherited in the defaulthandler, the specific program logic can be temporarily in the same side, pay attention to the structure of the program:</p> <p>public class SAXCounter extends DefaultHandler {private Hashtable tags; // this Hashtable used to record the number of times the tag appears before processing document // public void startDocument () throws SAXException {tags = new Hashtable (); //} // initialize Hashtable for each element belongs to begin processing public void startElement (String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException {String key = localName; Object value = tags.get (key); if (value == null) { // If it is a new tag, this adds a record tags.put (KEY, NEW INTEGER (1)) in Hastable;} else {// If you have encountered it, you get its count value, and add 1Int Count = ((Integer) .intValue (); count ; tags.put (key, new integer (count));}} // Follow the statistical work PUBLIC VOID EndDocument () THROWS SAXEXCEPTION {ENUMERATION E = Tags. Keys (); while (e.hasmorelements ()) {string tag = (string) E.NEXTELEMENT (); int count = ((Integer) tags.get (tag)). INTVALUE (); System.out.Println "Tag <" Tag > Occurs " Count " Times ");}} // User entry, used to complete parsing work static public void main (string" args) {string filename = null; boolean validation = false FILENAME = "LINKS.XML"; SAXPARSERFAACTORY SPF = SAXPARSERFACT ory.newInstance (); XMLReader xmlReader = null; SAXParser saxParser = null; try {// create a parser SAXParser objects saxParser = spf.newSAXParser (); // get SAXParser encapsulated SAX XMLReaderxmlReader = saxParser.getXMLReader (); } catch (exception ex) {system.err.println (ex); system.exit (1);} try {// uses the specified contenthandler, resolved to XML files, hereby pay attention, for the simple to // Same, here will put the main program and contenthandler together. In fact, all things made in the main method are independent of ContentHandler.</p> <p>XmlReader.Parse (New file (file (filename), new saxcounter ());} catch (saxexception se) {system.err.println (se.getMessage ()); system.exit (1);} catch (ooexception ie) { System.err.Println (IOE); System.exit (1);}}} Let's take a look at this program, in the main () method, the main thing is to create a parser, then parse the document. In fact, when you create a SAXPARSER object here, in order to make the program code on the specific parser, use the same design skills as DOM: Create a concrete SAXPARSER object through a SAXPARSERFActory class, so that different uses When the parser is changed, it is only a value of an environment variable, and the code's code can remain unchanged. This is the idea of FactoryMethod mode. It is no longer specifically here. If you still don't understand, you can see the explanation in the DOM above, the principle is the same.</p> <p>However, there is still a point here to pay attention to the relationship between the SAXPARSER class and the XMLReader class. You may be a bit confused. In fact, SAXPARSER is a package class in JAXP to XmlReader, and XMLReader is a interface to parse documents in SAX2.0. You can call the Parser () method in the XMLReader to resolve the document, the effect is exactly the same. However, the Parser () method in SAXPARS is accepted more parameters, which can be parsed for different XML document data sources, so it is convenient than XmlReader.</p> <p>This example only involves a little fur of SAX, and this is to be advanced. Below we have to implement, in the example of the DOM, it is to read the content from the XML document and format the output, although the program logic looks very simple, but SAX is not more than DOM, look at it.</p> <p>As mentioned earlier, when you encounter a start label, in the StarTelement () method, we don't get this label where you are in the XML document. This is a big trouble when dealing with the XML document, because in the semantics of the tag in XML, some is determined by the location of its location. And in some programs that need to verify the document structure, this is a problem. Of course, there is no problem that you can't solve, we can use a stack to implement a record of the document structure.</p> <p>The stack is characterized by advanced first, our current idea is that in the startelemnt () method, use Push to add this tag's name to the stack, in the endelement () method, come out of POP. We know that the nesting structure is complete, and each start label will always correspond to an end tag, and there is no misplacement between label nesses. Thus, each STARTELEMENT () method will inevitably correspond to the call of the endelement () method, so that PUSH and POP are also paired, we only need to analyze the structure of the stack, you can easily know the current label In the location of the document structure.</p> <p>Public class saxreader extends defaulthandler {java.util.stack tags = new java.util.stack (); // ------------ XML Content ---------- - String text = null; string url = null; string description = null; string day = null; string year = null; string year = null; // ---------------------------- ------------------------------------ Public void enddocument () throws saxexception {system.out.println "------ parse end --------");} public void startdocument () throws saxception {system.out.println ("---- Parse Beguin ------ - ");} public void startElement (String p0, String p1, String p2, Attributes p3) throws SAXException {tags.push (p1);} public void endElement (String p0, String p1, String p2) throws SAXException {tags .pop (); // A LINK node information is collected, formatted the output IF (p1.equals ("link")) print ();} public void characters (char [] P0, INT P1, INT P2) THROWS SAXEXCEPTION {// Gets the current node in the stack String Tag = (String) tags.peek (); if (tag.equals ("text")) text = new string (P0, P1, P2); Else if (tag.equals ("url")) URL = New String (p0, p1, p2); Else IF (tag.equals ("AUTHO R "))))) Author = new string (p0, p1, p2); Else IF (tag.equals (" day ")) DAY = New String (p0, p1, p2); Else IF (tag.equals (" MONTH " ))) Month = New string (p0, p1, p2); Else IF (tag.equals ("year")) Year = new string (p0, p1, p2); Else IF (tag.equals ("description")) Year = new string (P0, P1, P2);} private void printout () {system.out.print ("Content:"); system.out.println (text); system.out.print ("URL:" ); System.out.Println (URL); System.out.print ("Author:"); System.out.Println (Author); System.Out.print ("Date:"); System.out.println Day "-" Month "-" Year);</p> <p>System.out.print ("Description:"); System.out.Println (Description); system.out.println ();} static public void main (string [] args) {string filename = null; boolean validation = false ; filename = "links.xml"; SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser saxParser = null; try {saxParser = spf.newSAXParser ();} catch (Exception ex) {System.err.println (ex); System .exit (1);} try (new file (file (filename), new saxreader ());} catch (saxexception se) {system.err.println (se.getMessage ()); system.exit (1 );} catch (ioException ie) {system.err.println (IOE); system.exit (1);}}} Although there is no analysis of the stack, the analysis of the actual stack is very easy. Things, you should inherit the Java.util.Vector class for java.util.stack, and the elements in Stack are the structural arranging by the stack, because we can use the Size () method of the Vector class to get Stack The number of elements can also be used to get the specific ENTER. In fact, if you are arranged one by one from the bottom, we get a unique path from the XML root node to the current node, with this path information, the structure of the document is clear.</p> <p>Section</p> <p>Ok, until this, we have mastered two major tools for XML programming: DOM and SAX, and also know how to use them in a Java program. The DOM programming is relatively simple, but the speed is slower, which is more memory, and SAX programming is more complex, but the speed is fast, and the memory is less. Therefore, we should choose different methods according to different environments. Most XML applications can basically be solved by them. In particular, the DOM and SAX are actually language-independent, not Java unique, that is, as long as there is a corresponding language implementation, DOM and SAX can be applied in any object-oriented language.</p> <p>Above we introduced some methods of reading, content extraction, and documentation, and some ways to add and modify the XML document. Another type of problem is the conversion of the XML document, although it can be solved with DOM and SAX, but it is very complex, and the application XSLT will be simple. This problem will be discussed in detail in later articles.</p></div><div class="text-center mt-3 text-grey"> 转载请注明原文地址:https://www.9cbs.com/read-90093.html</div><div class="plugin d-flex justify-content-center mt-3"></div><hr><div class="row"><div class="col-lg-12 text-muted mt-2"><i class="icon-tags mr-2"></i><span class="badge border border-secondary mr-2"><h2 class="h6 mb-0 small"><a class="text-secondary" href="tag-2.html">9cbs</a></h2></span></div></div></div></div><div class="card card-postlist border-white shadow"><div class="card-body"><div class="card-title"><div class="d-flex justify-content-between"><div><b>New Post</b>(<span class="posts">0</span>) </div><div></div></div></div><ul class="postlist list-unstyled"> </ul></div></div><div class="d-none threadlist"><input type="checkbox" name="modtid" value="90093" checked /></div></div></div></div></div><footer class="text-muted small bg-dark py-4 mt-3" id="footer"><div class="container"><div class="row"><div class="col">CopyRight © 2020 All Rights Reserved </div><div class="col text-right">Processed: <b>0.050</b>, SQL: <b>9</b></div></div></div></footer><script src="./lang/en-us/lang.js?2.2.0"></script><script src="view/js/jquery.min.js?2.2.0"></script><script src="view/js/popper.min.js?2.2.0"></script><script src="view/js/bootstrap.min.js?2.2.0"></script><script src="view/js/xiuno.js?2.2.0"></script><script src="view/js/bootstrap-plugin.js?2.2.0"></script><script src="view/js/async.min.js?2.2.0"></script><script src="view/js/form.js?2.2.0"></script><script> var debug = DEBUG = 0; var url_rewrite_on = 1; var url_path = './'; var forumarr = {"1":"Tech"}; var fid = 1; var uid = 0; var gid = 0; xn.options.water_image_url = 'view/img/water-small.png'; </script><script src="view/js/wellcms.js?2.2.0"></script><a class="scroll-to-top rounded" href="javascript:void(0);"><i class="icon-angle-up"></i></a><a class="scroll-to-bottom rounded" href="javascript:void(0);" style="display: inline;"><i class="icon-angle-down"></i></a></body></html><script> var forum_url = 'list-1.html'; var safe_token = '5p5R5ST9q9rFLyC_2F9OaFMDLZGYLRRQzfxfh5Pb2zWsIToW5AL324G3sPPb6pcHGK7XLnLe0CvGWzlNCK2JMh_2Fw_3D_3D'; var body = $('body'); body.on('submit', '#form', function() { var jthis = $(this); var jsubmit = jthis.find('#submit'); jthis.reset(); jsubmit.button('loading'); var postdata = jthis.serializeObject(); $.xpost(jthis.attr('action'), postdata, function(code, message) { if(code == 0) { location.reload(); } else { $.alert(message); jsubmit.button('reset'); } }); return false; }); function resize_image() { var jmessagelist = $('div.message'); var first_width = jmessagelist.width(); jmessagelist.each(function() { var jdiv = $(this); var maxwidth = jdiv.attr('isfirst') ? first_width : jdiv.width(); var jmessage_width = Math.min(jdiv.width(), maxwidth); jdiv.find('img, embed, iframe, video').each(function() { var jimg = $(this); var img_width = this.org_width; var img_height = this.org_height; if(!img_width) { var img_width = jimg.attr('width'); var img_height = jimg.attr('height'); this.org_width = img_width; this.org_height = img_height; } if(img_width > jmessage_width) { if(this.tagName == 'IMG') { jimg.width(jmessage_width); jimg.css('height', 'auto'); jimg.css('cursor', 'pointer'); jimg.on('click', function() { }); } else { jimg.width(jmessage_width); var height = (img_height / img_width) * jimg.width(); jimg.height(height); } } }); }); } function resize_table() { $('div.message').each(function() { var jdiv = $(this); jdiv.find('table').addClass('table').wrap('<div class="table-responsive"></div>'); }); } $(function() { resize_image(); resize_table(); $(window).on('resize', resize_image); }); var jmessage = $('#message'); jmessage.on('focus', function() {if(jmessage.t) { clearTimeout(jmessage.t); jmessage.t = null; } jmessage.css('height', '6rem'); }); jmessage.on('blur', function() {jmessage.t = setTimeout(function() { jmessage.css('height', '2.5rem');}, 1000); }); $('#nav li[data-active="fid-1"]').addClass('active'); </script>