How do you get an encoding value when parsing an XML file?

xiaoxiao2021-03-05  29

DOM4J API:

Org.dom4j interface document Document

getXMLEncodingpublic String getXMLEncoding () Return the encoding of this document, as part of the XML declaration This is null when unspecified or when it is not known (such as when the Document was created in memory) or when the implementation does not support this operation. The way this encoding is retrieved also depends on the way the XML source is parsed. For instance, if the SAXReader is used and if the underlying XMLReader implementation support the org.xml.sax.ext.Locator2 interface, the result returned by this method IS specified by the geternal () Method of That Interface.

Returns: The Encoding of this document, as stated in the xml declaration, or null if unknown.since: 1.5

An important task for DOM Level 3 is to match the DOM data model with XML Information Set (InfoSet) by joining new XMLInfoset information that can query. For example, you can now query and modify information stored in an XML declaration through the Document interface (it mapped to the Infoset document information), such as Version, Standalone, and Encoding. Similarly, basic URIs and declaration basic URI properties are based on XML Base, they are placed in the Node interface. You can also get the Whitespace properties of the XML Infoset element content. This attribute indicates whether a Text node contains only a blank that can be ignored. This property can be obtained through the Text Interface (which is mapped to the XML InforSet Character Information item). Listing 1 shows the actual method signature in this interface in the Java language binding.

Listing 1. Signature of the method binding in Java language

// XML Declaration information on // the org.w3c.dom.Document interfacepublic String getXmlEncoding (); public void setXmlEncoding (String xmlEncoding); public boolean getXmlStandalone (); public void setXmlStandalone (boolean xmlStandalone) throws DOMException; public String getXmlVersion ( Public void setXMLVersion (String Xmlversion) THROWS DOMEXCEPTION;

// Element Content Whitespace Property on the text // interfacepublic Boolean iswhitespaceineINEEMENTCONTENT (); through the SchematypeInfo property through the Attr interface, you can also get the value of the properties of the property information item - the type of an attribute. There is a more detailed introduction to this later.

In addition, there is a new feature that returns Document in the form of closest to XML Infoset, prior to this, the document is usually more deviated by XML Infoset due to different editing operations (eg, inserted or deleting nodes). This is a part of the result of the document standardization (Document Normalization), and we will describe this in the following document standardization section.

Finally, the new Appendix C provides mapping between XML Infoset models and DOM. In this mapping, each XML Infoset information item is mapped to its corresponding node, and it is the same, each attribute of an information item Mapping the properties of its corresponding NODE. This appendix should allow you to have a good comprehensive understanding of the DOM data model and show how to access the information you want to find.

Those features are available in Dom Level 3, you may be DOWN to DOM Level 1 or Dom Level 2 to its management site

Using JDOM: SAXBUILDER Builder = new saxbuilder (); Document Doc;

doc = builder.build (new FileInputStream ( "sample.xml")); XMLOutputter output = new XMLOutputter (); output.output (doc, new FileOutputStream ( "shit.xml")); System.out.println (output. Getformat (). getEncoding ()); SO Simple ...------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------> XML File: < Comment> vincent --------------------------------------- -----------------------> Output: UTF-8

Import org.dom4j.document; import org.dom4j.documenthelper;

String XML = " hi there "; Document Doc = DocumentHelper.Parstext (XML); System.out.Println "The encoding is" doc.getxmlencoding (); system.out.println ("AS XML:" Doc.asxml ()); The Result IS:

The Encoding IS ISO-8859-1AS XML: hi there

==========================================================================================================================================================

String XML = " hi there "; document doc = documenthelper.parstext (xml); system.out.println ("the Encoding is " doc.getxmlencoding ()); system.out.println (" AS XML: " Doc.asxml ());

The Result IS:

The Encoding IS UTF-8

AS XML: hi there

==================================== String XML = " hi there "; Document Doc = DocumentHelper.Parstext (XML); System.out.Println (" The Encoding IS " Doc.GetXmlencoding ()); System.out.Println ("AS XML:" Doc.asxml ());

The result is: the encoding is GBK

AS XML: hi there

Import javax.xml.parsers.documentbuilder; import javax.xml.parsers.DocumentBuilderFactory;

Import org.apache.xerces.dom.DocumentImpl;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance (); DocumentBuilder builder = factory.newDocumentBuilder (); InputStream in = new FileInputStream (args [0]); DocumentImpl doc = (DocumentImpl) builder.parse (in); System.out.println (doc .GETXMLENCODING ());

转载请注明原文地址:https://www.9cbs.com/read-33229.html

New Post(0)