DOM4J code problem is completely solved

xiaoxiao2021-03-06 26

I started learning Dom4j in these few days. I got it on the Internet. I was very fast. But I found a problem that I couldn't save XML files in UTF-8. After saving, I will read again "Invalid Byte 2 of 2-Byte UTF-8 SEQUENCE. "This error is found that this file generated by the DOM4J is garbled in any editor that correctly handles XML encoding. From Notepad Viewing and garbled Chinese will display Chinese correctly. I am very headache. Try using GBK, the XML file generated by GB2312 coding can be analyzed. Therefore, doubt DOM4J has no processing of UTF-8 encoding. I started to view the original code of DOM4J. The problem that finally discovered is the problem of its own program.

Examples of DOM4J and popular online

"DOM4J Introduction" This tutorial is similar to the code of the new XML document.

public

Void

CreateXML (String filename) {

Document doc = org.dom4j.documenthelper.createdocument ();

Element root = doc.addelement ("book");

Root.addattribute ("Name", "My Book");

ELEMENT CHILDTMP;

ChildTMP = root.addelement ("price");

ChildTmp.Settext ("21.22");

ELEMENT WRITER = root.addelement ("author");

Writer.Settext ("Li Si");

Writer.Addattribute ("ID", "001");

Try {

Org.dom4j.io.xmlwriter xmlwriter = new org.dom4j.io.xmlwriter

New FileWriter (FileName);

XMLWRITER.WRITE (DOC);

XMLWriter.Close ();

}

Catch (Exception E) {

System.out.println (e);

}

} The output used in the above code is the output of the FileWriter object. This is why the subclass inherited by the Writer class inherited from the Writer class did not provide the encoding format processing, so the DOM4J will not process the correct format of the output file. At this time, the files saved at this time are saved with the default encoding of the system. The default code of Java under the Chinese version of WINDOW is GBK, that is, although we identified the XML to be saved as UTF-8 format but actually The file is saved in GBK format, so this is why we can use GBK, GB2312 encoding to generate an XML file that can be parsed correctly, and the file generated in UTF-8 format cannot be parsed by the XML parser. Ok, now we have found the reason, let's find a solution. First we see how DOM4J implements encoding processing.

Public XMLWriter (OutputStream Out) throws unsupportedEncodingexception {

//System.out.println ("in outputstream ");

this.format = default_format;

This.writer = CREATEWRITEWRITER (OUT, FORMAT.GETENCODING ());

THIS.AUTOFLUSH = TRUE;

NamespaceStack.push (namespace.no_namespace);

}

Public XMLWriter (OutputFormat Format) throws unsupportedencodingexception {//system.out.println ("in outputstream, outputformat ");

this.format = format;

This.writer = CREATEWRITEWRITER (OUT, FORMAT.GETENCODING ());

THIS.AUTOFLUSH = TRUE;

NamespaceStack.push (namespace.no_namespace);

}

/ **

* Get An OutputStreamWriter, Use preferred encoding.

* /

Protected Writer CreateWriter (OutputStream Outstream, String Encoding) throws unsupportedEncodingexception {

Return New BufferedWriter

New OutputStreamWriter (Outstream, Encoding)

);

}

From the above code. We can see that the DOM4J has no complicated processing on the encoding, complete through the functionality of Java itself. So we should not directly assign a Writer object directly when building XMLWRITER using DOM4J, should be built directly to it directly, but should be built through an OutputStream sub-objects. That is to say in our code, you should not use the FileWriter object to build an XML document, and you should use the FileoutPutStream object to build, so you can modify the code: public void createXML (String filename) {