I started to learn Dom4j, I have been found on the Internet. I have been very fast, but I found a problem that I can't save XML files in UTF-8. After saving, I will read again "Invalid Byte 2 of 2-Byte UTF-8 SEQUENCE. "This file is checked by the Dom4j generated by DOM4J. In any editor that can properly handle XML encoding, the Chinese is garbled, from Notepad Viewing and garbled will be correct Display Chinese. I am very headache. Try using GBK, the XML file generated by GB2312 coding can be analyzed. Therefore, doubt DOM4J has no processing of UTF-8 encoding. I started to view the original code of DOM4J. The problem that finally discovered is the problem of its own program. The code to create a new XML document in the tutorial in DOM4J's example and online popular "DOM4J Use" is similar to the following public void createXML (String FileName) {document. Nbspdoc = org.dom4j.document. Elper.createdocument. Element root = doc.addelement ("book"); root.addattribute ("Name", "My Book"); Element ChildTmp; ChildTmp = Root.addelement ("Price"); ChildTmp.Settext ("21.22" Element Writer = root.addelement ("author"); Writer.Settext ("Li 4"); Writer.Addattribute ("ID", "001"); try {org.dom4j.io.xmlwriter XMLWriter = new ORG .dom4j.io.xmlwriter (new filewriter); xmlwriter.write (); xmlwriter.close ();} catch (exception e) {system.out.println (e);}} in the above code The output is used as the FileWriter object output. This is why the subclass inherited by the Writer class inherited from the Writer class did not provide the encoding format processing, so the DOM4J will not process the correct format of the output file. At this time, the files saved at this time are saved with the default encoding of the system. The default code of Java under the Chinese version of WINDOW is GBK, that is, although we identified the XML to be saved as UTF-8 format but actually The file is saved in GBK format, so this is why we can use GBK, GB2312 encoding to generate an XML file that can be parsed correctly, and the file generated in UTF-8 format cannot be parsed by the XML parser. Ok, now we have found the reason, let's find a solution.
First, we look at how to achieve public XMLWriter dom4j encoding process (OutputStream out) throws UnsupportedEncodingException {//System.out.println("In OutputStream "); this.format = DEFAULT_FORMAT; this.writer = createWriter (out, format. getEncoding ()); this.autoFlush = true; namespaceStack.push (Namespace.NO_NAMESPACE);} public XMLWriter (OutputStream out, OutputFormat format) throws UnsupportedEncodingException {//System.out.println("In OutputStream, OutputFormat "); this .format = format; this.writer = createWriter (out, format.getEncoding ()); this.autoFlush = true; namespaceStack.push (Namespace.NO_NAMESPACE);} / ** * Get an OutputStreamWriter, use preferred encoding * /. protected Writer createWriter (OutputStream outStream, String encoding) throws UnsupportedEncodingException {return new BufferedWriter (new OutputStreamWriter (outStream, encoding));} From the above we can see that the code for coding dom4j and no complex processing, completely through the java this The function of the body is completed. So we should not directly assign a Writer object directly when building XMLWRITER using DOM4J, should be built directly to it directly, but should be built through an OutputStream sub-objects.