Author: rainy14f
Providing PDF file support for web pages In this article, Nick AfShartous describes a way to convert HTML's content to a PDF format. This method is quite useful, for example, a web program can provide functions such as Download as PDF on its page. This feature is convenient for printing and storage for future use. Afshartous's conversion method only uses open source components. There are also some commercial products available. Therefore, this method described in this article is both in the price, and the source code of the component used can be obtained. Putting the web content in PDF format facilitating the propagation of content. In some applications, documents that provide format easy to print are required, such as employee interests, etc. In fact, the law requires Summmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmary Plan Descriptions (SPDS) must be able to print, even if they are online provided. However, only the print web is not enough because the print format must contain the table content and page number. In order to provide such functions, developers can convert HTML content to PDF formats. That is to do this. This method introduced here is only using open source components. Some commercial products also support dynamic documentation, such as Adobe, which has a Document Server product line. However, the overhead of using commercial products is considerable. Use an open source solution to alleviate the problem of overhead and increase the transparency of the component source code. The conversion process contains three steps: 1. Convert HTML to XHTML; 2. Convert XHTML to XSL-FO (Extensible Stylesheet Language Formatting Objects Extended Style Table Language Format). Here you will use the XSL style sheet and the XSLT converter; 3. Transmit the XSL-FO document to the formatter to generate the target PDF document. This article first introduces how this conversion is used to use the command line interface, then describes how to use the DOM interface in Java to do the same job. Component version: The code in this article tests in the following versions: Component version JDK 1.5_06JTIDY R7-Devxalan-J 2.7FOP 0.20.5
Using the command line interface in each step in the conversion process contains the process of generating an output file from an input file. This process can be represented by the following picture: The command line interface using these three tools begins to work is a good way, although this method is not suitable for product-level systems, because it needs to write temporary intermediate files in the disk. . This additional I / O causes a decrease in performance. Later, this issue will be resolved when we call these three tools with Java. Step 1: Convert HTML for the first step in XHTML is to convert HTML into a new XHTML file. Of course, if the file is already XHTML, it doesn't need this step. I use JTIDY to complete this conversion. JTIDY is a Java version of the Tidy HTML parser. In the process of conversion, JTIDY automatically adds a missing label to create a well-final XML document. I use the latest version R7-dev on SourceForge. You can use the following scripts to run Jtidy: # / bin / shjava -classpath lib / tidy.jar org.w3c.tidy.tidy -asxml $ 1> $ 2 This script sets ClassPath and calls JTIDY. When running, the file to be entered is passed to JTIDY in the form of command line parameters. By default, the generated XHTML will be output to the standard output device. -Modify switch can be used to overwrite input files. -ASXML switch redirects the output of JTIDY to XML in format. When calling, like this: tidy.sh hello.html hello.xmlhello.html (input) and Hello.xml content is as follows:
Hello World! P> is an automatic [translation 1] for JTIDY. Step 2: Conversion XHTML is below XSL-FO [Translation 2], XHTML will be converted to XSL-FO, one for specifying a print format for an XML document. I complete this conversion by working with the XSLT converter (Apache Xalan) to complete this conversion. The style sheet I use is XHTML2FO.XSL provided by Antenna House. Antenna House is a company that sells business format programs on XSL-FO. The XHTML2FO.XSL style sheet specifies how to translate each HTML tag into a corresponding XSL-FO formatting command sequence. For example, H2 tags in HTML are defined in translation as: [code]
The second line in the template
The FO: Block tag is output, and the attribute of the H2 is generated as the properties and values of the BLOCK tag. Each XSL-FO block (block) is a paragraph, and their format is based on the value of the block's attribute. The attribute of the H2 is defined in the style sheet as:
This template is specified in the style sheet. Its role is to check some normal HTML properties (such as LANG, ID, Align, Valign, Style) and generate the corresponding XSL-FO indicator. To trigger translation of any label embedded in the top H2 tag, Process-Common-Attributes-and-Children will call: Therefore, if the input is Hello there em> h2>
Then the
XML Version = "1.0" encoding = "UTF-8"?>
Public static void main (String [] args) {// Open file IF (args.length! = 2) {system.out.println ("usage: html2pdf htmlfile style); system.exit (1);} fileInputstream Input = null; String htmlFileName = args [0]; try {input = new FileInputStream (htmlFileName);} catch (java.io.FileNotFoundException e) {System.out.println ( "File not found:" htmlFileName);} Tidy Tidy = new tidy (); Document Xmldoc = Tidy.Parsedom (Input, NULL); JTIDY DOM implementation does not support XML namespace. Therefore, we must modify the style sheet for Antenna House, let it use the default namespace. For example, it is:
After being modified, it is:
This change must be applied to all templates in XHTML2F0.xs, because JTIDY generated Document objects as roots as labels, such as:
Modified XHTML2FO.xsl is included in the source code supplied with this article. Next, the XML2FO () method calls Xalan to apply the style table to the DOM object generated by JTIDY:
Document FODOC = XML2FO (XMLDoc, Args [1]);
Method XML2FO () first calls GetTransformer () to get a TRANSFORMER object of a specified style sheet. Then, the Document representing the result of the conversion result is returned:
private static Document xml2FO (Document xml, String styleSheet) {DOMSource xmlDomSource = new DOMSource (xml); DOMResult domResult = new DOMResult (); Transformer transformer = getTransformer (styleSheet); if (transformer == null) {System.out.println ( "Error creating transformer for" styleSheet); System.exit (1);} try {transformer.transform (xmlDomSource, domResult);} catch (javax.xml.transform.TransformerException e) {return null;} return (Document DomResult.getNode ();} Next, the main method opens a FileoutPutStream with the same prefix as the HTML input file. The result obtained by calling the FO2PDF () method is written to OutputStream:
String pdfFileName = htmlFileName.substring (0, htmlFileName.indexOf () ".") ".Pdf"; try {OutputStream pdf = new FileOutputStream (new File (pdfFileName)); pdf.write (fo2PDF (foDoc));} Catch (java.io.filenotfoundexception e) {system.out.println ("ERROR CREANG PDF: PDFFILENAME);} catch (java.io.Exception e) {system.out.println (" Error Writing PDF: " PDFFileName);
Method FO2PDF () will use XSL-FO Document generated in the conversion to generate a FOP Driver object. A PDF file can be generated by calling Driver.run. The result was returned as a Byte Array:
private static byte [] fo2PDF (Document foDocument) {DocumentInputSource fopInputSource = new DocumentInputSource (foDocument); try {ByteArrayOutputStream out = new ByteArrayOutputStream (); Logger log = new ConsoleLogger (ConsoleLogger.LEVEL_WARN); Driver driver = new Driver (fopInputSource, out Driver.setLogger (log); driver.sethrenderer (driver.Run (); return out.tobyteaRray ();} catch (exception ex) {return null;}}