Complex text printing under Linux system
Chen Pu Wu Jian
Institute of Software, Chinese Academy of Sciences, Beijing 100080, China
Email: chenpu@sict.ac.cn
Summary
There are 56 ethnic minorities in my country, most of which are complex text. Now the printing system in the Linux environment is not perfect, there are still many problems with complex text. This article first illustrates the difficulty of complex text printing, then analyzes the current print implementation in the popular Linux environment, focusing on printing of printing under KDE, finally, presenting in an existing Linux desktop environment Complex text prints need to be solved.
1 Introduction
There are 56 ethnic minorities in my country, but for a long time, ethnic minority language information research and development and industrialization have been limited to Chinese information technology due to the factors of "planning, organizational, funds, talents, technology". It not only has become a technical bottleneck affecting my country's information technology, but also has become an unfavorable factor in my country 's improvement of comprehensive national strength. National 863 major special "National Language Version Linux Operating System and Office Suite R & D" (Project Number: 2003AA1Z2110) is to develop a complete Linux system minority language operating system and office suite, focus on the text input, display, printing of minority languages Key technology.
Localization of Linux systems contains three aspects: input, display, and printing. Where input and display can be implemented in accordance with international standards, such as C Locale, X locale, getText, etc. However, there is no printing in international standards. The printing of complex text requires different principles and implementation methods.
This paper analyzes the difficulty of implementing the implementation process in the liunx environment, and has made improved programs for difficulties. There are currently a number of popular Linux platforms such as KDE, GNOME, etc., their print implementations are different, using different libraries.
2. Difficulties in complex text printing
Currently there are many sets of print printers on UNIX, basically their output files are PostScript files because there is a GPL PostScript document print program, which is GHSOTScript. GhostScript has good portability, which has almost included all visible devices on the current market. And its quality is as good as the high-end PostScript printer, and even fully supports the function of color column printing.
So use PostScript in Linux's environment as the output document format is natural. But let PostScript documents support complex text printing is not a very easy thing, this has two reasons.
Cause 1, it is the characterization format that PostScript basically used is mainly designed for a single ByTe Character. For this type of complex text, the sequence set is not considered when designing, but because the PostScript language is a drawing language, we can of course handle these complex text as a graphic, but postscript is directly Some of the text print features that support cannot be used, and the size of the PostScript file will also increase.
Cause 2, if you want to keep these characters, there is a corresponding postscript print font. In the printer, some fonts are generally installed. If you want to properly print out a small number of complex texts, such as Wenwang, Tibetan text, you need GHOSTScript to add the original postscript file to these complex text. This information comes from the PostScript font. But now there is no POSTScript font for these national languages, which is often used just TureType fonts or OpenType fonts.
Let's introduce PostScript, then analyze the print implementation process in the KDE environment, and finally, how to make KDE (ie: QT library) supports printing of complex text. Standard data - PostScript data stream
PostScript is a printer language that is unrelated to the device, that is, when defining images, it can not consider the characteristics of the output device (such as the resolution, paper size, etc.) of the printer, and it implements the same processing process for text and graphics, this Bring great flexibility to the processing font. Due to the unrelated feature of the POSTScript device, the POSTScript is implemented by the printer description file by the POSTScript Printer Description file, such as the POSTScript Printer Description file. The PPD file mainly provides the following specific information related to the printer: the default / highest resolution, whether halftone monitoring, user setting monitoring information, page size definition, page printable area, default font (usually Courier), whether to support Double-sided printing, etc. Because PostScript is very complex, the general print controller is difficult to compete, usually by the Raster Image Processor in the printer to complete this conversion process.
The PostScript data stream is a data stream that conforms to the PostScript language specification. PostScript language includes three levels: PostScript Level 1, PostScript Level 2 and PostScript Level 3. For printing of complex text, you need a PostScript font name to be used in a unified PostScript.
Linux printing mechanism model
There is no unified print implementation mechanism in the current Linux environment, and the most common file format that needs to be printed is a normal text file and a PostScript file. Printing for ordinary text files is usually converted to the PostScript file before printing.
The existing PS file generation mode is mostly based on the library of development software or the application software, such as gnome (libgnomeprint), qt, openoffice.org, and more. The popular Linux desktop environment GNOME and KDE each provide their own print implementation and managed by the CUPS (General Unix Print System). Generally, the text you see on the screen is printed on the paper, and after two steps: 1. The application generates a PostScript file; 2. Send the file to the printer (directly or through ghostscript). The following figure shows GNOME and KDE Print implementation process (Figure 1):
Figure 1: Linux printing model
CUPS uses the Internet Print Protocol (Internet Printing Protocol: IPP) as the foundation of managing printing. In addition, the new version of CUPS also adds network printer browsing and other options based on PostScript printer specification.
Currently, not each printer supports direct printing of PostScript files, only some laser printers support postscriptd printing. Therefore, CUPS also uses a special GNU GhostScript and a mapping file RIP to implement support for non-PostScript specification printers.
Print of complex text
Using the PostScript language and its parser (ghostscript) in Linux's KDE environment to solve the three aspects of the problem:
1. Define PostScript rules
The operation process related to the display character in the PostScript language is:
1) Select the size of the font and font
2) The nip moves to the starting position of writing.
3) Writing characters
To completely similar text printing, you must specify the PostScript Font Library for this complex text. Print correctly if these francs are defined correctly. 2. Make GHOSTScript to explain PostScript files
Define a PostScript font library and correctly explain this font, which is usually provided and completed by PostScript interpreter software.
3. Applications and Conversion Programs Output Rule POSTScript files
After the above two problems have been solved, a complete complex text print system can be established. The remaining conversion program that converts the application to the POSTScript file to the PostScript file must output the PostScript file that meets the first rule.
To make the application and converter output a rule of PostScript file, you need to pay attention to the problem:
1) Complicated text and segmentation of Western
Due to the string of complex text and Western, the PostScript program is written separately using complex text or Western PostScript font, so the question of writing PostScript files should pay attention to the problem of entering the text characters. For the string of complex text, first set the current font library to this complex text, then write; writing to Western, first set the current font to the PostScript Fontus of Western. If there is a problem in segmentation, the parser will have an encoded range error when parsing the PostScript file and cannot continue.
2) The size match of the font library
Segmentation writing characters to match the size of the font. The character size of complex text can be determined according to the size of the Western character. Generally, complex text is slightly higher than Western text, and its bottom is slightly lower than Western text. If the fixed width character is selected by Western, and to align it up and down, it is generally compressed to the general width of the complex text, and the height of the Western character is to be enlarged. Generally fixed width of Western text is biased, which can be achieved by the POSTScript language.
3) Folding
Sometimes a string of complex text and Western string is too long, as a whole writing will make the text beyond the right boundary. At this time, you should subdivide this string so that the first part is written in the head and does not exceed the boundary, and the other part is written in one or a few lines below. Some punctuation should not be placed at the beginning or end of each row, should be advanced or postponeted several character folds, and adjust the entire line.
4) line spacing
If the linear spacing of complex text is slightly larger than the western version, it should be adjusted.
Qt's printing combines the printing of text and text, and generates a PostScript print file by the QPrinter class. This file will remove all the words used from the TrueType font file and embed it, and then the Unicode code and the word type of each word correspond to the output location of the character being defined by the PostScript language. Such as:
19 Y <00010002000400040007> [7 0 7 0 7 0 7 0 7 0 7 0 7 0 7 0 7 0 0
This code represents 7 characters in the position of the vertical 19. The seven characters of the word type information comes from the TrueType font file and writes in a PostScript statement, such as:
/ unia8b1 {{100 1000 0 32 -10 927 769 _sc
254 715_M
274 699 294 682 315 664 _C
325 655 336 651 348 652 _C
355 652 360 658 364 668 _C
366 678 364 694 356 716 _C
.................................. ...
Where / UNIA8B1 is a Tibetan Unicode code, behind it is the font information of PostScript syntax.
in conclusion
For the PostScript file generated by QPrinter, if the word type is not embedded, you need to specify the font used in the language, and the PostScript font must exist in the PostScript interpreter system. If the specified font is not found, the PostScript interpreter uses the default font name to display, which generally this default font name cannot explain some of the font names of complex fonts, and the text displayed on the screen is displayed differently. Case. Since the font name in the PostScript created by the system needs to be specified, the name of these fonts should be included in the PostScript interpreter system. Specific need: unified PostScript file content format; unified PostScript generation tool; unified font installation, path; 4. The font name inside the unified font, such as the font family name, font full name, font PS name Uniform is the same name. references
[1] The Kurt Pfeifle. Version 1.00. Kurt Pfeifle, Danka Deutschland GmbH.
[2] PostScript®Language Reference. Third Edition, Adobe Systems Incorporated, First Printing February 1999.
[3] Linux program design authority guide. Mechanical Industry Press, Yu Zhanyi, Chen Xiangyang, Fang Han, 2001, 2001.
[4] Supporting Downloadable PostScript Language Fonts, Technical Note # 5040, Adobe Systems Incorporated