David Mertz, Doctoral Protection Expert, Gnosis Software, Inc. October 2000
Content: What is docbook? Semantic flexibility preparation, place, tag! Contains content training education reference information about the author
In its third part of the "XML problem" new column, David Mertz takes you to use
Docbook, a SGML / XML dialect describing technical articles and other documentates. David discussed the use
The benefits of Docbook then describe how to plan and modular large document conversion projects.
Imagine a historian after a hundred years later I want to find an electronic document library and must decode them. A century-century-changing technology update must give her a great problem. But when the situation is not necessarily this!
This column proposes a very practical concern for me. Over the years, I wrote many academic papers about human subjects and hoped to put these papers on my website. But unfortunately, many years I have replaced transition processors and platforms many times, and many documents saved are written using programs that no longer have or unable. Even if I can get these programs, it may not be possible to run them on the current computer. I can find some conversion programs to do some suitable conversion work for the programs I can run. In other cases, it is necessary to use the original word processor format, most of which are ASCII, which has many typesetting errors.
In short, my electronic file is a mess. Many individuals and organizations are suffering from this worse file. With each software upgrade, large organizations lose a lot of important file documents to adapt technology changes - a more complex problem over time.
Fortunately, we can create documents that are more testing than existing documents. XML / SGML is usually, especially Docbook, walks long walking in the process of creating flexible and lasting documents.
What is docbook? Docbook is an SGML dialect developed by O'Reilly and Hal Computer Systems in 1991. It is now maintained by the Structured Information Standard Advanced Organization (OASIS). Docbook describes the contents of articles, books, technical manuals, and other documents. Although DocBook focuses on technical writing form, it is usually sufficient to describe most common writing. In this article, I will discuss the XML variants of the same Docbook DTD. It is also the most fundamental factor that can be used for time-tested documents to use document formats with open standards, such as XML / SGML. These open standards include two elements:
The grammar, or the appearance of the document, or the meaning of the documentation The syntax of the Docbook document is fully included in the simple XML tag rule and the DocBook DTD in each DocBook document. Its semantics is slightly different. For example, the DTD contains a particular semantic feature that determines which elements may or must appear inside other elements. The Docbook tag is applied to make them have some "general meaning" in some "general sense". However, other more detailed semantic issues depends on a specific publication guide, general usage rules and editor's judgments (for example, control suitable for list-specific lists). caution,
Reference inside
The Docbook manual provides some information about the regular language guide, but each different publications have more specific guidelines.
The second key is theoretical, but it is quite meaningful in practice. How easy is it to explain and use document formats other than formal specifications? It is difficult to understand the use of the text viewer to view the old binary stream format. But XML documents usually look at a fairly reasonable appearance, even without formal confirmation and processing. Of course, simple ASCII is easier to read.
Moreover, even if there is no formal specification, some formats are more easily reconstructed than other formats. Imagine our historians to find two documents: one is MS Word 97 format, accompanied by MSDN file format specification CD, one is XML format (ie a document missing DTD). Very clear, this historian is easier when re-constructing the contents of the XML document. In fact, there is no supplier - even Microsoft is not - a lot of work when writing a Word 97 converter, even in the format specification. In this regard, it is imagined that after your employer will "upgrade" to MS Office 2005 after your employer will re-construct your own document. After remembering the problem of portability and technology, I started to convert my previous academic work into a DocBook format. I believe this project will help the work saves and make it used for current and future document formats (by conversion).
Semantic flexibility To remember that the DocBook document comments are the semantics of the document rather than typographic or appearance. It focuses on the semantics of the document, rather than concentrated on word processors, HTML, and even TEX. Word processor typically allows you to help you mark, such as "Header, Level 2" style table, but to gradually try to implement "WySIWYG). Even if the style sheet is rarely consistent between the documentation. This approach has a large amount of assumptions such as, for example, transactions such as page size and layout, available fonts, and type patterns of elements. These assumptions have nothing to do with most of the conceptual meaning of text. Almost all assumptions make the document difficult to adapt to other different formats - whether it is different print layout, screen display, voice synthesizer version or the index of the web robot. HTML was originally similar to DocBook (although it was simpler), it added more and more typesetted tags, so it is currently a semantic and plated mixture (for example,
?
Preparatory, place, tag! My first project - converts doctoral thesis into docbook - is a big project, but I will increase. In addition to the current time of the paper, individual documents also cause some of the problem of document system. include:
Requires Roman pronunciation symbol (but non-European character set) Failed and cross-reference page number multi-level inscription Bibliographic 附 级 数 符 符 常 常 常 布 布 布 布 布 布 布 布 常 布 布 布 布 布 布 布 布 布 布 布 布 常 布 布(Must try to make it approach to the original typography), I have written a document and provide a significant test of many DocBook tags. The paper has returned to its original WordPerfect 7 format and has two different format PDF versions, but these two versions are not intended to be portable or flexible. Using Docbook will be an improvement to both aspects. At present, I only discuss tags without discussing the processing of the target format. For the preface, let's start to create a document: Mertz paper XML documentation
"http://gnosis.cx/download/docbook/4.12/docbookx.dtd" [
]>
& bookInfo;
& chap1;
& chap2;
& chap3;
& chap4;
& chap5;
& chap6;
& chap7;
& chap8;
& appendix1;
& appendix2;
& biblio;
book>
You can see that the first step is mainly planned. Creating the contents of component level elements, such as chapters, is real work. However, by creating entity references to these component-level elements, I will create a process to partially easily manage. In addition, I also simplified individual chapters as a release or export process of individual documents. In the first step, I specified the type of document created is a book, so an element including a series of component levels references an external file. Some entities in the top-level definition are not directly used, but only in the included files. For example, entity & abstract; only in the bookinfo.sgm document. This is also like this in Chapter 5. It is to determine the call to the segmentation content, but my evaluation standard is a separate file that should be created for a separate document. I may do other adjustments when I extend this Docbook project. At this time, it also defines the name mentioned in the documentation, but this is not suitable for US-ASCII. I can't enter the pronunciation symbol directly, and enter, for example, & zizek; such a symbol is difficult to detect near my actual needs. You can also use your full phrase an abbreviation. The contents of the main document settings included in the main document settings are composed of a single document root mark and their contents. Including the document type declaration or processing instruction in the file. The document type has been declared in the main document of the central book, so you can put it in one place. For example, bookinfo.sgm file contains only the following:
Included XML / SGML SUBDocument
Non-representational terrorism subtitle>
& abstract;
bookinfo>
Similarly, each chapter file begins with the
XML Version = "1.0"?>
"File: // g: /articles/scratch/docbook/4.12/docbookx.dtd" [
]>
American Dilemma Citetitle> (1944) attribution>
Sort this no living man can yet detect, Because of the family
Our Type of Western Culture Envelops US. Cultural Influences Have Set
Up The Assumptions About The Mind, The Body, And The Universe with Which
Pose; Pose The Questions We ask; Influnce The Facts We Seek;
DETERMINE The Interpretations We Give these Facts; and Direct Our
Reaction to these interpretations and conclusions. para>
epigraph>
& chap5_1;
& chap5_2;
& chap5_3;
chapter> This large section has been marked into 3 sections, and each section has a top SECT1 as its root. However, I can choose to handle the same section as part of the book level or chapter level package. I also published the second quarter as a separate article, and its structure is the same as the structure of the chapter.
Further study this column only provides information on DocBook generally. The subsequent columns will introduce the DocBook tag more detail and describe how they are constructed. In addition, I will also discuss how to convert DocBook documents into a format for more suitable for direct reading, how to confirm them, and how to perform processing operations. Please continue to pay attention. In this whole, it is best to start slightly.
Some of the reference materials
Docbook reference materials.
There are many DocBook, which may be more available than anyone. For this reason, use
Docbook is preparing some reference materials and not bother - even for editing is a special tool. Once you have a preliminary understanding of the tag type to find, how to put them together, it is easier.
Reference
Carefully check the top two parts of the "XML problem" column.
XML problem # 1 introduced
XML_Pickle object,
XML problem # 2 describes how to use
XML_Objectify.
Best from Docbook: The Definitive Guide, Norman Walsh & Leonard Muellner, O'Reilly, Cambridge, MA 1999, starts to understand more details on DOCBOOK. The online version of this book is available. OASIS is a structured information standard to improve organization (Organization Standards), which is a non-profit international federation, such as utility, such as XML and SGML, to create industry specifications that can be interoperable. Their mission is to promote these standards, and their site Oasis provides additional information about organizations and standards. From some aspects, a format than DocBook is more portable and more dependent on time test is pure ASCII, or "smart ASCII", which combines simple style annotations from the format developed from the USENET. Of course, ASCII can't capture all Semantic structures of DocBook, but you don't need it many times. Project GuteNberg is an example of trying to reserve and utilize text in this neutral manner. TEX is an important tool that overlaps with the target of Docbook. TEX focuses closer to typography, but TEX has many semantic marking elements specific to mathematical aspects. My own article, including this draft, initially used similar "smart ASCII" format. Use tool TXT2HTML from moving tags. Please refer to the ASCII version or text version of this article. I have done an obscure philosophical papers may have no attraction to most XML developers, but the actual format used is very interesting. The document was originally written in WordPerfect 7, with some parts from another word processor format. By trying to use a style sheet for some important elements to make global changes easier. In an attempt to WEB publication, I will output documents to the PDF format, and the style is closer to the printing magazine / journal article, not submitted. PDF is a non-bad format, but it does not separate the content from the layout. Please refer to my original PDF format or the paper in WordPerfect 7 format. The files used in this article can be found in the XML problem # 3 file. "Docbook Gentle Guide" is also on developerWorks, which introduces Docbook and describes how to create a simple document using Docbook. Regarding the author may use improper words, it is not easy to use the interest of David Mertz. The word is the word. able to pass
Mertz@gnosis.cx Contact David Mertz, inhavid Mertz, detail his life in detail on http://gnosis.cx/publish/. Very welcome to past, this article or future column articles comments and suggestions.