1.0 Introduction This paper explores the relationship between XML and databases, while listing some software tools that can use the database to process XML documents. Although it is impossible to introduce and provide a deeper evaluation of these software here, I hope it can describe the main part of the XML document using the database. It is a bit a little bit of a relational database because my experience is. 2.0 Why use the database? When you consider the first to ask your own question when you want to use XML and your database, you should be: Why do I need to use a database. Do you need to display data? Do you need a space that saves your homepage? Does the database transmit XML as data transfer format when e-commerce use programs? The answers to these issues will directly affect the selection of your database and middleware (if used). For example, assuming that you are using XML as a data transfer format in your e-commerce utility program. Then means that the data format you need to transmit will mainly have a high degree of standard structure, then in the XML's own coding specification is not important for you, so your interest is just on the data instead of these Data is physically stored in the document. If your application is simple, then a relational database and data transfer middleware will be able to meet your needs; if the relationship is huge and complex, then you need a fully supported XML development environment. On the other hand, assuming that you are to create a function of creating a website from a spurious XML file. You need not only to manage this website, you still have to provide users with the functionality of the content. At this time, your file will be highly irregular, while the entity use is important to you, because the structure of these files is the basic functional requirements of the website. In this example, you need some "Native XML" database instead of a normal relational database, perform an interpretation, XML entity uses and supports query language (eg, XQL). 3.0 Data and Document Perhaps in most cases, it is to determine whether the most important factor in the database is whether you use the database to save the data or save the file. If you want to save the data, the database you need is primarily for data storage, such as a relational database or an object-oriented database, or a middleware that passes data between the database and the XML document. From another perspective, if you want to store a file, you need a content management system specially designed to store files. Although files can be saved in relational databases or object-oriented databases, you will find that your work is often repeated in a content management system. Simply put, although a content management system is typically built in an object-oriented database or a top layer of the relational database, if it is just a content management system as a database is proved to fail. Do you need to store data or files frequently depend on your XML file. The reason is that the XML file is divided into two categories: data is dominated by documents. 3.1 Data-based file data is characterized by the structure that the structure is quite specified, and the data format is good (that is, the smallest independent unit in the data is a PCData-ONLY element level or an attribute), and some or There is no mixed content. The order of the appearance of the same type element and PCDATA is not important. For example, XML file content is a sales list, flight plan, restaurant menu, and more. Data-based files are often used to design machine consumption, and XML call is redundant - it is just a data transfer.
For example, the following sales list is a data-based document:
Turkey Wrench: b> Stainless Steel, One-Piece Construction, Lifetime Guarantee. P> description> 9.95 Price> part> 10 quantity> line> 18 leserm> at a cost = "usd" TimeUnit = "MONTHS"> 1000 price>. Lease> it is Implement this XML document and a simple template file: Abc industries lessee> 123 Main St., Chicago, Il address> xyz property lessor> xyz property 18 leserm> 1000 price> lease> 3.2 Document-based file is based on document-based file performance The characteristics are: unregulated structures, a large number of raw data (that is, the smallest independent data unit is an element level that contains mixed content or it is a document), and a large number of mixed content. Among them, the same type of element and the order of PCDATA are very important. For example, a book, an email, advertisement, and almost all XHTML documents.
The document is usually used to design human consumption: For example, the following product description is a document-based file: Turkey Wrench name> Full FabRICivity Labs, Inc. Developer> Like a monkey wrench, but not as big. Summary> The Turkey Wrench, Which Comes in Both Right- and Left-Handed Versions (Skyhook Optional), IS MADE OF ... the finest stainless steel The Readi-grip rubberized handle quickly adapts to your hands, even in the greasiest situations Adjustment is possible through a variety of custom dials Para> You can: Para> Order Your OWN Turkey Wrench link> item> Download the catalog link> item> list> The Turkey Wrench COSTS JUST $ 19.99 and if you Order Now, Comes with a HAND- CRAFTED SHRIMP HAMMER AS A BONUS GIFT. Para> Description> 3.3 Data (Data), Documents, Databases, in fact, data-based files and documents - based area Don't be very clear. For example, although a data-based file (e.g., an invoice), there may be a large amount of data of the unregulated structure, such as the description of the invoice. A document-based document (eg, user manual) may also contain a specified data structure (usually metadata Metadata), such as author name and redepower date. In addition to these, you can judge whether it is another important feature of one of the two is that you are interested in data or for documents, which will also determine what kind of system you want to use. To store and get data, you can use a database (usually a relational database, object-oriented database or tree system database) and middleware, or you can also use the XML server (you can think of it as a database and The middleware is bundled together). To save your document, you will need a content management system.
Discussing various systems, "Storage and Acquisition Data" and "Storage and Acquisition Data" and "Storage and Acquisition Data" in Section 4.0. You can find some software list "available software" in Section 6.0. 4.0 Storage and acquisition data data can be obtained in the original definition of data-based files or from the field type in the database. The former's example is that you want to save the data in the database into an XML file to the website; the latter's example is that you need to save a lot of data into the relational database. Depending on your specific needs, you need the software or read XML data into the database or output the data in the database to the XML file, or both support. 4.1 Transcription Data When the data is saved in the database, it often needs to discard a large amount of content related to document information, such as its name and DTD, and its physical structure, such as entity definition and use, attribute value, and same type. The order of the elements, as well as binary data storage mode (is the Base64 code or no encoded entity or other means), the contents of CDATA and other encoded information. Simply, when information is obtained from the database, the final generated XML document result may not contain any CDATA or Entity USAGE (unless the entity LT is predefined (the symbol "<"), gt ("> "), Amp (" & "), APOS (" "), quot (")), the order of the same type element, the properties, the order of the properties. For example, suppose you need to use the information of a sales order from one in the XML format Get the data in the database and then transcribed into another database. In this example, the number of the sales list is not cared in the XML document is the date in which the date of the sales list is still, and it is not necessary to care for the name of the customer. CData Section acts as an extension, or even directly as a PCDATA. However, this information is very important for transcribing these related data from the first database to the second database, this information is very important. In this data The transmission software needs to consider using a tree structure (it implements a separate sales order information group (Group). Another example is that the document information and its physical structure will bring trouble example --- " Borrowing arbitrage documentation, it saves data from the document in a database, and needs to be reassembled into a new document, and this process often causes the structure of the new document and the original document. From The above example can be seen that the selection of the database and data transmission middleware is based on your needs. 4.2 Map the document structure into the database structure In order to be able to pass data between XML documents and databases, it is necessary to make documents The structure is mapped into a database structure, and vice versa, this mapping relationship is divided into two categories: template driver and model driver 4.2.1 Template-driven mapping with template-driven mappings, this mapping does not predefine document structure and database structure The mapping relationship between, but uses the method of embedding the template within the command statement, allowing the data transmission middleware to perform the template.
For example, consider the following template (note that the template is not adapted to all the products), in the element embedded Select Selection: XML Version = "1.0"?> The Following Flights have available seats: Intro> SELECT Airline, FltNumber, Depart, Arrive FROM Flights SelectStmt> We hope one of these meets your needs Conclude> FlightInfo> when the data transmission processing middleware When the document, each Select option will be replaced by their respective results, get the following XML format: XML Version = "1.0"?> The Following Flights Have Available Seats: Intro> ACME Airline> 123 fltnumber> DEC 12, 1998 13:43 depart> DEC 13, 1998 01:21 Arrive> row> ... flights> We Hope One of these Meets Your Needs comprude> flightinfo> This type of mapping method is quite flexible. For example, some products allow you to replace what you want in the final result - including using parameters in SELECT - not simply simply in the example above. In addition, it also supports the use of programming structures such as cyclic and conditional judgment structures. There is also it supports passing parameters through HTTP. Currently, the template-driven mapping only supports the transition from a relational database to an XML document. 4.2.2 Map of Model Drive In a model-driven mapping mode, its principle is to map the structure of the data model in the XML document to the structure of the database, and vice versa. Its disadvantage is that flexibility is not as simple as template, but the advantage is easy to use, because it is based on specific data models, usually it can complete a lot of conversions, so it is easy to use. Since the operation of converting data from the database into XML is based on a single model (model), it is usually integrated with XSL in this manner to provide flexibility. There are two models in XML documents that are very common. The first is a model used by many middleware packages in the transition XML document into relational database data, that is, the XML document is used as a separate table (Table) object or a list of list objects.
That is, the real XML document must be similar to the format below. If it is a single table object, the
... color > ... colorn2> ... row> ... table> ... database> "table" can be understood as a single result set (when the data is from the database When transmitting to XML) or is a separate table object or an updated view (view) (when the data is transmitted from XML). If the data is from multiple result sets (when the data is from the database) or the XML document contains a deeper nest element, it is necessary to manifest a series of list objects (when the data is converted to the database) So similar to the above example is not possible. The second universal data model is an object tree of an XML document species. Under this model, the element usually corresponds to an object or an attribute or a PCData object. This model is mapped directly to object-oriented databases and tree structure databases, of course, with traditional object-relational mapping techniques and SQL 3 object views can also be mapped into relational databases. It should be noted that this model is not a document object model (DOM), and the DOM means that the document itself is a model, not data in the document. For example, the sales single document introduced above can be seen as 5 class object trees --- Orders, SalesOrder, Customer, Line, And Part - By the following: Orders | SalesOrder / | / Customer Line Line | | Part PART When an XML document model is processed into an object tree, there is no special requirement for elements and objects. For example, if an element contains only PCDATA, such as a CustName element in a single document, it can be seen as an attribute (that is, only a separate value). Simply, sometimes the hybrid element or the elemental modeling process is very useful. An existing example is the processing of the Description element in the sales single document: although it has a mixed content in the XHTML Form, it is more useful to handle the Description element as a separate property. Because of its components There is no significance of itself. 4.3 Data Types, Null Values (NULL), Character Set Settings and All Other Simple Sets This section discusses some and the XML document is converted into a database about storage data. Usually, when you choose what kind of middleware to solve these problems, you won't take into account these problems, but if you pay attention to the existence of these problems, I hope to have the following discussion to have some time when choosing the middleware. help. 4.3.1 Data Type XML does not support any meaningful data type unless it is unrecognizable entity, all data in all XML documents is treated as text (Text), although it can be represented by other data types, For example, it can be expressed as a date or an integer.
Typically, the data conversion middleware will convert text (text in the XML document) into other data types (data type in the database), and vice versa. However, some specific data types are limited in the process of conversion, such as limited to the JDBC drivers that provide data support. In these many possible data types, the date type is usually caused. Digital, especially due to different digital formats in international territories, may also result in problems. 4.3.2 Binary data There are two comparative general methods to save binary data to an XML document: do not do any encoding processing and perform Base64 encoding processing on the entity (a MIME encoding method, can shoot binary data into US- ASCII subset). For relational databases, these two methods have proven to have problems, because everyone knows that when saving and acquiring binary data is very strict, so that the middleware will cause problems. In addition, there is no standard symbol to illustrate the elements in an XML document containing Base64 encoded data, so that the middleware may not recognize this coding at all. Finally, there may be some middleware that simply ignore the symbols in the entity or elements in the base64 encoding during the process of storing data into the database. Therefore, if the binary data is very important to you, please have to confirm whether your middleware supports binary data. 4.3.3 Null Value (NULL) In the database world, NULL data means that the data is not there. This is different from a value of 0 (for digital type data) or the length is 0 (on the string type). For example, suppose your data is collected from a weather station. If the temperature of the weather station is a problem, then a NULL value will be stored instead of 0, the value is 0 is exactly another thing. XML also supports the concept of null values, which can be implemented by setting the type and attribute of an element. If the value of the element or attribute is NULL, the XML processing method is simply not included in the document. But for the database, empty elements or properties containing a 0-length string do not mean Null: their value is a string of length 0. When an XML document structure is mapped into a database or, you must pay special attention to those optional data types and the properties of the null value. If you don't do it, the result will be possible to insert an error (when converting data to the database) or illegal document error (when data is read from the database). Because of the relative and database relative to the database in XML, there is a better flexibility in the symbolic meaning - specifically, the XML user is willing to give an empty element or the attribute that contains the length of 0 is considered "null" - You must choose what kind of middleware to choose from this consideration. Some middleware provide to the user-defined what flag in the XML document is to indicate "null". 4.3.4 Character Set Setup Depending on the definition, an XML document can contain any Unicode characters, except for some special control characters. But unfortunately, many databases are limited or unicode and require some special configurations to handle non-ASCII encoded data characters. If your data contains non-ASCII characters, make sure that your database and middleware can handle these character sets. 4.3.5 Processing Instructions Processing Instructions Not the "Data" section in the XML document, and many middlemen are not able to process them normally. The problem is that, especially in a strictly mapped XML document structure into a database structure, processing instructions are often difficult to process, because the topic can appear any position in the document, so, the middleware is very difficult to determine What position is saved to where to get back when you read.
If the "Round-Tripping" of the handling instruction and the textbook is very important to you, you must ensure that the middleware you choose can handle this problem. 4.3.6 Storing Markup This is very useful in the segment of Section 4.2.2, and sometimes directly saving or further parsing the elements containing elements or mixing content directly into the database. The most common implementation method is simple to save this flag itself into the database. Unfortunately, this will bring another problem that when reading these data from the database: It is difficult to determine that the flag in the database is true, especially some characters that are escaping by LT and GT. For example, the following description: confusing example: b> description> Saved to the database to turn this: confusing example: b> this When the database will not be able to judge that and are flag or text. The solution has the following, such as replacing the symbol of the flag using other non-sign symbols, but you have to be very careful, because maybe not use the program when using this data. Phenomenon. For example, if you want to query the smaller than the number ("<") and the LT flag ("<") in the database ("<"). 4.4 Generating a DTDS and Reverse Process from the Database Structure Data between XML documents and databases is: How to generate DTDS and its reverse process from the structure of the database. In short, there are currently many softwares that provide direct use of operational functions, but what it produces is not very helpful for many users, maybe there is not much. For example, the following process (already simplified) is to generate DTD from an XML document to the relational database: a Table and a primary key field for each element type, which contains an element or mixed content. For each element of the mixed content, create a separate table where PCDATA is saved, and is connected to the parent table through the primary keyword. Each of the types of a single value and a sub-element type containing only PCDATA content are newly created a column (field) in this table. If the child element type or attribute is optional, the field is allowed to be empty. For each multi-value attribute or more only the child element type of PCDATA content, a separate table is established to save their values, and the primary keywords of their parent tables are connected to the parent table. For each sub-element, these sub-elements itself include elements or mixed content, using keywords in the parent table to the child element table. The following is a process that generates an XML document from the structure of the relational database (simplified): for each table, create a new element. Each of the columns in the table is created or only a sub-element containing PCDATA to create a column containing the main key value in the primary key / foreign key keyword relationship. Unfortunately, there are many defects in these processes. For example, there is no predefined method for the data type and a predefined predefined parallel length in the DTD. Because of any pre-definition,, for example, by reading an example document, an error occurs when reading a document that exceeds the field length content in other documents. (Use the data type definition in XML Schema documentation when solving this problem) Simply, when generating DTD from a relational structure, there is no way to pre-determine the order of the child elements "should" appear or similar to the database. Logo.