XML and database

xiaoxiao2021-03-06 83

XML and database Author: onecenter

Author: Ronald

Summary: This paper explores the relationship between XML and databases, and lists software that can use the database to process XML documents. Although these software is not intended to be detailed here, the author wants it to describe the main part of the XML document using the database.

content:

table of Contents

1.0 Introduction

2.0 XML is a database?

3.0 Why use the database?

4.0 comparison of data and documents

4.1 Data-centric files

4.2 Documents centered on documents

4.3 Data, Documents and Databases

5.0 Storage and retrieve data

5.1 Transfer Data

5.2 Map the document structure as a database structure

5.2.1 Template drive mapping

5.2.2 Model Drive Map

5.2.2.1 Table model

5.2.2.2 Specific Data Object Model

5.3 Data type, null value, character set and other

5.3.1 Data Type

5.3.2 binary data

5.3.3 null value

5.3.4 Character Set

5.3.5 Processing Directive

5.3.6 Storage tag

5.4 Generate DTD and its mutual inverse process from the structure of the database

1.0 Introduction

This paper briefly explores the relationship between XML and databases, and lists software that can use the database to process XML documents. Although these software is not intended to be detailed here, the author wants it to describe the main part of the XML document using the database. It is a bit a little bit of a relational database because my experience is.

2.0 XML is a database?

Before you start discussing XML and database, we need to answer a question that is lingering: "Is XML is a database?" In strict sense, if "XML" means "no" when "XML" means "NO". Although the XML document contains data, but if there is no other software to process these data, it is nothing to do with the meaning of the database and other text files.

If "XML refers to the XML document and all related XML tools and techniques, the answer is" Yes ". The reason is that the XML provides many databases needed. Part: Storage (XML document), structure (DTD,

XML Schema Language), Query Language (XQL, XML-QL, Quilt, etc.), programming interface (SAX, DOM), and so on. However ... XML is still lacking in a lot of contents necessary in real databases: effective storage, indexing, security, transaction, data completeness, multi-user access, trigger, multi-document query, etc.

So if the XML can be used as a database in an environment where the data is generally, fewer users, and the performance requirements can be used as a database; in most product environments, many users are required, requiring strict data integrity and The performance is high, and XML can't be eligible. Moreover, considering databases such as DBASE and Access are inexpensively and very easy, even in the first case, XML is rarely just served as the role of the database.

3.0 Why use the database?

When considering the use of XML and database, the first question should be asking yourself should be: Why do I need to use a database? Do you need to export the original data? Do you need to save your web homepage? You have to use the database in an e-commerce application, and the XML is the data format for transmission? The answers to these issues will directly affect your choice for databases and middleware (if any).

For example, suppose you use XML in an e-commerce application to perform data transfer. This is a good solution, because your data has a highly standardized structure, and those entities and codes in XML are not important for you. After all, what you care about is just data rather than how these data is stored in the document. If your application is relatively simple, the relational database and data transfer middleware will meet your needs; if the application is huge and complicated, then you need a fully supported XML development environment.

From another aspect, assume that you have a website created from the zero-scattered XML file. Not only do you need to manage this website, but you have to provide a way to get users to query the content. At this time your file will be very irregular, while the entity will become important to you, because the structure of these files is the foundation of the website. In this example, you need a certain type of "native XML" database to perform versionization, tracking entities, and support query languages such as XQL.

4.0 comparison of data and documents

The author believes that when choosing a database, the most important judgment factor may be that you use a database to save data or save a document. If you want to save data, the database you need is primarily for data storage (such as relational databases or object-oriented databases) and mutual conversion between databases and XML documents. From another angle, if you want to store documents, you need a content management system specially designed to store files.

Although you can save your files in relational databases or object-oriented databases, you will often find that your work is in duplicate content management systems. Similarly, although a content management system is typically built on an object-oriented database or relational database, it is possible to use a content management system as a database as a database.

You need to store data or a document, and the answer often depends on your XML document. The reason is that the XML file is divided into two categories: data-centered and documentation. .

4.1 Data-centric files

According to data-centric files, the structure is quite specified, and the data grain is good (that is, the smallest independent unit in the data is a PCDATA element or attribute), little or no mixed content. The order in which the same level of elements and PCDATA is not important. A typical example is that the XML document contains sales orders, flight arrangements, restaurant menus, and more. Data-centric documents are often used for the use of machines. At this time, XML may be redundant - it is just a means of data transmission.

For example, the document of the sales order below is data-centric:

ABC Industries

123 main ST.

Chicago

60609

981215

Turkey Wrench:

Stainless Steel, One-Piece Construction, ONE-PIECE CONSTRUCTION,

Lifetime guarance.

9.95

Stuffing Separator:

Aluminum, One-Year Guarantee.

13.27

In the world of XML, many of the rich documents are actually data centered. We use the Amazon.com website showing book information as an example. Although this page is quite huge text, the structure of this text is highly standard, where many of them are the same for any book description page, and the size of each part in the Features page is limited. That is, the page can be established by a simple, data-centric XML document, which contains text information obtained from the database and an XSL style sheet. Typically, any websites currently dynamically constructing the HTML page by filling database data in the template can be replaced by the above-centered XML document and one or more XSL style meters.

For example, let's look at the Lease documentation below:

ABC Industries Agrees To Lease The Property At

123 Main St., Chicago, Il from xyz

Properties for a Term of Not Less TimeUnit = "MONTHS"> 18 At a cost of currency = "USD" TimeUnit = "MONTHS"> 1000.

You can get from the XML documentation and simple style sheet:

ABC Industries

123 Main St., Chicago, IL

XYZ Properties

1000

4.2 Documents centered on documents

The characteristics of documentation-centered documents are: structural irregularities, larger data granules (ie, the smallest independent data unit is an element including mixed content or an entire XML document) and containing a large amount of mixed content. The order of elements and PCDATAs in which the same levels are very important. Typical examples are books, email, advertisements, and most XHTML documents. Document-centric documents is used for human use.

For example, the following product description document is based on documentation:

Turkey Wrench

Full Fabranion Labs, Inc.

Like a monkey wrench, but not as big.

The Turkey Wrench, Which Comes in Both Right- and

Left-handed version (Skyhook Optional), IS Made of The Finest

Stainless Steel. The Readi-Grip Rubberized Handle Quickly Adapts

To Your Hands, Even in The Greasiest Situations. Adjustment IS

Possible Through a variety of custom dials.

You CAN:

Order Your Own Turkey Wrench

Read more about wrenches

Download The Catalog

Turkey Wrench Costs Just $ 19.99 and, if you

ORDER NOW, COMES WITH A HAND-CRAFTED SHRIMP HAMMER AS A

Bonus gift.

4.3 Data, Documents and Databases

In reality, the difference between data-centric files and documents-centric files is not very strict. For example, a document-centric file (such as an invoice) may also contain coarse particles, irregular data (such as the description of the invoice). A document-centric file (such as a user manual) may also contain a good granularity, the structured data of the rules (usually metadata), such as the author and the revision date. In addition, let your documentation have data-centered or as a document as a document. It helps you determine whether to care about data or documentation, which will also determine what kind of system you need.

To store or retrieve data, you can use a database (usually a relational, object-oriented or hierarchical) and middleware (word or use third part), you can also use XML servers (ie create distributed applications) The platform, such as e-commerce applications that use XML for data transmission). To save a document, you will need a content management system or a consistent DOM implementation system. Discussion on various systems in 5.0

"Storage and retrieve data" section and 6.0 "

HREF = "# storingRetrieVingDocs"> Storage and retrieve documentation "section. You can also

href = "http://www.rpbourret.com/xml/xmldatabaseprods.htm">

Detailed related products list in XML database products.

5.0 Storage and retrieve data

The data content in data-centered documents may come from the database (at this point you want to export the data as an XML format), or the XML document (this point you want to store the data in the database). The former example is a large number of existing data (or lyric data) stored in the relational database; the latter example is to publish the data as XML in the web, and you want to store in your database for more Multiple treatment. In this way, according to your needs, you may need to transfer XML documents to the database, or you may need to transfer from the database to the XML document, or both support.

5.1 Transfer Data

When you store data in a database, you often need to discard information related to documents, such as document names and DTDs, as well as their physical structure, such as entity definition and use, order of attribute values, and in-binary data. The storage method (which is Base64 encoding, is a unhaken entity or his way), character data segment and other encoded information. Similarly, when the data is retrieved from the database, the resulting XML document result In addition to non-predefined entity LT (<"), GT (">), AMP ("&"), APOS ("'), quot ("), quot "" ") Does not contain any CDATA or entity reference. The order of appearance of the same elements and attributes is often the order of data returned from the database. Although it is a bit surprised, this is often reasonable. For example, suppose you need to use XML as a data format to transfer one from a database to another in another database. In this case, the number of the sales list is not concerned in the XML document is to save the date of the sales list or it is not followed, nor does it use to save the customer's name in the character data (CDATA) or as an external entity Or directly as a PCDATA. The most important thing is that the relevant data is transferred from the first database to the second database. In this way, this data transmission software needs to consider the hierarchy of the data (this structure is grouped), while others do not have to consider too much.

One of the consequences of ignoring document information and its physical structure is

The inconsistency of the "reverse regression" of the document is stored in the database in the database, and then reorganizes into new documents based on these data. Even according to the standard format, it is often often different from the previous document. Whether this can be accepted to depends on your needs, and will also affect your choice for your database and data transfer middleware.

5.2

Mapping from the document structure to the database structure

In order to transmit data between XML and databases, mutual mapping needs to be performed between the document structure and the database structure. Such mappings are usually divided into two categories: template drivers and mode drivers.

5.2.1 Template drive mapping

In a template-driven map, there is no predefined mapping between the document structure and the database structure.

Instead, use the method of embedding the template within the command statement, let the data transmission middleware processes the template. For example, consider the following template (note that the template does not apply any actual product), in the element embeds the SELECT statement:

The Following Flights Have Available Seats:

Select Airline, FltNumber, Depart, Arrive from Flights

We Hope One of these Meets Your Needs

When the data transmission middleware processes the document, each Select statement will be replaced by the respective execution results to get the following XML format:

The Following Flights Have Available Seats:

ACME

123

DEC 12, 1998 13:43 DEC 13, 1998 01:21

...

We Hope One of these Meets Your Needs

This map-driven mapping can be flexible. For example, some products allow you to replace the content you want (including parameters in Select) in any result set, not simply simply in the example above. In addition, it also supports the use of programming, such as cyclic and conditional judgment structures. There are also a parameterization of the SELECT statement, such as passing the parameters by HTTP.

Currently, the template-driven mapping only supports the conversion from a relational database into an XML document.

5.2.2 Model Drive Map

In a model-driven map, the data model corresponding to the XML document structure will be explicitly or implicitly mapped into the structure of the database, and vice versa. Its disadvantage is that flexibility is not enough, but it is easy to use because it is based on a specific data model to map, and it is usually possible to achieve many conversion work for users. Since the result of converting data from the database into XML, according to the single model,

Therefore, in this manner, in this manner, the flexibility in the system of the template-driven system is typically combined.

Data views in XML documents typically have two models: Table models and specific data object models. Other models may sometimes appear. For example, by adopting ID and IDREF properties, an XML document can be used to specify a graphic. However, many existing middleware do not support these models.

5.2.2.1 Table model

Many middleware packages are converted between XML and relational databases. It looks like an XML model as a single table or a series of forms. That is, the structure of the XML document is similar to the following example, where in the case of a single table, does not appear:

...

The term "table" is understood to be a single result set (when converting data from the database), or a separate table or updatable view (when converting data from XML). If the data needs from multiple result sets (when the data is from the database) or a collection of a series of tables (when the data to the database) is reached, the XML document contains a deeper nesting element, then similar The conversion is almost impossible.

5.2.2.2 Specific Data Object Model

The second universal data model in the XML document is a tree structure of a particular data object. In this model, the element type usually corresponds to the object, and the content model, attribute, and PCDATA in XML correspond to the properties of the object. This model directly maps to object-oriented databases and hierarchical databases, of course, with traditional object-relational mapping technology and SQL

3 object views can also be mapped into relational databases. It should be noted that this model is not a document object model (DOM). The DOM is modeling the document itself instead of data in the document. Such as

HREF = "# WriteYourown"> 6.1.2 This section is used to establish a content management system on the basis of the relational database.

For example, the above sales settle document can be considered as a tree structure consisting of five classes. As shown in the following view, including Orders, SalesOrder, Customer,

LINE and PART class:

ORDERS

SalesOrder

/ | Customer Line Line | |

Part part

When an XML document is modeled as a specific data object tree, there is no need to ask the element to correspond to the object. For example, if an element contains only PCDATA, such as a CustName element in the sales setup document, it can be processed as an attribute, so the attribute contains only a single, scalar type value. Similarly, it is also very useful to simulate the mixed element or element content. An outgoing example is the processing of the Description element in the sales setup document: Although it has a mixed content in the XHTML format, it is more useful to see the Description element as a single property, because its components are nothing significance.

5.3 Data type, null value, character set and other

This section will explore some storage issues related to the XML document from the database. Usually, you decide how you choose the middle piece to solve these problems, but you should be aware of the existence of these issues, because this helps you choose your middleware.

5.3.1 Data Type

XML does not support any data type with practical meaning. In addition to the alias, the data in all XML documents is treated as text, even if it can be represented by other data types (such as dates or integers). Typically, the data conversion middleware will convert text in the XML document into data types in other databases, and vice versa. However, the text format recognized by a specific data type is limited, for example, provided by JDBC

Data types supported by Driver. In these numerous data types, the date type is usually caused. Differences in digital formats in different international regions may also have problems.

5.3.2 binary data

There are usually two ways to save binary data into the XML document: a unharded entity and base64 encoding (a MIME encoding method, binary data can be mapped to subsset of US-ASCII).

For relational databases, these two methods may have problems because the rules of saving and retrieving binary data are very strict, which will cause problems in middleware.

In addition, there is no standard symbol to illustrate the elements in an XML document containing the base64 encoded data, so that the middleware may not recognize this coding at all. Finally, when storing data into the database, the symbols associated with the unhabited entity or base64 encoding element may be ignored. So, if you say that binary data is very important to you, please confirm whether your middleware supports binary data.

5.3.3 null value

In the database world, null data means that the data does not exist. However, this is very different from a string of a number or length of 0 or a length of 0. For example, suppose your data comes from a weather station,

If the temperature of the weather station is measured, you can't read the temperature value, then a NULL value will be stored instead of one 0. Obviously, the value is 0 is exactly another thing.

The support of null value concept in XML can be implemented by setting an optional element type or attribute. If the element type or attribute value is null, XML is as long as the document does not contain this element or attribute. However, for the database, empty elements or attributes containing a 0-length string are not null NULL: their value is a string of length 0.

When the XML document and database structure are mapped in mutual mapping, you must pay special attention to whether the optional element type or attribute corresponds to the null value item in the database. If you don't do this, you are likely to appear insert errors (when data is converted to the database) or an invalid document error (when data is read from the database).

Because it is also necessary to use symbolic null values, more flexible to the database in XML. Specifically, many XML users are likely to include empty elements or properties of empty strings. At this time you have to consider how to choose the right middleware to solve this problem. Some middleware allows users to choose what to define in an XML document to form an empty value.

5.3.4 Character Set

Depending on the definition, in addition to some control characters, the XML document can contain any Unicode characters. But unfortunately, many databases are limited or unicode, and some special configurations are required to handle non-ASCII encoded character data. If your data contains non-ASCII characters, you must verify that your database and middleware can handle these characters. 5.3.5 Processing Directive

The processing instruction is not a "data" section in the XML document, so many middleware may not be processed normally. The problem is, especially when the XML document structure is strictly mapped into a database structure, the processing instruction is often difficult because they can virtually appear anywhere in the document. Therefore, the middleware is difficult to determine where to save them to where and when to retrieve it. If the loop reply of the handle and document ("Round-Tripping") is very important to you, you must check that your middleware is like solving this problem.

5.3.6 Storage tag

Href = "# markup"> 4

HREF = "# Markup">. 2.2

It is very useful to mention that the elements containing elements or mixing contents are sometimes preserved to the database directly to the database. The most common way is to simply save this tag itself directly into the database. Unfortunately, a problem will occur when retrieving data from a database: It is impossible to determine that the label in the database is true or representative of the entity that represents the tag character, such as the character of the LT and GT escape.

For example, the description element below:

confusing example:

Store in the database:

confusing example:

At this time, the database cannot be judged that and are tag or text. There are several possible solutions, such as marking markers in a certain way or using entities for non-marked tag characters. But at this time you have to pay more attention to whether such a way is compatible with other applications that use these data. For example, if you want to query less than the number ("<") in the database and

The LT entity ("<") should pay special attention.

5.4 Generate DTD and its mutual inverse process from the structure of the database

When converting data between XML documents and databases, a universal problem is: How to generate XML DTD from the structure of the database, if the database is generated from the XML DTD. In short, this is a very direct operation, but the resulting result is usually some distance from the expectations of many users.

(Note that this is usually one-time operation, while most applications, especially all vertical applications combine the collection of known DTDs and relational SCHEMAs. Obvious special case is to store random XML documents in relational databases or Publish relational data into an XML document; in the back, DTD is not obvious.)

Each of the types of properties in the element type and a child element type containing only PCDATA content in this TA

BLE is newly established (fields). If the child element type or attribute is optional, the field is allowed to be empty.

For each multi-value attribute or more sub-element types containing only PCDATA content, then establish a separate

Table to save their values, connect to the parent table through the primary keywords of their parent tables.

For each sub-element, these sub-elements itself has elements or mixed content, using keywords in the parent table.

The parent element table is connected to the child element table.

The following is a process that generates an XML document from the structure of the relational database (simplified):

For each Table, create a new element.

Each of the columns in the table is created or only a sub-element containing PCDATA to create a column containing the main key value in the primary key / foreign key keyword relationship.

For example, the following process (simplified) describes how to generate a relational structure from one DTD:

For each element type containing an element or mixed content, a new table and a primary key field.

For each element type containing the mixed content, create a separate table, where the data is stored, and links to the parent table through the parent element master key.

Each single-value attribute for this element is and only the resulting data content, only one sub-elements appear, and a field is created in the table. If the element type or attribute is optional, you can set the field as a null value.

For each multi-value attribute and multiple sub-elements, create a separate table to store values, and link to the parent table through the parent element master key.

The parent element table and sub-element table are connected to each of the sub-elements of each element or mixed.

The following process (simplified) describes how to generate a DTD from a relational structure:

For each form, create a new element;

For each field in the table, create a new property or a sub-element that contains the data.

The relationship between the primary key / foreign key to provide the primary key in each table field is new.

Unfortunately, there are still some defects in these processes. For example, there is no method in the DTD predetermined data type or field length.

Because any pre-definition (eg, by reading a document or other document containing more "type" documents or other documents, an error is generated when reading a document that exceeds the word length content. (Long-term strategy is to use the data type of XML Schema document.) Simply, when Generating DTD from a relational structure, there is no way to pre-determine the order or field of the child element "should" appear (such as line identification inside the database) Whether it is fully converted.

Name conflicts may occur in both cases.

Although there is such a defect, these methods can still be well laid a starting point between relational structure and DTD.

转载请注明原文地址:https://www.9cbs.com/read-110276.html

9cbs

New Post(0)