Author: Zhu Liang XML technology from the date of birth indicates its glorious future, especially with the recent one or two years of vigorous development of the Web Service, XML is increasingly active in the field of data exchange and storage. The growth of XML data volume index, requires more efficient data management capabilities and faster, more accurate queries. At the same time that traditional database vendors announced support for XML, a new database technology, Native XML DBMS (NXD) has also emerged, breaking the situation of the RDBMS traditional database, and provides a good development opportunity for database technology research. This article will introduce you to the relationship between XML and database, the technical characteristics of NXD, the traditional database and NXD comparison and the status and prospects of NXD.
XML and database relationship
Is XML a database? The XML document has the characteristics of "self-description", "unlimited nested", "tree structure", so in a sense, an XML document is a database or one of them. The XML document STUDENT.XML shown below describes the information of a student: a student number, name and so on. We can easily correspond to a two-dimensional table (Table) in traditional RDBMS: label Student is a column, label ID, name, etc. Student.xml
name>
...
student>
We can put the relevant XML documents in a directory, use the file system to manage, provide queries, changes, and delete operations. In order to better support XML, W3C also developed some related technologies, such as: Document Mode (DTD, XML Schema), Query Language (XPath, XQuery, etc.), programming interface (DOM, SAX, etc.) to facilitate development applications. However, if you start from a higher technical perspective, you will find that the simple file management of XML documents is far less than: inefficient thumbness ⒆ 橹 ⑺ 檠 檠 酰 ┦ 瘛 瘛 指 薹 薹 な な 莸 暾 滦 滦 ⒎⒖ ぞ ぞ ぞ ぞ 取 ぞ 取 ぞ ぞ ぞ ぞ 事 事 事 事 事 事 事 事 事 事 事 事 事 事In the unified data format, not the characteristics of the database. Therefore, in XML applications, the database is still not changed as the location of data management. The tree structure of the XML data itself is different from the two-dimensional table structure in the relational model. This difference is reflected in the technology of the database product processing XML data, forming two major camps: XML-Enabled DBMS (Xed) and Native XML DBMS (NXD) ). Xed is an XML support module based on the original database, completes format conversion and transmission between XML data and databases. From the storage granularity, the entire XML document can be used as a row in the RDBMS table, or the XML document is parsed, stored in the corresponding table. In order to support some XML operation criteria for W3C, such as XPath, XED provides some new primitives (such as Oracle9ir2 adds some packets to operate XML data, etc.), and optimizes the XML processing module. NXD appears in the field of XML data processing, typically adopting a hierarchical data storage model, maintains the tree structure of the XML document, saving the data conversion process of the XML document and the traditional database. See 2 for details. Two document types "Data-centric" "Data-centric" XML document focuses on data in the document, not document format, such as flight information, sales order, scientific calculation results, etc. The data of this document is typically generated by the machine from the data in the traditional database. Mainly used in the fields of e-commerce, ERP, EAI, integrated different data sources, exchange information. "Data-centric" XML document has the following features: • Structured data · Data granular size is moderate · Mixed Content · Document-Order's order (Document-Order) is a typical The "Data-centered" XML document records the information of the student. Each student's information is very standard, and the particle size is appropriate, the order between the same level (Element) is not important, and the exchange of two episodes (Element) does not destroy the readability of the document. "Document-centric" "Document-centric" XML document is primarily used to represent the data described in human natural language, such as email, books and user manuals. This document has a more complex structure, generally not automatically generated by the machine. At present, most of the data on the Web can represent such a document. "Document-centered" document has the following features: • Semi-structured or unstructured data · Mixed Content · Document-Order Document Sequence (Document-Order) Important The products.xml is a typical one "Document-centered" XML document. Products.xml
The
Database, ... summary>
Intro>
...
Product>
For "data-centered" XML documents, Xed can easily extract it, stored in the traditional database, but the XML document "Centered as document" is not strong. NXD is very advantageous because there is no need to convert data between the two models.
Technical features of NXD
NXD is specifically designed to store XML document, and also has the characteristics of general databases, such as support transactions, concurrent control, query language, security mechanism, secondary development interface, etc. The only difference is that its internal storage model is based on an XML document tree structure, not a relational model. Ronald Bourret In its "XML and Databases" article, the NXD has the following definitions: "NXD logic model is built on the XML document, not the data in the document, and accesses data according to it. This model includes at least an element (attribute), PCDATA and document order, such as xPath data model ... NXD's minimum storage unit is XML document, ... "Generally speaking, NXD should have the following characteristics: Document Collection , Query, update, transaction, lock, and concurrent control, secondary development interface, etc. Document collection Many NXD products support the concept of "document collection", just like a directory in the file system or one table in RDBMS, a "document collection" gatches a class of documents to facilitate user operation. Inquiry at the collection level, the modification operation will be reflected in each document in the collection. Generally, a "document collection" is associated with a mode. When adding a document to a mode "Document Collection", the document to be added will be modeled. Only documents that match the "Document Collection" mode can be added. Unlike the RDBMS in the table must have a mode, NXD also provides "no mode" document collection, which is not necessary to check the mode of the document when a document is placed in the collection. "No mode" document collection is greatly convenient for user storage formats to unify, semi-structured XML documents. Query language XPath and XQuery are the query language recommended for the XML document recommended by W3C. At present, most NXD products support XPath,. There are also some NXDs to provide proprietary query languages. XPath is based on the XML document tree model, gives a query path from a node, search for documents. At present, XPath has many defects as database query languages: unable to group, sort, connection, etc. XQuery is more like a programming language, supporting logic such as loop, supporting packet, sorting, connection, etc. Compared to the standard SQL statement of the traditional database, XQuery is a more powerful and easier programming method for querying XML data. Transaction, lock and concurrent control, almost all NXDs support transaction processing. However, the size of the lock is usually relatively large, and the support of the entire document rather than the document break (Fragment), the support is relatively low. The specific concurrency depends on the application and "document". Both of the secondary development interface provide programming interfaces: providing database connections, browsing metadata, executing queries, and returning results. The return result is usually an XML string, a DOM tree, and returns a SAX parser for a document. If the query return result is a plurality of documents or document breaks (FRAGMENT), it is usually available to enumerate these results. For database products running in a Client / Server mode, the results can also be passed to the client via a network protocol (such as http). Round-Tripping NXD an important feature is that it provides Round-Tripping for XML documents: You can store the XML document in NXD and retrieve the "same" document. It is very important for "document-centric" applications because the CDATA section, entity applications, annotations, and processing instructions ignored by XED are indispensable components. Especially for data documents that are not allowed to be tampered in the format in the field of legal and medical fields.
All NXDs can provide Round-Tripping for documents at elements, attributes, CDATA, and file sequences, depending on the specific extent on the database product. Updates and sustainable DOM Most NXDs are implemented by the XML document update through the API calls provided, or simply replaced the entire document. Some NXDs also provide a sustainable DOM (Persistent Dom, PDOM): DOM model is implemented on some persistent storage medium, which is directly reflected in the database. Since the PDOM tree is "field", the database usually and the application in the same process space.
Comparison of traditional databases and NXD
Xed relying on traditional database technology for two or three years of accumulation, in the early stage of NXD competition, the XML application market occupied the XML application market: In the survey report in Intellor 2001, XED's market share is NXD 3 Double, about 1.2 billion US dollars. Although NXD appeared late, the growth rate of market share is very strong, in just 3 years, it has been developed from $ 390 million in 1999 to $ 390 million in 2001, and will be 200% in the next two years. growth rate. At present, NXD is mainly used in manufacturing, biomedicine, telecommunications and other fields. By comparing Xed and NXD technology, we can see the differences in both, especially reflected in the application. This is very meaningful for users who need to choose a database to develop XML applications. The technical feature of XED access XML data is completed by the XML gateway module, which is located between the user logic module and the database logic module, and packages the traditional database package to provide the user a transparent XML data source. Xed Access XML documents face the following technical difficulties: XML document mode and XED mode mapping In order to save the XML document to XED, we must map the XML document mode (DTD or XML Schema) to the database mode. Similarly, the data is taken out from XED to the XML document, and the opposite operation is completed. This transformation occurs on elements, attributes, and text. Since the XED pays attention to the data instead of format, most of the physical structure of the XML document (CDATA, entity, etc.) and a part of the logical structure (processing instructions, annotations, etc.) are ignored during this process, and the data is saved. This conversion may lose information, and an XML document exists after the XED, and it may become another format. Round-tripping provided relative to NXD, XED can only reserve information on the data level. Query Support for XML data is hard to keep consistent due to the XML document mode and XED mode, so it is often used to complete the conversion in the access process. But XSLT is very expensive, which will have a great impact on query performance. So better solution is that XED provides a query language to return to an XML document. There are already many Xed products that have provided this language, mainly divided into three types: Template-based query This is the most popular way of RDBMS XED, embedding the SQL statement into a written XML document template, Replace the results in the actual query. SQL (SQL-based query) completes queries for XML data by adding support for XML by adding support in the implementation of SQL statements. For example, Oracle9ir2 adds an XMLTYPE type and some new function packs to support XMLDB. XML queries include XPath and XQuery. Unlike both above, this query is built on the XML document model. That is, if the XED wants to support this query mode, virtual XML documents must be provided. Basically, the current XED only supports XPath. Data types, null values, character sets, etc., in the XML document and XED conversion, there will be problems such as matching, null and character set processing of data types. In addition to some entities that are not parsed in addition to some entities that are not parsed, the XML document is represented by text. In the conversion process, the problem of type mismatch may be encountered, such as the limitations of the JDBC drive module, different internationalization of the date. The XML document supports null values in a flexible way, such as omitting an element (Element), and attribute, etc., etc. And these are different in XED. The same problem also appears on the character set, binary data, and the processing of the XML document tag.
Xed and NXD Comparison - Xed's excellent, weakness: users do not need to re-port the original database into the new system, only slightly change, you can support XML applications. Traditional database technology, such as concurrent control, transaction, etc., is very mature traditional database knowledge and experience remain effective, users do not need to learn a new set of database technology disadvantages to apply XML: XML documentation requires it "Broken", need to "combine" when taken, but also time consuming, but also the format of the document may be complex between the XML document and the database, and it is necessary to invest greatly in the previous development phase. "Centered", Format complex XML document processing performance is more backward after adopting XML technical standards - NXD excellent, weakness advantage: XML document access does not need mode conversion, access speed fast XML document support than XED support most The latest XML technical standard disadvantage: in terms of traditional database technology, there is no new test knowledge, and the corresponding support personnel and document resources are relatively small applications only in the XML application field in fact, both. There is no uniform answer, but related to specific applications. When the development format is simple, the data content is more important than the format, XED is a good choice, especially in the existing traditional database to provide an XML access interface. Conversely, if the XML document format is complex, the data itself has a hierarchical relationship, or when only XML data is available, it can consider NXD because it provides better performance, which is more fully supported for XML standards. In addition, since NXD has not received a time test in traditional database technology such as transactions, data recovery, there are few applications with higher data security requirements, such as banks, financial systems database, and Xed relative to the traditional database. have more advantages. NXD status and prospects