XML feature in ADO.NET
Release Date: 4/1/2004
| Update Date: 4/1/2004
Dino Esposito Wintelle
December 13, 2001
There is no doubt that XML and some of the related technologies (including XPath, XSL Transformation and XML Schema) are the foundation of ADO.NET. Compared with ADO, the interoperability of the ADONET object model has been greatly improved, and in fact, XML is a key element for this important role. In ADO, XML is just a (non-default) I / O format for retaining the content of the disconnected recordset. The participation of XML in ADO.NET is more in-depth. You can summarize the stronger interaction and integration of ADO.NET and XML with the following points:
• Object Series and Remote Processing • Dual Programmable Interface • Batch Update for XML Drivers (SQL Server 2000)
In ADO.NET, you can save objects to XML documents and restore objects from an XML document. In summary, this ability is only a DataSet object, but you can extend to other container objects with the least code. Save objects such as DataTable and DataView can be considered as special circumstances for data set sequences.
In addition, the ADO.NET and XML classes provide a unified intermediate API, and programmers can use it by synchronous dual programming interfaces. You can access and update data using XML node-based hierarchical method or column-based table-based dataset relationship. You can switch to XML DOM from the data set of data at any time, and vice versa. The data will be synchronized, and any changes you entered in one model will be reflected in another model and see it. In this article, I will discuss the serialization of ADO.NET to XML and XML data access, which is the top two in the above list. Next month, I will mainly discuss XML-driven batch updates - you get from SQL Server 2000 XML Extensions (SQLXML 2.0) is one of the cooles.
Data set and XML
Just like any other .NET object, the DataSet object is stored in memory in a binary format. However, different from other objects is that the data set always performs remote processing and serialization in a special XML format called DiffGram. When the data set enters the boundary of the application domain or the physical boundary of the computer, it will be automatically presented as DIFFGRAM. On the target, there is no prompt to reconstruct the data set as a binary object and can be used immediately. The application can use the same serialization function through a group of methods, which is obviously very prominent. They are readXML and Writexml. The following table shows how you can use XML data sets in reading and writing.
Getxml
• Returns a string that is the xml representation of data stored in the data set • Does not include any architectural information GetXmlschema
• Returns a string, which is the XML schema information of the dataset ReadXML
• Fill the DataSet object ReadXMLSchema using the specified XML data read from the stream or file
• Load the specified XML schema information to the current DataSet object WriteXML
• Write XML data (or architecture) indicating the data set • Write the stream or file WriteXmlschema
• Write a string that is the XML schema information of the dataset • You can write a stream or file
As shown in the table, when using Dataset and XML, you can manage data and architecture information as different entities. You can accept the XML schema from the dataset, use it as a string. You can also write it to disk files or load it into an empty DataSet object. The method listed in the above table can comply with the DataSet object further contains two properties related to XML - Namespace and Prefix. NameSpace determines that XML attributes and elements are read to the Dataset time to define the XML namespaces of their range. The prefix as a namespace alias is stored in the Prefix property. Back to top
Build a data set from XML
The READXML method fills the DataSet object from multiple sources, which includes an instance of disk files, .NET streams, or XMLReader objects. This method can handle any type of XML file, however, if the XML file has a configuration equivalent to an irregularly format structure, some problems may, if present in the format of the column and rows, may of course produce some problems.
The READXML method has several overloads, all of which are similar. They accept XML sources and optional XMLReadMode values. E.g:
Public XmlreadMode ReadXML (String, XmlReadmode);
This method creates a relational architecture for the data set based on the specified read mode and whether there is an architecture in the data set. The following code snippet shows a typical code for loading the data set from an XML.
StreamReader SR = New StreamReader;
DataSet DS = New DataSet ();
DS.ReadXML (SR); // defaults to XmlreadMode.Auto
sr.close ();
When the content of the XML source is loaded into a dataset, the READXML does not merge its primary key information. To put existing data sets with data loaded from XML, you must first create a new dataset and then use the Merge method to merge these two datasets. During the merge, the rows to be covered are rows that have matching primary keys. You can also use another way to combine existing DataSet objects with content read from XML, ie through the DiffGram format (discussed later).
The following table explains multiple read modes supported by READXML. You can set them with an XMLReadMode enumeration.
IgnoresChema ignores any embedded architecture, relying on existing architectural READSCHEMA, relying on the dataset, and loads data and architectures InferSchema ignoring any embedded architecture, and reads DiffGram from the XML Data Inference Architecture DiffGRAM and adds data to data. Go to the current architecture fragment reading and adding XML fragments, which has been the end of the stream
The default read mode is not listed in the table, which is XmlReadMode.Auto. When setting this mode, or if you have not explicitly set any read mode, the READXML method checks the XML source and selects the most suitable option.
If the XML source is found to be DiffGram, the source is loaded as DiffGRAM. If the source exactly includes an embedded architecture or a reference to an external architecture, READXMLSCHEMA will be used to load the source. Finally, if there is no architectural information in the XML source, the READXML method is inferred in the INFERXMLSCHEMA method of the data set. The relationship between the data set (ie, architecture) consists of tables, columns, constraints, and relationships. Let's take a look at what happens when you set each of them.
The XMLReadMode.ignoreschema option will enable this method to ignore any embedded architecture or reference architecture. Therefore, the data is loaded into an existing data set architecture, and any data that is not suitable will be abandoned. If there is no architecture in the data set, no data is loaded. Note that the empty data set has no architecture information. Remember, if the XML source is DiffGram format, the effect of the IgnoreSchema option is the same as XmlReadMode.diffGram. // no Schema in the dataset, no data will be loading
DataSet DS = New DataSet ();
StreamReader SR = New StreamReader;
DS.ReadXML (SR, XmlreadMode.Ignoreschema);
The XMLReadMode.Readschema option is only valid for the embedded architecture without identifying external references. It can add a new table to the dataset, but if there is any table defined in the embedded architecture in the data set, it will lead an exception. You cannot use the ReadSchema option to change the architecture of an existing table. If the data set does not include the architecture (that is, the data set is empty), and no data is not read and load any data. READXML can only read the embedded architecture defined using an XML schema definition language (XSD) or XML-Data Reduced (XDR). No document type definition (DTD) is not supported.
If the XMLReadMode.Inferschema option is set, ReadXML will derive the architecture directly from the structure of XML data, and ignore any embedded architectures that may exist. Data is only loaded after inferring the architecture. Add a new table or add a new column to existing architectures to existing tables as appropriate. You can use the Inferxmlschema method of the dataset to load the schema from the specified XML file to the data set. To some extent, you can control the XML element processed during the architecture inferred operation. With the signature of the InferXmlschema, you can specify a set of naming spaces that will be excluded from the inferred.
Void InferXmlschema (String FileName, String [] Rgnamespace;
Diffgram is ADO.NET to save the XML format of the data set state. Similar to SQLXML's UpdateGram format, DiffGRAM includes both the current state and the original state of the data line. When using READXML to load DiffGRAM, the row with matching primary keys will be combined. You can explicitly indicate ReadXML to take effect on DiffGram using the XmlReadMode.diffgram flag. When using the DiffGram format, the target data set must have the same architecture as DiffGram, otherwise the merge operation will fail and will lead to an exception.
If the XMLReadMode.fragment option is set, the data set will be loaded from the XML clip. The XML fragment is a valid XML, it identifies elements, attributes, and documents. The XML fragment of the element is a tag text that fully qualifies the XML element (node, CDATA, processing instructions, comments). The clip of the property is attribute value, and the clip of the document is the entire content collection. If the XML data is a clip, the root level rule of the XML document is not applied. A fragment that matches the existing architecture is appended to the appropriate table, and the clip of the mismatched architecture will be abandoned. READXML reads the end of the stream from the current location. The XMLReadMode.fragment option should not be used to populate an empty and lack of architectural data sets.
Back to top
Sequence data set to XML
The XML representation of the data set can be written to the file, stream, XMLWRITER object, or string using the WriteXML method. The XML representation may include architecture information or exclude architecture information. The actual behavior of the WRITEXML method can be controlled by the XMLWRITEMode parameters you can pass. The value in the XMLWRITEMODE enumeration determines the layout of the output. Data set representations include tables, relationships, and constraint definitions. If you don't choose to use the DiffGram format, the lines in the table of datasets are only written to their current version. The following table summarizes the write options that XMLWRITEMODE can use. Ignoreschema Writing Data Set as an XML data without architecture to write Writeschema to write data set content DiffGram with embedded XSD architecture DiffGRAM writes the data set as DiffGRAM, including the original value and the current value
XMLWRITEMODE.IGNORESCHEMA is the default option. The following code shows a typical way of using a data set as an XML.
// ds is the dataset
Streamwriter SW = New StreamWriter;
DS.WriteXml (SW); // defaults to xmlwritemode.ignoreschema
SW.CLOSE ();
There are several factors that affect the final structure of the XML document created from the DataSet object. These factors include:
• Used XML Overall Format - DiffGram or Current Contents of Unformatted Form • Whether the architecture information exists • Nested relationship • How to map to XML elements
The DIFFGRAM format is a special XML format that will further explain later. It does not include architecture information, but retains row status and line errors. Thus, it seems to constitute a closer representation of the data set real-time instance.
Architecture Information If exists in the created data set, it will always be written as embedded XSD. You can't write it as XDR, DTD, or add a reference to external files. If the name has not been specified for the root node of the generated XML file, accept the name or newDataSet of the data set. The following code snippet is an example of the XML representation of the DataSet object consisting of two tables. The two tables are Customers and Orders, and the relationship between them is formed by the CustomerID field.
Customers>
Customers>
Orderers>
Orderers>
MyDataSet>
It is difficult for you to determine between the two tables according to the code listed above. Some of this information is set in the
Customers>
Customers>
MyDataSet>
As you can see, all orders are now concentrated under the corresponding customer subtree.
By default, in the XML table, columns are presented as node elements. However, this is just a setting that can be adjusted on each column. The DataColumn object has a property called columnmapping, which determines how the column is presented in XML. The columnMapping property accepts the value in the MappingType enumeration listed below.
ELEMENT maps to XML node elements:
Attribute is mapped to XML Node Properties:
If the XML output format is DIFFGRAM, the Hidden mapping type is ignored. However, in this case, the DiffGram representation of the column contains a special attribute, which labeled the column to hide the XML serialization. The SimpleContent mapping type is not always available, and can only be used when there are columns in the table.
Back to top
Diffgram format
DiffGram is just an XML string written according to a specific architecture representing the content of the data set. It is never .NET type. The following code snippet shows how to serialize the DataSet object to DiffGram.
Streamwriter SW = New StreamWriter;
DS.WRITEXML (SW, XMLWRITEMODE.DIFFGRAM);
SW.CLOSE ();
The resulting XML code is placed in the
:
MyDataSet>
:
Diffgr: Before>
:
Diffgr: Errors>
DIFFGR: DIFFGRAM>
The first quarter of the DIFFGRAM is mandatory and represents the current instance of the data. It is almost identical to the XML output obtained from ordinary serialization. The main difference between the two is that the DiffGram format never includes architecture information.
This data section includes the current value of the line of data. The original row including delete rows is stored in the
Finally, in the
Diffgr: Haschanges This line has been modified (see related lines in
The ADO.NET framework only provides explicit XML support for the DataSet object. However, converting DataView or DataTable to XML is not particularly difficult. In both cases, you must use the temporary dataset as a container that is to be saved as XML. The code required to save DataTable as XML is simple.
Void WriteDataTableToxml (String FileName, DataTable DT)
{
// duplicate the Table and add it to a Temporary DataSet
Dataset dstmp = new dataset ();
DataTable dttmp = dt.copy ();
Dstmp.Tables.Add (DTTMP);
// save the temporary dataset to xml
StreamWriter SR = New StreamWriter (FileName);
DSTMP.WRITEXML (SR);
sr.close ();
}
Each ADO.NET object can only be referenced by a container object. Because of this simple reason, it is very important to replicate the DataTable object. You can't have the same example, for example, a DataTable object belongs to two different DataSet objects.
Unlike the DataTable object, DataView is not a standard component of the dataset, so in order to save it to XML, you should convert DataView into a table object. This process can be implemented in the following code snippet: Void DataViewTodataable (DataView DV)
{
// Clone the structure of the table behind the view
Datarable dttemp = DV.TABLE.CLONE ();
DTTEMP.TABLENAME = "row"; // this is arbitrary!
// Populate the Table with Rows in the view
Foreach (DataRowView DRV in DV)
DTTEMP.IMPORTROW (DRV.ROW);
// giving a custom name to the dataset can help
// com Up with a clearr layout but is not mandatory
DataSet DSTEMP = New Dataset (DV.TABLE.TABLENAME);
// add the new Table to a Temporary Dataset
DSTEMP.TABLES.ADD (DTTEMP);
}
The first step is to clone the structure of the table behind the DataView object processed. Next, all records in this view are traversed, and the corresponding rows are added to the temporary DataTable. Then, add this DataTable to the temporary data set and serialize the DataTable. You can also try to provide a table name to the dataset and provide a custom format to the entire XML output. E.g:
:
Row>
:
Row>
:
Row>
TABLENAME>
Back to top
XMLDATADOCUMENT class
The XML and ADO.NET frames provide a unified model for accessing data represented by XML and relational data. Where the key XML class is XMLDATADOCUMENT, while DataSet is a key ADO.NET class. Specifically, XMLDATADOCUMEN is inherited from the base class XMLDocument and is simply different from the ability to synchronize with the DataSet object. When synchronization, the DataSet class and the XMLDATADECUMENT class target is the same row collection, and you can apply changes through two interfaces (nodes and relationships), so that these two classes can be immediately seen. Basically, DataSet and XMLDATADOCUMENT provide two sets of methods for the same data. Therefore, you can apply XSLT conversions to relational data, query relational data through XPath Expressions and use SQL to select the XML node.
You can bind the DataSet object and the XMLDATADOCUMENT object through several ways. The first method is to pass a non-empty DataSet object to the constructor of the XMLDATADOCUMENT class.
XmlDATADOCUMENT DOC = New XMLDATADOCUMENT (Dataset);
Similar to the base class, XMLDATADOCUMENT provides an XML DOM method using XML data, so it is very different from the XML reader and writer. The following example shows another method for synchronizing these two objects, which is a valid non-empty DataSet object from a non-empty instance of XML DOM. XmlDATADOCUMENT DOC = New XmLDATADOCUMENT ();
Doc.Load (filename);
DataSet DataSet = doc.dataset;
You can use the XMLDATADOCUMENT DataSet property to turn an XML document into a DataSet object. This property instantiates and fills the DataSet object and returns the object. When you first access the DataSet property, the dataset is associated with XmlDATADocument. Methods getElementFromrow and getRowFromElement switches between the XML form and the relationship view of the data. In order to view XML data from the point of view, you must first specify the architecture to use for data mapping. This purpose can be achieved by calling the READXMLSCHEMA method to the same XML file. Another way is that you can manually create the tables and columns required in the data set.
However, there is a method of synchronizing XmldAdocument and DataSet objects, that is, when they are empty, fill it separately. E.g:
DataSet DataSet = New DataSet ();
XmlDataDocument Xmldoc = New XmlDataDocument (Dataset);
XMLDoc.Load ("file.xml");
Keeping two objects can be synchronized to provide unprecedented flexibility, as mentioned earlier, you can use two distinct navigation types to move between records. In fact, you can use XML nodes using similar SQL's queries and use XPath queries to the relational row.
Not all XML files can be successfully synchronized with data sets. In order to keep synchronization, the XML document must have a regular table format structure that can be mapped to the relational structure, in the relational structure, each row has the same number of columns. The XML document will lose any XML-specific information when presenting a DataSet object, which may be their already owned and there is no relationship. This information includes annotations, declaration, and processing instructions.
Back to top
summary
In ADO.NET, XML is more than just a simple output format for serializing content. You can use XML to serialize the entire content of the DataSet object, but you can also select the actual XML architecture and control the structure of the obtained XML document. You can monitor the contents of the dataset, including tables and relationships, can accept the architecture information derived from the final document, and even DiffGram formats can be used.
More features can be provided when Ado.net interacts and integrates with XML. In particular, in .NET, you can provide and utilize two equivaler-independent views of the same data, which follows different logical data representations.
Back to top
Dialogue: Use GetChanges to Batch Update
I have discovered that the data set programming interface provides a method called getChange, which returns a smaller dataset, which only populates all the updated rows included in the table. So this makes me think that using this smaller data set instead of that original data set can improve performance. However, you have mentioned some case in the previous article, and I can't remember the name of the article, saying that this situation has triggered some unknown exceptions. Therefore, my problem is whether you can more clearly explain the use of the GetChanges method of the data set in the batch update? ADO.NET batch updates are based on loops that traverse the rows on the specified table. The code checks the state of the line and decides which one to do. This loop acts as a data set and data sheet for how to provide the adapter as a parameter. If you call the original data set or the smaller data set returned by getChanges, the result will be substantially the same. This will result in the lowest level, and only the role of narrowing the length of the cycle.
During the batch update, the data line is processed in the order from the intermediate layer to the data server. There is no data snapshot that is sent to the database one-time or as a single data block. In fact, in this case, use GetChanges will get a much higher optimization.
Deciding on how to perform many important operations during batch updates is the number of rows modified. Whether you are using the original dataset or a data set returned by getChanges, this parameter does not change.
Conversely, if you make a batch update for the data set returned by getChanges, you may have a serious problem when conflicting conflicts. In this case, the line processed before the failed row will be submitted normally, but they are not on the original data set! To ensure consistency of the application, you must accept the changes on the submitted line, as well as changes on the original data set. This code is completely independent. All in all, if you use the original data set, the batch update code is much simpler.
Back to top
Dino Esposito is Wintelle's ADO.NET expert and training faculty and consultant, and the work location is located in Rome, Italy. DINO is the special editor of MSDN Magazine, which is the writer of the Cutting Edge column. He also often writes to Developer Network Journal and MSDN NEWS. DINO is the founder of the book of the book. BUILDING Web Solutions with asp.net and ado.net ?? is also one of the founders of http://www.vb2themax.com/. If you want to contact Dino, you can send an email to dinoe@wintellect.com.