Analysis of ADO.NET batch updates (in-depth research data access)
Release Date: 4/1/2004
| Update Date: 4/1/2004
Dino esposito
Wintellect
November 8, 2001
The interaction between the ADO.NET application and the underlying data source is based on a two-way channel with a two-way channel. You can access the data source using each provider-specific command or batch update process to read and write. In both cases, data access generates a fully bidirectional binding and involves a variety of different objects and methods. You can use command classes such as SQLCommand and OLEDBCommand to perform a single command. You can use the data adapter object to download the disconnected data, submit the update routine. Although "Dataset" is a container object that the data adapter is used to return and submit the record block, each command returns data through the data reader object.
Updates are all commanded by each command, and any command text, which is not completed by the stored procedure, is generally called updates. The update command always performs new data embedded in the body of statement. The update command always requires an open connection that may also need a transaction or a new transaction. Batch updates are a slightly different approach branches. From the highest abstraction level, you don't make an order, no matter how complicated it. Instead, you submit a snapshot of the current row modified on the client and wait for the data source approval. The key concept behind the batch update is the concept of data disconnection. You download the table, usually for the data set, modify it on the client as needed, and then submit the new image of these rows to the database server. What you do is submitting changes instead of executing a command to create changes to the data source. This is updated (I discussed this issue in July Column) and the essential difference between batch updates.
The figure below illustrates the Dual update architecture of ADO.NET.
Figure 1. Two two-way interactions between ADO.NET applications and data sources
Before further discussing ADO.NET batch updates, I need to clarify an aspect of a batch update model that often leads to some misunderstanding. Although updates and batch updates have essential differences in the actual implementation in ADO.NET, they follow the same update model. Updates and batch updates are done by direct and provider-specific statements. Of course, these statements are combined for a batch call due to batch updates to more lines. Batch update will loop the row of the target dataset from the head to the end, as long as the update is found, the appropriate update command (INSERT, DELETE or UPDATE) will be issued. When communicating the updated row, a predefined direct SQL command will be run. In essence, this is a batch update.
This process is justified. In fact, if the batch update uses a completely different update model, special support from the data source is required. (This is exactly what happened to SQL Server 2000.) Batch update is just a software mechanism for simplifying clients submitted by multiple rows updates. In any case, each new row submission is always done through the normal channel direct command of the data source.
So far, this article only refers to the SQL command, but these mentioned content clearly indicates an important difference between ADO batch update implementation and ADO.NET batch update implementation. In ADO, batch updates may only occur on SQL-based data sources. In ADO.NET, batch updates may happen on any type of managed provider, including those hosting providers that should not disclose their data through the SQL query language. Now we can start discussing the key content of the ADO.NET batch update programming.
Prepare data set for submission
ADO.NET batch updates are performed by the Update method of the data adapter object. Data can only be submitted based on each table. If you do not specify a table name when you call "Update", use the default table name of TABLE. If there is no table with this name, an exception is generated. "Update" first checks the RowState property of each table row, and then prepare custom INSERT, UPDATE or DELETE statements for each inserted row, update rows or delete rows in the specified table. "Update" method has several overload. It can adopt pairs, a data sheet, or even an array of DATAROW objects. This method returns an integer value, that is, the number of rows that successfully update.
In order to minimize network communication, "update" is usually used for a subset of data set in operation. There is no doubt that this subset contains only rows that have been modified at the time. You can get such a subset by calling the GetChanges method of the data set.
IF (ds.haschanges ())
{
Dataset dschanges = ds.getchanges ();
Adapter.Update (DSChanges, "MyTable");
}
Alternatively, you can use the Haschanges method to check if the dataset has changed. Haschanges returns a Boolean value.
The data set returned with GetChange contains rows that have been inserted, deleted, or modified at the time. But what time is it here? This is exactly one aspect of ADO.NET batch update, must be processed with the current state of the table row.
Back to top
Row
Each line in the "Data Sheet" is presented by the DATAROW object. The DATAROW object is mainly an element that is mainly an element of the ROWS collection of the parent "Datasheet" object. From a conceptual perspective, the database is inherently chained with a given structure. It is for this reason, the DATAROW class in ADO.NET does not provide a public constructor. The only way to create a new DATAROW object is to call a method called newrow by means of a real-time instance of the "Data Table" object. Just created a row is not a ROWS collection of the parent table, but the relationship between the line is determined by the line. The table below shows some of the values of the ROWSTATE attribute. These values are combined in the DataRowState enumeration.
Added The row has been added to the table. DELETED The row is marked as deleted from the parent table. Detached This row has created but has not been added to the table, or the row has been removed from the collection of tables. Modified This row has changed. Added The row has been added to the table. Unchanged does not make any changes to the line after you have created or last call the AcceptChanges method.
Each row's ROWSTATE attribute affects the return value of the Haschanges method and the contents of the sub-dataset returned by getChanges.
As can be seen from these values, the value of ROWSTATE depends primarily on the operations that have been performed. The ADO.NET table is based on two methods - AcceptChanges and RejectChanges - to implement a submission model for similar transaction processing. When downloading tables from a data source or new tables in memory, all rows are not changed. All changes you entered will not change immediately to a permanent change, and you can return to REJECTCHANGES to roll back changes. You can call the REJECTCHANGES method at three levels:
• Reject all changes (no matter what change) at the dataset level. • All changes in a table can be canceled at the data table level. • The state before the line can be restored at a particular line level.
Method AcceptChanges can submit all changes in progress. It allows the data set to accept the current value as a new original value. Therefore, all hang changes are cleared. Like RejectChanges, you can also call acceptchange for the entire dataset, a table, or a line.
When you start a batch update operation, you will only consider submitting those marked as added, deleted, and modified. If you just call AcceptChanges before batch updates, no lasting changes are made for data sources. On the other hand, once the batch update operation is successfully completed, you must call AcceptChanges to clear the pending changes and mark the current data set value as the original value. Note that if the last call to AcceptChanges is omitted, the dataset will retain the hang change, resulting in re-issue these changes when the batch update will be submitted next time.
// Get Changes in the dataset
DSChanges = ds.getchanges ();
// performs the batch update for the given table
Da.Update (DSChanges, strtable);
// clears any pending change change in membrate
Ds.acceptchanges ();
The above code illustrates the three main steps behind the ADO.NET batch update.
If you delete the row from the dataset table, please note that the method you use is "delete" or "removal". The "Delete" method will perform logic deletion by marking the row as "delete". The "removal" method is physically deleted from the ROWS collection. Therefore, the row deleted by "Removing" is not marked as deletion, so it will not be processed during the rear batch update. If your final delete goal is deleted from the data source, "delete" should be used.
Back to top
Update in-depth content
There are three operations to change the status of the table:
• Insert a new row • Delete an existing row • Update an existing line
For each of the critical operations, the data adapter defines a custom command object as an attribute disclosure. Such attributes include INSERTCOMMAND, DELETECMMAND, and UPDATECMMMAND. The programmer is responsible for assigning meaningful command objects for these attributes, for example, SQLCommand objects.
Only available InsertCommand, DeleteCommand and UpdateCommand properties represent a huge breakthrough from ADO to ADO.NET. With this property, you can submit an unprecedented control for updates to the database server. If you are not satisfied with the update code generated by ADO.NET, you can now modify these update code without negating the overall characteristics of the batch update. When using ADO, you have no control for the SQL command generated by the library. In ADO.NET, use the publicly displayed command object, you can use the update using a custom stored procedure or SQL statement that is more in line with the user. In particular, you can use a batch update system for cross-references, or even non-SQL data providers such as Active DirectoryTM or Indexing Services.
The update command should run for each change in the table and must be very common to adapt to different values. For this task, it is very suitable for using the command parameters as soon as you can bind them to the value of the database column. The ADO.NET parameter object discloses two properties for this binding, for example, SourceColumn, and SourceVersion. Especially SourceColumn, which represents an indirect manner indicating the parameter value. You can use the column name to set the SourceColumn property and make the batch update mechanism extract valid values from time to time, rather than using the value attribute and set it with scalar values.
SourceVersion indicates which value on the column should be read. By default, ADO.NET returns the current value of the row. Another way is that you can choose the original value and all values in the DataRowVersion enumeration. If you want to batch updates to several columns in Northwind's Employees table, you can use the following custom commands. The INSERT command is defined as follows:
StringBuilder SB = New StringBuilder ("");
Sb.Append ("INSERT Employees (firstname, lastname) VALUES (")
Sb.append ("@ sfirstname, @slastname)");
Da.insertCommand = new sqlcommand ();
Da.insertCommand.commandtext = sb.toString ();
Da.insertCommand.connection = conn;
All parameters will be added to the parameters collection of the data adapter and bind to a data table.
SQLParameter P1 = New SQLParameter ("@ sfirstname", sqldbtype.nvarchar, 10);
P1. SourceVersion = DataRowVersion.current;
P1. SourceColumn = "firstname";
Da.insertCommand.Parameters.Add (P1);
SQLParameter P2 = New Sqlparameter ("@ SLASTNAME", SQLDBTYPE.NVARCHAR, 30);
P2.SourceVersion = DATAROWVERSION.CURRENT;
P2.SourceColumn = "lastname";
Da.insertCommand.Parameters.Add (p2);
Note that the auto-incremented column should not be listed in the syntax of the insert command, because their value is generated by the data source.
The UPDATE command needs to determine a specific line to apply its changes. To do this, you can use the WHERE clause to compare the parameterized value to the key field segment in this clause. In this case, the parameters used in the WHERE clause must be bound to the original value of the row, not the current value.
StringBuilder SB = New StringBuilder ("");
SB.Append ("Update Employees Set";
Sb.append ("LastName = @ SlastName, FirstName = @ SfirstName");
Sb.append ("where employeeid = @ NEMPID");
Da.UpdateCommand = new sqlcommand ();
Da.UpdateCommand.commandtext = sb.toString ();
Da.UpdateCommand.connection = conn;
// p1 and p2 set as before
:
P3 = New SQLParameter ("@nmpid", sqldbtype.int);
P3.SourceVersion = DataRowVersion.original
P3.SourceColumn = "EmployeeID";
Da.UpdateCommand.Parameters.Add (P3); Finally, the delete command needs to use the WHERE clause to determine the row to be deleted. In this case, you need to use the original version of the row to bind the parameter value.
StringBuilder SB = New StringBuilder ("");
Sb.Append ("delete from employees";
Sb.append ("where employeeid = @ NEMPID");
Da.deleteCommand = new sqlcommand ();
Da.DeleteCommand.commandtext = sb.toString ();
Da.deleteCommand.connection = conn;
P1 = New SqlParameter ("@ NEMPID", SqldbType.It);
P1.SourceVersion = DATAROWVERSION.ORIGINAL;
p1. SourceColumn = "EMPLOYEEID";
Da.deleteCommand.Parameters.Add (P1);
The actual structure of the SQL command depends on you. These commands are not necessarily a normal SQL statement, which can be a more efficient stored procedure (if you want to use this direction). If there is a very specific risk - Other people may update the rows you read and modified, then you may want to take some more effective ways. If this is the case, you can use a restrictive WHERE clause in the Delete and Update commands. The WHERE clause can be made clearly, but it should also ensure that all columns still retain the original value.
UPDATE EMPLOYEES
SET FIELD1 = @ new_field1, field2 = @ new_field2, ??? ..., fieldn = @ new_fieldn
Where infield1 = @ old_field1 and
FIELD2 = @ old_field2 and
:
Fieldn = @ Old_fieldn
Note that you don't need to populate all command parameters, only those you plan to use. If the code wants to use the command that has not been specified, an exception is thrown. Settings a command for batch update process setting commands may require many code, but you don't have to write a lot of code when you perform a batch update. In fact, in quite a few cases, ADO.NET can automatically generate a valid update command for you.
Back to top
Command generator
To take advantage of the default command, you must meet two requirements. First, a valid command object must be assigned to the SelectCommand property. You do not need to populate other command objects, but SELECTCOMMAND must point to a valid query statement. A valid query for batch updates is a query that returns the primary key column. In addition, the query must not include Inner Join, the calculated column, and multiple tables must not be referenced.
The columns and tables listed in the SelectCommand object will actually be used to prepare the text of the update and insert statements. If you do not set SELECTCOMMAND, you cannot automate the ADO.NET command automatically. The following code illustrates how to write code for the SelectCommand property.
SQLCommand cmd = new sqlcommand ();
cmd.commandtext = "SELECT Employeeid, Firstname, Lastname from Employees";
cmd.connection = conn;
Da.selectCommand = CMD;
Don't worry that SELECTCommand may have an impact on performance. The relevant statement is only executed once before the batch update process, but it only sesers the column data. No matter how you write SQL statements, you will never return any rows to the calling program. The reason for this happens is that when executed, SELECTCOMMAN has appended to the SQL batch statement starting at the beginning of the following code.
Set fmtonly off
SET NO_BROWSETABLE ON
Set fmtonly on
Therefore, the query does not return rows, and return the column data information.
The second requirement that your code must meet with the command generator. Command Builder is a class specific to a managed provider, which works above the data adapter object and automatically sets its INSERTCOMMAND, DELETECOMMAND, and UPDATECMMAND properties. The command generator first runs SELECTCOMMAND to collect enough information about the table and columns involved, and then create an update command. The actual command is created in the command generator class constructor.
SQLCommandbuilder CB = New SQLCOMMAndBuilder (DA);
The SQLCommandBuilder class ensures that the specified data adapter can be successfully used to batch updates for a particular data source. SQLCommandBuilder uses some properties defined in the SelectCommand object. These properties are Connection, CommandTimeout, and Transaction. As long as you change any properties, you need to call the Refreshschema method of the command generator to change the structure of the generated command for further batch updates.
You can mix the command generator and custom commands. If the INSERTCOMMAND property points to a valid command object before calling the command generator, the generator will only generate code for DeleteCommand and UpdateCommand. Non-empty SELECTCOMMAND properties is the key to the command generator to work.
Typically, you use the command generator because you feel that you write SQL commands too complicated. However, if you want to view the source code generated by the generator, you can call the method such as getInsertCommand, getUpdateCommand, and getDeleteCommand.
Command Builder is a provider-specific feature. Therefore, it is impossible to expect all types of managed providers to support it. SQL Server 7.0 and later providers and OLE DB Provider Support Command Builders.
The command generator has a good feature that detects the automatically incremented field and optimizes the code accordingly. In particular, as long as it has a way to identify some fields being automatically incremented, the automatic increment field will be extracted from the INSERT statement. This process can be implemented in two ways. For example, you can manually set the AutoInCrement property of the corresponding DataColumn object, or a better way to make it automatically based on the attributes listed in the data source (such as SQL Server). To automatically inherit such properties, make sure to change the MISSINGSCHEMAAAction property of the data adapter from the default value Add to AddWithKey.
Back to top
Conflict detection
The batch update mechanism has a very optimistic view. Each record is not locked after reading, still discloses to other users for reading and writing. In this case, some potentially inconsistent situations may occur. For example, after passing a row from the SELECT statement to your application, it may be modified before the batch update process truly changed the return server, and has been deleted.
If you update the data on the server, these data have been modified by another user, and data conflicts may be generated. To avoid new data override, the ADO.NET command generator generates a statement with the WHERE clause, and only when the current state of the data source is consistent with the state before the application is read, the WHERE clause takes effect. If such a command fails to update the line, the ADO.NET runs an exception of a DBConcurRencyException type. The following code snippet illustrates how to perform a batch update update operation with ADO.NET with a more accurate approach.
Try
{
Da.Update (DSChanges, "Employees");
}
Catch (DBConcurrencyException DBDCEX)
{
// resolve the conflict
}
The "Update" method of the data adapter you are using will lead to an exception for the first update failed. At this point, the control is returned to the client application, and the batch update process stops. However, all previous submit changes will still be performed. This process represents another transition from the ADO batch update model to ADO.NET.
The DATAROW object involved in conflict updates can be used by the ROW attribute of the DBConcurrencyException class. This DATAROW object contains a row's commit value and the original value. It does not contain a value that a given column is currently stored in the database. This value - the UnderlyingValue property of ADO - can only retrieve another query command.
Strictly resolve conflicts and may continue to batch updates strictly specific for applications. If there is a case where your application needs to continue to perform updates, you should understand a subtle, but a tricky problem. If you try to solve the conflict in the line, you must also come up with a way to accept the change in the batch has been successfully completed. If you ignore this technical details, a new conflict will be generated for the first row that the previous successful update! This situation will happen repeatedly, and your application will soon enter the deadline.
Back to top
summary
Compared with ADO, batch updates in ADO are more powerful and have higher accessibility. In ADO, batch update mechanism is a black box, which is almost impossible to deepen inside, or it is impossible to change the task you need to do. Batch updates in ADO.NET are more than a low-level solution, which implements several entry points for you to enter its internal and control events. ADO.NET Batch Update The most difficult part is conflict resolution. Author really suggests that you will use more time to test and then test. This investment can be rewarded through all the time saving of the command generator.
Back to top
Dialog: NULL value in the data table
I extract the data set from the database, everything goes well. Then I tried to save this data set to the XML file, still very smooth. But when this XML file read back to the dataset, the problem has appeared. This is because all columns with NULL values cannot be kept in XML. Can I use some way to add NULL values to the resulting XML?
This behavior is designed to make the best intention to save several bytes in the XML serialization process. If this behavior occurs on the network (for example, within the XML Web Service), the advantages it will be very obvious.
That is, you can solve your problem with a very simple approach. This trick is to extract the column through the ISNULL T-SQL function. We don't use the following code:
SELECT MyColumn from MyTable
Should be used:
Select Isnull (MyColumn, '') from myTable