3 implementation process
The implementation process is shown in Figure 3. Includes capture programs and two parts of the implementation.
Figure 3 "Data Replication Based on SQL Reproduction" implementation process
(1) capture procedure
The capture program is the program that captures changes that occurred by the source database.
After establishing a replication environment, you need to use the capture program to capture changes to the source database, and the change is temporarily stored in the change trajectory table.
"SQL reproduction method" to capture data is implemented by triggers. The specific capture step is shown in FIG. 4, when the source table modifies, inserts, and deletes operations, the flip-flop is started, calling the corresponding stored procedure, restoring the operation of the source table as a SQL statement (for example, the source table is TB_EMPLOYEE, The target table it corresponds to TGTB_EMPLOYEE, inserts a record in the source table TB_EMPLOYEE, such as INSERT INTO TB_EMPLOYEE VALUES (?,?,?,?), Then restored SQL statement may be INSERT INTO TGTB_EMPLOYEEEEEE VALUES (?,?, ?), Here is possible, because the target table does not necessarily contain all the columns of the source table, it is possible to just only one of the columns, if only one column of data of the source table is selected, then restore SQL statements Only these few columns are included), insert the change time and the restored SQL statement to the change track table corresponding to the source table.
Figure 4 capture process
The stored procedure processing is shown in Figure 5.
Figure 5 Workflow of the stored procedure
Figure 5 Work Flow of Stored Procedure
(2) Implementation procedure
The implementation procedure is scheduled to run, and the data in the track table is changed to the program in the target database.
Since asynchronous replication itself is required to be data distribution, there is a single server, which is also a "implementation server" to do this.
The implementation process is responsible for distributing data. Data distribution of "SQL reproduction" data replication technology uses the "pull" model, that is, the replication process is started by the target server, and the data is requested to change the data to the source server. The implementation procedure is developed by Java language, with the source database, and the target database is connected to the JDBC. The capture program is temporarily stored in the change track table, and the copy time implement program reads the changes to the source database from the change track table through the JDBC table, and then connect the target database through the JDBC to connect to the target database. Target database.
The implementation procedure is usually running on the target server, but it can also run on either server that can be connected to the network of the source and target server. Multiple implementation program instances can run on the same or different servers.
The implementation procedure is updated by a full update or differential update copy data from the source table to the target table.
(1) In the initialization process, the implementation procedures performs a full update copy. At this point, the implementation program performs the following tasks:
1) Delete all lines in the target table
2) Read all lines from the source table
3) Copy these lines into the target table
(2) When the differential update copy is performed, the implementation of the program only copies the data to the target table. At this point, the implementation procedure performs the following tasks, as shown in Figure 6:
Figure 6 implementation program workflow
Figure 6 Work Flow Of Apply Program
Among them, (1) (2) (3) represents:
(1) Implement the program to extract the change data;
(2) Apply the change data to the target table;
(3) Delete information in the corresponding change in trajectory table.
4 technical characteristics
"SQL reproduction" data replication technology has the characteristics of complete heterogeneous, DBMS independence, versatility, and reliability, and will be introduced one by one. (1) Complete isomer
A fully heterogeneous means that replication can be done between all databases at least all databases compatible with RDB (relational databases, RELATIONAL DATABASE). "SQL Reefill" Data Replication Technology Connect various databases through JDBC, as long as you find the appropriate JDBC, you can do copy between all databases. Therefore, "SQL reproduction method" data replication technology is completely heterogeneous.
(2) DBMS irrelevant
Many current replication schemes, especially data replication programs provided by database products rely on the technology close-up with the company's own DBMS core relationship, and "SQL reproduction" data replication technology uses Java technology development, use The JDBC interface is connected to various databases. Java itself is a cross-platform technology. JDBC has been respectful as an ODBC as a standard. The major database manufacturers have produced drivers that meet the JDBC standards, so that they use these interfaces. Considering the characteristics of too many underlying DBMS, various DBMs can be transparently operated. For example, connecting the class DatabaseMetadata to each database, using the Java language, connect various databases through JDBC, connection, access, execution process, query, etc., for application, etc., so you can do it. Not related to DBMS. Therefore, "SQL reproduction method" data replication technology is independent of DBMS.
(3) Universality
"Based on SQL Reefill" data replication technology, the change in capture is to copy any database in the form of SQL statements. If you don't have to write a corresponding database DML operation in the implementation, this must know which database is not the same, because the DML statement of different databases is different. If the TRIGGER can be encapsulated by DML, TRIGGER can encapsulate this difference, so that the implementation proceeds to face the comparison interface, so the implementation program itself can be transplanted, and Trigger itself is a non-portable object, which is packaged by it. Benefisited, you can make the implementation process for a consistent interface, so that the implementation can have better portability. The "SQL Redgendculture" data replication technology implementation procedure is to read the SQL statement from the source library, and execute this SQL statement in the target database.
(4) Reliability
"SQL Reefragment" data replication technology only copies the data modified in the database, so the load can be significantly reduced. "SQL reproduction method" data replication technology will change the action of the action of the target library and the action package in the delete change track table, when the network or target database fails, the change is applied to the database. Will success, the entire transaction rollback, the content in the change track table will not be deleted, assume that the original data is not damaged when the network or target database fails, and when the fault is recovered, the copy process can be manually started to submit a change trajectory table. The content of the content; or when the copy time is reached, the contents of the change track table are re-submitted. This approach makes this replication technology high reliability.
5 prototype
In order to verify the feasibility and technical characteristics of the "Based sql reproduction method" replication technology, the author develops prototype system Replicator, which will be described in detail.
(1) Experimental environment
The experimental environment is as follows:
DBMS: IBM DB2 8.1, Oracle9i, Sybase11.9
OS: RedHat 7.3, Windows 2000, NT
Network: LAN (TCP / IP 100MBPS) Hardware: PC (Pentium 1.7G, 512M RAM, 80GHD), PC (Pentium V), PC (Pentium 1.7G, 512M RAM, 80GHD)
(2) Experimental test
After the prototype is implemented, the data replication function is tested.
The specific operating environment is as described in (1), and try to close other programs and processes to occupy minimal resources. Table 1 of the test results:
Each of them records a size of 100 bytes, involving various common data types (characters, integer, floating point number, time / date, large object, etc.); about 50% of records have changed.
Table 1 test results
Table 1 Test Result
Record time use (s) memory usage (M) (M) (maximum memory loss during the run) 1000 58 16 5000 295 16 10000 592 16
The test results show that as the number of records increases, "SQL reproduction method" data replication technology time has increased linear growth (ie o (n), n is recorded), and the spatial complexity (mainly memory loss) basically Keep constant (ie o (1)).
Due to the limitations of time and other objective conditions, Replicator still has some shortcomings, and it needs to be improved and improved later. There are:
(1) Diversity
Although the experimental environment contains the main isomeric form, they have strong representation, but the situation involved in the two environments is limited.
(2) Efficiency problem
Adding Trigger on each source table is bound to bring a certain efficiency problem to the existing application, and give database management, how to control the number of TRIGGERs and the contents of the change of trajectory table control information is to improve efficiency, simplify management main considerations. The direction.
(3) Real-time
At present, "SQL reproduction method" data replication technology is a method of copying the control information afterwards, so that real-time replication cannot be implemented, which brings limitations to actual applications.
references
1 is respectful. Research and implementation of open heterogeneous database replication framework 2002.4
2 Wang Yu, Zhang Zhihao distributed database system copy mechanism and application computer engineering and science 2003
3 Ji Zengrui Distributed Database System Consistency Method Computer Engineering and Science 2002
4 Lin Huaizhong Data Replication and Consistency 2002
5 Ge Wei Min Based on Oracle Advanced Replication Distributed Database System Application Research Database & Information Processing 2003
7 Guo Si Mei Data Replication Technology In Managing Information Systems 2000.3