Distributed Database Tutorial (2) - Original

zhaozj2021-02-16 100

Third, the difference between traditional databases from distributed database

Traditional database applications often use client / server structures (ie, C / S structures, as shown in Figure 2), which is technically mature and applied is very wide, but this structure's application system has its lack of foot The place. For example, when the number of customers is large, the amount of data is very heavy, and the server's load will be heavy, and there are many repetitive work, because the query issued by many customers may be exactly the same, and the results of the query cannot be shared, even two customers The requested request is exactly the same as two queries on the server; the client stores a query algorithm with commercial value; the database server burden has caused low efficiency.

When you add a server between the server and the client, the problem is specifically used to store query algorithms and temporary query results, the problem has been well resolved: on the one hand, different customers can share temporary query results without accessing The database server reduces the server's burden; the client can also see the query algorithm as a commercial confidentiality. This is the working principle of the distributed system.

The emergence of distributed systems originates from a number of disadvantages of traditional C / S structures, such as low efficiency, safety, etc., combined into the database, global DNS (Domain Domain Analysis System Domain Name System) system is a typical example I tried to manage if all the domain names all over the world are set to a server, the server will definitely work hard because the load is too heavy, and the entire Internet is embarrassed. (You can draw a picture of a addressing DNS server)

Unlike the data redundancy in conventional databases, data redundancy is considered in a distributed system that is desired, why is the reason for:

First, if data is copied in need, the local application can be improved.

Second, when a node fails, copy data on other nodes can be operated, so this can increase the effectiveness of the system.

However, the evaluation of optimal redundancy in a distributed system is complicated.

Fourth, distributed database technology design

Database design principle

From the perspective of global applications, these databases are configured to form a distributed database system, implement the integrity and consistency of global data, and the fields of this site are still stored, and the shared pool is operated through the network, and the data is performed. The integrity and consistency check, although there is a certain amount of data redundancy, but multiple copies of the same data in different sites, improve the system's reliability and availability, and improve the efficiency of local applications, reduced Communication costs. The distributed database system can be extended in the case where the current mechanism has a minimum, and it is possible to add a node when adding a new site, and the mutual interference between each processor is also minimized.

data storage

In a distributed database, data storage is implemented by the following three ways:

1 Replication: Several identical copies of the system maintenance relationship, these copies are stored on different nodes.

2 Split: The relationship is divided into several fragments, and each fragment is stored on different nodes.

3 Copy Split: The relationship is divided into several fragments, and the system maintains several copies for each fragment.

Because there is a certain amount of data redundancy between each database, there is a difference, we use the copy shard mode to perform data storage.

Data fragment

In a distributed database system, the relationship is divided, facilitates the distribution of data by the user's requirements, and the current fragmentation method is horizontal, vertical fragment, export fraction, mixed fragment, and the like. We can use different segmentation methods according to different data relationships.

data synchronization

The data synchronization mode is used in two types of transaction replication, and the data management and analysis function is implemented by the data, data management and analysis functions are implemented by the data management and analysis function of the site. Each site only needs to save the updated data to the database of shared pools, we use transaction replication to synchronize business data, put the database as a publisher and distributor, shared the pool's database as a subscriber, for each site Data establishes a snapshot agent and logs information about the synchronous state in the distribution database. Each venue database using transaction has its own log read agent, running on the distributor and connects to the publisher. The Distribution Agent's task is to directly push transaction tasks held in the distribution database to the subscriber. When the subprogramme is created, the transaction publication established for immediate synchronization is running on the distributor through its own distribution agent and is connected to the subscriber. Transaction replication can support two types of object replication: tables and stored procedures. Define some or all of the data in the database in the publisher, select multiple stored procedures as replication. When the data of the site is updated, the log read agent pushes an instant update information into the database of the shared pool. Based on the storage process, the application has better performance, which can greatly reduce the network's communication. Transactional logs use transaction logs to monitor data changes in the article.

Merger replication is a kind of a kind of publisher to the subscriber distributing data method to modify the publisher and the subscriber, regardless of the subscriber and the publisher are connected or disconnected, and then when all (or part) nodes are connected The changes in each node are incorporated. In the merge replication, each node completes its own task independently, unworthchable between the subscriber and the publisher is not necessary to connect to other nodes like transaction replication and snapshot replication, and do not need to connect to other nodes, and do not have to use MS DTC to implement Two-stage submission can be modified in multiple nodes, just connect the node to other nodes at a certain time (other nodes at this time do not necessarily refer to all other nodes).

Then copy the occurrence of the data to which the data is changed to the database of these connected nodes.

The following is a design block diagram of the system:

Figure 4

■ Distributed transaction processing utilizes distributed technology to implement transaction processing and query

The distribution of data in distributed database systems leads to distribution of transactions. The execution of a global affairs is divided into execution of sub-affairs.

The distributed transaction must be executed on multiple servers. We use MS DTC as transaction managers to coordinate the processing operations of each server to transaction, in order to reduce network failures on distributed transaction processing, avoid distributed transactions to cause different servers The inconsistency of the data, the X / Open XA specification stipulates the processing of distributed transactions as two stages, namely the preparation phase and the submission phase, is often the two phases of the other.

When the distributed transaction is processed, we first start a distributed transaction at the server side with the Transact SQL script program, and use the server as a distributed transaction management server, then the scripter performs a distributed query or remote to the connection server. The stored procedure on the server, the distributed transaction management server automatically calls the MS DTC to allow remote servers to participate in distributed transaction processing. When the scripter executes a CommT Transaction, a CommT Work, a Rollback Transaction or a ROLLBACK WORK statement, the Distributed Transaction Management Server will call MS DTC again, use it to manage two-phase submission process, so that the connection server and the remote server are submitted or rollback transactions. For example, in the insurance business system, if the company's database management system finds that the customer is cross-insured, you need to insert the policy into the reject record table, and the state of the policy is set to invalid in the database of the corresponding business branch. . We have established branches in business database (DBServer1) stored procedure update_policy status update policy, execute the following script in the head office database server (DBServer), start a distributed transaction insert_reject: USE businessGOBEGIN DISTRIBUTED TRANSACTION INSERT rejectVALUES (policy_id, insurance_no , business_date, customer_id, customer_name ...) eXECUTE DBServer1.business.dbo.update_policyCOMMIT TRANSACTIONGO insert_reject transaction execution system to insert a record in the reject DBServer table, and updates the corresponding branch in the policy database status field table, data of the transaction that the system The integrity is guaranteed. ■ Distributed inquiry

Distribution of data in distributed database systems causes queries that also have distributed, distributed queries that may target other OLE DB or ODBC data sources. SQL Server supports distributed queries, including queries from two or more server data, supports retrieval, updates, and cursors between servers, and uses Microsoft Distributed Transaction Coordinator (MS DTC) to ensure the node transaction semantics, maintenance servers Safety.

In the process of system design, in order to reduce network communication, we have stored data relationships in various databases based on the function of the application, so most applications are operations for local databases, but global queries still need Data support for multiple databases. In the management module of the business person, since each branch is directly managed, and the management system is different, we store the business information in the database of the branch, through the joint distributed query, all of the company's belongings The salesman conducts registration; in the customer management module, we store the customer information separately in the network client table of the business database server, in the network client table of the business database server, according to the source. Register with customers, the following is a way to introduce a federated distributed query with an annual salesman:

SELECT

Emp.emp_name, EMP.EMP_ID

EMP.EMP_GENDER ... FROM

DBServer1.business.dbo.employee as Emp Where

Date Between '01 / 01/2000 'AND '12 / 31/2000' Union Select

Emp.emp_name, EMP.EMP_ID

Emp.emp_gender ... fromdbserver2.business.dbo.employee as Emp Where

Date Between '01 / 01/2000 'and '12 / 31/2000'

转载请注明原文地址:https://www.9cbs.com/read-20065.html

9cbs

New Post(0)