Implementation of a database cluster system based on mysql
Author: Mr. Linux 2005-01-30 15:58:48 from: IBM DW China
At present, the database cluster system is successful, and the application range is relatively widely: Oracle's Oracle9 and IBM DB2. Oracle9 uses Shared-Storage technology, DB2 selects Shared-Nothing technology, both of which are long. The theoretical basis of the latest database cluster system is distributed computing, distributes data to each node, all computing nodes parallel to process data, summarize the results. This way is undoubtedly perfect. But it is still unable to implement all functions. For Shared-Storage and Shared-Nothing, please refer to Oracle and the relevant information on the IBM website. In the second section, the current database application status is roughly divided into two categories. The first category is below 100g, and the database access is frequent, requesting dense. Mainly the application of the web app type, such as website, forum, etc. These Web App Types The application access database is: Access is frequent, the database should be accepted for thousands of queries per second, and data is often added frequently, while the response speed of the data is relatively high. Another category is used for scientific calculations, storage history data, and data is often reached several hundred g. These applications have access to the database: Most of the query operations, the data is batch, timed, concentrated into the database, the database is not often much, accumulated a lot of data, and there is no high demand for the response of the database. The first type of application of the Issue III exposed, because the access is more frequent, and in order to support more access, Web Server generally uses the load balancing cluster, but for the database, the cluster operation is not implemented, every second The request is increasing. As the server load increases, the response is getting slower and slower, if the library file is relatively large, there is a matter of obsolete, and the lock time is too long and other things affect access efficiency. The second type of application, mainly the data file is too large, and each processing data requires a lot of time. If you write a statement, you need to spend a few hours to keep inquiry. Section IV how to resolve the first possible to optimize from hardware, software, procedures, indexing, and SQL statements. If you still don't solve the problem, we should consider the database system cluster (parallel processing). For the first class, the application is still satisfactory to the database system in the case where the database server is operating normally, the load is not high. However, after the database system load is too high, the time to complete the request is long, and the system is not required. Since the load is caused by excessive requests, we take the way of sharing the request, let a part of the request to access another server, so that the load of the single server is reduced, thereby solving the problem. For a second type of application, a system that requires distributed computing is solved, and the general system is powerless. Section 5 For "Linux Apache PHP MySQL" solution to the first type of application problem with a practical case solution: I met this problem among my work, our web server is Linux Apache PHP The cluster of three machines, MySQL runs on the platform of Sun450, 2G memory. Since the Web's access is almost full of loads during peak, Loadavg (the number in the Running state within one minute) is between 10-20, which is reflected in a large number of requests are hanging when accessing the database. Live, causing a request without completion, the next request comes in, and finally the vicious circle. Loadavg will soar to more than 800 instantly.
The database is worse, and the Loadavg reaches more than 300, the database's thread is very much, and the CPU is busy switching the thread state. This time unless RESTART MySQL, it will not be good. After the SQL statement is optimized, it is still unable to solve the problem. We add a database server, synchronous mechanism through MySQL, so that the data on the two databases is synchronized, and some part will only read the operation. The PHP program allows these programs to connect another database, which is partially separated by the load, and the problem has been initially solved. But later, the business is big, and we have added multiple servers, modify a lot of programs, separating them to the database read operation, accessing different servers. Section 6 The Mysql-HA-PROXY scheme proposes to achieve the system's load separation by modifying the procedure, is a painful thing, the project is very painful, and it cannot be mistaken, because in addition to the primary server can write, modify the data, and Other servers can only update their own data through data, so if you write to those databases, the results will be catastrophic. If we can have a program sorting SQL statement, transfer to different servers according to his type (read / write), and then return the result. With a way similar to HTTP, we do not need to share the load by modifying the source program. If you can again depend on the load status of the server, or the status of the table (available / locked), it should be judged. This request is allocated to which server, which is better than the effect of our modified source program. Section 7 How to communicate between mysql client and server, have not found an article about MySQL communication protocol, which seems to have only analyzed MySQL source programs. So I found code for MySQL 3.23.49, open the Sniffer tool. Mysql's communication protocol may change many times, in the version of 3.23.49, the version of the communication protocol is 10. Simplely analyzed the communication agreement, now the rules are as follows, some places are not very perfect, because I really don't have much time to study my mysql code carefully, I only know this.
Server's response data format: Offset area Type length (byte) Description 0 Head Data Length 3 1 2 3 FLAG 1 = 0 normal information = 1 multi-segment information = 2 authentication Return> 2 paragraph End word 4 Data CMD CODE 1 5 Message Datalength - 1 When Flag = 0, CMD CODE and Message defined CMDCode Type Message Structure 00 Status Code Offset Type Length (Byte) 0 Affect Rows 2 0A Server Version Number Offset Type Length (Byte) only When you just connected Server, Server will return to a data segment information 0 Versionstring 8 end of '/ 0' 8 session ID 4 32BITS 12 Unknown 11 FF When an error occurs, return information offset type Length (Byte 0 Errcode 2 2 Errmsg End Fe Empty Empty Client Submit Data Format for Server: Offset Area Length Length (Byte) 0 Head Data Length 3 1 2 3 Compressed 1 4 Data Command ID 1 5 Command Data Data LENGTH - 1 Command ID and Command data Description: ID type data format 0 COM_SLEEP 1 COM_QUIT NULL 2 COM_INIT_DB Database name 3 COM_QUERY stand query string 4 COM_FIELD_LIST table name [128] wildcard [128] 5 COM_CREATE_DB Database name 6 COM_DROP_DB Database name 7 COM_REFRESH options (bits) 8 com_shutdown null 9 com_statistics null 10 com_process_info null 11 com_connect 12 com_process_kill sid [4] 13 COM_DEBUG NULL 14 COM_PING N ULL 15 COM_TIME 16 COM_DELAYED_INSERT 17 COM_CHANGE_USER [user] [passwd] [db] 18 com_binlog_dump 19 com_table_dump 20 COM_CONNECT_OUT Section 8 How to complete the user authentication protocol analysis through Server, I tried to work, but I have encountered this part of the certification When you are in trouble, MySQL Server is connected to it in a client, will first return to the client a packet, contain the version number, version information, sessionid, a 8-byte key, is this key reason. Client will use this key to encrypt the password, then send the username, password, the information you need to open to the server, so that you have completed the certificate. I don't know how the client uses this key to encrypt, so I intend to skip passwords, I will reorganize the client's packet, after the information of Password, I succeeded, but the mysql users in the cluster are not password, There are fewer security problems, but these servers are placed behind the HA. There is no external IP address, it should be a lot, but more less is a shortcoming.
But I always know if the user's password is correct? How to do it? Use a dedicated MySQL to complete password authentication. Install a minimized resource Mysql Server to do mysqlauth (dedicated authentication server), return Mysqlauth's first packet to the client when the Client connection is connected, which is of course a key, and then the Client will use this key After encrypting the password, send the authentication information back. At this time, the mysqlha system will forward this information to mysqlauth, and he retains a copy. If the authentication passes, the reserved number will be reorganized, remove the password information, Then use the reorganized authentication information to connect the server in the cluster. The structural and flowchart of the Ninth system is the high reliability system established using the HeartBeat mode (for specific implementation methods, please refer to http://www.linuxvirtualserver.org/). Proxy is a MySQL-Proxy system, MySQLAUTH is a dedicated authentication server. Red RealServer is the primary server, which can be updated, while synchronizing data to other Realserver. The above figure is that the CLIENT authentication process is described above with the diagram of authentication and authentication. After the authentication is passed, after the connection is established, the system handles the process of the process of the process of SQL Query request, I have now basically completed. The development of Mysql-Proxy program, but is still in the test phase, the latest version is 0.0.4, the next version is still revised. Starting from version 0.0.3, MySQL-Proxy has been able to run the SQL-Bench provided by MySQL itself, but this SQL-Bench can only provide a single point of performance, and does not provide test function for the cluster's MySQL system. The system provides a dynamically capture of the loadavg on RealServer and then feeds back to MySQL Proxy programs, but because of this part I did not test, I used the request allocation used in the previous test to polling mode, if there are two loads The RealServer system will automatically rotate between them (Edit: Nuthead)