In-depth explanation of server cluster technology

xiaoxiao2021-03-06 33

In the early days of development, all the way can provide power for one server and all its applications. Then develop a multi-processing era, then two or multiple processors share a storage pool and can handle more than larger applications. A server network appears, each server in the network dedicated to different application sets. Now, it has developed to server clusters, two or more servers work like a server, providing higher availability and performance, which is far beyond your imagination. Applications can be transferred from one server to another, or run on a trunk server - all this is transparent to the user. Clusters are not new things, but in software and hardware, until they are still proprietary. Information system managers have more careful considerations for clusters, because they can now use large-scale standard hardware to achieve clusters, such as RAID, symmetrical multi-processing systems, networks, and I / O network cards and peripherals. Cluster technology will get greater development in the future, now, continuously introducing new cluster options, and the real cluster criteria is still in the formulation.

What is the cluster? Simply put, the cluster is two or more computers or nodes work together in a group. The cluster can provide higher availability and expandability compared to a computer that works separately. Each node in the cluster usually has its own resource (processor, I / O, memory, operating system, memory), and is responsible for its own user sets. Fault switching features provide availability: When a node fails, its resource can "switch" to one or more other nodes in the cluster. Once the failed node resumes full operation, through the forward-looking server "Switch" to the other server in the cluster, you can implement the upgrade, stop the server's run to increase the component, and then put it back into the cluster , Then turn its function to the server from other servers. Additional more expandability can be provided using distributed messaging (DMP), and DMP is a cluster communication technology that allows applications to extend to a single symmetrical multiprocessing (SMP) system in a single symmetrical multiprocessing (SMP) system. Each node in the cluster must run the cluster software to provide services, such as fault detection, recovery, and ability to manage the server as about a system. The nodes in the cluster must be connected in a way you know all other node states. This is usually implemented through a communication path separated by a local network path, and a private network card is used to ensure clear communication between nodes. The communication path relay system is one "QUOT; heartbeat", so if a resource fault is therefore unable to send a heartbeat, it will start the failover process. In fact, the most reliable configuration uses different communication connections (LAN, Redundant heartbeat of SCSI and RS232) to ensure that communication failures will not activate errors.

Today, today, for cluster purchases, fortunately, there are a number of different grades of clusters to choose from, and they offer extensive availability. Of course, the higher the availability, the higher the price, the greater the management complexity. Shared storage shared disk subsystem is often the basis of a cluster, which uses shared SCSI or Fiber Channel. Each node uses its local disk storage operating system switched space and system file, and application data is stored on a shared disk, and each node can read data written by other nodes. The concurrency disk access between the application requires the distribution of the lock manager (DLM), and the distance between the shared disk subsystem and its cluster node is limited by the selected media (SCSI or Fiber Channel, etc.). Server mirroring (mirror disk) requires data redundancy without having to occupy the image data between the server. In addition to lower cost, another advantage of server mirroring is that the connection between the motherboard server and the auxiliary server can be based on a local area network, so that the SCSI distance limit is eliminated. After the data is written to the motherboard server, it is also written to the secondary server; maintaining the data integrity by locking server data. Some server mirroring products can also convert workloads from primary servers to the secondary service. Non-sharing now, some cluster products use the "non-shared" architecture, in which the nodes are neither a centralized disk, nor mirror data between nodes. When a fault occurs, the software has from the non-shared cluster enables disk ownership from one node to another without using a Distributed Distributed Lock Manager (DLM). How to implement a failover? You can use a variety of ways to formulate clusters to implement fault switching. The first method is that n-road formulation, all nodes in the cluster have their own users and workloads under normal conditions. The resource of a fault node can be switched to other nodes, but because the remaining servers bear additional loads, their performance will decline. N 1 formulation includes a hot standby system that is always in idle mode before the main system is faulty. In n 1 formulation, the performance of other nodes can be avoided when a node fails. However, since the standby node does not provide service under normal circumstances, the cost is higher. In any formulation, once there is a problem, the cluster software will be able to restore local recovery first. Local recovery is the ability to automatically restart the application or service in the local node when a fault occurs. Logical local recovery is the preferred mode because it is more than a deadline, because it is less interrupted by the user compared to the other node. In terms of failover, some cluster products can be recovered in parallel, with resources to fail to switch to remote nodes in different regions. This is suitable for disaster reduction requirements. Outside the secondary, in order to solve multiple node fault problems, some cluster products can be changed by cascade, and their working mode is like dominoes: node one failure to node 2, and then switch to node three, etc. .

Failover Example The following is a two-node cluster fault switching example, two of which have its own users and the following applications. 1. Node 1 causes the application failure due to memory problems. User information is incorrect and its application stops running. Cluster management software notifies the system administrator. 2. Node 1 For local recovery, restart the fault application. Users can restart their applications. 3. When the application fails again, the cluster software will fail over the node 2. It takes about 1 minute to fail, and the user must wait. (The actual time may be from a few seconds to minutes.) Some applications can detect the fault procedure and display information to the user to inform them to transfer applications to another server. 4. This application and client communication are switched to node 2. Once the application is restarted in node 2, the user can continue to work. 5. Diagnose and repair node 1. After retracting the normal node 1 backward, the Recovery (Switch) process will start to return the application and related resources to node 1. This fault recovery can be manually or automatically implemented. For example, during a non-peak period, it can be configured to failure recovery. The cluster expansion is also a major advantage of the cluster in addition to improving availability. Typically, performance can be improved through cluster load balance. Essentially, load balancing means transferring related applications and resources from a busy node to a non-busy node. Real expandability is achieved in other areas. The first area is to increase the expansion, which means that the server, disk memory, and the like can be continuously added without abandoning the previous system. In fact, as your computer demand continues to increase, the cluster provides the environment that pays with your development. You will see the second type of expandability after the real "support cluster" application that automatically assigns its workload on multiple nodes in the cluster. It is also available in addition to this, so that the different "threads" of an application runs on different nodes, thereby greatly improve

How to deal with failover? The next question is "How to deal with the fault change?" The answer is "This depends on the application and cluster products used." Some cluster products provide a resume or switch kit for specialized applications (such as database or communication protocol). These kits can be detected at the time of application failures and can restart the app on another server. Applying processing faults is different due to the difference between cluster products. As we mentioned earlier, although different manufacturers have tried to formulate a general standard, now there is no public standard. However, you must modify the current application to handle failover, the ultimate goal of the application is not affected by hardware. A solution is a group of programs and APIs that work together with the operating system, allowing app vendors to create programs that perform these recovery functions. Use these APIs to make the application "support cluster". Many manufacturers of current cluster products are struggling to ensure that cluster products can meet these different operating system APIs.

Virtual Interface Architecture (VIA) is jointly launched by Intel, Compaq, HP, Microsoft, Dell, SCO and Tianteng, which is a virtual interface architecture (VIA) program is developing cluster hardware and software products, which will be independent. It will provide more choices for users to purchase technology.

转载请注明原文地址:https://www.9cbs.com/read-46134.html

9cbs

New Post(0)