Sun Cluster Work Principle
The structural arrangement of this chapter is to introduce important concepts in Sun Cluster as the main line. The relevant working principle is distributed in the introduction of each concept. The concept of quorum is often used in a distributed system. In the original concept, Quorum is a consistent opinion reached at a majority member at a critical time of competitive relationships, giving the best solution. Here you can understand a mechanism for a consistent opinion of most people, or reach an agreement with these majority members. The actual number of quorums that make up the received quorum is different in different cases. Perhaps 2/3, maybe as long as more than 50%. In a distributed computer system, a set of communication relationships consists of a potential member of Quorum. In order to ensure that the system is effective and key decisions on system behavior, the process is achieved by interactive information, until the final formation of Quorum. In Sun Cluster, there are two types of Quorums that are used: Cluster Member Monitor CMM (Cluster Membership Monitor) Requirens Quorum for a group of cluster nodes, which has the ability to become a CLUSTER member. Editor Note: This means that CMM needs to get a majority of the consent in a set of nodes with Cluster node relationships. So Quorum: "Most people's consent" The main body of this "person" does not specifically represent something, just indicate that these things have formed majority agreement, and it is definitely a node. This type of Quorum is called CMM Quorum, or Cluster Quorum. Cluster Configuring Database CCD (Cluster Configuration Database) Need Quorum to pick a valid uniform CCD copy. The main body here is CCD.
4.1 In CMM Quorum Cluster 2.2, if you use SSVM or CVM, Quorum is determined by the Cluster's framework software, if Disksuite is used as a volume manager, Quorum is determined by Disksuite. CMM Quorum is determined: • If SSVM and CVM are used as a volume manager, Quorum is the majority adopted by multiple voting rights nodes and other neutral devices. In the two nodes, in order to generate Quorum, a Quorum Device provides a voting third party. Note: Quorum Device and Quorum are two concepts. The concept of Quorum Device is told later. • Use Disksuite as a volume manager, not discussed. It is necessary to form quorum when the node is added, and the private network connection between cluster and the private network connection between Cluster failed. Cluster2.x efforts have reached not to human intervention, both guaranteed the integrity of the data, also maintaining the availability of Cluster, so Quorum Device is used. In multi-nodes, even the TC is used. Cluster 2.2 determines the quorum when the Cluster heartbeat connection fails, the volume manager has played a major factor. For details, please refer to the QUORUM Device section of the Quorum Device.
4.2 CCD Quorum Cluster Configuring Database CCD (Cluster Configuration Database) Need Quorum to pick with a valid and consistent CCD copy. Related chapters behind the content reference of the CCD. The CLUSTER configuration information is saved in the CCD. CCD has one on each Cluster node, under normal circumstances, the CCD between each node should be kept synchronous. Communication between CCDs is connected to private network. However, due to the failure, it may cause each CCD that cannot be reported to synchronize. At this time, CCD Quorum is required. A valid CCD must be determined after failure recovery, if an effective CCD cannot be determined, all the query and update operations for the CCD failed with a CCD error alarm. It is a very limited condition that you need to start all nodes before determining a valid CCD copy. This limitation can be relaxed by making a forced limit on the update operation. If N is the number of nodes configured in the current Cluster, when the Cluster is restarted, CEILING (N) is sufficient to select a valid CCD copy. When N is an odd number, CEILING (N) = (n 1) / 2; when n is even, CEILING (N) = (N / 2). For a node, the copy number is sufficient; for 2 nodes, 1 copy is sufficient; when 2 nodes, 2 copies are sufficient; for 4 nodes, 2 copies are sufficient. So this ceiling (n) can be understood to be greater than half equal to half. Effective CCD will be known to go to all nodes without the latest CCD. Note: Even if the CCD is invalid, a node is still allowed to join the Cluster. However, in this case, the CCD can neither be updated nor by query (verification: When a node is started, there is a print message of the Query CCD). This means that all components that depend on CCD's Cluster are in a functional disorder state. Special, in this case, the logical host cannot be controlled, and the DATA Service cannot be activated. The CCD is only valid when there is a quorum when there is a sufficient number of nodes to get to the Cluster. When at least one or more nodes are in a "clear" state during the CCD reconfiguration process, the CCD's quorum problem can be avoided. In this case, the effective copy of any of these nodes will beware of the recently added cluster, and the other is to ensure that the Cluster is started on the node with the latest CCD copy. However, in the CCD update process, the system crashes, after which it is, it is very likely that the system's recovery algorithm finds an inconsistent CCD copy. In this case, the system administrator needs to re-recover the CCD with some parameters with CCDADM. The CCD also provides a tool for checkpoint operation to back up the current CCD current content. After any changes in the system configuration, a backup for CCD is a very good habit, which may be useful in future recovery. The size of the CCD and the common relational database are very small, backup and recovery take only a few seconds. In the case of a two-machine, the recommended node with the latest CCD is started, and then start another node, the latest CCD will be copied to another node, which greatly reduces the chance of CCD Quorum, and the resulting failure .
4.3 Quorum Device In specific cases, for example, two nodes of cluster, when the private network connection between the nodes fails, and the nodes are still members in the Cluster. Sun Cluster needs to solve the problem of CMM Quorum with the help of a physical device. This physical device is Quorum Device. Quorum Device is just a disk or controller specified during installation. Quorum Device is a logical concept that is specified as a Quorum Device does not have any effect on its use. In the official information of Sun, SSVM does not allow a disk partition (such as C1T1D1S5) as a separate DG-Disk Group (but actually implement), so a complete disk and its clusters (mirror) are required to be used as Quorum DEVICE. Quorum Device makes sure at any point in time, only one node can update the shared disk. If the heartbeat signal between the dual-machine is lost, it is not possible to ensure that the shared disk is accessed by that node, and Quorum Device is required. Each node will only try to update the shared disk data when it is possible to determine that it is a number of comments in Quorum. The nodes perform a vote, or quorum, decide which nodes remain in the Cluster. Each node must determine how many nodes communicate (of course, it is privately connected). If it can communicate more than half of the Node in Cluster, he becomes a member of Quorum and agrees to continue to retain the identity of the Cluster member. If you cannot be called a member in Quorum, you will automatically exit. Quorum Device is a third-party vote to prevent the appearance of the voting. For example, the heartbeat signal in the dual-machine is lost, and each node is going to fight for the support of Quorum Device. In this way, the number of QUORUM Device's nodes and the number of votes that have not been achieved by the Quorum Device node is 2: 1. After a node in the quorum, after controlling the shared disk, restart its cluster, and another node exits ( In this case, there is only one member in the Cluster, but the Cluster still exists). In fact, in each Cluster reconfiguration (note that this configuration is not installed, as a member is added, exiting or logic switching, exiting or logic switching, etc. A set of nodes and Quorum Device will vote once to determine a new system configuration. Only Quorum is successful, and the reconfiguration process will be done. After the Cluster is reconfigured, it is only the node of one of Quorums to remain in the Cluster. Quorum Device failed to fail to fail in two nodes cluster. Although the failure of Quorum Device does not cause the service to switch, it does reduce the performance of HA because the system will no longer reduce the failure of the heartbeat in the future. A failure Quorum device can be reconfigured or replaced during the running process. In this process, the Cluster will remain regularly in this process as long as there is no other components. If the heartbeat signal between the two-machine, the two nodes will try to start the Cluster reconfiguration process so that they become the only node in the Cluster (because the two-machine lost their heartbeat). The first node that successfully retains the configuration of the Quorum Device is new to only the cluster of its own single-node, and the other cannot retain the node that Qorum Device.
If the heartbeat signal has not been recovered, try to start an exit node, this exiting node (still not able to communicate with the cluster) attempt to make an appointment Quorum Device, because he thinks it is a member in Cluster. This attempt will fail because the request for Quorum is maintained by one node. This measure can effectively prevent the node from forming another CLUSTER. 4.4 Logic in the HA configuration, Cluster supports the concept of logic. A logic machine is a set of resources that can be moved as a unit between physical nodes. In Sun Cluster, this group of resources includes a set of network hostnames and associated IP addresses, plus one or more disk groups. In Sun Cluster, an IP address is assigned to a logic and temporarily bundles hosts running with the server application. This IP address is floating, that is, it is movable from one node to another node. In Cluster, the client program is accessed by specifying the floating IP address of the access logic, rather than accessing its fixed IP address.
As shown in Figure 4-1 (original map), the logical host Hahost1 includes a network hostname HaHosts1, a floating IP address 192.9.200.1 and disk group DiskGroup1. The name of the logical host name and disk group can be different. Editor Note: It can be easily considered that logic = logical hostname floating IP address disk group associated with it. Figure 4-1 (Original image) logic HAHOST1 logical host is controlled by the physical host. The physical host can access the logical host's disk group only after the logical host is controlled. A physical host can control multiple logical hosts, but each logical host can only be controlled by a physical host. Any physical host with the ability to control the logical host is called "potential controller". A Data Service can access its services by annaping a well-known logical hostname.
Figure 4-2 (original image) shows a multiple DATA Service located in a logical host disk. In this example, it is assumed that the logical host Hahosts2 is currently controlled by physical host Phys-HaHost2. If physical host Phys-Hahost2 fails, all Data Service will be switched to Phys-Hahost1. Figure 4-2 (Original image) The logical host and Data Service will be described in how to configure Data Service on a logic. It can be seen that the difference between the logical host and the Data Service is that the logical host has all the features of the host, which is the basis for providing Data Service. If there is no Data Service, the logical host can still exist. Data Service focuses on providing a data "service", and in sun, it has repeatedly mentioned that Data Service must be in a shared disk, the purpose is for HA. It must be manifested by the floating IP of the logic, that is, the service is provided by this floating IP. Although it does not highlight the concept of IP. Multiple Data Service can be available on one logic. Once the logic fails, all Data Service on the logic is switched. So if you don't want a Data Service failure, logic switching, while affecting the other Data Service HA in the same logic, you can consider establishing a logic for this Data Service. Because the logic machine can only be controlled by a physical host at the same time, the corresponding Data Service can only be provided by one physical host, and cannot be provided simultaneously by multiple physical hosts.
4.5 Mount Information / etc / vfstab file includes hook information for a local file system. For a multi-host file system used by the logic, all nodes that may control the logic should save the mount information of the file system. Mount information of the logic file system is saved in each node in a separate file, which is named /etc/opt/sunwcluster/conf/hanfs/vfstab.logichost (ie, each logic corresponds to one Its name is a suffix Vfstab file). The format of the file is consistent with the format of / etc / vfstab, so although some domains are not used, it is very easy to maintain. The same file system cannot be hooked at multiple nodes at the same time, because only the DG corresponding to the file system is behind the node Import, the file system can be hooked on this node. 4.6 Public Network Management (PNM) Some failures cause all logical hosts on a node to migrate to another node. But the cable failed between a single network card and the public network will not cause the node to fail to switch. Public Network Manager (PNM) software (in Sun Cluster Framework Software) Allows NICs to be composed of different groups, once a network card in the group fails, other network cards in the group can replace the services required by the network. In the process of listening to failure and in the process of switching, the user's access is only a little hesitial. In a configuration using the PNM, there are multiple network cards in the same subnet. These network cards have been formed into a "reselling group". At any time, a network card must be in a reusstall group, and only one network card can be activated in a poster group (note that this is only the public network, for the private network card, there is no limit). When this activated public network card fails, the PNM software automatically switches the network service to another network card in the support group. All NICs located on the public network must be in the backup group.
Figure 4-3 (Original image) indicates that HME0, HME1, and HME2 constitute a rearpler, where HME0 is in an active state. Note: Even if there is no connection between the network card with the same node fail, the post-aid group is also used to monitor the public network. Figure 4-3 NIC failed switching configuration
4.7 System failure Switch HA, a node fails, the Data Service running on this node automatically transferred to another node to continue running. Failed Switching Software To switch the floating IP of the logical host from the failure node to the backup node. All running Data Service on the logic machine is automatically removed on the failure node. System administrators can switch to a logic manually. The difference between switching and manual switching caused by failure is that the former is automatically handled automatically by the Sun Cluster software when a node fails, and the latter is manually handled by the system administrator. If you do a regular system maintenance or upgrade software, you may be manually switched. Figure 4-4 shows an ordinary dual-machine configuration. Among them, each physical host controls a logical host (as shown in thick line). Two clients have access to Data Service on both logic. Figure 4-4 (Original map) Symmetric configuration before switching If the phys-hahosts1 failed, the logical host Hahost1 will be transferred to Phys-Hahost2. The logical host Hahost1 floating IP will also move to Phys-Hahost2, and the DATA Service accessed is redirected to Phys-Hahost2. During the Cluster reconfiguration, the client accessing Hahost1 will only feel a short delay, and the new configuration is shown in Figure 4-5. Note that although the logical host Hahost1 has changed the physical host, the customer feels that the same logical host accessed does not change. And this is automatically completed by the Cluster software. Since Phys-Hahost2 now controls two logical hosts, all shared disks associated can only be accessed by Phys-Hahost2. Figure 4-5 (Original image) failure to switch or manually switched asymmetric configuration 4.8 Section failure Switch If a plurality of logical hosts are controlled on a physical host, each logical host is allowed to switch to the backup node separately, while each other does not affect each other . For example, there are 3 logical hosts on the physical host, and there may be only one of the logical hosts to switch to the backup node, and does not affect other logical hosts to continue to remain on the original physical host. 4.9 Summary According to the needs of the application, the one or more user processes running on the dual-machine together constitute one or more Data Service. The system monitors the execution of each DATA Service and the health of each node. When a node fails, the DATA service running on this node is transferred to another node as needed to avoid the long interrupt of the application. It can be seen that the SUN dual-machine system provides software and hardware detection, system management, system switching, and DATA Service automatic restart. The purpose is only one, which is to try to avoid the interruption of the service provided by the system. It is necessary to clarify the following points in understanding the basic working principle of the two-machine system:
1. Each node is a stand-alone server system. Each node has its own SPU, network interface card, disk, etc. It works independently without other systems. 2. Switch between dual-machine systems should be transparent to the outside world. That is, the user should feel that the service used is migrated from one node to another node, although the service may have a short interrupt in the process of migration. In order to achieve this, a floating IP address is set. 3. Switch between nodes should be automated. Switch between nodes is automatically completed according to the fault condition of the dual-machine software, and should not be intervened.