Comparison of architecture and scalability research (on)
Lin Fan iamafan@21cn.com
Chen Xun Software Studio R & D Manager November 2001
This article is the second half of the "scalability of the cluster and its distributed architecture", will continue to introduce common several parallel computing architecture, expandable and single system image, important indicators of clusters, etc. content.
The scalable parallel computational architecture first, let's first look at the main types of computer system architecture development. The difference in each architecture is not large, the key is that the interconnection technology, the complexity of nodes and the degree of coupling. In a cluster calculation and distributed system, the following three architectures are compared to represent.
No shared architecture
No shared architecture
(The current mode of most clusters is used. Each node is independent PC or workstation)
Most of our research cluster systems belong to this type of architecture. Each node of the cluster is a complete stand-alone operating system and hardware device collection. The node is connected by a loosely coupled manner between a local area network or a switch array, and the part of each other is shared, and even all available resources: CPU, memory, disk, IO device, etc. to form an external single, powerful computer system. Such systems have weak capacity for SSI, requiring special middleware or OS extensions. Shared disk architecture
Shared disk architecture (node is substantially independent computer, no or no local disk file system)
Distributed file systems are the application of such architecture applications. Common NFS, AFS or GFS belong to this category. Hardware resolutions are often implemented by shared disk arrays or SANs. The architecture is mainly capable of solving the capacity problem of regional storage space, providing a huge storage device for the entire cluster by constructing a single virtual file system. Especially in some highly available occasions, shared disk arrays often solve reliability issues such as fault tolerance and data consistent. Shared memory architecture
Shared memory architecture (least easy to implement, strong SSI capabilities)
From the difficulty of implementation, whether the complexity of hardware manufacturing is still difficult, this architecture is great to exceed the implementation of several other system structures. A cluster system that implements such an architecture has DSM (distributed shared storage cluster), NUMA, CCNUMA and other technologies. In such an architecture, a plurality of nodes can be collected together to form a single system consistent with a memory space. In subsequent learning we know, such systems have the best SSI (single system image) capabilities.
Scalable and single system image We ultimately hope, the parallel cluster we face, whether it is suitable for intensive calculations or high reliable commercial environments, there must be good scalability, acceptable unit computational cost, expected technology prospect. Therefore, designing a computing system, especially in parallel environments, and sometimes forgetting the core requirements of scalability.
But when we look at the computational cluster from another perspective, there will be different conclusions. In fact, for the end users and programmers, the focus of the parallel computer model is what the computer they see is what the computer is usually called SSI (single system image).
If the programmer, of course, I hope that I will face a machine instead of a bunch of machines, a machine means a single addressing space, no need to process messaging or remote call such complex programming technology. So, under this point of view, a cluster system with a single address space has this capability; or the user wants to have a huge uniform file system, which needs to be performed at the file system level. SSI's work.
But from the user's point of view, he doesn't care about how you deal with the address space, and the message is transmitted. It doesn't matter if you don't have a relationship. Users only care about the independent computer system, which can reduce He uses the complexity, does not need to switch back and forth between multiple systems, which can be convenient to manage the "one machine" he faced. Then, the SSI for providing management levels and using hierarchies is necessary. Therefore, the parallel computing model is an abstract parallel computer as seen by the user (including programmer, user), and the Feng Structure computer, can perform order calculation (it may be parallel computing program) and parallel computational tasks computer system.
According to the processor, memory, OS, and interconnect, the classification of parallel systems is performed, and we can get the following graphics in the reference of each other and the single-system image, we can get the following graphics:
Comparison of architecture comparison of clusters, distributed systems, MPP, SMP
In the above figure, the node can be a PC, a workstation or SMP server. The complexity of the node refers to the ability of software and hardware. In general, the cluster node is complex than MPP because each cluster node has a separate operating system and peripheral, and the nodes in the MPP may be just a microennote of the operating system.
The Node complexity of the SMP server is relatively high compared to the PC and commercial clusters. Let's take the most common X86 architecture SMP server, not only motherboard, bus technology is far more complicated than PC. And in order to support enterprise-level applications, SMP also needs to support more high-end peripherals, providing hotxcatable capabilities, memory data error correction, etc., which will increase the complexity of SMP.
MPP usually refers to a large parallel processing system that uses no shared resource structure, which generally includes hundreds of processor nodes, nodes typically run an incomplete OS (also called microennuclear), and between the nodes are interconnected by high speed switches. Such a proprietary system often has a relatively good scalable ability, but is limited to the proprietary system itself in the technical transformation.
As a large factor of cluster implementation, SSI is a single application level, subsystem, runtime system, operating system core, and hardware level. Or, SSI is not absolute, a relative concept, depending on what kind of perspective, is the SSI on the IP level or in memory space or the file system, which is determined by the final application environment.
In this category of distributed systems, systems often provide multiple system images, showing a multi-port, multi-image system collection, each node has high autonomous capabilities. MPP, SMP provides relatively single computing resources in a compact manner, such as a huge workstation. In a distributed system, in addition to using the same stroke point, heterogeneous platforms are often used as needed, which is bound to increase the design difficulty and management complexity of distributed systems. Other features are shown below:
Features MPPSMP cluster distributed system Number of 100 to 1000 Quantity Level 10 to 100 Quantity Level 10 to 1000 or more Some Quantities Nonimetage Complexity fine particles to the medium or coarse granularity medium large range Non-point communication message transmission Or shared variable shared memory messaging sharing file, RPC, message delivery task scheduling host single queue single run queue multi queue summary runs independent run queue single system image part support support full SSI a level support currently does not support node operating system A major kernel and multiple micro-kernel independent complete OSN class similar OS or heterogeneous OS address space multi-single address space (distributed shared memory) single multi-system availability or low high or fault-tolerant moderate unit An organization can be tested multiple organizations (multiplexed) multi-tissue connection distances, tightly coupling in a physical space, coupled in a chassis, loosely coupled, cross-regional (region) Or country) comparison of various parallel systems
For these four types of systems, SMP has the highest SSI, which provides SSI at all levels, ie sharing all system resources: single address space, single file system, single operating system core, etc., look and There is no different single CPU. MPP supports SSI only in some application layers and system layers. The degree of SSI provided by the cluster is lower, generally only to meet the SSI requirements of one or two aspects. For distributed systems, such as grid, its SSI has a low degree of implementation. By cross-platform tools such as Java, the distributed system may provide SSI capabilities in a certain sense, such as a single Java running space.
Important indicators of the cluster For clusters, we can get a simple concept: cluster is a collection of all computers (also called nodes), which are connected by high performance networks or LAN physics. Typically, each computer node can be a SMP a workstation or the most common PC. Most importantly, these individual independent computers are able to work together with efforts, and they look at a single integrated computer resource in "outside".
If it is just simple to connect the cluster with LAN, it is called a cluster, it is impossible to have practical value. It is important to investigate the cluster to see several performances and functional indicators of the cluster.
Energy: Due to each node in the cluster is running a traditional platform, users can develop and run their programs in a familiar and mature environment. The generic platform provides programming environments, operation interfaces, and control monitoring systems, and even GUI, allowing users to run a large number of programs they originally on the workstation without modification. Therefore, we can regard the cluster system as a large workstation. As a user, there is no different operation, but it is only a lot of performance.
Availability: Availability refers to a system of time percentage of a system for productive use (MTBF has no time to fail.). Traditional overall systems, such as host systems and fault tolerant systems rely on expensive custom designs to achieve high availability. The cluster does not use custom components, and use inexpensive commercial components to provide high availability, while highly equipment redundancy is the most common way of use of clusters:
Processors and Memory: The cluster has multiple processors and memory components. When a component is invalid, other still can still be used without affecting the overall operation of the cluster. In contrast, in SMP, since the processor communicates through the shared memory and bus, once the memory will fail to cause the system to crash. The memory is a "single point failure" of SMP. Disk array: Our common RAID 0 or 5, can meet the computer's disk redundancy error requirement. In the cluster, multiple local disks are often used, through standard shared protocols (NFS, IFS, etc.) to support fault tolerance requirements. When a node's local disk fails, you can continue to run over a remote disk. Common NAS devices, a disk device dedicated to a cluster network. Or use the distributed file system software to implement disk faults between multiple cluster nodes. Operating system: Generally, a cluster can achieve a single system image in a certain level. However, multiple operating system images still exist, and each node has a separate operating system. When a node crashes due to software or hardware failure, other nodes are still unaffected to continue working, and the entire cluster is also no different from the original. We sometimes call this feature "Node fault tolerance". Communication Network: Good cluster design fully considers a variety of possible failures and takes all feasible measures to avoid. The communication fault of the cluster node must also be considered. In a large complex cluster, a fault of a communication link may result in more than one node failure, and even make the entire cluster becomes unavailable. Therefore, it is necessary to take a suitable redundant link between the key points of the cluster. Generally considering the entry nodes of the cluster, the main control node, or the monitoring node is easier to become a single point failure, then in the access policy of these nodes, the backup link can achieve a better effect. Scalability: A cluster computing capacity increases with increases. Second, the scalability of the cluster is group scalability. Because it is a loosely coupled structure, the cluster can be extended to hundreds of nodes, and for SMP, it is very difficult to more than dozens of nodes.
In SMP, shared memory and memory bus are bottlenecks of system performance. When the same assembly is running in the cluster, there is no memory bottleneck. Each node can be performed on a node to fully use the local memory. For such applications, clusters provide higher overall memory bandwidth and reduce memory delays. The local disk of the cluster is also gathered into large disk space, which can easily exceed the centralized RAID disk space. Enhanced processing, storage, and I / O capability allows the cluster as long as the parallel software package as well developed, such as PVM or MPI, can solve a large application problem.
SMP does not have high scalable capabilities because it uses a competitive bus and a centralized shared memory. Single operating system images and shared memory are two potential single missing points, which reduces the availability of SMP.
The fault-tolerant system has extremely high availability, but the extension is expensive. The MPP's extension capacity is stronger, and it can maintain a better SSI ability. At present, the cluster is in a relatively compromised location, which will expand toward higher performance.
Performance price ratio: Cluster cost is effective to obtain the above advantages. Traditional supercomputers and MPP costs are very easy to reach tens of millions of dollars. Compared to this, cluster prices with the same peak performance are 1 to 2 orders. A large number of commercial components, their performance and prices follow Moore's law, so that the performance / cost ratio of the cluster is far faster than MPP.
Comprehensive comparison of availability and scalability
Design a good extension cluster system, it is necessary to take into account the above side.
First, try to make each of the components of the cluster independently to make independent local extensions possible and ensure backward compatible features. It is also necessary to use commercial components as much as possible, including OS, internet, and host systems, even applying programming environments. Final implementation: The algorithm is independent of the architecture, the application is independent of the platform, the language is independent of the machine, and the node is independent of the network. Secondly, select the appropriate model to perform the design tasks of the cluster system, try to use popular open standard parts to reduce the unit cost.
Finally, try to balance performance in design, avoid "wooden bucket principle" in the system (well known, the wooden barrel is limited to the shortest wooden board for the shorter wooden barrel); in addition to considering usability, it is necessary to consider usability Pay attention to the problem of single point failure, so as not to cause small errors due to small errors.
So, let's take a look after discussion, what we expect is what the cluster is like.
Conclusion The reason why us spend a lot of space introduction to several important architectural concepts of the cluster is because these concepts constitute the ultimate overall overall cluster. Finally, let's take a look at it. After the above-mentioned square, we can get the following elements of the cluster:
Independent Node: Each node is a complete computer, typically a single system. Single system image capability: A cluster is a single computing resource. The cluster will node as a separate resource, and achieve the concept of unified resource single entrance by means of a single system image technology. SSI makes clusters easier to use and manage. Effective nodes Connection: The nodes in the cluster typically use commercial networks such as Ethernet, FDDI or fiber, ATM, and so on. In addition, standard network protocols are also used to establish communication mechanisms between nodes. These all guarantee that the cluster communication is effective. Enhanced availability: The clustering provides a cost effective way to increase the availability of a system, compared to mainstream component level fault tolerance products, clusters tend to provide more reasonable costs to achieve effect. Most of the commercial domains to enhance the availability of the system as a design goal, so it can be implemented using techniques in the cluster. Better performance: It is equivalent to the birth of the cluster because of performance drivers. In the field of scientific computing, engineering applications, remote virtual reality simulations, the cluster should be able to provide higher performance, allowing the cluster as a super server, completing the task that the original stand-alone system cannot be successfully completed in the shortest time, or Provide huge disks and memory space, implement those "impossible tasks".
references
Scalable Parallel Computing Technology, Architecture, Programming Kai Hwang Zhiwei Xu Cluster Computing White Paper Mark Baker High Performance Cluster Computing Architectures and Systems Volume 1 Rajkumar Buyya
About the author Lin Fan, is now engaged in Linux related research work in Xiamen University. The cluster technology is greatly interested in communicating with like-minded friends. You can contact him via email iamafan@21cn.com.