Introduction
林凡 ((iamafan@21cn.com)
Chen Xun Software Studio R & D Manager November 2001
This is a new column for cluster technology. The author will focus on the scalability and architecture analysis of the cluster, the original theory, the cluster's consideration, the specific analysis case (LVS, Beowulf, MOSIX), cluster high availability technology, distributed file system, etc. Introduction the cluster system. This article is the first one. It mainly describes the concept of cluster origin, the definition and objective of distributed systems. The author needs to pay attention to the cluster solution through a story: how to properly look at the cluster, what is the point of view to examine a cluster system. Solve this fundamental position and view point of the cluster can be used to use the cluster's technology to solve the real problem.
The foremail "said that the world's big potential, long-term must, long-term" high-end calculation architecture development is in accordance with such a law. Processes the distribution calculation processing from the data concentration of financial and telecom, etc. produced.
The partitioning technique of division is initially in the large-scale era. After purchasing a large machine after purchasing a large machine, many applications use a small part of the resource. At this time, if a company's production department only needs to use 10% of the resources of this machine, and requires better security, the partitioning technology can meet its requirements, isolate the partial processors and system resources of the host. Only use these resource organization computing systems for use, other computing resources are not affected. This is the concept of early physical partitions. Partitioning technology has developed to today, not only for large host services, because distributed computing applications are constantly expanding, in many respects, high-performance UNIX server and IA architecture server have slowly replaced the position of the large machine, partition this Tombs also have an important role in these areas.
The partition brings people a higher return on investment and usage, more flexible application levels, as well as dynamically use and distribute resources. These features are implemented through different partition types, and there are more and more types of partitions, and more and more meticulous, and more preferred to apply. The most typical partition technology is a multi-operating system we used on the PC, which belongs to the local physical partition. In the multi-machine environment, the partition can be separated by the partitioning computer (even if several computers are located in the same physical location), and the partitioned computer can also be different in different Used on the level to improve the utilization of the computer system. The most common situation is not a VPN. With a safety protocol, VPN can set a decentralized remote computer system into a logically "local area network" to synergistically complete complex tasks and ensure remote communication; these computers may physically Computers (such as one office) again merge into a cluster, using free time to complete additional real-time computing tasks.
Now, "combined" cluster technology is a way of connection between computer systems, which uses it to connect the dispersed computing system to complete the task that cannot be completed by the original node, the earliest cluster system is out of this The purpose of parallel processing. However, with the development of computer performance and the emergence of network insecurity, the stability and reliability of the system is the problem that people are mainly facing resolution. People began to connect two or more devices, when single or local multipoint failures occur in the entire cluster system, other computers in the cluster will automatically pick up fault devices. The most typical example is to achieve both a hot backup. Two-in-one computer system, using cluster software, one of which is another backup, when the host system crashes, the other will replace its task. In addition, the high parallel performance of the cluster system is used for complex science, engineering calculations, and is also very economic practices. For cluster systems, the computer's architecture is a very important issue. Between the computers of the same architecture, cluster connections can be easily achieved, which is generally the cluster software provided by the manufacturer. For heterogeneous computer systems, there is generally no particular choice. Currently similar to Java's middleware technology can solve some cross-platform problems. Cluster technology can effectively solve the stability, compression and load balancing of open systems.
Why do you need a cluster that can only choose in two computing systems when building an information application system, one is based on the host's computing system, which is generally the partitioning mode as described above; the other is based on the client / Server's group set calculation system. Based on the host-based computing system has good system scalability, reliability, and high performance, it is expensive to force users to invest huge resources on the hardware system at the beginning of the system, while users spend huge funds to buy host systems may It contains many functions that are not needed, resulting in waste of resources. The client / server-based group-based computing system allows users to gradually increase hardware systems based on actual needs, but the system of this system is not a true cluster, lacks necessary availability and manageability, in terms of application upgrade and management User users have a huge price, and each additional server / client connection should increase the management burden on both ends.
Therefore, the future on calculation speed, system reliability, and cost effectiveness will cause development of additional computer models to replace the above calculation model. As the computer network appears, a new system with a higher performance / price ratio has gradually become the application of mainstream-distributed cluster calculation system. When the user needs to complete any tasks, distributed cluster calculations provide as many computer processing capabilities and data transparent access capabilities, while achieving high performance and high reliability.
Cluster computing mode is the most economical computing mode. The cluster system allows users to make ordinary commercial hardware systems, and add new hardware, improve system scalability and availability as needed, so that only high-end systems have only high-end systems on low-end low-end platforms with low prices. High scalability and high availability, both improves the performance of the system, and also reduces costs, achieving more computer = targets for faster speeds.
People have increasing interest in clustering. The theme of cluster calculations is a variety of various researchers are studying all aspects of distributed hardware architecture and distributed system software design to develop potential cluster parallelism and cluster availability.
Cluster computing systems (or distributed systems) are varied and related to different system architectures. For some users, a cluster system is a collection of multi-processors that work together to work with a single problem. For other users, a cluster system may mean a computer network that is geographically dispersed separately, these processors are connected together to achieve sharing of different resources.
However, the word the cluster is so widely used in the computer system so that its use is a bit depreciated. Many confuses from this area are derived from the lack of distribution of distribution and logic of physics. By distinguishing between these two concepts, the properties of a distributed system can be more accurately described.
For a distributed cluster system, we use the following definition: In the user's view, a cluster system is a single common system, however, running on a series of autonomous processing units (PE also called nodes), each processing unit (junction Point) has its own physical memory space and is connected through a high-speed link or a standard commercial network. The synergistic calculation of the same task is implemented by a close cooperation between nodes. The system must support any number of processes and nodes dynamic extensions. The main purpose of establishing a cluster system is:
Inherent applications are guaranteed. The cluster system exists in a natural way, for example, in our society, people often appear in groups and share information, companies, communities, classes, etc. in the form of groups. When calculating from individual calculation to a cluster distribution, it is often possible to retain the application on personal computing systems, and the original applications are resended in a new cluster system and achieve performance improvement. This is also a major cause of clusters. Performance / cost. The parallelism of the cluster system reduces the bottleneck of processing, providing a comprehensive improvement performance, that is, the cluster system provides a better performance price ratio. Resource Sharing. The cluster system can effectively support users of different locations to share information and resources (hardware and software). Flexibility and scalability. The cluster system can increase the extension and can easily modify or extend the system to accommodate the changing environment without interrupting its operation. Practical and fault tolerance. With the multiple of the storage unit and the processing unit, the cluster system has the potential to continue running in the event of a system failure. Scalability. The cluster system can easily expand to include more resources (hardware and software).
Recently, we noticed that in addition to the original manufacturer of manufacturers, manufacturers who use specialized cluster systems in manufacturers, several major hardware manufacturers are developing and planning to launch Linux-based openness. Source code cluster products, such as IBM, HP, SGI, etc. The rapid development of open source circles provides a good technical stage to the cluster, and the LINUX vendors of TurboLinux, Valinux and other old names are also the high-end application field of Linux clusters as their strategic development direction, spare no effort to add fierce market products. Competition. In front of us is a variety of diverse alternative cluster solutions, and there are software for hardware. So, how to achieve an excellent cluster system with an open source rich technical resource? First of all, it is certain that an excellent cluster system is impossible to be a chance, it is necessary to carefully consider the user's application environment, business needs, and the cost that can be investing. Below, we are more important in the cluster technology to explore some fundamental factors that have excellent cluster systems should have. And before this, let's first look at a small story that may be the most common.
Administrators' stories For users, how to think of computer systems and computing resources varies from person to person. Especially in a cluster application environment, different user needs have different requirements for resources.
Here is a story that happens in a small door site that is frequently accessed. After the website started to run for a while, the situation was good, but the administrator found that the administrator found that the user complained that the response was low. Thus, he upgraded the server's CPU and disk system and added 512M memory. I thought, I will never say anything. However, after a long time, (of course, our site is fortunate to attract a lot of users, and now the number of users has become 5 times), the situation is worse, and there is access to the peak period even Site refused. What is going on? Upgrade, the upgrade of the hardware seems to have come to the end, and our administrators will be in trouble again.
Here, unfortunate administrators encountered system bottlenecks. By viewing the log, we found that the load of the server CPU remains below 10% (occupancy), but there are many requests waiting for processing. Is the CPU is not fast enough, it is obviously not; is it too slow? It has already used the fastest RAID array, the average roof time reaches 5ms, can not be fast; the memory is not big enough, the motherboard can support only 4G, already At the upper limit. Everything is already the limit, and then upgrade has to replace it to the mainframe (there will be an endless input). Site still have to accept millions of or even tens of millions of page access every day, how can I control both server hardware costs and significantly improve performance? Of course, add high speed disk arrays to reduce the time of seek time, increase the speed of the response, or use Cache technology to speed up the page access efficiency. However, the hardware that rely on the extended single system will eventually be capable, disk, memory, network-wide development speed is much lower than the Moore Law speed of the CPU, causing mutual problems with each other. Moreover, in a single system, the higher the high-end hardware, the more cost and performance ratio is reduced by an index (see figure below). Solving such problems, it is necessary to start from the architecture, not a simple upgrade machine.
Single-machine system performance / price curve
In the story above. For users (referring to system administrators and website users), the server is a resource that provides web services. It is a number of calculations, storage, and data resources collection (find data, providing space). Users care is your site's response speed instead of how much Most of the CPU has used. Therefore, we start from the point of view of the user, it is not difficult to find, how to look at your computing resources, which is tight and the user's needs. Then, the key to solving the problem is not a few simple digital levels. Along the needs of users - improves the response speed of the site, more specifically, shortening the time of the Web site responding to each HTTP request - the issue of the administrator's first resolution.
We do not deny the role of upgrading hardware, but we know that in such cases, only partial replacement of hardware is uneconomical. The CPU becomes 1GHz from 500 MHz, and the performance of the web will not increase, even if it is unobstructed, this upgrade can only increase the Web performance of about 10%. Oh, it's too frustrating ~~.
Then we can think about it, since there is no substantial role, then you just have a big reform. Thoroughly change the architecture of the server. Of course, to control the cost of the money in the range of BOSS, the money is not used to burn. Program may be:
Replace the original system, purchase 64-bit server, such as Sun's Enterprise2000, not only CPU is strong enough, bus bandwidth is also enough to support harsh applications, and millions of users confirm: Sun is our ".com" Vital "point". Software and hardware have nothing, then let's buy it! Hey ~, wait, what, you will say it again, a Enterprise wants me to XX Wan Ocean! ? The original ASP is going to write! ! ? ? Please ask an administrator to maintain ~~! ! ! I can't hear it, let alone people who pick up the bag. The replacement system means investment, investment, reinvestment, and more destacies, what should I do? Throw away? I don't use it. The hardware platform of the replacement system is obviously not a good way to solve the problem, then the system is expanded on the original basis? So, with this method: use the free cluster software, on the basis of retaining the original hardware investment, add a few new PC devices to form a load balancing cluster system. Since a machine solves the problem, let multiple machines share together. How to do it? Step 1: Install Linux or BSD operating system, you don't have to worry about lack of excellent applications because you have a wide range of open source. Whether you are in how you apply an environment, an open source group always provides you with a proper stable software. Step 2: Install the corresponding kernel and related system patches. And install the cluster package that matches your kernel. Comparable for LVS, LSF, MOSIX, etc. You can meet the requirements of load balancing. Among them, LVS has a good scalability and performance performance due to its Netfilter technology, and is favored by the majority of developers. Step 3: Configure your load balancing cluster system. Nothing more is to configure some script files. Most cluster configuration files are directly editable, simple in grammar, easy to use, very convenient. Step 4: Configure your cluster service software. The service software here refers to applications that provide actual network services. In fact, it is generally Web Server or Mail Server. The story is a Web site, let's get a apache. Whether you are static text or CGI, Apache can help you achieve smooth web transplantation. If you are unfortunately using a proprietary technology such as ASP, you can also consider a third-party plug-in that is IASP to achieve transplantation. In general, it is very good for PHP, JSP and CGI, and Apache. Step 5: Set your data center appropriately. In fact, this step is very important for the cluster system. Because the storage consistent problem of the data storage consistent problem is highlighted by a single server. If the web application is concentrated in database access, you can use the concentrated database server, prograsql or mysql or even Oracle, which is guaranteed for data, safe and performance. Alternatively, you can consider the use of NFS, AFS and other distributed file systems to share the storage space and application data of each node of the cluster. In the last step: Write a report. Tell your BOSS, not only successfully solve the bottleneck problem of the site, but also greatly alleviate the pain of the upgrade of the money (in fact, in addition to buying the cost of cheap servers, other substantially nothing). Maybe BOSS can also rise in your salary ~ Hey.
After the story of the story, I don't know what to think. In fact, cluster technology is not a laboratory's Yangchun white snow, nor is the product of the old professor closed the door. Cluster technology "and our national planned people's livelihood, the lives of ordinary people have a close relationship";). We see that proper application cluster technology has not only significant economic benefits (many users are saving money), but also have strong scalability in many aspects such as performance, usability, stability. It is the main direction in which the calculation system has developed in recent years. Study clustering technology, need to start from multiple aspects, and consider clustering systems. If you consider the cluster from the perspective of the above resources, you should pay attention to what kind of computing resources need to be calculated or the response speed, or availability; if you see from the hardware architecture, there is a workstation cluster (COW), large scale Parallel Processing Machine (MPP), Symmetric Multiprocessor (SMP), distributed heterogeneous computing cluster (typically Grid); Load load balancing clusters, or widely used in strict commercial environments with high availability clocks; there are several aspects: availability, single system images (SSI), job management, scheduling, communication, etc. . It can be said that isolated from a certain angle or a certain aspect is unable to truly understand the principles of the cluster, accurate assessment of all aspects of the cluster, and don't say to choose the right cluster environment for application requirements. Understand the cluster, you must start with the design and scalability of the architecture.
About the author Lin Fan, is now engaged in Linux related research work in Xiamen University. The cluster technology is greatly interested in communicating with like-minded friends. You can contact him via email iamafan@21cn.com.