Linux server cluster system (2)
content:
introduction
Universal architecture of LVS cluster
Scalable web service
Scalable media service
Scalable Cache service
Scalable mail service
summary
Reference
About author
related information:
1.LVS project introduction
In the Linux area:
Tutorial
Tools & products
Project and patch
All articles
The architecture of the LVS cluster, Wensong@linux-vs.org, April 2002
This article mainly introduces the architecture of the LVS cluster. First give a general architecture of the LVS cluster and discussed its design principles and corresponding features; Finally, the LVS cluster is applied to establish telesconstructed web, media, cache, and mail.
1. Introduction In the past ten years, Internet development has become a global network with a large number of applications and services from several research institutions, and it is an indispensable part of people's lives. Although Internet has developed very fast, construction and maintenance of large network services is still a challenging task, as the system must be high performance, high and reliable, especially when accessing is growing, the system must be expanded to satisfy Constantly growing performance requirements. Due to lack of framework and design methods for establishing scalable network services, this means that there is only a very good engineering and management talent to establish and maintain large network services. In this case, this article gives the general architecture of the LVS cluster, and discusses its design principles and corresponding features; finally applies the LVS cluster to establish network services such as Scalable Web, Media, Cache and Mail. 2. The general architecture of the LVS cluster LVS cluster uses IP load balancing technology and content request distribution technology. The scheduler has a good throughput, transferring the request to perform on a different server, and the scheduler automatically blocks the server's fault, thereby constitute a set of servers, a high-performance, highly available virtual server. The structure of the entire server cluster is transparent to customers, and there is no need to modify the client and server.
Figure 1: Architecture of the LVS cluster
To this end, it is necessary to consider the transparency, scalability, high availability and easy management of the system when designing. In general, the LVS cluster uses a three-layer structure, and its architecture is shown in Figure 1, and the main components of the three layers are:
Load Balancer, is a front-end machine of the entire cluster, responsible for sending the customer's request to a set of servers, and the customer thinks the service is from an IP address (we can call the virtual IP address) Up.
Server Pool, is a set of servers that are truly customer requests, and services are implemented by Web, Mail, FTP, and DNS.
Shared Storage, which provides a shared storage area for the server pool, which is easy to have the same content as the server pool.
The scheduler is a unique entry point of the server cluster system, which can use IP load balancing techniques, based on content request distribution technology or both. In IP load balancing techniques, the server pool needs the same services to provide the same services. When the customer requests arrives, the scheduler only selects a server from the server load and the set scheduling algorithm, forwarding the request to the selected server, and record this scheduling; when this request The arrival will also be forwarded to the previously selected server. In content request distribution technology, the server can provide different services. When the client requests arrives, the scheduler can select the server to execute the server according to the contents of the request. Because all operations will be completed in the Linux operating system core space, its scheduling overhead is small, so it has a high throughput rate. The number of nodes of the server pool is variable. When the entire system received exceeds the processing capability of all current nodes, you can add the server to the server pool to meet growing request loads. For most network services, there is no strong correlation between the request, and the request can be executed in parallel at different nodes, so the performance of the entire system can basically increase as the number of nodes of the server pool increases. Shared storage is usually a database, a network file system, or a distributed file system. Server nodes require dynamic update data to be stored in a database system, while the database guarantees the consistency of data when accessible. Static data can be stored in a network file system (such as NFS / CIFS), but the network file system is limited, in general, NFS / CIFS servers can only support 3 to 6 busy server nodes. For large-scale cluster systems, distributed file systems, such as AFS [1], GFS [2.3], CODA [4], and Intermezzo [5], etc. Distributed file systems provide shared storage for each server, which access distributed file systems just like accessing local file systems, while distributed file systems provide good scalability and availability. In addition, when the application on different servers reads the same resource on the distributed file system simultaneously, the application's access conflict needs to disperse to make the resource consistently. This requires a Distributed Lock Manager, which may be provided inside the distributed file system, or external. Developers can use Distributed Lock Manager to ensure consistency in different nodes in different nodes when writing applications. The load scheduler, the server pool and shared storage system are connected by high speed networks such as 100Mbps switched network, myrinet, and gigabit networks, etc. Using high-speed networks, the Internet has become a bottleneck of the entire system when the system is expanded. Graphic Monitor is a monitor that provides a monitoring of the entire cluster system for system administrators that monitor the status of the system. Graphic Monitor is a browser based, so regardless of the management of the system regardless of the administrator or off-site. For security reasons, the browser can monitor system monitoring through HTTPS (Secure HTTP) protocols and identity authentication, and perform system configuration and management. 2.1. Why use the system structure of the hierarchical architecture allows the layer to be independent of each other, each of which provides different functions, and can reuse different software in one level. For example, the scheduler layer provides load balancing, scalability, and high availability, etc., which can run different network services such as web, cache, mail, and media, etc. to provide different scalable network services. The clear functional division and clear hierarchy make the system easy to build, and the entire system is easy to maintain, and the performance of the system is easily expanded. 2.2. Why is the shared storage shared storage such as a distributed file system in this LVS cluster system is optional.
When the network service needs to have the same content, the shared storage is a good choice, otherwise each server needs to copy the same content to the local hard disk. The more the content of the system stores, the larger the price of Shared-Nothing Structure, because each server needs the same large storage space, any update needs to involve each server, the system's maintenance cost very high. Shared storage provides unified storage to the server group, which makes the content maintenance work of the system relatively easy. If WebMaster only updates the page in the shared storage, it is valid for all servers. The distributed file system provides good scalability and availability. When the storage space of the distributed file system increases, the storage space of all servers increases. For most Internet services, they are read-intensive applications, and distributed file systems use local hard drives to cache (such as 2GBYTES space) per server, allowing access to distributed file systems. The speed is close to access the local hard drive. In addition, the development of storage hardware technologies also enables migration of shared storage from no shared clusters. Storage Area Networks technology solves each node of the cluster to connect to / share a huge hard disk array, and hardware manufacturers also offer a variety of hard disk sharing technology, such as Fiber Channel, Sharing SCSI (Shared SCSI). Infiniband is a universal high-performance I / O specification that allows the storage area to transfer I / O messages and cluster communication messages in a lower delay, and provide good scalability. Infiniband gets support from most large manufacturers, such as Compaq, Dell, Hewlett-Packard, IBM, Intel, Microsoft, and Sun Microsystems, etc., it is becoming a industry standard. The development of these technologies makes shared storage easier, and the scale production will also gradually decrease. 2.3. High availability cluster system is characterized by it has redundancy on hardware and software. The high availability of the system can be implemented by detecting a node or a service process failure and a correct reset system such that the request received by the system can be handled. Typically, we have resource monitoring processes on the scheduler to monitor the health status of each server node. When the server is not arrogant to ICMP ping or probing her network service does not respond to the specified time, the resource monitoring process notifies the operating system kernel to delete or expire the server from the scheduling list. In this way, the new service request will not be scheduled to the bad node. The resource monitoring process can report faults to the administrator via email or pager. Once the monitoring process is restored to the server, the notification scheduler will schedule it to the scheduling list. In addition, through the management program provided by the system, the administrator can send a command to enhance the new machine to improve the processing performance of the system, or the existing server can be cut out to system maintenance of the server. The scheduler of the current front end is likely to be a single failure point of the system (SINGLE POINT OF Failure). In general, the reliability of the scheduler is high because there is less program running on the scheduler, and most of the program has been traversed, but we cannot rule out the main failure of hardware aging, network lines, or human malfunctions. In order to avoid failure of the scheduler, the entire system cannot work, we need to set up a backup from the scheduler as the main scheduler. Two heartbeat processes [6] run on the master, from the scheduler, they report their respective health status through the serial port and UDP and other heartbeats.
When the heartbeat of the main scheduler cannot be heard from the scheduler, from the scheduler via the ARP spoof (Gratsuitous ARP), the cluster-external Virtual IP Address is taken, and the main scheduler is taken to provide the load scheduling service. When the primary scheduler is recovered, there are two ways. Here, the introduction of the incidence (i.e., the main scheduler is still working) due to the heartbeat failure, which is not possible, and the main scheduler is still working properly. Typically, when the primary scheduler is invalid, all the status information of all established connections on the main scheduler will be lost, and the existing connection will be interrupted. Customers need to reconnect, from the scheduler to schedule new connections to each server, which will cause certain inconvenience to customers. To this end, the IPVS scheduler implements a high-efficiency state synchronization mechanism in the Linux core, and synchronizes the status information of the main scheduler to the slave scheduler. When taken from the scheduler, most of the established connections will continue. 3. The architecture of the scalable web service based LVS is shown in Figure 2. The first layer is a load scheduler. Generally, IP load balancing techniques can make the entire system have a higher throughput; the second layer is Web server pool, can run HTTP services or HTTPS services separately on each node, or both are running; the third layer is shared storage, which can be a database, which can be a network file system or a distributed file system, or The mix of the triple. The nodes in the cluster are connected by a high-speed network. Figure 2: WEB cluster based on LVS
For dynamic pages (such as PHP, JSP, and ASP, etc.), the dynamic data that needs to be accessed is typically stored in the database server. The database service is running on a separate server and shared all web servers. Whether you access the same data on the same web server, or multiple dynamic pages on a different web server access to the same data, the database server is locked to make these access to ensure the consistency of the data. For static pages and files (such as HTML documents and pictures, etc.), can be stored in a network file system or a distributed file system. As for which one to choose, see size and demand for the system. Through shared network file systems or distributed file systems, WebMaster can see unified document storage space, maintenance, and update pages are easier, and modify the page of the shared storage is valid for all servers. In this configuration, when all server nodes are overloaded, the administrator can quickly join the new server node to process the request without having to copy the web document to the nodes of the node. Some web services may be used to http cookies, which is stored in the customer's browser to track and identify the client's mechanism. After using HTTP cookies, there is a correlation between different connections to the same customer, which must be sent to the same web server. Some web services use a secure HTTPS protocol, which is an HTTP protocol plus SSL (Secure Socket Layer) protocol. Some of the Web services may use a secure HTTPS protocol, which is an HTTP protocol plus SSL protocol. When a customer accesses the HTTPS service (HTTPS's default port is 443), an SSL connection will be established to exchange a symmetrical public key encrypted certificate and negotiate an SSL key to encrypt the later session. In the life cycle of SSL Key, all of which use this SSL Key, so different HTTPS connections in the same customer also associates. For these needs, the IPVS scheduler provides a persistent service, which allows different connections from the same IP address to the same server node in the set time, which can solve the customer connection well. Relevance problem. 4. The architecture of the scalable media service based on the LVS is shown in Figure 3: The first layer is a load scheduler, generally use IP load balancing techniques, which can make the entire system have a higher throughput; the second layer is Web server pool, can run the corresponding media service at each node; the third layer is shared storage, and stores media programs through the network file system / distributed file system. The nodes in the cluster are connected by a high-speed network. Figure 3: Media cluster based on LVS
IPVS load scheduler typically uses direct routing methods (ie, VS / DR methods, will be described in detail in later articles) to architecture media cluster systems. The scheduler distributes the media service request to each server, while the media server returns the response data directly to the customer, so that the entire media cluster system has good scalability. Media servers can run a variety of media service software. Currently, the LVS cluster has good support for Real Media, Windows Media and Apple QuickTime media services, all have real systems run. In general, streaming services use a TCP connection (such as RTSP protocol: Real-time streaming protocol) to control the bandwidth negotiation and flow rate, returning the stream data to the customer via UDP. Here, the IPVS scheduler provider considers TCP and UDP to ensure that media TCPs and UDP connections from the same customer will be forwarded to the same media server in the cluster, so that media services are accurate. Sharing storage is the most critical problem in the media cluster system, because media files are often very large (a film takes a few hundred to several thousand megabytes of storage), which has high requirements for storage capacity and read speed. For small medium cluster systems, such as 3 to 6 media server nodes, storage systems can consider using Linux servers with Gigabit NIC, using software RAID and log type file systems, run the kernel NFS service, will Have a good effect. For large-scale media cluster systems, it is best to choose a distributed file system for file segmentation storage and file cache; media file segmentation is stored in multiple storage nodes of distributed file systems. On, you can improve the performance of the file system and the load balancing between the storage nodes; the media file is automatically cached on the media server to improve the access speed of the file. Otherwise, you can consider developing a corresponding tool on the media server. If the cache tool can count the most recent hot media file, copy the hot file to your local hard disk, and replace the non-hot file in the cache, and finally notify other media. The server node is the media files and loads it caught; there is an application layer scheduling tool on the media server, which receives the customer's media service request. If the requested media file is cached on the local hard disk, then directly to the local media directly Service Process Service, otherwise it is first considering whether the file is cached by another media server; if the file is cached by another server and the server is not busy, the request forwarded to the media service process processing on the server, otherwise directly to the local media service Process, read media files from the rear shared storage. The benefits of shared storage are the managers of the media file to see unified storage space, making the media file maintenance work relatively convenient. When the customer's access is increasing such that the entire system is overloaded, the administrator can quickly join the new media server node to process the request. Real is known for its high compression ratio audio video format, Real Media Server and Media Player RealPlayer. REAL is using the above structure will consist of more than 20 servers to provide Web and audio video services for its global users. Real's senior technical supervisor claims that the LVS beat all the commercial load balancing products they have tried [7]. 5. Scalable Cache Services Effective Network Cache System can greatly reduce network traffic, reduce response delay, and server load. However, if the Cache server is overloaded and the request cannot be processed in time, it will increase the response delay. Therefore, the scalability of the Cache service is important. When the system load grows grows, the entire system can be extended to improve the processing power of the Cache service.
In particular, the Cache service on the backbone can require several Gbps throughput, single servers (such as Sun's current highest-end Enterprise 1000 server) far from which this throughput cannot be reached. It can be seen that it is a very effective way to achieve retractable Cache services through the PC server cluster, and is also the highest performance ratio. The architecture of the LVS-based Cache cluster is shown in Figure 4. The first layer is a load scheduler. Generally, IP load balancing techniques can make the entire system has a higher throughput; the second layer is a Cache server pool, general cache The server is placed in close to the trunk Internet connection that can be distributed in different networks. There are multiple schedulers to be located close to the customer. Figure 4: LVS-based Cache Cluster
The IPVS load scheduler generally uses IP tunneling methods (ie, the VS / TUN method, will be described in detail in the later article), because the Cache server may be placed different places (eg, near the trunk Internet connection), The scheduler and the Cache server pool may not be in the same physical network. With the VS / TUN method, the scheduler only schedules the Web Cache request, and the Cache server returns the response data to the customer. In the case where the request object cannot be in the local hit, the Cache server wants to send a request to the source server, retrieves the result, and finally returns the result to the customer; if the commercialization scheduler of NAT technology is used, it takes four to enter and exit the scheduler four times. This request. With the VS / TUN method (or the VS / DR method), the scheduler only schedules a request, and the other three times directly accessed the Internet. Therefore, this method is particularly effective for the Cache cluster system. The Cache server uses a local hard drive to store cache objects because the cacheable object is a write operation, and the access speed of I / O can be improved by a local hard disk. The Multicast Channel between the Cache server is interactive through the ICP protocol (Internet Cache Protocol). When a Cache server is not in the current request in the local hard disk, it can query other Cache servers if there is a copy of the other Cache server, if there is, from the adjacent Cache server to take the copy of the object, this can further improve Cache The hit rate of the service. For more than 150 universities and regional services, Janet Web Cache.com implemented scalable Cache clusters in November 1999, using only more than 50 mutually independent Cache servers, users Reflecting the network speed is as fast as summer (student puts on summer vacation). It can be seen that the burr of a single server can be visually visited by load scheduling, and the resource utilization of the entire system is improved. 6. Scalable mail services As the Internet users grow, many ISPs face the problem of their mail server overload. When the mail server does not accommodate more user accounts, some ISPs buy a higher-end server instead of the original, migrating the original server information (such as user mail) to the new server is a cumbersome work, which will result Interrupt; Some ISPs set new servers and new mail domain names, new mail users are placed on new servers, such as Shanghai Telecom now uses different mail servers public1.sta.net.cn, public2.sta.net.cn to Public9.sta.net.cn places the user's mail account, which is static to split the user to different servers, which will cause the mail server load imbalance, the resource utilization of the system is low, and the address of the mail is more difficult to mail the user. . Figure 5: Retractable mail cluster based on LVS