Linux server cluster system (3)
content:
Foreword
Related ways to realize virtual services
Implement virtual server via NAT
Implement virtual server via IP tunnel
Implement virtual server by direct routing
Excellent and shortcomings
summary
Reference
About author
related information:
1.LVS project introduction
2.LVS cluster architecture
In the Linux area:
Tutorial
Tools & products
Code and patch
article
Wensong@linux-vs.org in the LVS cluster) April 2002
In this paper, the analysis of the three IP load balancing technologies (VS / NAT, VS / TUN and VS / DR) implemented in the LVS cluster, and their advantages and disadvantages .
1. Foreword In the previous article, several structures of scalable network services, which require a front end load scheduler (or multiple master from backup). We first analyze the main techniques of realizing virtual network services, pointing out that IP load balancing techniques are the highest efficiency in the implementation technology of load schedulers. In existing IP load balancing technologies, there is mainly a network address translation that makes a set of servers a high-performance, highly available virtual server, we call VS / NAT technology (Virtual Server Via Network) Address Translation. On the basis of analyzing the disadvantages of VS / NAT and the unsatisfaction of network services, we propose ways to implement virtual servers through IP tunnels Vs / Tun (Virtual Server Via IP tunneling), and implement virtual servers by direct routing VS / DR (Virtual Server Via Direct Routing) that greatly enhances the scalability of the system. VS / NAT, VS / TUN, and VS / DR technology are three IP load balancing technologies implemented in the LVS cluster, and we will describe their working principles and their own advantages and disadvantages in the article. In the following description, the data communication between the customer's socket and the server's socket is connected, regardless of the use of TCP or the UDP protocol. The following is a brief description of several load scheduling methods for high scalable, highly available network services, and include several representative research projects in this regard. 2. Realization of virtual services In network services, one end is a client program, and the other end is a service program, and there may be a proxy in the middle. From this point of view, multiple servers load balancing can be implemented at different levels. The existing way to solve network service performance issues with a cluster is mainly divided into the following four categories. 2.1. RR-DNS-based solutions NCSA scalable web server system is the first prototype system based on RR-DNS (Round-Robin Domain Name System) [1, 2]. Its structure and workflow are shown below:
Figure 1: Scalable web server based on RR-DNS (Note: This picture is from the literature [9])
There is a set of web servers that share all HTML documents by distributed file system AFS (Andrew File System). This group of servers have the same domain name (such as www.ncsa.uiuc.edu), when the user follows this domain name, the RR-DNS server resolves the domain name turnt to the different IP addresses of the server, thus dividing the access load to Various servers. This approach brings several problems. First, the domain name server is a distributed system that is organized in accordance with a certain hierarchy. When the user submits the domain name resolution request to the local domain name server, it will be submitted to the first domain name server due to the unable to resolve, the last level domain name server is submitted upwards until the RR-DNS domain name is resolved to it. A server's IP address. It can be seen that there are multiple domain name servers from users to RR-DNS, and they all buffer mappings that have parsed names to IP addresses, which will cause all users under this domain name license to access the same web server, different webs Serious load is unbalanced. To ensure that the domain name-to-IP address in the domain name server is not long-lasting buffer, RR-DNS sets a TTL (TIME TO LIVE) value on the mapping of the domain name to IP address. After this period, the domain name server will be mapped from this mapping The buffer was eliminated. When the user requests, it will submit a request to the previous domain name appler and resume. This involves how to set this TTL value. If this value is too large, many requests will be mapped to the same web server during this TTL, which will also result in severe load imbalance. If this value is too small, for example 0, the local domain name server will be quickly submitted to the RR-DNS, adding the domain name parsing network traffic, and the RR-DNS server will become a new bottleneck in the system. Second, the user machine buffers the mapping from the name to the IP address, not affected by the TTL value, and the user's access request will be sent to the same Web server. Due to the difference and access method of user access requests, for example, some people have left, and some people can access for a few hours, so the loads between each server still have tilting (SKEW) without control. Suppose the user's average number of requests in each session is 20, and the amount of request for the maximum load is higher than that of each server average number of more than 30%. That is, when the TTL value is 0, there is a more severe load imbalance because the burst of user access is also. Third, the system is reliability and maintainable. If a server invalidates the domain name to be resolved to the server to see the service interrupt, even if the user presses the "Reload" button, it will not help. System administrators also cannot cut a server at any time to perform system maintenance, such as operating the operating system and application software upgrade, which requires the list of IP addresses in the RR-DNS server, cut off the IP address of the server, Then wait for a few days or longer, and all domain name suits will be eliminated by the domain name to this server, and all clients mapped to this server no longer use this site. 2.2. Client-based solutions Based on client-side solution requires a certain server cluster for each client, which will send a request to different servers in a load balance. For example, when the Netscape Navigator browser accesses the homepage of Netscape, it randomly selects NT from more than 100 servers, and finally send the request to www.netscape.com. However, this is not a good solution, Netscape just uses its Navigator to avoid the trouble of RR-DNS resolution, when using other browsers such as IE, RR-DNS resolution is to be performed. Smart Client [3] is another client-based solution made by BERKELEY.
The service provides a Java Applet runs in the client browser, and the applet sends information to each server to collect information such as the server's load, and then send the customer's request to the corresponding server based on this information. High availability is also implemented in the applet. When the server does not respond, the applet forwards the request to another server. The transparency of this method is not good, and the applet queries the server to collect information will increase additional network traffic, and there is no universal applicability. 2.3. Solution based on application layer load balancing scheduling Multiple servers connected to a cluster system through high-speed interconnect networks, with a load scheduler based on the application layer at the front end. When the user access request arrives at the scheduler, the request will be submitted to the application to load balancing scheduling, the request request, according to the load of each server, select a server, rewritten the request and access to the selected server, acquire the result Once, return to the user. The typical representative of the application layer load balancing has a Zeus load scheduler [4], PWEB [5], Reverse-Proxy [6], and SWEB [7]. The Zeus load scheduler is a commercial product of Zeus, which is a server structure that is converted by the Zeus Web server program. PWEB is a parallel web scheduling program made based on Apache 1.1 server program. When an HTTP request arrives, the PWEB will select a server, rewrite the request and issue a rewritten request to this server, and wait for the result to return, then Transfer the results to our customers. Reverse-Proxy uses the Proxy module and the Rewrite module in Apache 1.3.1 to implement a scalable web server. It is different from the PWEB to find it first from Proxy's cache. If there is no copy, then choose one The server sends a request to the server, and then forwards the result of the server to the customer. SWEB is the use of Redirect error code in HTTP, after reaching the customer requests a web server, this web server handles the request, or introduces the customer to another web server via the Redirect error code to achieve A retractable web server. There are also some problems with multiple server solutions based on application layer load balancing schedules. First, the system handling is particularly large, causing the system to be limited. When the request reaches the load balancing scheduler to the end, the scheduler needs to perform four times from core to user space or context switching and memory replication from user space to core space; need to perform quadratic TCP connections, once from users to schedule The other is from the scheduler to the real server; the request is analyzed and rewritten. These processes require non-small CPUs, memory, and networks such as resource overhead, and process time is long. The performance of the constituent system cannot be close to the linearity. When the general server group increases to 3 or 4, the scheduler itself may become a new bottleneck. Therefore, this method of applying a method based on application layer load balancing is extremely limited. Second, the load balance scheduler based on the application layer requires different scheduler for different applications. The above systems are based on HTTP protocols. If applications such as FTP, Mail, POP3, you need to rewrite the scheduler. 2.4. Solution Based on IP Load Balancing Scheduling Users When accessing services via virtual IP address (Virtual IP Address), the access request will reach the load scheduler, which is loaded equalize to load equalization, and selects from a set of real servers. One, convert the target address Virtual IP address of the message to the address of the selected server, the target port of the message is rewritten into the corresponding port of the selected server, and finally send the message to the selected server.
When the response message of the real server passes the load scheduler, the source address of the message and the source port and the source port are changed to the Virtual IP Address and the corresponding port, and then send the packet to the user. Berkeley's MagicRouter [8], Cisco's LocalDirector, Alteon's ACEDIRECTOR, and F5 BIG / IP, etc. are all using the network address conversion method. MagicRouter is a speed packet insertion technology on Linux version 1.3, so that the user process for load balancing will access the network device to the speed of the core space, which reduces the processing overhead of the context switch, but does not completely, it is just the prototype system There is no useful system to survive. Cisco's LocalDirector, ALTEON's ACEDIRECTOR, and F5 BIG / IP are very expensive commercial systems that support part of the TCP / UDP protocol, and some have problems on ICMP processing. IBM's TCP Router [9] uses a modified network address conversion method to achieve a retractable web server in the SP / 2 system. TCP Router modifies the destination address of the request message and forward it to the selected server, the server can set the source address of the response message to the TCP Router address instead of its own address. The advantage of this method is that the response message can be directly returned to the customer, and the damage is the operating system kernel of each server needs to be modified. IBM's NetDispatcher [10] is the successor of TCP Router, which forwards the message to the server, and the server configures the address of the router in the Non-ARP device. This approach is similar to VS / DR in the LVS cluster, which has high scalability, but a set of millions of dollars in IBM SP / 2 and NetDispatcher. In general, IBM technology is quite good. In the One-IP [11] of the Bell Lab, each server is independent IP address, but all the same VIP address is configured with IP Alias, and use routing and broadcasting two methods distribution requests, the server receives the request and press VIP. The address processing request and the result is returned as the source address as the source address. This method is also to avoid rewriting of the message, but each server is configured with the same VIP address for each server, which causes the address conflict, and some operating systems will have a network failure. Through broadcast distribution requests, you also need to modify the source code of the server operating system to filter packets, so that only one server is processed. Microsoft's Windows NT load balancing service (WLBS) [12] is obtained by the acquisition of Valence Research at the end of 1998, which is the same as the local filtration method based on One-IP. The WLBS is run between the NICs driver and the TCP / IP protocol stack. It obtains a message of the target address for VIP. Its filtering algorithm checks the source IP address and port number of the packet to ensure that only one server will packet. Hand it to the previous layer. However, when there is a new node to join and have node fail, all servers need to negotiate a new filter algorithm, which causes all SESSION connections. At the same time, WLBS requires all the servers have the same configuration, such as network card speed and processing power. 3. Many networks use a reserved IP address due to the increasingness and security of IP address space in IP address space in IPv4, and many networks use the IP address (10.0.0.0.0.0.0.0, 172.16.0.0/255.128.0.0 and 192.168.0.0/255.255.0.0: [64, 65, 66]. These addresses are not used on the Internet, but specifically for internal networks.
When the host in the internal network is to access the Internet or by Internet access, you need to use the network address translation (NAT), and the internal address is converted to the external address available on the Internet. The working principle of NAT is that after the header (target address, source address, port, etc.) is correctly rewritten, the customer believes they connect an IP address, and the server group of different IP addresses also believes that they are directly connected to customers. Thus, the parallel network service of different IP addresses can be used to use the NAT method to a virtual service on an IP address. The architecture of VS / NAT is shown in Figure 2. There is a scheduler in front of a set of servers, they are connected via Switch / Hub. These servers provide the same network service, the same content, that is, regardless of the request sent to which server is sent, the execution result is the same. The content of the service can be copied to the local hard disk of each server, and can be shared via a network file system (such as NFS) or by a distributed file system. Figure 2: Architecture of VS / NAT
When visiting the network service via Virtual IP Address, request the message to reach the scheduler, the scheduler selects a server from a set of real servers according to the connection scheduling algorithm, and the target address Virtual IP Address rewrites to the address of the selected server, the target port of the message is rewritten into the corresponding port of the selected server, and finally send the modified packet to the selected server. At the same time, the scheduler records this connection in the connection hash table. When the next message is reached, the address and port of the original selection server can be obtained from the connection HASH table, and the same rewriting operation is performed, and the message will be Pass to the original server. When the response message from the real server passes the scheduler, the scheduler changes the source address of the message and the source port to the Virtual IP Address and the corresponding port, and then send the message to the user. We introduce a state machine on the connection, and different packets will make the connection in different states, different states have different timeout values. In the TCP connection, the status migration is performed according to the standard TCP finite state machine. Here we do not describe the "TCP / IP IllustRated Volume I" in W. Richard Stevens; in UDP, we only set a UDP state. The timeout value of different states can be set, in the default, the SYN state is 1 minute, the timeout of the ESTABLISHED state is 15 minutes, the timeout of the FIN is 1 minute; the timeout of the UDP state is 5 minutes. When the connection terminates or timeout, the scheduler deletes this connection from the connection hash table. In this way, the customer see is just the service provided on the Virtual IP Address, and the structure of the server cluster is transparent to the user. For the rewritten message, the application increment adjusts the CHECKSUM algorithm to adjust the value of TCP CHECKSUM, avoid scanning the entire message to calculate the cost of Checksum. In some network services, they transmit IP addresses or port numbers in the packet. If we only convert the IP address and port number of the newspaper, it will appear, and the service will be interrupted. Therefore, for these services, we need to write the corresponding application module to convert the IP address or port number in the message data. We know this problem with network services with FTP, IRC, H.323, Cuseeme, Real Audio, Real Video, VxTreme / Vosiac, Vdolive, VivoActive, True Speech, RSTP, PPTP, StreamWorks, NTT Audiolink, NTT SoftwareVision, Yamaha Midplug, iChat Pager, Quake and Diablo. Here, an example is given to further illustrate VS / NAT, as shown in Figure 3: Figure 3: Example of vs / nat
The configuration of VS / NAT is shown in the following table, all the traffic to the IP address is 202.103.106.5 and the port 80 is loaded with the real server 172.16.0.2:80 and 172.16.0.3:8000. The packet of the target address is 202.103.106.5:21 is transferred to 172.16.0.3:3:21. The message to other ports will be rejected.
Protocol
Virtual IP Address
Port
Real IP Address
Port
Weight
TCP
202.103.106.5
80
172.16.0.2
80
1
172.16.0.3
8000
2
TCP
202.103.106.5
twenty one
172.16.0.3
twenty one
1
From the following example, we can learn more about the process of rewriting in more detail. Visiting the web service packets may have the following source address and destination address:
Source
202.100.1.2:3456Dest
202.103.106.5:80
The scheduler selects a server from the scheduling list, for example 172.16.0.3:8000. The message will be rewritten as the following address and send it to the selected server.
Source
202.100.1.2:2:3456
Destin
172.16.0.3:8000
The response packet returned from the server to the scheduler is as follows:
Source
172.16.0.3:8000
Destin
202.100.1.2:2:3456
The source address of the response message will be rewritten as the address of the virtual service, and then send the message to the customer:
Source
202.103.106.5:80
Destin
202.100.1.2:2:3456
In this way, the customer thinks the correct response from 202.103.106.5:50 service, without knowing the request is server 172.16.0.2 or server 172.16.0.3. 4. Implementing the Virtual Server (VS / TUN) through the IP Tunnel In the cluster system of VS / NAT, the data packets requested and responding need to pass the load scheduler, when the number of real servers is between 10 and 20 The load scheduler will become a new bottleneck of the entire cluster system. Most Internet services have such a feature: request messages and response packets often contain a lot of data. If the request and response can be separated, it is only responsible for scheduling requests in the load scheduler, which will greatly increase the throughput of the entire cluster system. IP Tunnel (IP Tunneling) is a technique that encapsulates an IP packet in another IP packet, which makes the target for an IP address can be encapsulated and forwarded to another IP address. IP tunnel technology is also known as IP package technology (IP Encapsulation). The IP tunnel is mainly used for mobile hosts and virtual private networks, where tunnels are static, and one end of the tunnel has an IP address, and the other end has a unique IP address. We use IP tunnel technology to forward request packets to the backend server, and the response message can return directly from the backend server to the customer. However, here, the backend server has a set of rather than one, so we cannot staticly establish a corresponding tunnel, but dynamically select a server, request packet packaging and forward to the selected server. In this way, we can use the principle of IP tunnels to consist of web services on a set of servers on a virtual network service on an IP address. The architecture of the VS / TUN is shown in Figure 4, each server configures the VIP address on its own IP tunnel device.
Figure 4: Architecture of VS / TUN
The Workflow of the VS / TUN is shown in Figure 5: its connection scheduling and management is the same as the VS / NAT, but its packet forwarding method is different. The scheduler dynamically selects a server according to the load of each server, and the request packet is encapsulated in another IP packet, and then forward the packaged IP packet to the selected server; the server receives the message First, first uncheck the packet that the original target address is VIP, the server finds that the VIP address is configured on the local IP tunnel device, so processes the request, and then returns the response packet directly to the customer according to the routing table.
Figure 5: Workflow for vs / tun
Here, it is important to indicate that according to the default TCP / IP protocol stack processing, the request message is requested to be VIP, and the source address of the response message is definitely VIP, so the response message does not need to be modified, you can return directly Customers, customers think it is normal, but not know which server is processed.
Figure 6: Semi-connected TCP finite state machine
5. Through the direct routing, the virtual server (VS / DR) is the same as the VS / TUN method. VS / DR uses the non-symmetric feature of most Internet services. The load scheduler is only responsible for dispatching requests, and the server directly returns to customers. It can greatly improve throughput throughout the cluster system. This method is similar to the method used in IBM's NetDispatcher products (where the IP address configuration method on the server is similar), but IBM's NetDispatcher is a very expensive commercial product, and we don't know the mechanisms used inside, of which Some are IBM patents. The architecture of VS / DR is shown in Figure 7: both the scheduler and the server group must be physically connected to the unstopped local area network, such as by high-speed switches or HUB. The VIP address is shared by the scheduler and server group. The VIP address configured by the scheduler is visible, requiring a request message for receiving a virtual service; all the servers are configured on their respective Non-ARP network devices, which The outside is invisible, just a network request for processing the target address as VIP. Figure 7: Architecture of VS / DR
The workflow of VS / DR is shown in Figure 8: Its connection scheduling and management is the same as in VS / NAT and VS / TUN, and its packet forwarding method is different, and the packet is directly routed to the target server. In VS / DR, the scheduler dynamically selects a server according to the load of each server, and does not modify the IP packet, but the MAC address of the data frame is changed to the MAC address of the server, and will The modified data frame is sent on the local area of the server group. Because the MAC address of the data frame is an selected server, the server must receive this data frame from which the IP packet can be obtained. When the server discovers the target address VIP is on the local network device, the server processes this message and then returns the response packet directly to the customer according to the routing table.
Figure 8: Workflow for VS / DR
In VS / DR, according to the default TCP / IP protocol stack processing, the request packet's destination address is VIP. The source address of the response message is definitely VIP, so the response message does not need to be modified, can be returned directly Give customers, customers think it is normal, not which server is processed. The VS / DR load scheduler is only in the semi-connection from the customer to the server as the VS / TUN, and the state migration is performed according to the semi-connected TCP finite state machine. 6. The advantages and disadvantages of the three methods compare the advantages and disadvantages of three IP load balancing techniques in the following table:
_
VS / NAT
VS / TUN
VS / DR
Server
Any
Tunneling
Non-Arp Device
Server Network
Private
Lan / WAN
Lan
Server Number
Low (10 ~ 20)
HIGH (100)
HIGH (100)
Server Gateway
Load Balancer
OWN ROUTER
OWN ROUTER
Note: The above three ways to support the maximum number of servers is that the assumption that the scheduler uses the 100M NIC, the hardware configuration of the scheduler is the same as the hardware configuration of the backend server, and is a general web service. Use a higher hardware configuration (such as Gigabit NIC and faster processor) as a scheduler, the number of servers that the scheduler can scheduling will increase accordingly. When the application is different, the number of servers will also change accordingly. Therefore, the above data estimation is mainly compared to the scaling of the three methods. 6.1. Virtual Server VIA NAT VS / NAT has the advantage that the server can run any operating system that supports TCP / IP, which only requires an IP address to be configured on the scheduler, and the server group can use private IP addresses. A disadvantage is that its scaling capacity is limited. When the number of server nodes rises to 20, the scheduler itself may become a new bottleneck of the system, as requests and response messages in VS / NAT need to pass the load scheduler. We measure the average delay of rewrite packets on the Pentium 166 processor to 60Us, and the delay on the processor with higher performance is short. Assuming the average length of the TCP packet is 536 bytes, the maximum throughput of the scheduler is 8.93 mBytes / s. We have assumed that each server has 800kBytes / S, such a scheduler can drive 10 servers. (Note: This is the data measured earlier)) The cluster system based on VS / NAT can be suitable for the performance requirements of many servers. If the load scheduler becomes a new bottleneck of the system, there can be three ways to solve this problem: mixing methods, VS / TUN and VS / DR. In the DNS hybrid cluster system, there are several VS / NAT load schedulers, each load scheduler with its own server cluster, while these load schedulers have a simple domain name through RR-DNS. But VS / TUN and VS / DR are better ways to improve system throughput. For network services transmitted to the IP address or port number in message data, you need to write the corresponding application module to convert the IP address or port number in the message data. This will bring the implementation of the work, and the application module checks the overhead of the message will reduce the throughput rate of the system. 6.2. Virtual Server VIA IP Tunneling In the VS / TUN cluster system, the load scheduler only schedules the request to different backend servers, and the backend server returns the data that the answer directly to the user. In this way, the load scheduler can handle a large number of requests, which can even dispatch hundreds of servers (equally larger servers), and it does not become the bottleneck of the system. Even if the load scheduler has only 100Mbps full-dupleng network card, the maximum throughput of the entire system can exceed 1Gbps. Therefore, the VS / TUN can greatly increase the number of servers that the load scheduler scheduling. The VS / TUN scheduler can schedule hundreds of servers, and it does not become a bottleneck of the system, which can be used to build high-performance super servers. VS / TUN technology requires the server, that is, all servers must support the "IP Tunneling" or "IP EncapSulation" protocol. At present, the rear server of VS / TUN is mainly running the Linux operating system, we do not test other operating systems. Because "IP tunneling" is becoming a standard protocol for each operating system, VS / TUN should apply the backend server running other operating systems. 6.3. VIRTUAL Server Via Direct Routing, like the VS / TUN method, the VS / DR scheduler only handles the client to the server-side connection, and the response data can be returned directly from the independent network route to the customer. This can greatly improve the scalability of the LVS cluster system.