The rapid growth of the Internet enables the number of access to the multimedia web server to increase, the server needs to provide a large number of concurrent access services, so for large-loaded servers, the CPU, I / O processing power will become a bottleneck. Since the performance of a single server is always limited, simple improvement hardware performance does not really solve this problem. To this end, multiple servers and load balancing technologies must be used to meet the needs of a large number of concurrent access. Linux Virtual Servers (LVS) uses load balancing technology to form multiple servers into a virtual server. It provides a load-bearing capacity that is easy to expand, and low-cost solutions for fast-growing network access requirements.
1.LVS structure and working principle
The structure of LVS is shown in Figure 1, which consists of a front-end load balancer (LB), and the rear end of the real server (RS) group. The RS can be connected to a local area network or a wide area network. This structure of LVS is transparent to the user, and the user can only see a virtual server (Virtual Server) as a LB, and the RS group that provides services is not available.
As shown in Figure 1
When the user's request is sent to the virtual server, the LB forwards the user request to the RS according to the set packet forwarding policy and the load balancing scheduling algorithm. Rs returns the user request result to the user. Like the request package, the return method of the answering package is also related to the package forwarding strategy.
There are three ways to send a polls in LVS:
NAT (Network Address Translation) mode. After the LB receives the user request package, the LB converts the IP address of the virtual server in the request package to a selected RS IP address, forwards to the RS; RS will be issued to the LB, LB will answer the IP of RS in the package For the IP address of the virtual server, back to the user.
IP Tunnel (IP Tunneling) mode. After the LB receives the user request package, the package is encapsulated according to the IP tunnel protocol, and then transmit it to a selected RS; RS solves the request information, directly transmits the answer content to the user. At this point, RS and LB are required to support IP tunneling protocols.
DR (Direct Routing) mode. After the LB receives the request package, the target MAC address in the request package is converted into a MAC address selected by a selected RS, and the packet is forwarded. After the RS receives the request package, the response content can be directly transmitted to the user. At this point, the LB and all RS must be within a physical segment, and the LB is shared with a virtual IP with the RS group.
2, IPvs software structure and implementation
The core of LVS software is IPVS running on LB, which uses an IP layer-based load balancing method. The overall structure of IPvs is shown in Figure 2, which is mainly composed of IP packet processing, load balancing algorithm, system configuration and management of three modules and virtual servers and real server chains.
as shown in picture 2
2.1 LVS Treatment Mode for IP Packages
IP package is done with the Netfilter framework of the Linux 2.4 kernel. A packet is shown by the Netfilter framework:
It is popular that Netfilter's architecture is to place some detection points (hook) through several locations across the network, and register some processing functions on each detection point (such as package filtering, NAT, etc., even Can be a user-defined function).
The position of the five HOOK points of the IP layer is as shown in the figure below:
As shown in Figure 3
NF_IP_PRE_ROUTING: The packet just entered into the network layer passes this point (just detected by the version number, checksum, etc.), and the source address conversion is performed at this point;
NF_IP_LOCAL_IN: After the route lookup, the Input package is filtered at this point after the route lookup.
NF_IP_FORWARD: The package to be forwarded by this detection point, the FORWORD package is filtered at this point; NF_IP_LOCAL_OUT: The package emitted by the machine process passes this detection point, the OUTPUT package is filtered at this point;
NF_IP_POST_ROUTING: All horses should pass through the package out of the network device through this detection point, the built-in destination address conversion function (including address camouflage) is performed at this point.
In the IP layer code, there are some statements with NF_HOOK macros, such as the forwarding function of IP:
<-ipforward.c ip_forward () ->
NF_HOOK (PF_INET, NF_IP_FORWARD, SKB, SKB-> DEV, DEV2, IP_FORWARD_FINISH);
// The definition of the NF_HOOK macro is basically as follows:
<- / include / linux / netfilter.h->
#ifdef config_netfilter
#define nf_hook (PF, Hook, SKB, INDEV, OUTDEV, OKFN)
(List_empty (& NF_HOOKS [(PF)] [(hook)])
(OKFN) (SKB)
: NF_HOOK_SLOW ((PF), (SKB), (OUTDEV), (OKFN)))))
#ELSE / *! config_netfilter * /
#define nf_hook (PF, HOOK, SKB, INDEV, OUTDEV, OKFN) (OKFN) (SKB)
#ENDIF / * Config_Netfilter * /
If Netfilter is not configured when the core is configured, it is equivalent to calling the last parameter. In this case, the ip_forward_finish function is executed; otherwise enter the hook point, perform the function of registering through nf_register_hook () (this sentence may be relatively passed It is actually entered the NF_HOOK_SLOW () function, and then the registration function is executed).
NF_HOOK macro parameters are:
PF: The protocol name, the NetFilter architecture can also be used outside the IP layer, so this variable can also have names such as PF_INET6, PF_DECNET.
Hook: The name of the hook point, for the IP layer, is to take the five values above;
SKB: As the name suggests
Indev: The device coming in, is represented by the struct net_device structure;
Outdev: Go out of equipment, expressed in the struct net_device structure;
okfn: It is a function pointer that turns this process when all the registration functions of all the Hook points are turned out.
These points are already defined in the kernel unless you are the maintainer of this part of the kernel code, otherwise no need to increase or modify, and the processing of this detection point can be specified by the user. Like Packet Filter, Nat, Connection Track These features are provided in this way. As Netfilter's original design goals - provide a flexible framework to facilitate extension features.
If we want to join your code, use the NF_REGISTER_HOOK function, its function prototype:
INT NF_REGISTER_HOOK (Struct NF_HOOK_OPS * REG)
Struct NF_HOOK_OPS: // Structure
Struct nf_hook_ops {
Struct list_head list;
/ * User Fills in from here down. * /
NF_HOOKFN * HOOK;
INT PF;
Int hooknum;
/ * Hooks are orderred in assending priority. * /
Int priority;
}
In fact, similar to LVS is to generate an instance of a struct nf_hook_ops structure, and use nf_register_hook to hook it. Where the List item is to initialize {null, null}; because of the general in IP layer, PF is always pf_inet; hooknum is hook point; a hook point may hang multiple processing functions, who first, then, you must see priority, That is, priority is specified. Netfilter_IPv4.h specifies the priority of the built-in processing function in Netfilter_IPv4.h:
ENUM NF_IP_HOOK_PRIORITIES {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_LAST = INT_MAX,
}
Unsigned int nf_hookfn (unsigned int hooknum,
Struct SK_Buff ** SKB,
Const struct net_device * in,
Const struct net_device * out,
INT (* okfn) (STRUCT SK_BUFF *));
Its five parameters will be passed from the NFhook macro.
The above is some basic usage of Netfillter to write its own module. Next, let's take a look at how it is implemented in LVS.
3. Implementation of Netfiler in LVS
Using Netfilter, LVS processing datagram from left into the system, after the IP check, the data report is processed by the first hook function nf_ip_pre_routing [hook1]; then performs routing, determine whether the data is required to forward or send it This unit; if the data is sent by this unit, the data is passed to the upper layer protocol after the hook function nf_ipal_in [hook2] is processed; if the data report should be forwarded, it is processed by NF_IP_FORWARD [Hook3]; After the datagram passes through the last hook function NF_IP_POST_ROUTING [HOOK4], it is then transferred to the network. After the data generated by the locally is processed by the hook function nf_ip_local_out [hook5], the routing selection process is performed, and then the NF_IP_POST_ROUTING [HOOK4] is processed to the network.
When the IPVS load IP_VS module is started, the module's initialization function IP_VS_INIT () registers NF_IP_LOCAL_IN [Hook2], NF_IP_FORWARD [HOOK3], NF_IP_POST_ROUTING [HOOK4] hook function to process the entry and exported datagram.
3.1 nf_ip_local_in process
User initiates a request to the virtual server, and the datagram passes through NF_IP_LOCAL_IN [Hook2] to enter IP_VS_IN () for processing. If it is incorporated into an ICMP datagram, IP_VS_IN_ICMP (); otherwise, continue to determine if the TCP / UDP datagram is not a TCP / UDP datagram, then the function returns NF_ACCEPT (let the kernel processes the dataginary); rest It is to deal with TCP / UDP datagram. First, call the ip_vs_header_check () check the header if an exception, then the function returns NF_DROP (discard the dataginary). Next, call IP_VS_CONN_IN_GET () Go to the IP_VS_CONN_TAB table to find whether there is such a connection: its client and virtual server IP address and port number, and protocol types are consistent with the corresponding information in the datagram. If there is no connection, it means that the connection has not yet been established. If the data is reported as the SYNC packet or UDP datagram, the corresponding virtual server is found; if the corresponding virtual server exists but has full load, return nf_drop; if The corresponding virtual server exists and not full load, then call ip_vs_schedule () to schedule an RS and create a new connection. If the schedule failed, call IP_VS_LEAVE () Continue to pass or discard the Data report. If there is a corresponding connection, first determine if the RS on the connection is available, if it is not available, the relevant information will be returned to NF_DROP. After finding the existing connection or establishing a new connection, modify the relevant information of the system record such as the number of incoming datagrams. If this connection is bound to the specific datagram transfer function, call this function to transfer the datagrand, otherwise return NF_ACCEPT. IP_VS_IN_ICMP () in IP_VS_IN () is handled by ICMP packets. When the function starts checking the length of the datagram, if an exception returns NF_DROP. The function is only the purpose of the TCP / UDP packet transfer error, and the source terminal is turned off or timeout, and the kernel is processed. In response to the above three types of packets, first check the inspection and. If the inspection and error, return nf_drop directly; otherwise, analyze the returned ICMP error message, find the corresponding connection to exist. If the connection does not exist, return nf_accept; if the connection exists, according to the connection information, the IP address of the error information clamp and the port number and the ICMP datagram report will be modified, and the inspection in each header is recalculated and modified. Routing IP_send () Sends a modified datagret and returns NF_STOLEN (withdrawing the processing process of the data report).
IP_VS_IN () Call function ip_vs_schedule () is a virtual server scheduling the available RS and establishes the corresponding connection. It will assign an RS based on the scheduling algorithm bound by the virtual server. If it is successful, the IP_VS_CONN_NEW () is called. IP_VS_CONN_NEW () will perform a series of initialization operations: set the connection protocol, IP address, port number, protocol timeout information, bind Application Helper, RS, and datagram transfer functions, and finally call the IP_VS_CONN_HASH () to insert this connection into the hash table IP_VS_CONN_TAB in. A connection binding datagram transfer function can be divided into IP_vs_nat_xmit (), IP_VS_Tunnel_Xmit (), IP_VS_TUNNEL_XMIT (), IP_VS_TUNNEL_XMIT (). For example, the main operation of IP_VS_NAT_Xmit () is: Modify the destination address of the message and the destination port for RS information, recalculate and set the inspection, and call ip_send () to send a modified datagram.
3.2 NF_IP_FORWARD Process
After the data is entered into NF_IP_FORWARD, it will enter IP_VS_OUT (). This function is only called in the NAT mode. It first determines the data report type. If ICMP datagram, IP_VS_OUT_ICMP () directly; secondly determine whether it is TCP / UDP datagram, if it is not the two, return NF_ACCEPT. The rest is the processing of TCP / UDP datagram. First, call the ip_vs_header_check () check the header if the exception returns NF_DROP. Next, the IP_VS_CONN_OUT_GET () determines whether there is a corresponding connection. If there is no corresponding connection: call IP_VS_LOOKUP_REAL_SERVICE () to find if the RS of the Send Dativity is still existing, if the RS exists and the message is a TCP non-reset message or UDP packet, call ICMP_send () to the RS Send The purpose is not reaching ICMP packets and returns NF_STOLEN; NF_ACCEPT is returned in the rest. If there is a corresponding connection: Check the data report, if the error returns NF_DROP, if the datagram is correct, modify the data report to the virtual server IP address, the source port is modified to the virtual server port number, recalculate and set the inspection and And return NF_ACCEPT. The process of IP_VS_OUT_ICMP () is similar to IP_VS_ICMP (), just modifying the datagram: IP header, the destination address of the UDP or TCP header is modified to the virtual server address, the error message, UDP or TCP The destination port number of the header is modified to the port number of the virtual server.
3.3 NF_IP_POST_ROUTING Processing Process
NF_IP_POST_ROUTING hook function is only used in NAT mode. After the datagram enters NF_IP_POST_ROUTING, the IP_VS_POST_ROUTING () is processed. It first determines whether the data report has passed IPvs, if it returns NF_ACCEPT if it is not passed; otherwise, the data is transferred immediately, and the function returns NF_STOLEN to prevent the datagram from being modified by the iptable rule.
4.LVS system configuration and management
When IPVS module is initialized, setsockopt / getSockopt (), the ipvsadm command calls these two functions to pass the system configuration data of the IP_vs_rule_user structure to the IPVS kernel module, complete the system configuration, implement the data of the virtual server and the RS address, modify, delete operating. The system is managed by these operations to the management of virtual servers and RS linkers.
The addition of the virtual server is done by ip_vs_add_service (), which adds a new node to the virtual server hash table according to the hash algorithm, finds the scheduling algorithm set by the user and binds this algorithm to the node; virtual server The modification is completed by ip_vs_edit_service (), this function modifies the scheduling algorithm for the specified server; the deletion of the virtual server is complete by IP_VS_DEL_SERVICE (), before deleting a virtual server, you must first remove all RSs, and release the virtual server The binding scheduling algorithm.
ip_vs_add_dest (), IP_VS_EDIT_DEST (), IP_VS_EDIT_SERVER (), respectively.
4. Load balancing scheduling algorithm
It has been mentioned earlier that the user is binding the scheduling algorithm when adding a virtual service, which is completed by IP_VS_BIND_SCHEDULER (), and the display algorithm is completed by IP_VS_SCHEDULER_GET (). IP_VS_SCHEDULER_GET () Call IP_VS_SCHED_GETBYNAME () from the name of the scheduling algorithm to find this scheduling algorithm from the scheduling algorithm queue, and then load the corresponding scheduling algorithm module and then look up, and finally returns the results. There are currently eight load balancing scheduling algorithms, as follows:
RR: Round-Robin It sequentially assigns different RS in turn, which is a request in RS. This algorithm is simple, but it is only suitable for the case where the RS processing performance is different.
WRR: Weighted Round-Robin It will allocate tasks based on different RS weights. RS with high weight will give priority, and the number of connections allocated will be more than RS with lower weight. RS of the same weight gives the same number of connections.
DH: Destination address hashing (Destination Hashing) Find a static hash table for the keyword to obtain a static Hash table to obtain the required RS.
SH: Source Hashing) Find a static hash table with the source address for the keyword to obtain the required RS.
LC: The minimum connection number scheduling (Least-Connection) IPVS table stores all of the activities. Send new connection requests to RS whose current number of connections.
WLC: Weighted Least-Connection assumes that the weight of each RS is WI (i = 1..n), and the current TCP connection is Ti (i = 1..n), according to time Select Ti / Wi to the smallest RS as the next assigned RS.
LBLC: The minimum connection number scheduling of the address is allocated to the same RS in the same RS, which is not full of load, otherwise the server is not full of load, otherwise it is the smallest number of connections, and it is The next assignment is first considered.
LBLCR: The address of the Locality-based Least-Connection With Replication For a certain address, there is a RS subset. Request for this address, the RS whose number of connections in the subset is assigned; if all the servers in the subset have full load, select a server with a smaller connection from the cluster, add it to this subset and assign it Connection; if this subset is not modified for a certain period of time, the node in the subset is removed from the subset.
Excerpt from:
LinuxAid