LVS cluster system network core principle analysis [reproduced]

xiaoxiao2021-03-06  120

Analysis of core principle of LVS cluster system network

Source: LinuxAid

2003-1-8 1:12:00

Summary????????????????????????????????????????????????? ??????????????????????????

This article mainly introduces the LVS system means. Mainly introduced 3 comparisons

Universal NetFilter hook functions for the 2.4 kernel Netfilter, the most

The balance algorithm of LVS can be described after the basic principles and techniques of load balancing

The rapid increase in Internet provides a bottleneck with a large number of concurrent access services. Due to a single server, multi-server and negative (Linux Virtual Servers, LINUX Virtual Servers must be used to grow rapidly growing network access requirements

Length to facilitate the facilities of the multimedia network server, so for large load-loaded servers can always be limited, simple improvement of hardware balance technology can meet a large number and visit VS) use load balancing technology to provide multiple service with a load Ability is easy to expand,

Asking the quantity increases, the server needs to speak, the CPU, I / O processing capability will soon become performance and cannot really solve this problem. In order to ask. The Linux virtual service server forms a virtual server. It is a low price solution.

1.LVS structure and work principle ??????????????????????????? ???????????

The structure of the LVS is transparent, and the user can only see one in the Real Server (RS) group.

Show, it consists of a front-end load balancer (LOA. RS can pass through a local area network or wide area as a virtual server of LB (Virtual S)

D Balancer, LB) and the rear end real service network connection. This structure of LVS is ERVER, but does not see the RS group that provides service.

As shown in Figure 1????????????????????????????????????????????? ???????????????????????

When the user's request is sent to the virtual server, the request is forwarded to RS. RS is related to the policy of returning the user request result.

The LB gives the user to the user according to the set packet forwarding strategy and load balancing scheduling algorithm. Like the request package, the return method of answering packages is also turning with the package.

LLVS's bag turn-forwarding strategy has three types: ?????????????????????????????????????????????????????????????????????????? ????????

NAT (the IP address of the Network Address Translation) is converted to an IP address of a selected RS to the virtual server's IP address and is sent to the user.

)mode. After the LB receives the user request package, the LB will request the packet in the virtual service, forward it to the RS; RS sends answers to the LB, LB will answer the rs.

IP tunnel (IP tunneling gives a selected RS; RS solve the road protocol.

)mode. After the LB receives the user request package, ask for information, directly pass the answer content.

The package is encapsulated according to IP tunnel protocol, and then it is then passed. At this point, RS and LB are required to support IP tunnels.

DR (Direct Routing) mode. After the LB received the requested MAC address, turn the package out, RS receives RS must be within a physical segment, and LB and RS group

After the package, the target MAC address in the request package is converted into a selected RS package, and the answer content can be directly transmitted to the user. The LB and the shared virtual IP are required.

2, IPVS software structure and realization ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ??????????

The core of the LVS software is that the structure is shown in Figure 2, which is mainly composed of I and the real server chain list.

IPvs running on LB, using p-packet processing, load balancing algorithm, system

Load balancing method for IP layer. Overall configuration and management of IPVs and virtual servers

as shown in picture 2????????????????????????????????????????????? ???????????????????????  2.1 LVS to the IP package processing mode ????????????????? ???????????????????????

IP package processing Linux procedure is shown in the figure:

2.4 The Netfilter framework of the kernel is completed.

A packet through the Netfilter framework

It is popular, Netfilhook, while user-defined features on each test point).

The architecture of Ter is to register some processing functions on the entire network process.

Several positions have some test points ((such as package filtering, NAT, etc., can even be

The position of the five HOOK points of the IP layer is as shown in the figure below: ????????

As shown in Figure 3????????????????????????????????????????????? ???????????????????????

NF_IP_PRE_ROUTING: Just entered network detection), the source address conversion is done at this point;

Layer's data package passes this point (just over the final version number, checksum, etc.

NF_IP_LOCAL_IN: After the route looks, send

To this machine's through this checkpoint, the Input package is filtered at this point;

NF_IP_FORWARD: Turn

The package passed this test point, the Forword package

Filter is performed at this point;

NF_IP_LOCAL_OUT: this

The package issued by the machine process passes this test point.

OUTPUT package is filtered at this point;

NF_IP_POST_ROUTING: All immediately registered conversion function (including address camouflage)

The package out of the network device passes this detection point, the built-in destination.

In the IP layer code, there are some statements with NF_HOOK macros, such as IP's forwarding function: ??

<-ipforward.c ip_forward () ->

NF_HOOK (PF_INET, NF_IP_FORWARD, SKB, SKB-> DEV, DEV2, IP_FORWARD_FINISH);

// The definition of the NF_HOOK macro is basically the following: ??????????????????????????????

<- / include / linux / netfilter.h->

#ifdef config_netfilter ??????????????????????????????????

#define nf_hook (PF,

Hook, SKB, INDEV, OUTDEV, OK

Fn)

(List_empty (& NF_HOOKS [(PF)] [(hook)]) ??????

(okfn) (SKB) ??????????????????????????????????? ??????????

: NF_HOOK_SLOW ((PF),

(HOOK), (SKB), (Indev), (OU

TDEV), (OKFN))))))))

#ELSE / *! config_netfilter * / ?????????????????????????????????????????????

#define nf_hook (PF, Hook, SKB, I

NDEV, OUTDEV, OKFN (OKFN) (SKB)

#ENDIF / * Config_netFilter * / ??????????????

  If the IP_forward_finish function is compiled by the kernel, the expression may be relatively passed.

When there is no NetFilter, it is quite; otherwise enter the hook point, execute the NF_HOOK_SLOW () to invoke the last parameter, this example is enforced _Register_hook () function (function, then perform registration function).

  NF_HOOK macro parameters are: ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

PF: The protocol name, the NetFilter architecture is the same as PF_INET6, PF_DECNET and other names.

It can be used in the IP layer, so this variable can also be

Hook: The name of the hook point, for the IP layer, is to take the five values ​​above; ????????????????

SKB: As the name suggests ??????????????????????????????????????????????????? ?????????????

Indev: The equipment coming in, is represented by the struct net_device structure; ??????

Outdev: Out of the equipment, expressed in the struct net_device structure; ????

OKFN: is a function pointer,

When all registered letters of all the hook points

After the numerical adjustment, turn this process.

These points are already or modified, and this test point is used for TRACK, which is also a flexible framework for extended features.

The core is defined, unless you are the processing of this part, you can specify by the user. Available in a PA. Just as Netfilter provides convenience.

The maintainer of the kernel code, otherwise no need to increase the original design goal of CKET Filter, Nat, Connection - provide one after

  If we want to join your code,

To use the NF_REGISTER_HOOK function, its function prototype is:

INT NF_REGISTER_HOOK

(struct nf_hook_ops * reg)

Struct nf_hook_ops: // Structure ???????????????????????

STRUCT NF_HOOK_OPS ???????????????????????????????????

{?????????????????????????????????? ???????????????????????????

Struct List_head list; ???????????????????????????

/ * User Fills in from here down. * / ????????

NF_HOOKFN * hook; ??????????????????????????????????????

INT PF; ??????????????????????????????? ??????????????????

INT hooknum; ?????????????????????????????????????????????????? ???????

/ * Hooks are ordered in Ascendin

g priority. * /

INT priority; ???????????????????????????????????????????????????????????????????????????????????????????? ?????

}; ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ??????????????????????????

In fact, similar to LVS is to generate nf_register_hooks to hook it. Where Listpf is always pf_inet; hooknum is the hook point; a preliminary, that is, Priority is specified. Netfilter_ Propariate: An instance of a struct nf_hook_ops structure, and use items to initialize to {null, null}; due to general in the IP layer, a hook point may hang multiple processing functions, who first, then see excellent IPv4 Have the built-in processing function with an enumerated type in. h.

ENUM NF_IP_HOOK_PRIORITIES {??????????????

NF_IP_PRI_FIRST = INT_MIN, ????????????????????????????????????

NF_IP_PRI_CONNTRACK = -200, ?????????????????????????????????????????????????????????????????????????????????????

NF_IP_PRI_MANGLE = -150, ??????????????????????

NF_IP_PRI_NAT_DST = -100, ????????????????????

NF_IP_PRI_FILTER = 0, ????????????????????????????

NF_IP_PRI_NAT_SRC = 100, ??????????????????????

NF_IP_PRI_LAST = INT_MAX, ????????????????????

}; ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ??????????????????????????

                  ???????????????????????????????????????????????????

Unsigned int NF_HOOK

FN (unsigned int hooknum,

STRUCT SK_BUFF ** SKB, ????????????????????????????

Const struct net_device * in, ????????????????????????????????????????

Const struct net_device * out, ????????????

INT (* okfn) (STRUCT SK_BUFF *)); ????????????????

Its five parameters will be passed from the NFhook macro. ?????????????????????????????

The above is how Netfillter is implemented.

Some basic usage when writing your own module

Next, let's take a look at the LVS.

3. The realization of Netfiler in LVS ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ?

  Using Netfilter, LVS Process Data Reporting Hook Functions NF_IP_PRE_ROUTING [HOOK1] Detailed or sent to this machine; if the datagram is sent to the upper protocol after being processed by this machine; A hook function NF_IP_ The data generated by the local manner is sent after the hook function nf_ip_locanf_ip_post_routing [hook4]

After entering the system from the left, after the IP check, the data is processed through the first line; then the route selection is performed, and the data is required to be turned, then the data is forwarded by the hook function nf_ip_local_in [hook2], then it is NF_IP_FORWARD [Hook3] Processing; after turning post_routing [hook4], then transfer to the network. L_Out [hook5] After processing, the routing selection is performed, and then it is passed to the network. When the IPVS load IP_VS module is started, NF_IP_LOCAL_IN [hook2], NF_IP_FORWARD is processed into and out of datagram.

The module's initialization function IP_VS_INIT () is registered [hook3], nf_ip_post_routing [hook4] hook function

3.1 nf_ip_local_in processing process ??????????????????????????

Users handle the virtual server. If it is incoming ICTCP / UDP datagram, if not t; the remaining situation is to process TCP / UD, then the function returns NF_DROP (Discard Whether there is a corresponding information in such a connection datagram in this connection. TCP SYNC News Text or UDP data recurrent, return nf_drop; if a RS is dispatched and create a new report. If there is a corresponding connection, the first NF_DROP. Find the existing number of even. If this connection returns NF_ACCEPT.

The issuance request is issued. After the NF_IP_LOMP datagram, the IP_VS_ICMCP / UDP datagram is called, and the function returns the NF_AP datagram. First, call ip_vs_hea this datagram). Next, call IP_VS_C pick up: Its client and virtual server does not have a corresponding connection, it means to find the corresponding virtual server; if the corresponding virtual server exists and the negative connection, if the schedule failed, call IP_V first judgment connection Whether the RS is available, if you connect or establish a new connection, modify the system to bind a specific datagram transfer letter.

Cal_in [hook2] enters IP_VS_IN () p (); otherwise, continues to determine if the CCEPT (let the kernel processes the dataginary report) DER_CHECK () check the header, if the exception ONN_TABIP address and port number, and the protocol type The connection has not yet been established. At this time, if the datagram is presented as the corresponding virtual server, it is full of load, then the IP_VS_SCHEDULE () S_LEVE () continues to pass or discard the data is not available, and the relevant information returns the relevant information, such as incoming data. The number of reports, call this function to transfer the datote, no

If an exception is called, if an exception returns NF_DROP. Closed or timeout ICMP packets, other tests and errors, directly return NF_DRO If the connection does not exist, return the NF_AIP address and the port number and ICMP data to find routing ip_send () Send

IP_VS_IN_ICMP () Processing the ICMP packet function is only processed by TCP / UDP packets. For the above three p; otherwise, the return ICMP error ccept is analyzed; if the connection exists, according to the IP address of the report header, and recalculate and modify the data report, and return NF_STO

. When the function begins to check the length of the datagram, the purpose of the error is not up to, and the source is closed, first check the inspection. If the information is checked, find the corresponding connection exists. Convection information, modify the modification of the error information clasp to modify the inspection in each of the headers, then check the LEN (with the processing of the data report).

IP_VS_IN () calls. It will establish a connection according to the virtual server. IP_VS_CONN_NEW, protocol timeout information, binding Applip_VS_CONN_HASH () will be divided according to IPvs work mode). For example, IP_VS_NAT_Xmit () calculates and sets the verification, call IP_S function ip_vs_schedule () assigns an RS for the virtual scheduled scheduling algorithm, if () will perform a series of initialization operations: ICATION Helper, RS, and Data Report Connection Plug-in hash table IP_VS_CONN_TAB The main operation of IP_VS_NAT_Xmit (), IP_VS_T is: Modify the destination end () to send the modified datagram.

The server scheduling the RS available and establishes the corresponding connection, then call the IP_VS_CONN_NEW () set the connection protocol, IP address, port number input function, and final call. A connection-bound datagram transfer letter unnel_xmit (), IP_VS_DR_XMIT (address and destination port is RS information, re-plan

3.2 nf_ip_forward processing process ?????????????????????????????????????????????????????????????

Data reported to NF_IP_ was called. It first determines whether the data judgment is a TCP / UDP data trial. First, call ip_vs_headip_vs_conn_out_get () to determine IP_VS_LOOKUP_REAL_SERVIC packets is TCP non-reset packets or uDPNF_STOLEN; return NF_DROP, if you are being virtual server port number, re-

After forward, IP_VS_OUT will be entered (report type if it is the ICMP data declaration, "if it is not these two, return nf_accer_check () check the header, if the discontinuation has a corresponding connection. If e () go to the hash table In the search to send the datastist, call ICMP_send () to return nf_accept to r. If there is a corresponding connection, modify the datagram, modify the source address and set the inspection, and return NF_A

) Processed. This function is only used in the NAT mode to adjust IP_VS_OUT_ICMP (); second EPT. The rest is the generals of TCP / UDP datagram returns NF_DROP. Second, call is in the corresponding connection: whether the called RS still exists, if the RS exists and the S transmission purpose is not reachable ICMP packets and returns to pick up: Check the datagram's inspection, if the error is the virtual server IP address, the source port modify the CCEPT.

  iP_vs_out_ICMP (the source address of the IP header and the error message UDP or TCP header

The process is modified to the port number of the virtual server in the process of the UDP or TCP header in the IP_VS_IN_ICMP () class.

Similar, only the difference is to modify the data report: change to a virtual server address, in the error message

3.3 nf_ip_post_routing process ??????????????????

  NF_IP_POST_ROUTING The hook function is only processed by IP_VS_POST_ROUTING (). NF_ACCEPT; otherwise transfer the dataginary report, function

Use in NAT mode. Data report enters NF_IP_POST_ROUTING It first determines whether the data report has passed IPvs, if it returns NF_STOLEN, preventing the datagram from being modified by the iptable rule.

4.LVS system configuration and management ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???????????

    ipvs module is initialized to pass the IP_vs_ler and RS address added, modified,

Registered with SetsockOpt / getSockOpt (Rule_User structure system configuration data delete operation. "After these operations

), Ipvsadm command calls these two functions, completes the configuration of the system, and implements the virtual service to manage the virtual server and RS linked list.   Virtual Server Advigator Hach Table Add a new section of the modification to delete by IP_VS_EDIT_SER by IP_VS_DEL_SERVICE (all RS tapes, and unwilling virtual service

The operation is complete by IP_VS_ADD_SERVICE (), find the scheduling algorithm set by the user and the Vice () completes, this function modification specifies) completes, deleting the scheduling algorithm bonded by a virtual server.

Cheng, the function is bound to the virtual service according to the hash algorithm to the virtual service to the node; the scheduling algorithm of the virtual service server; before the virtual server, you must first remove this virtual server.

      added, modified, deleted IP_VS_EDIT_SERVE

The addition is completed by IP_VS_ADD_DEST (), R (), respectively.

4. Load balancing scheduling algorithm ???????????????????????????????????? ????????????

The previously mentioned above, use _vs_bind_scheduler () to find this scheduling algorithm in the IP_VS_SCHEDULER_GET (), if

Households to be binding when adding a virtual service, the display of the scheduling algorithm is called by the IP_VS_ based on the name of the scheduling algorithm, and the IP_VS_ is not found to load the corresponding scheduling algorithm module.

Scheduling algorithm, which is completed by ipscheduler_get (). SCHED_GETBYNAME () looks up from the scheduling calculation, and finally returns the results.

At present, there are eight load balancing scheduling algorithms. The specific is as follows: ???????????????????????????????????????????????????

RR: Round-Robin it will request a simple algorithm, but only suitable for the difference between RS processing performance

Sign from allocate different RS, which is a request in RS. This is not very much.

WRR: Weighted wheel scheduling (the RS of We will give priority to tasks, and the number of destination connections.

IGHTED ROBIN) It ​​will be lower than the number of connections allocated

Different RS weight allocation tasks. The weight is higher RS. RS of the same weight is obtained

DH: Destination address hash dispatch table to get the required RS.

(Destination Hashing)

Address to find a static HASH

SH: Source Hashin's RS.

g) Find a static hash table with the source address for the keyword

LC: Minimum connection number scheduling (L requested to send to the current number of new connections

EAST-Connection) ipvs table stores RS.

All activities are connected. New connection

WLC: Weighted minimum connector scheduling (Weightedi = 1..n), the current TCP connection is the RS allocated by Ti.

Least-Connection) Assume that the weight of each RS is WI ((i = 1..n), select Ti / Wi to the smallest RS as the next one

LBLC: The minimum connection number scheduling based on the address (the request for the L address is assigned to the same RS if this service is the next assignment.

OCALITY-based Least-Connection will have not full load in the same destler, otherwise the rsonimed RS is the smallest connection, and it

LBLCR: Address-based REPLICATION for a minimum number of numbers; if the subsector, add it to this subset and subset the maximum node from subset deletion.

Repeat the minimum connection number scheduling (Locality- address, there is a RS subset. All the servers are full, then from the connection; if this subscription, this sub-based Least-Connection with this address request, It assigns a subset of connected to the cluster to select a smaller service set with a smaller number of connections is not modified, then the subset is concentrated.

转载请注明原文地址:https://www.9cbs.com/read-124918.html

New Post(0)