LINUX IP - Split and Restructuring - Detailed

xiaoxiao2021-03-06  77

LINUX IP - Split and Restructuring - Detailed

Principle introduction

Split again for a data package

The subtle difference between the data package fragment and the data wrapped again is that the gateway processes the difference between the MF bit. However, when a gateway is a piece of data package for fragmentation, in addition to the end of the data package, it places the MF bits on the remaining fractions as one, and the last piece is 0. However, when the gateway is divided into a non-end data package, it will set all the MF bits in all the generated subsets to 1, because all of these sub-shards cannot be the end of the entire data package. Data pack.

For fragmentation, copy IP headers and options, as well as data. And the copy of the options Note: According to the protocol standard, some options should only appear in a data package, and others must appear in all packets.

Packet reorganization

data structure

In order to make the data packets more efficient, the data structure for saving the packet must be available:

To make a set of data packages constituting a particular packet, the new data package is quickly inserted in a set of data wraps; if a complete data packet is all arrived; there is a data package timeout mechanism (Ip_expire), and, if the timer overflows before the restructuring is overflow, delete the data package.

Mutual exclusive operation

The restructuring program code uses a mutually exclusive quantity. IPFRAG_LOCK

Add a data package in the linked list

Find way: Linear look of linked list

Discard when overflow

The slice list space is full, discard all fractions of the corresponding packet. IP_evictor

Test if it makes up a complete packet ip_frag_queue

Determine if the IP_MF bit is 0!

Pack the data package into a complete data package last_in, ip_frag_reasm

Maintenance management of data package linked list

In order to make the packet of the lost data package no longer waste storage resources, and prevent IP to make confusion due to the reuse of the indicators, it is no longer possible to be subjected to the remaining data package, IP must periodically check the data package. List.

IPQ_unlink

IPQ_PUT

IPQ_KILL

Ipqhashfn

Implementation under Linux

IP fragmentation

How to improve the efficiency of fragmentation processing

IP_FRAGMENT (non-UDP)

Typical caller

IP_sendà ip_fragment (SKB, IP_FINISH_OUTPUT); generally from forwarding

IP_QUEUE_XMIT2à ip_fragment (SKB, SKB-> DST-> OUTPUT) is generally from TCP

Since the IP newspapers are too large, they are sack to be used for transmission of one frame.

Process process

Get outgoing equipment (decided by SKB)

DEV = RT-> u.dst.dev; export route device

! ! ! SKB-> DST = RT = rt-> u.dstàdst_entry

Take the IP header

RAW = SKB-> NH.RAW;

IPH = (struct iphdr *) RAW; take IP header

Set the start value

HLEN = IP head length

Left = ntoHS (iPh-> Tot_LEN) - HLEN; the total length of the package minus IP header - the length of data that needs fragmentation

MTU = RT-> u.dst.pmtu - hlen; physical MTU minus IP header - removes the slide length of IP header

PTR = RAW HLEN; take data area pointer

Package data package

Split algorithms are simple, but due to the implementation of SK_BUFF structures and chains, it is very complicated. If the DF bit is forbidden, IP_output discards the packet and returns an error message. If the packet is generated locally, the transport layer protocol returns the error back to the process if the group is forwarded, then IP_forward generates an ICMP purpose to be wrong, and pointing out that I will forward it without fragmentation. Group. Path discovery mechanism? The algorithm is used to search for the path of the destination host and discover the maximum transfer unit MTU supported by the intermediate network. The new fragmentation contains: IP headers, options in some original packets and the length of the most LEN. There is no fragment queue under Linux, and a slice package is issued separately. OFFSET = (NTOHS (iPh-> Frag_off) & ip_offset) << 3;

Take out the offset (13 bits) and multiply the total number of bytes - count the number of offset bytes of the package

NOT_LAST_FRAG = iPh-> Frag_off & Htons (IP_MF);

Take out the MF bit (14th place)

Cycle to separate:

While (Left> 0) {

Len = Left;

/ * If: it's 'mtu' - the data space left * /

If (len> MTU) If the remaining data LEFT is larger than the MTU, the data length of the fragmentation is divided by the MTU; otherwise, use Left as the data length (for the last piece)

Len = MTU;

/ * If: we are not sending upto and incruDing the packet end

The next start on an electric byte boundary The next start is the border of the eight bytes. * /

If the Len

LEN & = ~ 7; take an 8-byte integer multiple

}

Distribute SK_BUFF, Size: Hardware Frame Length IP Header

Fill fraction

Package Type: Native, Broadcast, Multicast, Other Hosts, Out, Roal, Routing Package Priority Leave Frame Terminal Specify IP and TCP Raw Pointer RAW

/ *

* Charge the memory for the Fragment to any Owner

* IT Might Possess

* /

If the package has SOCK, register the owner of the slice in the SOCK of the package. Copy-destination address, increase the reference count. Copy exit equipment. Copy the IP header copy IP block (only the size of the slice length) and reduce the total packet length Left-Len so that the next time. LEFT records the remaining data.

Filling new IP head

Positioning the IP head of the new fragment package sets the offset value of the fragment (for the first fractional value is the offset value of the original IP packet) - OFFSET records the offset of the fragmentation. At this time the flag is empty. IPH-> FRAG_OFF = HTONS ((Offset >> 3));

If the offset is 0 - indicates that the package is first separated

IF (Offset == 0)

The IP package is the first slice to fill in the first sheet to fill in some options that are not allowed in other shards, in order to improve efficiency (options typically placed in the first packet of the shard.

IP_OPTIONS_FRAGMENT (SKB);

For multi-filing sheets (not_last_frag = 1 means that the package is a fragment package) - When the slice package is subsequently divided, it is necessary to keep the MF 1IF (Left> 0 || not_last_frag)

IPH-> FRAG_OFF | = HTONS (IP_MF); Set MF bit 1

Move the original IP package data pointer PTR

Moving fragmentation offset pointer OFFSET

PTR = LEN; mobile IP package data pointer

OFFSET = LEN; Moving Split Pointer

If a firewall is configured, the firewall value is set.

Sending this slice

Calculate the total packager to re-generate the split package and send slice IP_FINISH_OUTPUT

} Loop until the data fragment is over (left = 0)

UDP fragmentation (to be continued)

IP recombination

IP_DEFRAG

As is well known, the network dataginary is transmitted in the network stack of Linux. The function of IP_DEFRAG () is transmitted by SK_BUFF (SK_BUFF), and attempts to combine. When the complete package group is in good, new SK_BUFF returns, otherwise returns an empty pointer.

Typical caller

IF (iph-> frag_off & htons (ip_mf | ip_offset) Judging whether it is shard

IP_LOCAL_DELIVER pP_DEFRAG (SKB);

Key data structure (2.4 series)

iPQ

These fractions form a two-way linked list (in the Linux kernel, if you need to use a linked list, you will recommend a two-way linked list unless you have special needs, see Document / CodingStyle, indicating an unbearable slice queue (a IP package). The head pointer of this linked list is placed in the IPQ structure:

/ * Describe an entry in the "incomplete database" queue. * /

Struct ipq {

Struct ipq * next; / * Linked List Pointers * /

U32 Saddr;

U32 DADDR;

U16 ID;

U8 protocol;

U8 last_in;

#define completion 4

#DEFINE FIRST_IN 2

#define last_in 1

Struct SK_Buff * fragments; / * Linked List of Received Fragments * /

Int Len; / * Total Length of Original DataGram * /

Int meat; reserved accumulated values ​​for existing fragmentation lengths

Spinlock_t lock;

Atomic_t refcnt;

Struct Timer_List Timer; / * When Will this Queue Expire? * /

Struct ipq ** pprev;

INT IIF; / * Device Index - for ICMP Replies * /

}

Note that each IPQ retains a timer (ie Struct Timer_List Timer;).

IPQ uses a Hash table to build a slice chain.

Hash table:

#define ipq_hashsz 64

Struct ipq * ipq_hash [ipq_hashsz];

#define ipqhashfn (ID, Saddr, Daddr, Prot) / ID, source address, destination address, protocol

(((ID) >> 1) ^ (SADDR) ^ (DADDR) ^ (PROT)) & (IPQ_HASHSZ - 1)) Each IP package is represented by the following four-tuples: (ID, Saddr, Daddr, Protocol) Four values ​​are divided into a fragmentation into an IPQ chain, which can be assembled into a complete IP package.

FRAG_CB

#define FRAG_CB (SKB) (Struct IPFRAG_SKB_CB *) ((SKB) -> CB))

CB is a control buffer. It provides a private data to each layer. If you need to keep them to other layers, you must perform clone SKB_CLONE.

CHAR CB [48];

IPFRAG_SKB_CB

Struct ipfrag_skb_cb

{

Struct inet_skb_parm h;

Int offset

}

INET_SKB_PARM

Struct inet_skb_parm

{

Struct ip_options opt; / * compiled ip options * /

Unsigned char flags;

#define ipskb_masqueraded 1

#define ipskb_translated 2

#define ipskb_forwarded 4

}

Function description

When the kernel receives a local IP packet, the first extinction of the fragment recombination is the same before passing to the upper protocol processing. The identification number (ID) between the iP packet slice is the same. When IP wrapper ingage (FRAG_OFF) 14th (IP_MF) is 1, indicating that the IP package has a subsequent fragment. The low 13 bid is the offset of the slice in the full packet, in 8 bytes. When the IP_MF bit is At 0, it means that the IP package is the last piece. If 60-120 seconds (IP_FRAG_TIME constant specified. (30 * hz)) The package is not arriving, the restructuring process fails, the restructuring queue is released, while sending the sender ICMP protocol notification failure information. The memory consumption of the restructuring queue must not be greater than 256K (sysctl_ipfrag_high_thresh), otherwise the (ip_evictor) will be called to release the restructuring queue of each hash end. All IP implementation must be able to reload up to 576 bytes . There may be slice overlap. Treatment of slice overlapping. In order to prevent excessive memory consumption due to reservation, Linux sets the boundary to prevent this, if the upper limit of memory usage is exceeded, the oldest queue (IPQ) in the memory is empty. The size stored in the memory is saved in variables In ip_frag_mem, of course, it should be "atom" operation (Atomic_Sub, Atomic_ADD, Atomic_READ, ETC). It defines the flow of fragment assembly code in the front of the file ip_fragment.c, which is basically the same as the 2.2 series, and the division of the function is changed. Since the original IPFRAG structure retains can be obtained in SKBUFF, this structure is canceled in 2.4, and some modifications have been made to the IPQ structure. Other major changes are:

1) IP_DEFRAG is divided into two parts: IP_DEFRAG and IP_FRAG_QUEUE.

2) IP_GLUE is renamed into IP_FRAG_REASM, the process is basically not movable.

3) Now IPQ reserves the accumulated value of the existing fragment length in the IPQ (already resolved overlap). If this value reaches the total length, all fractions arrive, so the IP_DONE function is canceled, do not have to pass once every time Link list, therefore has a large improvement in efficiency, and the ability to resist small fragment attack is strengthened.

ip_findàip_frag_createàip_frag-innà ip_frag_time

Process

Struct SK_Buff * ip_defrag (Struct Sk_buff * SKB)

{

If the memory space used for fragmentation is greater than 256K specified by the system, then the cleaning IP_EVICTOR is

Specify IP package corresponding device DEV

According to the HASH value, localize the position in the slice chain:

If there is a slice chain, other slices arrive,? ? ? If it is reached in normal order?

Insert the slice into the corresponding slice queue,

According to the flag and offset of the package: if it is the last slice package (but not necessarily fractionated), the length of the fragment queue is the length of the original package; if it is a package that is later than the current slice Changing the length of the fragmentation queue; if it is a packet. Adjust the SKB pointer of the package: Make SKB-> DATA points to the IP load (data area), SKB-> Tail points to the IP tail. That is: SKB-> DATA-SKB-> TAIL = IP load area. This is so that the SK_BUFF is added to the fragment chain and is easily spliced ​​in the recombination. Scan the package fragment in the reorganized queue, find the previous fragmentation of the shard in the shard chain: Use the offset to find offset If the fragment is not the first fragment (prev! = Null), eliminate the previous one Split overlap: See the difference between the PREV tail and the current offset, (this offset package is not necessarily the one of the following packages); re-eliminating the overlap of the rear slice, seeking the current offset and NEXT overlap If the current package is smaller than the tail of the rear package, the rear package is moved, the rear sliced ​​overlapping portion is cleared, reducing the total length of the fragmentation statistics; if the latter piece is completely included in this fragment, clear it NEXT, reduce the total length of Split Statistics. Insert the fragment into the fragmentation queue: set the device, increase the total length of the MEAT slice, if the offset is 0, the first package, place the first_in flag. Check if the slice is the last fragment (whether the shard is all, the package length) is: Before the slice reorganization IP_FRAG_REASM recombination, remove the slice queue first. Assign the SK_BUFF structure for the new package, fill in the corresponding value: set the new IP length (cannot exceed 65535 bytes); frame head position, IP header, option data copy the original IP header (the first queue There is a record in a slice) to the new SKB structure cycle copy: copy the slice SKB data on the slice chain to the new SKB structure. Increase the sample value. Set the destination address (cloning), package type, protocol, and device. Perform firewall processing. Reset the IP header, set the 3-bit flag and 13-bit offset to 0, and the total length is calculated. Returns a new IP package. If you are the first fragment (found), create a new entry item inserted into a slice quadch head IP_FRAG_CREATE

IP_evictor

IP_EVICATOR is called when the memory used by the shard exceeds a certain upper limit (SysctL_IPFRAG_High_Thresh) to release memory.

IP_EVICATOR will find an empty IPQ and empty it until it reaches the available lower limit (sysctl_ipfrag_low_thresh).

This value is defined below ip_fragment.c:

INT SYSCTL_IPFRAG_High_thresh = 256 * 1024;

INT SYSCTL_IPFRAG_LOW_THRESH = 192 * 1024; Similarly, these two-parameters can be seen with sysctl -a while dynamically modify.

#Nsysctl -a

......

Net.ipv4.ipfrag_low_thresh = 196608

Net.ipv4.ipfrag_high_thresh = 262144

......

Theoretical ip_evicator should use the LRU algorithm to clear the oldest IPQ. However, there is currently no such function, just emptying the Hash table in order, so that this benefit is simple.

Memory Limiting on Fragments. Evictor TRASHES (Discard) The Oldest * Fragment Queue Until We Are Back Under The Low Threshold.

The IP_EVICTOR function traverses the slice queue, and discards the slice that has been collected so far until the total amount of memory used is less than the specified limit. This function calls the ip_free function as long as the memory is greater than the memory limit on his memory. When the slide queue is empty and the memory threshold is also exceeded, the IP_EVICTOR function can cause the kerisality of the kernel.

LRU algorithm, global variable ipq_hash [64] linked list, the closer to the chain tail, the larger the count, the more it is not easy to wash away

Two cycles Each time you wash off the least reference count, until the total occupation memory drops to sysctl_ipfrag_low_thresh = 192 * 1024

IP_FRAG_CREATE

Initialize an IP fragment queue, including the timer, the processing function is IP_EXPIRE. Use ip_frag_intern to establish a linked header and insert the time chain.

IP_FRAG_INTERN

IP_FRAG_NQUEES

转载请注明原文地址:https://www.9cbs.com/read-109422.html

New Post(0)