LINUX IP - Split and Restructuring - Detailed
Principle introduction
Split again for a data package
The subtle difference between the data package fragment and the data wrapped again is that the gateway processes the difference between the MF bit. However, when a gateway is a piece of data package for fragmentation, in addition to the end of the data package, it places the MF bits on the remaining fractions as one, and the last piece is 0. However, when the gateway is divided into a non-end data package, it will set all the MF bits in all the generated subsets to 1, because all of these sub-shards cannot be the end of the entire data package. Data pack.
For fragmentation, copy IP headers and options, as well as data. And the copy of the options Note: According to the protocol standard, some options should only appear in a data package, and others must appear in all packets.
Packet reorganization
data structure
In order to make the data packets more efficient, the data structure for saving the packet must be available:
To make a set of data packages constituting a particular packet, the new data package is quickly inserted in a set of data wraps; if a complete data packet is all arrived; there is a data package timeout mechanism (Ip_expire), and, if the timer overflows before the restructuring is overflow, delete the data package.
Mutual exclusive operation
The restructuring program code uses a mutually exclusive quantity. IPFRAG_LOCK
Add a data package in the linked list
Find way: Linear look of linked list
Discard when overflow
The slice list space is full, discard all fractions of the corresponding packet. IP_evictor
Test if it makes up a complete packet ip_frag_queue
Determine if the IP_MF bit is 0!
Pack the data package into a complete data package last_in, ip_frag_reasm
Maintenance management of data package linked list
In order to make the packet of the lost data package no longer waste storage resources, and prevent IP to make confusion due to the reuse of the indicators, it is no longer possible to be subjected to the remaining data package, IP must periodically check the data package. List.
IPQ_unlink
IPQ_PUT
IPQ_KILL
Ipqhashfn
Implementation under Linux
IP fragmentation
How to improve the efficiency of fragmentation processing
IP_FRAGMENT (non-UDP)
Typical caller
IP_sendà ip_fragment (SKB, IP_FINISH_OUTPUT); generally from forwarding
IP_QUEUE_XMIT2à ip_fragment (SKB, SKB-> DST-> OUTPUT) is generally from TCP
Since the IP newspapers are too large, they are sack to be used for transmission of one frame.
Process process
Get outgoing equipment (decided by SKB)
DEV = RT-> u.dst.dev; export route device
! ! ! SKB-> DST = RT = rt-> u.dstàdst_entry
Take the IP header
RAW = SKB-> NH.RAW;
IPH = (struct iphdr *) RAW; take IP header
Set the start value
HLEN = IP head length
Left = ntoHS (iPh-> Tot_LEN) - HLEN; the total length of the package minus IP header - the length of data that needs fragmentation
MTU = RT-> u.dst.pmtu - hlen; physical MTU minus IP header - removes the slide length of IP header
PTR = RAW HLEN; take data area pointer
Package data package
Split algorithms are simple, but due to the implementation of SK_BUFF structures and chains, it is very complicated. If the DF bit is forbidden, IP_output discards the packet and returns an error message. If the packet is generated locally, the transport layer protocol returns the error back to the process if the group is forwarded, then IP_forward generates an ICMP purpose to be wrong, and pointing out that I will forward it without fragmentation. Group. Path discovery mechanism? The algorithm is used to search for the path of the destination host and discover the maximum transfer unit MTU supported by the intermediate network. The new fragmentation contains: IP headers, options in some original packets and the length of the most LEN. There is no fragment queue under Linux, and a slice package is issued separately. OFFSET = (NTOHS (iPh-> Frag_off) & ip_offset) << 3;
Take out the offset (13 bits) and multiply the total number of bytes - count the number of offset bytes of the package
NOT_LAST_FRAG = iPh-> Frag_off & Htons (IP_MF);
Take out the MF bit (14th place)
Cycle to separate:
While (Left> 0) {
Len = Left;
/ * If: it's 'mtu' - the data space left * /
If (len> MTU) If the remaining data LEFT is larger than the MTU, the data length of the fragmentation is divided by the MTU; otherwise, use Left as the data length (for the last piece)
Len = MTU;
/ * If: we are not sending upto and incruDing the packet end
The next start on an electric byte boundary The next start is the border of the eight bytes. * /
If the Len LEN & = ~ 7; take an 8-byte integer multiple } Distribute SK_BUFF, Size: Hardware Frame Length IP Header Fill fraction Package Type: Native, Broadcast, Multicast, Other Hosts, Out, Roal, Routing Package Priority Leave Frame Terminal Specify IP and TCP Raw Pointer RAW / * * Charge the memory for the Fragment to any Owner * IT Might Possess * / If the package has SOCK, register the owner of the slice in the SOCK of the package. Copy-destination address, increase the reference count. Copy exit equipment. Copy the IP header copy IP block (only the size of the slice length) and reduce the total packet length Left-Len so that the next time. LEFT records the remaining data. Filling new IP head Positioning the IP head of the new fragment package sets the offset value of the fragment (for the first fractional value is the offset value of the original IP packet) - OFFSET records the offset of the fragmentation. At this time the flag is empty. IPH-> FRAG_OFF = HTONS ((Offset >> 3)); If the offset is 0 - indicates that the package is first separated IF (Offset == 0) The IP package is the first slice to fill in the first sheet to fill in some options that are not allowed in other shards, in order to improve efficiency (options typically placed in the first packet of the shard. IP_OPTIONS_FRAGMENT (SKB); For multi-filing sheets (not_last_frag = 1 means that the package is a fragment package) - When the slice package is subsequently divided, it is necessary to keep the MF 1IF (Left> 0 || not_last_frag) IPH-> FRAG_OFF | = HTONS (IP_MF); Set MF bit 1 Move the original IP package data pointer PTR Moving fragmentation offset pointer OFFSET PTR = LEN; mobile IP package data pointer OFFSET = LEN; Moving Split Pointer If a firewall is configured, the firewall value is set. Sending this slice Calculate the total packager to re-generate the split package and send slice IP_FINISH_OUTPUT } Loop until the data fragment is over (left = 0) UDP fragmentation (to be continued) IP recombination IP_DEFRAG As is well known, the network dataginary is transmitted in the network stack of Linux. The function of IP_DEFRAG () is transmitted by SK_BUFF (SK_BUFF), and attempts to combine. When the complete package group is in good, new SK_BUFF returns, otherwise returns an empty pointer. Typical caller IF (iph-> frag_off & htons (ip_mf | ip_offset) Judging whether it is shard IP_LOCAL_DELIVER pP_DEFRAG (SKB); Key data structure (2.4 series) iPQ These fractions form a two-way linked list (in the Linux kernel, if you need to use a linked list, you will recommend a two-way linked list unless you have special needs, see Document / CodingStyle, indicating an unbearable slice queue (a IP package). The head pointer of this linked list is placed in the IPQ structure: / * Describe an entry in the "incomplete database" queue. * / Struct ipq { Struct ipq * next; / * Linked List Pointers * / U32 Saddr; U32 DADDR; U16 ID; U8 protocol; U8 last_in; #define completion 4 #DEFINE FIRST_IN 2 #define last_in 1 Struct SK_Buff * fragments; / * Linked List of Received Fragments * / Int Len; / * Total Length of Original DataGram * / Int meat; reserved accumulated values for existing fragmentation lengths Spinlock_t lock; Atomic_t refcnt; Struct Timer_List Timer; / * When Will this Queue Expire? * / Struct ipq ** pprev; INT IIF; / * Device Index - for ICMP Replies * / } Note that each IPQ retains a timer (ie Struct Timer_List Timer;). IPQ uses a Hash table to build a slice chain. Hash table: #define ipq_hashsz 64 Struct ipq * ipq_hash [ipq_hashsz]; #define ipqhashfn (ID, Saddr, Daddr, Prot) / ID, source address, destination address, protocol (((ID) >> 1) ^ (SADDR) ^ (DADDR) ^ (PROT)) & (IPQ_HASHSZ - 1)) Each IP package is represented by the following four-tuples: (ID, Saddr, Daddr, Protocol) Four values are divided into a fragmentation into an IPQ chain, which can be assembled into a complete IP package. FRAG_CB #define FRAG_CB (SKB) (Struct IPFRAG_SKB_CB *) ((SKB) -> CB)) CB is a control buffer. It provides a private data to each layer. If you need to keep them to other layers, you must perform clone SKB_CLONE. CHAR CB [48]; IPFRAG_SKB_CB Struct ipfrag_skb_cb { Struct inet_skb_parm h; Int offset } INET_SKB_PARM Struct inet_skb_parm { Struct ip_options opt; / * compiled ip options * / Unsigned char flags; #define ipskb_masqueraded 1 #define ipskb_translated 2 #define ipskb_forwarded 4 } Function description When the kernel receives a local IP packet, the first extinction of the fragment recombination is the same before passing to the upper protocol processing. The identification number (ID) between the iP packet slice is the same. When IP wrapper ingage (FRAG_OFF) 14th (IP_MF) is 1, indicating that the IP package has a subsequent fragment. The low 13 bid is the offset of the slice in the full packet, in 8 bytes. When the IP_MF bit is At 0, it means that the IP package is the last piece. If 60-120 seconds (IP_FRAG_TIME constant specified. (30 * hz)) The package is not arriving, the restructuring process fails, the restructuring queue is released, while sending the sender ICMP protocol notification failure information. The memory consumption of the restructuring queue must not be greater than 256K (sysctl_ipfrag_high_thresh), otherwise the (ip_evictor) will be called to release the restructuring queue of each hash end. All IP implementation must be able to reload up to 576 bytes . There may be slice overlap. Treatment of slice overlapping. In order to prevent excessive memory consumption due to reservation, Linux sets the boundary to prevent this, if the upper limit of memory usage is exceeded, the oldest queue (IPQ) in the memory is empty. The size stored in the memory is saved in variables In ip_frag_mem, of course, it should be "atom" operation (Atomic_Sub, Atomic_ADD, Atomic_READ, ETC). It defines the flow of fragment assembly code in the front of the file ip_fragment.c, which is basically the same as the 2.2 series, and the division of the function is changed. Since the original IPFRAG structure retains can be obtained in SKBUFF, this structure is canceled in 2.4, and some modifications have been made to the IPQ structure. Other major changes are: 1) IP_DEFRAG is divided into two parts: IP_DEFRAG and IP_FRAG_QUEUE. 2) IP_GLUE is renamed into IP_FRAG_REASM, the process is basically not movable. 3) Now IPQ reserves the accumulated value of the existing fragment length in the IPQ (already resolved overlap). If this value reaches the total length, all fractions arrive, so the IP_DONE function is canceled, do not have to pass once every time Link list, therefore has a large improvement in efficiency, and the ability to resist small fragment attack is strengthened. ip_findàip_frag_createàip_frag-innà ip_frag_time Process Struct SK_Buff * ip_defrag (Struct Sk_buff * SKB) { If the memory space used for fragmentation is greater than 256K specified by the system, then the cleaning IP_EVICTOR is Specify IP package corresponding device DEV According to the HASH value, localize the position in the slice chain: If there is a slice chain, other slices arrive,? ? ? If it is reached in normal order? Insert the slice into the corresponding slice queue, According to the flag and offset of the package: if it is the last slice package (but not necessarily fractionated), the length of the fragment queue is the length of the original package; if it is a package that is later than the current slice Changing the length of the fragmentation queue; if it is a packet. Adjust the SKB pointer of the package: Make SKB-> DATA points to the IP load (data area), SKB-> Tail points to the IP tail. That is: SKB-> DATA-SKB-> TAIL = IP load area. This is so that the SK_BUFF is added to the fragment chain and is easily spliced in the recombination. Scan the package fragment in the reorganized queue, find the previous fragmentation of the shard in the shard chain: Use the offset to find offset If the fragment is not the first fragment (prev! = Null), eliminate the previous one Split overlap: See the difference between the PREV tail and the current offset, (this offset package is not necessarily the one of the following packages); re-eliminating the overlap of the rear slice, seeking the current offset and NEXT overlap If the current package is smaller than the tail of the rear package, the rear package is moved, the rear sliced overlapping portion is cleared, reducing the total length of the fragmentation statistics; if the latter piece is completely included in this fragment, clear it NEXT, reduce the total length of Split Statistics. Insert the fragment into the fragmentation queue: set the device, increase the total length of the MEAT slice, if the offset is 0, the first package, place the first_in flag. Check if the slice is the last fragment (whether the shard is all, the package length) is: Before the slice reorganization IP_FRAG_REASM recombination, remove the slice queue first. Assign the SK_BUFF structure for the new package, fill in the corresponding value: set the new IP length (cannot exceed 65535 bytes); frame head position, IP header, option data copy the original IP header (the first queue There is a record in a slice) to the new SKB structure cycle copy: copy the slice SKB data on the slice chain to the new SKB structure. Increase the sample value. Set the destination address (cloning), package type, protocol, and device. Perform firewall processing. Reset the IP header, set the 3-bit flag and 13-bit offset to 0, and the total length is calculated. Returns a new IP package. If you are the first fragment (found), create a new entry item inserted into a slice quadch head IP_FRAG_CREATE IP_evictor IP_EVICATOR is called when the memory used by the shard exceeds a certain upper limit (SysctL_IPFRAG_High_Thresh) to release memory. IP_EVICATOR will find an empty IPQ and empty it until it reaches the available lower limit (sysctl_ipfrag_low_thresh). This value is defined below ip_fragment.c: INT SYSCTL_IPFRAG_High_thresh = 256 * 1024; INT SYSCTL_IPFRAG_LOW_THRESH = 192 * 1024; Similarly, these two-parameters can be seen with sysctl -a while dynamically modify. #Nsysctl -a ...... Net.ipv4.ipfrag_low_thresh = 196608 Net.ipv4.ipfrag_high_thresh = 262144 ...... Theoretical ip_evicator should use the LRU algorithm to clear the oldest IPQ. However, there is currently no such function, just emptying the Hash table in order, so that this benefit is simple. Memory Limiting on Fragments. Evictor TRASHES (Discard) The Oldest * Fragment Queue Until We Are Back Under The Low Threshold. The IP_EVICTOR function traverses the slice queue, and discards the slice that has been collected so far until the total amount of memory used is less than the specified limit. This function calls the ip_free function as long as the memory is greater than the memory limit on his memory. When the slide queue is empty and the memory threshold is also exceeded, the IP_EVICTOR function can cause the kerisality of the kernel. LRU algorithm, global variable ipq_hash [64] linked list, the closer to the chain tail, the larger the count, the more it is not easy to wash away Two cycles Each time you wash off the least reference count, until the total occupation memory drops to sysctl_ipfrag_low_thresh = 192 * 1024 IP_FRAG_CREATE Initialize an IP fragment queue, including the timer, the processing function is IP_EXPIRE. Use ip_frag_intern to establish a linked header and insert the time chain. IP_FRAG_INTERN IP_FRAG_NQUEES