Linux network code guide V0.2

zhaozj2021-02-11  192

◆ Linux network code guide V0.2

Author: yawl Home Page: http: //www.nsfocus.com/ Date: 2000-12-141 Introduction Many people in the analysis of the code of linux network part (mostly src / linux / net, SRC / Linux / Include / Net and SRC / Linux / Include / Linux Directory) The files in the directory) is more interesting, indeed, despite the large number of TCP / IP principles from books, do not read the source code, the mind is still built Specific impression. One problem analyzing this part of the code is a lot of code and the information is very small. The purpose of this article is to outline a framework, let readers can understand how TCP / IP works. Many of the code analysis previously seen are based on 2.0 kernel. Many functions in the new kernel have changed their names, especially for beginners, this article is an example of code 2.4.0-Test9, so Clearable code when code is. In fact, the code of the network part only analyzes a line of firewalls. Many of the other places are just a semi-solving, if it is wrong, welcome to correct. It is recommended to establish a project with Source Insight (www.soucedyn.com) while seeing code, which is more effective. I also used some other tools, but when analyzing a lot of code, there is no tool more convenient than it. 2 The seven-layer model of the body ISO is very familiar. Of course, for the Internet, it is more suitable with a four-layer model. In these two models, the network protocol appears in the form of hierarchy. In Linux's kernel code, strictly divided the clear level is more difficult, because in addition to some "Kernel Thread outside", the entire kernel is actually a single process. Therefore, the so-called "network layer" is just a set of related functions, and most of the layers are completed through the general function calls. Logically, the code of the network part should be more reasonable such that .BSD Socket layer: This section processes the BSD socket-related operation, and each socket is reflected in the Struct Socket structure in the kernel. The files of this part include: /net/socket.c /net/protocols.c etc .inet Socket layer: BSD socket is an interface that can be used for a variety of network protocols, and when used for TCP / IP, an AF_INET is established When the form of Socket, there is also a need to retain some additional parameters, so there is a Struct Sock structure. The files mainly include: /net/ipv4/protocol.c /net/ipv4/af_inet.c /net/core/sock.c etc .TCP / UDP layer: Processing the operation of the transport layer, the transport layer with struct inet_protocol and struct proto two Structure representation. The file mainly: /net/ipv4/udp.c /net/ipv4/datagram.c /net/ipv4/tcp.c /net/ipv4/tcp_input.c /net/ipv4/tcp_output.c / net / ipv4 / TCP_MINISOCKS.C /NET/IPV4/TCP_OUTPUT.C /NET/IPV4/TCP_TIMER.C ETC. IP layer: Handle the operation of the network layer, the network layer is represented by the struct packet_type structure.

The files mainly include: /net/ipv4/ip_forward.c ip_fragment.c ip_input.c ip_output.c etc .. Data Link Layer and Driver: Each network device is represented by Struct Net_Device, and the general processing is in Dev.c. The driver is in the / driver / net directory. There are still many other files such as firewalls, routes, etc. Now I have to give a table, the content of the full text is to illustrate this table (if you think that I am more boring in the language in the article, letting them, combine this table yourself). When I initially read the network part of the network, I prefer a parap with "Linux Kernel Internals", one of which process A sent a package to another process B through the network. Detail a packet from the network stack. The process of walking through. I think this can help the reader to see the entire forest of the forest, so this paper will be described with reference to this structure.

^ | Sys_read fs / read_write.c | sock_read net / socket.c | sock_recvmsg net / socket.c | inet_recvmsg net / ipv4 / af_inet.c | udp_recvmsg net / ipv4 / udp.c | skb_recv_datagram net / core / datagram.c | ------------------------------------------- | SOCK_QUEUE_RCV_SKB INCLUDE / NET / SOCK .h | udp_queue_rcv_skb net / ipv4 / udp.c | udp_rcv net / ipv4 / udp.c | ip_local_deliver_finish net / ipv4 / ip_input.c | ip_local_deliver net / ipv4 / ip_input.c | ip_recv net / ipv4 / ip_input.c | net_rx_action net / dev.c | ------------------------------------------- | Netif_RX Net / dev.c | EL3_RX Driver / Net / 3C309.C | EL3_INTERRUPT DRIVER / NET / 3C309.C ================================================================================================================================================================== SYS_WRITE FS / Read_Write.c | SOCK_WRITEV NET / SOCKET.C | sock_sendmsg net / socket.c | inet_sendmsg net / ipv4 / af_inet.c | udp_sendmsg net / ipv4 / udp.c | ip_build_xmit net / ipv4 / ip_output.c | output_maybe_reroute net / ipv4 / ip_output.c | ip_output net / ipv4 / ip_output. C | ip_finish_output net / ipv4 / ip_output.c | dev_queue_xmit net / dev.c | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------ | EL3_START_XMIT Driver / Net / 3C309.CV We assume that the environment is as follows: There are two hosts together with the Internet, one of which runs this process A, another one Run process B,

Process A will send a message to process B, such as "Hello", and B accept this information. TCP processing itself is very complicated, in order to facilitate the narrative, we will use UDP as an example. 2.1 Create a socket Before the data is sent, set a socket (Socket), which is called in the program on both sides: ... int Sockfd; sockfd = socket (AF_INET, SOCK_DGRAM, 0); ... This is a system call, so it will enter the system kernel through the 0x80 interrupt, call the corresponding function in the core. When looking for the corresponding process of the system calls in the kernel, it is generally joined "SYS_" and then find it, if it is for Fork, That is to call SYS_FORK. But Socket correlation calls are somewhat special, all such calls are passed through an entry, ie sys_socketcall enters the system kernel, and then call the specific sys_socket, socket_bind, etc. of the parameter. Sys_socket calls SOCK_CREATE to generate a struct socket structure (see include / Linux / net.h), each socket has such a structure in the kernel, after initializing some of the general members of this structure (such as assigning inode According to the second parameter is TYPE item assignment, according to the scheduling of its parameters, that is, this sentence: ... Net_Families [Family] -> CREATE (SOCK, Protocol); ... our program The first parameter is an AF_INET, so this function pointer will point to inet_create (); (Net_Families is an array, the information of the network protocol (NET Families), and these protocols are loaded with Sock_Register.) Most in the Struct Socket structure structure Important information is kept in the Struct Sock structure, which is often used in the network code. It is recommended to print it and other common structures (such as Struct SK_Buff). In the INET_CREATE, it will allocate memory for this structure, and according to the socket type (actually the second parameter of the socket function), made different initialization: ... if (SK-> Prot-> init) SK-> Prot -> Init (SK); ... If the type is SOCK_STREAM, TCP_V4_INIT_SOCK will call, and the SOCKET of the SOCK_DGRAM type does not have additional initialization, and the Socket call ends. There is also a thing worth noting that when inet_create () is called, then the SOCK_MAP_FD function will then be invoked, and a file descriptor is assigned a file descriptor and assigns a file file. Treat sockets like processing files in the application layer. At the beginning, some processes may be difficult to follow, mainly the actual pointing of these function pointers will change according to type.

2.2 Sending Data When the process A wants to send data, the program will call the following statement (if you use the sendto function, you will take a similar process, omitted): ... Write (Sockfd, "Hello", Strlen ("Hello")) ... WRITE The corresponding function in the kernel is Sys_WRITE, which first finds the Struct File structure according to the file descriptor, if this file exists (File pointer is not empty) and can be written (file-> f_mode & fmode_write is true), Write an operation of this file structure: ... if (file-> f_op& (write = file-> f_op-> write)! = NULL) RET = Write (file, buf, count, & file-> f_pos); ... where struct file_operations f_op is structured pointer, which points in the sock_map_fd socket_file_ops, which is defined as follows (/net/socket.c):static struct file_operations socket_file_ops = {llseek: sock_lseek, read: sock_read, write: sock_write, poll : sock_poll, ioctl: sock_ioctl, mmap: sock_mmap, open: sock_no_open, / * special open code to disallow open via / proc * / release: sock_close, fasync: sock_fasync, readv: sock_readv, writev: sock_writev}; In this case the function pointer wirte Obviously point to SOCK_WRITE, we follow up, this function puts a string buffer into struct msghdr, and finally called SOCK_SENDMSG.SOCK_SENDM SCM_SEND in SG I don't know (SCM is short of socket level constroles), it is not very critical, we noticed this sentence: ... SOCK-> Ops-> Sendmsg (Sock, MSG, Size, & SCM .... is another function pointer, SOCK-> OPS is initialized in the inet_create () function, because we are UDP socket, SOCK-> OPS points to inet_dgram_ops (ie sock-> ops = & inet_dgram_ops;) It is defined in Net / IPv4 / AF_INET.C: struct proto_ops inet_dgram_ops =

{Family: PF_INET, release: inet_release, bind: inet_bind, connect: inet_dgram_connect, socketpair: sock_no_socketpair, accept: sock_no_accept, getname: inet_getname, poll: datagram_poll, ioctl: inet_ioctl, listen: sock_no_listen, shutdown: inet_shutdown, setsockopt: inet_setsockopt, getsockopt : Inet_getsockopt, sendmsg: inet_sendmsg, recvmsg: inet_recvmsg, mmap: SOCK_NO_MMAP,}; therefore, we must see it is an inet_sendmsg () function, and immediately, this function calls another function by function pointer: ... SK- > Prot-> Sendmsg (SK, MSG, SIZE); ... we have to look for its specific pointing. Seeing this, how can I find the specific definition? I usually this: For the above example, SK is a Struct Sock structure, and it is defined (Linux / NET / Sock.h) Seeing Prot is a Struct Proto structure. At this time, we look for all in the source tree. Examples of this structure (these such as jump to definitions, looking for references, etc., it is too convenient to quickly ^ _ ^) in Source Insight, soon, it will find UDP_PROT, TCP_PROT, RAW_PROT, etc., guessing is UDP_PROT, then Find a reference to it in the source code, really found in inet_create, there is such a sentence: ... prot = & udp_prot; ... In fact, if you look at the inet_create function, you will find it earlier, but I don't have such a heart. :). We walk down with udp_sendmsg: The main role of this function is to fill the UDP header (source port, destination port, etc.), then call ip_route_output, the role is to find out the route, then: ... ip_build_xmit (SK, (SK -> no_check == udp_csum_noxmit? udp_getfrag_nosum: udp_getfrag), & UFH, ULEN, & IPC, RT, MSG-> msg_flags); ... IP_BUILD_XMIT function is a significant proportion to generate SK_BUFF and add IP headings for packets.

There is such a sentence behind: ... nf_hook (pf_inet, nf_ip_local_out, skb, null, rt-> u.dst.dev, output_maybe_reroute); ... Simply put, you can save this without firewall code intervention Treatment is to call Output_maybe_reroute directly, (specifically see the "Nuclear Firewall Netfilter Getting Started" in the Green Alliance Monthly) and only one sentence in Output_maybe_reroute: Return SKB-> DST-> OUTPUT (SKB); still on the top of the same (not this It is really not very easy to find), it is found that this pointer is specified in IP_ROUTE_OUTPUT, (Tip: ip_route_output_slow: rth-> u.dst.output = ip_output;), IP_ROUTE_OUTPUT's role is to find routing, and record the results SKB-> DST. So, we started watching the ip_output function, and it immediately went to ip_finish_output ~~. Each network device, such as a network card, in the kernel, is represented by a net_Device in IP_FINISH_OUTPUT (which is also initialized in IP_ROUTE_OUTPUT), this parameter is transmitted to Netfilter in the NF_IP_POST_ROUTING point registration function, after the end Call ip_finish_output2, and this function will be called: ... hh-> hh_output (SKB); ... Gossip less, actually call dev_queue_xmit, where we completed the TCP / IP layer, started data link The processing of the layer. After doing some judgment, the actual call is this sentence: ... dev-> hard_start_xmit (SKB, DEV); ... This function is defined in the network card driver, each different network card has different Processing, my network card is a relatively universal 3C509 (whose driver is 3c509.c), when the network card is processed (EL3_PROBE), there is: ... dev-> hard_start_xmit = & el3_start_xmit; ... The IO operation, the packet is truly sent to the network to the end of the transmission process. In the middle, I said some grassroots, completely missed, blocked, fragmentation, etc., only description of the ideal process. The purpose of this essay is to help everyone build a rough impression, in fact, each place has a very complex treatment (especially TCP part). 2.3 Accept Data When there is a data to the NIC, a hardware interrupt will be generated, then the function in the network card driver is called to handle it. For my 3C509 network card, the handler is: EL3_Interrupt. (The corresponding IRQ number is determined by the Request_irq function when the network card is initialized.) This interrupt handler must first be done to perform some IO operations to read data (read IO INW function), when data frame is successful After receiving, EL3_RX (DEV) is executed further.

In EL3_RX, the received data report is packaged into struct SK_BUFF, and is detached from the driver to the universal processing function Netif_RX (dev.c). For the efficiency of the CPU, the upper processing function will be activated by soft interrupt, and an important job of Netif_RX is to put the incoming SK_BUFF in the waiting queue, and set the soft interrupt flag, and then rest assured to return, wait The arrival of a network data package: ...__ SKB_QUE_TAIL (& queue-> INPUT_PKT__QUEUE, SKB); __ CPU_RAISE_SOFTIRQ (this_cpu, net_rx_softirq); ... This place has been referred to as "bottom half" processing in the 2.2 kernel - Bottom Half, Its internal implementation is basically similar, and the purpose is to return from the interrupt. After a while, a CPU scheduling will occur due to some reasons (such as the time of the process is used). In the process scheduling function, it is schedule (), if there is a soft interrupt, if there is a corresponding processing function: ... if (Softirq_Active (this_cpu) & software oto handle_softirq; handle_softirq_back: ... ... handle_softirq: do_softirq (); goto handle_softirq_back; ... When the system is initialized, specifically in NET_DEV_INIT, this soft interrupt process function is set to NET_RX_ACTION: ... Open_Softirq (NET_TX_SOFTIRQ, NET_TX_ACITION, NULL) ; ... When the next process schedules are executed, the system checks if NET_TX_SOFTIRQ soft interrupts occur, if there is, call NET_RX_ACTION. The NET_TX_ACITION function is both a NET_BH function in version 2.2. There are two global variables in the kernel to register the network layer, one is a chain table ptype_all, and the other is a group ptype_base [16], and they record the third floor of all kernels. (According to the OSI7 layer model) protocol. The reception of each network layer is represented by a struct packet_type, and this structure will register them in ptype_all or ptype_base. Only the type item in packet_type is embh_p_all, will be registered in the PTYPE_ALL chain, otherwise, if IP_PACKET_TYPE, the corresponding location will be found in the array ptype_base [16]. Both are different from if it is registered in eth_p_all, then the process function is subject to all types of packages, otherwise it can only handle the type of yourself. SKB-> Protocol is assigned in EL3_RX, in fact, the upper protocol name extracted in the Etheri frame header information, for our example, this value is eth_p_ip, so in the NET_TX_ACTION, the reception function of the IP layer is selected. And it is not difficult to see from ip_packet_type, this function is IP_RECV (). PT_PREV-> FUNC (actually pointing IP_RECV) A atomic_inc_inc (& SKB-> Users) operation (this place is a SKB_CLONE in the 2.2 core, the principle is similar), the purpose is to increase the number of references to this SK_Buff.

The reception function of the network layer is processed or because some reason to discard this SK_BUFF (such as a firewall) will call KFree_skb, and the kfree_skb will first check if there are other places to need this function, if there is no place to use, really release This memory (__kfree_skb), otherwise it is just a counter minus one. Now let's take a look at IP_RECV (Net / IPv4 / IP_INPUT.C). The operation of this function is very clear: first check the legality of this package (whether the version number, length, checksum, etc. are correct), if legitimated, the next process is performed. In the 2.4 core, in order to flexibly process the firewall code, the original IP_RECV is divided into two parts, and the latter half of the original IP_RECV will independently an IP_RCV_FINISH function. In IP_RCV_FINISH, a part is an IP package with IP options such as source routing. The exception is to look for routing through ip_route_input, and record the results into SKB-> DST. There are two packages received at this time, sent to the local process (need to be transmitted to the upper protocol) or forward (used as a gateway), and the required processing function is not the same. If it is transmitted to the local, IP_LOCAL_DELIVER (/ Net / IPv4 / IP_INPUT.C), otherwise call ip_forward (/net/ipv4/ip_forward.c) .skb-> dst-> input This function pointer will lead the data to the correct road. For our example, it should be time to call ip_local_deliver. The hair package is very likely to be a fragmentation. In this case, we should first put them back to the upper agreement. Of course, it is also the first job made by the ip_local_deliver function. If the assembly is successful (return SK_Buff is not empty) Then, continue processing (detailed assembly algorithm can see "Analysis of IP Split Restructuring and Common Debris Attack" in the 13th issue of the Green Alliance. But at this time, the code was divided into two by Netfilter. Like the front, we go directly to the second half, ip_local_deliver_finish (/net/ipv4/ip_input.c). The processing of the transport layer (such as TCP, UDP, RAW) is registered in INET_PROTOS (via inet_add_protocol). IP_LOCAL_DELIVER_FINISH will call the corresponding processing function according to the upper layer protocol information (ie iPh-> protocol) in the IP header information. For the sake of easy, we use UDP, and the ipprot-> handler is actually UDP_RCV. As mentioned earlier, each socket created in the application has a Struct Socket / Struct Sock in the kernel. UDP_RCV will first find the SOCK in the kernel via UDP_V4_LOOKUP, and then call UDP_QUE_RCV_SKB (/net/ipv4/udp.c).

Immediately, the SOCK_QUEUE_RCV_SKB function is called, this function puts SK_BUF in the waiting queue, then notifies the upper layer data to reach: ... KB_SET_OWNER_R (SKB, SK); SKB_QUE_TAIL (& Sk-> Receive_Queue, SKB); if (! SK-> DEAD) SK-> DATA_READY (SK, SKB-> LEN); RETURN 0; ... SK-> Data_Ready definition When the SOCK structure is initialized (SOCK_INIT_DATA): ... SK-> Data_Ready = SOCK_DEF_READABLE; now we It will be seen from top to bottom: Process B To receive the datagram, call: ... read (SockFD, buff, sizeof (buff)); ... This system calls in the kernel function is SYS_READ (fs / read_write.c) The following processing is similar to WRITE operation, no more detail .udp_recvmsg function calls SKB_RECV_DATAGRAM, if the data has not arrived, and the socket is set to block mode, the process will hang (Signal_Pending (current)) Until the Data_Ready Notification Process Resource is given to the meeting, continue (Wake_UP_Interruptible;). 2.4 Skbuff network code has a large number of processing involving SK_BUFF, although it avoids it as much as possible in this article, it must be analyzed at the time of careful analysis, the data packet is transmitted in the form of SK_BUFF. It can be said that it is the most important data structure of the network. Specific analysis recommended to see the "NetWork Buffers and Memory Management" of Alan Cox, this is published on Linux Journal in October 1996. Here is a picture in Phrack 55-12, although it only depicts the minimal side of SK_Buff, it is very useful, especially when you always forget that SKB_PUT is back or backward or backward Time :) --- ----------------- Hand ^ | | | | | | ------------- ---- Data --- --- | | | ^ | True | | | V SKB_PULL SIZE | | LEN | | | | V | | ------------ ----- tail --- --- | | | | | | | - - ---------------- Endlinux network layer efficiency: in Linux's network layer code is a large number of applications, the purpose is to avoid the operation of data copying system resources. The data segment portion of a packet is only twice a copy, that is, from the network card to the core state memory, and the intra-user memory is taken from the core state.

转载请注明原文地址:https://www.9cbs.com/read-5637.html

New Post(0)