Linux network code guide V0.2

zhaozj2021-02-08  252

Linux network code guide V0.2

Release date:

Author: yawl

Home: http://www.nsfocus.com

1 Introduction

Many people analyze the network part when analyzing Linux code (mainly src / linux / net, src / linux / incrude / Net and

SRC / Linux / Include / Linux Directory) is more interested, it is indeed, although a lot of books have been learned from books.

TCP / IP principle, if you don't read the source code, the mind is still unprecedented. And analyze this part of the code

The title is a lot of code and the information is very small. The purpose of this article is to outline a framework, let readers can roughly

How is TCP / IP work? Many of the code analysis previously seen is based on 2.0 kernel, in the new kernel

Many functions have changed their names, especially for beginners, this article is an example of code from 2.4.0-Test9, so

The control code may be clearer.

In fact, the code of the network part is only carefully analyzed by a line of firewall, and many other places are just a semi-solving.

If it is wrong, welcome to correct.

It is recommended to establish a project with Source Insight (www.soucedyn.com) while seeing this article.

This is more effective. I have used some other tools, but when analyzing a lot of code, there is no work.

It is more convenient than it.

2 text

The ISO's seven-layer model is very familiar, of course, for Internet, more suitable with four-layer model. In these two models,

The network protocol appears in a hierarchical form. In the kernel code of Linux, it is more difficult to strictly divide a clear level.

In addition to some "KERNEL THREAD), the entire kernel is actually a single process. Therefore, the so-called "network layer

", Just a set of related functions, and most of the layers are completed through the general function calls.

And from logically, the code of the network part should be more reasonable such that it is hidden:

.BSD Socket layer: This section handles BSD socket related operations, each Socket in the kernel in the Struct Socket knot

Configuration.

The files of this part include: /net/socket.c /net/protocols.c ETC

.Inet Socket layer: BSD socket is an interface that can be used for various network protocols, while it is used for TCP / IP,

When Socket in the form of AF_INET, some additional parameters are needed, so there is a Struct Sock structure.

The files are mainly: /net/ipv4/protocol.c /net/ipv4/af_INET.C /NET/CORE/SOCK.C ETC

.TCP / UDP layer: Processing the operation of the transport layer, the transport layer uses two structural tables for Struct INET_PROTOCOL and STRUCT PROTO

Indicated.

The files are mainly: /net/ipv4/udp.c /net/ipv4/datagram.c /net/ipv4/tcp.c /net/ipv4/tcp_input.c

/net/ipv4/tcp_output.c /net/ipv4/tcp_minisocks.c /net/ipv4/tcp_output.c

/net/ipv4/tcp_timer.c ETC

.IP layer: Handling the operation of the network layer, the network layer is represented by the struct packet_type structure.

The files mainly include: /net/ipv4/ip_forward.c ip_fragment.c ip_input.c ip_output.c etc .. Data Link Layer and Driver: Each network device is represented by Struct Net_Device, and the general processing is in Dev.c.

The driver is in the / driver / net directory.

There are still many other files, such as firewalls, routing, etc., usually can guess the corresponding

It will not be described later here.

Now I have to give a table, the content of the full text is to illustrate this table (if you think that I am in the language in the article

Bored, letting them, combine this table to see the code). When I first saw the network part of the code, I prefer it.

A section of Chapter 8 of "Linux Kernel Internals", one of which process A is sent to another process B through the network

Examples of packages, in detail how a packet is from the process of walking from the network stack. I think this can be more quickly

Help readers to see the whole of the forest, so this article refers to this structure.

Narrate.

^

| SYS_READ FS / Read_Write.c

| SOCK_READ NET / SOCKET.C

| SOCK_RECVMSG NET / SOCKET.C

| inet_recvmsg Net / IPv4 / AF_INET.C

| udp_recvmsg net / ipv4 / udp.c

| SKB_RECV_DATAGRAM NET / CORE / DATAGRAM.C

| -------------------------------------------

| SOCK_QUEUE_RCV_SKB INCLUDE / NET / SOCK.H

| udp_queue_rcv_skb net / ipv4 / udp.c

| udp_rcv net / ipv4 / udp.c

| ip_local_deliver_finish Net / IPv4 / IP_INPUT.C

| ip_local_deliver net / ipv4 / ip_input.c

| ip_recv Net / IPv4 / IP_INPUT.C

| NET_RX_ACITION NET / Dev.c

| -------------------------------------------

| Netif_Rx Net / Dev.c

| EL3_RX Driver / Net / 3C309.c

| EL3_INTERRUPT Driver / Net / 3C309.c

==========================

| SYS_WRITE FS / Read_Write.c

| SOCK_WRITEV NET / SOCKET.C

| SOCK_SENDMSG NET / SOCKET.C | inet_sendmsg Net / IPv4 / AF_INET.C

| udp_sendmsg net / ipv4 / udp.c

| ip_build_xmit net / ipv4 / ip_output.c

| OUTPUT_MAYBE_ROUTE NET / IPv4 / ip_output.c

| ip_output net / ipv4 / ip_output.c

| ip_finish_output net / ipv4 / ip_output.c

| dev_queue_xmit net / dev.c

| --------------------------------------------

| EL3_START_XMIT DRIVER / NET / 3C309.C

V

The environment we assume as follows: There are two mainframes together, one of the machines runs this process A,

Another run process B, process A will send a message to process B, such as "Hello", and B accept this information.

TCP processing itself is very complicated, in order to facilitate the narrative, we will use UDP as an example.

2.1 Create a socket

Before the data is sent, set a socket (socket), which is called in both sides of the program:

...

Int sockfd;

Sockfd = Socket (AF_INET, SOCK_DGRAM, 0);

...

This is a system call, so it will enter the system kernel through the 0x80 interrupt, call the corresponding function in the core. When looking for system adjustment

When using the corresponding flow in the kernel, it is generally added "SYS_" to find it, if it is called for Fork, it is called

SYS_FORK. But Socket-related calls are somewhat special, all such calls are passed through an entry, namely

SYS_SOCKETCALL enters the system kernel and then calls the specific sys_socket, socket_bind, and so on by parameters.

Sys_socket calls Sock_create to generate a struct socket structure (see include / linux / net.h), each set

There is such a structure in the kernel, and some general members are initialized (such as allocation).

Inode, based on the second parameter is TYPE item, the value will be scheduled according to its parameters, ie this

One sentence:

...

Net_Families [Family] -> Create (Sock, Protocol);

...

The first parameter of our program is AF_INET, so this function pointer will point to INET_CREATE ();

(NET_FAMILIES is an array that retains information from NET families, and these protocols are used

SOCK_REGISTER loads. )

The most important information in the Struct socket structure is kept in the Struct Sock structure, this structure is in the network code

Often use, it is recommended to print it and other common structures (such as Struct SK_BUFF) to put it on hand. In inet_create

The memory will be allocated for this structure and according to the socket type (actually the second parameter of the socket function), made different initialization:

...

IF (SK-> Prot-> init)

SK-> Prot-> Init (SK);

...

If the type is SOCK_STREAM, TCP_V4_INIT_SOCK is called, and the socket_dgram type

Socket does not have additional initialization, and the Socket call ends.

There is also a note that is worth noting when inet_create () is called, then call the sock_map_fd function, this

A file descriptor is assigned a file descriptor in a function and assigns a file file. The application layer is like processing files like

Treatment socket.

At the beginning, some processes may be difficult to follow, mainly the actual pointing of these function pointers will change according to type.

2.2 Sending Data

When the process A wants to send data, the program will call the following statement (if you use the Sendto function, you will take a similar process.

slightly):

...

Write (Sockfd, "Hello", Strlen ("Hello");

...

The corresponding function of the WRITE is SYS_WRITE, which first finds the Struct File structure according to the file descriptor.

If this file exists (File pointer is not empty) and writable (file-> f_mode & fmode_write is True),

Write action with this file structure:

...

IF (file-> f_op& (write = file-> f_op-> write)! = null)

Ret = Write (file, buf, count, & file-> f_pos);

...

Where f_op is a struct file_operations structure pointer, pointing it to Socket_File_Ops in SOCK_MAP_FD,

It is defined as follows (/net/socket.c):

Static struct file_operations socket_file_ops = {

Llseek: SOCK_LSEEK,

Read: SOCK_READ,

Write: sock_write,

Poll: Sock_Poll,

IOCTL: SOCK_IOCTL,

MMAP: SOCK_MMAP,

Open: SOCK_NO_OPEN, / * Special Open Code to Disllow Open VIA / Proc * /

Release: SOCK_CLOSE,

Fasync: sock_fasync,

Readv: SOCK_READV,

Writev: SOCK_WRITEV

}

At this time, the WIRTE function pointer clearly points to SOCK_WRITE, we follow, this function will consolidate a string buffer into

Struct Msghdr, finally called SOCK_SENDMSG.

SCM_SEND in Sock_sendmsg I don't know (SCM is short of Socket Level Control Messages,

It is not very critical, we noticed this sentence:

...

SOCK-> OPS-> Sendmsg (Sock, MSG, SIZE, & SCM); ...

It is also a function pointer, SOCK-> OPS is initialized in the inet_create () function, because we are udp set

Tag, SOCK-> OPS points to INET_DGRAM_OPS (ie SOCK-> OPS = & INET_DGRAM_OPS;), it is defined

Net / IPv4 / AF_INET.C:

Struct proto_ops inet_dgram_ops = {

Family: pf_inet,

Release: INET_RELEASE,

Bind: INET_BIND,

Connect: INET_DGRAM_CONNECT,

SocketPair: SOCK_NO_SOCKETPAIR,

Accept: SOCK_NO_ACCEPT,

GetName: INET_GETNAME,

Poll: DataGram_Poll,

IOCTL: INET_IOCTL,

Listen: SOCK_NO_LISTEN,

Shutdown: inet_shutdown,

SetsockOpt: inet_setsockopt,

GetSockOpt: inet_getsockopt,

Sendmsg: inet_sendmsg,

Recvmsg: INET_RECVMSG,

MMAP: SOCK_NO_MMAP,

}

So we have to see it is an inet_sendmsg () function, and immediately, this function is called by the function pointer.

A function:

...

SK-> Prot-> Sendmsg (SK, MSG, SIZE);

...

We have to look for its specific pointing. Seeing this, how can I find the specific definition? I

This is usually this: For the above example, SK is a Struct Sock structure to be defined (Linux / Net / Sock.h)

PROT is a struct proto structure where we look for instances of all this structure in the source tree (these such as jump to

Definition, find reference, etc. Work in Source Insight is too convenient to quickly ^ _ ^), soon, you will find that

UDP_PROT, TCP_PROT, RAW_PROT, etc., guessing is UDP_PROT, then find it in the source code

Sure enough, I found this in INET_CREATE:

...

Prot = & udp_prot;

...

In fact, if you look at the inet_create function, you will be discovered earlier, but I have not so careful :).

Let's go down with UDP_SENDMSG:

The main role of this function is to fill the UDP header (source port, destination port, etc.), then call

IP_ROUTE_OUTPUT, the role is to find out the route, then:

...

ip_build_xmit (SK,

(SK-> NO_CHECK == UDP_CSUM_NOXMIT?

UDP_GETFRAG_NOSUM:

UDP_GETFRAG,

& Ufh, Ulen, & IPC, RT, MSG-> msg_flags; ...

The IP_BUILD_XMIT function is a significant proportion that generates SK_BUFF and adds an IP header for the packet.

There is such a sentence behind:

...

NF_HOOK (PF_INET, NF_IP_LOCAL_OUT, SKB, NULL,

RT-> u.dst.dev, output_maybe_reroute;

...

Simply put, in the absence of firewall code intervention, you can understand it directly as direct calls

Output_maybe_reroute, (specifically see "Introduction to Nuclear Firewall Netfilter in the 16th issue of the Green Alliance")

There is only one sentence in Output_maybe_Reroute:

Return SKB-> DST-> OUTPUT (SKB);

According to the above method (but this is really not very good to find), it is found that this pointer is in IP_ROUTE_OUTPUT.

Set, (Tip: ip_route_output_slow: rth-> u.dst.output = ip_output;), ip_route_output

The role is to find the route and record the results into SKB-> DST.

So, we started watching the ip_output function, and it immediately went to ip_finish_output ~~.

Each network device, such as a network card, in the kernel, by a net_device, find it in ip_finish_output

Equipment (also initialized in IP_ROUTE_OUTPUT), this parameter is passed to Netfilter

NF_IP_POST_ROUTING Point registration function, then call ip_finish_output2, and this function will

transfer:

...

HH-> hh_output (SKB);

...

Gossip less, actually call dev_queue_xmit, where we completed the TCP / IP layer, started data link

The processing of the layer.

After making some judgment, the actual call is this sentence:

...

DEV-> HARD_START_XMIT (SKB, DEV);

...

This function is defined in the network card driver, each different network card has different processing, my network card is more

3C509 (its driver is 3c509.c), when the NIC is handled (EL3_PROBE), there is:

...

Dev-> hard_start_xmit = & el3_start_xmit;

...

Next, it is an IO operation, and the packet is truly sent to the network to the end of this transmission process.

In the middle, I said some grass rate, completely missed the middle, such as an error, blocking, fragmentation and other special processing, just will be ideal

The process is described.

The purpose of this essay is to help everyone build a rough impression, in fact, each place has a very complex handling

(Especially the TCP portion).

2.3 Accept data

When there is a data to the NIC, a hardware interrupt will be generated, then the function in the network card driver is called.

For my 3C509 network card, its processing function is: EL3_INTERRUPT. (The corresponding IRQ number is started in the system, the network card

When it is initialized, it is determined by the Request_IRQ function. This interrupt handler is the first to do, of course, is some IO operation to read data (read IO INW functions), and perform EL3_RX (DEV) further processing when the data frame is successfully accepted.

In EL3_RX, the received data report is encapsulated into struct SK_BUFF, and detached from the driver to a universal handle.

Number Netif_Rx (dev.c). For the efficiency of the CPU, the upper process function will be activated by soft interruption.

An important job of Netif_RX is to put the incoming SK_BUF in the waiting queue and set the soft interrupt flag, and then

Can be assured to return, wait for the next network packet arrival:

...

__skb_queue_tail (& Queue-> Input_PKT_QUE, SKB);

__CPU_RAISE_SOFTIRQ (this_cpu, net_rx_softirq);

...

This place has been referred to as "half" processing in the 2.2 core - Bottom Half, and its internal implementation is basically similar, the purpose is fast

Speed ​​back from the interrupt.

After a while, a CPU scheduling will occur due to some reasons (such as the time of the process is used). In progress

The scheduling function is schedule (), if there is no soft interrupt, if there is a corresponding processing function:

...

IF (Softirq_Active (this_cpu) & Softirq_Mask (this_cpu))

Goto Handle_softirq;

Handle_softirq_back:

...

...

Handle_softirq:

DO_SOFTIRQ ();

Goto handle_softirq_back;

...

When the system is initialized, specifically in NET_DEV_INIT, the processing function of this soft interrupt is set

NET_RX_ACITION:

...

Open_softirq (net_tx_softirq, net_tx_action, null);

...

When the next process schedules are executed, the system will check if NET_TX_SOFTIRQ soft interrupts occur.

Call NET_RX_ACITION.

The NET_TX_ACITION function is both a NET_BH function in version 2.2, and there are two global variables in the kernel to register network layers.

One is the chain table ptype_all, and the other is an array ptype_base [16], they record all kernels to handle

The third layer (according to the OSI7 layer model) protocol. The reception of each network layer is handled by one

Struct packet_type is said, and this structure will register them into ptype_all or

PTYPE_BASE. Only the TYPE item in packet_type is embh_p_all, will be registered to the PTYPE_ALL LED list

In otherwise IP_PACKET_TYPE, the corresponding location will be found in the array ptype_base [16]. Both, if

Is registered in eth_p_all type, then the process function is subject to all types of packages, otherwise you can only handle yourself.

Type.

SKB-> protocol is assigned in EL3_RX, in fact, the upper protocol name extracted in the Etheri frame header information, for me

For example, this value is eth_p_ip, so in the net_tx_action, the reception process of the IP layer is selected.

Number, and it is not difficult to see from ip_packet_type, this function is IP_RECV ().

PT_PREV-> FUNC (actually pointing to IP_RECV) has an Atomic_inc (& SKB-> User) operation (at 2.2 kernel)

This place is a SKB_CLONE, the principle is similar), and the purpose is to increase the number of references for this SK_Buff. Network layer

The recovery function is processed or because some reason is to discard this SK_BUFF (such as a firewall) will call kfree_skb, and

You will first check if there are other places in Kfree_skb, if you don't use it, you really release this

Save (__kfree_skb), otherwise it is just a counter minus one.

Now let's take a look at IP_RECV (Net / IPv4 / IP_INPUT.C). This function is very clear: first check

The legality of this package (whether the version number, length, checksum, etc. are correct), if legitimated, the next process is performed. in

2.4 In the kernel, in order to flexibly process the firewall code, divide the original IP_RECV into two parts, will be the original

The second half of the IP_RECV independently an IP_RCV_FINISH function. In IP_RCV_FINISH, a part is an IP option

(If the source routing, etc.), the exception is to find the route through ip_route_input, and record the result to SKB-> DST

in. There are two packages received at this time, sent to the local process (need to pass the upper protocol) or forward (used as a gateway),

The processing function you need is also different. If you pass the local, call ip_local_deliver (/net/ipv4/ip_input.c),

Otherwise, call ip_forward (/net/ipv4/ip_forward.c) .skb-> dst-> Input This function pointer will lead the data

The correct road.

For our example, it should be time to call ip_local_deliver.

The package is very likely to be a fragmentation, so that you should first put them again to the upper agreement, this is of course also

Is the first job made by the ip_local_deliver function, if the assembly is successful (return SK_BUFF is not empty), continue

The processing (detailed assembly algorithm can be found in the "Analysis of IP Split Restructuring and Common Debris Attack" in the 13th issue of the Green Alliance.

But at this time, the code is divided into two by Netfilter. Like the front, we directly go to the second half, ie

IP_LOCAL_DELIVER_FINISH (/net/ipv4/ip_input.c) goes.

The processing of the transport layer (such as TCP, UDP, RAW) is registered in INET_PROTOS (through

INET_ADD_PROTOCOL). IP_LOCAL_DELIVER_FINISH will be based on the upper level agreement information in IP header (ie

IPH-> protocol, call the corresponding processing function. For the sake of easy, we use UDP, IPPROT-> HANDLER at this time

The actual is UDP_RCV.

As mentioned earlier, there is a struct socket / struct Sock pair in each socket created in the application.

should. UDP_RCV will first find the SOCK in the kernel through udp_v4_lookup, and then call it as a parameter

UDP_QUEUE_RCV_SKB (/net/ipv4/udp.c). Immediately, the SOCK_QUEUE_RCV_SKB function is called, this function puts SK_BUF in the waiting queue, then notifies the upper data arrival:

...

KB_SET_OWNER_R (SKB, SK);

SKB_QUEUE_TAIL (& SK-> Receive_Queue, SKB);

IF (! SK-> DEAD)

SK-> DATA_READY (SK, SKB-> LEN);

Return 0;

...

SK-> Data_Ready defines when the SOCK structure is initialized (SOCK_INIT_DATA):

...

SK-> DATA_READY = SOCK_DEF_READABLE;

...

Now we have to look from the top:

Process B To receive a dataginary, call in the program:

...

Read (Sockfd, Buff, Sizeof (BUFF);

...

This system calls the function in the kernel is Sys_READ (fs / read_write.c) to process the processing similar to Write, no longer

Details .udp_recvmsg function calls SKB_RECV_DATAGRAM, if the data has not arrived, and socket is set to the resistance

When the plug mode, the process will hang (Signal_Pending (current)) until the Data_Ready notifies the process resource

Continue processing after a feet (Wake_up_Interruptible (SK-> Sleep);).

2.4 SKBUFF

A large amount of processing in the network code involves the operation of SK_Buff, although it is avoided in this article, but carefully analyzed

At this time, you must analyze this, the data packet is transmitted in the form of SK_BUFF in the form of SK_BUFF.

The most important data structure of the network. Specific analysis suggests to see Alan Cox "NetWork Buffers and Memory

", This is published on Linux Journal in October 1996.

Here is a picture in the Phrack 55-12, although it only depicts the minimal side of SK_Buff, but it is very

Useful, especially when you always forget that SKB_PUT is forwarded or back to the rear tuning :)

--- ----------------- HAND

^ | | | |

| | | | ^ SKB_PUSH

| | | | | |

| ----------------- Data --- ---

| | | ^ |

True | | | V SKB_PULL

Size | | LEN

| | | | | ^ SKB_TRIM

| | | V |

| ----------------- Tail --- ---

| | | | | |

| | | V SKB_PUT

V | |

------------------- End

Linux Network Layer Efficiency: In the network layer code in Linux, the pointer is applied to a large number, the purpose is to avoid data copying this type of consumption.

The operation of system resources. The data segment portion of a packet is only twice a copy, that is, from the network card to the core state memory, and the intra-user memory is taken from the core state. I saw some days ago, in some improvement of SNIFFER

In an attempt, Turbo Packet (a kernel patch) uses a core state and

A user-state shared memory approach has reducing a copy of a data, further improving efficiency.

3 postscript:

This contribution is also written in the last time, look at the poor writing inside, I really feel a bit right.

All ~~ If there is time, I will rewrite this part, in fact, this is also my wish :)

4 References:

[1.] Phrack 55-12

[2.] 2nd Edition

[3.] Network Buffers and Memory Management Alan Cox

Http://www2.linuxjournal.com/lj-issues/issue30/1312.html

[4.] Zhejiang University Source Analysis Report "Linux Network Equipment Analysis" Pan

[5.] Linux ip networking - a guide to the importation and modification of thelinux

POPTOCOL Stack

Glenn Herrin May 31, 2000 http://www.movement.uklinux.net/linux-net.pdf

All rights reserved, no reproduced

Welcome to our site http://www.nsfocus.com/

Zhonglian Green Alliance gives you safe guarantee

转载请注明原文地址:https://www.9cbs.com/read-1660.html

New Post(0)