(Reproduced) zero copy technology research and implementation

xiaoxiao2021-03-06  14

Zero copy technology research and implementation

Release Date: 2003-11-24

Abstract:

Digest:

http://www.xfocus.net/

Created: 2003-11-21 Updated: 2003-11-23

Article attribute: original

Article submission: firstdot (firstdot_at_163.com)

Zero copy technology research and implementation

Author: Liang Jian (firstdot)

E-mail: firstdot@163.com

Thanks to Wang Chao, Shi Xiaolong's joint research and strong help

One. basic concepts

Zero-Copy Basic Thought is: Datasciation Data Report from the process of network devices to user program space, reduce data copying times, reduce system calls, and realize zero participation of CPUs, completely eliminating the CPU in this area. The most important technique for realizing zero copy is DMA data transmission technology and memory area mapping technology. As shown in Figure 1, traditional network datagram processing requires a network device to operating system memory space, and the system memory space to the user application space, and also needs to undergo system calls to the system. The zero copy technology first uses DMA technology to pass the network datagrand directly to the system kernel pre-allocated address space, avoid the participation of the CPU; at the same time, map the memory area memory area in the system kernel to the detector application space. (There is also a way to establish a buffer in the user space, mapped to the kernel space, similar to the Kiobuf technology under the Linux system), the detection program directly access this memory, thereby reducing the system core to the user space Memory copy, while reducing the overhead of system calls, real "zero copy".

Figure 1 Comparison of traditional data processing and zero copy technology

two. achieve

The main idea is to apply for a core cache when the 8139 Too.c included in its kernel source code is: Apply for a kernel cache when the 8139TOO network driver module is started, and establish a data structure to manage it, then trial Sexually written to the plurality of string data, finally pass the Cached address to the user process via the Proc file system; the user process acquires the cache address by reading the PROC file system and maps the cache, so that it can be Read the data. Haha, in order to lazy, this article is only experimented with the address mapping part of zero copy ideas, and does not achieve DMA data transmission (too much trouble, you have to understand hardware). This test is not part of the captain module in an IDS product. To realize zero copy in IDS, in addition to DMA, there are some questions to consider, see the analysis of the third section of this article. The following is the main step of realizing zero copy, and the detailed code is shown in the appendix.

Step 1: Modify the network card driver

a. Apply for a cache in the network card driver: Since the maximum assignable consecutive cache supported in the Linux2.4.x kernel is 2m, if you need to store a larger amount of network data packets, you need to assign multiple discontinuous caches. And use linked lists, arrays, or Hash tables to manage these caches.

#define Pages_Order 9

Unsigned long su1_2

SU1_2 = __GET_FREE_PAGES (GFP_kernel, Pages_Order);

b. Write data to the cache: Zero copy implementation in the true IDS product should be written directly to the package using the DMA data transfer to directly write the cache. As a test, I just write several any strings to the cache. If you don't consider DMA, you want to write a true network packet to the cache, you can call Netif_Rx () in RTL8139_RX_INTERRUPT () in 8139TOO.C () () After inserting the following code: // PUT_PKT2MEM_N ; // Package

// put_mem (SKB-> DATA, PKT_SIZE);

Where the PUT_PKT2MEM_N variable and the PUT_MEM function see the appendix.

c. Transport the physical address of the cache to the user space: Since the cache address in the kernel is the virtual address, the user space needs to be the physical address of the cache, so the first virtual address to the physical address. In the Linux system, you can use the kernel virtual address to reduce 3G to obtain the corresponding physical address. Putting the cached address to the user space requires a small amount of data transfer in the kernel and user space, which can be implemented using the character driver, PROC file system, etc., which uses the PROC file system mode here.

Int Read_Procaddr (Char * BUF, Char ** Start, Off_T Offset, Int Count, INT * EOF, VOID * DATA)

{

Sprintf (buf, "% u / n", __ pa (su1_2));

* EOF = 1;

Return 9;

}

CREATE_PROC_READ_ENTRY ("NF_ADDR", 0, NULL, READ_PROCADDR, NULL);

Step 2: Implement Access to Shared Cache in the User Program

a. Read Cache Address: It can be obtained by reading the proc file directly.

Char Addr [9];

INT fd_procaddr;

UNSIGNED Long Addr;

FD_PROCADDR = Open ("/ proc / nf_addr", o_rdonly);

Read (FD_Procaddr, Addr, 9);

AddR = atol (addr);

b. Map the cache to the user process space: Open the / dev / MEM device in the user process (equivalent to physical memory), use MMAP to map the cache application for the network card driver to its own process space, and then read it The required network packets.

CHAR * SU1_2;

Int fd;

FD = Open ("/ dev / mem", o_rdwr);

SU1_2 = MMAP (0, Pages * 4 * 1024, Prot_Read | Prot_Write, Map_Shared, FD, ADDR);

three. analysis

The most critical issue in zero copy is synchronous problem, while writing a network card driver in the kernel space to the network packet, while the user process is directly analyzed directly in the cache (note, not copy after copying) Because both are in different spaces, this makes synchronous problems more complicated. The cache is divided into multiple small pieces, and each block stores a network packet and represents a data structure. This test uses the flag bit in the package data structure to identify when read or write, when the network card driver is filled in the package data structure After the real package data, the package is identified as readable. When the user process has the data analysis in the package data structure, the package is identified by the package, which is basically solved by the synchronization problem. However, since the IDS analyzed process needs to be analyzed directly to the data in the cache, rather than analyzing the data to the user space, this makes the read operation slower than the write operation, which may cause the network card to drive the cached space. Write, thus causing a certain packet loss, the key to solving this problem is how much cache is applied, too small cache is easy to packet loss, too large cache, manages trouble and system performance will have a relatively large impact. four. appendix

A. Code added in 8139 Too.c

/ * add_by_liangjian for ZERO_COPY * /

#include

#include

#include

#include

#define Pages_Order 9

#define pages 512

#define Mem_Width 1500

/ * Added * /

/ * add_by_liangjian for ZERO_COPY * /

Struct Mem_Data

{

// int key;

Unsigned short width; / * buffer width * /

Unsigned short length; / * buffer length * /

// unsigned short wtimes; / * write process count, reserved, to write * /

// unsigned short rtimes; / * read the process count, reserved, to read multiple processes in the future * /

UNSIGNED SHORT WI; / * Writing Pointer * /

Unsigned short ri; / * read pointer * /

} * MEM_DATA;

Struct Mem_packet

{

Unsigned int Len;

Unsigned char packetp [MEM_WIDTH - 4]; / ​​* sizeof (unsigned int) == 4 * /

}

Unsigned long su1_2; / * cache address * /

/ * Added * /

/ * add_by_liangjian for ZERO_COPY * /

// Delete the cache

Void del_Mem ()

{

INT PAGES = 0;

Char * addr;

Addr = (char *) SU1_2;

WHILE (Pages <= Pages -1)

{

MEM_MAP_UNRESERVE (Virt_to_page (add));

AddR = addr page_size;

Pages ;

}

Free_PAGES (SU1_2, PAGES_ORDER);

}

void init_mem ()

/ ************************************************** *******

* Initialization cache

* Enter: Amode: Buffer Read / write mode: r, w *

* Return: 00: Failure **> 0: Buffer Address *

*********************************************************** ****** /

{

INT I;

INT PAGES = 0;

Char * addr;

CHAR * BUF;

Struct Mem_packet * curr_pack;

SU1_2 = __GET_FREE_PAGES (GFP_kernel, Pages_Order);

Printk ("[% x] / n", su1_2);

Addr = (char *) SU1_2;

WHILE (PAGES <= PAGES -1)

{

MEM_MAP_RESERVE (Virt_to_page (add)); // Need to make the cached page resident

AddR = addr page_size;

Pages ;

}

MEM_DATA = (Struct Mem_Data *) SU1_2;

MEM_DATA [0] .ri = 1;

MEM_DATA [0] .wi = 1;

MEM_DATA [0] .length = pages * 4 * 1024 / MEM_WIDTH;

MEM_DATA [0] .width = MEM_WIDTH;

/ * Initial su1_2 * /

For (i = 1; i <= MEM_DATA [0] .length; i )

{

BUF = (void *) ((char *) SU1_2 MEM_WIDTH * I);

Curr_pack = (struct mem_packet *) BUF;

Curr_pack-> len = 0;

}

}

INT PUT_MEM (Char * ABUF, Unsigned Int Pack_size)

/ ************************************************** ****************

* Write buffer zone subprogram *

* Enter parameters: AMEM: Buffer Address *

* ABUF: Write data address *

* Output parameters: <= 00: error *

* Xxxx: Data item number *

*********************************************************** *************** /

{

Register Int S, I, Width, Length, MEM_I;

CHAR * BUF;

Struct Mem_packet * curr_pack;

S = 0;

MEM_DATA = (Struct Mem_Data *) SU1_2;

Width = MEM_DATA [0] .width;

Length = MEM_DATA [0] .length;

MEM_I = MEM_DATA [0] .wi;

BUF = (void *) ((char *) SU1_2 Width * MEM_I);

For (i = 1; i

Curr_pack = (struct mem_packet *) BUF; if (curr_pack-> len == 0) {

Memcpy (curr_pack-> packetp, abuf, pack_size);

CURR_PACK-> LEN = Pack_Size ;;

S = MEM_I;

MEM_I ;

IF (MEM_I> = Length)

MEM_I = 1;

MEM_DATA [0] .wi = MEM_I;

Break;

}

MEM_I ;

IF (MEM_I> = Length) {

MEM_I = 1;

BUF = (void *) ((char *) SU1_2 Width);

}

Else Buf = (char *) SU1_2 width * MEM_I;

}

IF (i> = length)

S = 0;

Return S;

}

// proc file read function

Int Read_Procaddr (Char * BUF, Char ** Start, Off_T Offset, Int Count, INT * EOF, VOID * DATA)

{

Sprintf (buf, "% u / n", __ pa (su1_2));

* EOF = 1;

Return 9;

}

/ * Added * /

Add the following code in the RTL8139_INIT_MODULE () function of 8139 Too.c:

/ * add_by_liangjian for ZERO_COPY * /

PUT_PKT2MEM_N = 0;

INIT_MEM ();

PUT_MEM ("Data1DfadFaserty, 16);

PUT_MEM ("Data2ZCVBNM", 11);

PUT_MEM ("DATA39876543210Poiuyt", 21);

CREATE_PROC_READ_ENTRY ("NF_ADDR", 0, NULL, READ_PROCADDR, NULL);

/ * Added * /

Add the following code to the RTL8139_CLEANUP_MODULE () function of 8139 Too.c:

/ * add_by_liangjian for ZERO_COPY * /

Del_Mem ();

REMOVE_PROC_ENTRY ("NF_ADDR", NULL);

/ * Added * /

b. User space read cache code

#include

#include

#include

#include

#include

#define pages 512

#define Mem_Width 1500

Struct Mem_Data

{

// int key;

Unsigned short width; / * buffer width * /

Unsigned short length; / * buffer length * /

// unsigned short wtimes; / * write process count, reserved, to write * /

// unsigned short rtimes; / * read the process count, reserved, to read multiple processes in the future * /

UNSIGNED SHORT WI; / * Writing Pointer * /

Unsigned short ri; / * read pointer * /

} * MEM_DATA;

Struct Mem_packet

{

Unsigned int Len;

Unsigned char packetp [MEM_WIDTH - 4]; / ​​* sizeof (unsigned int) == 4 * /};

INT GET_MEM (Char * AMEM, Char * ABUF, Unsigned Int * Size)

/ ************************************************** ****************

* Read buffer zone program *

* Enter parameters: AMEM: Buffer Address *

* ABUF: Return to the data address, the length of its data area should be greater than *

* Buffer width *

* Output parameters: <= 00: error *

* Xxxx: Data item number *

*********************************************************** *************** /

{

Register Int I, Width, Length, MEM_I;

CHAR * BUF;

Struct Mem_packet * curr_pack;

S = 0;

MEM_DATA = (void *) AMEM;

Width = MEM_DATA [0] .width;

Length = MEM_DATA [0] .length;

MEM_I = MEM_DATA [0] .ri;

BUF = (void *) (AMEM Width * MEM_I);

Curr_pack = (struct mem_packet *) BUF;

IF (curr_pa! = 0) {/ * The first byte is 0 indicates that the part is empty * /

Memcpy (abuf, curr_pack-> packetp, curr_pack-> len);

* size = curr_pack-> len;

Curr_pack-> len = 0;

s = MEM_DATA [0] .ri;

MEM_DATA [0] .ri ;

IF (MEM_DATA [0] .ri> = length)

MEM_DATA [0] .ri = 1;

Goto Ret;

}

For (i = 1; i

MEM_I ; / * Continue to find it, the worst situation is to find the entire buffer over again * /

IF (MEM_I> = Length)

MEM_I = 1;

BUF = (void *) (AMEM Width * MEM_I);

Curr_pack = (struct mem_packet *) BUF;

IF (curr_pack-> len == 0)

CONTINUE;

Memcpy (abuf, curr_pack-> packetp, curr_pack-> len);

* size = curr_pack-> len;

Curr_pack-> len = 0;

S = MEM_DATA [0] .ri = MEM_I;

MEM_DATA [0] .ri ;

IF (MEM_DATA [0] .ri> = length)

MEM_DATA [0] .ri = 1; Break;

}

RET:

Return S;

}

int main ()

{

CHAR * SU1_2;

Char Receive [1500];

INT I, J;

Int fd;

INT fd_procaddr;

Unsigned int size;

Char Addr [9];

UNSIGNED Long Addr;

J = 0;

/ * Open Device 'MEM' AS A Media to Access The Ram * /

FD = Open ("/ dev / mem", o_rdwr);

FD_PROCADDR = Open ("/ proc / nf_addr", o_rdonly);

Read (FD_Procaddr, Addr, 9);

AddR = atol (addr);

Close (FD_ProcAddr);

Printf ("% U [% 8LX] / N", addr, addr;

/ * Map the address in kernel to user space, use mmap function * /

SU1_2 = MMAP (0, Pages * 4 * 1024, Prot_Read | Prot_Write, Map_Shared, FD, ADDR);

PERROR ("MMAP");

While (1)

{

Bzero (Receive, 1500);

I = GET_MEM (SU1_2, Receive, & size);

IF (i! = 0)

{

J ;

Printf ("% d:% s [size =% d] / n", j, recaive, size);

}

Else

{

Printf ("There Have No Data / N");

Munmap (SU1_2, Pages * 4 * 1024);

Close (FD);

Break;

}

}

While (1);

}

Fives. references

1. Christian Kurmann, Felix Rauch, Thomas M. Stricker.

Speculative Defragmentation - Leading GigaBit Ethernet To True Zero-Copy Communication

2. Alessandro Rubini, Jonathan Corbet. "Linux Device Drivers 2", O'Reilly & Associates 2002.

3. Hu Ximing, Mao Dude, "Linux Nuclear Source Code Scenario, Zhejiang University Press 2001

About the author: Liang Jian, China Institute of Computing Technology postgraduate students, research: information security. The Top Taster 's Operation is "The Host Output Intrusion Detection and Defense Based on System Call Analysis". There have been more than two years of research experience for IDS, familiar with the Linux kernel, familiar with Linux C / C programming, Win32 API programming, interested in network and operating system.

转载请注明原文地址:https://www.9cbs.com/read-67664.html

New Post(0)