Application and perfect application and improvement of NAPI technology on Linux network

xiaoxiao2021-03-17  205

Document option

Send this page as an email

Level: primary

Lu Zheng

June 24, 2004

NAPI is a technology that improves network processing efficiency on Linux. Its core concept is not to read data in a interrupt manner, and poll data in a Poll method, similar to the bottom half mode (Bottom-Half Treatment mode); but currently in Linux's NAPI work efficiency is relatively poor, this article is analyzed, while providing an efficient improvement method for your reference.

Preface:

NAPI is a technique for improving network processing efficiency on Linux. Its core concept is not to read data in a interrupt manner, and in order to first use interrupt wake-up data received service programs, then poll. Data, (similar to the bottom half); from our data obtained in the experiment, the interrupt of NIC is constantly decreasing with the increase in the reception speed of the network, current NAPI Technology has been widely used in the network card drive layer and the network layer. There is already an E1000 series network card, the RTL8139 series of network cards, and the mainstream network adapters such as the 3C50X series have adopted this technology, and NAPI technology has been in the network level. It is fully applied to the famous Netif_Rx function, and provides a special POLL method -Process_BackLog to process polling; indicate that NAPI technology can greatly improve the efficiency of short-length packet reception, reduce interrupt trigger Due to RTL8139CP is an application relatively wide network adapter, this article uses it as an example, indicating the application and basic principles of NAPI technology on the network adapter.

However, NAPI has some more severe defects: and for the upper application, the system cannot process it in time when each packet is received, and the accumulated packet will be in time when each packet is received. It takes a lot of memory, and the experiment demonstrates that this problem on the Linux platform will be serious than in FreeBSD; another problem caused by NAPI is that it is more difficult for large data packets, the reason is that the big data package is sent to The time spent on the network layer is much longer than short packets (even in the DMA mode), so that NAPI technology is suitable for processing of short-term short-length data packets in this article, in the end of this article The improvement method of NAPI, and experimental data.

Back to top

Use NAPI prerequisites:

The drive can continue to use the old 2.4 kernel network driver interface, NAPI's joining does not cause loss of forward compatibility, but NAPI uses at least the following guarantee:

A. To use the DMA annular input queue (that is, Ring_DMA, this section of the 2.4 driver with respect to the Ethernet is described in detail), or there is a package with enough memory space cache driver.

B. When the transmission / reception packet generates an interrupt, it is capable to turn off the NIC interrupt, and after shutting off NIC, does not affect the network device to receive the ring buffer (hereinafter referred to as RX-Ring) processing In the queue.

NAPI's processing of events arrived by packets uses a polling method. When the packet is reached, NAPI enforces the DEV-> Poll method. And the process delay in which the package arrival time is used as the previous drive, is usually performed.

It should be noted that the test If the DEC Tulip Series (DE21X4X chip) and the National Semi's partial network card chip, the test indicates that if the part of the interrupt processing is changed, it will cause a slight latency, Therefore, some small tricks are required in the operation of MII (media independent). See MII_CHECK_MEDIA's function processing flow, which is not discussed in detail. The example shown below represents how to put the processing process in the Poll method in the 8139, put all the original interrupts should be processed in the Poll method, and the context, we only introduce the received Poll method.

In the 8139CP driver introduction, it indicates that anything you can do in the interrupt program can be done in the Poll method, and of course the different NICs are different from the status and events to be processed in the interrupt.

For all NIC devices, the following two types of NIC receive event register response mechanisms:

COR Mechanism: When the user program read status / event register, the status queue represented by the register and NIC's RX-Ring will be cleared, Natsemi and Sunbmac's NIC will do this, in which case must be The processing part of all previous interrupt responses is moved to the Poll method. COW mechanism: When the user program writes status register, you must write 1 clearer to the bit to write 1. The 8139cp you want later is such a type, and most of the NIC belongs to this type, and this type is NAPI. The response is best, it only needs to place the received packet processing section in the Poll method, and the status processing part of the receiving event is placed in the original interrupt control program, and we will wait for the 8139cp type network card that will be introduced. Types of.

C. There is a capacity to prevent data packages in the NIC queue from queuing.

When turning off the send / receive event interrupt, NAPI will be called in POLL. Due to the poll method, the NIC interrupt can not know the package, then this time is in the completion of the polling, and the interrupt is turned on. There is a NIC interruption, which triggers a Poll event. This package arrives at the interrupt time we arrived as "rotting"; this will create a competition between the Poll mechanism and the NIC interruption, the method of solving is the utilization The reception status bit of the NIC, continues to receive the data in the ring queue buffered RX-RING until the interrupt is enabled after no data reception.

Back to top

Lock and anti-conflict mechanism:

- 1.SMP guarantee mechanism: Ensure that only one processor calls the Poll method of the network device, because we will see that only one processor can call the NIC device to call the Netif_rx_schedule in the Poll queue to call the poll method.

- 2. Network core layer (NET CORE) calling device driver uses a loop mode to send packets, completely unlocked reception when the device drive layer receives the packet, and the network core layer also guarantees only one processor each time there is only one processor You can use a soft interrupt processing to receive the queue.

- 3. Multiple processors can only occur when the RX-Ring access to the NIC can only occur when the cyclic queue is turned off (a suspend) method (at this time attempt to clear the receiving cycle queue)

- 4. Data synchronization issues (for receiving cyclic queues), drivers are not needed to finish these things. - 5. If you do not give all parts to the Poll method, then the NIC interrupt still needs to enable, the receiving link state changes and the transmission completion interrupt is still the same as the previous processing step, the assumption that the assumption is that the interrupt is a device The most loaded situation is certainly not said to be correct.

The following sections will be described in detail to the Poll method of calling the device in the receiving event.

Back to top

NAPI provides important functions and data structures and functions:

Core data structure:

The field in the structure of the Struct SoftNet_Data is the processing queue between NIC and network layers. This structure is globally, which transmits data information from the NIC interruption and Poll methods. The fields included are:

Struct SoftNet_Data

{

INT ZHROTTLE; / * For 1 means that the data package of the current queue is disabled * /

INT CNG_LEVEL; / * Indicates that the current processor's packet processing congestion level * /

INT avg_blog; / * Average congestion of a processor * /

STRUCT SK_BUFF_HEAD INPUT_PKT_QUEUE; / * Receives SK_BUFF Queues in Buffers * /

Struct List_Head Poll_List; / * Poll Equipment Queue Head * /

Struct Net_Device Output_Queue; / * Network Device Send Queue Queue Head * /

Struct SK_BUFF Completion_Queue; / * Complete the queue of the sending data package waiting to be released * /

Struct Net_Device backlog_dev; / * Represents network devices currently participating in POLL processing * /

}

Core API:

1. Netif_rx_schedule (dev)

This function is interrupted service program call, add the POLL method of the device to the network hierarchical POLL process queue, queuing and preparing to receive the packet, need to call Netif_Rx_reschedule_prep before use, and return to 1, and trigger a net_rx_softirq Soft interrupt notifies the network layer receives the packet.

2. Netif_rx_schedule_prep (dev)

Determining the device is running, and the device has not been added to the POLL processing queue of the network layer, and this function is called before calling Netif_rx_schedule.

3. Netif_rx_complete (dev)

Clear the currently specified device from the Poll queue, usually called the POLL method of the device, not to clear the specified device when the Poll queue is in a working state, otherwise it will be wrong.

Back to top

How to use NAPI at 8139CP:

From the essential sense of the Poll method, the number of interruptions is to minimize the number of interruptions, especially when a large amount of small length data packets, reducing interruptions to achieve nothing of the entire operating system to spend too much time protection and recovery In order to make the winning time used to process data on my network layer, for example, in the processing of the 8139cp interrupt described below, the purpose is to hang the generated device in Poll_List as soon as possible, and turn off the reception interrupt. Finally, the POLL method of the device is directly called to handle the receipt of the packet until it is not available, or the scheduling is completed in a time piece.

RTL8139C data receiving ring buffer queue:

RTL8139C is a new buffer mode that can significantly reduce the cost of CPU reception data, suitable for large-scale servers, suitable for data downloads such as IP, TCP, UDP, etc., and connect IEEE802.1p, 802.1 Q, VLAN and other network form; 64 consecutive reception / transmission descriptor units in 8139CP, corresponding to three different annular buffer queues - one is a high priority transmission descriptor queue, one is a normal priority conveyor Description Queue, one is a receiver description queue, and each annular buffer queue is composed of four two-word continuous descriptors, each descriptor has four consecutive double words, and each descriptor start address is 256. The location of the byte is aligned, before receiving the data, the software needs to be pre-assigned a DMA buffer, generally for transmission, the buffer is up to 8kByte and link the physical address to the DMA address description unit of the descriptor, and two The double word unit represents the reception status of the corresponding DMA buffer. The data unit for the annular buffer queue descriptor in /Driver/net/8139cp.c is shown as follows:

Struct CP_DESC {U32 OPTS1; / * Buffer status control, including buffer size, buffer transmission start bit * / u32 opts2; / * dedicated to VLAN portions * / u64 addr; / * buffer DMA address * / }

8139CP NIC interruption:

Static Irqreturn_t

CP_INTERRUPT (int IRQ, Void * dev_instance, struct pt_regs * regs)

{

Struct Net_Device * dev = dev_instance;

Struct cp_private * cp = dev-> priv;

U16 status;

/ * Check if there is interrupt in RX-Ring to arrive * /

Status = CPR16 (INTRSTATUS);

IF (! status || (status == 0xfff)))

Return Irq_none;

IF (Netif_MSG_INTR (CP))

Printk (kern_debug "% s: INTR, STATUS% 04X CMD% 02X CPCMD% 04x / N",

DEV-> Name, Status, CPR8 (CMD), CPR16 (CPCMD));

/ * Clear the contents of the NIC interrupt controller * /

CPW16 (intrstatus, status & ~ cp_rx_intr_mask);

Spin_lock (& ​​CP-> Lock);

/ * Receive status register indicates that there is a packet to arrive * /

IF (Status & (RXOK | RXERR | RXEMPTY | RXFIFOVR) {

/ * Switch the current generated NIC device on the Poll queue in SoftNet_Data, waiting for the application processing on the upper layer on the network /

IF (Netif_Rx_schedule_prep (dev)) {

/ * Close the reception interrupt enable * /

CPW16_F (INTRMASK, CP_NORX_INTR_MASK);

__netif_rx_schedule (dev);

}

}

/ * Send an interrupt process and the dedicated soft interrupt of 8139c , here we don't care * /

IF (Status & (TXOK | TXERR | TXEMPTY | SWINT))

CP_TX (CP);

/ * If a link change occurs, it is necessary to check the carrier state of the media independent interface (MII), and also changes.

Otherwise, you must prepare to restart the MII interface * /

IF (Status & Linkchg) MII_CHECK_MEDIA (& CP-> MII_IF, NETIF_MSG_LINK (CP), FALSE);

/ * If an error occurs in the PCI bus, you need to reset 8139c equipment * /

IF (status & pcierr) {

U16 PCI_STATUS;

PCI_Read_config_word (CP-> PDEV, PCI_Status, & PCI_Status);

PCI_Write_Config_Word (CP-> PDEV, PCI_STATUS, PCI_STATUS);

Printk (kern_err "% s: pci bus error, status =% 04x, PCI status =% 04x / n",

DEV-> Name, Status, PCI_Status;

/ * Todo: Reset Hardware * /

}

Spin_unlock (& ​​CP-> LOCK);

Return Irq_Handled;

}

Hang the NIC on the Poll queue (Poll_List)

At the 8139CP interrupt program, you can see the call mode of __netif_rx_schedule, which hooks the NIC device on the poll_list queue in the SoftNet_Data structure, so that the interrupt is returned in time, let the special data package processes the Bottom-Half section to process, let's come first Look at the internal workflow of __netif_rx_schedule.

Static inline void __netif_rx_schedule (struct net_device * dev)

{

Unsigned long flag;

Local_irq_save (Flags);

DEV_HOLD (DEV);

/ * Hang the current NIC device in the Poll (Poll_List) queue, waiting to call the soft interrupt in polling * /

List_add_tail (& dev-> poll_list, & __ get_cpu_var (softnet_data) .poll_list);

/ * Determine the package size of the current device to be prepared * /

IF (dev-> quota <0)

DEV-> quota = dev-> weight;

Else

DEV-> quota = dev-> weight;

/ * Start soft interrupt, in indicating that all interrupt status words Irq_cpustat_t about soft interrupt field __softirq_pendin,

The handle net_rx_action of the interrupt is run when the network wheel receives the soft interrupt position 1, waiting for the schedule. * /

__raise_softirq_irqoff (NET_RX_SOFTIRQ);

Local_irq_restore (Flags);

}

Process analysis of soft interrupts started by __netif_rx_schedule

Soft Interrupt Event Triggers Already in this device subsystem initialization time call subsys_initcall (net_dev_init) is activated on the soft interrupt console, and hooks on the task queue tasklet to run it at the task scheduling Schedule, the most important part of this It is called the POLL method of the 8139C network device (dev-> Poll), obtaining data from the RX-Ring queue of the network device, which is originally implemented in the network device interrupt service program, as explained above, POLL Methods The mechanism of spatial exchange time puts it in a soft interrupt portion to perform a rotation mechanism (using an old BOTTOM-HALF mechanism can also achieve the same effect, and more easily understood) When the process is scheduled, it will be Perform a soft interrupt of network devices and polling RX-Ring to receive data for NIC. Soft interrupt process:

Static void net_rx_action (Struct Softirq_Action * H)

{

Struct SoftNet_Data * Queue = & __ get_cpu_var (SoftNet_Data);

Unsigned long start_time = jiffies;

INT budget = netdev_max_backlog; / * Represents the maximum length of the queue * /

/ * Locked the current thread and the multiprocessor cannot be interrupted by other processors * /

preempt_disable ();

Local_irq_disable ();

/ * Check if there is a device on the POLL queue (Poll_List) in preparing to wait for polling to get data * /

While (! list_empty (& queue-> poll_list)) {

Struct Net_Device * dev;

/ * Here you guarantee that the current POLL process does not exceed a time film, so that it is not taken up with too much time,

In this way, the current Poll process is performed in one scheduled time, the budget represents the "number" of the maximum data transfer in the time slice.

The block means that the number of SK_BUFF is completed each Poll, and the number of SK_BUFFs in each block is determined by dev-> quota, in the 8139CP driver,

BUDGET is 300, and the quota is 16 indicates that the number of SK_BUFFs that can receive up to 4.8K per time. * /

IF (budget <= 0 || Jiffies - Start_time> 1)

Goto softnet_break;

Local_irq_enable ();

/ * Get the device structure of the wait routing from the rotation queue in the public SoftNet_Data data structure * /

DEV = list_entry (queue-> poll_list.next,

Struct Net_Device, poll_list);

/ * Call the POLL method of the device read data from the Ring Buffer on the NIC * /

IF (dev-> quota <= 0 || dev-> poll (dev, & budget) {

/ * Complete the receipt of the data of the POLL process, redefine the "quota" of the device to receive data

(In fact, the number of SK_BUFF buffers, you can create and most each time you call the poll method

Maximum number of SK_BUFF buffers that can be submitted to the upper layer, this parameter is important to be careful to optimize this value when high speed processing.

In the case of a large amount of data, it is necessary to add this value) * /

Local_IRQ_DISABLE ();

List_del (& dev-> poll_list); list_add_tail (& dev-> poll_list, & queue-> poll_list);

IF (dev-> quota <0)

DEV-> quota = dev-> weight;

Else

DEV-> quota = dev-> weight;

} else {

/ * The error of the error has occurred, or the data receipt of the "specified" quota is not completed, and there is no new data.

This may also indicate the process of transferring the transmission, calling __netif_rx_complete to clear the network device from the Poll queue

(Introducing a detailed introduction of the POLL process) * /

DEV_PUT (DEV);

Local_IRQ_DISABLE ();

}

}

OUT:

Local_irq_enable ();

preempt_enable ();

Return;

Softnet_break:

__get_cpu_var (NetDev_Rx_stat) .time_squeeze ;

__raise_softirq_irqoff (NET_RX_SOFTIRQ);

Goto Out;

}

Polling method in 8139cp driving

Dev-> Poll method:

This method usually is acquired by the network layer to the driver's reception cycle queue, and the number of packets delivered to the network layer in the driven reception cycle queue is indicated in the dev-> quota field, let's see 8139CP Prototype of POLL:

Static int CP_RX_POLL (Struct Net_Device * dev, int * budget)

The upper task of the parameter budget requires the number of packets passed under the bottom layer, which cannot exceed NETDEV_MAX_BACKLOG values.

All in all, the Poll method is called by the network layer, only responsible for submitting the corresponding number of packets in accordance with the requirements of the network layer ("budget" value). 8139CP's Poll method registration is usually performed when the device driver module is initialized (invoking probe), as follows:

Static int CP_INIT_ONE (Struct PCI_DEV * PDEV, Const Struct PCI_Device_id * ENT)

{

...

DEV-> Poll = CP_RX_POLL;

...

}

The Poll method of the device is as before, is called by soft interrupt NET_RX_ACITION on the network, and we now see specific processes:

Static int CP_RX_POLL (Struct Net_Device * dev, int * budget)

{

Struct CP_Private * CP = NetDev_Priv (DEV);

Unsigned rx_tail = cp-> rx_tail;

/ * Set the size of the data packet from the device to the network level when the schedule is scheduled.

Unsigned RX_Work = dev-> quota;

Unsigned rx;

RX_STATUS_LOOP:

Rx = 0;

/ * Re-opening the NIC interrupt, interrupt in the CP_Interrupt interrupt handle, now Poll has started processing data in the loop buffer queue,

Therefore, the interrupt can be opened, ready to receive new packets * /

CPW16 (intrsTatus, CP_RX_INTR_MASK);

While (1) {/ * Poll cycle start * /

U32 status, len;

DMA_ADDR_T MAPPING;

Struct SK_Buff * SKB, * New_SKB; Struct CP_DESC * DESC;

UNSIGNED BUFLLEN

/ * From the loop buffer queue from the subscript of RX_TAIL, the queue rx_skb "pick" socket buffer * /

SKB = CP-> RX_SKB [RX_TAIL] .skb;

IF (! SKB)

Bug ();

DESC = & CP-> RX_RING [RX_TAIL];

/ * Check the last data reception state on the NIC ring queue (RX_RING), whether there is an error in reception or FIFO, whether * /

Status = le32_to_cpu (Desc-> OPTS1);

IF (Status & Descown)

Break;

Len = (status & 0x1fff) - 4;

Mapping = cp-> RX_SKB [RX_TAIL]. mapping;

IF ((Status & (Firstfrag | Lastfrag))! = (firstfrag | lastfrag) {

/ * We don't support incoming fragment frames.

* instead, we attempt to ensure what

* pre-allocated RX Skbs Are Properly Sized Such

* That RX Fragments Are Never Encountered

* /

CP_RX_ERR_ACCT (CP, RX_TAIL, STATUS, LEN);

CP-> NET_STATS.RX_DROPPED ;

CP-> CP_STATS.RX_FRAGS ;

Goto rx_next;

}

IF (Status & (RXError | RXERRFIFO) {

CP_RX_ERR_ACCT (CP, RX_TAIL, STATUS, LEN);

Goto rx_next;

}

IF (Netif_MSG_RX_STATUS (CP))

Printk (kern_debug "% s: rx slot% d status 0x% x len% d / n",

CP-> dev-> name, rx_tail, status, len;

BUFLEN = CP-> RX_BUF_SZ RX_OFFSET;

/ * Create a new socket buffer * /

NEW_SKB = dev_alloc_skb (buflen);

IF (! new_skb) {

CP-> NET_STATS.RX_DROPPED ;

Goto rx_next;

}

SKB_RESERVE (New_SKB, RX_OFFSET);

NEW_SKB-> DEV = CP-> DEV;

/ * Release the mapping area on the original mapped loop * /

PCI_Unmap_single (CP-> PDEV, MAPPING,

Buflen, PCI_DMA_FROMDEVICE;

/ * Check if the data checksum obtained on the socket buffer (SK_BUFF) is correct * /

/ * Handle checksum offloading for incoming packets. * /

IF (CP_RX_CSUM_OK (STATUS))

SKB-> IP_SUMMED = Checksum_unnecessary;

Else

SKB-> IP_SUMMED = Checksum_none;

/ * Redefine the size of the socket buffer according to the actual size of the data * /

SKB_PUT (SKB, LEN);

mapping =

CP-> RX_SKB [RX_TAIL]. mapping = / * DMA image The newly created socket buffer virtual address new_buf-> tail to the actual physical address,

And hung this physical address in the queue of the received buffer * /

PCI_map_single (cp-> pdev, new_skb-> tail,

Buflen, PCI_DMA_FROMDEVICE;

/ * Hill the virtual address of the newly established buffer in the queue of the receiving buffer, when the next time the RX_SKB array is time,

The Poll method reads out the received packet from this virtual address * /

CP-> RX_SKB [RX_TAIL] .skb = new_skb;

/ * Call Netif_Rx_skb in CP_RX_SKB, populate the received data packet queue, and wait for the network layer to call IP_RCV to receive network data in the Bottom Half queue.

This function replaces the previous Netif_Rx * /

CP_RX_SKB (CP, SKB, DESC);

Rx ;

RX_NEXT:

/ * Hook the physical address of the previously mapped in the loop of the NIC device (that is, rx_ring, it is mapped in the physical storage area in NiC,

Instead of driving the dynamically established in memory), prepare to submit to the next layer (NIC) to perform data transmission * /

Cp-> rx_ring [rx_tail] .opts2 = 0;

Cp-> RX_RING [RX_TAIL] .addr = CPU_TO_LE64 (MAPPING);

/ * Write the control word in the corresponding transmission register, return the control of RX_RING from the driver to NIC hardware * /

IF (rx_tail == (CP_RX_RING_SIZE - 1)))

DESC-> OPTS1 = CPU_TO_LE32 (Descown | Ringend |

CP-> RX_BUF_SZ);

Else

DESC-> OPTS1 = CPU_TO_LE32 (Descown | CP-> RX_BUF_SZ);

/ * Step to the next unit of the next received buffer queue * /

RX_TAIL = Next_RX (Rx_Tail);

IF (! rx_work--)

Break;

}

CP-> rx_tail = rx_tail;

/ * Deliver quota quota, once quota is decremented to 0, indicating that this Poll transfer has completed the mission.

Just waiting to have a data arrival again to wake up soft interrupt to execute a poll method * /

DEV-> quota - = rx;

* budget - = rx;

/ * IF WE DID NOT REACH WORK LIMIT, THEN WE'RE DONE with

* this Round of Polling

* /

IF (rx_work) {

/ * If there is still a data reach, return to the beginning of the Poll method loop, continue to receive data * /

IF (CPR16 (INTRSTATUS) & CP_RX_INTR_MASK)

Goto rx_status_loop;

/ * Here, the data has been received, and there is no new reception interrupt generation, this time it enables NIC reception interrupt,

And call __netif_rx_complete to remove the device that has completed POLL from Poll_List, waiting for the next interrupt generation,

Mount the device again in the Poll_List queue again. * /

Local_IRQ_DISABLE ();

CPW16_F (INTRMASK, CP_INTR_MASK);

__netif_rx_complete (dev);

Local_irq_enable ();

Return 0; / * done * /

}

Return 1; / * NOT DONE * /}

Other use NAPI drivers and 8139CP are similar, just use the network layer specially provided Poll method --Proecess_BackLog (/T/Dev.c), after receiving the packet after the NIC interrupt, call Netif_RX on the network layer (/ Net / dev.c) Receive the hardware interrupt to store the SK_BUFF structure, then check the hardware frame header, identify the frame type, put it in the receiving queue (on the input_pkt_queue queue in the SoftNet_Data structure), activate the reception soft interrupts for further processing The soft interrupt function (NET_RX_ACITION) extracts the receiving package, and process_backlog (that is, the Poll method) submits the data to the upper layer.

Back to top

Can you make the receiving speed faster?

We now think about how to improve NAPI efficiency. Before you say efficiency, let's take a look at NAPI_howto.txt in linux's documentation to construct your NIC's Poll method, but there are some no more than 8139. Same, there is a DIRTY_RX field in the NIC device description that is not used in the 8139CP.

DIRTY_RX is a buffer that has opened up the SK_BUFF buffer pointer and the RX_RING that has been submitted to NIC participates in the received buffer, but the number of buffers that have not been transferred and the number of buffers that have completed transmission, similar to the similarity of Cur_RX It is the next buffer pointer to the transmission, and we can see some specific ways of this field in the NAPI_HOWTO.TXT:

/ * CUR_RX is the next buffer pointer that needs to be involved in the transmission.

If the CUR_RX pointer is greater than Dirty_RX, it means that each of the RX-Ring opened in RX-RING has been exhausted.

At this time you need to call REFILL_RX_RING to open some RX-Ring receiving units that have already submitted data to the network, open up new buffers.

Increase the value of Dirty_RX, prepare for the next data reception, * /

IF (TP-> ​​CUR_RX - TP-> DIRTY_RX> RX_RING_SIZE / 2 ||

TP-> rx_buffers [tp-> dirty_rx% rx_ring_size] .skb == NULL)

Refill_rx_ring (dev);

/ * If the current Cur_RX and DIRTY_RX do not exceed half of the total RX_RING receiving unit,

And the remaining half of the empty transmission unit, then we don't have to worry, because there is enough buffer to use (by experience),

You can exit the current program, wait for the next soft interrupt calling Poll to handle the data received between this,

(NAPI_HOWTO.TXT is a restart clock count, so that if there is no NIC interrupt process) * /

IF (tp-> rx_buffers [tp-> dirty_rx% rx_ring_size] .skb == NULL)

RESTART_TIMER ();

/ * If it is performed here, there are several cases that may occur. The difference between the first current Cur_RX and DIRTY_RX does not exceed half of the total RX_RING receiving unit.

After calling refill_rx_ring, Dirty_RX did not increase, (perhaps a large number of units received in RX-Ring did not get the processing of network layer functions),

The result DIRTY_RX did not increase, and there was no idle unit to receive the new to data so that Netif_Rx_schedule was reused to wake up the soft interrupt, the Poll method of the device was called, and the data was collected in RX-Ring. * /

Else Netif_rx_schedule (dev); / * WE Are Back on the point list * /

DIRTY_RX fields are used in the RTL-8169 drivers, but not used in 8139CP, in fact, this is not an 8139CP driver inadvertent performance, you can read napi_howto.txt, now you can know, now 8139cp is not strict in accordance with NAPI If you are interested, if you are interested, you can compare the difference between the two drivers of 8139CP and RTL-8169. Everyone will find that although both are not in the NIC interrupt processing to complete the data from the drive layer The forwarding on the network layer is done in the soft interrupt, but uses some unique hardware characteristics in 8139, so that the NIC uses the data packet to the up bit (RXOK) while using the shutdown interrupt receiving data. Notify the event, then use the POLL method to forward the data directly from the NIC to the upper layer; the RTL8169 also needs to complete SK_BUFF interrupt from the NIC to the soft interrupt by means of the input_pkt_queue in the SOFTNET_DATA structure (socket buffer (SK_BUFF) input queue) Data scheduling; this is the greatest advantage of 8139CP, is not the Dirty_RX field and the CUR_RX field to let the Poll method and the NIC interrupt know the status of the current transmission unit, as well as do not need to periodically regular call REFILL_RX_RING to refresh RX-Ring to get free Transmission unit; saying that the bad place I want to rewrite a Poll method, can't borrow the process_backlog in /net/core/dev.c as its own Poll method, but this cost is worth it.

Said so much, it seems that there is no relationship with improvement efficiency. In fact, it is exactly the opposite. Through these understandings that the meaning of some of our fields in SoftNet_Data should be clearer, the method of increasing efficiency is borrowed on the basis of 8139CP. Some methods in NAPI_HOWTO.TXT, from the actual usage effect, under certain applications more than Linux's 8139cp is indeed, we first look at the kernel in Linux2.6.6 use 8139CP Packet reception processing on the X86 (PIII-900MHz) platform: The comparison table is as follows:

Psize IPPS TPUT RXINT DONE

-------------------------------------------------- -

60 490000 254560 21 10

128 358750 259946 27 11

256 334454 450034 34 18

512 234550 556670 201239 193455

1024 119061 995645 884526 882300

1440 74568 995645 995645 987154

The above table said: "PSZIE" means that the number of packets "IPPS" can receive the number of packets that can receive each second system "rxint" reception interrupt number "DONE" loaded in the total amount of "rxint" reception interrupt number "DONE" The number of POLLs required for data in -Ring, this value also expressed the number of times we need to clear RX-Ring.

As can be seen from the above table, when the receiving rate reaches 490K packets / s, only 21 interrupts are generated, only 10 times of POLL can complete the data from RX_RING, however for low-rate for large data packets, The reception interrupt will increase sharply until the last packet needs to process the POLL method. The last result is that each interrupt requires a POLL method, and finally causes a sharp decline in efficiency, so that the efficiency of the system It will be greatly reduced, so NAPI applies to a large number of packets and as small packets, but for large packets, and low rate, but will cause a decrease in system speed.

If we want to improve this, we can consider the following methods, we have a series of tests on MIPS, XSACLE and SA1100 platforms, and have achieved good results:

1. Remove the NIC interruption, use the RXOK location to control the interrupt,

2. The timer interrupt Timer_List is used to set a suitable interval cycle according to the hardware platform (depending on the platform depending on the platform), and the RX-Ring directly uses the POLL polling. We use the interrupt vector directly on MIPS and XScale. 0 - IRQ0 TOP-HALF as RX-Ring (note that the HZ value we selected on the above two platforms is 1000, and usually this value is 100, and the number of Wall-Time is re-written) Let the Wall-Time are separated by 10 ms. Of course, the appropriate timing time can be selected according to their own platform and the application.

3. With the INPUT_PKT_QUE queue in SoftNet_Data, after the POLL method is completed in the clock interrupt Bottom-Half, the data is not directly processed to the network layer, but the SK_BUFF hangs SK_BUFF on the input_pkt_queue queue, wakes up soft interrupt afterwards, of course Imagine, this needs to pay a certain amount of memory, and real-time performance is also worse.

4. Using the dirty_rx field and the refill_rx_ring function, after the Poll method is called, and the network layer program is idle when the network layer program is idle, the new buffer is set on the ring buffer queue, which can be reached in the new data package. Save time, the operating system does not have to open up new space to cope with new data.

5. Finally, please note: Our upper-level application is mainly based on network data forwarding, and there is no complex application of many background processes on the application level. The above 1 to 4 points is obvious to sacrifice system efficiency Efficiency and separately improve the processing of network data.

Let's take another 8139CP driver using 8139CP reception on the X86 (PIII-900MHz) platform:

Psize IPPS TPUT RXINT DONE

-------------------------------------------------- -

60 553500 354560 17 7

128 453000 350400 19 10

256 390050 324500 28 13

512 305600 456670 203 455

1024 123440 340020 772951 123005

1440 64568 344567 822394 130000

From the above figure, the efficiency and volatility of data transmission have a significant improvement, and the gap before the high rate and the low rate of Polls have been significantly significant, such and largest package reception. The ability is also increased to 553k / s, and we can increase approximately 15% -25% in the MIPS series and the XScale series platform.

Finally, using NAPI is not the only way to improve network efficiency. It can only be considered by rights and interests. The fundamental solution is still on the upper application to exclude network devices, or provide a large number of buffer resources, if so, according to our experimental data It can increase the receiving efficiency of 100% -150% or more.

Back to top

Back to top

About author

Lu Zheng, you can contact him through lu_zheng@21cn.com.

转载请注明原文地址:https://www.9cbs.com/read-129635.html

New Post(0)