Implementation and Analysis of Linux System Nuclear Space and User Spatial Communication
English original
content:
1 Introduction 2 LINUX Kernel Module Operation Environment and Traditional Process Communication 3 LINUX Nuclear Stomats and User Process Communication Methods and Implementation 4 Summary Reference Information About The Author's Evaluation
Chen Xin
Free software enthusiast, Nanjing University of Posts and Telecommunications Electronic Engineering Department July 2004
Most Linux kernel programs need to exchange data with user space, but the Linux kernel cannot provide adequate support between traditional Linux processes to communicate with communication. This article summarizes and compares the implementation methods of several kernel states and user-state process communication, and recommends using NetLink sockets to implement interrupt environments and user-state process communication. 1 Introduction Linux is an open operating system, whether it is ordinary user or enterprise users to write their own kernel code, plus the cropping of the standard kernel to make the operating system suitable for you. At present, there are many network devices used by low-end users from standard Linux, which also shows that more and more people are being added to the Linux kernel development group. The implementation of one or more kernel modules does not meet the needs of general Linux system software, because the limitations of the kernel are too large, such as printing on the terminal, can not be treated, etc.. When we need to do this, it is necessary to transfer the data collected by the kernel to a user-state for one or more processes. In this way, the method of communicating with the user space process is particularly important. There is no detailed introduction to this type of communication method in Linux's kernel release versions, and there is no other article summarizes this, so this article will list several methods of communicating with user-state process and analyze their implementation and applicable environment in detail. . 2 The running environment of the Linux kernel module is in the traditional process. In a computer running Linux, the CPU will only have the following four states at any time: [1] is handling a hard interrupt. [2] Treat a soft interrupt, such as Softirq, Tasklet, and BH. [3] Run in the kernel state, but there is a process context, which is related to a process. [4] Run a user-state process. Among them, [1], [2] and [3] are running in kernel space, and [4] is in user space. Among them, in addition to [4], other states can only be preemptive in the state of them. For example, soft interrupt can only be seized by hard interruption. The Linux kernel module is a code that can be dynamically loaded and unloaded, and the code that is loaded into the kernel is immediately working in the kernel. There are three operating environment of the Linux kernel code: the user context environment, hard interrupt environment and soft interrupt environment. However, the limitations of the three environments are divided into two kinds, because soft interrupt environments are only a continuation of a hard interrupt environment. Comparison is as follows [1]. Table 1】
The internal nuclear state environment introduces the operation of the local code of the limiter user context and a user space process, such as the operating environment of the code in the system. The local variable cannot be passed directly to the user-state memory, as the kernel state and the user-state memory mapping mechanism are different. The hard interrupt and soft interrupt environment hard interrupt or soft interrupt running environment, such as the running environment of the received code of the IP datagram, the driver of network devices, and the like. Data cannot be passed directly to the user memory area; the code is not blocking during the operation. There are a lot of communication between Linux traditional processes, such as various types of pipes, messages, memory sharing, semaphore, and more. But they are unable to be used in kernel states and user-mate purposes, such as tables [2]. Table 2】
The communication method cannot be interposed with the internal nuclear state and the user-state why the pipe (excluding the named pipe) is limited to communication between the parent-child processes. The message queue is hard, and the data cannot be received without blocking. The amount of semaphors cannot be used in kernel state and user-mate. Memory sharing requires semapcosting, and the semaphore cannot be used. The socket is hard, and the data cannot be received without blocking. 3 LINUX Nuclear Stomats and User Process Communication Methods 3.1 User Context Environment The code in the user context environment can be blocked, so that you can use message queues and Unix domain sockets to implement kernel states. Communication with the user state. However, the data transmission efficiency of these methods is low, and the Linux kernel provides a copy_from_user () / copy_to_user () function to implement copy of kernel state and user-state data, but these two functions will cause blocking, so they can't be used in hard, soft interrupts . These two special copy functions are generally used in a function similar to the system calling, and such functions are often "shuttle" in the kernel and user state. The working principle of such methods is as shown in Figure [1]. Figure [1] The relevant system call is required to write and load the kernel itself. Imp1.tar.gz is an example, the kernel module registers a function of set a set of socket options allows the user space process to call this group function to read and write the kernel data. The source code contains three files, IMP1.H is a universal header file that defines the macro to use by the user and kernel state. IMP1_K.C is the source code of the kernel module. Imp1_u.c is the source code for a user-state process. The entire example demonstrates a string from a user-state process to the user context, and the content is "a message from userSpace / n". Then send a string to the user process to the user's context environment, the content is "a message from kernel / n". 3.2 Hard, soft interrupt environment compared to user contextual environment, hard interruption and soft interrupt environment and user-state process without stress, and the operation process cannot block. 3.2.1 Methods of communication between general processes We cannot use the method of communication between the traditional processes. But hard, soft interrupts also have a synchronous mechanism - spinlock, can achieve interrupt environment and interrupt environment, interrupt environment and kernel threads, and the kernel thread is running in process context In the environment, this can use a socket or message queue in the kernel thread to acquire data of the user space, and then pass the data through the critical area to the interrupt process. Basic ideas are shown in Figure [2]. Figure [2] Since the interrupt process is not possible to wait for the user-state process to send data without rest, it is necessary to receive the data of the user space through a kernel thread, and then pass through the critical zone to the interrupt process. The interrupt process is sent to the user space must be no blocking. Such communication models are not satisfactory because the kernel thread is to receive data with other user-state process, and the efficiency is low, so the interrupt process cannot receive data from the user space in real time. 3.2.2 NetLink Sockets In the later version of the Linux version 2.4, almost all interrupt processes and user-state processes are implemented using NetLink sockets, but also use NetLink to implement IP Queue tools. However, the use of IP Queue has its limitations and cannot be freely used for various interrupt processes. The kernel's help documentation and other Linux related articles do not have a detailed description of the NetLink socket in the interrupt process and user space communication, making many users only have a blurred concept. The communication of NetLink sockets is based on an identifier corresponding to the process, which is generally identified as the ID. This identification is 0 when the end of the communication is in the interrupt process. When communicating with NetLink sockets, both parties of communication are user-state processes, and the use method is similar to the message queue.
However, both communication between communication is an interrupt process, and it is different. The maximum feature of NetLink socket is support for the interrupt process, which no longer needs to start a kernel thread by using user space data in the kernel space, but is called by another soft interrupt to call the received function in advance. The principle is as shown in Figure [3]. Figure [3] It is clear that the soft interrupt instead of the kernel thread is used to receive data, so that data receives the real-time. When the NetLink socket is used for communication of kernel space and user space, it is similar to the user space creation method and typical socket, but the creation method of kernel space is different. Figure [4] is the process created when the NetLink socket implements such communication. Figure [4] The following will give an application example of a NetLink socket. The example implements the ICMP datagram from NETFILTER's NF_IP_PRE_ROUTING, passes the related information of the datagram to a user-state process, and the user process is printed on the terminal. Source code in file imp2.tar.gz. Nuclear module code (segmentation details): (1) Module initialization and uninstall Static struct soc K * NLFD; struct {__U32 PID; rwlock_t lock;} user_proc; / * Mount function on NF_IP_PRE_ROUTING in the Netfilter frame is Get_ICMP () * / static struct nf_hook_ops imp2_ops = {.hook = get_icmp, / * netfilter hook function * / .pf = PF_INET, .hooknum = NF_IP_PRE_ROUTING, .priority = NF_IP_PRI_FILTER -1,}; static int __init init (void) {rwlock_init (& user_proc .lock); / * Create a NetLink Socket in the kernel, and indicate that the data is received by the kernel_recieve () function This protocol NL_IMP2 is a custom * / NLFD = NetLink_kernel_create (nl_imp2, kernel_receive); if (! nlfd) {printk (" can not create a netlink socket / n "); return -1;} / * netfilter hook functions to the point of NF_IP_PRE_ROUTING * / return nf_register_hook (& imp2_ops);} static void __exit fini (void) {if (nlfd) {sock_release ( NLFD-> Socket);} nf_unregister_hook (& IMP2_OPS);} Module_init (Init); Module_Exit (FINI); Actually, the work is very simple, the module loading phase first creates a NetLink socket in the kernel space, and then one The function is hooked on the NF_IP_PRE_ROUTING hook point in the Netfilter frame. Release the resource occupied by the socket when uninstalling and log out of the function on the Netfilter. (2) data receiving user space
DECLARE_MUTEX (receive_sem); 01: static void kernel_receive (struct sock * sk, int len) 02: {03: do 04: {05: struct sk_buff * skb; 06: if (down_trylock (& receive_sem)) 07: return; 08: 09: while ((SKB = SKB_DEQUE (& SK-
If the reader reads the source code in IP_QUEUE.C or RTNETLINK.C, 03 to 18 and 31 to 38 in the disc (2) are the framework of NetLink Socket to receive data in the kernel space. In the framework, all data is removed from the socket cache, and then the analysis is not a legitimate datagram, the legal NetLink Data report must have a header of the NLMSGHDR structure. Here, the author uses your own defined message type: IMP2_U_PID (the message for the user space process), IMP2_close (user space process is turned off). Because of SMP, a read-write lock is used here to avoid different CPUs to access the critical regions. The operation of the kernel_receive () function is in a soft interrupt environment. (3) Intercept IP Data Report
static unsigned int get_icmp (unsigned int hook, struct sk_buff ** pskb, const struct net_device * in, const struct net_device * out, int (* okfn) (struct sk_buff *)) {struct iphdr * iph = (* pskb) -> nh.iph; struct packet_info info; if (iph-> protocol == IPPROTO_ICMP) / * If the transport layer protocol is ICMP * / {read_lock_bh (& user_proc.lock); if (! user_proc.pid = 0) {read_unlock_bh (& user_proc. LOCK); info.src = iph-> saddr; / * Record source address * / info.dest = iph-> DADDR; / * Record destination address * / send_to_user (& info); / * Send data * /} else read_unlock_bh & user_proc.lock);} return nf_accept;} (4) Sending data
static int send_to_user (struct packet_info * info) {int ret; int size; unsigned char * old_tail; struct sk_buff * skb; struct nlmsghdr * nlh; struct packet_info * packet; size = NLMSG_SPACE (sizeof (* info)); / * open A new socket cache * / SKB = alloc_skb (size, gfp_atomic); Old_tail = SKB-> TAIL; / * Fill in Data Report Information * / NLH = NLMSG_PUT (SKB, 0, 0, IMP2_K_MSG, Size-Sizeof * NLH); packet = nlmsg_data (NLH); Memset (packet, 0, sizeof); / * Transferring data to user space * / packet-> src = info-> src; packet-> dest = INFO-> DEST; / * Calculate the actual length of the data after passage * / nlh-> nlmsg_len = SKB-> Tail - Old_tail; NetLink_CB (SKB) .dst_groups = 0; Read_Lock_BH (& User_Proc.lock); Ret = NetLink_Unicast (NLFD, SKB, User_Proc.PID, MSG_DONTWAIT); / * Send data * / read_unlock_bh (& user_proc.lock); Return Ret; NLMSG_Failure: / * If the send fails, cancel the socket cache * / if (SKB) KFree_skb (SKB); RETURN-1;} The macro reference used in the segment (4) is as follows:
/ * Byte alignment * / #define nlmsg_align ((len) NLMSG_ALIGNTO-1) & ~ (NLMSG_ALIGNTO-1)) / * Calculates Datagram League with a header * / #define nlmsg_length (LEN) (LEN) LEN) NLMSG_ALIGN (Struct NLMSGHDR)) / * Byte Alignment Data Report Level * / #Define NLMSG_SPACE (LEN) NLMSG_ALIGN (NLMSG_LENGTH (LEN)) / * Fill in the related header information, here, this is used here. So in the program to define * / #define NLMSG_PUT (SKB, PID, SEQ, TYPE, LEN) / ({IF (SKB_TAILROOM (SKB) <(int) nlmsg_space (len)) goto nlmsg_failure; / __nlmsg_put (SKB, PID, seq, type, len);}) static __inline__ struct nlmsghdr * __nlmsg_put (struct sk_buff * skb, u32 pid, u32 seq, int type, int len) {struct nlmsghdr * nlh; int size = NLMSG_LENGTH (len); nlh = ( struct nlmsghdr *) skb_put (skb, NLMSG_ALIGN (size)); nlh-> nlmsg_type = type; nlh-> nlmsg_len = size; nlh-> nlmsg_flags = 0; nlh-> nlmsg_pid = pid; nlh-> nlmsg_seq = seq; return NLH;} / * Skip header take actual data * / #define nlmsg_data (NLH) ((void *) ((char *) NLMSG_LENGTH (0))) / * Take NetLink control field * / #define NetLink_CB (SKB) (* (STRUCT NETLINK_SKB_PARMS *) & ((SKB) -> CB)) When you run an example, compile the IMP2_K.C module first, then use INSMOD to load the module into the kernel. . Run the compiled iMP2_u command, which will display the source address and destination address of the ICMP dibary currently received by this machine. Users can use Ctrl C to terminate the process of user space, and start another problem again. 4 Summary This article has different operating environments from kernel code to implement communication of different methods of kernel space and user space, and analyze their actual effects. Finally, it is recommended to use NetLink sockets to implement interrupt environments and user-state processes, because NetLink sockets are designed for such communication. References Linux 2.4 and subsequent vented kernel source code; www.netfilter.org; RFC 3549;
About Chen Xin: Nanjing University of Posts and Telecommunications Electronic Engineering Department 2000 Undergraduate, free software enthusiasts. Like to read Linux kernel source code, currently engaged in the analysis of network module parts of Linux systems. Contact: chex@njupt.edu.cn