Linux kernel waiting queue mechanism introduction (transfer from http:plinux.org)

xiaoxiao2021-03-06 47

The Linux kernel is waiting for the queue mechanism to introduce people who believe that many writers write Socket program. When we open a socket, then read this socket, if there is no information available, the read will live. (This is the case where o_nonblock is added), until you have information, you will pass back. There is a data structure in Linux Kernel to help us do this. This data structure is here to introduce Wait Queue. In Kernel, Wait_Queue's application is widely used, and WAIT_QUEUE will be used in WAIT_QUEUE to mail. So, it is a basic data structure in Kernel. Next, I want to introduce the usage of WAIT_QUEUE, and use an example to explain how to use Wait_Queue. Finally, I will take TRACE TRACE TRAT_QUEUE to see how Wait_Queue is done. What I want to mention is to mention first, Linux is the difference between User Space to Kernel Space. We know that Linux is a Multi-Tasking environment, and there are many people to perform a lot of programs. This is from the user's point of view. If you look at the Kernel's point of view, it is no so-called multi-tasking. In Kernel, only Single-Thread. That is, if your kernel code is being executed, only that part is executed in the system. There is no other part of the kernel code is also working. Of course, this refers to the case of Single Processor. If it is SMP, then I will not be clear. I think many people are writing in Windows 3.1, in that environment, each program must use the CPU to use the CPU to use other programs. If there is a while (1) inside a program, then the system stops there. This multitasking is called Non-preemptive. Its multitasking characteristics are caused by each program. Under Linux's User Space, the so-called preemptive, each process is to do what to do, even if you add While (1) in your program; this line does not affect the system's operation. Anyway time is here, the system will automatically stop your program and let other programs do it. This is in the case of User Space, in this way, it is the same as the Windows 3.1 program. In the Kernel, you must properly release the CPU's execution rights. If you join While (1) in Kernel; this line. That system will be like Windows 3.1. Card there. Of course, I didn't tried this to change KERNEL. If you are interested, you can try it. If there is a different result, please remember tell me. Suppose we create a buffer in Kernel, User can read or write information to this buffer via the System Call, such as Read, Write. If there is a USER write information to Buffer, the buffer is already full. Then how do you want to deal with this situation? The first one, pass it to the user an error message, saying that the buffer is already full, and can no longer write.

Second, put the user's requirements Block, and so some people read the buffer content, leave the space, let the user write information. But the problem is coming, how do you want to stay User's request Block? Do you want to use while (is_full); Write_to_buffer; do you think about this? Think about it, if you do this? First, kernel will always be executed in this while. Second, if Kernel has been executed in this while, it means that it has no way to go to the MAINTAIN system. The system is equivalent to the system. Here IS_FULL is a variable, of course, you can make Is_full is a function, in this function, do something else, let Kernel can operate, the system is not. This is a way. However, if we use WAIT_QUEUE, then the program will look more beautiful, and it is more understanding, as shown below: struct wait_Queue * wq = null; / * Global variable * / while (is_ful) {interruptible_sleep_on (& wq); } WRITE_TO_Buffer (); Interruptible_sleep_on (& WQ) is used to put the current process, which is required to write information into the process to buffer. In Interruptible_sleep_on, it is finally called schedule () to do Schedule's actions, which is to find another processs to perform the operation of the system. When INTERRUptible_sleep_on is executed, the Process of Write will be lived by Block. That will I resume execution? This process will be held by Block because of the space of the buffer, and cannot be written. But if someone reads the Buffer's data, the buffer can be written. So, the action on wake up Process should be done in the read buffer. EXTERN STRUCT WAIT_QUE * WQ; if (! is_empty) {read_from_buffer (); wake_up_interruptible (& wq);} .... The above program code should be placed in the program code of the read buffer, when Buffer has extra space We call WAKE_UP_INTERRUPTIBLE (& WQ) to wake up all the processs hanging on WQ. Please remember, I said to wake all the processs on WQ, so if there is 10 processs hanging on WQ, then 10 will be woken up. Then, as for who will execute first. It is to see how Schedule is doing. It is because these 10 will be woken up. If A is executed first, and if you don't make it, A is full of buffer, what should other 9 processs do? So in the Write Buffer part, you need to check if Buffer is currently available. If so, then continue to hang on the WQ. The above is the usage of WAIT_QUEUE. Very simple? Next, I will introduce WAIT_QUEUE to provide those Function let us use. Let me reiterate again.

Wait_Queue should be set to global variable, called WQ, as long as any processs want to hang itself, you can call Sleep_on and so on. Wake wake up the Process on WQ. Just call Wake_UP and other function. Just I know, Wait_Queue provides 4 function can be used, two is used to add process to wait_queue: sleep_on (struct wait_queue ** wq); interruptible_sleep_on (struct wait_Queue ** WQ The other two are wake up Process from Wait_Queue. Wake_up (struct wait_queue ** wq); wake_up_interruptible (struct wait_queue ** wq); I now explain why there are two groups. One group with Interruptible is like this. When we go to Read, there is no information to read the socket, and the process will block there. If we press CTRL C at this time, the read () will return to EINTR. Block IO like this is done using interruptible_sleep_on (). That is, if you use interruptible_sleep_on () to put the process to wait_queue, if someone sends a sign to this process, it will automatically wake from Wait_Queue. But if you use sleep_on () to put process in WQ, then don't care for any signal, it still doesn't care about you. Unless you are waking up using Wake_up (). Sleep has two groups. Wake_up also has two groups. Wake_up_interruptible () wakes up using the process of interruptible_sleep_on () in WQ. As for WAKE_UP (), you will wake up all the processs in WQ. Includes processs using interruptible_sleep_on (). There is a little careful, call interruptible_sleep_on (), and sleep_on (), if you use Wait_Queue. Simply put, Reentrant means that this Function does not change any global variable, or will not depend on any global variable, or will not dend ON after any global variable after calling interruptible_sleep_on () or sleep_on (). . Because when this Function calls Sleep_on (), the current process will be suspended. Maybe another process will call this Function. If the previous process exists some information, it is necessary to use it when it is executed, and the second stroke comes in, and this global variable is changed. When the first process is restored, INFORMATION in the global variable is changed. The result is that I am afraid that it is not what we can imagine. In fact, the function calls from the Process execution instruction to this function should be Reentrant.

Otherwise, it is very likely that there will be the above situation. Since Wait_Queue is provided by kernel, this example must be put in the Kernel to execute. This example I use is a simple Driver. It will maintain a buffer, the size is 8192 bytes. Provide read with WRITE features. When there is no information in buffer, read () will be immediately transferred, that is, do not do block IO. When Write Buffer, if you call Write (), the data is full or written is larger than the Buffer, it will be held by Block until someone reads the data in the buffer. In the program code of Write Buffer, we use Wait_Queue to do the block of Block IO. Here, I will write this Driver as Module for easy loading kernel. The first step, this Driver is a simple Character Device Driver. So let's create a Character Device at / DEV. Major Number We find a relatively no one, like 54, Minor Number uses 0. Next command. MKNOD / DEV / BUF C 54 0 MKNOD is a Command for generating SPECial File. / dev / buf said to generate a file called BUF, located under / dev. c indicates that it is a Character Device. 54 For its Major Number, 0 is its Minor Number. Writings about Character Device Driver.

I have the opportunity to introduce you again. Because this time is Wait_Queue, there is no more DRIVER's things. Step 2, we have to write a Module, under this module code: buf.c #define Module #include #include #include #include #include #define buf_len 8192 int flag; / * when rp = WP, FLAG = 0 for EMPTY, FLAG = 1 for non-empty * / char * wp, * rp; char buffer [ BUF_LEN]; EXPORT_NO_SYMBOLS; / * do not export anything * / static ssize_t buf_read (struct file * filp, char * buf, size_t count, loff_t * ppos) {return count;} static ssize_t buf_write (struct file * filp, const char * buf, size_t count, loff_t * ppos) {return count;} static int buf_open (struct inode * inode, struct file * filp) {MOD_INC_USE_COUNT; return 0;} static int buf_release (struct inode * inode, struct file * filp) {MOD_DEC_USE_COUNT; return 0;} static struct file_operations buf_fops = {NULL, / * lseek * / buf_read, buf_write, NULL, / * readdir * / NULL, / * poll * / NULL, / * ioctl * / NULL, / * mmap * / buf_open, / * open * / null, / * flush * / buf_release, / * release * / null, / * fsync * / null, / * fasync * / null, / * check_media_change * / null, / * revALIDATE * / NULL / * LOCK * /}; static int buf_init () {int result; flag = 0; wp = rp = buf; result = register_chrdev (54, "buf", & buf_fops ); if ("Result <0) {Printk (" <5> BUF: Cannot Get Major 54); Return Result;} Return 0;} static void buf_clean () {ix (unregister_chrdev (54, "buf")) { Printk ("<5> BUF: Unregister_chrdev Error");}}}} INT INIT_MODULE (VOID) {Return BUF_INIT ();

} void cleanup_module (void) {buf_clean ();} For Module's writing, please refer to other files, the most important thing is to have INIT_MODULE () and Cleanup_Module () two function. I do the action of Initialize and Finalize in these two functions. Now explain it separately. In init_module (), only call buf_init (). In fact, it is also possible to write buf_init () CODE to Init_Module (). Just I think this is better. FLAG = 0; wp = rp = buf; result = register_chrdev (54, "buf", & buf_fops); if (Result <0) {Printk ("<5> BUF: Cannot Get Major 54); Return Result;} Return 0; init_buf () does something to register a Character Device Driver. Before you register a Character Device Driver, you must first prepare a variable that is file_operations structure, and File_Operations contains some Function Pointer. Driver's author must write these function yourself. And put the Function Address in this structure. As a result, when User read this Device, Kernel has a way to call the corresponding DRiver's function. In fact, briefly. Character Device Driver is a variable of such a File_Operations structure. File_Operations is defined in this file. Its prototype is slightly different from Kernel 2.2.1 and the previous version, which is a place you need to pay attention. Register_chrdev () See the name, I know that I want to register with the Character Device Driver. The first parameter is the Major Number of this Device. The second is its name. You can take it with your name. The third parameter is the address of a FILE_OPERATIONS variable. INIT_MODULE () must be transferred to 0, and Module will be loaded. In the part of the cleanup_module (), we also call buf_clean (). What it does is the action of Unregister. IF (unregister_chrdev (54, "buf")) {Printk ("<5> BUF: Unregister_chrdev error");} is to wash the data originally recorded on the Device Driver Table. The first parameter is MAJOR NUMBER. The second is the name of this Driver, which must be just like the name given in register_chrdev (). Now let's see that the file_otations provided by this Driver is those.

Static struct file_operations buf_fops = {NULL, / * LSEEK * / BUF_READ, BUF_WRIT, NULL, / * READDIR * / NULL, / * POLL * / NULL, / * ioctl * / null, / * mmap * / buf_open, / * Open * / NULL, / * FLUSH * / BUF_RELEASE, / * RELEASE * / NULL, / * FSYNC * / NULL, / * FASYNC * / NULL, / * Check_Media_Change * / NULL, / * REVALIDATE * / NULL / * LOCK * / Here, we only intend to import (), buf_write (), buf_open, and buf_release (), etc. Function. When User calls Open (), buf_open () will call at the end of the Kernel. Similarly, when calling close (), read (), and write (), buf_release (), buf_ve (), and buf_write () are also called. First, let's take a look at BUF_Open (). Static int buf_open (struct inode * inode, struct file * filp) MOD_INC_USE_COUNT; RETURN 0;} buf_open () is very simple. Just add this module's Use COUNT. This is to avoid being removed from Kernel when this Module is being used. Compared with it, in BUF_RELEASE (), we should reduce the use count. Just like opening a file. There is Open (), you should have a corresponding close (). If the module's use count is not 0, then this module cannot be removed from the kernel. Static int buf_release (struct inode * inode, struct file * filp) {mod_dec_use_count; return 0;} Next, we have to look at BUF_READ () and BUF_WRITE (). static ssize_t buf_read (struct file * filp, char * buf, size_t count, loff_t * ppos) {return count;} static ssize_t buf_write (struct file * filp, const char * buf, size_t count, loff_t * ppos) {return count; } At this time, we just return to the number of characters that User requires read or writes. Here, I want to explain the meaning of these parameters. Filp is a File structure of Pointer. That is, it means the File structure of the BUF file produced under / dev. When we call read () or write (), you must give a buffer and the length to read and write. BUF refers to this buffer, and count refers to the length. As for PPOS, it is the official of this file.

This value is useful to ordinary files. That is, it is related to lseek (). Since it is a drive here. So the PPOS will not be used here. One thing to be careful is that the above parameter buf is an address, and is still an address of the User Space. When Kernel calls BUF_READ (), the program is located in Kernel Space. So you can't read your data directly into the buf. The FS must be switched to this register. Makefile P = BUF OBJ = BUF.O include = -i / usr / src / linux / include / linux cflags = -d__kernel__ -dmodversions -dexport_symtab-d $ (include) -include / usr / src / linux / include / linux / Modversions.h cc = GCC $ (p): $ (OBJ) LD -R $ (OBJ) -O $ (p) .o .co: $ (cc) -c $ (cflags) $ Clean: RM -F * .o * ~ $ (p) Add above this makefile, after entering Make, a buf.o file will be generated. Use INSMOD to load buf.o to Kernel. I believe you should use / dev / zero this Device. Read this Device, you will only get empty content. Write information to this Device will only have a sea in the sea. Now you can compare BUF and Zero two Devices. The behavior of the two should be very similar. In the third step, we in the second step in the second step in IMPLEMENT a Device Driver like Zero. We now have to use WAIT_QUEUE via modifying it. First, let's add a global variable, write_wq, and set it to NULL. Struct wait_queue * write_wq = null; then, in BUF_READ (), we have to rewrite this look. Static SSIZE_T BUF_READ (Struct File * Filp, Char * BUF, SIZE_T Count, Loff_t * PPOS) {Int Num, Nread; Nread = 0; While ((WP == RP) &&! Flag) {/ * buffer is empty * / Return 0;} repeate_reading: IF (rp 0) goto repeate_reading; flag = 0; wake_up_interruptible (& write_wq); Return Nread;} I mentioned in front of me, BUF's address is User Space.

In Kernel Space, you can't write data directly into BUF as normal written in Buffer, or read data directly from BUF. Using FS in Linux, using FS is used as a switching of Kernel Space and User Space. So, if you want to manually, you can do this: mm_segment_t fs; fs = get_fs (); set_fs (user_ds); Write_Data_to_buf (buf); set_fs (fs); also switched to user space, write information to BUF . After that, I remember to switch back to Kernel Space. This method of doing it is troublesome, so Linux provides several functions, allowing us to move directly between different spaces. As you see, Copy_to_user () is one of them. COPY_TO_USER (TO, FROM, N); COPY_FROM_USER (to, from, n); as the name suggestion, copy_to_user () is in the buffer of the data COPY to User Space, which is written from TO to from, n is the number of BYTE to COPY. The same, copy_from_user () is to use the information from the user space from from copy in the To Kernel, the length is n Bytes. In the previous kernel, the predecessor of these two functions is Memcpy_TOFS () and memcpy_fromfs (), and I don't know why I get to Kernel 2.2.1, the name is changed. As for their program code, there is no clear clear. As for that version. I didn't take a closer look, I only know that I haven't changed it in 2.0.36, I have changed it to 2.2.1. These two functions are Macro, which are defined. Remember to first include Include before use. The program code that believes in buf_read () should not be difficult to understand. I don't know if you have seen it, there is a line behind buf_read (), which is wake_up_interruptible (& write_wq); Write_WQ is our use of processs that we want to write information to buffer, but buffer is full. This line will wake up in this queue. When Queue is empty, it is also when Write_WQ is null, Wake_up_interruptible () does not cause any errors. Next, let's take a look at the changed buf_write ().

static ssize_t buf_write (struct file * filp, const char * buf, size_t count, loff_t * ppos) {int num, nWrite; nWrite = 0; while ((wp == rp) && flag) {interruptible_sleep_on (& write_wq);} repeate_writing : IF (rp> wp) {Num = min (count, (int));} else {num = min (count, (int));} copy_from_user (WP, BUF, NUM; WP = Num; count - = NUM; NWRITE = NUM; if (wp == (buffer buf_len) {wp = buffer;} = ((wp! = rp) && (count> 0 )) {goto repeate_writing;} flag = 1; Return NWRITE;} We put the processs to Write_WQ action in buf_write (). When the buffer is full, you will drop the process to Write_WQ. While ((wp == rp) && flag) {interruptible_sleep_on (& write_wq);} Ok. The program has now made some modifications. Re-make once, using Insmod to load buf.o to Kernel. Then, let's test, is it really block IO. # CD / dev # ls -l ~ / www-howto -rw-r - r - 1 root root 23910 APR 14 16:50 / root / www -Howto # cat ~ / www -howto> Buf is performed here, it should be lived by Block. Now, let's open a shell coming out. # CD / dev # cat buf ... (Contents of www-howto) ... Skip ... At this point, WWW-HOWTO's content will appear. Moreover, the Shell, which block lived, has also been back. Finally, the test ends, can be removed from the kernel below KERNEL. The above is the use of Wait_Queue. I hope that you can help you. I want to use something for some people. However, for some people, it may also hope that how this thing is made. At least I am this kind of person. Below, I will introduce WAIT_QUEUE IMPLEMentation. If it is not interested in its importation, you can slightly not read it. Wait_Queue is defined in, we can first see how it is: struct wait_queue {struct task_struct * task; struct wait_queue * next;}; very simple.

There is only two fields in this structure, one is a Task_struct Pointer, and the other is Wait_Queue's Pointer. Obviously, we can see that Wait_Queue is actually a Linked List, and it is still a circular linked list. The Task_Struct is used to refer to Function, etc. Function. In Linux, each Process is described by a task_struct. Task_struct is a big structure where we will not discuss it. There is a global variable in Linux, called Current, which refers to the Task_Struct structure of the Process that is currently executing. This is why it will know that the Process call is known as the Process call when Process calls System Call, switch to Kernel. Ok, let's see how interruptible_sleep_on () and sleep_on () do. Both functions are located in /usr/src/linux/kernel/sched.c. void interruptible_sleep_on (struct wait_queue ** p) {SLEEP_ON_VAR current-> state = TASK_INTERRUPTIBLE; SLEEP_ON_HEAD schedule (); SLEEP_ON_TAIL} void sleep_on (struct wait_queue ** p) {SLEEP_ON_VAR current-> state = TASK_UNINTERRUPTIBLE; SLEEP_ON_HEAD schedule (); SLEEP_ON_TAIL} Did you find that these two functions are very similar. Yes, their only difference is that in current-> state = ... this is. Before we have said, interruptible_sleep_on () can be interrupted by Signal, so whose Current-> State is set to Task_Interruptible. And sleep_on () has no way to be interrupted, so current-> state is set to Task_unInterruptible. Next, we just look at interruptible_sleep_on (). After all, their differences are only in that one. In Sched.c, sleep_on_var is a Macro, in fact it just defines two regional variables. #definesleep_on_var unsigned long flags; struct wait_queue wait; I also said that Current variable refers to the Task_Struct structure of Process that is currently being executed. So current-> state = task_interruptible will set on the processs of the call interruptible_sleep_on (). As for Sleep_ON_HEAD, it is put the value of the Current's value into the WAIT variable of the Sleep_on_var and put the wait_Queue List in the interruptible_sleep_on () parameter.

#defineSLEEP_ON_HEAD wait.task = current; write_lock_irqsave (& waitqueue_lock, flags); __add_wait_queue (p, & wait); write_unlock (& waitqueue_lock); wait variable is declared in the region of the SLEEP_ON_VAR. Its Task field is set into the passion interruptible_sleep_on () process. As for WaitQueue_lock this variable is a spin lock. WaitQueue_lock is used to make sure there is only one Writer at the same time. But at the same time, there can be several readers. That is to say that WAITQUEUE_LOCK is used to ensure the Mutual Exclusive Access for Critical Section. Unsigned long flag; Write_lock_irqsave (& WaitQueue_Lock, Flags); ... critical section ... people who have learned OS should know what's the role of critical section. If necessary, please refer to the OS reference book. Only one thing in critical section is to put the area variable of Wait in the P this WAIT_QUEUE LIST. P is User calls in call interruptible_sleep_on (), which is Struct Wait_Queue **. Here, critical section only calls __add_wait_queue (). extern inline void __add_wait_queue (struct wait_queue ** p, struct wait_queue * wait) {wait-> next = * p:? WAIT_QUEUE_HEAD (p); * p = wait;} __add_wait_queue () is an inline function, defined in. Wait_Queue_head () is a very interesting macro, and we will discuss it later. Now, just know that it will pass the beginning of this Wait_Queue. So, __ add_wait_queue () means to put WAIT to the beginning of the Wait_Queue List to which the P belongs. However, do you remember? In the above example, we started Write_WQ to NULL. That is to say * P is NULL. So, when * p is null, Wait-> next = WAIT_QUEUE_HEAD (P) What does it mean? So, now, let's take a look at Wait_Queue_head () a macro, which is defined. #define Wait_Queue_Head (x) ((x) -1)) The X-type is struct wait_queue ** because it is a Pointer, so the size is 4 Byte. Therefore, if X is 100, then ((x) -1) becomes 96. As shown below. Wait_Queue_head (x) actually passed back 96, and transformed into struct wait_queue *, you can take a look. The original WAIT_QUEUE * is only formulated between 100-104.

Now Wait_Queue_head (x) is passed back directly to 96, but the location of 96-100 is not configured by us. A wonderful thing. Since X is the beginning of a Wait_Queue List, we will never use 96-100, we will only use the 100-104 memory directly. This is also a WAIT_QUEUE a more strange implementation. There are three pictures below. The first one indicates that we declared a variable of WAIT_QUEUE *, address at 100. There is also a variable of WAIT_QUEUE, called Wait. The second picture is the result of our calling interruptible_sleep_on (). The third sheet is that we announce a WAIT_QUEUE, named ANO_WAIT, the result of the ANO_WAIT after Wait_Queue List is displayed in the third map. http://linuxfab.cx/columns/10/wqq.gif In Interruptible_Sleep_on (), after the call is Sleep_on_Head, the current process has been placed in Wait_Queue. Next, call Schedule (), this Function is used to do Scheduling. The process you refer to CURRENT will be placed in Scheduling Queue to be picked out. After SCHEDULE () is executed, the current has no way to continue. When Current is later by Wake Up, it will be executed from SCHEEP_ON_TAIL from Sleep_on_Tail. Sleep_on_tail is just the opposite of sleep_on_head, which will remove this Process from Wait_Queue. #definesleep_on_tail write_lock_irq (& WaitQueue_lock); __remove_wait_queue (p, & wait); Write_unlock_irqrestore (& WaitQueue_lock, flags); like sleep_on_head. Sleep_on_tail is also using Spin Lock to package a critical section. extern inline void __remove_wait_queue (struct wait_queue ** p, struct wait_queue * wait) {struct wait_queue * next = wait-> next; struct wait_queue * head = next; struct wait_queue * tmp; while ((tmp = head-> next)! = Wait) {head = TMP;} Head-> next = next;} __remove_wait_queue () is an inline function, which is also defined in. It is used to remove WAIT from the P this Wait_Queue List. Now, everyone should have clearly the practice of interruptible_sleep_on () and sleep_on (), should also be more clear how Wait_Queue is to do Block Io. Next, we continue to see how Wake_up_interruptible () and wake_up () are IMPLEMENT. Wake_up_interruptible () and wake_up () are actually two Macro, which are defined.

转载请注明原文地址:https://www.9cbs.com/read-39636.html

9cbs

New Post(0)