Design of Linux's preferred read and write lock

xiaoxiao2021-03-06 94

First, the purpose of this article

There are two basic mechanisms that implement data mutual exclusion under Linux, including Semaphore, Spinlock. The read write lock here is a variant of the spin lock. The difference between the general spin lock is that the spin lock can only have a process to enter the critical area, and the read and write locks If the process is read, you can have multiple processes to enter the critical area, and if you are writing, there is only one.

A type of reading and writing lock has been implemented in the release version of the LINUX kernel source code, is the reader-priority read and write lock. (In this design, the request for reading can be easier to enter the critical regions, while the request written is often easily blocked, this will be analyzed later, and I want to design the read and write lock, it is written process For the preferred object, if there is a written request, it will respond within the fastest time being allowed. The benefit is a server (such as a general public server) accessed by many clients (such as a general public server), if the administrator modifies some content or configuration of the server, then its timelyness may not meet . This is sometimes not acceptable.

Second, Linux existing read and write lock

Let me first analyze the implementation of the reading and writing lock in the Linux kernel source code, so it can be easily understood by the designs of the next writer priority read and write lock.

First introduce a data structure, which is an important role in reading and writing.

(Note: All of the kernel source code from below is from Linux 2.4.17. If you are different from your existing kernel source code, please make some corresponding changes, the principle is not changed) Typedf struct {volatile unsigned int int LOCK; #IF spinlock_debug unsigned magic; #ndif} rwlock_t;

The MAGIC here is used to debug, and the LOCK is allowed to be able to add a number of read locks.

This code is defined in the linux / include / asm-i386 / spinlock.h in the read_lock and write_lockstatic inline void read_lock (rwlock_t * rw) {#if SPINLOCK_DEBUG if BUG () (rw-> magic = RWLOCK_MAGIC!); #Endif __build_read_lock ( rw, "__read_lock_failed");} static inline void write_lock (rwlock_t * rw) {#if SPINLOCK_DEBUG if (rw-> magic = RWLOCK_MAGIC) BUG ();! #endif __build_write_lock (rw, "__write_lock_failed");}

Note that there are two parameters here, one is that RW is the number of read locks, and the later one parameter is a function of failing if the lock failed.

In this case, __build_write_lock_ptr or __build_write_lock_ptr or __build_write_lock_ptr, which is called __build_read_lock_ptr or __build_read_lock_ptr, where the delder is called the operation instruction address address of the call parameter. We only look at the Const class here.

#define __build_write_lock_const (rw, helper) /

ASM Volatile (LOCK "SUBL $" rw_lock_bias_str ", (% 0) / N / T" / "JNZ 2F / N" /

"1: / n" /

".section .text.lock, /" AX / "/ n" /

"2: / TPUSHL %% EAX / N / T" /

"LEAL% 0, %% EAX / N / T" /

"Call" Helper "/ N / T" /

"POPL %% EAX / N / T" /

"JMP 1B / N" /

".previous" /

: "= m" (* (Volatile Int *) RW):: "Memory")

The RW_LOCK_BIAS_STR here is "0x01000000", the reason for this value is that this value is sufficient, which can make the request to satisfy enough.

In ".section .text.lock, /" AX / "/" /".previous "/ in content is to compile this code to a section called .Text.lock, and this section is Re-positionable and executable, so during the execution of the code, because different sections are loaded into different pages, if JMP does not appear in front, it is at 1: At the end. The Call is __write_lock_failed, which is written in the previous Write_Lock, defines in Arch / ASM-I386 / KERNEL / SMAPHORE.C.

.align

.globl __write_lock_failed

__write_lock_failed:

"Lock" AddL $ "RW_LOCK_BIAS_STR", (% EAX)

1: Rep; NOP

CMPL $ "rw_lock_bias_str", (% EAX)

JNE 1B

"LOCK" SUBL $ "RW_LOCK_BIAS_STR", (% EAX)

JNZ __WRITE_LOCK_FAILED

RET

The LOCK here is a locking bus in the SMP architecture and does not allow other CPUs to access memory. Here first "Lock" AddL $ "rw_lock_bias_str", (% EAX) is to prevent the back of the response from the back, it will not be blocked, if necessary, there will be a deadlock, which is fatal. . and

1: Rep; NOP

CMPL $ "rw_lock_bias_str", (% EAX)

JNE 1B

Is the constant check lock can be obtained, no change will be NOP, this method can be held without changing the value of the LOCK, this does not use Lock, such a locking bus. Finally, if you get the lock

"LOCK" SUBL $ "RW_LOCK_BIAS_STR", (% EAX)

This enables the function of Write_lock.

Similar to reading locks

#define __build_read_lock_const (rw, helper) /

ASM Volatile (LOCK "SUBL $ 1,% 0 / N / T" /

"JS 2F / N" /

"1: / n" /

".section .text.lock, /" AX / "/ n" /

"2: / TPUSHL %% EAX / N / T" /

"LEAL% 0, %% EAX / N / T" / "Call" Helper "/ N / T" /

"POPL %% EAX / N / T" /

"JMP 1B / N" /

".previous" /

: "= m" (* (Volatile Int *) RW):: "Memory")

But here should pay attention to the read_lock, as long as the reduction is 1 and as long as this value is not negative, it can be locked, but the value of rw.lock is assigned when it is initialized, this value is 0x01000000, this value It is very big enough, but it is very small, so it is easy to get a reading lock. From theory, it is easy to get 0x01000000 times more than the write lock. This is what I said in front of Linux is now the reader is preferred. of. And this number makes it easy to understand that when you want to get a write, you have to lose the 0x01000000, which is if there is one read or write request in the critical area, the second write request cannot be written.

And if you can't read the lock, what you want to jump is __read_lock_failed in Read_Lock

.align 4

.globl __read_lock_failed

__read_lock_failed:

Lock; Incl (% EAX)

1: Rep; NOP

CMPL $ 1, (% EAX)

JS 1B

LOCK; DECL (% EAX)

JS __READ_LOCK_FAILED

RET

The truth here is similar to the truth in the front __write_lock_failed.

Third, the implementation of the read and write lock

Since it is necessary to implement the writer as a priority read and write lock, it is natural, we think that the request is attending, don't try to get a read lock first, but check whether there is a written request is waiting, if there is a write The request is waiting, then the request must be in the waiting state. After the write request is completed, the request that has not been written is waiting, and I try to get the lock.

Here us first design RWLOCK_T data structure,

Typedef struct {

Volatile unsigned int Lock

#if Wlock_Priority

Volatile Unsigned Int Wlock_waiting;

#ENDIF

#if spinlock_debug

Unsigned Magic;

#ENDIF

} rwlock_t;

This wlock_waiting here is the number of logos waiting for a request for a request to detect. If this number is not equal to 0, the request already written is waiting. Its negative size determines the number of write requests.

Here we first modify __build_write_lock_const

#define __build_write_lock_const (RW, Wlock_Wait, Helper, Helper1) /

ASM Volatile

"CMPL $ 0, (% 1) / N / T" /

"JNZ 3F / N" /

"1: / t" lock "SUBL $" rw_lock_bias_str ", (% 0) / N / T" /

"JNZ 4F / N" /

"2: / n" /

".section .text.lock, /" AX / "/ n" /

"3: / TPUSHL %% EBX / N / T" /

"LEAL% 1, %% EBX / N / T" /

"Call" Helper1 "/ N / T" /

"POPL %% EBX / N / T" /

"JMP 1B / N" / "4: / TPUSHL %% EAX / N / T" /

PUSHL %% EBX / N / T "/

"LEAL% 0, %% EAX / N / T" /

"LEAL% 1, %% EBX / N / T" /

"Call" Helper "/ N / T" /

"POPL %% EBX / N / T" /

"POPL %% EAX / N / T" /

"JMP 2B / N" /

".previous" /

: "= m" (* (Volatile Int *) RW), "= M" (* (volatile int *) Wlock_waiting): "Memory")

The newly added WLOCK_WAITING here is the address of the Wlock_Waiting, which is the address, not the value itself, it is the way to address the address address, to ensure that the operation of the instruction is directly in memory, not In the data load to the register in the memory, after proceeding back to memory, if this is not this, it is possible to change this variable in the register within the register. Changed in their registers. Helper1 knows how to have a write request waiting to be locked and jumped. The name I took here is __read_lock_failed_wlock_wait

.align 4

.globl __read_lock_failed_wlock_wait

__read_lock_failed_wlock_wait:

1: Rep; NOP

CMPL $ 0, (% EBX)

JNZ 1B

JS __READ_LOCK_FAILED_WLOCK_WAIT

RET

Here is the constant check of whether the value of WLOCK_WAITIG is 0, if not 0, execute the idle command.

Helper is the name of the front __read_lock_failed, but a little change.

.align 4

.globl __read_lock_failed

__read_lock_failed:

Lock; Incl (% EAX)

1: Rep; NOP

CMPL $ 0, (% EBX)

JNZ 1B

REP; NOP

CMPL $ 1, (% EAX)

JS 1B

LOCK; DECL (% EAX)

JS __READ_LOCK_FAILED

RET

It is still to check if there is a write request waiting, because if this is not the case, there may be written requests during the jump process of the previous instruction, and we are strictly enforced writer priority reading Write rules. This is also the reason if it is the address of the write request if you try to get the read lock process.

The acquisition of the write is also modified.

#define __build_write_lock_const (RW, Wlock_Wait, Helper) /

ASM Volatile (LOCK "SUBL $ 1, (% 1) / N / T" /

LOCK "SUBL $" RW_LOCK_BIAS_STR ", (% 0) / N / T" /

"JNZ 2F / N" /

"1: / n" lock "Add1 $ 1, (% 1) / N / T" /

".section .text.lock, /" AX / "/ n" /

"2: / TPUSHL %% EAX / N / T" /

"LEAL% 0, %% EAX / N / T" /

"Call" Helper "/ N / T" /

"POPL %% EAX / N / T" /

"JMP 1B / N" / ".previous "/

: "= m" (* (Volatile Int *) RW), "= M" (* (volatile int *) Wlock_waiting): "Memory")

Here is similar to the principle in __build_read_lock_const, but there is a little different, once you want to get a write lock, you must first reduce the number of wlock_waiting this number. If you get it, add 1, this is to let the same read The request can be found that the requested request is waiting to be read and write lock. And here, addl and subl is applied, instead of INCL and DECL are because the latter two instructions can only be valid in 24 BITS, and the first two instructions can be valid for 32 BITS, while adopting first reduction Strategy, not first add, because if it is reduced, it turns into 0xffffffff, so all bits are 1 reliability for hardware. It is the same for Helper and the previous readers.

The above is just given the most important content that modifies the read and write lock, in fact, if it is really necessary to modify ten points, here I have a modified code here, I will download:

Code download

Fourth, small knot

Here, the read and write lock in the Linux kernel has made a priority modification. This modification looks from the contents of the code than the reader's priority to increase the cost, especially if there is a write request waiting outside the critical area. There may be a lot of read requests in __read_lock_failed_wlock_wait. However, if the probability of the write request and the occurrence of the read request may be 1: 100 or smaller, and the high standard for the system refreshes the requirements. Then this loss is worth it, especially if the router, or a high-time information publishing platform (such as a securities) should be. This is "bigger thanks" on the code. And if this code is applied on a general PC platform, it may only be "smashed". From this, you see the spirit of open source code, so that users achieve their best configuration and functionality.

转载请注明原文地址:https://www.9cbs.com/read-110573.html

9cbs

New Post(0)