Analysis of Linux Thread Realization Mechanism

xiaoxiao2021-03-06 45

content:

Basics: Threads and Processes LINUX 2.4 Lightweight Processes Realization Linuxthread Thread Mechanism Other Thread Implementation Mechanism Reference About the author

In the Linux area:

Tutorial Tools & Product Codes and Component Project Articles

Yangshazhou (Pubb@163.net) University of Defense Science and Technology Computer College May 2003

Since the concept of multi-threaded programming occurs in Linux, the development of Linux multi-wire applications is always unconducting two problems: compatibility, efficiency. This article starts from the thread model. By analyzing the implementation of the most popular LinuxThreads thread library on the Linux platform, it describes how the Linux community looks and resolves compatibility and efficiency.

I. Basic knowledge: Threads and processes are defined in textbooks, the process is the minimum unit of resource management, and threads are the minimum unit of program execution. In the operating system design, from the process evolve thread, the most important purpose is to better support SMP and reduce (process / thread) context switching overhead. Regardless of how the processes are required, a process requires at least one thread as its instruction executive, and the process manages resources (such as CPU, memory, file, etc.), and assigns threads to a CPU. A process can of course have multiple threads, at this time, if the process runs on the SMP machine, it can use multiple CPUs simultaneously to perform various threads to maximize parallelism; at the same time, even in single CPU On the machine, the multi-threaded model is used to design the program. If the multi-process model is used in place of the single process model, it makes the design more concise, the function is more complete, and the execution efficiency of the program is higher, for example, multiple threads response multiple inputs At this time, the functionality implemented by the multi-threaded model can actually be implemented in a multi-process model, and compared with the latter, the context switch overhead of the thread is much smaller than the process, from the semantics, while responding more Enter such a function, actually sharing all resources other than the CPU. Two significance of thread models have developed two threaded models of core-level threads and user-level threads. The standards classified is mainly due to thread dispattors in the nucleus or outside the core. The former is more conducive to the use of multiprocessor resources, while the latter is more considered for context switching overhead. In the current commercial system, both are usually used, providing both core threads to meet the needs of the SMP system, and also supporting the way to implement another set of thread mechanisms in the user state in the user state, at which time a core thread Become a schemer of multiple user state threads. As many technologies, "mixing" usually brings higher efficiency, but it also brings more difficult to achieve difficulty, for "simple" design ideas, Linux does not implement a mixed model from the beginning. However, it uses another "mixing" of another idea. In the specific implementation of the thread mechanism, threads can be implemented on the operating system kernel, or can be implemented in the nuclear exhibition, and the latter obviously requires at least the process, and the former is generally required to support the process in the nucleus. The core level thread model clearly requires the former support, and the user-level thread model does not necessarily be based on the latter. This difference is as mentioned earlier, it is a standard different from two classification methods. When the roll supports both the process and supports threads, the "multi-pair" model of the thread-process can be implemented, that is, a thread of a process is scheduled by the nuclear, while it can also be used as the scheduler of the user-level thread pool. Select the appropriate user-level thread run in its space. This is the "mixed" thread model mentioned earlier, which can meet the needs of multi-processor systems, or minimize scheduling overhead. This majority of commercial operating systems (such as Digital Unix, Solaris, Irix) can fully implement thread models of POSIX1003.1C standards. Threads that can be divided into "one-to-one", "more pairs one" model, the former uses a core process (perhaps a lightweight process) corresponding to a thread, and the thread scheduling is equivalent to the process schedule, The core is completed, while the latter implements multi-threads extends, and the scheduling is also completed in the user state.

The latter is the implementation of the simple user-level thread model mentioned above. Is synchronized or asynchronous) is all in-units, so it cannot be positioned to thread, so this implementation cannot be used for multiprocessor system, and this demand is getting bigger and bigger, so in reality , The implementation of pure user-level threads, in addition to the algorithm research purposes, almost disappeared. The Linux kernel only provides support for lightweight processes, limiting the implementation of a more efficient thread model, but Linux focuses on the scheduling overhead of the process, and it also compensates for this lack of defects to some extent. At present, the most popular thread mechanism LinuxThread is the thread-process "one-to-one" model, dispatting to the core, and a thread management mechanism including signal processing is implemented. The running mechanism of Linux-LinuxThreads is the focus of this paper. II. LINUX 2.4 Lightweight Process in the kernel Realization initial process definitions contain programs, resources, and its implementation three parts, where procedures usually refer to the code, resources typically include memory resources, IO resources, signal processing, etc. on the operating system level. And the execution of the program is often understood to perform context, including the occupation of the CPU, which is later developed as a thread. Before the thread concept, in order to reduce the overhead of the process switching, the operating system designer gradually corrects the concept of the process, gradually allowing resources to be stripped from its main body from its main body, allowing some of the processes to share some resources, such as files, signals , Data memory, even code, which develops the concept of lightweight processes. The Linux kernel has implemented a lightweight process in version 2.0.x, and the application can use a unified clone () system call interface, specifying the created lightweight process or a normal process with different parameters. In the kernel, Clone () calls DO_FORK () after passing the parameter transfer and interpretation, this core function is also the final implementation of fork (), vFork () system call:

INT Do_Fork (unsigned long clone_flags, unsigned long stack_start,

Struct Pt_Regs * Regs, unsigned long stack_size

Where the clone_flags is taken from the "or" value of the following:

#define csignal 0x000000FF / * SIGNAL MASK to Be Sent At Exit * /

#define clone_vm 0x00000100 / * SET IF VM Shared Between Processes * /

#define clone_fs 0x00000200 / * SET IF FS Info Shared Between Processes * /

#define clone_files 0x00000400 / * set if open files shared betWeen Processes * /

#define clone_sighand 0x00000800 / * SET if Signal Handlers and Blocked SIGNALS Shared * /

#define clone_pid 0x00001000 / * set if pid shared * /

#define clone_ptrace 0x00002000 / * SET IFW Want to Let Tracing Continue on the child Too * / # define clone_vfork 0x00004000 / * set if the parent wants the child to wake it up on mm_release * /

#define clone_parent 0x00008000 / * SET IF WE Want to Have The Same Parent as The Cloner * /

#define clone_thread 0x00010000 / * Same Thread group? * /

#define clone_newns 0x00020000 / * New Namespace Group? * /

#define clone_signal (clone_sighand | clone_thread)

In do_fork (), different clone_flags will result in different behaviors, for LinuXThreads, which uses (clone_vm | clone_fs | clone_files | clone_sighand) parameters to call Clone () Create "Thread", indicating shared memory, shared file system access count, Share file descriptor table, and shared signal processing mode. This section is for these parameters to see how Linux kernels are shared by these resources. 1.clone_vm do_fork () needs to invoke COPY_MM () to set the mm and Active_mm items in Task_struct, which correspond to the memory space associated with the process. If DO_FORK () specifies the clone_vm switch, COPY_MM () will set the mm and Active_mm in the new task_struct to the same number as the CURRENT, while increasing the number of users of the mm_struct (mm_struct :: mm_users). That is, the lightweight process shares the memory address space with the parent process, and the status of mm_struct in the process is shown below:

2. Clone_fs task_struct Records the root directory and current directory information of the file system where the process is located, and the COPY_FS () replicates this structure when DO_FORK () is replicated; only FS is added to the lightweight process. > Count count, share the same FS_STRUCT as the parent process. That is, the lightweight process does not have a separate file system related information, and any thread in the process changes the current directory, the root directory and other information will directly affect other threads. 3. Clone_files One process may open some files, using Files (Struct Files_Struct *) in the process structure Task_structure to save the Struct File information, and the do_fork () calls Copy_Files () to handle this process property. The lightweight process is shared with the parent process, and only the files-> count count is added when copy_files (). This share enables any thread to access the open file maintained by the process, and the operations to them are directly reflected in the process in the process. 4.Clone_sighand Each Linux process can define the processing of signals to the signal, and use a struct k_sigAction structure in the sig (struct signal_struct) in task_struct, saving this configuration information, and coupy_sighand () in do_fork () is responsible for copying This information; the lightweight process is not copied, but only the signal_struct :: count count is set, and the structure is shared with the parent process. That is, the child process is identical to the signal processing of the parent process, and can be changed with each other. There are many work done in do_fork (), which is not described in detail here. For SMP systems, all process Forks are all assigned to the same CPU as the parent process, and the CPU selection will be performed when the process is scheduled. Although Linux supports a lightweight process, it cannot be said that it supports the core level thread, because Linux's "thread" and "process" are actually in a scheduling level, share a process identifier space, this limit makes it impossible LINUX implements a full-sense POSIX thread mechanism, so many Linux thread library implementation can only achieve the most semantics of POSIX as much as possible, and is as close as possible. Three. LINUXTHREAD thread mechanism LinuXThreads is currently using the most widely used thread library on the Linux platform, which is responsible for development and completed in Glibc. It is implemented based on the "one-to-one" thread model of the core lightweight process, and a thread entity corresponds to a core lightweight process, and the management between the thread is implemented in the nuclear external function library. 1. Thread Description Data Structure and Implementation Limits LinuxThreads Define a struct _pthread_descr_struct data structure to describe the thread and use global array variables __pthread_handles to describe and reference the process. In the first two __pthread_handles, LinuxThreads global system defines two threads: __ pthread_initial_thread and __pthread_manager_thread, and characterization of the parent thread __pthread_manager_thread (initially __pthread_initial_thread) with __pthread_main_thread.

Struct _pthread_descr_struct is a double-ring list structure, and the linked list where __ pthread_manager_thread is only included, in fact, __ pthread_manager_thread is a special thread, LinuxThreads only uses three fields such as Errno, P_PID, P_Priority. __Pthread_main_thread The chain where the __pthread_main_thread is located together all user threads in the process. After a series of pthread_create (), the __pthread_handles array will appear as shown below:

The newly created thread will first occupy an array in the __pthread_handles array, and then in the link list of __pthread_main_thread as the first pointer through the chain pointer in the data structure. The use of this list will be mentioned when introducing the creation and release of the thread. LinuxThreads follows the POSIX1003.1C standard, in which the implementation of the thread library has been limited, such as the maximum number of threads, the thread private data area, and the like. In the implementation of LinuxThreads, these restrictions are basically followed, but there are certain changes, and the trend of changes is relaxed or expanded, making programming more convenient. These defined macros are mainly concentrated in sysdeps / decal_lim.h (different from different platforms in different platforms), including the following: private data for each process, POSIX definition _POSIX_THREAD_KEYS_MAX is 128, LinuxThreads use PTHREAD_KEYS_MAX, 1024; the number of allowed operations when the private data is released, LinuxThreads consistent with POSIX, as defined PTHREAD_DESTRUCTOR_ITERATIONS 4; number of threads per process, POSIX is defined as 64, LinuxThreads increased to 1024 (PTHREAD_THREADS_MAX); thread running the minimum stack Space size, POSIX is not specified, LinuxThreads use pthread_stack_min, 16384 (bytes). 2. One of the benefits of the management thread "One-to-one" model is that the scheduling of threads is completed, while others are completed in the nuclear outer thread library. In LinuxThreads, a management thread is dedicated to each process, responsible for handling thread-related management. When the process first calls pthread_create () creates a thread, you will create (__clone ()) and start the management thread. In a process space, the management thread is communicating between the other threads through a pair of management pipes (Manager_pipe [2]) ", which is created before creating the management thread, and managing the management of the pipe after successfully launching the management thread. The end and the write end assign two global variables __pthread_manager_reader and __pthread_manager_request, each user thread is requested by the __pthread_manager_request to the management line, but the management thread itself does not directly use __pthread_manager_reader, the read end of the pipeline (Manager_Pipe) [0]) is transmitted to the management thread as one of the parameters of __clone (), and the management thread is mainly listening to the pipe reading end, and the request is reacted from the request. The process of creating a management thread is as follows: (global variable pthread_manager_request initial value is -1)

After the initialization is over, the lightweight process number and the thread ID of the lightweight process number and the verified allocation and management of the thread ID, 2 * pthread_threads_max 1 will not conflict with any regular user thread ID. Managing threads run as a sub-thread of the caller thread of pthread_create (), and the user thread created by pthread_create () is created by the management thread to call clone (), so it is actually a sub-thread to manage threads. (The concept of this surian thread should be understood as a political process.) __Pthread_manager () is the main loop of the management thread, and enters the While (1) loop after performing a series of initialization work. In the loop, the thread is inquiry at a timeout query (__poll ()) manages the read end of the pipe. Before processing the request, check if its parent thread (that is, create the main thread of Manager) has exited, and exit the entire process if you have exited. If there is an exit sub-thread that needs to be cleaned, call pthread_reap_children () cleanup. Then is the request in the read pipe, and perform the corresponding operation according to the request type. The specific request processing, the source code is more clear, and details will not be described here. 3. Thread Stack In LinuxThreads, the stack of the management thread and the user thread is separated, and the management thread assigns a thread_manager_stack_size by malloc () in the process heap as its own homage stack. The stack allocation method of the user thread varies depending on the architecture, mainly according to two macro definitions, one is new_separate_register_stack, this property is only used on the IA64 platform; the other is floating_stack macro, using a few platforms such as I386 At this time, the user threading stack determines the specific location by the system and provides protection. At the same time, the user can also specify the use of a user-defined stack through a thread attribute structure. Because of the boundary, there are only two stack organizations used by the i386 platform: floating_stack mode and user custom methods. In the floating_stack mode, LinuxThreads uses MMAP () to assign 8MB space from the kernel space (I386 system default maximum stack space size, if there is a RLIMIT, follow the runtime limit), use mprotect () to set it One page is a non-access area. The function assignment of this 8M space is as follows: The low address protected page is used to monitor the stack overflow. For the stack specified by the user, set the thread stack top after indicating the boundary, and calculate the bottom of the stack, do not protect, correctness is guaranteed by the user. Regardless of the organization, the thread description structure is always located in the position of the stack is adjacent to the stack. 4. Thread ID and Process ID Each LinuxThreads thread has a thread ID and a process ID at the same time, where the process ID is the process number maintained by the kernel, and the thread ID is allocated and maintained by LinuxThreads. __pthread_initial_thread thread id is PTHREAD_THREADS_MAX, __ pthread_manager_thread is 2 * PTHREAD_THREADS_MAX 1, a first user thread is the thread id PTHREAD_THREADS_MAX 2, after the n-th user thread id thread following formula:

TID = N * pthread_threads_max n 1

This assignment ensures that all threads (including exiting) in the process do not have the same thread ID, and the type of thread ID is defined as unsigned long int, but also guarantees reason. The thread ID will not be repeated during the runtime. From the thread ID lookup thread data structure is done in the pthread_handle () function, it is actually just modeled by PTHREAD_THREADS_MAX, which is the index of the thread in __pthread_handles. 5. The creation of threads After pTHREAD_CREATE () sends the REQ_CREATE request to the management thread, the management thread calls pthread_handle_create () creates a new thread. Distribution stack, set the thread property, create and start the new thread with pthread_start_thread (). PTHREAD_START_THREAD () reads its own process ID number in the thread description structure, and configures schedule according to the scheduling method of the record. After everything is ready, call the true thread execution function and then call the pthread_exit () cleaning site after this function returns. 6.LinuxThreads is not enough due to restrictions on Linux kernels and difficulty, LinuxThreads is not fully POSIX compatible, which is described in its release readme. 1) Process ID problem This lack of deficiencies is the most critical, which is caused by the "one-to-one" model of LinuxThreads. The Linux kernel does not support the true thread, LinuxThreads is used to implement thread support with a lightweight process with the normal process with the same kernel scheduling view. These lightweight processes have independent process IDs that enjoy the same capacity as normal processes in process schedule, signal processing, IO. In the source code reader, the Linux kernel's clone () does not implement support for Clone_PID parameters. The processing of Clone_PID in the kernel DO_FORK () is like this: if (clone_flags & clone_pid) {

IF (Current-> PID)

Goto fork_out;

}

This code shows that the current Linux kernel only recognizes the clone_pid parameters when the PID is 0. In fact, the clone_pid parameters will only be used when SMP initialization, manually created the process. Follow POSIX definitions, all threads of the same process should share a process ID and parent process ID, which is unable to implement in the current "one-to-one" model. 2) Signal Problem Because the asynchronous signal is distributed to the roof, each thread of LinuxThreads is a process for the kernel, and there is no "thread group", so some semantics does not comply with the POSIX standard. For example, no implementation is sent to all threads in the process, and readme will explain this. If the core does not provide real-time signals, LinuxThreads will use Sigusr1 and Sigusr2 as the RESTART and CANCEL signals used inside, so that the application cannot use the two signals that are reserved for users. The released real-time signal (from _sigrtmin to _sigrtmax) after the version after Linux Kernel 2.1.60, so there is no such problem. The default action of certain signals is difficult to implement in the current system, such as SigStop and Sigcont, LinuxThreads can only hang a thread without hanging throughout the process. 3) Total number of threads Linuxthreads define the maximum number of threads of each process as 1024, but in fact, this value is also limited by the total process number of the entire system, which is because the thread is actually a core process. In Kernel 2.4.x, a new total process calculation method is used, so that the total process number is substantially limited to the size of physical memory, and the formula is in the fork_init () function of kernel / fork.c: max_threads = Mempages / (Thread_size / Page_Size) / 8

On I386, thread_size = 2 * Page_Size, Page_Size = 2 ^ 12 (4KB), Mempages = Physical Memory Size / Page_Size, Mempages = 256 * 2 ^ 20/2 ^ 12 = 256 * 2 ^ 20/2 ^ 12 = 256 * 2 ^ 20/2 ^ 12 = 256 * 2 ^ 20/2 ^ 12 = 256 * 2 ^ 20/2 ^ 12 = 256 * 2 ^ 8, the maximum number of threads at this time is 4096. However, in order to ensure that the total number of processes in each user (except root) does not take more than half of the physical memory, fork_init () continues to specify:

INIT_TASK.RLIM [RLIMIT_NPROC]. rlim_cur = max_threads / 2;

INIT_TASK.RLIM [RLIMIT_NPROC]. rlim_max = max_threads / 2;

The number of these processes is performed in Do_Fork (), so for LinuxThreads, the total number of threads is limited by these three factors. 4) Management thread problem management thread is easy to become bottleneck, this is a common problem of this structure; at the same time, management thread is responsible for the cleaning of user threads, so although most of the signals have been shielded, once manage thread death, The user thread has to be cleaned up, and the user thread does not know the status of the management thread, and the subsequent thread creation and other requests will be processed. 5) Synchronous problem Linuxthreads in LinuxThreads is largely established on the signal basis, which is always a problem with the synchronous mode of the complicated signal processing mechanism. 6) Other POSIX Compatibility Problems Linux Many system calls, follow the semantics, such as Nice, SetUID, Strlimit, etc. In the current LinuxThreads, these calls only affect the caller thread. 7) The introduction of the real-time problem thread has certain real-time considerations, but LinuxThreads are temporarily not supported, such as dispatching options, is currently not implemented. Not only Linuxthreads, the standard Linux is very small in real-time. IV. Other thread implementation mechanisms LinuxThreads issues, especially compatibility, seriously hindering cross-platform applications on Linux (such as apache) using multi-threaded design, making thread applications on Linux have been maintained in relatively low Level. In the Linux community, many people have worked hard for improving thread performance, including both user-level thread libraries, including core level and user-level mating improvement thread libraries. There are two items that are most optimistic, one is NPTL (Native Posix Thread Library) led by Redhat, and the other is the NGPT (Next Generation Posix Thream) developed by IBM, both are fully compatible with POSIX 1003.1. C, simultaneous work in nuclear and nuclear external work to achieve multi-pair multi-threaded models. Both models make up for the shortcomings of LinuxThreads to a certain extent, and they are all designed to restart the stove. 1. NPTL NPTL design goals summarizes the following points:

POSIX compatibility SMP structure utilization low-start overhead low-link overhead (ie programs that do not use threads should not be affected by thread libraries) with LinuxThreads Applications of binary compatibility hardware-integrated multi-architecture support NUMA support and C integration In technology implementation, NPTL still uses 1: 1 thread model, and cooperates with Glibc and the latest Linux kernel2.5.x development version to optimize in many aspects of signal processing, thread synchronization, storage management. Unlike LinuXThreads, NPTL does not use management threads, and the management of core threads is directly in the nucleus, which also has the optimization of performance. It is mainly because of the core problem, NPTL is still less than 100% POSIX compatible, but it has been greatly improved relative to LinuxThreads for performance. 2.NGPT IBM Open Source Project NGPT has launched a stable version 2.2.0 on January 10, 2003, but the relevant documentation work is still much. As far as is currently known, NGPT is a M: N model implemented by a GNU PTH (GNU Portable Threads) project, and GNU PTH is a classic user-level line library implementation. According to the Notice on the official website of NGPT in March 2003, NGPT taking into account NPTL is increasingly accepted by people, in order to avoid confusion caused by different thread library versions, it will no longer be further developed, while supportive maintenance work. That is, NGPT has given up the next generation of Linux Posix thread library standards with NPTL competition. 3. Other high efficiency thread mechanisms cannot mention Scheduler Activations. This multi-threaded core published on the ACM in 1991 affected many multi-threaded cores, including Mach 3.0, NetBSD, and commercial version Digital UNIX (now CoMpaq True64 Unix). Its essence is to reduce the user-level system call request while using user-level thread scheduling, while the latter is often an important source of running overhead. The thread mechanism for this structure is actually a practicality combined with the flexible and efficient and core level threads of the user-level thread. Therefore, multiple open source operating system design communities, including Linux, FreeBSD are related research, Forced to implement Scheduler Activations in this system. V. Reference

转载请注明原文地址:https://www.9cbs.com/read-64414.html

9cbs

New Post(0)