Process management is the most critical part of the operating system, and its design and implementation directly affect the performance of the entire system. In a multi-process operating system, there can be multiple processes "concurrent" execution in a time period. This avoids the case where a more fast CPU is waiting for a low speed I / O device, and the CPU utilization is improved, thereby improving the performance of the system. Another aspect, while running multiple processes, you can also provide a variety of services or more customer services, which is the basic task of modern operating systems.
In the Linux operating system based on the Intel i386 architecture, support has been provided with such multi-process operations. By reasonable selection process scheduling algorithm, a relatively good average corresponding time and higher system performance can be obtained. However, the beauty is inadequate, in the current 2.2.x version of the Linux kernel, there is a limit to the maximum number of processes. That is, in the current Linux system, you can only have 4090 user processes at the same time. For a general desktop application, this number is more than enough. However, it is not enough for enterprise-class server applications.
Imagine a typical web server software that generally uses the structure of multi-process / threads. Whenever a connection request is received, a child process / thread is generated. Obviously, for a heavy-load server, there is a common connection with thousands of connections. At this time, the system with kernel 2.2.x cannot be eligible. Therefore, it can be said that it is because there is a limit to this maximum number of processes, so that Linux cannot compete for the work of the enterprise server operating system. In fact, this level of operating systems is generally more mature and there is no such restrictions such as Solaris, AIX, HP-UX and other systems.
It is worth mentioning that numerous Linux developers have realized this problem. The follow-up version of Kernel 2.2.x, such as test versions 2.3.x, and 2.4 have settled this problem. However, there is still a long time from the release of 2.4, and 2.4 is stable in use, and it takes a while. In this period of time, is we only waiting for or choose other systems? Or, can there be a solution, can break through the limit of the number of processes on the 2.2.x kernel?
To answer this question, you must first analyze 2.2.x core process management.
1. I386 architecture and kernel v2.2.x Storage Management Feeds Process Management must first understand the basic I386 architecture and storage management. This is because the architecture determines the implementation of storage management, and the process management is inseparable from storage management. In modern operating systems, storage management generally uses virtual storage, that is, the address space used by the system is different from the actual physical address space, and is the address of "virtual". The processor provides a certain mechanism to convert the "virtual" address to the actual address. The storage management of the operating system is implemented based on the address conversion mechanism provided by the processor. Basic storage management has two ways, namely, types and pages. Segment management is to divide the memory into a different segment, pass the segment pointer to the method of each segment. In a relatively early system, such as PDP-11, such as use this method. The disadvantage of this method is that the segment must be considered when designing the program, which is very inconvenient. Page management is to divide memory into a number of pages (PAGE) that fixed the size, allocate in units. Complete address translation through the page mapping table. The i386 store management uses two-level virtual segment page management, that is, it segments first, and then pagin. Specifically, it seizes GDT (Global Descriptor Table) and converts virtual addresses to linear addresses. Then use the two-stage page table structure to make a paging, and the linear address is converted to a physical address. The figure below shows the conversion of the virtual address to the physical address in I386: In Linux, the operating system is in the process of the processor 0 privilege level, by setting the GDT, putting its code and data in a separate segment, to distinguish from the user User data segments and code segments are used. All user programs use the same data segment and code segment, that is, all user programs are in the same address space. Protection between programs is done by establishing different page table mappings. The segment descriptor in the Linux 2.2.x GDT table is shown below: In actual use, the user program can also have its own address space by calling a system call settings LDT system. Second, the Kernel 2.2.x process management process is a process of execution of a program, is a dynamic concept. In the i386 architecture, tasks and processes are equivalent concepts. Process management involves system initialization, process creation / demise, process schedule, and inter-process communication, etc. In the Linux kernel, the process is actually a set of data structures, including context, scheduling data, signal processing, process queue pointer, process identification, time data, semaphore data, etc. This group of data is included in the Process Control Block. Linux process management is closely related to the previous I386 architecture relationship. The basic content of the segment-type memory management used in I386 has been discussed earlier, in fact, there are many uses in the paragraph I386. For example, a special segment is used in process management, which is a task status section TSS (Task Status Segment). Each process must have its own TSS, which is done by setting the correct TR register. According to the definition of the I386 architecture, the TR register is stored in the TSS selector, which must be mapped by an item (ie, descriptor descriptor) in the GDT. Similarly, the LDT of the process also has such a limit, that is, the selector corresponding to the LDTR must also be mapped by an item in the GDT. In Kernel 2.2.x, in order to meet this requirement of the I386 architecture, a method of pre-allocating the GDT item required for the process is used. That is, 2 GDT entries for each process. The TR and LDTR values in the process PCB are the selectors it in GDT. The correspondence between the process and its GDT entry can be represented by a Task array. The following is a detailed analysis.
System initialization
In Linux 2.2.x, some data structures associated with process management are initialized when the system is initialized. The most important of these is GDT and process table TASK. The initialization of GDT is mainly to determine how many processes need to keep space, which is to need much GDT. This is determined by a macro definition NR_TASKS. The value of NR_TASKS is the maximum number of processes of the system, which is determined when compiling. It can be understood from Figure 2 that the size of the GDT is 10 2 NR_TASKS * 2.
Process Table Task is actually a PCB pointer array that is defined as follows: struct task_struct * task [nr_tasks] = {& init_task,};
Among them, init_task is a system's No. 0 process and a parent process of all other processes. When the system is initialized, you must manually set this process, add it to the process pointer table to start the process management mechanism. It can be seen that the size of TASK is also dependent on NR_TASKS.
2. Process creation
In Linux 2.2.x, the process is created through the system call fork, the new process is the child process for the original process. It should be noted that in 2.2.x, there is no thread in the true sense. Threads in Linux are actually simulated through the process. That is to say, the thread in Linux is also created by Fork, which is a "light" process. The process of the Fork system call is as follows:
The comparison key step is: a) Add the new process PCB to the process table. From the previous discussion, you can know that the size of the process table TASK is predefined, so you first have an empty entry from Task: nr = find_empty_process (). If an empty entry cannot be found, the system has reached the upper limit of the maximum number of processes, and the new process cannot be created. The returned NR is the subscript of idle entries.
b) copy the address space of the parent to the child process. In this step, you have to copy the LDT to the child process, then build the LDT descriptor (Descriptor) of the sub-process in the GDT, and save this selector in the PCB.
Set TSS for the child process. At the same time, the TSS descriptor of the sub-process is created in the GDT and the selector is saved in the PCB. 3. Process Scheduling This We will not discuss the process schedule algorithm, just take a look at the work that the process should do. In Linux 2.2.x, the process switch is done in the Switch_TO function. The Switch_TO basic operations are as follows: 1) Set the LTR, load the new process TSS 2 to save the FS of the original process, the GS register to its PCB. 3) If necessary, LDT 4 of the Loading New Process 4) Loading a new process The page table of the new process is loaded with the new process FS and GS or the selectors of the TSS and LDTs of the LDT are removed from the process of PCB. Third, breakthrough limitations limit 1. Limitation caused by the above discussion, we can see why Linux 2.2.x maximum process limit: the macro NR_TASKS defined in Linux 2.2.x, is static when compiling X86 The number of processes in the system is running. This number static determines the total length of the GDT table when the system is initialized. However, because the I386 architecture features, the global descriptor table register GDTR length domain is 16 bits, each descriptor is 8 bytes, so the maximum number of descriptors can be accommodated = 1 "(16-3) = 1" 13 = 8192. The kernel has special arrangements for the first 12 items of the GDT table during initialization, as shown below: 1) Empty Descriptor (Article 0), Reserved Descriptor (Article 1, 6, 7) 2) Nuclear Code, Data Section (Article 2, 3) and user code, data segment (4,5) 3) APM BIOS Use Section (Article 8-11) At the same time, since each process needs two TSS and LDT descriptors, Therefore, theory can be used in the number of processes = (8192-12) / 2 = 4090. 2. The solution is visible, breaking through the maximum number of processes, must solve the problem without sufficient GDT entries. The size of the GDT is hardware restrictions, so only the TSS and LDT descriptors of the process can only be considered, and the processes are canceled to pre-allocate the GDT space.
In fact, according to the analysis of the process data structure PCB, it is possible to dynamically address the TSS and LDT paragraphs of each process in the kernel, so the above two segments can be referenced by the PCB pointer during task switching. Addressing, respectively:
Process TSS section: Proc-> TSS process LDT paragraph: Proc-> mm-> segments
Therefore, it is not necessary to staticly assign and retain the descriptors of these two segments. We can establish them when any needs, such as when switching, segment describes the segment describing the segment in the GDT table.
The key to this method is that the TSS, LDT descriptor of the process that is running on the CPU is stored in the GDT to store the TSS, LDT descriptor of the process running on the CPU. These two descriptors are established by the operating system based on the contents of the process PCB during process switching. 3. Program implementation
To break through the limit of the number of 4090 processes, you need to dynamically set the TSS and LDT segment descriptors of the process. Specifically, it is reflected in two aspects:
1) System initialization: a) Set the static setting GDT table length in Head.s to the maximum value 8192. b) In the GDT table definition of DESC.H, the NR_CPus * 2 is inserted in the first position of the process, as a special item for each CPU. Used for running outside the GDT table. Still left 4090-NR_CPUS = 4090-32 = 4058 items for the original algorithm. CPU0: Shared_TSS_ENTRY = 12Shared_LDT_ENTRY = 13
CPU1: Shared_TSS_ENTRY 1 = 14
Shared_ldt_entry 1 = 15
...
CPUN: Shared_TSS_ENTRY N = Shared_TSS_ENTRY N
Shared_ldt_entry n = Shared_LDT_ENTRY N
(n c) Change the macro NR_TASKS for the variable INT NR_TASKS and dynamically allocated process pointer array space in the initialization function start_kernel (): Task = kmalloc (sizeof (void *) * nr_tasks, gfp_atomic; In this way, the use of symbolic TASK in the kernel is the same as static allocation. An additional variable NR_TASKS can be incorporated into the kernel in parameters when LILO is started. d) Modify PARSE_OPTIONS (), enabling the kernel to identify variable NRTASKs that represent the maximum number of processes from LILO for the allocation of Task array space. 2) Task switching a) When Fork, TSS.LDT and TSS.TR in the PCB are used to save the LDT and TSS selection of the process to load the TR register and the LDTR register during switching. Since there is now more than 4090 processes, according to the original There is an algorithm calculation, and the LDT of these processes will exceed 16-bit range, so we use the TSS structure to keep unused short __ldth, combined with the short LDT to become long intensive (int) LDT, so assignment commands become : * (unsigned long *) & (p-> tss.ldt)) = (unsigned long) _LDT (NR); IF (* (unsigned long *) & (p-> tss.ldt)) <(unsigned long) (8292 << 3)) SET_LDT_DESC (NR, LDT, LDT_ENTRIES); // This sentence is original code; Else {// What does not do, leave a similar setting when switching? One advantage of this method is that only the value of LDT in the PCB can be known whether the process is greater than 4090, which greatly facilitates the implementation of the following process switching. b) When switching: We have allowed the number of processes that exceed the number of GDT tables, that is, the process of exceeding the portion cannot be pre-allocated in the GDT table. At this time, the above-mentioned GDT entry reserved for each CPU is in the GDT table. Addressing is: Shared_TSS_ENTRY SMP_PROCESSOR_ID (); Thus, the setting of the process segment descriptor exceeding the GDT table uses the following algorithm, and the processing does not exceed the part: ... #define set_shared_tss_desc (addr, cpu) / _SET_TSSLDT_DESC (GDT_TABLE Shared_TSS_ENTRY 2 * CPU, (INT) Addr, 235, 0x89); #define set_shared_ldt_desc (addr, size, cpu) / _SET_TSSLDT_DESC (GDT_TABLE Shared_LDT_ENTRY 2 * CPU, (INT) AddR, ((SIZE << 3) -1), 0x82); .... void __switch_to (task_struct * prev, task_struct * next) {... IF (Next-> TSS.TR <= 0x0000FFFF) { Procedure by the original code ... Else { SET_SHARED_TSS_DESC (& next-> TSS), SMP_Processor_ID ()); SET_SHARED_LDT_DESC (& next-> mm-> segments, ldt_entries, smp_processor_id ()); } / / The LDTR and TR registers are loaded with the appropriate value. ... } In summary, you can break through the limit of the maximum number of processes, and write a specific parameter in the LILO setting file. When startup, you can automatically pass it to the kernel, you can implement more than 4090 under Kernel V2.2.x. The number of processes are processed. The format of the parameters is: Append = "nrtasks = 40000" 4. Summary of the program After the above modification, the system theory can be set to the 2nd party. However, in practice, the cost of the pure kernel memory space allocation of the process is: PCB and Core Stack 2 Page Page Category 1 Page Page Table 1 Page (at least) = 4 Page (16K) Therefore, assume that the machine memory is 1G, the process memory management overhead is calculated in 20k, and the operating system occupies approximately 20m, the maximum number of processes is: (1G-20M) / 20K = (1073741824-20971520) / 20480 = 51404 ~ = 50,000. The actual application process is more than 20K, if it is assumed to be 30K, then 1G machine actual maximum process number is = 50,000 * (2/3) = 33000 (one) It should be noted that Linux 2.2.x The original implementation method is faster during process switches. Our solution will be slightly slower when the number of processes is less than 4090, and it will be slightly slower after more than 4090 processes. (Xteamlinux)