Linux 2.4 scheduling system analysis
content:
Foreword Related Data Process Switching Process Ready Process Select Algorithm Scheduling Related Part Linux 2.4 Some Problems of Dispatching System After Recipe Reference Information About the author
In the Linux area:
Tutorial Tools & Product Codes & Component Articles
Yangshazhou (pubb@163.net) University of Defense Science and Technology April 2004, 2004, in the Open Source Operating System, Linux's development is most significant, so far, it has occupied considerable share in the low-end server market. . From the latest Linux 2.6 system, there are two main ways in the development direction of Linux: embedded systems and high-end calculations. The scheduling system has a very important impact on the overall performance of the operating system, the embedded system, the desktop system, and the high-end server are very different from the requirements of the scheduler. There are two characteristics of the Linux scheduler:
The core is not a preemption; the scheduling algorithm is simple and effective. Since Linux is suitable for a variety of platforms, this article refers to the SMP system under I386 default. two. Related Data Structure In Linux, the process is represented by Task_Struct, all processes are organized into the two-way linked list of init_task as the header (see [include / linux / sched.h] set_links () macro), the list is the only system . All CPUs are organized to an array of schedule_data (afterwards). The process can be accessed between the process and the running CPU (see below). All running processes (task_running) are organized to two-way linies that are running with Runqueue_Head, and the scheduler always looks for the most suitable scheduled process. Runqueue_head is also unique in the system. These data structures related to the scheduler work are described below. 1. Init_tss TSS, Task State Segment, 80x86 platform-specific process running environment, although Linux does not use TSS, but save the information described by TSS in TSS_STRUCT array init_TSS in the CPU number index, the process switches, where the value is switched Will be updated. 2. Task_struct In Linux, threads, processes are the same core data structure, which can be said that only processes in the built in 2.4, which contain lightweight processes. One process uses a Task_Struct structure in the core, contains a lot of information that describes the process, where information related to the scheduler mainly includes the following: i. State Linux process status is mainly divided into three categories: can run (Task_Running, equivalent to operational state and ready state); pended (task_interruptible, task_uninterruptible, and task_stopped); uncomfortable (task_zombie), the scheduler mainly processes the process that can be run and hang in two states, The Task_Stopped also specifies the response of IPC signals such as Sigstp, and Task_Zombie refers to the "zombie" process that has been exited and temporarily not recovered by the parent process. II. NEED_RESCHED Boolean value, used in the scheduler to indicate that the process needs to be scheduled (see "Scheduler Workflow"). III. Policy In Linux 2.4, the process's scheduling policy can have three options: SCHED_FIFO (advanced first-out schedule, unless there is a higher priority process application running, the process will remain running to exit to let CPUs), SCHED_RR (Relief Scheduling, the process will be scheduled to be placed at the end of the running queue to ensure that other real-time processes have the opportunity to run), SCHED_OTHER (General Timed Scheduling Policy). In addition, policy also includes a SCHED_YIELD bit, and when set, the POLIC is set to actively discard the CPU. IV. RT_Priority is used to characterize the priority of the real-time process, from 1-99, non-real-time processes should be 0. This property will calculate the weight of the schedule (see "Ready Process Selection Algorithm").
v. counter This property is recorded in the current time slice that is also allowed to run (in units of CPU Clock Tick Values, each process is related to the nice value, the smaller the Nice, the bigger the Counter, that is, preferred The higher the level, the higher the obtained CPU time is relatively relatively, and participate in the "Ready Process Selection Algorithm". In Linux 2.4, each (non-SCHED_FIFO real-time) process is not allowed to run greater than a certain time slice, once the timeout, the scheduler will force another process to run (see "Scheduler Workflow") VI. Nice User-dominated process priority will participate in the Ready Process Select Algorithm, and this value also determines the length of the process of the process (see below). VII. CPUS_ALLOWED represents a CPU that can be used for this process in the form of a bit vector (see "Scheduler Workflow"). VIII. CPUS_Runnable represents the CPU currently running the process in the form of a bit vector. If not running on any CPU, it is all 1. This property is combined with the cpus_allowed attribute to quickly determine if the process can schedule to run (bit "and") on a CPU. IX. Processor This process is current (or recent) CPU number. x. thread is used to save the process execution environment (the values of each register and the IO operation permission mapping table), the content is similar to TSS. Since TSS is indexed in the CPU ID, and Linux cannot predict that the process replaced will run on which CPU is running on the next time, this information cannot be saved in TSS. 3. The Current core often needs Task_Struct of the process currently running on a CPU, pointing to this descriptor with a CURRENT pointer in Linux. The implementation of the Current uses a tip to achieve efficient access speed, this tip is related to the storage method of the Linux process task_struct. In Linux, the process used in the core level is different from the stack assigned and used in user-level. Because this stack is not high, only two pages (8kb) are allocated when the process is created, and the task_struct of the process is arranged on the top of the stack. (In fact, these two pages apply when the task_struct is applied. After initialization of Task_Struct, the ESP is preset to the end of the page as the core stack of the process, and the Task_Struct direction extends.) So, to access the Task_Struct of this process, only You need to do the following simple operations: __ASM __ ("ANDL %% ESP,% 0;": "= r": "0" (~ 8191 uL);
This sentence will work with the 0x0ffffe000 as "and", obtain the home base base of the core stack, which is the address of the task_struct. 4. Schedule_data Task_struct is a data structure for describing the process, which contains properties to the running CPU. In Linux, another data structure corresponds to the CPU, which can be used to access the process running on a CPU, which is defined as the Schedule_Data structure, including two properties: Curr pointer, pointing to the process running on the CPU Task_struct, usually accessed with a CPU_Curr (CPU) macro; Last_ScHEDule timestamp records the last time the process switching on the CPU, usually uses the Last_ScHEDule (CPU) macro to access. In order to make the access of the data structure can be consistent with the CPU's Cache Line size, schedule_data is organized to an aligned_data unit in SMP_CACHE_BYTES, and each CPU corresponds to an element on each CPU in the system. 5. The init_tasks scheduler does not directly use the init_task as the process linked list of the header, but only uses "iDLE_TASK". This process is in the CPU_IDLE () loop after guiding the system (see "IDLE process") of "other core applications"). In the SMP system, each CPU corresponds to an IDLE_TASK, and their Task_Struct pointer is organized into the init_tasks [nr_cpus] array, and the scheduler accesss these "iDLE" processes via the IDLE_TASK (CPU) macro (see "scheduler work Process"). 6. Runqueue_Head records all the transtent stateful processes (where the current running process is also in it, but idle_task), but idle_task is also in it, but iDLE_TASK is except that the scheduler is always put into operation. three. The process switching process switches from a context to another process because of its high frequency, so it is usually the key to the high efficiency of scheduler. In Linux, this feature is implemented in a classic assembly code, which focuses on this code. This code segment named Switch_to () is called during the Schedule (), with a macro:
/ * Announcement from [include / asm-i386 / system.h] * /
#define switch_to (prev, next, last) do {/
ASM Volatile ("Pushl %% ESI / N / T" /
"Pushl %% EDI / N / T" /
"Pushl %% EBP / N / T" / save ESI, EDI, EBP register
"MOVL %% ESP,% 0 / N / T" / ESP Save to Prev-> Thread.esp
"MOVL% 3, %% ESP / N / T" / from Next-> Thread.esp Restore ESP
"MOVL $ 1F,% 1 / N / T" / Save "1:" jump address in Prev-> Thread.eip, will start executing when the Prev is switched again
"Pushl% 4 / N / T" / Save next-> thread.eip on the stack, __ switch_to () will go there to perform, that is, enter the context of the next process
"JMP __Switch_to / N" / Jump to __Switch_to (), further processing (see below) "1: / t" /
"POPL %% EBP / N / T" /
"POPL %% EDI / N / T" /
"POPL %% ESI / N / T" / First resume the register value saved last time, and then returns from Switch_TO ().
: "= m" (prev-> thread.esp), /% 0
"= m" (prev-> thread.eip), /% 1
"= B" / EBX, because the process is switched, the prev information on the recovered stack is not just switched, so it is passed using the EBX register to deliver the value to prev
: "M" (Next-> Thread.esp), /% 3
"M" (Next-> Thread.eip), /% 4
"a" (prev), "D" (Next), / EAX, EDX
"B" (prev)); / EBX
} while (0)
The process switching process can be divided into two phases. The above assembly code can be seen as the first phase, which saves some key registers and sets the address of the new process on the stack. The second phase is started in Switch_TO (), implemented in the __switch_to () function, mainly for saving and updating some registers (and IO Operation Permissions Mapping Table IOPERM):
Unlazy_fpu (), if the old process sets a PF_USEDFPU bit in the task_struct's Flags, indicating that it uses the FPU, unlazy_fpu () saved the FPU content in Task_Struct :: Thread; use the new process ESP0 (Task_Struct :: thread Update the ESP0 of the corresponding location in the init_TSS; saves the current FS and GS registers in the Task_Struct :: Thread of the old process, then recover the FS and GS registers from the task_struct :: thread of the new process; restore from the new process Task_Struct :: Thread The value of the six debug registers; uses the corresponding content of the IOPERM update in init_TSS to the corresponding content switch_to () function back, the return address on the stack is the task_struct :: thread :: EIP of the new process, that is, the new process is hanging last time. The continued operation of the continued operation (the last execution Switch_TO () is executed "1:" location). It is running to the context of the new process. In the previous Linux kernel, the process of switching is FAR JMP instruction, 2.4 uses the hand-control jump as shown above, the action and the time used are similar to FAR JMP, but more to optimize and control. four. Ready Process Select Algorithm Linux Schedule () Function Traversed all processes in the ready queue, call the goodness () function calculates the weight Weight of each process, and the maximum amount of weight is put into operation. The calculation of process scheduling weights is divided into real-time processes and non-real-time processes. For non-real-time processes (SCHED_OTHER), the factors that affect weights have the following: 1. The number of Ticks left in the current time film, ie Task_struct's Counter, the greater the chance to get the CPU, the greater the CPU, because the initial value of the Counter is related, so this factor represents the priority of the process, and on the other hand The process of "default"; (Weight = p-> counter;) 2. The CPU running last time is the current CPU. If so, the weight increases a constant, indicating that the priority is not migrated from the CPU scheduling. Because the Cache information is valid at this time; (Weight = proc_change_penalty;) 3. Does switching if the memory needs to switch memory, if not required (or switch between the two threads of the same process, or the core thread without MM attribute) If the weight is 1, it is indicated (slightly) prioritizing the process without switching the memory; (Weight = 1;) 4. The priority Nice of the user visible Nice, the smaller the weight, the larger value. (The NICE value in Linux is selected between -20 to 19, the default value is 0, and the Nice () system call can be used to modify the priority.
(Weight = 20 - P-> nice;) For real-time processes (Sched_FIFO, SCHED_RR), the weight size is only determined by the RT_Priority value of the process (Weight = 1000 P-> RT_PRIORITY;), the reference quantity of 1000 The weight of the real-time process is larger than all non-real-time processes, so as long as there is a real-time process in the ready queue, the scheduler will give priority to its operational needs. If the weight is the same, select the procedure in the previous column in the ready queue to put into operation. In addition to the above standard values, Goodness may also return -1, indicating that the process sets the SCHED_YIELD bit, at which time it is only when there is no other ready process. If you traverse all ready processes, the weight value is 0, indicating that the current time slice is over, and the counter value of all processes (not just ready process) is recalculated, and then the ready process is selected (see "scheduler. work process"). Fives. The scheduler of the scheduler Linux is mainly implemented in the Schedule () function. 1. The basic process of the scheduler workflow schedule () function can be summarized in four steps: 1). Clean up the process in the current run). Select the next process 3). Set the running environment of the new process 4). Exercise process Context switch 5). In the later finish included some lock operations: Ready queue lock Runquque_lock, global core lock KERNEL_FLAG, global interrupt lock global_irq_lock, process list lock tasklist_lock. The following is first describing the working process of the scheduler from the lock operation. A. Related locks
Runqueue_lock, defined as self-spin lock, must lock before the ready queue is operated; kernel_flag is defined as a self-spinker, because many core operations (such as driving) need to ensure that only one process is executed, so you need to call Lock_kernel () / release_kernel () operates the core lock, which is still set on the task_struct :: Lock_Depth while locking / unlocking kernel_flag, and Lock_Depth is less than 0 means unlocked. When a process switching occurs, the process that is not allowed to be switched with a kernel_flag lock, so it must be called Release_kernel_lock () to force release, and if the new process is put into runtime if Lock_Depth> 0, that is, the process is held before The core lock must call the reacquire_kernel_lock () again; global_irq_lock, define the overall memory length integer, use the Clear_bit () / set_bit () series, it is in conjunction with the global_irq_holder to indicate which CPU has a global interrupt lock, the lock Hang up the interrupt process within the global range (see IRQ_Enter ()); Tasklist_lock, defined as a read-write lock, protecting the process list structure with init_task. B. Prev In Schedule, the current process (which may be scheduled to go) is accessed with the Prev pointer. For the real-time process of SCHED_RR, only when the process time piece ends (counter == 0), it will switch to another process, and will reset the Counter according to the NICE value, and the process is placed at the end of the ready queue. Of course, if there is no other real-time process in the current ready queue, the scheduler will still select to the process according to the goodness () algorithm mentioned earlier. If the process in the Task_Interruptible status is required to process (this may happen to wait for the signal to wait for the signal to discard the CPU, before the CPU is abandoned, the signal has occurred), the scheduler does not immediately execute the process, but will This process is ready (the process is also deleted in the future and from the ready queue), participating in the immediate Goodness selection. If Prev is not in the read, it is not in the above-mentioned hanging state (PREV is waiting for resources, actively calling Schedule () Abandon CPU), then remove it from the ready queue, after which it is awakened Operation resets the process back to the ready queue, otherwise it will not participate in the schedule. When the passive manner starts the scheduler, the NEED_RESCHED attribute of the current process will be set (see "Scheduler Working Timer"). In Schedule (), this bit will be cleared, indicating that the process has been processed in the scheduler (of course, this process does not mean that the process must obtain a CPU). C. Goodness scheduler traverses all the processes in the ready queue, as long as it is currently scheduled (CPUS_Runnable & CPUS_ALLOWED & (1 << CPU), indicating that the process can run on the CPU of the currently running scheduler, and is not on other CPUs Run), call Goodness () to calculate its weight.
Next pointer is used to point to the maximum weight, the default points to iDLE_TASK, and if the reader is empty, use the default idle_task as Next. As mentioned earlier, if the maximum weight after the end of the traversal is 0, it means that the time slice of all the ready-to-scheduled processes is used, and the scheduler will need to reset all processes (including ready and Hally, the COUNTER value is not completed (for example, the currently hang process or process currently running on other CPUs), half of its remaining time slice will be superimposed into the new time film. Setting the selected process to run on this CPU (task_set_cpu ()), Runqueue_lock can be unconnected, and next will configure the next. D. NEXT selection The new process may just need to replace the old process, because it is actually not required to switch, so you can skip the two phases of the configuration next and "Switch" and "Schedule_tail" below. The operating environment of the new process is actually mainly the memory. There are two memory properties associated with the scheduler: MM and Active_mm, the former pointing to the memory area owned by the process, and the latter pointing to the memory actually used by the process. For most processes, MM and Active_mm are the same, but the core thread is not autonomous memory, and their MM pointers are always NULL. We know, in the false page table of any process, the core space will always map the high-end fixed position of the virtual memory, so the memory used by the core thread is the same, so there is no need to switch the process. . In the scheduler, you can know if NEXT-> MM is empty, you can know that the process is a core thread. If so, continue using prev's Active_mm (Next-> Active_MM = Prev-> Active_MM), and by setting CPU_TLBSTATE [CPU] .State is TLBSTATE_LAZY, telling the memory management component not to refresh the TLB; otherwise, call the switch_mm () function to switch the switch (the specific process involves the knowledge of the memory management module, here is omnidably). In fact, the prev-> Active_mm and Next-> mm will be judged once in Switch_MM (), if two values are equal, the two processes are two "threads" belonging to a "process" (actually light The amount process), does not need to perform memory switching, but this situation is still refreshed. After setting the NEXT memory environment, you can call MMDrop () to release the memory structure of the Prev. All processes that are not running, their Active_MM properties should be empty. E. Switch process switching process has been described in detail above. F. Schedule_tail After the switch is complete, the scheduler will call __schedule_tail (). This function has no impact on the UP system. For the SMP system, if the process of being switched (reply) is still in the read state and is not dispatched by any CPU, __ schedule_tail () will call reschedule_idle (), pick one for P Idle (or the running process priority than P low) CPU and forces the CPU to re-schedule to re-enter P.
When the process wakes up from the sleep state, a suitable CPU is also required to operate, which is implemented by calling reschedule_idle () in the wake_up_process () function. The principle of picking the CPU is as follows: p The last CPU is currently idle. Obviously, this is the best choice, because it does not need to seize the CPU, CPU Cache is most likely to match P. However, since the P is running, the scheduler cannot schedule to IDLE_TASK, so this situation will only occur when Wake_up_process (). One of the least recently active in all idle CPUs (Last_Schedule (CPU) minimum). The Cache information in the CPU is most likely to use, so this option can be as best possible to reduce the cost of seize the CPU, while avoiding frequent preemptions as much as possible. It is worth noting that on the SMP platform that supports the CPU supporting the Hyper-Treading Technology, once the two logical CPUs of the physical CPU are found, one of the logical CPUs of the CPU immediately becomes the Scheduled CPU of the P candidate without the need. Continue to find the least recently active CPU. The CPU is not idle, but the running process priority is lower than the priority of P, and the difference is the largest. Calculate the priority is a Goodness () function because the information it contains is most. After finding the appropriate CPU, reschedule_idle () will set the target process (the process running on the CPU, may be iDle_task), so that the scheduler can work (see "scheduler working opportunity"). At the same time, because iDLE_TASK is in many cases, the CPU is stopped (HALT) state to power saving, so it is necessary to call SMP_SEND_RESCHEDULE (CPU) to send a reschedule_vector interrupt (via the IPI interface) to the CPU to awaken the CPU. Note: The target process is the case of idle_task, but also to determine its NEED_RESCHED flag, only when it is 0, start scheduling, because the idle_task itself is not checking the need_resched value, which will start Schedule () (See "iDLE process"). There are two results from the CLEAR scheduler working: switching, no switching, but the cleaning operation before the scheduler exits is the same, that is, the state of restoring the new process. Mainly includes two actions:
The SCHED_YIELD bit (regardless of whether it is set); if it is set); if the new process (p) is greater than or equal to 0, it is restored to the core lock kernel_flag (see "related lock"). 2. Startup of the scheduler working timer scheduler usually there are two ways: A. Active calls Schedule directly in the core application. This usually occurs when the process needs to be placed in a suspend (sleep) state because of the waiting core event - then actively request schedule to facilitate other processes to use the CPU. Below is an example of active schedule:
/ * Extract from [Drivers / INPUT / MOUSEDEV.C] mousedev_read () * /
Add_wait_Queue (& List-> Mousedev-> Wait, & Wait);
Current-> State = Task_Interruptible;
While (! List-> Ready) {
IF (file-> f_flags & o_nonblock) {retval = -eagain;
Break;
}
IF (Signal_pending (current)) {
RETVAL = -Restartsys;
Break;
}
Schedule ();
}
Current-> state = task_running; / * This sentence can actually omit, because the status of the process has been restored to task_running during the wakeup process * /
REMOVE_WAIT_QUEUE (& List-> Mousedev-> Wait, & Wait);
The process can usually be divided into four steps:
Add the process to the event waiting queue; set the process status as task_interruptible (or Task_uninterruptible); check the waiting condition in the loop to be satisfied, call schedule (), to quit the loop; remove the process from the event waiting queue to delete the process . From the Distributor Workflow, we know that the scheduler will delete the process in the sleep state from the ready queue, and only the process in the ready queue may be scheduled. The action replaced in the ready queue is done during the "wake-up" process when the event occurs. In the mouse drive shown above, the mouse interrupt will call the mousedev_event () function, which will use Wake_up_interruptible () to wake up all the processes waiting for the mouse event. Wake_up_interruptible () will eventually call the try_to_wake_up () function:
/ * Extreme [kernel / SCHED.C] * /
Static Inline INT TRY_TO_WAKE_UP (Struct Task_struct * p, int synchronous)
{
Unsigned long flag;
INT Success = 0;
Spin_lock_irqsave (& Runqueue_lock, Flags);
P-> State = Task_running;
IF (Task_on_runqueue (p))
Goto Out;
Add_to_runqueue (p); / * Add to the ready queue * /
IF (! Synchronous ||! (P-> CPUS_ALLOWED & (1 << SMP_PROCESSOR_ID ())))))))))
RESCHEDULE_IDLE (P); / * Call wake_up () in this case, Synchronous is always 0, at this time, * /
/ * If this CPU is not suitable for running the process, you need to call Reschadule_Idle () to find the right CPU * /
Success = 1;
OUT:
Spin_unlock_irqrestore (& Runqueue_lock, Flags);
RETURN SUCCESS;
}
At this time, SCHEDULE () is passive. B. After the system call execution is completed, the control is returned by the core state to the user state, and Linux will check the NEED_RESCHED value of the current process in the RET_FROM_SYS_CALL entry. If the value is 1, call Schedule ():
/ * Infaite [Arch / I386 / kernel / entry.s] * /
Entry (RET_FROM_SYS_CALL)
CLI
CMPL $ 0, NEED_RESCHED (% EBX) #ebx stores a Current pointer
Jne Reschandule
......
Reschedule:
Call symbol_name (Schedule)
JMP RET_FROM_SYS_CALL # repeatedly query NEED_RESCHED
Therefore, you only need to set up the current process NEED_RESCHED, there is a chance to start the scheduler. There is usually a few occasions will set up_resched:
Update_process_times (), triggered by the clock interrupt, responsible for managing the time slice consumption of other processes other than the 0th process (IDLE process). If the current process (except for the SCHED_FIFO real-time process) is used up (counter == 0), set NEED_RESCHED to 1; (Note: At this time, it does not calculate or reset the counter value, this work is in all processes. After exhaling, in Schedule_idle (), this function has been described in detail in the "Scheduler Workflow" section, but the most frequent caller is sleeping on a queue in a queue in a queue. Process of the process --Wake_up_process () and other series of Wake_up functions (see "Active Scheduling"); Sched_setScheduler (), SCHED_YIELD () system call, and system initialization (REST_INIT ()), create a new process (Do_Fork () In the case where the semantic is desired to start the scheduler. Since the timing of starting the schedule () is actually determined by the current process, it is not meant to schedule it in time, which is also the cause of "Linux kernel can't be sew" (see "Some of Linux 2.4 scheduling system" "Nuclear can't be preemptive"). six. Many of the technologies in the scheduling related part of the other core applications are related to the scheduler. Only a few of them are slightly expanded, and the details of the technology are not involved, only the parties related to the scheduler are discussed, assuming the reader This technology has a preliminary understanding. 1. The initial boot process of the IDLE process system (init_task) becomes the IDLE process on the CPU 0 after the boot is over. There is an IDLE process on each CPU. As mentioned above, these processes are registered in the init_tasks [] array and can be accessed by IDLE_TASK () macro (see "related data structure"). The IDLE process does not enter the ready queue. After the system is stable, the iDLE process will be scheduled only when the reader is empty. INIT_TASK's task_struct is a static configuration, defined in the init_task () macro in [include / Linux / Sched.h], and several attributes associated with scheduling are:
State: Task_Running; Counter: 10 * Hz / 100; I386 About 100ms Nice: 0; Default Priority Policy: Sched_other; non-real-time process cpus_runnable: -1; all 1, not running on any CPU CPUS_Allowed: -1 [All 1, can run in SMP_INIT () on any CPU (actually in SMP_BOOT_CPUS () in [Arch / I386 / Kernel / SmpBoot.c]), the init_task's processor property is set to 0, corresponding Schedule_data also sets the corresponding value. After creating a core thread after executing the init () function ([/init/main.c]rest_init ()), theinit_task sets your own NED_RESCHED equal to 1, then call the CPU_Idle () into the IDLE loop. In CPU_Idle (), the NICE value of init_task is set to 20 (minimum priority), and counter is -100 (smaller enough is not meaningful), then CPU_Idle () enters an infinite loop: / * Selection from [Arch / I386 / KERNEL /Processs.c] CPU_IDLE () * /
While (1) {
Void (* idle) = PM_IDLE;
IF (! idle)
IDLE = Default_Idle;
While (! Current-> NEED_RESCHED)
Idle ();
Schedule ();
Check_pgt_cache ();
}
The first time CPU_Idle () during the initialization process, because NEED_RESCHED is 1, so the schedule () is launched directly. As mentioned above, Schedule () will clear the NEED_RESCHED bit, so this loop will be executed until the NEED_RESCHED is set to non-0 (such as in reschedule_idle (), see "scheduler working time "). There are three implementations of IDLE () functions:
Default_idle (), execute the HLT command; "IDLE = POLL" is defined if the core parameter is defined, and PM_Idle will point to Poll_Idle (), which sets the NEED_RESCHED to special -1, then repeated loops until NEED_RESCHED is not equal to 1. Because Poll_Idle () uses a more efficient instruction, running efficiency is higher than default_idle (); the power management-related IDLE process, such as the IDLE process defined in the APM and ACPI modules. Because only when the read queue is empty, it will be scheduled to the IDLE process, so the check_pgt_cache () operation is only executed when the system is completely idle, and the page table cache is cleared. 2. Process creation system In addition to init_task is manually created, other processes, including the IDLE process, including other CPUs, is created by do_fork (), where the CLONE_PID flag is used when creating an IDLE process. In Do_Fork (), the properties of the new process are set to:
State: Task_uninterruptible PID: If Clone_PID is set, the same is the same as the parent process (only 0), otherwise, for the next reasonable PID CPUS_Runnable: All 1; not running Processor on any CPU: The same as the parent process; child process Where can I create a preference to run the Counter: The parent process counter value plus one half; the parent process is also halved, guaranteed that the process cannot steal more runtime (same, in child process End running, its remaining time fillet will also return to the parent process, so as not to cause the parent process to suffer from the loss of the time film by the sub-process. The same child process is connected to the process list through the SET_LINKS () link, and then call wake_up_process () Wake up (see "scheduler working opportunities"). 3. SMP System Initialization Init_Task After completing the key data structure initialization, SMP_init () is called to initialize the SMP system before performing the initialization of hardware. SMP_INIT () calls SMP_BOOT_CPUS (), SMP_BOOT_CPUS () calls each CPU once a DO_BOOT_CPU (), completes the initialization of SMP other CPUs. / * Extract from [Arch / I386 / kernel / SmpBoot.c] do_boot_cpu () * /
IF (fork_by_hand () <0) / * do_fork (clone_vm | clone_pid) Creates a new process, with INIT_TASK, with 0 PID * /
PANIC ("Failed for CPU% D", CPU);
IDLE = init_task.prev_task; / * In the list of processes, the new process is always located on the left chain of init_task. * /
IF (! idle)
PANIC ("NO Idle Process for CPU% D", CPU);
IDLE-> Processor = CPU;
IDLE-> CPUS_Runnable = 1 << CPU; / * Run on the specified CPU * /
Map_cpu_to_boot_apicid (CPU, APICID);
Idle-> thread.eip = (unsigned long) start_secondary; / * Startup address that is scheduled to the post * /
Del_from_runqueue (idle); / * iDLE process does not pass ready queue dispatch * /
Unhash_process (idle);
INIT_TASKS [CPU] = iDLE; / * All IDLE processes can be accessed through init_tasks [] array * /
This process is scheduled to time, and then execute start_secondary (), eventually calls CPU_IDLE (), becoming the IDLE process. Seven. Linux 2.4 Scheduling System Some Problems 1. Process Time Film 2.4 The process default time film in the core is calculated according to the following formula:
/ * Extreme [kernel / SCHED.C] * /
#if hz <200
#define tick_scale (x) ((x) >> 2)
#LIF HZ <400
#define tick_scale (x) ((x) >> 1)
#ELIF HZ <800
#define tick_scale (x) (x)
#ELIF HZ <1600
#define tick_scale (x) ((x) << 1)
#ELSE
#define tick_scale (x) ((x) << 2)
#ENDIF
#define nice_to_ticks (nice) (Tick_Scale (20- (nice)) 1)
......
Schedule ()
{
......
P-> counter = (p-> counter >> 1) nice_to_tics (p-> nice);
......
}
As mentioned above, the clock interrupt will continue to perform the operation of the currently running non-IDLE process to reduce the remaining value of 1, until the counter in all ready queues is reduced to 0, just in schedule () for each process (including Sleeping process) Use the above formula to perform the update of the time slice. Among them, Hz is 100 in [include / ASM-I386 / param.h], and counter typically 0, NICE defaults to 0 (Nice is selected between -20 to 19), so I386 COUNTER The default value is 6, that is, about 60ms (clock interrupt approximately every 10ms). At the same time, for the process of sleep, it participates in the calculated counter non-0, so its Counter is accumulated, constitutes a number of quadrograms counter = Counter / 2 K, 1 / * Announcement from [Arch / I386 / ENTRY.S] * / Entry (RET_FROM_INTR) Get_current (% EBX) # Saves the Current pointer to the EBX register RET_FROM_EXCEPTION: MOVL EFLAGS (% ESP),% EAX # Take the VM_Mask bit in EFLAGS to determine if it is in VM86 mode MOVB CS (% ESP),% Al # take CS low two judgments in user state TESTL $ (VM_MASK | 3),% EAXJNE RET_FROM_SYS_CALL # If in VM86 mode or in a user state, return from the RET_FROM_SYS_CALL entry, otherwise it will return directly JMP Restore_all This is the only place that may call Schedule () at this time (see "scheduler working opportunity", but the normal core thread does not belong to any required state, it can respond to interrupt, but cannot cause dispatch. One of this feature is that the high priority process cannot interrupt the low priority process that is executing the system call (or interrupt service) in the nuclear, which is fatal for real-time systems, but simplifies the core code. Many places in the kernel use this feature, you can access shared data without excessive protection, without worrying about the disturb of other processes. 3. Real-time performance Linux 2.4 Distinguish the real-time process and non-real-time processes, as long as there is a real-time process, the non-real-time process will not be run. Linux also divides the real-time process into two categories: SCHED_RR and SCHED_FIFO. The schedule will occur after the SCHED_RR time film, and places yourself at the end of the ready queue, thereby runs the same (or higher) real-time process for other RT_PRIORITY (see "Scheduler Workflow"), and SCHED_FIFO will not The film is over to discard the CPU (see "Scheduler Work Time"), or a higher priority real-time process, or actively discard the CPU, otherwise SCHED_FIFO will run until the end of the process. Although Linux 2.4 distinguishes out the real-time processes and non-real-time process scheduling priorities, it is only. The operating system that does not support the core preemption is difficult to achieve real real-time because the response time of real-time tasks cannot be predicted. There are two ways to make the system's real-time real-time, one is to use the settings similar to seize the schedule points, one is to make the kernel truly grabbility. Even if the kernel can seize, it does not necessarily satisfy the real-time requirements. It only solves the access priority issue of the CPU resource. Other resources also need "preemptive", such as the real-time process, you should be able to share a share The common process of resources won the resources it needs, and it will give a normal process after it is used. But in fact, many systems can't do this, Linux's scheduler does not have this ability. 4. Limitations of Linux in multiprocessor systems are originally designed for single processor systems, and continuously through patches to improve the performance efficiency of multiprocessor systems (mainly SMP systems) during kernel development. This development method has continued to version 2.4, so in the 2.4 kernel, SMP applications still have many obstacles that cannot break through, such as global sharing Ready queues. Many studies have studied the multi-processor extension of the Linux scheduler, and two [5] [6] are listed in the references, but the most authoritative improvement is still in the 2.6 core. For the Hyper-throw CPU, the Linux scheduler supports limited, it can distinguish between two logical CPUs on the same physical CPU, when both logical CPUs are idle, the scheduler can prioritize the process to run on one of the logical CPUs. (See "Scheduler Workflow"). As in principle, the hyper-thread CPU is a single CPU with two (or more) executing the site, only two threads using the CPU different components (such as fixed-point components and floating-point components) are active. The acceleration, otherwise, due to the implementation component conflict and Cache Miss, using hyper-threading techniques even brings some degree of performance loss. Linux 2.4 scheduler does not distinguish which processes are "similar", which processes use different execution parts, so the ultra-thread CPU cannot be properly used.