Abnormal is the mutation in the control stream to respond to certain changes in the processor state. When the processor detects an event occurs, it will make an indirect process call via a jump table called an exception table, and a specially designed operating system subroutine that deals such events. This table is Interrupt Description Table IDT. This article will analyze and debug the Linux0.11 code to understand the interrupt mechanism, mainly to analyze the following three questions:
1. The establishment of the interrupt descriptor table.
2. The general interruption process is interrupted as an example of 0x3.
3. The process of system call is called by the Fork system call.
For the establishment of the debugging environment, please refer to: Boot the code oscillate in the codeword in memory from Linux0.11.
Interrupt descriptor table
The creation code of the Interrupt Descriptor Table (IDT) In boot / head.s, the creation of the global descriptor table is similar, the kernel performs the LIDT IDT_DESCR instruction to complete the creation, the global variable IDT_DESCR is as follows:
IDT_DESCR: .WORD 256 * 8-1 # idt contains 256 entries .long _idt_idt: .fill 256, 8, 0 # idt is uninitialized
The LIDT command is 6-byte operands, which loads the _idt address into the IDTR register, and IDT is set to include a descriptor table of 256 8-byte entry.
The initialization of the interrupt descriptor table is mainly completed by macro _SET_GET, which is defined in include / asm / system.h, as follows:
#define _SET_GATE (GATE_ADDR, TYPE, DPL, ADDR) / __ASM__ ("MOVW %% DX, %% AX / N / T" / "MOVW% 0, %% DX / N / T" / "MOVL %% EAX, % 1 / N / T "/" MOVL %% EDX,% 2 "/: /:" i "((SHORT) (0x8000 (DPL << 13) (Type << 8))), /" O " (* (CHAR *))), / "o" (* (4 (CHAR *) (GATE_ADDR))), / "D" ((CHAR *) ((ADDR)), "A" 0x00080000)) / * Set the interrupt gate function, privilege level 0, type 386 interrupt door * / # define set_intr_gate (n, addr) / _SET_GATE (& IDT [N], 14, 0, addr) / * Set trap door function, privilege Level 0, Type 386 Trap Door * / # define set_trap_gate (n, addr) / _SET_GATE (& IDT [N], 15, 0, AddR) / * Settings System Call Function, Privilege Level 3, Type 386 Trap Door * / # Define Set_system_gate (n, addr) / _SET_GATE (& IDT [N], 15, 3, ADDR)
The kernel will use these macros to initialize the IDT table, the code is as follows:
/ * Excerpted from kernel / traps.c, trap_init function * / set_trap_gate (0, & divide_error); set_trap_gate (1, & debug); set_trap_gate (2, & nmi); set_system_gate (3, & int3); / * int3-5 can be caled from all * / set_system_gate (4, & overflow); set_system_gate (5, & bounds); set_trap_gate (6, & invalid_op); set_trap_gate (7, & device_not_available); set_trap_gate (8, & double_fault); set_trap_gate (9, & coprocessor_segment_overrun); set_trap_gate (10, & invalid_TSS ); set_trap_gate (11, & segment_not_present); set_trap_gate (12, & stack_segment); set_trap_gate (13, & general_protection); set_trap_gate (14, & page_fault); set_trap_gate (15, & reserved); set_trap_gate (16, & coprocessor_error); for (i = 17; i <48; i ) set_trap_gate (i, & reserved); set_trap_gate (45, & irq13); set_trap_gate (39, & parallel_interrupt); / * Taken kernel / chr_drv / serial.c, rs_init function * / set_intr_gate (0x24, rs1_interrupt); set_intr_gate (0x23, RS2_Interrupt); / * Excerpt from K ernel / chr_drv / console.c, con_init function * / set_trap_gate (0x21, & keyboard_interrupt); / * Taken kernel / sched.c, sched_init function * / set_intr_gate (0x20, & timer_interrupt); set_system_gate (0x80, & system_call); / * taken from the kernel /BLK_DRV/HD.C ,HD_INIT function * / set_intr_gate (0x2e, & hd_interrupt); / * Excerpted from kernel / blk_drv / floppy.c, floppy_init function * / set_trap_gate (0x26, & floppy_interrupt);
Each interrupt vector number is not explained here. Interested comrades can refer to the "80386 and its programming in the protection mode" and Dr. Zhao Wei's "Linux core complete notes" published by Tsinghua University Press. The process will analyze in detail in the following examples. Now we care about the IDT of the initialization, debugging the contents of this table, select 0x0, 0x20, 0x80 interrupt as an example. By viewing the system.map file: 0x0 interrupt calling Divide_Error function address is 0x8DEC, 0x20 interrupt call Timer_Interrupt function address is 0x74f
The SYSTEM_CALL function address of the 4,0x80 interrupt is 0x7418. When the kernel calls the Fork function to create a child process, the IDT table is initialized, so we are in the Fork function address 0x.
753C
Set breakpoints, start bochsdgb for debugging, command line as follows:
753C
(0) Breakpoint 1, 0x
753C
In ?? ()
Next At t = 16879006
(0) [0x
0000753C
] 0008:
0000753C
(unk. ctxt): Call. 0x93d4; E8931E00
00
......
IDTR: Base = 0x54b8, limited = 0x7ff
......
The address of the IDT base is 0x54b8, 0 interrupt descriptor is 0x54b8 0 * 8 = 0x54b8, 20 interrupt descriptor's address is 0x54b8 0x20 * 8 = 0x555B8, 8 interrupt descriptor's address is 0x54b8 0x80 * 8 = 0x58b8, check the 8-byte content of the three addresses of memory, the command line is as follows:
[bochs]:
0x000054B8
00008F
00
[bochs]:
0x000055B8
000874F
4 0x00008E00
[bochs]:
0x000058B8
The door descriptor has the following form:
M 7M 6M 5M 4M 3M 2M 1M 0Offset (31 ... 16) AttributeSselectorOffset (15 ... 0)
BYTE M 5BYTE M 4bit7bit6bit5bit4bit3bit2bit1bit0bit7bit6bit5bit4bit3bit2bit1bit0pdpldt0type000dword country
Therefore, debugging information shows that the 0x0 interrupt descriptor interrupt call address is 0x0008: 0x00008DEC, is a 386 trap door having a privilege level 0, and the 0x20 interrupt descriptor interrupt call function address is 0x0008: 0x
000074F
4, is a 386 interrupt gate of a privileged level, and the 0x80 interrupt descriptor interrupt call function address is 0x0008: 0x00007418, is a 386 trap door with a privileged level of 3. This is consistent with the pre-analyzed case.
Mission's kernel state stack
Introduce the kernel state stack of the task before analyzing the interrupt response process. When the interrupt event occurs, the interrupt source issues an application to the CPU. If the CPU accepts it, save the current register status, interrupts the number of information such as the address, and then the CPU rotates the corresponding event handler. After the interrupt is processed, the CPU will restore the previously saved information and continue the original work. Because interrupt processing needs to be carried out in kernel state, each task has a kernel stack to complete the protection site and restore the site in the interrupt processing. This kernel stack and the task data structure of each task are placed in the same page. When you create a new task, the Fork function is set in the Task TSS core field, and the code is located in the COPY_PROCESS function of kernel / fork.c, as follows:
/ * p The new task that needs to be created * / p-> tss.esp0 = page_size (long) P; P-> TSS.SS0 = 0x10;
The value of TSS.ESP0 and TSS.SS0 is not changed when the task is kept, so this stack is always empty when the task has entered the kernel work.
General interruption processing process
The 0x3 interrupt is used to suspend the execution of the program. By viewing Linux code, you can know that the processing of this interrupt is merely print some register status information. Selecting this interrupt as an example meaning: it has a complete protection site and recovery of the site (such as the processing of the 0x0 interrupt will directly terminate the process without restoring the site); the interrupt signal can be generated by the user.
The 0x3 interrupt handler INT3 is defined in Kernel / asm.s, as follows:
# 书 书 顺 序 排 排 排 是%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% EDX PUSHL% EDI PUSHL% ESI Pushl% EBP PUSH% DS PUSH% ES PUSH% FS Pushl $ 0 # "Error Code" Lea 44 (% ESP),% EDX PUSHL% EDX MOVL $ 0x10,% EDX MOV% DX,% DS MOV% DX,% ES MOV% DX,% FS Call *% EAX # Call the actual interrupt handler AddL $ 8,% ESP # below operation for restoring the action POP% FS POP% ES POP% DS POPL% EBP Popl % ESI POPL% EDI POPL% EDX POPL% ECX POPL% EBX POPL% EAX IRET
Here is a problem: when the privilege level change occurs, when is the user state stack pointer save and recovers? The answer is that the CPU response is automatically enabled, and the data is automatically put into the stack when the IRET instruction is executed. The following experiments can verify this.
The next test is more cumbersome, follow the steps below:
1. Write a program that generates an interrupt of 0x3.
2. Set breakpoints at the INT3 function address, see the contents of the kernel stack at this time, that is, verify the action of the protection site.
3. Perform until the interruption returns, verify the role of the IRET instruction, that is, verify the action of the restoration site. Writing the program that produces an interrupt of 0x3 is very simple, starting BOCHS Linux-0.11-Devel-040329 (this IMG is added by Dr. Zhao Wei). Create editing a C file INT3.c with VI, the code is as follows:
#include
3 "
RETURN 0;}
Compile this file to generate executive INT3.
It can be seen that the address of the 0x3 interrupt handler _int3 is 0x8E by viewing the system.map file.
2F
. Start bochsdgb for debugging, the command line is as follows:
2F
(0) Breakpoint 1, 0x8e
2F
In ?? ()
Next at t = 143245141
(0) [0x00008E
2F
】 0008: 00008E
2F
(unk. ctxt): push 0x7af4;
68F
47A
00
00
First pay attention to the contents of the kernel stack, the SS0 and ESP0 fields in the TSS structure of the current task (0x60-0x20) / 8 = 8 tasks contain the segment descriptors and stack pointers of the kernel stack, and the address of the TSS structure by GDT Table TSS descriptor is provided. Continue to debug, the command line is as follows:
......
ESP: 0xfa3fec # This value is in the later analysis will be used
......
Tr: s = 0x60, DL = 0x32e80068, DH = 0x89fa, Valid = 1
GDTR: Base = 0x5cb8, limited = 0x7ff
......
[bochs]:
0x00005d18
[bochs]:
0x00fa32e8
0x00000000
0x00fa
32F
8
0x00000000
0x00fa3308
0x00000005
0x00fa3318
000574C
0 0x00000014 0x03FFFDD8
0x03fffde4
0x00fa3328
0x
0000000F
0x00fa3338
0x00000017
0x00FA3348
BIT31-BIT16BIT15-BIT1BIT0OffsetData0000000000000000 link field 00x00000000ESP040x00fa40000000000000000000SS080x00000010ESP10CH0x000000000000000000000000SS110H0x00000000ESP214H0x000000000000000000000000SS218H0x00000000CR31CH0x00000000EIP20H0x000398afEFLAGS24H0x00000246EAX28H0x00000000ECX2CH0x00000005EDX30H0x000574c0EBX34H0x00000014ESP38H0x03fffdd8EBP3CH0x03fffde4ESI40H0x00000001EDI44H0x000000000000000000000000ES48H0x000000170000000000000000CS4CH0x
0000000F
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Table 1: TSS structure of Task 8
It can be seen from Table 1: The starting stack pointer of the core state of the task 8 is 0x00fa4000. View register status can know that the current stack pointer points to 0x00FA3FEC, with the top of the stack 20/4 = 5 words, debugging the content of these 5 words, the command line is as follows:
[bochs]:
0x00fa3fec
0000001C
0x
0000000F
0x00010202
0x03ffFefc
0x00FA3FFC
This information is that the CPU is automatically saved before entering the INT3 interrupt processing. Reference Zhao Wei's "Linux kernel full annotation" can be seen: before the user program (process) is handed over to the interrupt handler, the CPU will first at least 12 The information of the byte is pressed into the stack of the interrupt handler. This situation is similar to a long call (segment sub-program call). The CPU will select the code segment to press the offset value in line with the return address. Another place to call comparison with segments is 80386 to press the information into the stack of the destination code. When an interruption occurs, this destination stack is the kernel stack. In addition, the CPU always presses the contents of the flag register EFLAGS into the stack. If the priority changes, such as change from user-level to the kernel system level, the CPU also presses the original code stack segment value and the stack pointer into the stack of the interrupt program.
According to the stack, the debug information is set down down, as shown in the following table:
0x0000 original SS0X00000017 original ESP0X03FFFEFCEFLAGS0X000102020X0000CS0X
0000000F
EIP0X
0000001C
Table 2: Content of stack when interrupt
When the IRET instruction is executed, it is also similar to the return from a subroutine call. These contents in the stack will automatically pop up into the response register and complete the operation of the interrupt return to recovery site. Debug to verify this process, the command line is as follows:
(0) [0x00008E34] 0008: 00008E34 (UNK. CTXT): JMP. 0x8DF1; EBBB
Next At t = 172477605
(0) [0x00008DF1] 0008: 00008DF1 (UNK. CTXT): XCHG DWORD PTR SS: [ESP], EAX; 87042
4
......
00008E20: (): IRETD; CF
(0) Breakpoint 2, 0x8e
20 in
?? ()
Next At t = 172498467
(0) [0x00008E20] 0008: 00008E20 (UNK. CTXT): IRETD; CF
Next At t = 172498468
(0) [0x00FAC
01C
]
000f
:
0000001C
(unk. ctxt): xor Eax, Eax;
31C
0
......
ESP: 0x3ffefc
EFLAGS: 0x10202
EIP: 0x
1C
CS: S = 0xF, DL = 0x0, DH = 0x
10C
0FA00, VALID = 1
SS: S = 0x17, DL = 0x3FFF, DH = 0x
10C
0F
300, VALID = 1
......
There is no need to explain, table 2 and the register status information of the above will explain the problem.
System calling process
Taking the system calling the fork function as an example, it is defined as follows:
/ * Excerpted from init / main.c * / static inline _syscall0 (int, fork) / * Excerned from include / unistd.h * / # deflude __nr_fork 2 / * Excerpted / UnisTd.h * / # define _syscall0 (Type, Name) / Type name (void) / {/ long __res; / __ ASM__ Volatile ("INT $ 0x80" /: "= a" (__res) /: "0" (__nr _ ## name); / if (__res> = 0 ) / Return (type) __res; / errno = -__ res; / return -1; /}
__Nr_fork Value 2 is an index of the jump table for the system call interrupt processing. This system call function pointer table is defined as follows:
/ * Taken include / linux / sched.h * / typedef int (* fn_ptr) (); / * Taken include / linux / sys.h * / fn_ptr sys_call_table [] = {sys_setup, sys_exit, sys_fork, sys_read, sys_write, sys_open , sys_close, sys_waitpid, sys_creat, sys_link, sys_unlink, sys_execve, sys_chdir, sys_time, sys_mknod, sys_chmod, sys_chown, sys_break, sys_stat, sys_lseek, sys_getpid, sys_mount, sys_umount, sys_setuid, sys_getuid, sys_stime, sys_ptrace, sys_alarm, sys_fstat, sys_pause, sys_utime , sys_stty, sys_gtty, sys_access, sys_nice, sys_ftime, sys_sync, sys_kill, sys_rename, sys_mkdir, sys_rmdir, sys_dup, sys_pipe, sys_times, sys_prof, sys_brk, sys_setgid, sys_getgid, sys_signal, sys_geteuid, sys_getegid, sys_acct, sys_phys, sys_lock, sys_ioctl, sys_fcntl , sys_mpx, sys_setpgid, sys_ulimit, sys_uname, sys_umask, sys_chroot, sys_ustat, sys_dup2, sys_getppid, sys_getpgrp, sys_setsid, sys_sigaction, sys_sgetmask, sys_ssetmask, sys_setreuid, sys_setregid}; sys_call_table [2] value is sys_fork function pointer, the function of this function is not The focus of our research, interested comrades can refer to other information.
Expand the macro _syscall0 and __nr_fork:
Staic inline int fork (void) {long __res; __asm__volatile ("int $ 0x80": "= a" (_RES): "0" (2)); / * EAX value is 2 * / if (__res> = 0) Return (int) __res; errno = -__ res; return -1;}
Now the function of the fork function is very clear: set the value of Eax to 2, generate 0x80 interrupt, 0x80 interrupt interrupt processing function is SYSTEM_CALL (Remember? Set_system_gate (0x80, & system_call)). System_call is defined as follows:
_System_Call: CMPL $ NR_SYSTEM_CALLS-1,% EAX #EAX Save System call jump function table index value ja bad_sys_call push% DS # Protection Field PUSH% ES PUSH% FS Pushl% EDX PUSHL% ECX # push% EBX,% ECX, % EDX As Parameters Pushl% EBX # To the System Call MoVL $ 0x10,% ES To Kernel Space MOV% DX,% DS MOV% DX,% ES MOVL $ 0x17,% EDX # fs Points to LOCAL Data Space MOV% DX,% fs call _sys_call_table (,% eax, 4) # Call the jump function table via the system Pushl% Eax Movl _Current,% EAX CMPL $ 0, State (% EAX) # state current process The process scheduled JNE RESCHEDULE CMPL $ 0, counter (% EAX) # counter Time Federation Time Sliced JE Respheduleret_From_Sys_Call: Movl _Current,% EAX # Task [0] Cannot Have Signals CMPL _TASK,% EAX JE3F
CMPW $ 0x
0F
, CS (% ESP) # was old code segment supervisor? Jne
3F
CMPW $ 0x17, Oldss (% ESP) # was stack segment = 0x17? jne
3F
MOVL SIGNAL (% EAX),% EBX MOVL block,% ECX NOTL% ECX and% EBX,% ECX BSFL% ECX,% ECX JE
3F
BTRL% ECX,% EBX # Signal call signal processing program MOVL% EBX, Signal (% EAX) incl% ECX PUSHL% ECX CALL _DO_SIGNAL POPL% EAX # Restore Scene 3: Popl% ECX Popl% EBX POPL% ECX Popl% EDX POP% FS POP% ES POP% DS IRET # interrupt return
The CPU process 0x80 interrupt is the same as the general interrupt processing process: pressing CS, EIP, EFLAGS to the target stack, the interrupt returns, from the stack to the corresponding register. The break processing function will handle the corresponding system calls through the system call function pointer table. This process does not have verified, interested comrades can refer to the general interrupted debugging process.
EIP value
When the source is overwriting in the CPU response, the values of the pressed EIP are interrupted to load this value to the EIP, and the application control flow is continued in this way. The value of this EIP will be determined according to different exceptions:
Category Cause Asynchronous / Synchronous Return behavior interrupts from I / O devices signals Alternative to the next instruction trap intended abnormal synchronization always returns to the next instruction fault potential recoverable error Synchronization According to the fault Execute the current instruction, either terminate the endless error synchronization will not return to Table 3: Abnormal categories (taken from "in-depth understanding of computer systems")
The previously analyzed 0x3 interrupt and 0x80 interrupts belong to "trap", so they are always converted to the user state after the interrupt is complete, and the internal nuclear state is always converted to the user state (by the segmentation mechanism, the segment register loads different segment descriptors), and returns Go to the next instruction for the application.
postscript
The behavior of interrupt processing and long modulation (segment subroutine call) is quite similar, understanding the processing procedure of long calls, can be understood. Many concepts in computer theory are all connected, so the solid basic work can be guided by bypass bypass bypass.