X86 assembly language learning Notes (1) Author: BadcoffeeEmail: blog.oliver@gmail.com2004年 October ORIGINAL: http://blog.9cbs.net/yayong Copyright: Be sure to indicate the original Reprinted with hyperlink form Source, author information and this statement This is the author's study notes in the process of learning X86 assembly, and it is inevitable that there are mistakes and omissions. Welcome to correct. The author will modify the error at any time and publish the new version on its own Blog site. Strictly speaking, this document is more focused on the knowledge of C language and C compiler. If it involves the content of the specific assembly language, you can refer to the relevant documentation. 1. Compilation Environment OS: Solaris 9 x86 Compiler: GCC 3.3.2 Linker: Solaris Link Editors 5.x Debug Tool: MDB Editor: VI Note: About the installation and setting of the compilation environment, you can refer to article:
Development Environment Installation and Settings on Solaris
. MDB is the kernel debug tool provided by Solaris, where it uses it to make an anti-assessment and assembly language commissioning tool. If you can use GDB in the Linux platform, it can be disassembled and debugged.
2. The simplest C code analysis is a simplified problem to analyze the assembly code generated by the simplest C code: # vi test1.c int main () {return 0;} Compile the program, generate binary:
# GCC TEST1.C -O Test1 # file test1 test1: ELF 32-bit LSB EXECUTABLE 80386 VERSION 1, DYNAMICALLY LINKED, NOT STRIPPED TEST1 is an executable file, dynamic link and symbol The table is not removed. This is exactly the typical executable file format of the UNIX / Linux platform. Using MDB disassembly, you can observe the generated assembly code: # mdb test1 loading modules: [libc.so.1]> main :: disgunding the main function, MDB command generally format
:: DIS MAIN:Pushl% EBP; EBP register content stack, the stack of the upper-level call function for saving the main call function of the Main function Main 1: MOVL% ESP,% EBP; ESP value is assigned to EBP, set the stack of Main functions Main 3: SUBL $ 8,% ESP Main 6: Andl $ 0xF0,% ESP Main 9: MOVL $ 0,% EAX Main 0xE: Subl% EAX,% ESP Main 0x10: MOVL $ 0,% EAX; Setup Function Return Value 0 Main 0x15: Leave; assign the EBP value to the base address of the upper grade function stack in the POP in the POP, restore the original stack base main 0x16: return; main function returns, return to superior calls> Note: here get The assembly language syntax format is very different from Intel's manual, and UNIX / Linux uses AT & T assembly format as a syntax format for assembly language. If you want to know AT & T assembly, please refer to article: Linux AT & T assembly language development guide
Question: Who called the main function? From the C language level, the main function is a program's starting entry point, and in fact, the entry point of the ELF executable is not main but _start.
MDB can also disassemble _Start:> _Start :: DIS; from _start's address starts to reverse assessment _Start: Pushl $ 0 _Start 2: Pushl $ 0 _Start 4: MOVL% ESP,% EBP _Start 6: Pushl% EDX _Start 7: MOVL $ 0x80504B0,% EAX _Start 0xc: Testl% Eax,% EAX _Start 0xe: JE 0xF <_Start 0x1D> _Start 0x10: Pushl $ 0x80504B0 _Start 0x15: Call -0x75
0x61: Call -0xa1
Pushl% EBP MOVL% ESP,% EBP 8) Leave is a stack framework that releases the current function or process, ie equivalent to the following instructions: MOVL EBP ESP POPL EBP If it dislapped a function, many times will enter and return to the function I found a compilation statement similar to the following form:
Pushl% EBP;
EBP register content stack, ie the stack base address MOVL% ESP,% EBP value of the upper-level call function of the main function, assigns an EBP to set the stack base of the main function ......... The above two instructions are equivalent to Enter 0,0 ........... Leave; assign the EBP value to the base address of the superior function stack in the POP in the POP's previous stack to restore the original stack base site
Ret; main function returns, return to the superior call These statements are used to create and release a stack framework for a function or process. The original compiler will automatically insert the statement created and release the stack framework at the function portal and the exit. When the function is called: 1) EIP / EBP becomes the boundary function of the new function stack is called, the EIP returns the EIP first pressed into the stack; when the stack framework, the EBP of the superior function stack is pressed into the stack, and the EIP is connected to the EIP The boundary of the new function stack frame 2) EBP becomes a stack frame pointer SFP, used to indicate the boundary stack frame of the new function stack, the content of the stack points to the EBP is the EBP of the last level function stack, you can imagine, It can be traversed by the stack of the layer call function, and the debugger is using this feature 3) ESP always as a stack pointer to the top of the stack, used to assign the stack space stack allocation space to the function local variable The statement is usually minus a constant value to the ESP, for example, allocating a integer data is ESP-4
4) Parameter transfer and local variable access can be implemented by SFP, which is usually implemented because stack frame pointer always points to the current function, parameters, and local variable access usually as follows: 8 XX (% EBP); function Accessing -XX (% EBP) of the entrance parameters; function local variable access If the function A call function B, the function b call function C, the function stack framework and call relationship is shown below:
----------------------- ----> High Address | EIP (superior function return address) | ------ ------------------- -> | EBP (EBP of the superior function) | - <------ EBP of the current function A (ie SFP frame pointer) | ------------------------ -> offset A | | | | |. ........ | - <------ ESP points to the new partial variable of the function A, local variables can be accessed by the EBP-offset of A, f --- -------------------- | r | arg n (Nth parameters of function B) | | a ----------- -------------- | M | arg. (the first parameter) | | E ----------------- -------- | | ARG 1 (1st parameter of the function B) | | o ------------------------ - | F | Arg 0 (the 0th parameter of the function B) | - <------ B function parameters can be accessed by the EBP offset B of B | ----- ------------------ -> Offset B | A | EIP (return address of a function) | | | -------- ----------------- - --- | EBP (EBP of a function) | <- <------ EBP of current function B (Ie SFP frame pointer) ------------------------- | | Local variables | | | ........ | | <------ ESP points to function B new allocated local variable --------------------- | | Arg N ( Nth parameters of function C) | | ----------------------- | | ARG. (The function C's first parameter) | | ------------------------- -> Frame of B | Arg 1 (1st parameter of the function C) | | ------------------------- | | Arg 0 (the 0th 5 of the function C Parameters | | ------------------------- | | EIP (return address of the B function) | | ----- -------------------- | -> | EBP (EBP of B) | - <------ EBP of the current function C (Ie sfp frame pointer) | ------------------------- | | Local variables | | | ......... Local variables for function C newly allocated | ----------------------- -- >
Low Address Frame Of C Figure 1-1 Analysis of the meaning of the remaining part of the statement in TEST1 in the reverse assembly:
# MDB Test1 Loading Modules: [libc.so.1]> main :: disgunding main function main: pushl% EBP Main 1: MOVL% ESP,% EBP; Create Stack frame (Stack Frame) Main 3: SUBL $ 8,% ESP; assigns 8-byte stack space MAIN 6: Andl $ 0xF0,% ESP; MAIN 9: MAIN 0XE : Subl% EAX,% ESP; meaningless MAIN 0x10: MAIN 0,% EAX; setting main function return value main 0x15: Leave; Undo Stack frame (Stack Frame) Main 0x16: Ret; main function returned
> The following two sentences seem to be meaningless, is it true? MOVL $ 0,% EAX SUBL% EAX,% ESP
Re-compile TEST1.C: # with GCC O2 Optimization
GCC -O2 TEST1.C -O Test1 # MDB TEST1> Main :: DIS Main: Pushl% EBP Main 1: MOVL% ESP,% EBP Main 3: Subl $ 8,% ESP Main 6: Andl $ 0xF0,% ESP Main 9: XORL% EAX,% EAX; Sets Main Return Value, use the XORL vary or instruction to make Eax 0 main 0xb: Leave Main 0xc: RET> new anti-assembly results are simpler than the initial results Sure enough, the statement that was previously considered use was optimized, and further verified the previous guess.
Tip: Some statements generated by the compiler may not be used in actual semantics, and can remove these statements with the optimization options.
Question: Why use XORL to set the value of EAX? Note that in the optimized code, the setting of the EAX return value is made by MOVL $ 0,% EAX to XORL% EAX,% Eax, because IA32 instructions, XORL has a higher running speed than MOVL. Concept:
Stack aligned Stack The following statement is the following statement? SUBL $ 8,% ESP ANDL $ 0xF0,% ESP; the low 4 bits are 0, which guarantees the 16-byte aligned surface of the stack address, the most direct consequence of this statement is to make the ESP's address 4 digits of 0, That is, 16 bytes are aligned, then why do you do this? It turns out that some instructions of the IA32 series CPU have faster running speeds at 4, 8, and 16 bytes, so the GCC compiler is the running speed generated on IA32, and the resulting code is 16 words. Evaluation of Andl $ 0xf0,% ESP is obvious, then SUBL $ 8,% ESP, is it necessary? Here, it is assumed that the stack is 16-byte aligned, then after entering the main function, the EIP and EBP are pressed into the stack, and the last 4-bit binary bit of the stack address must be 1000, and ESP-8 is just The latter 4-bit address binary bit is 0000. It seems that this is also to ensure that the stack is 16-byte alignment. If you check the GCC manual, you will find the parameter settings for the stack.
-mpreferred-stack-boundary = n; I hope the stack is aligned according to the N times of 2, the N value range is 2-12 by default, N is equal to 4, that is, by default, GCC It is 16-byte alignment to adapt to the requirements of most of the IA32. Let us use -mpreferred-stack-boundary = 2 to remove the stack alignment command:
# gcc -mpreferred-stack-boundary = 2 TEST1.C -O TEST1
> Main :: DIS
Main: pushl% EBP
Main 1: MOVL% ESP,% EBP
Main 3: MOVL $ 0,% EAX
Main 8: Leave
Main 9: Ret
> It can be seen that the stack has no instructions, because the stack of IA32 itself is 4-byte alignment, and does not need to be aligned with additional instructions. So, is the stack frame pointer SFP?
# gcc -mpreferred-stack-boundary = 2 -FOMIT-frame-pointer test1.c -o test
> Main :: DIS
Main: MOVL $ 0,% EAX
Main 5: Ret
> This is known that -FOTIMIT-FRAME-POINTER can remove SFP.
Question: Is there any shortcomings after the SFP?
1) Increase the modulation difficulty due to SFP is used in the command of the debugger backtrace, so there is no SFP that the debug instruction cannot be used. 2) Reduce the assembly code readability function parameters and local variables. In the case of no EBP, it can only be accessed by XX (ESP), and it is difficult to distinguish between two ways, reduce the program's readability. Sex.
Question: What is the advantage of going to SFP?
1) Save the stack space 2) After reducing the instructions of establishing and revoking the stack frame, simplifying code 3) makes the EBP idle, making it used as a general register, increasing the number of general registers.
4) The above 3 points make the program run faster concept: Calling Convention Call Aggregation and ABI (Application Binary Interface Application Binary Interface
How do functions find its parameters?
How does the function return the result?
Where to store local variables?
Is that hardware register be the starting space?
The hardware register must be pre-reserved in advance? The Calling Convention Call will make a regulation of the above problems. Calling convepen is also
Part of ABI.
Therefore, complying with the same ABI specification operating system, making it possible to achieve interoperability of binary code interoperability. For example: Because Solaris, Linux complies with System V ABI, Solaris 10 provides a function of running Linux binary programs. See Article:
Concerned: 10 new changes in Solaris 10
3. Small junction This article introduces the following concepts through the simplest C procedures:
SFP stack frame pointer
Stack ALIGNED Stack The two-pieces of the Calling Convention Call agreement and ABI (Application Binary Interface) application binary interface will be deeply understood by further experiments. By mastering these concepts, it is possible to master C language advanced debugging skills in the compiled debugger.
Related documents:
Development Environment Installation and Settings on Solaris
Linux AT & T Asficient Language Development Guide
ELF Dynamic Resolution Symbol Process (Revised)
Concerned: 10 new changes in Solaris 10