X86 assembly language learning incoming (1) (reposted)

MDB can also disassemble _Start:> _Start :: Dis ---> From _Start's address start anti-assessment _Start: Pushl $ 0 _Start 2: Pushl $ 0 _Start 4: MOVL% ESP,% EBP _START 6: Pushl% EDX _START 7: MOVL $ 0x80504B0,% EAX _Start 0xc: testl% Eax,% EAX _Start 0xe: JE 0xF <_Start 0x1D> _Start 0x10: Pushl $ 0x80504b0 _start 0x15: Call -0x75 < Atexit> _Start 0x1a: Add1 $ 4,% ESP _Start 0x1d: MOVL $ 0x8060710,% EAX _Start 0x22: Testl% Eax,% EAX _Start 0x24: JE 7 <_Start 0x2b> _Start 0x26: Call -0x86 _Start 0x2b: Pushl $ 0x80506cd _Start 0x30: Call -0x90 _Start 0x35: MOVL 8 (% EBP),% EAX _Start 0x38: Leal 0x10 (% EBP,% EAX, 4) ,% EDX _Start 0x3c: MOVL% EDX, 0x8060804 _Start 0x42: Andl $ 0xf0,% ESP _Start 0x45: Subl $ 4,% ESP _Start 0x48: Pushl% EDX _Start 0x49: Leal 0xc (% EBP), % EDX _Start 0x4c: Pushl% EDX _Start 0x4d: pushl% EAX _Start 0x4e: Call 0x152 <_init> _Start 0x53: Call -0xa3 <__ fpstart> _Start 0x58: Call 0xfb

---> Calling the main function _Start 0x5d: addl $ 0xc,% ESP _Start 0x60: Pushl% EAX _Start 0x61: Call -0xa1 _Start 0x66: Pushl $ 0 _ST Art 0x68: MOVL $ 1,% EAX _Start 0x6d: LCALL $ 7, $ 0 _Start 0x74: HLT> Question: Why save functions to save functions with EAX register? In fact, IA32 does not specify which register to save the return value. However, if you dislapse Solaris / Linux binary file, you will find that the function return value is saved with EAX. This is not an accidental phenomenon, which is the ABI (Application Binary Interface) of the operating system to determine. The ABI of the Solaris / Linux operating system is Sytem V ABI. Concept: SFP (Stack Frame Pointer) Stack Frame Pointer Correct Understanding SFP Must Understand: IA32 Stack Concepts CPU 32-bit Register ESP / EBP PUSH / POP Directive How does the stack Call / Ret / Leave and other instructions? As we know about the stack: 1) The stack of the IA32 is used to store temporary data, and it is LIFO, which is first out.

The growth direction of the stack is to increase from the high address to the low address, and add in bytes. 2) EBP is the pointer of the stack base, always pointing to the bottom of the stack (high address), ESP is a stack pointer, always pointing to the top (low address). 3) When a long-type data is used, the data is pressed into the stack in bytes, and the data is stored in the ESP-1, ESP-2, ESP-3, ESP-4, from high to low. . 4) POP a long-type data, the process is in contrast to the PUSH, sequentially pop up ESP-4, ESP-3, ESP-2, ESP-1 from the stack, put a 32-bit register. 5) The CALL instruction is used to call a function or process. At this time, the next instruction address will be pressed into the stack to return to the lower instruction when returning. 6) The RET instruction is used to return from a function or process. The lower strip of the previous call will pop up from the stack to the EIP register, and the program goes to the CALL 7) Enter is a stack to establish the current function Frame, that is, the following two instructions: Pushl% EBP MOVL% ESP,% EBP Leave is a stack frame that releases the current function or process, ie, equivalent to the following two instructions: MOVL EBP ESP Popl EBP If it is negative, there are many functions. At the time of the function enters and return, it is found that there is a compilation statement similar to the following form: Pushl% EBP ---> EBP register content stack, the list of the top-level call function of the Main function, MOVL% ESP,% EBP - -> ESP value is assigned to EBP, set the stack of main functions ........... ----> The above two instructions are equivalent to Enter 0,0 ........ ... Leave ---> Assign the EBP value to the base address of the higher-level function stack in the POP previous stack to the EBP, restore the original stack base return ---> main function returns, return to the superior call these statements is The stack frame for creation and release a function or process. The original compiler will automatically insert the statement created and release the stack framework at the function portal and the exit.

When the function is called: 1) EIP / EBP becomes the boundary function of the new function stack is called, the EIP returns the EIP first pressed into the stack; when the stack framework, the EBP of the superior function stack is pressed into the stack, and the EIP is connected to the EIP Boundary 2) EBP becomes a stack frame pointer STP, used to indicate the boundary stack frame of the new function stack, the content of the stack points to the stack of EBP is the EBP of the last function stack, you can imagine, pass EBP It can be traversed by the stack of the layer call function, and the debugger is using this feature 3) ESP always as a stack pointer to the top of the stack, used to assign the stack space stack allocation space to the function local variable The statement is usually to subtract a constant value to the ESP, for example, allocating a integer data is the parameter transfer and local variable access of the ESP-4 4) function. You can implement a stack of the current function by STP EBP. Address, parameters, and local variables are usually as follows: 8 XX (% EBP): Access to function portal parameters: Function local variable access, such as function a call function B, function B call function C , The function stack framework and call relationship is shown in the following figure: ------------------ ----> High Address | EIP (superior function returned Address) | ---------------------- -> | EBP (EBP for superior functions) | - <------ Current Function A EBP (ie STP Frame Pointer) | -------------------- -> Offix A | | Local Variables | | | ........ | - <------ ESP points to the new allocated local variables, local variables can be accessed by the EBP-offset A of A, F - --------------------- | r | arg n (Nth parameters of the function B) | | a ---------- ------------ | M | Arg. (The first parameter) | | E ------------------- --- | | | ARG 1 (1st parameter of the function B) | | O ---------------------- | f | arg 0 ( The 0th parameter of the function b) | - <------ B function parameters can be accessed by the EBP offset B of B | --------------- ------- -> Offset B | A | EIP (A function Return Address) | | | ---------------------- --- | EBP (EBP of a Function) | <- <------ EBP (ie STP frame pointer) ------------------------------------------------------------------------------------------------------------------------------------ | ......... | | <------ ESP points to the new partial variable -------------------- | | | Arg N (Nth parameters of the function C) | | ---------------------- | | arg. (The "function C) Parameters | | ---------------------- -> Frame of B | Arg 1 (the first parameter of the function C) | | | -------------------- | | Arg 0 (the 0th parameter of the function C) | |

---------------------- | | EIP (Back address of the B function) | | -------------- -------- | -> | EBP (EBP of B "EBP) | --- <------ EBP (ie, STP Framework Pointer) | --- ------------------- | | | | ........ | <------ ESP pointing to function C Allocated local variables | -------------------- ----> low address frame of c to analyze the remaining statements in TEST1 disassembly results Meaning: # MDB Test1 Loading Modules: [Libc.so.1]> Main :: Dis ---> Continued Main Function Main: Pushl% EBP Main 1: MOVL% ESP,% EBP ---> Create Stack Frame (Stack Frame) Main 3: Subl $ 8,% ESP ---> Assign 8 bytes Stack Space MAIN 6: Andl $ 0xF0,% ESP ---> Replacement Address 16 bytes MAIN 9: MOVL $ 0,% EAX ---> meaningless main 0xe: Subl% EAX,% ESP ---> meaningless main 0x10: MOVL $ 0,% EAX ---> Set the main function return value main 0x15: Leave ---> Undo Stack Frame (Stack Frame) Main 0x16: RET ---> Main function returns> The following two sentences seem to be meaningless, is the same? MOVL $ 0,% EAX SUBL% EAX,% ESP is recompiled with GCC O2 Optimization to recompile TEST1.C: # gcc -o2 test1.c -o test1 # mdb test1> main :: dis be: pushl% EBP Main 1 : MOVL% ESP,% EBP Main 3: Subl $ 8,% ESP Main 6: Andl $ 0xf0,% ESP Main 9: XORL% EAX,% EAX ---> Set Main Return Value, use xorl too or instruction To make EAX 0 main 0xb: Leave main 0xc: return> new anti-assembly results are simpler than the initial results, and since the statement that is considered useless is optimized, further verify the previous guess. Tip: Some statements generated by the compiler may not be used in actual semantics, and can remove these statements with the optimization options. Question: Why use XORL to set the value of EAX? Note that in the optimized code, the setting of the EAX return value is made by MOVL $ 0,% EAX to XORL% EAX,% Eax, because IA32 instructions, XORL has a higher running speed than MOVL.

Concept: Stack Aligned Stack The following statement is the following statement? SUBL $ 8,% ESP ANDL $ 0xF0,% ESP ---> Based on the low 4 bits by ANDL, ensure that the stack address 16-byte alignment surface, the most direct consequence of this statement is to make the ESP's address 4 digits For 0, the 16-byte alignment, then why do you do this? It turns out that some instructions of the IA32 series CPU have a more running speed at 4, 8, 16 bytes, so the GCC compiler is the running speed generating code on IA32, and the resulting code is 16-byte. Aligned and the meaning of the ANDL $ 0xf0,% ESP is obvious, then SUBL $ 8,% ESP, is it necessary? Before entering the main function, the stack is 16-byte aligned words, then after entering the main function, after the EIP is pressed into the stack, the last 4 bit of the stack address must be 0100, and the ESP-8 just makes the last 4-bit address. 0. It seems that this is also to ensure that the stack is 16-byte alignment. If you check the GCC manual, you will find the parameter settings for the stack: -mpreferred-stack-boundary = n ---> I hope the stack is aligned according to the N times of 2, the value of N is 2- 12 By default, N is equal to 4, that is, by default, GCC is 16-byte alignment to accommodate the requirements of most of IA32. Let us use -mpreferred-stack-boundary = 2 to remove the stack alignment instruction: # gcc -mpreferred-stack-boundary = 2 TEST1.C -O TEST1> Main :: DIS Main: pushl% EBP Main 1: MOVL% ESP % EBP Main 3: MOVL $ 0,% EAX Main 8: Leave Main 9: Ret> You can see that the stack is not in the command, because the stack of IA32 is 4-byte alignment, no need to use additional The instruction is aligned. So, is the stack frame pointer STP? # gcc -mpreferred-stack-boundary = 2 -FOTI-frame-point test1.c -o test> main :: dis be: MOVL $ 0,% EAX Main 5: Ret> can be known, -fomit-frame-pointer Can remove STP. Question: What shortcomings have you removed after STP? 1) Increasing the adjustment difficulty Since the STP is used in the command of the debugger backtrace, there is no STP that the debug instruction cannot be used. 2) Reduce the assembly code readability function parameters and local variables. In the case of no EBP, it can only be accessed by XX (ESP), and it is difficult to distinguish between two ways, reduce the program's readability. Sex.

X86 assembly language learning incoming (1) (reposted)

9cbs