X86 assembly language learning incoming (1)

xiaoxiao2021-03-06 112

X86 assembly language learning Notes (1) Author: Badcoffee

Email: blog.oliver@gmail.com

October 2004

Original article: http://blog.9cbs.net/yayong

This is the author's study notes in the process of learning the X86 assembly, which is inevitable to have errors and omissions. Welcome to correct.

The author will modify the error at any time and publish the new version on its own Blog site.

Strictly speaking, this document is more focused on the knowledge of C language and C compiler if it involves specific assembly language.

The content can be referred to the relevant documentation.

Compilation environment

OS: Solaris 9 x86

Compiler: GCC 3.3.2

LINKER: Solaris Link Editors 5.x

Debug Tool: MDB

Editor: vi

Note: About the installation and setting of the compilation environment, you can refer to the article:

Development Environment Installation and Settings on Solaris.

MDB is the kernel debug tool provided by Solaris, where it uses it to make an anti-assessment and assembly language commissioning tool.

If you can use GDB in the Linux platform, it can be disassembled and debugged.

2. The simplest C code analysis is a simplified problem to analyze the assembly code generated by the simplest C code:

# vi test1.c

int main ()

{

Return 0;

}

Compile this program to generate binary:

# GCC Test1.c -o test1

# File Test1

TEST1: ELF 32-bit LSB EXECUTABLE 80386 Version 1, DynamicalLinked, NOT STRIPPED

Test1 is an executable file of a 32-bit small end (Little endian), dynamic link, and the symbolic table is not removed.

This is exactly the typical executable file format of the UNIX / Linux platform.

Using MDB disassembly to observe the generated assembly code:

# MDB Test1

Loading Modules: [Libc.so.1]

> main :: dish;

Confluence MAIN function, MDB command generally format

:: DIS

MAIN:

Pushl% EBP; EBP register content stack, ie a stack base address that saves the superior call function of the main function

Main 1: MOVL% ESP,% EBP; ESP value is assigned to EBP,

Set the stack base of the main function

Main 3: SUBL $ 8,% ESP

Main 6: Andl $ 0xF0,% ESP

Main 9: MOVL $ 0,% EAX

Main 0xe: Subl% EAX,% ESP

Main 0x10: MOVL $ 0,% EAX; set function Return value 0

Main 0x15: Leave; assign the EBP value to the base address of the upper grade function stack in the POP in the POP, restore the original stack base

Main 0x16: return; main function returns, return to superior call

Note: The assembly language syntax formats here is very different from Intel's manual, and UNIX / Linux uses AT & T assembly format as a syntax format of assembly language.

If you want to know the AT & T assembly, you can refer to the article:

Linux AT & T Asficient Language Development Guide

Question: Who called the main function?

From the C language level, the main function is a program's starting entry point, and in fact, the entry point of the ELF executable is not main but _start.

MDB can also disassemble _Start:

> _START :: DIS

;

Start from _start's address

_Start: Pushl $ 0

_Start 2: Pushl $ 0

_START 4: MOVL% ESP,% EBP

_Start 6: Pushl% EDX

_Start 7: MOVL $ 0x80504B0,% EAX

_Start 0xc: Testl% Eax,% EAX

_Start 0xe: JE 0xF <_Start 0x1D>

_Start 0x10: Pushl $ 0x80504B0

_Start 0x15: Call -0x75

_Start 0x1a: Add1 $ 4,% ESP

_Start 0x1D: MOVL $ 0x8060710,% EAX

_Start 0x22: Testl% EAX,% EAX

_Start 0x24: JE 7 <_Start 0x2B>

_Start 0x26: Call -0x86

_Start 0x2b: Pushl $ 0x80506cd

_Start 0x30: Call -0x90

_Start 0x35: MOVL 8 (% EBP),% EAX

_Start 0x38: LEAL 0X10 (% EBP,% EAX, 4),% EDX

_Start 0x3c: MOVL% EDX, 0x8060804

_Start 0x42: Andl $ 0xF0,% ESP

_Start 0x45: SUBL $ 4,% ESP

_Start 0x48: Pushl% EDX

_Start 0x49: LEAL 0xC (% EBP),% EDX

_Start 0x4c: Pushl% EDX

_Start 0x4D: Pushl% EAX

_Start 0x4e: Call 0x152 <_init>

_Start 0x53: Call -0xa3 <__ fpstart>

_Start 0x58:

Call 0xfb

; call the main function here

_Start 0x5D: AddL $ 0xc,% ESP

_Start 0x60: Pushl% EAX

_Start 0x61: Call -0xa1 _Start 0x66: Pushl $ 0

_Start 0x68: MOVL $ 1,% EAX

_Start 0x6D: LCALL $ 7, $ 0

_Start 0x74: HLT

Question: Why save the function return value with the EAX register?

In fact, IA32 does not specify which register to save the return value. But if you dislapse Solaris / Linux's binary, you will find that the Eax saves the function return value.

This is not a chance, it is the operating system

ABI (Application Binary Interface) is determined.

The Solaris / Linux operating system ABI is

Sytem V ABI.

concept:

SFP (Stack Frame Pointer) Stack Frame Pointer

Correct understanding SFP must be understood:

IA32 stack concept

The role of 32-bit register ESP / EBP in CPU

Push / POP instruction

How to affect the stack

How does Call / Ret / Leave and other instructions affect the stack

as we know:

1) The stack of IA32 is used to store temporary data, and it is LIFO, which is first out. The growth direction of the stack is to increase from the high address to the low address, and add in bytes.

2) EBP is the pointer of the stack base, always pointing to the bottom of the stack (high address), ESP is a stack pointer, always pointing to the top (low address).

3) When a long-type data is used, the data is pressed into the stack in bytes, and the data is stored in the ESP-1, ESP-2, ESP-3, ESP-4, from high to low. .

4) POP a long-type data, the process is in contrast to the PUSH, sequentially pop up ESP-4, ESP-3, ESP-2, ESP-1 from the stack, put a 32-bit register.

5) The CALL instruction is used to call a function or process. At this time, the next instruction address will be pressed into the stack to return to the lower instruction when returning.

6) The RET instruction is used to return from a function or process. The lower strip of the previous CALL will pop up from the stack to the EIP register, the program is transferred to the CALL before the following instructions

7) Enter is the stack framework for establishing the current function, ie the following two instructions:

Pushl% EBP

MOVL% ESP,% EBP

8) Leave is a stack frame that releases the current function or process, that is, the following two instructions:

MOVL EBP ESP

POPL EBP

If a function is disassembled, many times will enter and return at the function, find a compilation statement similar to the following form:

Pushl% EBP

;

EBP register content stack, the stack base address of the higher-level call function for saving the main function

MOVL% ESP,% EBP

;

The ESP value is assigned to EBP.

Set the stack base of the main function

...........

;

The above two instructions are equivalent to ENTER 0,0

...........

Leave

;

Assign the EBP value to the base address of the superior function stack in the POP's previous stack to the EBP, restore the original stack base site

Ret; main function returns, return to the superior call These statements are used to create and release a stack framework for a function or process.

The original compiler will automatically insert the statement created and release the stack framework at the function portal and the exit.

When the function is called:

1) EIP / EBP becomes the boundaries of new function stacks

When the function is called, the EIP that returns is first pressed into the stack; when the stack framework is created, the EBP of the higher-level function stack is pressed into the stack, and the boundary of a new function stack frame with EIP 2) EBP becomes a stack frame pointer SFP, After the border stack framework for indicating the new function stack, the contents of the stack points to the EBP are the EBP of the previous function stack, you can imagine that the stack of the layer call function can be traversed once, debugger through the EBP That is to use this feature to implement the backtrace function

3) ESP always points to the top of the stack, used to assign the stack space

The statement when the stack allocation space gives a local variable is usually minus a constant value to the ESP, for example, allocating a integer data is ESP-4

4) Parameter transfer and local variable access can be implemented through SFP, which is usually implemented because the stack frame pointer always points to the current function, the parameters, and local variable access usually as follows:

8 XX (% EBP); Access to function portal parameters

-xx (% EBP); function local variable access

If the function A call function B, the function b call function C, the function stack framework and call relationship is shown in the following figure:

----------------------- ----> High Address | EIP (superior function return address) | ------ ------------------- -> | EBP (EBP of the superior function) | - <------ EBP of the current function A (ie SFP frame pointer) | ------------------------ -> offset A | | | | |. ........ | - <------ ESP points to the new partial variable of the function A, local variables can be accessed by the EBP-offset of A, f --- -------------------- | r | arg n (Nth parameters of function B) | | a ----------- -------------- | M | arg. (the first parameter) | | E ----------------- -------- | | ARG 1 (1st parameter of the function B) | | o ------------------------ - | F | Arg 0 (the 0th parameter of the function B) | - <------ B function parameters can be accessed by the EBP offset B of B | ----- ------------------ -> Offset B | A | EIP (return address of a function) | | | -------- ----------------- - --- | EBP (EBP of a function) | <- <------ EBP of current function B (Ie SFP frame pointer) ------------------------- | | Local variables | | | ........ | | <------ ESP points to function B new allocated local variable --------------------- | | Arg N ( Nth parameters of function C) | | ----------------------- | | ARG. (The function C's first parameter) | | ------------------------- -> Frame of B | Arg 1 (1st parameter of the function C) | | ------------------------- | | Arg 0 (the 0th 5 of the function C Parameters | | ------------------------- | | EIP (return address of the B function) | | ----- -------------------- | -> | EBP (EBP of B) | - <------ EBP of the current function C (Ie sfp frame pointer) | ------------------------- | | Local variables | | | ......... Local variables for function C newly allocated | ----------------------- -- >

Low Address Frame Of C Figure 1-1 Analysis of the meaning of the remaining part of the statement in TEST1 in the reverse assembly:

# MDB Test1

Loading Modules: [Libc.so.1]

> main :: dish;

Continued main function

Main: pushl% EBP

Main 1: MOVL% ESP,% EBP

;

Create Stack Frame (Stack Frame)

MAIN 3:

SUBL $ 8,% ESP

;

Assign 8-byte stack space through ESP-8

MAIN 6:

Andl $ 0xF0,% ESP

;

Harmony in the stack address 16 bytes

MAIN 9:

MOVL $ 0,% EAX

;

Meaningless

Main 0xe:

SUBL% EAX,% ESP

;

Meaningless

Main 0x10: MOVL $ 0,% EAX

;

Set the main function return value

Main 0x15: Leave

;

Undo Stack Frame (Stack Frame)

Main 0x16: RET

;

Main function returns

> The following two sentences seem to be meaningless, is it true?

MOVL $ 0,% EAX

SUBL% EAX,% ESP

Use GCC O2 optimization to recompile TEST1.C:

GCC -O2 TEST1.C -O Test1 #

MDB TEST1

> Main :: DIS

MAIN:

Pushl% EBP

MAIN 1:

MOVL% ESP,% EBP

MAIN 3:

SUBL $ 8,% ESP

MAIN 6:

Andl $ 0xF0,% ESP

MAIN 9:

XORL% EAX,% EAX

;

Set main return value

, Use xorl is or instructions to make Eax 0

Main 0xB:

Leave

Main 0xc:

RET

The new anti-assembly result is simpler than the initial results. Sure enough, the statement that is considered useless is optimized, further verifying the previous guess.

Tip: Some statements generated by the compiler may not be used in actual semantics, and can remove these statements with the optimization options.

Question: Why use XORL to set the value of EAX? Note that the EAX return value is set by the Optimized code.

MOVL $ 0,% EAX changes

XORL% EAX,% EAX, because IA32 instructions, XORL has a higher running speed than MOVL.

concept:

Stack aligned Stack The following statement is the following statement?

SUBL $ 8,% ESP

Andl $ 0xf0,% ESP; 4 bits of low 4 digits via Andl, guarantee stack address 16 byte alignment

On the surface, the most direct consequence of this statement is to make the address of the ESP to 0, that is, 16 bytes aligned, then why do you do this?

It turns out that some instructions of the IA32 series CPU have a faster running speed at 4, 8, and 16 bytes, so the GCC compiler is the running speed generated on IA32, respectively.

The resulting code is 16-byte alignment by default.

Andl $ 0xf0, the meaning of% ESP is obvious, then

SUBL $ 8,% ESP, is it necessary?

Here, it is assumed that the stack is 16-byte aligned, then after entering the main function, the EIP and EBP are pressed into the stack, and the last 4-bit binary bit of the stack address must be 1000, and ESP-8 is just The latter 4-bit address binary bit is 0000. It seems that this is also to ensure that the stack is 16-byte alignment. If you check the GCC manual, you will find the parameter settings for the stack.

-mpreferred-stack-boundary = n; I hope the stack is aligned according to the N times of 2, the N value range is 2-12 by default, N is equal to 4, that is, by default, GCC It is 16-byte alignment to adapt to the requirements of most of the IA32.

Let us use

-mpreferred-stack-boundary = 2 to remove the stack alignment command:

# gcc -mpreferred-stack-boundary = 2 TEST1.C -O TEST1

> Main :: DIS

Main: pushl% EBP

Main 1: MOVL% ESP,% EBP

Main 3: MOVL $ 0,% EAX

Main 8: Leave

Main 9: Ret

> It can be seen that the stack has no instructions, because the stack of IA32 itself is 4-byte alignment, and does not need to be aligned with additional instructions.

So, is the stack frame pointer SFP?

# gcc -mpreferred-stack-boundary = 2 -FOMIT-frame-pointer test1.c -o test

> Main :: DIS

Main: MOVL $ 0,% EAX

Main 5: Ret

It can be seen that

-FOMIT-FRAME-POINTER can remove SFP.

Question: Is there any shortcomings after the SFP?

1) Increase the difficulty of adjustment

Since the SFP is used in the command of the debugger backtrace, there is no SFP that the debug instruction cannot be used.

2) Reduce assembly code readability

Visit of function parameters and local variables, only through xx (ESP) mode without EBP, and it is difficult to distinguish between two ways, reduce the readability of the program.

Question: What is the advantage of going to SFP?

1) Save the stack space

2) After reducing the instructions of establishing and revoking the stack frame, simplifying the code

3) Make the EBP to the universal register, increase the number of universal registers.

4) The above 3 points make the program run faster

Concept: Calling Convention Call, ABI (

Application Binary Interface Application Binary Interface

How do functions find its parameters?

How does the function return the result?

Where to store local variables?

Is that hardware register be the starting space?

The hardware register must be pre-reserved in advance?

The Calling Convention Call will make a regulation of the above problems.

Calling convepen is also

Part of ABI.

Therefore, complying with the same ABI specification operating system, making it possible to achieve interoperability of binary code interoperability. For example: Because Solaris, Linux complies with System V ABI, Solaris 10 provides a function of running Linux binary programs.

See Article:

Concerned: 10 new changes in Solaris 10

3. Small junction This article introduces the following concepts through the simplest C procedures: SFP stack frame pointer

Stack aligned stack

Calling convention call agreement and ABI (

Application Binary Interface Application Binary Interface

In the future, through further experiments, these concepts will be deeply understood. By mastering these concepts, it is possible to master C language advanced debugging skills in the compiled debugger.

9cbs

New Post(0)