Generate a form of format binary file (Plain Binary Files)

xiaoxiao2021-03-06  194

Generate a form of format binary file (Plain Binary Files)

I searched on the Internet for a long time, I only found some sporadic about this information. I want to use GCC to develop a special tool that I use, combined with my own work experience, write this summary information.

1. Hardware and software environment

l At least one of the 80x86 series of 32-bit computers, the better the better.

l Set of Linux issues, such as Redhat, Mandrake, TurboLinux, etc.

l GNU GCC compiler. This compiler is commonly used under Linux.

l Linux on binutils.

l Text editor you are familiar with, such as VI, etc.

If you don't have these conditions, don't look down again. My work environment is installed on a Celeron 433 installed RedHat Linux8.0, 128M memory, GCC is the default, version 3.2.2. You can use the following command to view the GCC version:

GCC --Version

2. Generate a binary file using the C language

Write a Test.c using your favorite text editor:

int main ()

{

}

Compile with the following command:

GCC -C Test.c

ld -o test -ttext 0x0 -e main test.o

Objcopy -r .note -r .comment -s -o binary test test.bin

The final generated binaries are Test.bin, you can use the anti-assessment tool you like to see what is in this file. I use the objdump under Linux for disassembly:

Objdump -d -b binary -a i386 Test.bin

The results are as follows:

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C9 Leave

11: C3 RET

The first column is the memory address of the instruction; the second column is the machine code of the instruction; the third column is the assembly instruction. I believe your results are the same. If your GCC is different from me, for example 2.7.x version of GCC, your results are likely to have different, lack of the following four instructions, which is normal, the stack framework used by these two versions of GCC Different (the examples described below will also cause differences due to the differences in the compiler version):

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP # stack alignment, allocate local variable space in 16bytes

9: B8 00 00 00 MOV $ 0x0,% EAXE: 29 C4 SUB% EAX,% ESP

The above code is 32-bit code, you need to run in a 32-bit environment like Linux and is a protection mode. You can also generate Test.bin with only the following instructions:

GCC -C Test.c

ld -ttext 0x0 -e main --oformat binary -o test.bin Test.o

There is only one function above Test.c, but it is just a framework. Its disassembly code is not difficult to understand.

3. Write procedures with local variables

Create a new Test.c and see how the GCC handles local variables.

int main ()

{

INT I;

I = 0x12345678;

}

One of the two methods described above is compiled, and Test.bin is generated. Then use Objdump to make an anti-assessment:

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C7 45 FC 78 56 34 12 MOVL $ 0X12345678, 0XFFFFFFFC (% EBP)

17: C9 Leave

18: C3 RET

Compared with the first example, the six instructions at the beginning and the last two instructions are identical, only one instruction is different. This statement is assigned to local variables, and the assignment of its space has been carried out. In GCC, local variable space in the stack is assigned in units of 16 bytes, rather than the usual 1 byte. If they will

INT I;

I = 0x12345678;

Change to

INT I = 0x12345678;

The result is no different. However, if it is a global variable, it is not the same.

4. Write procedures with global variables

Change Test.c to:

INT I;

int main ()

{

I = 0x12345678;

}

Compile with the same method, then make it repeatedly:

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C7 05 1C 10 00 00 78 MOVL $ 0x12345678,0x101c17: 56 34 12

1A: C9 Leave

1B: C3 RET

Our defined global variables are placed at the 0x101c, which is the result of Align the data segment by default by default, which doesn't matter from page-aligns in Page Memory Management. When using an LD link, use the -n parameter to turn off the alignment effect.

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C7 05 1C 00 00 78 MOVL $ 0X12345678, 0X1C

17: 56 34 12

1A: C9 Leave

1B: C3 RET

As we see, the data segment is close to the code segment. We can also specify the location of the data segment, try the following commands to compile:

GCC -C Test.c

ld -ttext 0x0 -tdata 0x1234-E main -n --oformat binary -o test.bin Test.o

Then use ObjDump to disconnect from:

00000000 <.data>:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C7 05 34 12 00 00 78 MOVL $ 0x12345678,0x1234

17: 56 34 12

1A: C9 Leave

1B: C3 RET

Now, our defined global variable is put in 0x1234. By specifying the -tdata parameter to the LD, you can freely define the address of the data segment, if not specified, the data segment is behind the code segment.

Look again to initialize the global variable directly.

Const I = 0x12345678;

int main ()

{

}

Still using the above methods to compile, link, disassemble, the results are as follows:

00000000 <.data>: 0: 55 PUSH% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C9 Leave

11: C3 RET

12:00 00 Add% Al, (% EAX)

14: 78 56 JS 0x6C

16: 34 12 xor $ 0x12,% Al

The code is aligned in 4bytes, the global variable is stored directly after the code segment, and the LD directly places the constant in the position of the global variable, one step in place.

Use the following command to see more details:

Objdump -d test.o

You can see the following results:

Test.o: File Format ELF32-I386

Disassembly of section .text:

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C9 Leave

11: C3 RET

Disassembly of section .data:

Disassembly of Section.Rodata:

00000000 :

0: 78 56 JS 58

2: 34 12 xor $ 0x12,% Al

We can see more clearly that the global constant defined in the .C file is placed in a read-only data segment. Look at the following code:

INT I = 0x12345678;

Const Int C = 0x12345678;

int main ()

{

}

Or use the above method to compile, link, disassemble, and can be obtained as follows:

Test.o: File Format ELF32-I386

Disassembly of section .text:

00000000

: 0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C9 Leave

11: C3 RET

Disassembly of section .data:

00000000 :

0: 78 56 JS 58

2: 34 12 xor $ 0x12,% Al

Disassembly of Section.Rodata:

00000000 :

0: 78 56 JS 58

2: 34 12 xor $ 0x12,% Al

It can be seen that the integer i is placed in a normal data segment, and constant C is placed in the read-only data segment. When the global variable (constant) is used, the LD will automatically store them using the appropriate data segment.

5. Processing pointer

Use the following code to see the case of the GCC processing pointer variable:

int main ()

{

INT I;

INT * P;

P = & I;

* p = 0x12345678;

}

Use Objdump to view the generated machine code:

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: 8D 45 FC LEA 0xfffffffc (% EBP),% EAX

13: 89 45 F8 MOV% EAX, 0xfffffff8 (% EBP)

16: 8B 45 F8 MOV 0xfffffff8 (% EBP),% EAX

19: C7 00 78 56 34 12 MOVL $ 0x12345678, (% EAX)

1F: C9 Leave

20: C3 reset, GCC has pre-allocated at least 8Bytes space for local variables, and enables ESP to align in 16bytes borders. If additional space is required, the GCC will allocate in 16bytes, not other compilers. Allocate in 1byTE units. The variable I is located in EBP-4, and the variable P is located in EBP-8. The LEA instruction places the valid address of i in EAX and then placed in P. Finally, 0x12345678 is assigned to the P Points I.

6. About function call

Look at the following code:

Void func ();

int main ()

{

Func ();

}

Void func ()

{

}

Look at the generated binary code:

00000000 <.data>:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: E8 03 00 00 00 Call 0x18

15: C9 Leave

16: C3 RET

17: 90 NOP

18: 55 PUSH% EBP

19: 89 E5 MOV% ESP,% EBP

1B: C9 Leave

1C: C3 RET

The main function main calls the empty function FUNC through the CALL instruction, which is similar to the MAIN. Specify the -map switch to the LD to output the MAP file, you can get more detailed information.

.Text 0x00000000 0x1D

* (. text .stub .text. * .gnu.linkonce.t. *)

.Text 0x00000000 0x1D Test.o

0x00000000 main

0x00000018 FUNC

The first column is the segment name, here is .Text; second column is the start position, the third column is the length of the segment, the last column is additional information, the function name, the target file, the like. It can be seen that .Text segment starts from 0x0, length is 0x1d; function func starts from 0x18.

7. Return value of the function

Look at the code below, the main function main returns a whole value:

int main ()

{

Return 0x12345678;

}

The generated binary code is similar to other compilers:

00000000

:

0: 55 PUSH% EBP1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: B8 78 56 34 12 MOV $ 0X12345678,% EAX

15: C9 Leave

16: C3 RET

You have seen it, GCC passes the return value using EAX. Because the return value is the value of the EAX register, you can impose hidden, and even not return. Because the return value is saved in the register, the return value is often ignored when the function is called. For example, we often call functions:

Printf (...);

This function is a return value. If the data returned by the function is greater than 4Bytes, it is no longer necessary to return to data. Look at the example below:

Typedef struct mydef {

INT A, B, C, D;

Int arch [10];

} mydef;

Mydef func ();

int main ()

{

MyDef D;

D = func ();

}

mydef func ()

{

MyDef D;

Return D;

}

Then look at the anti-assembly code:

00000000 <.data>:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 48 SUB $ 0x48,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: 8D 45 B8 LEA 0xfffffB8 (% EBP),% EAX

13: 83 EC 0C SUB $ 0xc,% ESP

16: 50 push% EAX

17: E8 06 00 00 00 Call 0x22

1C: 83 C4 0C Add $ 0xc,% ESP

1F: C9 Leave

20: C3 RET

21: 90 NOP

22: 55 PUSH% EBP

23: 89 E5 MOV% ESP,% EBP25: 57 PUSH% EDI

26: 56 PUSH% ESI

27: 83 EC 40 SUB $ 0X40,% ESP

2A: 8B 7D 08 MOV 0x8 (% EBP),% EDI

2D: 8D 75 B8 LEA 0xfffffB8 (% EBP),% ESI

30: FC CLD

31: B8 0E 00 00 00 MOV $ 0XE,% EAX

36: 89 C1 MOV% EAX,% ECX

38: F3 A5 REPZ MOVSL% DS: (% ESI),% ES: (% EDI)

3A: 8B 45 08 MOV 0x8 (% EBP),% EAX

3D: 83 C4 40 Add $ 0x40,% ESP

40: 5E POP% ESI

41: 5F POP% EDI

42: C9 Leave

43: C2 04 00 RET $ 0x4

Our custom structure is 0x38bytes, GCC assigns 0x40bytes space in order to keep the 16bytes alignment of the stack. The function FUNC does not have a parameter, but when the call is called, the pointer of the variable D is passed. Then use this pointer to assign D to D using the command MOVSL directly. Look at the example below:

Typedef struct mydef {

INT A, B, C, D;

Int arch [10];

} mydef;

Mydef func ();

int main ()

{

Func ();

}

mydef func ()

{

MyDef D;

Return D;

}

Look at the results of the anti-assessment:

00000000 <.data>:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 48 SUB $ 0x48,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: 8D 45 B8 LEA 0xfffffB8 (% EBP),% EAX

13: 83 EC 0C SUB $ 0xc,% ESP

16: 50 PUSH% EAX17: E8 06 00 00 00 Call 0x22

1C: 83 C4 0C Add $ 0xc,% ESP

1F: C9 Leave

20: C3 RET

21: 90 NOP

22: 55 PUSH% EBP

23: 89 E5 MOV% ESP,% EBP

25: 57 PUSH% EDI

26: 56 PUSH% ESI

27: 83 EC 40 SUB $ 0X40,% ESP

2A: 8B 7D 08 MOV 0x8 (% EBP),% EDI

2D: 8D 75 B8 LEA 0xfffffB8 (% EBP),% ESI

30: FC CLD

31: B8 0E 00 00 00 MOV $ 0XE,% EAX

36: 89 C1 MOV% EAX,% ECX

38: F3 A5 REPZ MOVSL% DS: (% ESI),% ES: (% EDI)

3A: 8B 45 08 MOV 0x8 (% EBP),% EAX

3D: 83 C4 40 Add $ 0x40,% ESP

40: 5E POP% ESI

41: 5F POP% EDI

42: C9 Leave

43: C2 04 00 RET $ 0X4

It can be said that the word is not bad with the results above! We didn't declare the result of the variable storage of FUNC returned in the main function, but GCC did it for us. It still passes a pointer to the function func and passes the result, although we are not interested in the return value, but the compiler is not interested in our interests, still in my life. (If an optimized option is used, the result is likely to be the same).

8. Transfer parameters to the function

GCC follows general C language standards, including parameter delivery. Take a look at the example below:

CHAR RES;

Char Func (Char A, Char B);

int main ()

{

Res = func (0x02, 0x03);

}

Char Func (Char A, Char B)

{

RETURN A B;

}

Take a look at his disassembly code:

00000000 <.data>:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP3: 83 EC 08 SUB $ 0X8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: 83 EC 08 SUB $ 0x8,% ESP

13: 6A 03 Push $ 0x3

15: 6A 02 Push $ 0x2

17: E8 0A 00 00 00 Call 0x26

1C: 83 C4 10 Add $ 0x10,% ESP

1F: A2 44 00 00 00 MOV% Al, 0x44

24: C9 Leave

25: C3 RET

26: 55 push% EBP

27: 89 E5 MOV% ESP,% EBP

29: 83 EC 04 SUB $ 0X4,% ESP

2C: 8B 45 08 MOV 0x8 (% EBP),% EAX

2F: 8B 55 0C MOV 0xc (% EBP),% EDX

32: 88 45 FF MOV% Al, 0xFffffff (% EBP)

35: 88 55 Fe MOV% DL, 0xffffffe (% EBP)

38: 8A 45 Fe MOV 0xffffffe (% EBP),% Al

3b: 02 45 FF ADD 0xFffffff (% EBP),% Al

3E: 0F BE C0 MOVSBL% Al,% EAX

41: C9 Leave

42: C3 RET

If you are proficient in assembly language, after reading this code, I am afraid you have already spit blood and fainting! GCC actually produced such a code! However, we still talk about the function call specification of the C language.

We have already seen that the parameters are set from right to left. The following instructions are all subject to 32bytes code, which specifically specificallys the following:

l The caller is responsible for pressing the parameters into the stack, the order is from right to left. That is, the last stack on the left.

l The caller uses the NEAR CALL instruction to transmit control to the caller.

l is controlled by the caller, which generally needs to create a stack frame (this is not required, usually do it). First, press the EBP into the stack to save, then place the EBP into the EBP, make EBP a base pointer to the access parameter. l Access parameters by the caller through the EBP. Because EBP has pressed into the stack first, [EBP 4] is automatically pressed into the stack by the CALL instruction, obviously, from [EBP 8], it is the parameter. Since the parameter on the left side of the function is finally pressed into the stack, [EBP 8] is this parameter, and other parameters are pushed in this class. A function like Printf has a number of unsure parameters, but the parameter is set in the stack order, indicating that the caller can find the first parameter, the type and number of other parameters, need to be The first parameter is given.

l The value of the caller reduces the value of the ESP is the temporary variable allocation space in the stack, and then uses EBP and a negative offset access.

l Use Al, AX, and EAX to return to different values. The floating point number can be returned by the ST0 register.

l After the caller is completed, use the previously established stack frame, restore the value of the ESP, SBP, and return the caller using the RET instruction.

l The caller is reaches control, by adding an immediate empty stack for ESP (try not to use multiple POP instructions to empty the stack). If a function prototype is passed by a function prototype through the stack, the caller is still able to restore the stack to the correct state because the caller knows the data of several bytes to the stack.

Combined with the C language function call rules, the above code is not difficult to understand.

Starting from 80386, the number of operands of the PUSH instruction can be 8-bit, 16-bit, 32-bit, but the C language is all processed by 32-bit integer, and the caller is also processed by 32-bit. This is important, especially when assembled language and C language mixed programming.

9. Conversion between basic data types

GCC processing three types of basic data types:

L Signed Char, Unsigned Char, 1 Byte

L Signed Short, Unsigned Short, 2 Bytes

L Signed Int, unsigned int, 4 bytes

Conversion between various data types, follows the rules of the general C language, specifically refer to IA-32 standards. Here is only an example:

int main ()

{

CHAR CH = 'a';

INT x = 2;

INT Y = -4;

}

Compilation and disassembly using the same method:

00000000

:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 18 SUB $ 0x18,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP

10: C6 45 FF 61 MOVB $ 0x61, 0xfffffffff (% EBP) 14: C7 45 F8 02 00 00 MOVL $ 0x2,0xfffffffff8 (% EBP)

1B: C7 45 F4 FC FF FF MOVL $ 0xffffffc, 0xfffffff4 (% EBP)

22: C9 Leave

23: C3 RET

10. Basic operating environment for GCC compiling code

In this part, I checked a lot of documents, and there is no introduction in this regard. Ask a lot of masters, the situation is about the following, I can't guarantee that it is correct here, and the future is also correct, for reference only:

l 32-bit protection mode runs.

The L-segment register CS, DS, ES, FS, GS, and SS must point to the same paragraph memory area.

l The global variable that is not initialized is placed in the BSS segment, and the area is behind the code segment. However, if you generate files are binary files, the BSS segment is not part of the file, you need to be careful. The global variable initialized is within the DATA segment, which is part of the binary and is located after the code segment. The global variables that are declared for const are placed in the RODATA segment, and it is also part of the binary file and placed behind the code segment.

l Make sure the stack has no overflow, careful code segment and global data do not be destroyed.

I also checked Intel's help document "Intel Architecture Software Developer's Manual", there are three volumes of three volumes! Refer to the statement about the memory organization (Volume 1: Memory Organization) (suggest you to study). In summary, make CS, DS, SS always point to the same memory area, you should make the code correctly. If the operation environment is not the case, I don't know the result.

11. Access to the external global variable

See how global variables in the C language program in a non-C language program. This part is useful if you want to use other programs to load C programs, such as assembly language written, especially during core development.

INT myval = 0x5;

int main ()

{

}

Compile this code:

GCC -C Test.c

ld -ttext 0x0 -e main -n -oformat binary -map memmap.txt -o test.bin test.o

Objdump -d -b binrary -m i386 test.bin

Got the following results:

00000000 <.data>:

0: 55 push% EBP

1: 89 E5 MOV% ESP,% EBP

3: 83 EC 08 SUB $ 0x8,% ESP

6: 83 E4 F0 and $ 0xffffffff0,% ESP

9: B8 00 00 00 MOV $ 0x0,% EAX

E: 29 C4 SUB% EAX,% ESP10: C9 Leave

11: C3 RET

12:00 00 Add% Al, (% EAX)

14: 05 .byte 0x5

15:00 00 Add% Al, (% EAX)

The global variable MyVal is stored in 0x14. I have just used the -map switch to make LD generate memory image file MemMap.txt, you should be able to find:

.DATA 0x00000014 0x4

* (. Data .data. * .gnu.linkonce.d. *)

.DATA 0x00000014 0x4 Test.o

0x00000014 MyVal

Note MyVal is located at the 0x00000014 position of the Test.o module. Using the address as an offset, you can access the MyVal variable directly in other languages. Another example can also find the size of the BSS segment by MemMap.txt:

CAT memmap.txt | grep '/.bss' | grep '0x' | sed 's /.* 0x / 0x /'

This example, the size of the BSS is 0x0.

Unable to access global variables using Static modified in the C program. Because such variables are static, they are not listed in the MAP file. Maybe you can use other ways, but it is best not to do this.

12. Options for generating binary files in other formats

The binary file that generates different formats is a quite trouble. It requires many unusless options and some are not listed in the Mana Help information.

The first is the option for GCC: -nostdinc. Obviously, after using this option, GCC does not search for the default incrude path, usually / usr / include. If you need to use a custom header file, you can add a search path using the -i option.

Then the option of the LD. The first is -nostdlib, ignoring the standard library. If necessary, you can use the -l option to specify the search path of the library. The second is -ttext, which is the address of the specified code segment. If the address of the other segment is not specified, they will be placed automatically after the code segment. The third is -e, is the entry address of the specified code, the default is _start, if the code is not starting, it should specify the entry point. The fourth is -oformat binary, that is, the output file is the original binarn file, but a file can make any files supported by the system. However, the intermediate module file cannot be the original binarily, because many symbols and relocation information are needed. You can use the -monaat option to specify the format of the input file, but usually very little use. The fifth is -static, if other libraries are used, use the static link mode, unless your program supports dynamic links.

There is also a code to indicate the pseudo directive. The assembler can compile 16-bit code, or compile 32-bit code. However, GCC always generates the assembly code of 32-bit. The GCC can generate 16-bit assembly code by using the ASM () pseudo command in the C code.

The first is .CODE16, 16-bit code that is running in the 16-bit segment;

The second is .CODE32, 32-bit code that is running in the 32-bit segment, the GCC is always doing by default;

The third is .CODE16GCC, GCC will decide on the 16-bit or 32-bit code that is running under the 16-bit section as needed. GAS will add the necessary prefix, indicating the 32-bit command or register, and the like. This option is useful, which allows us to write code running in the 16-bit environment in the C language, whether it is a real mode or protection mode. You can now have both 16-bit code in a C module, and 32-bit code, but you should pay attention to the address space problem of different partial code.

For example, we want to use GCC to generate an .COM program and start boot program running under DOS.

First, the .com file in DOS is the original binary running in real mode, whose starting address is 0x100. To generate the .com file using GCC, add the following pseudo command at each .c file:

__ASM __ ("CODE16GCC / N");

These library files need to be generated in this way if you need to reference other library files. In the link, add the following options:

-Ttext 0x100 -static -oformat binary

If the program contains embedded assembly code, it needs to be converted to the AT & T format.

If you want to write the boot program, you only need to use 0x7c00 to replace 0x100 when you are linked! In addition, the final generated binary code must be less than 446 bytes!

13. Reference

l Intel Architecture Software Developer's Manual

l Manual Pages in Linux

l Redhat gnupro Toolkit

转载请注明原文地址:https://www.9cbs.com/read-128592.html

New Post(0)