Generate a form of format binary file (Plain Binary Files)
I searched on the Internet for a long time, I only found some sporadic about this information. I want to use GCC to develop a special tool that I use, combined with my own work experience, write this summary information.
1. Hardware and software environment
l At least one of the 80x86 series of 32-bit computers, the better the better.
l Set of Linux issues, such as Redhat, Mandrake, TurboLinux, etc.
l GNU GCC compiler. This compiler is commonly used under Linux.
l Linux on binutils.
l Text editor you are familiar with, such as VI, etc.
If you don't have these conditions, don't look down again. My work environment is installed on a Celeron 433 installed RedHat Linux8.0, 128M memory, GCC is the default, version 3.2.2. You can use the following command to view the GCC version:
GCC --Version
2. Generate a binary file using the C language
Write a Test.c using your favorite text editor:
int main ()
{
}
Compile with the following command:
GCC -C Test.c
ld -o test -ttext 0x0 -e main test.o
Objcopy -r .note -r .comment -s -o binary test test.bin
The final generated binaries are Test.bin, you can use the anti-assessment tool you like to see what is in this file. I use the objdump under Linux for disassembly:
Objdump -d -b binary -a i386 Test.bin
The results are as follows:
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C9 Leave
11: C3 RET
The first column is the memory address of the instruction; the second column is the machine code of the instruction; the third column is the assembly instruction. I believe your results are the same. If your GCC is different from me, for example 2.7.x version of GCC, your results are likely to have different, lack of the following four instructions, which is normal, the stack framework used by these two versions of GCC Different (the examples described below will also cause differences due to the differences in the compiler version):
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP # stack alignment, allocate local variable space in 16bytes
9: B8 00 00 00 MOV $ 0x0,% EAXE: 29 C4 SUB% EAX,% ESP
The above code is 32-bit code, you need to run in a 32-bit environment like Linux and is a protection mode. You can also generate Test.bin with only the following instructions:
GCC -C Test.c
ld -ttext 0x0 -e main --oformat binary -o test.bin Test.o
There is only one function above Test.c, but it is just a framework. Its disassembly code is not difficult to understand.
3. Write procedures with local variables
Create a new Test.c and see how the GCC handles local variables.
int main ()
{
INT I;
I = 0x12345678;
}
One of the two methods described above is compiled, and Test.bin is generated. Then use Objdump to make an anti-assessment:
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C7 45 FC 78 56 34 12 MOVL $ 0X12345678, 0XFFFFFFFC (% EBP)
17: C9 Leave
18: C3 RET
Compared with the first example, the six instructions at the beginning and the last two instructions are identical, only one instruction is different. This statement is assigned to local variables, and the assignment of its space has been carried out. In GCC, local variable space in the stack is assigned in units of 16 bytes, rather than the usual 1 byte. If they will
INT I;
I = 0x12345678;
Change to
INT I = 0x12345678;
The result is no different. However, if it is a global variable, it is not the same.
4. Write procedures with global variables
Change Test.c to:
INT I;
int main ()
{
I = 0x12345678;
}
Compile with the same method, then make it repeatedly:
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C7 05 1C 10 00 00 78 MOVL $ 0x12345678,0x101c17: 56 34 12
1A: C9 Leave
1B: C3 RET
Our defined global variables are placed at the 0x101c, which is the result of Align the data segment by default by default, which doesn't matter from page-aligns in Page Memory Management. When using an LD link, use the -n parameter to turn off the alignment effect.
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C7 05 1C 00 00 78 MOVL $ 0X12345678, 0X1C
17: 56 34 12
1A: C9 Leave
1B: C3 RET
As we see, the data segment is close to the code segment. We can also specify the location of the data segment, try the following commands to compile:
GCC -C Test.c
ld -ttext 0x0 -tdata 0x1234-E main -n --oformat binary -o test.bin Test.o
Then use ObjDump to disconnect from:
00000000 <.data>:
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C7 05 34 12 00 00 78 MOVL $ 0x12345678,0x1234
17: 56 34 12
1A: C9 Leave
1B: C3 RET
Now, our defined global variable is put in 0x1234. By specifying the -tdata parameter to the LD, you can freely define the address of the data segment, if not specified, the data segment is behind the code segment.
Look again to initialize the global variable directly.
Const I = 0x12345678;
int main ()
{
}
Still using the above methods to compile, link, disassemble, the results are as follows:
00000000 <.data>: 0: 55 PUSH% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C9 Leave
11: C3 RET
12:00 00 Add% Al, (% EAX)
14: 78 56 JS 0x6C
16: 34 12 xor $ 0x12,% Al
The code is aligned in 4bytes, the global variable is stored directly after the code segment, and the LD directly places the constant in the position of the global variable, one step in place.
Use the following command to see more details:
Objdump -d test.o
You can see the following results:
Test.o: File Format ELF32-I386
Disassembly of section .text:
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C9 Leave
11: C3 RET
Disassembly of section .data:
Disassembly of Section.Rodata:
00000000 :
0: 78 56 JS 58
2: 34 12 xor $ 0x12,% Al
We can see more clearly that the global constant defined in the .C file is placed in a read-only data segment. Look at the following code:
INT I = 0x12345678;
Const Int C = 0x12345678;
int main ()
{
}
Or use the above method to compile, link, disassemble, and can be obtained as follows:
Test.o: File Format ELF32-I386
Disassembly of section .text:
00000000
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C9 Leave
11: C3 RET
Disassembly of section .data:
00000000 :
0: 78 56 JS 58
2: 34 12 xor $ 0x12,% Al
Disassembly of Section.Rodata:
00000000
0: 78 56 JS 58
2: 34 12 xor $ 0x12,% Al
It can be seen that the integer i is placed in a normal data segment, and constant C is placed in the read-only data segment. When the global variable (constant) is used, the LD will automatically store them using the appropriate data segment.
5. Processing pointer
Use the following code to see the case of the GCC processing pointer variable:
int main ()
{
INT I;
INT * P;
P = & I;
* p = 0x12345678;
}
Use Objdump to view the generated machine code:
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: 8D 45 FC LEA 0xfffffffc (% EBP),% EAX
13: 89 45 F8 MOV% EAX, 0xfffffff8 (% EBP)
16: 8B 45 F8 MOV 0xfffffff8 (% EBP),% EAX
19: C7 00 78 56 34 12 MOVL $ 0x12345678, (% EAX)
1F: C9 Leave
20: C3 reset, GCC has pre-allocated at least 8Bytes space for local variables, and enables ESP to align in 16bytes borders. If additional space is required, the GCC will allocate in 16bytes, not other compilers. Allocate in 1byTE units. The variable I is located in EBP-4, and the variable P is located in EBP-8. The LEA instruction places the valid address of i in EAX and then placed in P. Finally, 0x12345678 is assigned to the P Points I.
6. About function call
Look at the following code:
Void func ();
int main ()
{
Func ();
}
Void func ()
{
}
Look at the generated binary code:
00000000 <.data>:
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: E8 03 00 00 00 Call 0x18
15: C9 Leave
16: C3 RET
17: 90 NOP
18: 55 PUSH% EBP
19: 89 E5 MOV% ESP,% EBP
1B: C9 Leave
1C: C3 RET
The main function main calls the empty function FUNC through the CALL instruction, which is similar to the MAIN. Specify the -map switch to the LD to output the MAP file, you can get more detailed information.
.Text 0x00000000 0x1D
* (. text .stub .text. * .gnu.linkonce.t. *)
.Text 0x00000000 0x1D Test.o
0x00000000 main
0x00000018 FUNC
The first column is the segment name, here is .Text; second column is the start position, the third column is the length of the segment, the last column is additional information, the function name, the target file, the like. It can be seen that .Text segment starts from 0x0, length is 0x1d; function func starts from 0x18.
7. Return value of the function
Look at the code below, the main function main returns a whole value:
int main ()
{
Return 0x12345678;
}
The generated binary code is similar to other compilers:
00000000
0: 55 PUSH% EBP1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: B8 78 56 34 12 MOV $ 0X12345678,% EAX
15: C9 Leave
16: C3 RET
You have seen it, GCC passes the return value using EAX. Because the return value is the value of the EAX register, you can impose hidden, and even not return. Because the return value is saved in the register, the return value is often ignored when the function is called. For example, we often call functions:
Printf (...);
This function is a return value. If the data returned by the function is greater than 4Bytes, it is no longer necessary to return to data. Look at the example below:
Typedef struct mydef {
INT A, B, C, D;
Int arch [10];
} mydef;
Mydef func ();
int main ()
{
MyDef D;
D = func ();
}
mydef func ()
{
MyDef D;
Return D;
}
Then look at the anti-assembly code:
00000000 <.data>:
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 48 SUB $ 0x48,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: 8D 45 B8 LEA 0xfffffB8 (% EBP),% EAX
13: 83 EC 0C SUB $ 0xc,% ESP
16: 50 push% EAX
17: E8 06 00 00 00 Call 0x22
1C: 83 C4 0C Add $ 0xc,% ESP
1F: C9 Leave
20: C3 RET
21: 90 NOP
22: 55 PUSH% EBP
23: 89 E5 MOV% ESP,% EBP25: 57 PUSH% EDI
26: 56 PUSH% ESI
27: 83 EC 40 SUB $ 0X40,% ESP
2A: 8B 7D 08 MOV 0x8 (% EBP),% EDI
2D: 8D 75 B8 LEA 0xfffffB8 (% EBP),% ESI
30: FC CLD
31: B8 0E 00 00 00 MOV $ 0XE,% EAX
36: 89 C1 MOV% EAX,% ECX
38: F3 A5 REPZ MOVSL% DS: (% ESI),% ES: (% EDI)
3A: 8B 45 08 MOV 0x8 (% EBP),% EAX
3D: 83 C4 40 Add $ 0x40,% ESP
40: 5E POP% ESI
41: 5F POP% EDI
42: C9 Leave
43: C2 04 00 RET $ 0x4
Our custom structure is 0x38bytes, GCC assigns 0x40bytes space in order to keep the 16bytes alignment of the stack. The function FUNC does not have a parameter, but when the call is called, the pointer of the variable D is passed. Then use this pointer to assign D to D using the command MOVSL directly. Look at the example below:
Typedef struct mydef {
INT A, B, C, D;
Int arch [10];
} mydef;
Mydef func ();
int main ()
{
Func ();
}
mydef func ()
{
MyDef D;
Return D;
}
Look at the results of the anti-assessment:
00000000 <.data>:
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 48 SUB $ 0x48,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: 8D 45 B8 LEA 0xfffffB8 (% EBP),% EAX
13: 83 EC 0C SUB $ 0xc,% ESP
16: 50 PUSH% EAX17: E8 06 00 00 00 Call 0x22
1C: 83 C4 0C Add $ 0xc,% ESP
1F: C9 Leave
20: C3 RET
21: 90 NOP
22: 55 PUSH% EBP
23: 89 E5 MOV% ESP,% EBP
25: 57 PUSH% EDI
26: 56 PUSH% ESI
27: 83 EC 40 SUB $ 0X40,% ESP
2A: 8B 7D 08 MOV 0x8 (% EBP),% EDI
2D: 8D 75 B8 LEA 0xfffffB8 (% EBP),% ESI
30: FC CLD
31: B8 0E 00 00 00 MOV $ 0XE,% EAX
36: 89 C1 MOV% EAX,% ECX
38: F3 A5 REPZ MOVSL% DS: (% ESI),% ES: (% EDI)
3A: 8B 45 08 MOV 0x8 (% EBP),% EAX
3D: 83 C4 40 Add $ 0x40,% ESP
40: 5E POP% ESI
41: 5F POP% EDI
42: C9 Leave
43: C2 04 00 RET $ 0X4
It can be said that the word is not bad with the results above! We didn't declare the result of the variable storage of FUNC returned in the main function, but GCC did it for us. It still passes a pointer to the function func and passes the result, although we are not interested in the return value, but the compiler is not interested in our interests, still in my life. (If an optimized option is used, the result is likely to be the same).
8. Transfer parameters to the function
GCC follows general C language standards, including parameter delivery. Take a look at the example below:
CHAR RES;
Char Func (Char A, Char B);
int main ()
{
Res = func (0x02, 0x03);
}
Char Func (Char A, Char B)
{
RETURN A B;
}
Take a look at his disassembly code:
00000000 <.data>:
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP3: 83 EC 08 SUB $ 0X8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: 83 EC 08 SUB $ 0x8,% ESP
13: 6A 03 Push $ 0x3
15: 6A 02 Push $ 0x2
17: E8 0A 00 00 00 Call 0x26
1C: 83 C4 10 Add $ 0x10,% ESP
1F: A2 44 00 00 00 MOV% Al, 0x44
24: C9 Leave
25: C3 RET
26: 55 push% EBP
27: 89 E5 MOV% ESP,% EBP
29: 83 EC 04 SUB $ 0X4,% ESP
2C: 8B 45 08 MOV 0x8 (% EBP),% EAX
2F: 8B 55 0C MOV 0xc (% EBP),% EDX
32: 88 45 FF MOV% Al, 0xFffffff (% EBP)
35: 88 55 Fe MOV% DL, 0xffffffe (% EBP)
38: 8A 45 Fe MOV 0xffffffe (% EBP),% Al
3b: 02 45 FF ADD 0xFffffff (% EBP),% Al
3E: 0F BE C0 MOVSBL% Al,% EAX
41: C9 Leave
42: C3 RET
If you are proficient in assembly language, after reading this code, I am afraid you have already spit blood and fainting! GCC actually produced such a code! However, we still talk about the function call specification of the C language.
We have already seen that the parameters are set from right to left. The following instructions are all subject to 32bytes code, which specifically specificallys the following:
l The caller is responsible for pressing the parameters into the stack, the order is from right to left. That is, the last stack on the left.
l The caller uses the NEAR CALL instruction to transmit control to the caller.
l is controlled by the caller, which generally needs to create a stack frame (this is not required, usually do it). First, press the EBP into the stack to save, then place the EBP into the EBP, make EBP a base pointer to the access parameter. l Access parameters by the caller through the EBP. Because EBP has pressed into the stack first, [EBP 4] is automatically pressed into the stack by the CALL instruction, obviously, from [EBP 8], it is the parameter. Since the parameter on the left side of the function is finally pressed into the stack, [EBP 8] is this parameter, and other parameters are pushed in this class. A function like Printf has a number of unsure parameters, but the parameter is set in the stack order, indicating that the caller can find the first parameter, the type and number of other parameters, need to be The first parameter is given.
l The value of the caller reduces the value of the ESP is the temporary variable allocation space in the stack, and then uses EBP and a negative offset access.
l Use Al, AX, and EAX to return to different values. The floating point number can be returned by the ST0 register.
l After the caller is completed, use the previously established stack frame, restore the value of the ESP, SBP, and return the caller using the RET instruction.
l The caller is reaches control, by adding an immediate empty stack for ESP (try not to use multiple POP instructions to empty the stack). If a function prototype is passed by a function prototype through the stack, the caller is still able to restore the stack to the correct state because the caller knows the data of several bytes to the stack.
Combined with the C language function call rules, the above code is not difficult to understand.
Starting from 80386, the number of operands of the PUSH instruction can be 8-bit, 16-bit, 32-bit, but the C language is all processed by 32-bit integer, and the caller is also processed by 32-bit. This is important, especially when assembled language and C language mixed programming.
9. Conversion between basic data types
GCC processing three types of basic data types:
L Signed Char, Unsigned Char, 1 Byte
L Signed Short, Unsigned Short, 2 Bytes
L Signed Int, unsigned int, 4 bytes
Conversion between various data types, follows the rules of the general C language, specifically refer to IA-32 standards. Here is only an example:
int main ()
{
CHAR CH = 'a';
INT x = 2;
INT Y = -4;
}
Compilation and disassembly using the same method:
00000000
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 18 SUB $ 0x18,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP
10: C6 45 FF 61 MOVB $ 0x61, 0xfffffffff (% EBP) 14: C7 45 F8 02 00 00 MOVL $ 0x2,0xfffffffff8 (% EBP)
1B: C7 45 F4 FC FF FF MOVL $ 0xffffffc, 0xfffffff4 (% EBP)
22: C9 Leave
23: C3 RET
10. Basic operating environment for GCC compiling code
In this part, I checked a lot of documents, and there is no introduction in this regard. Ask a lot of masters, the situation is about the following, I can't guarantee that it is correct here, and the future is also correct, for reference only:
l 32-bit protection mode runs.
The L-segment register CS, DS, ES, FS, GS, and SS must point to the same paragraph memory area.
l The global variable that is not initialized is placed in the BSS segment, and the area is behind the code segment. However, if you generate files are binary files, the BSS segment is not part of the file, you need to be careful. The global variable initialized is within the DATA segment, which is part of the binary and is located after the code segment. The global variables that are declared for const are placed in the RODATA segment, and it is also part of the binary file and placed behind the code segment.
l Make sure the stack has no overflow, careful code segment and global data do not be destroyed.
I also checked Intel's help document "Intel Architecture Software Developer's Manual", there are three volumes of three volumes! Refer to the statement about the memory organization (Volume 1: Memory Organization) (suggest you to study). In summary, make CS, DS, SS always point to the same memory area, you should make the code correctly. If the operation environment is not the case, I don't know the result.
11. Access to the external global variable
See how global variables in the C language program in a non-C language program. This part is useful if you want to use other programs to load C programs, such as assembly language written, especially during core development.
INT myval = 0x5;
int main ()
{
}
Compile this code:
GCC -C Test.c
ld -ttext 0x0 -e main -n -oformat binary -map memmap.txt -o test.bin test.o
Objdump -d -b binrary -m i386 test.bin
Got the following results:
00000000 <.data>:
0: 55 push% EBP
1: 89 E5 MOV% ESP,% EBP
3: 83 EC 08 SUB $ 0x8,% ESP
6: 83 E4 F0 and $ 0xffffffff0,% ESP
9: B8 00 00 00 MOV $ 0x0,% EAX
E: 29 C4 SUB% EAX,% ESP10: C9 Leave
11: C3 RET
12:00 00 Add% Al, (% EAX)
14: 05 .byte 0x5
15:00 00 Add% Al, (% EAX)
The global variable MyVal is stored in 0x14. I have just used the -map switch to make LD generate memory image file MemMap.txt, you should be able to find:
.DATA 0x00000014 0x4
* (. Data .data. * .gnu.linkonce.d. *)
.DATA 0x00000014 0x4 Test.o
0x00000014 MyVal
Note MyVal is located at the 0x00000014 position of the Test.o module. Using the address as an offset, you can access the MyVal variable directly in other languages. Another example can also find the size of the BSS segment by MemMap.txt:
CAT memmap.txt | grep '/.bss' | grep '0x' | sed 's /.* 0x / 0x /'
This example, the size of the BSS is 0x0.
Unable to access global variables using Static modified in the C program. Because such variables are static, they are not listed in the MAP file. Maybe you can use other ways, but it is best not to do this.
12. Options for generating binary files in other formats
The binary file that generates different formats is a quite trouble. It requires many unusless options and some are not listed in the Mana Help information.
The first is the option for GCC: -nostdinc. Obviously, after using this option, GCC does not search for the default incrude path, usually / usr / include. If you need to use a custom header file, you can add a search path using the -i option.
Then the option of the LD. The first is -nostdlib, ignoring the standard library. If necessary, you can use the -l option to specify the search path of the library. The second is -ttext, which is the address of the specified code segment. If the address of the other segment is not specified, they will be placed automatically after the code segment. The third is -e, is the entry address of the specified code, the default is _start, if the code is not starting, it should specify the entry point. The fourth is -oformat binary, that is, the output file is the original binarn file, but a file can make any files supported by the system. However, the intermediate module file cannot be the original binarily, because many symbols and relocation information are needed. You can use the -monaat option to specify the format of the input file, but usually very little use. The fifth is -static, if other libraries are used, use the static link mode, unless your program supports dynamic links.
There is also a code to indicate the pseudo directive. The assembler can compile 16-bit code, or compile 32-bit code. However, GCC always generates the assembly code of 32-bit. The GCC can generate 16-bit assembly code by using the ASM () pseudo command in the C code.
The first is .CODE16, 16-bit code that is running in the 16-bit segment;
The second is .CODE32, 32-bit code that is running in the 32-bit segment, the GCC is always doing by default;
The third is .CODE16GCC, GCC will decide on the 16-bit or 32-bit code that is running under the 16-bit section as needed. GAS will add the necessary prefix, indicating the 32-bit command or register, and the like. This option is useful, which allows us to write code running in the 16-bit environment in the C language, whether it is a real mode or protection mode. You can now have both 16-bit code in a C module, and 32-bit code, but you should pay attention to the address space problem of different partial code.
For example, we want to use GCC to generate an .COM program and start boot program running under DOS.
First, the .com file in DOS is the original binary running in real mode, whose starting address is 0x100. To generate the .com file using GCC, add the following pseudo command at each .c file:
__ASM __ ("CODE16GCC / N");
These library files need to be generated in this way if you need to reference other library files. In the link, add the following options:
-Ttext 0x100 -static -oformat binary
If the program contains embedded assembly code, it needs to be converted to the AT & T format.
If you want to write the boot program, you only need to use 0x7c00 to replace 0x100 when you are linked! In addition, the final generated binary code must be less than 446 bytes!
13. Reference
l Intel Architecture Software Developer's Manual
l Manual Pages in Linux
l Redhat gnupro Toolkit