Byte alignment and CC ++ function call mode learning summary

zhaozj2021-02-16  92

Preface: "*** Software Programming Specification" mentioned: "When defining structure data types, in order to improve system efficiency, pay attention to 4 bytes alignment principles ...". This will explain the mechanism of X86 on the byte alignment, and other architectural readers can test themselves. At the same time, this paper discusses the C / C function call mode.

BTW thinks about a few days to write a summary but until today's 18th. Writing should be helpful to yourself. Maybe there is a little bit of reference value. Due to the fact that I am in terms of my own level, I welcome everyone.

Thanks a few colleagues. And Carrot. Oh ... The following is a return.

1. First look at the example below: struct a {char C1; int i; short s; int J;} a;

Struct B {Int i; int J; short s; char C1;} b;

Structure A does not comply with byte alignment (for distinguishing, I call it to alignment declaration), structural B is observed. Let's see what results will appear on x86. First print out the address of each member of A and B. It will be seen that the spacing between each member is 4 bytes. B, I and J, J and S are between 4 bytes, but S and C1 are 2 bytes. So: SIZEOF (a) = 16sizeOf (b) = 12 Why does this result? This is the role of X86 on byte alignment. In order to speed up the speed of the program, some architectures are designed in alignment, usually in the alignment boundary. For some structural variables, the entire structure is to align the maximum alignment boundary in the internal member variable, such as B, the entire structure is aligned with 4, so sizeof (b) is 12, not 11. For A, although there is no alignment when the statement is declared, the compiler has been automatically aligned according to the printed address, so the spacing of each member is 4. Under x86, the unique difference in declaration A and B is only a 4 byte memory. (Is it a specific case, B is implemented faster, this still needs to be discussed. For example, two instructions of the next to T and C1 respectively, if the architecture is not aligned, the members in A will be one A stored, so SizeOf (A) is 11. Obviously, the space is wasted. So why should I use alignment? The alignment and non-alignment of the architecture is a trade-off at time and space. Allow time. Assuming a word length of a architecture is W, it assumes that it is most important to process the data of the width W on this architecture. Its design is also taken into account from priority to improving the efficiency of W-bit data operations. For example, when reading and writing, most cases need to read and write W bit data, then the data channel will be W bit. If all data access is equal to W, the access can be further accelerated because the address bit that needs to be transmitted is reduced, and the address can be accelerated. Most architectures are aligned in accordance with the word length. When it is not aligned, some will be wrong, such as the BUS ERROR, and X86 will perform multiple access to splicing results, thereby reducing the execution efficiency.

Some architectures must be aligned, such as sparc, mips. They are aligned in hardware design on mandatory requirements. Not because they can't do alignment access, but they think that there is no meaning. They pursue speeds.

The system is aligned with the architecture. On the IA-32, SIZEOF (A) is 16, which is the result of alignment. Let's take a look, why is the variable declare to be aligned as possible. We see that the declaration of structural A is not aligned, but its members' address is still aligned with 4 as boundary (member spacing is 4). This is the credit of the compiler. Because the compiler GCC I use is aligned by default. The x86 can handle non-aligned data access, so this declaration process will not be wrong. However, for other structures, only the aligned data, and the compiler is not carefully set up, and the code cannot be executed. If you declare in a B method, you can access the data regardless of whether the compiler is equipped with alignment options. The current development generally pays attention to performance, so aligned problems, there are three different processing methods: 1) Declaring in B. 2) For logically related member variables, it is desired to be placed in a way. There is an approach to explicit inserting reserved member: struct a {char c1; char reserved1 [3]; int i; short s; char reserved2 [2]; int J;} a; 3) Just write, everything is given to The compiler is automatically aligned.

Many of the hidden dangers of the code are implicit. For example, when the mandatory type is converted. The following example: unsigned int ui_1 = 0x12345678; unsigned char * p = null; unsigned short * us_1 = null;

P = & ui_1; * p = 0x00; US_1 = (unsigned short *) (p 1); * US_1 = 0x0000; the last two code, from the odd boundary to access the unsigned short type variable, obviously does not meet the alignment regulations. On x86, similar operations will only affect efficiency, but on MIPS or SPARC, it may be a BUS Error (I have not tried). Some people like to operate the structure in the structure by moving pointers (such as members of Linux operation struct SK_Buff, but we see that A in A (& C1 1) is never equal to & i. But B (& S 2) is & C1. Therefore, we know the storage location of members in the structure to write unlolecular code. At the same time, I remember that regardless of the structural, array, or ordinary variables, you must see more when the mandatory type conversion :) But in order not to be tired, or observe the statement of the statement! (This principle is to say that the variables are as declared on its alignment boundary, and on the basis of space savings)

2. C / C function call mode We certainly know that the function call in C / C is a value passed, not the parameter delivery. So, how is the value transfer? The typical assembly before the function call is as follows: push% EaxCall 0x401394 add $ 0x10,% ESP first, and the stack is the address of the argument. Since the modulated function is to operate on the address, the principle and parameters that can be understood to be able to understand is the case. Call ***, is to call the function, the later address is the entry address of the function. Call instruction is equivalent to: Push IP JMP *** First put the current execution address IP, and then jump to function execution. After execution, the called function is to return, and the RET instruction is to be executed. RET is equivalent to POP IP, restoring the execution address before the CALL. So once the CALL instruction is used, the stack pointer SP will automatically reduce 2 because the value of IP is in the stack. The order of the parameters of the function is from right to left, which is the difference between C and other languages ​​such as Pascal. The function call starts with the following statement: PUSH% EBPMOV% ESP,% EBP first saves the value of the BP, and then passes the current stack pointer to the BP. So now BP 2 is the value of IP (the case of 16-bit register), the BP 4 places the value of the first parameter, BP 6 places the second parameter ... The function is executed before the end, performs POP BP. C / C language default function call mode is made by the main call function and restores the stack. The stack order of the arguments is from right to left, and finally the stack recovery is made by the master function. Since the main adjustment function management stack, the changeable function can be implemented. For WinAPI and Callback functions, it is responsible for stack in the main call function and is responsible for popping the parameters in the stack in the called function and is responsible for restoring the stack. Therefore, a variable reference function cannot be achieved. (Which is more understanding of compilation principle and compiler, you can write this part, thank you. Can join the time of compile time. Otherwise, only the time to continue to learn)

转载请注明原文地址:https://www.9cbs.com/read-14368.html

New Post(0)