Analysis of the C language and CPU floating point mechanism under Intel IA32 architecture

xiaoxiao2021-03-06  73

(Reprint, please indicate the original author and the source)

(Original (normal size font and PDF format): http://www.binghua.com/Article/class1/class2/200409/259.html)

Analysis of the C language and CPU floating point mechanism under Intel IA32 architecture

Version 0.01

Harbin Institute of Technology Xie Yubo

(Email: xieyubo@126.com URL: http://purec.binghua.com)

(QQ: 13916830 Harbin Institute of Technology BBSID: IAMXIAOHAN)

Foreword

When I saw a C language book, I found some such a paragraph.

Example: Assign the same real value to a single precision real shape and double precision real, and then print out.

#include

Main ()

{

Float a;

Double B;

A = 123456.789E4;

B = 123456.789E4;

Printf ("% f / n% f / n", a, b);

}

The results of the operation are as follows:

1234567936.000000

1234567890.000000

Why is the result of the output to the Float type variable and Double type variables? This is because the value of the true constant is assigned to the Float type variable and the assignment to the Double type variable, and the valid numeric bits they receive are different.

This paragraph is correct, but it is too blurred! Why is the result of an output than the original? Why isn't it more than the original? Is there any of the fundamental causes of memory or random? Why is there such a situation? This is not explained above. The above explanation is the most ordinary explanation, even it is just a phenomenon, and there is no deep explanation reason, which is not awkward, I feel very unhappy!

There is also the following paragraph:

(1) The results of the two integers are still integers, and they will go to the value of the fractional part. For example, the result value of 6/4 and 6.0 / 4 calculations is different, and the value of 6/4 is integer 1, and the value of 6.0 / 4 is a real value 1.5. This is because when one of the operands is real, the result of the integer and the real number operation is a Double type.

Unfortunately, "the result of" integer and real arithmetic operation is Double type ", such an expression is inaccurate, regardless of the anti-assessment results of the actual program, or from the analysis of the CPU hardware structure, this statement is very worth scrutinizing . However, in many C language tutorials, we always see such statements: "All computments involving real numbers will be converted into Double, and then operate." However, is the actual situation?

Regarding this part of the floating point number, most of the C textures have not been involved, which makes us have a lot of questions when we use the C language.

Let's take a look at the following procedure:

/ * ------------ A.C ------------------ * /

#include

Double F (int X)

{

Return 1.0 / x;

}

void main ()

{

Double A, B;

INT I;

A = f (10);

B = f (10);

i = a == b;

Printf ("% d / n", i);

}

After this program uses gcc -o2 a.c compiled, the output of running it is 0, that is, A is not equal to B, why?

Take a look at the following, almost the same procedure as the above:

/ * ---------------- B.C --------------------- * / # include

Double F (int X)

{

Return 1.0 / x;

}

void main ()

{

Double A, B, C;

INT I;

A = f (10);

B = f (10);

C = f (10);

i = a == b;

Printf ("% d / n", i);

}

It also compiles GCC -O2 B.C, and the result of this program output is 1, which means a equal to B, why?

There is almost no C language book in China (at least I haven't seen it yet), explain this problem, in terms of the processing of floating point in C language, almost all of them will taste, and some books in foreign countries There is a detailed description of this, the above example is from a book "Computer Systems a Program's Perspective" (referred to herein 2, hereinafter referred to as "CSAPP"), this book is handled on C language and CPU The number of points description is very detailed. There are many places where many books in China is obvious. We don't have some kind of deep spirit for some details. If you don't have to get a water, this is also destined to publish some Bible level work. If a book is worth a long-term reservation, you can become Bible, then I think it must be very clear about a detail description, so that after reading this book, you can never need to read other books, you can refer to this detail.

"Csapp" This book is indeed very classic, and unfortunately, this book seems to have no electronic version, so I plan to be based on this book (some examples and descriptions from this book), plus some other you have seen. Information, as well as the understanding and analysis of this problem, talk about the processing of C language and Intel CPU on floating point numbers, in which, in this respect, you can help with schoolmates who don't know this part.

To read this article in any obstacle, you need to know about C language and compilation. All experiments in this article are based on Linux, hardware based on Intel IA32 CPU, so if you want to learn more from this article, you can best Skilled in Linux GCC and Objdump command line tools (very unfortunately, now there are less C language textbooks to tell this), in addition, you need to understand the stack operation, this is in any part of the data structure Both will be mentioned.

Because of its own knowledge and ability, if there is a place or wrong, please contact me, I will track all the issues on the Harbin Institute of Pure C Forum (http://purec.binghua.com) and Feedback.

First, the logical structure of the Intel CPU floating point arithmetic unit

For a long time, due to the restriction of the CPU process, it is impossible to integrate a high-performance floating-point computer in a single chip. Therefore, Intel also develops a so-called coordinator to match the main processor to complete high performance floating point operations. For example, 80386's coprocessor is 80387, and later, due to the development of the integrated circuit process, people have been able to integrate more logical function units in one chip, so in 80486dx, Intel integrates in the 80486DX chip. Powerful floating point processing unit. Below, let's take a look, after being integrated into the main processor, the logical structure of this floating point processing unit, which is a prerequisite for understanding the Intel CPU to floating point processing mechanism. (Figure 1 Intel CPU floating point processing unit logic structure)

The above figure is the logical structure diagram of the Intel IA32 architecture CPU floating point processing unit. From the figure we can see that it has a total of 8 data registers, each 80-bit (10B); a control register, a status register (STATUS register), a flag register, each 16-bit (2b); there is a last instruction pointer, and a last Operand Pointer, each 48 Bit (6b); and an Optographic Register.

The status register is similar to the program status word of the common primary CPU. It is used to mark whether the operation overflows, whether it produces an error, etc., the main one is that it also records the top position of the top of 8 data registers (this will be below Detailed description).

The most important thing in the control register is that it specifies the rounding of this floating point processing unit (later described in detail) and precision (24 bits, 53, 64 bits). The default accuracy of the Intel CPU floating-point processor is 64-bit, also known as Double Extended Precision ("in Chinese): Dual extension accuracy, but this proprietary noun, not translating is better, translated, but feel more.) And the accuracy of the 53-bit, to support the floating point standard defined by IEEE (IEEE 754 standard), which is Float and Double in the C language.

The flag register indicates the state of each register in 8 registers, such as whether they are empty, whether it is available, whether it is zero, whether it is a special value (such as nan: not a number).

The last command pointer register and the last data pointer register are used to store the last floating point command (non-control instruction) and the number of operands used in memory. Since the 16-bit segment selectors and 32-bit memory offset addresses, these two registers are 48 bits (which involves the memory address access method under the Intel IA32 architecture. If it is unclear, it can not be Too much, just know if they indicate a memory address, if you want to figure out to see the reference 1) of this article.

The operation code register records the operand of the last floating point command (non-control instruction), which is simple, nothing to say. Below we will describe the eight data registers in the floating point processing unit, which is very different from the general registers in the main CPU we usually used, such as EAX, EBX, ECX, EDX, etc. It is very important to understand the Intel CPU floating point treatment mechanism!

Second, Intel CPU floating point arithmetic unit floating point data register organization

In total, the floating point data register in the Intel CPU floating point unit is 80, namely 10-bytes of registers. For each byte, I will describe the floating point in the format after each byte. Introduction, here will be described in detail here that these 8 registers organize and how to use it.

Intel CPU organizes the eight floats registers into a stack and uses some of the signs in the status register to mark the position of the stack of this stack. We record this stack to ST (0), followed by the top The next element is ST (1), and then the next is ST (2), and it is pushed. Since the size of the stack is 8, when the stack is filled, the element can be accessed is ST (0) ~ ST (7), as shown below:

(Figure 2 Location of the top of the floating point register when loading different data)

It can be apparent from the above figure to see how the floating point register is organized and used. It should be noted that we cannot use R0 ~ R7 directly through the instructions, and only ST (0) ~ ST (7) can be used, which will be described in detail when describing the floating point calculation instruction.

Maybe there will be a friend to have questions about the above picture. When you have already put it in 8 numbers, what is the case in which the ST (0) is placed in R0, what is the order?

When 8 numbers have been stored in the floating point register, the data is placed in the inside, which is processed according to whether or not the corresponding shield bit in the control register is set. If the corresponding shield bit is not set, it will produce an exception, just like an interrupt, processed by the operating system, if the corresponding shield bit is set, the CPU will simply replace the original value with an uncertain value. . As shown below:

(Figure 3 is loaded with data greater than eight numbers, floating point register status)

It can be seen that in fact, the floating point register is equivalent to being organized into a ring stack. When ST (0) is in the R7 position, if there is data load, ST (0) will return to the R0 position, but this time is loaded into ST (0) It is an uncertain value because the CPU regards this superior as an error.

So what is the saying? Don't worry, after describing the floating point operation below, I will use a paragraph of experimental code to verify the above.

Third, the Intel CPU floating point operation instructions for the use of floating point registers

In the second quarter, we pointed that the Intel CPU organized its 8 float registers as a ring stack structure, and used ST (0) to refer to the top, corresponding, Intel CPU's considerable part of the floating point operation instruction only Operate the data of the stack, and most of the instructions exist, one will pop, one will not pop up. For example, a number of taps below:

FSTS 0x12345678

This is a directive that does not pop, it just puts the top of the stack, the data of the ST (0) to the memory address of 0x12345678, where the final letter S of FSTS indicates that this is the single-precision number. That is to say, it will only store four bytes in ST (0) into memory space starting with 0x12345678. Specifically, the four bytes involves a conversion from 80-bit ST (0) to single-precision (float), which will be described in detail below. After the above instruction is executed, the array operation is not performed, that is, the value of ST (0) will not be lost, and the following is the splay version of the same instruction:

FSTPS 0x12345678

The functionality of this instruction is almost the same as one of the above instructions, and the only difference is that this is an instruction that causes the population operation, where the letter P in FSTPS indicates this. After this instruction is executed, the content in St (0) is lost, and the content in the original ST (1) becomes the content in ST (0), and this stack is flexible, I want to be familiar with everyone. However, there will be no longer described here, unclear, you can refer to any of the books of the data structure.

This article is directed to describing the basic principles of Intel CPU floating point digital processing mechanisms, not floating point instructions, so this article no longer describes many floating point instructions, in the following description, this article only performs the instructions used. Simple explanation, if you want to complete the floating point instruction, you can refer to the reference 1 of this article.

Below, we will end this example with an example, which will involve the contents of the above and this section, which verifies that the above description is correct.

Please enter the following code under Linux:

/ * ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------- * /

Void f (int x [])

{

INT f [] = {1, 2, 3, 4, 5, 6, 7, 8, 9};

/ * ----------------------------------------------- ------------- * /

__ASM __ ("Fildl% 0" :: "m" (f [0]));

__ASM __ ("Fildl% 0" :: "m" (f [1]));

__ASM __ ("Fildl% 0": "m" (f [2]));

__ASM __ ("Fildl% 0" :: "m" (f [3]));

__ASM __ ("Fildl% 0" :: "m" (f [4]));

__ASM __ ("Fildl% 0" :: "m" (f [5]));

__ASM __ ("Fildl% 0" :: "m" (f [6]));

__ASM __ ("Fildl% 0" :: "m" (f [7]));

/ / __ASM __ ("Fildl% 0" :: "m" (f [8])); (*)

/ / __ASM __ ("FST% ST (3)"); (**)

/ * ------------------------------------------------------------------------------------------------------------------------------------------------- ----------------- * /

__ASM __ ("fistpl% 0" :: "m" (x [0]));

__ASM __ ("fistpl% 0" :: "m" (x [1])); __ASM __ ("fistpl% 0" :: "m" (x [2]));

__asm ​​__ ("fistpl% 0" :: "m" (x [3]));

__ASM __ ("fistpl% 0" :: "m" (x [4]));

__ASM __ ("fistpl% 0" :: "m" (x [5]));

__asm ​​__ ("fistpl% 0" :: "m" (x [6]));

__ASM __ ("fistpl% 0" :: "m" (x [7]));

}

void main ()

{

INT X [8], J;

f (x);

For (j = 0; j <8; J)

Printf ("% d / n", x [j]);

}

The above code is embedded in the embedded assembly, and the integer in an integer array is pressed into the floating point register in the A section (the Fildl instruction is used to press the integer into the floating point register), and then the number in the floating point register is Take another array (the fistpl instruction is used to store the stack top data into the specified memory cell, and the letter P in the instruction indicates that this is a pop-up command, and the array will be popped each time.). In the program, we only pressed 8 data of F [0] ~ F [7], while the crimped data order is 1, 2, 3, 4, 5, 6, 7, 8, and therefore, the order of removal should It is 8, 7, 6, 5, 4, 3, 2, 1, compiling and running under Linux. We will get the same result.

Below, we will remove the annotation symbol "//" at the (*) statement, this time we press 9 of the F [0] ~ F [8], which will cause the superior, according to the above description, When this happens, ST (0) will change from the position of the R0 to R7, and in which an uncertain value is stored, then the actual situation is this? Also, compile and run this program under Linux, and compare the results with Figure 3. As you need here, we always take out data in the order of ST (0), ST (1), ..., ST (7).

Finally, we will remove the annotation symbols at the (**) statement, "FST% ST (3)" The function of this instruction is to store the contents in the ST (0) in ST (3), and there is no P in the instruction. Letters, therefore, this is not a command that causes the spring. Also, please compile operation under Linux and observe the results of Figure 3 to verify the content described above.

Fourth, floating-point format

The floating point in the C-language and the CPU follow the IEEE 754 standard. Let's discuss this in detail below.

The IEEE standard pointed out that a number can be represented as:. Where S is 0 when V is positive, the negative number is taken 1. Corresponding to the program, this is obviously a symbol bit, so S accounts for 1 bit, and the number of bits of V and M is determined by the data type. If it is a single precision type (FLOAT), the e accounts for 8 bits (E = 8), m accounts for 23 bits (m = 32), if it is a double-precision type (Double), E accounts for 11 bits (E = 11), M Take 52 bits (m = 52), as shown below: (Figure 4 IEEE 745 floating point format)

It should be noted here that we use S ', E', M 'rather than S, E, M, because there is a conversion relationship between them. Please see the description below.

S is equal to S 'at any time, but E and M are different. When e '= 0, e, m = 0.m', give an example: What is the floating point number indicated by 0x00 50 00 00? We start this number

0000 0000 0101 0000 00 00 00 0000

The most open red 1 bit is symbolic bit s ', so S = s' = 0, the middle 8-bit blue is E 'bit, because E' = 0, so according to the above formula, E = -126, last Is M ', so M = 0. 101 0000 0000 0000 0000 = 0.625, so this decimal should be V == 7.346840e-39, then actually doing this? We also verify it by one of the following programs:

Union u {

Float f;

Struct {

UNSIGNED Char x1;

UNSIGNED Char X2;

UNSIGNED Char X3;

UNSIGNED Char X4;

}

} u;

int main ()

{

u.x1 = 0x00;

u.x2 = 0x00;

u.x3 = 0x50;

u.x4 = 0x00;

Printf ("% E / N", U.F);

Return 0;

}

The program is very simple, you can compile and execute under Linux to check foregoing.

Talking about when e '= 0, then when E' is not 0, and is not all 1 (ie E ')? At this time, E = E ', M = 1.m', give an example, how much is the number of floating points indicated by 0xBF 20 00 00? Also expand this number:

1011 1111 0010 0000 00 00 00 00 00 0000

Below we are calculated as required:

S = s' = 1, E = -1, m = 1.m = 1. 010 0000 0000 00 00 0000 = 1.25

and so

V = = -0.625

Similarly, we verify it through a program, but this time we change this program, enter 0.625 directly, look at its output bytes:

Union u {

Float f;

Struct {

UNSIGNED Char x1;

UNSIGNED Char X2;

UNSIGNED Char X3;

UNSIGNED Char X4;

}

} u;

int main ()

{

u.F = -0.625;

Printf ("% 2x% 2x% 2x% 2x / n", u.x4, u.x3, u.x2, u.x1);

Return 0;

}

After compiling and run, the output we get is: BF 20 0 0 This is exactly the same as our previous analysis.

Here is to pay attention to you, through this example, we can easily see that in the IEEE format, negative numbers are not expressed in complement!

There is only the last case left below: When E 'is all 1, E'. At this time, if m '= 0, this number indicates endless (INF, when S = 0 is infin, when S = 1 is negative INF), if M' is not 0, this number is NaN (Not a number), For example, you have to calculate the square root of -1 -1, you will get nan, indicating that it cannot be represented by a number.

The floating point format defined by IEEE is described above, and is used in the Intel CPU, which is a floating point number of extended precision. Its E has 15 bits (E = 15), and M has 63 (m = 63) position. Plus 1 bit symbol bit, just equal to 80 bits, which is the same as the length of the floating point register in the Intel CPU. However, the determination method of E and M bits is still compatible with the IEEE standard, that is, the method described in this section is fully applicable to the 80-bit extension format of the INTE CPU.

V. Floating point number rounds and processes on programming

Due to the existence of a plurality of format floating point numbers, it will be rounded from a high-precision format to a low-precision format, and due to rounding, there will be a lot of very interesting mistakes in the program, now Let's talk about this problem. Here will eventually solve the problem mentioned in the preface.

First talk about the transformation from low precision format to high precision format. This does not cause loss of accuracy, so such a conversion is safe. As indicating the number of digits indicating the data of the data, the high-precision format can completely copy the corresponding data bits of the low precision format, and will not lose any Information, however, from high precision to low accuracy, it will lose information because the low-precision format is less than high precision, and all information is not received.

Now let's take a look at the maximum positive or minimum positive numbers of various precision formats.

First, look at the minimum positive number, the minimum positive number E '= 0, and M' = 000 ... ..1, so that the previous description is known, for single-precision number (float), minimum positive number, for double precision Number (double), minimum positive number; maximum positive number E '= 111 ... 10, and M' = 111 ... 1, so the maximum positive number of single precision numbers is the maximum positive number of double precision (double) for. It is obvious that the range of the double precision can be seen, and the range representation of the single quantitative number is expressed.

When a double-precision number indicates that the maximum number of single-precision can be expressed, the CPU will make the single-precision number is equal to infinity. At this time, the single-precision number of E 'is all 1, and M' = 0 (see as described above).

When a double-precision number is more than the minimum number of single-precision, the CPU will make the single-precision number equal to 0, at this time, the single-precision number of E ', M' is 0. Note that the fact that m has only two can express forms, one is m = 1.m ', one is m = 0.m', when m = 0.m ', the index e can only be, this Features determine that any non-zero number is only a determination. This determinism makes the computer's processing very convenient.

For example, a Double type floating point number, it can only have a type when using the float type, which allows us to determine the Float type data, and you can determine if this number has made float The range of numbers. Similarly, for a computer, this greatly facilitates the conversion between different precision, so that this conversion has a unique situation.

In the CPU, all floating point numbers are converted when they are floating in floating floating point registers, from single precision, double precision, integer to 80-bit extension accuracy, and deposit memory from floating point registers The conversion occurs, from the extended precision to the corresponding precision format, this conversion is automatically completed by the CPU hardware, but it is because of the existence of the behavior of the extension accuracy to low precision format, it will make us There are some strange problems in the program. Please see a piece below:

int main ()

{

Float f = 3.25 1e10;

f = f - 1e10;

Printf ("% f / n", f);

Return 0;

}

This code is an example of a typical precision, I will come carefully to analyze it now:

3.25 1E10 This number we can expressed as:

1001 0101 0000 0010 1111 1001 0000 0000 11.01

Since M 'can only be in the form of 0.m' or 1.m ', if it is 0.m', the above formula can be represented as:

0. 1001 0101 0000 0010 1111 1001 0000 0000 1101

If it is 1.m ', it can be expressed as:

1.001 0101 0000 0010 1111 1001 0000 0000 1101

Now we convert them to a single precision, according to the format of the IEEE single-precision, if it is 0.m ', the index can only be, and the above E = 34, so the number above can only be used 1 In the form of .m, this time, m '= 001 0000 0000 0000 0010 1111 1001 0000 0000 1101, but since IEEE pointed out that M' can only have 23 bits, therefore, the last 0000 0000 1101 will be Truncate, M 'actually:

M '= 001 0101 0000 0010 1111 1001

That is, 3.25 in the original number is lost, so f = 1e10 after f = 3.25 1e10, so, the output result of the above code is 0.

Here, it is necessary to mention the truncation method described above, and a professional term is called rounding. IEEE defines four rounds of rounding rules, namely "round to evenu", "round toward zero", "round down", "round up", which rounding rules you use can be in floating point units The control register is specified (see "Refer to previous article), the default is to use" Round to Even ", let's take a look at these rules. First look at "Round to Even", if there is a number of x.yyyy0 .... So rounded for x.yyyy; if there is a number of x.yyyy1 .... x.yyyy 0.0001; if the original number is x.yyyy100 ....00, then this time is discussed. If the last Y is 1, then rounded as X.YYYY 0.0001, if the last Y is 0, then rounded as X.YYYY. For example, 1.0110 100 This number is 1.0110, while 1.0111 100, this number is 1.0111 0.0001 = 1.1000.

Take a look at "Round Toward Zero", this is very simple, it requires the absolute value of the number after the rounding of the number of not more than the absolute value of the original number.

"Round Down" requires that the number after rounding is not more than the original number.

"Round UP" requires that the number after rounding is not less than the original number.

The above example of accuracy loss is a relatively simple and simple example. But things don't always be so obvious, let's take anatomical procedure mentioned in the foreword:

First review the questions mentioned in the foreword:

Let's take a look at the following procedure:

/ * ------------ A.C ------------------ * /

#include

Double F (int X)

{

Return 1.0 / x;

}

void main ()

{

Double A, B;

INT I;

A = f (10);

B = f (10);

i = a == b;

Printf ("% d / n", i);

}

After this program uses gcc -o2 a.c compiled, the output of running it is 0, that is, A is not equal to B, why?

Take a look at the following, almost the same procedure as the above:

/ * ---------------- B.C --------------------- * /

#include

Double F (int X)

{

Return 1.0 / x;

}

void main ()

{

Double A, B, C;

INT I;

A = f (10);

B = f (10);

C = f (10);

i = a == b;

Printf ("% d / n", i);

}

It also compiles GCC -O2 B.C, and the result of this program is 1, that is, a equal to B, why?

We will now disconnect the first code, namely A.C, the following is the result of the contrast:

08048328

:

8048328: 55 PUSH% EBP

8048329: 89 E5 MOV% ESP,% EBP

804832B: D9 E8 FLD1804832D: DA 75 08 FIDIVL 0x8 (% EBP) // Calculate, and store the results in ST (0)

8048330: C9 Leave

8048331: C3 RET

8048332: 89 F6 MOV% ESI,% ESI

In the above code, we only need to note that the FIDIVL instruction represents the number in ST (0) in a certain memory address, and the result exists in ST (0).

Note that such a fact, the current calculation result is in ST (0), and this is an 80-bit value. Let's take a look at the code in main:

08048334

:

8048334: 55 PUSH% EBP

8048335: 89 E5 MOV% ESP,% EBP

8048337: 83 EC 08 SUB $ 0x8,% ESP

804833A: 83 E4 F0 and $ 0xffffff0,% ESP

804833D: 83 EC 0C SUB $ 0xc,% ESP

8048340: 6A 0A Push $ 0xA

8048342: E8 E1 FF FF FF CALL 8048328

8048347: DD 5D F8 FSTPL 0xffffffff8 (% EBP)

804834A: C7 04 24 0A 00 00 MOVL $ 0XA, (% ESP, 1)

8048351: E8 D2 FF FF FF CALL 8048328

8048356: DD 45 F8 FLDL 0xfffffff8 (% EBP)

8048359: 58 POP% EAX

804835A: DA E9 Fucompp

804835C: DF E0 fnstsw% AX

804835e: 80 E4 45 and $ 0x45,% ah

8048361: 80 FC 40 CMP $ 0X40,% AH

8048364: 0F 94 C0 STE% Al

8048367: 5A POP% EDX

8048368: 0F B6 C0 MOVZBL% Al,% EAX

804836b: 50 push% EAX

804836C: 68 D8 83 04 08 Push $ 0x80483d8

8048371: E8 F2 Fe FF FF CALL 8048268 <_init 0x38>

8048376: C9 Leave

8048377: C3 RET code is very long, but we actually need to care about a small part, please see:

8048342: E8 E1 FF FF FF CALL 8048328

//

Calculate f (10), this time

// The calculation result is in ST (0)

8048347: DD 5D F8 fstpl 0xffffffff8 (% EBP) // store the calculation result back to memory

// a

804834A: C7 04 24 0A 00 00 MOVL $ 0XA, (% ESP, 1)

8048351: E8 D2 FF FF FF CALL 8048328

//

Calculate f (10), corresponding to b = f (10)

// The calculation result is in ST (0)

8048356: DD 45 F8 FLDL 0xFfffffff8 (% EBP) // Direct load A in the value

// At this time, ST (0) = a

// ST (1) The value of B is calculated

8048359: 58 POP% EAX

804835A: DA E9 fucompp // Compare ST (0) with ST (1)

804835C: DF E0 fnstsw% AX

Here we have been able to see the problem! It first calculated A = f (10), then stored this result back to memory, because 0.1 No way to use binary precisely, from 80-bit extension accuracy to the 64-bit Double Double Double, Accuracy loss, then calculate B = f (10), and this value is not stored back in memory. At this time, GCC directly loads the A value in the memory into ST (0), and The calculated B value is compared. Since the B value is not stored back in memory, the B value does not have precision loss, and a value is lost, so A and B are not equal!

Below let's disconnect the second code, we only posted the parties we need to care about:

8048342: E8 E1 FF FF FF CALL 8048328

//

Calculate A

8048347: DD 5D F8 FSTPL 0xffffffff8 (% EBP) // Store A back memory

// a generating precision loss

804834A: C7 04 24 0A 00 00 MOVL $ 0XA, (% ESP, 1)

8048351: E8 D2 FF FF FF CALL 8048328

//

Calculate B

8048356: DD 5D F0 FSTPL 0xFffffff0 (% EBP) // Store B back to memory

// b Generate precision loss

8048359: C7 04 24 0A 00 00 MOVL $ 0XA, (% ESP, 1)

8048360: E8 C3 FF FF FF CALL 8048328

//

Calculate C

8048365: DD D8 FSTP% ST (0)

8048367: DD 45 F8 FLDL 0xffffffff8 (% EBP) // Load A from memory

804836A: DD 45 F0 FLDL 0xFffffff0 (% EBP) // Load B from memory

804836D: D9 C9 FXCH% ST (1)

804836F: 58 POP% EAX

8048370: Da E9 fucompp // Compare A, B

8048372: DF E0 fnstsw% AX

From the above code, A and B have been repayed by the GCC after the calculation is completed, so they produce precision loss, so their values ​​are exactly the same, so they will then transfer them to floating point. The register is compared, the result is A = B. This is mainly because of a C = F (10) calculation in the program, which allows GCC to store the previously calculated B value back to memory.

I think now, you should have a clear understanding of this problem. In the way, a keyword for a long double is provided in the GCC, corresponding to the 80-bit extension accuracy in the Intel CPU, which is 10 bytes.

Sixth, President

It is said that "Jiuyin Zhenjing" finally had a summary, summed up the book, here, I also borrowed it, summarizing the previous line.

This article is relatively completely described the basic processing mechanism of Intel CPU on floating point, hoping to understand the CPU and generate a certain use of the C-language to play floating point programming. In the end, I would like to talk about my opinion on a few blurred statements.

First, the floating point number cannot be relatively equal. In fact, this statement is inaccurate, from essentially, the floating point can be completely compared, and the CPU also provides corresponding instructions, but due to rounding problems, the floating point number is insecure when it is equal. , For example, the program analyzed above, is also f (10), but the first program is not established when it is equal, and the second is established, which is mainly due to the existence of rounding problems. If we can understand this problem from essential, we can fully rest assured whether two floating point numbers are equal in the program. For this program, I still want to explain that, we must get the above results under the premise of GCC to open optimization, because GCC has calculated the results each time the result is calculated, and the results will be stored back to memory. . The most important goal of program optimization is to affect the results of the program, but very unfortunately, GCC did not do this.

Second, we often see it in the book: "All the operations involving floating point numbers will be turned into a Double to do", we can also see from the above analysis, this is not accurate. From a hardware point of view, Intel CPUs use Under default conditions (LONG DOUBLE type in GCC) under default, they are converted into extended precision. From the C language itself, the C language is a very close language with hardware. His final result is to generate the hardware to identify code, and how hard is determined by hardware, C language cannot be controlled Inside the interference, the opposite C language must adapt to the rules of hardware. Therefore, it cannot be said that the C language itself converts all floating point operations into a certain precision, which precision is used by the CPU decision is also done by the CPU.

Third, due to floating point operations, the results will be rounded in memory, which is likely to bring errors, so we should try to use high-precision data types. For example, when we use GCC, we try to use Long Double rather than Double, float, which reduces considerable error chance. Don't think that they will have a decline in performance, and their most important weakness is that the space occupied is relatively large, but the space is now not a major consideration. Seven, recommended reading

The C language is a language that is close to hardware. If you want to be really proficient in C language, you should have a considerable understanding of the hardware structure and instruction system, which you can see this article 1.

If you think that the reference 1 is too complicated, you can see the reference 2 this article 2, in addition, it also has made a lot of analysis on the possible problems in the programming, very deeply meticulous, described in some details Very clear, has the quality of Bible.

If you don't know very well on C language, you can refer to the reference 3, it has a relatively simple description of the C language, especially one chapter "C program design common mistakes and solutions" is very large for beginners. benefit.

references:

1. "IA-32 Intel Architecture Software Develop's Manual" Vol.1, Vol.2, Intel Corp

2. "Computer Systems a Programmer's Perspective" (David R.O'Hallaron) Electronics Industry Press (Copy)

3. "C language university practical tutorial" (Su Xiaohong, Chen Huipeng, Sun Zhigang) Electronic Industry Press

转载请注明原文地址:https://www.9cbs.com/read-118835.html

New Post(0)