Optimized code for Visual C ++. Net 2003 (translation)

zhaozj2021-02-11  191

Optimized code for Viusal C

Mark Lacey

Microsoft Corporation

April 2003

Translation: cnss

Summary: This article introduces the code optimization in Visual C . Net 2003. In addition, some readers may not understand the optimization of VC.NET 2002, so we will briefly introduce the "Whole Program Optimization". Finally, we use some examples to express the optimization performance of VC.NET and discuss it.

This article applies to: Visual C .NET 2003

-------------------------------------------------- -----------------

Foreword

When people use a new programming tool, they will always feel confident. This article tries to make you more intuitive feelings about the VC code optimization. I hope you can "get" more things from the VC through reading this article.

Visual C .NET 2003

VC.NET 2003 not only brought two new optimization options, but also improved some optimized performance in VC.NET 2002.

The first new option is "/ g7", which tells the compiler to optimize the Intel Pentium 4 and AMD Athlon processors.

Using the "/ G7" option, when we compare the code generated by the VC.Net 2002, it is found that it usually increases the running speed of the typical program by 5 to 10 percentage points. If a large floating point code can be used Raise 10 to 15 percentage points. The level of improvement can be highly high or low, in some tests using the latest CPUs and "/ G7" option, even increased by 20% performance.

Using the "/ G7" option does not mean that the generated code can only be run on the Intel Pentium 4 and the AMD Athlon processor. These codes can still run on the old CPU, just in performance performance, there may be "small punishment". In addition, we observed that some programs were running on AMD Athlon after using "/ G7" than "/ g7" slow.

When you do not use the "/ gx" option, the compiler will use the "/ GB" option by default, at this time "Blended" optimization mode. In VC.NET 2002 and VC.NET 2003, "/ GB" represents "/ g6", which is Optimized for Intel Pentium Pro, Pentium II, Pentium III processor.

There is an example here that it shows the optimization effect of using Pentium 4 and "/ G7" when multiplied by the frequent integer multiplication. The following is the source code:

INT I;

...

// Do Something That Assigns a value to i.

...

Return i * 15;

When using "/ g6", the target code is generated:

Mov Eax, DWORD PTR _i $ [ESP-4]

Imul Eax, 15

When using "/ g7", it generates faster (unpleasant) code, it does not use the Imul (multiplying) instruction, and executes only 14 cycles on Pentium 4. The target code is as follows:

MOV ECX, DWORD PTR _i $ [ESP-4]

MOV EAX, ECX

SHL EAX, 4

Sub Eax, ECX

The second optimization option is "/ arch: [argument]", use it to optimize SSE or SSE2, generate programs that use Streaming SIMD Extensions (SSE) and Streaming SIMD EXTENSIS 2 (SSE2) instruction sets. When using the "/ Arch: SSE" option, the target code can only be run on the CPU that supports SSE instructions (such as CMOV, FCOMI, FCOMIP, FUCOMI, Fucomip). When using the "/ Arch: SSE2" option, the target code can only run on the CPU that supports the SSE2 instruction set. Compared to "/ G7", the SSE or SSE2 optimized procedures are used, which can generally reduce operational time of 2-3%, and even reduce the run time of 5% in individual tests.

Use "/ Arch: SSE" to get the following effects:

1. When using a single-precision floating point, use the SSE instruction to handle it.

2. Using the CMOV directive, it is earlier by Pentium Pro.

3. Using FCOMI, FCOMIP, Fucomi, Fucomip, which is also supported by Pentium Pro.

If you use "/ Arch: SSE2", you can get the effect of all "/ Arch: SSE" option, and the following effects:

1. When using the double precision floating point number, use the SSE2 instruction to handle it.

2. Make the SSE2 instruction set to 64-bit switching. (Original: Making Use of sse2 instructions for 64-bit shifts) also has other benefits,

When using "/ Arch: SSE" or "/ Arch: SSE2" and "/ GL" option option, the compiler will modulate the floating point parameters and floating point return values.

The above-mentioned optimization features have been included in VC.NET 2003. In addition, it is to eliminate "dead parameters" - never used by parameters. such as:

int

F1 (INT I, INT J, INT K)

{

Return i K;

}

int

Main ()

{

INT N = a B C D;

M = f1 (3, n, 4);

Return 0;

}

In the function F1 (), the second parameter has never been used. When we use the "/ GL" option, the compiler will generate the following target code to call F1 ():

Mov Eax, 4

MOV ECX, 3

Call f1 @@ yahhhh @ z

Mov DWORD PTR M @@ 3ha, EAX

In this example, the variable "n" has never been calculated, only two parameters are used by F1 (), so only the two parameters (and they are passed from the register, which is faster than using the stack). In addition, the inlineing is prohibited when compiling this example, otherwise the function f1 () does not exist, and the value 7 is given directly to the M.

Visual C .NET 2002

VC.NET 2002 introduces the concept of full program optimization (WPO WPO), "/ GL" option, represents the use of full program optimization. Optimization of the full program means that the compiler is stored in the .Obj file, the intermediate expression of the code is not the target code, and the connector is optimized and generated in the connection when the connection is connected.

One of the main benefits of all program optimization is that we can cross the source file inline, which will greatly improve the performance of the program. There is also a benefit that the compiler can track the use of memory and registers to optimize the overhead that makes the function call smaller. The following representative shows the performance of the full program:

// file 1

Extern Void Func (INT *, INT *);

INT G, H;

int

Main ()

{

INT i = 0;

INT j = 1;

g = 5;

H = 6;

Func (& I, & J);

g = g i;

H = h i;

Return 0;

}

// file 2

Extern Int g;

Extern Int h;

Void

Func (int * pi, int * pj)

{

* pj = g;

H = * pi;

}

When you do not use the "/ GL" option, the following code is generated:

SUB ESP, 8

Lea Eax, DWORD PTR _J $ [ESP 8]

Push EAX

Lea ECX, DWORD PTR _i $ [ESP 12]

Push ECX

Mov DWORD PTR _I $ [ESP 16], 0

MOV DWORD PTR _J $ [ESP 16], 1

Mov DWORD PTR G @@ 3ha, 5

Mov DWORD PTR H @@ 3ha, 6

Call func @@ yaxpah0 @ z

MOV EAX, DWORD PTR _i $ [ESP 16]

Mov EDX, DWORD PTR G @@ 3ha

Mov ECX, DWORD PTR H @@ 3ha

Add Edx, EAX

Add ECX, EAX

Mov DWORD PTR G @@ 3ha, EDX

Mov DWORD PTR H @@ 3ha, ECX

XOR EAX, EAX

Add ESP, 16

Ret 0

When "/ GL" is used, you will see the following code, the current code is short. Pay attention to compiling this example, pay attention to the inline optimization.

SUB ESP, 8

Lea ECX, DWORD PTR _J $ [ESP 8]

Lea Edx, DWORD PTR _i $ [ESP 8]

Mov DWORD PTR _I $ [ESP 8], 0

Mov DWORD PTR G @@ 3ha, 5

Mov DWORD PTR H @@ 3ha, 6

Call func @@ yaxpah0 @ z

MOV DWORD PTR? G @@ 3ha, 5

XOR EAX, EAX

Add ESP, 8

Ret 0

Typical optimization

The VC compiler includes two major optimized parameters, "/ O1" and "/ o2". "/ O1" represents the minimum size, and the compiler is selected to use the following options.

1./OG global optimization, such as the frequently used variables to save, or optimize the calculation within the loop

2./OS Program (EXE or DLL) Size Optimization takes precedence over code speed optimization

3./oy omitted the frame pointer to improve function call speed

4./OB2 compiler "I feel" I should use the inline function, I use inline

5./gf enable string pool

6./GY Enable Function Level Link, tell the compiler to compile each function according to the intermediate format

The "/ O2" option represents the fastest speed, which is basically the same as "/ O1", just replaces "/ OS" with "/ OT". There is also "/ oi" represents the expansion of the inline function. In general, the fastest optimization of the small procedure is used, and the minimum size optimization is used for large procedures. This is because the size of the size is usually caused by slow loading, the cache hits low, and the system is frequently switched between paging memory. Use minimum size optimization, compile no longer expand cycle, and does not use longer code.

After selecting the main optimization option, use profile to find "hot zone" is a good way so you can optimize the different parts of the program. For example, if you use the minimum size optimization, you can find that there are several functions to perform very frequently with profile, then you can optimize the few functions.

The VC compiler can optimize the specific function!

For example, if you find that the fiddle () function is called, you can optimize the function only for this function, this:

#pragma Optimize ("T", ON)

INT FIDDLE (S * P)

{

...;

}

#pragma optimize ("", "

In addition to "/ O1" and "/ O2", there is an "/ ox" option, which is very similar to the "/ O2" effect, and "/ ox" and "/ OS" combination are the same as "/ O1". We recommend "/ O1" and "/ O2" instead of using "/ ox".

At this point, we discussed "/ g7", "/ arch" and "/ gl" optimization options.

In addition to the above, VC also provides two:

1./GA Optimized static thread local storage. (Do not use DLL Project, use it without effective)

2./gr Use __fastcall to make the default call rule, which represents the first two parameters will be transmitted with register (if the parameter can be loaded).

Another option is "/ OPT: REF", with it to notify the connector, remove the function that is not called and the data that is not used when connected. With the "/ OPT: ICF" option, merge the same function (such as your program may be expanded several times by template), and it can also reduce the size of the program.

Optimization Improvement in Visual C .NET

There are three important optimization options here, you can use them in the VC.NET 2003 project. Although VC.NET 2002 also provides these options, VC.NET 2003 has made it performance improvements.

The following table briefly describes them, if you want to know more detailed content, please refer to the documentation belled by the VC.

Option Effect / RTC1 uses an unopened Debug mode, the compiler inserts dynamic detection code to help you find errors in the program. For example, you have not initialized memory, or you blend __stdcall and __cdecl. / GS joins the code that detects the static buffer (stack) overflow, hacker cannot overwrite the address returned by the function to perform malicious code. (It is said that Windows XP SP2 is compiled with this parameter)

Note: This doesn't mean you can have no worries, you still have to pay attention to the safety code! / WP64 detects the problem of generating 64-bit code, through it you can find problems that might appear in your code to 64-bit environments.

in conclusion

转载请注明原文地址:https://www.9cbs.com/read-5107.html

New Post(0)