Mark LaceyMicrosoft CorporationApril 2003 Translation: CNSS Summary: This article describes the code optimization in Visual C . Net 2003. In addition, some readers may not understand the optimization of VC.NET 2002, so we will briefly introduce the Whole Program Optimization. Finally, we use some examples to fully express the optimization performance of VC.NET and discuss it. This article applies to: Visual C .NET 2003 ----------------------------------------- -------------------------- Foreword People always feel confident when using a new programming tool, this article tries to let you code to VC Optimization has more intuitive feelings, I hope you can "get" more things from VC through reading this article. Visual C .NET 2003 VC.Net 2003 not only brought two new optimization options, but also improved some optimized performance in VC.NET 2002. The first new option is "/ g7", which tells the compiler to optimize the Intel Pentium 4 and AMD Athlon processors. Using the "/ G7" option, when we compare the code generated by the VC.Net 2002, it is found that it usually increases the running speed of the typical program by 5 to 10 percentage points. If a large floating point code can be used Raise 10 to 15 percentage points. The level of improvement can be highly high or low, in some tests using the latest CPUs and "/ G7" option, even increased by 20% performance. Using the "/ G7" option does not mean that the generated code can only be run on the Intel Pentium 4 and the AMD Athlon processor. These codes can still run on the old CPU, just in performance performance, there may be "small punishment". In addition, we observed that some programs were running on AMD Athlon after using "/ G7" than Intel Pentium 4. When you do not use the "/ gx" option, the compiler will use the "/ GB" option by default, at this time "Blended" optimization mode. In VC.NET 2002 and VC.NET 2003, "/ GB" represents "/ g6", which is Optimized for Intel Pentium Pro, Pentium II, Pentium III processor. There is an example of this, which shows the optimization effect of using Pentium 4 and "/ G7" when multiplying frequent integer multiplication. The following is the source code: int i; ... // do something That Assigns a value to i. ... Return i * 15; When using "/ g6", the target code is generated: MOV Eax, DWORD PTR _i $ [ESP-4] Imul Eax, 15 When using "/ g7", generate faster (unfortunately) code It does not use the Imul (multiplication) instruction, and execute only 14 cycles on Pentium 4.
The target code is as follows: MOV ECX, DWORD PTR _I $ [ESP-4] MOV EAX, ECXSHL EAX, 4SUB EAX, ECX second optimization option is "/ Arch: [argument]", use it to optimize SSE or SSE2, Generate programs that use Streaming SIMD Extensions (SSE) and Streaming SIMD Extensions 2 (SSE2) instruction sets. When using the "/ Arch: SSE" option, the target code can only be run on the CPU that supports SSE instructions (such as CMOV, FCOMI, FCOMIP, FUCOMI, Fucomip). When using the "/ Arch: SSE2" option, the target code can only run on the CPU that supports the SSE2 instruction set. Compared to "/ G7", the SSE or SSE2 optimized procedures are used, which can generally reduce operational time of 2-3%, and even reduce the run time of 5% in individual tests. Use "/ Arch: SSE" to get the following effects: 1. When using a single-precision floating point, use the SSE instruction to handle it. 2. Using the CMOV directive, it is earlier by Pentium Pro. 3. Using FCOMI, FCOMIP, Fucomi, Fucomip, which is also supported by Pentium Pro. If you use "/ Arch: SSE2", you can get the effect of all "/ Arch: SSE" option, and the following effects: 1. When using the double precision floating point number, use the SSE2 instruction to handle it. 2. Make the SSE2 instruction set to 64-bit switching. (Original: Making Use of SSE2 INSTRUCTIONS for 64-bit shifts) Compiles while using "/ Arch: SSE" or "/ Arch: SSE2" and "/ GL" option options, compile Optimize the floating point parameters and floating point return values. The above-mentioned optimization features have been included in VC.NET 2003. In addition, it is to eliminate "dead parameters" - never used by parameters. For example: intf1 (INT I, INT J, INT K) {RETURN I K;} INTMAIN () {INT N = A B C D; M = F1 (3, N, 4); Return 0;} In the function F1 (), the second parameter has never been used. When we use the "/ GL" option, the compiler will generate the following target code to call F1 (): MOV EAX, 4MOV ECX, 3Call? F1 @@ yahhhh @ zmov dword PTR? M @@ 3ha, EAX In this example, the variable "n" has never been calculated, only two parameters are used by F1 (), so only the two parameters (and they are passed from the register, which is faster than using the stack). In addition, the inlineing is prohibited when compiling this example, otherwise the function f1 () does not exist, and the value 7 is given directly to the M.
Visual C .NET 2002 VC.NET 2002 introduces the concept of full optimization (WHOL Program Optimization, abbreviation WPO), "/ GL" option represents the use of full-scale optimization. Full-scale optimization means that the compiler is stored in the .Obj file instead of the target code, the connector is optimized and generated by the real target code. One main benefit of full optimization is that we can cross the source file inline, which will greatly improve the performance of the program. There is also a benefit that the compiler can track the use of memory and registers to optimize the overhead that makes the function call smaller. The following representative shows the full-scale performance: // file 1Extern void func (int *, int *); int G, h; intimain () {INT i = 0; int J = 1; g = 5; h = 6 FUNC (& I, & J); g = g i; h = h i; return 0;} // file 2 Extern Int g; extern int h; voidfunc (int * pi, int * pj) {* pj = g; h = * pi;} When the "/ GL" option is not used, the following code is generated: SUB ESP, 8LEA EAX, DWORD PTR _J $ [ESP 8] Push Eaxlea ECX, DWORD PTR _i $ [ESP 12 ] Push ECXMOV DWORD PTR _I $ [ESP 16], 0MOV DWORD PTR _J $ [ESP 16], 1MOV DWORD PTR? G @@ 3HA, 5MOV DWORD PTR? h @@ 3ha, 6call? func @@ yaxpah0 @ zmov EAX, DWORD PTR _I $ [ESP 16] MOV EDX, DWORD PTR? G @@ 3hamov ECX, DWORD PTR? H @@ 3haadd Edx, Eaxadd ECX, Eaxmov DWORD PTR? G @@ 3ha, Edxmov DWORD PTR? h @ @ 3HA, ECXXOR EAX, EaxAdd ESP, 16RET 0 When "/ GL" is used, you will see the following code, the current code is short. Pay attention to compiling this example, pay attention to the inline optimization. Sub ESP, 8LEA ECX, DWORD PTR _J $ [ESP 8] Lea EDX, DWORD PTR _i $ [ESP 8] MOV DWORD PTR _I $ [ESP 8], 0MOV DWORD PTR? G @@ 3ha, 5MOV DWORD PTR ? h @@ 3ha, 6Call? Func @@ yaxpah0 @@ 3ha, 5xor Eax, EaxAdd ESP, 8RET 0 Performance Optimization The best example of the VC compiler includes two main optimized parameters, "/ O1 "And" / o2 ". "/ O1" represents the minimum size, and the compiler is chosen to use the following options.
1./OG global optimization, such as the frequently used variables to save, or calculate the calculation optimization in the loop 2./OS program (EXE or DLL) size optimization take precedence over code speed optimization 3./oy use frame pointer to improve Function call speed 4./OB2 compiler "I feel" I should use the inline function, use the inline 5./gf to use the read-only string pool 6./GY tell the compiler to compile each function in packaging format "/ O2 "The option represents the fastest speed, it is basically the same as" / O1 ", just use" / OT "instead of" / OS ". There is also "/ oi" represents the expansion of the inline function. In general, the fastest optimization of the small procedure is used to optimize the larger program, because the size of the size of the size can usually result in slow load, low Cache hit, and system frequently switched distribution memory. Use minimum size optimization, compile no longer expand cycle, and does not use longer code. After selecting the main optimization option, use profile to find "hot zone" is a good way so you can optimize the different parts of the program. For example, if you use the minimum size optimization, you can find that there are several functions to perform very frequently with profile, then you can optimize the few functions. The VC compiler can optimize options for specific functions! For example, if you find that the fiddle () function is called high, you can let the compiler optimize this function, so: #pragma Optimize "T", ON) INT FIDDLE (S *P) {...;} # Pragma Optimize ("ON) In addition to" / O1 "and" / O2 ", there is" / ox "option, it is very / O2 "The effect is the same, and" / ox "and" / OS "combination are the same as" / O1 ". We recommend "/ O1" and "/ O2" instead of using "/ ox". At this point, we discussed "/ g7", "/ arch" and "/ gl" optimization options. In addition to the above, VC also provides two: 1./GA optimized static thread local storage. (Do not use DLL Project, use it without effect) 2./gr Use __fastcall to make default call rules, which represents the first two parameters to transfer with registers (if the parameter can be loaded). Another option is "/ OPT: REF", with it to notify the connector, remove the function that is not called and the data that is not used when connected. With the "/ OPT: ICF" option, merge the same function (such as your program may be expanded several times by template), and it can also reduce the size of the program. Optimization Improvements in Visual C .NET have three important optimization options that you can use them in the VC.NET 2003 project. Although VC.NET 2002 also provides these options, VC.NET 2003 has made it performance improvements. The following table briefly describes them, if you want to know more detailed content, please refer to the documentation belled by the VC. Option Effect / RTC1 uses an unopened Debug mode, the compiler inserts dynamic detection code to help you find errors in the program. For example, you have not initialized memory, or you blend __stdcall and __cdecl.
/ GS joins the code that detects the static buffer (stack) overflow, hacker cannot overwrite the address returned by the function to perform malicious code. Note: This doesn't mean you can have no worries, you still have to pay attention to the safety code! / WP64 detects the problem of generating 64-bit code, through it you can find problems that might appear in your code to 64-bit environments. // Don't know if VC has no future .........