The best performance of released programs on Intel architecture

xiaoxiao2021-03-06  41

The optimization of software performance is used as a fee, and difficulties are often seen as the territory of software development, so that general software developers are stunned. However, software performance is competitive in the market in the market, and it is a key role in the success of software products. Therefore, how to improve software is the problem that software engineers often encounter, and is also a headache for software engineers.

Is there a simple way to improve the performance of the software? Intel's software development tools provide you with this convenience. Flexible use Intel Software Development Tools, you can easily and easily improve the performance of the program so that the program achieves best performance on Intel architecture.

From this issue we will introduce a range of articles to describe how to use Intel Software Development Tools to optimize the performance of procedures. After reading this series of articles, you will be able to release the best performance of your program on Intel architecture by using Intel's software development tools.

As the first article of this series of articles, we will show how to use the Intel compiler optimization program performance.

How to optimize your program's performance compiler for your program is the most basic tool in today's software development. The performance of the compiler directly affects the performance of the generated executable program. The fastest, the easiest way to improve program performance is to use a compiler with an optimized function. The optimization function of the compiler has achieved great progress in recent years. A good compiler can help you make full use of the novel processor features, allowing optimization work automation, you will not have to turn a thick processor manual. The Intel Compiler as the leader, fully utilizing the features of the Intel 32-bit processor and Intel 64-bit processor, allowing the compiled code to be the highest, it is based on IA32 (Intel32) Arrangement and IA64 (Intel64-bit architecture) preferences in programs.

We first introduce how to use the Intel compiler in the Microsoft Visual C development environment and demonstrate how to use the Intel C compiler to optimize the specific Intel processor and how to write functions for a specific Intel processor, in the final We will discuss how to use Intel C to use the Intel processor's SIMD instruction to improve the performance of the program.

1 Using the Intel C compiler

The Intel C compiler has numerous optimization characteristics that make full use of the latest processors and advanced optimization strategies. And it can be easily integrated into the popular integrated development environment and cooperates with other development tools to complete development work.

Here we explain how to use the Intel C compiler in the popular C development tool Microsoft Visual C . After the Intel C compiler is installed, it will automatically integrate into the Microsoft Visual C development environment.

In Microsoft Visual C 6.0, by setting in the Selection Tool in the Microsoft Visual C 6.0 menu, the Intel C compiler can replace the compiler in the Microsoft Visual C 6.0 development environment as the default compiler. In Microsoft Visual C . Net 2003, you can convert engineering into projects using the Intel C system through the right-click shortcut menu. You can also define macro _USE_INTEL_COMPILER and _USE_NON_INTEL_COMPILER as a compiler of a specific project.

The Intel C compiler also supports the Linux platform and has the same features as the Windows version. you can

Find more detailed information about Windows and Linux version of the Intel C compiler and Intel Fortran compiler on the INTEL development tool. 2 For specific processors Optimization We always hope that the procedures we develop can utilize all the characteristics of the processor, so that the running efficiency of the program is best. Whether the compiler supports the new processor's new instructions and code scheduling rules determine whether the generator can make full use of all the characteristics of the processor. The Intel C compiler supports new instructions and code scheduling rules for new processors. When using a specific processor instruction, for example, only the streaming SIMD EXTENSIVE is used in the Pentium4 processor and its subsequent processors, the compiler can generate simultaneously on the old processor. The code executed. Such a compiler output can get the best performance on a new processor while running on all old-fashioned processors. For example, you want to use SSE instructions to run on the Pentium 4 processor, you can use the G7 Qxk option at the same time. In Microsoft Visual C 6.0, these options can be added from the Project Settings dialog box of Microsoft Visual C . In Microsoft Visual C . In Net 2003, you can use these options directly in Intel Specific.

3 Written a function for a specific processor

In order to make the program use a specific processor's characteristics, you have to write some functions, such as MMX instructions, only specific processors support these instructions. At this time, the compiler requires some CPU monitoring code.

The model number of the CPU can be determined by calling the assembly instruction CPUID. When the instruction is called the EAX register set to 1 (refer to Intel Architecture Software Developer's Manual, Volume 2: Instruction Set Reference and Application Note AP-485 Intel Processor Identification and the CPUID Instruction) after the instruction is executed, a number of processors and other information Information, such as the CPU feature information and the Cache size are placed in the corresponding register. You can use this information in the program, select a different function on different processors.

A relatively simple but can achieve the same purpose is to use the DISPATCH feature of the Intel C compiler. The compiler automatically generates an efficient CPU detection code. To define a function that is executed on a particular processor is more simple, without having to handle trivial questions about the Detail of the CPU ID command. As shown in the following example, the keyword CPU_DISPATCH and CPU_SPECific are used in the function declaration, and the compiler calls a specific function on a particular processor.

__DECLSPEC (CPU_SPECific) Void Fn (Void)

{

// Place this universal code for I386

}

__DECLSPEC (CPU_SPECICIC (Pentium 4)) Void Fn (VOID)

{

// Add code for the Pentium 4 processor

}

__DECLSPEC (CPU_DISPATCH (Generic, Pentium_4) VoID

Fn (void)

{

/ / Do not place the function, don't place any code

// The compiler will add the corresponding code according to the CPU type

}

4 use SIMD instructions

The use of SIMD instructions in the program allows the performance of the program to be greatly improved, but the C / C language itself does not provide a method directly using them. In the past, only SIMD instructions were used by handwriting assembly languages, but this means additional development, commissioning and maintenance. Fortunately, the Intel C compiler adds support for SIMD instructions in the C / C language, which makes it easier to use using SIMD. There are four ways to use SIMD instructions in the Intel / C compiler: 4.1automatic vectorzation uses this way, the Intel C compiler can analyze the loop in the program, and automatically use the SIMD instruction, command line option Q [A] x { I | M | k | w} is notified using SIMD instructions safely. The following example shows that the SIMD instruction is secure. The following example shows the use of the QXW option license compiler with the Pentium 4 processor unique instruction.

C: / dev / simd> ICL ¨cc ¨cqxw simd.cpp

Intel (R) C Compiler for 32-bit Applications, Version

8.0 Build 20040318Z

Package ID: w_cc_pc_8.0.048

Copyright (C) 1985-2004 Intel Corporation. All Rights

RESERVED.

SIMD.CPP

Simd.cpp (8): (Col. 2) Remark: loop was vectorized.

Simd.cpp (21): (col. 2) Remark: loop was vectorized.

There are very many command line parameters and Progmas Control Automatic Vectoriazation. You can refer to Intel C Compiler's Guide for more information. More information can be obtained in Intel Software Development Products Web Site.

4.2 Support SIMD C Category Library

Exciting is that the Intel C compiler contains data types that use the SIMD instruction. You can use these data types to get control of the generated code. Use these data types, only need to declare a variable that requires data types, which can increase the number of elements of the processing, thereby reducing the number of cycles.

The following example shows a conversion using I32VEC4 data type (bundled four 32-bit integers):

// Original Version Using Integers

Void quarter (int Array [], int LEN)

{INT I;

For (i = 0: i

Array [i] = array [i] >> 2;

} // Modified Version Using IsVec4, 4 Simd Integers

Void Quartervec (int Array [], INT LEN)

{

// Assumes Len Is A Multiple of 4

// Assumes array is 16 byte aligned

IS32VEC4 * Array4 = (is32Vec4 *) Array

INT I;

For (i = 0; i

Array4 [i] = array4 [i] >> 2;

}

The 4.3IntrinsicsIntel C compiler supports the use of the Intrinsics function, which supports the map to the SIMD instruction and the other assembly instructions. The following example shows the same quarter () function as the previous product () function, which can be seen that the resulting assembler is the same, just it uses a C / C variable replacement register. Documents on his own function can be found on IA-32 Intel Architecture Software Developer's Manual, Volume 2: Instruction Set Reference and Intel C Compiler's Guide. 4.4 Inline assembly language

Sometimes we have to write programs at the bottom floor. The embedded assembly language makes it possible to coding the bottom. The following example shows the use of embedded assembly with the same function as the previous:

Void quarterasm (int Array [], int LEN)

{

_asm {

Mov ESI, Array; ESI = array pointer

Mov ECX, Len; ECX = Loop Counter

SHR ECX, 2; 4 Shifts Per loop

iTeration

Theloop:

MOVDQA XMM0, [ESI]; Load 4 INTEGERS

PSRAD XMM0, 2; Shift Right ALL 4

INTEGERS

MOVDQA [ESI], XMM0; Aligned Store

Add ESI, 16; Move Array Pointer

SUB ECX, 1; Decrement Loop Counter

Jnz theloop

}

}

4.5 Other Compiler Optimization

In addition to Automatic Vectorization and use the SIMD instruction, use the Intel C compiler to do a lot of other optimizations, such as tables: Detailed information on how to use these and other optimizations, please check Intel

C Compiler User's Guide and Reference.

summary

This article describes how to use an Intel C compiler optimizer. To achieve the optimal performance, you need to always open the optimization item of the appropriate compiler. These options help problems when you find an optimization program, and it is easier to discover and patch the problem, and you will help you pay attention to program performance. You should turn off these compiler optimization options only when debugging needs.

Since the Intel C compiler is optimized, it is necessary to use the document that is briefly read by the compiler before use. This will help you clear all possible optimization options and performance characteristics.

转载请注明原文地址:https://www.9cbs.com/read-82776.html

New Post(0)