Program Design Introduction Based on MMX Command Set

zhaozj2021-02-16  58

Program Design Introduction Based on MMX Command Set

Author: Alex Farber Source: http: //www.codeproject.com/cpp/mmxintro.asp

MMX Technical Introduction Intel's MMXTM (Multimedia Enhancement Instruction Set) technology can greatly improve the application ability of the application to two-dimensional three-dimensional graphics and images. Intel MMX technology can be used to complex processing for large amounts of data and complex arrays, using MMX technology can be processed by the basic units, or word, or double word (double-word).

Visual Studio .NET 2003 provides support for the MMX instruction set features, so you don't have to write assembly code, you can implement the function of the MMX instruction directly using C code. By referring to Intel Software Manuals [1] and the theme of MMX programming technology will make you better grasp the key points of MMX programming.

MMX technology implements the execution mode of single-channel command multi-channel data stream (SIMD, Single-INSTRUCTION, MULTIPLE-DATA). Consider the following tasks that need to be programmed, in a byte array, make each of the elements, plus one number, in the traditional program, the algorithm for implementing this function is as follows:

For each b in array // Each element of the array B b = b n // plus a number n

Let's take a look at its implementation details:

The for Each B IN Array // Aligned each element B {load B into the register into the number in the register back the result in the resulting register back to memory}

The processor with the MMX instruction set support has eight 64-bit registers, each register can store 8 bytes, 4 words (Word) or 2 double words (Double-Word). The MMX technology also provides an MMX instruction set, and the instructions can be loaded into these MMX registers in these MMX registers in the registers, and the registers can be used in the registers. The result is put back into the memory storage unit. The above example of the algorithm after MMX technology is like this:

For each 8 members in array // 8 bytes in the array (one byte in one byte) as a set of extract {load this 8 bytes into the MMX register through a CPU instruction execution cycle Plus 8 bytes in this register, write the results calculated in the register back memory}

C programmers do not have to use instructions in the MMX instruction set directly to access these MMX registers. You can use 64-bit data type __m64 and a series of C functions to perform related arithmetic and logical operations. Which MMX register and code optimization are the task of the C compiler.

Visual C MMXSWARM [4] is an example of a good use of MMX technology in MSDN that contains some encapsulated classes simplified the operation of using MMX technology and showing you to various Different format images are processed (such as monochrome 24-bit pixels RGB, 32-bit pixel RGB, etc.). This article is just a brief introduction to the MMX programming using Visual C . If you are interested, you can see the example of MSDN on MSDN.

MMX programming details

Head file included

All MMX instruction set functions are defined in the emmintrin.h file: #include Because the MMX processor instruction used in the program is determined by the compiler, it is not related .lib library file. __m64 data type

This type of variable can be used as an operand of the MMX instruction, which cannot be accessed directly. The _m64 type variable is automatically assigned to the word length of 8 bytes.

CPU support for MMX instruction set

If your CPU can have an MMX instruction set, you can use the Visual Studio .NET 2003 to support the C function library supported by the MMX instruction set, you can view an example of a Visual C CPUID [3] in MSDN, it Can help you detect if your CPU supports SSE, MMX instruction sets, or other CPU functions.

Saturation Alithmetic and Packaging Mode (Wraparound Mode)

MMX technology supports a computing mode called Saturating Arithmetic (saturation algorithm). In saturation mode, when the calculation result is overflow (overflow or underflow), the CPU automatically removes the overflowed portion, allowing the calculation result to demonstrate the data type represents the upper limit value of the value (if overflow) or the lower limit value ( If underflow). The calculation of saturation mode is used to process the image. The following example can make you understand the difference between saturation mode and packaging mode. If a byte (byte) type variable is 255, then add a value to one. In the package mode, the result is 0 (removed); in saturation mode, the result is 255. Saturated mode is treated with a similar method, for example, for a number of byte data types in saturation mode, 1 minus 2 results are 0 (rather than -1). Each MMX arithmetic directive has both modes: saturation mode and package mode. The items you want to discuss this article use only MMX instructions in saturation mode.

Programming instance

The following explanation instances under Visual Studio .NET 2003, you can download sample programs compressed packages at http://www.codeproject.com/cpp/mmxintro/mmx_src.zip. The compression package contains two projects, which are Visual C . The Visual C . Net project established based on the Microsoft Basic Class Library (MFC), you can also establish these two items as described below.

MMX8 demo project

MMX8 is a single document interface (SDI) application for simple processing of monochrome bitmaps per pixel 8 bits. The image of the source image and the post-processed image will be displayed in the form. The new ATL (active template library) class CIMAGE is used to extract images from the resource and display it in the form. The program is to perform two processing operations: image color inverting and changing the brightness of the image. Each processing operation can be implemented in one of the following methods:

Pure C code; use C MMX functional code; use the code of the MMX assembly instruction.

The time to process the image will be displayed in the status bar.

Image color inverted function with pure C implementation:

void CImg8Operations :: InvertImageCPlusPlus (BYTE * pSource, BYTE * pDest, int nNumberOfPixels) {for (int i = 0; i

The implementable functionality corresponding to the MMX assembly instruction Visual C . The MMX function in NET clears the content in the MMX register, which is initialized (to avoid conflicting with floating point). EMMS_MM_EMPTY performs subtraction operations simultaneously in two 64-bit numbers (8) bytes. PSUBUSB_MM_SUBS_PU 8 performs additional operations at the two 64-bit numbers of no symbol (8-bit) bytes. PADDUSB_MM_ADDS_PU8

Implement image color inverted functions with Visual C . Net MMX instruction function:

Void Cimg8Operations :: InvertImagec_mmx (byte * psource, byte * pdest, int nnumberofpixels) {__INT64 i = 0; i = ~ i; // 0xffffffffffffffffffff

// Treat 8 pixels INT nloop = nnumberofpixels / 8 each cycle;

__m64 * pin = (__m64 *) PSource; // Input byte array pointer __m64 * pout = (__m64 *) PDEST; // output byte array pointer

__m64 TMP; // Temporary work variable

_MM_EMPTY (); // Perform MMX instruction: EMMS, initialization MMX register

__m64 n1 = GET_M64 (I);

For (int i = 0; i

PIN ; // Remove 8 pixel points Pout ;}

_MM_EMPTY (); // Perform MMX instruction: EMMS, clear the contents of the MMX register} __ m64 cimg8otations:: GET_M64 (_ T64 n) {union __m64__m64 {__m64 m; __INT64 I;} mi;

mi.i = n; return mi.m;}

Although this function is executed very short time, I record these three ways to take time, the following is the result of running on my computer: pure C code 43 milliseconds using C MMX instruction function code 26 MMX assembly instruction code 26 milliseconds in milliseconds

The above image processing time must be implemented when the program release is optimized after compiling.

I use the simplest way to change the brightness of the image: add or subtract the color value of each pixel in the image. Such a conversion function is somewhat complicated relative to the previous processing function, because we need to divide the processing process into two cases, one is to increase the pixel color value, and the other is to reduce the pixel color value.

Change the function of brightness with pure C functions:

void CImg8Operations :: ChangeBrightnessCPlusPlus (BYTE * pSource, BYTE * pDest, int nNumberOfPixels, int nChange) {if (nChange> 255) nChange = 255; else if (nChange <-255) nChange = -255;

BYTE B = (Byte) ABS (nchange);

INT I, N;

IF (nchange> 0) // increase pixel color value {for (i = 0; i

IF (n> 255) n = 255;

* PDEST = (byte) n;}} else // Reduce pixel color value {for (i = 0; i

IF (n <0) n = 0; * PDEST = (byte) n;}}}

Changing image brightness functions implemented by Visual C . Net MMX instruction function:

Void Cimg8Operations :: ChangeBrightnessc_mmx (byte * psource, byte * pdest, int nnumberofpixels, int nchange) {if (nchange> 255) nchange = 255; Else IF (nchange <-255) nchange = -255;

BYTE B = (Byte) ABS (nchange);

__INT64 C = B;

For (int i = 1; i <= 7; i ) {c = c << 8; c | = b;} // process 8 pixels INT nnumberofloops = nnumberofpixels / 8 in a cycle;

__m64 * pin = (__m64 *) PSource; // Input byte array __m64 * pout = (__m64 *) PDEST; // output byte array

__m64 TMP; // Temporary work variable

_MM_EMPTY (); // Perform MMX instruction: EMMS

__m64 nchange64 = GET_M64 (C);

IF (nChange> 0) {for (i = 0; i

* pout = TMP;

PIN ; // Remove 8 pixels Pout ;}} else {for (i = 0; i

* pout = TMP;

PIN ; // Remove 8 pixels Pout ;}}

_MM_EMPTY (); // Perform MMX instruction: emms}

Note that the parameter nchange symbols check the function outside the cyclic body each time the function is called, not in the cyclic body, which will be checked for a thousand thousand people. Here is the time to process image on my computer:

Pure C code 49 milliseconds using C MMX instruction function code 26 milliseconds using MMX assembly instruction code 26 ms

MMX32 demo project

The MMX32 project can process the RGB image of 32-bit pixels. The image processing of the image is the image color inverting operation and the balance of the image color (each color of the pixel point is multiplied by a certain value).

MMX multiplication is much more complicated than plus subtraction, because the number of bits of the result of the multiplication operation is no longer the size of the previous bits. For example, if the number of operations of the multiplication has a byte (8-bit BYTE) size, the result will reach a word (16-bit Word) size. This requires additional conversion and uses the MMX assembly instructions and C code to convert image conversions. Time is not very large (the time difference is 5-10%). Changing the function of the color balance of image color balance using Visual C . Net MMX instruction function:

void CImg32Operations :: ColorsC_MMX (BYTE * pSource, BYTE * pDest, int nNumberOfPixels, float fRedCoefficient, float fGreenCoefficient, float fBlueCoefficient) {int nRed = (int) (fRedCoefficient * 256.0f); int nGreen = (int) (fGreenCoefficient * 256.0 f); int NBLUE = (int) (FBLUECOEFFICIENT * 256.0F);

/ / Set the multiplier coefficient __INT64 C = 0; c = nred; c = c << 16; c | = ngreen; c = c << 16; c | = NBLUE;

__m64 nnull = _m_from_int (0); // NULL __M64 TMP = _M_FROM_INT (0); // Temporary work temporary variable initialization

_MM_EMPTY (); // Clear the MMX register.

__m64 ncoeff = GET_M64 (C);

DWORD * PIN = (DWORD *) PSource; // Enter the double word array DWORD * Pout = (dword *) pdest; // Output double word array

For (int i = 0; i

TMP = _MM_UNPACKLO_PI8 (TMP, NNULL); // Plip the 5-byte transformed in the TMP to the high position of the word // word to the bit value on the status in NNULL.

TMP = _MM_MULLO_PI16 (TMP, NCOEFF); // multiplies each word in the TMP, sending the high position of the multiplied result to NCOEFF, only the low position of each result is retained in TMP.

TMP = _MM_SRLI_PI16 (TMP, 8); // Put 8 bits of each word in TMP, equivalent to division 256

TMP = _MM_PACKS_PU 16 (TMP, NNUL); // Use saturation mode to process the results in TMP as follows: // Translate 4 words in TMP to 4 bytes, and write these 4 bytes to TMP The low 32-bit / / / / converts 4 words in nNull into 4 bytes, and writes the 4 bytes to the TMP high 32 bits. * pout = _m_to_int (tmp); // * pout = TMP (putting data with low TMP low in Pout) PIN ; pout ;

}

_MM_EMPTY ();

You can see the source code for the sample project to learn more about this project.

SSE2 technology

The SSE2 technology includes a set of instructions similar to the integer operation in MMX, and also contains 128-bit SSE register groups. For example, the use of SSE2 technology to change image color balance can be achieved more efficient than using pure C code. SSE2 is also an extension of SSE technology, such as it can not only a single-precision floating point count, but also to process an array of bid-precision floating point data types. The MMXSWARM sample item implemented with C not only uses the MMX instruction function, but also the function of the SSE2 instruction on the integer operation.

Reference documentation:

[1] Intel Software Manuals: http: //developer.intel.com/design/archives/Processors/mmx/index.htm.

[2] MSDN's topic for MMX technology: http://msdn.microsoft.com/library/default.asp? Url = / library / en-us / vclang / html / vcrefSupportformmxTechnology.asp.

[3] Microsoft Visual C CPUID Project Example: http://msdn.microsoft.com/library/default.asp? URL = / library / en-us / vcsample / html / vcsamcpuiddeterminecpucapability.asp.

[4] Microsoft Visual C MMXSwarm Project example: http: //msdn.microsoft.com/library/default.asp url = / library / en-us / vcsample / html / vcsamMMXSwarmSampleDemonstratesCImageVisualCsMMXSupport.asp?. [5] Matt Pietrek in Microsoft Systems Journal issued in February 1998, published article: http://www.microsoft.com/msj/0298/HOOD0298.ASPX.

转载请注明原文地址:https://www.9cbs.com/read-22379.html

New Post(0)