Progress of the instruction set - MMX and SSE

zhaozj2021-02-08  280

Data Organization · Flying Software Studio - Progress in Programmer Website Directive Set - MMX and SSE If you can only do something ... from the simplest computer, the instruction sequence can get the operation object, and execute them Calculate. For most computers, these instructions can only perform one calculation. To complete some parallel operations (such as the stereo left, right channel, or the red, green, blue mix of the display), will continue to perform multiple calculations. Such computers use "single instruction single data" (SISD) processor. However, most calculations in the real world will conflict with SISD models. For example, when the left and right stereo channel from the microphone performs simple filtration, it is necessary to add a previous sample value, and then add the current value to the number of sampling. The left and right channel must be repeated. The following C code is designed for this purpose (assuming sampling value is stored in the LEFT and RIGHT arrays, the variable points to the latest sound sampling value, and requires the average of the previous three values): int LEFTSUM, Rightsum; Leftsum = (Left [now-2] Left [NOW-1] LEFT [now]) / 3; Rightsum = (Right [now-2] Right [now-1] Right [now]) / 3; It seems to be quite simple, but in practical applications, each sample must be the same calculation. If you decide to use the CD sound quality, then you must perform 44,100 samples per second, with a total of 88,200 times. In order to obtain the results of LEFTSUM and RIGHTSUM, 6 instructions are performed separately. Therefore, in order to ensure a coherent CD sound quality, the total number of instructions per second is: 44100 sample × 2 channel × 6 instructions = 529000! The truth of the computer is similar to this, but the situation will be much worse - think about the resolution of 1024 × 768 and the 24-bit true color, it is assumed to display 30 frames per second (although good but non-special excellent 3D acceleration performance) In order to access each pixel, do not do any actual work, 70778880 instructions must be implemented per second, which is obviously a heavy burden. At the same time, do more things will be much more ... Let's take another two lines of C code. You will find that the data except the data source (the same instruction is applied to two different data flows), both almost completely. Imagine, if there is such a processor, although it can only perform a single instruction sequence, it can be applied to several independent data streams at the same time, and the speed will obviously be much faster. We call it "Single Command Multiple Data" (SIMD) processor. MMX and SIMD EXTENSISONS are designed to this - all add a series of new instructions for traditional X86 instructions - in Pentium and Pentium II called MMX, in Pentium III, called SSE, they can Processing data in SIMD. SSE is actually the KNI (Katmai New Instruction) instructions mentioned in our early article. With the PII-based, KNI is officially named SSE. The MMX instruction can perform SIMD operations for integers, such as -40, 0, 1, 469 or 32766, etc.; SSE instructions increase the SIMD computing capacity of floating point, such as -40.2337, 1.4355 or 877343226.012, and the like. With MMX and SSE, one instruction can perform calculations for more than 2 data streams. For the previous example, you don't have to perform 529,000 instructions per second, just implement 264600 can be executed. Because the same instructions can act simultaneously on the left and right channels.

When displaying, 70,778,880 instructions are required per second, with only 23592960, because red, green, and blue channels can be controlled by the same instructions. The role of MMX and SSE is more than this. It is assumed that the color of the color is changed between 0 and 255 (24-bit color depth). In order to display dark or lighting effects, this value is completely less than 0 or more than 255 when adjusting the light intensity. If you are saved with 8 bits, these two situations are called "underflow" and "overflow". Obviously, the value must be limited to 0 to 255, otherwise it will produce a chaotic display. In the case where there is no MMX or SSE, this situation must be judged and corrected in the software. However, since the jump instruction (JUMP) is used in the instruction, the speed of some processors is significantly slowed down. After the MMX or SSE came out, only the instructions were executed by the range restriction algorithm. The value will be "forced" between the correct range, the program will perform smoothly, and the user does not feel any changes. MMX is not only useful for games ... Slightly explore 3D games, you will know why MMX does not bring significant improvement in game performance, and SSE has excellent performance in this regard. For example, when playing quake, the 3D object is composed of a polygon, while these polygons are saved in the form of a series of points. Each point (on page 29) has a corresponding 3-axis coordinate. If it is limited to only an integer, these locations cannot be accurately expressed (such as 16 bits per coordinate axis, then only 65536 coordinate points), resulting in a very bad graphic display. Since the beginning of the Pentium, the floating point calculation capacity of Intel various processors is very powerful, and the game developers are almost willing to choose floating point operations. Since MMX does not operate floating point (worse, when switching from MMX to floating point mode, the performance of performance is also caused by the performance of performance), so MMX does not speed up the game to a higher level than the device driver. This is why MMX is disappointed with many people. If you use a 3D acceleration card to make graphic rendering, the operation completed in the game (analog, 3D deformation, lighting, etc.) will consume about 90% processor time. That is, MMX only leaves 10% of time for the processor to do other jobs, which is still under the premise of using the 3D card. SSE effectively solves this problem, in addition to maintaining the original MMX instruction, adding 70 instructions, while speeding up floating point operations, it also improves the use efficiency of memory, so that the memory speed is faster. The improvement of game performance is very significant, it can be said that it is shocking! Think of you need a processor above 400MHz to make VOODOO2, RIVA TNT, or RAGE 128 graphics cards to the highest frame rate, you can clearly understand this. According to Intel, SSE affects the effects of the following areas: 3D geometric operations and animation; graphic processing (such as Photoshop); video editing / compression / decompression (such as MPEG and DVD); speech recognition (ViaVoice is still just Toys, because of the usually you say 20 words, it will be wrong); and sound compression and synthesis. Introducing SSE into your system Pentium III (code KATMAI) officially listed at the end of February, with only two versions of 450 and 500MHz, all using Slot 1 packages and 100MHz. Therefore, if you use a BX motherboard, just upgrade BIOS, you can let PII run on your system. Of course, the premise is that your pocket is sufficient - they are still a high price. Don't worry about software and drive.

Since 1999, there will be more and more SSE optimization software listing, including some of the most popular games, such as "Tianyou Turn 3" and "Quake 3: Arena" and so on. The next series of Pentium III will be listed in a large number of second quarters this year, and will support faster 133MHz outgoing (some old boards do not provide this frequency), from 533MHz. By the end of 1999, the system of 600MHz is more "flowers everywhere." Flying Software Studio - Programmer Website Copyright

转载请注明原文地址:https://www.9cbs.com/read-1376.html

New Post(0)