Recently, I encountered a problem of cycle efficiency when making a very simple STEREO-> MONO transformation function. After thinking about how to optimize, I feel a little meaning, write down the record. The formula of the algorithm is very simple, the data in the MONO = (LCH RCH) / 2Stereo data stream is stored in the format of [LCH RCH LCH RCH LCH .....], in general, the first-minded way Is this [code] for (i = 0; I So I joined the symbol of the mandatory type conversion during the process of operations: (FLOAT). However, the meaning of this symbol joining is that the result of dividing the STEREO parameter should be saved and the remainder is available at the same time. And I thought about it, it seems that the remaining part as the filter input function, the impact on the results is not very large, can not be. So, I will drop (float). Sure enough, the CPU usage rate fell to about 12% when the last operation was run. The results of the operations are also not large before modification. It is necessary to remind it that because my buffer size is 5000 and 2500, it is a multiple of 10, so I have a loop operation 10 times, no problem. If the size of the buffer and the number of cyclic arithmeters are not multiple relationship, then the program calls the illegal memory address, causing the memory area to be destroyed. In practical applications, the number of calculations per cycle must not be, the better, this can be repeatedly tried according to the specific situation to achieve maximum efficiency. Moreover, it is best to adjust the number of per loop execution digital operations in accordance with the data bus width of the CPU running in the program and predict the size of the CACHE in the CPU may be occupied. In addition, when using the above code, you must first understand that the C language interpreter used at the time is similar to Monobuffer [i ] = (float) (Stereobuf [i * 2] stereobuf [i * 2 1]) / 2; this statement is not only the I self-increment operation after all of the operations, otherwise you need to list I , as a separate instruction. In fact, this approach is in some sense that the operation speed before and after the optimization should be similar, but the optimization is to reduce the length of the CPU time in the current task at the same time. That is to say, more CPU time can be empty, and other task services for the multi-tasking system of embedded software. The above is my own experience, if you have any opinions, welcome to discuss.