C Program Optimization Road (3)
--Liyuming1978@163.com
This article tells the common optimization methods of writing C program code, divided into I / O articles, memory articles, algorithms. MMX I also want to be here, but due to the content and title of the content, I decided to change a name, called MMX technology, and the MMX application in H263 video compression technology. Two articles.
three. Algorithm
In the previous one, we told the optimization of memory operations, this article mainly tells some commonly used optimization algorithms. There are too many east, and the content may be a bit messy, forgive me.
I. Start from the small place:
Let me talk about some small places first:
1 such as N / 2 is N >> 1 This is a common method, but it is important to pay attention to these two not complete equivalents! Because: if n = 3, n / 2 = 1; n >> 1 = 1; However, if n = -3, N / 2 = -1; n >> 1 = -2, said in a positive number When they all finished, but they were not the same. (Integer YUV to RGB transformation in JPG2000 must use >> instead of division is this truth)
2 is also a = a 1 to write as a ; a = A b is to write as A = B (estimated generally using VB will write A = A 1: P)
3 Multiple arithmetic fusion: such as A [i ]; just visit A [i], let i plus 1; say from the perspective of compilation, this is indeed optimized, if written as a [i], and i If there is a possibility of reading two times to the I variable, write (specifically to see the compiler's optimization ability), but if A [i ], you must read the I variable once. However, there is a problem here to pay attention: the integration within the conditional judgment must be careful, such as: (0 block judgment in idct transformation, Chen Wang algorithm)
IF (! ((((((xik [8 * 4] << 8) | (x2 = blk [8 * 6]) | (x3 = BLK [8 * 2]) | (x4 = BLK [8 * 1 ]) | (X5 = BLK [8 * 7]) | (x6 = BLK [8 * 5]) | (x7 = BLK [8 * 3])))))))))))))
In the condition judgment, the assignment statement is fused, but in fact, if the condition is true, it is not necessary to assign the statement, which means that when the condition is really, there are more garbage statements, these are the problems on the H263 source code. Although these garbage statements have calculated 0 pieces, the time increases by 30%, but since the IDCT only accounts for only 1% time, only 30% to 70% of the time, the performance loss is nothing to do. (This is the conclusion that I get when I use the compilation to change the source code). Here is also illustrated, the program optimization has a focus on the most time consuming place. There is no great practical meaning for code optimization when you are still consuming.
II. Internal memory change speed:
The world is always difficult, the program is also the same, most cases, speed is the same memory (or performance, such as what is compressed) is uncomfortable. A large-scale algorithm for current program acceleration is a large-scale approach to use a check list to avoid computing (such as a Huffman code table in JPG, in YUV to RGB transformation) This original complex computing is now only checking tables, although waste Memory, but the speed is significantly improved, or it is very cost-effective. There is also such ideas in the database query, store hotspots to accelerate queries. Now introduce a simple example, (temporarily thinking, huh): For example, in the program, it is often necessary to calculate 1000 to 2000 steps, then we can use an array a [1000] first put these values first Ok, keep down, you have to calculate 1200! At the time, check table A [1200-1000] is OK. III. Zero
Since scattered memory allocation, and a large amount of small objects are consumed, the optimization of them will sometimes have effect, such as the problem that I am talking about, is because of a large number of scattered memory allocations. Now, from a VB program, I used to use VB to make a small program, (Oh, mainly using VB programming than Vc, I can write one) when using MSFLEXGRID control (it is a form) Controls), found that if the new line is increased, the refresh speed is very slow, so I will add 100 lines every time, when the data is more than the new line, add 100 lines, so that "zero is "Oh, use this method, refreshing the speed than it is nam! In fact, there are many ideas, such as: When the program is running, it is actually a certain space. The subsequent small memory allocation is first in this space, which guarantees that the internal memory is less likely, while speeding up speed.
IV. Conditional statements or CASE statements will be most likely to put in front
The optimization effect is not obvious. If you want to get it, you can't think of it.
V. For the readability of the program, do not do those compilers can do or have undispriced processing:
This is very important, a normal program is good, mainly its readability, portability, reusability, and then it is its performance. So, if the compiler itself can help us optimize, we don't have to write things that everyone don't understand. For example, a = 52 (end) -16 (start); this may be because it understands the meaning of A when he reads the program. We don't have to write as a = 36, because the compiler will help us calculate it.
IV. Specific situation specific analysis:
Specific analysis, specific analysis, this is the truth that is in the four seas. There is no specific analysis, and it is impossible to add a solution to the problem. Let me talk about the method of analysis. That is, how to find the time-consuming point of the program: (From the easiest way, first explain a function gettickcount (), this function is called once at the head, the return value is reduced by the time consumption, accurate to 1ms)
1 For a relatively time consuming function, run twice, or comment out the statement inside (to ensure that the program can run), look at most (or less) how much time. This method is simple and incorrect. 2 Each place is used with a GetTickCount () function test time, pay attention to getTickCount () can only be accurate to MS. The general less than 10 ms is not too accurate.
3 Use another function queryperformancecounter (& counter) and queryperformancefrequency (& frequency), the CPU clock cycle is calculated, followed by the CPU frequency to remode is time. However, if you want to get to this step, it is recommended to set the process to the highest level to prevent it from being blocked.
Finally, let me handle a program: The program requires me to forget, there is a function in anyway, there is a large loop in the function, and the process is time consuming in the circulation. Result The initial situation was still very fast. The slower it was, the slower it was; I found that the original loop was jumped after the initial cycle, and the number of cycles after the back is increasing. . I found why I was slow, I can treat the disease. My handling is that each cycle is not from the beginning, but starts around the last loop jumped out of the last cycle (because the next loop jumped out of the next time I don't have a small, So, it is also necessary to traverse the front), so the speed of the program is also very fast. I speak this reason is in practical use, to make specific analyte's slow real reasons to achieve the best optimization effect.