Code optimization test - short cycle optimization (on)

zhaozj2021-02-16  66

The last compare queue performance, SGI-STL is more than me, it is more than me (it is 2.3 times, or mixed in a bunch of different code, it is estimated that it is actually three times to block), so determined Improve code quality. But this requires constant accumulation, now only a little bit - hey, when can I see the mother's back. Starting from a short cycle, this overhead for the CPU is very large, and the chances of occurrence in the program are relatively large, so optimization, it can bring significant increase in efficiency. In order to explain the problem, I did the following test: (I copied my Timer.h to the inlcude directory) test environment: C500, 192RAM, WIN2000SP3, turn off other front programs

Test program 1

#include

#include

Void sum1 ()

{

INT j = 0;

For (unsigned i = 1; i <630001; i ) j = i;

}

Void Sum2 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

}

}

Void Sum3 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

}

}

Void Sum4 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

}

}

Void Sum5 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

}

}

Void Sum6 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

}

}

Void Sum7 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;}

}

Void Sum8 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

}

}

Void Sum9 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

}

}

Void Sum10 ()

{

INT j = 0;

For (unsigned i = 1; i <630001;)

{

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

J = i ;

}

}

int main ()

{

Timer Timer; Double T;

For (unsigned k = 0; k <3; k )

{

Timer.Start ();

Sum1 ();

T = Timer.getTime ();

Cout << "SUM1:" << t << endl;

Timer.Start ();

Sum2 ();

T = Timer.getTime ();

COUT << "sum2:" << t << endl;

Timer.Start ();

Sum3 ();

T = Timer.getTime ();

Cout << "Sum3:" << t << endl;

Timer.Start ();

Sum4 ();

T = Timer.getTime ();

Cout << "SUM4:" << t << endl;

Timer.Start ();

Sum5 ();

T = Timer.getTime ();

COUT << "sum5:" << t << endl;

Timer.Start ();

Sum6 ();

T = Timer.getTime ();

Cout << "SUM6:" << t << endl;

Timer.Start (); SUM7 ();

T = Timer.getTime ();

Cout << "SUM7:" << t << endl;

Timer.Start ();

Sum8 ();

T = Timer.getTime ();

Cout << "sum8: << t << endl;

Timer.Start ();

Sum9 ();

T = Timer.getTime ();

Cout << "SUM9:" << t << endl;

Timer.Start ();

SUM10 ();

T = Timer.getTime ();

Cout << "SUM10:" << t << endl;

}

Return 0;

}

Test results 1

Sum1 Sum2 Sum3 Sum4 Sum5 Sum6 Sum7 Sum8 Sum9 Sum10 VC6 Release version, generated file size 57,344B, the unit of time ms 3.83457 2.59251 1.936 1.68038 1.51304 1.81448 1.71642 1.71418 1.44795 1.49488 3.82311 2.51848 1.93628 1.71474 1.51276 1.75022 1.7167 1.77564 1.44711 1.42728 3.82479 2.51848 1.93544 1.7315 1.51304 1.74994 1.7167 1.81196 1.44739 1.49488 bCC32, generated file size 141,312B, the unit of time ms 2.57491 1.9645 1.936 1.75218 1.51276 1.75078 1.74827 1.41862 1.44851 1.43035 2.51987 1.92119 1.93656 1.68178 1.5136 1.75022 1.75022 1.41862 1.44795 1.42951 2.57491 1.89074 2.03154 1.6815 1.5122 1.75078 1.74715 1.90639 1.44795 1.4974

It can be seen that for connecting the additional cycle, it is indeed a performance that can be improved, and the speed of 5 items in VC6 is not expanded to 2.5 times; because BCC32 is optimized, it is not very obvious. There is only 1.6 times - BCB's supporter, in this respect, they are blessed, do not optimize, the speed of the code compiled by the BCC32 is 1.5 times the VC6; if everyone is written optimized, the speed is flat.

The peak value of the smallest price is 5, not 4 items, which may be the first time, which can explain. In addition, it is expected that the development of floating-point operations should be greater, the actual situation is not this (even 1 times no improvement), maybe my code is not written, who has an example to provide.

A lot of people who use VC6, a lot of c code compiled by its compiler is the highest efficiency of Win32 - some people are just use in its IDE environment, without MFC. I don't know if there is any corresponding optimizer, otherwise I will be too much trouble every time I am.

转载请注明原文地址:https://www.9cbs.com/read-22987.html

New Post(0)