Optimize the code to see which execution is faster !!!
Recently, I have written a small program with a function to change the A and B value, because the function calls frequently, so there are three versions of it, please try to see REV1, REV2, and REV3 three functions which execute fast.
// Delphi code is as follows:
Unit unit1;
// created by bhb. 2004-08-31
Interface
Uses Windows, Messages, Sysutils, Classes, Graphics, Controls, Forms, Dialogs, Stdctrls;
TYPE TFORM1 = Class (TFORM) Procedure formcreate (Sender: TOBJECT); Private
Public {public declarations} END;
Var Form1: TFORM1;
IMPLEMENTATION
{$ R * .dfm}
// High-precision timing function Xtimer (VAR T0: INT64; BSTART: BOOL): INT64; VAR T, FQ: INT64; Begin QueryperformanceCounter (T); if Bstart THEN THEN T0: = T Else Begin T: = T - T0; QueryPerformanceFrequency (FQ); Result: = trunc (T / FQ * 1000); end;
Procedure Rev1 (VAR A, B: Integer); Begin ASM MOV ECX, [EDX] XCHG [EAX], ECX MOV [EDX], ECX END;
Procedure REV2 (VAR A, B: Integer); Begin A: = a xor B; B: = a xor b; A: = a xor b; end;
Procedure Rev3 (VAR A, B: Integer; var T: Integer; Begin T: = A; A: = B; B: = T; END;
Procedure TForm1.FormCreate (Sender: TOBJECT); TYPE TREVFUN = Procedure (VAR A, B: Integer); var s: string; a, b: integer; t: int64;
Procedure test (const revname: string; rev: turn); var i: integer; begin xtimer (t, true); // Start
// Ten million cycles for i: = 0 to 10000000 DO REV (A, B);
T: = = XTimer (t, false); // End S: = S Revname Format ('time consuming:% D milliseconds.' # 13 # 10, [t]); end;
Begin A: = 123; B: = 321; TEST ('REV1', REV1); TEST ('REV2', REV2); TEST ('REV3', REV3); showMessage (s); end;
End.
If you think that the compilation is faster, you may choose REV1; if you have seen the optimization book, you may choose Rev2, almost 100-percent textbooks are written like this, XOR operation is faster; if you have just learned your computer, you may Will do this program practice, the method of Rev3 is simple, easy to understand. Which is faster, the following is the execution result under different platforms: (the smaller the value, the faster the speed)
AMD DURON 750MHZ execution results:
P4 2.00GHz Execute:
How is it unexpected, no matter which platform, although their execution time is short, the corresponding execution speed is: rev3> Rev2> Rev1
The code written by the compilation instructions is near the P4 machine, but the speed is nearly ten times more slow than the fastest Rev3, and the performance of the AMD machine is slower; Rev2 is the standard answer, the same or operation. Faster, the test results are not the case ... Rev3 has achieved the last victory, which is enough to subvert our inertial thinking, the simplest programming method is the fastest implementation, in fact, it is also very reasonable. Simple, REV1 although only three assembly instructions are completed, it is the least in three functions (other two functions are implemented by six assembly instructions separately), but the XCHG instruction is executed, in P4 The performance on the machine is the most realistic; the same reason is the same, it is not used to temporary variables, which is the number of XOR operations, this should be fast than REV3, but actually defeat, the reason is because The xor operation is more time-consuming than the Rev3 assignment command, the REV3 is used to use a temporary variable to save the intermediate value, but a stack, the slum is not consumed, and the remaining assembly instructions are used. MOV, and the MOV directive is the most frequent compilation instruction using the CPU, so the manufacturer will make better optimization on the hardware circuit design of the instruction, so the REV3 will not be blamed in the final win.