Improve fastRePlace speed (fstrrep.pas)

xiaoxiao2021-03-06  63

{In fact, you can also determine the FindStr in FastReplace. If FindStr is completely Chinese, you can use FastPos in FastReplace, which can increase speed. }

Unit fstrrep;

Interface

TYPE TFASTPOSPROC = Function (Const AsourceString, Afindstring: String; Const Asourcelen, Afindlen, StartPos: Integer: Integer;

Function FastReplace (var AsourceString: string; const AFINDSTRING, AREPLACESTRING: STRING; CASESENSITIVE: BOOLEAN = false: String;

Function Fastpos (Const AsourceString, Afindstring: String; Const Asourcelen, Afindlen, StartPos: Integer: Integer;

Function Fastposnocase (Const AsourceString, Afindstring: String; Const Asourcelen, Afindlen, StartPos: Integer: Integer;

IMPLEMentation

// This TYPE declaration will become apparent later.//The first thing to note here is that I'm passing the SourceLength and FindL // ength. As neither Source nor Find will alter at any point during FastReplace //, there's no need to call the LENGTH subroutine each time function FastPos (const aSourceString, aFindString: String; const aSourceLen, aFindLen, StartPos: integer):! integer; var SourceLen: integer; begin // Next, we determine how many bytes we need to // Scan to find the "start" of afindstring. sourcelen: = asourcelen; sourcelen: = sourcelen - afindlen; if (startPOS-1)> Sourcelen dam Result: = 0; exit; end; sourcelen: = Sourcelen - StartPos; Sourcelen: = Sourcelen 2; // the asm starts here. Asm // delphi Uses ESI, EDI, AND EBX A Lot, // So We Must Preserve Them. Push ESI Push Edi Push EBX // Get The Address of SourceString [1] // and add (startpos-1). // we do this for the purpose of finding // the next ost tecurrence, rather than // always the first! mov EDI, aSourceString add EDI, StartPos Dec EDI // Get the address of aFindString. mov ESI, aFindString // Note how many bytes we need to // look through in aSourceString // to find aFindString. Mov ECX, Sourcelen // Get The First Char of Afindstring; // Note How It Is Done Outside of The // Main Loop, As It Never Changes! Mov Al, [ESI] // Now The FindFirstCharacter loop! @scasb: / / Jet the value of the current // character in AsourceString. // this is equat [= edi ^, That // is what the [] are around [edi]. Mov AH, [EDI] // Compare this character WITH ADESTSTRING [1]. CMP AH, AL // IF THEY '

Re Not Equal we don't // compare the strings. jne @nextchar //iff they're equpar, obviously we do! @compareStrings: // put the length of afindlen in EBX. Mov EBX, AFINDLEN //WE DEC EBX To point to the end of // the string; That is, we don't want to // address 1 if Afindstring is 1 in Length! DEC EBX // Add by ShengQuanhu // IF EBX IS ZERO, THEN We've SuccessFully // Compared Each Character; IE it's a match! // it will be happened when Afindlen = 1 jz @endofmatch // add end

// hRE's another optimization tip. People at this point usually push esi and // so on and then pop esi and so forth the end-instead, I Opted Not to CHAN // GE ESI AND SO ON At All. This Saves LOTS ! of pushing and popping @CompareNext: // Get aFindString character // aFindStringLength (the last char) mov Al, [ESI EBX] // Get aSourceString character (current // position aFindStringLength) mov Ah, [EDI.. EBX] // compare thematches //iff the first character. Mov al, [@ @ m i @ @ ESI] JMP @nextchar @matches: //iff (Point to // Previous Character to Compare). DEC EBX // IF EBX <> 0 ("J" UMP "N" OT "Z" ERO ), we // Continue Comparing strings. jnz @compareNext

// add by shengquanhu @endofmatch: // add end

// if EBX IS ZERO, THEN We've SuccessFully // Compared Each Character; IE's A Match! // Move The Address of the * Current * // Character in EDI. // Note, We Haven't Altered Edi Since // the first char WAS FOUND. MOV EAX, EDI // this is an address, so subtract the // address of astring [1] to get // an activity character position. Sub eax, asourceString // Inc EAX To Make IT 1-based, // Rather Than 0-based. Inc Eax // Put IT INTO Result. Mov Result, Eax // Finish this Routine! Jmp @thend @Nextchar: // this is where i jump to when i want To Continue Searching for the first char // acter of confindstring in ass (AfindString [x]) to // the next character, incodi // dec ECX TELLS US That We've Checked // Another Character, and That We're // Fast Running Out of String to Check! DEC ECX // IF EBX <> 0, Then Continue Scanning // First Character.//add by ShengQuanhu // IF A H is Chinese char, JUMP AGAIN JZ @ Result0 CMP AH, $ 80 JB @Scasb Inc Edi Dec ECX // Add by ShengQuanhu End

JNZ @scasb

// add by shengquanhu @ result0: // address by hngquanhu end

// if EBX = 0, THEN MOVE 0 INTO Result. Mov Result, 0 // Restore EBX, EDI, ESI for Delphi /// 帖子're popped in the // Opposite Order THEY WERE PUSHED . @Thend: POP EBX POP EDI POP ESI end;

// This routine is an identical copy of FastPOS except where commented! The ide // a is that when grabbing bytes, it ANDs them with $ df, effectively making the // m lowercase before comparing. Maybe this would be quicker if aFindString was // made lowercase in one fell swoop at the beginning of the function, saving a // n AND instruction each time.function FastPosNoCase (const aSourceString, aFindString: String; const aSourceLen, aFindLen, StartPos: integer): integer; var SourceLen: INTEGER; Begin Sourcelen: = Asourcelen; Sourcelen: = Sourcelen - Afindlen; IF (StartPOS-1)> Sourcelen Then Begin Result: = 0; EXIT; End; Sourcelen: = Sourcelen - StartPos; Sourcelen: = Sourcelen 2; ASM PUSH ESI PUSH EDI PUSH Ebxmov Edi, AsourceString Add Edi, StartPos Dec Edi Mov ESI, AFINDSTRING MOV ECX, SOURCEN MOV Al, [ESI]

// add by shengquanhu: Just Modified The Lowercase 'A' .. 'Z' CMP Al, $ 7A JA @scasb

CMP Al, $ 61 JB @scasb // end -------------------------------------------------------- -

// Make Al Uppercase. And Al, $ DF

@Scasb: MOV AH, [EDI]

// add by shengquanhu: Just Modified The Lowercase 'A' .. 'Z' CMP AH, $ 7A JA @comparechar

CMP AH, $ 61 JB @comparechar // end ----------------------------------------------------------------------- -

// Make Ah Uppercase. And Ah, $ DF

@Comparechar: CMP AH, Al jne @nextchar @compareStrings: MOV EBX, AFINDLEN DEC EBX

// add by hngquanhu jz @endofmatch // address end

@CompareNext: MOV Al, [ESI EBX] MOV AH, [EDI EBX]

// add by shengquanhu: Just Modified The Lowercase 'A'. 'Z' CMP AH, $ 7A JA @Lowrah

CMP Al, $ 61 JB @Lowrah // end ---------------------------------------- - // Make Al and Ah Uppercase. And Al, $ DF

// add by shengquanhu: Just Modified The Lowercase 'A'. 'Z' @Lowrah: CMP AH, $ 7A JA @ Comparechar2

CMP AH, $ 61 JB @ CompareChar2 // end ---------------------------------------------------------------------------------------------------- -

And Ah, $ DF

@ Comparechar2: CMP Al, Ah JZ @Matches Mov Al, [ESI]

// Add by Shengquanhu: Just Modified The Lowercase 'A' .. 'Z' CMP Al, $ 7A JA @nextchar

CMP Al, $ 61 JB @nextchar // end -------------------------------------------------------------------------------------------------------------------------------- -

// Make Al Uppercase. And Al, $ DF JMP @nextchar @matches: DEC EBX JNZ @compareNext

// add by shengquanhu @endofmatch: // add end

Mov Eax, EDI SUB EAX, AsourceString Inc EAX MOV Result, EAX JMP @thend @Nextchar: Inc Edi Dec ECX // Add by ShengQuanhu // IF AH IS Chinese Char, Jump Again JZ @ Result0 CMP AH, $ 80 JB @Scasb Inc EDI DEC ECX // Add by Shengquanhu End Jnz @scasb @ Result0: Mov Result, 0 @Theend: Pop EBX POP EDI POP ESI End;

// My Move Isn't As Fast As Move When Source and Destination Aree Both DWORD AL // IGNED, But It's Certainly Faster When're NOT. AS We're Moving Charac // Ters in A String, IT ISN ' very likely at all that both source and destinat // ion are DWord aligned, so moving bytes avoids the cycle penalty of reading / w // riting DWords across physical boundaries.procedure MyMove (const Source; var Dest; Count: Integer); asm // NOTE: WHEN THIS FUNCTION IS CALED, // Delphi Passs The parameters as stock: // ECX = count // Eax = const Source // EDX = var dest Source // EDX = var dest //10 =//10 altogether;... there's no point pushing registers cmp ECX, 0 Je @JustQuit // Preserve the critical Delphi registers push ESI push EDI // Move Source into ESI (generally the // SOURCE register) // Move Dest into EDI (generally THE DEST // Register for string commands). // this might not actually be necessary, // as I'm not use movsb etc. // i might becom b e j; // there could be a penalty for not use // ESI, EDI, But I doubt it. // this is another thinking! Mov ESI, EAX MOV EDI, EDX // The Following loop is the Same as repnz // Movsb, But Oddly Quicker! @Loop: // Get The Source Byte. Mov Al, [ESI] // Point to Next Byte. Inc ESI // Put IT ITE DEST. MOV [EDI], Al // Point Dest To NEXT POSITION. INC EDI // DEC ECX TO Note How Many We Have Left to Copy. Dec ECX // IF ECX ​​<> 0, Then Loop. JNZ @

Loop // Another Optimization Note. // MOV Al, [ESI] // MOV [EDI], Al // Inc ESI // Inc ESI // There's a hidden problem here. I won 't go into too much detail, but the Pe // ntium can continue processing instructions while it's still working out the // result of INC ESI or INC EDI. If, however, you use them while they're stil // l being Calculated, The Processor Will Stop Until The're Calculate (a Pen // Alty). Therefore, I AlTer ESI and EDI AS FAR in Advance As Possible of Using // Them. // Pop The Critical Delphi Registers // That We ' Ve Altered. Pop Edi Pop ESI @Justquit: End;

// POINT 1: I Pass Var AsourceString Rather Than Just AsourceString. This is be // cause I'll Just Be Passed a Pointer To The Data Rather Than A 10m Copy of T // He Data Itself, Which Is Much Quicker! Function FastReplace (var aSourceString: String; const aFindString, aReplaceString: String; CaseSensitive: Boolean = False): String; var // size already passed to SetLength, // the REAL size of RESULT ActualResultLen, // Position of aFindString is aSourceString.. Currentpos, // Last Position The Afindstring Was Found At. Lastpos, // Bytes To Copy (That IS, Lastpos To this POS). Bytestocopy, // The "Running" Result Length, NOT The Actual One. Resultlen, // Length .. of aFindString, to save // ​​calling LENGTH repetitively FindLen, // Length of aReplaceString, for the same reason ReplaceLen, SourceLen: Integer; // This is where I explain the // TYPE TFastPosProc from earlier FastPosProc:! TFastPosProc; begin // as this function has the option of being case-inse NSITIVE, I'D NEED TO CALL / / EIGEER FASTPOS OR FASTPOSNOCASE. The problem is this // withnin a loop. this is a bad idea, since the result never change, Since the Result Never change -In Which Case We CAN DETERMINE ITIN ADVANCE, LIKE SO / /: IF CaseSensitive Ten FastPosProc: = fastpos else fastposproc: = fastposnocase; // i don't think i actually need // this, But I don't really mind !............................................................. ..

// If we already have room for the replacements, // then set the length of the result to // the length of the SourceString if ReplaceLen <= FindLen then ActualResultLen:. = SourceLen else // If not, we need to calculate the // worst-case scenario // That is, the Source consists ONLY of // aFindString, and we're going to replace // every one of them ActualResultLen:.! = SourceLen (SourceLen * ReplaceLen div FindLen) ReplaceLen; // set the length of results; this // Will Assign THE MEMORY, ETC. SETLENGTLEN; CURRENTPOS: = 1; Resultlen: = 0; Lastpos: = 1; // Again, I'm Eliminating anix statement in a loop by repeating code-this ap // proach results in very slightly larger code, but if ever you can trade some // memory in exchange for speed, go for it! if ReplaceLen> 0 then begin repeat // Get the Position of the first (or next) // AfindString in AsourceString. // Note That There's No if CasensITIVE, // I just call FastPOSProc, which is pointing // to the correct pre-determined routine CurrentPos:.. = FastPosProc (aSourceString, aFindString, SourceLen, FindLen, CurrentPos); // If 0, then we're finished if CurrentPos = 0 THEN BREAK; // Number of Bytes To Copy from the // Source String IS Currentpos - Lastpos, // IE "cat" in "the cat the". Bytestocopy: = currentpos-lastpos; // Copy Chars from astring /// To the end of results. mymove (asourceString [lastpos], result [resultlen 1], bytestocopy;

// Copy chars from aReplaceString to // the end of Result MyMove (aReplaceString [1], Result [ResultLen 1 BytesToCopy], ReplaceLen);. // Remember, using COPY would copy all of // the data over and over .. again // Never fall into this trap (like a certain // software company did) // Set the running length to ResultLen: = ResultLen BytesToCopy ReplaceLen; // Set the position in aSourceString to where // we want to . continue searching from currentPos: = currentPos FindLen; LastPos: = currentPos; until false; end else begin // You might have noticed If ReplaceLen> 0. // Well, if ReplaceLen = 0, then we're deleting the // substrings, rather than replacing them, so we // do not need the extra MyMove from aReplaceString repeat currentPos: = FastPos (aSourceString, aFindString, SourceLen, FindLen, currentPos); if currentPos = 0 then break; BytesToCopy:. = CurrentPos- Lastpos; MyMove (aSourceString [LastPos], Result [ResultLen 1], BytesToCopy); ResultLen: = ResultLen BytesToCopy ReplaceLen; CurrentPos: = CurrentPos FindLen; LastPos: = CurrentPos; until false; end; // Now that we've Finished doing all of the replace, i Just Need to adjut: dec (Lastpos); // now i set the length ip. That IS, "M / / at "When Replacing" The "in" sat on the mat ". setlength (result, resultlen (sourcelen-lastpos); // if there '

Sa bit of string dangling, life // add it to the end of our string. if LastPos 1 <= Sourcelen Ten Mymove (AsourceString [LastPos 1], Result [Resultlen 1], Sourcelen-Lastpos; END; END; .

转载请注明原文地址:https://www.9cbs.com/read-109125.html

New Post(0)