Write program is an attitude (2) four times speed MEMMOVE

zhaozj2021-02-16  51

I have found an interesting phenomenon on the Internet. Some company's technical interviews always patron some common ANSI C functions. I have been fortunate to achieve a simple Scanf proposition last year. This type of question, the surface is actually not difficult, but it is difficult to consider it, it is difficult to write, but it is difficult, and there is no certain internal work, this is also frequently used to test. Reason for talent rule.

In fact, I have known some masters to study C Run-Time code in order to practice internal work, and I will have a widerness of the view, involving more classic code, and the opposite side is my write. A great role. I was in this time, I hope to record some little bit with this series, and then re-examine some of my ideas with the excellent opportunity to exchange with the majority of netizens.

First, I would like to thank the first "Write program is an attitude (1) Strcmp" Reply netizen Darkay, transfer my sight into the topic of more interest. I mentioned that the MS Run-Time implements functions such as strcmp just the characterization of the algorithm. Compared to the ASM file such as Strcmp.asm is the specific implementation of the integration of the Intel instruction set. This can be a perhaps inappropriate class ratio, strcmp.c is a pseudo code description, and strcmp.asm is a specific implementation; because Strcmp.c compiled with some C compiler is likely to be directly strcmp.asm more Efficient, although the algorithm's ideas have not changed. Through this topic we are watching a classic Memcpy and Memmove:

Void * __cdecl memcpy

Void * DST,

Const void * SRC,

SIZE_T COUNT

)

{

Void * Ret = DST;

/ *

* Copy from Lower Addresses to Higher Addresses

* /

While (count -) {

* (char *) DST = * (char *) SRC;

DST = (char *) DST 1;

SRC = (char *) SRC 1;

}

Return (RET);

}

Void * __cdecl memmove

Void * DST,

Const void * SRC,

SIZE_T COUNT

)

{

Void * Ret = DST;

IF (DST <= src || (char *) DST> = ((char *) src count) {

/ *

* Non-overlapping buffers

* Copy from Lower Addresses to Higher Addresses

* /

While (count -) {

* (char *) DST = * (char *) SRC;

DST = (char *) DST 1;

SRC = (char *) SRC 1;

}

}

Else {

/ *

* Overlapping buffers

* Copy from Higher Addresses to Lower Addresses

* /

DST = (char *) DST COUNT - 1;

SRC = (char *) SRC COUNT - 1;

While (count -) {

* (char *) DST = * (char *) SRC;

DST = (char *) DST - 1;

SRC = (char *) SRC - 1;

}

}

Return (RET);

}

Here I saved it.

#if defined (_m_mrx000) || Defined (_M_ALPHA) || Defined (_M_PPC) || Defined (_M_IA64) Compiling Code in the switch, there is another handle for these targets, and we are now only positioned on Intel 32-Bit.

1. The documentation said that Memcpy did not consider memory overlapping, and Memmove considered, in fact, the code was obvious, Memcpy was just a subset of Memmove, so it is recommended to use Memmove to use Memmove. Memory overlap.

2. Consider whether you can use C to describe more refining? If the assignment from the low address to the high address can be simply written:

While (count -) {

* DST = * SRC ;

}

3. Is the instruction generated after the above code compiled? Is it a byte one byte copy?

4. Question 3 will bring our discussion on the subtitle - four times

Intel 80386 The above-supported instructions focuses the MOVSD instruction and the REP instructions Move DWORD (32bit) in memory, ie four bytes of a clock cycle COPY, four times faster than the Movsb (8bit) instruction. However, the destination memory address that uses MOVSD to move must be 32 bit alignment. Briefly describes the memory COPY from low to high.

Let L for the total number of bytes to copy, the dest start address, X is the number of bytes from DEST, y is the number of dword to copy, z is the remaining DWORD-aligned word. Section number. Then there is a formula as follows:

X = (4 - DEST & 3) & 3 (bytes) // Low two-digit address is DWORD-aligned

Y = (L - X) >> 2 (dwords) // All divided by 4 is DWORD number

Z = (L - X - Y * 4) (bytes)

Do the corresponding processing.

Summarize, the memory of the large segment will be very optimized with Memmove, and the Code written with C will reduce the four-fold efficiency, which is why the ASM is directly implemented. In fact, reading procedures is also an attitude, I don't know if I have drill the horn pointed, I hope not.

转载请注明原文地址:https://www.9cbs.com/read-25584.html

New Post(0)