This is to remove the MMX instruction pairing and assembly instruction pair, after organizing code, reading should be more intuitive, for you to understand its Bitboard principle should have a big help, about the relevant information of MMX instructions can be searched for Google.com The make_bitboard () function is the code in my program, because the space cannot list all the code, but I want to be smart, you should know the probilization of the code. Interestingly, after organizing, I found that there is a compilation directive is redundant, I don't know if it is paired or any other reason. (But it was a long time ago, I don't remember now, you can confirm that, in the end of the code), you should also know that such an incomplete pairing directive is very low. Note: The font is slightly increased, 9CBS is very difficult, but it seems that it looks good.
Define thiither_color (color) (~ color) & 0x03)
TypedEf struct {uint8 board [Board_Rows 2] [Board_cols 2];} board_type;
static unsigned __int64 dir_mask0; static unsigned __int64 dir_mask1; static unsigned __int64 dir_mask2; static unsigned __int64 dir_mask3; static unsigned __int64 dir_mask4; static unsigned __int64 dir_mask5; static unsigned __int64 dir_mask6; static unsigned __int64 dir_mask7; static unsigned __int64 c0f; static unsigned __int64 c33; Static unsigned __INT64 C55;
void init_mmx (void) {dir_mask0 = 0x007e7e7e7e7e7e00; dir_mask1 = 0x00ffffffffffff00; dir_mask2 = 0x007e7e7e7e7e7e00; dir_mask3 = 0x7e7e7e7e7e7e7e7e; dir_mask4 = 0x7e7e7e7e7e7e7e7e; dir_mask5 = 0x007e7e7e7e7e7e00; dir_mask6 = 0x00ffffffffffff00; dir_mask7 = 0x007e7e7e7e7e7e00; c0f = 0x0f0f0f0f0f0f0f0f; c33 = 0x3333333333333333; c55 = 0x5555555555555555; }
void make_bitboard (board_type * board_ptr, BitBoard & my_bits, BitBoard & opp_bits, UINT8 objcolor) {UINT8 curcolor; UINT8 thithercolor = THITHER_COLOR (objcolor); / * my_bits.high = 0; my_bits.low = 0; opp_bits.high = 0; opp_bits. low = 0; * / unsigned __int64 power = 0x0000000000000001; unsigned __int64 my_bits64 = 0x0000000000000000; unsigned __int64 opp_bits64 = 0x0000000000000000; for (int i = BOARD_ROWS; i> = 1; i--) {for (int j = BOARD_COLS; j> = 1; j -) {curcolor = Board_ptr-> Board [i] [j]; if (curcolor == objcolor) my_bits64 | = power; else if (curcolor == thiTherColor) OPP_BITS64 | = Power; Power << = 1; } }
my_bits.high = (unsigned long) (my_bits64 >> 32); my_bits.low = (unsigned long) (my_bits64 & 0xffffffff); opp_bits.high = (unsigned long) (opp_bits64 >> 32); opp_bits.low = (unsigned LONG) (OPP_BITS64 & 0xFfffffff);
Int bitboard_mobility (const bitboard my_bits) {unsigned int count; // unsigned int count = 0;
__ASM {// push EAX; / * push ECX; Push EDX; Push Ebx; Push ESI; Push EDI; // * / / / * Ready for Init Data * / // Mov Eax, 0 MOV EBX, MY_BITS.HIGH; MOV ECX, MY_BITS.LOW; MOV EDI, OPP_BITS.HIGH; MOV ESI, OPP_BITS.LOW; // MOVD MM0, EBX; PSLLQ MM0, 32; MOVD MM3, ECX; POR MM0, MM3; MM0 IS Bitboard Of My_BITS MOVD MM1, EDI; PSLLQ MM1, 32; MOVD MM4, ESI; POR MM1, MM4; MM1 IS Bitboard OPP_BITS PXOR MM2, MM2; MM2 <- 0x0000000000000000
/ * shift = -9 rowdelta = -1 Coldelta = -1 * / / * shift = 9 rowdelta = 1 Coldelta = 1 * /
/ * Disc # 1, Flip Direction 0. * / / * DISC # 1, Flip Direction 7. * / MOVQ MM3, MM1; MM3 / * Disc # 3, Flip Direction 0. * / / * Disc # 3, Flip Direction 7. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 9; PSRLQ MM7, 9; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * DISC # 4, FLIP DIRECTION 0. * / / * DISC # 4, Flip Direction 7. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5 , 9; PSRLQ MM7, 9; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * Disc # 5, Flip Direction 0. * / / * DISC # 5, FLIP DIRECTION 7. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 9; PSRLQ MM7, 9; Pand MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * Disc # 6, Flip Direction 0. * / / * DISC # 6, Flip Direction 7. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSRLQ MM7, 9; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; PSLLQ MM4, 9; PSRLQ MM6, 9; POR MM2, MM4; POR MM2, MM6; / * ************* ************* * / PUSH ESI; PUSH EDI; PUSH ECX; PUSH EBX; and edi, 0x7e7e7e7e; 0x7e7e7e7e and esi, 0x7e7e7e7e; 0x7e7e7e7e; value of:; 011111110; 011111110; 011111110; 011111110 shl ebx, 1; shl ecx, 1; and ebx, edi; and ecx, esi; mov eax, ebx; mov EDX, ECX; SHL EDX, 1; SHL EAX, 1; And Eax, EDI; AND EDX, ESI; OR EBX, EAX; OR ECX, EDX; MOV EAX, EBX; MOV EDX, ECX; SHL EDX, 1; SHL Eax, 1 And Eax, EDI; AND EDX, ESI; EBX, EAX; OR ECX, EDX; MOV EAX, EBX; MOV EDX, ECX; SHL EDX, 1; SHL EAX, 1; and Eax, EDI; And Edx, ESI ; OR EBX, EX; MOV EAX, EBX; MOV EDX, ECX; SHL EDX, 1; SHL EAX, 1; and Eax, EDI; and EDX, ESI; OR EBX, EX; MOV EAX, EBX; MOV EDX, ECX; SHL EDX, 1; SHL EAX, 1; And Eax, EDI; AND EDX, ESI; OR EBX, EJ; or ECX, EDX; SHL EBX, 1; SHL ECX, 1; / * ************************* * // * Shift = -8 rowdelta = - 1 Coldelta = 0 * / / * shift = 8 rowdelta = 1 Coldelta = 0 * / / * DISC # 1, Flip Direction 1. * / / * DISC # 1, Flip Direction 6. * / MOVQ MM3, MM1; MOVQ MM4, MM0; MOVQ MM6, MM0; PAND MM3, DIR_MASK1; 0x00fffffffffffffffff00;; Dir_mask1 of Value :; 10000000; 11111111; 111111; 11111111; 11111111; 111111; 11111111; 111111; PAND MM4, MM3; PAND MM6, MM3; / * Disc # 2, Flip Direction 1. * / / * DISC # 2, Flip Direction 6. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 8; PSRLQ MM7, 8; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * Serialize Here: Add Horizontal SHL Flips. * / MOVD MM5, EBX; PSLLQ MM5, 32; MOVD MM7, ECX; POR MM5, MM7; POR MM2, MM5; / * DISC # 3, FLIP DIRECTION 1. * / / * DISC # 3, FLIP DIRECTION 6. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 8; P4RLQ MM7, 8; Pand MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * Disc # 4, Flip Direction 1. * / / * DISC # 4, Flip Direction 6. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 8; P4RLQ MM7, 8; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * DISC # 5, Flip Direction 1. * / / * DISC # 5, Flip Direction 6. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 8; PSRLQ MM7, 8; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * DISC # 6, FLIP DIRECTION 1. * / / * DISC # 6, FLIP Direction 6. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5 , PSRLQ MM7, 8; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; PSLLQ MM4, 8; PSRLQ MM6, 8; POR MM2, MM4; POR MM2, MM6; / * *************************** * / POP EBX; PUSH ECX; PUSH EBX; SHR EBX, 1; SHR ECX, 1; and EBX, EDI; EDI = 0x7e7e7e7e and ECX, ESI; ESI = 0x7e7e7e7e; value of:; 011111110; 0111110; 011111110; 0111110; EBX; MOV EDX, ECX; SHR EAX; , 1; SHR EDX, 1; And Eax, EDI; AND EDX, ESI; OR EBX, EX; OR ECX, EDX; MOV EAX, EBX; MOV EDX, ECX; SHR EAX, 1; SHR EDX, 1; and EEX , EDI; and EDX, E Si; OR EBX, EAX; OR ECX, EDX; MOV EAX, EBX; MOV EDX, ECX; SHR EAX, 1; SHR EDX, 1; and Eax, EDI; AND EDX, ESI; OR EBX, EAX; OR ECX, EDX; MOV EAX, EBX; MOV EDX, ECX; SHR EAX, 1; SHR EDX, 1; And Eax, EDI; and EDX, ESI; OR EBX, EX; OR ECX, EDX; MOV EDX, ECX; SHR EAX, 1; SHR EDX, 1; And Eax, EDI; AND EDX, ESI; OR EBX, EAX; OR ECX, EDX; MOV EAX, EBX; MOV EDX, ECX; SHR EAX, 1; SHR EDX, 1; And EX, EDI; AND EDX, ESI; OR EBX, EX; OR ECX, EDX; SHR EBX, 1; SHR ECX, 1; / * *** ************************ * // * Shift = -7 rowdelta = -1 coldelta = 1 * / / * shift = 7 rowdelta = 1 COLDELTA = -1 * / / * DISC # 1, Flip Direction 2. * / / * DISC # 1, Flip Direction 5. * / MOVQ MM3, MM1; MOVQ MM4, MM0; MOVQ MM6, MM0; PAND MM3, DIR_MASK2; 0x007E7E7E7E7E7E00;; Dir_mask2 of Value : 4011110; 0111110; 0111110; 0111110; 0111110; 0111110; 0111110; 0111110; 00000000 PSLLQ MM4, 7; PSRLQ MM6, 7; PAND MM4, MM3; PAND MM6, MM3; / * DISC # 2, Flip Direction 2. * / / * DISC # 2, Flip Direction 5. * / MOVQ MM5, MM4 ; MOVQ MM7, MM6; PSLLQ MM5, 7; PSRLQ MM7, 7; PAND MM5, MM3; PAND MM7, MM3; PORM4, MM5; POR MM6, MM7; / * Disc # 3, Flip Direction 2. * / / * DISC # 3, Flip Direction 5. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 7; PSRLQ MM7, 7; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * Disc # 4, Flip Direction 2. * / / * DISC # 4, Flip Direction 5. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 7; PSRLQ MM7, 7; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * DISC # 5, FLIP DIRECTION 2. * / / * DISC # 5, FLIP DIRECTION 5. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5 , 7; PSRLQ MM7, 7; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; / * Serialize Here: add horizontal shr flips. * / MOVD MM5, EBX; PSLLQ MM5, 32; MOVD MM7, ECX; POR MM5, MM7; POR MM2, MM5; POP EBX; POP ECX; POP EDI; POP ESI; / * Disc # 6, Flip Direction 2. * / / * DISC # 6, Flip Direction 5. * / MOVQ MM5, MM4; MOVQ MM7, MM6; PSLLQ MM5, 7; PSRLQ MM7, 7; PAND MM5, MM3; PAND MM7, MM3; POR MM4, MM5; POR MM6, MM7; PSLLQ MM4, 7; PSRLQ MM6, 7; POR MM2, MM4; POR MM2, MM6; / * mm2 is the pseudo-feasible moves at this point. * / / * Let MM7 Be The Feasible Moves, IE, MM2 Restricted to Empty Squares. * / MOVQ MM7, MM0; POR MM7, MM1; PANDN MM7, MM2; / * Count The Moves, I., The Number of Bits Set in mm7. * / movq mm1, mm7; psrld mm7, 1; pand mm7, c55; c55 = 0x5555555555555555 psubd mm1, mm7; movq mm7, mm1; psrld mm1, 2; pand mm7, c33; c33 = 0x3333333333333333; pand mm1, c33; c33 = 0x3333333333333333 PADDD MM7, MM1; MOVQ MM1, MM7; PSRLD MM7, 4; PADDD MM7, MM1; PAND MM7, C0F; C0F = 0x0F0F0F0F0F0F0F; MOVQ MM1, MM7; PSRLD MM7, 8; PADDD MM7, MM1; MOVQ MM1, MM7; PSRLD MM7, 16; PADDD MM7, MM1; MOVQ MM1, MM7; PSRLQ MM7, 32; PADDD MM7, MM1; MOVD EAX, MM7; And Eax, 63; MOV Count, EAX; / / / EMMS; / * POP EDI; POP ESI; POP EBX; POP EDX; POP ECX; // * / / / Pop Eax;} return count } Attached: MMX data structure Multimedia software has a significant feature: 1. Small integer data type (graphic data is 8 bits, audio data is 16 bits) 2, frequent and repeated calculation operations of small integer data (eg, core calls) Algorithm); 3, many operations have a parallelism of memory (such as the same addition, minus or multiplication operation operation for a large number of data); MMX technology designs a basic, common tightening shaping directive, a total of 57. The so-called "tightening shaping data" refers to a plurality of 8/16/32-bit plastic data combinations into a 64-bit data. The MMX instruction is mainly using this tightening shaping data, which is divided into four shaping types: tightening bytes, Tighten the word, tighten the double word, tighten 4 words. A Packed Byte: 8 bytes are combined into a 64-bit data; Packed Word: 4 characters set into a 64-bit data; A tightening double word (Packed DoubleWord): 2 double word combined into a 64-bit data; Tighten 4 words: A 64-bit data such a MMX instruction can handle 8/4/2 data unit simultaneously, which is a so-called "single-instruction multi-data SIMD" structure. This structure is the most fundamental factor in MMX technology to improve machine performance. In order to facilitate the use of 64-bit tight shaping data, the MMX technology contains 8 64-bit MMX registers (MM0 ----- MM7), and only MMX registers can be used. It is worth mentioning that the MMX register is random access, but it is actually achieved by 8 float data registers. The floating point processing unit FPU has 8 floating point registers FPR to access in a stack. Each floating point data register has 80 bits, high 16 bits for exponential and symbols, low 64 bits for effective numbers. MMX utilizes its 64-bit effective digital part with a 64-bit MMX register that randomly accesses. MMX instruction set 1, arithmetic operation: PADD [B, W, D] surround [byte, word, double word] padds [b, w] has symbolic saturation [byte, word] Paddus [B, W] unsigned saturation plus [Byte, word] PSUB [B, W, D] surround [byte, word, double word] PSUBS [B, W] has symbolic saturation [byte, word] PSUBUS [D, W] unsigned saturation Reduce [byte, word] Pmulhw tightening word After taking high PMULLW tightening words, take the low Pmaddwd tightening word, accumulate 2, comparison: PCMPEQ [B, W] tighten comparison is equal [byte, word , Double word] PCMPGT [B, W] tighten comparison is greater than [byte, word, double word] 3, type conversion: Packuswb Press unsatisfactory compression [word zyps] Packss [WB, DW] Symbol saturation compression [word / double word / byte / word] PUNPCKH [BW, WD, DQ] extended high [byte, word, double word, double word, 4 words] PUNPCKL [BW, WD, DQ] Extended status [byte, word, double word, double word, 4 words] 4, logical operation: PAND tightening logic and PANDN tightening logic and non-POR tightening logic or PXOR tightening logic different or 5, displacement: PSLL [W, D, Q] Tightening logic left [word, double word, 4 words] PSRL [W, D, Q] tighten logic right shift [word, double word, 4 words] PSRA [W, D] tightening calculation right movement [word , Double word] 7, data transfer: MOV [D, Q] from MMX register passers / pass [Double word / 4 words] 8, status clearance EMMS clear MMX status