Author: Gerd Isenberg
Date: 12:56:58 12/01/02
Hi all, The fastest 64bit - BitScan with Athlons so far! The AMD-3DNow! Instruction PI2FD does a packed 32-bit (signed) integer to floating-point (IEEE 32-bit) conversion. The PI2FD (direct path, 4 cycles latency) instruction performs the following operations: mmreg1 [31: 0] == float(mmreg2/mem64 [31: 0]) mmreg1 [63:32] == float(mmreg2/mem64 [63:32]) Bit 31 of the float is the sign bit, the eight bits 30:23 are the biased exponent and the remaining 23 bits are the significand. If you pass single-bit bitboards through this instruction, the bitscan idea becomes obvious clear due to the biased base two exponent. bit 0-63 PI2FD 00000001 3F800000 00000002 40000000 00000004 40800000 00000008 41000000 00000010 41800000 .. 40000000 4E800000 80000000 CF000000 ! sign Some additional shift, substraction and masks, and there is faster bitscan routine than with two 32-bit bsf-instructions! // Athlon only due to pswapd and PI2FD // precondition: BBs has exactly one bit set // may be used as bitscan with (b&-b) //=========================================== int getBitIndex(BitBoard singleBit) { __asm { pxor mm2, mm2 ; 0 movd mm0, [singleBit] punpckldq mm0, [singleBit+4] pcmpeqd mm4, mm4 ; -1 pcmpeqd mm3, mm3 ; -1 pcmpeqd mm2, mm0 ; -1-mask for the zero dword PI2FD mm1, mm0 ; 3f8..,400.. psrlq mm4, 63 ; 00:01 pxor mm2, mm3 ; -1-mask for the none zero dword psrld mm3, 25 ; 7f:7f psrld mm1, 23 ; 3f8.. to 7f psllq mm4, 32+5 ; 20:00 to add 32 to high dword psubd mm1, mm3 ; - 7f:7f pand mm1, mm3 ; & 7f:7f por mm1, mm4 ; + 32 to high dword pand mm1, mm2 ; mask apropriate low- or high dword pswapd mm2, mm1 ; swap dwords por mm1, mm2 ; combine low and high dword movd eax, mm1 } } Even if there is some effort to calculate the consts 7f:7f and 20:0, this is shlightly faster than the bsf pair, even with (b&-b) outside the functions! A parallized verions with two bitboards (forBB, toBB) does very well, only 12.5 seconds for 10^9 runs, faster than a single bsf pair. Cheers, Gerd
This page took 0.03 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.