Author: Gerd Isenberg
Date: 05:59:38 12/02/02
Go up one level in this thread
On December 02, 2002 at 07:19:06, Sune Fischer wrote: >On December 01, 2002 at 17:05:06, Gerd Isenberg wrote: > >>oups, something shorter and faster: >> >>int getBitIndex(BitBoard singleBit) >>{ >> __asm >> { >> pxor mm2, mm2 ; 0 >> movd mm0, [singleBit] >> punpckldq mm0, [singleBit+4] >> pcmpeqd mm6, mm6 ; -1 >> pxor mm7, mm7 ; 0 >> pcmpeqd mm2, mm0 ; ~mask of the none zero dword >> PI2FD mm1, mm0 ; 3f8..,400.. >> pxor mm2, mm6 ; mask of the none zero dword >> psrlq mm6, 63 ; 01 >> psrld mm1, 23 ; 3f8 to 7f >> psrld mm2, 25 ; 7f mask >> psllq mm6, 32+5 ; 20:00 >> psubd mm1, mm2 ; - 7f mask >> por mm1, mm6 ; + 32 in high dword >> pand mm1, mm2 ; & 7f mask >> psadbw mm1, mm7 ; add all bytes >> movd eax, mm1 >> } >>} > >This is great, I will try it. > >What I really need is GetFirstBitAndReset() functions. <snip> >Is it possible to make it xor out the bit it found too? >Perhaps it is too complicated, in my case I think b&(-b) needs to be in >assembler, so that the precondition is removed entirely. Hi Sune, hmm, lets try it on the fly (i'm at work, so not tested and optimized so far): int bitSearchAndReset(BitBoard &bb) { BitBoard lsb = bb & -((__int64)bb); bb ^= lsb; return getBitIndex(lsb); // should be inlined } With mmx there is some trouble with the 64-bit twos-complement, because there is no paddq: int bitSearchAndReset(BitBoard &bb) { __asm { pxor mm2, mm2 ; 0 pxor mm3, mm3 ; 0 pcmpeqd mm6, mm6 ; -1 pcmpeqd mm1, mm1 ; -1 mov eax, [bb] movq mm0, [eax] ; assume properly aligned bitboard psrlq mm6, 63 ; 00:01 pxor mm1, mm0 ; ~bb, ones complement paddd mm1, mm6 ; +1 but no overflow to high dword psllq mm6, 32 ; 01:00 pcmpeqd mm3, mm1 ; look whether low dword is zero due to overflow psllq mm3, 1 ; shift carry to the right place pand mm3, mm6 ; ... and mask 1 paddd mm1, mm3 ; add possible overflow, no we have -bb pand mm0, mm1 ; lsb = bb & -bb pxor mm7, mm7 ; 0 pxor [eax], mm0 ; reset lsb in bb pcmpeqd mm2, mm0 ; ~mask of the none zero dword PI2FD mm1, mm0 ; 3f8..,400.. pxor mm2, mm6 ; mask of the none zero dword psrld mm1, 23 ; 3f8 to 7f psrld mm2, 25 ; 7f mask psllq mm6, 5 ; 20:00 psubd mm1, mm2 ; - 7f mask por mm1, mm6 ; + 32 in high dword pand mm1, mm2 ; & 7f mask psadbw mm1, mm7 ; add all bytes movd eax, mm1 } } >Is it possible to do a similar optimization on 32 bit? may be... > >I have this: oups, there is a serious error! >uint32 FirstBit32(uint32 bitmap) >{ > __asm > { > bsf eax, [bitmap] > jnz done > mov eax, 0 // That is even true if bit 0 is set !!! > done: > } >} should be: uint32 FirstBit32(uint32 bitmap) { __asm { bsf eax, [bitmap] jnz done mov eax, 0xffffffff done: } } or int FirstBit32(uint32 bitmap) { __asm { bsf eax, [bitmap] jnz done mov eax, -1 done: } } > >I would like functions that precondition the bitboard is not empty, ie. that at >least 1 bit is set. The little function above isn't optimized for that, how do I change it? > >Thanks :) >-S. or this one for Athlon with PI2FD with reset of the found bit: // should return < 0 (0x80000000) if bitmap is zero // not tested !! int FirstBit32WithReset(unsigned int &bitmap) { __asm { mov ebx, [bitmap] xor edx, edx ; 0 mov eax, [ebx] ; b sub edx, eax ; -b and eax, edx ; b & -b xor [ebx], eax ; reset bit, if any movd mm0, eax ; hmm... vector path PI2FD mm0, mm0 ; 0-0; 1-3f8.., 2-400.., 4-408 movd eax, mm0 shr eax, 23 ; 0, 7f, 80, 81... sub eax, 0x7f and eax, 0x8000001f } } But there are a lot of register dependencies... Gerd
This page took 0.04 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.