Author: Sune Fischer
Date: 08:53:37 12/02/02
Go up one level in this thread
On December 02, 2002 at 08:59:38, Gerd Isenberg wrote: >On December 02, 2002 at 07:19:06, Sune Fischer wrote: > >>On December 01, 2002 at 17:05:06, Gerd Isenberg wrote: >> >>>oups, something shorter and faster: >>> >>>int getBitIndex(BitBoard singleBit) >>>{ >>> __asm >>> { >>> pxor mm2, mm2 ; 0 >>> movd mm0, [singleBit] >>> punpckldq mm0, [singleBit+4] >>> pcmpeqd mm6, mm6 ; -1 >>> pxor mm7, mm7 ; 0 >>> pcmpeqd mm2, mm0 ; ~mask of the none zero dword >>> PI2FD mm1, mm0 ; 3f8..,400.. >>> pxor mm2, mm6 ; mask of the none zero dword >>> psrlq mm6, 63 ; 01 >>> psrld mm1, 23 ; 3f8 to 7f >>> psrld mm2, 25 ; 7f mask >>> psllq mm6, 32+5 ; 20:00 >>> psubd mm1, mm2 ; - 7f mask >>> por mm1, mm6 ; + 32 in high dword >>> pand mm1, mm2 ; & 7f mask >>> psadbw mm1, mm7 ; add all bytes >>> movd eax, mm1 >>> } >>>} >> >>This is great, I will try it. >> >>What I really need is GetFirstBitAndReset() functions. ><snip> > >>Is it possible to make it xor out the bit it found too? >>Perhaps it is too complicated, in my case I think b&(-b) needs to be in >>assembler, so that the precondition is removed entirely. > > >Hi Sune, > >hmm, lets try it on the fly (i'm at work, so not tested and optimized so far): > >int bitSearchAndReset(BitBoard &bb) >{ > BitBoard lsb = bb & -((__int64)bb); > bb ^= lsb; > return getBitIndex(lsb); // should be inlined >} > >With mmx there is some trouble with the 64-bit twos-complement, because there is >no paddq: > >int bitSearchAndReset(BitBoard &bb) >{ > __asm > { > pxor mm2, mm2 ; 0 > pxor mm3, mm3 ; 0 > pcmpeqd mm6, mm6 ; -1 > pcmpeqd mm1, mm1 ; -1 > > mov eax, [bb] > movq mm0, [eax] ; assume properly aligned bitboard > psrlq mm6, 63 ; 00:01 > pxor mm1, mm0 ; ~bb, ones complement > paddd mm1, mm6 ; +1 but no overflow to high dword > psllq mm6, 32 ; 01:00 > pcmpeqd mm3, mm1 ; look whether low dword is zero due to overflow > psllq mm3, 1 ; shift carry to the right place > pand mm3, mm6 ; ... and mask 1 > paddd mm1, mm3 ; add possible overflow, no we have -bb > pand mm0, mm1 ; lsb = bb & -bb > pxor mm7, mm7 ; 0 > pxor [eax], mm0 ; reset lsb in bb > > pcmpeqd mm2, mm0 ; ~mask of the none zero dword > PI2FD mm1, mm0 ; 3f8..,400.. > pxor mm2, mm6 ; mask of the none zero dword > psrld mm1, 23 ; 3f8 to 7f > psrld mm2, 25 ; 7f mask > psllq mm6, 5 ; 20:00 > psubd mm1, mm2 ; - 7f mask > por mm1, mm6 ; + 32 in high dword > pand mm1, mm2 ; & 7f mask > psadbw mm1, mm7 ; add all bytes > movd eax, mm1 > } >} > >>Is it possible to do a similar optimization on 32 bit? > >may be... > >> >>I have this: oups, there is a serious error! >>uint32 FirstBit32(uint32 bitmap) >>{ >> __asm >> { >> bsf eax, [bitmap] >> jnz done >> mov eax, 0 // That is even true if bit 0 is set !!! >> done: >> } >>} > >should be: > >uint32 FirstBit32(uint32 bitmap) >{ > __asm > { > bsf eax, [bitmap] > jnz done > mov eax, 0xffffffff > done: > } >} It would crash if I tried continuing with that value, so it has to be a value in the range 0-31. I know it is no good for error detection then, but I don't use that anyway, hence my question for the precoditioning :) >or > >int FirstBit32(uint32 bitmap) >{ > __asm > { > bsf eax, [bitmap] > jnz done > mov eax, -1 > done: > } >} signed/unsigned mix is just slowing things down. I always use unsigned when possible, it gives a measurable boost! :) >> >>I would like functions that precondition the bitboard is not empty, ie. that at >>least 1 bit is set. The little function above isn't optimized for that, how do I change it? so would this work? uint32 FirstBit32(uint32 bitmap) { __asm { bsf eax, [bitmap] } } ? I do not want redundant assembler lines, I already know the bitmaps (32 or 64 bits) aren't 0 because they are running inside while loops, so I hope there is no testing of that? >>Thanks :) >>-S. > >or this one for Athlon with PI2FD with reset of the found bit: > >// should return < 0 (0x80000000) if bitmap is zero >// not tested !! > >int FirstBit32WithReset(unsigned int &bitmap) >{ > __asm > { > mov ebx, [bitmap] > xor edx, edx ; 0 > mov eax, [ebx] ; b > sub edx, eax ; -b > and eax, edx ; b & -b > xor [ebx], eax ; reset bit, if any > movd mm0, eax ; hmm... vector path > PI2FD mm0, mm0 ; 0-0; 1-3f8.., 2-400.., 4-408 > movd eax, mm0 > shr eax, 23 ; 0, 7f, 80, 81... > sub eax, 0x7f > and eax, 0x8000001f > } >} > >But there are a lot of register dependencies... Hmm, may be, I just thought it would be "nice" to not have that x&=x-1 everywhere. It's more compact an a little bit less errorprone, sometimes I forget and have to debug an infinite loop stall. I will try them, thanks :) -S. >Gerd
This page took 0.03 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.