Author: Gerd Isenberg
Date: 09:51:45 07/13/03
Go up one level in this thread
On July 13, 2003 at 06:57:48, Bas Hamstra wrote: >Coicidentally I am working a bit on LastOne too right now. I also have a Athlon >XP (like you I believe). So BSF isn't too fast. The question is: can you >outperform it? At first it seems so, in a zillion times loop the following code >appeared to be nearly a factor 3 faster than the BSF routine below: > >const char Tabel[64] = > { 0, 0, 0,15, 0, 1,28, 0,16, 0, 0, 0, 2,21,29, 0, > 0, 0,19,17,10, 0,12, 0, 0, 3, 0, 6, 0,22,30, 0, > 14, 0,27, 0, 0, 0,20, 0,18, 9,11, 0, 5, 0, 0,13, > 26, 0, 0, 8, 0, 4, 0,25, 0, 7,24, 0,23, 0,31, 0 > }; > > >__forceinline int LastOne2(unsigned __int64 Bits) > >{ unsigned int *p = (unsigned int*) &Bits; > unsigned int Short; > > if(p[0]) > { Short = p[0]; > Short = Short ^ (Short-1); > Short *= 7*255*255*25; // ?? 25 or 255 doesn't matter? > Short >>= 26; > return Tabel[Short&63]; > } > Short = p[1]; > Short = Short ^ (Short-1); > Short *= 7*255*255*255; > Short >>= 26; > return Tabel[Short&63]+32; >} > >VC produces very efficient asm code for it, as far as I can judge. However how >about the overall performance in the program? It seems to be noticably slower >than plain BSF, like below: > >__forceinline int LastOne(BB M) >{ __asm > { mov EAX, dword ptr [M] > XOR EDX, EDX > cmp EAX, 0 > jnz L2 > mov EAX, dword ptr [M+4] > add EDX, 32 > L2: bsf EAX, EAX > ADD EAX, EDX > } >} > >Both routines have an extra branch to test the first 32 bits, so that is not the >problem!? > Hi Bas, LastOne - FirstOne, Ok it depends on the starting point 0 or 63. What about nullTo63One? Even with LS1B and MS1B is confusion. Nice constants, but they map 32-bit to a range of 64 or 60 - about the double space as necessary ;-) Ahh, you use signed char for table, may be a additional and 0xff or movsx instead of mov. Why not using de Bruijn constants, that map to a range of 32? A loop test is interesting, but code size matters a lot in programs that often inlines "small" functions. Also memory lookup seems to be an issue here - even with this cache friendly size. My intention is not to outperform bsf anymore - but to use a more bitboard metric approach in "future" 64-bit IsiChess. 1. Try to avoid traversing bitboards at all. 2. If traversing bitboards, use x & -x to isolate single bits - and eg. for moves use a fast hashkey function, to map move from/to bitboards to an appropriate value-range. Regards, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.