Author: Gerd Isenberg
Date: 03:56:09 12/02/05
Go up one level in this thread
On December 01, 2005 at 17:18:41, Mridul Muralidharan wrote: >On December 01, 2005 at 16:46:01, Zappa wrote: > >>On December 01, 2005 at 16:45:22, Mridul Muralidharan wrote: >> >>>Hi, >>> >>> Just got the express ver of vc++ 2005 and it barfs on my inline assembly code >>>(which uses bsf , etc) when I try to compile it for amd 64. >>> >>>Might be asking for too much , but just wondering if anyone has already ported >>>leadz/trailz for it in an efficient manner (I do have some fallback non-asm code >>>, but i suspect it is way too slow). >>> >>>Thanks, >>>Mridul >> >>There is an intrinsic for it, but I forget what its called. >> >>anthony > > >Thanks , totally forgot about intrinsic's :) >Will revisit them later if required - for now , port. > >- Mridul Hi Mridul, yes, you'll find x64 intrinsics here: http://msdn2.microsoft.com/library/azcs88h2(en-US,VS.80).aspx #include <intrin.h> extern "C" unsigned char_BitScanForward64( unsigned long * Index, unsigned __int64 Mask ); #pragma intrinsic(_BitScanForward64) Whether this is faster than "Using de Bruijn Sequences to index a 1 in a Computer Word" http://supertech.csail.mit.edu/papers/debruijn.pdf or Matt Taylor's folded DeBruijn trick with 32-bit mul may depend on a lot of circumstances in a concrete chess program. Clearly bsf reg64,reg64 is the shortest code, but 9 cycles vector path. The multiplication with a DeBruijn constant is only 3/4 cycles direct path, but requires a memory lookup... Cheers, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.