Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: De Bruijn Sequence Generator

Author: Gerd Isenberg

Date: 13:14:58 12/30/03

Go up one level in this thread


On December 30, 2003 at 13:16:18, Dieter Buerssner wrote:

>On December 30, 2003 at 08:09:21, Gerd Isenberg wrote:
>
>Hi Gerd,
>
>I did not try to understand your code - seems clever :-).

Hi Dieter,

it's quite simple. It starts with an initial sequence of 64-zero bits and with
an initial bitpos of 58. Bitpos addresses the lowest bit of a
6-bit sub-sequence which should be unique, so 58 addresses bits 63..58, the
first sub-sequence is zero (start index).

Recursive calls reduce bitpos by one, until a deBruin with unique 0..63 indices
is found at bitpos zero, or until an index occured more than once which
contradicts the de Bruin definition. To keep track of all used indicies so far,
a bit with this index is set in a static 64-bit word (initialized with zero)
before recursice deepening and reset after return. The first call leaves the
sequence unchanged (set a "0" at bitpos-1, due to the initial zero value), the
second call sets a "1" at bitpos-1.

There is a slight optimization at "frontier-nodes" at bitpos == 1.
If we start with Null as first index, the bit 0 must always set due to the
wrapped zero bits 63..59.

Of course the algorthm may be written in an iterative way - but i found the
recursive more intuitive and easy.

>I also do not
>remember, what your latest fastest LastOne() looked like - did it use an if and
>only 32-bit muls or did it use a 64-bit mul? (I really hope, the search engine
>will come back.)


Currently for Athlon-32 i still use inline assembler version with jnz after
lower bsf! With my "new" Opteron-Approach with a complete different bitboard
infrastructure with respect to (SSE2) fill-algorithms, i will try to avoid
bitscan at all and to extract bits with x & -x only.

Therefore my interest to find a fast hashfunction for move indices.

>
>>This De Bruijn Sequence generator is very helpfull in finding constants, to do a
>>kind of double bitscan - finding move indices with from- and to-squares as
>>single isolated bit bitboards. I found best results so far with the difference
>>of the two bits, multiplying with some value based on the found DeBruijn
>>sequence, and shifting the product 64-N bits right.
>>
>>     idx = ((tobitboard - frombitboard) * f(deBruijn)) >> (64-N);
>>
>>Even f(deBruijn) = deBruijn gives interesting result, but xor with some constant
>>pattern like 0x101010 is more promising.
>
>I don't understand the f(deBruijn) part together with the xor. How exactly would
>f(deBruijn) look then? When I read you sentence, it seems to be the
>DeBrujn-sequence xored with some constant - can this be right?

Yes, xoring with some "magic" constants and with all 63 rotates of itselft.


>
>>Doing move-generation in Steffan Westcott's way, using direction wise sets for
>>sliding pieces, and with all disjoint piece sets, it is possible to get an index
>>range of 8K (8192 = 2*64*64). Each unique move index of this scheme does not
>>only map unique move from-to-squares, but also the kind of piece.
>
>This is too much magic for me. I am aware, that there are less than 64*64 legal
>from-to combinations. But you fold in the piece type as well - how can this
>work?

With all deBruins mx = fx(deBruin):

One straight direction (224 moves) map a 2^10 == 1024 range.
But with mirroring the ranges are packable with the right mx constants for rooks
and queens disjoint offsets.

  idxNorthRook  = A + (((tobitboard - frombitboard) * m1) >> (64-10));
  idxSouthRook  = B - (((frombitboard - tobitboard) * m1) >> (64-10));
  idxNorthQueen = C + (((tobitboard - frombitboard) * m1) >> (64-10));
  idxSouthQueen = D - (((frombitboard - tobitboard) * m1) >> (64-10));

Similar with west/east with appropriate m2.

One diagonal direction (140 moves) map a 2^9 == 512 range.
There are similar packing tricks.
Knight- and King-moves map to a 512 range each, packable too.
There is still enough space for pawn moves, pawn-captures, including promotions,
ep and the four castles.

The naked DeBruijn generator with empty doSomeThingWithDeBruijn
takes only three minutes.

With one DeBruijn i try about 7+3*64 constants inside a loop.
With each Constant i try to get an unique index range for a set of moves in one
straight/diagonal direction.

If i found one, i messure the minimal amount of packing shifts that the
intersections of the index-sets becomes zero. That is done in both directions:

         X & shift(mirror(X), i2minimize).
         mirror(X) & shift(X, j2minimize).

During one night run (4-6hours) i found the constants with best mirroring
packing properties.


>
>On my HD I found a file debruijn.ps.gz:

Yes, i have it.

>
>Using de Bruijn Sequences to Index a 1 in a Computer Word
>by Charles E. Leiserson, Harald Prokop, and Keith H. Randall.
>
>I guess you know it already, otherwise, it should be easy to find by google.
>They also mention the indexing exactly "2 bits set problem" and came up with a
>concrete formula.
>

Yes, 32K range.


>Slightly related. I thought, the same readers interested in your post might also
>be interested in the following. I reread parts of that paper, and got reminded
>of one test I had done - using floating point to find the index of the bit.

I tried a similar bitscan trick with AMD's 3D-Now 2*32-integer-float conversion.
I nice trick for MMX-bitscan ;-)

>I looked at some old code snippets of mine, and thought I try to measure the
>performance with the help of the cycle count (RDTSC instruction) of modern x86
>CPUs. I also reread some suggestions of Intel, how to use that instruction and
>came up with the below source. I get partitially strange results. First the easy
>part: If I interprete correctly and did not code some bugs, it will take 56
>cycles on my P4 to get the bit index of 64-bit word with exactly one bit set
>(using 80-bit floating point, one could get the index of the most significant
>bit without isolating one bit). But other results are strange. I tried to
>measure the RDTSC overhead exactly as suggested by Intel. And for numbers 1 and
>2, the bit-index is found faster (incl. the CPUID/RDTSC overhead) than the
>overhead alone!?
>

I never tried these RDTSC instructions.
I will try your code during the next days due to windows and msc.

Gerd

<snip>



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.