Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Cleverness, Please!

Author: Matt Taylor

Date: 14:05:08 01/22/03

Go up one level in this thread


On January 22, 2003 at 10:40:20, Joshua Haglund wrote:

>On January 22, 2003 at 08:28:11, Matt Taylor wrote:
>
>>I'm exploring the last couple possibilities in those bitscan routines. What I
>>have decided is that I will do bit = (bb & -bb) - 1 to get a mask that looks
>>like this:
>>
>>bit  index  mask (byte)
>>-----------------------
>>  1    0    00000000
>>  2    1    00000001
>>  4    2    00000011
>>  8    3    00000111
>> 16    4    00001111
>> 32    5    00011111
>> 64    6    00111111
>>128    7    01111111
>>
>>(Extrapolate for 32-bits; I'm lazy!)
>>
>>Now, this is convenient for me because x86 will record the last bit shifted out
>>into the carry flag. Furthermore, I can accomplish the index = (index << 1) |
>>((bit & mask) != 0) with a simple add with carry. That's pretty ideal for C -- 2
>>instructions for 2 lines of code. Also, in the process, it modifies mask in a
>>potentially valuable way.
>>
>>All I need to make this a clever and slick way to eat bits. After the first
>>iteration, the masks for the second half of the table should be identical to the
>>masks in the first half. In other words, if the index >= 4, shift left 4, else
>>no-op. (A cmovz will sort've do it.)
>>
>>Alternatively, I can use subtraction to mess with the numbers and use the carry.
>>
>>Any suggestion would be welcome at any level of thought; higher-level concepts
>>may help me make a connection that I did not formerly see. Also, even more
>>complex expressions can sometimes be reduced very nicely.
>>
>>-Matt
>
>
>Sounds like a good idea!!
>
>What's this for again ? hehehehe ;)
>
>Good Day,
>
>Joshua
>toneewa@yahoo.com

A routine to find the least-significant bit set in a 64-bit number. Obviously
useful for bitboard decoding.

Athlon can issue 3 ops/cycle. Ideally each "round" of the binary search
algorithm can be executed in 1 cycle. The add with carry is extremely convenient
here because it only eats 1 instruction and does a shift + or for me. (I add a
number with itself -- same as a shift left by 1.)

I have two remaining cycles to compute the carry and fold the result. If I can
achieve this optimum 3 instructions/round, I can execute the search in 5 cycles,
and the setup cost is 6 cycles. This gives it a good speed/space tradeoff. The
code is something like 100 bytes long, but it doesn't pollute the data cache,
and it would be tied for fastest bitscan.

-Matt

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.