Author: Anthony Cozzie
Date: 21:33:00 06/18/02
Go up one level in this thread
I just spent a few days playing around with LSB functions, and this is the best I have been able to do. This runs on approximately 31 clocks on my athlon, which is 50% or more faster than any implementation I had based on the BSF instruction. The problem with BSF is it is one of the "CISC instructions of the past" and actually involves testing each bit. While this is done in microops and not real loops, I think its still possible to better. If anyone can optimize this code or suggest improvements, I'd really be interested in seeing them. //This is in ATT [gcc] syntax, destination is on the right, source on the left movl 8(%esp), %eax movl 12(%esp), %ebx movl $32, %edx xor %ecx, %ecx testl %eax, %eax cmove %ebx, %eax cmove %edx, %ecx movl %eax, %ebx and $65535, %eax movl %ecx, %edx shr $16, %ebx add $16, %edx testl %eax, %eax cmove %ebx, %eax cmove %edx, %ecx movl %eax, %ebx and $255, %eax movl %ecx, %edx shr $8, %ebx add $8, %edx test %eax, %eax cmove %ebx, %eax cmove %edx, %ecx movsbl lead_one_lut(%eax), %eax add %ecx, %eax Here lead_one_lut is a 256 byte table with the precomputed LSB for an 8 bit integer.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.