Author: Aart J.C. Bik
Date: 11:40:50 01/13/05
Go up one level in this thread
The following attempt for the 64-bit version will vectorize, but I see no speedup over the sequential compilation of the same implementation (it is faster than the original source code with shift, however): unsigned int bits32[32]; /* precompute shifts */ int dotProduct64(unsigned __int64 bb, unsigned char weight[]) { int i; int sum = 0; unsigned int b1 = bb; unsigned int b2 = bb>>32; #pragma ivdep #pragma vector aligned for (i=0; i < 32; i++) { if (b1 & bits32[i]) sum += weight[i]; if (b2 & bits32[i]) sum += weight[i+32]; } return sum; }
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.