Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Chess program improvement project (copy at Winboard::Programming)

Author: Stuart Cracraft

Date: 13:40:53 03/08/06

Go up one level in this thread


On March 08, 2006 at 11:01:39, Robert Hyatt wrote:

>On March 07, 2006 at 00:09:58, Stuart Cracraft wrote:
>
>>On March 06, 2006 at 22:53:05, Dann Corbit wrote:
>>
>>>Lot's of programs have a problem with this one.  It's a sacrifice leading to a
>>>forced mate.  I would not worry about this one too much:
>>>[D] 4r1k1/p1qr1p2/2pb1Bp1/1p5p/3P1n1R/1B3P2/PP3PK1/2Q4R w - - bm Qxf4
>>
>>Of course you're right but I'm still hopeful some day I can solve
>>things like that. I think I remember Bob talking about WAC 141.
>
>
>Here is what I believe is the correct way to use WAC for testing:
>
>(1) do _not_ do the one second per position run.  You don't get much information
>there, and if a change doesn't get you more or less right answers, you get no
>info at all.

Okay I agree to that.

>
>(2) I have an "early exit" option when I run a test that says "if, at the end of
>N consecutive iterations, the correct (key) move has been found to be best, stop
>the search immediately, regardless of how much time has been used."  I run the
>test with this value set to 1, so that when an iteration completes, if the right
>move is best, I quit and go to the next position.  Take the times for each of
>the 300 positions, square them and then sum the result.  Any change you make
>that reduces this number is good.  Any change you make that increases this
>number is bad.  Even if you get one more right, but now every position takes
>longer to solve, you hurt your tactical ability.  The sum-of-squares is the
>easiest way to detect this.  Otherwise suppose you get the same number right on
>two consecutive runs, and suppose you only have three positions in the suite and
>you get them all right.  The first run, the times are 4, 4, and 2.  The second
>run (changed code) gives you 3, 3 and 5.   Which is better?  I'd rather have the
>first (sum of squares = 36) rather than the latter (sum of squares = 43).

Doesn't this discriminate against programs that can't solve all 300?
For example, let's say there are three positions and I get none of them.
My sum-of-squares score is what for those three failed positions?

Separately, shouldn't the N refer to a maximum time instead of consecutive
iterations instead? For example, "I limit this program to N seconds per
search. If it finds the move at any period within that N seconds, terminate
the search, remember the amount of time it took and use that for a squared
value in the sum-of-squares. If it *doesnt* solve the position in N seconds,
then terminate the search and make the result be (N+1)*(N+1) for the
sum-of-squares term for this position.

Please describe the formula a little more exactly for failed positions.
Thanks.

>
>For reference for times, on my 750 laptop, crafty gets 265 correct out of 300.
>On a single 2.8ghz xeon, it gets 285 right.  using two cpus it gets 294 right.
>I have an 8-way opteron run that got 300 right in 1 sec/move but that is with
>extreme hardware.  An older version saw 300 right at 1 sec per move on a dual
>alpha, run by Tim Mann, 3-4 years ago.  I still have the log I believe...  Yes,
>here is the summary:  (this a dual 21264, do not know the speed)
>
>total positions searched..........         300
>number right......................         300
>number wrong......................           0
>percentage right..................         100
>percentage wrong..................           0
>total nodes searched.............. 236973211.0
>average search depth..............         4.5
>nodes per second..................     1783641
>
>One caution, I can (and have) tuned Crafty to do better than this.  But if you
>are not careful, you can kill real game performance as WAC likes check
>extensions, recapture extensions, etc.  While real games might show those to
>waste excessive nodes and result in worse play OTB.

Do you have any good test suites that are well-debugged that you use
for testing?

Stuart



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.