Author: Stuart Cracraft
Date: 13:40:53 03/08/06
Go up one level in this thread
On March 08, 2006 at 11:01:39, Robert Hyatt wrote: >On March 07, 2006 at 00:09:58, Stuart Cracraft wrote: > >>On March 06, 2006 at 22:53:05, Dann Corbit wrote: >> >>>Lot's of programs have a problem with this one. It's a sacrifice leading to a >>>forced mate. I would not worry about this one too much: >>>[D] 4r1k1/p1qr1p2/2pb1Bp1/1p5p/3P1n1R/1B3P2/PP3PK1/2Q4R w - - bm Qxf4 >> >>Of course you're right but I'm still hopeful some day I can solve >>things like that. I think I remember Bob talking about WAC 141. > > >Here is what I believe is the correct way to use WAC for testing: > >(1) do _not_ do the one second per position run. You don't get much information >there, and if a change doesn't get you more or less right answers, you get no >info at all. Okay I agree to that. > >(2) I have an "early exit" option when I run a test that says "if, at the end of >N consecutive iterations, the correct (key) move has been found to be best, stop >the search immediately, regardless of how much time has been used." I run the >test with this value set to 1, so that when an iteration completes, if the right >move is best, I quit and go to the next position. Take the times for each of >the 300 positions, square them and then sum the result. Any change you make >that reduces this number is good. Any change you make that increases this >number is bad. Even if you get one more right, but now every position takes >longer to solve, you hurt your tactical ability. The sum-of-squares is the >easiest way to detect this. Otherwise suppose you get the same number right on >two consecutive runs, and suppose you only have three positions in the suite and >you get them all right. The first run, the times are 4, 4, and 2. The second >run (changed code) gives you 3, 3 and 5. Which is better? I'd rather have the >first (sum of squares = 36) rather than the latter (sum of squares = 43). Doesn't this discriminate against programs that can't solve all 300? For example, let's say there are three positions and I get none of them. My sum-of-squares score is what for those three failed positions? Separately, shouldn't the N refer to a maximum time instead of consecutive iterations instead? For example, "I limit this program to N seconds per search. If it finds the move at any period within that N seconds, terminate the search, remember the amount of time it took and use that for a squared value in the sum-of-squares. If it *doesnt* solve the position in N seconds, then terminate the search and make the result be (N+1)*(N+1) for the sum-of-squares term for this position. Please describe the formula a little more exactly for failed positions. Thanks. > >For reference for times, on my 750 laptop, crafty gets 265 correct out of 300. >On a single 2.8ghz xeon, it gets 285 right. using two cpus it gets 294 right. >I have an 8-way opteron run that got 300 right in 1 sec/move but that is with >extreme hardware. An older version saw 300 right at 1 sec per move on a dual >alpha, run by Tim Mann, 3-4 years ago. I still have the log I believe... Yes, >here is the summary: (this a dual 21264, do not know the speed) > >total positions searched.......... 300 >number right...................... 300 >number wrong...................... 0 >percentage right.................. 100 >percentage wrong.................. 0 >total nodes searched.............. 236973211.0 >average search depth.............. 4.5 >nodes per second.................. 1783641 > >One caution, I can (and have) tuned Crafty to do better than this. But if you >are not careful, you can kill real game performance as WAC likes check >extensions, recapture extensions, etc. While real games might show those to >waste excessive nodes and result in worse play OTB. Do you have any good test suites that are well-debugged that you use for testing? Stuart
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.