Author: Uri Blass
Date: 14:57:31 06/12/04
Go up one level in this thread
On June 12, 2004 at 17:23:52, Mike S. wrote: >On June 12, 2004 at 11:32:03, Robert Hyatt wrote: > >>(...) > >>This shows that such tests are basically flawed. The test should state "The >>time to solution is the time where the engine chooses the right move, and then >>sticks with it from that point forward, searching at least 30 minutes more..." > >Why "should..."?? This *is* the condition for a correct solution in the WM Test >and ever has been, with the exception that the max. time is 20 minutes/pos. A >solution is counted from the time when an engine has found *and kept* the >solution move until the full testing time of 20 minutes. > >Rolf fails to inform you about that, or he doesn't know it himself. Does that >surprise you? > >(You can always claim that the test time is too short, but if you for example >run every position for a whole day, you'll still find engines which would switch >to a wrong move after 26 hours. So you have to draw a line somewhere - and 20 >minutes/pos. is a time for "intensive analysis;" a normal game usually will >nearly never take more than 10 minutes per pos. and not more than 3 minutes/pos. >average...) > >http://www.computerschach.de/test/WM-Test.zip >(English version included, and results of 4 Crafties.) > >I hope you didn't assume the WM-Test authors and the complete audience who uses >it, are idiots who count a "pseudo solution" which is found i.e. after 12 >seconds, when from 42 secs. to 7 min. an engine switches to a wrong move >etc.etc. ?? Of course not. A high percentage of CSS readers are experienced >advanced computerchess users (at least). CSS itself has built, informed and >developed that expert's audience (I guess the US has nothing comparable, >unfortunately). - Also, advice has been given to set the "extra plies" parameter >for automatic testsuite functions to 99, to ensure that the complete testing >time is used, for each position. But in general, we have recommended to test >manually and watch the engine's thinking process to get impressions so to speak. > >I'm a bit disappointed about your statement that "...such tests are basically >flawed. The test should," when indeed it *does* just that. > >>That stops this kind of nonsensical "faster = worse" problem. Because as is, >>the test simply is meaningless when changing nothing but the hardware results in >>a poorer result... > >Are you aware that only some (few) of the positions are affected by that >problem? The WM-Test has 100 positions. Some engines show that behaviour in some >of the positions (different engines in different positions). Some fail to >finally solve due to that, some solve but would change to a wrong move after >20:00, etc. > >Can you guarantee that any single test position you use (and pls don't tell me >you use nove :-)) is not affected from that problem? Who can guarantee that? >Engines are creative in finding ways to decide for the correct move, but for the >wrong reason, sometimes... You are aware that it is very difficult to avoid it >to 100%, especially when a large test suite is compiled? > >So please be fair. > >Regards, >Mike Scheidl I think that there are better test positions. test positions when engines can often solve for the wrong reason are bad test position. I also think that test positions cannot be used to get information about the strength of the engine. Evaluation of the engine may be dependent on the previous moves of the game and when the engine gets only position without previous moves it may perform worse than it's real strength. I use test positions only to test tactical strength and I think that it is a bad idea to use test positions to test strategic elements. My opinion is that test suites based on human games include too many sacrifices and the result may be that increasing the positional evaluation for the engine may cause it to perform better in test suites but worse in games because in games it may play wrong sacrifices for positional reasons. I prefer test suites that are based on comp-comp games like the arasan test suite. Uri
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.