Author: Bas Hamstra
Date: 05:04:25 08/05/02
Go up one level in this thread
On August 04, 2002 at 23:58:05, Robert Hyatt wrote: >On August 04, 2002 at 09:04:39, Vincent Diepeveen wrote: > >>On July 31, 2002 at 21:35:32, James Swafford wrote: >> >>>On July 31, 2002 at 18:10:08, Vincent Diepeveen wrote: >>> >>>>On July 30, 2002 at 22:43:36, James Swafford wrote: >>>> >>>>> >>>>>Hey everyone. I'm at an AAAI conference in Edmonton. It's ironic (to me) >>>>>that it's been mentioned here recently that Edmonton is a hive of computer >>>>>chess enthusiasts. I don't know if that's true (what's a "hive"? :-), but >>>>>there are certainly a few... >>>>> >>>>>Now to my question. I asked Jonathon Schaeffer today (who is a really >>>>>nice guy, IMO) some questions about his experience with TD learning >>>>>algorithms. He's (co?)published a paper entitled (something like) >>>>>"Temporal Difference Learning in High Performance Game Playing." I >>>>>thought the title was a bit misleading, because he focused on checkers. >>>>>Checkers programs have much smaller evaluation fuctions than chess >>>>>programs, obviously. I asked him if he thought the TDLeaf(Lambda) >>>>>algorithm had potential in high calibre chess. (Yes, yes, I know >>>>>all about Knightcap... but that wasn't quite "high" calibre.) >>>>>He responded with a very enthusiastic "yes". He said "I'll never manually >>>>>tune another evaluation function again." >>>> >>>>And he'll never do a competative chessprogram again either, he forgot to >>>>add that too. >>>> >>>>>A natural follow up question (which I also asked) is -- then why isn't >>>>>everyone doing it?? I don't _believe_ (and maybe I'm wrong about this) >>>>>that any top ranked chess programs use it. His response was simply: >>>>>"There's a separation between academia and industry." Schaeffer stated >>>> >>>>Schaeffer is well known for his good speeches and answers :) >>>> >>>>>that perhaps the programmers of top chess programs don't believe in >>>>>the potential of temporal difference algorithms in the chess domain. >>>>>Or, perhaps, they don't want to put the effort into them. >>>> >>>>>I believe Crafty is the strongest program in academia now. If not, >>>>>certainly among the strongest. So, Bob -- have you looked at TDLeaf >>>>>and found it wanting? It's interesting (and perplexing) to me that >>>>>paper after paper praises the potential of TDLeaf, but it's _yet_ to >>>>>be used in the high end programs. Knightcap was strong, but it's >>>>>definitely not in the top tier. >>>> >>>>I remember Knightcap very well. TD learning had the habit to slowly >>>>make it more aggressive until it was giving away a piece for 1 pawn and >>>>a check. >>>> >>>>Then of course the 'brain was cleared' and experiment restarted. >>>>So in short the longer the program used the TD learning the worse it >>>>would play, from my viewpoint. >>>> >>>>Definitely from a chessplayers viewpoint it did. Of course we must not >>>>forget that in the time it played online, that nearly no program was >>>>very aggressive. So doing a few patzer moves was a good way to get from >>>>perhaps scoring 11% to 12% or so. >>>> >>>>>Maybe Tridgell/Baxter quit to soon, and Knightcap really could've been >>>>>a top tier program. Or maybe the reason nobody is using TD is because >>>>>it's impractical for the large number of parameters required to be >>>>>competitive in chess. Or maybe Schaeffer was right, and the commercial >>>>>guys just aren't taking TD seriously. >>>>> >>>>>Thoughts? >>>>> >>>>>-- >>>>>James >>> >>> >>>So, I can put you on record as saying that TD-Leaf is never going to >>>produce a high calibre player? >> >>For a complex evaluation TD learning will never achieve what handtuning >>by an experienced chess programmer is doing. That is a statement i'm >>willing to make. >> >>Of course if you start with the most stupid tuned set like putting >>everything to zero or everything to -1, then it looks as if TD learning >>and all other random forms of learning are ok. >> >>Same for neural networks and such. I toyed quite a bit with simple >>neural networks, simply because there are several out there to toy >>with. >> >>The major problem is that i for example conclude that open files are >>more important than a pawn in the center, *any* form of general learning >>will never, by definition, being able to conclude the same, for the obvious >>reason that it has no domain knowledge. >> >>We can discuss till chess is solved, but it's definitely a really simple >>case here. The proof is so obvious that it doesn't work, that i am always >>amazed by people who say it works for them. >> >>That must be persons who don't know the difference between a bishop and >>a knight ;) >> >>What i advice is to tune crafty against an opponent where crafty scores >>80% against now. Tuning something in order to achieve < 50% is real simple, >>because thereis no proof that it could be done better. >> >>You really see the difference between automatic tuning and hand tuning >>when an engine is crushing a certain opponent with the hand tuning. >> >>Now automatic tune it to get more than that. to get 90% instead of 80%. >> >>If you have an incredible bad engine and you modify a random thing in >>search, it also might still play incredible bad, but a bit better. >> >>For the stronger engines however this is way harder. >> >>So turn off learning in crafty, find an opponent where it scores well against, >>then autotune crafty. It has a very small evaluation, and the few patterns >>it has, they are even requiring no arrays to tune. so very little >>parameters are there to tune. Should be easy nah? > >No arrays? Have you looked at the code? It has many arrays. Some of which >are used in a third-order fashion. lookups and summations from a first array, >then that value is used to index into a second array... some sums of those >and that value indexes into a third array... > >I think TD learning would be tough. But I don't see why it can't work. Just >because it might be hard to do doesn't mean it is impossible to do... I have played with it. I am convinced it has possibilities, but one problem I encountered was the cause-effect problem. For say I am a piece down. After I lost the game TD will conclude that the winner had better mobility and will tune it up. However worse mobility was not the *cause* of the loss, it was the *effect* of simply being a piece down. In my case it kept tuning mobility up and up until ridiculous values. Bas.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.