Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hello from Edmonton (and on Temporal Differences)

Author: Rémi Coulom

Date: 08:50:14 07/31/02

Go up one level in this thread


On July 31, 2002 at 08:12:43, Will Singleton wrote:

>
>In backgammon, as you know, a learning program would play hundreds of thousands
>of games in a short time span to tune the weights, then thousands of real games
>against other versions to test for improvements, ad infinitum.  Given the huge
>amount of training and game-play needed, how could chess ever effectively
>exploit the technique?
>
>Will

I am not sure (I have not tried it), but I believe that the problem of the
quantity of learning data is very important, indeed. I can think of 3 ways to
deal with this:
- The brute force approach: use many fast computers during a very long period of
time.
- The blitzer approach: use games with very short time controls. It is not
obvious which would be the best time control. It is a matter of balancing the
quantity of data with its accuracy.
- Episodic memory: TDLeaf(lambda), as used by Baxter et al., forgets all past
experience as learning progresses: the only memory of the learner is the weights
of the evaluation function. Re-analyzing old games might be a good way to make a
more efficient use of data. New games would bring more valuable data, but they
are costly to generate, whereas old games can be retrieved at no cost. I believe
that incorporating episodic memory to reinforcement learning algorithms is an
interesting research direction. It might as well be a way to solve the
instability problems of RL methods. But this is just a vague idea I have, and I
should not reveal my secret research plans!

I believe that finding the right architecture for the evaluation function is the
most important problem. It should be made so that reinforcement learning can
work efficiently and "creatively". Maybe classical evaluation functions that are
designed to be hand-tuned do not have the right structure.

Rémi




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.