Author: Rémi Coulom
Date: 08:50:14 07/31/02
Go up one level in this thread
On July 31, 2002 at 08:12:43, Will Singleton wrote: > >In backgammon, as you know, a learning program would play hundreds of thousands >of games in a short time span to tune the weights, then thousands of real games >against other versions to test for improvements, ad infinitum. Given the huge >amount of training and game-play needed, how could chess ever effectively >exploit the technique? > >Will I am not sure (I have not tried it), but I believe that the problem of the quantity of learning data is very important, indeed. I can think of 3 ways to deal with this: - The brute force approach: use many fast computers during a very long period of time. - The blitzer approach: use games with very short time controls. It is not obvious which would be the best time control. It is a matter of balancing the quantity of data with its accuracy. - Episodic memory: TDLeaf(lambda), as used by Baxter et al., forgets all past experience as learning progresses: the only memory of the learner is the weights of the evaluation function. Re-analyzing old games might be a good way to make a more efficient use of data. New games would bring more valuable data, but they are costly to generate, whereas old games can be retrieved at no cost. I believe that incorporating episodic memory to reinforcement learning algorithms is an interesting research direction. It might as well be a way to solve the instability problems of RL methods. But this is just a vague idea I have, and I should not reveal my secret research plans! I believe that finding the right architecture for the evaluation function is the most important problem. It should be made so that reinforcement learning can work efficiently and "creatively". Maybe classical evaluation functions that are designed to be hand-tuned do not have the right structure. Rémi
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.