Author: jonathan Baxter
Date: 16:00:17 10/05/98
Go up one level in this thread
On October 05, 1998 at 13:53:31, Don Beal wrote: >The learning rate is critical - it has to be as large as one dares >for fast learning, but low for stable values. We've been experimenting >with methods for automatically adjusting the learning rate. (Higher >rates if the adjustments go in the same direction, lower if they keep >changing direction.) This is similar to the use of "momentum" in training neural networks. Its an example of a second order method, like newton's, conjugate gradient, levenberg-marquet, etc. There is a monstrous literature on this for neural nets. I always though it wouldn't make a great deal of difference to go to second order. >The other problem is learning weights for terms which only occur rarely. >Then the learning process doesn't see enough examples to settle on >good weights in a reasonable time. I suspect this is the main limitation >of the method, but it may be possible to devise ways to generate >extra games which exercise the rare conditions. This is called the "exploration/explotation " tradeoff in Reinforcement Learning. Its a tough question. The same problem arose in some experiments I ran over the weekend: KnightC was playing with PST's and 10 stages (3x3 for the castling options for each side and then an ending stage). The Q-side-castle/Q-side-castle stage was never seen in a few hundred games so those PST's stayed with zero values. But with on-line play you avoid the worst of this problem because your opponents tend to guide you to the relevant positions. With self-play it is a real headache I think. Cheers, Jonathan Baxter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.