Author: Gareth McCaughan
Date: 18:20:24 07/06/00
Go up one level in this thread
[KarinsDad:] > Do you have a doctorate? ;) > > Who can read this code (alpha?, Lambda?, tanh?, pow?)? tanh is the hyperbolic tangent function. tanh(x) is close to -1 when x is very negative, close to +1 when x is very positive, and smooth in between. pow(x,y) is C's way of writing "x to the power y". alpha is a parameter that determines how much difference each game makes to your eval function parameters. If alpha is very small, your eval changes only a very little with each game: you incorporate new information slowly. If alpha is larger, your eval changes more with each game: you incorporate new information quickly (and run the risk of overwhelming your carefully built up evaluation function with data from a small number of new games). To understand what lambda's for, you need to understand the point of the TD business. The idea is that you adjust your parameters not to make the evaluation close to some pre-defined target, but to make the evaluation not change much from one move to the next. The idea is that if the game is played perfectly, each position is exactly as good as the one that follows it. (This is badly oversimplified, by the way.) If you have an evaluation function that (1) gives the right answer in "terminal" positions, and (2) evaluates each position the same way as it evaluates the position after a Very Good Player has made a move, then you have a good evaluation function. So, TD learning is all about trying to make your eval at one move close to your eval for the next move. One of the many ways in which the description above is an oversimplification is that you actually try to make it match evals for later moves too. lambda=0 means "just try to match evals on consecutive moves"; lambda=1 means "try to match all evals to final result"; intermediate values produce intermediate results.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.