Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Pseudo-code for TD learning

Author: Gareth McCaughan

Date: 18:20:24 07/06/00

Go up one level in this thread


[KarinsDad:]
> Do you have a doctorate? ;)
>
> Who can read this code (alpha?, Lambda?, tanh?, pow?)?

tanh is the hyperbolic tangent function. tanh(x) is close to -1
when x is very negative, close to +1 when x is very positive,
and smooth in between.

pow(x,y) is C's way of writing "x to the power y".

alpha is a parameter that determines how much difference
each game makes to your eval function parameters. If alpha
is very small, your eval changes only a very little with
each game: you incorporate new information slowly. If alpha
is larger, your eval changes more with each game: you
incorporate new information quickly (and run the risk of
overwhelming your carefully built up evaluation function
with data from a small number of new games).

To understand what lambda's for, you need to understand
the point of the TD business. The idea is that you adjust
your parameters not to make the evaluation close to some
pre-defined target, but to make the evaluation not change
much from one move to the next. The idea is that if the
game is played perfectly, each position is exactly as good
as the one that follows it. (This is badly oversimplified,
by the way.) If you have an evaluation function that (1)
gives the right answer in "terminal" positions, and (2)
evaluates each position the same way as it evaluates the
position after a Very Good Player has made a move, then
you have a good evaluation function.

So, TD learning is all about trying to make your eval at
one move close to your eval for the next move. One of the
many ways in which the description above is an oversimplification
is that you actually try to make it match evals for later moves
too. lambda=0 means "just try to match evals on consecutive
moves"; lambda=1 means "try to match all evals to final result";
intermediate values produce intermediate results.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.