Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Pseudo-code for TD learning

Author: Gareth McCaughan

Date: 13:11:09 07/07/00

Yesterday I wrote:

> To understand what lambda's for, you need to understand
> the point of the TD business. The idea is that you adjust
> your parameters not to make the evaluation close to some
> pre-defined target, but to make the evaluation not change
> much from one move to the next. The idea is that if the
> game is played perfectly, each position is exactly as good
> as the one that follows it. (This is badly oversimplified,
> by the way.) If you have an evaluation function that (1)
> gives the right answer in "terminal" positions, and (2)
> evaluates each position the same way as it evaluates the
> position after a Very Good Player has made a move, then
> you have a good evaluation function.

I should have mentioned something important, which is that
the same thing is true if (2') it evaluates each position
the same way as it evaluates the position after *it* has
made a move (using its own eval to choose what to do). So
you can, in principle, use TD to learn by self-play.

--
g

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.