Computer Chess Club Archives

Search

Terms

Messages

Subject: Pseudo-code for TD learning

Author: Dan Homan

Date: 07:25:55 07/06/00

I've got some further e-mail requests about my TD learning code in EXchess,
and I thought it might be useful to write up some pseudo-code.

This is basically a direct translation of the formal presentation of
formulas in the knightcap papers, so there is nothing new here.  In fact,
you should definitely read those papers before trying to implement this,
because there will be special cases and considerations for your
particular implementation.  Also, I don't claim to have got it right... so
far it seems to work for the piece values. The pseudo-code corresponds pretty
closely (minus special cases and some debugging stuff) to my implementation in
EXchess.

The code assumes you have the following things

eval(position p)  - evaluation function, returns in centipawns
set_eval_param()  - function to set the eval parameters (e.g. KNIGHT_VALUE =
                     score_param[3]; etc...)
leaf_pos[]  - array of final leaf positions from searches during the game
score_param[]  - array of scoring parameters, stored in a file
write_param()  - function to write parameters to a file

void learn_parameters(int game_result) {   // game_result is -1,0,or 1

  float s[256];  // array of position scores
  float d[256];  // array of position score differences
  float ds;      // scoring derivative
  float sum1, sum2;
  float alpha = 1.0, Lambda = 0.7;  // alpha controls size of the parameter
                                    // updates and Lambda controls how much
                                    // the later score differences in a game
                                    // influence the contribution of a
                                    // particular position's score derivative

  /* loop to setup position scores and score differences */
  for(int i = 1; i < N; i++) {
   s[i] = tanh(0.00255*eval(leaf_pos[i]));
   if(i > 1) {
    d[i-1] = s[i] - s[i-1];
    if(i==N-1) d[i] = float(game_result) - s[i];
    /* don't use opponent blunders to learn scores */
    if(d[i] > 0 && !predicted_opponent_move[i]) d[i] = 0;
   }
  }

  /* loop over eval parameters */
  for(int j = 1; j <= TOTAL_PARAMETERS; j++) {
   sum1 = 0.0;
   /* loop over game positions */
   for(int i = 1; i < N; i++) {
    // compute the scoring derivative by adding 1/100 of pawn
    score_param[j] += 1;
    set_eval_param();
    ds = (tanh(0.00255*eval(leaf_pos[i])) - s[i])/0.01;
    score_param[j] -= 1;
    // now sum over all score differences to end of game,
    //  weighting down the later positions by Lambda^(m-i)
    sum2 = 0.0;
    for(int m = i; m < N; m++) {
     sum2 += pow(Lambda,(m-i))*d[m];
    }
    // add in the contribution of this position to the parameter update
    sum1 += ds*sum2;
   }
   // update the scoring parameter;
   score_param[j] += alpha*sum1;
  }

  // write the updated parameters out to a file
  write_param();

}

Re: Pseudo-code for TD learning Frank Schneider 21:17:31 07/06/00
Re: Pseudo-code for TD learning Andrew Williams 10:56:23 07/06/00
Re: Pseudo-code for TD learning Tom Kerrigan 10:20:15 07/06/00
- Re: Pseudo-code for TD learning Dan Homan 10:30:05 07/06/00
Re: Pseudo-code for TD learning KarinsDad 08:00:46 07/06/00
- Re: Pseudo-code for TD learning Djordje Vidanovic 05:47:45 07/07/00
- Re: Pseudo-code for TD learning Gareth McCaughan 18:20:24 07/06/00
  - Re: Pseudo-code for TD learning Gareth McCaughan 13:11:09 07/07/00
- Re: Pseudo-code for TD learning Dan Homan 08:04:46 07/06/00

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.