Author: Ratko V Tomic
Date: 23:53:57 11/09/99
In an earlier thread on applying Huffman codes for enumerating and encoding chess positions, one sub-thread dealt with the issue of what kind of constraints on number of promotions, expressed in terms of captures for a given Material Content (MC, i.e. the counts of pieces by piece type) state, can be given. Any such constraint helps reduce the number of valid MC states, thus reduces the upper bound on the number of legal positions (hence on the code length needed to encode an arbitrary position). In that context (recalling his earlier program enumerating chess positions) Blass Uri wrote: > My program used this fact [that the #of promotions<= #missing pawns for > each side] and the fact that: > > an upper bound for the number of promoted white pieces is > 2*the number of captures of black pieces > +the number of captures of white pieces(I include pawns also as pieces). As illustrated in the earlier reply to the above, some of the optimal capture+promotion sequencies (max promotions for min captures) do not reach the bound above, which means (since these optimal transactions cannot be exceeded by any other transactions) that a tighter upper bound exists. The tightest upper bound of this type (on promotions in terms of captured material, its kind and its quantity) is: Number of white promotions cannot exceed 2 * (# of black pawns captured) + # of black non-pawns captured + # of white pieces captured (pawns or non-pawns), or symbolically: wp <= 2*bpc + bnc + wc (1) (and similar for the reverse colors, via prefix swap w<->b) Note that in your equation the 2nd term (bnc) has also a factor of 2, making your upper bound higher whenever there are black non-pawns captured. This bound is the tightest possible (in these terms) since it is achieved exactly (turned into equality) by the optimal capture+promotion sequences listed earlier, hence a lowering of the upper bound would necessarily make it false. Using this bound (or rather, a derived one in terms of excess and defect in material for a given MC state, classified by pawns and non-pawns, for white and black), the enumeration algorithm worked as follows: 1. Compute all Material Content (MC) states for white. The MC state is defined by counts of all piece types (Q,R,B,N,P). 1.1 There are 8694 distinct MC states for white (the loop actually iterates exactly this many times) 1.2 The only constraint on the one side MC states is that: # of excess pieces + # of pawns <= 8 where the "excess pieces" are non-pawns which are not present in the initial chess positions (i.e. W.Queens above 1, W.Rooks above 2, etc). This is simple consequence of the fact that each promotion removes exactly one pawn. Since white can have (with cooperating opponent) all 8 pawns promoted without having any of its pieces captured, the above is the strongest constraint on promotions in terms of piece counts for one side (black or white). 2. For each white MC state, pair it with up to 8694 MC states for the black, resulting in at most 8694^2=75,585,636 potential pairs (i.e. the full MC states). 2.1 If the pairing of MC state x with MC state y was computed then it is not necessary to compute pair y,x (reversed color), so the pairing loop makes only 8694*8695/2=37,797,165 iterations. 2.2 The new promotion/capture constraint (1) is applied to the pairs, resulting in rejection of 17,501,326 pair combinations, which violated the inequality, leaving 58,084,310 complete (black+white) valid MC states. Hence the MC state can be encoded in 25.792 bits, i.e. in using a 26 bit whole bit codes. 3. For each valid complete MC state the number of Material Placements (MP) is computed, and these were added over all the MC states (58,084,310 such states). 3.1 The number of distinct chess positions was thus obtained as: N = 2.463947 * 10^46 = 154.110 bits This is lower than your earlier constraint 3.701*10^46, as one would expect due to the tighter promotion constraint. 3.2 The maximum placements were generated by the 30 piece symmetrical MC state given as: W.QRNBP=3,3,2,2,4 B.QRNBP=3,3,2,2,4 and it had the #MP=3.067753*10^42=141.138 bits (the same value is obtained, of course, by all other states differing from this one only in the independent permutations in W & B non-pawn counts). 3.3 The number of MC states which used (for their #MP) 130 bits or more was counted and found to be 1,016,088 (smaller than 1024^2, i.e. smaller than 20 bits). 3.4 The Bishop optimization (discussed in an earlier sub-thread here) was not performed in the current version (due to the estimated total gain of at most few millibits from the 154.110 bits, while complicating the code great deal compared to the single bishop type enumeration). 3.5 The pawn optimization (48 available squares instead of 64 squares) was used. 3.6 Castle state was not included in the count, but it was estimated that it would add less than 0.13 bits to the position code, in the worst case (check earlier discussion on king pair + castle status encoding in 12 bits). The ep status also wasn't encoded, but since there could be at most 4 ep capture targets (white assumed to move, thus 4 targets are black pawns on the 5-th rank), the ep would add at most log2(5)=2.32 bits (to the 154.110 bit position code length; some checks on pawn counts in MC state could lower this further). Thus the ep+castle status would still keep the full position code length below 157 bits. 4. An encoding method which achieves uniform (across all positions) whole-bit code length of 155 bits (or 157 bits when ep & castle are included) consists of encoding any position with a 2 part code, MC part (specifying one of the MC states among the 58,084,310 valid MC states) plus an MP part indexing a particular placement permutation among all the placements generated by this MC state. The specific steps needed to achieve the uniform length 155-bit code in this manner are as follows: 4.1 Replace the fixed length 26 bit MC code with a variable length Huffman code, which uses the Number of MP states (for this MC) divided by the total number of states (N=2.46*10^46) as the Huffman weight (probability) for this MC state. 4.2 If, for a given MC state, the #of MPs uses X bits, then the MC state gets encoded in L(MC)=(155-X) bits. This follows from the Huffman code length L(MC) = -log2(p) = -log2[(2^X)/N)] = log2(N)-X = 155-X. The full code length is L(MC)+L(MP) = = (155-X)+X = 155, i.e. it is a fixed length 155-bit code. 4.3 The encoder/decoder of the MC part of the code would use a sorted array of one side MC states (8694 entries). The full MC code (for both sides) would be constructed on the fly as MC pairs, and their validity would be checked as in the section 2, 2.1, 2.2. If the optimization for the bishops were added, the one side MC array would have 26,298 entries (instead 8694). 4.4 The MP part of the code (the placement index for a given MC state) can be encoded/decoded using a variable radix representation for permutations. Thus it is possible to actually/practically encode (and decode) all chess positions using a fixed length 155-bit code (or 157 if ep & castle are also encoded). 4.5 Given that this uniform code comes quite close to the Karin's Dad's limit of 160 bits max for any chess position, it seems unlikely that any code which uses up fewer than 150 bits for some chess positions would be able to keep the worst case under the 160 bits target (i.e. the shorter the codes for some positions, the longer the codes for the others). To keep this post at a reasonable length, I had omitted various details (which may be obvious to those thinkering with this problem).
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.