Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Ten years later: revising EPD/FEN/PGN

Author: Robert Hyatt

Date: 10:30:45 09/10/03

Go up one level in this thread


On September 10, 2003 at 10:11:05, Dave Gomboc wrote:

>On September 09, 2003 at 18:33:22, Steven Edwards wrote:
>
>>Greetings:
>>
>>It was some ten years ago back in 1994 that the CDS (Chess Data Standards: SAN,
>>FEN, EPD, and PGN) took their current form after months of newsgroup discussions
>>and megabytes of email exchange.  Prior to the adoption of the CDS, the exchange
>>of chess data and the interoperatabilty of chess softawre was greatly impeded by
>>the lack of standards and the prevalence of secret and non-free proprietary
>>formats.
>>
>>Although the CDS are now used by (nearly) all archives and all programs, it does
>>not mean that the current form of the standards is the best.  Certainly there
>>are places where extensions can be made and where unneeded items deprecated and
>>eventually deleted.  To make a revision work, there is a requirement for input
>>from the experienced chess programmer community just as there was such a need
>>ten years ago.  Therefore, I once again issue the call for online (here in CCC)
>>and email discussion on the CDS topic. (Including "CDS" in the subject line
>>helps in this process.)
>>
>>Here are some of my ideas:
>>
>>SAN: Looks fine; I don't see any need for changes.  There is the possibility to
>>extend the piece identfier letter semantics to accommodate heterodox /
>>unorthodox chess.
>
>The use of FAN (figurine algebraic notation) would be more forward-looking.  I'm
>not going to use KQRBNP forever, given there are glyphs in Unicode for the chess
>pieces.  Switching would also obviate the need for language translations.
>
>>FEN: One deficiency exists for which I take responsibility; the en passant
>>target square semantics should indicate a non null value only if an active
>>color pawn attacks the passed-over target square.  A change here will
>>improve position database operation and can also have a positive effect
>>on internal transposition management.  Also, there is the possibility of
>>extending the castling availability semantics to accommodate heterodox /
>>unorthodox chess.
>
>Yes, people regularly ignore the spec when it comes to the e.p. square.
>
>>EPD: There are several items of concern:
>>
>>1. Currently, the first four EPD fields match the first four FEN fields.  This
>>was done to save space with the idea that the "hmvc" and "fmvn" EPD operations
>>could provide the extra information if needed.  This is rather arbitrary and I
>>suggest that every EPD record have the same first six fields as a does a FEN
>>record.  Alternatively, EPD opcodes can be defined for the (current) first four
>>EPD fields and so there would be NO required fields at the start of an EPD
>>record.
>
>Either solution is better than the status quo.
>
>>2. Representations of string/symbolic data in operands is inconsistent with
>>respect to the need for quoting.  This is also the case with PGN tag values.  A
>>uniform rule is needed for both.
>>
>>3. Representations of time and date value operands needs to be formalized along
>>with a provision for sub-second decimal resolution.
>>
>>4. The centipawn evaluation operand type needs a mate score indication
>>correction.
>>
>>5. The centipawn evaluation  operand type probably needs to be deprecated and
>>replaced with a pawn evaluation operand type with a provision for sub-pawn
>>decimal resolution.
>
>This should have used a real value to begin with.
>
>>6. The time control operand type needs to be extended and formalized.  PGN has a
>>problem with this as well.
>>
>>7. A formal XML schema could be useful.  Likewise for PGN.
>
>The current syntax is not compatible with XML.  You could consider heading an
>effort to make a replacement standard interchange format that used XML.
>
>>8. Removal of record length limitations.
>>
>>9. Explicit support for 64 bit integer values (decimal and hexadecimal) as
>>operands.
>>
>>10. Inclusion of progam-to-program comand protocol opcodes.
>>
>>
>>PGN:  Again, several areas need re-examination.
>>
>>1. Adding the Broket Form to the movetext.  A broket form is a single EPD
>>operation delimited by angle brackets ("brokets").  This is a far better
>>approach to embedding metadata than the cuurent use of comments.
>>
>>2. Deprecation of the use of a period of each White move number.  The use of a
>>period here has little, if any, need and just consumes space.
>
>Lots of things consume space, but humans expect the period to be there.  Don't
>mess with it.  If we were looking for an efficient, compact representation, we
>wouldn't be using text to begin with.

Here I agree.  Otherwise let's dump SAN as well.  We define a simple
canonical move generation order, and then just store each move as a one
byte index into that list.  A 60 move game would require 120 bytes, total.

But it won't be readable by humans.  PGN is basically machine _and_ human
readable.

>
>>3. Removal of all mention of "canonical representation".  It was an attempt to
>>support matching PGN movetexts based on simple string comparisons.  Unneeded.
>>
>>4. Formalization of the PGN tag name set, including any PGN tag names that have
>>become popular "in the wild" and deprecating those which are rarely, if ever,
>>used.
>>
>>5. Formalization of PGN tag value semantics.  Part of this includes the use of
>>"*" to indicate an unknown value, just as it already does for a game result.
>
>I should point out here that most data uses semi-colon to separate two values,
>though PGN indicates a colon should be used.
>
>>6. Removal of the binary representation standard.  This is unneeded as the use
>>of fast and portable text compression tools is now commonplace.
>>
>>7. Adding some kind of formal way of representing attributes for aggregates af
>>PGN game data.
>
>You've missed a category, and the one that is most contentious: NAG (numeric
>annotation glyphs).
>
>I think the design erred in the way that NAGs were assigned meaning.  This can
>be evidenced by the large amount of data that uses NAGs in Informant style, e.g.
>"white has a space advantage" is used irrespective of whose turn it is.  In
>practice, the code for this is defined as "space advantage", and the "black has
>a space advantage" might as well not exist.  The player whom space advantage
>refers to is invariably the player of the last move before the NAG occurs.
>
>In the below, "nobody" is of course probably not true, there's likely SOMEONE
>who uses it, but what I mean is that widespread practice is otherwise.
>
>For instance:
>11 Equal chances, quiet position (=)
>12 Equal chances, active position (=)
>
>Nobody uses $12.  $11 is invariably used to refer to any position assessed as
>equal.
>
>7 Forced move
>8 Singular move; no reasonable alternatives
>
>Nobody uses $7.
>
>14 White has a slight advantage (+=)
>15 Black has a slight advantage (=+)
>16 White has a moderate advantage (+/-)
>17 Black has a moderate advantage (-/+)
>18 White has a decisive advantage (+-)
>19 Black has a decisive advantage (-+)
>20 White has a crushing advantage (+-)
>21 Black has a crushing advantage (-+)
>
>Nobody uses $20 or $21.
>
>Better would have been to assign codes for: white, black, slight advantage,
>clear advantage, decisive advantage.  That's not obvious from the above, but
>here we see:
>
>24 White has a slight space advantage
>25 Black has a slight space advantage
>26 White has a moderate space advantage
>27 Black has a moderate space advantage
>28 White has a decisive space advantage
>29 Black has a decisive space advantage
>30 White has a slight time (development) advantage
>31 Black has a slight time (development) advantage
>32 White has a moderate time (development) advantage
>33 Black has a moderate time (development) advantage
>34 White has a decisive time (development) advantage
>35 Black has a decisive time (development) advantage
>36 White has the initiative
>37 Black has the initiative
>38 White has a lasting initiative
>39 Black has a lasting initiative
>40 White has the attack
>41 Black has the attack
>
>And so on.  Nobody uses the black version of these, just the white.
>Furthermore, nobody uses 24, 28, 30, 34, or 38 either, because they don't map to
>Chess Informant codes, which is what people are used to.  In practice, $40 means
>"with the attack" whether the NAG spec says so or not.
>
>
>You should also check http://scid.sourceforge.net/help/NAGs.html, which contains
>the following extensions to represent common symbols:
>140 With the idea ...
>141 Aimed against ...
>142 Better move
>143 Worse move
>144 Equivalent move
>145 Editor's Remark ("RR")
>146 Novelty ("N")
>147 Weak point
>148 Endgame
>149 Line
>150 Diagonal
>and so on, see the URL for the complete list.
>
>There was also some extension to PGN made for clock timing:
>http://www.enpassant.dk/chess/palview/enhancedpgn.htm
>
>Frankly, there are too many separate notations for everything.  How many
>acronyms does one need? :-)  There's room to tweak the standard, and if that's
>all you want to do, fine and well.  But there are other things that make sense
>to do that aren't supported at all currently, and would look like a hack with
>our current notation.
>
>For instance, consider the record of a game where moves were missed up to the
>time control.  It should be possible to indicate that a series of moves are
>missing, that the next board position is like so, and continue with a new move
>numbering from there (e.g. it might jump by eight or something.)  This would
>also have allowed the representation of games where an illegal move occured
>(arguably the most important case), representation of non-standard chess games
>in PGN by introducing a board after funny-not-real-chess-castling occured, or
>introducing a board after a piece was dropped onto the board (e.g. "crazy
>house", like bughouse but for 2 people).  This is not an ideal solution for
>everything but it at least would have been an acceptable workaround.
>
>Anyway, my point is that tweaks are one thing, but if you want to do some XML
>thing, though, then start fresh.  I don't think the XML will be as
>human-readable as PGN, though.  Tweaking PGN, and using XML to create a database
>interchange format are, I think, two separate tasks.
>
>Dave



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.