Author: Dave Gomboc
Date: 07:11:05 09/10/03
Go up one level in this thread
On September 09, 2003 at 18:33:22, Steven Edwards wrote: >Greetings: > >It was some ten years ago back in 1994 that the CDS (Chess Data Standards: SAN, >FEN, EPD, and PGN) took their current form after months of newsgroup discussions >and megabytes of email exchange. Prior to the adoption of the CDS, the exchange >of chess data and the interoperatabilty of chess softawre was greatly impeded by >the lack of standards and the prevalence of secret and non-free proprietary >formats. > >Although the CDS are now used by (nearly) all archives and all programs, it does >not mean that the current form of the standards is the best. Certainly there >are places where extensions can be made and where unneeded items deprecated and >eventually deleted. To make a revision work, there is a requirement for input >from the experienced chess programmer community just as there was such a need >ten years ago. Therefore, I once again issue the call for online (here in CCC) >and email discussion on the CDS topic. (Including "CDS" in the subject line >helps in this process.) > >Here are some of my ideas: > >SAN: Looks fine; I don't see any need for changes. There is the possibility to >extend the piece identfier letter semantics to accommodate heterodox / >unorthodox chess. The use of FAN (figurine algebraic notation) would be more forward-looking. I'm not going to use KQRBNP forever, given there are glyphs in Unicode for the chess pieces. Switching would also obviate the need for language translations. >FEN: One deficiency exists for which I take responsibility; the en passant >target square semantics should indicate a non null value only if an active >color pawn attacks the passed-over target square. A change here will >improve position database operation and can also have a positive effect >on internal transposition management. Also, there is the possibility of >extending the castling availability semantics to accommodate heterodox / >unorthodox chess. Yes, people regularly ignore the spec when it comes to the e.p. square. >EPD: There are several items of concern: > >1. Currently, the first four EPD fields match the first four FEN fields. This >was done to save space with the idea that the "hmvc" and "fmvn" EPD operations >could provide the extra information if needed. This is rather arbitrary and I >suggest that every EPD record have the same first six fields as a does a FEN >record. Alternatively, EPD opcodes can be defined for the (current) first four >EPD fields and so there would be NO required fields at the start of an EPD >record. Either solution is better than the status quo. >2. Representations of string/symbolic data in operands is inconsistent with >respect to the need for quoting. This is also the case with PGN tag values. A >uniform rule is needed for both. > >3. Representations of time and date value operands needs to be formalized along >with a provision for sub-second decimal resolution. > >4. The centipawn evaluation operand type needs a mate score indication >correction. > >5. The centipawn evaluation operand type probably needs to be deprecated and >replaced with a pawn evaluation operand type with a provision for sub-pawn >decimal resolution. This should have used a real value to begin with. >6. The time control operand type needs to be extended and formalized. PGN has a >problem with this as well. > >7. A formal XML schema could be useful. Likewise for PGN. The current syntax is not compatible with XML. You could consider heading an effort to make a replacement standard interchange format that used XML. >8. Removal of record length limitations. > >9. Explicit support for 64 bit integer values (decimal and hexadecimal) as >operands. > >10. Inclusion of progam-to-program comand protocol opcodes. > > >PGN: Again, several areas need re-examination. > >1. Adding the Broket Form to the movetext. A broket form is a single EPD >operation delimited by angle brackets ("brokets"). This is a far better >approach to embedding metadata than the cuurent use of comments. > >2. Deprecation of the use of a period of each White move number. The use of a >period here has little, if any, need and just consumes space. Lots of things consume space, but humans expect the period to be there. Don't mess with it. If we were looking for an efficient, compact representation, we wouldn't be using text to begin with. >3. Removal of all mention of "canonical representation". It was an attempt to >support matching PGN movetexts based on simple string comparisons. Unneeded. > >4. Formalization of the PGN tag name set, including any PGN tag names that have >become popular "in the wild" and deprecating those which are rarely, if ever, >used. > >5. Formalization of PGN tag value semantics. Part of this includes the use of >"*" to indicate an unknown value, just as it already does for a game result. I should point out here that most data uses semi-colon to separate two values, though PGN indicates a colon should be used. >6. Removal of the binary representation standard. This is unneeded as the use >of fast and portable text compression tools is now commonplace. > >7. Adding some kind of formal way of representing attributes for aggregates af >PGN game data. You've missed a category, and the one that is most contentious: NAG (numeric annotation glyphs). I think the design erred in the way that NAGs were assigned meaning. This can be evidenced by the large amount of data that uses NAGs in Informant style, e.g. "white has a space advantage" is used irrespective of whose turn it is. In practice, the code for this is defined as "space advantage", and the "black has a space advantage" might as well not exist. The player whom space advantage refers to is invariably the player of the last move before the NAG occurs. In the below, "nobody" is of course probably not true, there's likely SOMEONE who uses it, but what I mean is that widespread practice is otherwise. For instance: 11 Equal chances, quiet position (=) 12 Equal chances, active position (=) Nobody uses $12. $11 is invariably used to refer to any position assessed as equal. 7 Forced move 8 Singular move; no reasonable alternatives Nobody uses $7. 14 White has a slight advantage (+=) 15 Black has a slight advantage (=+) 16 White has a moderate advantage (+/-) 17 Black has a moderate advantage (-/+) 18 White has a decisive advantage (+-) 19 Black has a decisive advantage (-+) 20 White has a crushing advantage (+-) 21 Black has a crushing advantage (-+) Nobody uses $20 or $21. Better would have been to assign codes for: white, black, slight advantage, clear advantage, decisive advantage. That's not obvious from the above, but here we see: 24 White has a slight space advantage 25 Black has a slight space advantage 26 White has a moderate space advantage 27 Black has a moderate space advantage 28 White has a decisive space advantage 29 Black has a decisive space advantage 30 White has a slight time (development) advantage 31 Black has a slight time (development) advantage 32 White has a moderate time (development) advantage 33 Black has a moderate time (development) advantage 34 White has a decisive time (development) advantage 35 Black has a decisive time (development) advantage 36 White has the initiative 37 Black has the initiative 38 White has a lasting initiative 39 Black has a lasting initiative 40 White has the attack 41 Black has the attack And so on. Nobody uses the black version of these, just the white. Furthermore, nobody uses 24, 28, 30, 34, or 38 either, because they don't map to Chess Informant codes, which is what people are used to. In practice, $40 means "with the attack" whether the NAG spec says so or not. You should also check http://scid.sourceforge.net/help/NAGs.html, which contains the following extensions to represent common symbols: 140 With the idea ... 141 Aimed against ... 142 Better move 143 Worse move 144 Equivalent move 145 Editor's Remark ("RR") 146 Novelty ("N") 147 Weak point 148 Endgame 149 Line 150 Diagonal and so on, see the URL for the complete list. There was also some extension to PGN made for clock timing: http://www.enpassant.dk/chess/palview/enhancedpgn.htm Frankly, there are too many separate notations for everything. How many acronyms does one need? :-) There's room to tweak the standard, and if that's all you want to do, fine and well. But there are other things that make sense to do that aren't supported at all currently, and would look like a hack with our current notation. For instance, consider the record of a game where moves were missed up to the time control. It should be possible to indicate that a series of moves are missing, that the next board position is like so, and continue with a new move numbering from there (e.g. it might jump by eight or something.) This would also have allowed the representation of games where an illegal move occured (arguably the most important case), representation of non-standard chess games in PGN by introducing a board after funny-not-real-chess-castling occured, or introducing a board after a piece was dropped onto the board (e.g. "crazy house", like bughouse but for 2 people). This is not an ideal solution for everything but it at least would have been an acceptable workaround. Anyway, my point is that tweaks are one thing, but if you want to do some XML thing, though, then start fresh. I don't think the XML will be as human-readable as PGN, though. Tweaking PGN, and using XML to create a database interchange format are, I think, two separate tasks. Dave
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.