Computer Chess

Hash Collisions in Chess Engines, and What They May Mean...

(12/10): This research has been paused in favor of my chess anti-cheating and ratings work, but is potentially more important. The goal is to find a concrete, feasible test for pseudo-randomness on small scales, something my field believes is impossible on large scales.

Chess programs use open-address hash tables without caring about safety measures, because a bad entry causes no more disaster than losing one game. For a small case of a bad table making a program go haywire, here is my "2006 Xmas Card to the World of Computer Chess". A little ghoulish admittedly, but the green and gold colors of Winboard/XBoard at least look festive. It shows GnuChess 5.0.7 declaring itself up almost a Rook and 3 pawns at 13 ply depth (more than the 12 plies considered a "gold standard" in a relevant paper by Bratko and Guid reviewed by Dr. Soren Riis for ChessBase.com here), but with a Principal Variation (PV) in which it declares intent to play 78.Qb6+ Kc2 79.Bc5??? hanging its Queen with check and losing! The previous moves were 74.Ke6-e7 Rb1-b7+ 75.Ke7-d6 Rb7-b1 76.Qc3-d4+ Kd1-c2 77.Qd4-c5+ Kc2-b3, and if you rewind to moves 74 and 75 with White to move, you have the positions causing the most problems. All of these positions are objectively drawn.

NEW 17 Jan. 2012 Although this is primarily a known (Knight) underpromotion bug in Rybka versions 2.3.2a clear thru to the present 4.1, the fact that it's accompanied by a long stall (which is reproducible from empty hash) may make it related to timing issues isolated to Rybka 3 hash-table size here (in German, Google Translate version here).

This shows a sampling of anomalies obtained while analyzing the notorious Kramnik-Grischuk endgame from Mexico 2007 with Deep Fritz 10 on a quad-core machine (with 3 cores allocated to Df10). The main first two examples do not show when only a single core is executing, but reproduce fairly frequently with 2-3-4 core threads. Thus the program with 1 core acts as its own control, indicating that the weirdly high evals are due not to chess tactics or even the program's own architecture, but rather to the interplay between multiple threads and the storage of already-evaluated positions.

More GnuChess anomaly, hanging Black's Rook and Queen in two moves! (It's not stalemate, either!) I am told this kind of thing may be a "display bug", and indeed in related tests engines "wise up" when you step them one move further into the position, but the display itself constitutes a verifiable anomaly. The other reason this GnuChess 5.0.7 example is insignificant is that I'm using a default hash table with only 1,024 entries, incredibly tiny! Anomalies with other engines, however, have been observed with hash up to 512MB and above (the ChessBase/Fritz 9 GUI is buggy when you try to set 1024MB hash, both as reported to me and by my own observations of the hash-size window showing "1" or "4" after having been set to "1024").

(Speculative Content---may be proved wrong, and references may be updated as I find better ones) The most far-reaching possible ramifications are a new kind of statistical test for pseudorandom generators (PRGs), which are (unadvisably IMHO but almost universally) used to construct 64-bit "Zobrist Keys" for each of 12 kinds of piece on each of 64 squares. Along with Z-keys for player-to-move, castling rights, and en-passant possibilities, the 64-bit key of a position is the XOR of the keys for each of its constituents. Under the general name subset-sum hashing this is a major topic in Computational Complexity. Here GnuChess 5.0 first generates 55 32-bit numbers using the Mathematica 2.0 Random function, then feeds them into a PRG from Knuth's ACP vol. 2 to generate the Z-keys, for which subset-sum needs represent a 3rd level of pseudo-randomness (John von Neumann's "Original Sin") that is piled on here! Also crucially, a second level of hashing is applied to map the 64-bit keys into however-many entries are allocated by the chess program's user---if each entry needs 16 bytes then 512MB of hash will yield 2^25 = about 32 million entries, and one can take 25 bits from either end of the 64-bit Z-key or wrap or truncate in either ways. Thus agreement in a portion of the position key may yield a hash collision, and with chess engines all doing open-address hashing with little or no probing since "speed is king", this can affect a result, as famously happened to the Shredder 9.1 program in July 2005 in the game shown here, with explanation (key part in Spanish---I have not seen a comparably-detailed English version and will translate myself when I get the chance) here. The off-the-wall question is, can this kind of problem be isolated to (aspects of) the choice of PRG generating the Z-keys? Points of novelty are:

The test would use an "exponential time algorithm", namely a chess engine. But chess engines have been tuned to be fast, and they output data amenable to both quantitative and "human" processing.
The test would be "in situ", as held desirable in this 1995 paper, which also remarks on the need to test short sequences (such as just under 50,000 Z-key bits) produced by PRGs. Ingredients of some of the possible problems are discussed in this 2005 paper.
The use of depth-first search in chess engines means that the test might not be subject to "dimensionality" limitations and other constraints of commonly-used statistical tests.
Most speculatively, hash anomalies (such as evidently in the Gnuchess example above, but across significantly many chess engines now as will be posted) may represent a kind of "emergent behavior", which is more generally known as an enemy to randomized dynamical computer simulations that require homogeneity and independence to deliver right results.

Insofar as systems with few degrees of freedom and low independence may be best known from elementary quantum mechanics, I have posted a "quantum basketball" analogy here, as comments to the Dec. 20 "An amazing feat!" (of two long basketball shots made by the same player in consecutive games) in Grandmaster Susan Polgar's Chess Blog, which is also a marvelous source for items of general science and character-building interest! This was for explanation to some people involved in current testing, but you can at least read it as general science writing linking the famous "EPR Paradox" and "entanglement" to questions of improbable events and "synchronicity". If you have research-relevant ideas which may shed more light, by all means please e-mail them to me! My home page has full contact info, plus I am listed.