When computers fail to play good chess

The Top Chess Engine Championship is ongoing. The final is not surprisingly Stockfish against Komodo. As I write, round 58 is under way. Komodo is leading with four wins against one.

It can be viewed here: http://tcec.chessdom.com/live.php

In an earlier round (22), something that is hard to explain happened. Stockfish had a winning position and misplayed it! Gone is the illusion that “you have to be a computer to win this”, or at least it has been augmented a bit.

On top of this there was something to feed conspiracy theories. Here is what a source close to the match said:

“After 53.b6 the online broadcast stopped. The reason it stopped was a technical glitch. There are
two computers running the TCEC event: a 24-core machine runs the engines, and the tournament program cutechess-cli, a webserver. The game playing machine had a problem uploading the pgn file to the webserver (that’s my assessment of what happened). The live broadcast went offline in a position which seemed like an easy win for SF (sf was showing +7 and komodo was also showing a high score). Then the game was drawn; offline…and out of view.”

Indeed Stockfish did mess up a winning position, which led to some Stockfish fans to conclude that there was foul play involved. I enjoy the sound of “computer cheating in computer tournament” or “human cheats in computer tournament” or whatever the Daily Mail will be able to make out of it, looking for another scandal in chess to write about. (Recently an English player changed to Wales – for the second time in his life – and somehow this was seen as a major scandal in chess, while no one had actually noticed, or had cared once they did notice… Best of luck to Nigel of course, but that it should be a scandal is hard to understand.)

To me the most interesting to me is the question, “Why did Stockfish mess up?”, not “Did the Knights Templar hide the descendent of Christ?” or whatever…

Here is the position.

TCEC Season 8 – Superfinal http://tcec.chessdom.com (22), 11.11.2015

1.d4 Nf6 2.c4 e6 3.Nf3 b6 4.g3 Ba6 5.Nbd2 c5 6.e4 cxd4 7.e5 Ng4 8.h3 Nh6 9.Bg2 Nc6 10.0–0 Be7 11.Qa4 Bb7 12.Nxd4 Nxd4 13.Bxb7 Rb8 14.Be4 Qc7 15.Qd1 Nhf5 16.Re1 Qxe5 17.Nb3 Rd8 18.Bf4 Qf6 19.Qd3 Bc5 20.Rad1 Nxb3 21.axb3 Nd4 22.Kg2 Nc6 23.h4 a5 24.Qe2 Qe7 25.Qh5 g6 26.Qf3 Nd4 27.Qc3 Qf6 28.Bd5 Bb4 29.Qd3 0–0 30.Be5 Qf5 31.Qxd4 Bxe1 32.Rxe1 d6 33.Bf6 e5 34.Qxb6 Qxf6 35.Qxa5 Kh8 36.b4 g5 37.Rh1 gxh4 38.Rxh4 Qg6 39.Qa3 f5 40.Qf3 Qg7 41.b5 Rb8 42.b4 Rf6 43.Rh5 Qg6 44.Qe2 f4 45.Be4 Qg7 46.Qf3 Rh6 47.Rxh6 Qxh6 48.Qe2 fxg3 49.fxg3 Qg5 50.c5 Rg8 51.Qe1 dxc5 52.bxc5 Rd8 53.b6 Rd2+ 54.Kg1 Qd8 55.Qe3 Rb2 56.Bf3 Rb1+ 57.Kg2 Rb2+ 58.Kh3 Qf6 59.b7 Qe6+ 60.g4 h5 61.c6 hxg4+ 62.Bxg4 Qd6 63.Bf5 Qf6

64.Kg4 Rg2+ 65.Kf3 Rb2 66.Kg4 Rg2+ 67.Kf3 Rb2 68.Qe4 Qd6
½–½

I analysed myself, with engine help, and found that White is winning:

64.Bc2 Rb4 (64…Qxc6 65.Qxe5+ Kg8 66.Qg3+!+–) 65.Be4 Rb2 66.Kg4 Qe6+ 67.Kg5 Qe7+ 68.Kf5 Qf7+ 69.Kxe5 Qg7+ 70.Ke6 Qg8+ 71.Kf5 Rb5+ 72.Kf4 Qf7+ (72…Qf8+ 73.Kg3 Qf6 transposes.) 73.Kg3 Qf6 74.Qf3 Qd6+ (74…Rg5+ 75.Kf2 Qb2+ 76.Qe2 Qd4+ 77.Kf3 Qf6+ 78.Ke3 Qc3+ 79.Qd3 Qe1+ 80.Kf3 Qh1+ 81.Kf2 Qh2+ 82.Ke3 Qg3+ 83.Kd2 also seems winning.) 75.Qf4 Rg5+ 76.Kf3 Qd1+ 77.Kf2 Qd4+ 78.Qe3 Qb2+ 79.Qe2 Qd4+ 80.Kf3 Qf6+ 81.Ke3 Qe5 82.Qd2 Rg8 83.Qd5 Qc3+ 84.Ke2 Qb2+ 85.Qd2 Qb5+ 86.Ke1 Rg1+ 87.Kf2 Qf1+ 88.Ke3 Rg3+ 89.Kd4 Qa1+ 90.Kc5 Qa3+ 91.Kc4 Qa4+ 92.Kd5 Rg5+ 93.Kd6 Qa3+ 94.Ke6 Qh3+ 95.Ke7 Rg7+ 96.Kf6 Qh4+ 97.Ke5 Re7+ 98.Kd6 Qf6+ 99.Kc5 Re8 100.Qd4 Re5+ 101.Kc4 Qf1+ 102.Kb4 Qb5+ 103.Kc3 Kg7 104.Qd7+ Kh6 105.Qd6+ Kg5 106.b8Q

On the TCEC noticeboard Louis Zulli gives a wining line very similar to the one I found:

+123.21 [+] [*] 64. Bc2 Rb4 65. Be4 Rb2 66. Kg4 Qe6+ 67. Kg5 Qe7+ 68. Kf5 Qf7+ 69. Kxe5 Qg7+ 70. Ke6 Qg8+ 71. Kf5 Rb5+ 72. Kf4 Qf7+ 73. Kg3 Qf6 74. Qf3 Qd6+ 75. Qf4 Rg5+ 76. Kf3 Qd1+ 77. Kf2 Qd4+ 78. Qe3 Qb2+ 79. Qe2 Qd4+ 80. Kf3 Qf6+ 81. Ke3 Qc3+ 82. Qd3 Qe1+ 83. Kf3 Qh1+ 84. Ke2 Qh2+ 85. Ke3 Qg1+ 86. Kd2 Qf2+ 87. Kc3 Qc5+ 88. Kb3 Qb6+ 89. Kc2 Rb5 90. Qc3+ Kg8 91. c7 Qf2+ 92. Kd3 Qf1+ 93. Ke3 Qh3+ 94. Ke2 Qh2+ 95. Kf1 Rb1+ 96. Bxb1 Qh1+ 97. Ke2 Qh2+ 98. Kd1 Qg1+ 99. Kc2 Qf2+ 100. Kb3 Qb6+ 101. Qb4 Qe3+ 102. Bd3 Qe6+ 103. Bc4 Kh7 104. Bxe6 (depth 57, 0:09:43)

So if anyone has any idea why (oh why) Stockfish did not follow its own recommendation, I would be glad to know. Clearly time constraints will be part of the story. Another part is that engines have not yet been programmed to choose the lines where there are possibilities for winning, but none for losing, over the lines where it is an instant draw (as here). (I cannot imagine this would be too difficult to do?)

Any suggestion is very welcome.

20 thoughts on “When computers fail to play good chess”

  1. An Ordinary Chessplayer

    “Another part is that engines have not yet been programmed to choose the lines where there are possibilities for winning, but none for losing …”

    The way the computer “thinks”, all lines have possibilities for losing. E.g. in your variation, 64.Bc2 Rb4 65.Qd4? is losing.

  2. @An Ordinary Chessplayer

    I think Jacob’s point is that a computer might decide that with White to move, he has 3 ideas that at best draw, but would be able to down-select from those 3 the variation that gives Black the largest opportunities to go wrong.

    I have long thought about the same possibility Jacob mentioned within endgame tablebases, which would be very easy to generate, as the tree data is already there. What Jacob mentions is possible if you use IDeA analysis in Aquarium – you can generate plots for each possible branch in a tree that will display the % of nodes beyond the initial position that contain each evaluation, e.g. 50% =, 30% +=, 15% +/-, and 5% +-. Of course, your analysis might not be correct, but the functionality is there!

    I suspect that engines might not store data in a way that makes it amenable to do this, but I really have no idea, that’s just speculation.

  3. An Ordinary Chessplayer

    I got his point, my elliptically made point was that it is a human calculation, for use against a human opponent. The engines achieve depth by pruning the very data the analysis requires. For tablebases I can see it, but even there the preference should be for depth; e.g. extending to 8-man instead of enhancing the 7-man etc.

  4. I can recommend

    https://groups.google.com/forum/?fromgroups=#!forum/fishcooking
    and
    http://talkchess.com/forum/index.php

    for everyone who is interested in the highly complex trial and error methods of computer chess software engineering. The verdict is still out what made stockfish to “forget” the winning line, but there are several possible reasons (hashtable collision, excessive pruning rules, using 32 threads, implementation of 6-men tablebases etc).

  5. Until I saw this post I wasn’t aware of the TCEC tournament. (Been in my office finishing Avrukh’s Catalan book:) ) I use Stockfish as my go to analysis computer, and have noticed an interesting trend. That every once in awhile in certain not so complicated positions, it will go down a weird line that gives a decent plus. Something like .6 to .90. At these times I ask ‘Why? Why can’t I just play ‘X’? It fits better with the position.’ And interestingly, my moves tend to get better results from stockfish itself. I actually timed Stockfish once and it was almost a minute before it even considered my move. After that it became the top choice. Now that could be because I only have a quad core processor, or I am not running enough lines etc, but I find it interesting.

  6. An Ordinary Chessplayer

    @Everybody – It is clear that I am not making myself clear. Sorry for that. I will have to go into verbose mode.

    I think “no losing chances” is a human thought which is not easy to translate into computer code. Jacob suggests an alternate move selection, based on that idea, to be implemented in a program. I don’t believe it will happen because it is only useful against human opponents. Computers are rated in competition against other computers, so there is not any rating advantage to doing this difficult implementation. And really, at this point why should computers need to play any better against humans?

    However, I have a suggestion which might be relatively easy to implement. Forgive me if engines already do this. If we make a distinction between positions that are 0.5 (eventually drawn) and positions that are immediately 1/2-1/2 (because of stalemate, repetition, or 50-move rule), a simple way to have the computer “try” to win would be to choose the 0.5 line that is 1/2-1/2 at the greatest depth. The only drawback might be that sometimes a computer that is “worse” (according to a human assessment) would avoid stalemating itself in preference to drawing by the 50-move rule. Again, pretty pointless against another engine, but it might cause a human to flag or blunder.

  7. An Ordinary Chessplayer

    @Paul – Interesting indeed. Is this reproducible, like if you take the position as EPD or PGN and give it to Stockfish again, does it do the same? If so, then I bet the programmers would also be interested.

  8. @An Ordinary Chessplayer
    Did I insult you? I did not mean to. But clearly we can see that engines have options where one more is immediately draw and the other is risk free. If they were programmed to go for “risk-free”, they should be more successful in these TCEC tournaments.

  9. @An Ordinary Chessplayer
    If I send the position as a .cbv file, once the position is loaded it actually takes roughly 1:30 before the engine even considers my move(11…e5). If you’re interested here is the position. Start running your stockfish engine at 11. c3.

    1. e4 c5
    2. Nc3 Nc6
    3. f4 e6
    4. Bc4 Nge7
    5. e5 d5
    6. exd6 Nf5
    7. Ne4 Nxd6
    8. Nxd6+ Bxd6
    9. Ne2 Qh4+
    10. g3 Qf6?!
    11. c3 e5

  10. @Jacob Aagaard
    Interesting. My stock fish doesn’t even consider 11…e5 as an option until after a minute. I will increase the amount of lines when I get home and see if that changes things.

    My computer rated 11…0.0 as -.58 and 11…e5 -.71 which is not much but still something and if you add a shorter time it’s possible that Stockfiah would play the ‘not best’ move. I would be interested in seeing how Komodo would rate the position. If it finds a better move faster, that could give it a very slight edge.

  11. Paul :
    @Jacob Aagaard
    Interesting. My stock fish doesn’t even consider 11…e5 as an option until after a minute. I will increase the amount of lines when I get home and see if that changes things.
    My computer rated 11…0.0 as -.58 and 11…e5 -.71 which is not much but still something and if you add a shorter time it’s possible that Stockfiah would play the ‘not best’ move. I would be interested in seeing how Komodo would rate the position. If it finds a better move faster, that could give it a very slight edge.

    If I run ten lines, the delay is negligible. Interesting.

  12. Komodo 7 is now freeware, available here:
    https://komodochess.com/pub/komodo-7.zip

    The main difference between Komodo and Stockfish is the amount of ‘chess knowledge’, or intuition, in komodo’s ‘feel’ for a position. Even without searching Komodo knows which moves are likely strongest. You can see this for yourself by simply playing against Komodo limited to ‘search depth 1’.

  13. I have a quite new computer, with 16 GB RAM, a Core i5-6600K on a Z170 Mobo. I tried with different 64-bit engines default mode using 2 lines and around 9 GB RAM Hash; Houdini 4 Pro, Stockfish 6 and 7 and also Komodo 8. I only get 11…e5 from Komodo 8 quickly. With the rest, it takes about 30-60 secs, analysing at about 7000 kN/s.

  14. Or no, sorry, not even Komodo 8 shows 11…e5. My mistake, that was when I changed to the best 3 lines. I think it happened at around depth 21. For Stockfish 7 with 3 lines, it took until depth 25 for 11…e5 to show up in the moves list, and it ends up as the top choice at depth 26.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top