The unexpected bug was caused by a full moon



Engineer Bartłomiej Kupiau shares a story about one of the strangest bugs he's ever seen, which he encountered while training a neural network to learn how to play the roguelike game NetHack .



Thread by @CupiaBart on Thread Reader App – Thread Reader App
https://threadreaderapp.com/thread/1793930355617259811.html

Together with Maciej Woczek, a postdoctoral researcher at Jagiellonian University in Poland, Kupiaw is training a neural network that learns how to play the roguelike game 'NetHack' using a model developed by AI researcher Jens Tyles .



As a result of further improvements through reinforcement learning, the model was able to score 5,000 points, and Mr. Cupiau and his colleagues aimed to further improve the score by fine-tuning it, but one day, the model was only able to score 3,000 points.

NetHack is a game in which a different dungeon is automatically generated each time you play it, and it seems that Cupiau and his team played the game with a fixed seed. However, the problem occurred consistently even when they changed the seed, so Cupiau thought that there was something wrong with the model itself, and so he reverted the model to code from a few days ago. However, he still only managed to get a score of 3,000 points, and even when he reverted to code from a few weeks ago, the score was still only 3,000 points.

Luckily, the server used for the experiment still had the files from when they'd achieved 5,000 points, so Cupiau and his team used these to try again, but they still only managed to get a score of 3,000.

After reviewing the entire environment, I found that the CUDA library for high-speed calculations on the GPU had been upgraded from version 11.8 to 12.4. I wondered if the difference in the CUDA version had such an effect, but even when I reverted the library to 11.8, the score was still 3000 points. Even when I created a new environment for CUDA 12.4, the score was still 3000 points.

An example of NetHack gameplay



I even ran it on my personal laptop, with multithreading, GPU, and all other options that might cause issues turned off, and still got a score of 3000.

Cupiau said he had been working on the problem for hours and felt like he was going crazy, so he asked Tyles the question before going to bed and then took the day off.

The next day, Mr. Cupiau received a reply from Mr. Tiles saying, 'Today is a full moon.'

NetHack divides a month into eight periods of three or four days each, and during the 'full moon' period, characters are given more luck. The game doesn't become more difficult, but the model's training set didn't include full moon data, so the score was lower.

Time - NetHack Wiki
https://nethackwiki.com/wiki/Time

Just to be sure, Mr. Cupiau changed the system clock to move it away from the period corresponding to the 'full moon', and the model was able to score 5,000 points again.

Cupiau thanked Tiles and concluded the episode by saying, 'If you encounter any unexpected bugs, be sure to check the moon phase.'



in Software, Posted by logc_nt