Succeeded in finding a bug that occurs once in 1000 times by repeating Linux startup 292,612 times



Richard M.W. Jones, a developer of Red Hat Linux , noticed that there was a bug that hung up when starting Linux v6.4, and tested to restart Linux 292,612 times. is.

I booted Linux 292,612 times |
https://rwmj.wordpress.com/2023/06/14/i-booted-linux-292612-times/



Dev Boots Linux 292,612 Times to Find Intel, AMD Kernel Bug | Tom's Hardware
https://www.tomshardware.com/news/dev-boots-linux-292612-times-for-1-in-1000-kernel-bug



Jones suspected a startup hang as a bug when he tested some server software using the nbdkit protocol for accessing block devices over the network, and found that a virtual machine disk image had That was when I noticed random hangs when used with the library ' libguestsfs ' to access. Mr. Jones set out to identify the bug to prove that he had discovered it himself.

According to Jones, when launching the open source processor emulator ' QEMU ', it was found that the bug always occurred at the same stage of the boot process.

Linux kernel hangs rarely when booting on the latest qemu (#1696) Issues QEMU / QEMU GitLab
https://gitlab.com/qemu-project/qemu/-/issues/1696#note_1428829389



So Mr. Jones started Linux 292,612 times and checked whether the bug occurred. Then, it turned out that it hangs at the time of startup at a rate of 1 time in 1000 times. It seems that the test to restart 292,612 times took a total of 21 hours, but ``It took days to do the restart test so far,'' Jones said.

Then I ran the command line 'guestfish' to inspect the virtual machine's filesystem 10,000 times in a loop, running many instances in parallel and parsing the output to find the cause. The culprit that rarely interfered with Linux startup was a regression of 'printk time' that displays timestamps on the kernel console.

According to Jones, by comparing Linux v6.0 and v6.4-rc6, he was able to narrow down the culprit of the boot hang. 'Reverting the code commit at printk time will fix the problem,' Jones asserts.

LKML: 'Richard WM Jones': printk.time causes rare kernel boot hangs
https://lkml.org/lkml/2023/6/13/733



According to Mr. Jones, for some reason this startup bug occurred less frequently on machines with Intel CPUs than on machines with AMD CPUs.

in Software,   Security, Posted by log1i_yk