Big no no to BUG_ON(), Beware of this Linux 4.8 Bug

BUG_ON() has been discouraged for debugging kernel problems for a very long time. However, there are still several BUG_ON() usages in the Linux Kernel. A late commit to Kernel 4.8 is one such case, and is now causing a lot of headaches with the release that came out on October 2nd, 2016.

This BUG_ON() is triggered when CONFIG_DEBUG_VM is enabled. Some distributions such as the standard Fedora Kernel config enable it by default. Linus Torvalds has discovered that once this BUG_ON() triggers, the machine will have problems handling kernel paging requests and report that a reboot is required to fix a recursive fault from which the machine will never recover!

Here’s his direct quote

The reason the machine *dies* from that thing is that we end up then immediately having a

BUG: unable to handle kernel paging request at ffffffffb70bdaa8
 IP: blk_flush_plug_list+0x8b/0x250
 Call Trace:
 schedule+0x61/0x80
 do_exit+0x8c8/0xae0
 rewind_stack_do_exit+0x17/0x20

and then a

Fixing recursive fault but reboot is needed!

and the machine will never recover.

Fixing this bug is number one priority for Linus and he asked Johannes Weiner to work on it. Measures are being taken to avoid having it end up in a stable release, but please check your kernel config to make sure CONFIG_DEBUG_VM is disabled until the bug is fixed.

Linus re-emphasized and warned to not use BUG_ON() for debugging, but rather use WARN_ON() which is a safer alternative to BUG_ON().

For more details on this problem, please follow the thread on the LKML.

Author: Shuah Khan

Shuah contributes to multiple aspects of the Linux Kernel, and she maintains the Kernel Selftest framework.