diff options
author | David Hildenbrand <david@redhat.com> | 2022-09-23 13:34:24 +0200 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2022-09-29 13:20:53 -0600 |
commit | 1cfd9d7e43d5a1cf739d1420b10b1e65feb02f88 (patch) | |
tree | fd1ac535f84899677a03bbbd617fc463b19af928 /scripts | |
parent | 657ed9c9bca059660238771dd1fcecb57b59f90a (diff) | |
download | lwn-1cfd9d7e43d5a1cf739d1420b10b1e65feb02f88.tar.gz lwn-1cfd9d7e43d5a1cf739d1420b10b1e65feb02f88.zip |
coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
is just as bad as BUG_ON(), because it will crash the kernel on
distributions that enable CONFIG_DEBUG_VM (like Fedora):
VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
no different, the only difference is "we can make the code smaller
because these are less important". [2]
This resulted in a more generic discussion about usage of BUG() and
friends. While there might be corner cases that still deserve a BUG_ON(),
most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
recovery path if reasonable:
The only possible case where BUG_ON can validly be used is "I have
some fundamental data corruption and cannot possibly return an
error". [2]
As a very good approximation is the general rule:
"absolutely no new BUG_ON() calls _ever_" [2]
... not even if something really shouldn't ever happen and is merely for
documenting that an invariant always has to hold. However, there are sill
exceptions where BUG_ON() may be used:
If you have a "this is major internal corruption, there's no way we can
continue", then BUG_ON() is appropriate. [3]
There is only one good BUG_ON():
Now, that said, there is one very valid sub-form of BUG_ON():
BUILD_BUG_ON() is absolutely 100% fine. [2]
While WARN will also crash the machine with panic_on_warn set, that's
exactly to be expected:
So we have two very different cases: the "virtual machine with good
logging where a dead machine is fine" - use 'panic_on_warn'. And
the actual real hardware with real drivers, running real loads by
users. [4]
The basic idea is that warnings will similarly get reported by users
and be found during testing. However, in contrast to a BUG(), there is a
way to actually influence the expected behavior (e.g., panic_on_warn)
and to eventually keep the machine alive to extract some debug info.
Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
expect this code to trigger in any case, recovery code is not really
helpful.
I'd prefer to keep all these warnings 'simple' - i.e. no attempted
recovery & control flow, unless we ever expect these to trigger.
[5]
There have been different rules floating around that were never properly
documented. Let's try to clarify.
[1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
[2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
[3] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
[4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
[5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20220923113426.52871-2-david@redhat.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'scripts')
0 files changed, 0 insertions, 0 deletions