diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 12:04:41 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 12:04:41 -0800 |
commit | d089f48fba28db14d0fe7753248f2575a9ddfc73 (patch) | |
tree | a3821c02dd38342193459e41ba453c058f75e3d2 /Documentation/RCU/stallwarn.rst | |
parent | 3f6ec19f2d05d800bbc42d95dece433da7697864 (diff) | |
parent | 2b392cb11c0db645ba81a08b6a2e96c56ec1fc64 (diff) | |
download | lwn-d089f48fba28db14d0fe7753248f2575a9ddfc73.tar.gz lwn-d089f48fba28db14d0fe7753248f2575a9ddfc73.zip |
Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"These are the latest RCU updates for v5.12:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide
allocator return addresses to more easily locate bugs. This has a
couple of RCU-related commits, but is mostly MM. Was pulled in with
akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks, which enables
better debugging information and smarter reactions to large numbers
of callbacks.
- The first round of changes to allow CPUs to be runtime switched
from and to callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a
"torture everything" script that runs rcutorture, locktorture,
scftorture, rcuscale, and refscale. Plus does an allmodconfig
build.
- nolibc fixes for the torture tests"
* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
mm: Make mem_obj_dump() vmalloc() dumps include start and length
mm: Make mem_dump_obj() handle vmalloc() memory
mm: Make mem_dump_obj() handle NULL and zero-sized pointers
mm: Add mem_dump_obj() to print source of memory block
tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
tools/nolibc: Fix position of -lgcc in the documented example
tools/nolibc: Emit detailed error for missing alternate syscall number definitions
tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
tools/nolibc: Get timeval, timespec and timezone from linux/time.h
tools/nolibc: Implement poll() based on ppoll()
tools/nolibc: Implement fork() based on clone()
tools/nolibc: Make getpgrp() fall back to getpgid(0)
tools/nolibc: Make dup2() rely on dup3() when available
tools/nolibc: Add the definition for dup()
rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
torture: Maintain torture-specific set of CPUs-online books
torture: Clean up after torture-test CPU hotplugging
rcutorture: Make object_debug also double call_rcu() heap object
...
Diffstat (limited to 'Documentation/RCU/stallwarn.rst')
-rw-r--r-- | Documentation/RCU/stallwarn.rst | 27 |
1 files changed, 24 insertions, 3 deletions
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst index c9ab6af4d3be..7148e9be08c3 100644 --- a/Documentation/RCU/stallwarn.rst +++ b/Documentation/RCU/stallwarn.rst @@ -25,7 +25,7 @@ warnings: - A CPU looping with bottom halves disabled. -- For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel +- For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel without invoking schedule(). If the looping in the kernel is really expected and desirable behavior, you might need to add some calls to cond_resched(). @@ -44,7 +44,7 @@ warnings: result in the ``rcu_.*kthread starved for`` console-log message, which will include additional debugging information. -- A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might +- A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, @@ -92,7 +92,9 @@ warnings: buggy timer hardware through bugs in the interrupt or exception path (whether hardware, firmware, or software) through bugs in Linux's timer subsystem through bugs in the scheduler, and, - yes, even including bugs in RCU itself. + yes, even including bugs in RCU itself. It can also result in + the ``rcu_.*timer wakeup didn't happen for`` console-log message, + which will include additional debugging information. - A bug in the RCU implementation. @@ -292,6 +294,25 @@ kthread is waiting for a short timeout, the "state" precedes value of the task_struct ->state field, and the "cpu" indicates that the grace-period kthread last ran on CPU 5. +If the relevant grace-period kthread does not wake from FQS wait in a +reasonable time, then the following additional line is printed:: + + kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 + +The "23804" indicates that kthread's timer expired more than 23 thousand +jiffies ago. The rest of the line has meaning similar to the kthread +starvation case. + +Additionally, the following line is printed:: + + Possible timer handling issue on cpu=4 timer-softirq=11142 + +Here "cpu" indicates that the grace-period kthread last ran on CPU 4, +where it queued the fqs timer. The number following the "timer-softirq" +is the current ``TIMER_SOFTIRQ`` count on cpu 4. If this value does not +change on successive RCU CPU stall warnings, there is further reason to +suspect a timer problem. + Multiple Warnings From One Stall ================================ |