summaryrefslogtreecommitdiff
path: root/kernel/rcu/tree.h
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.ibm.com>2019-03-29 16:43:51 -0700
committerPaul E. McKenney <paulmck@linux.ibm.com>2019-08-13 14:32:39 -0700
commit12f54c3a8410102afb96ed437aebe7f1d87f399f (patch)
tree972e4b2944a7075cd191ebf4543a645dc90fd681 /kernel/rcu/tree.h
parent6484fe54b5c64e9a388f369001508ab8df85a646 (diff)
downloadlwn-12f54c3a8410102afb96ed437aebe7f1d87f399f.tar.gz
lwn-12f54c3a8410102afb96ed437aebe7f1d87f399f.zip
rcu/nocb: Provide separate no-CBs grace-period kthreads
Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads are divided into groups. The first rcuo kthread to come online in a given group is that group's leader, and the leader both waits for grace periods and invokes its CPU's callbacks. The non-leader rcuo kthreads only invoke callbacks. This works well in the real-time/embedded environments for which it was intended because such environments tend not to generate all that many callbacks. However, given huge floods of callbacks, it is possible for the leader kthread to be stuck invoking callbacks while its followers wait helplessly while their callbacks pile up. This is a good recipe for an OOM, and rcutorture's new callback-flood capability does generate such OOMs. One strategy would be to wait until such OOMs start happening in production, but similar OOMs have in fact happened starting in 2018. It would therefore be wise to take a more proactive approach. This commit therefore features per-CPU rcuo kthreads that do nothing but invoke callbacks. Instead of having one of these kthreads act as leader, each group has a separate rcog kthread that handles grace periods for its group. Because these rcuog kthreads do not invoke callbacks, callback floods on one CPU no longer block callbacks from reaching the rcuc callback-invocation kthreads on other CPUs. This change does introduce additional kthreads, however: 1. The number of additional kthreads is about the square root of the number of CPUs, so that a 4096-CPU system would have only about 64 additional kthreads. Note that recent changes decreased the number of rcuo kthreads by a factor of two (CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so this still represents a significant improvement on most systems. 2. The leading "rcuo" of the rcuog kthreads should allow existing scripting to affinity these additional kthreads as needed, the same as for the rcuop and rcuos kthreads. (There are no longer any rcuob kthreads.) 3. A state-machine approach was considered and rejected. Although this would allow the rcuo kthreads to continue their dual leader/follower roles, it complicates callback invocation and makes it more difficult to consolidate rcuo callback invocation with existing softirq callback invocation. The introduction of rcuog kthreads should thus be acceptable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Diffstat (limited to 'kernel/rcu/tree.h')
-rw-r--r--kernel/rcu/tree.h6
1 files changed, 4 insertions, 2 deletions
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 32b3348d3a4d..dc3c53cb9608 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -200,8 +200,8 @@ struct rcu_data {
atomic_long_t nocb_q_count_lazy; /* invocation (all stages). */
struct rcu_head *nocb_cb_head; /* CBs ready to invoke. */
struct rcu_head **nocb_cb_tail;
- struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
- struct task_struct *nocb_cb_kthread;
+ struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */
+ struct task_struct *nocb_gp_kthread;
raw_spinlock_t nocb_lock; /* Guard following pair of fields. */
int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */
struct timer_list nocb_timer; /* Enforce finite deferral. */
@@ -211,6 +211,8 @@ struct rcu_data {
/* CBs waiting for GP. */
struct rcu_head **nocb_gp_tail;
bool nocb_gp_sleep; /* Is the nocb GP thread asleep? */
+ struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
+ struct task_struct *nocb_cb_kthread;
struct rcu_data *nocb_next_cb_rdp;
/* Next rcu_data in wakeup chain. */