diff options
author | Peter Zijlstra <peterz@infradead.org> | 2017-04-25 14:00:49 +0200 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2017-05-15 10:15:28 +0200 |
commit | 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 (patch) | |
tree | ac1164ba291f0082d2ff7469630f59c3a66413c8 /kernel/sched/topology.c | |
parent | af85596c74de2fd9abb87501ae280038ac28a3f4 (diff) | |
download | lwn-73bb059f9b8a00c5e1bf2f7ca83138c05d05e600.tar.gz lwn-73bb059f9b8a00c5e1bf2f7ca83138c05d05e600.zip |
sched/topology: Fix overlapping sched_group_mask
The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.
The current code gets it wrong, as can be readily demonstrated with a
topology like:
node 0 1 2 3
0: 10 20 30 20
1: 20 10 20 30
2: 30 20 10 20
3: 20 30 20 10
Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:
[] CPU1 attaching sched-domain:
[] domain 0: span 0-2 level NUMA
[] groups: 1 (mask: 1), 2, 0
[] domain 1: span 0-3 level NUMA
[] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)
This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.
The fixed topology looks like:
[] CPU1 attaching sched-domain:
[] domain 0: span 0-2 level NUMA
[] groups: 1 (mask: 1), 2, 0
[] domain 1: span 0-3 level NUMA
[] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)
(note: this relies on OVERLAP domains to always have children, this is
true because the regular topology domains are still here -- this is
before degenerate trimming)
Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'kernel/sched/topology.c')
-rw-r--r-- | kernel/sched/topology.c | 18 |
1 files changed, 17 insertions, 1 deletions
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 21efacf547d4..09563c1d1d5b 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -495,6 +495,9 @@ enum s_alloc { /* * Build an iteration mask that can exclude certain CPUs from the upwards * domain traversal. + * + * Only CPUs that can arrive at this group should be considered to continue + * balancing. */ static void build_group_mask(struct sched_domain *sd, struct sched_group *sg) { @@ -505,11 +508,24 @@ static void build_group_mask(struct sched_domain *sd, struct sched_group *sg) for_each_cpu(i, sg_span) { sibling = *per_cpu_ptr(sdd->sd, i); - if (!cpumask_test_cpu(i, sched_domain_span(sibling))) + + /* + * Can happen in the asymmetric case, where these siblings are + * unused. The mask will not be empty because those CPUs that + * do have the top domain _should_ span the domain. + */ + if (!sibling->child) + continue; + + /* If we would not end up here, we can't continue from here */ + if (!cpumask_equal(sg_span, sched_domain_span(sibling->child))) continue; cpumask_set_cpu(i, sched_group_mask(sg)); } + + /* We must not have empty masks here */ + WARN_ON_ONCE(cpumask_empty(sched_group_mask(sg))); } /* |