diff options
author | Thomas Gleixner <tglx@linutronix.de> | 2016-03-04 15:59:42 +0100 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2016-04-12 07:33:49 -0700 |
commit | a9a3cef503366cfaf0ac6c98b27a80f9164c186a (patch) | |
tree | d1c8b0bc2b4852d6177f9170675178315c9b9e06 | |
parent | be7cc5a6ea59e48780c993fc11e4bee67abf2197 (diff) | |
download | lwn-a9a3cef503366cfaf0ac6c98b27a80f9164c186a.tar.gz lwn-a9a3cef503366cfaf0ac6c98b27a80f9164c186a.zip |
sched/cputime: Fix steal time accounting vs. CPU hotplug
commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream.
On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:
u64 steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.
Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.
None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.
Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r-- | kernel/sched/core.c | 1 | ||||
-rw-r--r-- | kernel/sched/sched.h | 13 |
2 files changed, 14 insertions, 0 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 41f6b2215aa8..a74073f8c08c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5630,6 +5630,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu) case CPU_UP_PREPARE: rq->calc_load_update = calc_load_update; + account_reset_rq(rq); break; case CPU_ONLINE: diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 10f16374df7f..ff87d887ff62 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu) } #endif /* CONFIG_64BIT */ #endif /* CONFIG_IRQ_TIME_ACCOUNTING */ + +static inline void account_reset_rq(struct rq *rq) +{ +#ifdef CONFIG_IRQ_TIME_ACCOUNTING + rq->prev_irq_time = 0; +#endif +#ifdef CONFIG_PARAVIRT + rq->prev_steal_time = 0; +#endif +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING + rq->prev_steal_time_rq = 0; +#endif +} |