diff options
author | Byungchul Park <byungchul.park@lge.com> | 2017-10-25 17:56:00 +0900 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2017-10-25 12:19:01 +0200 |
commit | d141babe4244945f1d001118578e0eb3ce12729d (patch) | |
tree | 2a8e33099749737decdd4cb0e4b8438c66ee3bec /kernel/locking | |
parent | 24208435e343679b21502fb90786084dfaf15369 (diff) | |
download | lwn-d141babe4244945f1d001118578e0eb3ce12729d.tar.gz lwn-d141babe4244945f1d001118578e0eb3ce12729d.zip |
locking/lockdep: Add a boot parameter allowing unwind in cross-release and disable it by default
Johan Hovold reported a heavy performance regression caused by lockdep
cross-release:
> Boot time (from "Linux version" to login prompt) had in fact doubled
> since 4.13 where it took 17 seconds (with my current config) compared to
> the 35 seconds I now see with 4.14-rc4.
>
> I quick bisect pointed to lockdep and specifically the following commit:
>
> 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition
> of a crosslock")
>
> which I've verified is the commit which doubled the boot time (compared
> to 28a903f63ec0^) (added by lockdep crossrelease series [1]).
Currently cross-release performs unwind on every acquisition, but that
is very expensive.
This patch makes unwind optional and disables it by default and only
records acquire_ip.
Full stack traces are sometimes required for full analysis, in which
case a boot paramter, crossrelease_fullstack, can be specified.
On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was
fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU',
where $QEMU launches a kernel with init=/bin/true:
1. No lockdep enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.756558155 seconds time elapsed ( +- 0.09% )
2. Lockdep enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.968710420 seconds time elapsed ( +- 0.12% )
3. Lockdep enabled + cross-release enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
3.153839636 seconds time elapsed ( +- 0.31% )
4. Lockdep enabled + cross-release enabled + this patch applied:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.963669551 seconds time elapsed ( +- 0.11% )
I.e. lockdep cross-release performance is now indistinguishable from
vanilla lockdep.
Bisected-by: Johan Hovold <johan@kernel.org>
Analyzed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: amir73il@gmail.com
Cc: axboe@kernel.dk
Cc: darrick.wong@oracle.com
Cc: david@fromorbit.com
Cc: hch@infradead.org
Cc: idryomov@gmail.com
Cc: johannes.berg@intel.com
Cc: kernel-team@lge.com
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1508921765-15396-5-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'kernel/locking')
-rw-r--r-- | kernel/locking/lockdep.c | 19 |
1 files changed, 17 insertions, 2 deletions
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e36e652d996f..160b5d6df7cb 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -76,6 +76,15 @@ module_param(lock_stat, int, 0644); #define lock_stat 0 #endif +static int crossrelease_fullstack; +static int __init allow_crossrelease_fullstack(char *str) +{ + crossrelease_fullstack = 1; + return 0; +} + +early_param("crossrelease_fullstack", allow_crossrelease_fullstack); + /* * lockdep_lock: protects the lockdep graph, the hashes and the * class/list/hash allocators. @@ -4863,8 +4872,14 @@ static void add_xhlock(struct held_lock *hlock) xhlock->trace.nr_entries = 0; xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES; xhlock->trace.entries = xhlock->trace_entries; - xhlock->trace.skip = 3; - save_stack_trace(&xhlock->trace); + + if (crossrelease_fullstack) { + xhlock->trace.skip = 3; + save_stack_trace(&xhlock->trace); + } else { + xhlock->trace.nr_entries = 1; + xhlock->trace.entries[0] = hlock->acquire_ip; + } } static inline int same_context_xhlock(struct hist_lock *xhlock) |