diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-04-28 12:37:53 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-04-28 12:37:53 -0700 |
commit | 0ff0edb550e256597e505eff308f90d9a0b6677c (patch) | |
tree | 6d96f53e70f2bdec49627e30c2645ee97b987848 | |
parent | 9a45da9270b64b14e154093c28f746d861ab8c61 (diff) | |
parent | f4abe9967c6fdb511ee567e129a014b60945ab93 (diff) | |
download | lwn-0ff0edb550e256597e505eff308f90d9a0b6677c.tar.gz lwn-0ff0edb550e256597e505eff308f90d9a0b6677c.zip |
Merge tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rtmutex cleanup & spring cleaning pass that removes ~400 lines of
code
- Futex simplifications & cleanups
- Add debugging to the CSD code, to help track down a tenacious race
(or hw problem)
- Add lockdep_assert_not_held(), to allow code to require a lock to not
be held, and propagate this into the ath10k driver
- Misc LKMM documentation updates
- Misc KCSAN updates: cleanups & documentation updates
- Misc fixes and cleanups
- Fix locktorture bugs with ww_mutexes
* tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
kcsan: Fix printk format string
static_call: Relax static_call_update() function argument type
static_call: Fix unused variable warn w/o MODULE
locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock()
locking/rtmutex: Restrict the trylock WARN_ON() to debug
locking/rtmutex: Fix misleading comment in rt_mutex_postunlock()
locking/rtmutex: Consolidate the fast/slowpath invocation
locking/rtmutex: Make text section and inlining consistent
locking/rtmutex: Move debug functions as inlines into common header
locking/rtmutex: Decrapify __rt_mutex_init()
locking/rtmutex: Remove pointless CONFIG_RT_MUTEXES=n stubs
locking/rtmutex: Inline chainwalk depth check
locking/rtmutex: Move rt_mutex_debug_task_free() to rtmutex.c
locking/rtmutex: Remove empty and unused debug stubs
locking/rtmutex: Consolidate rt_mutex_init()
locking/rtmutex: Remove output from deadlock detector
locking/rtmutex: Remove rtmutex deadlock tester leftovers
locking/rtmutex: Remove rt_mutex_timed_lock()
MAINTAINERS: Add myself as futex reviewer
locking/mutex: Remove repeated declaration
...
44 files changed, 1246 insertions, 827 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fa08ec0dfbe7..54582ca6c4f9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -782,6 +782,16 @@ cs89x0_media= [HW,NET] Format: { rj45 | aui | bnc } + csdlock_debug= [KNL] Enable debug add-ons of cross-CPU function call + handling. When switched on, additional debug data is + printed to the console in case a hanging CPU is + detected, and that CPU is pinged again in order to try + to resolve the hang situation. + 0: disable csdlock debugging (default) + 1: enable basic csdlock debugging (minor impact) + ext: enable extended csdlock debugging (more impact, + but more data) + dasd= [HW,NET] See header of drivers/s390/block/dasd_devmap.c. diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst index be7a0b0e1f28..d85ce238ace7 100644 --- a/Documentation/dev-tools/kcsan.rst +++ b/Documentation/dev-tools/kcsan.rst @@ -1,3 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. Copyright (C) 2019, Google LLC. + The Kernel Concurrency Sanitizer (KCSAN) ======================================== diff --git a/MAINTAINERS b/MAINTAINERS index e32c844a154a..85a39e13c6c6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7452,6 +7452,7 @@ M: Thomas Gleixner <tglx@linutronix.de> M: Ingo Molnar <mingo@redhat.com> R: Peter Zijlstra <peterz@infradead.org> R: Darren Hart <dvhart@infradead.org> +R: Davidlohr Bueso <dave@stgolabs.net> L: linux-kernel@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h index 8f009e788ad4..f610a773f2be 100644 --- a/arch/arm/include/asm/spinlock.h +++ b/arch/arm/include/asm/spinlock.h @@ -22,7 +22,7 @@ * assembler to insert a extra (16-bit) IT instruction, depending on the * presence or absence of neighbouring conditional instructions. * - * To avoid this unpredictableness, an approprite IT is inserted explicitly: + * To avoid this unpredictability, an appropriate IT is inserted explicitly: * the assembler won't change IT instructions which are explicitly present * in the input. */ diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index 5ce342b91ff3..610a05374c02 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -14,7 +14,7 @@ #include <linux/stringify.h> #include <linux/types.h> -static __always_inline bool arch_static_branch(struct static_key *key, bool branch) +static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) { asm_volatile_goto("1:" ".byte " __stringify(BYTES_NOP5) "\n\t" @@ -30,7 +30,7 @@ l_yes: return true; } -static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) +static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) { asm_volatile_goto("1:" ".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t" diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index bb6c5ee43ac0..5ce4f8d038b9 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -4727,6 +4727,8 @@ out: /* Must not be called with conf_mutex held as workers can use that also. */ void ath10k_drain_tx(struct ath10k *ar) { + lockdep_assert_not_held(&ar->conf_mutex); + /* make sure rcu-protected mac80211 tx path itself is drained */ synchronize_net(); diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h index cf14840609ce..9fd0ad80fef6 100644 --- a/include/linux/kcsan-checks.h +++ b/include/linux/kcsan-checks.h @@ -1,4 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ +/* + * KCSAN access checks and modifiers. These can be used to explicitly check + * uninstrumented accesses, or change KCSAN checking behaviour of accesses. + * + * Copyright (C) 2019, Google LLC. + */ #ifndef _LINUX_KCSAN_CHECKS_H #define _LINUX_KCSAN_CHECKS_H diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h index 53340d8789f9..fc266ecb2a4d 100644 --- a/include/linux/kcsan.h +++ b/include/linux/kcsan.h @@ -1,4 +1,11 @@ /* SPDX-License-Identifier: GPL-2.0 */ +/* + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. Public interface and + * data structures to set up runtime. See kcsan-checks.h for explicit checks and + * modifiers. For more info please see Documentation/dev-tools/kcsan.rst. + * + * Copyright (C) 2019, Google LLC. + */ #ifndef _LINUX_KCSAN_H #define _LINUX_KCSAN_H diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 1c746139d5e3..5cf387813754 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -155,7 +155,7 @@ extern void lockdep_set_selftest_task(struct task_struct *task); extern void lockdep_init_task(struct task_struct *task); /* - * Split the recrursion counter in two to readily detect 'off' vs recursion. + * Split the recursion counter in two to readily detect 'off' vs recursion. */ #define LOCKDEP_RECURSION_BITS 16 #define LOCKDEP_OFF (1U << LOCKDEP_RECURSION_BITS) @@ -268,6 +268,11 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass, extern void lock_release(struct lockdep_map *lock, unsigned long ip); +/* lock_is_held_type() returns */ +#define LOCK_STATE_UNKNOWN -1 +#define LOCK_STATE_NOT_HELD 0 +#define LOCK_STATE_HELD 1 + /* * Same "read" as for lock_acquire(), except -1 means any. */ @@ -301,8 +306,14 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); #define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0) -#define lockdep_assert_held(l) do { \ - WARN_ON(debug_locks && !lockdep_is_held(l)); \ +#define lockdep_assert_held(l) do { \ + WARN_ON(debug_locks && \ + lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \ + } while (0) + +#define lockdep_assert_not_held(l) do { \ + WARN_ON(debug_locks && \ + lockdep_is_held(l) == LOCK_STATE_HELD); \ } while (0) #define lockdep_assert_held_write(l) do { \ @@ -397,7 +408,8 @@ extern int lockdep_is_held(const void *); #define lockdep_is_held_type(l, r) (1) #define lockdep_assert_held(l) do { (void)(l); } while (0) -#define lockdep_assert_held_write(l) do { (void)(l); } while (0) +#define lockdep_assert_not_held(l) do { (void)(l); } while (0) +#define lockdep_assert_held_write(l) do { (void)(l); } while (0) #define lockdep_assert_held_read(l) do { (void)(l); } while (0) #define lockdep_assert_held_once(l) do { (void)(l); } while (0) #define lockdep_assert_none_held_once() do { } while (0) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 515cff77a4f4..e19323521f9c 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -20,6 +20,7 @@ #include <linux/osq_lock.h> #include <linux/debug_locks.h> +struct ww_class; struct ww_acquire_ctx; /* @@ -65,9 +66,6 @@ struct mutex { #endif }; -struct ww_class; -struct ww_acquire_ctx; - struct ww_mutex { struct mutex base; struct ww_acquire_ctx *ctx; diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h index 6fd615a0eea9..d1672de9ca89 100644 --- a/include/linux/rtmutex.h +++ b/include/linux/rtmutex.h @@ -31,12 +31,6 @@ struct rt_mutex { raw_spinlock_t wait_lock; struct rb_root_cached waiters; struct task_struct *owner; -#ifdef CONFIG_DEBUG_RT_MUTEXES - int save_state; - const char *name, *file; - int line; - void *magic; -#endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif @@ -46,35 +40,17 @@ struct rt_mutex_waiter; struct hrtimer_sleeper; #ifdef CONFIG_DEBUG_RT_MUTEXES - extern int rt_mutex_debug_check_no_locks_freed(const void *from, - unsigned long len); - extern void rt_mutex_debug_check_no_locks_held(struct task_struct *task); +extern void rt_mutex_debug_task_free(struct task_struct *tsk); #else - static inline int rt_mutex_debug_check_no_locks_freed(const void *from, - unsigned long len) - { - return 0; - } -# define rt_mutex_debug_check_no_locks_held(task) do { } while (0) +static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { } #endif -#ifdef CONFIG_DEBUG_RT_MUTEXES -# define __DEBUG_RT_MUTEX_INITIALIZER(mutexname) \ - , .name = #mutexname, .file = __FILE__, .line = __LINE__ - -# define rt_mutex_init(mutex) \ +#define rt_mutex_init(mutex) \ do { \ static struct lock_class_key __key; \ __rt_mutex_init(mutex, __func__, &__key); \ } while (0) - extern void rt_mutex_debug_task_free(struct task_struct *tsk); -#else -# define __DEBUG_RT_MUTEX_INITIALIZER(mutexname) -# define rt_mutex_init(mutex) __rt_mutex_init(mutex, NULL, NULL) -# define rt_mutex_debug_task_free(t) do { } while (0) -#endif - #ifdef CONFIG_DEBUG_LOCK_ALLOC #define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname) \ , .dep_map = { .name = #mutexname } @@ -86,7 +62,6 @@ do { \ { .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \ , .waiters = RB_ROOT_CACHED \ , .owner = NULL \ - __DEBUG_RT_MUTEX_INITIALIZER(mutexname) \ __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)} #define DEFINE_RT_MUTEX(mutexname) \ @@ -104,7 +79,6 @@ static inline int rt_mutex_is_locked(struct rt_mutex *lock) } extern void __rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key); -extern void rt_mutex_destroy(struct rt_mutex *lock); #ifdef CONFIG_DEBUG_LOCK_ALLOC extern void rt_mutex_lock_nested(struct rt_mutex *lock, unsigned int subclass); @@ -115,9 +89,6 @@ extern void rt_mutex_lock(struct rt_mutex *lock); #endif extern int rt_mutex_lock_interruptible(struct rt_mutex *lock); -extern int rt_mutex_timed_lock(struct rt_mutex *lock, - struct hrtimer_sleeper *timeout); - extern int rt_mutex_trylock(struct rt_mutex *lock); extern void rt_mutex_unlock(struct rt_mutex *lock); diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 4c715be48717..a66038d88878 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -110,7 +110,7 @@ do { \ /* * This is the same regardless of which rwsem implementation that is being used. - * It is just a heuristic meant to be called by somebody alreadying holding the + * It is just a heuristic meant to be called by somebody already holding the * rwsem to see if somebody from an incompatible type is wanting access to the * lock. */ diff --git a/include/linux/static_call.h b/include/linux/static_call.h index e01b61ab86b1..fc94faa53b5b 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -118,9 +118,9 @@ extern void arch_static_call_transform(void *site, void *tramp, void *func, bool #define static_call_update(name, func) \ ({ \ - BUILD_BUG_ON(!__same_type(*(func), STATIC_CALL_TRAMP(name))); \ + typeof(&STATIC_CALL_TRAMP(name)) __F = (func); \ __static_call_update(&STATIC_CALL_KEY(name), \ - STATIC_CALL_TRAMP_ADDR(name), func); \ + STATIC_CALL_TRAMP_ADDR(name), __F); \ }) #define static_call_query(name) (READ_ONCE(STATIC_CALL_KEY(name).func)) diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h index 6ecf2a0220db..b77f39f319ad 100644 --- a/include/linux/ww_mutex.h +++ b/include/linux/ww_mutex.h @@ -48,39 +48,26 @@ struct ww_acquire_ctx { #endif }; -#ifdef CONFIG_DEBUG_LOCK_ALLOC -# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) \ - , .ww_class = class -#else -# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) -#endif - #define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \ { .stamp = ATOMIC_LONG_INIT(0) \ , .acquire_name = #ww_class "_acquire" \ , .mutex_name = #ww_class "_mutex" \ , .is_wait_die = _is_wait_die } -#define __WW_MUTEX_INITIALIZER(lockname, class) \ - { .base = __MUTEX_INITIALIZER(lockname.base) \ - __WW_CLASS_MUTEX_INITIALIZER(lockname, class) } - #define DEFINE_WD_CLASS(classname) \ struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1) #define DEFINE_WW_CLASS(classname) \ struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0) -#define DEFINE_WW_MUTEX(mutexname, ww_class) \ - struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class) - /** * ww_mutex_init - initialize the w/w mutex * @lock: the mutex to be initialized * @ww_class: the w/w class the mutex should belong to * * Initialize the w/w mutex to unlocked state and associate it with the given - * class. + * class. Static define macro for w/w mutex is not provided and this function + * is the only way to properly initialize the w/w mutex. * * It is not allowed to initialize an already locked mutex. */ diff --git a/kernel/futex.c b/kernel/futex.c index 00febd6dea9c..c98b825da9cf 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -981,6 +981,7 @@ static inline void exit_pi_state_list(struct task_struct *curr) { } * p->pi_lock: * * p->pi_state_list -> pi_state->list, relation + * pi_mutex->owner -> pi_state->owner, relation * * pi_state->refcount: * @@ -1494,13 +1495,14 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q) static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state) { u32 curval, newval; + struct rt_mutex_waiter *top_waiter; struct task_struct *new_owner; bool postunlock = false; DEFINE_WAKE_Q(wake_q); int ret = 0; - new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); - if (WARN_ON_ONCE(!new_owner)) { + top_waiter = rt_mutex_top_waiter(&pi_state->pi_mutex); + if (WARN_ON_ONCE(!top_waiter)) { /* * As per the comment in futex_unlock_pi() this should not happen. * @@ -1513,6 +1515,8 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_ goto out_unlock; } + new_owner = top_waiter->task; + /* * We pass it to the next owner. The WAITERS bit is always kept * enabled while there is PI state around. We cleanup the owner @@ -2315,19 +2319,15 @@ retry: /* * PI futexes can not be requeued and must remove themself from the - * hash bucket. The hash bucket lock (i.e. lock_ptr) is held on entry - * and dropped here. + * hash bucket. The hash bucket lock (i.e. lock_ptr) is held. */ static void unqueue_me_pi(struct futex_q *q) - __releases(q->lock_ptr) { __unqueue_futex(q); BUG_ON(!q->pi_state); put_pi_state(q->pi_state); q->pi_state = NULL; - - spin_unlock(q->lock_ptr); } static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, @@ -2909,8 +2909,8 @@ no_block: if (res) ret = (res < 0) ? res : 0; - /* Unqueue and drop the lock */ unqueue_me_pi(&q); + spin_unlock(q.lock_ptr); goto out; out_unlock_put_key: @@ -3237,15 +3237,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, * reference count. */ - /* Check if the requeue code acquired the second futex for us. */ + /* + * Check if the requeue code acquired the second futex for us and do + * any pertinent fixup. + */ if (!q.rt_waiter) { - /* - * Got the lock. We might not be the anticipated owner if we - * did a lock-steal - fix up the PI-state in that case. - */ if (q.pi_state && (q.pi_state->owner != current)) { spin_lock(q.lock_ptr); - ret = fixup_pi_state_owner(uaddr2, &q, current); + ret = fixup_owner(uaddr2, &q, true); /* * Drop the reference to the pi state which * the requeue_pi() code acquired for us. @@ -3287,8 +3286,8 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, if (res) ret = (res < 0) ? res : 0; - /* Unqueue and drop the lock. */ unqueue_me_pi(&q); + spin_unlock(q.lock_ptr); } if (ret == -EINTR) { diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile index 65ca5539c470..c2bb07f5bcc7 100644 --- a/kernel/kcsan/Makefile +++ b/kernel/kcsan/Makefile @@ -13,5 +13,5 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \ obj-y := core.o debugfs.o report.o obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o -CFLAGS_kcsan-test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer -obj-$(CONFIG_KCSAN_TEST) += kcsan-test.o +CFLAGS_kcsan_test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer +obj-$(CONFIG_KCSAN_KUNIT_TEST) += kcsan_test.o diff --git a/kernel/kcsan/atomic.h b/kernel/kcsan/atomic.h index 75fe701f4127..530ae1bda8e7 100644 --- a/kernel/kcsan/atomic.h +++ b/kernel/kcsan/atomic.h @@ -1,4 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ +/* + * Rules for implicitly atomic memory accesses. + * + * Copyright (C) 2019, Google LLC. + */ #ifndef _KERNEL_KCSAN_ATOMIC_H #define _KERNEL_KCSAN_ATOMIC_H diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c index 3bf98db9c702..45c821d4e8bd 100644 --- a/kernel/kcsan/core.c +++ b/kernel/kcsan/core.c @@ -1,4 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * KCSAN core runtime. + * + * Copyright (C) 2019, Google LLC. + */ #define pr_fmt(fmt) "kcsan: " fmt @@ -639,8 +644,6 @@ void __init kcsan_init(void) BUG_ON(!in_task()); - kcsan_debugfs_init(); - for_each_possible_cpu(cpu) per_cpu(kcsan_rand_state, cpu) = (u32)get_cycles(); diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c index 3c8093a371b1..c1dd02f3be8b 100644 --- a/kernel/kcsan/debugfs.c +++ b/kernel/kcsan/debugfs.c @@ -1,4 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * KCSAN debugfs interface. + * + * Copyright (C) 2019, Google LLC. + */ #define pr_fmt(fmt) "kcsan: " fmt @@ -261,7 +266,9 @@ static const struct file_operations debugfs_ops = .release = single_release }; -void __init kcsan_debugfs_init(void) +static void __init kcsan_debugfs_init(void) { debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops); } + +late_initcall(kcsan_debugfs_init); diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h index 7ee405524904..170a2bb22f53 100644 --- a/kernel/kcsan/encoding.h +++ b/kernel/kcsan/encoding.h @@ -1,4 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ +/* + * KCSAN watchpoint encoding. + * + * Copyright (C) 2019, Google LLC. + */ #ifndef _KERNEL_KCSAN_ENCODING_H #define _KERNEL_KCSAN_ENCODING_H diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h index 8d4bf3431b3c..9881099d4179 100644 --- a/kernel/kcsan/kcsan.h +++ b/kernel/kcsan/kcsan.h @@ -1,8 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ - /* * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please * see Documentation/dev-tools/kcsan.rst. + * + * Copyright (C) 2019, Google LLC. */ #ifndef _KERNEL_KCSAN_KCSAN_H @@ -31,11 +32,6 @@ void kcsan_save_irqtrace(struct task_struct *task); void kcsan_restore_irqtrace(struct task_struct *task); /* - * Initialize debugfs file. - */ -void kcsan_debugfs_init(void); - -/* * Statistics counters displayed via debugfs; should only be modified in * slow-paths. */ diff --git a/kernel/kcsan/kcsan-test.c b/kernel/kcsan/kcsan_test.c index ebe7fd245104..8bcffbdef3d3 100644 --- a/kernel/kcsan/kcsan-test.c +++ b/kernel/kcsan/kcsan_test.c @@ -13,6 +13,8 @@ * Author: Marco Elver <elver@google.com> */ +#define pr_fmt(fmt) "kcsan_test: " fmt + #include <kunit/test.h> #include <linux/jiffies.h> #include <linux/kcsan-checks.h> @@ -951,22 +953,53 @@ static void test_atomic_builtins(struct kunit *test) } /* - * Each test case is run with different numbers of threads. Until KUnit supports - * passing arguments for each test case, we encode #threads in the test case - * name (read by get_num_threads()). [The '-' was chosen as a stylistic - * preference to separate test name and #threads.] + * Generate thread counts for all test cases. Values generated are in interval + * [2, 5] followed by exponentially increasing thread counts from 8 to 32. * * The thread counts are chosen to cover potentially interesting boundaries and - * corner cases (range 2-5), and then stress the system with larger counts. + * corner cases (2 to 5), and then stress the system with larger counts. */ -#define KCSAN_KUNIT_CASE(test_name) \ - { .run_case = test_name, .name = #test_name "-02" }, \ - { .run_case = test_name, .name = #test_name "-03" }, \ - { .run_case = test_name, .name = #test_name "-04" }, \ - { .run_case = test_name, .name = #test_name "-05" }, \ - { .run_case = test_name, .name = #test_name "-08" }, \ - { .run_case = test_name, .name = #test_name "-16" } +static const void *nthreads_gen_params(const void *prev, char *desc) +{ + long nthreads = (long)prev; + + if (nthreads < 0 || nthreads >= 32) + nthreads = 0; /* stop */ + else if (!nthreads) + nthreads = 2; /* initial value */ + else if (nthreads < 5) + nthreads++; + else if (nthreads == 5) + nthreads = 8; + else + nthreads *= 2; + if (!IS_ENABLED(CONFIG_PREEMPT) || !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) { + /* + * Without any preemption, keep 2 CPUs free for other tasks, one + * of which is the main test case function checking for + * completion or failure. + */ + const long min_unused_cpus = IS_ENABLED(CONFIG_PREEMPT_NONE) ? 2 : 0; + const long min_required_cpus = 2 + min_unused_cpus; + + if (num_online_cpus() < min_required_cpus) { + pr_err_once("Too few online CPUs (%u < %ld) for test\n", + num_online_cpus(), min_required_cpus); + nthreads = 0; + } else if (nthreads >= num_online_cpus() - min_unused_cpus) { + /* Use negative value to indicate last param. */ + nthreads = -(num_online_cpus() - min_unused_cpus); + pr_warn_once("Limiting number of threads to %ld (only %d online CPUs)\n", + -nthreads, num_online_cpus()); + } + } + + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "threads=%ld", abs(nthreads)); + return (void *)nthreads; +} + +#define KCSAN_KUNIT_CASE(test_name) KUNIT_CASE_PARAM(test_name, nthreads_gen_params) static struct kunit_case kcsan_test_cases[] = { KCSAN_KUNIT_CASE(test_basic), KCSAN_KUNIT_CASE(test_concurrent_races), @@ -996,24 +1029,6 @@ static struct kunit_case kcsan_test_cases[] = { /* ===== End test cases ===== */ -/* Get number of threads encoded in test name. */ -static bool __no_kcsan -get_num_threads(const char *test, int *nthreads) -{ - int len = strlen(test); - - if (WARN_ON(len < 3)) - return false; - - *nthreads = test[len - 1] - '0'; - *nthreads += (test[len - 2] - '0') * 10; - - if (WARN_ON(*nthreads < 0)) - return false; - - return true; -} - /* Concurrent accesses from interrupts. */ __no_kcsan static void access_thread_timer(struct timer_list *timer) @@ -1076,9 +1091,6 @@ static int test_init(struct kunit *test) if (!torture_init_begin((char *)test->name, 1)) return -EBUSY; - if (!get_num_threads(test->name, &nthreads)) - goto err; - if (WARN_ON(threads)) goto err; @@ -1087,38 +1099,18 @@ static int test_init(struct kunit *test) goto err; } - if (!IS_ENABLED(CONFIG_PREEMPT) || !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) { - /* - * Without any preemption, keep 2 CPUs free for other tasks, one - * of which is the main test case function checking for - * completion or failure. - */ - const int min_unused_cpus = IS_ENABLED(CONFIG_PREEMPT_NONE) ? 2 : 0; - const int min_required_cpus = 2 + min_unused_cpus; + nthreads = abs((long)test->param_value); + if (WARN_ON(!nthreads)) + goto err; - if (num_online_cpus() < min_required_cpus) { - pr_err("%s: too few online CPUs (%u < %d) for test", - test->name, num_online_cpus(), min_required_cpus); - goto err; - } else if (nthreads > num_online_cpus() - min_unused_cpus) { - nthreads = num_online_cpus() - min_unused_cpus; - pr_warn("%s: limiting number of threads to %d\n", - test->name, nthreads); - } - } + threads = kcalloc(nthreads + 1, sizeof(struct task_struct *), GFP_KERNEL); + if (WARN_ON(!threads)) + goto err; - if (nthreads) { - threads = kcalloc(nthreads + 1, sizeof(struct task_struct *), - GFP_KERNEL); - if (WARN_ON(!threads)) + threads[nthreads] = NULL; + for (i = 0; i < nthreads; ++i) { + if (torture_create_kthread(access_thread, NULL, threads[i])) goto err; - - threads[nthreads] = NULL; - for (i = 0; i < nthreads; ++i) { - if (torture_create_kthread(access_thread, NULL, - threads[i])) - goto err; - } } torture_init_end(); @@ -1156,7 +1148,7 @@ static void test_exit(struct kunit *test) } static struct kunit_suite kcsan_test_suite = { - .name = "kcsan-test", + .name = "kcsan", .test_cases = kcsan_test_cases, .init = test_init, .exit = test_exit, diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c index d3bf87e6007c..13dce3c664d6 100644 --- a/kernel/kcsan/report.c +++ b/kernel/kcsan/report.c @@ -1,4 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * KCSAN reporting. + * + * Copyright (C) 2019, Google LLC. + */ #include <linux/debug_locks.h> #include <linux/delay.h> diff --git a/kernel/kcsan/selftest.c b/kernel/kcsan/selftest.c index 9014a3a82cf9..7f29cb0f5e63 100644 --- a/kernel/kcsan/selftest.c +++ b/kernel/kcsan/selftest.c @@ -1,4 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * KCSAN short boot-time selftests. + * + * Copyright (C) 2019, Google LLC. + */ #define pr_fmt(fmt) "kcsan: " fmt diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 8838f1d7c4a2..3572808223e4 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -12,7 +12,6 @@ ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_lockdep_proc.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_mutex-debug.o = $(CC_FLAGS_FTRACE) -CFLAGS_REMOVE_rtmutex-debug.o = $(CC_FLAGS_FTRACE) endif obj-$(CONFIG_DEBUG_IRQFLAGS) += irqflag-debug.o @@ -26,7 +25,6 @@ obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_PROVE_LOCKING) += spinlock.o obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex.o -obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index ef28a0b9cf1e..48d736aa03b2 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -54,6 +54,7 @@ #include <linux/nmi.h> #include <linux/rcupdate.h> #include <linux/kprobes.h> +#include <linux/lockdep.h> #include <asm/sections.h> @@ -1747,7 +1748,7 @@ static enum bfs_result __bfs(struct lock_list *source_entry, /* * Step 4: if not match, expand the path by adding the - * forward or backwards dependencis in the search + * forward or backwards dependencies in the search * */ first = true; @@ -1916,7 +1917,7 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth, * -> B is -(ER)-> or -(EN)->, then we don't need to add A -> B into the * dependency graph, as any strong path ..-> A -> B ->.. we can get with * having dependency A -> B, we could already get a equivalent path ..-> A -> - * .. -> B -> .. with A -> .. -> B. Therefore A -> B is reduntant. + * .. -> B -> .. with A -> .. -> B. Therefore A -> B is redundant. * * We need to make sure both the start and the end of A -> .. -> B is not * weaker than A -> B. For the start part, please see the comment in @@ -5253,13 +5254,13 @@ int __lock_is_held(const struct lockdep_map *lock, int read) if (match_held_lock(hlock, lock)) { if (read == -1 || hlock->read == read) - return 1; + return LOCK_STATE_HELD; - return 0; + return LOCK_STATE_NOT_HELD; } } - return 0; + return LOCK_STATE_NOT_HELD; } static struct pin_cookie __lock_pin_lock(struct lockdep_map *lock) @@ -5538,10 +5539,14 @@ EXPORT_SYMBOL_GPL(lock_release); noinstr int lock_is_held_type(const struct lockdep_map *lock, int read) { unsigned long flags; - int ret = 0; + int ret = LOCK_STATE_NOT_HELD; + /* + * Avoid false negative lockdep_assert_held() and + * lockdep_assert_not_held(). + */ if (unlikely(!lockdep_enabled())) - return 1; /* avoid false negative lockdep_assert_held() */ + return LOCK_STATE_UNKNOWN; raw_local_irq_save(flags); check_flags(flags); diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c index 02ef87f50df2..806978314496 100644 --- a/kernel/locking/lockdep_proc.c +++ b/kernel/locking/lockdep_proc.c @@ -348,7 +348,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v) debug_locks); /* - * Zappped classes and lockdep data buffers reuse statistics. + * Zapped classes and lockdep data buffers reuse statistics. */ seq_puts(m, "\n"); seq_printf(m, " zapped classes: %11lu\n", diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 0ab94e1f1276..b3adb40549bf 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -76,13 +76,13 @@ static void lock_torture_cleanup(void); struct lock_torture_ops { void (*init)(void); void (*exit)(void); - int (*writelock)(void); + int (*writelock)(int tid); void (*write_delay)(struct torture_random_state *trsp); void (*task_boost)(struct torture_random_state *trsp); - void (*writeunlock)(void); - int (*readlock)(void); + void (*writeunlock)(int tid); + int (*readlock)(int tid); void (*read_delay)(struct torture_random_state *trsp); - void (*readunlock)(void); + void (*readunlock)(int tid); unsigned long flags; /* for irq spinlocks */ const char *name; @@ -105,7 +105,7 @@ static struct lock_torture_cxt cxt = { 0, 0, false, false, * Definitions for lock torture testing. */ -static int torture_lock_busted_write_lock(void) +static int torture_lock_busted_write_lock(int tid __maybe_unused) { return 0; /* BUGGY, do not use in real life!!! */ } @@ -122,7 +122,7 @@ static void torture_lock_busted_write_delay(struct torture_random_state *trsp) torture_preempt_schedule(); /* Allow test to be preempted. */ } -static void torture_lock_busted_write_unlock(void) +static void torture_lock_busted_write_unlock(int tid __maybe_unused) { /* BUGGY, do not use in real life!!! */ } @@ -145,7 +145,8 @@ static struct lock_torture_ops lock_busted_ops = { static DEFINE_SPINLOCK(torture_spinlock); -static int torture_spin_lock_write_lock(void) __acquires(torture_spinlock) +static int torture_spin_lock_write_lock(int tid __maybe_unused) +__acquires(torture_spinlock) { spin_lock(&torture_spinlock); return 0; @@ -169,7 +170,8 @@ static void torture_spin_lock_write_delay(struct torture_random_state *trsp) torture_preempt_schedule(); /* Allow test to be preempted. */ } -static void torture_spin_lock_write_unlock(void) __releases(torture_spinlock) +static void torture_spin_lock_write_unlock(int tid __maybe_unused) +__releases(torture_spinlock) { spin_unlock(&torture_spinlock); } @@ -185,7 +187,7 @@ static struct lock_torture_ops spin_lock_ops = { .name = "spin_lock" }; -static int torture_spin_lock_write_lock_irq(void) +static int torture_spin_lock_write_lock_irq(int tid __maybe_unused) __acquires(torture_spinlock) { unsigned long flags; @@ -195,7 +197,7 @@ __acquires(torture_spinlock) return 0; } -static void torture_lock_spin_write_unlock_irq(void) +static void torture_lock_spin_write_unlock_irq(int tid __maybe_unused) __releases(torture_spinlock) { spin_unlock_irqrestore(&torture_spinlock, cxt.cur_ops->flags); @@ -214,7 +216,8 @@ static struct lock_torture_ops spin_lock_irq_ops = { static DEFINE_RWLOCK(torture_rwlock); -static int torture_rwlock_write_lock(void) __acquires(torture_rwlock) +static int torture_rwlock_write_lock(int tid __maybe_unused) +__acquires(torture_rwlock) { write_lock(&torture_rwlock); return 0; @@ -235,12 +238,14 @@ static void torture_rwlock_write_delay(struct torture_random_state *trsp) udelay(shortdelay_us); } -static void torture_rwlock_write_unlock(void) __releases(torture_rwlock) +static void torture_rwlock_write_unlock(int tid __maybe_unused) +__releases(torture_rwlock) { write_unlock(&torture_rwlock); } -static int torture_rwlock_read_lock(void) __acquires(torture_rwlock) +static int torture_rwlock_read_lock(int tid __maybe_unused) +__acquires(torture_rwlock) { read_lock(&torture_rwlock); return 0; @@ -261,7 +266,8 @@ static void torture_rwlock_read_delay(struct torture_random_state *trsp) udelay(shortdelay_us); } -static void torture_rwlock_read_unlock(void) __releases(torture_rwlock) +static void torture_rwlock_read_unlock(int tid __maybe_unused) +__releases(torture_rwlock) { read_unlock(&torture_rwlock); } @@ -277,7 +283,8 @@ static struct lock_torture_ops rw_lock_ops = { .name = "rw_lock" }; -static int torture_rwlock_write_lock_irq(void) __acquires(torture_rwlock) +static int torture_rwlock_write_lock_irq(int tid __maybe_unused) +__acquires(torture_rwlock) { unsigned long flags; @@ -286,13 +293,14 @@ static int torture_rwlock_write_lock_irq(void) __acquires(torture_rwlock) return 0; } -static void torture_rwlock_write_unlock_irq(void) +static void torture_rwlock_write_unlock_irq(int tid __maybe_unused) __releases(torture_rwlock) { write_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags); } -static int torture_rwlock_read_lock_irq(void) __acquires(torture_rwlock) +static int torture_rwlock_read_lock_irq(int tid __maybe_unused) +__acquires(torture_rwlock) { unsigned long flags; @@ -301,7 +309,7 @@ static int torture_rwlock_read_lock_irq(void) __acquires(torture_rwlock) return 0; } -static void torture_rwlock_read_unlock_irq(void) +static void torture_rwlock_read_unlock_irq(int tid __maybe_unused) __releases(torture_rwlock) { read_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags); @@ -320,7 +328,8 @@ static struct lock_torture_ops rw_lock_irq_ops = { static DEFINE_MUTEX(torture_mutex); -static int torture_mutex_lock(void) __acquires(torture_mutex) +static int torture_mutex_lock(int tid __maybe_unused) +__acquires(torture_mutex) { mutex_lock(&torture_mutex); return 0; @@ -340,7 +349,8 @@ static void torture_mutex_delay(struct torture_random_state *trsp) torture_preempt_schedule(); /* Allow test to be preempted. */ } -static void torture_mutex_unlock(void) __releases(torture_mutex) +static void torture_mutex_unlock(int tid __maybe_unused) +__releases(torture_mutex) { mutex_unlock(&torture_mutex); } @@ -357,12 +367,34 @@ static struct lock_torture_ops mutex_lock_ops = { }; #include <linux/ww_mutex.h> +/* + * The torture ww_mutexes should belong to the same lock class as + * torture_ww_class to avoid lockdep problem. The ww_mutex_init() + * function is called for initialization to ensure that. + */ static DEFINE_WD_CLASS(torture_ww_class); -static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class); -static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class); -static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class); +static struct ww_mutex torture_ww_mutex_0, torture_ww_mutex_1, torture_ww_mutex_2; +static struct ww_acquire_ctx *ww_acquire_ctxs; + +static void torture_ww_mutex_init(void) +{ + ww_mutex_init(&torture_ww_mutex_0, &torture_ww_class); + ww_mutex_init(&torture_ww_mutex_1, &torture_ww_class); + ww_mutex_init(&torture_ww_mutex_2, &torture_ww_class); + + ww_acquire_ctxs = kmalloc_array(cxt.nrealwriters_stress, + sizeof(*ww_acquire_ctxs), + GFP_KERNEL); + if (!ww_acquire_ctxs) + VERBOSE_TOROUT_STRING("ww_acquire_ctx: Out of memory"); +} + +static void torture_ww_mutex_exit(void) +{ + kfree(ww_acquire_ctxs); +} -static int torture_ww_mutex_lock(void) +static int torture_ww_mutex_lock(int tid) __acquires(torture_ww_mutex_0) __acquires(torture_ww_mutex_1) __acquires(torture_ww_mutex_2) @@ -372,7 +404,7 @@ __acquires(torture_ww_mutex_2) struct list_head link; struct ww_mutex *lock; } locks[3], *ll, *ln; - struct ww_acquire_ctx ctx; + struct ww_acquire_ctx *ctx = &ww_acquire_ctxs[tid]; locks[0].lock = &torture_ww_mutex_0; list_add(&locks[0].link, &list); @@ -383,12 +415,12 @@ __acquires(torture_ww_mutex_2) locks[2].lock = &torture_ww_mutex_2; list_add(&locks[2].link, &list); - ww_acquire_init(&ctx, &torture_ww_class); + ww_acquire_init(ctx, &torture_ww_class); list_for_each_entry(ll, &list, link) { int err; - err = ww_mutex_lock(ll->lock, &ctx); + err = ww_mutex_lock(ll->lock, ctx); if (!err) continue; @@ -399,25 +431,29 @@ __acquires(torture_ww_mutex_2) if (err != -EDEADLK) return err; - ww_mutex_lock_slow(ll->lock, &ctx); + ww_mutex_lock_slow(ll->lock, ctx); list_move(&ll->link, &list); } - ww_acquire_fini(&ctx); return 0; } -static void torture_ww_mutex_unlock(void) +static void torture_ww_mutex_unlock(int tid) __releases(torture_ww_mutex_0) __releases(torture_ww_mutex_1) __releases(torture_ww_mutex_2) { + struct ww_acquire_ctx *ctx = &ww_acquire_ctxs[tid]; + ww_mutex_unlock(&torture_ww_mutex_0); ww_mutex_unlock(&torture_ww_mutex_1); ww_mutex_unlock(&torture_ww_mutex_2); + ww_acquire_fini(ctx); } static struct lock_torture_ops ww_mutex_lock_ops = { + .init = torture_ww_mutex_init, + .exit = torture_ww_mutex_exit, .writelock = torture_ww_mutex_lock, .write_delay = torture_mutex_delay, .task_boost = torture_boost_dummy, @@ -431,7 +467,8 @@ static struct lock_torture_ops ww_mutex_lock_ops = { #ifdef CONFIG_RT_MUTEXES static DEFINE_RT_MUTEX(torture_rtmutex); -static int torture_rtmutex_lock(void) __acquires(torture_rtmutex) +static int torture_rtmutex_lock(int tid __maybe_unused) +__acquires(torture_rtmutex) { rt_mutex_lock(&torture_rtmutex); return 0; @@ -487,7 +524,8 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp) torture_preempt_schedule(); /* Allow test to be preempted. */ } -static void torture_rtmutex_unlock(void) __releases(torture_rtmutex) +static void torture_rtmutex_unlock(int tid __maybe_unused) +__releases(torture_rtmutex) { rt_mutex_unlock(&torture_rtmutex); } @@ -505,7 +543,8 @@ static struct lock_torture_ops rtmutex_lock_ops = { #endif static DECLARE_RWSEM(torture_rwsem); -static int torture_rwsem_down_write(void) __acquires(torture_rwsem) +static int torture_rwsem_down_write(int tid __maybe_unused) +__acquires(torture_rwsem) { down_write(&torture_rwsem); return 0; @@ -525,12 +564,14 @@ static void torture_rwsem_write_delay(struct torture_random_state *trsp) torture_preempt_schedule(); /* Allow test to be preempted. */ } -static void torture_rwsem_up_write(void) __releases(torture_rwsem) +static void torture_rwsem_up_write(int tid __maybe_unused) +__releases(torture_rwsem) { up_write(&torture_rwsem); } -static int torture_rwsem_down_read(void) __acquires(torture_rwsem) +static int torture_rwsem_down_read(int tid __maybe_unused) +__acquires(torture_rwsem) { down_read(&torture_rwsem); return 0; @@ -550,7 +591,8 @@ static void torture_rwsem_read_delay(struct torture_random_state *trsp) torture_preempt_schedule(); /* Allow test to be preempted. */ } -static void torture_rwsem_up_read(void) __releases(torture_rwsem) +static void torture_rwsem_up_read(int tid __maybe_unused) +__releases(torture_rwsem) { up_read(&torture_rwsem); } @@ -579,24 +621,28 @@ static void torture_percpu_rwsem_exit(void) percpu_free_rwsem(&pcpu_rwsem); } -static int torture_percpu_rwsem_down_write(void) __acquires(pcpu_rwsem) +static int torture_percpu_rwsem_down_write(int tid __maybe_unused) +__acquires(pcpu_rwsem) { percpu_down_write(&pcpu_rwsem); return 0; } -static void torture_percpu_rwsem_up_write(void) __releases(pcpu_rwsem) +static void torture_percpu_rwsem_up_write(int tid __maybe_unused) +__releases(pcpu_rwsem) { percpu_up_write(&pcpu_rwsem); } -static int torture_percpu_rwsem_down_read(void) __acquires(pcpu_rwsem) +static int torture_percpu_rwsem_down_read(int tid __maybe_unused) +__acquires(pcpu_rwsem) { percpu_down_read(&pcpu_rwsem); return 0; } -static void torture_percpu_rwsem_up_read(void) __releases(pcpu_rwsem) +static void torture_percpu_rwsem_up_read(int tid __maybe_unused) +__releases(pcpu_rwsem) { percpu_up_read(&pcpu_rwsem); } @@ -621,6 +667,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = { static int lock_torture_writer(void *arg) { struct lock_stress_stats *lwsp = arg; + int tid = lwsp - cxt.lwsa; DEFINE_TORTURE_RANDOM(rand); VERBOSE_TOROUT_STRING("lock_torture_writer task started"); @@ -631,7 +678,7 @@ static int lock_torture_writer(void *arg) schedule_timeout_uninterruptible(1); cxt.cur_ops->task_boost(&rand); - cxt.cur_ops->writelock(); + cxt.cur_ops->writelock(tid); if (WARN_ON_ONCE(lock_is_write_held)) lwsp->n_lock_fail++; lock_is_write_held = true; @@ -642,7 +689,7 @@ static int lock_torture_writer(void *arg) cxt.cur_ops->write_delay(&rand); lock_is_write_held = false; WRITE_ONCE(last_lock_release, jiffies); - cxt.cur_ops->writeunlock(); + cxt.cur_ops->writeunlock(tid); stutter_wait("lock_torture_writer"); } while (!torture_must_stop()); @@ -659,6 +706,7 @@ static int lock_torture_writer(void *arg) static int lock_torture_reader(void *arg) { struct lock_stress_stats *lrsp = arg; + int tid = lrsp - cxt.lrsa; DEFINE_TORTURE_RANDOM(rand); VERBOSE_TOROUT_STRING("lock_torture_reader task started"); @@ -668,7 +716,7 @@ static int lock_torture_reader(void *arg) if ((torture_random(&rand) & 0xfffff) == 0) schedule_timeout_uninterruptible(1); - cxt.cur_ops->readlock(); + cxt.cur_ops->readlock(tid); lock_is_read_held = true; if (WARN_ON_ONCE(lock_is_write_held)) lrsp->n_lock_fail++; /* rare, but... */ @@ -676,7 +724,7 @@ static int lock_torture_reader(void *arg) lrsp->n_lock_acquired++; cxt.cur_ops->read_delay(&rand); lock_is_read_held = false; - cxt.cur_ops->readunlock(); + cxt.cur_ops->readunlock(tid); stutter_wait("lock_torture_reader"); } while (!torture_must_stop()); @@ -891,16 +939,16 @@ static int __init lock_torture_init(void) goto unwind; } - if (cxt.cur_ops->init) { - cxt.cur_ops->init(); - cxt.init_called = true; - } - if (nwriters_stress >= 0) cxt.nrealwriters_stress = nwriters_stress; else cxt.nrealwriters_stress = 2 * num_online_cpus(); + if (cxt.cur_ops->init) { + cxt.cur_ops->init(); + cxt.init_called = true; + } + #ifdef CONFIG_DEBUG_MUTEXES if (str_has_prefix(torture_type, "mutex")) cxt.debug_lock = true; diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 5e10153b4d3c..85251d8771d9 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -7,7 +7,7 @@ * The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spin-lock * with the desirable properties of being fair, and with each cpu trying * to acquire the lock spinning on a local variable. - * It avoids expensive cache bouncings that common test-and-set spin-lock + * It avoids expensive cache bounces that common test-and-set spin-lock * implementations incur. */ #ifndef __LINUX_MCS_SPINLOCK_H diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 622ebdfcd083..cb6b112ce155 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -92,7 +92,7 @@ static inline unsigned long __owner_flags(unsigned long owner) } /* - * Trylock variant that retuns the owning task on failure. + * Trylock variant that returns the owning task on failure. */ static inline struct task_struct *__mutex_trylock_or_owner(struct mutex *lock) { @@ -207,7 +207,7 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter, /* * Give up ownership to a specific task, when @task = NULL, this is equivalent - * to a regular unlock. Sets PICKUP on a handoff, clears HANDOF, preserves + * to a regular unlock. Sets PICKUP on a handoff, clears HANDOFF, preserves * WAITERS. Provides RELEASE semantics like a regular unlock, the * __mutex_trylock() provides a matching ACQUIRE semantics for the handoff. */ diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 1de006ed3aa8..d5610ad52b92 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -135,7 +135,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) */ /* - * Wait to acquire the lock or cancelation. Note that need_resched() + * Wait to acquire the lock or cancellation. Note that need_resched() * will come with an IPI, which will wake smp_cond_load_relaxed() if it * is implemented with a monitor-wait. vcpu_is_preempted() relies on * polling, be careful. @@ -164,7 +164,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) /* * We can only fail the cmpxchg() racing against an unlock(), - * in which case we should observe @node->locked becomming + * in which case we should observe @node->locked becoming * true. */ if (smp_load_acquire(&node->locked)) diff --git a/kernel/locking/rtmutex-debug.c b/kernel/locking/rtmutex-debug.c deleted file mode 100644 index 36e69100e8e0..000000000000 --- a/kernel/locking/rtmutex-debug.c +++ /dev/null @@ -1,182 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * RT-Mutexes: blocking mutual exclusion locks with PI support - * - * started by Ingo Molnar and Thomas Gleixner: - * - * Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com> - * Copyright (C) 2006 Timesys Corp., Thomas Gleixner <tglx@timesys.com> - * - * This code is based on the rt.c implementation in the preempt-rt tree. - * Portions of said code are - * - * Copyright (C) 2004 LynuxWorks, Inc., Igor Manyilov, Bill Huey - * Copyright (C) 2006 Esben Nielsen - * Copyright (C) 2006 Kihon Technologies Inc., - * Steven Rostedt <rostedt@goodmis.org> - * - * See rt.c in preempt-rt for proper credits and further information - */ -#include <linux/sched.h> -#include <linux/sched/rt.h> -#include <linux/sched/debug.h> -#include <linux/delay.h> -#include <linux/export.h> -#include <linux/spinlock.h> -#include <linux/kallsyms.h> -#include <linux/syscalls.h> -#include <linux/interrupt.h> -#include <linux/rbtree.h> -#include <linux/fs.h> -#include <linux/debug_locks.h> - -#include "rtmutex_common.h" - -static void printk_task(struct task_struct *p) -{ - if (p) - printk("%16s:%5d [%p, %3d]", p->comm, task_pid_nr(p), p, p->prio); - else - printk("<none>"); -} - -static void printk_lock(struct rt_mutex *lock, int print_owner) -{ - if (lock->name) - printk(" [%p] {%s}\n", - lock, lock->name); - else - printk(" [%p] {%s:%d}\n", - lock, lock->file, lock->line); - - if (print_owner && rt_mutex_owner(lock)) { - printk(".. ->owner: %p\n", lock->owner); - printk(".. held by: "); - printk_task(rt_mutex_owner(lock)); - printk("\n"); - } -} - -void rt_mutex_debug_task_free(struct task_struct *task) -{ - DEBUG_LOCKS_WARN_ON(!RB_EMPTY_ROOT(&task->pi_waiters.rb_root)); - DEBUG_LOCKS_WARN_ON(task->pi_blocked_on); -} - -/* - * We fill out the fields in the waiter to store the information about - * the deadlock. We print when we return. act_waiter can be NULL in - * case of a remove waiter operation. - */ -void debug_rt_mutex_deadlock(enum rtmutex_chainwalk chwalk, - struct rt_mutex_waiter *act_waiter, - struct rt_mutex *lock) -{ - struct task_struct *task; - - if (!debug_locks || chwalk == RT_MUTEX_FULL_CHAINWALK || !act_waiter) - return; - - task = rt_mutex_owner(act_waiter->lock); - if (task && task != current) { - act_waiter->deadlock_task_pid = get_pid(task_pid(task)); - act_waiter->deadlock_lock = lock; - } -} - -void debug_rt_mutex_print_deadlock(struct rt_mutex_waiter *waiter) -{ - struct task_struct *task; - - if (!waiter->deadlock_lock || !debug_locks) - return; - - rcu_read_lock(); - task = pid_task(waiter->deadlock_task_pid, PIDTYPE_PID); - if (!task) { - rcu_read_unlock(); - return; - } - - if (!debug_locks_off()) { - rcu_read_unlock(); - return; - } - - pr_warn("\n"); - pr_warn("============================================\n"); - pr_warn("WARNING: circular locking deadlock detected!\n"); - pr_warn("%s\n", print_tainted()); - pr_warn("--------------------------------------------\n"); - printk("%s/%d is deadlocking current task %s/%d\n\n", - task->comm, task_pid_nr(task), - current->comm, task_pid_nr(current)); - - printk("\n1) %s/%d is trying to acquire this lock:\n", - current->comm, task_pid_nr(current)); - printk_lock(waiter->lock, 1); - - printk("\n2) %s/%d is blocked on this lock:\n", - task->comm, task_pid_nr(task)); - printk_lock(waiter->deadlock_lock, 1); - - debug_show_held_locks(current); - debug_show_held_locks(task); - - printk("\n%s/%d's [blocked] stackdump:\n\n", - task->comm, task_pid_nr(task)); - show_stack(task, NULL, KERN_DEFAULT); - printk("\n%s/%d's [current] stackdump:\n\n", - current->comm, task_pid_nr(current)); - dump_stack(); - debug_show_all_locks(); - rcu_read_unlock(); - - printk("[ turning off deadlock detection." - "Please report this trace. ]\n\n"); -} - -void debug_rt_mutex_lock(struct rt_mutex *lock) -{ -} - -void debug_rt_mutex_unlock(struct rt_mutex *lock) -{ - DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current); -} - -void -debug_rt_mutex_proxy_lock(struct rt_mutex *lock, struct task_struct *powner) -{ -} - -void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock) -{ - DEBUG_LOCKS_WARN_ON(!rt_mutex_owner(lock)); -} - -void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter) -{ - memset(waiter, 0x11, sizeof(*waiter)); - waiter->deadlock_task_pid = NULL; -} - -void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter) -{ - put_pid(waiter->deadlock_task_pid); - memset(waiter, 0x22, sizeof(*waiter)); -} - -void debug_rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key) -{ - /* - * Make sure we are not reinitializing a held lock: - */ - debug_check_no_locks_freed((void *)lock, sizeof(*lock)); - lock->name = name; - -#ifdef CONFIG_DEBUG_LOCK_ALLOC - lockdep_init_map(&lock->dep_map, name, key, 0); -#endif -} - diff --git a/kernel/locking/rtmutex-debug.h b/kernel/locking/rtmutex-debug.h deleted file mode 100644 index fc549713bba3..000000000000 --- a/kernel/locking/rtmutex-debug.h +++ /dev/null @@ -1,37 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * RT-Mutexes: blocking mutual exclusion locks with PI support - * - * started by Ingo Molnar and Thomas Gleixner: - * - * Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com> - * Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com> - * - * This file contains macros used solely by rtmutex.c. Debug version. - */ - -extern void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter); -extern void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter); -extern void debug_rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key); -extern void debug_rt_mutex_lock(struct rt_mutex *lock); -extern void debug_rt_mutex_unlock(struct rt_mutex *lock); -extern void debug_rt_mutex_proxy_lock(struct rt_mutex *lock, - struct task_struct *powner); -extern void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock); -extern void debug_rt_mutex_deadlock(enum rtmutex_chainwalk chwalk, - struct rt_mutex_waiter *waiter, - struct rt_mutex *lock); -extern void debug_rt_mutex_print_deadlock(struct rt_mutex_waiter *waiter); -# define debug_rt_mutex_reset_waiter(w) \ - do { (w)->deadlock_lock = NULL; } while (0) - -static inline bool debug_rt_mutex_detect_deadlock(struct rt_mutex_waiter *waiter, - enum rtmutex_chainwalk walk) -{ - return (waiter != NULL); -} - -static inline void rt_mutex_print_deadlock(struct rt_mutex_waiter *w) -{ - debug_rt_mutex_print_deadlock(w); -} diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 48fff6437901..406818196a9f 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -49,7 +49,7 @@ * set this bit before looking at the lock. */ -static void +static __always_inline void rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner) { unsigned long val = (unsigned long)owner; @@ -60,13 +60,13 @@ rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner) WRITE_ONCE(lock->owner, (struct task_struct *)val); } -static inline void clear_rt_mutex_waiters(struct rt_mutex *lock) +static __always_inline void clear_rt_mutex_waiters(struct rt_mutex *lock) { lock->owner = (struct task_struct *) ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS); } -static void fixup_rt_mutex_waiters(struct rt_mutex *lock) +static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex *lock) { unsigned long owner, *p = (unsigned long *) &lock->owner; @@ -149,7 +149,7 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock) * all future threads that attempt to [Rmw] the lock to the slowpath. As such * relaxed semantics suffice. */ -static inline void mark_rt_mutex_waiters(struct rt_mutex *lock) +static __always_inline void mark_rt_mutex_waiters(struct rt_mutex *lock) { unsigned long owner, *p = (unsigned long *) &lock->owner; @@ -165,8 +165,8 @@ static inline void mark_rt_mutex_waiters(struct rt_mutex *lock) * 2) Drop lock->wait_lock * 3) Try to unlock the lock with cmpxchg */ -static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock, - unsigned long flags) +static __always_inline bool unlock_rt_mutex_safe(struct rt_mutex *lock, + unsigned long flags) __releases(lock->wait_lock) { struct task_struct *owner = rt_mutex_owner(lock); @@ -204,7 +204,7 @@ static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock, # define rt_mutex_cmpxchg_acquire(l,c,n) (0) # define rt_mutex_cmpxchg_release(l,c,n) (0) -static inline void mark_rt_mutex_waiters(struct rt_mutex *lock) +static __always_inline void mark_rt_mutex_waiters(struct rt_mutex *lock) { lock->owner = (struct task_struct *) ((unsigned long)lock->owner | RT_MUTEX_HAS_WAITERS); @@ -213,8 +213,8 @@ static inline void mark_rt_mutex_waiters(struct rt_mutex *lock) /* * Simple slow path only version: lock->owner is protected by lock->wait_lock. */ -static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock, - unsigned long flags) +static __always_inline bool unlock_rt_mutex_safe(struct rt_mutex *lock, + unsigned long flags) __releases(lock->wait_lock) { lock->owner = NULL; @@ -229,9 +229,8 @@ static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock, #define task_to_waiter(p) \ &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } -static inline int -rt_mutex_waiter_less(struct rt_mutex_waiter *left, - struct rt_mutex_waiter *right) +static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, + struct rt_mutex_waiter *right) { if (left->prio < right->prio) return 1; @@ -248,9 +247,8 @@ rt_mutex_waiter_less(struct rt_mutex_waiter *left, return 0; } -static inline int -rt_mutex_waiter_equal(struct rt_mutex_waiter *left, - struct rt_mutex_waiter *right) +static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left, + struct rt_mutex_waiter *right) { if (left->prio != right->prio) return 0; @@ -270,18 +268,18 @@ rt_mutex_waiter_equal(struct rt_mutex_waiter *left, #define __node_2_waiter(node) \ rb_entry((node), struct rt_mutex_waiter, tree_entry) -static inline bool __waiter_less(struct rb_node *a, const struct rb_node *b) +static __always_inline bool __waiter_less(struct rb_node *a, const struct rb_node *b) { return rt_mutex_waiter_less(__node_2_waiter(a), __node_2_waiter(b)); } -static void +static __always_inline void rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { rb_add_cached(&waiter->tree_entry, &lock->waiters, __waiter_less); } -static void +static __always_inline void rt_mutex_dequeue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { if (RB_EMPTY_NODE(&waiter->tree_entry)) @@ -294,18 +292,19 @@ rt_mutex_dequeue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) #define __node_2_pi_waiter(node) \ rb_entry((node), struct rt_mutex_waiter, pi_tree_entry) -static inline bool __pi_waiter_less(struct rb_node *a, const struct rb_node *b) +static __always_inline bool +__pi_waiter_less(struct rb_node *a, const struct rb_node *b) { return rt_mutex_waiter_less(__node_2_pi_waiter(a), __node_2_pi_waiter(b)); } -static void +static __always_inline void rt_mutex_enqueue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter) { rb_add_cached(&waiter->pi_tree_entry, &task->pi_waiters, __pi_waiter_less); } -static void +static __always_inline void rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter) { if (RB_EMPTY_NODE(&waiter->pi_tree_entry)) @@ -315,7 +314,7 @@ rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter) RB_CLEAR_NODE(&waiter->pi_tree_entry); } -static void rt_mutex_adjust_prio(struct task_struct *p) +static __always_inline void rt_mutex_adjust_prio(struct task_struct *p) { struct task_struct *pi_task = NULL; @@ -340,17 +339,13 @@ static void rt_mutex_adjust_prio(struct task_struct *p) * deadlock detection is disabled independent of the detect argument * and the config settings. */ -static bool rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter, - enum rtmutex_chainwalk chwalk) +static __always_inline bool +rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter, + enum rtmutex_chainwalk chwalk) { - /* - * This is just a wrapper function for the following call, - * because debug_rt_mutex_detect_deadlock() smells like a magic - * debug feature and I wanted to keep the cond function in the - * main source file along with the comments instead of having - * two of the same in the headers. - */ - return debug_rt_mutex_detect_deadlock(waiter, chwalk); + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEX)) + return waiter != NULL; + return chwalk == RT_MUTEX_FULL_CHAINWALK; } /* @@ -358,7 +353,7 @@ static bool rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter, */ int max_lock_depth = 1024; -static inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p) +static __always_inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p) { return p->pi_blocked_on ? p->pi_blocked_on->lock : NULL; } @@ -426,12 +421,12 @@ static inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p) * unlock(lock->wait_lock); release [L] * goto again; */ -static int rt_mutex_adjust_prio_chain(struct task_struct *task, - enum rtmutex_chainwalk chwalk, - struct rt_mutex *orig_lock, - struct rt_mutex *next_lock, - struct rt_mutex_waiter *orig_waiter, - struct task_struct *top_task) +static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task, + enum rtmutex_chainwalk chwalk, + struct rt_mutex *orig_lock, + struct rt_mutex *next_lock, + struct rt_mutex_waiter *orig_waiter, + struct task_struct *top_task) { struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter; struct rt_mutex_waiter *prerequeue_top_waiter; @@ -579,7 +574,6 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, * walk, we detected a deadlock. */ if (lock == orig_lock || rt_mutex_owner(lock) == top_task) { - debug_rt_mutex_deadlock(chwalk, orig_waiter, lock); raw_spin_unlock(&lock->wait_lock); ret = -EDEADLK; goto out_unlock_pi; @@ -706,7 +700,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, } else if (prerequeue_top_waiter == waiter) { /* * The waiter was the top waiter on the lock, but is - * no longer the top prority waiter. Replace waiter in + * no longer the top priority waiter. Replace waiter in * the owner tasks pi waiters tree with the new top * (highest priority) waiter and adjust the priority * of the owner. @@ -784,8 +778,9 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL */ -static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, - struct rt_mutex_waiter *waiter) +static int __sched +try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, + struct rt_mutex_waiter *waiter) { lockdep_assert_held(&lock->wait_lock); @@ -886,9 +881,6 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, raw_spin_unlock(&task->pi_lock); takeit: - /* We got the lock. */ - debug_rt_mutex_lock(lock); - /* * This either preserves the RT_MUTEX_HAS_WAITERS bit if there * are still waiters or clears it. @@ -905,10 +897,10 @@ takeit: * * This must be called with lock->wait_lock held and interrupts disabled */ -static int task_blocks_on_rt_mutex(struct rt_mutex *lock, - struct rt_mutex_waiter *waiter, - struct task_struct *task, - enum rtmutex_chainwalk chwalk) +static int __sched task_blocks_on_rt_mutex(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter, + struct task_struct *task, + enum rtmutex_chainwalk chwalk) { struct task_struct *owner = rt_mutex_owner(lock); struct rt_mutex_waiter *top_waiter = waiter; @@ -994,8 +986,8 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, * * Called with lock->wait_lock held and interrupts disabled. */ -static void mark_wakeup_next_waiter(struct wake_q_head *wake_q, - struct rt_mutex *lock) +static void __sched mark_wakeup_next_waiter(struct wake_q_head *wake_q, + struct rt_mutex *lock) { struct rt_mutex_waiter *waiter; @@ -1044,8 +1036,8 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q, * Must be called with lock->wait_lock held and interrupts disabled. I must * have just failed to try_to_take_rt_mutex(). */ -static void remove_waiter(struct rt_mutex *lock, - struct rt_mutex_waiter *waiter) +static void __sched remove_waiter(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter) { bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); struct task_struct *owner = rt_mutex_owner(lock); @@ -1102,7 +1094,7 @@ static void remove_waiter(struct rt_mutex *lock, * * Called from sched_setscheduler */ -void rt_mutex_adjust_pi(struct task_struct *task) +void __sched rt_mutex_adjust_pi(struct task_struct *task) { struct rt_mutex_waiter *waiter; struct rt_mutex *next_lock; @@ -1125,7 +1117,7 @@ void rt_mutex_adjust_pi(struct task_struct *task) next_lock, NULL, task); } -void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter) +void __sched rt_mutex_init_waiter(struct rt_mutex_waiter *waiter) { debug_rt_mutex_init_waiter(waiter); RB_CLEAR_NODE(&waiter->pi_tree_entry); @@ -1143,10 +1135,9 @@ void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter) * * Must be called with lock->wait_lock held and interrupts disabled */ -static int __sched -__rt_mutex_slowlock(struct rt_mutex *lock, int state, - struct hrtimer_sleeper *timeout, - struct rt_mutex_waiter *waiter) +static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, int state, + struct hrtimer_sleeper *timeout, + struct rt_mutex_waiter *waiter) { int ret = 0; @@ -1155,24 +1146,17 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state, if (try_to_take_rt_mutex(lock, current, waiter)) break; - /* - * TASK_INTERRUPTIBLE checks for signals and - * timeout. Ignored otherwise. - */ - if (likely(state == TASK_INTERRUPTIBLE)) { - /* Signal pending? */ - if (signal_pending(current)) - ret = -EINTR; - if (timeout && !timeout->task) - ret = -ETIMEDOUT; - if (ret) - break; + if (timeout && !timeout->task) { + ret = -ETIMEDOUT; + break; + } + if (signal_pending_state(state, current)) { + ret = -EINTR; + break; } raw_spin_unlock_irq(&lock->wait_lock); - debug_rt_mutex_print_deadlock(waiter); - schedule(); raw_spin_lock_irq(&lock->wait_lock); @@ -1183,8 +1167,8 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state, return ret; } -static void rt_mutex_handle_deadlock(int res, int detect_deadlock, - struct rt_mutex_waiter *w) +static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock, + struct rt_mutex_waiter *w) { /* * If the result is not -EDEADLOCK or the caller requested @@ -1194,9 +1178,9 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock, return; /* - * Yell lowdly and stop the task right here. + * Yell loudly and stop the task right here. */ - rt_mutex_print_deadlock(w); + WARN(1, "rtmutex deadlock detected\n"); while (1) { set_current_state(TASK_INTERRUPTIBLE); schedule(); @@ -1206,10 +1190,9 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock, /* * Slow path lock function: */ -static int __sched -rt_mutex_slowlock(struct rt_mutex *lock, int state, - struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk) +static int __sched rt_mutex_slowlock(struct rt_mutex *lock, int state, + struct hrtimer_sleeper *timeout, + enum rtmutex_chainwalk chwalk) { struct rt_mutex_waiter waiter; unsigned long flags; @@ -1268,7 +1251,7 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state, return ret; } -static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock) +static int __sched __rt_mutex_slowtrylock(struct rt_mutex *lock) { int ret = try_to_take_rt_mutex(lock, current, NULL); @@ -1284,7 +1267,7 @@ static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock) /* * Slow path try-lock function: */ -static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) +static int __sched rt_mutex_slowtrylock(struct rt_mutex *lock) { unsigned long flags; int ret; @@ -1311,13 +1294,24 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) } /* + * Performs the wakeup of the top-waiter and re-enables preemption. + */ +void __sched rt_mutex_postunlock(struct wake_q_head *wake_q) +{ + wake_up_q(wake_q); + + /* Pairs with preempt_disable() in mark_wakeup_next_waiter() */ + preempt_enable(); +} + +/* * Slow path to release a rt-mutex. * * Return whether the current task needs to call rt_mutex_postunlock(). */ -static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, - struct wake_q_head *wake_q) +static void __sched rt_mutex_slowunlock(struct rt_mutex *lock) { + DEFINE_WAKE_Q(wake_q); unsigned long flags; /* irqsave required to support early boot calls */ @@ -1359,7 +1353,7 @@ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, while (!rt_mutex_has_waiters(lock)) { /* Drops lock->wait_lock ! */ if (unlock_rt_mutex_safe(lock, flags) == true) - return false; + return; /* Relock the rtmutex and try again */ raw_spin_lock_irqsave(&lock->wait_lock, flags); } @@ -1370,10 +1364,10 @@ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, * * Queue the next waiter for wakeup once we release the wait_lock. */ - mark_wakeup_next_waiter(wake_q, lock); + mark_wakeup_next_waiter(&wake_q, lock); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); - return true; /* call rt_mutex_postunlock() */ + rt_mutex_postunlock(&wake_q); } /* @@ -1382,74 +1376,21 @@ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, * The atomic acquire/release ops are compiled away, when either the * architecture does not support cmpxchg or when debugging is enabled. */ -static inline int -rt_mutex_fastlock(struct rt_mutex *lock, int state, - int (*slowfn)(struct rt_mutex *lock, int state, - struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk)) -{ - if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) - return 0; - - return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK); -} - -static inline int -rt_mutex_timed_fastlock(struct rt_mutex *lock, int state, - struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk, - int (*slowfn)(struct rt_mutex *lock, int state, - struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk)) +static __always_inline int __rt_mutex_lock(struct rt_mutex *lock, long state, + unsigned int subclass) { - if (chwalk == RT_MUTEX_MIN_CHAINWALK && - likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) - return 0; + int ret; - return slowfn(lock, state, timeout, chwalk); -} + might_sleep(); + mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_); -static inline int -rt_mutex_fasttrylock(struct rt_mutex *lock, - int (*slowfn)(struct rt_mutex *lock)) -{ if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) - return 1; - - return slowfn(lock); -} - -/* - * Performs the wakeup of the top-waiter and re-enables preemption. - */ -void rt_mutex_postunlock(struct wake_q_head *wake_q) -{ - wake_up_q(wake_q); - - /* Pairs with preempt_disable() in rt_mutex_slowunlock() */ - preempt_enable(); -} - -static inline void -rt_mutex_fastunlock(struct rt_mutex *lock, - bool (*slowfn)(struct rt_mutex *lock, - struct wake_q_head *wqh)) -{ - DEFINE_WAKE_Q(wake_q); - - if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) - return; - - if (slowfn(lock, &wake_q)) - rt_mutex_postunlock(&wake_q); -} - -static inline void __rt_mutex_lock(struct rt_mutex *lock, unsigned int subclass) -{ - might_sleep(); + return 0; - mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_); - rt_mutex_fastlock(lock, TASK_UNINTERRUPTIBLE, rt_mutex_slowlock); + ret = rt_mutex_slowlock(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK); + if (ret) + mutex_release(&lock->dep_map, _RET_IP_); + return ret; } #ifdef CONFIG_DEBUG_LOCK_ALLOC @@ -1461,7 +1402,7 @@ static inline void __rt_mutex_lock(struct rt_mutex *lock, unsigned int subclass) */ void __sched rt_mutex_lock_nested(struct rt_mutex *lock, unsigned int subclass) { - __rt_mutex_lock(lock, subclass); + __rt_mutex_lock(lock, TASK_UNINTERRUPTIBLE, subclass); } EXPORT_SYMBOL_GPL(rt_mutex_lock_nested); @@ -1474,7 +1415,7 @@ EXPORT_SYMBOL_GPL(rt_mutex_lock_nested); */ void __sched rt_mutex_lock(struct rt_mutex *lock) { - __rt_mutex_lock(lock, 0); + __rt_mutex_lock(lock, TASK_UNINTERRUPTIBLE, 0); } EXPORT_SYMBOL_GPL(rt_mutex_lock); #endif @@ -1490,82 +1431,37 @@ EXPORT_SYMBOL_GPL(rt_mutex_lock); */ int __sched rt_mutex_lock_interruptible(struct rt_mutex *lock) { - int ret; - - might_sleep(); - - mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); - ret = rt_mutex_fastlock(lock, TASK_INTERRUPTIBLE, rt_mutex_slowlock); - if (ret) - mutex_release(&lock->dep_map, _RET_IP_); - - return ret; + return __rt_mutex_lock(lock, TASK_INTERRUPTIBLE, 0); } EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible); -/* - * Futex variant, must not use fastpath. - */ -int __sched rt_mutex_futex_trylock(struct rt_mutex *lock) -{ - return rt_mutex_slowtrylock(lock); -} - -int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock) -{ - return __rt_mutex_slowtrylock(lock); -} - -/** - * rt_mutex_timed_lock - lock a rt_mutex interruptible - * the timeout structure is provided - * by the caller - * - * @lock: the rt_mutex to be locked - * @timeout: timeout structure or NULL (no timeout) - * - * Returns: - * 0 on success - * -EINTR when interrupted by a signal - * -ETIMEDOUT when the timeout expired - */ -int -rt_mutex_timed_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout) -{ - int ret; - - might_sleep(); - - mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); - ret = rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout, - RT_MUTEX_MIN_CHAINWALK, - rt_mutex_slowlock); - if (ret) - mutex_release(&lock->dep_map, _RET_IP_); - - return ret; -} -EXPORT_SYMBOL_GPL(rt_mutex_timed_lock); - /** * rt_mutex_trylock - try to lock a rt_mutex * * @lock: the rt_mutex to be locked * - * This function can only be called in thread context. It's safe to - * call it from atomic regions, but not from hard interrupt or soft - * interrupt context. + * This function can only be called in thread context. It's safe to call it + * from atomic regions, but not from hard or soft interrupt context. * - * Returns 1 on success and 0 on contention + * Returns: + * 1 on success + * 0 on contention */ int __sched rt_mutex_trylock(struct rt_mutex *lock) { int ret; - if (WARN_ON_ONCE(in_irq() || in_nmi() || in_serving_softirq())) + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task())) return 0; - ret = rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock); + /* + * No lockdep annotation required because lockdep disables the fast + * path. + */ + if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) + return 1; + + ret = rt_mutex_slowtrylock(lock); if (ret) mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); @@ -1581,10 +1477,26 @@ EXPORT_SYMBOL_GPL(rt_mutex_trylock); void __sched rt_mutex_unlock(struct rt_mutex *lock) { mutex_release(&lock->dep_map, _RET_IP_); - rt_mutex_fastunlock(lock, rt_mutex_slowunlock); + if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) + return; + + rt_mutex_slowunlock(lock); } EXPORT_SYMBOL_GPL(rt_mutex_unlock); +/* + * Futex variants, must not use fastpath. + */ +int __sched rt_mutex_futex_trylock(struct rt_mutex *lock) +{ + return rt_mutex_slowtrylock(lock); +} + +int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock) +{ + return __rt_mutex_slowtrylock(lock); +} + /** * __rt_mutex_futex_unlock - Futex variant, that since futex variants * do not use the fast-path, can be simple and will not need to retry. @@ -1593,7 +1505,7 @@ EXPORT_SYMBOL_GPL(rt_mutex_unlock); * @wake_q: The wake queue head from which to get the next lock waiter */ bool __sched __rt_mutex_futex_unlock(struct rt_mutex *lock, - struct wake_q_head *wake_q) + struct wake_q_head *wake_q) { lockdep_assert_held(&lock->wait_lock); @@ -1630,23 +1542,6 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex *lock) } /** - * rt_mutex_destroy - mark a mutex unusable - * @lock: the mutex to be destroyed - * - * This function marks the mutex uninitialized, and any subsequent - * use of the mutex is forbidden. The mutex must not be locked when - * this function is called. - */ -void rt_mutex_destroy(struct rt_mutex *lock) -{ - WARN_ON(rt_mutex_is_locked(lock)); -#ifdef CONFIG_DEBUG_RT_MUTEXES - lock->magic = NULL; -#endif -} -EXPORT_SYMBOL_GPL(rt_mutex_destroy); - -/** * __rt_mutex_init - initialize the rt_mutex * * @lock: The rt_mutex to be initialized @@ -1657,15 +1552,13 @@ EXPORT_SYMBOL_GPL(rt_mutex_destroy); * * Initializing of a locked rt_mutex is not allowed */ -void __rt_mutex_init(struct rt_mutex *lock, const char *name, +void __sched __rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key) { - lock->owner = NULL; - raw_spin_lock_init(&lock->wait_lock); - lock->waiters = RB_ROOT_CACHED; + debug_check_no_locks_freed((void *)lock, sizeof(*lock)); + lockdep_init_map(&lock->dep_map, name, key, 0); - if (name && key) - debug_rt_mutex_init(lock, name, key); + __rt_mutex_basic_init(lock); } EXPORT_SYMBOL_GPL(__rt_mutex_init); @@ -1683,11 +1576,10 @@ EXPORT_SYMBOL_GPL(__rt_mutex_init); * possible at this point because the pi_state which contains the rtmutex * is not yet visible to other tasks. */ -void rt_mutex_init_proxy_locked(struct rt_mutex *lock, - struct task_struct *proxy_owner) +void __sched rt_mutex_init_proxy_locked(struct rt_mutex *lock, + struct task_struct *proxy_owner) { - __rt_mutex_init(lock, NULL, NULL); - debug_rt_mutex_proxy_lock(lock, proxy_owner); + __rt_mutex_basic_init(lock); rt_mutex_set_owner(lock, proxy_owner); } @@ -1703,7 +1595,7 @@ void rt_mutex_init_proxy_locked(struct rt_mutex *lock, * possible because it belongs to the pi_state which is about to be freed * and it is not longer visible to other tasks. */ -void rt_mutex_proxy_unlock(struct rt_mutex *lock) +void __sched rt_mutex_proxy_unlock(struct rt_mutex *lock) { debug_rt_mutex_proxy_unlock(lock); rt_mutex_set_owner(lock, NULL); @@ -1728,9 +1620,9 @@ void rt_mutex_proxy_unlock(struct rt_mutex *lock) * * Special API call for PI-futex support. */ -int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, - struct rt_mutex_waiter *waiter, - struct task_struct *task) +int __sched __rt_mutex_start_proxy_lock(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter, + struct task_struct *task) { int ret; @@ -1753,8 +1645,6 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, ret = 0; } - debug_rt_mutex_print_deadlock(waiter); - return ret; } @@ -1777,9 +1667,9 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, * * Special API call for PI-futex support. */ -int rt_mutex_start_proxy_lock(struct rt_mutex *lock, - struct rt_mutex_waiter *waiter, - struct task_struct *task) +int __sched rt_mutex_start_proxy_lock(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter, + struct task_struct *task) { int ret; @@ -1793,26 +1683,6 @@ int rt_mutex_start_proxy_lock(struct rt_mutex *lock, } /** - * rt_mutex_next_owner - return the next owner of the lock - * - * @lock: the rt lock query - * - * Returns the next owner of the lock or NULL - * - * Caller has to serialize against other accessors to the lock - * itself. - * - * Special API call for PI-futex support - */ -struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock) -{ - if (!rt_mutex_has_waiters(lock)) - return NULL; - - return rt_mutex_top_waiter(lock)->task; -} - -/** * rt_mutex_wait_proxy_lock() - Wait for lock acquisition * @lock: the rt_mutex we were woken on * @to: the timeout, null if none. hrtimer should already have @@ -1829,9 +1699,9 @@ struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock) * * Special API call for PI-futex support */ -int rt_mutex_wait_proxy_lock(struct rt_mutex *lock, - struct hrtimer_sleeper *to, - struct rt_mutex_waiter *waiter) +int __sched rt_mutex_wait_proxy_lock(struct rt_mutex *lock, + struct hrtimer_sleeper *to, + struct rt_mutex_waiter *waiter) { int ret; @@ -1869,8 +1739,8 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock, * * Special API call for PI-futex support */ -bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, - struct rt_mutex_waiter *waiter) +bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter) { bool cleanup = false; @@ -1905,3 +1775,11 @@ bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, return cleanup; } + +#ifdef CONFIG_DEBUG_RT_MUTEXES +void rt_mutex_debug_task_free(struct task_struct *task) +{ + DEBUG_LOCKS_WARN_ON(!RB_EMPTY_ROOT(&task->pi_waiters.rb_root)); + DEBUG_LOCKS_WARN_ON(task->pi_blocked_on); +} +#endif diff --git a/kernel/locking/rtmutex.h b/kernel/locking/rtmutex.h deleted file mode 100644 index 732f96abf462..000000000000 --- a/kernel/locking/rtmutex.h +++ /dev/null @@ -1,35 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * RT-Mutexes: blocking mutual exclusion locks with PI support - * - * started by Ingo Molnar and Thomas Gleixner: - * - * Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com> - * Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com> - * - * This file contains macros used solely by rtmutex.c. - * Non-debug version. - */ - -#define rt_mutex_deadlock_check(l) (0) -#define debug_rt_mutex_init_waiter(w) do { } while (0) -#define debug_rt_mutex_free_waiter(w) do { } while (0) -#define debug_rt_mutex_lock(l) do { } while (0) -#define debug_rt_mutex_proxy_lock(l,p) do { } while (0) -#define debug_rt_mutex_proxy_unlock(l) do { } while (0) -#define debug_rt_mutex_unlock(l) do { } while (0) -#define debug_rt_mutex_init(m, n, k) do { } while (0) -#define debug_rt_mutex_deadlock(d, a ,l) do { } while (0) -#define debug_rt_mutex_print_deadlock(w) do { } while (0) -#define debug_rt_mutex_reset_waiter(w) do { } while (0) - -static inline void rt_mutex_print_deadlock(struct rt_mutex_waiter *w) -{ - WARN(1, "rtmutex deadlock detected\n"); -} - -static inline bool debug_rt_mutex_detect_deadlock(struct rt_mutex_waiter *w, - enum rtmutex_chainwalk walk) -{ - return walk == RT_MUTEX_FULL_CHAINWALK; -} diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h index ca6fb489007b..a90c22abdbca 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -13,6 +13,7 @@ #ifndef __KERNEL_RTMUTEX_COMMON_H #define __KERNEL_RTMUTEX_COMMON_H +#include <linux/debug_locks.h> #include <linux/rtmutex.h> #include <linux/sched/wake_q.h> @@ -23,34 +24,30 @@ * @tree_entry: pi node to enqueue into the mutex waiters tree * @pi_tree_entry: pi node to enqueue into the mutex owner waiters tree * @task: task reference to the blocked task + * @lock: Pointer to the rt_mutex on which the waiter blocks + * @prio: Priority of the waiter + * @deadline: Deadline of the waiter if applicable */ struct rt_mutex_waiter { - struct rb_node tree_entry; - struct rb_node pi_tree_entry; + struct rb_node tree_entry; + struct rb_node pi_tree_entry; struct task_struct *task; struct rt_mutex *lock; -#ifdef CONFIG_DEBUG_RT_MUTEXES - unsigned long ip; - struct pid *deadlock_task_pid; - struct rt_mutex *deadlock_lock; -#endif - int prio; - u64 deadline; + int prio; + u64 deadline; }; /* - * Various helpers to access the waiters-tree: + * Must be guarded because this header is included from rcu/tree_plugin.h + * unconditionally. */ - #ifdef CONFIG_RT_MUTEXES - static inline int rt_mutex_has_waiters(struct rt_mutex *lock) { return !RB_EMPTY_ROOT(&lock->waiters.rb_root); } -static inline struct rt_mutex_waiter * -rt_mutex_top_waiter(struct rt_mutex *lock) +static inline struct rt_mutex_waiter *rt_mutex_top_waiter(struct rt_mutex *lock) { struct rb_node *leftmost = rb_first_cached(&lock->waiters); struct rt_mutex_waiter *w = NULL; @@ -67,42 +64,12 @@ static inline int task_has_pi_waiters(struct task_struct *p) return !RB_EMPTY_ROOT(&p->pi_waiters.rb_root); } -static inline struct rt_mutex_waiter * -task_top_pi_waiter(struct task_struct *p) -{ - return rb_entry(p->pi_waiters.rb_leftmost, - struct rt_mutex_waiter, pi_tree_entry); -} - -#else - -static inline int rt_mutex_has_waiters(struct rt_mutex *lock) -{ - return false; -} - -static inline struct rt_mutex_waiter * -rt_mutex_top_waiter(struct rt_mutex *lock) +static inline struct rt_mutex_waiter *task_top_pi_waiter(struct task_struct *p) { - return NULL; -} - -static inline int task_has_pi_waiters(struct task_struct *p) -{ - return false; + return rb_entry(p->pi_waiters.rb_leftmost, struct rt_mutex_waiter, + pi_tree_entry); } -static inline struct rt_mutex_waiter * -task_top_pi_waiter(struct task_struct *p) -{ - return NULL; -} - -#endif - -/* - * lock->owner state tracking: - */ #define RT_MUTEX_HAS_WAITERS 1UL static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock) @@ -111,6 +78,13 @@ static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock) return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS); } +#else /* CONFIG_RT_MUTEXES */ +/* Used in rcu/tree_plugin.h */ +static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock) +{ + return NULL; +} +#endif /* !CONFIG_RT_MUTEXES */ /* * Constants for rt mutex functions which have a selectable deadlock @@ -127,10 +101,16 @@ enum rtmutex_chainwalk { RT_MUTEX_FULL_CHAINWALK, }; +static inline void __rt_mutex_basic_init(struct rt_mutex *lock) +{ + lock->owner = NULL; + raw_spin_lock_init(&lock->wait_lock); + lock->waiters = RB_ROOT_CACHED; +} + /* * PI-futex support (proxy locking functions, etc.): */ -extern struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock); extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock, struct task_struct *proxy_owner); extern void rt_mutex_proxy_unlock(struct rt_mutex *lock); @@ -156,10 +136,29 @@ extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock, extern void rt_mutex_postunlock(struct wake_q_head *wake_q); -#ifdef CONFIG_DEBUG_RT_MUTEXES -# include "rtmutex-debug.h" -#else -# include "rtmutex.h" -#endif +/* Debug functions */ +static inline void debug_rt_mutex_unlock(struct rt_mutex *lock) +{ + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES)) + DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current); +} + +static inline void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock) +{ + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES)) + DEBUG_LOCKS_WARN_ON(!rt_mutex_owner(lock)); +} + +static inline void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter) +{ + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES)) + memset(waiter, 0x11, sizeof(*waiter)); +} + +static inline void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter) +{ + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES)) + memset(waiter, 0x22, sizeof(*waiter)); +} #endif diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index abba5df50006..809b0016d344 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -632,7 +632,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) } /* - * The rwsem_spin_on_owner() function returns the folowing 4 values + * The rwsem_spin_on_owner() function returns the following 4 values * depending on the lock owner state. * OWNER_NULL : owner is currently NULL * OWNER_WRITER: when owner changes and is a writer @@ -819,7 +819,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) * we try to get it. The new owner may be a spinnable * writer. * - * To take advantage of two scenarios listed agove, the RT + * To take advantage of two scenarios listed above, the RT * task is made to retry one more time to see if it can * acquire the lock or continue spinning on the new owning * writer. Of course, if the time lag is long enough or the diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c index 0ff08380f531..c8d7ad9fb9b2 100644 --- a/kernel/locking/spinlock.c +++ b/kernel/locking/spinlock.c @@ -58,10 +58,10 @@ EXPORT_PER_CPU_SYMBOL(__mmiowb_state); /* * We build the __lock_function inlines here. They are too large for * inlining all over the place, but here is only one user per function - * which embedds them into the calling _lock_function below. + * which embeds them into the calling _lock_function below. * * This could be a long-held lock. We both prepare to spin for a long - * time (making _this_ CPU preemptable if possible), and we also signal + * time (making _this_ CPU preemptible if possible), and we also signal * towards that other CPU that it should break the lock ASAP. */ #define BUILD_LOCK_OPS(op, locktype) \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b2890f6e6d6f..347127e73422 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5396,25 +5396,25 @@ static void sched_dynamic_update(int mode) switch (mode) { case preempt_dynamic_none: static_call_update(cond_resched, __cond_resched); - static_call_update(might_resched, (typeof(&__cond_resched)) __static_call_return0); - static_call_update(preempt_schedule, (typeof(&preempt_schedule)) NULL); - static_call_update(preempt_schedule_notrace, (typeof(&preempt_schedule_notrace)) NULL); - static_call_update(irqentry_exit_cond_resched, (typeof(&irqentry_exit_cond_resched)) NULL); + static_call_update(might_resched, (void *)&__static_call_return0); + static_call_update(preempt_schedule, NULL); + static_call_update(preempt_schedule_notrace, NULL); + static_call_update(irqentry_exit_cond_resched, NULL); pr_info("Dynamic Preempt: none\n"); break; case preempt_dynamic_voluntary: static_call_update(cond_resched, __cond_resched); static_call_update(might_resched, __cond_resched); - static_call_update(preempt_schedule, (typeof(&preempt_schedule)) NULL); - static_call_update(preempt_schedule_notrace, (typeof(&preempt_schedule_notrace)) NULL); - static_call_update(irqentry_exit_cond_resched, (typeof(&irqentry_exit_cond_resched)) NULL); + static_call_update(preempt_schedule, NULL); + static_call_update(preempt_schedule_notrace, NULL); + static_call_update(irqentry_exit_cond_resched, NULL); pr_info("Dynamic Preempt: voluntary\n"); break; case preempt_dynamic_full: - static_call_update(cond_resched, (typeof(&__cond_resched)) __static_call_return0); - static_call_update(might_resched, (typeof(&__cond_resched)) __static_call_return0); + static_call_update(cond_resched, (void *)&__static_call_return0); + static_call_update(might_resched, (void *)&__static_call_return0); static_call_update(preempt_schedule, __preempt_schedule_func); static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_func); static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched); diff --git a/kernel/smp.c b/kernel/smp.c index aeb0adfa0606..f472ef623956 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -24,14 +24,70 @@ #include <linux/sched/clock.h> #include <linux/nmi.h> #include <linux/sched/debug.h> +#include <linux/jump_label.h> #include "smpboot.h" #include "sched/smp.h" #define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK) +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG +union cfd_seq_cnt { + u64 val; + struct { + u64 src:16; + u64 dst:16; +#define CFD_SEQ_NOCPU 0xffff + u64 type:4; +#define CFD_SEQ_QUEUE 0 +#define CFD_SEQ_IPI 1 +#define CFD_SEQ_NOIPI 2 +#define CFD_SEQ_PING 3 +#define CFD_SEQ_PINGED 4 +#define CFD_SEQ_HANDLE 5 +#define CFD_SEQ_DEQUEUE 6 +#define CFD_SEQ_IDLE 7 +#define CFD_SEQ_GOTIPI 8 +#define CFD_SEQ_HDLEND 9 + u64 cnt:28; + } u; +}; + +static char *seq_type[] = { + [CFD_SEQ_QUEUE] = "queue", + [CFD_SEQ_IPI] = "ipi", + [CFD_SEQ_NOIPI] = "noipi", + [CFD_SEQ_PING] = "ping", + [CFD_SEQ_PINGED] = "pinged", + [CFD_SEQ_HANDLE] = "handle", + [CFD_SEQ_DEQUEUE] = "dequeue (src CPU 0 == empty)", + [CFD_SEQ_IDLE] = "idle", + [CFD_SEQ_GOTIPI] = "gotipi", + [CFD_SEQ_HDLEND] = "hdlend (src CPU 0 == early)", +}; + +struct cfd_seq_local { + u64 ping; + u64 pinged; + u64 handle; + u64 dequeue; + u64 idle; + u64 gotipi; + u64 hdlend; +}; +#endif + +struct cfd_percpu { + call_single_data_t csd; +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG + u64 seq_queue; + u64 seq_ipi; + u64 seq_noipi; +#endif +}; + struct call_function_data { - call_single_data_t __percpu *csd; + struct cfd_percpu __percpu *pcpu; cpumask_var_t cpumask; cpumask_var_t cpumask_ipi; }; @@ -54,8 +110,8 @@ int smpcfd_prepare_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->csd = alloc_percpu(call_single_data_t); - if (!cfd->csd) { + cfd->pcpu = alloc_percpu(struct cfd_percpu); + if (!cfd->pcpu) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); return -ENOMEM; @@ -70,7 +126,7 @@ int smpcfd_dead_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); - free_percpu(cfd->csd); + free_percpu(cfd->pcpu); return 0; } @@ -102,15 +158,60 @@ void __init call_function_init(void) #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG +static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled); +static DEFINE_STATIC_KEY_FALSE(csdlock_debug_extended); + +static int __init csdlock_debug(char *str) +{ + unsigned int val = 0; + + if (str && !strcmp(str, "ext")) { + val = 1; + static_branch_enable(&csdlock_debug_extended); + } else + get_option(&str, &val); + + if (val) + static_branch_enable(&csdlock_debug_enabled); + + return 0; +} +early_param("csdlock_debug", csdlock_debug); + static DEFINE_PER_CPU(call_single_data_t *, cur_csd); static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); static DEFINE_PER_CPU(void *, cur_csd_info); +static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local); #define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC) static atomic_t csd_bug_count = ATOMIC_INIT(0); +static u64 cfd_seq; + +#define CFD_SEQ(s, d, t, c) \ + (union cfd_seq_cnt){ .u.src = s, .u.dst = d, .u.type = t, .u.cnt = c } + +static u64 cfd_seq_inc(unsigned int src, unsigned int dst, unsigned int type) +{ + union cfd_seq_cnt new, old; + + new = CFD_SEQ(src, dst, type, 0); + + do { + old.val = READ_ONCE(cfd_seq); + new.u.cnt = old.u.cnt + 1; + } while (cmpxchg(&cfd_seq, old.val, new.val) != old.val); + + return old.val; +} + +#define cfd_seq_store(var, src, dst, type) \ + do { \ + if (static_branch_unlikely(&csdlock_debug_extended)) \ + var = cfd_seq_inc(src, dst, type); \ + } while (0) /* Record current CSD work for current CPU, NULL to erase. */ -static void csd_lock_record(call_single_data_t *csd) +static void __csd_lock_record(call_single_data_t *csd) { if (!csd) { smp_mb(); /* NULL cur_csd after unlock. */ @@ -125,7 +226,13 @@ static void csd_lock_record(call_single_data_t *csd) /* Or before unlock, as the case may be. */ } -static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) +static __always_inline void csd_lock_record(call_single_data_t *csd) +{ + if (static_branch_unlikely(&csdlock_debug_enabled)) + __csd_lock_record(csd); +} + +static int csd_lock_wait_getcpu(call_single_data_t *csd) { unsigned int csd_type; @@ -135,12 +242,86 @@ static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) return -1; } +static void cfd_seq_data_add(u64 val, unsigned int src, unsigned int dst, + unsigned int type, union cfd_seq_cnt *data, + unsigned int *n_data, unsigned int now) +{ + union cfd_seq_cnt new[2]; + unsigned int i, j, k; + + new[0].val = val; + new[1] = CFD_SEQ(src, dst, type, new[0].u.cnt + 1); + + for (i = 0; i < 2; i++) { + if (new[i].u.cnt <= now) + new[i].u.cnt |= 0x80000000U; + for (j = 0; j < *n_data; j++) { + if (new[i].u.cnt == data[j].u.cnt) { + /* Direct read value trumps generated one. */ + if (i == 0) + data[j].val = new[i].val; + break; + } + if (new[i].u.cnt < data[j].u.cnt) { + for (k = *n_data; k > j; k--) + data[k].val = data[k - 1].val; + data[j].val = new[i].val; + (*n_data)++; + break; + } + } + if (j == *n_data) { + data[j].val = new[i].val; + (*n_data)++; + } + } +} + +static const char *csd_lock_get_type(unsigned int type) +{ + return (type >= ARRAY_SIZE(seq_type)) ? "?" : seq_type[type]; +} + +static void csd_lock_print_extended(call_single_data_t *csd, int cpu) +{ + struct cfd_seq_local *seq = &per_cpu(cfd_seq_local, cpu); + unsigned int srccpu = csd->node.src; + struct call_function_data *cfd = per_cpu_ptr(&cfd_data, srccpu); + struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu); + unsigned int now; + union cfd_seq_cnt data[2 * ARRAY_SIZE(seq_type)]; + unsigned int n_data = 0, i; + + data[0].val = READ_ONCE(cfd_seq); + now = data[0].u.cnt; + + cfd_seq_data_add(pcpu->seq_queue, srccpu, cpu, CFD_SEQ_QUEUE, data, &n_data, now); + cfd_seq_data_add(pcpu->seq_ipi, srccpu, cpu, CFD_SEQ_IPI, data, &n_data, now); + cfd_seq_data_add(pcpu->seq_noipi, srccpu, cpu, CFD_SEQ_NOIPI, data, &n_data, now); + + cfd_seq_data_add(per_cpu(cfd_seq_local.ping, srccpu), srccpu, CFD_SEQ_NOCPU, CFD_SEQ_PING, data, &n_data, now); + cfd_seq_data_add(per_cpu(cfd_seq_local.pinged, srccpu), srccpu, CFD_SEQ_NOCPU, CFD_SEQ_PINGED, data, &n_data, now); + + cfd_seq_data_add(seq->idle, CFD_SEQ_NOCPU, cpu, CFD_SEQ_IDLE, data, &n_data, now); + cfd_seq_data_add(seq->gotipi, CFD_SEQ_NOCPU, cpu, CFD_SEQ_GOTIPI, data, &n_data, now); + cfd_seq_data_add(seq->handle, CFD_SEQ_NOCPU, cpu, CFD_SEQ_HANDLE, data, &n_data, now); + cfd_seq_data_add(seq->dequeue, CFD_SEQ_NOCPU, cpu, CFD_SEQ_DEQUEUE, data, &n_data, now); + cfd_seq_data_add(seq->hdlend, CFD_SEQ_NOCPU, cpu, CFD_SEQ_HDLEND, data, &n_data, now); + + for (i = 0; i < n_data; i++) { + pr_alert("\tcsd: cnt(%07x): %04x->%04x %s\n", + data[i].u.cnt & ~0x80000000U, data[i].u.src, + data[i].u.dst, csd_lock_get_type(data[i].u.type)); + } + pr_alert("\tcsd: cnt now: %07x\n", now); +} + /* * Complain if too much time spent waiting. Note that only * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, * so waiting on other types gets much less information. */ -static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) +static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) { int cpu = -1; int cpux; @@ -184,6 +365,8 @@ static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 t *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request"); } if (cpu >= 0) { + if (static_branch_unlikely(&csdlock_debug_extended)) + csd_lock_print_extended(csd, cpu); if (!trigger_single_cpu_backtrace(cpu)) dump_cpu_task(cpu); if (!cpu_cur_csd) { @@ -204,7 +387,7 @@ static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 t * previous function call. For multi-cpu calls its even more interesting * as we'll have to ensure no other cpu is observing our csd. */ -static __always_inline void csd_lock_wait(call_single_data_t *csd) +static void __csd_lock_wait(call_single_data_t *csd) { int bug_id = 0; u64 ts0, ts1; @@ -218,7 +401,36 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd) smp_acquire__after_ctrl_dep(); } +static __always_inline void csd_lock_wait(call_single_data_t *csd) +{ + if (static_branch_unlikely(&csdlock_debug_enabled)) { + __csd_lock_wait(csd); + return; + } + + smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK)); +} + +static void __smp_call_single_queue_debug(int cpu, struct llist_node *node) +{ + unsigned int this_cpu = smp_processor_id(); + struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local); + struct call_function_data *cfd = this_cpu_ptr(&cfd_data); + struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu); + + cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE); + if (llist_add(node, &per_cpu(call_single_queue, cpu))) { + cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI); + cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING); + send_call_function_single_ipi(cpu); + cfd_seq_store(seq->pinged, this_cpu, cpu, CFD_SEQ_PINGED); + } else { + cfd_seq_store(pcpu->seq_noipi, this_cpu, cpu, CFD_SEQ_NOIPI); + } +} #else +#define cfd_seq_store(var, src, dst, type) + static void csd_lock_record(call_single_data_t *csd) { } @@ -256,6 +468,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); void __smp_call_single_queue(int cpu, struct llist_node *node) { +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG + if (static_branch_unlikely(&csdlock_debug_extended)) { + unsigned int type; + + type = CSD_TYPE(container_of(node, call_single_data_t, + node.llist)); + if (type == CSD_TYPE_SYNC || type == CSD_TYPE_ASYNC) { + __smp_call_single_queue_debug(cpu, node); + return; + } + } +#endif + /* * The list addition should be visible before sending the IPI * handler locks the list to pull the entry off it because of @@ -314,6 +539,8 @@ static int generic_exec_single(int cpu, call_single_data_t *csd) */ void generic_smp_call_function_single_interrupt(void) { + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->gotipi, CFD_SEQ_NOCPU, + smp_processor_id(), CFD_SEQ_GOTIPI); flush_smp_call_function_queue(true); } @@ -341,7 +568,13 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline) lockdep_assert_irqs_disabled(); head = this_cpu_ptr(&call_single_queue); + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->handle, CFD_SEQ_NOCPU, + smp_processor_id(), CFD_SEQ_HANDLE); entry = llist_del_all(head); + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->dequeue, + /* Special meaning of source cpu: 0 == queue empty */ + entry ? CFD_SEQ_NOCPU : 0, + smp_processor_id(), CFD_SEQ_DEQUEUE); entry = llist_reverse_order(entry); /* There shouldn't be any pending callbacks on an offline CPU. */ @@ -400,8 +633,12 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline) } } - if (!entry) + if (!entry) { + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->hdlend, + 0, smp_processor_id(), + CFD_SEQ_HDLEND); return; + } /* * Second; run all !SYNC callbacks. @@ -439,6 +676,9 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline) */ if (entry) sched_ttwu_pending(entry); + + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->hdlend, CFD_SEQ_NOCPU, + smp_processor_id(), CFD_SEQ_HDLEND); } void flush_smp_call_function_from_idle(void) @@ -448,6 +688,8 @@ void flush_smp_call_function_from_idle(void) if (llist_empty(this_cpu_ptr(&call_single_queue))) return; + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->idle, CFD_SEQ_NOCPU, + smp_processor_id(), CFD_SEQ_IDLE); local_irq_save(flags); flush_smp_call_function_queue(true); if (local_softirq_pending()) @@ -664,7 +906,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask, cpumask_clear(cfd->cpumask_ipi); for_each_cpu(cpu, cfd->cpumask) { - call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu); + struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu); + call_single_data_t *csd = &pcpu->csd; if (cond_func && !cond_func(cpu, info)) continue; @@ -678,18 +921,27 @@ static void smp_call_function_many_cond(const struct cpumask *mask, csd->node.src = smp_processor_id(); csd->node.dst = cpu; #endif - if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) + cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE); + if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) { __cpumask_set_cpu(cpu, cfd->cpumask_ipi); + cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI); + } else { + cfd_seq_store(pcpu->seq_noipi, this_cpu, cpu, CFD_SEQ_NOIPI); + } } /* Send a message to all CPUs in the map */ + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->ping, this_cpu, + CFD_SEQ_NOCPU, CFD_SEQ_PING); arch_send_call_function_ipi_mask(cfd->cpumask_ipi); + cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu, + CFD_SEQ_NOCPU, CFD_SEQ_PINGED); if (wait) { for_each_cpu(cpu, cfd->cpumask) { call_single_data_t *csd; - csd = per_cpu_ptr(cfd->csd, cpu); + csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd; csd_lock_wait(csd); } } diff --git a/kernel/static_call.c b/kernel/static_call.c index 2c5950b0b90e..723fcc9d20db 100644 --- a/kernel/static_call.c +++ b/kernel/static_call.c @@ -165,13 +165,13 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func) stop = __stop_static_call_sites; -#ifdef CONFIG_MODULES if (mod) { +#ifdef CONFIG_MODULES stop = mod->static_call_sites + mod->num_static_call_sites; init = mod->state == MODULE_STATE_COMING; - } #endif + } for (site = site_mod->sites; site < stop && static_call_key(site) == key; site++) { diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan index f271ff5fbb5a..0440f373248e 100644 --- a/lib/Kconfig.kcsan +++ b/lib/Kconfig.kcsan @@ -69,8 +69,9 @@ config KCSAN_SELFTEST panic. Recommended to be enabled, ensuring critical functionality works as intended. -config KCSAN_TEST - tristate "KCSAN test for integrated runtime behaviour" +config KCSAN_KUNIT_TEST + tristate "KCSAN test for integrated runtime behaviour" if !KUNIT_ALL_TESTS + default KUNIT_ALL_TESTS depends on TRACEPOINTS && KUNIT select TORTURE_TEST help diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt new file mode 100644 index 000000000000..1ab189f51f55 --- /dev/null +++ b/tools/memory-model/Documentation/access-marking.txt @@ -0,0 +1,479 @@ +MARKING SHARED-MEMORY ACCESSES +============================== + +This document provides guidelines for marking intentionally concurrent +normal accesses to shared memory, that is "normal" as in accesses that do +not use read-modify-write atomic operations. It also describes how to +document these accesses, both with comments and with special assertions +processed by the Kernel Concurrency Sanitizer (KCSAN). This discussion +builds on an earlier LWN article [1]. + + +ACCESS-MARKING OPTIONS +====================== + +The Linux kernel provides the following access-marking options: + +1. Plain C-language accesses (unmarked), for example, "a = b;" + +2. Data-race marking, for example, "data_race(a = b);" + +3. READ_ONCE(), for example, "a = READ_ONCE(b);" + The various forms of atomic_read() also fit in here. + +4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);" + The various forms of atomic_set() also fit in here. + + +These may be used in combination, as shown in this admittedly improbable +example: + + WRITE_ONCE(a, b + data_race(c + d) + READ_ONCE(e)); + +Neither plain C-language accesses nor data_race() (#1 and #2 above) place +any sort of constraint on the compiler's choice of optimizations [2]. +In contrast, READ_ONCE() and WRITE_ONCE() (#3 and #4 above) restrict the +compiler's use of code-motion and common-subexpression optimizations. +Therefore, if a given access is involved in an intentional data race, +using READ_ONCE() for loads and WRITE_ONCE() for stores is usually +preferable to data_race(), which in turn is usually preferable to plain +C-language accesses. + +KCSAN will complain about many types of data races involving plain +C-language accesses, but marking all accesses involved in a given data +race with one of data_race(), READ_ONCE(), or WRITE_ONCE(), will prevent +KCSAN from complaining. Of course, lack of KCSAN complaints does not +imply correct code. Therefore, please take a thoughtful approach +when responding to KCSAN complaints. Churning the code base with +ill-considered additions of data_race(), READ_ONCE(), and WRITE_ONCE() +is unhelpful. + +In fact, the following sections describe situations where use of +data_race() and even plain C-language accesses is preferable to +READ_ONCE() and WRITE_ONCE(). + + +Use of the data_race() Macro +---------------------------- + +Here are some situations where data_race() should be used instead of +READ_ONCE() and WRITE_ONCE(): + +1. Data-racy loads from shared variables whose values are used only + for diagnostic purposes. + +2. Data-racy reads whose values are checked against marked reload. + +3. Reads whose values feed into error-tolerant heuristics. + +4. Writes setting values that feed into error-tolerant heuristics. + + +Data-Racy Reads for Approximate Diagnostics + +Approximate diagnostics include lockdep reports, monitoring/statistics +(including /proc and /sys output), WARN*()/BUG*() checks whose return +values are ignored, and other situations where reads from shared variables +are not an integral part of the core concurrency design. + +In fact, use of data_race() instead READ_ONCE() for these diagnostic +reads can enable better checking of the remaining accesses implementing +the core concurrency design. For example, suppose that the core design +prevents any non-diagnostic reads from shared variable x from running +concurrently with updates to x. Then using plain C-language writes +to x allows KCSAN to detect reads from x from within regions of code +that fail to exclude the updates. In this case, it is important to use +data_race() for the diagnostic reads because otherwise KCSAN would give +false-positive warnings about these diagnostic reads. + +In theory, plain C-language loads can also be used for this use case. +However, in practice this will have the disadvantage of causing KCSAN +to generate false positives because KCSAN will have no way of knowing +that the resulting data race was intentional. + + +Data-Racy Reads That Are Checked Against Marked Reload + +The values from some reads are not implicitly trusted. They are instead +fed into some operation that checks the full value against a later marked +load from memory, which means that the occasional arbitrarily bogus value +is not a problem. For example, if a bogus value is fed into cmpxchg(), +all that happens is that this cmpxchg() fails, which normally results +in a retry. Unless the race condition that resulted in the bogus value +recurs, this retry will with high probability succeed, so no harm done. + +However, please keep in mind that a data_race() load feeding into +a cmpxchg_relaxed() might still be subject to load fusing on some +architectures. Therefore, it is best to capture the return value from +the failing cmpxchg() for the next iteration of the loop, an approach +that provides the compiler much less scope for mischievous optimizations. +Capturing the return value from cmpxchg() also saves a memory reference +in many cases. + +In theory, plain C-language loads can also be used for this use case. +However, in practice this will have the disadvantage of causing KCSAN +to generate false positives because KCSAN will have no way of knowing +that the resulting data race was intentional. + + +Reads Feeding Into Error-Tolerant Heuristics + +Values from some reads feed into heuristics that can tolerate occasional +errors. Such reads can use data_race(), thus allowing KCSAN to focus on +the other accesses to the relevant shared variables. But please note +that data_race() loads are subject to load fusing, which can result in +consistent errors, which in turn are quite capable of breaking heuristics. +Therefore use of data_race() should be limited to cases where some other +code (such as a barrier() call) will force the occasional reload. + +In theory, plain C-language loads can also be used for this use case. +However, in practice this will have the disadvantage of causing KCSAN +to generate false positives because KCSAN will have no way of knowing +that the resulting data race was intentional. + + +Writes Setting Values Feeding Into Error-Tolerant Heuristics + +The values read into error-tolerant heuristics come from somewhere, +for example, from sysfs. This means that some code in sysfs writes +to this same variable, and these writes can also use data_race(). +After all, if the heuristic can tolerate the occasional bogus value +due to compiler-mangled reads, it can also tolerate the occasional +compiler-mangled write, at least assuming that the proper value is in +place once the write completes. + +Plain C-language stores can also be used for this use case. However, +in kernels built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, this +will have the disadvantage of causing KCSAN to generate false positives +because KCSAN will have no way of knowing that the resulting data race +was intentional. + + +Use of Plain C-Language Accesses +-------------------------------- + +Here are some example situations where plain C-language accesses should +used instead of READ_ONCE(), WRITE_ONCE(), and data_race(): + +1. Accesses protected by mutual exclusion, including strict locking + and sequence locking. + +2. Initialization-time and cleanup-time accesses. This covers a + wide variety of situations, including the uniprocessor phase of + system boot, variables to be used by not-yet-spawned kthreads, + structures not yet published to reference-counted or RCU-protected + data structures, and the cleanup side of any of these situations. + +3. Per-CPU variables that are not accessed from other CPUs. + +4. Private per-task variables, including on-stack variables, some + fields in the task_struct structure, and task-private heap data. + +5. Any other loads for which there is not supposed to be a concurrent + store to that same variable. + +6. Any other stores for which there should be neither concurrent + loads nor concurrent stores to that same variable. + + But note that KCSAN makes two explicit exceptions to this rule + by default, refraining from flagging plain C-language stores: + + a. No matter what. You can override this default by building + with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n. + + b. When the store writes the value already contained in + that variable. You can override this default by building + with CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. + + c. When one of the stores is in an interrupt handler and + the other in the interrupted code. You can override this + default by building with CONFIG_KCSAN_INTERRUPT_WATCHER=y. + +Note that it is important to use plain C-language accesses in these cases, +because doing otherwise prevents KCSAN from detecting violations of your +code's synchronization rules. + + +ACCESS-DOCUMENTATION OPTIONS +============================ + +It is important to comment marked accesses so that people reading your +code, yourself included, are reminded of the synchronization design. +However, it is even more important to comment plain C-language accesses +that are intentionally involved in data races. Such comments are +needed to remind people reading your code, again, yourself included, +of how the compiler has been prevented from optimizing those accesses +into concurrency bugs. + +It is also possible to tell KCSAN about your synchronization design. +For example, ASSERT_EXCLUSIVE_ACCESS(foo) tells KCSAN that any +concurrent access to variable foo by any other CPU is an error, even +if that concurrent access is marked with READ_ONCE(). In addition, +ASSERT_EXCLUSIVE_WRITER(foo) tells KCSAN that although it is OK for there +to be concurrent reads from foo from other CPUs, it is an error for some +other CPU to be concurrently writing to foo, even if that concurrent +write is marked with data_race() or WRITE_ONCE(). + +Note that although KCSAN will call out data races involving either +ASSERT_EXCLUSIVE_ACCESS() or ASSERT_EXCLUSIVE_WRITER() on the one hand +and data_race() writes on the other, KCSAN will not report the location +of these data_race() writes. + + +EXAMPLES +======== + +As noted earlier, the goal is to prevent the compiler from destroying +your concurrent algorithm, to help the human reader, and to inform +KCSAN of aspects of your concurrency design. This section looks at a +few examples showing how this can be done. + + +Lock Protection With Lockless Diagnostic Access +----------------------------------------------- + +For example, suppose a shared variable "foo" is read only while a +reader-writer spinlock is read-held, written only while that same +spinlock is write-held, except that it is also read locklessly for +diagnostic purposes. The code might look as follows: + + int foo; + DEFINE_RWLOCK(foo_rwlock); + + void update_foo(int newval) + { + write_lock(&foo_rwlock); + foo = newval; + do_something(newval); + write_unlock(&foo_rwlock); + } + + int read_foo(void) + { + int ret; + + read_lock(&foo_rwlock); + do_something_else(); + ret = foo; + read_unlock(&foo_rwlock); + return ret; + } + + int read_foo_diagnostic(void) + { + return data_race(foo); + } + +The reader-writer lock prevents the compiler from introducing concurrency +bugs into any part of the main algorithm using foo, which means that +the accesses to foo within both update_foo() and read_foo() can (and +should) be plain C-language accesses. One benefit of making them be +plain C-language accesses is that KCSAN can detect any erroneous lockless +reads from or updates to foo. The data_race() in read_foo_diagnostic() +tells KCSAN that data races are expected, and should be silently +ignored. This data_race() also tells the human reading the code that +read_foo_diagnostic() might sometimes return a bogus value. + +However, please note that your kernel must be built with +CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n in order for KCSAN to +detect a buggy lockless write. If you need KCSAN to detect such a +write even if that write did not change the value of foo, you also +need CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. If you need KCSAN to +detect such a write happening in an interrupt handler running on the +same CPU doing the legitimate lock-protected write, you also need +CONFIG_KCSAN_INTERRUPT_WATCHER=y. With some or all of these Kconfig +options set properly, KCSAN can be quite helpful, although it is not +necessarily a full replacement for hardware watchpoints. On the other +hand, neither are hardware watchpoints a full replacement for KCSAN +because it is not always easy to tell hardware watchpoint to conditionally +trap on accesses. + + +Lock-Protected Writes With Lockless Reads +----------------------------------------- + +For another example, suppose a shared variable "foo" is updated only +while holding a spinlock, but is read locklessly. The code might look +as follows: + + int foo; + DEFINE_SPINLOCK(foo_lock); + + void update_foo(int newval) + { + spin_lock(&foo_lock); + WRITE_ONCE(foo, newval); + ASSERT_EXCLUSIVE_WRITER(foo); + do_something(newval); + spin_unlock(&foo_wlock); + } + + int read_foo(void) + { + do_something_else(); + return READ_ONCE(foo); + } + +Because foo is read locklessly, all accesses are marked. The purpose +of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy +concurrent lockless write. + + +Lockless Reads and Writes +------------------------- + +For another example, suppose a shared variable "foo" is both read and +updated locklessly. The code might look as follows: + + int foo; + + int update_foo(int newval) + { + int ret; + + ret = xchg(&foo, newval); + do_something(newval); + return ret; + } + + int read_foo(void) + { + do_something_else(); + return READ_ONCE(foo); + } + +Because foo is accessed locklessly, all accesses are marked. It does +not make sense to use ASSERT_EXCLUSIVE_WRITER() in this case because +there really can be concurrent lockless writers. KCSAN would +flag any concurrent plain C-language reads from foo, and given +CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, also any concurrent plain +C-language writes to foo. + + +Lockless Reads and Writes, But With Single-Threaded Initialization +------------------------------------------------------------------ + +For yet another example, suppose that foo is initialized in a +single-threaded manner, but that a number of kthreads are then created +that locklessly and concurrently access foo. Some snippets of this code +might look as follows: + + int foo; + + void initialize_foo(int initval, int nkthreads) + { + int i; + + foo = initval; + ASSERT_EXCLUSIVE_ACCESS(foo); + for (i = 0; i < nkthreads; i++) + kthread_run(access_foo_concurrently, ...); + } + + /* Called from access_foo_concurrently(). */ + int update_foo(int newval) + { + int ret; + + ret = xchg(&foo, newval); + do_something(newval); + return ret; + } + + /* Also called from access_foo_concurrently(). */ + int read_foo(void) + { + do_something_else(); + return READ_ONCE(foo); + } + +The initialize_foo() uses a plain C-language write to foo because there +are not supposed to be concurrent accesses during initialization. The +ASSERT_EXCLUSIVE_ACCESS() allows KCSAN to flag buggy concurrent unmarked +reads, and the ASSERT_EXCLUSIVE_ACCESS() call further allows KCSAN to +flag buggy concurrent writes, even if: (1) Those writes are marked or +(2) The kernel was built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y. + + +Checking Stress-Test Race Coverage +---------------------------------- + +When designing stress tests it is important to ensure that race conditions +of interest really do occur. For example, consider the following code +fragment: + + int foo; + + int update_foo(int newval) + { + return xchg(&foo, newval); + } + + int xor_shift_foo(int shift, int mask) + { + int old, new, newold; + + newold = data_race(foo); /* Checked by cmpxchg(). */ + do { + old = newold; + new = (old << shift) ^ mask; + newold = cmpxchg(&foo, old, new); + } while (newold != old); + return old; + } + + int read_foo(void) + { + return READ_ONCE(foo); + } + +If it is possible for update_foo(), xor_shift_foo(), and read_foo() to be +invoked concurrently, the stress test should force this concurrency to +actually happen. KCSAN can evaluate the stress test when the above code +is modified to read as follows: + + int foo; + + int update_foo(int newval) + { + ASSERT_EXCLUSIVE_ACCESS(foo); + return xchg(&foo, newval); + } + + int xor_shift_foo(int shift, int mask) + { + int old, new, newold; + + newold = data_race(foo); /* Checked by cmpxchg(). */ + do { + old = newold; + new = (old << shift) ^ mask; + ASSERT_EXCLUSIVE_ACCESS(foo); + newold = cmpxchg(&foo, old, new); + } while (newold != old); + return old; + } + + + int read_foo(void) + { + ASSERT_EXCLUSIVE_ACCESS(foo); + return READ_ONCE(foo); + } + +If a given stress-test run does not result in KCSAN complaints from +each possible pair of ASSERT_EXCLUSIVE_ACCESS() invocations, the +stress test needs improvement. If the stress test was to be evaluated +on a regular basis, it would be wise to place the above instances of +ASSERT_EXCLUSIVE_ACCESS() under #ifdef so that they did not result in +false positives when not evaluating the stress test. + + +REFERENCES +========== + +[1] "Concurrency bugs should fear the big bad data-race detector (part 2)" + https://lwn.net/Articles/816854/ + +[2] "Who's afraid of a big bad optimizing compiler?" + https://lwn.net/Articles/793253/ diff --git a/tools/memory-model/Documentation/simple.txt b/tools/memory-model/Documentation/simple.txt index 81e1a0ec5342..4c789ec8334f 100644 --- a/tools/memory-model/Documentation/simple.txt +++ b/tools/memory-model/Documentation/simple.txt @@ -189,7 +189,6 @@ Additional information may be found in these files: Documentation/atomic_t.txt Documentation/atomic_bitops.txt -Documentation/core-api/atomic_ops.rst Documentation/core-api/refcount-vs-atomic.rst Reading code using these primitives is often also quite helpful. |