summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-06-02vdso/vsyscall: Gate update_vsyscall() behind CONFIG_GENERIC_GETTIMEOFDAYThomas Weißschuh
Both the compilation of kernel/time/vsyscall.c, which contains the real definition of update_vsyscall() and the other vDSO definitions in timekeeper_internal.h use CONFIG_GENERIC_GETTIMEOFDAY and not CONFIG_GENERIC_TIME_VSYSCALL. Align the code to use a single Kconfig symbol. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-2-5c2a5905d5f5@linutronix.de
2026-06-02sched/cputime: Handle dyntick-idle steal time correctlyFrederic Weisbecker
The dyntick-idle steal time is currently accounted when the tick restarts but the stolen idle time is not subtracted from the idle time that was already accounted. This is to avoid observing the idle time going backward as the dyntick-idle cputime accessors can't reliably know in advance the stolen idle time. In order to maintain a forward progressing idle cputime while subtracting idle steal time from it, keep track of the previously accounted idle stolen time and substract it from _later_ idle cputime accounting. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-16-frederic@kernel.org
2026-06-02sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-caseFrederic Weisbecker
The last reason why get_cpu_idle/iowait_time_us() may return -1 now is if the config doesn't support nohz. The ad-hoc replacement solution by cpufreq is to compute jiffies minus the whole busy cputime. Although the intention should provide a coherent low resolution estimation of the idle and iowait time, the implementation is buggy because jiffies don't start at 0. Just provide instead a real get_cpu_[idle|iowait]_time_us() offcase. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-14-frederic@kernel.org
2026-06-02tick/sched: Consolidate idle time fetching APIsFrederic Weisbecker
Fetching the idle cputime is available through a variety of accessors all over the place depending on the different accounting flavours and needs: - idle vtime generic accounting can be accessed by kcpustat_field(), kcpustat_cpu_fetch(), get_idle/iowait_time() and get_cpu_idle/iowait_time_us() - dynticks-idle accounting can only be accessed by get_idle/iowait_time() or get_cpu_idle/iowait_time_us() - CONFIG_NO_HZ_COMMON=n idle accounting can be accessed by kcpustat_field() kcpustat_cpu_fetch(), or get_idle/iowait_time() but not by get_cpu_idle/iowait_time_us() Moreover get_idle/iowait_time() relies on get_cpu_idle/iowait_time_us() with a non-sensical conversion to microseconds and back to nanoseconds on the way. Start consolidating the APIs with removing get_idle/iowait_time() and make kcpustat_field() and kcpustat_cpu_fetch() work for all cases. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-13-frederic@kernel.org
2026-06-02tick/sched: Move dyntick-idle cputime accounting to cputime codeFrederic Weisbecker
Although the dynticks-idle cputime accounting is necessarily tied to the tick subsystem, the actual related accounting code has no business residing there and should be part of the scheduler cputime code. Move away the relevant pieces and state machine to where they belong. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-10-frederic@kernel.org
2026-06-02tick/sched: Unify idle cputime accountingFrederic Weisbecker
The non-vtime dynticks-idle cputime accounting is a big mess that accumulates within two concurrent statistics, each having their own shortcomings: * The accounting for online CPUs which is based on the delta between tick_nohz_start_idle() and tick_nohz_stop_idle(). Pros: - Works when the tick is off - Has nsecs granularity Cons: - Account idle steal time but doesn't substract it from idle cputime. - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but the IRQ time is simply ignored when CONFIG_IRQ_TIME_ACCOUNTING=n - The windows between 1) idle task scheduling and the first call to tick_nohz_start_idle() and 2) idle task between the last tick_nohz_stop_idle() and the rest of the idle time are blindspots wrt. cputime accounting (though mostly insignificant amount) - Relies on private fields outside of kernel stats, with specific accessors. * The accounting for offline CPUs which is based on ticks and the jiffies delta during which the tick was stopped. Pros: - Handles steal time correctly - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and CONFIG_IRQ_TIME_ACCOUNTING=n correctly. - Handles the whole idle task - Accounts directly to kernel stats, without midlayer accumulator. Cons: - Doesn't elapse when the tick is off, which doesn't make it suitable for online CPUs. - Has TICK_NSEC granularity (jiffies) - Needs to track the dyntick-idle ticks that were accounted and substract them from the total jiffies time spent while the tick was stopped. This is an ugly workaround. Having two different accounting for a single context is not the only problem: since those accountings are of different natures, it is possible to observe the global idle time going backward after a CPU goes offline. Clean up the situation with introducing a hybrid approach that stays coherent and works for both online and offline CPUs: * Tick based or native vtime accounting operate before the idle loop is entered and resume once the idle loop prepares to exit. * When the idle loop starts, switch to dynticks-idle accounting as is done currently, except that the statistics accumulate directly to the relevant kernel stat fields. * Private dyntick cputime accounting fields are removed. * Works on both online and offline case. Further improvement will include: * Only switch to dynticks-idle cputime accounting when the tick actually goes in dynticks mode. * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the dynticks-idle accounting still elapses while on IRQs. * Correctly substract idle steal cputime from idle time Reported-by: Xin Zhao <jackzxcui1989@163.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-8-frederic@kernel.org
2026-06-02powerpc/time: Prepare to stop elapsing in dynticks-idleFrederic Weisbecker
Currently the tick subsystem stores the idle cputime accounting in private fields, allowing cohabitation with architecture idle vtime accounting. The former is fetched on online CPUs, the latter on offline CPUs. For consolidation purpose, architecture vtime accounting will continue to account the cputime but will make a break when the idle tick is stopped. The dyntick cputime accounting will then be relayed by the tick subsystem so that the idle cputime is still seen advancing coherently even when the tick isn't there to flush the idle vtime. Prepare for that and introduce three new APIs which will be used in subsequent patches: - vtime_dynticks_start() is deemed to be called when idle enters in dyntick mode. The idle cputime that elapsed so far is accumulated. - vtime_dynticks_stop() is deemed to be called when idle exits from dyntick mode. The vtime entry clocks are fast-forward to current time so that idle accounting restarts elapsing from now. - vtime_reset() is deemed to be called from dynticks idle IRQ entry to fast-forward the clock to current time so that the IRQ time is still accounted by vtime while nohz cputime is paused. Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid accounting twice the idle cputime, along with nohz accounting. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-6-frederic@kernel.org
2026-06-02sched/cputime: Correctly support generic vtime idle timeFrederic Weisbecker
Currently whether generic vtime is running or not, the idle cputime is fetched from the nohz accounting. However generic vtime already does its own idle cputime accounting. Only the kernel stat accessors are not plugged to support it. Read the idle generic vtime cputime when it's running, this will allow to later more clearly split nohz and vtime cputime accounting. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-5-frederic@kernel.org
2026-06-02sched/cputime: Remove superfluous and error prone kcpustat_field() parameterFrederic Weisbecker
The first parameter to kcpustat_field() is a pointer to the cpu kcpustat to be fetched from. This parameter is error prone because a copy to a kcpustat could be passed by accident instead of the original one. Also the kcpustat structure can already be retrieved with the help of the mandatory CPU argument. Remove the needless parameter. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-4-frederic@kernel.org
2026-06-02bpf: Silence unused-but-set-variable warning in bpf_for_each_reg_in_vstate_maskAmery Hung
The macro requires callers to pass a stack variable, but not all callbacks use it. Add (void)__stack to suppress the clang W=1 warning. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260602175204.624401-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-02Merge branches 'rcutorture.2026.05.24' and 'misc.2026.05.24' into ↵Uladzislau Rezki (Sony)
rcu-merge.2026.05.24 rcutorture.2026.05.24: Torture-test updates misc.2026.05.24: Miscellaneous RCU updates
2026-06-02io_uring/bpf-ops: restrict ctx access to BPFPavel Begunkov
BPF programs should have no need in looking into struct io_ring_ctx, if anything, most of such cases would be anti patterns like looking up ring indices directly via the context. Replace it with a new empty structure, which is just an alias to struct io_ring_ctx. It'll create a new BTF type and fail verification if a BPF program tries to access it (beyond the first byte). It'll also give more flexibility for the future, and otherwise it can be made aligned with io_ring_ctx as before with struct groups if ever needed or extended in a different way. Fixes: d0e437b76bd3c ("io_uring/bpf-ops: implement loop_step with BPF struct_ops") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/5f6ca3649e9e0bae8667db4357e28dd00cd07901.1780394491.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-02Merge tag 'mm-hotfixes-stable-2026-06-01-20-58' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3 address post-7.1 issues or aren't considered suitable for backporting. There's a three-patch series "userfaultfd: verify VMA state across UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things. The rest are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: userfaultfd: remove redundant check in vm_uffd_ops() userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs userfaultfd: verify VMA state across UFFDIO_COPY retry mm/huge_memory: update file PMD counter before folio_put() mm/huge_memory: update file PUD counter before folio_put() mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback mm/damon/ops-common: call folio_test_lru() after folio_get() mm/cma: fix reserved page leak on activation failure mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison mm/hugetlb: restore reservation on error in hugetlb folio copy paths mm/cma_debug: fix invalid accesses for inactive CMA areas memcg: use round-robin victim selection in refill_stock mm/hugetlb: avoid false positive lockdep assertion
2026-06-02mm: Make empty_zero_page[] constArd Biesheuvel
The empty zero page is used to back any kernel or user space mapping that is supposed to remain cleared, and so the page itself is never supposed to be modified. So mark it as const, which moves it into .rodata rather than .bss: on most architectures, this ensures that both the kernel's mapping of it and any aliases that are accessible via the kernel direct (linear) map are mapped read-only, and cannot be used (inadvertently or maliciously) to corrupt the contents of the zero page. Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-06-02regulator: Use named initializers for platform_device_id arraysMark Brown
Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> says: this series targets to use named initializers for platform_device_id arrays. In general these are better readable for humans and more robust to changes in the respective struct definition. This robustness is needed as I want to do Link: https://patch.msgid.link/cover.1779878004.git.u.kleine-koenig@baylibre.com
2026-06-02Merge tag 'v7.1-rc6' into workJonathan Cameron
Linux 7.1-rc6
2026-06-02rseq: Fix using an uninitialized stack variable in rseq_exit_user_update()Qing Wang
There is an bug in which an uninitialized stack variable is used in rseq_exit_user_update() as reported by syzbot: BUG: KMSAN: kernel-infoleak in rseq_set_ids_get_csaddr include/linux/rseq_entry.h:502 [inline] The local variable: struct rseq_ids ids = { .cpu_id = task_cpu(t), .mm_cid = task_mm_cid(t), .node_id = cpu_to_node(ids.cpu_id), }; According to the C standard, the evaluation order of expressions in an initializer list is indeterminately sequenced. The compiler (Clang, in this KMSAN build) evaluates `cpu_to_node(ids.cpu_id)` *before* `ids.cpu_id` is initialized with `task_cpu(t)`. This is fixed by moving the assignment of ids.node_id outside the structure initialization. Fixes: 82f572449cfe ("rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode") Closes: https://syzkaller.appspot.com/bug?extid=185a631927096f9da2fc Reported-by: syzbot+185a631927096f9da2fc@syzkaller.appspotmail.com Signed-off-by: Qing Wang <wangqing7171@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://patch.msgid.link/20260602030854.574038-1-wangqing7171@gmail.com
2026-06-02sched/proxy: Remove PROXY_WAKINGK Prateek Nayak
Now that the proxy path uses ->is_blocked, use the '->is_blocked && !->blocked_on' state instead of PROXY_WAKING. Notably, this is where a blocked_on relation is broken but the donor task might still need a return migration. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260526113322.596522894%40infradead.org
2026-06-02sched: Add blocked_donor link to task for smarter mutex handoffsPeter Zijlstra
Add link to the task this task is proxying for, and use it so the mutex owner can do an intelligent hand-off of the mutex to the task that the owner is running on behalf. [jstultz: This patch was split out from larger proxy patch] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260512025635.2840817-8-jstultz@google.com
2026-06-02sched: Add is_blocked task flagJohn Stultz
Add a new is_blocked flag to the task struct. This flag is set by try_to_block_task() and cleared by ttwu_do_wakeup() and tracks if the task is blocked. Traditionally this would mirror !p->on_rq, however due things like DELAY_DEQUEUE and PROXY_EXEC, this can diverge, so its useful to manage separately. Additionally with this, we might be able to get rid of the p->se.sched_delayed (ab)use in the core code (eventually). Taken whole cloth from Peter's email: https://lore.kernel.org/lkml/20260501132143.GC1026330@noisy.programming.kicks-ass.net/ With a few additional p->is_blocked = 0 in a few cases where we return current if blocked_on gets zeroed or there is no owner. This may hint that these current special cases might be dropped eventually. This change also helps resolve wait-queue stalls seen with proxy-execution. See previous patch attempts for details: https://lore.kernel.org/lkml/20260430215103.2978955-2-jstultz@google.com/ Reported-by: Vineeth Pillai <vineethrp@google.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260512025635.2840817-7-jstultz@google.com
2026-06-02sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING caseJohn Stultz
This patch adds logic so try_to_wake_up() will notice if we are waking a task where blocked_on == PROXY_WAKING, and if necessary dequeue the task so the wakeup will naturally return-migrate the donor task back to a cpu it can run on. This helps performance as we do the dequeue and wakeup under the locks normally taken in the try_to_wake_up() and avoids having to do proxy_force_return() from __schedule(), which has to re-take similar locks and then force a pick again loop. This was split out from the larger proxy patch, and significantly reworked. Credits for the original patch go to: Peter Zijlstra (Intel) <peterz@infradead.org> Juri Lelli <juri.lelli@redhat.com> Valentin Schneider <valentin.schneider@arm.com> Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260512025635.2840817-6-jstultz@google.com
2026-06-02Merge branch 'tip/sched/urgent'Peter Zijlstra
Pick up urgent fixes. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2026-06-02pps: Convert to ktime_get_snapshot_id()Thomas Gleixner
ktime_get_snapshot() resolves to ktime_get_snapshot_id(CLOCK_REALTIME). Make it obvious in the code and convert the readout to use the snapshot::systime and monoraw fields instead of snapshot::real and raw, which aregoing away. Similar to the PPS generators, avoid the more expensive snapshot when CONFIG_NTP_PPS is disabled. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260529195557.123410250@kernel.org
2026-06-02timekeeping: Provide ktime_get_snapshot_id()Thomas Gleixner
ktime_get_snapshot() provides a snapshot of the underlying clocksource counter value and the corresponding CLOCK_MONOTONIC_RAW, CLOCK_REALTIME and CLOCK_BOOTTIME timestamps. There is no usage of CLOCK_REALTIME and CLOCK_BOOTTIME at the same time and CLOCK_BOOTTIME support was just added for the ARM64 KVM tracing mechanism, which needs CLOCK_BOOTTIME and the underlying clocksource counter value. ktime_get_snapshot() is also not suitable for usage with CLOCK_AUX, but that's a prerequisite to support PTP hardware timestamping for CLOCK_AUX steering. As a first step, rename ktime_get_snapshot() to ktime_get_snapshot_id(), which now takes a clockid argument to select the clock which needs to be captured. The result is stored in system_time_snapshot::systime, which will replace the system_time_snapshot::real/boot members once all usage sites have been converted. ktime_get_snapshot() is a simple wrapper which hands in CLOCK_REALTIME as clockid argument for the conversion period. That means CLOCK_REALTIME is now captured twice, but that redunancy is only temporary. As all usage sites of struct system_time_snapshot has to be updated anyway, rename the 'raw' member to 'monoraw' for clarity. No functional change vs. current users of ktime_get_snapshot() Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260529195556.971591633@kernel.org
2026-06-01libbpf: Reject non-exclusive metadata maps in the signed loaderKP Singh
The loader verifies map->sha against the metadata hash in its instructions. map->sha is calculated when BPF_OBJ_GET_INFO_BY_FD is called on the frozen map. While the map is frozen, the /signed loader/ must also ensure the map is exclusive, as, without exclusivity (which a hostile host could just omit when loading the loader), another BPF program with map access can mutate the contents afterwards, so the check passes on stale data. With the extra check as part of the signed loader, it now refuses to move on with map->sha validation if the host set it up wrongly. Fixes: fb2b0e290147 ("libbpf: Update light skeleton for signing") Signed-off-by: KP Singh <kpsingh@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Drop redundant hash_buf from map_get_hash operationDaniel Borkmann
bpf_map_get_info_by_fd() is the only caller of the ->map_get_hash and always invokes it with hash_buf == map->sha and hash_buf_size of SHA256_DIGEST_SIZE. array_map_get_hash() in turn lets sha256() write the digest directly into that buffer (map->sha) and then performs a trailing memcpy(), which evaluates to memcpy(map->sha, map->sha, 32): a redundant self-copy. The hash_buf_size argument was never used at all. Simplify this a bit, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Unify release handling for helpers and kfuncsAmery Hung
Introduce release_reg() to consolidate the release logic shared by both helpers and kfuncs: dynptr release, kptr_xchg percpu-to-RCU conversion, regular reference release, and NULL pass-through. NULL pass-through is only allowed if the prototype indicates the argument may be null. Determine release_regno from the function prototype/metadata before argument checking, rather than discovering it dynamically during argument processing. For helpers, scan the arg_type array in check_func_proto() via check_proto_release_reg(). For kfuncs, set release_regno to BPF_REG_1 in bpf_fetch_kfunc_arg_meta() when KF_RELEASE is set. In the future when we start adding decl_tag to kfunc arguments, we can just look at the function prototype instead of a release_regno. Extract ref_convert_alloc_rcu_protected() and invalidate_rcu_protected_refs() to make it more clear what the code is doing. For ref_convert_alloc_rcu_protected(), it pre-converts MEM_ALLOC | MEM_PERCPU registers to MEM_RCU (clearing id so they survive), then calls release_reference() to invalidate the remaining registers and release the reference state. Add KF_RELEASE to bpf_dynptr_file_discard() so its release_regno is set via fetch_kfunc_meta rather than being assigned manually in the dynptr argument processing. Set arg_type to ARG_PTR_TO_DYNPTR for KF_ARG_PTR_TO_DYNPTR so that check_func_arg_reg_off() correctly allows non-zero stack offsets for dynptr release arguments same as helper. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-9-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Unify referenced object tracking in verifierAmery Hung
Helpers and kfuncs independently tracked referenced object metadata using standalone id fields in their respective arg_meta structs. This led to duplicated logic and inconsistent error handling between the two paths. Introduce struct ref_obj_desc to consolidate id and parent_id along with a count of how many arguments carry a reference. Add update_ref_obj() to populate it from a bpf_reg_state, replacing open-coded assignments in check_func_arg(), check_kfunc_args(), and process_iter_arg(). Add validate_ref_obj() to check for ambiguous ref_obj before using it. For ref_obj releasing helpers and kfuncs, keep checking it before calling update_ref_obj() for now. A later patch will make these functions not depending on ref_obj. For other users of ref_obj, move the checks to the use locations. For helper, this means moving the checks inside helper_multiple_ref_obj_use() to use locations. is_acquire_function() is dropped as ref_obj is never used. Pass ref_obj_desc into process_dynptr_func()/mark_stack_slots_dynptr() instead of a bare parent_id to make it less confusing. Drop the selftest introduced in 7ec899ac90a2 ("selftests/bpf: Negative test case for ref_obj_id in args") since the verifier no longer complains about ambiguous ref_obj if it is not used. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-8-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Refactor object relationship tracking and fix dynptr UAF bugAmery Hung
Refactor object relationship tracking in the verifier and fix a dynptr use-after-free bug where file/skb dynptrs are not invalidated when the parent referenced object is freed. Add parent_id to bpf_reg_state to precisely track child-parent relationships. A child object's parent_id points to the parent object's id. This replaces the PTR_TO_MEM-specific dynptr_id. Remove ref_obj_id from bpf_reg_state by folding its role into the existing id field. Previously, id tracked pointer identity for null checking while ref_obj_id tracked the owning reference for lifetime management. These are now unified: acquire helpers and kfuncs set id to the acquired reference id, and release paths use id directly. Add reg_is_referenced() which checks if a register is referenced by looking up its id in the reference array. This replaces all former ref_obj_id checks. For release_reference(), invalidating an object now also invalidates all descendants by traversing the object tree. This is done using stack-based DFS to avoid recursive call chains of release_reference() -> unmark_stack_slots_dynptr() -> release_reference(). Referenced objects encountered during tree traversal are reported as leaked references. Add parent_id to bpf_reference_state to enable hierarchical reference tracking. When acquiring a reference, a parent_id can be specified to link the new reference to an existing one (e.g., referenced dynptrs acquire a reference with parent_id linking to the parent object's reference). Pointer casting: For pointer casting helpers (bpf_sk_fullsock, bpf_tcp_sock), instead of propagating ref_obj_id, the cast result reuses the same reference id as the source pointer. Since the cast may return NULL for a non-NULL input, the NULL case is explored as a separate verifier branch. This allows releasing any of the original or cast pointers to invalidate all others. Referenced dynptrs: When constructing a referenced dynptr, acquire a intermediate reference with parent_id linking to the parent referenced object. The dynptr and all clones share the same parent_id (pointing to the intermediate ref) but get unique ids for independent slice tracking. Releasing a referenced dynptr releases the parent reference, which in turn invalidates all clones and their derived slices. Owning to non-owning reference conversion: After converting owning to non-owning by clearing id (e.g., object(id=1) -> object(id=0)), the verifier releases the reference state via release_reference_nomark(). Note that the error message "reference has not been acquired before" in the helper and kfunc release paths is removed. This message was already unreachable. The verifier only calls release_reference() after confirming the reference is valid, so the condition could never trigger in practice. Fixes: 870c28588afa ("bpf: net_sched: Add basic bpf qdisc kfuncs") Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-6-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Unify dynptr handling in the verifierAmery Hung
Simplify dynptr checking for helper and kfunc by unifying it. Remember the initialized dynptr (i.e.,g !(arg_type |= MEM_UNINIT)) pass to a dynptr kfunc during process_dynptr_func() so that we can easily retrieve the information for verification later. By saving it in meta->dynptr, there is no need to call dynptr helpers such as dynptr_id(), dynptr_ref_obj_id() and dynptr_type() in check_func_arg(). Remove and open code the helpers in process_dynptr_func() when saving id, ref_obj_id, and type. Besides, since dynptr ref_obj_id information is now pass around in meta->bpf_dynptr_desc, drop the check in helper_multiple_ref_obj_use. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01net: Remove orphaned ax25_ptr referencesCosta Shulyupin
The AX.25 subsystem was removed in commit dd8d4bc28ad7 ("net: remove ax25 and amateur radio (hamradio) subsystem"), which removed the ax25_ptr field from struct net_device but left behind the kdoc comment and documentation. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260531134837.4111349-1-costa.shul@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01ata: Annotate functions in the issuing path with __must_hold()Bart Van Assche
Annotate the following functions used in the issuing path: ata_qc_issue(), ata_sas_queuecmd(), ata_scsi_qc_issue(), ata_scsi_translate(), __ata_scsi_queuecmd() These functions are all used in the issuing path, so context analysis will be able to verify that the ap lock is held, from it is taken in sas_queuecommand() or ata_scsi_queuecmd() all the way down to ata_qc_issue(). Commenting out the spin_lock_irqsave() successfully results in a compiler error on Clang 23. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Co-developed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2026-06-01ata: libata: Document when host->eh_mutex should be heldBart Van Assche
Annotate the following functions with __must_hold(&host->eh_mutex): * All ata_port_operations.error_handler() implementations. * ata_eh_reset() and ata_eh_recover() because these functions call ata_eh_release() and ata_eh_acquire(). * All callers of ata_eh_reset() and ata_eh_recover(). Enable Clang's context analysis. This will cause the build to fail if e.g. a locking bug would be introduced in an error path. This patch should not affect the generated assembler code. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [cassel: drop note about clang 23 from commit log] Signed-off-by: Niklas Cassel <cassel@kernel.org>
2026-06-01Merge remote-tracking branch 'tip/locking/context' into for-7.2Niklas Cassel
2026-06-01Merge tag 'v7.1-rc6' into char-misc-nextGreg Kroah-Hartman
We need the char/misc/iio fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-01Merge tag 'v7.1-rc6' into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-01Merge tag 'v7.1-rc6' into usb-nextGreg Kroah-Hartman
We need the USB and Thunderbolt fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-01sunrpc: add a generic netlink family for cache upcallsJeff Layton
The auth.unix.ip and auth.unix.gid caches live in the sunrpc module, so they cannot use the nfsd generic netlink family. Create a new "sunrpc" generic netlink family with its own "exportd" multicast group to support cache upcall notifications for sunrpc-resident caches. Define a YAML spec (sunrpc_cache.yaml) with a cache-type enum (ip_map, unix_gid), a cache-notify multicast event, and the corresponding uapi header. Implement sunrpc_cache_notify() in cache.c, which checks for listeners on the exportd multicast group, builds and sends a SUNRPC_CMD_CACHE_NOTIFY message with the cache-type attribute. Register/unregister the sunrpc_nl_family in init_sunrpc() and cleanup_sunrpc(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-06-01sunrpc: add helpers to count and snapshot pending cache requestsJeff Layton
Add sunrpc_cache_requests_count() and sunrpc_cache_requests_snapshot() to allow callers to count and snapshot the pending upcall request list without exposing struct cache_request outside of cache.c. Both functions skip entries that no longer have CACHE_PENDING set. The snapshot function takes a cache_get() reference on each item so the caller can safely use them after the queue_lock is released. These will be used by the nfsd generic netlink dumpit handler for svc_export upcall requests. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-06-01sunrpc: add a cache_notify callbackJeff Layton
A later patch will be changing the kernel to send a netlink notification when there is a pending cache_request. Add a new cache_notify operation to struct cache_detail for this purpose. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-06-01sunrpc: rename sunrpc_cache_pipe_upcall_timeout()Jeff Layton
This function doesn't have anything to do with a timeout. The only difference is that it warns if there are no listeners. Rename it to sunrpc_cache_upcall_warn(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-06-01sunrpc: rename sunrpc_cache_pipe_upcall() to sunrpc_cache_upcall()Jeff Layton
Since it will soon also send an upcall via netlink, if configured. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-06-01Merge remote-tracking branches 'vfs/vfs-7.2.casefold', ↵Chuck Lever
'vfs/vfs-7.2.directory.delegations' and 'vfs/vfs-7.2.exportfs' into vfs-7.2-merge
2026-06-01ASoC: nau8822: add support for supply regulatorsMark Brown
Alexey Charkov <alchark@flipper.net> says: The Nuvoton NAU8822 codec has four power supply pins: VDDA, VDDB, VDDC and VDDSPK, which must be online and stable before the device can be accessed over I2C. On boards where these rails are software-controlled, probing the codec before the regulators are up results in -ENXIO errors during register access. This short series adds optional regulator support to both the device tree binding and the driver, so platforms that need explicit power sequencing can describe and enforce it: Link: https://patch.msgid.link/20260525-nau8822-reg-v2-0-7d37ae393e46@flipper.net
2026-06-01spi: fsl-lpspi: fix DMA termination issuesMark Brown
Carlos Song (OSS) <carlos.song@oss.nxp.com> says: This series fixes two issues in the fsl-lpspi DMA transfer error paths. Patch 1 replaces the deprecated dmaengine_terminate_all() with dmaengine_terminate_sync() across all error paths in fsl_lpspi_dma_transfer(). Patch 2 fixes a missing RX DMA channel termination when TX descriptor preparation fails. Since the RX channel is already submitted and issued before the TX descriptor is prepared, returning -EINVAL without terminating the RX channel leaves it running against buffers that the SPI core will unmap, potentially causing memory corruption. Link: https://patch.msgid.link/20260525062357.3191349-1-carlos.song@oss.nxp.com
2026-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Conflicts: drivers/net/ethernet/microsoft/mana/mana_en.c: 17bfe0a8c014e ("net: mana: Add NULL guards in teardown path to prevent panic on attach failure") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-06-01kho: docs: fix typo in ABI documentationLongWei
Replace "Indentifies" with "Identifies". Signed-off-by: Long Wei <longwei27@huawei.com> Link: https://patch.msgid.link/20260516085653.2193872-1-longwei27@huawei.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: Reference count incoming FLB dataDavid Matlack
Increment the incoming FLB refcount in liveupdate_flb_get_incoming() so that the FLB structure cannot be freed while the caller is actively using it. Add an additional liveupdate_flb_put_incoming() function so the caller can explicitly indicate when it is done using the FLB data. During a Live Update, a subsystem might need to hold onto the incoming File-Lifecycle-Bound (FLB) data for an extended period, such as during device enumeration. Incrementing the reference count guarantees that the data remains valid and accessible until the subsystem releases it, preventing future use-after-free bugs. Fixes: cab056f2aae7 ("liveupdate: luo_flb: introduce File-Lifecycle-Bound global state") Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20260423174032.3140399-3-dmatlack@google.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: Use refcount_t for FLB reference countsDavid Matlack
Use refcount_t instead of a raw integer to keep track of references on incoming and outgoing FLBs. Using refcount_t provides protection from overflow, underflow, and other issues. Fixes: cab056f2aae7 ("liveupdate: luo_flb: introduce File-Lifecycle-Bound global state") Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20260423174032.3140399-2-dmatlack@google.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01kho: fix deferred initialization of scratch areasMichal Clapinski
Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, kho_release_scratch() will initialize the struct pages and set migratetype of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some of that will be overwritten either by deferred_init_pages() or memmap_init_reserved_range(). To fix it, make memmap_init_range(), deferred_init_memmap_chunk() and __init_page_from_nid() recognize KHO scratch regions and set migratetype of pageblocks in those regions to MIGRATE_CMA. Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Michal Clapinski <mclapinski@google.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Link: https://patch.msgid.link/20260423122538.140993-2-mclapinski@google.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>