summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
16 hoursMerge tag 'liveupdate-v7.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux Pull liveupdate updates from Mike Rapoport: "Kexec Handover (KHO): - make memory preservation compatible with deferred initialization of the memory map Live Update Orchestrator (LUO): - add LIVEUPDATE_SESSION_GET_NAME ioctl and parameter verification for LIVEUPDATE_IOCTL_CREATE_SESSION ioctl - documentation updates for liveupdate=on command line option, systemd support and the current compatibility status - remove the fixed limits on the number of files that can be preserved within a single session, and the total number of sessions managed by the LUO Misc fixes: - reference count incoming File-Lifecycle-Bound (FLB) data so it cannot be freed while a subsystem is still using it - fixes for a TOCTOU race in luo_session_retrieve(), a use- after-free in the file finish and unpreserve paths, concurrent session mutations during reboot and serialization on preserve_context kexec - make sure ioctls for incoming LUO sessions are blocked for outgoing sessions and vice versa - make sure KHO scratch size is always aligned by CMA_MIN_ALIGNMENT_BYTES - fix memblock tests build issue introduced by KHO changes" * tag 'liveupdate-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux: (36 commits) liveupdate: Document that retrieve failure is permanent docs: memfd_preservation: fix rendering of ABI documentation selftests/liveupdate: Add stress-files kexec test selftests/liveupdate: Add stress-sessions kexec test selftests/liveupdate: Test session and file limit removal liveupdate: Remove limit on the number of files per session liveupdate: Remove limit on the number of sessions liveupdate: defer session block allocation and physical address setting kho: add support for linked-block serialization liveupdate: Extract luo_session_deserialize_one helper liveupdate: Extract luo_file_deserialize_one helper liveupdate: register luo_ser as KHO subtree liveupdate: centralize state management into struct luo_ser liveupdate: avoid mixing cleanup guards with goto in luo_session_retrieve_fd liveupdate: change file_set->count type to u64 for type safety liveupdate: Remove unused ser field from struct luo_session liveupdate: fix u-a-f in luo_file_unpreserve_files() and luo_file_finish() liveupdate: block session mutations during reboot liveupdate: fix TOCTOU race in luo_session_retrieve() liveupdate: skip serialization for context-preserving kexec ...
3 daysMerge tag 'landlock-7.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "This adds new Landlock access rights to control UDP bind and connect/send operations, and a new "quiet" feature to mute specific specific audit logs (and other future observability events). A few commits also fix Landlock issues" * tag 'landlock-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (24 commits) selftests/landlock: Add tests for invalid use of quiet flag selftests/landlock: Add tests for quiet flag with scope selftests/landlock: Add tests for quiet flag with net rules selftests/landlock: Add tests for quiet flag with fs rules selftests/landlock: Replace hard-coded 16 with a constant samples/landlock: Add quiet flag support to sandboxer landlock: Suppress logging when quiet flag is present landlock: Add API support and docs for the quiet flags landlock: Add a place for flags to layer rules landlock: Add documentation for UDP support samples/landlock: Add sandboxer UDP access control selftests/landlock: Add tests for UDP send selftests/landlock: Add tests for UDP bind/connect landlock: Add UDP send+connect access control landlock: Add UDP bind() access control landlock: Fix unmarked concurrent access to socket family selftests/landlock: Explicitly disable audit in teardowns selftests/landlock: Test SCOPE_SIGNAL on the SIGIO/fowner pgid path landlock: Fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path landlock: Demonstrate best-effort allowed_access filtering ...
3 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "arm64: This is a bit of an odd merge window on the KVM/arm64 front. There is absolutely no new feature in the pull request. It is purely fixes, because it is simply becoming too hard to review new stuff when so many AI-fuelled fixes hit the list. - Significant cleanup of the vgic-v5 PPI support which was merged in 7.1. This makes the code more maintainable, and squashes a couple of bugs in the meantime - Set of fixes for the handling of the MMU in an NV context, particularly VNCR-triggered faults. S1POE support is fixed as well - Large set of pKVM fixes, mostly addressing recurring issues around hypervisor tracking of donated pages in obscure cases where the donation could fail and leave things in a bizarre state - Fixes for the so-called "lazy vgic init", which resulted in sleeping operations in non-preemptible sections. This turned out to be far more invasive than initially expected.. - Reduce the overhead of L1/L2 context switch by not touching the FP registers - Fix the way non-implemented page sizes are dealt with when a guest insist on using them for S2 translation - The usual set of low-impact fixes and cleanups all over the map Loongarch: - On a request for lazy FPU load, load all FPU state that the VM supports instead of enabling only the part (FPU, LSX or LASX) that caused the FPU load request - Some enhancements about interrupt injection - Some bug fixes and other small changes RISC-V: - Batch G-stage TLB flushes for GPA range based page table updates - Convert HGEI line management to fully per-HART - Fix missing CSR dirty marking when FWFT state updated via ONE_REG - Fix stale FWFT feature exposure to Guest/VM - Speed up dirty logging write faults using MMU rwlock and atomic PTE updates using cmpxchg() for permission-only changes - Use flexible array for APLIC IRQ state - Use kvm_slot_dirty_track_enabled() for logging enable check on a memslot - Avoid skipping valid pages in kvm_riscv_gstage_wp_range() - Avoid skipping valid pages in kvm_riscv_gstage_unmap_range() - Use endian-specific __lelong for NACL shared memory S390: - KVM_PRE_FAULT_MEMORY support - Support for 2G hugepages - Support for the ASTFLEIE 2 facility - Support for fast inject using kvm_arch_set_irq_inatomic - Fix potential leak of uninitialized bytes - A few more misc gmap fixes x86: - Generic support for the more granular permissions allowed by EPT, namely "read" (which was previously usurping the U bit) and separate execution bits for kernel and userspace - Do not assume that all page tables start with U=1/W=1/NX=0 at the root, as AMD GMET needs to have U=0 at the root - Introduce common assembly macros for use within Intel and AMD vendor-specific vmentry code. This touches the SPEC_CTRL handling, which is now entirely done in assembly for Intel (by reusing the AMD code that already existed), and register save/restore which uses some macro magic to compute the offsets in the struct. Both of these are preparatory changes for upcoming APX support - Clean up KVM's register tracking and storage, primarily to prepare for APX support, which expands the maximum number of GPRs from 16 to 32 - Keep a single copy of the PDPTRs rather than two, since architecturally there is just one - Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor code gets a chance to handle things like reaping the PML buffer - Update KVM's view of PV async enabling if and only if the MSR write fully succeeds - Fix a variety of issues where the emulator doesn't honor guest-debug state, and clean up related code along the way - Synthesize EPT Violation and #NPF "error code" bits when injecting faults into L1 that didn't originate in hardware (in which case the VMCS/VMCB doesn't hold relevant information) - Add support for virtualizing (well, emulating) AMD's flavor of CPL>0 CPUID faulting - Clean up the GPR APIs so that KVM's use of "raw" is consistent, and fix a variety of minor bugs along the way - Fix an OOB memory access due to not checking the VP ID when handling a Hyper-V PV TLB flush for L2 - Fix a bug in the mediated PMU's handling of fixed counters that allowed the guest to bypass the PMU event filter - Allow userspace to return EAGAIN when handling SNP and TDX hypercalls, so the KVM can forward a "retry" status code to the guest, and reserve all unused error codes for future usage - Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific logic as possible into the TDX code, and to funnel (almost) all S-EPT updates into a single chokepoint. The motivation is largely to prepare for upcoming Dynamic PAMT support, but the cleanups are nice to have on their own - Plug a hole in shadow page table handling, where KVM fails to recursively zap nested EPT/NPT shadow page tables when the nested hypervisor tears down its own EPT/NPT page tables from the bottom up x86 (Intel): - Support for nested MBEC (Mode-Based Execute Control), see above in the generic section; also run with MBEC enabled even for non-nested mode - Use the kernel's "enum pg_level" in the TDX APIs instead of the TDX-Module's level definitions (which are 0-based) - Rework the TDX memory APIs to not require/assume that guest memory is backed by "struct page" (in prepartion for guest_memfd hugepage support) - Fix a largely benign bug where KVM TDX would incorrectly state it could emulate several x2APIC MSRs - Use the "safe" WRMSR API when proxying LBR MSR writes as the to-be-written value is guest controlled and completely unvalidated x86 (AMD): - Support for nested GMET (Guest Mode Execution Trap), see above in the generic section; also run with GMET enabled even for non-nested mode - Fixes and minor cleanups to GHCB handling, on top of the earlier work already merged into 7.1-rc - Ensure KVM's copy of CR0 and CR3 are up-to-date prior to invoking fastpath handlers - Add support for virtualizing gPAT (KVM previously just used L1's PAT when running L2) - Fix goofs where KVM mishandles side effects (e.g. single-step and PMC updates) when emulating VMRUN - Fix a variety of bugs in AVIC's handling of x2APIC MSR interception, most notably where KVM didn't disable interception of IRR, ISR, and TMR regs - Add support for virtualizing Host-Only/Guest-Only bits in the mediated PMU - Don't advertise support for unusable VM types, and account for VM types that are disabled by firmware, e.g. to mitigate security vulnerabilities - Rewrite the SEV {en,de}crypt debug ioctls as they were riddle with bugs and unnecessarily complicated, and add comprehensive tests - Clean up and deduplicate the SEV page pinning code - Fix minor goofs related to writing back CPUID information after firmware rejects a CPUID page for an SNP vCPU Generic: - Rename invalidate_begin() to invalidate_start() throughout KVM to follow the kernel's nomenclature, e.g. for mmu_notifiers - Use guard() to cleanup up various KVM+VFIO flows - Minor cleanups guest_memfd: - Return -EEXIST instead of -EINVAL if userspace attempts to bind a gmem range to multiple memslots, and fix the test that was supposed to ensure KVM returns -EEXIST - Treat memslot binding offsets and sizes as unsigned values to fix a bug where KVM interprets a large "offset + size" as a negative value and allows a nonsensical offset - Use the inode number instead of the page offset for the NUMA interleaving index to fix a bug where the effective index would jump by two for consecutive pages (the caller also adds in the page offset) Selftests: - Randomize the dirty log test's delay when reaping the bitmap on the first pass, as always waiting only 1ms hid a KVM RISC-V bug as the test reaped the bitmap before KVM could build up enough state to hit the bug - A pile of one-off fixes and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (326 commits) KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level KVM: x86: Fix shadow paging use-after-free due to unexpected role KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject KVM: s390: Enable adapter_indicators_set to use mapped pages KVM: s390: Add map/unmap ioctl and clean mappings post-guest riscv: kvm: Use endian-specific __lelong for NACL shared memory KVM: selftests: access_tracking_perf_test: bump number of NUMA nodes to 32 KVM: s390: vsie: Implement ASTFLEIE facility 2 KVM: s390: vsie: Refactor handle_stfle s390/sclp: Detect ASTFLEIE 2 facility KVM: s390: Minor refactor of base/ext facility lists KVM: x86/mmu: move pdptrs out of the MMU KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare KVM: x86: remove nested_mmu from mmu_is_nested() KVM: arm64: vgic-its: Make ABI commit helpers return void KVM: s390: Initialize KVM_S390_GET_CMMA_BITS memory LoongArch: KVM: Add missing slots_lock for device register/unregister LoongArch: KVM: Validate irqchip index in irqfd routing ...
3 daysMerge tag 'media/v7.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - v4l2: - core: fix subdev sensor ownership - subdev: Allow accessing routes with STREAMS client capability - ctrls: Add validation for HEVC active reference counts and background detection control - common: Add YUV24 format info and has_alpha helper - vb2: Change vb2_read() and vb2_write() return types to ssize_t - i2c: cvs: Add driver of Intel Computer Vision Sensing Controller(CVS) - atmel-isc: remove deprecated driver - cec: Add CEC Latency Indication Protocol (LIP) support - imon: Add iMON VFD HID OEM v1.2 key mappings - AVMatrix: new HWS capture driver - isp4: new AMD capture driver - qcom: - iris: Add hierarchical coding, B-frame, and Long-Term Reference support for encoder - camss: Add SM6350 platform support - venus: Add SM6115 platform support - chips-media: wave5: Add support for Packed YUV422, CBP profile, and background detection - csi2rx: Add multistream support and 32 dma chans - Several cleanups and fixes * tag 'media/v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (394 commits) media: v4l2-fwnode: Fix subdev owner overwritten in v4l2_async_register_subdev_sensor() media: qcom: iris: vdec: allow GEN2 decoding into 10bit format media: qcom: iris: vdec: update find_format to handle 8bit and 10bit formats media: qcom: iris: vdec: update size and stride calculations for 10bit formats media: qcom: iris: gen2: add support for 10bit decoding media: qcom: iris: add QC10C & P010 buffer size calculations media: qcom: iris: add helpers for 8bit and 10bit formats media: qcom: iris: Fix FPS calculation and VPP FW overhead media: qcom: camss: vfe-340: Support for PIX client media: qcom: camss: vfe-340: Proper client handling media: qcom: camss: csid-340: Enable PIX interface routing media: qcom: camss: csid-340: Add port-to-interface mapping media: qcom: camss: csid-340: Switch to generic CSID_CFG/CTRL registers media: iris: Initialize HFI ops after firmware load in core init media: iris: drop struct iris_fmt media: iris: Add platform data for X1P42100 media: iris: Add hardware power on/off ops for X1P42100 media: iris: optimize COMV buffer allocation for VPU3x and VPU4x media: iris: add FPS calculation and VPP FW overhead in frequency formula media: qcom: iris: Simplify COMV size calculation ...
4 daysMerge tag 'nfsd-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Jeff Layton wired up netlink upcalls for the auth.unix.ip and auth.unix.gid caches in SunRPC and the svc_export and nfsd.fh caches in NFSD. The new kernel-user API is more extensible and lays the groundwork for retiring the old pipe interface. The default NFS r/w block size rises to 4MB on hosts with at least 16GB of RAM, reducing per-RPC overhead on fast networks. Smaller machines keep their previously computed default, and the value remains tunable through /proc/fs/nfsd/max_block_size. Chuck Lever converted the server's RPCSEC GSS Kerberos code to the kernel's shared crypto/krb5 library. The conversion retires and removes SunRPC's bespoke implementation of Kerberos v5, but keeps RPCSEC GSS-API. Continuing the xdrgen migration that converted the NLMv4 server XDR layer in v7.1, Chuck Lever converted the NLM version 3 server-side XDR layer from hand-written C to xdrgen-generated code. As with the NLMv4 conversion in v7.1, the goals are improved memory safety, lower maintenance burden, and groundwork for generation of Rust code for this layer instead of C. Chuck Lever fixed an issue where lingering NFSv4 state pins a mounted file system after it is unexported. A new netlink-based mechanism can now release NLM locks and NFSv4 state by client address, by filesystem, and by export. Now an administrator can quiesce an export cleanly before unmounting it. The remaining patches are bug fixes, clean-ups, and minor optimizations, including a batch of memory-leak and use-after-free fixes in the ACL, lockd, and TLS handshake paths, many of them reported by Chris Mason. Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.2 NFSD development cycle" * tag 'nfsd-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (106 commits) svcrdma: wake sq waiters when the transport closes nfsd: reset write verifier on deferred writeback errors nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race sunrpc: wait for in-flight TLS handshake callback when cancel loses race sunrpc: pin svc_xprt across the asynchronous TLS handshake callback nfsd: fix posix_acl leak on SETACL decode failure nfsd: fix posix_acl leak and ignored error in nfsd4_create_file nfsd: check get_user() return when reading princhashlen nfsd: fix inverted cp_ttl check in async copy reaper nfsd: fix dead ACL conflict guard in nfsd4_create NFSD: Fix SECINFO_NO_NAME decode error cleanup sunrpc: harden rq_procinfo lifecycle to prevent double-free SUNRPC: Return an error from xdr_buf_to_bvec() on overflow SUNRPC: Bound-check xdr_buf_to_bvec() stores before writing nfsd: release layout stid on setlease failure lockd: Avoid hashing uninitialized bytes in nlm4svc_lookup_file() lockd: Plug nlm_file refcount leak on cached nlm_do_fopen() failure lockd: Plug nlm_file leak when nlm_do_fopen() fails Revert "NFSD: Defer sub-object cleanup in export put callbacks" Revert "svcrdma: Use contiguous pages for RDMA Read sink buffers" ...
4 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Many AI driven bug fixes, and several big driver API cleanups - Driver bug fixes and minor cleanups in mlx5, hns, rxe, efa, siw, rtrs, mana, irdma, mlx4. Commonly error path flows, integer arithmetic overflows on unsafe data, out of bounds access, and use after free issues under races. - Second half of the new udata API for drivers focusing on uAPI response - bnxt_re supports more options for QP creation that will allow a dv path in rdma-core - Untangle the module dependencies so drivers don't link to ib_uverbs.ko as was originall intended - Provide a new way to handle umems with a consistent simplified uAPI and update several drivers to use it. This brings dmabuf support to more places and more drivers - Support for mlx5 rate limit and packet pacing for UD and UC - A batch of fixes for the new shared FRMR pools infrastructure" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (148 commits) RDMA/irdma: Replace waitqueue and flag with completion RDMA/hns: Fix memory leak of bonding resources RDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sg docs: infiniband: correct name of option to enable the ib_uverbs module RDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocated RDMA/bnxt_re: Fail DBR related page allocation UAPIs if the feature is disabled RDMA/bnxt_re: Avoid repeated requests to allocate WC pages RDMA/bnxt_re: Proper rollback if the ioremap fails RDMA/bnxt_re: Add a max slot check for SQ RDMA/bnxt_re: Avoid displaying the kernel pointer RDMA/bnxt_re: Free CQ toggle page after firmware teardown RDMA/bnxt_re: Free SRQ toggle page after firmware teardown RDMA/bnxt_re: Initialize dpi variable to zero ABI: sysfs-class-infiniband: minor cleanup RDMA/mlx5: Release the HW‑provided UAR index rather than the SW one RDMA/mlx5: Fix undefined shift of user RQ WQE size RDMA/mlx5: Remove raw RSS QP restrack tracking RDMA/mlx5: Remove DCT restrack tracking RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles ...
5 daysMerge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "All various fixes: - Typo breaking the veventq uAPI for 32 bit userspace - Several Sashiko found errors in the veventq and fault fd paths - Fix incorrect use of dmabuf locks, and possible races with iommufd destroy and dmabuf revoke - Sashiko errors found in the uAPI validation for IOMMU_HWPT_INVALIDATE" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommu: Avoid copying the user array twice in the full-array copy helper iommufd/selftest: Add invalidation entry_num and entry_len boundary tests iommufd: Set upper bounds on cache invalidation entry_num and entry_len iommufd: Clarify IOAS_MAP_FILE dma-buf support iommufd: Destroy the pages content after detaching from dmabuf iommufd: Take dma_resv lock before dma_buf_unpin() in release path iommufd/selftest: Cover invalid read counts on vEVENTQ FD iommufd: Avoid partial fault group delivery in iommufd_fault_fops_read() iommufd: Break the loop on failure in iommufd_fault_fops_read() iommufd: Reject invalid read count in iommufd_fault_fops_read() iommufd: Propagate allocation failure in iommufd_veventq_deliver_fetch() iommufd: Reject invalid read count in iommufd_veventq_fops_read() iommufd: Rewind header length in done if iommufd_veventq_fops_read() fails iommufd/selftest: Add boundary tests for veventq_depth iommufd: Set veventq_depth upper bound iommufd: Move vevent memory allocation outside spinlock iommufd: Fix data_len byte-count vs element-count mismatch iommufd: Use sizeof(*hdr) instead of sizeof(hdr) in veventq read
5 daysMerge tag 'iommu-updates-v7.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core Code: - Fix dma-iommu scatterlist length handling in the P2PDMA path - Extend the generic IOMMU page-table code with detailed gather support for more precise invalidations - Add pending-gather tracking to generic page-table invalidation handling - Add support for smaller virtual address sizes in the generic AMDv1 page-table format, including KUnit coverage - Fix page-size bitmap calculation for smaller VA configurations - Rework Arm io-pgtable allocation/freeing to consistently use the iommu-pages API and address-conversion helpers - Add PCI ATS infrastructure for devices that require ATS, including always-on ATS handling for pre-CXL devices AMD IOMMU: - Fix several IOTLB invalidation details, including PDE handling, flush-all behavior, and command address encoding - Honor IVINFO[VASIZE] when deriving address limits - Fix premature loop termination in init_iommu_one() - Add Hygon family 18h model 4h IOAPIC support - Clean up legacy-mode handling, stale comments, dead IVMD exclusion-range code, and unused address-size macros Arm SMMU / Arm SMMU v3: - SMMUv2: - Device-tree binding updates for Qualcomm Hawi, Nord and Shikra SoCs - Constrain the clocks which can be specified for recent Qualcomm SoCs - Fix broken compatible string for Qualcomm prefetcher configuration an add new entry for the Glymur MDSS - Ensure SMMU is powered-up when writing context bank for Adreno client - SMMUv3: - Fix off-by-one in queue allocation retry loop - Enable hardware update of access/dirty bits from the SMMU - Re-jig command construction to use separate inline helpers for each command type Intel VT-d: - Add the PCI segment number to DMA fault messages - Improve support for non-PRI mode SVA - Ensure atomicity during context entry teardown - Fix RB-tree corruption in the probe error path RISC-V IOMMU: - Add NAPOT range invalidation support - Use detailed gather information for invalidation decisions - Compute the best stride for single invalidations - Advertise Svpbmt support to the generic page-table code - Add capability definitions and clean up command macro encoding VeriSilicon IOMMU: - Add a new VeriSilicon IOMMU driver - Add devicetree binding documentation and MAINTAINERS coverage - Add the RK3588 VeriSilicon IOMMU node - Apply small cleanups and warning fixes in the new driver Rockchip IOMMU: - Disable the fetch DTE time limit Apple DART: - Correct a stale CONFIG_PCIE_APPLE macro name in a comment" * tag 'iommu-updates-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits) iommu/dma-iommu: Fix wrong scatterlist length assignment in P2PDMA path iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec iommu/amd: Have amd_iommu_domain_flush_pages() use last iommu/amd: Pass last in through to build_inv_address() iommu/amd: Simplify build_inv_address() iommu/apple-dart: correct CONFIG_PCIE_APPLE macro name in comment iommu/vt-d: Fix RB-tree corruption in probe error path iommu/vt-d: Improve IOMMU fault information iommu/vt-d: Remove typo from pasid_pte_config_nested() iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry iommu/vt-d: Avoid WARNING in sva unbind path dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali dt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC iommu/amd: Don't split flush for amd_iommu_domain_flush_all() iommu/rockchip: disable fetch dte time limit iommu/arm-smmu-v3: Allow ATS to be always on PCI: Allow ATS to be always on for pre-CXL devices PCI: Add pci_ats_required() for CXL.cache capable devices iommu/vsi: Use list_for_each_entry() ...
5 daysMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - new virtio CAN driver - support for LoongArch architecture in fw_cfg - support for firmware notifications in vdpa/octeon_ep - support for VFs in virtio core - fixes, cleanups all over the place, notably: - vhost: fix vhost_get_avail_idx for a non empty ring fixing an significant old perf regression - READ_ONCE() annotations mean virtio ring is now free of KCSAN warnings * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits) can: virtio: Fix comment in UAPI header can: virtio: Add virtio CAN driver virtio: add num_vf callback to virtio_bus fw_cfg: Add support for LoongArch architecture vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handler vdpa/octeon_ep: Add vDPA device event handling for firmware notifications vdpa/octeon_ep: Use 4 bytes for mailbox signature vdpa/octeon_ep: Fix PF->VF mailbox data address calculation vhost_task_create: kill unnecessary .exit_signal initialization vhost: remove unnecessary module_init/exit functions vdpa/mlx5: Use kvzalloc_flex() for MTT command memory vdpa_sim_net: switch to dynamic root device vdpa_sim_blk: switch to dynamic root device virtio-mem: Destroy mutex before freeing virtio_mem virtio-balloon: Destroy mutex before freeing virtio_balloon tools/virtio: fix build for kmalloc_obj API and missing stubs virtio_ring: Add READ_ONCE annotations for device-writable fields vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFO tools/virtio: check mmap return value in vringh_test vhost/net: complete zerocopy ubufs only once ...
5 daysMerge tag 'vfio-v7.2-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Fix out-of-tree vfio selftest builds with make O= (Jason Gunthorpe) - Allow vfio selftests to build when ARCH=x86 is used for 64-bit x86 builds (David Matlack) - Tighten vfio selftest infrastructure with stricter builds, safer path handling, sysfs helpers, and reusable device/VF-token setup. Build on that to add the SR-IOV UAPI selftest across supported IOMMU modes (Raghavendra Rao Ananta) - Conclude earlier vfio PCI BAR work already taken as v7.1 fixes by replacing vfio_pci_core_setup_barmap() and direct barmap[] access with vfio_pci_core_get_iomap(). Fix resulting sparse warnings (Matt Evans) - Simplify hisi_acc vfio-pci variant driver device-info reads by using the mailbox's new direct command-based read helper (Weili Qian) - Avoid duplicate reset handling in the Xe vfio-pci variant driver reset-done path (GuoHan Zhao) - Resolve a lockdep circular dependency splat by tracking active VFs with a private sriov_active flag rather than calling pci_num_vf() under memory_lock (Raghavendra Rao Ananta) - Add CXL DVSEC-based readiness polling for Blackwell-Next in the nvgrace-gpu vfio-pci variant driver, including interruptible, lockless waits to support worst case spec defined timeouts (Ankit Agrawal) - Prevent vfio_mig_get_next_state() from spinning forever on blocked migration state transition (Junrui Luo) - Fix a qat vfio variant driver migration resume race by taking the migration file lock before boundary checks (Giovanni Cabiddu) - Add explicit dependencies between vfio selftest output object files and output directories to ensure directories are always created (David Matlack) * tag 'vfio-v7.2-rc1' of https://github.com/awilliam/linux-vfio: vfio: selftests: Ensure libvfio output dirs are always created vfio/qat: fix f_pos race in qat_vf_resume_write() vfio: prevent infinite loop in vfio_mig_get_next_state() on blocked arc vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC vfio/pci: Use a private flag to prevent power state change with VFs vfio/pci: Fix sparse warning in vfio_pci_core_get_iomap() vfio/xe: avoid duplicate reset in xe_vfio_pci_reset_done hisi_acc_vfio_pci: simplify the command for reading device information vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() vfio: selftests: Add tests to validate SR-IOV UAPI vfio: selftests: Add helpers to alloc/free vfio_pci_device vfio: selftests: Add helper to set/override a vf_token vfio: selftests: Expose more vfio_pci_device functions vfio: selftests: Extend container/iommufd setup for passing vf_token vfio: selftests: Introduce a sysfs lib vfio: selftests: Introduce snprintf_assert() vfio: selftests: Add -Wall and -Werror to the Makefile vfio: selftests: Allow builds when ARCH=x86 vfio: selftests: Fix out-of-tree build with make O=
5 daysMerge tag 'soc-drivers-7.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "There are a few added drivers, but mostly the normal maintenance to drivers for firmware, memory controller and other soc specific hardware: - The NXP QuickEngine gets modern MSI support, which allows some cleanups to the GICv3 irqchip chip driver - A new SoC specific driver for the Renesas R-Car MFIS unit is added, encapsulating support for the on-chip mailbox and hwspinlock implementations that are not easily separated into individual drivers - The Qualcomm SoC drivers add support for additional SoC implementations, and flexibility around power management for the serial-engine driver as well as probing the LLCC driver using custom hardware descriptions inside of the device itself. - Added support for the Samsung thermal management unit - A cleanup to the Tegra 'PMC' driver interfaces to remove legacy APIs and allow multiple PMC instances everywhere. - Updates to the TI SCI and KNAS drivers to improve suspend/resume support. - Minor driver changes for mediatek, xilinx, allwinner, aspeed, tegra, broadcom, amd, microchip and starfive specific drivers - Memory controller updates for Tegra and Renesas for additional SoC types and other improvements. - Firmware driver updates for Arm FF-A, SMCCC and SCMI interfaces, to update driver probing, object lifetimes and address minor bugs" * tag 'soc-drivers-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (189 commits) Revert "firmware: zynqmp: Add dynamic CSU register discovery and sysfs interface" Revert "Documentation: ABI: add sysfs interface for ZynqMP CSU registers" memory: tegra234: drop dead NULL check in tegra234_mc_icc_aggregate() memory: tegra264: drop redundant tegra264_mc_icc_aggregate() memory: tegra186-emc: stop borrowing MC aggregate hook for EMC soc: aspeed: cleanup dead default for ASPEED_SOCINFO firmware: tegra: bpmp: Add support for multi-socket platforms firmware: tegra: bpmp: Propagate debugfs errors soc/tegra: pmc: Add Tegra238 support soc/tegra: pmc: Restrict power-off handler to Nexus 7 soc/tegra: pmc: Populate powergate debugfs only when needed soc/tegra: pmc: Move legacy code behind CONFIG_ARM guard soc/tegra: pmc: Remove unused legacy functions soc/tegra: pmc: Create PMC context dynamically firmware: samsung: acpm: remove compile-testing stubs firmware: samsung: acpm: Add devm_acpm_get_by_phandle helper firmware: samsung: acpm: Add TMU protocol support firmware: samsung: acpm: Make acpm_ops const and access via pointer firmware: samsung: acpm: Drop redundant _ops suffix in acpm_ops members firmware: samsung: acpm: Annotate rx_data->cmd with __counted_by_ptr ...
5 daysMerge tag 'drm-next-2026-06-17' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - xe: add initial CRI platform support - amdgpu: initial HDMI 2.1 FRL support - rust: add some new type concepts for device lifetimes - scheduler: moves to a fair algorithm and lots of cleanups But it's mostly the usual mountain of changes across the board. core: - add docbook for DRM_IOCTL_SYNCOBJ_EVENTFD - change signature of drm_connector_attach_hdr_output_metadata_property - dedup counter and timestamp retrieval in vblank code - parse AMD VSDB v3 in CTA extension blocks - add P230, Y7, XYYY2101010, T430, XVUY210101010 formats - don't call drop master on file close if not master - use drm_printf_indent in atomic / bridge - fix 32b format descriptions - docs: fix toctree - hdmi: add common TMDS character rates - fix drm_syncobj_find_fence leak rust: - introduce Higher-Ranked lifetime types - replace drvdata with scoped registration data - add GPUVM immediate mode abstraction for rust GPU drivers - introduce DeviceContext type state for drm::Device bridge: - clarify drm_bridge_get/put - create drm_get_bridge_by_endpoint and use it - analogix_dp: add panel probing - ite-it6211 - use drm audio hdmi helpers buddy: - add lockdep annotations dp: - add PR and VRR updates - mst: fix buffer overflows - add Adaptive Sync SDP decoding support - fix OOB reads in dp-mst ttm: - bump fpfn/lpfn to 64-bit scheduler: - change default to fair scheduler - map runqueue 1:1 with scheduler dma-buf: - port selftests to kunit - convert dma-buf system/heap allocators to module - add separate DMABUF_HEAPS_SYSTEM_CC_SHARED Kconfig udmabuf: - revert hugetlb support - fix error with CONFIG_DMA_API_DEBUG dma-fence: - fix tracepoints lifetime - remove unused signal on any support ras: - add clear error counter netlink command to drm ras gpusvm: - reject VMAs with VM_IO or VM_PFNMAP when creating SVM ranges - use IOVA allocations pagemap: - use IOVA allocations panels: - update to use ref counts - add support for CSW PNB601LS1-2, LGD LP116WHA-SPB1 - add support for waveshare panels - CMN N116BCN-EA1, CMN N140HCA-EEK, IVO M140NWFQ R5, - IVO, R140NWFW R0, BOE NT140*, BOE NV133FHM-N4F, - AUO B140*, AUO B133HAN06.6 and AUO B116XTN02.3 eDP panels - Surface Pro 12 Panel xe: - add CRI PCI-IDs - debugfs add multi-lrc info - engine init cleanup - PF fair scheduling auto provisioning - system controller support for CRI/Xe3p - PXP state machine fixes - Reset/wedge/unload corner case fixes - Wedge path memory allocation fixes - PAT type cleanups - Reject unsafe PAT for CPU cached memory - OA improvements for CRI device memory - kernel doc syntax in xe headers - xe_drm.h documentation fixes - include guard cleanups - VF CCS memory pool - i915/xe step unification - Xe3p GT tuning fixes - forcewake cleanup in GT and GuC - admin-only PF mode - enable hwmon energy attributes for CRI - enable GT_MI_USER_INTERRUPT - refactor emit functions - oa workarounds - multi_queue: allow QUEUE_TIMESTAMP register - convert stolen memory to ttm range manager - use xe2 style blitter as a feature flag - make drm_driver const - add/use IRQ page to HW engine definition - fix oops when display disabled i915: - enable PIPEDMC_ERROR interrupt - more common display code refactoring - restructure DP/HDMI sink format handling - eliminate FB usage from lowlevel pinning code - panel replay bw optimization - integrate sharpness filter into the scaler - new fb_pin abstraction for xe/i915 fb transparent handling - skip inactive MST connectors on HDCP - start switching to display specific registers - use polling when irq unavailable - Adaptive-sync SDP prep amdgpu: - use drm_display_info for AMD VSDB data - Initial HDMI 2.1 FRL support - Initial DCN 4.2.1 support - GART fixes for non-4k pages - GC 11.5.6/SDMA 6.4.0/and other new IPs - GFX9/DCE6/Hawaii/SDMA4/GART/Userq fixes - Finish support for using multiple SDMA queues for TTM operations - SWSMU updates - GC 12.1 updates - SMU 15.0.8 updates - DCN 4.2 updates - DC type conversion fixes - Enable DC power module - Replay/PSR updates - SMU 13.x updates - Compute queue quantum MQD updates - ASPM fix - Align VKMS with common implementation - DC analog support fixes - UVD 3 fixes - TCC harvesting fixes for SI - GC 11 APU module reload fix - NBIO 6.3.2 support - IH 7.1 updates - DC cursor fixes - VCN/JPEG user fence fixes - DC support for connectors without DDC - Prefer ROM BAR for default VGA device - DC bandwidth fixes - Add PTL support for profiler - Introduce dc_plane_cm and migrate surface update color path - Add FRL registers for HDMI 2.1 - Restructure VM state machine - Auxless ALPM support - GEM_OP locking/warning fixes - switch to system_dfl_wq amdkfd: - GPUVM TLB flush fix - Hotplug fix - Boundary check fixes - SVM fixes - CRIU fixes - add profiler API - MES 12.1 updates msm: - core: - fix shrinker documentation - IFPC enabled for gen8 - PERFCNTR_CONFIG ioctl support - GPU: - reworked UBWC handling - a810 support - MDSS: - add support for Milos platform - reworked UBWC handling - DisplayPort: - reworked HPD handling as prep for MST - DPU: - Milos platform support - reworked UBWC handling - DSI: - Milos platform support nova: - Hopper/Blackwell enablement (GH100/GB100/GB202) - FSP support - 32-bit firmware support - HAL functions - refactor GSP boot/unload - GA100 support - VBIOS hardening/refactoring - Adopt higher order lifetime types tyr: - define register blocks - add shmem backed GEM objects - adopt higher order lifetime types - move clock cleanup into Drop radeon: - Hawaii SMU fixes - CS parser fix - use struct drm_edid instead of edid amdxdna: - export per-client BO memory via fdinfo - AIE4 device support - support medium/lower power modes - expandable device heap support - revert read-only user-pointer BO mappings ivpu: - support frequency limiting panthor: - enable GEM shrinker support - add eviction and reclaim info to fdinfo v3d: - enable runtime PM mgag200: - support XRGB1555 + C8 ast: - support XRGB1555 + C8 - use constants for lots of registers - fix register handling imagination: - fence handling refactoring nouveau: - fix sched double call - expose VBIOS on GSP-RM systems - add GA100 support virtio: - add VIRTIO_GPU_F_BLOB_ALIGNMENT flag - add deferred mapping support gud: - add RCade Display Adapter hibmc: - fix no connectors usage mediatek: - hdmi: convert error handling - simplify mtk_crtc allocation exynos: - move fbdev emulation to drm client buffers - use drm format helpers for geometry/size - adopt core DMA tracking - fix framebuffer offset handling renesas: - add RZ/T2H SOC support versilicon: - add cursor plane support tegra: - use drm client for framebuffer" * tag 'drm-next-2026-06-17' of https://gitlab.freedesktop.org/drm/kernel: (1731 commits) dma-buf: move system_cc_shared heap under separate Kconfig accel/amdxdna: Clear sva pointer after unbind agp/amd64: Fix broken error propagation in agp_amd64_probe() accel/amdxdna: Require carveout when PASID and force_iova are disabled drm/amdkfd: always resume_all after suspend_all drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini drm/amd/display: Consult MCCS FreeSync cap only if requested & supported drm/amd/pm: Use strscpy in profile mode parsing drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype length drm/amdkfd: fix sysfs topology prop length on buffer truncation drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages drm/amd/pm: bound OD parameter parsing to stack array size drm/amd/pm: Stop pp_od_clk_voltage emit at PAGE_SIZE drm/amdkfd: Unwind debug trap enable on copy_to_user failure drm/amdgpu: validate the mes firmware version for gfx12.1 drm/amdgpu: validate the mes firmware version for gfx12 drm/amdgpu: compare MES firmware version ucode for gfx11 drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTS drm/amdgpu: restart the CS if some parts of the VM are still invalidated drm/amd/display: use unsigned types for local pipe and REG_GET counters ...
5 daysMerge tag 'bpf-next-7.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "Major changes: - Recover from BPF arena page faults using a scratch page and add ptep_try_set() for lockless empty-slot installs on x86 and arm64. This allows BPF kfuncs to access arena pointers directly. The 'arena_direct_access' stable branch was created for this work and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar Kartikeya Dwivedi) - Lift old restriction and support 6+ arguments in BPF programs and kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan) Other features and fixes: - Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease addition of new BTF kinds (Alan Maguire) - Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei Starovoitov) - Refactor object relationship tracking in the verifier and fix a dynptr use-after-free bug (Amery Hung) - Harden the signed program loader and reject exclusive maps as inner maps (Daniel Borkmann) - Replace the verifier min/max bounds fields with a circular number (cnum) representation and improve 32->64 bit range refinements (Eduard Zingerman) - Introduce the arena library and runtime (libarena) with a buddy allocator, rbtree and SPMC queue data structures, ASAN support and a parallel test harness. Allow subprograms to return arena pointers and switch to a BTF type-tag based __arena annotation (Emil Tsalapatis) - Cache build IDs in the sleepable stackmap path and avoid faultable build ID reads under mm locks (Ihor Solodrai) - Introduce the tracing_multi link to attach a single BPF program to many kernel functions at once. Allow specifying the uprobe_multi target via FD (Jiri Olsa) - Extend the bpf_list family of kfuncs with bpf_list_add/del(), and bpf_list_is_first/is_last/empty() (Kaitao Cheng) - Extend the BPF syscall with common attributes support for prog_load, btf_load and map_create (Leon Hwang) - Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu) - Add sleepable support for tracepoint programs and fix deadlocks in LRU map due to NMI reentry (Mykyta Yatsenko) - Fix OOB access in bpf_flow_keys, fix nullness analysis of inner arrays, enforce write checks for global subprograms (Nuoqi Gui) - Report the maximum combined stack depth and print a breakdown of instructions processed per subprogram (Paul Chaignon) - Add an XDP load-balancer benchmark and arm64 JIT support for stack arguments (Puranjay Mohan) - Add kfuncs to traverse over wakeup_sources (Samuel Wu) - Allow sleepable BPF programs to use LPM trie maps directly (Vlad Poenaru) - Many more fixes and cleanups across the verifier, BTF, sockmap, devmap, bpffs, security hooks, s390/riscv/loongarch JITs, rqspinlock, libbpf, bpftool, selftests" * tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits) selftests/bpf: Work around llvm stack overflow in crypto progs selftests/bpf: add test for bpf_msg_pop_data() overflow bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check sockmap: Fix use-after-free in udp_bpf_recvmsg() bpf, sockmap: keep sk_msg copy state in sync bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data() bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() selftsets/bpf: Retry map update on helper_fill_hashmap() selftests/bpf: Add test for sleepable lsm_cgroup rejection selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket selftests/bpf: Avoid static LLVM linking for cross builds selftests/bpf: Use common CFLAGS for urandom_read selftests/bpf: Initialize operation name before use tools/bpf: build: Append extra cflags libbpf: Initialize CFLAGS before including Makefile.include bpftool: Append extra host flags bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS bpftool: Pass host flags to bootstrap libbpf selftests/bpf: correct CONFIG_PPC64 macro name in comment ...
5 daysMerge tag 'net-next-7.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Work on removing rtnl_lock protection throughout the stack continues. In this chapter: - don't use rtnl_lock for IPv6 multicast routing configuration - don't take rtnl_lock in ethtool for modern drivers - prepare Qdisc dump callbacks for rtnl_lock removal - Support dumping just ifindex + name of all interfaces, under RCU. It's a common operation for Netlink CLI tools (when translating names to ifindexes) and previously required full rtnl_lock. - Support dumping qdiscs and page pools for a specific netdev. Even tho user space wants a dump of all netdevs, most of the time, the OOO programming model results in repeating the dump for each netdev. Which, in absence of a cache, leads to a O(n^2) behavior. - Flush nexthops once on multi-nexthop removal (e.g. when device goes down), another O(n^2) -> O(n) improvement. - Rehash locally generated traffic to a different nexthop on retransmit timeout. - Honor oif when choosing nexthop for locally generated IPv6 traffic. - Convert TCP Auth Option to crypto library, and drop non-RFC algos. - Increase subflow limits in MPTCP to 64 and endpoint limit to 256. - Support MPTCP signaling of IPv6 address + port (ADD_ADDR). We need to selectively skip reporting of the standard TCP Timestamp option, because they won't fit into the header space together (12 + 30 > 40). - Support using bridge neighbor suppression, Duplicate Address Detection, Gratuitous ARP and unsolicited NA forwarding - in EVPN deployments, e.g. VXLAN fabrics (IPv4 and IPv6). - Improve link state reporting for upper netdevs (e.g. macvlan) over tunnel devices (again, mostly for EVPN deployments). - Support binding GENEVE tunnels to a local address. - Speed up UDP tunnel destruction (remove one synchronize_rcu()). - Support exponential field encoding in multicast (IGMPv3 and MLDv2). - Support attaching PSP crypto offload to containers (veth, netkit). - Add a new IPSec Netlink message XFRM_MSG_MIGRATE_STATE that allows migrating individual IPsec SAs independently of their policies. The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA migration, lacks SPI for unique SA identification, and cannot express reqid changes or migrate Transport mode selectors. The new interface identifies the SA via SPI and mark, supports reqid changes, address family changes, encap removal, and uses an atomic create+install flow under x->lock to prevent SN/IV reuse during AEAD SA migration. - Implement GRO/GSO support for PPPoE. - Convert sockopt callbacks in a number of protocols to iov_iter. Cross-tree stuff: - Remove support for Crypto TFM cloning (unblocked after the TCP Auth Option rework). This feature regressed performance for all crypto API users, since it changed crypto transformation objects into reference-counted objects. - Add FCrypt-PCBC implementation to rxrpc and remove it from the global crypto API as obsolete and insecure. Wireless: - Major rework of station bandwidth handling, fixing issues with lower capability than AP. - Cleanups for EMLSR spec issues (drafts differed). - More Neighbor Awareness Networking (Wi-Fi Aware) work (multicast, schedule improvements, multi-station etc.) - Some Ultra High Reliability (UHR) / IEEE 802.11bn (D1.4) work (e.g. non-primary channel access, UHR DBE support). - Fine Timing Measurement ranging (i.e. distance measurement) APIs. Netfilter: - Use per-rule hash initval in nf_conncount. This avoids unnecessary lock contention with short keys (e.g. conntrack zones) in different namespaces. - Various safety improvements, both in packet parsing and object lifetimes. Notably add refcounts to conntrack timeout policy. Deletions: - Remove TLS + sockmap integration. TLS wants to pin user pages to avoid a copy, and sockmap wants to write to the input stream. More work on this integration is clearly needed, and we can't find any users (original author admitted that they never deployed it). - Remove support for TLS offload with TCP Offload Engine (the far more common opportunistic offload is retained). The locking looks unfixable (driver sleeps under TCP spin locks) and people from the vendor that added this are AWOL. - Remove more ATM code, trying to leave behind only what PPPoATM needs, AAL5 and br2684 with permanent circuits. - Remove AppleTalk. Let it join hamradio in our out of tree protocol graveyard, I mean, repository. - Disable 32-bit x_tables compatibility (32bit binaries on 64bit kernel) interface in user namespaces. To be deleted completely, soon. - Remove 5/10 MHz support from cfg80211/mac80211. Drivers: - Software: - Support DEVMEM/DMABUF Tx over NETMEM_TX_NO_DMA devices (netkit) - bonding: add knob to strictly follow 802.3ad for link state - New drivers: - Alibaba Elastic Ethernet Adaptor (cloud vNIC). - NXP NETC switch within i.MX94. - DPLL: - Add operational state to pins (implement in zl3073x). - Add generic DPLL type, for daisy-chaining DPLLs (implement in ice). - Ethernet high-speed NICs: - Huawei (hinic3): - enhance tc flow offload support with queue selection, tunnels - nVidia/Mellanox: - avoid over-copying payload to the skb's linear part (up to 60% win for LRO on slow CPUs like ARM64 V2) - expose more per-queue stats over the standard API - support additional, unprivileged PFs in the DPU configuration - support Socket Direct (multi-PF) with switchdev offloads - add a pool / frag allocator for DMA mapped buffers for control objects, save memory on systems with 64kB page size - take advantage of the ability to dynamically change RSS table size, even when table is configured by the user - increase the max RSS table size for even traffic distribution - Ethernet NICs: - Marvell/Aquantia: - AQC113 PTP support - Realtek USB (r8152): - support 10Gbit Link Speeds and Energy-Efficient Ethernet (EEE) - support firmware loaded (for RTL8157/RTL8159) - support for the RTL8159 - Intel (ixgbe): - support Energy-Efficient Ethernet (EEE) on E610 devices - Ethernet switches: - Airoha: - support multiple netdevs on a single GDM block / port - Marvell (mv88e6xxx): - support SERDES of mv88e6321 - Microchip (ksz8/9): - rework the driver callbacks to remove one indirection layer - Motorcomm (yt921x): - support port rate policing - support TBF qdisc offload - support ACL/flower offload - nVidia/Mellanox: - expose per-PG rx_discards - Realtek: - rtl8365mb: bridge offloading and VLAN support - Ethernet PHYs: - Airoha: - support Airoha AN8801R Gigabit PHYs. - Micrel: - implement 3 low-loss cable tunables - Realtek: - support MDI swapping for RTL8226-CG - support MDIO for RTL931x - Qualcomm: - at803x: Rx and Tx clock management for IPQ5018 PHY - Motorcomm: - support YT8522 100M RMII PHY - set drive strength in YT8531s RGMII - TI: - dp83822: add optional external PHY clock - Bluetooth: - hci_sync: add support for HCI_LE_Set_Host_Feature [v2] - SMP: use AES-CMAC library API - Intel: - support Product level reset - support smart trigger dump - Mediatek: - add event filter to filter specific event - Realtek: - fix RTL8761B/BU broken LE extended scan - WiFi: - Broadcom (b43): - new support for a 11n device - MediaTek (mt76): - support mt7927 - mt792x: broken usb transport detection - mt7921: regulatory improvements - Qualcomm (ath9k): - GPIO interface improvements - Qualcomm (ath12k): - WDS support - replace dynamic memory allocation in WMI Rx path - thermal throttling/cooling device support - 6 GHz incumbent interference detection - channel 177 in 5 GHz - Realtek (rt89): - RTL8922AU support - USB 3 mode switch for performance - better monitor radiotap support - RTL8922DE preparations" * tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1778 commits) ipv4: fib_rule: Move fib4_rules_exit() to ->exit(). net: serialize netif_running() check in enqueue_to_backlog() net: skmsg: preserve sg.copy across SG transforms appletalk: move the protocol out of tree appletalk: stop storing per-interface state in struct net_device selftests/bpf: test that TLS crypto is rejected on a sockmap socket selftests/bpf: drop the unused kTLS program from test_sockmap selftests/bpf: remove sockmap + ktls tests tls: remove dead sockmap (psock) handling from the SW path tls: reject the combination of TLS and sockmap atm: remove orphaned uAPI for deleted drivers, protocols and SVCs atm: remove unused ATM PHY operations atm: remove the unused pre_send and send_bh device operations atm: remove the unused change_qos device operation atm: remove SVC socket support and the signaling daemon interface atm: remove the local ATM (NSAP) address registry atm: remove dead SONET PHY ioctls atm: remove the unused send_oam / push_oam callbacks atm: remove AAL3/4 transport support net: dsa: sja1105: fix lastused timestamp in flower stats ...
6 daysatm: remove orphaned uAPI for deleted drivers, protocols and SVCsJakub Kicinski
ATM removals have left a number of uAPI headers and ioctl definitions with no in-kernel implementation behind them: - device headers for adapters deleted with the legacy PCI/SBUS drivers: atm_eni.h, atm_he.h, atm_idt77105.h, atm_nicstar.h, atm_zatm.h and the atmtcp pair atm_tcp.h / <linux/atm_tcp.h> - protocol headers for the removed CLIP, LANE and MPOA stacks: atmarp.h, atmclip.h, atmlec.h, atmmpc.h - atmsvc.h and the SVC / p2mp / local-address ioctls in atmdev.h (ATM_{GET,RST,ADD,DEL}ADDR, ATM_{ADD,DEL,GET}LECSADDR, ATM_{ADD,DROP}PARTY) left behind by the SVC and address-registry removals None of these are referenced by any remaining in-tree code. Let's try to delete all this. Chances are nobody cares about these headers any more. I'm keeping this separate from the kernel side code changes for ease of revert, in case I am proven wrong... Link: https://patch.msgid.link/20260615194416.752559-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysMerge tag 'for-7.2/block-20260615' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Per-controller admin and IO timeout sysfs attributes, and letting the block layer set request timeouts (Maurizio, Maximilian) - Multipath passthrough iostats, and PCI P2PDMA enablement for multipath devices (Keith, Kiran) - A new diag sysfs attribute group exporting per-controller counters (retries, multipath failover, error counters, requeue and failure counts, reset and reconnect events) (Nilay) - FDP configuration validation and bounds check fixes (liuxixin) - Various nvmet fixes, including a pre-auth out-of-bounds read in the Discovery Get Log Page handler, auth payload bounds validation, and tcp error-path leak fixes (Bryam, Tianchu, Geliang) - nvme-tcp lockdep and workqueue fixes (Shin'ichiro, Kuniyuki, Eric) - Assorted other fixes and cleanups (John, Yao, Chao, Mateusz, Achkinazi, Wentao) - MD pull request via Yu Kuai: - raid1/raid10 fixes for a deadlock in the read error recovery path, error-path detection and bio accounting with cloned bios, and an nr_pending leak in the REQ_ATOMIC bad-block error path (Abd-Alrhman) - PCI P2PDMA propagation from member devices to the RAID device (Kiran) - dm-raid bio requeue fix, and various smaller fixes and cleanups (Benjamin, Chen, Li, Thorsten) - Enable Clang lock context analysis for the block layer, with the accompanying annotations across queue limits, the blk_holder_ops callbacks, crypto, cgroup, iocost, kyber and mq-deadline (Bart) - Block status code infrastructure work: a tagged status table, a str_to_blk_op() helper, a bio_endio_status() helper, and on top of that a new configurable block-layer error injection facility (Christoph) - DRBD netlink rework, replacing the genl_magic machinery with explicit netlink serialization and moving the DRBD UAPI headers to include/uapi/linux/ (Christoph Böhmwalder) - bvec improvements: a bvec_folio() helper and making the bvec_iter helpers proper inline functions (Willy, Christoph) - ublk cleanups and a canceling-flag fix for the disk-not-allocated case (Caleb, Ming) - Partition handling fixes: bound the AIX pp_count scan, fix an of_node refcount leak, and replace __get_free_page() with kmalloc() (Bryam, Wentao, Mike) - Convert numa_node to int in blk_mq_hw_ctx and ->init_request, and add WQ_PERCPU to the block workqueue users (Mateusz, Marco) - Block statistics and tracing: propagate in-flight to the whole disk on partition IO, export passthrough stats, and a new block_rq_tag_wait tracepoint (Tang, Keith, Aaron) - A round of removals, unexports and cleanups across bio, direct-io and the bvec helpers (Christoph) - Various driver fixes (mtip32xx use-after-free, rbd snap_count validation and strscpy conversion, nbd socket lockdep reclassify, virtio-blk zone report clamp, floppy) and a batch of MAINTAINERS email/list updates (Coly, Li, Yu, Christoph Böhmwalder) - Other little fixes and cleanups all over * tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (117 commits) MAINTAINERS: Update Coly Li's email address block: check bio split for unaligned bvec nbd: Reclassify sockets to avoid lockdep circular dependency block: add configurable error injection block: add a str_to_blk_op helper block: add a "tag" for block status codes block: add a macro to initialize the status table floppy: Drop unused pnp driver data block: propagate in_flight to whole disk on partition I/O virtio-blk: clamp zone report to the report buffer capacity block: optimize I/O merge hot path with unlikely() hints drivers/block/rbd: Use strscpy() to copy strings into arrays partitions: aix: bound the pp_count scan to the ppe array block: Enable lock context analysis block/mq-deadline: Make the lock context annotations compatible with Clang block/Kyber: Make the lock context annotations compatible with Clang block/blk-mq-debugfs: Improve lock context annotations block/blk-iocost: Inline iocg_lock() and iocg_unlock() block/blk-iocost: Split ioc_rqos_throttle() block/crypto: Annotate the crypto functions ...
6 daysMerge tag 'for-7.2/io_uring-20260615' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Rework the task_work infrastructure. Both the local (DEFER_TASKRUN) and the normal (tctx) task_work lists were llist based, which is LIFO ordered, and hence each run had to do an O(n) list reversal pass first to restore queue order. Additionally, to cap the amount of task_work run, each method needed a retry list as well. Add a lockless MPCS FIFO queue (based on Dmitry Vyukov's intrusive MPSC algorithm) and switch both task_work lists to it. It performs better than llists and we can then also ditch the retry lists as well as entries are popped one-at-the-time. On top of those changes, run the tctx fallback task_work directly and remove the now-unused per-ctx fallback machinery entirely. - zcrx user notifications. Add a mechanism for zcrx to communicate conditions back to userspace via a dedicated CQE, with the initial users being notification on running out of buffers and on a frag copy fallback, plus shared-memory notification statistics. Alongside that, a series of zcrx reliability and cleanup fixes: more reliable scrubbing, poisoning pointers on unregistration, dropping an extra ifq close, adding a ctx back-pointer, reordering fd allocation in the export path, and killing a dead 'sock' member. - Allow using io_uring registered buffers for plain SEND and RECV, not just for the zero-copy send path. This enables targets like ublk's NBD backend to push/pull IO data directly to/from a registered buffer over a plain send/recv on a TCP socket. - Registered buffer improvements: account huge pages correctly, bump the io_mapped_ubuf length field to size_t, and raise the previous 1GB registered buffer size limit. - Restrict the ctx access exposed to io_uring BPF struct_ops programs by handing them an opaque type rather than the full io_ring_ctx, and add a separate MAINTAINERS entry for the bpf-ops code. - Allow opcode filtering on IORING_OP_CONNECT. - Validate ring-provided buffer addresses with access_ok(), and align the legacy buffer add limit with MAX_BIDS_PER_BGID. - Various other cleanups and minor fixes, including avoiding msghdr async data on connect/bind, dropping async_size for OP_LISTEN, making the POLL_FIRST receive side checks consistent, re-checking IO_WQ_BIT_EXIT for each linked work item, and using trace_call__##name() at guarded tracepoint call sites. * tag 'for-7.2/io_uring-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (31 commits) io_uring/bpf-ops: add a separate maintainer entry io_uring/net: make POLL_FIRST receive side checks consistent io_uring: remove the per-ctx fallback task_work machinery io_uring: run the tctx task_work fallback directly io_uring: switch normal task_work to a mpscq io_uring: switch local task_work to a mpscq io_uring/mpscq: add lockless multi-producer, single-consumer FIFO queue io_uring: grab RCU read lock marking task run io_uring/zcrx: kill dead 'sock' member in struct io_zcrx_args io_uring/kbuf: validate ring provided buffer addresses with access_ok() io_uring/net: support registered buffer for plain send and recv io_uring/nop: Drop a wrong comment in struct io_nop io_uring/net: Remove async_size for OP_LISTEN io_uring/net: Avoid msghdr on op_connect/op_bind async data io_uring/bpf-ops: restrict ctx access to BPF io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item io_uring/kbuf: align legacy buffer add limit with MAX_BIDS_PER_BGID io_uring/zcrx: add shared-memory notification statistics io_uring/zcrx: notify user on frag copy fallback io_uring/zcrx: notify user when out of buffers ...
6 daysMerge tag 'for-7.2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The most noticeable change is to enable large folios by default, it's been in testing for a few releases. Related to that is huge folio support (still under experimental config). Otherwise a few ioctl updates, performance improvements and usual fixes and core changes. User visible changes: - enable large folios by default, added in 6.17 (under experimental build), no feature limitations, a big change internally - new ioctl to return raw checksums to userspace (a bit tricky given compression and tail extents), can be used for mkfs and deduplication optimizations - provide stable UUID for e.g. overlayfs and temp_fsid, also reflected in statvfs() field f_fsid, internal dev_t is hashed in to allow cloning - add 32bit compat version of GET_SUBVOL_INFO ioctl - in experimental build, support huge folios (up to 2M) Performance related improvements/changes: - limit bio size to the estimated optimum derived from the queue, this prevents build up of too much data for writeback, which could cause latency spikes (reported improvement 15% on sequential writes) - don't force direct IO to be serialized, forgotten change during mount API port, brings back +60% of throughput - lockless calculation of number of shrinkable extent maps, improve performance with many memcg allocated objects Notable fixes: - in zoned mode, fix a deadlock due to zone reclaim and relocation when space needs to be flushed - don't trim device which is internally not tracked as writeable (e.g. when missing device is being rescanned) - fix deadlock when cloning inline extent and mounted with flushoncommit - fix false IO failures after direct IO falls back to buffered write in some cases Core: - remove COW fixup mechanism completely; detect and fix changes to pages outside of filesystem tracking, guaranteed since 5.8, grace period is over - remove 2K block size support, experimental to test subpage code on x86_64 but now it would block folio changes - tree-checker improvements of: - free-space cache and tree items - root reference and backref items - extent state exceptions in reloc tree - subpage mode updates: - code optimizations, simplify tracking bitmaps - re-enable readahead of compressed extent - extend bitmap size to cover huge folios - add tracepoints related to sync, tree-log and transactions - device stats item tracking unification, remove item if there are no stats recorded, also don't leave stale stats on replaced device - allow extent buffer pages to be allocated as movable, to help page migration - added checks for proper extent buffer release - btrfs.ko code size reduction due to transaction abort call simplifications - several struct size reductions - more auto free conversions - more verbose assertions" * tag 'for-7.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (130 commits) btrfs: fix use-after-free after relocation failure with concurrent COW btrfs: move WARN_ON on unexpected error in __add_tree_block() btrfs: move locking into btrfs_get_reloc_bg_bytenr() btrfs: lzo: reject compressed segment that overflows the compressed input btrfs: retry faulting in the pages after a zero sized short direct write btrfs: fix incorrect buffered IO fallback for append direct writes btrfs: fix false IO failure after falling back to buffered write btrfs: use verbose assertions in backref.c btrfs: print a message when a missing device re-appears btrfs: do not trim a device which is not writeable btrfs: return real error after lookup failure in btrfs_ioctl_default_subvol() btrfs: use mapping shared locking for reading super block btrfs: use lockless read in nr_cached_objects shrinker callback btrfs: switch local indicator variables to bools btrfs: send: pass bool for pending_move and refs_processed parameters btrfs: use shifts for sectorsize and nodesize btrfs: fix deadlock cloning inline extent when using flushoncommit btrfs: allocate eb-attached btree pages as movable btrfs: add 32-bit compat ioctl for BTRFS_IOC_GET_SUBVOL_INFO btrfs: derive f_fsid from on-disk fsid and dev_t ...
6 daysMerge tag 'watchdog-for-v7.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull watchdog updates and fixes from Guenter Roeck: "Subsystem: - Unregister PM notifier on watchdog unregister - Various documentation fixes and improvements Removed drivers: - Remove AMD Elan SC520 processor watchdog driver - Drop SMARC-sAM67 support - Remove driver for integrated WDT of ZFx86 486-based SoC New drivers: - Driver for Andes ATCWDT200 - Driver for Gunyah Watchdog Added support to existing drivers: - Add "apple,t8103-wdt" and "apple,t8122-wdt" compatibles to Apple watchdog driver - Add rockchip,rk3528-wdt and rockchip,rv1103b-wdt to snps,dw-wdt.yaml - Document IPQ9650, IPQ5210, Shikra, Nord, and Hawi in qcom-wdt.yaml Also document sram property and add support to get the bootstatus to qcom wdt driver - lenovo_se10_wdt: Fix use-after-rfree and add support for SE10 Gen 2 platform - ti,rti-wdt: Add ti,am62l-rti-wdt compatible - renesas: Document RZ/G3L support and rework example for renesas,r9a09g057-wdt Other bug fixes and improvements: - Use named initializers (sc1200, ziirave_wdt) - Allow pic32-dmt and pic32-wdt to be built with COMPILE_TEST - realtek-otto: enable clock before using I/O, and prevent PHASE2 underflows - rti_wdt: Add reaction control - renesas,rzn1-wdt: Drop interrupt support and other cleanup - gpio_wdt: Add ACPI support - imx7ulp_wdt: Keep WDOG running until A55 enters WFI on i.MX94 - sprd_wdt: Remove redundant sprd_wdt_disable() on register failure - bcm2835_wdt: Switch to new sys-off handler API - sama5d4_wdt: Fix WDDIS detection on SAM9X60 and SAMA7G5 - hpwdt: Refine hpwdt message for UV platform - Convert TS-4800 bindings to DT schema - menz069_wdt: drop unneeded MODULE_ALIAS - sp5100_tco: Use EFCH MMIO for newer Hygon FCH" * tag 'watchdog-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (58 commits) watchdog: sc1200: Drop unused assignment of pnp_device_id driver data watchdog: unregister PM notifier on watchdog unregister dt-bindings: watchdog: qcom-wdt: Document IPQ5210 watchdog watchdog: dev: convert to kernel-doc comments watchdog: core: clean up some comments watchdog: uapi: add comments for what bit masks apply to watchdog: linux/watchdog.h: repair kernel-doc comments watchdog: add devm_watchdog_register_device() to watchdog-kernel-api watchdog: ziirave_wdt: Use named initializers for struct i2c_device_id watchdog: realtek-otto: enable clock before using I/O watchdog: realtek-otto: prevent PHASE2 underflows dt-bindings: watchdog: qcom-wdt: Document IPQ9650 watchdog dt-bindings: watchdog: renesas,rzn1-wdt: interrupts are not required dt-bindings: watchdog: apple,wdt: Add t8122 compatible watchdog: apple: Add "apple,t8103-wdt" compatible watchdog: rzn1: remove now obsolete interrupt support dt-bindings: watchdog: Add watchdog compatible for RK3528 watchdog: convert the Kconfig dependency on OF_GPIO to OF watchdog: Remove AMD Elan SC520 processor watchdog driver watchdog: lenovo_se10_wdt: Fix use-after-free and resource leak risk ...
7 daysnetdev: expose io_uring rx_page_order order via netlinkDragos Tatulea
This adds observability for the io_uring zcrx rx-buf-len configuration. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20260612211709.1456966-3-dtatulea@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'kvm-s390-next-7.2-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: New features for 7.2 New features for 7.2 for KVM/s390: * KVM_PRE_FAULT_MEMORY support * Support for 2G hugepages * Support for the ASTFLEIE 2 facility * kvm_arch_set_irq_inatomic Fast Inject * Fix potential leak of uninitialized bytes
7 daysMerge tag 'locking-core-2026-06-14' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Futex updates: - Optimize futex hash bucket access patterns (Peter Zijlstra) - Large series to address the robust futex unlock race for real, by Thomas Gleixner: "The robust futex unlock mechanism is racy in respect to the clearing of the robust_list_head::list_op_pending pointer because unlock and clearing the pointer are not atomic. The race window is between the unlock and clearing the pending op pointer. If the task is forced to exit in this window, exit will access a potentially invalid pending op pointer when cleaning up the robust list. That happens if another task manages to unmap the object containing the lock before the cleanup, which results in an UAF. In the worst case this UAF can lead to memory corruption when unrelated content has been mapped to the same address by the time the access happens. User space can't solve this problem without help from the kernel. This series provides the kernel side infrastructure to help it along: 1) Combined unlock, pointer clearing, wake-up for the contended case 2) VDSO based unlock and pointer clearing helpers with a fix-up function in the kernel when user space was interrupted within the critical section. ... with help by André Almeida: - Add a note about robust list race condition (André Almeida) - Add self-tests for robust release operations (André Almeida) Context analysis updates: - Implement context analysis for 'struct rt_mutex'. (Bart Van Assche) - Bump required Clang version to 23 (Marco Elver) Guard infrastructure updates: - Series to remove NULL check from unconditional guards (Dmitry Ilvokhin) Lockdep updates: - Restore self-test migrate_disable() and sched_rt_mutex state on PREEMPT_RT (Karl Mehltretter) Membarriers updates: - Use per-CPU mutexes for targeted commands (Aniket Gattani) - Modernize membarrier_global_expedited with cleanup guards (Aniket Gattani) - Add rseq stress test for CFS throttle interactions (Aniket Gattani) percpu-rwsems updates: - Extract __percpu_up_read() to optimize inlining overhead (Dmitry Ilvokhin) Seqlocks updates: - Allow UBSAN_ALIGNMENT to fail optimizing (Heiko Carstens) Lock tracing: - Add contended_release tracepoint to sleepable locks such as mutexes, percpu-rwsems, rtmutexes, rwsems and semaphores (Dmitry Ilvokhin) MAINTAINERS updates: - MAINTAINERS: Add RUST [SYNC] entry (Boqun Feng) Misc updates and fixes by Randy Dunlap, YE WEI-HONG, Fabricio Parra, Dmitry Ilvokhin and Peter Zijlstra" * tag 'locking-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (36 commits) locking: Add contended_release tracepoint to sleepable locks locking/percpu-rwsem: Extract __percpu_up_read() tracing/lock: Remove unnecessary linux/sched.h include futex: Optimize futex hash bucket access patterns rust: sync: completion: Mark inline complete_all and wait_for_completion MAINTAINERS: Add RUST [SYNC] entry cleanup: Specify nonnull argument index selftests: futex: Add tests for robust release operations Documentation: futex: Add a note about robust list race condition x86/vdso: Implement __vdso_futex_robust_try_unlock() x86/vdso: Prepare for robust futex unlock support futex: Provide infrastructure to plug the non contended robust futex unlock race futex: Add robust futex unlock IP range futex: Add support for unlocking robust futexes futex: Cleanup UAPI defines x86: Select ARCH_MEMORY_ORDER_TSO uaccess: Provide unsafe_atomic_store_release_user() futex: Provide UABI defines for robust list entry modifiers futex: Move futex related mm_struct data into a struct futex: Make futex_mm_init() void ...
7 daysbpf: Add support to specify uprobe_multi target via file descriptorJiri Olsa
Allow uprobe_multi link to identify the target binary by an already opened file descriptor. Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for the attr.link_create.uprobe_multi struct. When the flag is set, we resolve the target from path_fd, without the flag, we keep the existing string path behavior. I don't see a use case for supporting O_PATH file descriptors, because we need to read the binary first to get probes offsets, so I'm using the CLASS(fd, f), which fails for O_PATH fds. Assisted-by: Codex:GPT-5.4 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysMerge tag 'vfs-7.2-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ...
7 daysMerge tag 'vfs-7.2-rc1.openat2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull openat2 updates from Christian Brauner: "Features: - Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file descriptor from an O_PATH file descriptor it is possible to use openat(fd, ".", O_DIRECTORY) for directories, but other file types require going through open("/proc/<pid>/fd/<nr>") and thus depend on a functioning procfs. With O_EMPTYPATH an empty path string is accepted and LOOKUP_EMPTY is set at path resolution time, allowing to reopen the file behind the file descriptor directly. Selftests are included. - Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open anything but regular files with the new EFTYPE error code. This implements the "ability to only open regular files" feature requested by userspace via uapi-group.org and protects services from being redirected to fifos, device nodes, and friends. All atomic_open implementations were audited for OPENAT2_REGULAR handling. Explicit checks were added to ceph, gfs2, nfs (v4), and cifs/smb - these are the filesystems whose atomic_open can encounter an existing non-regular file and would otherwise call finish_open() on it or return a misleading error code. The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) only call finish_open() on freshly created files and use finish_no_open() for lookup hits, letting the VFS catch non-regular files via the do_open() safety net. Cleanups: - Migrate the openat2 selftests to the kselftest harness and move them under selftests/filesystems/. The tests were written in the early days of selftests' TAP support and the modern kselftest harness is much easier to follow and maintain. The contents of the tests are unchanged and the new emptypath tests are ported on top. - Make the LAST_XXX last-type constants private to fs/namei.c. The only user outside of fs/namei.c was ksmbd which only needs to know whether the last component is a regular one, so vfs_path_parent_lookup() now performs the LAST_NORM check internally. The ints are replaced with a dedicated enum last_type" * tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: replace ints with enum last_type for LAST_XXX vfs: make LAST_XXX private to fs/namei.c selftests: openat2: port emptypath_test to kselftest harness kselftest/openat2: test for OPENAT2_REGULAR flag openat2: new OPENAT2_REGULAR flag support openat2: introduce EFTYPE error code selftest: add tests for O_EMPTYPATH vfs: add O_EMPTYPATH to openat(2)/openat2(2) selftests: openat2: migrate to kselftest harness selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE selftests: openat2: move helpers to header selftests: move openat2 tests to selftests/filesystems/
7 daysMerge tag 'vfs-7.2-rc1.casefold' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs casefolding updates from Christian Brauner: "This exposes the case folding behavior of local filesystems so that file servers - nfsd, ksmbd, and user space file servers - can report the actual behavior to clients instead of guessing. Filesystems report case-insensitive and case-nonpreserving behavior via new file_kattr flags in their fileattr_get implementations. fat, exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are wired up. Local filesystems that are not explicitly handled default to the usual POSIX behavior of case-sensitive and case-preserving. nfsd uses this to report case folding via NFSv3 PATHCONF and to implement the NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING attributes - both have been part of the NFS protocols for decades to support clients on non-POSIX systems - and ksmbd reports it via FS_ATTRIBUTE_INFORMATION. Exposing the information through the fileattr uapi covers user space file servers. The immediate motivation is interoperability: Windows NFS clients hard-require servers to report case-insensitivity for Win32 applications to work correctly, and a client that knows the server is case-insensitive can avoid issuing multiple LOOKUP/READDIR requests searching for case variants. The Linux NFS client already grew support for case-insensitive shares years ago in support of the Hammerspace NFS server - negative dentry caching must be disabled (a lookup for "FILE.TXT" failing must not cache a negative entry when "file.txt" exists) and directory change invalidation must drop cached case-folded name variants. Such servers often operate in multi-protocol environments where a single file service instance caters to both NFS and SMB clients, and nfsd needs to report case folding properly to participate as a first-class citizen there. A follow-up series brings fixes for the initial work: the nfsd case-info probe now uses kernel credentials, maps -ESTALE to NFS3ERR_STALE, and has its cost capped across READDIR entries; the nfs client avoids transiently zeroed case capability bits during the probe and skips the pathconf probe when neither field is consumed; the FS_CASEFOLD_FL semantics are clarified in the UAPI header; and the tools UAPI headers are synced" * tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) nfsd: Cap case-folding probe cost across READDIR entries nfsd: Map -ESTALE from case probe to NFS3ERR_STALE nfsd: Use kernel credentials for case-info probe fs: Clarify FS_CASEFOLD_FL semantics in UAPI header nfs: Skip pathconf probe when neither field is consumed nfs: Avoid transient zeroed case capability bits during probe tools headers UAPI: Sync case-sensitivity flags from linux/fs.h ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING nfsd: Report export case-folding via NFSv3 PATHCONF isofs: Implement fileattr_get for case sensitivity vboxsf: Implement fileattr_get for case sensitivity nfs: Implement fileattr_get for case sensitivity cifs: Implement fileattr_get for case sensitivity xfs: Report case sensitivity in fileattr_get hfsplus: Report case sensitivity in fileattr_get hfs: Implement fileattr_get for case sensitivity ntfs3: Implement fileattr_get for case sensitivity exfat: Implement fileattr_get for case sensitivity fat: Implement fileattr_get for case sensitivity ...
8 dayslandlock: Add API support and docs for the quiet flagsTingmao Wang
Adds the UAPI for the quiet flags feature (but not the implementation yet). Even though currently LANDLOCK_ADD_RULE_QUIET only affects audit logging, in the future this can also be used as part of a supervisor mechanism, where it will also suppress denial notifications on a per-object basis. Thus the name is deliberately generic, as opposed to e.g. LANDLOCK_ADD_RULE_LOG_QUIET. According to pahole, even after adding the struct access_masks quiet_masks in struct landlock_hierarchy, the u32 log_* bitfield still only has a size of 2 bytes, so there's minimal wasted space. Assisted-by: GitHub-Copilot:claude-opus-4.8 Signed-off-by: Tingmao Wang <m@maowtm.org> [mic: Update date, fix comment formatting] Link: https://patch.msgid.link/031184748a8e74c0bb02f1fa13d7a3f10918c627.1781228815.git.m@maowtm.org Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 daysdevlink: Implement devlink param multi attribute nested data valuesSaeed Mahameed
Devlink param value attribute is not defined since devlink is handling the value validating and parsing internally, this allows us to implement multi attribute values without breaking any policies. Devlink param multi-attribute values are considered to be dynamically sized arrays of u64 values, by introducing a new devlink param type DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable count of u64 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute. Implement get/set parsing and add to the internal value structure passed to drivers. This is useful for devices that need to configure a list of values for a specific configuration. example: $ devlink dev param show pci/... name multi-value-param name multi-value-param type driver-specific values: cmode permanent value: 0,1,2,3,4,5,6,7 $ devlink dev param set pci/... name multi-value-param \ value 4,5,6,7,0,1,2,3 cmode permanent Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260609040453.711932-5-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 dayslandlock: Add UDP send+connect access controlMatthieu Buffet
Add support for a second fine-grained UDP access right. LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP controls the ability to set the remote port of a socket (via connect()) and to specify an explicit destination when sending a datagram, to override any remote peer set on a UDP socket (e.g. in sendto() or sendmsg()). It will be useful for applications that send datagrams, and for some servers too (those creating per-client sockets, which want to receive traffic only from a specific address). Similarly as for bind(), this access control is performed when configuring sockets, not in hot code paths. Add detection of when autobind is about to be required, and deny the operation if the process would not be allowed to call bind(0) explicitly. Autobind can only be performed in udp_lib_get_port() from code paths already controlled by LSM hooks: when connect()ing, sending a first datagram, and in some splice() EOF edge case which, afaiu, can only happen after a remote peer has been set. This invariant needs to be preserved to keep bind policies actually enforced. Signed-off-by: Matthieu Buffet <matthieu@buffet.re> Link: https://patch.msgid.link/20260611162107.49278-3-matthieu@buffet.re [mic: Add quick return for non-sandboxed tasks, fix sa_family dereferencing, fix comment formatting] Signed-off-by: Mickaël Salaün <mic@digikod.net>
8 dayslandlock: Add UDP bind() access controlMatthieu Buffet
Add support for a first fine-grained UDP access right. LANDLOCK_ACCESS_NET_BIND_UDP controls the ability to set the local port of a UDP socket (via bind()). It will be useful for servers (to start receiving datagrams), and for some clients that need to use a specific source port (e.g. mDNS requires to use port 5353) For obvious performance concerns, access control is only enforced when configuring sockets, not when using them for common send/recv operations. Bump ABI to allow userspace to detect and use this new right. Signed-off-by: Matthieu Buffet <matthieu@buffet.re> Link: https://patch.msgid.link/20260611162107.49278-2-matthieu@buffet.re [mic: Fix comment formatting] Signed-off-by: Mickaël Salaün <mic@digikod.net>
9 daysdpll: add generic DPLL typeGrzegorz Nitka
Add DPLL_TYPE_GENERIC to represent DPLL devices which do not fit the existing PPS or EEC classes. The UAPI type is intentionally generic. During netdev discussion, maintainers pointed out that introducing identifiers tied to a specific placement or single design does not scale across ASICs and vendors. The role of a DPLL is already inferable from the spawning driver, bus device, and pin topology, without encoding additional purpose-specific taxonomy in the type name. Using a generic type keeps the UAPI extensible and avoids premature naming that may become incorrect as new hardware topologies are exposed through the DPLL subsystem. Expose the new type through UAPI and netlink specification as "generic". Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Link: https://patch.msgid.link/20260607183045.1213735-2-grzegorz.nitka@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'ipsec-next-2026-06-12' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2026-06-12 1) Replace the open-coded manual cleanup in xfrm_add_policy() error path with xfrm_policy_destroy() for consistency with xfrm_policy_construct(). From Deepanshu Kartikey. 2) Limit XFRMA_TFCPAD to a sensible maximum (max IP length, 64k) since u32 is excessive for traffic flow confidentiality padding. From David Ahern. 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that allows migrating individual IPsec SAs independently of their policies. The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA migration, lacks SPI for unique SA identification, and cannot express reqid changes or migrate Transport mode selectors. The new interface identifies the SA via SPI and mark, supports reqid changes, address family changes, encap removal, and uses an atomic create+install flow under x->lock to prevent SN/IV reuse during AEAD SA migration. From Antony Antony. * tag 'ipsec-next-2026-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: add documentation for XFRM_MSG_MIGRATE_STATE xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration xfrm: make xfrm_dev_state_add xuo parameter const xfrm: extract address family and selector validation helpers xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper xfrm: move encap and xuo into struct xfrm_migrate xfrm: add error messages to state migration xfrm: add state synchronization after migration xfrm: check family before comparing addresses in migrate xfrm: split xfrm_state_migrate into create and install functions xfrm: rename reqid in xfrm_migrate xfrm: fix NAT-related field inheritance in SA migration xfrm: allow migration from UDP encapsulated to non-encapsulated ESP xfrm: add extack to xfrm_init_state xfrm: remove redundant assignments xfrm: Reject excessive values for XFRMA_TFCPAD xfrm: cleanup error path in xfrm_add_policy() ==================== Link: https://patch.msgid.link/20260612074725.1760473-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 dayspsp: add new netlink cmd for dev-assoc and dev-disassocWei Wang
The main purpose of this cmd is to be able to associate a non-psp-capable device (e.g. veth or netkit) with a psp device. One use case is if we create a pair of veth/netkit, and assign 1 end inside a netns, while leaving the other end within the default netns, with a real PSP device, e.g. netdevsim or a physical PSP-capable NIC. With this command, we could associate the veth/netkit inside the netns with PSP device, so the virtual device could act as PSP-capable device to initiate PSP connections, and performs PSP encryption/decryption on the real PSP device. Signed-off-by: Wei Wang <weibunny@fb.com> Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20260608233118.2694144-3-weibunny.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daystls: remove tls_toe and the related driverSabrina Dubroca
The tls_toe feature and its single user (chelsio chtls) have been unmaintained for multiple years. It also hooks into the core of the TCP implementation, and bypasses most of the networking stack. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/1f30e73275c07bf879f547589872d0916025a52e.1781165969.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysiommufd: Clarify IOAS_MAP_FILE dma-buf supportAlex Mastro
IOMMU_IOAS_MAP_FILE is documented as mapping a memfd, but the implementation first tries to resolve the fd as a dma-buf and has a special path for supported dma-buf exporters. In particular, VFIO PCI dma-bufs exported through VFIO_DEVICE_FEATURE_DMA_BUF can be mapped when they describe a single DMA range. Update the UAPI comment so userspace understands that certain kinds of dma-buf are supported in addition to memfd. Fixes: 44ebaa1744fd ("iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE") Link: https://patch.msgid.link/r/20260610-tmp-v1-1-b8ccbf557391@fb.com Signed-off-by: Alex Mastro <amastro@fb.com> Assisted-by: Codex:gpt-5.5-high Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
10 daysMerge branches 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', ↵Joerg Roedel
'rockchip', 'verisilicon', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next
10 daysnet: ethtool: add KSZ87xx low-loss cable PHY tunablesFidelio Lawson
Introduce vendor-specific PHY tunable identifiers to control the KSZ87xx low-loss cable erratum handling through the ethtool PHY tunable interface. The following tunables are added: - a boolean "short-cable" tunable, applying a documented and conservative preset intended for short or low-loss Ethernet cables; - an integer LPF bandwidth tunable, allowing advanced adjustment of the receiver low-pass filter bandwidth; - an integer DSP EQ initial value tunable, allowing advanced tuning of the PHY equalizer initialization. The actual behavior is implemented by the corresponding PHY and switch drivers. Reviewed-by: Marek Vasut <marex@nabladev.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com> Link: https://patch.msgid.link/20260609-ksz87xx_errata_low_loss_connections-v10-2-9ba4418cf3db@exotec.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 daysliveupdate: Document that retrieve failure is permanentTarun Sahu
Signed-off-by: Tarun Sahu <tarunsahu@google.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Link: https://patch.msgid.link/faec5df2a3e90240cde5897bae2250cd7d44aeac.1781170056.git.tarunsahu@google.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
12 daysbonding: 3ad: add lacp_strict configuration knobLouis Scalbert
When an 802.3ad (LACP) bonding interface has no slaves in the collecting/distributing state, the bonding master still reports carrier as up as long as at least 'min_links' slaves have carrier. In this situation, only one slave is effectively used for TX/RX, while traffic received on other slaves is dropped. Upper-layer daemons therefore consider the interface operational, even though traffic may be blackholed if the lack of LACP negotiation means the partner is not ready to deal with traffic. Introduce a configuration knob to control this behavior. It allows the bonding master to assert carrier only when at least 'min_links' slaves are in Collecting_Distributing state. The default mode preserves the existing behavior. This patch only introduces the knob; its behavior is implemented in the subsequent commit. Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20260603150331.1919611-4-louis.scalbert@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 dayscan: virtio: Fix comment in UAPI headerNathan Chancellor
When compile testing the UAPI headers with clang, there is an warning turned error for using a C++ style ('//') comment, which is explicitly forbidden for UAPI headers. In file included from <built-in>:1: ./usr/include/linux/virtio_can.h:29:35: error: // comments are not allowed in this language [-Werror,-Wcomment] 29 | #define VIRTIO_CAN_MAX_DLEN 64 // this is like CANFD_MAX_DLEN | ^ 1 error generated. Switch to a standard C style comment. Fixes: 2b6b4bb7d96f ("can: virtio: Add virtio CAN driver") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260604-virtio_can-fix-uapi-comment-v1-1-199fa96ec5f0@kernel.org>
12 dayscan: virtio: Add virtio CAN driverMatias Ezequiel Vara Larsen
Add virtio CAN driver based on Virtio 1.4 specification (see https://github.com/oasis-tcs/virtio-spec/tree/virtio-1.4). The driver implements a complete CAN bus interface over Virtio transport, supporting both CAN Classic and CAN-FD Ids. In term of frames, it supports classic and CAN FD. RTR frames are only supported with classic CAN. Usage: - "ip link set up can0" - start controller - "ip link set down can0" - stop controller - "candump can0" - receive frames - "cansend can0 123#DEADBEEF" - send frames Signed-off-by: Harald Mommer <harald.mommer@oss.qualcomm.com> Co-developed-by: Harald Mommer <harald.mommer@oss.qualcomm.com> Signed-off-by: Mikhail Golubev-Ciuchea <mikhail.golubev-ciuchea@oss.qualcomm.com> Co-developed-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Damir Shaikhutdinov <Damir.Shaikhutdinov@opensynergy.com> Reviewed-by: Francesco Valla <francesco@valla.it> Tested-by: Francesco Valla <francesco@valla.it> Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <ahXNb+KzuHYbS24+@fedora>
12 daysvirtio_console: Fix spelling mistake "colums" -> "columns"Ethan Carter Edwards
There is a spelling mistake in a struct description. Fix it. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260418-virtio-typo-v1-1-0df6f943a79d@ethancedwards.com>
13 daysNFSD: Add NFSD_CMD_UNLOCK_EXPORT netlink commandChuck Lever
When a filesystem is exported to NFS clients, NFSv4 state (opens, locks, delegations, layouts) holds references that prevent the underlying filesystem from being unmounted. NFSD_CMD_UNLOCK_FILESYSTEM addresses this at superblock granularity, but administrators unexporting a single path on a shared filesystem (e.g., one of several exports on the same device) need finer control. Add NFSD_CMD_UNLOCK_EXPORT, which revokes NFSv4 state acquired through exports of a specific path. Matching is by path identity (dentry + vfsmount) via the sc_export field on each nfs4_stid, so multiple svc_export objects for the same path -- one per auth_domain -- are handled correctly without requiring the caller to name a specific client. The command takes a single "path" attribute. Userspace (exportfs -u) sends this after removing the last client for a given path, enabling the underlying filesystem to be unmounted. When multiple clients share an export path, individual unexports do not trigger state revocation; only the final one does. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
13 daysNFSD: Add NFSD_CMD_UNLOCK_FILESYSTEM netlink commandChuck Lever
Add NFSD_CMD_UNLOCK_FILESYSTEM as a dedicated netlink command for revoking NFS state under a filesystem path, providing a netlink equivalent of /proc/fs/nfsd/unlock_fs. The command requires a "path" string attribute containing the filesystem path whose state should be released. The handler resolves the path to its superblock, then cancels async copies, releases NLM locks, and revokes NFSv4 state on that superblock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
13 daysNFSD: Add NFSD_CMD_UNLOCK_IP netlink commandChuck Lever
The existing write_unlock_ip procfs interface releases NLM file locks held by a specific client IP address, but procfs provides no structured way to extend that operation to other scopes such as revoking NFSv4 state. Add NFSD_CMD_UNLOCK_IP as a dedicated netlink command for releasing NLM locks by client address. The command accepts a binary sockaddr_in or sockaddr_in6 in its address attribute. The handler validates the address family and length, then calls nlmsvc_unlock_all_by_ip() to release matching NLM locks. Because lockd is a single global instance, that call operates across all network namespaces regardless of which namespace the caller inhabits. A separate netlink command for filesystem-scoped unlock is added in a subsequent commit. The nfsd_ctl_unlock_ip tracepoint is updated from string-based address logging to __sockaddr, which stores the binary sockaddr and formats it with %pISpc. This affects both the new netlink path and the existing procfs write_unlock_ip path, giving consistent structured output in both cases. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
13 daysKVM: s390: Add capability to support 2G hugepagesClaudio Imbrenda
Add KVM_CAP_S390_HPAGE_2G to signal to userspace that 2G hugepages may be used to back the guest; restrictions apply similar to 1M hugepages. Enable the (for now still ignored) GMAP_FLAG_ALLOW_HPAGE_2G flag for the guest gmap, and propagate / disable it as necessary. Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260609150930.665370-3-imbrenda@linux.ibm.com>
13 daysif_ether.h: add 802.1AC, warn about GRE 0x00FEDavid 'equinox' Lamparter
Because LLC wasn't complicated/annoying enough, there's 2 more "ethertypes" being used for it: - 0x8870 is pretty "normal", it got standardized in 802.1AC-2016/Cor1-2018 for transporting LLC frames > 1500 bytes. It simply replaces the length value (which is no longer encoded, and must now be derived from the packet.) The actual value dates back to 2001; https://datatracker.ietf.org/doc/html/draft-ietf-isis-ext-eth-01 (it was used without "proper" standardization for a long time) - 0x00fe is a doozy - actually "invalid" depending on how you look at it; it's used in GRE (and possibly GENEVE) tunnels to transport the IS-IS routing protocol. https://seclists.org/tcpdump/2002/q4/61 is the best/oldest source I could find. It's inspired by the 0xfe SAP value, a GRE packet with protocol 0x00fe is followed by a payload "as if" it was Ethernet with "<length> 0xfe 0xfe 0x03". (Again the length isn't encoded explicitly anymore.) The 0x00fe value is quite close to other values the kernel is using internally for various things (after all they "won't clash for 1500 types"). Except this one does clash, and if someone unknowingly starts using it for something internal... we end up in a world of pain in getting IS-IS running on GRE tunnels. Hence the "WARNING". Signed-off-by: David 'equinox' Lamparter <equinox@diac24.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Link: https://patch.msgid.link/20260605164144.81184-1-equinox@diac24.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysMerge tag 'batadv-next-pullrequest-20260605' of https://git.open-mesh.org/batadvJakub Kicinski
Simon Wunderlich says: ==================== This cleanup patchset includes the following patches, all by Sven Eckelmann: - tp_meter: initialize last_recv_time during init - convert cancellation of work items to disable helper - clean up wifi detection cache (3 patches) - clean up kernel-doc: corrections, reword, typos (6 patches) * tag 'batadv-next-pullrequest-20260605' of https://git.open-mesh.org/batadv: batman-adv: fix kernel-doc typos and grammar errors batman-adv: fix batadv_v_ogm_packet_recv error handling kernel-doc batman-adv: uapi: keep kernel-doc in struct member order batman-adv: bla: update stale kernel-doc batman-adv: tp_meter: update stale kernel-doc after refactoring batman-adv: correct batadv_wifi_* kernel-doc batman-adv: document cleanup of batadv_wifi_net_devices entries batman-adv: use GFP_KERNEL allocations for the wifi detection cache batman-adv: drop duplicated wifi_flags assignments batman-adv: convert cancellation of work items to disable helper batman-adv: tp_meter: initialize last_recv_time during init ==================== Link: https://patch.msgid.link/20260605072005.490368-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 dayswatchdog: uapi: add comments for what bit masks apply toRandy Dunlap
Add comments similar to those in include/linux/watchdog.h so that the reader/user doesn't have to dig into the API documentation files for this. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
14 daysbtrfs: add ioctl GET_CSUMS to read raw checksums from file rangeMark Harmstone
Add a new unprivileged BTRFS_IOC_GET_CSUMS ioctl, which can be used to query the on-disk csums for a file range. The ioctl is deliberately per-file rather than exposing raw csum tree lookups, to avoid leaking information to users about files they may not have access to. This is done by userspace passing a struct btrfs_ioctl_get_csums_args to the kernel, which details the offset and length we're interested in, and a buffer for the kernel to write its results into. The kernel writes a struct btrfs_ioctl_get_csums_entry into the buffer, followed by the csums if available. The maximum size of the user buffer is capped to 16MiB. If the extent is an uncompressed, non-NODATASUM extent, the kernel sets the entry type to BTRFS_GET_CSUMS_HAS_CSUMS and follows it with the csums. If it is sparse, preallocated, or beyond the EOF, it sets the type to BTRFS_GET_CSUMS_ZEROED - this is so userspace knows it can use the precomputed hash of the zero sector. Otherwise, it sets the type to BTRFS_GET_CSUMS_NODATASUM, BTRFS_GET_CSUMS_COMPRESSED, BTRFS_GET_CSUM_ENCRYPTED, or BTRFS_GET_CSUM_INLINE. For example, a file with a [0, 4K) hole and [4K, 12K) data extent would produce the following output buffer: | [0, 4K) ZEROED | [4K, 12K) HAS_CSUMS | csum data | We do store the csums of compressed extents, but we deliberately don't return them here: they're calculated over the compressed data, not the uncompressed data that's returned to userspace. Similarly for encrypted data, once encryption is supported, in which the csums will be on the ciphertext. The main use case for this is for speeding up mkfs.btrfs --rootdir. For the case when the source FS is btrfs and using the same csum algorithm, we can avoid having to recalculate the csums - in my synthetic benchmarks (16GB file on a spinning-rust drive), this resulted in a ~11% speed-up (218s to 196s). When using the --reflink option added in btrfs-progs v6.16.1, we can forgo reading the data entirely, resulting a ~2200% speed-up on the same test (128s to 6s). # mkdir rootdir # dd if=/dev/urandom of=rootdir/file bs=4096 count=4194304 (without ioctl) # echo 3 > /proc/sys/vm/drop_caches # time mkfs.btrfs --rootdir rootdir testimg ... real 3m37.965s user 0m5.496s sys 0m6.125s # echo 3 > /proc/sys/vm/drop_caches # time mkfs.btrfs --rootdir rootdir --reflink testimg ... real 2m8.342s user 0m5.472s sys 0m1.667s (with ioctl) # echo 3 > /proc/sys/vm/drop_caches # time mkfs.btrfs --rootdir rootdir testimg ... real 3m15.865s user 0m4.258s sys 0m6.261s # echo 3 > /proc/sys/vm/drop_caches # time mkfs.btrfs --rootdir rootdir --reflink testimg ... real 0m5.847s user 0m2.899s sys 0m0.097s Another notable use case is for deduplication, where reading the checksums may serve as a hint instead of reading the whole file data. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>