| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux
Pull liveupdate updates from Mike Rapoport:
"Kexec Handover (KHO):
- make memory preservation compatible with deferred initialization
of the memory map
Live Update Orchestrator (LUO):
- add LIVEUPDATE_SESSION_GET_NAME ioctl and parameter verification
for LIVEUPDATE_IOCTL_CREATE_SESSION ioctl
- documentation updates for liveupdate=on command line option,
systemd support and the current compatibility status
- remove the fixed limits on the number of files that can be
preserved within a single session, and the total number of
sessions managed by the LUO
Misc fixes:
- reference count incoming File-Lifecycle-Bound (FLB) data so
it cannot be freed while a subsystem is still using it
- fixes for a TOCTOU race in luo_session_retrieve(), a use-
after-free in the file finish and unpreserve paths, concurrent
session mutations during reboot and serialization on
preserve_context kexec
- make sure ioctls for incoming LUO sessions are blocked for
outgoing sessions and vice versa
- make sure KHO scratch size is always aligned by
CMA_MIN_ALIGNMENT_BYTES
- fix memblock tests build issue introduced by KHO changes"
* tag 'liveupdate-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux: (36 commits)
liveupdate: Document that retrieve failure is permanent
docs: memfd_preservation: fix rendering of ABI documentation
selftests/liveupdate: Add stress-files kexec test
selftests/liveupdate: Add stress-sessions kexec test
selftests/liveupdate: Test session and file limit removal
liveupdate: Remove limit on the number of files per session
liveupdate: Remove limit on the number of sessions
liveupdate: defer session block allocation and physical address setting
kho: add support for linked-block serialization
liveupdate: Extract luo_session_deserialize_one helper
liveupdate: Extract luo_file_deserialize_one helper
liveupdate: register luo_ser as KHO subtree
liveupdate: centralize state management into struct luo_ser
liveupdate: avoid mixing cleanup guards with goto in luo_session_retrieve_fd
liveupdate: change file_set->count type to u64 for type safety
liveupdate: Remove unused ser field from struct luo_session
liveupdate: fix u-a-f in luo_file_unpreserve_files() and luo_file_finish()
liveupdate: block session mutations during reboot
liveupdate: fix TOCTOU race in luo_session_retrieve()
liveupdate: skip serialization for context-preserving kexec
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This adds new Landlock access rights to control UDP bind and
connect/send operations, and a new "quiet" feature to mute specific
specific audit logs (and other future observability events).
A few commits also fix Landlock issues"
* tag 'landlock-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (24 commits)
selftests/landlock: Add tests for invalid use of quiet flag
selftests/landlock: Add tests for quiet flag with scope
selftests/landlock: Add tests for quiet flag with net rules
selftests/landlock: Add tests for quiet flag with fs rules
selftests/landlock: Replace hard-coded 16 with a constant
samples/landlock: Add quiet flag support to sandboxer
landlock: Suppress logging when quiet flag is present
landlock: Add API support and docs for the quiet flags
landlock: Add a place for flags to layer rules
landlock: Add documentation for UDP support
samples/landlock: Add sandboxer UDP access control
selftests/landlock: Add tests for UDP send
selftests/landlock: Add tests for UDP bind/connect
landlock: Add UDP send+connect access control
landlock: Add UDP bind() access control
landlock: Fix unmarked concurrent access to socket family
selftests/landlock: Explicitly disable audit in teardowns
selftests/landlock: Test SCOPE_SIGNAL on the SIGIO/fowner pgid path
landlock: Fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
landlock: Demonstrate best-effort allowed_access filtering
...
|
|
Pull kvm updates from Paolo Bonzini:
"arm64:
This is a bit of an odd merge window on the KVM/arm64 front. There
is absolutely no new feature in the pull request. It is purely
fixes, because it is simply becoming too hard to review new stuff
when so many AI-fuelled fixes hit the list.
- Significant cleanup of the vgic-v5 PPI support which was merged in
7.1. This makes the code more maintainable, and squashes a couple
of bugs in the meantime
- Set of fixes for the handling of the MMU in an NV context,
particularly VNCR-triggered faults. S1POE support is fixed as well
- Large set of pKVM fixes, mostly addressing recurring issues around
hypervisor tracking of donated pages in obscure cases where the
donation could fail and leave things in a bizarre state
- Fixes for the so-called "lazy vgic init", which resulted in
sleeping operations in non-preemptible sections. This turned out to
be far more invasive than initially expected..
- Reduce the overhead of L1/L2 context switch by not touching the FP
registers
- Fix the way non-implemented page sizes are dealt with when a guest
insist on using them for S2 translation
- The usual set of low-impact fixes and cleanups all over the map
Loongarch:
- On a request for lazy FPU load, load all FPU state that the VM
supports instead of enabling only the part (FPU, LSX or LASX) that
caused the FPU load request
- Some enhancements about interrupt injection
- Some bug fixes and other small changes
RISC-V:
- Batch G-stage TLB flushes for GPA range based page table updates
- Convert HGEI line management to fully per-HART
- Fix missing CSR dirty marking when FWFT state updated via ONE_REG
- Fix stale FWFT feature exposure to Guest/VM
- Speed up dirty logging write faults using MMU rwlock and atomic PTE
updates using cmpxchg() for permission-only changes
- Use flexible array for APLIC IRQ state
- Use kvm_slot_dirty_track_enabled() for logging enable check on a
memslot
- Avoid skipping valid pages in kvm_riscv_gstage_wp_range()
- Avoid skipping valid pages in kvm_riscv_gstage_unmap_range()
- Use endian-specific __lelong for NACL shared memory
S390:
- KVM_PRE_FAULT_MEMORY support
- Support for 2G hugepages
- Support for the ASTFLEIE 2 facility
- Support for fast inject using kvm_arch_set_irq_inatomic
- Fix potential leak of uninitialized bytes
- A few more misc gmap fixes
x86:
- Generic support for the more granular permissions allowed by EPT,
namely "read" (which was previously usurping the U bit) and
separate execution bits for kernel and userspace
- Do not assume that all page tables start with U=1/W=1/NX=0 at the
root, as AMD GMET needs to have U=0 at the root
- Introduce common assembly macros for use within Intel and AMD
vendor-specific vmentry code. This touches the SPEC_CTRL handling,
which is now entirely done in assembly for Intel (by reusing the
AMD code that already existed), and register save/restore which
uses some macro magic to compute the offsets in the struct. Both of
these are preparatory changes for upcoming APX support
- Clean up KVM's register tracking and storage, primarily to prepare
for APX support, which expands the maximum number of GPRs from 16
to 32
- Keep a single copy of the PDPTRs rather than two, since
architecturally there is just one
- Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor
code gets a chance to handle things like reaping the PML buffer
- Update KVM's view of PV async enabling if and only if the MSR write
fully succeeds
- Fix a variety of issues where the emulator doesn't honor
guest-debug state, and clean up related code along the way
- Synthesize EPT Violation and #NPF "error code" bits when injecting
faults into L1 that didn't originate in hardware (in which case the
VMCS/VMCB doesn't hold relevant information)
- Add support for virtualizing (well, emulating) AMD's flavor of
CPL>0 CPUID faulting
- Clean up the GPR APIs so that KVM's use of "raw" is consistent, and
fix a variety of minor bugs along the way
- Fix an OOB memory access due to not checking the VP ID when
handling a Hyper-V PV TLB flush for L2
- Fix a bug in the mediated PMU's handling of fixed counters that
allowed the guest to bypass the PMU event filter
- Allow userspace to return EAGAIN when handling SNP and TDX
hypercalls, so the KVM can forward a "retry" status code to the
guest, and reserve all unused error codes for future usage
- Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific
logic as possible into the TDX code, and to funnel (almost) all
S-EPT updates into a single chokepoint. The motivation is largely
to prepare for upcoming Dynamic PAMT support, but the cleanups are
nice to have on their own
- Plug a hole in shadow page table handling, where KVM fails to
recursively zap nested EPT/NPT shadow page tables when the nested
hypervisor tears down its own EPT/NPT page tables from the bottom
up
x86 (Intel):
- Support for nested MBEC (Mode-Based Execute Control), see above in
the generic section; also run with MBEC enabled even for non-nested
mode
- Use the kernel's "enum pg_level" in the TDX APIs instead of the
TDX-Module's level definitions (which are 0-based)
- Rework the TDX memory APIs to not require/assume that guest memory
is backed by "struct page" (in prepartion for guest_memfd hugepage
support)
- Fix a largely benign bug where KVM TDX would incorrectly state it
could emulate several x2APIC MSRs
- Use the "safe" WRMSR API when proxying LBR MSR writes as the
to-be-written value is guest controlled and completely unvalidated
x86 (AMD):
- Support for nested GMET (Guest Mode Execution Trap), see above in
the generic section; also run with GMET enabled even for non-nested
mode
- Fixes and minor cleanups to GHCB handling, on top of the earlier
work already merged into 7.1-rc
- Ensure KVM's copy of CR0 and CR3 are up-to-date prior to invoking
fastpath handlers
- Add support for virtualizing gPAT (KVM previously just used L1's
PAT when running L2)
- Fix goofs where KVM mishandles side effects (e.g. single-step and
PMC updates) when emulating VMRUN
- Fix a variety of bugs in AVIC's handling of x2APIC MSR
interception, most notably where KVM didn't disable interception of
IRR, ISR, and TMR regs
- Add support for virtualizing Host-Only/Guest-Only bits in the
mediated PMU
- Don't advertise support for unusable VM types, and account for VM
types that are disabled by firmware, e.g. to mitigate security
vulnerabilities
- Rewrite the SEV {en,de}crypt debug ioctls as they were riddle with
bugs and unnecessarily complicated, and add comprehensive tests
- Clean up and deduplicate the SEV page pinning code
- Fix minor goofs related to writing back CPUID information after
firmware rejects a CPUID page for an SNP vCPU
Generic:
- Rename invalidate_begin() to invalidate_start() throughout KVM to
follow the kernel's nomenclature, e.g. for mmu_notifiers
- Use guard() to cleanup up various KVM+VFIO flows
- Minor cleanups
guest_memfd:
- Return -EEXIST instead of -EINVAL if userspace attempts to bind a
gmem range to multiple memslots, and fix the test that was supposed
to ensure KVM returns -EEXIST
- Treat memslot binding offsets and sizes as unsigned values to fix a
bug where KVM interprets a large "offset + size" as a negative
value and allows a nonsensical offset
- Use the inode number instead of the page offset for the NUMA
interleaving index to fix a bug where the effective index would
jump by two for consecutive pages (the caller also adds in the page
offset)
Selftests:
- Randomize the dirty log test's delay when reaping the bitmap on the
first pass, as always waiting only 1ms hid a KVM RISC-V bug as the
test reaped the bitmap before KVM could build up enough state to
hit the bug
- A pile of one-off fixes and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (326 commits)
KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level
KVM: x86: Fix shadow paging use-after-free due to unexpected role
KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
KVM: s390: Enable adapter_indicators_set to use mapped pages
KVM: s390: Add map/unmap ioctl and clean mappings post-guest
riscv: kvm: Use endian-specific __lelong for NACL shared memory
KVM: selftests: access_tracking_perf_test: bump number of NUMA nodes to 32
KVM: s390: vsie: Implement ASTFLEIE facility 2
KVM: s390: vsie: Refactor handle_stfle
s390/sclp: Detect ASTFLEIE 2 facility
KVM: s390: Minor refactor of base/ext facility lists
KVM: x86/mmu: move pdptrs out of the MMU
KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging
KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare
KVM: x86: remove nested_mmu from mmu_is_nested()
KVM: arm64: vgic-its: Make ABI commit helpers return void
KVM: s390: Initialize KVM_S390_GET_CMMA_BITS memory
LoongArch: KVM: Add missing slots_lock for device register/unregister
LoongArch: KVM: Validate irqchip index in irqfd routing
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- v4l2:
- core: fix subdev sensor ownership
- subdev: Allow accessing routes with STREAMS client capability
- ctrls: Add validation for HEVC active reference counts and
background detection control
- common: Add YUV24 format info and has_alpha helper
- vb2: Change vb2_read() and vb2_write() return types to ssize_t
- i2c: cvs: Add driver of Intel Computer Vision Sensing Controller(CVS)
- atmel-isc: remove deprecated driver
- cec: Add CEC Latency Indication Protocol (LIP) support
- imon: Add iMON VFD HID OEM v1.2 key mappings
- AVMatrix: new HWS capture driver
- isp4: new AMD capture driver
- qcom:
- iris: Add hierarchical coding, B-frame, and Long-Term Reference
support for encoder
- camss: Add SM6350 platform support
- venus: Add SM6115 platform support
- chips-media: wave5: Add support for Packed YUV422, CBP profile, and
background detection
- csi2rx: Add multistream support and 32 dma chans
- Several cleanups and fixes
* tag 'media/v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (394 commits)
media: v4l2-fwnode: Fix subdev owner overwritten in v4l2_async_register_subdev_sensor()
media: qcom: iris: vdec: allow GEN2 decoding into 10bit format
media: qcom: iris: vdec: update find_format to handle 8bit and 10bit formats
media: qcom: iris: vdec: update size and stride calculations for 10bit formats
media: qcom: iris: gen2: add support for 10bit decoding
media: qcom: iris: add QC10C & P010 buffer size calculations
media: qcom: iris: add helpers for 8bit and 10bit formats
media: qcom: iris: Fix FPS calculation and VPP FW overhead
media: qcom: camss: vfe-340: Support for PIX client
media: qcom: camss: vfe-340: Proper client handling
media: qcom: camss: csid-340: Enable PIX interface routing
media: qcom: camss: csid-340: Add port-to-interface mapping
media: qcom: camss: csid-340: Switch to generic CSID_CFG/CTRL registers
media: iris: Initialize HFI ops after firmware load in core init
media: iris: drop struct iris_fmt
media: iris: Add platform data for X1P42100
media: iris: Add hardware power on/off ops for X1P42100
media: iris: optimize COMV buffer allocation for VPU3x and VPU4x
media: iris: add FPS calculation and VPP FW overhead in frequency formula
media: qcom: iris: Simplify COMV size calculation
...
|
|
Pull nfsd updates from Chuck Lever:
"Jeff Layton wired up netlink upcalls for the auth.unix.ip and
auth.unix.gid caches in SunRPC and the svc_export and nfsd.fh caches
in NFSD. The new kernel-user API is more extensible and lays the
groundwork for retiring the old pipe interface.
The default NFS r/w block size rises to 4MB on hosts with at least
16GB of RAM, reducing per-RPC overhead on fast networks. Smaller
machines keep their previously computed default, and the value remains
tunable through /proc/fs/nfsd/max_block_size.
Chuck Lever converted the server's RPCSEC GSS Kerberos code to the
kernel's shared crypto/krb5 library. The conversion retires and
removes SunRPC's bespoke implementation of Kerberos v5, but keeps
RPCSEC GSS-API.
Continuing the xdrgen migration that converted the NLMv4 server XDR
layer in v7.1, Chuck Lever converted the NLM version 3 server-side XDR
layer from hand-written C to xdrgen-generated code. As with the NLMv4
conversion in v7.1, the goals are improved memory safety, lower
maintenance burden, and groundwork for generation of Rust code for
this layer instead of C.
Chuck Lever fixed an issue where lingering NFSv4 state pins a mounted
file system after it is unexported. A new netlink-based mechanism can
now release NLM locks and NFSv4 state by client address, by
filesystem, and by export. Now an administrator can quiesce an export
cleanly before unmounting it.
The remaining patches are bug fixes, clean-ups, and minor
optimizations, including a batch of memory-leak and use-after-free
fixes in the ACL, lockd, and TLS handshake paths, many of them
reported by Chris Mason. Sincere thanks to all contributors,
reviewers, testers, and bug reporters who participated in the v7.2
NFSD development cycle"
* tag 'nfsd-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (106 commits)
svcrdma: wake sq waiters when the transport closes
nfsd: reset write verifier on deferred writeback errors
nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race
sunrpc: wait for in-flight TLS handshake callback when cancel loses race
sunrpc: pin svc_xprt across the asynchronous TLS handshake callback
nfsd: fix posix_acl leak on SETACL decode failure
nfsd: fix posix_acl leak and ignored error in nfsd4_create_file
nfsd: check get_user() return when reading princhashlen
nfsd: fix inverted cp_ttl check in async copy reaper
nfsd: fix dead ACL conflict guard in nfsd4_create
NFSD: Fix SECINFO_NO_NAME decode error cleanup
sunrpc: harden rq_procinfo lifecycle to prevent double-free
SUNRPC: Return an error from xdr_buf_to_bvec() on overflow
SUNRPC: Bound-check xdr_buf_to_bvec() stores before writing
nfsd: release layout stid on setlease failure
lockd: Avoid hashing uninitialized bytes in nlm4svc_lookup_file()
lockd: Plug nlm_file refcount leak on cached nlm_do_fopen() failure
lockd: Plug nlm_file leak when nlm_do_fopen() fails
Revert "NFSD: Defer sub-object cleanup in export put callbacks"
Revert "svcrdma: Use contiguous pages for RDMA Read sink buffers"
...
|
|
Pull rdma updates from Jason Gunthorpe:
"Many AI driven bug fixes, and several big driver API cleanups
- Driver bug fixes and minor cleanups in mlx5, hns, rxe, efa, siw,
rtrs, mana, irdma, mlx4. Commonly error path flows, integer
arithmetic overflows on unsafe data, out of bounds access, and use
after free issues under races.
- Second half of the new udata API for drivers focusing on uAPI
response
- bnxt_re supports more options for QP creation that will allow a dv
path in rdma-core
- Untangle the module dependencies so drivers don't link to
ib_uverbs.ko as was originall intended
- Provide a new way to handle umems with a consistent simplified uAPI
and update several drivers to use it. This brings dmabuf support to
more places and more drivers
- Support for mlx5 rate limit and packet pacing for UD and UC
- A batch of fixes for the new shared FRMR pools infrastructure"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (148 commits)
RDMA/irdma: Replace waitqueue and flag with completion
RDMA/hns: Fix memory leak of bonding resources
RDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sg
docs: infiniband: correct name of option to enable the ib_uverbs module
RDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocated
RDMA/bnxt_re: Fail DBR related page allocation UAPIs if the feature is disabled
RDMA/bnxt_re: Avoid repeated requests to allocate WC pages
RDMA/bnxt_re: Proper rollback if the ioremap fails
RDMA/bnxt_re: Add a max slot check for SQ
RDMA/bnxt_re: Avoid displaying the kernel pointer
RDMA/bnxt_re: Free CQ toggle page after firmware teardown
RDMA/bnxt_re: Free SRQ toggle page after firmware teardown
RDMA/bnxt_re: Initialize dpi variable to zero
ABI: sysfs-class-infiniband: minor cleanup
RDMA/mlx5: Release the HW‑provided UAR index rather than the SW one
RDMA/mlx5: Fix undefined shift of user RQ WQE size
RDMA/mlx5: Remove raw RSS QP restrack tracking
RDMA/mlx5: Remove DCT restrack tracking
RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure
RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"All various fixes:
- Typo breaking the veventq uAPI for 32 bit userspace
- Several Sashiko found errors in the veventq and fault fd paths
- Fix incorrect use of dmabuf locks, and possible races with iommufd
destroy and dmabuf revoke
- Sashiko errors found in the uAPI validation for IOMMU_HWPT_INVALIDATE"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommu: Avoid copying the user array twice in the full-array copy helper
iommufd/selftest: Add invalidation entry_num and entry_len boundary tests
iommufd: Set upper bounds on cache invalidation entry_num and entry_len
iommufd: Clarify IOAS_MAP_FILE dma-buf support
iommufd: Destroy the pages content after detaching from dmabuf
iommufd: Take dma_resv lock before dma_buf_unpin() in release path
iommufd/selftest: Cover invalid read counts on vEVENTQ FD
iommufd: Avoid partial fault group delivery in iommufd_fault_fops_read()
iommufd: Break the loop on failure in iommufd_fault_fops_read()
iommufd: Reject invalid read count in iommufd_fault_fops_read()
iommufd: Propagate allocation failure in iommufd_veventq_deliver_fetch()
iommufd: Reject invalid read count in iommufd_veventq_fops_read()
iommufd: Rewind header length in done if iommufd_veventq_fops_read() fails
iommufd/selftest: Add boundary tests for veventq_depth
iommufd: Set veventq_depth upper bound
iommufd: Move vevent memory allocation outside spinlock
iommufd: Fix data_len byte-count vs element-count mismatch
iommufd: Use sizeof(*hdr) instead of sizeof(hdr) in veventq read
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
"Core Code:
- Fix dma-iommu scatterlist length handling in the P2PDMA path
- Extend the generic IOMMU page-table code with detailed gather
support for more precise invalidations
- Add pending-gather tracking to generic page-table invalidation
handling
- Add support for smaller virtual address sizes in the generic AMDv1
page-table format, including KUnit coverage
- Fix page-size bitmap calculation for smaller VA configurations
- Rework Arm io-pgtable allocation/freeing to consistently use the
iommu-pages API and address-conversion helpers
- Add PCI ATS infrastructure for devices that require ATS, including
always-on ATS handling for pre-CXL devices
AMD IOMMU:
- Fix several IOTLB invalidation details, including PDE handling,
flush-all behavior, and command address encoding
- Honor IVINFO[VASIZE] when deriving address limits
- Fix premature loop termination in init_iommu_one()
- Add Hygon family 18h model 4h IOAPIC support
- Clean up legacy-mode handling, stale comments, dead IVMD
exclusion-range code, and unused address-size macros
Arm SMMU / Arm SMMU v3:
- SMMUv2:
- Device-tree binding updates for Qualcomm Hawi, Nord and Shikra
SoCs
- Constrain the clocks which can be specified for recent Qualcomm
SoCs
- Fix broken compatible string for Qualcomm prefetcher
configuration an add new entry for the Glymur MDSS
- Ensure SMMU is powered-up when writing context bank for Adreno
client
- SMMUv3:
- Fix off-by-one in queue allocation retry loop
- Enable hardware update of access/dirty bits from the SMMU
- Re-jig command construction to use separate inline helpers for
each command type
Intel VT-d:
- Add the PCI segment number to DMA fault messages
- Improve support for non-PRI mode SVA
- Ensure atomicity during context entry teardown
- Fix RB-tree corruption in the probe error path
RISC-V IOMMU:
- Add NAPOT range invalidation support
- Use detailed gather information for invalidation decisions
- Compute the best stride for single invalidations
- Advertise Svpbmt support to the generic page-table code
- Add capability definitions and clean up command macro encoding
VeriSilicon IOMMU:
- Add a new VeriSilicon IOMMU driver
- Add devicetree binding documentation and MAINTAINERS coverage
- Add the RK3588 VeriSilicon IOMMU node
- Apply small cleanups and warning fixes in the new driver
Rockchip IOMMU:
- Disable the fetch DTE time limit
Apple DART:
- Correct a stale CONFIG_PCIE_APPLE macro name in a comment"
* tag 'iommu-updates-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits)
iommu/dma-iommu: Fix wrong scatterlist length assignment in P2PDMA path
iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
iommu/amd: Have amd_iommu_domain_flush_pages() use last
iommu/amd: Pass last in through to build_inv_address()
iommu/amd: Simplify build_inv_address()
iommu/apple-dart: correct CONFIG_PCIE_APPLE macro name in comment
iommu/vt-d: Fix RB-tree corruption in probe error path
iommu/vt-d: Improve IOMMU fault information
iommu/vt-d: Remove typo from pasid_pte_config_nested()
iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry
iommu/vt-d: Avoid WARNING in sva unbind path
dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali
dt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC
iommu/amd: Don't split flush for amd_iommu_domain_flush_all()
iommu/rockchip: disable fetch dte time limit
iommu/arm-smmu-v3: Allow ATS to be always on
PCI: Allow ATS to be always on for pre-CXL devices
PCI: Add pci_ats_required() for CXL.cache capable devices
iommu/vsi: Use list_for_each_entry()
...
|
|
Pull virtio updates from Michael Tsirkin:
- new virtio CAN driver
- support for LoongArch architecture in fw_cfg
- support for firmware notifications in vdpa/octeon_ep
- support for VFs in virtio core
- fixes, cleanups all over the place, notably:
- vhost: fix vhost_get_avail_idx for a non empty ring
fixing an significant old perf regression
- READ_ONCE() annotations mean virtio ring is now
free of KCSAN warnings
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits)
can: virtio: Fix comment in UAPI header
can: virtio: Add virtio CAN driver
virtio: add num_vf callback to virtio_bus
fw_cfg: Add support for LoongArch architecture
vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handler
vdpa/octeon_ep: Add vDPA device event handling for firmware notifications
vdpa/octeon_ep: Use 4 bytes for mailbox signature
vdpa/octeon_ep: Fix PF->VF mailbox data address calculation
vhost_task_create: kill unnecessary .exit_signal initialization
vhost: remove unnecessary module_init/exit functions
vdpa/mlx5: Use kvzalloc_flex() for MTT command memory
vdpa_sim_net: switch to dynamic root device
vdpa_sim_blk: switch to dynamic root device
virtio-mem: Destroy mutex before freeing virtio_mem
virtio-balloon: Destroy mutex before freeing virtio_balloon
tools/virtio: fix build for kmalloc_obj API and missing stubs
virtio_ring: Add READ_ONCE annotations for device-writable fields
vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFO
tools/virtio: check mmap return value in vringh_test
vhost/net: complete zerocopy ubufs only once
...
|
|
Pull VFIO updates from Alex Williamson:
- Fix out-of-tree vfio selftest builds with make O= (Jason Gunthorpe)
- Allow vfio selftests to build when ARCH=x86 is used for 64-bit x86
builds (David Matlack)
- Tighten vfio selftest infrastructure with stricter builds, safer path
handling, sysfs helpers, and reusable device/VF-token setup. Build on
that to add the SR-IOV UAPI selftest across supported IOMMU modes
(Raghavendra Rao Ananta)
- Conclude earlier vfio PCI BAR work already taken as v7.1 fixes by
replacing vfio_pci_core_setup_barmap() and direct barmap[] access
with vfio_pci_core_get_iomap(). Fix resulting sparse warnings (Matt
Evans)
- Simplify hisi_acc vfio-pci variant driver device-info reads by using
the mailbox's new direct command-based read helper (Weili Qian)
- Avoid duplicate reset handling in the Xe vfio-pci variant driver
reset-done path (GuoHan Zhao)
- Resolve a lockdep circular dependency splat by tracking active VFs
with a private sriov_active flag rather than calling pci_num_vf()
under memory_lock (Raghavendra Rao Ananta)
- Add CXL DVSEC-based readiness polling for Blackwell-Next in the
nvgrace-gpu vfio-pci variant driver, including interruptible,
lockless waits to support worst case spec defined timeouts (Ankit
Agrawal)
- Prevent vfio_mig_get_next_state() from spinning forever on blocked
migration state transition (Junrui Luo)
- Fix a qat vfio variant driver migration resume race by taking the
migration file lock before boundary checks (Giovanni Cabiddu)
- Add explicit dependencies between vfio selftest output object files
and output directories to ensure directories are always created
(David Matlack)
* tag 'vfio-v7.2-rc1' of https://github.com/awilliam/linux-vfio:
vfio: selftests: Ensure libvfio output dirs are always created
vfio/qat: fix f_pos race in qat_vf_resume_write()
vfio: prevent infinite loop in vfio_mig_get_next_state() on blocked arc
vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC
vfio/pci: Use a private flag to prevent power state change with VFs
vfio/pci: Fix sparse warning in vfio_pci_core_get_iomap()
vfio/xe: avoid duplicate reset in xe_vfio_pci_reset_done
hisi_acc_vfio_pci: simplify the command for reading device information
vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
vfio: selftests: Add tests to validate SR-IOV UAPI
vfio: selftests: Add helpers to alloc/free vfio_pci_device
vfio: selftests: Add helper to set/override a vf_token
vfio: selftests: Expose more vfio_pci_device functions
vfio: selftests: Extend container/iommufd setup for passing vf_token
vfio: selftests: Introduce a sysfs lib
vfio: selftests: Introduce snprintf_assert()
vfio: selftests: Add -Wall and -Werror to the Makefile
vfio: selftests: Allow builds when ARCH=x86
vfio: selftests: Fix out-of-tree build with make O=
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"There are a few added drivers, but mostly the normal maintenance to
drivers for firmware, memory controller and other soc specific
hardware:
- The NXP QuickEngine gets modern MSI support, which allows some
cleanups to the GICv3 irqchip chip driver
- A new SoC specific driver for the Renesas R-Car MFIS unit is added,
encapsulating support for the on-chip mailbox and hwspinlock
implementations that are not easily separated into individual
drivers
- The Qualcomm SoC drivers add support for additional SoC
implementations, and flexibility around power management for the
serial-engine driver as well as probing the LLCC driver using
custom hardware descriptions inside of the device itself.
- Added support for the Samsung thermal management unit
- A cleanup to the Tegra 'PMC' driver interfaces to remove legacy
APIs and allow multiple PMC instances everywhere.
- Updates to the TI SCI and KNAS drivers to improve suspend/resume
support.
- Minor driver changes for mediatek, xilinx, allwinner, aspeed,
tegra, broadcom, amd, microchip and starfive specific drivers
- Memory controller updates for Tegra and Renesas for additional SoC
types and other improvements.
- Firmware driver updates for Arm FF-A, SMCCC and SCMI interfaces, to
update driver probing, object lifetimes and address minor bugs"
* tag 'soc-drivers-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (189 commits)
Revert "firmware: zynqmp: Add dynamic CSU register discovery and sysfs interface"
Revert "Documentation: ABI: add sysfs interface for ZynqMP CSU registers"
memory: tegra234: drop dead NULL check in tegra234_mc_icc_aggregate()
memory: tegra264: drop redundant tegra264_mc_icc_aggregate()
memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
soc: aspeed: cleanup dead default for ASPEED_SOCINFO
firmware: tegra: bpmp: Add support for multi-socket platforms
firmware: tegra: bpmp: Propagate debugfs errors
soc/tegra: pmc: Add Tegra238 support
soc/tegra: pmc: Restrict power-off handler to Nexus 7
soc/tegra: pmc: Populate powergate debugfs only when needed
soc/tegra: pmc: Move legacy code behind CONFIG_ARM guard
soc/tegra: pmc: Remove unused legacy functions
soc/tegra: pmc: Create PMC context dynamically
firmware: samsung: acpm: remove compile-testing stubs
firmware: samsung: acpm: Add devm_acpm_get_by_phandle helper
firmware: samsung: acpm: Add TMU protocol support
firmware: samsung: acpm: Make acpm_ops const and access via pointer
firmware: samsung: acpm: Drop redundant _ops suffix in acpm_ops members
firmware: samsung: acpm: Annotate rx_data->cmd with __counted_by_ptr
...
|
|
Pull drm updates from Dave Airlie:
"Highlights:
- xe: add initial CRI platform support
- amdgpu: initial HDMI 2.1 FRL support
- rust: add some new type concepts for device lifetimes
- scheduler: moves to a fair algorithm and lots of cleanups
But it's mostly the usual mountain of changes across the board.
core:
- add docbook for DRM_IOCTL_SYNCOBJ_EVENTFD
- change signature of drm_connector_attach_hdr_output_metadata_property
- dedup counter and timestamp retrieval in vblank code
- parse AMD VSDB v3 in CTA extension blocks
- add P230, Y7, XYYY2101010, T430, XVUY210101010 formats
- don't call drop master on file close if not master
- use drm_printf_indent in atomic / bridge
- fix 32b format descriptions
- docs: fix toctree
- hdmi: add common TMDS character rates
- fix drm_syncobj_find_fence leak
rust:
- introduce Higher-Ranked lifetime types
- replace drvdata with scoped registration data
- add GPUVM immediate mode abstraction for rust GPU drivers
- introduce DeviceContext type state for drm::Device
bridge:
- clarify drm_bridge_get/put
- create drm_get_bridge_by_endpoint and use it
- analogix_dp: add panel probing
- ite-it6211 - use drm audio hdmi helpers
buddy:
- add lockdep annotations
dp:
- add PR and VRR updates
- mst: fix buffer overflows
- add Adaptive Sync SDP decoding support
- fix OOB reads in dp-mst
ttm:
- bump fpfn/lpfn to 64-bit
scheduler:
- change default to fair scheduler
- map runqueue 1:1 with scheduler
dma-buf:
- port selftests to kunit
- convert dma-buf system/heap allocators to module
- add separate DMABUF_HEAPS_SYSTEM_CC_SHARED Kconfig
udmabuf:
- revert hugetlb support
- fix error with CONFIG_DMA_API_DEBUG
dma-fence:
- fix tracepoints lifetime
- remove unused signal on any support
ras:
- add clear error counter netlink command to drm ras
gpusvm:
- reject VMAs with VM_IO or VM_PFNMAP when creating SVM ranges
- use IOVA allocations
pagemap:
- use IOVA allocations
panels:
- update to use ref counts
- add support for CSW PNB601LS1-2, LGD LP116WHA-SPB1
- add support for waveshare panels
- CMN N116BCN-EA1, CMN N140HCA-EEK, IVO M140NWFQ R5,
- IVO, R140NWFW R0, BOE NT140*, BOE NV133FHM-N4F,
- AUO B140*, AUO B133HAN06.6 and AUO B116XTN02.3 eDP panels
- Surface Pro 12 Panel
xe:
- add CRI PCI-IDs
- debugfs add multi-lrc info
- engine init cleanup
- PF fair scheduling auto provisioning
- system controller support for CRI/Xe3p
- PXP state machine fixes
- Reset/wedge/unload corner case fixes
- Wedge path memory allocation fixes
- PAT type cleanups
- Reject unsafe PAT for CPU cached memory
- OA improvements for CRI device memory
- kernel doc syntax in xe headers
- xe_drm.h documentation fixes
- include guard cleanups
- VF CCS memory pool
- i915/xe step unification
- Xe3p GT tuning fixes
- forcewake cleanup in GT and GuC
- admin-only PF mode
- enable hwmon energy attributes for CRI
- enable GT_MI_USER_INTERRUPT
- refactor emit functions
- oa workarounds
- multi_queue: allow QUEUE_TIMESTAMP register
- convert stolen memory to ttm range manager
- use xe2 style blitter as a feature flag
- make drm_driver const
- add/use IRQ page to HW engine definition
- fix oops when display disabled
i915:
- enable PIPEDMC_ERROR interrupt
- more common display code refactoring
- restructure DP/HDMI sink format handling
- eliminate FB usage from lowlevel pinning code
- panel replay bw optimization
- integrate sharpness filter into the scaler
- new fb_pin abstraction for xe/i915 fb transparent handling
- skip inactive MST connectors on HDCP
- start switching to display specific registers
- use polling when irq unavailable
- Adaptive-sync SDP prep
amdgpu:
- use drm_display_info for AMD VSDB data
- Initial HDMI 2.1 FRL support
- Initial DCN 4.2.1 support
- GART fixes for non-4k pages
- GC 11.5.6/SDMA 6.4.0/and other new IPs
- GFX9/DCE6/Hawaii/SDMA4/GART/Userq fixes
- Finish support for using multiple SDMA queues for TTM operations
- SWSMU updates
- GC 12.1 updates
- SMU 15.0.8 updates
- DCN 4.2 updates
- DC type conversion fixes
- Enable DC power module
- Replay/PSR updates
- SMU 13.x updates
- Compute queue quantum MQD updates
- ASPM fix
- Align VKMS with common implementation
- DC analog support fixes
- UVD 3 fixes
- TCC harvesting fixes for SI
- GC 11 APU module reload fix
- NBIO 6.3.2 support
- IH 7.1 updates
- DC cursor fixes
- VCN/JPEG user fence fixes
- DC support for connectors without DDC
- Prefer ROM BAR for default VGA device
- DC bandwidth fixes
- Add PTL support for profiler
- Introduce dc_plane_cm and migrate surface update color path
- Add FRL registers for HDMI 2.1
- Restructure VM state machine
- Auxless ALPM support
- GEM_OP locking/warning fixes
- switch to system_dfl_wq
amdkfd:
- GPUVM TLB flush fix
- Hotplug fix
- Boundary check fixes
- SVM fixes
- CRIU fixes
- add profiler API
- MES 12.1 updates
msm:
- core:
- fix shrinker documentation
- IFPC enabled for gen8
- PERFCNTR_CONFIG ioctl support
- GPU:
- reworked UBWC handling
- a810 support
- MDSS:
- add support for Milos platform
- reworked UBWC handling
- DisplayPort:
- reworked HPD handling as prep for MST
- DPU:
- Milos platform support
- reworked UBWC handling
- DSI:
- Milos platform support
nova:
- Hopper/Blackwell enablement (GH100/GB100/GB202)
- FSP support
- 32-bit firmware support
- HAL functions
- refactor GSP boot/unload
- GA100 support
- VBIOS hardening/refactoring
- Adopt higher order lifetime types
tyr:
- define register blocks
- add shmem backed GEM objects
- adopt higher order lifetime types
- move clock cleanup into Drop
radeon:
- Hawaii SMU fixes
- CS parser fix
- use struct drm_edid instead of edid
amdxdna:
- export per-client BO memory via fdinfo
- AIE4 device support
- support medium/lower power modes
- expandable device heap support
- revert read-only user-pointer BO mappings
ivpu:
- support frequency limiting
panthor:
- enable GEM shrinker support
- add eviction and reclaim info to fdinfo
v3d:
- enable runtime PM
mgag200:
- support XRGB1555 + C8
ast:
- support XRGB1555 + C8
- use constants for lots of registers
- fix register handling
imagination:
- fence handling refactoring
nouveau:
- fix sched double call
- expose VBIOS on GSP-RM systems
- add GA100 support
virtio:
- add VIRTIO_GPU_F_BLOB_ALIGNMENT flag
- add deferred mapping support
gud:
- add RCade Display Adapter
hibmc:
- fix no connectors usage
mediatek:
- hdmi: convert error handling
- simplify mtk_crtc allocation
exynos:
- move fbdev emulation to drm client buffers
- use drm format helpers for geometry/size
- adopt core DMA tracking
- fix framebuffer offset handling
renesas:
- add RZ/T2H SOC support
versilicon:
- add cursor plane support
tegra:
- use drm client for framebuffer"
* tag 'drm-next-2026-06-17' of https://gitlab.freedesktop.org/drm/kernel: (1731 commits)
dma-buf: move system_cc_shared heap under separate Kconfig
accel/amdxdna: Clear sva pointer after unbind
agp/amd64: Fix broken error propagation in agp_amd64_probe()
accel/amdxdna: Require carveout when PASID and force_iova are disabled
drm/amdkfd: always resume_all after suspend_all
drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini
drm/amd/display: Consult MCCS FreeSync cap only if requested & supported
drm/amd/pm: Use strscpy in profile mode parsing
drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype length
drm/amdkfd: fix sysfs topology prop length on buffer truncation
drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages
drm/amd/pm: bound OD parameter parsing to stack array size
drm/amd/pm: Stop pp_od_clk_voltage emit at PAGE_SIZE
drm/amdkfd: Unwind debug trap enable on copy_to_user failure
drm/amdgpu: validate the mes firmware version for gfx12.1
drm/amdgpu: validate the mes firmware version for gfx12
drm/amdgpu: compare MES firmware version ucode for gfx11
drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTS
drm/amdgpu: restart the CS if some parts of the VM are still invalidated
drm/amd/display: use unsigned types for local pipe and REG_GET counters
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"Major changes:
- Recover from BPF arena page faults using a scratch page and add
ptep_try_set() for lockless empty-slot installs on x86 and arm64.
This allows BPF kfuncs to access arena pointers directly.
The 'arena_direct_access' stable branch was created for this work
and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar
Kartikeya Dwivedi)
- Lift old restriction and support 6+ arguments in BPF programs and
kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan)
Other features and fixes:
- Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease
addition of new BTF kinds (Alan Maguire)
- Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei
Starovoitov)
- Refactor object relationship tracking in the verifier and fix a
dynptr use-after-free bug (Amery Hung)
- Harden the signed program loader and reject exclusive maps as inner
maps (Daniel Borkmann)
- Replace the verifier min/max bounds fields with a circular number
(cnum) representation and improve 32->64 bit range refinements
(Eduard Zingerman)
- Introduce the arena library and runtime (libarena) with a buddy
allocator, rbtree and SPMC queue data structures, ASAN support and
a parallel test harness. Allow subprograms to return arena pointers
and switch to a BTF type-tag based __arena annotation (Emil
Tsalapatis)
- Cache build IDs in the sleepable stackmap path and avoid faultable
build ID reads under mm locks (Ihor Solodrai)
- Introduce the tracing_multi link to attach a single BPF program to
many kernel functions at once. Allow specifying the uprobe_multi
target via FD (Jiri Olsa)
- Extend the bpf_list family of kfuncs with bpf_list_add/del(), and
bpf_list_is_first/is_last/empty() (Kaitao Cheng)
- Extend the BPF syscall with common attributes support for
prog_load, btf_load and map_create (Leon Hwang)
- Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu)
- Add sleepable support for tracepoint programs and fix deadlocks in
LRU map due to NMI reentry (Mykyta Yatsenko)
- Fix OOB access in bpf_flow_keys, fix nullness analysis of inner
arrays, enforce write checks for global subprograms (Nuoqi Gui)
- Report the maximum combined stack depth and print a breakdown of
instructions processed per subprogram (Paul Chaignon)
- Add an XDP load-balancer benchmark and arm64 JIT support for stack
arguments (Puranjay Mohan)
- Add kfuncs to traverse over wakeup_sources (Samuel Wu)
- Allow sleepable BPF programs to use LPM trie maps directly (Vlad
Poenaru)
- Many more fixes and cleanups across the verifier, BTF, sockmap,
devmap, bpffs, security hooks, s390/riscv/loongarch JITs,
rqspinlock, libbpf, bpftool, selftests"
* tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits)
selftests/bpf: Work around llvm stack overflow in crypto progs
selftests/bpf: add test for bpf_msg_pop_data() overflow
bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
sockmap: Fix use-after-free in udp_bpf_recvmsg()
bpf, sockmap: keep sk_msg copy state in sync
bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
selftsets/bpf: Retry map update on helper_fill_hashmap()
selftests/bpf: Add test for sleepable lsm_cgroup rejection
selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper
bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
selftests/bpf: Avoid static LLVM linking for cross builds
selftests/bpf: Use common CFLAGS for urandom_read
selftests/bpf: Initialize operation name before use
tools/bpf: build: Append extra cflags
libbpf: Initialize CFLAGS before including Makefile.include
bpftool: Append extra host flags
bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS
bpftool: Pass host flags to bootstrap libbpf
selftests/bpf: correct CONFIG_PPC64 macro name in comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Work on removing rtnl_lock protection throughout the stack
continues. In this chapter:
- don't use rtnl_lock for IPv6 multicast routing configuration
- don't take rtnl_lock in ethtool for modern drivers
- prepare Qdisc dump callbacks for rtnl_lock removal
- Support dumping just ifindex + name of all interfaces, under RCU.
It's a common operation for Netlink CLI tools (when translating
names to ifindexes) and previously required full rtnl_lock.
- Support dumping qdiscs and page pools for a specific netdev. Even
tho user space wants a dump of all netdevs, most of the time, the
OOO programming model results in repeating the dump for each
netdev. Which, in absence of a cache, leads to a O(n^2) behavior.
- Flush nexthops once on multi-nexthop removal (e.g. when device goes
down), another O(n^2) -> O(n) improvement.
- Rehash locally generated traffic to a different nexthop on
retransmit timeout.
- Honor oif when choosing nexthop for locally generated IPv6 traffic.
- Convert TCP Auth Option to crypto library, and drop non-RFC algos.
- Increase subflow limits in MPTCP to 64 and endpoint limit to 256.
- Support MPTCP signaling of IPv6 address + port (ADD_ADDR). We need
to selectively skip reporting of the standard TCP Timestamp option,
because they won't fit into the header space together (12 + 30 >
40).
- Support using bridge neighbor suppression, Duplicate Address
Detection, Gratuitous ARP and unsolicited NA forwarding - in EVPN
deployments, e.g. VXLAN fabrics (IPv4 and IPv6).
- Improve link state reporting for upper netdevs (e.g. macvlan) over
tunnel devices (again, mostly for EVPN deployments).
- Support binding GENEVE tunnels to a local address.
- Speed up UDP tunnel destruction (remove one synchronize_rcu()).
- Support exponential field encoding in multicast (IGMPv3 and MLDv2).
- Support attaching PSP crypto offload to containers (veth, netkit).
- Add a new IPSec Netlink message XFRM_MSG_MIGRATE_STATE that allows
migrating individual IPsec SAs independently of their policies.
The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA
migration, lacks SPI for unique SA identification, and cannot
express reqid changes or migrate Transport mode selectors.
The new interface identifies the SA via SPI and mark, supports
reqid changes, address family changes, encap removal, and uses an
atomic create+install flow under x->lock to prevent SN/IV reuse
during AEAD SA migration.
- Implement GRO/GSO support for PPPoE.
- Convert sockopt callbacks in a number of protocols to iov_iter.
Cross-tree stuff:
- Remove support for Crypto TFM cloning (unblocked after the TCP Auth
Option rework). This feature regressed performance for all crypto
API users, since it changed crypto transformation objects into
reference-counted objects.
- Add FCrypt-PCBC implementation to rxrpc and remove it from the
global crypto API as obsolete and insecure.
Wireless:
- Major rework of station bandwidth handling, fixing issues with
lower capability than AP.
- Cleanups for EMLSR spec issues (drafts differed).
- More Neighbor Awareness Networking (Wi-Fi Aware) work (multicast,
schedule improvements, multi-station etc.)
- Some Ultra High Reliability (UHR) / IEEE 802.11bn (D1.4) work
(e.g. non-primary channel access, UHR DBE support).
- Fine Timing Measurement ranging (i.e. distance measurement) APIs.
Netfilter:
- Use per-rule hash initval in nf_conncount. This avoids unnecessary
lock contention with short keys (e.g. conntrack zones) in different
namespaces.
- Various safety improvements, both in packet parsing and object
lifetimes. Notably add refcounts to conntrack timeout policy.
Deletions:
- Remove TLS + sockmap integration. TLS wants to pin user pages to
avoid a copy, and sockmap wants to write to the input stream. More
work on this integration is clearly needed, and we can't find any
users (original author admitted that they never deployed it).
- Remove support for TLS offload with TCP Offload Engine (the far
more common opportunistic offload is retained). The locking looks
unfixable (driver sleeps under TCP spin locks) and people from the
vendor that added this are AWOL.
- Remove more ATM code, trying to leave behind only what PPPoATM
needs, AAL5 and br2684 with permanent circuits.
- Remove AppleTalk. Let it join hamradio in our out of tree protocol
graveyard, I mean, repository.
- Disable 32-bit x_tables compatibility (32bit binaries on 64bit
kernel) interface in user namespaces. To be deleted completely,
soon.
- Remove 5/10 MHz support from cfg80211/mac80211.
Drivers:
- Software:
- Support DEVMEM/DMABUF Tx over NETMEM_TX_NO_DMA devices (netkit)
- bonding: add knob to strictly follow 802.3ad for link state
- New drivers:
- Alibaba Elastic Ethernet Adaptor (cloud vNIC).
- NXP NETC switch within i.MX94.
- DPLL:
- Add operational state to pins (implement in zl3073x).
- Add generic DPLL type, for daisy-chaining DPLLs (implement in ice).
- Ethernet high-speed NICs:
- Huawei (hinic3):
- enhance tc flow offload support with queue selection,
tunnels
- nVidia/Mellanox:
- avoid over-copying payload to the skb's linear part (up to
60% win for LRO on slow CPUs like ARM64 V2)
- expose more per-queue stats over the standard API
- support additional, unprivileged PFs in the DPU
configuration
- support Socket Direct (multi-PF) with switchdev offloads
- add a pool / frag allocator for DMA mapped buffers for
control objects, save memory on systems with 64kB page size
- take advantage of the ability to dynamically change RSS
table size, even when table is configured by the user
- increase the max RSS table size for even traffic
distribution
- Ethernet NICs:
- Marvell/Aquantia:
- AQC113 PTP support
- Realtek USB (r8152):
- support 10Gbit Link Speeds and Energy-Efficient Ethernet
(EEE)
- support firmware loaded (for RTL8157/RTL8159)
- support for the RTL8159
- Intel (ixgbe):
- support Energy-Efficient Ethernet (EEE) on E610 devices
- Ethernet switches:
- Airoha:
- support multiple netdevs on a single GDM block / port
- Marvell (mv88e6xxx):
- support SERDES of mv88e6321
- Microchip (ksz8/9):
- rework the driver callbacks to remove one indirection layer
- Motorcomm (yt921x):
- support port rate policing
- support TBF qdisc offload
- support ACL/flower offload
- nVidia/Mellanox:
- expose per-PG rx_discards
- Realtek:
- rtl8365mb: bridge offloading and VLAN support
- Ethernet PHYs:
- Airoha:
- support Airoha AN8801R Gigabit PHYs.
- Micrel:
- implement 3 low-loss cable tunables
- Realtek:
- support MDI swapping for RTL8226-CG
- support MDIO for RTL931x
- Qualcomm:
- at803x: Rx and Tx clock management for IPQ5018 PHY
- Motorcomm:
- support YT8522 100M RMII PHY
- set drive strength in YT8531s RGMII
- TI:
- dp83822: add optional external PHY clock
- Bluetooth:
- hci_sync: add support for HCI_LE_Set_Host_Feature [v2]
- SMP: use AES-CMAC library API
- Intel:
- support Product level reset
- support smart trigger dump
- Mediatek:
- add event filter to filter specific event
- Realtek:
- fix RTL8761B/BU broken LE extended scan
- WiFi:
- Broadcom (b43):
- new support for a 11n device
- MediaTek (mt76):
- support mt7927
- mt792x: broken usb transport detection
- mt7921: regulatory improvements
- Qualcomm (ath9k):
- GPIO interface improvements
- Qualcomm (ath12k):
- WDS support
- replace dynamic memory allocation in WMI Rx path
- thermal throttling/cooling device support
- 6 GHz incumbent interference detection
- channel 177 in 5 GHz
- Realtek (rt89):
- RTL8922AU support
- USB 3 mode switch for performance
- better monitor radiotap support
- RTL8922DE preparations"
* tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1778 commits)
ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
net: serialize netif_running() check in enqueue_to_backlog()
net: skmsg: preserve sg.copy across SG transforms
appletalk: move the protocol out of tree
appletalk: stop storing per-interface state in struct net_device
selftests/bpf: test that TLS crypto is rejected on a sockmap socket
selftests/bpf: drop the unused kTLS program from test_sockmap
selftests/bpf: remove sockmap + ktls tests
tls: remove dead sockmap (psock) handling from the SW path
tls: reject the combination of TLS and sockmap
atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
atm: remove unused ATM PHY operations
atm: remove the unused pre_send and send_bh device operations
atm: remove the unused change_qos device operation
atm: remove SVC socket support and the signaling daemon interface
atm: remove the local ATM (NSAP) address registry
atm: remove dead SONET PHY ioctls
atm: remove the unused send_oam / push_oam callbacks
atm: remove AAL3/4 transport support
net: dsa: sja1105: fix lastused timestamp in flower stats
...
|
|
ATM removals have left a number of uAPI headers and ioctl
definitions with no in-kernel implementation behind them:
- device headers for adapters deleted with the legacy PCI/SBUS drivers:
atm_eni.h, atm_he.h, atm_idt77105.h, atm_nicstar.h, atm_zatm.h and
the atmtcp pair atm_tcp.h / <linux/atm_tcp.h>
- protocol headers for the removed CLIP, LANE and MPOA stacks:
atmarp.h, atmclip.h, atmlec.h, atmmpc.h
- atmsvc.h and the SVC / p2mp / local-address ioctls in atmdev.h
(ATM_{GET,RST,ADD,DEL}ADDR, ATM_{ADD,DEL,GET}LECSADDR,
ATM_{ADD,DROP}PARTY) left behind by the SVC and address-registry
removals
None of these are referenced by any remaining in-tree code.
Let's try to delete all this. Chances are nobody cares about
these headers any more. I'm keeping this separate from the
kernel side code changes for ease of revert, in case I am
proven wrong...
Link: https://patch.msgid.link/20260615194416.752559-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Per-controller admin and IO timeout sysfs attributes, and
letting the block layer set request timeouts (Maurizio,
Maximilian)
- Multipath passthrough iostats, and PCI P2PDMA enablement for
multipath devices (Keith, Kiran)
- A new diag sysfs attribute group exporting per-controller
counters (retries, multipath failover, error counters, requeue
and failure counts, reset and reconnect events) (Nilay)
- FDP configuration validation and bounds check fixes (liuxixin)
- Various nvmet fixes, including a pre-auth out-of-bounds read in
the Discovery Get Log Page handler, auth payload bounds
validation, and tcp error-path leak fixes (Bryam, Tianchu,
Geliang)
- nvme-tcp lockdep and workqueue fixes (Shin'ichiro, Kuniyuki,
Eric)
- Assorted other fixes and cleanups (John, Yao, Chao, Mateusz,
Achkinazi, Wentao)
- MD pull request via Yu Kuai:
- raid1/raid10 fixes for a deadlock in the read error recovery
path, error-path detection and bio accounting with cloned bios,
and an nr_pending leak in the REQ_ATOMIC bad-block error path
(Abd-Alrhman)
- PCI P2PDMA propagation from member devices to the RAID device
(Kiran)
- dm-raid bio requeue fix, and various smaller fixes and cleanups
(Benjamin, Chen, Li, Thorsten)
- Enable Clang lock context analysis for the block layer, with the
accompanying annotations across queue limits, the blk_holder_ops
callbacks, crypto, cgroup, iocost, kyber and mq-deadline (Bart)
- Block status code infrastructure work: a tagged status table, a
str_to_blk_op() helper, a bio_endio_status() helper, and on top of
that a new configurable block-layer error injection facility
(Christoph)
- DRBD netlink rework, replacing the genl_magic machinery with explicit
netlink serialization and moving the DRBD UAPI headers to
include/uapi/linux/ (Christoph Böhmwalder)
- bvec improvements: a bvec_folio() helper and making the bvec_iter
helpers proper inline functions (Willy, Christoph)
- ublk cleanups and a canceling-flag fix for the disk-not-allocated
case (Caleb, Ming)
- Partition handling fixes: bound the AIX pp_count scan, fix an of_node
refcount leak, and replace __get_free_page() with kmalloc() (Bryam,
Wentao, Mike)
- Convert numa_node to int in blk_mq_hw_ctx and ->init_request, and add
WQ_PERCPU to the block workqueue users (Mateusz, Marco)
- Block statistics and tracing: propagate in-flight to the whole disk
on partition IO, export passthrough stats, and a new
block_rq_tag_wait tracepoint (Tang, Keith, Aaron)
- A round of removals, unexports and cleanups across bio, direct-io and
the bvec helpers (Christoph)
- Various driver fixes (mtip32xx use-after-free, rbd snap_count
validation and strscpy conversion, nbd socket lockdep reclassify,
virtio-blk zone report clamp, floppy) and a batch of MAINTAINERS
email/list updates (Coly, Li, Yu, Christoph Böhmwalder)
- Other little fixes and cleanups all over
* tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (117 commits)
MAINTAINERS: Update Coly Li's email address
block: check bio split for unaligned bvec
nbd: Reclassify sockets to avoid lockdep circular dependency
block: add configurable error injection
block: add a str_to_blk_op helper
block: add a "tag" for block status codes
block: add a macro to initialize the status table
floppy: Drop unused pnp driver data
block: propagate in_flight to whole disk on partition I/O
virtio-blk: clamp zone report to the report buffer capacity
block: optimize I/O merge hot path with unlikely() hints
drivers/block/rbd: Use strscpy() to copy strings into arrays
partitions: aix: bound the pp_count scan to the ppe array
block: Enable lock context analysis
block/mq-deadline: Make the lock context annotations compatible with Clang
block/Kyber: Make the lock context annotations compatible with Clang
block/blk-mq-debugfs: Improve lock context annotations
block/blk-iocost: Inline iocg_lock() and iocg_unlock()
block/blk-iocost: Split ioc_rqos_throttle()
block/crypto: Annotate the crypto functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Rework the task_work infrastructure.
Both the local (DEFER_TASKRUN) and the normal (tctx) task_work lists
were llist based, which is LIFO ordered, and hence each run had to do
an O(n) list reversal pass first to restore queue order.
Additionally, to cap the amount of task_work run, each method needed
a retry list as well.
Add a lockless MPCS FIFO queue (based on Dmitry Vyukov's intrusive
MPSC algorithm) and switch both task_work lists to it. It performs
better than llists and we can then also ditch the retry lists as well
as entries are popped one-at-the-time.
On top of those changes, run the tctx fallback task_work directly and
remove the now-unused per-ctx fallback machinery entirely.
- zcrx user notifications.
Add a mechanism for zcrx to communicate conditions back to userspace
via a dedicated CQE, with the initial users being notification on
running out of buffers and on a frag copy fallback, plus
shared-memory notification statistics.
Alongside that, a series of zcrx reliability and cleanup fixes: more
reliable scrubbing, poisoning pointers on unregistration, dropping an
extra ifq close, adding a ctx back-pointer, reordering fd allocation
in the export path, and killing a dead 'sock' member.
- Allow using io_uring registered buffers for plain SEND and RECV, not
just for the zero-copy send path.
This enables targets like ublk's NBD backend to push/pull IO data
directly to/from a registered buffer over a plain send/recv on a TCP
socket.
- Registered buffer improvements: account huge pages correctly, bump
the io_mapped_ubuf length field to size_t, and raise the previous 1GB
registered buffer size limit.
- Restrict the ctx access exposed to io_uring BPF struct_ops programs
by handing them an opaque type rather than the full io_ring_ctx, and
add a separate MAINTAINERS entry for the bpf-ops code.
- Allow opcode filtering on IORING_OP_CONNECT.
- Validate ring-provided buffer addresses with access_ok(), and align
the legacy buffer add limit with MAX_BIDS_PER_BGID.
- Various other cleanups and minor fixes, including avoiding msghdr
async data on connect/bind, dropping async_size for OP_LISTEN, making
the POLL_FIRST receive side checks consistent, re-checking
IO_WQ_BIT_EXIT for each linked work item, and using
trace_call__##name() at guarded tracepoint call sites.
* tag 'for-7.2/io_uring-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (31 commits)
io_uring/bpf-ops: add a separate maintainer entry
io_uring/net: make POLL_FIRST receive side checks consistent
io_uring: remove the per-ctx fallback task_work machinery
io_uring: run the tctx task_work fallback directly
io_uring: switch normal task_work to a mpscq
io_uring: switch local task_work to a mpscq
io_uring/mpscq: add lockless multi-producer, single-consumer FIFO queue
io_uring: grab RCU read lock marking task run
io_uring/zcrx: kill dead 'sock' member in struct io_zcrx_args
io_uring/kbuf: validate ring provided buffer addresses with access_ok()
io_uring/net: support registered buffer for plain send and recv
io_uring/nop: Drop a wrong comment in struct io_nop
io_uring/net: Remove async_size for OP_LISTEN
io_uring/net: Avoid msghdr on op_connect/op_bind async data
io_uring/bpf-ops: restrict ctx access to BPF
io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item
io_uring/kbuf: align legacy buffer add limit with MAX_BIDS_PER_BGID
io_uring/zcrx: add shared-memory notification statistics
io_uring/zcrx: notify user on frag copy fallback
io_uring/zcrx: notify user when out of buffers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"The most noticeable change is to enable large folios by default, it's
been in testing for a few releases. Related to that is huge folio
support (still under experimental config). Otherwise a few ioctl
updates, performance improvements and usual fixes and core changes.
User visible changes:
- enable large folios by default, added in 6.17 (under experimental
build), no feature limitations, a big change internally
- new ioctl to return raw checksums to userspace (a bit tricky given
compression and tail extents), can be used for mkfs and
deduplication optimizations
- provide stable UUID for e.g. overlayfs and temp_fsid, also
reflected in statvfs() field f_fsid, internal dev_t is hashed in to
allow cloning
- add 32bit compat version of GET_SUBVOL_INFO ioctl
- in experimental build, support huge folios (up to 2M)
Performance related improvements/changes:
- limit bio size to the estimated optimum derived from the queue,
this prevents build up of too much data for writeback, which could
cause latency spikes (reported improvement 15% on sequential
writes)
- don't force direct IO to be serialized, forgotten change during
mount API port, brings back +60% of throughput
- lockless calculation of number of shrinkable extent maps, improve
performance with many memcg allocated objects
Notable fixes:
- in zoned mode, fix a deadlock due to zone reclaim and relocation
when space needs to be flushed
- don't trim device which is internally not tracked as writeable
(e.g. when missing device is being rescanned)
- fix deadlock when cloning inline extent and mounted with
flushoncommit
- fix false IO failures after direct IO falls back to buffered write
in some cases
Core:
- remove COW fixup mechanism completely; detect and fix changes to
pages outside of filesystem tracking, guaranteed since 5.8, grace
period is over
- remove 2K block size support, experimental to test subpage code on
x86_64 but now it would block folio changes
- tree-checker improvements of:
- free-space cache and tree items
- root reference and backref items
- extent state exceptions in reloc tree
- subpage mode updates:
- code optimizations, simplify tracking bitmaps
- re-enable readahead of compressed extent
- extend bitmap size to cover huge folios
- add tracepoints related to sync, tree-log and transactions
- device stats item tracking unification, remove item if there are no
stats recorded, also don't leave stale stats on replaced device
- allow extent buffer pages to be allocated as movable, to help page
migration
- added checks for proper extent buffer release
- btrfs.ko code size reduction due to transaction abort call
simplifications
- several struct size reductions
- more auto free conversions
- more verbose assertions"
* tag 'for-7.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (130 commits)
btrfs: fix use-after-free after relocation failure with concurrent COW
btrfs: move WARN_ON on unexpected error in __add_tree_block()
btrfs: move locking into btrfs_get_reloc_bg_bytenr()
btrfs: lzo: reject compressed segment that overflows the compressed input
btrfs: retry faulting in the pages after a zero sized short direct write
btrfs: fix incorrect buffered IO fallback for append direct writes
btrfs: fix false IO failure after falling back to buffered write
btrfs: use verbose assertions in backref.c
btrfs: print a message when a missing device re-appears
btrfs: do not trim a device which is not writeable
btrfs: return real error after lookup failure in btrfs_ioctl_default_subvol()
btrfs: use mapping shared locking for reading super block
btrfs: use lockless read in nr_cached_objects shrinker callback
btrfs: switch local indicator variables to bools
btrfs: send: pass bool for pending_move and refs_processed parameters
btrfs: use shifts for sectorsize and nodesize
btrfs: fix deadlock cloning inline extent when using flushoncommit
btrfs: allocate eb-attached btree pages as movable
btrfs: add 32-bit compat ioctl for BTRFS_IOC_GET_SUBVOL_INFO
btrfs: derive f_fsid from on-disk fsid and dev_t
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull watchdog updates and fixes from Guenter Roeck:
"Subsystem:
- Unregister PM notifier on watchdog unregister
- Various documentation fixes and improvements
Removed drivers:
- Remove AMD Elan SC520 processor watchdog driver
- Drop SMARC-sAM67 support
- Remove driver for integrated WDT of ZFx86 486-based SoC
New drivers:
- Driver for Andes ATCWDT200
- Driver for Gunyah Watchdog
Added support to existing drivers:
- Add "apple,t8103-wdt" and "apple,t8122-wdt" compatibles to Apple
watchdog driver
- Add rockchip,rk3528-wdt and rockchip,rv1103b-wdt to snps,dw-wdt.yaml
- Document IPQ9650, IPQ5210, Shikra, Nord, and Hawi in qcom-wdt.yaml
Also document sram property and add support to get the bootstatus
to qcom wdt driver
- lenovo_se10_wdt: Fix use-after-rfree and add support for SE10 Gen 2
platform
- ti,rti-wdt: Add ti,am62l-rti-wdt compatible
- renesas: Document RZ/G3L support and rework example for
renesas,r9a09g057-wdt
Other bug fixes and improvements:
- Use named initializers (sc1200, ziirave_wdt)
- Allow pic32-dmt and pic32-wdt to be built with COMPILE_TEST
- realtek-otto: enable clock before using I/O, and prevent PHASE2 underflows
- rti_wdt: Add reaction control
- renesas,rzn1-wdt: Drop interrupt support and other cleanup
- gpio_wdt: Add ACPI support
- imx7ulp_wdt: Keep WDOG running until A55 enters WFI on i.MX94
- sprd_wdt: Remove redundant sprd_wdt_disable() on register failure
- bcm2835_wdt: Switch to new sys-off handler API
- sama5d4_wdt: Fix WDDIS detection on SAM9X60 and SAMA7G5
- hpwdt: Refine hpwdt message for UV platform
- Convert TS-4800 bindings to DT schema
- menz069_wdt: drop unneeded MODULE_ALIAS
- sp5100_tco: Use EFCH MMIO for newer Hygon FCH"
* tag 'watchdog-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (58 commits)
watchdog: sc1200: Drop unused assignment of pnp_device_id driver data
watchdog: unregister PM notifier on watchdog unregister
dt-bindings: watchdog: qcom-wdt: Document IPQ5210 watchdog
watchdog: dev: convert to kernel-doc comments
watchdog: core: clean up some comments
watchdog: uapi: add comments for what bit masks apply to
watchdog: linux/watchdog.h: repair kernel-doc comments
watchdog: add devm_watchdog_register_device() to watchdog-kernel-api
watchdog: ziirave_wdt: Use named initializers for struct i2c_device_id
watchdog: realtek-otto: enable clock before using I/O
watchdog: realtek-otto: prevent PHASE2 underflows
dt-bindings: watchdog: qcom-wdt: Document IPQ9650 watchdog
dt-bindings: watchdog: renesas,rzn1-wdt: interrupts are not required
dt-bindings: watchdog: apple,wdt: Add t8122 compatible
watchdog: apple: Add "apple,t8103-wdt" compatible
watchdog: rzn1: remove now obsolete interrupt support
dt-bindings: watchdog: Add watchdog compatible for RK3528
watchdog: convert the Kconfig dependency on OF_GPIO to OF
watchdog: Remove AMD Elan SC520 processor watchdog driver
watchdog: lenovo_se10_wdt: Fix use-after-free and resource leak risk
...
|
|
This adds observability for the io_uring zcrx rx-buf-len configuration.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/20260612211709.1456966-3-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: New features for 7.2
New features for 7.2 for KVM/s390:
* KVM_PRE_FAULT_MEMORY support
* Support for 2G hugepages
* Support for the ASTFLEIE 2 facility
* kvm_arch_set_irq_inatomic Fast Inject
* Fix potential leak of uninitialized bytes
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Futex updates:
- Optimize futex hash bucket access patterns (Peter Zijlstra)
- Large series to address the robust futex unlock race for real, by
Thomas Gleixner:
"The robust futex unlock mechanism is racy in respect to the
clearing of the robust_list_head::list_op_pending pointer because
unlock and clearing the pointer are not atomic.
The race window is between the unlock and clearing the pending op
pointer. If the task is forced to exit in this window, exit will
access a potentially invalid pending op pointer when cleaning up
the robust list.
That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF.
In the worst case this UAF can lead to memory corruption when
unrelated content has been mapped to the same address by the time
the access happens.
User space can't solve this problem without help from the kernel.
This series provides the kernel side infrastructure to help it
along:
1) Combined unlock, pointer clearing, wake-up for the
contended case
2) VDSO based unlock and pointer clearing helpers with a
fix-up function in the kernel when user space was interrupted
within the critical section.
... with help by André Almeida:
- Add a note about robust list race condition (André Almeida)
- Add self-tests for robust release operations (André Almeida)
Context analysis updates:
- Implement context analysis for 'struct rt_mutex'. (Bart Van Assche)
- Bump required Clang version to 23 (Marco Elver)
Guard infrastructure updates:
- Series to remove NULL check from unconditional guards (Dmitry
Ilvokhin)
Lockdep updates:
- Restore self-test migrate_disable() and sched_rt_mutex state on
PREEMPT_RT (Karl Mehltretter)
Membarriers updates:
- Use per-CPU mutexes for targeted commands (Aniket Gattani)
- Modernize membarrier_global_expedited with cleanup guards (Aniket
Gattani)
- Add rseq stress test for CFS throttle interactions (Aniket Gattani)
percpu-rwsems updates:
- Extract __percpu_up_read() to optimize inlining overhead (Dmitry
Ilvokhin)
Seqlocks updates:
- Allow UBSAN_ALIGNMENT to fail optimizing (Heiko Carstens)
Lock tracing:
- Add contended_release tracepoint to sleepable locks such as
mutexes, percpu-rwsems, rtmutexes, rwsems and semaphores (Dmitry
Ilvokhin)
MAINTAINERS updates:
- MAINTAINERS: Add RUST [SYNC] entry (Boqun Feng)
Misc updates and fixes by Randy Dunlap, YE WEI-HONG, Fabricio Parra,
Dmitry Ilvokhin and Peter Zijlstra"
* tag 'locking-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (36 commits)
locking: Add contended_release tracepoint to sleepable locks
locking/percpu-rwsem: Extract __percpu_up_read()
tracing/lock: Remove unnecessary linux/sched.h include
futex: Optimize futex hash bucket access patterns
rust: sync: completion: Mark inline complete_all and wait_for_completion
MAINTAINERS: Add RUST [SYNC] entry
cleanup: Specify nonnull argument index
selftests: futex: Add tests for robust release operations
Documentation: futex: Add a note about robust list race condition
x86/vdso: Implement __vdso_futex_robust_try_unlock()
x86/vdso: Prepare for robust futex unlock support
futex: Provide infrastructure to plug the non contended robust futex unlock race
futex: Add robust futex unlock IP range
futex: Add support for unlocking robust futexes
futex: Cleanup UAPI defines
x86: Select ARCH_MEMORY_ORDER_TSO
uaccess: Provide unsafe_atomic_store_release_user()
futex: Provide UABI defines for robust list entry modifiers
futex: Move futex related mm_struct data into a struct
futex: Make futex_mm_init() void
...
|
|
Allow uprobe_multi link to identify the target binary by an already
opened file descriptor.
Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for
the attr.link_create.uprobe_multi struct.
When the flag is set, we resolve the target from path_fd, without the
flag, we keep the existing string path behavior.
I don't see a use case for supporting O_PATH file descriptors, because
we need to read the binary first to get probes offsets, so I'm using
the CLASS(fd, f), which fails for O_PATH fds.
Assisted-by: Codex:GPT-5.4
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Reduce pipe->mutex contention by pre-allocating pages outside the
lock in anon_pipe_write().
anon_pipe_write() called alloc_page() once per page while holding
pipe->mutex. The allocation can sleep doing direct reclaim and runs
memcg charging, which extends the critical section and stalls any
concurrent reader on the same mutex. Now up to 8 pages are
pre-allocated before the mutex is taken, leftovers are recycled
into the per-pipe tmp_page[] cache before unlock, and any remainder
is released after unlock, keeping the allocator out of the critical
section on both sides. On a writers x readers sweep with 64KB
writes against a 1 MB pipe throughput improves 6-28% and average
write latency drops 5-22%; under memory pressure - when the cost of
holding the mutex across reclaim is highest - throughput improves
21-48% and latency drops 17-33%. The microbenchmark is added to
selftests.
- uaccess/sockptr: fix the ignored_trailing logic in
copy_struct_to_user() to behave as documented and the usize check
in copy_struct_from_sockptr() for user pointers, and add
copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).
- bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
inode backing a dentry via d_real_inode(). On overlayfs the inode
attached to the dentry doesn't carry the underlying device
information; this is used by the filesystem restriction BPF program
that was merged into systemd.
- docs: add guidelines for submitting new filesystems, motivated by
the maintenance burden abandoned and untestable filesystems impose
on VFS developers, blocking infrastructure work like folio
conversions and iomap migration.
Fixes:
- libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
and drop the now-redundant assignments in callers. This began as a
one-line dma-buf fix for a path_noexec() warning; a pseudo
filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
callers were audited: the only visible effect is on dma-buf where
SB_I_NOEXEC silences the warning.
- Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a
device with a sector size > PAGE_SIZE crashed roughly half of them;
the rest had the same missing error handling pattern. Plus a
follow-up releasing the superblock buffer_head when setting the
minix v3 block size fails.
- mount: honour SB_NOUSER in the new mount API.
- fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
switching the process-group paths of send_sigio() and send_sigurg()
from read_lock(&tasklist_lock) to RCU, matching the single-PID
path.
- vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
delegated NFS mounts (fsopen() in a container with the mount
performed by a privileged daemon) that broke when non-init
s_user_ns was tied to FS_USERNS_MOUNT.
- selftests/namespaces: fix a hang in nsid_test where an unreaped
grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
listns_efault_test, and a false FAIL on kernels without listns()
where the tests should SKIP.
- filelock: fix the break_lease() stub signature for
CONFIG_FILE_LOCKING=n.
- init/initramfs_test: wait for the async initramfs unpacking before
running; the test and do_populate_rootfs() share the parser state.
- fs/coredump: reduce redundant log noise in
validate_coredump_safety().
- iomap: pass the correct length to fserror_report_io() in
__iomap_write_begin().
- backing-file: fix the backing_file_open() kerneldoc.
Cleanups:
- initramfs: refactor the cpio hex header parsing to use hex2bin()
instead of the hand-rolled simple_strntoul() which is reverted, and
extend the initramfs KUnit tests to cover header fields with 0x
prefixes.
- Replace __get_free_pages() and friends with kmalloc()/kzalloc()
across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
do_mounts init code - part of the larger work of replacing page
allocator calls with kmalloc().
- Use clear_and_wake_up_bit() in unlock_buffer() and
journal_end_buffer_io_sync() instead of open-coding the sequence.
- Drop unused VFS exports: unexport drop_super_exclusive(), remove
start_removing_user_path_at(), and fold __start_removing_path()
into start_removing_path().
- fs/read_write: narrow the __kernel_write() export with
EXPORT_SYMBOL_FOR_MODULES().
- vfs: uapi: retire octal and hex constants in favor of (1 << n) for
the O_ flags. Finding a free bit for a new flag across the
architectures was needlessly hard with the mixed bases.
- dcache: add extra sanity checks of dead dentries in dentry_free()
via a new DENTRY_WARN_ONCE() that also prints d_flags.
- iov_iter: use kmemdup_array() in dup_iter() to harden the
allocation against multiplication overflow.
- fs/pipe: write to ->poll_usage only once.
- vfs: remove an always-taken if-branch in find_next_fd().
- dcache: use kmalloc_flex() for struct external_name in __d_alloc().
- namei: use QSTR() instead of QSTR_INIT() in path_pts().
- sync_file_range: delete dead S_ISLNK code.
- Comment fixes: retire a stale comment in fget_task_next() and fix
assorted spelling mistakes"
* tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits)
backing-file: fix backing_file_open() kerneldoc parameter
iomap: pass the correct len to fserror_report_io in __iomap_write_begin
vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
bpf: add bpf_real_inode() kfunc
fs/read_write: Do not export __kernel_write() to the entire world
libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
mount: honour SB_NOUSER in the new mount API
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
selftests/pipe: add pipe_bench microbenchmark
fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
fs: retire stale comment in fget_task_next()
fs: fix spelling mistakes in comment
bfs: replace get_zeroed_page() with kzalloc()
binfmt_misc: replace __get_free_page() with kmalloc()
configfs: replace __get_free_pages() with kzalloc()
fs/namespace: use __getname() to allocate mntpath buffer
fs/select: replace __get_free_page() with kmalloc()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull openat2 updates from Christian Brauner:
"Features:
- Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file
descriptor from an O_PATH file descriptor it is possible to use
openat(fd, ".", O_DIRECTORY) for directories, but other file types
require going through open("/proc/<pid>/fd/<nr>") and thus depend
on a functioning procfs.
With O_EMPTYPATH an empty path string is accepted and LOOKUP_EMPTY
is set at path resolution time, allowing to reopen the file behind
the file descriptor directly. Selftests are included.
- Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open
anything but regular files with the new EFTYPE error code.
This implements the "ability to only open regular files" feature
requested by userspace via uapi-group.org and protects services
from being redirected to fifos, device nodes, and friends.
All atomic_open implementations were audited for OPENAT2_REGULAR
handling. Explicit checks were added to ceph, gfs2, nfs (v4), and
cifs/smb - these are the filesystems whose atomic_open can
encounter an existing non-regular file and would otherwise call
finish_open() on it or return a misleading error code.
The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) only
call finish_open() on freshly created files and use
finish_no_open() for lookup hits, letting the VFS catch non-regular
files via the do_open() safety net.
Cleanups:
- Migrate the openat2 selftests to the kselftest harness and move
them under selftests/filesystems/. The tests were written in the
early days of selftests' TAP support and the modern kselftest
harness is much easier to follow and maintain. The contents of the
tests are unchanged and the new emptypath tests are ported on top.
- Make the LAST_XXX last-type constants private to fs/namei.c. The
only user outside of fs/namei.c was ksmbd which only needs to know
whether the last component is a regular one, so
vfs_path_parent_lookup() now performs the LAST_NORM check
internally. The ints are replaced with a dedicated enum last_type"
* tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: replace ints with enum last_type for LAST_XXX
vfs: make LAST_XXX private to fs/namei.c
selftests: openat2: port emptypath_test to kselftest harness
kselftest/openat2: test for OPENAT2_REGULAR flag
openat2: new OPENAT2_REGULAR flag support
openat2: introduce EFTYPE error code
selftest: add tests for O_EMPTYPATH
vfs: add O_EMPTYPATH to openat(2)/openat2(2)
selftests: openat2: migrate to kselftest harness
selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE
selftests: openat2: move helpers to header
selftests: move openat2 tests to selftests/filesystems/
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs casefolding updates from Christian Brauner:
"This exposes the case folding behavior of local filesystems so that
file servers - nfsd, ksmbd, and user space file servers - can report
the actual behavior to clients instead of guessing.
Filesystems report case-insensitive and case-nonpreserving behavior
via new file_kattr flags in their fileattr_get implementations. fat,
exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are
wired up. Local filesystems that are not explicitly handled default to
the usual POSIX behavior of case-sensitive and case-preserving.
nfsd uses this to report case folding via NFSv3 PATHCONF and to
implement the NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
attributes - both have been part of the NFS protocols for decades to
support clients on non-POSIX systems - and ksmbd reports it via
FS_ATTRIBUTE_INFORMATION. Exposing the information through the
fileattr uapi covers user space file servers.
The immediate motivation is interoperability: Windows NFS clients
hard-require servers to report case-insensitivity for Win32
applications to work correctly, and a client that knows the server is
case-insensitive can avoid issuing multiple LOOKUP/READDIR requests
searching for case variants.
The Linux NFS client already grew support for case-insensitive shares
years ago in support of the Hammerspace NFS server - negative dentry
caching must be disabled (a lookup for "FILE.TXT" failing must not
cache a negative entry when "file.txt" exists) and directory change
invalidation must drop cached case-folded name variants. Such servers
often operate in multi-protocol environments where a single file
service instance caters to both NFS and SMB clients, and nfsd needs to
report case folding properly to participate as a first-class citizen
there.
A follow-up series brings fixes for the initial work: the nfsd
case-info probe now uses kernel credentials, maps -ESTALE to
NFS3ERR_STALE, and has its cost capped across READDIR entries; the nfs
client avoids transiently zeroed case capability bits during the probe
and skips the pathconf probe when neither field is consumed; the
FS_CASEFOLD_FL semantics are clarified in the UAPI header; and the
tools UAPI headers are synced"
* tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
nfsd: Cap case-folding probe cost across READDIR entries
nfsd: Map -ESTALE from case probe to NFS3ERR_STALE
nfsd: Use kernel credentials for case-info probe
fs: Clarify FS_CASEFOLD_FL semantics in UAPI header
nfs: Skip pathconf probe when neither field is consumed
nfs: Avoid transient zeroed case capability bits during probe
tools headers UAPI: Sync case-sensitivity flags from linux/fs.h
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
nfsd: Report export case-folding via NFSv3 PATHCONF
isofs: Implement fileattr_get for case sensitivity
vboxsf: Implement fileattr_get for case sensitivity
nfs: Implement fileattr_get for case sensitivity
cifs: Implement fileattr_get for case sensitivity
xfs: Report case sensitivity in fileattr_get
hfsplus: Report case sensitivity in fileattr_get
hfs: Implement fileattr_get for case sensitivity
ntfs3: Implement fileattr_get for case sensitivity
exfat: Implement fileattr_get for case sensitivity
fat: Implement fileattr_get for case sensitivity
...
|
|
Adds the UAPI for the quiet flags feature (but not the implementation
yet).
Even though currently LANDLOCK_ADD_RULE_QUIET only affects audit
logging, in the future this can also be used as part of a supervisor
mechanism, where it will also suppress denial notifications on a
per-object basis. Thus the name is deliberately generic, as opposed to
e.g. LANDLOCK_ADD_RULE_LOG_QUIET.
According to pahole, even after adding the struct access_masks
quiet_masks in struct landlock_hierarchy, the u32 log_* bitfield still
only has a size of 2 bytes, so there's minimal wasted space.
Assisted-by: GitHub-Copilot:claude-opus-4.8
Signed-off-by: Tingmao Wang <m@maowtm.org>
[mic: Update date, fix comment formatting]
Link: https://patch.msgid.link/031184748a8e74c0bb02f1fa13d7a3f10918c627.1781228815.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Devlink param value attribute is not defined since devlink is handling
the value validating and parsing internally, this allows us to implement
multi attribute values without breaking any policies.
Devlink param multi-attribute values are considered to be dynamically
sized arrays of u64 values, by introducing a new devlink param type
DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable
count of u64 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute.
Implement get/set parsing and add to the internal value structure passed
to drivers.
This is useful for devices that need to configure a list of values for
a specific configuration.
example:
$ devlink dev param show pci/... name multi-value-param
name multi-value-param type driver-specific
values:
cmode permanent value: 0,1,2,3,4,5,6,7
$ devlink dev param set pci/... name multi-value-param \
value 4,5,6,7,0,1,2,3 cmode permanent
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-5-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for a second fine-grained UDP access right.
LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP controls the ability to set the
remote port of a socket (via connect()) and to specify an explicit
destination when sending a datagram, to override any remote peer set on
a UDP socket (e.g. in sendto() or sendmsg()). It will be useful for
applications that send datagrams, and for some servers too (those
creating per-client sockets, which want to receive traffic only from a
specific address).
Similarly as for bind(), this access control is performed when
configuring sockets, not in hot code paths.
Add detection of when autobind is about to be required, and deny the
operation if the process would not be allowed to call bind(0)
explicitly. Autobind can only be performed in udp_lib_get_port() from
code paths already controlled by LSM hooks: when connect()ing, sending a
first datagram, and in some splice() EOF edge case which, afaiu, can
only happen after a remote peer has been set. This invariant needs to be
preserved to keep bind policies actually enforced.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://patch.msgid.link/20260611162107.49278-3-matthieu@buffet.re
[mic: Add quick return for non-sandboxed tasks, fix sa_family
dereferencing, fix comment formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add support for a first fine-grained UDP access right.
LANDLOCK_ACCESS_NET_BIND_UDP controls the ability to set the local port
of a UDP socket (via bind()). It will be useful for servers (to start
receiving datagrams), and for some clients that need to use a specific
source port (e.g. mDNS requires to use port 5353)
For obvious performance concerns, access control is only enforced when
configuring sockets, not when using them for common send/recv
operations.
Bump ABI to allow userspace to detect and use this new right.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://patch.msgid.link/20260611162107.49278-2-matthieu@buffet.re
[mic: Fix comment formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add DPLL_TYPE_GENERIC to represent DPLL devices which do not fit the
existing PPS or EEC classes.
The UAPI type is intentionally generic. During netdev discussion,
maintainers pointed out that introducing identifiers tied to a specific
placement or single design does not scale across ASICs and vendors.
The role of a DPLL is already inferable from the spawning driver,
bus device, and pin topology, without encoding additional
purpose-specific taxonomy in the type name.
Using a generic type keeps the UAPI extensible and avoids premature
naming that may become incorrect as new hardware topologies are
exposed through the DPLL subsystem.
Expose the new type through UAPI and netlink specification as "generic".
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-2-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2026-06-12
1) Replace the open-coded manual cleanup in xfrm_add_policy() error
path with xfrm_policy_destroy() for consistency with
xfrm_policy_construct().
From Deepanshu Kartikey.
2) Limit XFRMA_TFCPAD to a sensible maximum (max IP length, 64k) since
u32 is excessive for traffic flow confidentiality padding.
From David Ahern.
3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
allows migrating individual IPsec SAs independently of
their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
to policy+SA migration, lacks SPI for unique SA identification,
and cannot express reqid changes or migrate Transport mode
selectors. The new interface identifies the SA via SPI and mark,
supports reqid changes, address family changes, encap removal,
and uses an atomic create+install flow under x->lock to prevent
SN/IV reuse during AEAD SA migration.
From Antony Antony.
* tag 'ipsec-next-2026-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: add documentation for XFRM_MSG_MIGRATE_STATE
xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE
xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
xfrm: make xfrm_dev_state_add xuo parameter const
xfrm: extract address family and selector validation helpers
xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper
xfrm: move encap and xuo into struct xfrm_migrate
xfrm: add error messages to state migration
xfrm: add state synchronization after migration
xfrm: check family before comparing addresses in migrate
xfrm: split xfrm_state_migrate into create and install functions
xfrm: rename reqid in xfrm_migrate
xfrm: fix NAT-related field inheritance in SA migration
xfrm: allow migration from UDP encapsulated to non-encapsulated ESP
xfrm: add extack to xfrm_init_state
xfrm: remove redundant assignments
xfrm: Reject excessive values for XFRMA_TFCPAD
xfrm: cleanup error path in xfrm_add_policy()
====================
Link: https://patch.msgid.link/20260612074725.1760473-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The main purpose of this cmd is to be able to associate a
non-psp-capable device (e.g. veth or netkit) with a psp device.
One use case is if we create a pair of veth/netkit, and assign 1 end
inside a netns, while leaving the other end within the default netns,
with a real PSP device, e.g. netdevsim or a physical PSP-capable NIC.
With this command, we could associate the veth/netkit inside the netns
with PSP device, so the virtual device could act as PSP-capable device
to initiate PSP connections, and performs PSP encryption/decryption on
the real PSP device.
Signed-off-by: Wei Wang <weibunny@fb.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260608233118.2694144-3-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The tls_toe feature and its single user (chelsio chtls) have been
unmaintained for multiple years. It also hooks into the core of the
TCP implementation, and bypasses most of the networking stack.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/1f30e73275c07bf879f547589872d0916025a52e.1781165969.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IOMMU_IOAS_MAP_FILE is documented as mapping a memfd, but the
implementation first tries to resolve the fd as a dma-buf and has a
special path for supported dma-buf exporters. In particular, VFIO PCI
dma-bufs exported through VFIO_DEVICE_FEATURE_DMA_BUF can be mapped when
they describe a single DMA range.
Update the UAPI comment so userspace understands that certain kinds of
dma-buf are supported in addition to memfd.
Fixes: 44ebaa1744fd ("iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE")
Link: https://patch.msgid.link/r/20260610-tmp-v1-1-b8ccbf557391@fb.com
Signed-off-by: Alex Mastro <amastro@fb.com>
Assisted-by: Codex:gpt-5.5-high
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
'rockchip', 'verisilicon', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next
|
|
Introduce vendor-specific PHY tunable identifiers to control the
KSZ87xx low-loss cable erratum handling through the ethtool PHY
tunable interface.
The following tunables are added:
- a boolean "short-cable" tunable, applying a documented and
conservative preset intended for short or low-loss Ethernet cables;
- an integer LPF bandwidth tunable, allowing advanced adjustment of the
receiver low-pass filter bandwidth;
- an integer DSP EQ initial value tunable, allowing advanced tuning of
the PHY equalizer initialization.
The actual behavior is implemented by the corresponding PHY and switch
drivers.
Reviewed-by: Marek Vasut <marex@nabladev.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
Link: https://patch.msgid.link/20260609-ksz87xx_errata_low_loss_connections-v10-2-9ba4418cf3db@exotec.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Link: https://patch.msgid.link/faec5df2a3e90240cde5897bae2250cd7d44aeac.1781170056.git.tarunsahu@google.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
|
|
When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.
In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.
Introduce a configuration knob to control this behavior. It allows
the bonding master to assert carrier only when at least 'min_links'
slaves are in Collecting_Distributing state.
The default mode preserves the existing behavior. This patch only
introduces the knob; its behavior is implemented in the subsequent
commit.
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20260603150331.1919611-4-louis.scalbert@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When compile testing the UAPI headers with clang, there is an warning turned
error for using a C++ style ('//') comment, which is explicitly forbidden for
UAPI headers.
In file included from <built-in>:1:
./usr/include/linux/virtio_can.h:29:35: error: // comments are not allowed in this language [-Werror,-Wcomment]
29 | #define VIRTIO_CAN_MAX_DLEN 64 // this is like CANFD_MAX_DLEN
| ^
1 error generated.
Switch to a standard C style comment.
Fixes: 2b6b4bb7d96f ("can: virtio: Add virtio CAN driver")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260604-virtio_can-fix-uapi-comment-v1-1-199fa96ec5f0@kernel.org>
|
|
Add virtio CAN driver based on Virtio 1.4 specification (see
https://github.com/oasis-tcs/virtio-spec/tree/virtio-1.4). The driver
implements a complete CAN bus interface over Virtio transport,
supporting both CAN Classic and CAN-FD Ids. In term of frames, it
supports classic and CAN FD. RTR frames are only supported with classic
CAN.
Usage:
- "ip link set up can0" - start controller
- "ip link set down can0" - stop controller
- "candump can0" - receive frames
- "cansend can0 123#DEADBEEF" - send frames
Signed-off-by: Harald Mommer <harald.mommer@oss.qualcomm.com>
Co-developed-by: Harald Mommer <harald.mommer@oss.qualcomm.com>
Signed-off-by: Mikhail Golubev-Ciuchea <mikhail.golubev-ciuchea@oss.qualcomm.com>
Co-developed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Damir Shaikhutdinov <Damir.Shaikhutdinov@opensynergy.com>
Reviewed-by: Francesco Valla <francesco@valla.it>
Tested-by: Francesco Valla <francesco@valla.it>
Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <ahXNb+KzuHYbS24+@fedora>
|
|
There is a spelling mistake in a struct description. Fix it.
Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260418-virtio-typo-v1-1-0df6f943a79d@ethancedwards.com>
|
|
When a filesystem is exported to NFS clients, NFSv4 state
(opens, locks, delegations, layouts) holds references that
prevent the underlying filesystem from being unmounted.
NFSD_CMD_UNLOCK_FILESYSTEM addresses this at superblock
granularity, but administrators unexporting a single path on a
shared filesystem (e.g., one of several exports on the same device)
need finer control.
Add NFSD_CMD_UNLOCK_EXPORT, which revokes NFSv4 state acquired
through exports of a specific path. Matching is by path identity
(dentry + vfsmount) via the sc_export field on each nfs4_stid,
so multiple svc_export objects for the same path -- one per
auth_domain -- are handled correctly without requiring the caller
to name a specific client.
The command takes a single "path" attribute. Userspace (exportfs
-u) sends this after removing the last client for a given path,
enabling the underlying filesystem to be unmounted. When multiple
clients share an export path, individual unexports do not trigger
state revocation; only the final one does.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Add NFSD_CMD_UNLOCK_FILESYSTEM as a dedicated netlink command for
revoking NFS state under a filesystem path, providing a netlink
equivalent of /proc/fs/nfsd/unlock_fs.
The command requires a "path" string attribute containing the
filesystem path whose state should be released. The handler
resolves the path to its superblock, then cancels async copies,
releases NLM locks, and revokes NFSv4 state on that superblock.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The existing write_unlock_ip procfs interface releases NLM file
locks held by a specific client IP address, but procfs provides
no structured way to extend that operation to other scopes such as
revoking NFSv4 state.
Add NFSD_CMD_UNLOCK_IP as a dedicated netlink command for
releasing NLM locks by client address. The command accepts a
binary sockaddr_in or sockaddr_in6 in its address attribute.
The handler validates the address family and length, then calls
nlmsvc_unlock_all_by_ip() to release matching NLM locks. Because
lockd is a single global instance, that call operates across
all network namespaces regardless of which namespace the caller
inhabits.
A separate netlink command for filesystem-scoped unlock is added in
a subsequent commit.
The nfsd_ctl_unlock_ip tracepoint is updated from string-based
address logging to __sockaddr, which stores the binary sockaddr
and formats it with %pISpc. This affects both the new netlink path
and the existing procfs write_unlock_ip path, giving consistent
structured output in both cases.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Add KVM_CAP_S390_HPAGE_2G to signal to userspace that 2G hugepages may
be used to back the guest; restrictions apply similar to 1M hugepages.
Enable the (for now still ignored) GMAP_FLAG_ALLOW_HPAGE_2G flag for
the guest gmap, and propagate / disable it as necessary.
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260609150930.665370-3-imbrenda@linux.ibm.com>
|
|
Because LLC wasn't complicated/annoying enough, there's 2 more
"ethertypes" being used for it:
- 0x8870 is pretty "normal", it got standardized in
802.1AC-2016/Cor1-2018 for transporting LLC frames > 1500 bytes.
It simply replaces the length value (which is no longer encoded, and
must now be derived from the packet.) The actual value dates back to
2001; https://datatracker.ietf.org/doc/html/draft-ietf-isis-ext-eth-01
(it was used without "proper" standardization for a long time)
- 0x00fe is a doozy - actually "invalid" depending on how you look at
it; it's used in GRE (and possibly GENEVE) tunnels to transport the
IS-IS routing protocol. https://seclists.org/tcpdump/2002/q4/61 is
the best/oldest source I could find. It's inspired by the 0xfe SAP
value, a GRE packet with protocol 0x00fe is followed by a payload "as
if" it was Ethernet with "<length> 0xfe 0xfe 0x03". (Again the length
isn't encoded explicitly anymore.)
The 0x00fe value is quite close to other values the kernel is using
internally for various things (after all they "won't clash for 1500
types"). Except this one does clash, and if someone unknowingly starts
using it for something internal... we end up in a world of pain in
getting IS-IS running on GRE tunnels. Hence the "WARNING".
Signed-off-by: David 'equinox' Lamparter <equinox@diac24.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Link: https://patch.msgid.link/20260605164144.81184-1-equinox@diac24.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches, all by
Sven Eckelmann:
- tp_meter: initialize last_recv_time during init
- convert cancellation of work items to disable helper
- clean up wifi detection cache (3 patches)
- clean up kernel-doc: corrections, reword, typos (6 patches)
* tag 'batadv-next-pullrequest-20260605' of https://git.open-mesh.org/batadv:
batman-adv: fix kernel-doc typos and grammar errors
batman-adv: fix batadv_v_ogm_packet_recv error handling kernel-doc
batman-adv: uapi: keep kernel-doc in struct member order
batman-adv: bla: update stale kernel-doc
batman-adv: tp_meter: update stale kernel-doc after refactoring
batman-adv: correct batadv_wifi_* kernel-doc
batman-adv: document cleanup of batadv_wifi_net_devices entries
batman-adv: use GFP_KERNEL allocations for the wifi detection cache
batman-adv: drop duplicated wifi_flags assignments
batman-adv: convert cancellation of work items to disable helper
batman-adv: tp_meter: initialize last_recv_time during init
====================
Link: https://patch.msgid.link/20260605072005.490368-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add comments similar to those in include/linux/watchdog.h
so that the reader/user doesn't have to dig into the API documentation
files for this.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add a new unprivileged BTRFS_IOC_GET_CSUMS ioctl, which can be used to
query the on-disk csums for a file range.
The ioctl is deliberately per-file rather than exposing raw csum tree
lookups, to avoid leaking information to users about files they may not
have access to.
This is done by userspace passing a struct btrfs_ioctl_get_csums_args to
the kernel, which details the offset and length we're interested in, and
a buffer for the kernel to write its results into. The kernel writes a
struct btrfs_ioctl_get_csums_entry into the buffer, followed by the
csums if available. The maximum size of the user buffer is capped to
16MiB.
If the extent is an uncompressed, non-NODATASUM extent, the kernel sets
the entry type to BTRFS_GET_CSUMS_HAS_CSUMS and follows it with the
csums. If it is sparse, preallocated, or beyond the EOF, it sets the
type to BTRFS_GET_CSUMS_ZEROED - this is so userspace knows it can use
the precomputed hash of the zero sector. Otherwise, it sets the type to
BTRFS_GET_CSUMS_NODATASUM, BTRFS_GET_CSUMS_COMPRESSED,
BTRFS_GET_CSUM_ENCRYPTED, or BTRFS_GET_CSUM_INLINE.
For example, a file with a [0, 4K) hole and [4K, 12K) data extent would
produce the following output buffer:
| [0, 4K) ZEROED | [4K, 12K) HAS_CSUMS | csum data |
We do store the csums of compressed extents, but we deliberately don't
return them here: they're calculated over the compressed data, not the
uncompressed data that's returned to userspace. Similarly for encrypted
data, once encryption is supported, in which the csums will be on the
ciphertext.
The main use case for this is for speeding up mkfs.btrfs --rootdir. For
the case when the source FS is btrfs and using the same csum algorithm,
we can avoid having to recalculate the csums - in my synthetic
benchmarks (16GB file on a spinning-rust drive), this resulted in a ~11%
speed-up (218s to 196s).
When using the --reflink option added in btrfs-progs v6.16.1, we can forgo
reading the data entirely, resulting a ~2200% speed-up on the same test
(128s to 6s).
# mkdir rootdir
# dd if=/dev/urandom of=rootdir/file bs=4096 count=4194304
(without ioctl)
# echo 3 > /proc/sys/vm/drop_caches
# time mkfs.btrfs --rootdir rootdir testimg
...
real 3m37.965s
user 0m5.496s
sys 0m6.125s
# echo 3 > /proc/sys/vm/drop_caches
# time mkfs.btrfs --rootdir rootdir --reflink testimg
...
real 2m8.342s
user 0m5.472s
sys 0m1.667s
(with ioctl)
# echo 3 > /proc/sys/vm/drop_caches
# time mkfs.btrfs --rootdir rootdir testimg
...
real 3m15.865s
user 0m4.258s
sys 0m6.261s
# echo 3 > /proc/sys/vm/drop_caches
# time mkfs.btrfs --rootdir rootdir --reflink testimg
...
real 0m5.847s
user 0m2.899s
sys 0m0.097s
Another notable use case is for deduplication, where reading the
checksums may serve as a hint instead of reading the whole file data.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|