| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "selftests/mm: clean up build output and verbosity" (Li Wang)
Remove some noise from the MM selftests build
- "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts)
Speed up the freeing of a batch of 0-order pages by first scanning
them for coalescing opportunities. This is applicable to vfree() and
to the releasing of frozen pages
- "mm/damon: introduce DAMOS failed region quota charge ratio"
(SeongJae Park)
Address a DAMOS usability issue: The DAMOS quota often exhausts
prematurely because it charges for all memory attempted, causing slow
and inconsistent performance when actions fail on unreclaimable
memory.
To fix this, a new feature lets users set a smaller, flexible quota
charge ratio (via a numerator and denominator) for failed regions.
Since failed actions cause less overhead, reducing their quota cost
ensures more predictable and efficient DAMOS processing
- "selftests/cgroup: improve zswap tests robustness and support large
page sizes" (Li Wang)
Fix various spurious failures and improves the overall robustness of
the cgroup zswap selftests
- "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga)
Fix an issue in the mlock selftests on arm32
- "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao)
Some maintenance work in the huge_memory code
- "treewide: fixup gfp_t printks" (Brendan Jackman)
Use the special vprintf() gfp_t conversion in various places
- "mm: Fix vmemmap optimization accounting and initialization" (Muchun
Song)
Fix several bugs in the vmemmap optimization, mainly around incorrect
page accounting and memmap initialization in the DAX and memory
hotplug paths. It also fixes pageblock migratetype initialization and
struct page initialization for ZONE_DEVICE compound pages
- "mm/damon: repost non-hotfix reviewed patches in damon/next tree"
A sprinkle of unrelated minor bugfixes for DAMON
- "mm: remove page_mapped()" (David Hildenbrand)
Remove this function from the tree, replacing it with folio_mapped()
- "mm/damon: let DAMON be paused and resumed" (SeongJae Park)
Allow DAMON to be paused and resumed without losing its current state
- "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad
Usama Anjum)
Simplify and speed up kasan by removing its ineffective tagging of
stacks and page tables
- "mm/damon/reclaim,lru_sort: monitor all system rams by default"
(SeongJae Park)
Simplify deployment on diverse hardware like NUMA systems by updating
DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the
physical address range covering all System RAM areas by default,
replacing the overly restrictive behavior that only targeted the
single largest memory block to save on negligible overhead
- "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae
Park)
Update some DAMON docs
- "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin)
Switch zone->lock handling over to using the guard() mechanisms
- "mm/filemap: tighten mmap_miss hit accounting" (fujunjie)
Fix a flaw where the mmap_miss counter over-credited page cache hits
during fault-arounds and page-fault retries. This results in
significant reduction of redundant synchronous mmap readahead I/O,
drastically cutting down execution time and gigabytes read for sparse
random or strided memory access workloads
- "selftests/cgroup: Fix false positive failures in test_percpu_basic"
(Li Wang)
Fix a couple of false-positives in the cgroup kmem selftests
- "mm/damon/reclaim: support monitoring intervals auto-tuning"
(SeongJae Park)
Add a new parameter to DAMON permitting DAMON_RECLAIM to
automatically tune DAMON's sampling and aggregation intervals
- "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park)
Change DAMON_STAT to provide the pid of its kdamond
- "mm/kmemleak: dedupe verbose scan output" (Breno Leitao)
Remove large amounts of duplicated backtraces from the verbose-mode
kmemleak output
- "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David
Hildenbrand)
Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to
removing it entirely in a later series
- "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan)
Prevent users from passing a non-power-of-2 value of `addr_unit', as
this later results in undesirable behavior
- "mm: document read_pages and simplify usage" (Frederick Mayle)
- "tools/mm/page-types: Fix misc bugs" (Ye Liu)
Fix three issues in tools/mm/page-types.c
- "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman)
Implement several cleanups in the page allocator and related code
- "mm, swap: swap table phase IV: unify allocation" (Kairui Song)
Unify the allocation and charging of anon and shmem swap in folios,
provides better synchronization, consolidates the metadata
management, hence dropping the static array and map, and improves
performance
- "mm/damon: introduce data attributes monitoring" (SeongJae Park(
Extend DAMON to monitor general data attributes other than accesses
- "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra)
Implement the TODO in vrealloc() to unmap and free unused pages when
shrinking across a page boundary
- "mm/damon: documentation and comment fixes" (niecheng)
- "remove mmap_action success, error hooks" (Lorenzo Stoakes)
Eliminate custom hooks from mmap_action by removing the problematic
success_hook which allowed drivers to improperly access uninitialized
VMAs. It replaces the error_hook with a simple error-code field and
updates the memory char driver accordingly
- "mm/damon: minor improvements for code readability and tests"
(SeongJae Park)
- "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym
Shcherba)
- "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike
Rapoport)
- "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and
others)
Clean up and slightly improves MGLRU's reclaim loop and dirty
writeback handling. Large performance improvements are measured
- "use vma locks for proc/pid/{smaps|numa_maps} reads" (Suren
Baghdasaryan)
Use per-vma locks when reading /proc/pid/smaps and numa_maps similar
to reduce contention on central mmap_lock
- "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()"
(Ran Xiaokai)
Some cleanup work in the THP code
- "selftests/memfd: fix compilation warnings" (Konstantin Khorenko)
Fix a few build glitches in the memfd selftest code.
- "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel
Butt)
Resolve a 68% performance regression caused by NUMA-node cache
thrashing around struct obj_stock_pcp by shrinking its existing
fields and expanding it into a multi-slot array that caches up to
five obj_cgroup pointers per CPU, allowing per-node variants of the
same memcg to coexist within a single 64-byte cache line.
- "zram: writeback fixes" (Sergey Senozhatsky)
address a couple of unrelated zram writeback issues
- "mm: switch THP shrinker to list_lru" (Johannes Weiner)
Resolve NUMA-awareness issues and streamlines callsite interaction by
refactoring and extending the list_lru API to completely replace the
complex, open-coded deferred split queue for Transparent Huge Pages
- "mm: improve large folio readahead for exec memory" (Usama Arif)
Improve large-folio readahead on systems like 64K-page arm64 by
preventing the mmap_miss check from permanently disabling
target-oriented VM_EXEC readahead, and by generalizing the
force_thp_readahead gate to support mappings with any usefully large
maximum folio order under the cache cap.
- "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau)
Fix a bunch of minor issues in the userfaultfd/pagemap, all of which
were flagged by Sashiko review of proposed new material
- "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and
vmemmap_check_pmd()" (Muchun Song)
Provide generic versions of these two functions so the four
arch-specific implementations can be removed.
- "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device" (Youngjun Park)
Address a uswsusp-vs-swapoff race and reduces the swap device
reference taking/releasing frequency.
- "mm/hmm: A fix and a selftest" (Dev Jain)
* tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries
fs/proc/task_mmu: do not warn on seeing non-migration pmd entry
lib/test_hmm: check alloc_page_vma() return value and handle OOM
mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX
mm/swap: remove redundant swap device reference in alloc/free
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
mm/filemap: use folio_next_index() for start
vmalloc: fix NULL pointer dereference in is_vm_area_hugepages()
sparc/mm: drop vmemmap_check_pmd helper and use generic code
loongarch/mm: drop vmemmap_check_pmd helper and use generic code
riscv/mm: drop vmemmap_pmd helpers and use generic code
arm64/mm: drop vmemmap_pmd helpers and use generic code
mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd()
rust: page: mark Page::nid as inline
userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks
userfaultfd: gate must_wait writability check on pte_present()
mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade
fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole()
fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry()
fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Work on removing rtnl_lock protection throughout the stack
continues. In this chapter:
- don't use rtnl_lock for IPv6 multicast routing configuration
- don't take rtnl_lock in ethtool for modern drivers
- prepare Qdisc dump callbacks for rtnl_lock removal
- Support dumping just ifindex + name of all interfaces, under RCU.
It's a common operation for Netlink CLI tools (when translating
names to ifindexes) and previously required full rtnl_lock.
- Support dumping qdiscs and page pools for a specific netdev. Even
tho user space wants a dump of all netdevs, most of the time, the
OOO programming model results in repeating the dump for each
netdev. Which, in absence of a cache, leads to a O(n^2) behavior.
- Flush nexthops once on multi-nexthop removal (e.g. when device goes
down), another O(n^2) -> O(n) improvement.
- Rehash locally generated traffic to a different nexthop on
retransmit timeout.
- Honor oif when choosing nexthop for locally generated IPv6 traffic.
- Convert TCP Auth Option to crypto library, and drop non-RFC algos.
- Increase subflow limits in MPTCP to 64 and endpoint limit to 256.
- Support MPTCP signaling of IPv6 address + port (ADD_ADDR). We need
to selectively skip reporting of the standard TCP Timestamp option,
because they won't fit into the header space together (12 + 30 >
40).
- Support using bridge neighbor suppression, Duplicate Address
Detection, Gratuitous ARP and unsolicited NA forwarding - in EVPN
deployments, e.g. VXLAN fabrics (IPv4 and IPv6).
- Improve link state reporting for upper netdevs (e.g. macvlan) over
tunnel devices (again, mostly for EVPN deployments).
- Support binding GENEVE tunnels to a local address.
- Speed up UDP tunnel destruction (remove one synchronize_rcu()).
- Support exponential field encoding in multicast (IGMPv3 and MLDv2).
- Support attaching PSP crypto offload to containers (veth, netkit).
- Add a new IPSec Netlink message XFRM_MSG_MIGRATE_STATE that allows
migrating individual IPsec SAs independently of their policies.
The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA
migration, lacks SPI for unique SA identification, and cannot
express reqid changes or migrate Transport mode selectors.
The new interface identifies the SA via SPI and mark, supports
reqid changes, address family changes, encap removal, and uses an
atomic create+install flow under x->lock to prevent SN/IV reuse
during AEAD SA migration.
- Implement GRO/GSO support for PPPoE.
- Convert sockopt callbacks in a number of protocols to iov_iter.
Cross-tree stuff:
- Remove support for Crypto TFM cloning (unblocked after the TCP Auth
Option rework). This feature regressed performance for all crypto
API users, since it changed crypto transformation objects into
reference-counted objects.
- Add FCrypt-PCBC implementation to rxrpc and remove it from the
global crypto API as obsolete and insecure.
Wireless:
- Major rework of station bandwidth handling, fixing issues with
lower capability than AP.
- Cleanups for EMLSR spec issues (drafts differed).
- More Neighbor Awareness Networking (Wi-Fi Aware) work (multicast,
schedule improvements, multi-station etc.)
- Some Ultra High Reliability (UHR) / IEEE 802.11bn (D1.4) work
(e.g. non-primary channel access, UHR DBE support).
- Fine Timing Measurement ranging (i.e. distance measurement) APIs.
Netfilter:
- Use per-rule hash initval in nf_conncount. This avoids unnecessary
lock contention with short keys (e.g. conntrack zones) in different
namespaces.
- Various safety improvements, both in packet parsing and object
lifetimes. Notably add refcounts to conntrack timeout policy.
Deletions:
- Remove TLS + sockmap integration. TLS wants to pin user pages to
avoid a copy, and sockmap wants to write to the input stream. More
work on this integration is clearly needed, and we can't find any
users (original author admitted that they never deployed it).
- Remove support for TLS offload with TCP Offload Engine (the far
more common opportunistic offload is retained). The locking looks
unfixable (driver sleeps under TCP spin locks) and people from the
vendor that added this are AWOL.
- Remove more ATM code, trying to leave behind only what PPPoATM
needs, AAL5 and br2684 with permanent circuits.
- Remove AppleTalk. Let it join hamradio in our out of tree protocol
graveyard, I mean, repository.
- Disable 32-bit x_tables compatibility (32bit binaries on 64bit
kernel) interface in user namespaces. To be deleted completely,
soon.
- Remove 5/10 MHz support from cfg80211/mac80211.
Drivers:
- Software:
- Support DEVMEM/DMABUF Tx over NETMEM_TX_NO_DMA devices (netkit)
- bonding: add knob to strictly follow 802.3ad for link state
- New drivers:
- Alibaba Elastic Ethernet Adaptor (cloud vNIC).
- NXP NETC switch within i.MX94.
- DPLL:
- Add operational state to pins (implement in zl3073x).
- Add generic DPLL type, for daisy-chaining DPLLs (implement in ice).
- Ethernet high-speed NICs:
- Huawei (hinic3):
- enhance tc flow offload support with queue selection,
tunnels
- nVidia/Mellanox:
- avoid over-copying payload to the skb's linear part (up to
60% win for LRO on slow CPUs like ARM64 V2)
- expose more per-queue stats over the standard API
- support additional, unprivileged PFs in the DPU
configuration
- support Socket Direct (multi-PF) with switchdev offloads
- add a pool / frag allocator for DMA mapped buffers for
control objects, save memory on systems with 64kB page size
- take advantage of the ability to dynamically change RSS
table size, even when table is configured by the user
- increase the max RSS table size for even traffic
distribution
- Ethernet NICs:
- Marvell/Aquantia:
- AQC113 PTP support
- Realtek USB (r8152):
- support 10Gbit Link Speeds and Energy-Efficient Ethernet
(EEE)
- support firmware loaded (for RTL8157/RTL8159)
- support for the RTL8159
- Intel (ixgbe):
- support Energy-Efficient Ethernet (EEE) on E610 devices
- Ethernet switches:
- Airoha:
- support multiple netdevs on a single GDM block / port
- Marvell (mv88e6xxx):
- support SERDES of mv88e6321
- Microchip (ksz8/9):
- rework the driver callbacks to remove one indirection layer
- Motorcomm (yt921x):
- support port rate policing
- support TBF qdisc offload
- support ACL/flower offload
- nVidia/Mellanox:
- expose per-PG rx_discards
- Realtek:
- rtl8365mb: bridge offloading and VLAN support
- Ethernet PHYs:
- Airoha:
- support Airoha AN8801R Gigabit PHYs.
- Micrel:
- implement 3 low-loss cable tunables
- Realtek:
- support MDI swapping for RTL8226-CG
- support MDIO for RTL931x
- Qualcomm:
- at803x: Rx and Tx clock management for IPQ5018 PHY
- Motorcomm:
- support YT8522 100M RMII PHY
- set drive strength in YT8531s RGMII
- TI:
- dp83822: add optional external PHY clock
- Bluetooth:
- hci_sync: add support for HCI_LE_Set_Host_Feature [v2]
- SMP: use AES-CMAC library API
- Intel:
- support Product level reset
- support smart trigger dump
- Mediatek:
- add event filter to filter specific event
- Realtek:
- fix RTL8761B/BU broken LE extended scan
- WiFi:
- Broadcom (b43):
- new support for a 11n device
- MediaTek (mt76):
- support mt7927
- mt792x: broken usb transport detection
- mt7921: regulatory improvements
- Qualcomm (ath9k):
- GPIO interface improvements
- Qualcomm (ath12k):
- WDS support
- replace dynamic memory allocation in WMI Rx path
- thermal throttling/cooling device support
- 6 GHz incumbent interference detection
- channel 177 in 5 GHz
- Realtek (rt89):
- RTL8922AU support
- USB 3 mode switch for performance
- better monitor radiotap support
- RTL8922DE preparations"
* tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1778 commits)
ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
net: serialize netif_running() check in enqueue_to_backlog()
net: skmsg: preserve sg.copy across SG transforms
appletalk: move the protocol out of tree
appletalk: stop storing per-interface state in struct net_device
selftests/bpf: test that TLS crypto is rejected on a sockmap socket
selftests/bpf: drop the unused kTLS program from test_sockmap
selftests/bpf: remove sockmap + ktls tests
tls: remove dead sockmap (psock) handling from the SW path
tls: reject the combination of TLS and sockmap
atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
atm: remove unused ATM PHY operations
atm: remove the unused pre_send and send_bh device operations
atm: remove the unused change_qos device operation
atm: remove SVC socket support and the signaling daemon interface
atm: remove the local ATM (NSAP) address registry
atm: remove dead SONET PHY ioctls
atm: remove the unused send_oam / push_oam callbacks
atm: remove AAL3/4 transport support
net: dsa: sja1105: fix lastused timestamp in flower stats
...
|
|
AppleTalk has been removed in MacOS X 10.6 (Snow Leopard), in 2009,
according to Wikipedia. We recently got a burst of AI generated
fixes to this protocol which nobody is reviewing.
Let AppleTalk follow AX.25 and hamradio out of the Linux tree.
We we will maintain the code at: github.com/linux-netdev/mod-orphan
for anyone interested in playing with it.
Retain the uAPI for now. No strong reason, simply because I suspect
keeping it will be less controversial.
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://patch.msgid.link/20260615222935.947233-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
overcommit_memory is set to 0 or 1
Looking at __vm_enough_memory() in mm/util.c, user_reserve_kbytes has no
effect when overcommit_memory is set to 0 or 1. The documentation for
overcommit_memory already references user_reserve_kbytes when the flag
is set to 2.
Let's go ahead and add a clarification to user_reserve_kbytes in vm.rst
that it has no effect when overcommit_memory is set to 0 or 1.
Link: https://lore.kernel.org/20260528-mm-clarify-docs-v1-1-aa88e83b4bfd@redhat.com
Signed-off-by: Brian Masney <bmasney@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
These networking subsystems were removed in commit dd8d4bc28ad7
("net: remove ax25 and amateur radio (hamradio) subsystem"),
but the sysctl directory table still listed them.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260515180200.1490926-1-costa.shul@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "pid: make sub-init creation retryable" (Oleg Nesterov)
Make creation of init in a new namespace more robust by clearing away
some historical cruft which is no longer needed. Also some
documentation fixups
- "selftests/fchmodat2: Error handling and general" (Mark Brown)
Fix and a cleanup for the fchmodat2() syscall selftest
- "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)
- "hung_task: Provide runtime reset interface for hung task detector"
(Aaron Tomlin)
Give administrators the ability to zero out
/proc/sys/kernel/hung_task_detect_count
- "tools/getdelays: use the static UAPI headers from
tools/include/uapi" (Thomas Weißschuh)
Teach getdelays to use the in-kernel UAPI headers rather than the
system-provided ones
- "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)
Several cleanups and fixups to the hardlockup detector code and its
documentation
- "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)
A couple of small/theoretical fixes in the bch code
- "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)
- "cleanup the RAID5 XOR library" (Christoph Hellwig)
A quite far-reaching cleanup to this code. I can't do better than to
quote Christoph:
"The XOR library used for the RAID5 parity is a bit of a mess right
now. The main file sits in crypto/ despite not being cryptography
and not using the crypto API, with the generic implementations
sitting in include/asm-generic and the arch implementations
sitting in an asm/ header in theory. The latter doesn't work for
many cases, so architectures often build the code directly into
the core kernel, or create another module for the architecture
code.
Change this to a single module in lib/ that also contains the
architecture optimizations, similar to the library work Eric
Biggers has done for the CRC and crypto libraries later. After
that it changes to better calling conventions that allow for
smarter architecture implementations (although none is contained
here yet), and uses static_call to avoid indirection function call
overhead"
- "lib/list_sort: Clean up list_sort() scheduling workarounds"
(Kuan-Wei Chiu)
Clean up this library code by removing a hacky thing which was added
for UBIFS, which UBIFS doesn't actually need
- "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)
Fix a few bugs in the scatterlist code, add in-kernel tests for the
now-fixed bugs and fix a leak in the test itself
- "kdump: Enable LUKS-encrypted dump target support in ARM64 and
PowerPC" (Coiby Xu)
Enable support of the LUKS-encrypted device dump target on arm64 and
powerpc
- "ocfs2: consolidate extent list validation into block read callbacks"
(Joseph Qi)
Cleanup, simplify, and make more robust ocfs2's validation of extent
list fields (Kernel test robot loves mounting corrupted fs images!)
* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
ocfs2: validate group add input before caching
ocfs2: validate bg_bits during freefrag scan
ocfs2: fix listxattr handling when the buffer is full
doc: watchdog: fix typos etc
update Sean's email address
ocfs2: use get_random_u32() where appropriate
ocfs2: split transactions in dio completion to avoid credit exhaustion
ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
ocfs2: validate extent block list fields during block read
ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
ocfs2: validate dx_root extent list fields during block read
ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
ocfs2: handle invalid dinode in ocfs2_group_extend
.get_maintainer.ignore: add Askar
ocfs2: validate bg_list extent bounds in discontig groups
checkpatch: exclude forward declarations of const structs
tools/accounting: handle truncated taskstats netlink messages
taskstats: set version in TGID exit notifications
ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Support HW queue leasing, allowing containers to be granted access
to HW queues for zero-copy operations and AF_XDP
- Number of code moves to help the compiler with inlining. Avoid
output arguments for returning drop reason where possible
- Rework drop handling within qdiscs to include more metadata about
the reason and dropping qdisc in the tracepoints
- Remove the rtnl_lock use from IP Multicast Routing
- Pack size information into the Rx Flow Steering table pointer
itself. This allows making the table itself a flat array of u32s,
thus making the table allocation size a power of two
- Report TCP delayed ack timer information via socket diag
- Add ip_local_port_step_width sysctl to allow distributing the
randomly selected ports more evenly throughout the allowed space
- Add support for per-route tunsrc in IPv6 segment routing
- Start work of switching sockopt handling to iov_iter
- Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid
buffer size drifting up
- Support MSG_EOR in MPTCP
- Add stp_mode attribute to the bridge driver for STP mode selection.
This addresses concerns about call_usermodehelper() usage
- Remove UDP-Lite support (as announced in 2023)
- Remove support for building IPv6 as a module. Remove the now
unnecessary function calling indirection
Cross-tree stuff:
- Move Michael MIC code from generic crypto into wireless, it's
considered insecure but some WiFi networks still need it
Netfilter:
- Switch nft_fib_ipv6 module to no longer need temporary dst_entry
object allocations by using fib6_lookup() + RCU.
Florian W reports this gets us ~13% higher packet rate
- Convert IPVS's global __ip_vs_mutex to per-net service_mutex and
switch the service tables to be per-net. Convert some code that
walks the service lists to use RCU instead of the service_mutex
- Add more opinionated input validation to lower security exposure
- Make IPVS hash tables to be per-netns and resizable
Wireless:
- Finished assoc frame encryption/EPPKE/802.1X-over-auth
- Radar detection improvements
- Add 6 GHz incumbent signal detection APIs
- Multi-link support for FILS, probe response templates and client
probing
- New APIs and mac80211 support for NAN (Neighbor Aware Networking,
aka Wi-Fi Aware) so less work must be in firmware
Driver API:
- Add numerical ID for devlink instances (to avoid having to create
fake bus/device pairs just to have an ID). Support shared devlink
instances which span multiple PFs
- Add standard counters for reporting pause storm events (implement
in mlx5 and fbnic)
- Add configuration API for completion writeback buffering (implement
in mana)
- Support driver-initiated change of RSS context sizes
- Support DPLL monitoring input frequency (implement in zl3073x)
- Support per-port resources in devlink (implement in mlx5)
Misc:
- Expand the YAML spec for Netfilter
Drivers
- Software:
- macvlan: support multicast rx for bridge ports with shared
source MAC address
- team: decouple receive and transmit enablement for IEEE 802.3ad
LACP "independent control"
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- support high order pages in zero-copy mode (for payload
coalescing)
- support multiple packets in a page (for systems with 64kB
pages)
- Broadcom 25-400GE (bnxt):
- implement XDP RSS hash metadata extraction
- add software fallback for UDP GSO, lowering the IOMMU cost
- Broadcom 800GE (bnge):
- add link status and configuration handling
- add various HW and SW statistics
- Marvell/Cavium:
- NPC HW block support for cn20k
- Huawei (hinic3):
- add mailbox / control queue
- add rx VLAN offload
- add driver info and link management
- Ethernet NICs:
- Marvell/Aquantia:
- support reading SFP module info on some AQC100 cards
- Realtek PCI (r8169):
- add support for RTL8125cp
- Realtek USB (r8152):
- support for the RTL8157 5Gbit chip
- add 2500baseT EEE status/configuration support
- Ethernet NICs embedded and off-the-shelf IP:
- Synopsys (stmmac):
- cleanup and reorganize SerDes handling and PCS support
- cleanup descriptor handling and per-platform data
- cleanup and consolidate MDIO defines and handling
- shrink driver memory use for internal structures
- improve Tx IRQ coalescing
- improve TCP segmentation handling
- add support for Spacemit K3
- Cadence (macb):
- support PHYs that have inband autoneg disabled with GEM
- support IEEE 802.3az EEE
- rework usrio capabilities and handling
- AMD (xgbe):
- improve power management for S0i3
- improve TX resilience for link-down handling
- Virtual:
- Google cloud vNIC:
- support larger ring sizes in DQO-QPL mode
- improve HW-GRO handling
- support UDP GSO for DQO format
- PCIe NTB:
- support queue count configuration
- Ethernet PHYs:
- automatically disable PHY autonomous EEE if MAC is in charge
- Broadcom:
- add BCM84891/BCM84892 support
- Micrel:
- support for LAN9645X internal PHY
- Realtek:
- add RTL8224 pair order support
- support PHY LEDs on RTL8211F-VD
- support spread spectrum clocking (SSC)
- Maxlinear:
- add PHY-level statistics via ethtool
- Ethernet switches:
- Maxlinear (mxl862xx):
- support for bridge offloading
- support for VLANs
- support driver statistics
- Bluetooth:
- large number of fixes and new device IDs
- Mediatek:
- support MT6639 (MT7927)
- support MT7902 SDIO
- WiFi:
- Intel (iwlwifi):
- UNII-9 and continuing UHR work
- MediaTek (mt76):
- mt7996/mt7925 MLO fixes/improvements
- mt7996 NPU support (HW eth/wifi traffic offload)
- Qualcomm (ath12k):
- monitor mode support on IPQ5332
- basic hwmon temperature reporting
- support IPQ5424
- Realtek:
- add USB RX aggregation to improve performance
- add USB TX flow control by tracking in-flight URBs
- Cellular:
- IPA v5.2 support"
* tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits)
net: pse-pd: fix kernel-doc function name for pse_control_find_by_id()
wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit
wireguard: allowedips: remove redundant space
tools: ynl: add sample for wireguard
wireguard: allowedips: Use kfree_rcu() instead of call_rcu()
MAINTAINERS: Add netkit selftest files
selftests/net: Add additional test coverage in nk_qlease
selftests/net: Split netdevsim tests from HW tests in nk_qlease
tools/ynl: Make YnlFamily closeable as a context manager
net: airoha: Add missing PPE configurations in airoha_ppe_hw_init()
net: airoha: Fix VIP configuration for AN7583 SoC
net: caif: clear client service pointer on teardown
net: strparser: fix skb_head leak in strp_abort_strp()
net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete()
selftests/bpf: add test for xdp_master_redirect with bond not up
net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master
net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration
sctp: disable BH before calling udp_tunnel_xmit_skb()
sctp: fix missing encap_port propagation for GSO fragments
net: airoha: Rely on net_device pointer in ETS callbacks
...
|
|
Pull documentation updates from Jonathan Corbet:
"A busier cycle than I had expected for docs, including:
- Translations: some overdue updates to the Japanese translations,
Chinese translations for some of the Rust documentation, and the
beginnings of a Portuguese translation.
- New documents covering CPU isolation, managed interrupts, debugging
Python gbb scripts, and more.
- More tooling work from Mauro, reducing docs-build warnings, adding
self tests, improving man-page output, bringing in a proper C
tokenizer to replace (some of) the mess of kernel-doc regexes, and
more.
- Update and synchronize changes.rst and scripts/ver_linux, and put
both into alphabetical order.
... and a long list of documentation updates, typo fixes, and general
improvements"
* tag 'docs-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (162 commits)
Documentation: core-api: real-time: correct spelling
doc: Add CPU Isolation documentation
Documentation: Add managed interrupts
Documentation: seq_file: drop 2.6 reference
docs/zh_CN: update rust/index.rst translation
docs/zh_CN: update rust/quick-start.rst translation
docs/zh_CN: update rust/coding-guidelines.rst translation
docs/zh_CN: update rust/arch-support.rst translation
docs/zh_CN: sync process/2.Process.rst with English version
docs/zh_CN: fix an inconsistent statement in dev-tools/testing-overview
tracing: Documentation: Update histogram-design.rst for fn() handling
docs: sysctl: Add documentation for /proc/sys/xen/
Docs: hid: intel-ish-hid: make long URL usable
Documentation/kernel-parameters: fix architecture alignment for pt, nopt, and nobypass
sched/doc: Update yield_task description in sched-design-CFS
Documentation/rtla: Convert links to RST format
docs: fix typos and duplicated words across documentation
docs: fix typo in zoran driver documentation
docs: add an Assisted-by mention to submitting-patches.rst
Revert "scripts/checkpatch: add Assisted-by: tag validation"
...
|
|
Add documentation for the Xen hypervisor sysctl controls in
/proc/sys/xen/balloon/.
Documents the hotplug_unpopulated tunable (available when
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is enabled) which controls
whether unpopulated memory regions are automatically hotplugged
when the Xen balloon driver needs to reclaim memory.
The documentation is based on source code analysis of
drivers/xen/balloon.c.
Signed-off-by: Shubham Chakraborty <chakrabortyshubham66@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260304150419.16738-1-chakrabortyshubham66@gmail.com>
|
|
Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset once
an incident has been resolved. Furthermore, the previous implementation
relied upon implicit ordering, which could not strictly guarantee that
diagnostic metadata published by one CPU was visible to the panic logic on
another.
This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the counter.
The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:
1. Prevention of Load-Store Reordering via Acquire Semantics By
utilising atomic_long_read_acquire() to snapshot the counter
before initiating the task traversal, we establish a strict
memory barrier. This prevents the compiler or hardware from
reordering the initial load to a point later in the scan. Without
this "acquire" barrier, a delayed load could potentially read a
"0" value resulting from a userspace reset that occurred
mid-scan. This would lead to the subsequent cmpxchg succeeding
erroneously, thereby overwriting the user's reset with stale
increment data.
2. Atomicity of the "Commit" Phase via Release Semantics The
atomic_long_cmpxchg_release() serves as the transaction's commit
point. The "release" barrier ensures that all diagnostic
recordings and task-state observations made during the scan are
globally visible before the counter is incremented.
3. Race Condition Resolution This pairing effectively detects any
"out-of-band" reset of the counter. If
sysctl_hung_task_detect_count is modified via the procfs
interface during the scan, the final cmpxchg will detect the
discrepancy between the current value and the "acquire" snapshot.
Consequently, the update will fail, ensuring that a reset command
from the administrator is prioritised over a scan that may have
been invalidated by that very reset.
Link: https://lkml.kernel.org/r/20260303203031.4097316-3-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Joel Granados <joel.granados@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When no H2G transport is loaded, vsock currently routes all CIDs to the
G2H transport (commit 65b422d9b61b ("vsock: forward all packets to the
host when no H2G is registered"). Extend that existing behavior: when
an H2G transport is loaded but does not claim a given CID, the
connection falls back to G2H in the same way.
This matters in environments like Nitro Enclaves, where an instance may
run nested VMs via vhost-vsock (H2G) while also needing to reach sibling
enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old
code, any CID > 2 was unconditionally routed to H2G when vhost was
loaded, making those enclaves unreachable without setting
VMADDR_FLAG_TO_HOST explicitly on every connect.
Requiring every application to set VMADDR_FLAG_TO_HOST creates friction:
tools like socat, iperf, and others would all need to learn about it.
The flag was introduced 6 years ago and I am still not aware of any tool
that supports it. Even if there was support, it would be cumbersome to
use. The most natural experience is a single CID address space where H2G
only wins for CIDs it actually owns, and everything else falls through to
G2H, extending the behavior that already exists when H2G is absent.
To give user space at least a hint that the kernel applied this logic,
automatically set the VMADDR_FLAG_TO_HOST on the remote address so it
can determine the path taken via getpeername().
Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1).
At 0 it forces strict routing: H2G always wins for CID > VMADDR_CID_HOST,
or ENODEV if H2G is not loaded.
Signed-off-by: Alexander Graf <graf@amazon.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add documentation for the /proc/sys/crypto and /proc/sys/debug
directories in the admin-guide. This includes tunables for FIPS
mode (fips_enabled, fips_name, fips_version), exception-trace,
and kprobes-optimization.
The documentation is based on source code analysis and addresses
stylistic feedback to keep it direct and concise.
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Shubham Chakraborty <chakrabortyshubham66@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260223203724.20874-1-chakrabortyshubham66@gmail.com>
|
|
Update the vsock child_ns_mode documentation to include the new
write-once semantics of setting child_ns_mode. The semantics are
implemented in a preceding patch in this series.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-3-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
|
|
Add documentation for the vsock per-namespace sysctls (`ns_mode` and
`child_ns_mode`) to Documentation/admin-guide/sysctl/net.rst.
These sysctls were introduced by commit eafb64f40ca4 ("vsock: add
netns to vsock core").
Document the two namespace modes (`global` and `local`), the
inheritance behavior of `child_ns_mode`, and the restriction preventing
local namespaces from setting `child_ns_mode` to `global`.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260216163147.236844-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes
arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)
It adds a generic enter/leave layer and switches architectures to use
it. Various hacks were removed in the process.
- "zram: introduce compressed data writeback" implements data
compression for zram writeback (Richard Chang and Sergey Senozhatsky)
- "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
page ranges for hugepages. Large improvements during demand faulting
are demonstrated (David Hildenbrand)
- "memcg cleanups" tidies up some memcg code (Chen Ridong)
- "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
stats" improves DAMOS stat's provided information, deterministic
control, and readability (SeongJae Park)
- "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
issues in the hugetlb cgroup charging selftests (Li Wang)
- "Fix va_high_addr_switch.sh test failure - again" addresses several
issues in the va_high_addr_switch test (Chunyu Hu)
- "mm/damon/tests/core-kunit: extend existing test scenarios" improves
the KUnit test coverage for DAMON (Shu Anzai)
- "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
transiently return -EAGAIN (Shivank Garg)
- "arch, mm: consolidate hugetlb early reservation" reworks and
consolidates a pile of straggly code related to reservation of
hugetlb memory from bootmem and creation of CMA areas for hugetlb
(Mike Rapoport)
- "mm: clean up anon_vma implementation" cleans up the anon_vma
implementation in various ways (Lorenzo Stoakes)
- "tweaks for __alloc_pages_slowpath()" does a little streamlining of
the page allocator's slowpath code (Vlastimil Babka)
- "memcg: separate private and public ID namespaces" cleans up the
memcg ID code and prevents the internal-only private IDs from being
exposed to userspace (Shakeel Butt)
- "mm: hugetlb: allocate frozen gigantic folio" cleans up the
allocation of frozen folios and avoids some atomic refcount
operations (Kefeng Wang)
- "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
of memory betewwn the active and inactive LRUs and adds auto-tuning
of the ratio-based quotas and of monitoring intervals (SeongJae Park)
- "Support page table check on PowerPC" makes
CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)
- "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
nodes_and() and nodes_andnot() propagate the return values from the
underlying bit operations, enabling some cleanup in calling code
(Yury Norov)
- "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
some DAMON internal interfaces (SeongJae Park)
- "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
in khupaged and fixes a scan limit accounting issue (Shivank Garg)
- "mm: balloon infrastructure cleanups" goes to town on the balloon
infrastructure and its page migration function. Mainly cleanups, also
some locking simplification (David Hildenbrand)
- "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
additional tracepoints to the page reclaim code (Jiayuan Chen)
- "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
part of Marco's kernel-wide migration from the legacy workqueue APIs
over to the preferred unbound workqueues (Marco Crivellari)
- "Various mm kselftests improvements/fixes" provides various unrelated
improvements/fixes for the mm kselftests (Kevin Brodsky)
- "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
folio allocation, mainly by avoiding unnecessary work in
pfn_range_valid_contig() (Kefeng Wang)
- "selftests/damon: improve leak detection and wss estimation
reliability" improves the reliability of two of the DAMON selftests
(SeongJae Park)
- "mm/damon: cleanup kdamond, damon_call(), damos filter and
DAMON_MIN_REGION" does some cleanup work in the core DAMON code
(SeongJae Park)
- "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
performs maintenance work on the DAMON documentation (SeongJae Park)
- "mm: add and use vma_assert_stabilised() helper" refactors and cleans
up the core VMA code. The main aim here is to be able to use the mmap
write lock's lockdep state to perform various assertions regarding
the locking which the VMA code requires (Lorenzo Stoakes)
- "mm, swap: swap table phase II: unify swapin use" removes some old
swap code (swap cache bypassing and swap synchronization) which
wasn't working very well. Various other cleanups and simplifications
were made. The end result is a 20% speedup in one benchmark (Kairui
Song)
- "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
available on 64-bit alpha, loongarch, mips, parisc, and um. Various
cleanups were performed along the way (Qi Zheng)
* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
mm/memory: handle non-split locks correctly in zap_empty_pte_table()
mm: move pte table reclaim code to memory.c
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
um: mm: enable MMU_GATHER_RCU_TABLE_FREE
parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
zsmalloc: make common caches global
mm: add SPDX id lines to some mm source files
mm/zswap: use %pe to print error pointers
mm/vmscan: use %pe to print error pointers
mm/readahead: fix typo in comment
mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
mm: refactor vma_map_pages to use vm_insert_pages
mm/damon: unify address range representation with damon_addr_range
mm/cma: replace snprintf with strscpy in cma_new_area
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core & protocols:
- A significant effort all around the stack to guide the compiler to
make the right choice when inlining code, to avoid unneeded calls
for small helper and stack canary overhead in the fast-path.
This generates better and faster code with very small or no text
size increases, as in many cases the call generated more code than
the actual inlined helper.
- Extend AccECN implementation so that is now functionally complete,
also allow the user-space enabling it on a per network namespace
basis.
- Add support for memory providers with large (above 4K) rx buffer.
Paired with hw-gro, larger rx buffer sizes reduce the number of
buffers traversing the stack, dincreasing single stream CPU usage
by up to ~30%.
- Do not add HBH header to Big TCP GSO packets. This simplifies the
RX path, the TX path and the NIC drivers, and is possible because
user-space taps can now interpret correctly such packets without
the HBH hint.
- Allow IPv6 routes to be configured with a gateway address that is
resolved out of a different interface than the one specified,
aligning IPv6 to IPv4 behavior.
- Multi-queue aware sch_cake. This makes it possible to scale the
rate shaper of sch_cake across multiple CPUs, while still enforcing
a single global rate on the interface.
- Add support for the nbcon (new buffer console) infrastructure to
netconsole, enabling lock-free, priority-based console operations
that are safer in crash scenarios.
- Improve the TCP ipv6 output path to cache the flow information,
saving cpu cycles, reducing cache line misses and stack use.
- Improve netfilter packet tracker to resolve clashes for most
protocols, avoiding unneeded drops on rare occasions.
- Add IP6IP6 tunneling acceleration to the flowtable infrastructure.
- Reduce tcp socket size by one cache line.
- Notify neighbour changes atomically, avoiding inconsistencies
between the notification sequence and the actual states sequence.
- Add vsock namespace support, allowing complete isolation of vsocks
across different network namespaces.
- Improve xsk generic performances with cache-alignment-oriented
optimizations.
- Support netconsole automatic target recovery, allowing netconsole
to reestablish targets when underlying low-level interface comes
back online.
Driver API:
- Support for switching the working mode (automatic vs manual) of a
DPLL device via netlink.
- Introduce PHY ports representation to expose multiple front-facing
media ports over a single MAC.
- Introduce "rx-polarity" and "tx-polarity" device tree properties,
to generalize polarity inversion requirements for differential
signaling.
- Add helper to create, prepare and enable managed clocks.
Device drivers:
- Add Huawei hinic3 PF etherner driver.
- Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
controller.
- Add ethernet driver for MaxLinear MxL862xx switches
- Remove parallel-port Ethernet driver.
- Convert existing driver timestamp configuration reporting to
hwtstamp_get and remove legacy ioctl().
- Convert existing drivers to .get_rx_ring_count(), simplifing the RX
ring count retrieval. Also remove the legacy fallback path.
- Ethernet high-speed NICs:
- Broadcom (bnxt, bng):
- bnxt: add FW interface update to support FEC stats histogram
and NVRAM defragmentation
- bng: add TSO and H/W GRO support
- nVidia/Mellanox (mlx5):
- improve latency of channel restart operations, reducing the
used H/W resources
- add TSO support for UDP over GRE over VLAN
- add flow counters support for hardware steering (HWS) rules
- use a static memory area to store headers for H/W GRO,
leading to 12% RX tput improvement
- Intel (100G, ice, idpf):
- ice: reorganizes layout of Tx and Rx rings for cacheline
locality and utilizes __cacheline_group* macros on the new
layouts
- ice: introduces Synchronous Ethernet (SyncE) support
- Meta (fbnic):
- adds debugfs for firmware mailbox and tx/rx rings vectors
- Ethernet virtual:
- geneve: introduce GRO/GSO support for double UDP encapsulation
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- some code refactoring and cleanups
- RealTek (r8169):
- add support for RTL8127ATF (10G Fiber SFP)
- add dash and LTR support
- Airoha:
- AN8811HB 2.5 Gbps phy support
- Freescale (fec):
- add XDP zero-copy support
- Thunderbolt:
- add get link setting support to allow bonding
- Renesas:
- add support for RZ/G3L GBETH SoC
- Ethernet switches:
- Maxlinear:
- support R(G)MII slow rate configuration
- add support for Intel GSW150
- Motorcomm (yt921x):
- add DCB/QoS support
- TI:
- icssm-prueth: support bridging (STP/RSTP) via the switchdev
framework
- Ethernet PHYs:
- Realtek:
- enable SGMII and 2500Base-X in-band auto-negotiation
- simplify and reunify C22/C45 drivers
- Micrel: convert bindings to DT schema
- CAN:
- move skb headroom content into skb extensions, making CAN
metadata access more robust
- CAN drivers:
- rcar_canfd:
- add support for FD-only mode
- add support for the RZ/T2H SoC
- sja1000: cleanup the CAN state handling
- WiFi:
- implement EPPKE/802.1X over auth frames support
- split up drop reasons better, removing generic RX_DROP
- additional FTM capabilities: 6 GHz support, supported number of
spatial streams and supported number of LTF repetitions
- better mac80211 iterators to enumerate resources
- initial UHR (Wi-Fi 8) support for cfg80211/mac80211
- WiFi drivers:
- Qualcomm/Atheros:
- ath11k: support for Channel Frequency Response measurement
- ath12k: a significant driver refactor to support multi-wiphy
devices and and pave the way for future device support in the
same driver (rather than splitting to ath13k)
- ath12k: support for the QCC2072 chipset
- Intel:
- iwlwifi: partial Neighbor Awareness Networking (NAN) support
- iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
- RealTek (rtw89):
- preparations for RTL8922DE support
- Bluetooth:
- implement setsockopt(BT_PHY) to set the connection packet type/PHY
- set link_policy on incoming ACL connections
- Bluetooth drivers:
- btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
- btqca: add WCN6855 firmware priority selection feature"
* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
bnge/bng_re: Add a new HSI
net: macb: Fix tx/rx malfunction after phy link down and up
af_unix: Fix memleak of newsk in unix_stream_connect().
net: ti: icssg-prueth: Add optional dependency on HSR
net: dsa: add basic initial driver for MxL862xx switches
net: mdio: add unlocked mdiodev C45 bus accessors
net: dsa: add tag format for MxL862xx switches
dt-bindings: net: dsa: add MaxLinear MxL862xx
selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
octeontx2-pf: Unregister devlink on probe failure
net: renesas: rswitch: fix forwarding offload statemachine
ionic: Rate limit unknown xcvr type messages
tcp: inet6_csk_xmit() optimization
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
ipv6: use np->final in inet6_sk_rebuild_header()
ipv6: add daddr/final storage in struct ipv6_pinfo
net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
...
|
|
Pull documentation updates from Jonathan Corbet:
"A slightly calmer cycle for docs this time around, though there is
still a fair amount going on, including:
- Some signs of life on the long-moribund Japanese translation
- Documentation on policies around the use of generative tools for
patch submissions, and a separate document intended for consumption
by generative tools
- The completion of the move of the documentation tools to
tools/docs. For now we're leaving a /scripts/kernel-doc symlink
behind to avoid breaking scripts
- Ongoing build-system work includes the incorporation of
documentation in Python code, better support for documenting
variables, and lots of improvements and fixes
- Automatic linking of man-page references -- cat(1), for example --
to the online pages in the HTML build
...and the usual array of typo fixes and such"
* tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (107 commits)
doc: development-process: add notice on testing
tools: sphinx-build-wrapper: improve its help message
docs: sphinx-build-wrapper: allow -v override -q
docs: kdoc: Fix pdfdocs build for tools
docs: ja_JP: process: translate 'Obtain a current source tree'
docs: fix 're-use' -> 'reuse' in documentation
docs: ioctl-number: fix a typo in ioctl-number.rst
docs: filesystems: ensure proc pid substitutable is complete
docs: automarkup.py: Skip common English words as C identifiers
Documentation: use a source-read extension for the index link boilerplate
docs: parse_features: make documentation more consistent
docs: add parse_features module documentation
docs: jobserver: do some documentation improvements
docs: add jobserver module documentation
docs: kabi: helpers: add documentation for each "enum" value
docs: kabi: helpers: add helper for debug bits 7 and 8
docs: kabi: system_symbols: end docstring phrases with a dot
docs: python: abi_regex: do some improvements at documentation
docs: python: abi_parser: do some improvements at documentation
docs: add kabi modules documentation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initrd removal from Christian Brauner:
"Remove the deprecated linuxrc-based initrd code path and related dead
code. The linuxrc initrd path was deprecated in 2020 and this series
completes its removal. If we see real-life regressions we'll revert.
The core change removes handle_initrd() and init_linuxrc() — the
entire flow that ran /linuxrc from an initrd, pivoted roots, and
handed off to the real root filesystem. With that gone, initrd_load()
becomes void (no longer short-circuits prepare_namespace()),
rd_load_image() is simplified to always load /initrd.image instead of
taking a path, and rd_load_disk() is deleted.
The /proc/sys/kernel/real-root-dev sysctl and its backing variable are
removed since they only existed for linuxrc to communicate the real
root device back to the kernel.
The no-op load_ramdisk= and prompt_ramdisk= parameters are dropped,
and noinitrd and ramdisk_start= gain deprecation warnings.
Initramfs is entirely unaffected. The non-linuxrc initrd path
(root=/dev/ram0) is preserved but now carries a deprecation warning
targeting January 2027 removal"
* tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
init: remove /proc/sys/kernel/real-root-dev
initrd: remove deprecated code path (linuxrc)
init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc8).
No adjacent changes, conflicts:
drivers/net/ethernet/spacemit/k1_emac.c
2c84959167d64 ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()")
f66086798f91f ("net: spacemit: Remove broken flow control support")
https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix the the buggy conversion of fuse_reverse_inval_entry() introduced
during the creation rework
- Disallow nfs delegation requests for directories by setting
simple_nosetlease()
- Require an opt-in for getting readdir flag bits outside of S_DT_MASK
set in d_type
- Fix scheduling delayed writeback work by only scheduling when the
dirty time expiry interval is non-zero and cancel the delayed work if
the interval is set to zero
- Use rounded_jiffies_interval for dirty time work
- Check the return value of sb_set_blocksize() for romfs
- Wait for batched folios to be stable in __iomap_get_folio()
- Use private naming for fuse hash size
- Fix the stale dentry cleanup to prevent a race that causes a UAF
* tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: document d_dispose_if_unused()
fuse: shrink once after all buckets have been scanned
fuse: clean up fuse_dentry_tree_work()
fuse: add need_resched() before unlocking bucket
fuse: make sure dentry is evicted if stale
fuse: fix race when disposing stale dentries
fuse: use private naming for fuse hash size
writeback: use round_jiffies_relative for dirtytime_work
iomap: wait for batched folios to be stable in __iomap_get_folio
romfs: check sb_set_blocksize() return value
docs: clarify that dirtytime_expire_seconds=0 disables writeback
writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
readdir: require opt-in for d_type flags
vboxsf: don't allow delegations to be set on directories
ceph: don't allow delegations to be set on directories
gfs2: don't allow delegations to be set on directories
9p: don't allow delegations to be set on directories
smb/client: properly disallow delegations on directories
nfs: properly disallow delegation requests on directories
fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
|
|
NETDEV_RSS_KEY_LEN has been set to 52 bytes in 2014, until now.
Jakub suggested we bump the size to 128 bytes or more.
Some drivers (like idpf) were already working around the core limit.
Since this change might cause some issues in admin scripts,
bump it directly to 256 in one go.
tjbp26:~# cat /proc/sys/net/core/netdev_rss_key | wc -c
768
tjbp26:~# ethtool -x eth1
RX flow hash indirection table for eth1 with 32 RX ring(s):
...
RSS hash key:
fe:16:5b:2f:93:85:c2:c9:c1:ef:bd:60:c6:e0:2b:99:4d:bf:b7:14:c8:1e:8d:cb:31:17:51:da:55:eb:91:d9:9e:f9:89:9b:44:a1:dc:08:72:3a:b3:d6:31:86:9a:fe:02:3a:0d:eb:a1:7c:f5:a3:51:3b:08:56:c9:3f:71:69:01:ba:70:38
RSS hash function:
toeplitz: on
xor: off
crc32: off
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20260122075206.504ec591@kernel.org/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260122190349.2771064-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reintroduces a concept removed by: commit d6cb41cc44c6 ("mm, hugetlb:
remove hugepages_treat_as_movable sysctl")
This sysctl provides flexibility between ZONE_MOVABLE use cases:
1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable
When ZONE_MOVABLE is used to make huge page allocation more reliable,
disallowing gigantic pages memory in this region is pointless. If hotplug
is not a requirement, we can loosen the restrictions to allow 1GB gigantic
pages in ZONE_MOVABLE.
Since 1GB can be difficult to migrate / has impacts on compaction /
defragmentation, we don't enable this by default. Notably, 1GB pages can
only be migrated if another 1GB page is available - so hot-unplug will
fail if such a page cannot be found.
However, since there are scenarios where gigantic pages are migratable, we
should allow use of these on movable regions.
When not valid 1GB is available for migration, hot-unplug will retry
indefinitely (or until interrupted). For example:
echo 0 > node0/hugepages/..-1GB/nr_hugepages # clear node0 1GB pages
echo 1 > node1/hugepages/..-1GB/nr_hugepages # reserve node1 1GB page
./alloc_huge_node1 & # Allocate a 1GB page on node1
./node1_offline & # attempt to offline all node1 memory
echo 1 > node0/hugepages/..-1GB/nr_hugepages # reserve node0 1GB page
In this example, node1_offline will block indefinitely until the final
step, when a node0 1GB page is made available.
Note: Boot-time CMA is not possible for driver-managed hotplug memory, as
CMA requires the memory to be registered as SystemRAM at boot time.
Additionally, 1GB huge pages are not supported by THP.
Link: https://lkml.kernel.org/r/20251221125603.2364174-1-gourry@gourry.net
Signed-off-by: Gregory Price <gourry@gourry.net>
Suggested-by: David Rientjes <rientjes@google.com>
Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Laptop mode was introduced to save battery, by delaying and consolidating
writes and thereby maximize the time rotating hard drives wouldn't have to
spin.
Luckily, rotating hard drives, with their high spin-up times and power
draw, are a thing of the past for battery-powered devices. Reclaim has
also since changed to not write single filesystem pages anymore, and
regular filesystem writeback is lumpy by design.
The juice doesn't appear worth the squeeze anymore. The footprint of the
feature is small, but nevertheless it's a complicating factor in mm,
block, filesystems. Developers don't think about it, and it likely hasn't
been tested with new reclaim and writeback changes in years.
Let's sunset it. Keep the sysctl with a deprecation warning around for a
few more cycles, but remove all functionality behind it.
[akpm@linux-foundation.org: fix Documentation/admin-guide/laptops/index.rst]
Link: https://lkml.kernel.org/r/20251216185201.GH905277@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When CONFIG_MEM_ALLOC_PROFILING_DEBUG=y, /proc/sys/vm/mem_profiling is
read-only to avoid debug warnings in a scenario when an allocation is
made while profiling is disabled (allocation does not get an allocation
tag), then profiling gets enabled and allocation gets freed (warning due
to the allocation missing allocation tag).
Link: https://lkml.kernel.org/r/20260116184423.2708363-1-surenb@google.com
Fixes: ebdf9ad4ca98 ("memprofiling: documentation")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
vsprintf.c uses a mix of the `kernel.kptr_restrict` sysctl and the
`hash_pointers` boot param to control pointer hashing. But that wasn't
possible to tell without looking at the source code.
They have a different focus and purpose. To avoid wasting the time of
users trying to use one instead of the other, simply have them reference
each other in the Documentation.
Signed-off-by: Marc Herbert <marc.herbert@linux.intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260107-doc-hash-ptr-v2-1-cb4c161218d7@linux.intel.com>
|
|
In blamed commit, I added a check against the temporary queue
built in __dev_xmit_skb(). Idea was to drop packets early,
before any spinlock was acquired.
if (unlikely(defer_count > READ_ONCE(q->limit))) {
kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
return NET_XMIT_DROP;
}
It turned out that HTB Qdisc has a zero q->limit.
HTB limits packets on a per-class basis.
Some of our tests became flaky.
Add a new sysctl : net.core.qdisc_max_burst to control
how many packets can be stored in the temporary lockless queue.
Also add a new QDISC_BURST_DROP drop reason to better diagnose
future issues.
Thanks Neal !
Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption")
Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
It is not used anymore.
Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://patch.msgid.link/20251119222407.3333257-4-safinaskar@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Document that setting vm.dirtytime_expire_seconds to zero disables
periodic dirtytime writeback, matching the behavior of the related
dirty_writeback_centisecs sysctl which already documents this.
Signed-off-by: Laveesh Bansal <laveeshb@laveeshbansal.com>
Link: https://patch.msgid.link/20260106145059.543282-3-laveeshb@laveeshbansal.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
|
|
Which serves as a global default sys_info mask. When users want the same
system information for many error cases (panic, hung, lockup ...), they
can chose to set this global knob only once, while not setting up each
individual sys_info knobs.
This just adds a 'lazy' option, and doesn't change existing kernel
behavior as the mask is 0 by default.
Link: https://lkml.kernel.org/r/20251113111039.22701-5-feng.tang@linux.alibaba.com
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When soft/hard lockup happens, developers may need different kinds of
system information (call-stacks, memory info, locks, etc.) to help
debugging.
Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take
human readable string like "tasks,mem,timers,locks,ftrace,...", and when
system lockup happens, all requested information will be printed out.
(refer kernel/sys_info.c for more details).
Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When task-hung happens, developers may need different kinds of system
information (call-stacks, memory info, locks, etc.) to help debugging.
Add 'hung_task_sys_info' sysctl knob to take human readable string like
"tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
requested information will be dumped. (refer kernel/sys_info.c for more
details).
Meanwhile, the newly introduced sys_info() call is used to unify some
existing info-dumping knobs.
[feng.tang@linux.alibaba.com: maintain consistecy established behavior, per Lance and Petr]
Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
Link: https://lkml.kernel.org/r/20251113111039.22701-3-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Enable hung_task and lockup cases to dump system info on
demand", v2.
When working on kernel stability issues: panic, task-hung and soft/hard
lockup are frequently met. And to debug them, user may need lots of
system information at that time, like task call stacks, lock info, memory
info, ftrace dump, etc.
panic case already uses sys_info() for this purpose, and has a
'panic_sys_info' sysctl(also support cmdline setup) interface to take
human readable string like "tasks,mem,timers,locks,ftrace,..." to control
what kinds of information is needed. Which is also helpful to debug
task-hung and lockup cases.
So this patchset introduces the similar sys_info sysctl interface for
task-hung and lockup cases.
his is mainly for debugging and the info dumping could be intrusive, like
dumping call stack for all tasks when system has huge number of tasks,
similarly for ftrace dump (we may add tracing_stop() and tracing_start()
around it)
Locally these have been used in our bug chasing for stability issues and
were helpful.
As Andrew suggested, add a configurable global 'kernel_sys_info' knob.
When error scenarios like panic/hung-task/lockup etc doesn't setup their
own sys_info knob and calls sys_info() with parameter "0", this global
knob will take effect. It could be used for other kernel cases like OOM,
which may not need one dedicated sys_info knob.
This patch (of 4):
Some sys_info names wered forgotten to change in patch iterations, while
the right names are defined in kernel/sys_info.c.
Link: https://lkml.kernel.org/r/20251113111039.22701-1-feng.tang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20251113111039.22701-2-feng.tang@linux.alibaba.com
Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter")
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The hung_task_panic sysctl is currently a blunt instrument: it's all or
nothing.
Panicking on a single hung task can be an overreaction to a transient
glitch. A more reliable indicator of a systemic problem is when
multiple tasks hang simultaneously.
Extend hung_task_panic to accept an integer threshold, allowing the
kernel to panic only when N hung tasks are detected in a single scan.
This provides finer control to distinguish between isolated incidents
and system-wide failures.
The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic on the first hung task (unchanged)
- N > 1: Panic after N hung tasks are detected in a single scan
The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility.
[lance.yang@linux.dev: new changelog]
Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> [aspeed_g5_defconfig]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Florian Wesphal <fw@strlen.de>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
skb_defer_max value is very conservative, and can be increased
to avoid too many calls to kick_defer_list_purge().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251106202935.1776179-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out
of the global protocol memory accounting.
Let's control the flag by a new sysctl knob.
The flag is written once during socket(2) and is inherited to child
sockets.
Tested with a script that creates local socket pairs and send()s a
bunch of data without recv()ing.
Setup:
# mkdir /sys/fs/cgroup/test
# echo $$ >> /sys/fs/cgroup/test/cgroup.procs
# sysctl -q net.ipv4.tcp_mem="1000 1000 1000"
# ulimit -n 524288
Without net.core.bypass_prot_mem, charged to tcp_mem & memcg
# python3 pressure.py &
# cat /sys/fs/cgroup/test/memory.stat | grep sock
sock 22642688 <-------------------------------------- charged to memcg
# cat /proc/net/sockstat| grep TCP
TCP: inuse 2006 orphan 0 tw 0 alloc 2008 mem 5376 <-- charged to tcp_mem
# ss -tn | head -n 5
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:53188
ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:49972
ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:53868
ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:53554
# nstat | grep Pressure || echo no pressure
TcpExtTCPMemoryPressures 1 0.0
With net.core.bypass_prot_mem=1, charged to memcg only:
# sysctl -q net.core.bypass_prot_mem=1
# python3 pressure.py &
# cat /sys/fs/cgroup/test/memory.stat | grep sock
sock 2757468160 <------------------------------------ charged to memcg
# cat /proc/net/sockstat | grep TCP
TCP: inuse 2006 orphan 0 tw 0 alloc 2008 mem 0 <- NOT charged to tcp_mem
# ss -tn | head -n 5
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 111000 0 127.0.0.1:36019 127.0.0.1:49026
ESTAB 110000 0 127.0.0.1:36019 127.0.0.1:45630
ESTAB 110000 0 127.0.0.1:36019 127.0.0.1:44870
ESTAB 111000 0 127.0.0.1:36019 127.0.0.1:45274
# nstat | grep Pressure || echo no pressure
no pressure
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://patch.msgid.link/20251014235604.3057003-4-kuniyu@google.com
|
|
Add a new sysctl to control how often a queue reselection
can happen even if a flow has a persistent queue of skbs
in a Qdisc or NIC queue.
A value of zero means the feature is disabled.
Default is 1000 (1 second).
This sysctl is used in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251013152234.842065-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull documentation updates from Jonathan Corbet:
"It has been a relatively busy cycle in docsland, with changes all
over:
- Bring the kernel memory-model docs into the Sphinx build in the
"literal include" mode.
- Lots of build-infrastructure work, further cleaning up long-term
kernel-doc technical debt. The sphinx-pre-install tool has been
converted to Python and updated for current systems.
- A new tool to detect when documents have been moved and generate
HTML redirects; this can be used on kernel.org (or any other site
hosting the rendered docs) to avoid breaking links.
- Automated processing of the YAML files describing the netlink
protocol.
- A significant update of the maintainer's PGP guide.
... and a seemingly endless series of typo fixes, build-problem fixes,
etc"
* tag 'docs-6.18' of git://git.lwn.net/linux: (193 commits)
Documentation/features: Update feature lists for 6.17-rc7
docs: remove cdomain.py
Documentation/process: submitting-patches: fix typo in "were do"
docs: dev-tools/lkmm: Fix typo of missing file extension
Documentation: trace: histogram: Convert ftrace docs cross-reference
Documentation: trace: histogram-design: Wrap introductory note in note:: directive
Documentation: trace: historgram-design: Separate sched_waking histogram section heading and the following diagram
Documentation: trace: histogram-design: Trim trailing vertices in diagram explanation text
Documentation: trace: histogram: Fix histogram trigger subsection number order
docs: driver-api: fix spelling of "buses".
Documentation: fbcon: Use admonition directives
Documentation: fbcon: Reindent 8th step of attach/detach/unload
Documentation: fbcon: Add boot options and attach/detach/unload section headings
docs: filesystems: sysfs: add remaining top level sysfs directory descriptions
docs: filesystems: sysfs: clarify symlink destinations in dev and bus/devices descriptions
docs: filesystems: sysfs: remove top level sysfs net directory
docs: maintainer: Fix ambiguous subheading formatting
docs: kdoc: a few more dump_typedef() tweaks
docs: kdoc: remove redundant comment stripping in dump_typedef()
docs: kdoc: remove some dead code in dump_typedef()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet
completes the removal of this legacy IDR API
- "panic: introduce panic status function family" from Jinchao Wang
provides a number of cleanups to the panic code and its various
helpers, which were rather ad-hoc and scattered all over the place
- "tools/delaytop: implement real-time keyboard interaction support"
from Fan Yu adds a few nice user-facing usability changes to the
delaytop monitoring tool
- "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos
Petrongonas fixes a panic which was happening with the combination of
EFI and KHO
- "Squashfs: performance improvement and a sanity check" from Phillip
Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere
150x speedup was measured for a well-chosen microbenchmark
- plus another 50-odd singleton patches all over the place
* tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits)
Squashfs: reject negative file sizes in squashfs_read_inode()
kallsyms: use kmalloc_array() instead of kmalloc()
MAINTAINERS: update Sibi Sankar's email address
Squashfs: add SEEK_DATA/SEEK_HOLE support
Squashfs: add additional inode sanity checking
lib/genalloc: fix device leak in of_gen_pool_get()
panic: remove CONFIG_PANIC_ON_OOPS_VALUE
ocfs2: fix double free in user_cluster_connect()
checkpatch: suppress strscpy warnings for userspace tools
cramfs: fix incorrect physical page address calculation
kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
Squashfs: fix uninit-value in squashfs_get_parent
kho: only fill kimage if KHO is finalized
ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name()
kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths
sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock
coccinelle: platform_no_drv_owner: handle also built-in drivers
coccinelle: of_table: handle SPI device ID tables
lib/decompress: use designated initializers for struct compress_format
efi: support booting with kexec handover (KHO)
...
|
|
User reported current document about SYS_INFO_PANIC_CONSOLE_REPLAY is
confusing, that people could expect all user space console messages to be
replayed.
Specify that only 'kernel' messages will be replayed to solve the confusion.
Link: https://lkml.kernel.org/r/20250825025701.81921-3-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reported-by: Askar Safin <safinaskar@zohomail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a few missing directories under /proc/sys.
Fix punctuation and doubled words.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250819075456.113623-1-rdunlap@infradead.org
|
|
SO_RCVBUF and SO_SNDBUF have limited range today, unless
distros or system admins change rmem_max and wmem_max.
Even iproute2 uses 1 MB SO_RCVBUF which is capped by
the kernel.
Decouple [rw]mem_max and [rw]mem_default and increase
[rw]mem_max to 4 MB.
Before:
$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992
After:
$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 4194304
net.core.wmem_default = 212992
net.core.wmem_max = 4194304
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pipe size limit used when the fs.pipe-user-pages-soft sysctl value
is reached was increased from one to two pages in commit 46c4c9d1beb7;
update the documentation to match the new reality.
Fixes: 46c4c9d1beb7 ("pipe: increase minimum default pipe size to 2 pages")
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250729-pipedoc-v2-1-18b8e735a9c6@smrk.net
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
|
|
Pull documentation updates from Jonathan Corbet:
"It has been a relatively busy cycle for docs, especially the build
system:
- The Perl kernel-doc script was added to 2.3.52pre1 just after the
turn of the millennium. Over the following 25 years, it accumulated
a vast amount of cruft, all in a language few people want to deal
with anymore. Mauro's Python replacement in 6.16 faithfully
reproduced all of the cruft in the hope of avoiding regressions.
Now that we have a more reasonable code base, though, we can work
on cleaning it up; many of the changes this time around are toward
that end.
- A reorganization of the ext4 docs into the usual TOC format.
- Various Chinese translations and updates.
- A new script from Mauro to help with docs-build testing.
- A new document for linked lists
- A sweep through MAINTAINERS fixing broken GitHub git:// repository
links.
...and lots of fixes and updates"
* tag 'docs-6.17' of git://git.lwn.net/linux: (147 commits)
scripts: add origin commit identification based on specific patterns
sphinx: kernel_abi: fix performance regression with O=<dir>
Documentation: core-api: entry: Replace deprecated KVM entry/exit functions
docs: fault-injection: drop reference to md-faulty
docs: document linked lists
scripts: kdoc: make it backward-compatible with Python 3.7
docs: kernel-doc: emit warnings for ancient versions of Python
Documentation/rtla: Describe exit status
Documentation/rtla: Add include common_appendix.rst
docs: kernel: Clarify printk_ratelimit_burst reset behavior
Documentation: ioctl-number: Don't repeat macro names
Documentation: ioctl-number: Shorten macros table
Documentation: ioctl-number: Correct full path to papr-physical-attestation.h
Documentation: ioctl-number: Extend "Include File" column width
Documentation: ioctl-number: Fix linuxppc-dev mailto link
overlayfs.rst: fix typos
docs: kdoc: emit a warning for ancient versions of Python
docs: kdoc: clean up check_sections()
docs: kdoc: directly access the always-there KdocItem fields
docs: kdoc: straighten up dump_declaration()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective
subsystems. Only 5 (out of the original 50) will remain in
kernel/sysctl.c file; these handle either sysctl or common arch
variables.
By decentralizing sysctl registrations, subsystem maintainers regain
control over their sysctl interfaces, improving maintainability and
reducing the likelihood of merge conflicts.
- docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented
in the check-sysctl-docs script. This script can now be used to
automatically identify if documentation is missing.
* tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits)
docs: Downgrade arm64 & riscv from titles to comment
docs: Replace spaces with tabs in check-sysctl-docs
docs: Remove colon from ctltable title in vm.rst
docs: Add awk section for ucount sysctl entries
docs: Use skiplist when checking sysctl admin-guide
docs: nixify check-sysctl-docs
sysctl: rename kern_table -> sysctl_subsys_table
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
uevent: mv uevent_helper into kobject_uevent.c
sysctl: Removed unused variable
sysctl: Nixify sysctl.sh
sysctl: Remove superfluous includes from kernel/sysctl.c
sysctl: Remove (very) old file changelog
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
sysctl: move cad_pid into kernel/pid.c
sysctl: Move tainted ctl_table into kernel/panic.c
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
fork: mv threads-max into kernel/fork.c
parisc/power: Move soft-power into power.c
mm: move randomize_va_space into memory.c
...
|
|
Remove the title string ("====") from under arm64 & riscv and move them
to a commment under the perf_user_access sysctl. They are explanations,
*not* sysctls themselves
This effectively removes these two strings from appearing as not
implemented when the check-sysctl-docs script is run
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
Removing them solves an issue where they were incorrectly considered as
not implemented by the check-sysctl-docs script
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:
- Add the new top-level CONFIG_KSTACK_ERASE option which will be
implemented either with the stackleak GCC plugin, or with the Clang
stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
for what it does rather than what it protects against), but leave as
many of the internals alone as possible to avoid even more churn.
While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|