Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
"Address a KUnit stack initialization regression that got tickled on
m68k, and solve a Clang(v14 and earlier) bug found by 0day:
- Fix stackinit KUnit regression on m68k
- Use ARRAY_SIZE() for memtostr*()/strtomem*()"
* tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
compiler.h: Introduce __must_be_byte_array()
compiler.h: Move C string helpers into C-only kernel section
stackinit: Fix comment for test_small_end
stackinit: Keep selftest union size small on m68k
|
|
In union test_small_end, the small members are three and four.
Fixes: e71a29db79da1946 ("stackinit: Add union initialization to selftests")
Closes: https://lore.kernel.org/CAMuHMdWvcKOc6v5o3-9-SqP_4oh5-GZQjZZb=-krhY=mVRED_Q@mail.gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/3f8faa2d7d0d6b36571093ab0fb1fd5157abd7bb.1738593178.git.geert+renesas@glider.be
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The stack frame on m68k is very sensitive to the size of what needs to
be stored. Like done for long string testing, reduce the size of the
large trailing struct in the union initialization testing.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdXW8VbtOAixO7w+aDOG70aZtZ50j1Ybcr8B3eYnRUcrcA@mail.gmail.com
Fixes: e71a29db79da ("stackinit: Add union initialization to selftests")
Link: https://lore.kernel.org/r/20250204174509.work.711-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"21 hotfixes. 8 are cc:stable and the remainder address post-6.13
issues. 13 are for MM and 8 are for non-MM.
All are singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
MAINTAINERS: include linux-mm for xarray maintenance
revert "xarray: port tests to kunit"
MAINTAINERS: add lib/test_xarray.c
mailmap, MAINTAINERS, docs: update Carlos's email address
mm/hugetlb: fix hugepage allocation for interleaved memory nodes
mm: gup: fix infinite loop within __get_longterm_locked
mm, swap: fix reclaim offset calculation error during allocation
.mailmap: update email address for Christopher Obbard
kfence: skip __GFP_THISNODE allocations on NUMA systems
nilfs2: fix possible int overflows in nilfs_fiemap()
mm: compaction: use the proper flag to determine watermarks
kernel: be more careful about dup_mmap() failures and uprobe registering
mm/fake-numa: handle cases with no SRAT info
mm: kmemleak: fix upper boundary check for physical address objects
mailmap: add an entry for Hamza Mahfooz
MAINTAINERS: mailmap: update Yosry Ahmed's email address
scripts/gdb: fix aarch64 userspace detection in get_current_task
mm/vmscan: accumulate nr_demoted for accurate demotion statistics
ocfs2: fix incorrect CPU endianness conversion causing mount failure
mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
...
|
|
Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build
when compiing the xarray userspace test harness code.
Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com
Cc: David Gow <davidgow@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tamir Duberstein <tamird@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
"This is a fix for the soon to be released GCC 15 which has regressed
its initialization of unions when performing explicit initialization
(i.e. a general problem, not specifically a hardening problem; we're
just carrying the fix).
Details in the final patch, Acked by Masahiro, with updated selftests
to validate the fix"
* tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: Use -fzero-init-padding-bits=all
stackinit: Add union initialization to selftests
stackinit: Add old-style zero-init syntax to struct tests
|
|
The stack initialization selftests were checking scalars, strings,
and structs, but not unions. Add union tests (which are mostly identical
setup to structs). This catches the recent union initialization behavioral
changes seen in GCC 15. Before GCC 15, this new test passes:
ok 18 test_small_start_old_zero
With GCC 15, it fails:
not ok 18 test_small_start_old_zero
Specifically, a union with a larger member where a smaller member is
initialized with the older "= { 0 }" syntax:
union test_small_start {
char one:1;
char two;
short three;
unsigned long four;
struct big_struct {
unsigned long array[8];
} big;
};
This is a regression in compiler behavior that Linux has depended on.
GCC does not seem likely to fix it, instead suggesting that affected
projects start using -fzero-init-padding-bits=unions:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118403
Link: https://lore.kernel.org/r/20250127191031.245214-2-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The deprecated way to do a full zero init of a structure is with "= { 0 }",
but we weren't testing this style. Add it.
Link: https://lore.kernel.org/r/20250127191031.245214-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC cleanups from Eric Biggers:
"Simplify the kconfig options for controlling which CRC implementations
are built into the kernel, as was requested by Linus.
This means making the option to disable the arch code visible only
when CONFIG_EXPERT=y, and standardizing on a single generic
implementation of CRC32"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crc32: remove other generic implementations
lib/crc: simplify the kconfig options for CRC implementations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl table constification from Joel Granados:
"All ctl_table declared outside of functions and that remain unmodified
after initialization are const qualified.
This prevents unintended modifications to proc_handler function
pointers by placing them in the .rodata section.
This is a continuation of the tree-wide effort started a few releases
ago with the constification of the ctl_table struct arguments in the
sysctl API done in 78eb4ea25cd5 ("sysctl: treewide: constify the
ctl_table argument of proc_handlers")"
* tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
treewide: const qualify ctl_tables where applicable
|
|
Now that we've standardized on the byte-by-byte implementation of CRC32
as the only generic implementation (see previous commit for the
rationale), remove the code for the other implementations.
Tested with crc_kunit.
Link: https://lore.kernel.org/r/20250123212904.118683-3-ebiggers@kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Make the following simplifications to the kconfig options for choosing
CRC implementations for CRC32 and CRC_T10DIF:
1. Make the option to disable the arch-optimized code be visible only
when CONFIG_EXPERT=y.
2. Make a single option control the inclusion of the arch-optimized code
for all enabled CRC variants.
3. Make CRC32_SARWATE (a.k.a. slice-by-1 or byte-by-byte) be the only
generic CRC32 implementation.
The result is there is now just one option, CRC_OPTIMIZATIONS, which is
default y and can be disabled only when CONFIG_EXPERT=y.
Rationale:
1. Enabling the arch-optimized code is nearly always the right choice.
However, people trying to build the tiniest kernel possible would
find some use in disabling it. Anything we add to CRC32 is de facto
unconditional, given that CRC32 gets selected by something in nearly
all kernels. And unfortunately enabling the arch CRC code does not
eliminate the need to build the generic CRC code into the kernel too,
due to CPU feature dependencies. The size of the arch CRC code will
also increase slightly over time as more CRC variants get added and
more implementations targeting different instruction set extensions
get added. Thus, it seems worthwhile to still provide an option to
disable it, but it should be considered an expert-level tweak.
2. Considering the use case described in (1), there doesn't seem to be
sufficient value in making the arch-optimized CRC code be
independently configurable for different CRC variants. Note also
that multiple variants were already grouped together, e.g.
CONFIG_CRC32 actually enables three different variants of CRC32.
3. The bit-by-bit implementation is uselessly slow, whereas slice-by-n
for n=4 and n=8 use tables that are inconveniently large: 4096 bytes
and 8192 bytes respectively, compared to 1024 bytes for n=1. Higher
n gives higher instruction-level parallelism, so higher n easily wins
on traditional microbenchmarks on most CPUs. However, the larger
tables, which are accessed randomly, can be harmful in real-world
situations where the dcache may be cold or useful data may need be
evicted from the dcache. Meanwhile, today most architectures have
much faster CRC32 implementations using dedicated CRC32 instructions
or carryless multiplication instructions anyway, which make the
generic code obsolete in most cases especially on long messages.
Another reason for going with n=1 is that this is already what is
used by all the other CRC variants in the kernel. CRC32 was unique
in having support for larger tables. But as per the above this can
be considered an outdated optimization.
The standardization on slice-by-1 a.k.a. CRC32_SARWATE makes much of
the code in lib/crc32.c unused. A later patch will clean that up.
Link: https://lore.kernel.org/r/20250123212904.118683-2-ebiggers@kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the big set of driver core and debugfs updates for 6.14-rc1.
Included in here is a bunch of driver core, PCI, OF, and platform rust
bindings (all acked by the different subsystem maintainers), hence the
merge conflict with the rust tree, and some driver core api updates to
mark things as const, which will also require some fixups due to new
stuff coming in through other trees in this merge window.
There are also a bunch of debugfs updates from Al, and there is at
least one user that does have a regression with these, but Al is
working on tracking down the fix for it. In my use (and everyone
else's linux-next use), it does not seem like a big issue at the
moment.
Here's a short list of the things in here:
- driver core rust bindings for PCI, platform, OF, and some i/o
functions.
We are almost at the "write a real driver in rust" stage now,
depending on what you want to do.
- misc device rust bindings and a sample driver to show how to use
them
- debugfs cleanups in the fs as well as the users of the fs api for
places where drivers got it wrong or were unnecessarily doing
things in complex ways.
- driver core const work, making more of the api take const * for
different parameters to make the rust bindings easier overall.
- other small fixes and updates
All of these have been in linux-next with all of the aforementioned
merge conflicts, and the one debugfs issue, which looks to be resolved
"soon""
* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
rust: device: Use as_char_ptr() to avoid explicit cast
rust: device: Replace CString with CStr in property_present()
devcoredump: Constify 'struct bin_attribute'
devcoredump: Define 'struct bin_attribute' through macro
rust: device: Add property_present()
saner replacement for debugfs_rename()
orangefs-debugfs: don't mess with ->d_name
octeontx2: don't mess with ->d_parent or ->d_parent->d_name
arm_scmi: don't mess with ->d_parent->d_name
slub: don't mess with ->d_name
sof-client-ipc-flood-test: don't mess with ->d_name
qat: don't mess with ->d_name
xhci: don't mess with ->d_iname
mtu3: don't mess wiht ->d_iname
greybus/camera - stop messing with ->d_iname
mediatek: stop messing with ->d_iname
netdevsim: don't embed file_operations into your structs
b43legacy: make use of debugfs_get_aux()
b43: stop embedding struct file_operations into their objects
carl9170: stop embedding file_operations into their objects
...
|
|
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc/IIO driver updates from Greg KH:
"Here is the "big" set of char/misc/iio and other smaller driver
subsystem updates for 6.14-rc1. Loads of different things in here this
development cycle, highlights are:
- ntsync "driver" to handle Windows locking types enabling Wine to
work much better on many workloads (i.e. games). The driver
framework was in 6.13, but now it's enabled and fully working
properly. Should make many SteamOS users happy. Even comes with
tests!
- Large IIO driver updates and bugfixes
- FPGA driver updates
- Coresight driver updates
- MHI driver updates
- PPS driver updatesa
- const bin_attribute reworking for many drivers
- binder driver updates
- smaller driver updates and fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
ntsync: Fix reference leaks in the remaining create ioctls.
spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe()
spmi: Set fwnode for spmi devices
ntsync: fix a file reference leak in drivers/misc/ntsync.c
scripts/tags.sh: Don't tag usages of DECLARE_BITMAP
dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs
dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible
dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles
interconnect: sm8750: Add missing const to static qcom_icc_desc
memstick: core: fix kernel-doc notation
intel_th: core: fix kernel-doc warnings
binder: log transaction code on failure
iio: dac: ad3552r-hs: clear reset status flag
iio: dac: ad3552r-common: fix ad3541/2r ranges
iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw()
misc: fastrpc: Fix copy buffer page size
misc: fastrpc: Fix registered buffer page address
misc: fastrpc: Deregister device nodes properly in error scenarios
nvmem: core: improve range check for nvmem_cell_write()
nvmem: qcom-spmi-sdam: Set size in struct nvmem_config
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
|
|
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When memory allocation profiling is disabled there is no need to update
current->alloc_tag and these manipulations add unnecessary overhead. Fix
the overhead by skipping these extra updates.
I ran comprehensive testing on Pixel 6 on Big, Medium and Little cores:
Overhead before fixes Overhead after fixes
slab alloc page alloc slab alloc page alloc
Big 6.21% 5.32% 3.31% 4.93%
Medium 4.51% 5.05% 3.79% 4.39%
Little 7.62% 1.82% 6.68% 1.02%
This is an allocation microbenchmark doing allocations in a tight loop.
Not a really realistic scenario and useful only to make performance
comparisons.
Link: https://lkml.kernel.org/r/20241226211639.1357704-1-surenb@google.com
Fixes: b951aaff5035 ("mm: enable page allocation tagging")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Wang <00107082@163.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The previous commit removed the page_list argument from
alloc_pages_bulk_noprof() along with the alloc_pages_bulk_list() function.
Now that only the *_array() flavour of the API remains, we can do the
following renaming (along with the _noprof() ones):
alloc_pages_bulk_array -> alloc_pages_bulk
alloc_pages_bulk_array_mempolicy -> alloc_pages_bulk_mempolicy
alloc_pages_bulk_array_node -> alloc_pages_bulk_node
Link: https://lkml.kernel.org/r/275a3bbc0be20fbe9002297d60045e67ab3d4ada.1734991165.git.luizcap@redhat.com
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As of now during link list corruption it prints about cluprit address and
its wrong value, but sometime it is not enough to catch the actual issue
point.
If it prints allocation and free path of that corrupted node, it will be a
lot easier to find and fix the issues.
Adding the same information when data mismatch is found in link list
debug data:
[ 14.243055] slab kmalloc-32 start ffff0000cda19320 data offset 32 pointer offset 8 size 32 allocated at add_to_list+0x28/0xb0
[ 14.245259] __kmalloc_cache_noprof+0x1c4/0x358
[ 14.245572] add_to_list+0x28/0xb0
...
[ 14.248632] do_el0_svc_compat+0x1c/0x34
[ 14.249018] el0_svc_compat+0x2c/0x80
[ 14.249244] Free path:
[ 14.249410] kfree+0x24c/0x2f0
[ 14.249724] do_force_corruption+0xbc/0x100
...
[ 14.252266] el0_svc_common.constprop.0+0x40/0xe0
[ 14.252540] do_el0_svc_compat+0x1c/0x34
[ 14.252763] el0_svc_compat+0x2c/0x80
[ 14.253071] ------------[ cut here ]------------
[ 14.253303] list_del corruption. next->prev should be ffff0000cda192a8, but was 6b6b6b6b6b6b6b6b. (next=ffff0000cda19348)
[ 14.254255] WARNING: CPU: 3 PID: 84 at lib/list_debug.c:65 __list_del_entry_valid_or_report+0x158/0x164
Moved prototype of mem_dump_obj() to bug.h, as mm.h can not be included in
bug.h.
Link: https://lkml.kernel.org/r/20241230101043.53773-1-maninder1.s@samsung.com
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Rohit Thapliyal <r.thapliyal@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When the upper bound of the search is exhausted, the maple state may be
returned in an error state of -EBUSY. This means maple state needs to be
reset before the second search in mas_alloc_cylic() to ensure the search
happens. This test ensures the issue is not recreated.
Link: https://lkml.kernel.org/r/20241216190113.1226145-3-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com> says:
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, LZ4_DISTANCE_MAX and LZ4_DECOMPRESS_INPLACE_MARGIN are
defined in the erofs subsystem for LZ4 in-place decompression, which is
somewhat unsuitable since they should belong to the LZ4 itself and
may change with future LZ4 codebase updates.
Move them to include/linux/lz4.h to match the upstream LZ4 library [1].
No logic changes.
[1] https://github.com/lz4/lz4/blob/v1.10.0/lib/lz4.h#L670
Link: https://lkml.kernel.org/r/20250114130454.1191150-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Cc; Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Besides xas_squash_marks(), all functions use xa_mark_t type to iterate
all possible marks. Use xa_mark_t in xas_squash_marks() to keep code
consistent.
Link: https://lkml.kernel.org/r/20241213122523.12764-6-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Caller of xas_squash_marks() has ensured xas->xa_sibs is non-zero. Just
remove repeat check of xas->xa_sibs in xas_squash_marks().
Link: https://lkml.kernel.org/r/20241213122523.12764-5-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We don't support large entries which expand two more level xa_node in
split. For case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order", we also
need two level of xa_node to expand. Distinguish entry as large entry in
case "xas->xa_shift + 2 * XA_CHUNK_SHIFT == order".
As max order of folio in pagecache (MAX_PAGECACHE_ORDER) is <=
(XA_CHUNK_SHIFT * 2 - 1), this change is more likely a cleanup...
Link: https://lkml.kernel.org/r/20241213122523.12764-4-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After xas_load(), xas->index could point to mid of found multi-index entry
and xas->index's bits under node->shift maybe non-zero. The afterward
xas_pause() will move forward xas->index with xa->node->shift with bits
under node->shift un-masked and thus skip some index unexpectedly.
Consider following case:
Assume XA_CHUNK_SHIFT is 4.
xa_store_range(xa, 16, 31, ...)
xa_store(xa, 32, ...)
XA_STATE(xas, xa, 17);
xas_for_each(&xas,...)
xas_load(&xas)
/* xas->index = 17, xas->xa_offset = 1, xas->xa_node->xa_shift = 4 */
xas_pause()
/* xas->index = 33, xas->xa_offset = 2, xas->xa_node->xa_shift = 4 */
As we can see, index of 32 is skipped unexpectedly.
Fix this by mask bit under node->xa_shift when move forward index in
xas_pause().
For now, this will not cause serious problems. Only minor problem like
cachestat return less number of page status could happen.
Link: https://lkml.kernel.org/r/20241213122523.12764-3-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fixes and cleanups to xarray", v5.
This series contains some random fixes and cleanups to xarray. Patch 1-2
are fixes and patch 3-6 are cleanups. More details can be found in
respective patches.
This patch (of 5):
Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return
sibling entries from xa_load()"), we may return sibling entries from
xas_find_marked as following:
Thread A: Thread B:
xa_store_range(xa, entry, 6, 7, gfp);
xa_set_mark(xa, 6, mark)
XA_STATE(xas, xa, 6);
xas_find_marked(&xas, 7, mark);
offset = xas_find_chunk(xas, advance, mark);
[offset is 6 which points to a valid entry]
xa_store_range(xa, entry, 4, 7, gfp);
entry = xa_entry(xa, node, 6);
[entry is a sibling of 4]
if (!xa_is_node(entry))
return entry;
Skip sibling entry like xas_find() does to protect caller from seeing
sibling entry from xas_find_marked() or caller may use sibling entry
as a valid entry and crash the kernel.
Besides, load_race() test is modified to catch mentioned issue and modified
load_race() only passes after this fix is merged.
Here is an example how this bug could be triggerred in tmpfs which
enables large folio in mapping:
Let's take a look at involved racer:
1. How pages could be created and dirtied in shmem file.
write
ksys_write
vfs_write
new_sync_write
shmem_file_write_iter
generic_perform_write
shmem_write_begin
shmem_get_folio
shmem_allowable_huge_orders
shmem_alloc_and_add_folios
shmem_alloc_folio
__folio_set_locked
shmem_add_to_page_cache
XA_STATE_ORDER(..., index, order)
xax_store()
shmem_write_end
folio_mark_dirty()
2. How dirty pages could be deleted in shmem file.
ioctl
do_vfs_ioctl
file_ioctl
ioctl_preallocate
vfs_fallocate
shmem_fallocate
shmem_truncate_range
shmem_undo_range
truncate_inode_folio
filemap_remove_folio
page_cache_delete
xas_store(&xas, NULL);
3. How dirty pages could be lockless searched
sync_file_range
ksys_sync_file_range
__filemap_fdatawrite_range
filemap_fdatawrite_wbc
do_writepages
writeback_use_writepage
writeback_iter
writeback_get_folio
filemap_get_folios_tag
find_get_entry
folio = xas_find_marked()
folio_try_get(folio)
Kernel will crash as following:
1.Create 2.Search 3.Delete
/* write page 2,3 */
write
...
shmem_write_begin
XA_STATE_ORDER(xas, i_pages, index = 2, order = 1)
xa_store(&xas, folio)
shmem_write_end
folio_mark_dirty()
/* sync page 2 and page 3 */
sync_file_range
...
find_get_entry
folio = xas_find_marked()
/* offset will be 2 */
offset = xas_find_chunk()
/* delete page 2 and page 3 */
ioctl
...
xas_store(&xas, NULL);
/* write page 0-3 */
write
...
shmem_write_begin
XA_STATE_ORDER(xas, i_pages, index = 0, order = 2)
xa_store(&xas, folio)
shmem_write_end
folio_mark_dirty(folio)
/* get sibling entry from offset 2 */
entry = xa_entry(.., 2)
/* use sibling entry as folio and crash kernel */
folio_try_get(folio)
Link: https://lkml.kernel.org/r/20241213122523.12764-1-shikemeng@huaweicloud.com
Link: https://lkml.kernel.org/r/20241213122523.12764-2-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org> [English fixes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a detailed explanation in the list_sort() kernel doc comment
specifying that the comparison function must satisfy antisymmetry and
transitivity. These properties are essential for the sorting algorithm to
produce correct results.
Issues have arisen in the past [1][2][3][4] where comparison functions
violated the transitivity property, causing sorting algorithms to fail to
correctly order elements. While these requirements may seem
straightforward, they are commonly misunderstood or overlooked, leading to
bugs. Highlighting these properties in the documentation will help
prevent such mistakes in the future.
Link: https://lore.kernel.org/lkml/20240701205639.117194-1-visitorckw@gmail.com [1]
Link: https://lore.kernel.org/lkml/20241203202228.1274403-1-visitorckw@gmail.com [2]
Link: https://lore.kernel.org/lkml/20241209134226.1939163-1-visitorckw@gmail.com [3]
Link: https://lore.kernel.org/lkml/20241209145728.1975311-1-visitorckw@gmail.com [4]
Link: https://lkml.kernel.org/r/20250106170104.3137845-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: <chuang@cs.nycu.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "lib: clarify comparison function requirements", v2.
Add a detailed explanation in the sort_r/list_sort kernel doc comment
specifying that the comparison function must satisfy antisymmetry and
transitivity. These properties are essential for the sorting algorithm to
produce correct results.
Issues have arisen in the past [1][2][3][4] where comparison functions
violated the transitivity property, causing sorting algorithms to fail to
correctly order elements. While these requirements may seem
straightforward, they are commonly misunderstood or overlooked, leading to
bugs. Highlighting these properties in the documentation will help
prevent such mistakes in the future.
Link: https://lore.kernel.org/lkml/20240701205639.117194-1-visitorckw@gmail.com [1]
Link: https://lore.kernel.org/lkml/20241203202228.1274403-1-visitorckw@gmail.com [2]
Link: https://lore.kernel.org/lkml/20241209134226.1939163-1-visitorckw@gmail.com [3]
Link: https://lore.kernel.org/lkml/20241209145728.1975311-1-visitorckw@gmail.com [4]
This patch (of 2):
Add a detailed explanation in the sort_r() kernel doc comment specifying
that the comparison function must satisfy antisymmetry and transitivity.
These properties are essential for the sorting algorithm to produce
correct results.
Issues have arisen in the past [1][2][3][4] where comparison functions
violated the transitivity property, causing sorting algorithms to fail to
correctly order elements. While these requirements may seem
straightforward, they are commonly misunderstood or overlooked, leading to
bugs. Highlighting these properties in the documentation will help
prevent such mistakes in the future.
Link: https://lkml.kernel.org/r/20250106170104.3137845-1-visitorckw@gmail.com
Link: https://lore.kernel.org/lkml/20240701205639.117194-1-visitorckw@gmail.com [1]
Link: https://lore.kernel.org/lkml/20241203202228.1274403-1-visitorckw@gmail.com [2]
Link: https://lore.kernel.org/lkml/20241209134226.1939163-1-visitorckw@gmail.com [3]
Link: https://lore.kernel.org/lkml/20241209145728.1975311-1-visitorckw@gmail.com [4]
Link: https://lkml.kernel.org/r/20250106170104.3137845-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: <chuang@cs.nycu.edu.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Remove physical address skcipher walking
- Fix boot-up self-test race
Algorithms:
- Optimisations for x86/aes-gcm
- Optimisations for x86/aes-xts
- Remove VMAC
- Remove keywrap
Drivers:
- Remove n2
Others:
- Fixes for padata UAF
- Fix potential rhashtable deadlock by moving schedule_work outside
lock"
* tag 'v6.14-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (75 commits)
rhashtable: Fix rhashtable_try_insert test
dt-bindings: crypto: qcom,inline-crypto-engine: Document the SM8750 ICE
dt-bindings: crypto: qcom,prng: Document SM8750 RNG
dt-bindings: crypto: qcom-qce: Document the SM8750 crypto engine
crypto: asymmetric_keys - Remove unused key_being_used_for[]
padata: avoid UAF for reorder_work
padata: fix UAF in padata_reorder
padata: add pd get/put refcnt helper
crypto: skcipher - call cond_resched() directly
crypto: skcipher - optimize initializing skcipher_walk fields
crypto: skcipher - clean up initialization of skcipher_walk::flags
crypto: skcipher - fold skcipher_walk_skcipher() into skcipher_walk_virt()
crypto: skcipher - remove redundant check for SKCIPHER_WALK_SLOW
crypto: skcipher - remove redundant clamping to page size
crypto: skcipher - remove unnecessary page alignment of bounce buffer
crypto: skcipher - document skcipher_walk_done() and rename some vars
crypto: omap - switch from scatter_walk to plain offset
crypto: powerpc/p10-aes-gcm - simplify handling of linear associated data
crypto: bcm - Drop unused setting of local 'ptr' variable
crypto: hisilicon/qm - support new function communication
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace fing buffer fix from Steven Rostedt:
"Fix atomic64 operations on some architectures for the tracing ring
buffer:
- Have emulating atomic64 use arch_spin_locks instead of
raw_spin_locks
The tracing ring buffer events have a small timestamp that holds
the delta between itself and the event before it. But this can be
tricky to update when interrupts come in. It originally just set
the deltas to zero for events that interrupted the adding of
another event which made all the events in the interrupt have the
same timestamp as the event it interrupted. This was not suitable
for many tools, so it was eventually fixed. But that fix required
adding an atomic64 cmpxchg on the timestamp in cases where an event
was added while another event was in the process of being added.
Originally, for 32 bit architectures, the manipulation of the 64
bit timestamp was done by a structure that held multiple 32bit
words to hold parts of the timestamp and a counter. But as updates
to the ring buffer were done, maintaining this became too complex
and was replaced by the atomic64 generic operations which are now
used by both 64bit and 32bit architectures. Shortly after that, it
was reported that riscv32 and other 32 bit architectures that just
used the generic atomic64 were locking up. This was because the
generic atomic64 operations defined in lib/atomic64.c uses a
raw_spin_lock() to emulate an atomic64 operation. The problem here
was that raw_spin_lock() can also be traced by the function tracer
(which is commonly used for debugging raw spin locks). Since the
function tracer uses the tracing ring buffer, which now is being
traced internally, this was triggering a recursion and setting off
a warning that the spin locks were recusing.
There's no reason for the code that emulates atomic64 operations to
be using raw_spin_locks which have a lot of debugging
infrastructure attached to them (depending on the config options).
Instead it should be using the arch_spin_lock() which does not have
any infrastructure attached to them and is used by low level
infrastructure like RCU locks, lockdep and of course tracing. Using
arch_spin_lock()s fixes this issue.
- Do not trace in NMI if the architecture uses emulated atomic64
operations
Another issue with using the emulated atomic64 operations that uses
spin locks to emulate the atomic64 operations is that they cannot
be used in NMI context. As an NMI can trigger while holding the
atomic64 spin locks it can try to take the same lock and cause a
deadlock.
Have the ring buffer fail recording events if in NMI context and
the architecture uses the emulated atomic64 operations"
* tag 'trace-ringbuffer-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
atomic64: Use arch_spin_locks instead of raw_spin_locks
ring-buffer: Do not allow events in NMI with generic atomic64 cmpxchg()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"A smaller than usual release cycle.
The main changes are:
- Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)
In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile
only mode. Half of the tests are failing, since support for
btf_decl_tag is still WIP, but this is a great milestone.
- Convert various samples/bpf to selftests/bpf/test_progs format
(Alexis Lothoré and Bastien Curutchet)
- Teach verifier to recognize that array lookup with constant
in-range index will always succeed (Daniel Xu)
- Cleanup migrate disable scope in BPF maps (Hou Tao)
- Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)
- Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin
KaFai Lau)
- Refactor verifier lock support (Kumar Kartikeya Dwivedi)
This is a prerequisite for upcoming resilient spin lock.
- Remove excessive 'may_goto +0' instructions in the verifier that
LLVM leaves when unrolls the loops (Yonghong Song)
- Remove unhelpful bpf_probe_write_user() warning message (Marco
Elver)
- Add fd_array_cnt attribute for prog_load command (Anton Protopopov)
This is a prerequisite for upcoming support for static_branch"
* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits)
selftests/bpf: Add some tests related to 'may_goto 0' insns
bpf: Remove 'may_goto 0' instruction in opt_remove_nops()
bpf: Allow 'may_goto 0' instruction in verifier
selftests/bpf: Add test case for the freeing of bpf_timer
bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
bpf: Free special fields after unlock in htab_lru_map_delete_node()
tools: Sync if_xdp.h uapi tooling header
libbpf: Work around kernel inconsistently stripping '.llvm.' suffix
bpf: selftests: verifier: Add nullness elision tests
bpf: verifier: Support eliding map lookup nullness
bpf: verifier: Refactor helper access type tracking
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Add missing newline on verbose() call
selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED
libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED
libbpf: Fix return zero when elf_begin failed
selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test
veristat: Load struct_ops programs only once
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
directly accessible via the library API, instead of requiring the
crypto API. This is much simpler and more efficient.
- Convert some users such as ext4 to use the CRC32 library API instead
of the crypto API. More conversions like this will come later.
- Add a KUnit test that tests and benchmarks multiple CRC variants.
Remove older, less-comprehensive tests that are made redundant by
this.
- Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
volunteering to maintain it. I have additional cleanups and
optimizations planned for future cycles.
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
MAINTAINERS: add entry for CRC library
powerpc/crc: delete obsolete crc-vpmsum_test.c
lib/crc32test: delete obsolete crc32test.c
lib/crc16_kunit: delete obsolete crc16_kunit.c
lib/crc_kunit.c: add KUnit test suite for CRC library functions
powerpc/crc-t10dif: expose CRC-T10DIF function through lib
arm64/crc-t10dif: expose CRC-T10DIF function through lib
arm/crc-t10dif: expose CRC-T10DIF function through lib
x86/crc-t10dif: expose CRC-T10DIF function through lib
crypto: crct10dif - expose arch-optimized lib function
lib/crc-t10dif: add support for arch overrides
lib/crc-t10dif: stop wrapping the crypto API
scsi: target: iscsi: switch to using the crc32c library
f2fs: switch to using the crc32 library
jbd2: switch to using the crc32c library
ext4: switch to using the crc32c library
lib/crc32: make crc32c() go directly to lib
bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
x86/crc32: expose CRC32 functions through lib
x86/crc32: update prototype for crc32_pclmul_le_16()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- fix struct completion warning
- introduce autorun option
- add fallback for os.sched_getaffinity
- enable hardware acceleration when available
* tag 'linux_kselftest-kunit-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Introduce autorun option
kunit: enable hardware acceleration when available
kunit: add fallback for os.sched_getaffinity
kunit: platform: Resolve 'struct completion' warning
|
|
raw_spin_locks can be traced by lockdep or tracing itself. Atomic64
operations can be used in the tracing infrastructure. When an architecture
does not have true atomic64 operations it can use the generic version that
disables interrupts and uses spin_locks.
The tracing ring buffer code uses atomic64 operations for the time
keeping. But because some architectures use the default operations, the
locking inside the atomic operations can cause an infinite recursion.
As atomic64 implementation is architecture specific, it should not be
using raw_spin_locks() but instead arch_spin_locks as that is the purpose
of arch_spin_locks. To be used in architecture specific implementations of
generic infrastructure like atomic64 operations.
Note, by switching from raw_spin_locks to arch_spin_locks, the locks taken
to emulate the atomic64 operations will not have lockdep, mmio, or any
kind of checks done on them. They will not even disable preemption,
although the code will disable interrupts preventing the tasks that hold
the locks from being preempted. As the locks held are done so for very
short periods of time, and the logic is only done to emulate atomic64, not
having them be instrumented should not be an issue.
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/20250122144311.64392baf@gandalf.local.home
Fixes: c84897c0ff592 ("ring-buffer: Remove 32bit timestamp logic")
Closes: https://lore.kernel.org/all/86fb4f86-a0e4-45a2-a2df-3154acc4f086@gaisler.com/
Reported-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"This is slightly smaller than usual, with the most interesting work
being still around RTNL scope reduction.
Core:
- More core refactoring to reduce the RTNL lock contention, including
preparatory work for the per-network namespace RTNL lock, replacing
RTNL lock with a per device-one to protect NAPI-related net device
data and moving synchronize_net() calls outside such lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
and more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter:
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on each
restart.
Protocols:
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel TLS
(for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API:
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W
implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling:
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec, to ease
maintenance and future development.
- Add YNL support for decoding the link types used in net self-tests,
allowing a single build to run both net and drivers/net.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues,
affecting both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station
mode support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync"
* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
net/rose: prevent integer overflows in rose_setsockopt()
net: phylink: fix regression when binding a PHY
net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
ipv6: Move lifetime validation to inet6_rtm_newaddr().
ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
ipv6: Pass dev to inet6_addr_add().
ipv6: Convert inet6_ioctl() to per-netns RTNL.
ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
ipv6: Add __in6_dev_get_rtnl_net().
net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
net: mii: Fix the Speed display when the network cable is not connected
sysctl net: Remove macro checks for CONFIG_SYSCTL
eth: bnxt: update header sizing defaults
...
|
|
Pull Documentation updates from Jonathan Corbet:
- Quite a bit of Chinese and Spanish translation work
- Clarifying that Git commit IDs >12chars are OK
- A new nvme-multipath document
- A reorganization of the admin-guide top-level page to make it
readable
- Clarification of the role of Acked-by and maintainer discretion on
their acceptance
- Some reorganization of debugging-oriented docs
... and typo fixes, documentation updates, etc as usual
* tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits)
Documentation: Fix x86_64 UEFI outdated references to elilo
Documentation/sysctl: Add timer_migration to kernel.rst
docs/mm: Physical memory: Remove zone_t
docs: submitting-patches: clarify that signers may use their discretion on tags
docs: submitting-patches: clarify difference between Acked-by and Reviewed-by
docs: submitting-patches: clarify Acked-by and introduce "# Suffix"
Documentation: bug-hunting.rst: remove odd contact information
docs/zh_CN: Add sak index Chinese translation
doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes
doc: module: Fix documented type of namespace
Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
docs/zh_CN: Add landlock index Chinese translation
Documentation: Fix typo localmodonfig -> localmodconfig
overlayfs.rst: Fix and improve grammar
docs/zh_CN: Add siphash index Chinese translation
docs/zh_CN: Add security IMA-templates Chinese translation
docs/zh_CN: Add security digsig Chinese translation
Align git commit ID abbreviation guidelines and checks
docs: process: submitting-patches: split canonical patch format section
docs/zh_CN: Add security lsm Chinese translation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
"Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never
execute relevant code on any other CPU. This is currently handled
by smpboot code which takes care of CPU-hotplug operations.
Affinity here is a correctness constraint.
2) Some kthreads _have_ to be affine to a specific set of CPUs and
can't run anywhere else. The affinity is set through
kthread_bind_mask() and the subsystem takes care by itself to
handle CPU-hotplug operations. Affinity here is assumed to be a
correctness constraint.
3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
This is not a correctness constraint but merely a preference in
terms of memory locality. kswapd and kcompactd both fall into this
category. The affinity is set manually like for any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
node comes up. Also care should be taken so that the node affinity
doesn't cross isolated (nohz_full) cpumask boundaries.
4) Similar to the previous point except kthreads have a _preferred_
affinity different than a node. Both RCU boost kthreads and RCU
exp kworkers fall into this category as they refer to "RCU nodes"
from a distinctly distributed tree.
Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation. Each of which do it in its
own ad-hoc way.
This is an infrastructure proposal to handle this with the following
API changes:
- kthread_create_on_node() automatically affines the created kthread
to its target node unless it has been set as per-cpu or bound with
kthread_bind[_mask]() before the first wake-up.
- kthread_affine_preferred() is a new function that can be called
right after kthread_create_on_node() to specify a preferred
affinity different than the specified node.
When the preferred affinity can't be applied because the possible
targets are offline or isolated (nohz_full), the kthread is affine to
the housekeeping CPUs (which means to all online CPUs most of the time
or only the non-nohz_full CPUs when nohz_full= is set).
kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
converted, along with a few old drivers.
Summary of the changes:
- Consolidate a bunch of ad-hoc implementations of
kthread_run_on_cpu()
- Introduce task_cpu_fallback_mask() that defines the default last
resort affinity of a task to become nohz_full aware
- Add some correctness check to ensure kthread_bind() is always
called before the first kthread wake up.
- Default affine kthread to its preferred node.
- Convert kswapd / kcompactd and remove their halfway working ad-hoc
affinity implementation
- Implement kthreads preferred affinity
- Unify kthread worker and kthread API's style
- Convert RCU kthreads to the new API and remove the ad-hoc affinity
implementation"
* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
kthread: modify kernel-doc function name to match code
rcu: Use kthread preferred affinity for RCU exp kworkers
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
rcu: Use kthread preferred affinity for RCU boost
kthread: Implement preferred affinity
mm: Create/affine kswapd to its preferred node
mm: Create/affine kcompactd to its preferred node
kthread: Default affine kthread to its preferred NUMA node
kthread: Make sure kthread hasn't started while binding it
sched,arm64: Handle CPU isolation on last resort fallback rq selection
arm64: Exclude nohz_full CPUs from 32bits el0 support
lib: test_objpool: Use kthread_run_on_cpu()
kallsyms: Use kthread_run_on_cpu()
soc/qman: test: Use kthread_run_on_cpu()
arm/bL_switcher: Use kthread_run_on_cpu()
|
|
Pull drm updates from Dave Airlie:
"There are two external interactions of note, the msm tree pull in some
opp tree, hopefully the opp tree arrives from the same git tree
however it normally does.
There is also a new cgroup controller for device memory, that is used
by drm, so is merging through my tree. This will hopefully help open
up gpu cgroup usage a bit more and move us forward.
There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.
Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
refactors across the board:
core:
- device memory cgroup controller added
- Remove driver date from drm_driver
- Add drm_printer based hex dumper
- drm memory stats docs update
- scheduler documentation improvements
new driver:
- amdxdna - Ryzen AI NPU support
connector:
- add a mutex to protect ELD
- make connector setup two-step
panels:
- Introduce backlight quirks infrastructure
- New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00,
- Multi-Inno Technology MI1010Z1T-1CP11
bridge:
- ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
- Provide default implementation of atomic_check for HDMI bridges
- it605: HDCP improvements, MCCS Support
xe:
- make OA buffer size configurable
- GuC capture fixes
- add ufence and g2h flushes
- restore system memory GGTT mappings
- ioctl fixes
- SRIOV PF scheduling priority
- allow fault injection
- lots of improvements/refactors
- Enable GuC's WA_DUAL_QUEUE for newer platforms
- IRQ related fixes and improvements
i915:
- More accurate engine busyness metrics with GuC submission
- Ensure partial BO segment offset never exceeds allowed max
- Flush GuC CT receive tasklet during reset preparation
- Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
- Fix DG1 power gate sequence
- Enabling uncompressed 128b/132b UHBR SST
- Handle hdmi connector init failures, and no HDMI/DP cases
- More robust engine resets on Haswell and older
i915/xe display:
- HDCP fixes for Xe3Lpd
- New GSC FW ARL-H/ARL-U
- support 3 VDSC engines 12 slices
- MBUS joining sanitisation
- reconcile i915/xe display power mgmt
- Xe3Lpd fixes
- UHBR rates for Thunderbolt
amdgpu:
- DRM panic support
- track BO memory stats at runtime
- Fix max surface handling in DC
- Cleaner shader support for gfx10.3 dGPUs
- fix drm buddy trim handling
- SDMA engine reset updates
- Fix doorbell ttm cleanup
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- DCN 3.5 updates
- Use a pm notifier to more gracefully handle VRAM eviction on
suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine
instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- GC 12.x updates
- DC FAMS updates
amdkfd:
- GG 9.5 updates
- Logging improvements
- Shader debugger fixes
- Trap handler cleanup
- Cleanup includes
- Eviction fence wq fix
msm:
- MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support
- DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two
SSPPs for a single plane)
- added CWB hardware blocks support
- DSI:
- added SM6150 (aka QCS615) support
- GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting
rcar-du:
- Add r8a779h0 Support
ivpu:
- Fix qemu crash when using passthrough
nouveau:
- expose GSP-RM logging buffers via debugfs
panfrost:
- Add MT8188 Mali-G57 MC3 support
rockchip:
- Gamma LUT support
hisilicon:
- new HIBMC support
virtio-gpu:
- convert to helpers
- add prime support for scanout buffers
v3d:
- Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
vc4:
- Add support for BCM2712
vkms:
- line-per-line compositing algorithm to improve performance
zynqmp:
- Add DP audio support
mediatek:
- dp: Add sdp path reset
- dp: Support flexible length of DP calibration data
etnaviv:
- add fdinfo memory support
- add explicit reset handling"
* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
drm/bridge: fix documentation for the hdmi_audio_prepare() callback
doc/cgroup: Fix title underline length
drm/doc: Include new drm-compute documentation
cgroup/dmem: Fix parameters documentation
cgroup/dmem: Select PAGE_COUNTER
kernel/cgroup: Remove the unused variable climit
drm/display: hdmi: Do not read EDID on disconnected connectors
drm/tests: hdmi: Add connector disablement test
drm/connector: hdmi: Do atomic check when necessary
drm/amd/display: 3.2.316
drm/amd/display: avoid reset DTBCLK at clock init
drm/amd/display: improve dpia pre-train
drm/amd/display: Apply DML21 Patches
drm/amd/display: Use HW lock mgr for PSR1
drm/amd/display: Revised for Replay Pseudo vblank control
drm/amd/display: Add a new flag for replay low hz
drm/amd/display: Remove unused read_ono_state function from Hwss module
drm/amd/display: Do not elevate mem_type change to full update
drm/amd/display: Do not wait for PSR disable on vbl enable
drm/amd/display: Remove unnecessary eDP power down
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Lockdep:
- Improve and fix lockdep bitsize limits, clarify the Kconfig
documentation (Carlos Llamas)
- Fix lockdep build warning on Clang related to
chain_hlock_class_idx() inlining (Andy Shevchenko)
- Relax the requirements of PROVE_RAW_LOCK_NESTING arch support by
not tying it to ARCH_SUPPORTS_RT unnecessarily (Waiman Long)
Rust integration:
- Support lock pointers managed by the C side (Lyude Paul)
- Support guard types (Lyude Paul)
- Update MAINTAINERS file filters to include the Rust locking code
(Boqun Feng)
Wake-queues:
- Add raw_spin_*wake() helpers to simplify locking code (John Stultz)
SMP cross-calls:
- Fix potential data update race by evaluating the local cond_func()
before IPI side-effects (Mathieu Desnoyers)
Guard primitives:
- Ease [c]tags based searches by including the cleanup/guard type
primitives (Peter Zijlstra)
ww_mutexes:
- Simplify the ww_mutex self-test code via swap() (Thorsten Blum)
Static calls:
- Update the static calls MAINTAINERS file-pattern (Jiri Slaby)"
* tag 'locking-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add static_call_inline.c to STATIC BRANCH/CALL
cleanup, tags: Create tags for the cleanup primitives
sched/wake_q: Add helper to call wake_up_q after unlock with preemption disabled
rust: sync: Add lock::Backend::assert_is_held()
rust: sync: Add SpinLockGuard type alias
rust: sync: Add MutexGuard type alias
rust: sync: Make Guard::new() public
rust: sync: Add Lock::from_raw() for Lock<(), B>
locking: MAINTAINERS: Start watching Rust locking primitives
lockdep: Move lockdep_assert_locked() under #ifdef CONFIG_PROVE_LOCKING
lockdep: Mark chain_hlock_class_idx() with __maybe_unused
lockdep: Document MAX_LOCKDEP_CHAIN_HLOCKS calculation
lockdep: Clarify size for LOCKDEP_*_BITS configs
lockdep: Fix upper limit for LOCKDEP_*_BITS configs
locking/ww_mutex/test: Use swap() macro
smp/scf: Evaluate local cond_func() before IPI side-effects
locking/lockdep: Enforce PROVE_RAW_LOCK_NESTING only if ARCH_SUPPORTS_RT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support caching symlink lengths in inodes
The size is stored in a new union utilizing the same space as
i_devices, thus avoiding growing the struct or taking up any more
space
When utilized it dodges strlen() in vfs_readlink(), giving about
1.5% speed up when issuing readlink on /initrd.img on ext4
- Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
If a file system supports uncached buffered IO, it may set
FOP_DONTCACHE and enable support for RWF_DONTCACHE.
If RWF_DONTCACHE is attempted without the file system supporting
it, it'll get errored with -EOPNOTSUPP
- Enable VBOXGUEST and VBOXSF_FS on ARM64
Now that VirtualBox is able to run as a host on arm64 (e.g. the
Apple M3 processors) we can enable VBOXSF_FS (and in turn
VBOXGUEST) for this architecture.
Tested with various runs of bonnie++ and dbench on an Apple MacBook
Pro with the latest Virtualbox 7.1.4 r165100 installed
Cleanups:
- Delay sysctl_nr_open check in expand_files()
- Use kernel-doc includes in fiemap docbook
- Use page->private instead of page->index in watch_queue
- Use a consume fence in mnt_idmap() as it's heavily used in
link_path_walk()
- Replace magic number 7 with ARRAY_SIZE() in fc_log
- Sort out a stale comment about races between fd alloc and dup2()
- Fix return type of do_mount() from long to int
- Various cosmetic cleanups for the lockref code
Fixes:
- Annotate spinning as unlikely() in __read_seqcount_begin
The annotation already used to be there, but got lost in commit
52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as
statement expressions")
- Fix proc_handler for sysctl_nr_open
- Flush delayed work in delayed fput()
- Fix grammar and spelling in propagate_umount()
- Fix ESP not readable during coredump
In /proc/PID/stat, there is the kstkesp field which is the stack
pointer of a thread. While the thread is active, this field reads
zero. But during a coredump, it should have a valid value
However, at the moment, kstkesp is zero even during coredump
- Don't wake up the writer if the pipe is still full
- Fix unbalanced user_access_end() in select code"
* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
gfs2: use lockref_init for qd_lockref
erofs: use lockref_init for pcl->lockref
dcache: use lockref_init for d_lockref
lockref: add a lockref_init helper
lockref: drop superfluous externs
lockref: use bool for false/true returns
lockref: improve the lockref_get_not_zero description
lockref: remove lockref_put_not_zero
fs: Fix return type of do_mount() from long to int
select: Fix unbalanced user_access_end()
vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
pipe_read: don't wake up the writer if the pipe is still full
selftests: coredump: Add stackdump test
fs/proc: do_task_stat: Fix ESP not readable during coredump
fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
fs: sort out a stale comment about races between fd alloc and dup2
fs: Fix grammar and spelling in propagate_umount()
fs: fc_log replace magic number 7 with ARRAY_SIZE()
fs: use a consume fence in mnt_idmap()
file: flush delayed work in delayed fput()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs netfs updates from Christian Brauner:
"This contains read performance improvements and support for monolithic
single-blob objects that have to be read/written as such (e.g. AFS
directory contents). The implementation of the two parts is interwoven
as each makes the other possible.
- Read performance improvements
The read performance improvements are intended to speed up some
loss of performance detected in cifs and to a lesser extend in afs.
The problem is that we queue too many work items during the
collection of read results: each individual subrequest is collected
by its own work item, and then they have to interact with each
other when a series of subrequests don't exactly align with the
pattern of folios that are being read by the overall request.
Whilst the processing of the pages covered by individual
subrequests as they complete potentially allows folios to be woken
in parallel and with minimum delay, it can shuffle wakeups for
sequential reads out of order - and that is the most common I/O
pattern.
The final assessment and cleanup of an operation is then held up
until the last I/O completes - and for a synchronous sequential
operation, this means the bouncing around of work items just adds
latency.
Two changes have been made to make this work:
(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and
also dispatches retries as necessary).
(2) For readahead and AIO, this work item be done on a workqueue
and can run in parallel with the ultimate consumer of the data;
for synchronous direct or unbuffered reads, the collection is
run in the application thread and not offloaded.
Functions such as smb2_readv_callback() then just tell netfslib
that the subrequest has terminated; netfslib does a minimal bit of
processing on the spot - stat counting and tracing mostly - and
then queues/wakes up the worker. This simplifies the logic as the
collector just walks sequentially through the subrequests as they
complete and walks through the folios, if buffered, unlocking them
as it goes. It also keeps to a minimum the amount of latency
injected into the filesystem's low-level I/O handling
The way netfs supports filesystems using the deprecated
PG_private_2 flag is changed: folios are flagged and added to a
write request as they complete and that takes care of scheduling
the writes to the cache. The originating read request can then just
unlock the pages whatever happens.
- Single-blob object support
Single-blob objects are files for which the content of the file
must be read from or written to the server in a single operation
because reading them in parts may yield inconsistent results. AFS
directories are an example of this as there exists the possibility
that the contents are generated on the fly and would differ between
reads or might change due to third party interference.
Such objects will be written to and retrieved from the cache if one
is present, though we allow/may need to propose multiple
subrequests to do so. The important part is that read from/write to
the *server* is monolithic.
Single blob reading is, for the moment, fully synchronous and does
result collection in the application thread and, also for the
moment, the API is supplied the buffer in the form of a folio_queue
chain rather than using the pagecache.
- Related afs changes
This series makes a number of changes to the kafs filesystem,
primarily in the area of directory handling:
- AFS's FetchData RPC reply processing is made partially
asynchronous which allows the netfs_io_request's outstanding
operation counter to be removed as part of reducing the
collection to a single work item.
- Directory and symlink reading are plumbed through netfslib using
the single-blob object API and are now cacheable with fscache.
This also allows the afs_read struct to be eliminated and
netfs_io_subrequest to be used directly instead.
- Directory and symlink content are now stored in a folio_queue
buffer rather than in the pagecache. This means we don't require
the RCU read lock and xarray iteration to access it, and folios
won't randomly disappear under us because the VM wants them
back.
- The vnode operation lock is changed from a mutex struct to a
private lock implementation. The problem is that the lock now
needs to be dropped in a separate thread and mutexes don't
permit that.
- When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know
what it's likely to look like).
- We now use the in-directory hashtable to reduce the number of
entries we need to scan when doing a lookup. The edit routines
have to maintain the hash chains.
- Cancellation (e.g. by signal) of an async call after the
rxrpc_call has been set up is now offloaded to the worker thread
as there will be a notification from rxrpc upon completion. This
avoids a double cleanup.
- A "rolling buffer" implementation is created to abstract out the
two separate folio_queue chaining implementations I had (one for
read and one for write).
- Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again.
This is used to handle AFS directories, but could also be used to
create bounce buffers for content crypto and transport crypto.
- The was_async argument is dropped from netfs_read_subreq_terminated()
Instead we wake the read collection work item by either queuing it
or waking up the app thread.
- We don't need to use BH-excluding locks when communicating between
the issuing thread and the collection thread as neither of them now
run in BH context.
- Also included are a number of new tracepoints; a split of the
netfslib write collection code to put retrying into its own file
(it gets more complicated with content encryption).
- There are also some minor fixes AFS included, including fixing the
AFS directory format struct layout, reducing some directory
over-invalidation and making afs_mkdir() translate EEXIST to
ENOTEMPY (which is not available on all systems the servers
support).
- Finally, there's a patch to try and detect entry into the folio
unlock function with no folio_queue structs in the buffer (which
isn't allowed in the cases that can get there).
This is a debugging patch, but should be minimal overhead"
* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...
|
|
This merges the vsnprintf internal cleanups I did, which were triggered
by a combination of performance issues (see for example commit
f9ed1f7c2e26: "genirq/proc: Use seq_put_decimal_ull_width() for decimal
values") and discussion about tracing abusing the vsnprintf code in odd
ways.
The intent was to improve code generation, but also to possibly
eventually expose the cleaned-up printf format decoding state machine.
It certainly didn't get to the point where we'd want to expose the
format decoding to external users, but it's an improvement over what we
used to have. Several of the complex case statements have been
simplified, or removed entirely to be replaced by simple table lookups.
* branch 'vsnprintf':
vsnprintf: fix the number base for non-numeric formats
vsnprintf: fix up kerneldoc for argument name changes
vsprintf: don't make the 'binary' version pack small integer arguments
vsnprintf: collapse the number format state into one single state
vsnprintf: mark the indirect width and precision cases unlikely
vsnprintf: inline skip_atoi() again
vsprintf: deal with format specifiers with a lookup table
vsprintf: deal with format flags with a simple lookup table
vsprintf: associate the format state with the format pointer
vsprintf: fix calling convention for format_decode()
vsprintf: avoid nested switch statement on same variable
vsprintf: simplify number handling
|
|
The test on whether rhashtable_insert_one did an insertion relies
on the value returned by rhashtable_lookup_one. Unfortunately that
value is overwritten after rhashtable_insert_one returns. Fix this
by moving the test before data gets overwritten.
Simplify the test as only data == NULL matters.
Finally move atomic_inc back within the lock as otherwise it may
be reordered with the atomic_dec on the removal side, potentially
leading to an underflow.
Reported-by: Michael Kelley <mhklinux@outlook.com>
Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This patch enables to update a selected component from PLDM image
containing multiple components.
Example usage:
struct pldmfw;
data.mode = PLDMFW_UPDATE_MODE_SINGLE_COMPONENT;
data.compontent_identifier = DRIVER_FW_MGMT_COMPONENT_ID;
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Konrad Knitter <konrad.knitter@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Replace int used as bool with the actual bool type for return values that
can only be true or false.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-4-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
lockref_put_return returns exactly -1 and not "an error" when the lockref
is dead or locked.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-3-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
lockref_put_not_zero is not used anywhere, and unless I'm missing
something didn't end up being used used at all. Remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-2-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|