summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-02connector: Fix proc_event_num_listeners count not clearedwangkeqi
When we register a cn_proc listening event, the proc_event_num_listener variable will be incremented by one, but if PROC_CN_MCAST_IGNORE is not called, the count will not decrease. This will cause the proc_*_connector function to take the wrong path. It will reappear when the forkstat tool exits via ctrl + c. We solve this problem by determining whether there are still listeners to clear proc_event_num_listener. Signed-off-by: wangkeqi <wangkeqiwang@didiglobal.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02net: phy: linux/phy.h: fix Excess kernel-doc description warningRandy Dunlap
Remove the @phy_timer: line to prevent the kernel-doc warning: include/linux/phy.h:768: warning: Excess struct member 'phy_timer' description in 'phy_device' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: netdev@vger.kernel.org Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)Jörn-Thorben Hinz
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new socket option SO_TIMESTAMPING_NEW. Setting the option is handled in sk_setsockopt(), querying it was not handled in sk_getsockopt(), though. Following remarks on an earlier submission of this patch, keep the old behavior of getsockopt(SO_TIMESTAMPING_OLD) which returns the active flags even if they actually have been set through SO_TIMESTAMPING_NEW. The new getsockopt(SO_TIMESTAMPING_NEW) is stricter, returning flags only if they have been set through the same option. Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/ Link: https://lore.kernel.org/netdev/0d7cddc9-03fa-43db-a579-14f3e822615b@app.fastmail.com/ Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01net: qrtr: ns: Return 0 if server port is not presentSarannya S
When a 'DEL_CLIENT' message is received from the remote, the corresponding server port gets deleted. A DEL_SERVER message is then announced for this server. As part of handling the subsequent DEL_SERVER message, the name- server attempts to delete the server port which results in a '-ENOENT' error. The return value from server_del() is then propagated back to qrtr_ns_worker, causing excessive error prints. To address this, return 0 from control_cmd_del_server() without checking the return value of server_del(), since the above scenario is not an error case and hence server_del() doesn't have any other error return value. Signed-off-by: Sarannya Sasikumar <quic_sarannya@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01bcachefs: make RO snapshots actually ROKent Overstreet
Add checks to all the VFS paths for "are we in a RO snapshot?". Note - we don't check this when setting inode options via our xattr interface, since those generally only affect data placement, not contents of data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01bcachefs: bch_sb_field_downgradeKent Overstreet
Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb.recovery_passes_requiredKent Overstreet
Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add persistent identifiers for recovery passesKent Overstreet
The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: prt_bitflags_vector()Kent Overstreet
similar to prt_bitflags(), but for ulong arrays Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: move BCH_SB_ERRS() to sb-errors_types.hKent Overstreet
we need BCH_SB_ERR_MAX in bcachefs.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix buffer overflow in nocow write pathKent Overstreet
BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an extent, it's the maximum number of dirty pointers. We don't have a real restriction on the number of cached pointers, and we don't want a fixed size array here anyways - so switch to DARRAY_PREALLOCATED(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
2024-01-01bcachefs: DARRAY_PREALLOCATED()Kent Overstreet
Add support to darray for preallocating some number of elements. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Switch darray to kvmalloc()Kent Overstreet
We sometimes use darrays for quite large buffers - the btree write buffer in particular needs large buffers, since it must be sized to hold all the write buffer keys outstanding in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Factor out darray resize slowpathKent Overstreet
Move the slowpath (actually growing the darray) to an out-of-line function; also, add some helpers for the upcoming btree write buffer rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix setting version_upgrade_completeKent Overstreet
If a superblock write hasn't happened (i.e. we never had to go rw), then c->sb.version will be out of date w.r.t. c->disk_sb.sb->version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix invalid free in dio write pathKent Overstreet
turns out iterate_iovec() mutates __iov, we need to save our own copy Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Marcin Mirosław <marcin@mejor.pl>
2024-01-01bcachefs: Fix extents iteration + snapshots interactionKent Overstreet
peek_upto() checks against the end position and bails out before FILTER_SNAPSHOTS checks; this is because if we end up at a different inode number than the original search key none of the keys we see might be visibile in the current snapshot - we might be looking at inode in a completely different subvolume. But this is broken, because when we're iterating over extents we're checking against the extent start position to decide when to bail out, and the extent start position isn't monotonically increasing until after we've run FILTER_SNAPSHOTS. Fix this by adding a simple inode number check where the old bailout check was, and moving the main check to the correct position. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01MAINTAINERS: step down as TJA11XX C45 maintainerRadu Pirea (NXP OSS)
I am stepping down as TJA11XX C45 maintainer. Andrei Botila will take the responsibility to maintain and improve the support for TJA11XX C45 PHYs. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01r8169: Fix PCI error on system resumeKai-Heng Feng
Some r8168 NICs stop working upon system resume: [ 688.051096] r8169 0000:02:00.1 enp2s0f1: rtl_ep_ocp_read_cond == 0 (loop: 10, delay: 10000). [ 688.175131] r8169 0000:02:00.1 enp2s0f1: Link is Down ... [ 691.534611] r8169 0000:02:00.1 enp2s0f1: PCI error (cmd = 0x0407, status_errs = 0x0000) Not sure if it's related, but those NICs have a BMC device at function 0: 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Realtek RealManage BMC [10ec:816e] (rev 1a) Trial and error shows that increase the loop wait on rtl_ep_ocp_read_cond to 30 can eliminate the issue, so let rtl8168ep_driver_start() to wait a bit longer. Fixes: e6d6ca6e1204 ("r8169: Add support for another RTL8168FP") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01net/tcp_sigpool: Use kref_get_unless_zero()Dmitry Safonov
The freeing and re-allocation of algorithm are protected by cpool_mutex, so it doesn't fix an actual use-after-free, but avoids a deserved refcount_warn_saturate() warning. A trivial fix for the racy behavior. Fixes: 8c73b26315aa ("net/tcp: Prepare tcp_md5sig_pool for TCP-AO") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01net: sched: em_text: fix possible memory leak in em_text_destroy()Hangyu Hua
m->data needs to be freed when em_text_destroy is called. Fixes: d675c989ed2d ("[PKT_SCHED]: Packet classification based on textsearch (ematch)") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-31Linux 6.7-rc8v6.7-rc8Linus Torvalds
2023-12-31get_maintainer: remove stray punctuation when cleaning file emailsAlvin Šipraga
When parsing emails from .yaml files in particular, stray punctuation such as a leading '-' can end up in the name. For example, consider a common YAML section such as: maintainers: - devicetree@vger.kernel.org This would previously be processed by get_maintainer.pl as: - <devicetree@vger.kernel.org> Make the logic in clean_file_emails more robust by deleting any sub-names which consist of common single punctuation marks before proceeding to the best-effort name extraction logic. The output is then correct: devicetree@vger.kernel.org Some additional comments are added to the function to make things clearer to future readers. Link: https://lore.kernel.org/all/0173e76a36b3a9b4e7f324dd3a36fd4a9757f302.camel@perches.com/ Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-31get_maintainer: correctly parse UTF-8 encoded names in filesAlvin Šipraga
While the script correctly extracts UTF-8 encoded names from the MAINTAINERS file, the regular expressions damage my name when parsing from .yaml files. Fix this by replacing the Latin-1-compatible regular expressions with the unicode property matcher \p{L}, which matches on any letter according to the Unicode General Category of letters. The proposed solution only works if the script uses proper string encoding from the outset, so instruct Perl to unconditionally open all files with UTF-8 encoding. This should be safe, as the entire source tree is either UTF-8 or ASCII encoded anyway. See [1] for a detailed analysis. Furthermore, to prevent the \w expression from matching non-ASCII when checking for whether a name should be escaped with quotes, add the /a flag to the regular expression. The escaping logic was duplicated in two places, so it has been factored out into its own function. The original issue was also identified on the tools mailing list [2]. This should solve the observed side effects there as well. Link: https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/ [1] Link: https://lore.kernel.org/tools/20230726-gush-slouching-a5cd41@meerkat/ [2] Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30Merge tag 'trace-v6.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix readers that are blocked on the ring buffer when buffer_percent is 100%. They are supposed to wake up when the buffer is full, but because the sub-buffer that the writer is on is never considered "dirty" in the calculation, dirty pages will never equal nr_pages. Add +1 to the dirty count in order to count for the sub-buffer that the writer is on. - When a reader is blocked on the "snapshot_raw" file, it is to be woken up when a snapshot is done and be able to read the snapshot buffer. But because the snapshot swaps the buffers (the main one with the snapshot one), and the snapshot reader is waiting on the old snapshot buffer, it was not woken up (because it is now on the main buffer after the swap). Worse yet, when it reads the buffer after a snapshot, it's not reading the snapshot buffer, it's reading the live active main buffer. Fix this by forcing a wakeup of all readers on the snapshot buffer when a new snapshot happens, and then update the buffer that the reader is reading to be back on the snapshot buffer. - Fix the modification of the direct_function hash. There was a race when new functions were added to the direct_function hash as when it moved function entries from the old hash to the new one, a direct function trace could be hit and not see its entry. This is fixed by allocating the new hash, copy all the old entries onto it as well as the new entries, and then use rcu_assign_pointer() to update the new direct_function hash with it. This also fixes a memory leak in that code. - Fix eventfs ownership * tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Fix modification of direct_function hash while in use tracing: Fix blocked reader of snapshot buffer ring-buffer: Fix wake ups when buffer_percent is set to 100 eventfs: Fix file and directory uid and gid ownership
2023-12-30locking/osq_lock: Clarify osq_wait_next()David Laight
Directly return NULL or 'next' instead of breaking out of the loop. Signed-off-by: David Laight <david.laight@aculab.com> [ Split original patch into two independent parts - Linus ] Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30locking/osq_lock: Clarify osq_wait_next() calling conventionDavid Laight
osq_wait_next() is passed 'prev' from osq_lock() and NULL from osq_unlock() but only needs the 'cpu' value to write to lock->tail. Just pass prev->cpu or OSQ_UNLOCKED_VAL instead. Should have no effect on the generated code since gcc manages to assume that 'prev != NULL' due to an earlier dereference. Signed-off-by: David Laight <david.laight@aculab.com> [ Changed 'old' to 'old_cpu' by request from Waiman Long - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.cDavid Laight
struct optimistic_spin_node is private to the implementation. Move it into the C file to ensure nothing is accessing it. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30ftrace: Fix modification of direct_function hash while in useSteven Rostedt (Google)
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where if the number of new entries are added is large enough to cause two allocations in the loop: for (i = 0; i < size; i++) { hlist_for_each_entry(entry, &hash->buckets[i], hlist) { new = ftrace_add_rec_direct(entry->ip, addr, &free_hash); if (!new) goto out_remove; entry->direct = addr; } } Where ftrace_add_rec_direct() has: if (ftrace_hash_empty(direct_functions) || direct_functions->count > 2 * (1 << direct_functions->size_bits)) { struct ftrace_hash *new_hash; int size = ftrace_hash_empty(direct_functions) ? 0 : direct_functions->count + 1; if (size < 32) size = 32; new_hash = dup_hash(direct_functions, size); if (!new_hash) return NULL; *free_hash = direct_functions; direct_functions = new_hash; } The "*free_hash = direct_functions;" can happen twice, losing the previous allocation of direct_functions. But this also exposed a more serious bug. The modification of direct_functions above is not safe. As direct_functions can be referenced at any time to find what direct caller it should call, the time between: new_hash = dup_hash(direct_functions, size); and direct_functions = new_hash; can have a race with another CPU (or even this one if it gets interrupted), and the entries being moved to the new hash are not referenced. That's because the "dup_hash()" is really misnamed and is really a "move_hash()". It moves the entries from the old hash to the new one. Now even if that was changed, this code is not proper as direct_functions should not be updated until the end. That is the best way to handle function reference changes, and is the way other parts of ftrace handles this. The following is done: 1. Change add_hash_entry() to return the entry it created and inserted into the hash, and not just return success or not. 2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove the former. 3. Allocate a "new_hash" at the start that is made for holding both the new hash entries as well as the existing entries in direct_functions. 4. Copy (not move) the direct_function entries over to the new_hash. 5. Copy the entries of the added hash to the new_hash. 6. If everything succeeds, then use rcu_pointer_assign() to update the direct_functions with the new_hash. This simplifies the code and fixes both the memory leak as well as the race condition mentioned above. Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/ Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-30ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx seriesAabish Malik
The HP Pavilion 14 ec1xxx series uses the HP mainboard 8A0F with the ALC287 codec. The mute led can be enabled using the already existing ALC287_FIXUP_HP_GPIO_LED quirk. Tested on an HP Pavilion ec1003AU Signed-off-by: Aabish Malik <aabishmalik3337@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231229170352.742261-3-aabishmalik3337@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29mlxbf_gige: fix receive packet race conditionDavid Thompson
Under heavy traffic, the BlueField Gigabit interface can become unresponsive. This is due to a possible race condition in the mlxbf_gige_rx_packet function, where the function exits with producer and consumer indices equal but there are remaining packet(s) to be processed. In order to prevent this situation, read receive consumer index *before* the HW replenish so that the mlxbf_gige_rx_packet function returns an accurate return value even if a packet is received into just-replenished buffer prior to exiting this routine. If the just-replenished buffer is received and occupies the last RX ring entry, the interface would not recover and instead would encounter RX packet drops related to internal buffer shortages since the driver RX logic is not being triggered to drain the RX ring. This patch will address and prevent this "ring full" condition. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29Merge tag 'gpio-fixes-for-v6.7-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - Andy steps down as GPIO reviewer - Kent becomes a reviewer for GPIO uAPI - add missing intel file to the relevant MAINTAINERS section * tag 'gpio-fixes-for-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: Add a missing file to the INTEL GPIO section MAINTAINERS: Remove Andy from GPIO maintainers MAINTAINERS: split out the uAPI into a new section
2023-12-29Merge tag 'platform-drivers-x86-v6.7-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - Intel PMC GBE LTR regression - P2SB / PCI deadlock fix * tag 'platform-drivers-x86-v6.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback platform/x86/intel/pmc: Allow reenabling LTRs platform/x86/intel/pmc: Add suspend callback platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
2023-12-29Merge tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Fix for a badly numbered flag, and a regression fix for the badblocks updates from this merge window" * tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux: block: renumber QUEUE_FLAG_HW_WC badblocks: avoid checking invalid range in badblocks_check()
2023-12-29mailmap: add entries for Mathieu OthaceheMathieu Othacehe
Add my gnu.org mail address. Link: https://lkml.kernel.org/r/20231223144226.25740-1-othacehe@gnu.org Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Matt Ranostay <matt@ranostay.sg> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29MAINTAINERS: change vmware.com addresses to broadcom.comZack Rusin
Update the email addresses for vmwgfx and vmmouse to reflect the fact that VMware is now part of Broadcom. Add a .mailmap entry because the vmware.com address will start bouncing soon. Link: https://lkml.kernel.org/r/20231224052036.603621-1-zack.rusin@broadcom.com Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Ian Forbes <ian.forbes@broadcom.com> Cc: Martin Krastev <martin.krastev@broadcom.com> Cc: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29arch/mm/fault: fix major fault accounting when retrying under per-VMA lockSuren Baghdasaryan
A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm/mglru: skip special VMAs in lru_gen_look_around()Yu Zhao
Special VMAs like VM_PFNMAP can contain anon pages from COW. There isn't much profit in doing lookaround on them. Besides, they can trigger the pte_special() warning in get_pte_pfn(). Skip them in lru_gen_look_around(). Link: https://lkml.kernel.org/r/20231223045647.1566043-1-yuzhao@google.com Fixes: 018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: syzbot+03fd9b3f71641f0ebf2d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/000000000000f9ff00060d14c256@google.com/ Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29MAINTAINERS: hand over hwpoison maintainership to Miaohe LinNaoya Horiguchi
Miaohe Lin has contributed to hwpoison subsystem as a reviewer for more than 1.5 year, and has made many patch contributions in hwpoison subsystem and the memory management subsystem. So I'd like to pass on the hwpoison maintainership to Miaohe. [nao.horiguchi@gmail.com: update to keep myself as a reviewer] Link: https://lkml.kernel.org/r/20231223031115.GA2883156@u2004 Link: https://lkml.kernel.org/r/20231222024024.1601043-1-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29MAINTAINERS: remove hugetlb maintainer Mike KravetzMike Kravetz
I am stepping away from my role as hugetlb maintainer. There should be no gap in coverage as Muchun Song is also a hugetlb maintainer. [akpm@linux-foundation.org: update CREDITS] Link: https://lkml.kernel.org/r/20231220220843.73586-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: fix unmap_mapping_range high bits shift bugJiajun Xie
The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: memcg: fix split queue list crash when large folio migrationBaolin Wang
When running autonuma with enabling multi-size THP, I encountered the following kernel crash issue: [ 134.290216] list_del corruption. prev->next should be fffff9ad42e1c490, but was dead000000000100. (prev=fffff9ad42399890) [ 134.290877] kernel BUG at lib/list_debug.c:62! [ 134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted: G E 6.7.0-rc4+ #20 [ 134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0 ...... [ 134.294252] Call Trace: [ 134.294362] <TASK> [ 134.294440] ? die+0x33/0x90 [ 134.294561] ? do_trap+0xe0/0x110 ...... [ 134.295681] ? __list_del_entry_valid_or_report+0x97/0xb0 [ 134.295842] folio_undo_large_rmappable+0x99/0x100 [ 134.296003] destroy_large_folio+0x68/0x70 [ 134.296172] migrate_folio_move+0x12e/0x260 [ 134.296264] ? __pfx_remove_migration_pte+0x10/0x10 [ 134.296389] migrate_pages_batch+0x495/0x6b0 [ 134.296523] migrate_pages+0x1d0/0x500 [ 134.296646] ? __pfx_alloc_misplaced_dst_folio+0x10/0x10 [ 134.296799] migrate_misplaced_folio+0x12d/0x2b0 [ 134.296953] do_numa_page+0x1f4/0x570 [ 134.297121] __handle_mm_fault+0x2b0/0x6c0 [ 134.297254] handle_mm_fault+0x107/0x270 [ 134.300897] do_user_addr_fault+0x167/0x680 [ 134.304561] exc_page_fault+0x65/0x140 [ 134.307919] asm_exc_page_fault+0x22/0x30 The reason for the crash is that, the commit 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") removed the charging and uncharging operations of the migration folios and cleared the memcg data of the old folio. During the subsequent release process of the old large folio in destroy_large_folio(), if the large folio needs to be removed from the split queue, an incorrect split queue can be obtained (which is pgdat->deferred_split_queue) because the old folio's memcg is NULL now. This can lead to list operations being performed under the wrong split queue lock protection, resulting in a list crash as above. After the migration, the old folio is going to be freed, so we can remove it from the split queue in mem_cgroup_migrate() a bit earlier before clearing the memcg data to avoid getting incorrect split queue. [akpm@linux-foundation.org: fix comment, per Zi Yan] Link: https://lkml.kernel.org/r/61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: fix arithmetic for max_prop_frac when setting max_ratioJingbo Xu
Since now bdi->max_ratio is part per million, fix the wrong arithmetic for max_prop_frac when setting max_ratio. Otherwise the miscalculated max_prop_frac will affect the incrementing of writeout completion count when max_ratio is not 100%. Link: https://lkml.kernel.org/r/20231219142508.86265-3-jefflexu@linux.alibaba.com Fixes: efc3e6ad53ea ("mm: split off __bdi_set_max_ratio() function") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: fix arithmetic for bdi min_ratioJingbo Xu
Since now bdi->min_ratio is part per million, fix the wrong arithmetic. Otherwise it will fail with -EINVAL when setting a reasonable min_ratio, as it tries to set min_ratio to (min_ratio * BDI_RATIO_SCALE) in percentage unit, which exceeds 100% anyway. # cat /sys/class/bdi/253\:0/min_ratio 0 # cat /sys/class/bdi/253\:0/max_ratio 100 # echo 1 > /sys/class/bdi/253\:0/min_ratio -bash: echo: write error: Invalid argument Link: https://lkml.kernel.org/r/20231219142508.86265-2-jefflexu@linux.alibaba.com Fixes: 8021fb3232f2 ("mm: split off __bdi_set_min_ratio() function") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29mm: align larger anonymous mappings on THP boundariesRik van Riel
Align larger anonymous memory mappings on THP boundaries by going through thp_get_unmapped_area if THPs are enabled for the current process. With this patch, larger anonymous mappings are now THP aligned. When a malloc library allocates a 2MB or larger arena, that arena can now be mapped with THPs right from the start, which can result in better TLB hit rates and execution time. Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com Link: https://lkml.kernel.org/r/20231214223423.1133074-1-yang@os.amperecomputing.com Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offsetEugen Hristev
AUD_PAD_TOP widget's correct register is AFE_AUD_PAD_TOP , and not zero. Having a zero as register, it would mean that the `snd_soc_dapm_new_widgets` would try to read the register at offset zero when trying to get the power status of this widget, which is incorrect. Fixes: b65c466220b3 ("ASoC: mediatek: mt8186: support adda in platform driver") Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20231229114342.195867-1-eugen.hristev@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-12-29cifs: do not depend on release_iface for maintaining iface_listShyam Prasad N
parse_server_interfaces should be in complete charge of maintaining the iface_list linked list. Today, iface entries are removed from the list only when the last refcount is dropped. i.e. in release_iface. However, this can result in undercounting of refcount if the server stops advertising interfaces (which Azure SMB server does). This change puts parse_server_interfaces in full charge of maintaining the iface_list. So if an empty list is returned by the server, the entries in the list will immediately be removed. This way, a following call to the same function will not find entries in the list. Fixes: aa45dadd34e4 ("cifs: change iface_list from array to sorted linked list") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-29cifs: cifs_chan_is_iface_active should be called with chan_lock heldShyam Prasad N
cifs_chan_is_iface_active checks the channels of a session to see if the associated iface is active. This should always happen with chan_lock held. However, these two callers of this function were missing this locking. This change makes sure the function calls are protected with proper locking. Fixes: b54034a73baf ("cifs: during reconnect, update interface if necessary") Fixes: fa1d0508bdd4 ("cifs: account for primary channel in the interface list") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-29cifs: after disabling multichannel, mark tcon for reconnectShyam Prasad N
Once the server disables multichannel for an active multichannel session, on the following reconnect, the client would reduce the number of channels to 1. However, it could be the case that the tree connect was active on one of these disabled channels. This results in an unrecoverable state. This change fixes that by making sure that whenever a channel is being terminated, the session and tcon are marked for reconnect too. This could mean a few redundant tree connect calls to the server, but considering that this is not a frequent event, we should be okay. Fixes: ee1d21794e55 ("cifs: handle when server stops supporting multichannel") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-29ALSA: scarlett2: Convert meter levels from little-endianGeoffrey D. Bennett
Add missing conversion from little-endian data to CPU-endian in scarlett2_usb_get_meter_levels(). Fixes: 3473185f31df ("ALSA: scarlett2: Remap Level Meter values") Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/ZYsBIE3DSKdi4YC/@m.b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>