summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-29tracing/user_events: Update ABI documentation to align to bits vs bytesBeau Belgrave
Update the documentation to reflect the new ABI requirements and how to use the byte index with the mask properly to check event status. Link: https://lkml.kernel.org/r/20220728233309.1896-7-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use bits vs bytes for enabled status page dataBeau Belgrave
User processes may require many events and when they do the cache performance of a byte index status check is less ideal than a bit index. The previous event limit per-page was 4096, the new limit is 32,768. This change adds a bitwise index to the user_reg struct. Programs check that the bit at status_bit has a bit set within the status page(s). Link: https://lkml.kernel.org/r/20220728233309.1896-6-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use refcount instead of atomic for ref trackingBeau Belgrave
User processes could open up enough event references to cause rollovers. These could cause use after free scenarios, which we do not want. Switching to refcount APIs prevent this, but will leak memory once saturated. Once saturated, user processes can still use the events. This prevents a bad user process from stopping existing telemetry from being emitted. Link: https://lkml.kernel.org/r/20220728233309.1896-5-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Ensure user provided strings are safely formattedBeau Belgrave
User processes can provide bad strings that may cause issues or leak kernel details back out. Don't trust the content of these strings when formatting strings for matching. This also moves to a consistent dynamic length string creation model. Link: https://lkml.kernel.org/r/20220728233309.1896-4-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use WRITE instead of READ for io vector importBeau Belgrave
import_single_range expects the direction/rw to be where it came from, not the protection/limit. Since the import is in a write path use WRITE. Link: https://lkml.kernel.org/r/20220728233309.1896-3-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use NULL for strstr checksBeau Belgrave
Trivial fix to ensure strstr checks use NULL instead of 0. Link: https://lkml.kernel.org/r/20220728233309.1896-2-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing: Fix spelling mistake "preapre" -> "prepare"Colin Ian King
There is a spelling mistake in the trace text. Fix it. Link: https://lkml.kernel.org/r/20220928215828.66325-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing: Wake up waiters when tracing is disabledSteven Rostedt (Google)
When tracing is disabled, there's no reason that waiters should stay waiting, wake them up, otherwise tasks get stuck when they should be flushing the buffers. Cc: stable@vger.kernel.org Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing: Add ioctl() to force ring buffer waiters to wake upSteven Rostedt (Google)
If a process is waiting on the ring buffer for data, there currently isn't a clean way to force it to wake up. Add an ioctl call that will force any tasks that are waiting on the trace_pipe_raw file to wake up. Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-28tracing: Wake up ring buffer waiters on closing of the fileSteven Rostedt (Google)
When the file that represents the ring buffer is closed, there may be waiters waiting on more input from the ring buffer. Call ring_buffer_wake_waiters() to wake up any waiters when the file is closed. Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-28ring-buffer: Add ring_buffer_wake_waiters()Steven Rostedt (Google)
On closing of a file that represents a ring buffer or flushing the file, there may be waiters on the ring buffer that needs to be woken up and exit the ring_buffer_wait() function. Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer and allow them to exit the wait loop. Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ring-buffer: Check pending waiters when doing wake ups as wellSteven Rostedt (Google)
The wake up waiters only checks the "wakeup_full" variable and not the "full_waiters_pending". The full_waiters_pending is set when a waiter is added to the wait queue. The wakeup_full is only set when an event is triggered, and it clears the full_waiters_pending to avoid multiple calls to irq_work_queue(). The irq_work callback really needs to check both wakeup_full as well as full_waiters_pending such that this code can be used to wake up waiters when a file is closed that represents the ring buffer and the waiters need to be woken up. Link: https://lkml.kernel.org/r/20220927231824.209460321@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ring-buffer: Have the shortest_full queue be the shortest not longestSteven Rostedt (Google)
The logic to know when the shortest waiters on the ring buffer should be woken up or not has uses a less than instead of a greater than compare, which causes the shortest_full to actually be the longest. Link: https://lkml.kernel.org/r/20220927231823.718039222@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ring-buffer: Allow splice to read previous partially read pagesSteven Rostedt (Google)
If a page is partially read, and then the splice system call is run against the ring buffer, it will always fail to read, no matter how much is in the ring buffer. That's because the code path for a partial read of the page does will fail if the "full" flag is set. The splice system call wants full pages, so if the read of the ring buffer is not yet full, it should return zero, and the splice will block. But if a previous read was done, where the beginning has been consumed, it should still be given to the splice caller if the rest of the page has been written to. This caused the splice command to never consume data in this scenario, and let the ring buffer just fill up and lose events. Link: https://lkml.kernel.org/r/20220927144317.46be6b80@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 8789a9e7df6bf ("ring-buffer: read page interface") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ftrace: Fix recursive locking direct_mutex in ftrace_modify_direct_callerSong Liu
Naveen reported recursive locking of direct_mutex with sample ftrace-direct-modify.ko: [ 74.762406] WARNING: possible recursive locking detected [ 74.762887] 6.0.0-rc6+ #33 Not tainted [ 74.763216] -------------------------------------------- [ 74.763672] event-sample-fn/1084 is trying to acquire lock: [ 74.764152] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ register_ftrace_function+0x1f/0x180 [ 74.764922] [ 74.764922] but task is already holding lock: [ 74.765421] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ modify_ftrace_direct+0x34/0x1f0 [ 74.766142] [ 74.766142] other info that might help us debug this: [ 74.766701] Possible unsafe locking scenario: [ 74.766701] [ 74.767216] CPU0 [ 74.767437] ---- [ 74.767656] lock(direct_mutex); [ 74.767952] lock(direct_mutex); [ 74.768245] [ 74.768245] *** DEADLOCK *** [ 74.768245] [ 74.768750] May be due to missing lock nesting notation [ 74.768750] [ 74.769332] 1 lock held by event-sample-fn/1084: [ 74.769731] #0: ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ modify_ftrace_direct+0x34/0x1f0 [ 74.770496] [ 74.770496] stack backtrace: [ 74.770884] CPU: 4 PID: 1084 Comm: event-sample-fn Not tainted ... [ 74.771498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 74.772474] Call Trace: [ 74.772696] <TASK> [ 74.772896] dump_stack_lvl+0x44/0x5b [ 74.773223] __lock_acquire.cold.74+0xac/0x2b7 [ 74.773616] lock_acquire+0xd2/0x310 [ 74.773936] ? register_ftrace_function+0x1f/0x180 [ 74.774357] ? lock_is_held_type+0xd8/0x130 [ 74.774744] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.775213] __mutex_lock+0x99/0x1010 [ 74.775536] ? register_ftrace_function+0x1f/0x180 [ 74.775954] ? slab_free_freelist_hook.isra.43+0x115/0x160 [ 74.776424] ? ftrace_set_hash+0x195/0x220 [ 74.776779] ? register_ftrace_function+0x1f/0x180 [ 74.777194] ? kfree+0x3e1/0x440 [ 74.777482] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.777941] ? __schedule+0xb40/0xb40 [ 74.778258] ? register_ftrace_function+0x1f/0x180 [ 74.778672] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.779128] register_ftrace_function+0x1f/0x180 [ 74.779527] ? ftrace_set_filter_ip+0x33/0x70 [ 74.779910] ? __schedule+0xb40/0xb40 [ 74.780231] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.780678] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.781147] ftrace_modify_direct_caller+0x5b/0x90 [ 74.781563] ? 0xffffffffa0201000 [ 74.781859] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.782309] modify_ftrace_direct+0x1b2/0x1f0 [ 74.782690] ? __schedule+0xb40/0xb40 [ 74.783014] ? simple_thread+0x2a/0xb0 [ftrace_direct_modify] [ 74.783508] ? __schedule+0xb40/0xb40 [ 74.783832] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.784294] simple_thread+0x76/0xb0 [ftrace_direct_modify] [ 74.784766] kthread+0xf5/0x120 [ 74.785052] ? kthread_complete_and_exit+0x20/0x20 [ 74.785464] ret_from_fork+0x22/0x30 [ 74.785781] </TASK> Fix this by using register_ftrace_function_nolock in ftrace_modify_direct_caller. Link: https://lkml.kernel.org/r/20220927004146.1215303-1-song@kernel.org Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function") Reported-and-tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ftrace: Properly unset FTRACE_HASH_FL_MODZheng Yejian
When executing following commands like what document said, but the log "#### all functions enabled ####" was not shown as expect: 1. Set a 'mod' filter: $ echo 'write*:mod:ext3' > /sys/kernel/tracing/set_ftrace_filter 2. Invert above filter: $ echo '!write*:mod:ext3' >> /sys/kernel/tracing/set_ftrace_filter 3. Read the file: $ cat /sys/kernel/tracing/set_ftrace_filter By some debugging, I found that flag FTRACE_HASH_FL_MOD was not unset after inversion like above step 2 and then result of ftrace_hash_empty() is incorrect. Link: https://lkml.kernel.org/r/20220926152008.2239274-1-zhengyejian1@huawei.com Cc: <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27tracing/eprobe: Fix alloc event dir failed when event name no setTao Chen
The event dir will alloc failed when event name no set, using the command: "echo "e:esys/ syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events" It seems that dir name="syscalls/sys_enter_openat" is not allowed in debugfs. So just use the "sys_enter_openat" as the event name. Link: https://lkml.kernel.org/r/1664028814-45923-1-git-send-email-chentao.kernel@linux.alibaba.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Linyu Yuan <quic_linyyuan@quicinc.com> Cc: Tao Chen <chentao.kernel@linux.alibaba.com Cc: stable@vger.kernel.org Fixes: 95c104c378dc ("tracing: Auto generate event name when creating a group of events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27x86: kprobes: Remove unused macro stack_addrChen Zhongjin
An unused macro reported by [-Wunused-macros]. This macro is used to access the sp in pt_regs because at that time x86_32 can only get sp by kernel_stack_pointer(regs). '3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")' This commit have unified the pt_regs and from them we can get sp from pt_regs with regs->sp easily. Nowhere is using this macro anymore. Refrencing pt_regs directly is more clear. Remove this macro for code cleaning. Link: https://lkml.kernel.org/r/20220924072629.104759-1-chenzhongjin@huawei.com Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ftrace: Remove obsoleted code from ftrace and task_structGaosheng Cui
The trace of "struct task_struct" was no longer used since commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do"), and the functions about flags for current->trace is useless, so remove them. Link: https://lkml.kernel.org/r/20220923090012.505990-1-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27tracing: Disable interrupt or preemption before acquiring arch_spinlock_tWaiman Long
It was found that some tracing functions in kernel/trace/trace.c acquire an arch_spinlock_t with preemption and irqs enabled. An example is the tracing_saved_cmdlines_size_read() function which intermittently causes a "BUG: using smp_processor_id() in preemptible" warning when the LTP read_all_proc test is run. That can be problematic in case preemption happens after acquiring the lock. Add the necessary preemption or interrupt disabling code in the appropriate places before acquiring an arch_spinlock_t. The convention here is to disable preemption for trace_cmdline_lock and interupt for max_lock. Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org Fixes: a35873a0993b ("tracing: Add conditional snapshot") Fixes: 939c7a4f04fc ("tracing: Introduce saved_cmdlines_size file") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26rv/monitor: Add __init/__exit annotations to module init/exit funcsXiu Jianfeng
Add missing __init/__exit annotations to module init/exit funcs. Link: https://lkml.kernel.org/r/20220922103208.162869-1-xiujianfeng@huawei.com Fixes: 24bce201d798 ("tools/rv: Add dot2k") Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k") Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/osnoise: Fix possible recursive locking in stop_per_cpu_kthreadsNico Pache
There is a recursive lock on the cpu_hotplug_lock. In kernel/trace/trace_osnoise.c:<start/stop>_per_cpu_kthreads: - start_per_cpu_kthreads calls cpus_read_lock() and if start_kthreads returns a error it will call stop_per_cpu_kthreads. - stop_per_cpu_kthreads then calls cpus_read_lock() again causing deadlock. Fix this by calling cpus_read_unlock() before calling stop_per_cpu_kthreads. This behavior can also be seen in commit f46b16520a08 ("trace/hwlat: Implement the per-cpu mode"). This error was noticed during the LTP ftrace-stress-test: WARNING: possible recursive locking detected -------------------------------------------- sh/275006 is trying to acquire lock: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: stop_per_cpu_kthreads but task is already holding lock: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: start_per_cpu_kthreads other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cpu_hotplug_lock); lock(cpu_hotplug_lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by sh/275006: #0: ffff8881023f0470 (sb_writers#24){.+.+}-{0:0}, at: ksys_write #1: ffffffffb084f430 (trace_types_lock){+.+.}-{3:3}, at: rb_simple_write #2: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: start_per_cpu_kthreads Link: https://lkml.kernel.org/r/20220919144932.3064014-1-npache@redhat.com Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Nico Pache <npache@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: kprobe: Make gen test module work in arm and riscvYipeng Zou
For now, this selftest module can only work in x86 because of the kprobe cmd was fixed use of x86 registers. This patch adapted to register names under arm and riscv, So that this module can be worked on those platform. Link: https://lkml.kernel.org/r/20220919125629.238242-3-zouyipeng@huawei.com Cc: <linux-riscv@lists.infradead.org> Cc: <mingo@redhat.com> Cc: <paul.walmsley@sifive.com> Cc: <palmer@dabbelt.com> Cc: <aou@eecs.berkeley.edu> Cc: <zanussi@kernel.org> Cc: <liaochang1@huawei.com> Cc: <chris.zjh@huawei.com> Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: kprobe: Fix kprobe event gen test module on exitYipeng Zou
Correct gen_kretprobe_test clr event para on module exit. This will make it can't to delete. Link: https://lkml.kernel.org/r/20220919125629.238242-2-zouyipeng@huawei.com Cc: <linux-riscv@lists.infradead.org> Cc: <mingo@redhat.com> Cc: <paul.walmsley@sifive.com> Cc: <palmer@dabbelt.com> Cc: <aou@eecs.berkeley.edu> Cc: <zanussi@kernel.org> Cc: <liaochang1@huawei.com> Cc: <chris.zjh@huawei.com> Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26x86/kprobes: Remove unused arch_kprobe_override_function() declarationGaosheng Cui
All uses of arch_kprobe_override_function() have been removed by commit 540adea3809f ("error-injection: Separate error-injection from kprobe"), so remove the declaration, too. Link: https://lkml.kernel.org/r/20220914110437.1436353-3-cuigaosheng1@huawei.com Cc: <mingo@redhat.com> Cc: <tglx@linutronix.de> Cc: <bp@alien8.de> Cc: <dave.hansen@linux.intel.com> Cc: <x86@kernel.org> Cc: <hpa@zytor.com> Cc: <mhiramat@kernel.org> Cc: <peterz@infradead.org> Cc: <ast@kernel.org> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26x86/ftrace: Remove unused modifying_ftrace_code declarationGaosheng Cui
All uses of modifying_ftrace_code have been removed by commit 768ae4406a5c ("x86/ftrace: Use text_poke()"), so remove the declaration, too. Link: https://lkml.kernel.org/r/20220914110437.1436353-2-cuigaosheng1@huawei.com Cc: <mingo@redhat.com> Cc: <tglx@linutronix.de> Cc: <bp@alien8.de> Cc: <dave.hansen@linux.intel.com> Cc: <x86@kernel.org> Cc: <hpa@zytor.com> Cc: <mhiramat@kernel.org> Cc: <peterz@infradead.org> Cc: <ast@kernel.org> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracepoint: Optimize the critical region of mutex_lock in ↵Zhen Lei
tracepoint_module_coming() The memory allocation of 'tp_mod' does not require mutex_lock() protection, move it out. Link: https://lkml.kernel.org/r/20220914061416.1630-1-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/filter: Call filter predicate functions directly via a switch statementSteven Rostedt (Google)
Due to retpolines, indirect calls are much more expensive than direct calls. The filters have a select set of functions it uses for the predicates. Instead of using function pointers to call them, create a filter_pred_fn_call() function that uses a switch statement to call the predicate functions directly. This gives almost a 10% speedup to the filter logic. Using the histogram benchmark: Before: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 113 } hitcount: 272 { delta: 114 } hitcount: 840 { delta: 118 } hitcount: 344 { delta: 119 } hitcount: 25428 { delta: 120 } hitcount: 350590 { delta: 121 } hitcount: 1892484 { delta: 122 } hitcount: 6205004 { delta: 123 } hitcount: 11583521 { delta: 124 } hitcount: 37590979 { delta: 125 } hitcount: 108308504 { delta: 126 } hitcount: 131672461 { delta: 127 } hitcount: 88700598 { delta: 128 } hitcount: 65939870 { delta: 129 } hitcount: 45055004 { delta: 130 } hitcount: 33174464 { delta: 131 } hitcount: 31813493 { delta: 132 } hitcount: 29011676 { delta: 133 } hitcount: 22798782 { delta: 134 } hitcount: 22072486 { delta: 135 } hitcount: 17034113 { delta: 136 } hitcount: 8982490 { delta: 137 } hitcount: 2865908 { delta: 138 } hitcount: 980382 { delta: 139 } hitcount: 1651944 { delta: 140 } hitcount: 4112073 { delta: 141 } hitcount: 3963269 { delta: 142 } hitcount: 1712508 { delta: 143 } hitcount: 575941 After: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 103 } hitcount: 60 { delta: 104 } hitcount: 16966 { delta: 105 } hitcount: 396625 { delta: 106 } hitcount: 3223400 { delta: 107 } hitcount: 12053754 { delta: 108 } hitcount: 20241711 { delta: 109 } hitcount: 14850200 { delta: 110 } hitcount: 4946599 { delta: 111 } hitcount: 3479315 { delta: 112 } hitcount: 18698299 { delta: 113 } hitcount: 62388733 { delta: 114 } hitcount: 95803834 { delta: 115 } hitcount: 58278130 { delta: 116 } hitcount: 15364800 { delta: 117 } hitcount: 5586866 { delta: 118 } hitcount: 2346880 { delta: 119 } hitcount: 1131091 { delta: 120 } hitcount: 620896 { delta: 121 } hitcount: 236652 { delta: 122 } hitcount: 105957 { delta: 123 } hitcount: 119107 { delta: 124 } hitcount: 54494 { delta: 125 } hitcount: 63856 { delta: 126 } hitcount: 64454 { delta: 127 } hitcount: 34818 { delta: 128 } hitcount: 41446 { delta: 129 } hitcount: 51242 { delta: 130 } hitcount: 28361 { delta: 131 } hitcount: 23926 The peak before was 126ns per event, after the peak is 114ns, and the fastest time went from 113ns to 103ns. Link: https://lkml.kernel.org/r/20220906225529.781407172@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: Move struct filter_pred into trace_events_filter.cSteven Rostedt (Google)
The structure filter_pred and the typedef of the function used are only referenced by trace_events_filter.c. There's no reason to have it in an external header file. Move them into the only file they are used in. Link: https://lkml.kernel.org/r/20220906225529.598047132@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/hist: Call hist functions directly via a switch statementSteven Rostedt (Google)
Due to retpolines, indirect calls are much more expensive than direct calls. The histograms have a select set of functions it uses for the histograms, instead of using function pointers to call them, create a hist_fn_call() function that uses a switch statement to call the histogram functions directly. This gives a 13% speedup to the histogram logic. Using the histogram benchmark: Before: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 129 } hitcount: 2213 { delta: 130 } hitcount: 285965 { delta: 131 } hitcount: 1146545 { delta: 132 } hitcount: 5185432 { delta: 133 } hitcount: 19896215 { delta: 134 } hitcount: 53118616 { delta: 135 } hitcount: 83816709 { delta: 136 } hitcount: 68329562 { delta: 137 } hitcount: 41859349 { delta: 138 } hitcount: 46257797 { delta: 139 } hitcount: 54400831 { delta: 140 } hitcount: 72875007 { delta: 141 } hitcount: 76193272 { delta: 142 } hitcount: 49504263 { delta: 143 } hitcount: 38821072 { delta: 144 } hitcount: 47702679 { delta: 145 } hitcount: 41357297 { delta: 146 } hitcount: 22058238 { delta: 147 } hitcount: 9720002 { delta: 148 } hitcount: 3193542 { delta: 149 } hitcount: 927030 { delta: 150 } hitcount: 850772 { delta: 151 } hitcount: 1477380 { delta: 152 } hitcount: 2687977 { delta: 153 } hitcount: 2865985 { delta: 154 } hitcount: 1977492 { delta: 155 } hitcount: 2475607 { delta: 156 } hitcount: 3403612 After: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 113 } hitcount: 272 { delta: 114 } hitcount: 840 { delta: 118 } hitcount: 344 { delta: 119 } hitcount: 25428 { delta: 120 } hitcount: 350590 { delta: 121 } hitcount: 1892484 { delta: 122 } hitcount: 6205004 { delta: 123 } hitcount: 11583521 { delta: 124 } hitcount: 37590979 { delta: 125 } hitcount: 108308504 { delta: 126 } hitcount: 131672461 { delta: 127 } hitcount: 88700598 { delta: 128 } hitcount: 65939870 { delta: 129 } hitcount: 45055004 { delta: 130 } hitcount: 33174464 { delta: 131 } hitcount: 31813493 { delta: 132 } hitcount: 29011676 { delta: 133 } hitcount: 22798782 { delta: 134 } hitcount: 22072486 { delta: 135 } hitcount: 17034113 { delta: 136 } hitcount: 8982490 { delta: 137 } hitcount: 2865908 { delta: 138 } hitcount: 980382 { delta: 139 } hitcount: 1651944 { delta: 140 } hitcount: 4112073 { delta: 141 } hitcount: 3963269 { delta: 142 } hitcount: 1712508 { delta: 143 } hitcount: 575941 { delta: 144 } hitcount: 351427 { delta: 145 } hitcount: 218077 { delta: 146 } hitcount: 167297 { delta: 147 } hitcount: 146198 { delta: 148 } hitcount: 116122 { delta: 149 } hitcount: 58993 { delta: 150 } hitcount: 40228 The delta above is in nanoseconds. It brings the fastest time down from 129ns to 113ns, and the peak from 141ns to 126ns. Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: Add numeric delta time to the trace event benchmarkSteven Rostedt (Google)
In order to testing filtering and histograms via the trace event benchmark, record the delta time of the last event as a numeric value (currently, it just saves it within the string) so that filters and histograms can use it. Link: https://lkml.kernel.org/r/20220906225529.213677569@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26rv/dot2K: add 'static' qualifier for local variableZeng Heng
Following Daniel's suggestion, fix similar warning in template files, which would prevent new monitors from such warning. Link: https://lkml.kernel.org/r/20220824034357.2014202-3-zengheng4@huawei.com Cc: <mingo@redhat.com> Fixes: 24bce201d798 ("tools/rv: Add dot2k") Suggested-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Zeng Heng <zengheng4@huawei.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26rv/monitors: add 'static' qualifier for local symbolsZeng Heng
The sparse tool complains as follows: kernel/trace/rv/monitors/wwnr/wwnr.c:18:19: warning: symbol 'rv_wwnr' was not declared. Should it be static? The `rv_wwnr` symbol is not dereferenced by other extern files, so add static qualifier for it. So does wip module. Link: https://lkml.kernel.org/r/20220824034357.2014202-2-zengheng4@huawei.com Cc: <mingo@redhat.com> Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor") Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26selftests/ftrace: Add eprobe syntax error testcaseMasami Hiramatsu (Google)
Add a syntax error test case for eprobe as same as kprobes. Link: https://lkml.kernel.org/r/165932115471.2850673.8014722990775242727.stgit@devnote2 Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/eprobe: Add eprobe filter supportMasami Hiramatsu (Google)
Add the filter option to the event probe. This is useful if user wants to derive a new event based on the condition of the original event. E.g. echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \ runtime=$runtime:u32 if cpu < 4' >> ../dynamic_events Then it can filter the events only on first 4 cores. Note that the fields used for 'if' must be the fields in the original events, not eprobe events. Link: https://lkml.kernel.org/r/165932114513.2850673.2592206685744598080.stgit@devnote2 Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-25Linux 6.0-rc7v6.0-rc7Linus Torvalds
2022-09-25Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Regression and bug fixes: - Performance regression fix from 5.18 on a Rasberry Pi - Fix extent parsing bug which triggers a BUG_ON when a (corrupted) extent tree has has a non-root node when zero entries. - Fix a livelock where in the right (wrong) circumstances a large number of nfsd threads can try to write to a nearly full file system, and retry for hours(!)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: limit the number of retries after discarding preallocations blocks ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 ext4: use buckets for cr 1 block scan instead of rbtree ext4: use locality group preallocation for small closed files ext4: make directory inode spreading reflect flexbg size ext4: avoid unnecessary spreading of allocations among groups ext4: make mballoc try target group first even with mb_optimize_scan
2022-09-25Merge tag 'dax-and-nvdimm-fixes-v6.0-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull NVDIMM and DAX fixes from Dan Williams: "A recently discovered one-line fix for devdax that further addresses a v5.5 regression, and (a bit embarrassing) a small batch of fixes that have been sitting in my fixes tree for weeks. The older fixes have soaked in linux-next during that time and address an fsdax infinite loop and some other minor fixups. - Fix a infinite loop bug in fsdax - Fix memory-type detection for devdax (EINJ regression) - Small cleanups" * tag 'dax-and-nvdimm-fixes-v6.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: devdax: Fix soft-reservation memory description fsdax: Fix infinite loop in dax_iomap_rw() nvdimm/namespace: drop nested variable in create_namespace_pmem() ndtest: Cleanup all of blk namespace specific code pmem: fix a name collision
2022-09-25Merge tag 'i2c-for-6.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C driver bugfixes for mlxbf and imx, a few documentation fixes after the rework this cycle, and one hardening for the i2c-mux core" * tag 'i2c-for-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mux: harden i2c_mux_alloc() against integer overflows i2c: mlxbf: Fix frequency calculation i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction() i2c: mlxbf: incorrect base address passed during io write Documentation: i2c: fix references to other documents MAINTAINERS: remove Nehal Shah from AMD MP2 I2C DRIVER i2c: imx: If pm_runtime_get_sync() returned 1 device access is possible
2022-09-24Merge branch 'for-6.0/dax' into libnvdimm-fixesDan Williams
Pick up another "Soft Reservation" fix for v6.0-final on top of some straggling nvdimm fixes that missed v5.19.
2022-09-24devdax: Fix soft-reservation memory descriptionDan Williams
The "hmem" platform-devices that are created to represent the platform-advertised "Soft Reserved" memory ranges end up inserting a resource that causes the iomem_resource tree to look like this: 340000000-43fffffff : hmem.0 340000000-43fffffff : Soft Reserved 340000000-43fffffff : dax0.0 This is because insert_resource() reparents ranges when they completely intersect an existing range. This matters because code that uses region_intersects() to scan for a given IORES_DESC will only check that top-level 'hmem.0' resource and not the 'Soft Reserved' descendant. So, to support EINJ (via einj_error_inject()) to inject errors into memory hosted by a dax-device, be sure to describe the memory as IORES_DESC_SOFT_RESERVED. This is a follow-on to: commit b13a3e5fd40b ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP") ...that fixed EINJ support for "Soft Reserved" ranges in the first instance. Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration") Reported-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com> Tested-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com> Cc: <stable@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Omar Avelar <omar.avelar@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Gross <markgross@kernel.org> Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-09-24Merge tag 'kbuild-fixes-v6.0-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix build error for the combination of SYSTEM_TRUSTED_KEYRING=y and X509_CERTIFICATE_PARSER=m - Fix DEBUG_INFO_SPLIT to generate debug info for GCC 11+ and Clang 12+ - Revive debug info for assembly files - Remove unused code * tag 'kbuild-fixes-v6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Makefile.debug: re-enable debug info for .S files Makefile.debug: set -g unconditional on CONFIG_DEBUG_INFO_SPLIT certs: make system keyring depend on built-in x509 parser Kconfig: remove unused function 'menu_get_root_menu' scripts/clang-tools: remove unused module
2022-09-24Merge tag 's390-6.0-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Vasily Gorbik: - Fix potential hangs in VFIO AP driver * tag 's390-6.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/vfio-ap: bypass unnecessary processing of AP resources
2022-09-24Merge tag 'pm-6.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an uninitialized variable usage in the operating performance points code and add missing DT bindings for it. Specifics: - Fix uninitialized variable usage in dev_pm_opp_config_clks_simple() (Christophe JAILLET) - Add missing OPP DT properties (Rob Herring)" * tag 'pm-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: dt-bindings: opp: Add missing (unevaluated|additional)Properties on child nodes OPP: Fix an un-initialized variable usage
2022-09-24Merge tag 'char-misc-6.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are three tiny driver fixes for 6.0-rc7. They include: - phy driver reset bugfix - fpga memleak bugfix - counter irq config bugfix The first two have been in linux-next for a while, the last one has only been added to my tree in the past few days, but was in linux-next under a different commit id. I couldn't pull directly from the counter tree due to some gpg key propagation issue, so I took the commit directly from email instead" * tag 'char-misc-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: counter: 104-quad-8: Fix skipped IRQ lines during events configuration fpga: m10bmc-sec: Fix possible memory leak of flash_buf phy: marvell: phy-mvebu-a3700-comphy: Remove broken reset support
2022-09-24Merge tag 'tty-6.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small, and late, serial driver fixes for 6.0-rc7 to resolve some reported problems. Included in here are: - tegra icount accounting fixes, including a framework function that other drivers will be converted over to using in 6.1-rc1. - fsl_lpuart reset bugfix - 8250 omap 485 bugfix - sifive serial clock bugfix The last three patches have not shown up in linux-next due to them being added to my tree only 2 days ago, but they are tiny and self-contained and the developers say they resolve issues that they have with 6.0-rc. The other three have been in linux-next for a while with no reported issues" * tag 'tty-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: sifive: enable clocks for UART when probed serial: 8250: omap: Use serial8250_em485_supported serial: fsl_lpuart: Reset prior to registration serial: tegra-tcu: Use uart_xmit_advance(), fixes icount.tx accounting serial: tegra: Use uart_xmit_advance(), fixes icount.tx accounting serial: Create uart_xmit_advance()
2022-09-24Merge tag 'cgroup-for-6.0-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Add Waiman Long as a cpuset maintainer - cgroup_get_from_id() could be fed a kernfs ID which doesn't point to a cgroup directory but a knob file and then crash. Error out if the lookup kernfs_node isn't a directory. * tag 'cgroup-for-6.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: cgroup_get_from_id() must check the looked-up kn is a directory cpuset: Add Waiman Long as a cpuset maintainer
2022-09-24Merge tag 'wq-for-6.0-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Just one patch to improve flush lockdep coverage" * tag 'wq-for-6.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: don't skip lockdep work dependency in cancel_work_sync()
2022-09-24Merge tag 'io_uring-6.0-2022-09-23' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix for an issue with un-reaped IOPOLL requests on ring exit" * tag 'io_uring-6.0-2022-09-23' of git://git.kernel.dk/linux: io_uring: ensure that cached task references are always put on exit
2022-09-24Merge tag 'block-6.0-2022-09-22' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Fix a regression that's been plaguing us by reverting the offending commit, as attempts to both reproduce the issue and fix it in a saner fashion have failed. Fix for a potential oops condition in the s390 dasd block driver" * tag 'block-6.0-2022-09-22' of git://git.kernel.dk/linux: Revert "block: freeze the queue earlier in del_gendisk" s390/dasd: fix Oops in dasd_alias_get_start_dev due to missing pavgroup