diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-03-31 08:52:33 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-03-31 08:52:33 -0700 |
| commit | 802f0d58d52e8e34e08718479475ccdff0caffa0 (patch) | |
| tree | 305f3be98d12b0c6881a6c59eb92e795e6088e51 /tools/perf/util/parse-events.c | |
| parent | 4e82c87058f45e79eeaa4d5bcc3b38dd3dce7209 (diff) | |
| parent | 35d13f841a3d8159ef20d5e32a9ed3faa27875bc (diff) | |
| download | lwn-802f0d58d52e8e34e08718479475ccdff0caffa0.tar.gz lwn-802f0d58d52e8e34e08718479475ccdff0caffa0.zip | |
Merge tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Introduce latency profiling using scheduler information.
The latency profiling is to show impacts on wall-time rather than
cpu-time. By tracking context switches, it can weight samples and
find which part of the code contributed more to the execution
latency.
The value (period) of the sample is weighted by dividing it by the
number of parallel execution at the moment. The parallelism is
tracked in perf report with sched-switch records. This will reduce
the portion that are run in parallel and in turn increase the
portion of serial executions.
For now, it's limited to profile processes, IOW system-wide
profiling is not supported. You can add --latency option to enable
this.
$ perf record --latency -- make -C tools/perf
I've run the above command for perf build which adds -j option to
make with the number of CPUs in the system internally. Normally
it'd show something like below:
$ perf report -F overhead,comm
...
#
# Overhead Command
# ........ ...............
#
78.97% cc1
6.54% python3
4.21% shellcheck
3.28% ld
1.80% as
1.37% cc1plus
0.80% sh
0.62% clang
0.56% gcc
0.44% perl
0.39% make
...
The cc1 takes around 80% of the overhead as it's the actual
compiler. However it runs in parallel so its contribution to
latency may be less than that. Now, perf report will show both
overhead and latency (if --latency was given at record time) like
below:
$ perf report -s comm
...
#
# Overhead Latency Command
# ........ ........ ...............
#
78.97% 48.66% cc1
6.54% 25.68% python3
4.21% 0.39% shellcheck
3.28% 13.70% ld
1.80% 2.56% as
1.37% 3.08% cc1plus
0.80% 0.98% sh
0.62% 0.61% clang
0.56% 0.33% gcc
0.44% 1.71% perl
0.39% 0.83% make
...
You can see latency of cc1 goes down to around 50% and python3 and
ld contribute a lot more than their overhead. You can use --latency
option in perf report to get the same result but ordered by
latency.
$ perf report --latency -s comm
perf report:
- As a side effect of the latency profiling work, it adds a new
output field 'latency' and a sort key 'parallelism'. The below is a
result from my system with 64 CPUs. The build was well-parallelized
but contained some serial portions.
$ perf report -s parallelism
...
#
# Overhead Latency Parallelism
# ........ ........ ...........
#
16.95% 1.54% 62
13.38% 1.24% 61
12.50% 70.47% 1
11.81% 1.06% 63
7.59% 0.71% 60
4.33% 12.20% 2
3.41% 0.33% 59
2.05% 0.18% 64
1.75% 1.09% 9
1.64% 1.85% 5
...
- Support Feodra mini-debuginfo which is a LZMA compressed symbol
table inside ".gnu_debugdata" ELF section.
perf annotate:
- Add --code-with-type option to enable data-type profiling with the
usual annotate output.
Instead of focusing on data structure, it shows code annotation
together with data type it accesses in case the instruction refers
to a memory location (and it was able to resolve the target data
type). Currently it only works with --stdio.
$ perf annotate --stdio --code-with-type
...
Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period)
----------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff81050610 <__fdget>:
0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation)
0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation)
0.00 : ffffffff81050616: movq %rsp, %rbp
0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation)
0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation)
0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation)
0.00 : ffffffff8105061e: subq $0x10, %rsp
0.00 : ffffffff81050622: movl %edi, %ebx
0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0
0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files)
0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter)
0.00 : ffffffff81050636: cmpl $0x1, %eax
0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99>
0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt)
0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds)
0.00 : ffffffff81050641: cmpl %ebx, %eax
0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf>
0.00 : ffffffff81050649: movl %ebx, %r15d
5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd)
...
The "# data-type:" part was added with this change. The first few
entries are not very interesting. But later you can it accesses a
couple of fields in the task_struct, files_struct and fdtable.
perf trace:
- Support syscall tracing for different ABI. For example it can trace
system calls for 32-bit applications on 64-bit kernel
transparently.
- Add --summary-mode=total option to show global syscall summary. The
default is 'thread' to show per-thread syscall summary.
Python support:
- Add more interfaces to 'perf' module to parse events, and config,
enable or disable the event list properly so that it can implement
basic functionalities purely in Python. There is an example code
for these new interfaces in python/tracepoint.py.
- Add mypy and pylint support to enable build time checking. Fix some
code based on the findings from these tools.
Internals:
- Introduce io_dir__readdir() API to make directory traveral (usually
for proc or sysfs) efficient with less memory footprint.
JSON vendor events:
- Add events and metrics for ARM Neoverse N3 and V3
- Update events and metrics on various Intel CPUs
- Add/update events for a number of SiFive processors"
* tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits)
perf bpf-filter: Fix a parsing error with comma
perf report: Fix a memory leak for perf_env on AMD
perf trace: Fix wrong size to bpf_map__update_elem call
perf tools: annotate asm_pure_loop.S
perf python: Fix setup.py mypy errors
perf test: Address attr.py mypy error
perf build: Add pylint build tests
perf build: Add mypy build tests
perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
tools/build: Don't pass test log files to linker
perf bench sched pipe: fix enforced blocking reads in worker_thread
perf tools: Fix is_compat_mode build break in ppc64
perf build: filter all combinations of -flto for libperl
perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation
perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata
perf trace: Fix evlist memory leak
perf trace: Fix BTF memory leak
perf trace: Make syscall table stable
perf syscalltbl: Mask off ABI type for MIPS system calls
perf build: Remove Makefile.syscalls
...
Diffstat (limited to 'tools/perf/util/parse-events.c')
| -rw-r--r-- | tools/perf/util/parse-events.c | 179 |
1 files changed, 116 insertions, 63 deletions
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 1e23faa364b1..5152fd5a6ead 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -17,6 +17,7 @@ #include "strbuf.h" #include "debug.h" #include <api/fs/tracing_path.h> +#include <api/io_dir.h> #include <perf/cpumap.h> #include <util/parse-events-bison.h> #include <util/parse-events-flex.h> @@ -554,8 +555,8 @@ static int add_tracepoint_multi_event(struct parse_events_state *parse_state, struct parse_events_terms *head_config, YYLTYPE *loc) { char *evt_path; - struct dirent *evt_ent; - DIR *evt_dir; + struct io_dirent64 *evt_ent; + struct io_dir evt_dir; int ret = 0, found = 0; evt_path = get_events_file(sys_name); @@ -563,14 +564,14 @@ static int add_tracepoint_multi_event(struct parse_events_state *parse_state, tracepoint_error(err, errno, sys_name, evt_name, loc->first_column); return -1; } - evt_dir = opendir(evt_path); - if (!evt_dir) { + io_dir__init(&evt_dir, open(evt_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY)); + if (evt_dir.dirfd < 0) { put_events_file(evt_path); tracepoint_error(err, errno, sys_name, evt_name, loc->first_column); return -1; } - while (!ret && (evt_ent = readdir(evt_dir))) { + while (!ret && (evt_ent = io_dir__readdir(&evt_dir))) { if (!strcmp(evt_ent->d_name, ".") || !strcmp(evt_ent->d_name, "..") || !strcmp(evt_ent->d_name, "enable") @@ -592,7 +593,7 @@ static int add_tracepoint_multi_event(struct parse_events_state *parse_state, } put_events_file(evt_path); - closedir(evt_dir); + close(evt_dir.dirfd); return ret; } @@ -615,17 +616,23 @@ static int add_tracepoint_multi_sys(struct parse_events_state *parse_state, struct parse_events_error *err, struct parse_events_terms *head_config, YYLTYPE *loc) { - struct dirent *events_ent; - DIR *events_dir; + struct io_dirent64 *events_ent; + struct io_dir events_dir; int ret = 0; + char *events_dir_path = get_tracing_file("events"); - events_dir = tracing_events__opendir(); - if (!events_dir) { + if (!events_dir_path) { + tracepoint_error(err, errno, sys_name, evt_name, loc->first_column); + return -1; + } + io_dir__init(&events_dir, open(events_dir_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY)); + put_events_file(events_dir_path); + if (events_dir.dirfd < 0) { tracepoint_error(err, errno, sys_name, evt_name, loc->first_column); return -1; } - while (!ret && (events_ent = readdir(events_dir))) { + while (!ret && (events_ent = io_dir__readdir(&events_dir))) { if (!strcmp(events_ent->d_name, ".") || !strcmp(events_ent->d_name, "..") || !strcmp(events_ent->d_name, "enable") @@ -639,8 +646,7 @@ static int add_tracepoint_multi_sys(struct parse_events_state *parse_state, ret = add_tracepoint_event(parse_state, list, events_ent->d_name, evt_name, err, head_config, loc); } - - closedir(events_dir); + close(events_dir.dirfd); return ret; } @@ -1660,7 +1666,7 @@ int parse_events_multi_pmu_add_or_add_pmu(struct parse_events_state *parse_state /* Failed to add, try wildcard expansion of event_or_pmu as a PMU name. */ while ((pmu = perf_pmus__scan(pmu)) != NULL) { if (!parse_events__filter_pmu(parse_state, pmu) && - perf_pmu__match(pmu, event_or_pmu)) { + perf_pmu__wildcard_match(pmu, event_or_pmu)) { bool auto_merge_stats = perf_pmu__auto_merge_stats(pmu); if (!parse_events_add_pmu(parse_state, *listp, pmu, @@ -1974,48 +1980,55 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li int *force_grouped_idx = _fg_idx; int lhs_sort_idx, rhs_sort_idx, ret; const char *lhs_pmu_name, *rhs_pmu_name; - bool lhs_has_group, rhs_has_group; /* - * First sort by grouping/leader. Read the leader idx only if the evsel - * is part of a group, by default ungrouped events will be sorted - * relative to grouped events based on where the first ungrouped event - * occurs. If both events don't have a group we want to fall-through to - * the arch specific sorting, that can reorder and fix things like - * Intel's topdown events. + * Get the indexes of the 2 events to sort. If the events are + * in groups then the leader's index is used otherwise the + * event's index is used. An index may be forced for events that + * must be in the same group, namely Intel topdown events. */ - if (lhs_core->leader != lhs_core || lhs_core->nr_members > 1) { - lhs_has_group = true; - lhs_sort_idx = lhs_core->leader->idx; + if (*force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)) { + lhs_sort_idx = *force_grouped_idx; } else { - lhs_has_group = false; - lhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs) - ? *force_grouped_idx - : lhs_core->idx; - } - if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1) { - rhs_has_group = true; - rhs_sort_idx = rhs_core->leader->idx; + bool lhs_has_group = lhs_core->leader != lhs_core || lhs_core->nr_members > 1; + + lhs_sort_idx = lhs_has_group ? lhs_core->leader->idx : lhs_core->idx; + } + if (*force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)) { + rhs_sort_idx = *force_grouped_idx; } else { - rhs_has_group = false; - rhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs) - ? *force_grouped_idx - : rhs_core->idx; + bool rhs_has_group = rhs_core->leader != rhs_core || rhs_core->nr_members > 1; + + rhs_sort_idx = rhs_has_group ? rhs_core->leader->idx : rhs_core->idx; } + /* If the indices differ then respect the insertion order. */ if (lhs_sort_idx != rhs_sort_idx) return lhs_sort_idx - rhs_sort_idx; - /* Group by PMU if there is a group. Groups can't span PMUs. */ - if (lhs_has_group && rhs_has_group) { - lhs_pmu_name = lhs->group_pmu_name; - rhs_pmu_name = rhs->group_pmu_name; - ret = strcmp(lhs_pmu_name, rhs_pmu_name); - if (ret) - return ret; - } + /* + * Ignoring forcing, lhs_sort_idx == rhs_sort_idx so lhs and rhs should + * be in the same group. Events in the same group need to be ordered by + * their grouping PMU name as the group will be broken to ensure only + * events on the same PMU are programmed together. + * + * With forcing the lhs_sort_idx == rhs_sort_idx shows that one or both + * events are being forced to be at force_group_index. If only one event + * is being forced then the other event is the group leader of the group + * we're trying to force the event into. Ensure for the force grouped + * case that the PMU name ordering is also respected. + */ + lhs_pmu_name = lhs->group_pmu_name; + rhs_pmu_name = rhs->group_pmu_name; + ret = strcmp(lhs_pmu_name, rhs_pmu_name); + if (ret) + return ret; - /* Architecture specific sorting. */ + /* + * Architecture specific sorting, by default sort events in the same + * group with the same PMU by their insertion index. On Intel topdown + * constraints must be adhered to - slots first, etc. + */ return arch_evlist__cmp(lhs, rhs); } @@ -2024,9 +2037,11 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) int idx = 0, force_grouped_idx = -1; struct evsel *pos, *cur_leader = NULL; struct perf_evsel *cur_leaders_grp = NULL; - bool idx_changed = false, cur_leader_force_grouped = false; + bool idx_changed = false; int orig_num_leaders = 0, num_leaders = 0; int ret; + struct evsel *force_grouped_leader = NULL; + bool last_event_was_forced_leader = false; /* * Compute index to insert ungrouped events at. Place them where the @@ -2049,10 +2064,13 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) */ pos->core.idx = idx++; - /* Remember an index to sort all forced grouped events together to. */ - if (force_grouped_idx == -1 && pos == pos_leader && pos->core.nr_members < 2 && - arch_evsel__must_be_in_group(pos)) - force_grouped_idx = pos->core.idx; + /* + * Remember an index to sort all forced grouped events + * together to. Use the group leader as some events + * must appear first within the group. + */ + if (force_grouped_idx == -1 && arch_evsel__must_be_in_group(pos)) + force_grouped_idx = pos_leader->core.idx; } /* Sort events. */ @@ -2080,31 +2098,66 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) * Set the group leader respecting the given groupings and that * groups can't span PMUs. */ - if (!cur_leader) + if (!cur_leader) { cur_leader = pos; + cur_leaders_grp = &pos->core; + if (pos_force_grouped) + force_grouped_leader = pos; + } cur_leader_pmu_name = cur_leader->group_pmu_name; - if ((cur_leaders_grp != pos->core.leader && - (!pos_force_grouped || !cur_leader_force_grouped)) || - strcmp(cur_leader_pmu_name, pos_pmu_name)) { - /* Event is for a different group/PMU than last. */ + if (strcmp(cur_leader_pmu_name, pos_pmu_name)) { + /* PMU changed so the group/leader must change. */ cur_leader = pos; - /* - * Remember the leader's group before it is overwritten, - * so that later events match as being in the same - * group. - */ cur_leaders_grp = pos->core.leader; + if (pos_force_grouped && force_grouped_leader == NULL) + force_grouped_leader = pos; + } else if (cur_leaders_grp != pos->core.leader) { + bool split_even_if_last_leader_was_forced = true; + /* - * Avoid forcing events into groups with events that - * don't need to be in the group. + * Event is for a different group. If the last event was + * the forced group leader then subsequent group events + * and forced events should be in the same group. If + * there are no other forced group events then the + * forced group leader wasn't really being forced into a + * group, it just set arch_evsel__must_be_in_group, and + * we don't want the group to split here. */ - cur_leader_force_grouped = pos_force_grouped; + if (force_grouped_idx != -1 && last_event_was_forced_leader) { + struct evsel *pos2 = pos; + /* + * Search the whole list as the group leaders + * aren't currently valid. + */ + list_for_each_entry_continue(pos2, list, core.node) { + if (pos->core.leader == pos2->core.leader && + arch_evsel__must_be_in_group(pos2)) { + split_even_if_last_leader_was_forced = false; + break; + } + } + } + if (!last_event_was_forced_leader || split_even_if_last_leader_was_forced) { + if (pos_force_grouped) { + if (force_grouped_leader) { + cur_leader = force_grouped_leader; + cur_leaders_grp = force_grouped_leader->core.leader; + } else { + cur_leader = force_grouped_leader = pos; + cur_leaders_grp = &pos->core; + } + } else { + cur_leader = pos; + cur_leaders_grp = pos->core.leader; + } + } } if (pos_leader != cur_leader) { /* The leader changed so update it. */ evsel__set_leader(pos, cur_leader); } + last_event_was_forced_leader = (force_grouped_leader == pos); } list_for_each_entry(pos, list, core.node) { struct evsel *pos_leader = evsel__leader(pos); |
