diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-18 06:32:11 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-18 06:32:11 +0200 |
commit | 57d17378a4a042401b0c2fe211e5a0e3a276cb3d (patch) | |
tree | be65e51a7b11cccb903b44708b98b3af47477304 /tools | |
parent | f0033681f0fe8421baf8db125e57fa6157824c2d (diff) | |
parent | 9bce13ea88f85344b765abe5d3dabdd0f44dc177 (diff) | |
download | lwn-57d17378a4a042401b0c2fe211e5a0e3a276cb3d.tar.gz lwn-57d17378a4a042401b0c2fe211e5a0e3a276cb3d.zip |
Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"New features:
- Add 'trace' subcommand for 'perf ftrace', setting the stage for
more 'perf ftrace' subcommands. Not using a subcommand yields the
previous behaviour of 'perf ftrace'.
- Add 'latency' subcommand to 'perf ftrace', that can use the
function graph tracer or a BPF optimized one, via the -b/--use-bpf
option.
E.g.:
$ sudo perf ftrace latency -a -T mutex_lock sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 4596 | ######################## |
1 - 2 us | 1680 | ######### |
2 - 4 us | 1106 | ##### |
4 - 8 us | 546 | ## |
8 - 16 us | 562 | ### |
16 - 32 us | 1 | |
32 - 64 us | 0 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
The original implementation of this command was in the bcc tool.
- Support --cputype option for hybrid events in 'perf stat'.
Improvements:
- Call chain improvements for ARM64.
- No need to do any affinity setup when profiling pids.
- Reduce multiplexing with duration_time in 'perf stat' metrics.
- Improve error message for uncore events, stating that some event
groups are can only be used in system wide (-a) mode.
- perf stat metric group leader fixes/improvements, including arch
specific changes to better support Intel topdown events.
- Probe non-deprecated sysfs path first, i.e. try the path
/sys/devices/system/cpu/cpuN/topology/thread_siblings first, then
the old /sys/devices/system/cpu/cpuN/topology/core_cpus.
- Disable debuginfod by default in 'perf record', to avoid stalls on
distros such as Fedora 35.
- Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.
- Enable ignore_missing_thread in 'perf trace'
Fixes:
- Avoid TUI crash when navigating in the annotation of recursive
functions.
- Fix hex dump character output in 'perf script'.
- Fix JSON indentation to 4 spaces standard in the ARM vendor event
files.
- Fix use after free in metric__new().
- Fix IS_ERR_OR_NULL() usage in the perf BPF loader.
- Fix up cross-arch register support, i.e. when printing register
names take into account the architecture where the perf.data file
was collected.
- Fix SMT fallback with large core counts.
- Don't lower case MetricExpr when parsing JSON files so as not to
lose info such as the ":G" event modifier in metrics.
perf test:
- Add basic stress test for sigtrap handling to 'perf test'.
- Fix 'perf test' failures on s/390
- Enable system wide for metricgroups test in 'perf test´.
- Use 3 digits for test numbering now we can have more tests.
Arch specific:
- Add events for Arm Neoverse N2 in the ARM JSON vendor event files
- Support PERF_MEM_LVLNUM encodings in powerpc, that came from a
single patch series, where I incorrectly merged the kernel bits,
that were then reverted after coordination with Michael Ellerman
and Stephen Rothwell.
- Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.
- Update AMD documentation, with info on raw event encoding.
- Add support for global and local variants of the "p_stage_cyc" sort
key, applicable to perf.data files collected on powerpc.
- Remove duplicate and incorrect aux size checks in the ARM CoreSight
ETM code.
Refactorings:
- Add a perf_cpu abstraction to disambiguate CPUs and CPU map
indexes, fixing problems along the way.
- Document CPU map methods.
UAPI sync:
- Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench
mem memcpy'
- Sync UAPI files with the kernel sources: drm, msr-index,
cpufeatures.
Build system
- Enable warnings through HOSTCFLAGS.
- Drop requirement for libstdc++.so for libopencsd check
libperf:
- Make libperf adopt perf_counts_values__scale() from tools/perf/util/.
- Add a stat multiplexing test to libperf"
* tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits)
perf record: Disable debuginfod by default
perf evlist: No need to do any affinity setup when profiling pids
perf cpumap: Add is_dummy() method
perf metric: Fix metric_leader
perf cputopo: Fix CPU topology reading on s/390
perf metricgroup: Fix use after free in metric__new()
libperf tests: Update a use of the new cpumap API
perf arm: Fix off-by-one directory path
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Update tools's copy of drm.h header
tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
perf pmu-events: Don't lower case MetricExpr
perf expr: Add debug logging for literals
perf tools: Probe non-deprecated sysfs path 1st
perf tools: Fix SMT fallback with large core counts
perf cpumap: Give CPUs their own type
perf stat: Correct first_shadow_cpu to return index
perf script: Fix flipped index and cpu
perf c2c: Use more intention revealing iterator
...
Diffstat (limited to 'tools')
153 files changed, 4685 insertions, 2175 deletions
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index d5b5f2ab87a0..18de5f76f198 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -315,6 +315,7 @@ #define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */ +#define X86_FEATURE_CPPC (13*32+27) /* Collaborative Processor Performance Control */ /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index 01e2650b9585..3faf0f97edb1 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -486,6 +486,23 @@ #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f +/* AMD Collaborative Processor Performance Control MSRs */ +#define MSR_AMD_CPPC_CAP1 0xc00102b0 +#define MSR_AMD_CPPC_ENABLE 0xc00102b1 +#define MSR_AMD_CPPC_CAP2 0xc00102b2 +#define MSR_AMD_CPPC_REQ 0xc00102b3 +#define MSR_AMD_CPPC_STATUS 0xc00102b4 + +#define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff) +#define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff) +#define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff) +#define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff) + +#define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0) +#define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8) +#define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16) +#define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24) + /* Fam 17h MSRs */ #define MSR_F17H_IRPERF 0xc00000e9 diff --git a/tools/arch/x86/lib/memcpy_64.S b/tools/arch/x86/lib/memcpy_64.S index 1cc9da6e29c7..59cf2343f3d9 100644 --- a/tools/arch/x86/lib/memcpy_64.S +++ b/tools/arch/x86/lib/memcpy_64.S @@ -39,7 +39,7 @@ SYM_FUNC_START_WEAK(memcpy) rep movsq movl %edx, %ecx rep movsb - ret + RET SYM_FUNC_END(memcpy) SYM_FUNC_END_ALIAS(__memcpy) EXPORT_SYMBOL(memcpy) @@ -53,7 +53,7 @@ SYM_FUNC_START_LOCAL(memcpy_erms) movq %rdi, %rax movq %rdx, %rcx rep movsb - ret + RET SYM_FUNC_END(memcpy_erms) SYM_FUNC_START_LOCAL(memcpy_orig) @@ -137,7 +137,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig) movq %r9, 1*8(%rdi) movq %r10, -2*8(%rdi, %rdx) movq %r11, -1*8(%rdi, %rdx) - retq + RET .p2align 4 .Lless_16bytes: cmpl $8, %edx @@ -149,7 +149,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig) movq -1*8(%rsi, %rdx), %r9 movq %r8, 0*8(%rdi) movq %r9, -1*8(%rdi, %rdx) - retq + RET .p2align 4 .Lless_8bytes: cmpl $4, %edx @@ -162,7 +162,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig) movl -4(%rsi, %rdx), %r8d movl %ecx, (%rdi) movl %r8d, -4(%rdi, %rdx) - retq + RET .p2align 4 .Lless_3bytes: subl $1, %edx @@ -180,7 +180,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig) movb %cl, (%rdi) .Lend: - retq + RET SYM_FUNC_END(memcpy_orig) .popsection diff --git a/tools/arch/x86/lib/memset_64.S b/tools/arch/x86/lib/memset_64.S index 9827ae267f96..d624f2bc42f1 100644 --- a/tools/arch/x86/lib/memset_64.S +++ b/tools/arch/x86/lib/memset_64.S @@ -40,7 +40,7 @@ SYM_FUNC_START(__memset) movl %edx,%ecx rep stosb movq %r9,%rax - ret + RET SYM_FUNC_END(__memset) SYM_FUNC_END_ALIAS(memset) EXPORT_SYMBOL(memset) @@ -63,7 +63,7 @@ SYM_FUNC_START_LOCAL(memset_erms) movq %rdx,%rcx rep stosb movq %r9,%rax - ret + RET SYM_FUNC_END(memset_erms) SYM_FUNC_START_LOCAL(memset_orig) @@ -125,7 +125,7 @@ SYM_FUNC_START_LOCAL(memset_orig) .Lende: movq %r10,%rax - ret + RET .Lbad_alignment: cmpq $7,%rdx diff --git a/tools/build/Build.include b/tools/build/Build.include index 2cf3b1bde86e..c2a95ab47379 100644 --- a/tools/build/Build.include +++ b/tools/build/Build.include @@ -99,7 +99,7 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXX ### ## HOSTCC C flags -host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj)) +host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj)) # output directory for tests below TMPOUT = .tmp_$$$$ diff --git a/tools/include/uapi/drm/drm.h b/tools/include/uapi/drm/drm.h index 3b810b53ba8b..642808520d92 100644 --- a/tools/include/uapi/drm/drm.h +++ b/tools/include/uapi/drm/drm.h @@ -1096,6 +1096,24 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct drm_syncobj_transfer) #define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL DRM_IOWR(0xCD, struct drm_syncobj_timeline_array) +/** + * DRM_IOCTL_MODE_GETFB2 - Get framebuffer metadata. + * + * This queries metadata about a framebuffer. User-space fills + * &drm_mode_fb_cmd2.fb_id as the input, and the kernels fills the rest of the + * struct as the output. + * + * If the client is DRM master or has &CAP_SYS_ADMIN, &drm_mode_fb_cmd2.handles + * will be filled with GEM buffer handles. Planes are valid until one has a + * zero handle -- this can be used to compute the number of planes. + * + * Otherwise, &drm_mode_fb_cmd2.handles will be zeroed and planes are valid + * until one has a zero &drm_mode_fb_cmd2.pitches. + * + * If the framebuffer has a format modifier, &DRM_MODE_FB_MODIFIERS will be set + * in &drm_mode_fb_cmd2.flags and &drm_mode_fb_cmd2.modifier will contain the + * modifier. Otherwise, user-space must ignore &drm_mode_fb_cmd2.modifier. + */ #define DRM_IOCTL_MODE_GETFB2 DRM_IOWR(0xCE, struct drm_mode_fb_cmd2) /* diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index bd8860eeb291..4cd39aaccbe7 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -1332,7 +1332,10 @@ union perf_mem_data_src { /* hop level */ #define PERF_MEM_HOPS_0 0x01 /* remote core, same node */ -/* 2-7 available */ +#define PERF_MEM_HOPS_1 0x02 /* remote node, same socket */ +#define PERF_MEM_HOPS_2 0x03 /* remote socket, same board */ +#define PERF_MEM_HOPS_3 0x04 /* remote board */ +/* 5-7 available */ #define PERF_MEM_HOPS_SHIFT 43 #define PERF_MEM_S(a, s) \ diff --git a/tools/lib/perf/Documentation/libperf.txt b/tools/lib/perf/Documentation/libperf.txt index 63ae5e0195ce..32c5051c24eb 100644 --- a/tools/lib/perf/Documentation/libperf.txt +++ b/tools/lib/perf/Documentation/libperf.txt @@ -48,6 +48,7 @@ SYNOPSIS int perf_cpu_map__nr(const struct perf_cpu_map *cpus); bool perf_cpu_map__empty(const struct perf_cpu_map *map); int perf_cpu_map__max(struct perf_cpu_map *map); + bool perf_cpu_map__has(const struct perf_cpu_map *map, int cpu); #define perf_cpu_map__for_each_cpu(cpu, idx, cpus) -- @@ -135,16 +136,16 @@ SYNOPSIS int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads); void perf_evsel__close(struct perf_evsel *evsel); - void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu); + void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu_map_idx); int perf_evsel__mmap(struct perf_evsel *evsel, int pages); void perf_evsel__munmap(struct perf_evsel *evsel); - void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread); - int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, + void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu_map_idx, int thread); + int perf_evsel__read(struct perf_evsel *evsel, int cpu_map_idx, int thread, struct perf_counts_values *count); int perf_evsel__enable(struct perf_evsel *evsel); - int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu); + int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu_map_idx); int perf_evsel__disable(struct perf_evsel *evsel); - int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu); + int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu_map_idx); struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel); struct perf_thread_map *perf_evsel__threads(struct perf_evsel *evsel); struct perf_event_attr *perf_evsel__attr(struct perf_evsel *evsel); diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c index adaad3dddf6e..ee66760f1e63 100644 --- a/tools/lib/perf/cpumap.c +++ b/tools/lib/perf/cpumap.c @@ -10,15 +10,24 @@ #include <ctype.h> #include <limits.h> -struct perf_cpu_map *perf_cpu_map__dummy_new(void) +static struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus) { - struct perf_cpu_map *cpus = malloc(sizeof(*cpus) + sizeof(int)); + struct perf_cpu_map *cpus = malloc(sizeof(*cpus) + sizeof(struct perf_cpu) * nr_cpus); if (cpus != NULL) { - cpus->nr = 1; - cpus->map[0] = -1; + cpus->nr = nr_cpus; refcount_set(&cpus->refcnt, 1); + } + return cpus; +} + +struct perf_cpu_map *perf_cpu_map__dummy_new(void) +{ + struct perf_cpu_map *cpus = perf_cpu_map__alloc(1); + + if (cpus) + cpus->map[0].cpu = -1; return cpus; } @@ -54,15 +63,12 @@ static struct perf_cpu_map *cpu_map__default_new(void) if (nr_cpus < 0) return NULL; - cpus = malloc(sizeof(*cpus) + nr_cpus * sizeof(int)); + cpus = perf_cpu_map__alloc(nr_cpus); if (cpus != NULL) { int i; for (i = 0; i < nr_cpus; ++i) - cpus->map[i] = i; - - cpus->nr = nr_cpus; - refcount_set(&cpus->refcnt, 1); + cpus->map[i].cpu = i; } return cpus; @@ -73,31 +79,32 @@ struct perf_cpu_map *perf_cpu_map__default_new(void) return cpu_map__default_new(); } -static int cmp_int(const void *a, const void *b) + +static int cmp_cpu(const void *a, const void *b) { - return *(const int *)a - *(const int*)b; + const struct perf_cpu *cpu_a = a, *cpu_b = b; + + return cpu_a->cpu - cpu_b->cpu; } -static struct perf_cpu_map *cpu_map__trim_new(int nr_cpus, int *tmp_cpus) +static struct perf_cpu_map *cpu_map__trim_new(int nr_cpus, const struct perf_cpu *tmp_cpus) { - size_t payload_size = nr_cpus * sizeof(int); - struct perf_cpu_map *cpus = malloc(sizeof(*cpus) + payload_size); + size_t payload_size = nr_cpus * sizeof(struct perf_cpu); + struct perf_cpu_map *cpus = perf_cpu_map__alloc(nr_cpus); int i, j; if (cpus != NULL) { memcpy(cpus->map, tmp_cpus, payload_size); - qsort(cpus->map, nr_cpus, sizeof(int), cmp_int); + qsort(cpus->map, nr_cpus, sizeof(struct perf_cpu), cmp_cpu); /* Remove dups */ j = 0; for (i = 0; i < nr_cpus; i++) { - if (i == 0 || cpus->map[i] != cpus->map[i - 1]) - cpus->map[j++] = cpus->map[i]; + if (i == 0 || cpus->map[i].cpu != cpus->map[i - 1].cpu) + cpus->map[j++].cpu = cpus->map[i].cpu; } cpus->nr = j; assert(j <= nr_cpus); - refcount_set(&cpus->refcnt, 1); } - return cpus; } @@ -105,7 +112,7 @@ struct perf_cpu_map *perf_cpu_map__read(FILE *file) { struct perf_cpu_map *cpus = NULL; int nr_cpus = 0; - int *tmp_cpus = NULL, *tmp; + struct perf_cpu *tmp_cpus = NULL, *tmp; int max_entries = 0; int n, cpu, prev; char sep; @@ -124,24 +131,24 @@ struct perf_cpu_map *perf_cpu_map__read(FILE *file) if (new_max >= max_entries) { max_entries = new_max + MAX_NR_CPUS / 2; - tmp = realloc(tmp_cpus, max_entries * sizeof(int)); + tmp = realloc(tmp_cpus, max_entries * sizeof(struct perf_cpu)); if (tmp == NULL) goto out_free_tmp; tmp_cpus = tmp; } while (++prev < cpu) - tmp_cpus[nr_cpus++] = prev; + tmp_cpus[nr_cpus++].cpu = prev; } if (nr_cpus == max_entries) { max_entries += MAX_NR_CPUS; - tmp = realloc(tmp_cpus, max_entries * sizeof(int)); + tmp = realloc(tmp_cpus, max_entries * sizeof(struct perf_cpu)); if (tmp == NULL) goto out_free_tmp; tmp_cpus = tmp; } - tmp_cpus[nr_cpus++] = cpu; + tmp_cpus[nr_cpus++].cpu = cpu; if (n == 2 && sep == '-') prev = cpu; else @@ -179,7 +186,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list) unsigned long start_cpu, end_cpu = 0; char *p = NULL; int i, nr_cpus = 0; - int *tmp_cpus = NULL, *tmp; + struct perf_cpu *tmp_cpus = NULL, *tmp; int max_entries = 0; if (!cpu_list) @@ -220,17 +227,17 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list) for (; start_cpu <= end_cpu; start_cpu++) { /* check for duplicates */ for (i = 0; i < nr_cpus; i++) - if (tmp_cpus[i] == (int)start_cpu) + if (tmp_cpus[i].cpu == (int)start_cpu) goto invalid; if (nr_cpus == max_entries) { max_entries += MAX_NR_CPUS; - tmp = realloc(tmp_cpus, max_entries * sizeof(int)); + tmp = realloc(tmp_cpus, max_entries * sizeof(struct perf_cpu)); if (tmp == NULL) goto invalid; tmp_cpus = tmp; } - tmp_cpus[nr_cpus++] = (int)start_cpu; + tmp_cpus[nr_cpus++].cpu = (int)start_cpu; } if (*p) ++p; @@ -250,12 +257,16 @@ out: return cpus; } -int perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx) +struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx) { + struct perf_cpu result = { + .cpu = -1 + }; + if (cpus && idx < cpus->nr) return cpus->map[idx]; - return -1; + return result; } int perf_cpu_map__nr(const struct perf_cpu_map *cpus) @@ -265,21 +276,26 @@ int perf_cpu_map__nr(const struct perf_cpu_map *cpus) bool perf_cpu_map__empty(const struct perf_cpu_map *map) { - return map ? map->map[0] == -1 : true; + return map ? map->map[0].cpu == -1 : true; } -int perf_cpu_map__idx(struct perf_cpu_map *cpus, int cpu) +int perf_cpu_map__idx(const struct perf_cpu_map *cpus, struct perf_cpu cpu) { - int low = 0, high = cpus->nr; + int low, high; + if (!cpus) + return -1; + + low = 0; + high = cpus->nr; while (low < high) { - int idx = (low + high) / 2, - cpu_at_idx = cpus->map[idx]; + int idx = (low + high) / 2; + struct perf_cpu cpu_at_idx = cpus->map[idx]; - if (cpu_at_idx == cpu) + if (cpu_at_idx.cpu == cpu.cpu) return idx; - if (cpu_at_idx > cpu) + if (cpu_at_idx.cpu > cpu.cpu) high = idx; else low = idx + 1; @@ -288,10 +304,19 @@ int perf_cpu_map__idx(struct perf_cpu_map *cpus, int cpu) return -1; } -int perf_cpu_map__max(struct perf_cpu_map *map) +bool perf_cpu_map__has(const struct perf_cpu_map *cpus, struct perf_cpu cpu) { + return perf_cpu_map__idx(cpus, cpu) != -1; +} + +struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map) +{ + struct perf_cpu result = { + .cpu = -1 + }; + // cpu_map__trim_new() qsort()s it, cpu_map__default_new() sorts it as well. - return map->nr > 0 ? map->map[map->nr - 1] : -1; + return map->nr > 0 ? map->map[map->nr - 1] : result; } /* @@ -305,7 +330,7 @@ int perf_cpu_map__max(struct perf_cpu_map *map) struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig, struct perf_cpu_map *other) { - int *tmp_cpus; + struct perf_cpu *tmp_cpus; int tmp_len; int i, j, k; struct perf_cpu_map *merged; @@ -319,19 +344,19 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig, if (!other) return orig; if (orig->nr == other->nr && - !memcmp(orig->map, other->map, orig->nr * sizeof(int))) + !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu))) return orig; tmp_len = orig->nr + other->nr; - tmp_cpus = malloc(tmp_len * sizeof(int)); + tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu)); if (!tmp_cpus) return NULL; /* Standard merge algorithm from wikipedia */ i = j = k = 0; while (i < orig->nr && j < other->nr) { - if (orig->map[i] <= other->map[j]) { - if (orig->map[i] == other->map[j]) + if (orig->map[i].cpu <= other->map[j].cpu) { + if (orig->map[i].cpu == other->map[j].cpu) j++; tmp_cpus[k++] = orig->map[i++]; } else diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c index e37dfad31383..9a770bfdc804 100644 --- a/tools/lib/perf/evlist.c +++ b/tools/lib/perf/evlist.c @@ -407,7 +407,7 @@ perf_evlist__mmap_cb_get(struct perf_evlist *evlist, bool overwrite, int idx) static int perf_evlist__mmap_cb_mmap(struct perf_mmap *map, struct perf_mmap_param *mp, - int output, int cpu) + int output, struct perf_cpu cpu) { return perf_mmap__mmap(map, mp, output, cpu); } @@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, int idx, struct perf_mmap_param *mp, int cpu_idx, int thread, int *_output, int *_output_overwrite) { - int evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx); + struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx); struct perf_evsel *evsel; int revent; @@ -643,14 +643,14 @@ perf_evlist__next_mmap(struct perf_evlist *evlist, struct perf_mmap *map, return overwrite ? evlist->mmap_ovw_first : evlist->mmap_first; } -void __perf_evlist__set_leader(struct list_head *list) +void __perf_evlist__set_leader(struct list_head *list, struct perf_evsel *leader) { - struct perf_evsel *evsel, *leader; + struct perf_evsel *first, *last, *evsel; - leader = list_entry(list->next, struct perf_evsel, node); - evsel = list_entry(list->prev, struct perf_evsel, node); + first = list_first_entry(list, struct perf_evsel, node); + last = list_last_entry(list, struct perf_evsel, node); - leader->nr_members = evsel->idx - leader->idx + 1; + leader->nr_members = last->idx - first->idx + 1; __perf_evlist__for_each_entry(list, evsel) evsel->leader = leader; @@ -659,7 +659,10 @@ void __perf_evlist__set_leader(struct list_head *list) void perf_evlist__set_leader(struct perf_evlist *evlist) { if (evlist->nr_entries) { + struct perf_evsel *first = list_entry(evlist->entries.next, + struct perf_evsel, node); + evlist->nr_groups = evlist->nr_entries > 1 ? 1 : 0; - __perf_evlist__set_leader(&evlist->entries); + __perf_evlist__set_leader(&evlist->entries, first); } } diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c index 8441e3e1aaac..7ea86a44eae5 100644 --- a/tools/lib/perf/evsel.c +++ b/tools/lib/perf/evsel.c @@ -43,18 +43,22 @@ void perf_evsel__delete(struct perf_evsel *evsel) free(evsel); } -#define FD(e, x, y) ((int *) xyarray__entry(e->fd, x, y)) -#define MMAP(e, x, y) (e->mmap ? ((struct perf_mmap *) xyarray__entry(e->mmap, x, y)) : NULL) +#define FD(_evsel, _cpu_map_idx, _thread) \ + ((int *)xyarray__entry(_evsel->fd, _cpu_map_idx, _thread)) +#define MMAP(_evsel, _cpu_map_idx, _thread) \ + (_evsel->mmap ? ((struct perf_mmap *) xyarray__entry(_evsel->mmap, _cpu_map_idx, _thread)) \ + : NULL) int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads) { evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int)); if (evsel->fd) { - int cpu, thread; - for (cpu = 0; cpu < ncpus; cpu++) { + int idx, thread; + + for (idx = 0; idx < ncpus; idx++) { for (thread = 0; thread < nthreads; thread++) { - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, idx, thread); if (fd) *fd = -1; @@ -74,13 +78,13 @@ static int perf_evsel__alloc_mmap(struct perf_evsel *evsel, int ncpus, int nthre static int sys_perf_event_open(struct perf_event_attr *attr, - pid_t pid, int cpu, int group_fd, + pid_t pid, struct perf_cpu cpu, int group_fd, unsigned long flags) { - return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); + return syscall(__NR_perf_event_open, attr, pid, cpu.cpu, group_fd, flags); } -static int get_group_fd(struct perf_evsel *evsel, int cpu, int thread, int *group_fd) +static int get_group_fd(struct perf_evsel *evsel, int cpu_map_idx, int thread, int *group_fd) { struct perf_evsel *leader = evsel->leader; int *fd; @@ -97,7 +101,7 @@ static int get_group_fd(struct perf_evsel *evsel, int cpu, int thread, int *grou if (!leader->fd) return -ENOTCONN; - fd = FD(leader, cpu, thread); + fd = FD(leader, cpu_map_idx, thread); if (fd == NULL || *fd == -1) return -EBADF; @@ -109,7 +113,8 @@ static int get_group_fd(struct perf_evsel *evsel, int cpu, int thread, int *grou int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads) { - int cpu, thread, err = 0; + struct perf_cpu cpu; + int idx, thread, err = 0; if (cpus == NULL) { static struct perf_cpu_map *empty_cpu_map; @@ -139,21 +144,21 @@ int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, perf_evsel__alloc_fd(evsel, cpus->nr, threads->nr) < 0) return -ENOMEM; - for (cpu = 0; cpu < cpus->nr; cpu++) { + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { for (thread = 0; thread < threads->nr; thread++) { int fd, group_fd, *evsel_fd; - evsel_fd = FD(evsel, cpu, thread); + evsel_fd = FD(evsel, idx, thread); if (evsel_fd == NULL) return -EINVAL; - err = get_group_fd(evsel, cpu, thread, &group_fd); + err = get_group_fd(evsel, idx, thread, &group_fd); if (err < 0) return err; fd = sys_perf_event_open(&evsel->attr, threads->map[thread].pid, - cpus->map[cpu], group_fd, 0); + cpu, group_fd, 0); if (fd < 0) return -errno; @@ -165,12 +170,12 @@ int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, return err; } -static void perf_evsel__close_fd_cpu(struct perf_evsel *evsel, int cpu) +static void perf_evsel__close_fd_cpu(struct perf_evsel *evsel, int cpu_map_idx) { int thread; for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, cpu_map_idx, thread); if (fd && *fd >= 0) { close(*fd); @@ -181,10 +186,8 @@ static void perf_evsel__close_fd_cpu(struct perf_evsel *evsel, int cpu) void perf_evsel__close_fd(struct perf_evsel *evsel) { - int cpu; - - for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) - perf_evsel__close_fd_cpu(evsel, cpu); + for (int idx = 0; idx < xyarray__max_x(evsel->fd); idx++) + perf_evsel__close_fd_cpu(evsel, idx); } void perf_evsel__free_fd(struct perf_evsel *evsel) @@ -202,29 +205,29 @@ void perf_evsel__close(struct perf_evsel *evsel) perf_evsel__free_fd(evsel); } -void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu) +void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu_map_idx) { if (evsel->fd == NULL) return; - perf_evsel__close_fd_cpu(evsel, cpu); + perf_evsel__close_fd_cpu(evsel, cpu_map_idx); } void perf_evsel__munmap(struct perf_evsel *evsel) { - int cpu, thread; + int idx, thread; if (evsel->fd == NULL || evsel->mmap == NULL) return; - for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) { + for (idx = 0; idx < xyarray__max_x(evsel->fd); idx++) { for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) { - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, idx, thread); if (fd == NULL || *fd < 0) continue; - perf_mmap__munmap(MMAP(evsel, cpu, thread)); + perf_mmap__munmap(MMAP(evsel, idx, thread)); } } @@ -234,7 +237,7 @@ void perf_evsel__munmap(struct perf_evsel *evsel) int perf_evsel__mmap(struct perf_evsel *evsel, int pages) { - int ret, cpu, thread; + int ret, idx, thread; struct perf_mmap_param mp = { .prot = PROT_READ | PROT_WRITE, .mask = (pages * page_size) - 1, @@ -246,15 +249,16 @@ int perf_evsel__mmap(struct perf_evsel *evsel, int pages) if (perf_evsel__alloc_mmap(evsel, xyarray__max_x(evsel->fd), xyarray__max_y(evsel->fd)) < 0) return -ENOMEM; - for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) { + for (idx = 0; idx < xyarray__max_x(evsel->fd); idx++) { for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) { - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, idx, thread); struct perf_mmap *map; + struct perf_cpu cpu = perf_cpu_map__cpu(evsel->cpus, idx); if (fd == NULL || *fd < 0) continue; - map = MMAP(evsel, cpu, thread); + map = MMAP(evsel, idx, thread); perf_mmap__init(map, NULL, false, NULL); ret = perf_mmap__mmap(map, &mp, *fd, cpu); @@ -268,14 +272,14 @@ int perf_evsel__mmap(struct perf_evsel *evsel, int pages) return 0; } -void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread) +void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu_map_idx, int thread) { - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, cpu_map_idx, thread); - if (fd == NULL || *fd < 0 || MMAP(evsel, cpu, thread) == NULL) + if (fd == NULL || *fd < 0 || MMAP(evsel, cpu_map_idx, thread) == NULL) return NULL; - return MMAP(evsel, cpu, thread)->base; + return MMAP(evsel, cpu_map_idx, thread)->base; } int perf_evsel__read_size(struct perf_evsel *evsel) @@ -303,19 +307,19 @@ int perf_evsel__read_size(struct perf_evsel *evsel) return size; } -int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, +int perf_evsel__read(struct perf_evsel *evsel, int cpu_map_idx, int thread, struct perf_counts_values *count) { size_t size = perf_evsel__read_size(evsel); - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, cpu_map_idx, thread); memset(count, 0, sizeof(*count)); if (fd == NULL || *fd < 0) return -EINVAL; - if (MMAP(evsel, cpu, thread) && - !perf_mmap__read_self(MMAP(evsel, cpu, thread), count)) + if (MMAP(evsel, cpu_map_idx, thread) && + !perf_mmap__read_self(MMAP(evsel, cpu_map_idx, thread), count)) return 0; if (readn(*fd, count->values, size) <= 0) @@ -326,13 +330,13 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, static int perf_evsel__run_ioctl(struct perf_evsel *evsel, int ioc, void *arg, - int cpu) + int cpu_map_idx) { int thread; for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) { int err; - int *fd = FD(evsel, cpu, thread); + int *fd = FD(evsel, cpu_map_idx, thread); if (fd == NULL || *fd < 0) return -1; @@ -346,9 +350,9 @@ static int perf_evsel__run_ioctl(struct perf_evsel *evsel, return 0; } -int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu) +int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu_map_idx) { - return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, NULL, cpu); + return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, NULL, cpu_map_idx); } int perf_evsel__enable(struct perf_evsel *evsel) @@ -361,9 +365,9 @@ int perf_evsel__enable(struct perf_evsel *evsel) return err; } -int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu) +int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu_map_idx) { - return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, NULL, cpu); + return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, NULL, cpu_map_idx); } int perf_evsel__disable(struct perf_evsel *evsel) @@ -431,3 +435,22 @@ void perf_evsel__free_id(struct perf_evsel *evsel) zfree(&evsel->id); evsel->ids = 0; } + +void perf_counts_values__scale(struct perf_counts_values *count, + bool scale, __s8 *pscaled) +{ + s8 scaled = 0; + + if (scale) { + if (count->run == 0) { + scaled = -1; + count->val = 0; + } else if (count->run < count->ena) { + scaled = 1; + count->val = (u64)((double)count->val * count->ena / count->run); + } + } + + if (pscaled) + *pscaled = scaled; +} diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h index 840d4032587b..581f9ffb4237 100644 --- a/tools/lib/perf/include/internal/cpumap.h +++ b/tools/lib/perf/include/internal/cpumap.h @@ -4,16 +4,30 @@ #include <linux/refcount.h> +/** A wrapper around a CPU to avoid confusion with the perf_cpu_map's map's indices. */ +struct perf_cpu { + int cpu; +}; + +/** + * A sized, reference counted, sorted array of integers representing CPU + * numbers. This is commonly used to capture which CPUs a PMU is associated + * with. The indices into the cpumap are frequently used as they avoid having + * gaps if CPU numbers were used. For events associated with a pid, rather than + * a CPU, a single dummy map with an entry of -1 is used. + */ struct perf_cpu_map { refcount_t refcnt; + /** Length of the map array. */ int nr; - int map[]; + /** The CPU values. */ + struct perf_cpu map[]; }; #ifndef MAX_NR_CPUS #define MAX_NR_CPUS 2048 #endif -int perf_cpu_map__idx(struct perf_cpu_map *cpus, int cpu); +int perf_cpu_map__idx(const struct perf_cpu_map *cpus, struct perf_cpu cpu); #endif /* __LIBPERF_INTERNAL_CPUMAP_H */ diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h index f366dbad6a88..4cefade540bd 100644 --- a/tools/lib/perf/include/internal/evlist.h +++ b/tools/lib/perf/include/internal/evlist.h @@ -4,6 +4,7 @@ #include <linux/list.h> #include <api/fd/array.h> +#include <internal/cpumap.h> #include <internal/evsel.h> #define PERF_EVLIST__HLIST_BITS 8 @@ -36,7 +37,7 @@ typedef void typedef struct perf_mmap* (*perf_evlist_mmap__cb_get_t)(struct perf_evlist*, bool, int); typedef int -(*perf_evlist_mmap__cb_mmap_t)(struct perf_mmap*, struct perf_mmap_param*, int, int); +(*perf_evlist_mmap__cb_mmap_t)(struct perf_mmap*, struct perf_mmap_param*, int, struct perf_cpu); struct perf_evlist_mmap_ops { perf_evlist_mmap__cb_idx_t idx; @@ -127,5 +128,5 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist, void perf_evlist__reset_id_hash(struct perf_evlist *evlist); -void __perf_evlist__set_leader(struct list_head *list); +void __perf_evlist__set_leader(struct list_head *list, struct perf_evsel *leader); #endif /* __LIBPERF_INTERNAL_EVLIST_H */ diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h index 1f3eacbad2e8..cfc9ebd7968e 100644 --- a/tools/lib/perf/include/internal/evsel.h +++ b/tools/lib/perf/include/internal/evsel.h @@ -6,8 +6,8 @@ #include <linux/perf_event.h> #include <stdbool.h> #include <sys/types.h> +#include <internal/cpumap.h> -struct perf_cpu_map; struct perf_thread_map; struct xyarray; @@ -27,7 +27,7 @@ struct perf_sample_id { * queue number. */ int idx; - int cpu; + struct perf_cpu cpu; pid_t tid; /* Holds total ID period value for PERF_SAMPLE_READ processing. */ diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h index 5e3422f40ed5..5a062af8e9d8 100644 --- a/tools/lib/perf/include/internal/mmap.h +++ b/tools/lib/perf/include/internal/mmap.h @@ -6,6 +6,7 @@ #include <linux/refcount.h> #include <linux/types.h> #include <stdbool.h> +#include <internal/cpumap.h> /* perf sample has 16 bits size limit */ #define PERF_SAMPLE_MAX_SIZE (1 << 16) @@ -24,7 +25,7 @@ struct perf_mmap { void *base; int mask; int fd; - int cpu; + struct perf_cpu cpu; refcount_t refcnt; u64 prev; u64 start; @@ -46,7 +47,7 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map); void perf_mmap__init(struct perf_mmap *map, struct perf_mmap *prev, bool overwrite, libperf_unmap_cb_t unmap_cb); int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp, - int fd, int cpu); + int fd, struct perf_cpu cpu); void perf_mmap__munmap(struct perf_mmap *map); void perf_mmap__get(struct perf_mmap *map); void perf_mmap__put(struct perf_mmap *map); diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h index 7c27766ea0bf..15b8faafd615 100644 --- a/tools/lib/perf/include/perf/cpumap.h +++ b/tools/lib/perf/include/perf/cpumap.h @@ -3,11 +3,10 @@ #define __LIBPERF_CPUMAP_H #include <perf/core.h> +#include <perf/cpumap.h> #include <stdio.h> #include <stdbool.h> -struct perf_cpu_map; - LIBPERF_API struct perf_cpu_map *perf_cpu_map__dummy_new(void); LIBPERF_API struct perf_cpu_map *perf_cpu_map__default_new(void); LIBPERF_API struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list); @@ -16,10 +15,11 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map); LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig, struct perf_cpu_map *other); LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map); -LIBPERF_API int perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx); +LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx); LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus); LIBPERF_API bool perf_cpu_map__empty(const struct perf_cpu_map *map); -LIBPERF_API int perf_cpu_map__max(struct perf_cpu_map *map); +LIBPERF_API struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map); +LIBPERF_API bool perf_cpu_map__has(const struct perf_cpu_map *map, struct perf_cpu cpu); #define perf_cpu_map__for_each_cpu(cpu, idx, cpus) \ for ((idx) = 0, (cpu) = perf_cpu_map__cpu(cpus, idx); \ diff --git a/tools/lib/perf/include/perf/evsel.h b/tools/lib/perf/include/perf/evsel.h index 60eae25076d3..2a9516b42d15 100644 --- a/tools/lib/perf/include/perf/evsel.h +++ b/tools/lib/perf/include/perf/evsel.h @@ -4,6 +4,8 @@ #include <stdint.h> #include <perf/core.h> +#include <stdbool.h> +#include <linux/types.h> struct perf_evsel; struct perf_event_attr; @@ -26,18 +28,20 @@ LIBPERF_API void perf_evsel__delete(struct perf_evsel *evsel); LIBPERF_API int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads); LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel); -LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu); +LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu_map_idx); LIBPERF_API int perf_evsel__mmap(struct perf_evsel *evsel, int pages); LIBPERF_API void perf_evsel__munmap(struct perf_evsel *evsel); -LIBPERF_API void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread); -LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, +LIBPERF_API void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu_map_idx, int thread); +LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu_map_idx, int thread, struct perf_counts_values *count); LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel); -LIBPERF_API int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu); +LIBPERF_API int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu_map_idx); LIBPERF_API int perf_evsel__disable(struct perf_evsel *evsel); -LIBPERF_API int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu); +LIBPERF_API int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu_map_idx); LIBPERF_API struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel); LIBPERF_API struct perf_thread_map *perf_evsel__threads(struct perf_evsel *evsel); LIBPERF_API struct perf_event_attr *perf_evsel__attr(struct perf_evsel *evsel); +LIBPERF_API void perf_counts_values__scale(struct perf_counts_values *count, + bool scale, __s8 *pscaled); #endif /* __LIBPERF_EVSEL_H */ diff --git a/tools/lib/perf/libperf.map b/tools/lib/perf/libperf.map index 71468606e8a7..93696affda2e 100644 --- a/tools/lib/perf/libperf.map +++ b/tools/lib/perf/libperf.map @@ -10,6 +10,7 @@ LIBPERF_0.0.1 { perf_cpu_map__cpu; perf_cpu_map__empty; perf_cpu_map__max; + perf_cpu_map__has; perf_thread_map__new_dummy; perf_thread_map__set_pid; perf_thread_map__comm; @@ -50,6 +51,7 @@ LIBPERF_0.0.1 { perf_mmap__read_init; perf_mmap__read_done; perf_mmap__read_event; + perf_counts_values__scale; local: *; }; diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c index c89dfa5f67b3..f7ee07cb5818 100644 --- a/tools/lib/perf/mmap.c +++ b/tools/lib/perf/mmap.c @@ -32,7 +32,7 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map) } int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp, - int fd, int cpu) + int fd, struct perf_cpu cpu) { map->prev = 0; map->mask = mp->mask; @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count count->ena += delta; if (idx) count->run += delta; - - cnt = mul_u64_u64_div64(cnt, count->ena, count->run); } count->val = cnt; diff --git a/tools/lib/perf/tests/test-evlist.c b/tools/lib/perf/tests/test-evlist.c index ce91a582f0e4..b3479dfa9a1c 100644 --- a/tools/lib/perf/tests/test-evlist.c +++ b/tools/lib/perf/tests/test-evlist.c @@ -21,6 +21,9 @@ #include "tests.h" #include <internal/evsel.h> +#define EVENT_NUM 15 +#define WAIT_COUNT 100000000UL + static int libperf_print(enum libperf_print_level level, const char *fmt, va_list ap) { @@ -331,7 +334,8 @@ static int test_mmap_cpus(void) }; cpu_set_t saved_mask; char path[PATH_MAX]; - int id, err, cpu, tmp; + int id, err, tmp; + struct perf_cpu cpu; union perf_event *event; int count = 0; @@ -374,7 +378,7 @@ static int test_mmap_cpus(void) cpu_set_t mask; CPU_ZERO(&mask); - CPU_SET(cpu, &mask); + CPU_SET(cpu.cpu, &mask); err = sched_setaffinity(0, sizeof(mask), &mask); __T("sched_setaffinity failed", err == 0); @@ -413,6 +417,159 @@ static int test_mmap_cpus(void) return 0; } +static double display_error(long long average, + long long high, + long long low, + long long expected) +{ + double error; + + error = (((double)average - expected) / expected) * 100.0; + + __T_VERBOSE(" Expected: %lld\n", expected); + __T_VERBOSE(" High: %lld Low: %lld Average: %lld\n", + high, low, average); + + __T_VERBOSE(" Average Error = %.2f%%\n", error); + + return error; +} + +static int test_stat_multiplexing(void) +{ + struct perf_counts_values expected_counts = { .val = 0 }; + struct perf_counts_values counts[EVENT_NUM] = {{ .val = 0 },}; + struct perf_thread_map *threads; + struct perf_evlist *evlist; + struct perf_evsel *evsel; + struct perf_event_attr attr = { + .type = PERF_TYPE_HARDWARE, + .config = PERF_COUNT_HW_INSTRUCTIONS, + .read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | + PERF_FORMAT_TOTAL_TIME_RUNNING, + .disabled = 1, + }; + int err, i, nonzero = 0; + unsigned long count; + long long max = 0, min = 0, avg = 0; + double error = 0.0; + s8 scaled = 0; + + /* read for non-multiplexing event count */ + threads = perf_thread_map__new_dummy(); + __T("failed to create threads", threads); + + perf_thread_map__set_pid(threads, 0, 0); + + evsel = perf_evsel__new(&attr); + __T("failed to create evsel", evsel); + + err = perf_evsel__open(evsel, NULL, threads); + __T("failed to open evsel", err == 0); + + err = perf_evsel__enable(evsel); + __T("failed to enable evsel", err == 0); + + /* wait loop */ + count = WAIT_COUNT; + while (count--) + ; + + perf_evsel__read(evsel, 0, 0, &expected_counts); + __T("failed to read value for evsel", expected_counts.val != 0); + __T("failed to read non-multiplexing event count", + expected_counts.ena == expected_counts.run); + + err = perf_evsel__disable(evsel); + __T("failed to enable evsel", err == 0); + + perf_evsel__close(evsel); + perf_evsel__delete(evsel); + + perf_thread_map__put(threads); + + /* read for multiplexing event count */ + threads = perf_thread_map__new_dummy(); + __T("failed to create threads", threads); + + perf_thread_map__set_pid(threads, 0, 0); + + evlist = perf_evlist__new(); + __T("failed to create evlist", evlist); + + for (i = 0; i < EVENT_NUM; i++) { + evsel = perf_evsel__new(&attr); + __T("failed to create evsel", evsel); + + perf_evlist__add(evlist, evsel); + } + perf_evlist__set_maps(evlist, NULL, threads); + + err = perf_evlist__open(evlist); + __T("failed to open evsel", err == 0); + + perf_evlist__enable(evlist); + + /* wait loop */ + count = WAIT_COUNT; + while (count--) + ; + + i = 0; + perf_evlist__for_each_evsel(evlist, evsel) { + perf_evsel__read(evsel, 0, 0, &counts[i]); + __T("failed to read value for evsel", counts[i].val != 0); + i++; + } + + perf_evlist__disable(evlist); + + min = counts[0].val; + for (i = 0; i < EVENT_NUM; i++) { + __T_VERBOSE("Event %2d -- Raw count = %lu, run = %lu, enable = %lu\n", + i, counts[i].val, counts[i].run, counts[i].ena); + + perf_counts_values__scale(&counts[i], true, &scaled); + if (scaled == 1) { + __T_VERBOSE("\t Scaled count = %lu (%.2lf%%, %lu/%lu)\n", + counts[i].val, + (double)counts[i].run / (double)counts[i].ena * 100.0, + counts[i].run, counts[i].ena); + } else if (scaled == -1) { + __T_VERBOSE("\t Not Running\n"); + } else { + __T_VERBOSE("\t Not Scaling\n"); + } + + if (counts[i].val > max) + max = counts[i].val; + + if (counts[i].val < min) + min = counts[i].val; + + avg += counts[i].val; + + if (counts[i].val != 0) + nonzero++; + } + + if (nonzero != 0) + avg = avg / nonzero; + else + avg = 0; + + error = display_error(avg, max, min, expected_counts.val); + + __T("Error out of range!", ((error <= 1.0) && (error >= -1.0))); + + perf_evlist__close(evlist); + perf_evlist__delete(evlist); + + perf_thread_map__put(threads); + + return 0; +} + int test_evlist(int argc, char **argv) { __T_START; @@ -424,6 +581,7 @@ int test_evlist(int argc, char **argv) test_stat_thread_enable(); test_mmap_thread(); test_mmap_cpus(); + test_stat_multiplexing(); __T_END; return tests_failed == 0 ? 0 : -1; diff --git a/tools/perf/Documentation/perf-buildid-cache.txt b/tools/perf/Documentation/perf-buildid-cache.txt index cd8ce6e8ec12..7e44b419d301 100644 --- a/tools/perf/Documentation/perf-buildid-cache.txt +++ b/tools/perf/Documentation/perf-buildid-cache.txt @@ -74,12 +74,15 @@ OPTIONS used when creating a uprobe for a process that resides in a different mount namespace from the perf(1) utility. ---debuginfod=URLs:: +--debuginfod[=URLs]:: Specify debuginfod URL to be used when retrieving perf.data binaries, it follows the same syntax as the DEBUGINFOD_URLS variable, like: buildid-cache.debuginfod=http://192.168.122.174:8002 + If the URLs is not specified, the value of DEBUGINFOD_URLS + system environment variable is used. + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-buildid-list[1] diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt index 3bb75c1f25e8..0420e71698ee 100644 --- a/tools/perf/Documentation/perf-config.txt +++ b/tools/perf/Documentation/perf-config.txt @@ -587,6 +587,15 @@ record.*:: Use 'n' control blocks in asynchronous (Posix AIO) trace writing mode ('n' default: 1, max: 4). + record.debuginfod:: + Specify debuginfod URL to be used when cacheing perf.data binaries, + it follows the same syntax as the DEBUGINFOD_URLS variable, like: + + http://192.168.122.174:8002 + + If the URLs is 'system', the value of DEBUGINFOD_URLS system environment + variable is used. + diff.*:: diff.order:: This option sets the number of columns to sort the result. diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 4dc8d0af19df..57384a97c04f 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -81,7 +81,11 @@ On AMD systems it is implemented using IBS (up to precise-level 2). The precise modifier works with event types 0x76 (cpu-cycles, CPU clocks not halted) and 0xC1 (micro-ops retired). Both events map to IBS execution sampling (IBS op) with the IBS Op Counter Control bit -(IbsOpCntCtl) set respectively (see AMD64 Architecture Programmer’s +(IbsOpCntCtl) set respectively (see the +Core Complex (CCX) -> Processor x86 Core -> Instruction Based Sampling (IBS) +section of the [AMD Processor Programming Reference (PPR)] relevant to the +family, model and stepping of the processor being used). + Manual Volume 2: System Programming, 13.3 Instruction-Based Sampling). Examples to use IBS: @@ -94,10 +98,12 @@ RAW HARDWARE EVENT DESCRIPTOR Even when an event is not available in a symbolic form within perf right now, it can be encoded in a per processor specific way. -For instance For x86 CPUs NNN represents the raw register encoding with the +For instance on x86 CPUs, N is a hexadecimal value that represents the raw register encoding with the layout of IA32_PERFEVTSELx MSRs (see [Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide] Figure 30-1 Layout -of IA32_PERFEVTSELx MSRs) or AMD's PerfEvtSeln (see [AMD64 Architecture Programmer’s Manual Volume 2: System Programming], Page 344, -Figure 13-7 Performance Event-Select Register (PerfEvtSeln)). +of IA32_PERFEVTSELx MSRs) or AMD's PERF_CTL MSRs (see the +Core Complex (CCX) -> Processor x86 Core -> MSR Registers section of the +[AMD Processor Programming Reference (PPR)] relevant to the family, model +and stepping of the processor being used). Note: Only the following bit fields can be set in x86 counter registers: event, umask, edge, inv, cmask. Esp. guest/host only and @@ -126,6 +132,38 @@ It's also possible to use pmu syntax: perf record -e cpu/r1a8/ ... perf record -e cpu/r0x1a8/ ... +Some processors, like those from AMD, support event codes and unit masks +larger than a byte. In such cases, the bits corresponding to the event +configuration parameters can be seen with: + + cat /sys/bus/event_source/devices/<pmu>/format/<config> + +Example: + +If the AMD docs for an EPYC 7713 processor describe an event as: + + Event Umask Event Mask + Num. Value Mnemonic Description + + 28FH 03H op_cache_hit_miss.op_cache_hit Counts Op Cache micro-tag + hit events. + +raw encoding of 0x0328F cannot be used since the upper nibble of the +EventSelect bits have to be specified via bits 32-35 as can be seen with: + + cat /sys/bus/event_source/devices/cpu/format/event + +raw encoding of 0x20000038F should be used instead: + + perf stat -e r20000038f -a sleep 1 + perf record -e r20000038f ... + +It's also possible to use pmu syntax: + + perf record -e r20000038f -a sleep 1 + perf record -e cpu/r20000038f/ ... + perf record -e cpu/r0x20000038f/ ... + You should refer to the processor specific documentation for getting these details. Some of them are referenced in the SEE ALSO section below. @@ -316,4 +354,4 @@ SEE ALSO linkperf:perf-stat[1], linkperf:perf-top[1], linkperf:perf-record[1], http://www.intel.com/sdm/[Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide], -http://support.amd.com/us/Processor_TechDocs/24593_APM_v2.pdf[AMD64 Architecture Programmer’s Manual Volume 2: System Programming] +https://bugzilla.kernel.org/show_bug.cgi?id=206537[AMD Processor Programming Reference (PPR)] diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 3cf7bac67239..9ccc75935bc5 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -30,8 +30,10 @@ OPTIONS - a symbolic event name (use 'perf list' to list all events) - - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a - hexadecimal event descriptor. + - a raw PMU event in the form of rN where N is a hexadecimal value + that represents the raw register encoding with the layout of the + event control registers as described by entries in + /sys/bus/event_sources/devices/cpu/format/*. - a symbolic or raw PMU event followed by an optional colon and a list of event modifiers, e.g., cpu-cycles:p. See the @@ -713,6 +715,15 @@ measurements: include::intel-hybrid.txt[] +--debuginfod[=URLs]:: + Specify debuginfod URL to be used when cacheing perf.data binaries, + it follows the same syntax as the DEBUGINFOD_URLS variable, like: + + http://192.168.122.174:8002 + + If the URLs is not specified, the value of DEBUGINFOD_URLS + system environment variable is used. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1] diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 7e6fb7cbc0f4..c06c341e72b9 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -36,8 +36,10 @@ report:: - a symbolic event name (use 'perf list' to list all events) - - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a - hexadecimal event descriptor. + - a raw PMU event in the form of rN where N is a hexadecimal value + that represents the raw register encoding with the layout of the + event control registers as described by entries in + /sys/bus/event_sources/devices/cpu/format/*. - a symbolic or raw PMU event followed by an optional colon and a list of event modifiers, e.g., cpu-cycles:p. See the @@ -493,6 +495,10 @@ This option can be enabled in perf config by setting the variable $ perf config stat.no-csv-summary=true +--cputype:: +Only enable events on applying cpu with this type for hybrid platform +(e.g. core or atom)" + EXAMPLES -------- diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 9898a32b8d9c..cac3dfbee7d8 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -38,9 +38,10 @@ Default is to monitor all CPUS. -e <event>:: --event=<event>:: Select the PMU event. Selection can be a symbolic event name - (use 'perf list' to list all events) or a raw PMU - event (eventsel+umask) in the form of rNNN where NNN is a - hexadecimal event descriptor. + (use 'perf list' to list all events) or a raw PMU event in the form + of rN where N is a hexadecimal value that represents the raw register + encoding with the layout of the event control registers as described + by entries in /sys/bus/event_sources/devices/cpu/format/*. -E <entries>:: --entries=<entries>:: diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index 3df74cf5651a..96ad944ca6a8 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -17,6 +17,7 @@ detected = $(shell echo "$(1)=y" >> $(OUTPUT).config-detected) detected_var = $(shell echo "$(1)=$($(1))" >> $(OUTPUT).config-detected) CFLAGS := $(EXTRA_CFLAGS) $(filter-out -Wnested-externs,$(EXTRA_WARNINGS)) +HOSTCFLAGS := $(filter-out -Wnested-externs,$(EXTRA_WARNINGS)) include $(srctree)/tools/scripts/Makefile.arch @@ -143,7 +144,10 @@ FEATURE_CHECK_LDFLAGS-libcrypto = -lcrypto ifdef CSINCLUDES LIBOPENCSD_CFLAGS := -I$(CSINCLUDES) endif -OPENCSDLIBS := -lopencsd_c_api -lopencsd -lstdc++ +OPENCSDLIBS := -lopencsd_c_api +ifeq ($(findstring -static,${LDFLAGS}),-static) + OPENCSDLIBS += -lopencsd -lstdc++ +endif ifdef CSLIBS LIBOPENCSD_LDFLAGS := -L$(CSLIBS) endif @@ -211,6 +215,7 @@ endif ifneq ($(WERROR),0) CORE_CFLAGS += -Werror CXXFLAGS += -Werror + HOSTCFLAGS += -Werror endif ifndef DEBUG @@ -290,6 +295,9 @@ CXXFLAGS += -ggdb3 CXXFLAGS += -funwind-tables CXXFLAGS += -Wno-strict-aliasing +HOSTCFLAGS += -Wall +HOSTCFLAGS += -Wextra + # Enforce a non-executable stack, as we may regress (again) in the future by # adding assembler files missing the .GNU-stack linker note. LDFLAGS += -Wl,-z,noexecstack diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index 80522bcfafe0..ac861e42c8f7 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -226,7 +226,7 @@ else endif export srctree OUTPUT RM CC CXX LD AR CFLAGS CXXFLAGS V BISON FLEX AWK -export HOSTCC HOSTLD HOSTAR +export HOSTCC HOSTLD HOSTAR HOSTCFLAGS include $(srctree)/tools/build/Makefile.include @@ -1041,7 +1041,7 @@ SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel) SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp) SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h -SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h +SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h $(SKEL_TMP_OUT) $(LIBBPF_OUTPUT): $(Q)$(MKDIR) -p $@ diff --git a/tools/perf/arch/arm/include/perf_regs.h b/tools/perf/arch/arm/include/perf_regs.h index 4085419283d0..99a06550e25d 100644 --- a/tools/perf/arch/arm/include/perf_regs.h +++ b/tools/perf/arch/arm/include/perf_regs.h @@ -15,46 +15,4 @@ void perf_regs_load(u64 *regs); #define PERF_REG_IP PERF_REG_ARM_PC #define PERF_REG_SP PERF_REG_ARM_SP -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_ARM_R0: - return "r0"; - case PERF_REG_ARM_R1: - return "r1"; - case PERF_REG_ARM_R2: - return "r2"; - case PERF_REG_ARM_R3: - return "r3"; - case PERF_REG_ARM_R4: - return "r4"; - case PERF_REG_ARM_R5: - return "r5"; - case PERF_REG_ARM_R6: - return "r6"; - case PERF_REG_ARM_R7: - return "r7"; - case PERF_REG_ARM_R8: - return "r8"; - case PERF_REG_ARM_R9: - return "r9"; - case PERF_REG_ARM_R10: - return "r10"; - case PERF_REG_ARM_FP: - return "fp"; - case PERF_REG_ARM_IP: - return "ip"; - case PERF_REG_ARM_SP: - return "sp"; - case PERF_REG_ARM_LR: - return "lr"; - case PERF_REG_ARM_PC: - return "pc"; - default: - return NULL; - } - - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index 293a23bf8be3..2e8b2c4365a0 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -203,9 +203,11 @@ static int cs_etm_set_option(struct auxtrace_record *itr, struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL); /* Set option of each CPU we have */ - for (i = 0; i < cpu__max_cpu(); i++) { - if (!cpu_map__has(event_cpus, i) || - !cpu_map__has(online_cpus, i)) + for (i = 0; i < cpu__max_cpu().cpu; i++) { + struct perf_cpu cpu = { .cpu = i, }; + + if (!perf_cpu_map__has(event_cpus, cpu) || + !perf_cpu_map__has(online_cpus, cpu)) continue; if (option & BIT(ETM_OPT_CTXTID)) { @@ -407,25 +409,6 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, } - /* Validate auxtrace_mmap_pages provided by user */ - if (opts->auxtrace_mmap_pages) { - unsigned int max_page = (KiB(128) / page_size); - size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size; - - if (!privileged && - opts->auxtrace_mmap_pages > max_page) { - opts->auxtrace_mmap_pages = max_page; - pr_err("auxtrace too big, truncating to %d\n", - max_page); - } - - if (!is_power_of_2(sz)) { - pr_err("Invalid mmap size for %s: must be a power of 2\n", - CORESIGHT_ETM_PMU_NAME); - return -EINVAL; - } - } - if (opts->auxtrace_snapshot_mode) pr_debug2("%s snapshot size: %zu\n", CORESIGHT_ETM_PMU_NAME, opts->auxtrace_snapshot_size); @@ -541,9 +524,11 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, /* cpu map is not empty, we have specific CPUs to work with */ if (!perf_cpu_map__empty(event_cpus)) { - for (i = 0; i < cpu__max_cpu(); i++) { - if (!cpu_map__has(event_cpus, i) || - !cpu_map__has(online_cpus, i)) + for (i = 0; i < cpu__max_cpu().cpu; i++) { + struct perf_cpu cpu = { .cpu = i, }; + + if (!perf_cpu_map__has(event_cpus, cpu) || + !perf_cpu_map__has(online_cpus, cpu)) continue; if (cs_etm_is_ete(itr, i)) @@ -555,8 +540,10 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused, } } else { /* get configuration for all CPUs in the system */ - for (i = 0; i < cpu__max_cpu(); i++) { - if (!cpu_map__has(online_cpus, i)) + for (i = 0; i < cpu__max_cpu().cpu; i++) { + struct perf_cpu cpu = { .cpu = i, }; + + if (!perf_cpu_map__has(online_cpus, cpu)) continue; if (cs_etm_is_ete(itr, i)) @@ -741,8 +728,10 @@ static int cs_etm_info_fill(struct auxtrace_record *itr, } else { /* Make sure all specified CPUs are online */ for (i = 0; i < perf_cpu_map__nr(event_cpus); i++) { - if (cpu_map__has(event_cpus, i) && - !cpu_map__has(online_cpus, i)) + struct perf_cpu cpu = { .cpu = i, }; + + if (perf_cpu_map__has(event_cpus, cpu) && + !perf_cpu_map__has(online_cpus, cpu)) return -EINVAL; } @@ -762,9 +751,12 @@ static int cs_etm_info_fill(struct auxtrace_record *itr, offset = CS_ETM_SNAPSHOT + 1; - for (i = 0; i < cpu__max_cpu() && offset < priv_size; i++) - if (cpu_map__has(cpu_map, i)) + for (i = 0; i < cpu__max_cpu().cpu && offset < priv_size; i++) { + struct perf_cpu cpu = { .cpu = i, }; + + if (perf_cpu_map__has(cpu_map, cpu)) cs_etm_get_metadata(i, &offset, itr, info); + } perf_cpu_map__put(online_cpus); diff --git a/tools/perf/arch/arm64/include/perf_regs.h b/tools/perf/arch/arm64/include/perf_regs.h index fa3e07459f76..35a3cc775b39 100644 --- a/tools/perf/arch/arm64/include/perf_regs.h +++ b/tools/perf/arch/arm64/include/perf_regs.h @@ -4,7 +4,9 @@ #include <stdlib.h> #include <linux/types.h> +#define perf_event_arm_regs perf_event_arm64_regs #include <asm/perf_regs.h> +#undef perf_event_arm_regs void perf_regs_load(u64 *regs); @@ -15,80 +17,4 @@ void perf_regs_load(u64 *regs); #define PERF_REG_IP PERF_REG_ARM64_PC #define PERF_REG_SP PERF_REG_ARM64_SP -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_ARM64_X0: - return "x0"; - case PERF_REG_ARM64_X1: - return "x1"; - case PERF_REG_ARM64_X2: - return "x2"; - case PERF_REG_ARM64_X3: - return "x3"; - case PERF_REG_ARM64_X4: - return "x4"; - case PERF_REG_ARM64_X5: - return "x5"; - case PERF_REG_ARM64_X6: - return "x6"; - case PERF_REG_ARM64_X7: - return "x7"; - case PERF_REG_ARM64_X8: - return "x8"; - case PERF_REG_ARM64_X9: - return "x9"; - case PERF_REG_ARM64_X10: - return "x10"; - case PERF_REG_ARM64_X11: - return "x11"; - case PERF_REG_ARM64_X12: - return "x12"; - case PERF_REG_ARM64_X13: - return "x13"; - case PERF_REG_ARM64_X14: - return "x14"; - case PERF_REG_ARM64_X15: - return "x15"; - case PERF_REG_ARM64_X16: - return "x16"; - case PERF_REG_ARM64_X17: - return "x17"; - case PERF_REG_ARM64_X18: - return "x18"; - case PERF_REG_ARM64_X19: - return "x19"; - case PERF_REG_ARM64_X20: - return "x20"; - case PERF_REG_ARM64_X21: - return "x21"; - case PERF_REG_ARM64_X22: - return "x22"; - case PERF_REG_ARM64_X23: - return "x23"; - case PERF_REG_ARM64_X24: - return "x24"; - case PERF_REG_ARM64_X25: - return "x25"; - case PERF_REG_ARM64_X26: - return "x26"; - case PERF_REG_ARM64_X27: - return "x27"; - case PERF_REG_ARM64_X28: - return "x28"; - case PERF_REG_ARM64_X29: - return "x29"; - case PERF_REG_ARM64_SP: - return "sp"; - case PERF_REG_ARM64_LR: - return "lr"; - case PERF_REG_ARM64_PC: - return "pc"; - default: - return NULL; - } - - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/arm64/util/machine.c b/tools/perf/arch/arm64/util/machine.c index 7e7714290a87..d2ce31e28cd7 100644 --- a/tools/perf/arch/arm64/util/machine.c +++ b/tools/perf/arch/arm64/util/machine.c @@ -5,6 +5,8 @@ #include <string.h> #include "debug.h" #include "symbol.h" +#include "callchain.h" +#include "record.h" /* On arm64, kernel text segment starts at high memory address, * for example 0xffff 0000 8xxx xxxx. Modules start at a low memory @@ -26,3 +28,8 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c) p->end = c->start; pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end); } + +void arch__add_leaf_frame_record_opts(struct record_opts *opts) +{ + opts->sample_user_regs |= sample_reg_masks[PERF_REG_ARM64_LR].mask; +} diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c index d3a18f9c85f6..79124bba713e 100644 --- a/tools/perf/arch/arm64/util/pmu.c +++ b/tools/perf/arch/arm64/util/pmu.c @@ -15,7 +15,7 @@ const struct pmu_events_map *pmu_events_map__find(void) * The cpumap should cover all CPUs. Otherwise, some CPUs may * not support some events or have different event IDs. */ - if (pmu->cpus->nr != cpu__max_cpu()) + if (pmu->cpus->nr != cpu__max_cpu().cpu) return NULL; return perf_pmu__find_map(pmu); diff --git a/tools/perf/arch/csky/include/perf_regs.h b/tools/perf/arch/csky/include/perf_regs.h index 25ac3bdcb9d1..1afcc0e916c2 100644 --- a/tools/perf/arch/csky/include/perf_regs.h +++ b/tools/perf/arch/csky/include/perf_regs.h @@ -15,86 +15,4 @@ #define PERF_REG_IP PERF_REG_CSKY_PC #define PERF_REG_SP PERF_REG_CSKY_SP -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_CSKY_A0: - return "a0"; - case PERF_REG_CSKY_A1: - return "a1"; - case PERF_REG_CSKY_A2: - return "a2"; - case PERF_REG_CSKY_A3: - return "a3"; - case PERF_REG_CSKY_REGS0: - return "regs0"; - case PERF_REG_CSKY_REGS1: - return "regs1"; - case PERF_REG_CSKY_REGS2: - return "regs2"; - case PERF_REG_CSKY_REGS3: - return "regs3"; - case PERF_REG_CSKY_REGS4: - return "regs4"; - case PERF_REG_CSKY_REGS5: - return "regs5"; - case PERF_REG_CSKY_REGS6: - return "regs6"; - case PERF_REG_CSKY_REGS7: - return "regs7"; - case PERF_REG_CSKY_REGS8: - return "regs8"; - case PERF_REG_CSKY_REGS9: - return "regs9"; - case PERF_REG_CSKY_SP: - return "sp"; - case PERF_REG_CSKY_LR: - return "lr"; - case PERF_REG_CSKY_PC: - return "pc"; -#if defined(__CSKYABIV2__) - case PERF_REG_CSKY_EXREGS0: - return "exregs0"; - case PERF_REG_CSKY_EXREGS1: - return "exregs1"; - case PERF_REG_CSKY_EXREGS2: - return "exregs2"; - case PERF_REG_CSKY_EXREGS3: - return "exregs3"; - case PERF_REG_CSKY_EXREGS4: - return "exregs4"; - case PERF_REG_CSKY_EXREGS5: - return "exregs5"; - case PERF_REG_CSKY_EXREGS6: - return "exregs6"; - case PERF_REG_CSKY_EXREGS7: - return "exregs7"; - case PERF_REG_CSKY_EXREGS8: - return "exregs8"; - case PERF_REG_CSKY_EXREGS9: - return "exregs9"; - case PERF_REG_CSKY_EXREGS10: - return "exregs10"; - case PERF_REG_CSKY_EXREGS11: - return "exregs11"; - case PERF_REG_CSKY_EXREGS12: - return "exregs12"; - case PERF_REG_CSKY_EXREGS13: - return "exregs13"; - case PERF_REG_CSKY_EXREGS14: - return "exregs14"; - case PERF_REG_CSKY_TLS: - return "tls"; - case PERF_REG_CSKY_HI: - return "hi"; - case PERF_REG_CSKY_LO: - return "lo"; -#endif - default: - return NULL; - } - - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/mips/include/perf_regs.h b/tools/perf/arch/mips/include/perf_regs.h index ee73b36a14d1..b8cd8bbb37ba 100644 --- a/tools/perf/arch/mips/include/perf_regs.h +++ b/tools/perf/arch/mips/include/perf_regs.h @@ -12,73 +12,4 @@ #define PERF_REGS_MASK ((1ULL << PERF_REG_MIPS_MAX) - 1) -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_MIPS_PC: - return "PC"; - case PERF_REG_MIPS_R1: - return "$1"; - case PERF_REG_MIPS_R2: - return "$2"; - case PERF_REG_MIPS_R3: - return "$3"; - case PERF_REG_MIPS_R4: - return "$4"; - case PERF_REG_MIPS_R5: - return "$5"; - case PERF_REG_MIPS_R6: - return "$6"; - case PERF_REG_MIPS_R7: - return "$7"; - case PERF_REG_MIPS_R8: - return "$8"; - case PERF_REG_MIPS_R9: - return "$9"; - case PERF_REG_MIPS_R10: - return "$10"; - case PERF_REG_MIPS_R11: - return "$11"; - case PERF_REG_MIPS_R12: - return "$12"; - case PERF_REG_MIPS_R13: - return "$13"; - case PERF_REG_MIPS_R14: - return "$14"; - case PERF_REG_MIPS_R15: - return "$15"; - case PERF_REG_MIPS_R16: - return "$16"; - case PERF_REG_MIPS_R17: - return "$17"; - case PERF_REG_MIPS_R18: - return "$18"; - case PERF_REG_MIPS_R19: - return "$19"; - case PERF_REG_MIPS_R20: - return "$20"; - case PERF_REG_MIPS_R21: - return "$21"; - case PERF_REG_MIPS_R22: - return "$22"; - case PERF_REG_MIPS_R23: - return "$23"; - case PERF_REG_MIPS_R24: - return "$24"; - case PERF_REG_MIPS_R25: - return "$25"; - case PERF_REG_MIPS_R28: - return "$28"; - case PERF_REG_MIPS_R29: - return "$29"; - case PERF_REG_MIPS_R30: - return "$30"; - case PERF_REG_MIPS_R31: - return "$31"; - default: - break; - } - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/powerpc/include/perf_regs.h b/tools/perf/arch/powerpc/include/perf_regs.h index 93339d17acc4..9bb17c3f370b 100644 --- a/tools/perf/arch/powerpc/include/perf_regs.h +++ b/tools/perf/arch/powerpc/include/perf_regs.h @@ -19,70 +19,4 @@ void perf_regs_load(u64 *regs); #define PERF_REG_IP PERF_REG_POWERPC_NIP #define PERF_REG_SP PERF_REG_POWERPC_R1 -static const char *reg_names[] = { - [PERF_REG_POWERPC_R0] = "r0", - [PERF_REG_POWERPC_R1] = "r1", - [PERF_REG_POWERPC_R2] = "r2", - [PERF_REG_POWERPC_R3] = "r3", - [PERF_REG_POWERPC_R4] = "r4", - [PERF_REG_POWERPC_R5] = "r5", - [PERF_REG_POWERPC_R6] = "r6", - [PERF_REG_POWERPC_R7] = "r7", - [PERF_REG_POWERPC_R8] = "r8", - [PERF_REG_POWERPC_R9] = "r9", - [PERF_REG_POWERPC_R10] = "r10", - [PERF_REG_POWERPC_R11] = "r11", - [PERF_REG_POWERPC_R12] = "r12", - [PERF_REG_POWERPC_R13] = "r13", - [PERF_REG_POWERPC_R14] = "r14", - [PERF_REG_POWERPC_R15] = "r15", - [PERF_REG_POWERPC_R16] = "r16", - [PERF_REG_POWERPC_R17] = "r17", - [PERF_REG_POWERPC_R18] = "r18", - [PERF_REG_POWERPC_R19] = "r19", - [PERF_REG_POWERPC_R20] = "r20", - [PERF_REG_POWERPC_R21] = "r21", - [PERF_REG_POWERPC_R22] = "r22", - [PERF_REG_POWERPC_R23] = "r23", - [PERF_REG_POWERPC_R24] = "r24", - [PERF_REG_POWERPC_R25] = "r25", - [PERF_REG_POWERPC_R26] = "r26", - [PERF_REG_POWERPC_R27] = "r27", - [PERF_REG_POWERPC_R28] = "r28", - [PERF_REG_POWERPC_R29] = "r29", - [PERF_REG_POWERPC_R30] = "r30", - [PERF_REG_POWERPC_R31] = "r31", - [PERF_REG_POWERPC_NIP] = "nip", - [PERF_REG_POWERPC_MSR] = "msr", - [PERF_REG_POWERPC_ORIG_R3] = "orig_r3", - [PERF_REG_POWERPC_CTR] = "ctr", - [PERF_REG_POWERPC_LINK] = "link", - [PERF_REG_POWERPC_XER] = "xer", - [PERF_REG_POWERPC_CCR] = "ccr", - [PERF_REG_POWERPC_SOFTE] = "softe", - [PERF_REG_POWERPC_TRAP] = "trap", - [PERF_REG_POWERPC_DAR] = "dar", - [PERF_REG_POWERPC_DSISR] = "dsisr", - [PERF_REG_POWERPC_SIER] = "sier", - [PERF_REG_POWERPC_MMCRA] = "mmcra", - [PERF_REG_POWERPC_MMCR0] = "mmcr0", - [PERF_REG_POWERPC_MMCR1] = "mmcr1", - [PERF_REG_POWERPC_MMCR2] = "mmcr2", - [PERF_REG_POWERPC_MMCR3] = "mmcr3", - [PERF_REG_POWERPC_SIER2] = "sier2", - [PERF_REG_POWERPC_SIER3] = "sier3", - [PERF_REG_POWERPC_PMC1] = "pmc1", - [PERF_REG_POWERPC_PMC2] = "pmc2", - [PERF_REG_POWERPC_PMC3] = "pmc3", - [PERF_REG_POWERPC_PMC4] = "pmc4", - [PERF_REG_POWERPC_PMC5] = "pmc5", - [PERF_REG_POWERPC_PMC6] = "pmc6", - [PERF_REG_POWERPC_SDAR] = "sdar", - [PERF_REG_POWERPC_SIAR] = "siar", -}; - -static inline const char *__perf_reg_name(int id) -{ - return reg_names[id]; -} #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c index 3bf441257466..cf430a4c55b9 100644 --- a/tools/perf/arch/powerpc/util/event.c +++ b/tools/perf/arch/powerpc/util/event.c @@ -40,8 +40,12 @@ const char *arch_perf_header_entry(const char *se_header) { if (!strcmp(se_header, "Local INSTR Latency")) return "Finish Cyc"; - else if (!strcmp(se_header, "Pipeline Stage Cycle")) + else if (!strcmp(se_header, "INSTR Latency")) + return "Global Finish_cyc"; + else if (!strcmp(se_header, "Local Pipeline Stage Cycle")) return "Dispatch Cyc"; + else if (!strcmp(se_header, "Pipeline Stage Cycle")) + return "Global Dispatch_cyc"; return se_header; } @@ -49,5 +53,7 @@ int arch_support_sort_key(const char *sort_key) { if (!strcmp(sort_key, "p_stage_cyc")) return 1; + if (!strcmp(sort_key, "local_p_stage_cyc")) + return 1; return 0; } diff --git a/tools/perf/arch/riscv/include/perf_regs.h b/tools/perf/arch/riscv/include/perf_regs.h index 6b02a767c918..6944bf0de53e 100644 --- a/tools/perf/arch/riscv/include/perf_regs.h +++ b/tools/perf/arch/riscv/include/perf_regs.h @@ -19,78 +19,4 @@ #define PERF_REG_IP PERF_REG_RISCV_PC #define PERF_REG_SP PERF_REG_RISCV_SP -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_RISCV_PC: - return "pc"; - case PERF_REG_RISCV_RA: - return "ra"; - case PERF_REG_RISCV_SP: - return "sp"; - case PERF_REG_RISCV_GP: - return "gp"; - case PERF_REG_RISCV_TP: - return "tp"; - case PERF_REG_RISCV_T0: - return "t0"; - case PERF_REG_RISCV_T1: - return "t1"; - case PERF_REG_RISCV_T2: - return "t2"; - case PERF_REG_RISCV_S0: - return "s0"; - case PERF_REG_RISCV_S1: - return "s1"; - case PERF_REG_RISCV_A0: - return "a0"; - case PERF_REG_RISCV_A1: - return "a1"; - case PERF_REG_RISCV_A2: - return "a2"; - case PERF_REG_RISCV_A3: - return "a3"; - case PERF_REG_RISCV_A4: - return "a4"; - case PERF_REG_RISCV_A5: - return "a5"; - case PERF_REG_RISCV_A6: - return "a6"; - case PERF_REG_RISCV_A7: - return "a7"; - case PERF_REG_RISCV_S2: - return "s2"; - case PERF_REG_RISCV_S3: - return "s3"; - case PERF_REG_RISCV_S4: - return "s4"; - case PERF_REG_RISCV_S5: - return "s5"; - case PERF_REG_RISCV_S6: - return "s6"; - case PERF_REG_RISCV_S7: - return "s7"; - case PERF_REG_RISCV_S8: - return "s8"; - case PERF_REG_RISCV_S9: - return "s9"; - case PERF_REG_RISCV_S10: - return "s10"; - case PERF_REG_RISCV_S11: - return "s11"; - case PERF_REG_RISCV_T3: - return "t3"; - case PERF_REG_RISCV_T4: - return "t4"; - case PERF_REG_RISCV_T5: - return "t5"; - case PERF_REG_RISCV_T6: - return "t6"; - default: - return NULL; - } - - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/s390/include/perf_regs.h b/tools/perf/arch/s390/include/perf_regs.h index ce3031526623..52fcc0891da6 100644 --- a/tools/perf/arch/s390/include/perf_regs.h +++ b/tools/perf/arch/s390/include/perf_regs.h @@ -14,82 +14,4 @@ void perf_regs_load(u64 *regs); #define PERF_REG_IP PERF_REG_S390_PC #define PERF_REG_SP PERF_REG_S390_R15 -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_S390_R0: - return "R0"; - case PERF_REG_S390_R1: - return "R1"; - case PERF_REG_S390_R2: - return "R2"; - case PERF_REG_S390_R3: - return "R3"; - case PERF_REG_S390_R4: - return "R4"; - case PERF_REG_S390_R5: - return "R5"; - case PERF_REG_S390_R6: - return "R6"; - case PERF_REG_S390_R7: - return "R7"; - case PERF_REG_S390_R8: - return "R8"; - case PERF_REG_S390_R9: - return "R9"; - case PERF_REG_S390_R10: - return "R10"; - case PERF_REG_S390_R11: - return "R11"; - case PERF_REG_S390_R12: - return "R12"; - case PERF_REG_S390_R13: - return "R13"; - case PERF_REG_S390_R14: - return "R14"; - case PERF_REG_S390_R15: - return "R15"; - case PERF_REG_S390_FP0: - return "FP0"; - case PERF_REG_S390_FP1: - return "FP1"; - case PERF_REG_S390_FP2: - return "FP2"; - case PERF_REG_S390_FP3: - return "FP3"; - case PERF_REG_S390_FP4: - return "FP4"; - case PERF_REG_S390_FP5: - return "FP5"; - case PERF_REG_S390_FP6: - return "FP6"; - case PERF_REG_S390_FP7: - return "FP7"; - case PERF_REG_S390_FP8: - return "FP8"; - case PERF_REG_S390_FP9: - return "FP9"; - case PERF_REG_S390_FP10: - return "FP10"; - case PERF_REG_S390_FP11: - return "FP11"; - case PERF_REG_S390_FP12: - return "FP12"; - case PERF_REG_S390_FP13: - return "FP13"; - case PERF_REG_S390_FP14: - return "FP14"; - case PERF_REG_S390_FP15: - return "FP15"; - case PERF_REG_S390_MASK: - return "MASK"; - case PERF_REG_S390_PC: - return "PC"; - default: - return NULL; - } - - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h index cddc4cdc0d9b..16e23b722042 100644 --- a/tools/perf/arch/x86/include/perf_regs.h +++ b/tools/perf/arch/x86/include/perf_regs.h @@ -23,86 +23,4 @@ void perf_regs_load(u64 *regs); #define PERF_REG_IP PERF_REG_X86_IP #define PERF_REG_SP PERF_REG_X86_SP -static inline const char *__perf_reg_name(int id) -{ - switch (id) { - case PERF_REG_X86_AX: - return "AX"; - case PERF_REG_X86_BX: - return "BX"; - case PERF_REG_X86_CX: - return "CX"; - case PERF_REG_X86_DX: - return "DX"; - case PERF_REG_X86_SI: - return "SI"; - case PERF_REG_X86_DI: - return "DI"; - case PERF_REG_X86_BP: - return "BP"; - case PERF_REG_X86_SP: - return "SP"; - case PERF_REG_X86_IP: - return "IP"; - case PERF_REG_X86_FLAGS: - return "FLAGS"; - case PERF_REG_X86_CS: - return "CS"; - case PERF_REG_X86_SS: - return "SS"; - case PERF_REG_X86_DS: - return "DS"; - case PERF_REG_X86_ES: - return "ES"; - case PERF_REG_X86_FS: - return "FS"; - case PERF_REG_X86_GS: - return "GS"; -#ifdef HAVE_ARCH_X86_64_SUPPORT - case PERF_REG_X86_R8: - return "R8"; - case PERF_REG_X86_R9: - return "R9"; - case PERF_REG_X86_R10: - return "R10"; - case PERF_REG_X86_R11: - return "R11"; - case PERF_REG_X86_R12: - return "R12"; - case PERF_REG_X86_R13: - return "R13"; - case PERF_REG_X86_R14: - return "R14"; - case PERF_REG_X86_R15: - return "R15"; -#endif /* HAVE_ARCH_X86_64_SUPPORT */ - -#define XMM(x) \ - case PERF_REG_X86_XMM ## x: \ - case PERF_REG_X86_XMM ## x + 1: \ - return "XMM" #x; - XMM(0) - XMM(1) - XMM(2) - XMM(3) - XMM(4) - XMM(5) - XMM(6) - XMM(7) - XMM(8) - XMM(9) - XMM(10) - XMM(11) - XMM(12) - XMM(13) - XMM(14) - XMM(15) -#undef XMM - default: - return NULL; - } - - return NULL; -} - #endif /* ARCH_PERF_REGS_H */ diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c index 0b0951030a2f..f924246eff78 100644 --- a/tools/perf/arch/x86/util/evlist.c +++ b/tools/perf/arch/x86/util/evlist.c @@ -17,3 +17,20 @@ int arch_evlist__add_default_attrs(struct evlist *evlist) else return parse_events(evlist, TOPDOWN_L1_EVENTS, NULL); } + +struct evsel *arch_evlist__leader(struct list_head *list) +{ + struct evsel *evsel, *first; + + first = list_first_entry(list, struct evsel, core.node); + + if (!pmu_have_event("cpu", "slots")) + return first; + + __evlist__for_each_entry(list, evsel) { + if (evsel->pmu_name && !strcmp(evsel->pmu_name, "cpu") && + evsel->name && strstr(evsel->name, "slots")) + return evsel; + } + return first; +} diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c index ddaca75c3bc0..1a17ec83d3c4 100644 --- a/tools/perf/bench/epoll-ctl.c +++ b/tools/perf/bench/epoll-ctl.c @@ -253,7 +253,7 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu) if (!noaffinity) { CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset); if (ret) diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c index 79d13dbc0a47..0d1dd8879197 100644 --- a/tools/perf/bench/epoll-wait.c +++ b/tools/perf/bench/epoll-wait.c @@ -342,7 +342,7 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu) if (!noaffinity) { CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset); if (ret) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index fcdea3e44937..9627b6ab8670 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -177,7 +177,7 @@ int bench_futex_hash(int argc, const char **argv) goto errmem; CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset); if (ret) diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c index 137890f78e17..a512a320df74 100644 --- a/tools/perf/bench/futex-lock-pi.c +++ b/tools/perf/bench/futex-lock-pi.c @@ -136,7 +136,7 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr, worker[i].futex = &global_futex; CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c index f7a5ffebb940..aca47ce8b1e7 100644 --- a/tools/perf/bench/futex-requeue.c +++ b/tools/perf/bench/futex-requeue.c @@ -131,7 +131,7 @@ static void block_threads(pthread_t *w, /* create and block all threads */ for (i = 0; i < params.nthreads; i++) { CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c index 0983f40b4b40..888ee6037945 100644 --- a/tools/perf/bench/futex-wake-parallel.c +++ b/tools/perf/bench/futex-wake-parallel.c @@ -152,7 +152,7 @@ static void block_threads(pthread_t *w, pthread_attr_t thread_attr, /* create and block all threads */ for (i = 0; i < params.nthreads; i++) { CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c index 2226a475e782..aa82db51c0ab 100644 --- a/tools/perf/bench/futex-wake.c +++ b/tools/perf/bench/futex-wake.c @@ -105,7 +105,7 @@ static void block_threads(pthread_t *w, /* create and block all threads */ for (i = 0; i < params.nthreads; i++) { CPU_ZERO(&cpuset); - CPU_SET(cpu->map[i % cpu->nr], &cpuset); + CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c index d0895162c2ba..d291f3a8af5f 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -226,7 +226,6 @@ static void run_collection(struct collection *coll) if (!bench->fn) break; printf("# Running %s/%s benchmark...\n", coll->name, bench->name); - fflush(stdout); argv[1] = bench->name; run_bench(coll->name, bench->name, bench->fn, 1, argv); @@ -247,6 +246,9 @@ int cmd_bench(int argc, const char **argv) struct collection *coll; int ret = 0; + /* Unbuffered output */ + setvbuf(stdout, NULL, _IONBF, 0); + if (argc < 2) { /* No collection specified. */ print_usage(); @@ -300,7 +302,6 @@ int cmd_bench(int argc, const char **argv) if (bench_format == BENCH_FORMAT_DEFAULT) printf("# Running '%s/%s' benchmark:\n", coll->name, bench->name); - fflush(stdout); ret = run_bench(coll->name, bench->name, bench->fn, argc-1, argv+1); goto end; } diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c index 0db3cfc04c47..cd381693658b 100644 --- a/tools/perf/builtin-buildid-cache.c +++ b/tools/perf/builtin-buildid-cache.c @@ -351,10 +351,14 @@ static int build_id_cache__show_all(void) static int perf_buildid_cache_config(const char *var, const char *value, void *cb) { - const char **debuginfod = cb; + struct perf_debuginfod *di = cb; - if (!strcmp(var, "buildid-cache.debuginfod")) - *debuginfod = strdup(value); + if (!strcmp(var, "buildid-cache.debuginfod")) { + di->urls = strdup(value); + if (!di->urls) + return -ENOMEM; + di->set = true; + } return 0; } @@ -373,8 +377,8 @@ int cmd_buildid_cache(int argc, const char **argv) *purge_name_list_str = NULL, *missing_filename = NULL, *update_name_list_str = NULL, - *kcore_filename = NULL, - *debuginfod = NULL; + *kcore_filename = NULL; + struct perf_debuginfod debuginfod = { }; char sbuf[STRERR_BUFSIZE]; struct perf_data data = { @@ -399,8 +403,10 @@ int cmd_buildid_cache(int argc, const char **argv) OPT_BOOLEAN('f', "force", &force, "don't complain, do it"), OPT_STRING('u', "update", &update_name_list_str, "file list", "file(s) to update"), - OPT_STRING(0, "debuginfod", &debuginfod, "debuginfod url", - "set debuginfod url"), + OPT_STRING_OPTARG_SET(0, "debuginfod", &debuginfod.urls, + &debuginfod.set, "debuginfod urls", + "Enable debuginfod data retrieval from DEBUGINFOD_URLS or specified urls", + "system"), OPT_INCR('v', "verbose", &verbose, "be more verbose"), OPT_INTEGER(0, "target-ns", &ns_id, "target pid for namespace context"), OPT_END() @@ -425,10 +431,7 @@ int cmd_buildid_cache(int argc, const char **argv) if (argc || !(list_files || opts_flag)) usage_with_options(buildid_cache_usage, buildid_cache_options); - if (debuginfod) { - pr_debug("DEBUGINFOD_URLS=%s\n", debuginfod); - setenv("DEBUGINFOD_URLS", debuginfod, 1); - } + perf_debuginfod_setup(&debuginfod); /* -l is exclusive. It can not be used with other options. */ if (list_files && opts_flag) { diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index b5c67ef73862..77dd4afacca4 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -2015,7 +2015,8 @@ static int setup_nodes(struct perf_session *session) { struct numa_node *n; unsigned long **nodes; - int node, cpu; + int node, idx; + struct perf_cpu cpu; int *cpu2node; if (c2c.node_info > 2) @@ -2038,8 +2039,8 @@ static int setup_nodes(struct perf_session *session) if (!cpu2node) return -ENOMEM; - for (cpu = 0; cpu < c2c.cpus_cnt; cpu++) - cpu2node[cpu] = -1; + for (idx = 0; idx < c2c.cpus_cnt; idx++) + cpu2node[idx] = -1; c2c.cpu2node = cpu2node; @@ -2057,13 +2058,13 @@ static int setup_nodes(struct perf_session *session) if (perf_cpu_map__empty(map)) continue; - for (cpu = 0; cpu < map->nr; cpu++) { - set_bit(map->map[cpu], set); + perf_cpu_map__for_each_cpu(cpu, idx, map) { + set_bit(cpu.cpu, set); - if (WARN_ONCE(cpu2node[map->map[cpu]] != -1, "node/cpu topology bug")) + if (WARN_ONCE(cpu2node[cpu.cpu] != -1, "node/cpu topology bug")) return -EINVAL; - cpu2node[map->map[cpu]] = node; + cpu2node[cpu.cpu] = node; } } diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c index 87cb11a7a3ee..71452599f87d 100644 --- a/tools/perf/builtin-ftrace.c +++ b/tools/perf/builtin-ftrace.c @@ -13,7 +13,9 @@ #include <signal.h> #include <stdlib.h> #include <fcntl.h> +#include <math.h> #include <poll.h> +#include <ctype.h> #include <linux/capability.h> #include <linux/string.h> @@ -28,36 +30,12 @@ #include "strfilter.h" #include "util/cap.h" #include "util/config.h" +#include "util/ftrace.h" #include "util/units.h" #include "util/parse-sublevel-options.h" #define DEFAULT_TRACER "function_graph" -struct perf_ftrace { - struct evlist *evlist; - struct target target; - const char *tracer; - struct list_head filters; - struct list_head notrace; - struct list_head graph_funcs; - struct list_head nograph_funcs; - int graph_depth; - unsigned long percpu_buffer_size; - bool inherit; - int func_stack_trace; - int func_irq_info; - int graph_nosleep_time; - int graph_noirqs; - int graph_verbose; - int graph_thresh; - unsigned int initial_delay; -}; - -struct filter_entry { - struct list_head list; - char name[]; -}; - static volatile int workload_exec_errno; static bool done; @@ -303,7 +281,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap) int ret; int last_cpu; - last_cpu = cpu_map__cpu(cpumap, cpumap->nr - 1); + last_cpu = perf_cpu_map__cpu(cpumap, cpumap->nr - 1).cpu; mask_size = last_cpu / 4 + 2; /* one more byte for EOS */ mask_size += last_cpu / 32; /* ',' is needed for every 32th cpus */ @@ -565,7 +543,24 @@ static int set_tracing_options(struct perf_ftrace *ftrace) return 0; } -static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv) +static void select_tracer(struct perf_ftrace *ftrace) +{ + bool graph = !list_empty(&ftrace->graph_funcs) || + !list_empty(&ftrace->nograph_funcs); + bool func = !list_empty(&ftrace->filters) || + !list_empty(&ftrace->notrace); + + /* The function_graph has priority over function tracer. */ + if (graph) + ftrace->tracer = "function_graph"; + else if (func) + ftrace->tracer = "function"; + /* Otherwise, the default tracer is used. */ + + pr_debug("%s tracer is used\n", ftrace->tracer); +} + +static int __cmd_ftrace(struct perf_ftrace *ftrace) { char *trace_file; int trace_fd; @@ -586,10 +581,7 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv) return -1; } - signal(SIGINT, sig_handler); - signal(SIGUSR1, sig_handler); - signal(SIGCHLD, sig_handler); - signal(SIGPIPE, sig_handler); + select_tracer(ftrace); if (reset_tracing_files(ftrace) < 0) { pr_err("failed to reset ftrace\n"); @@ -600,11 +592,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv) if (write_tracing_file("trace", "0") < 0) goto out; - if (argc && evlist__prepare_workload(ftrace->evlist, &ftrace->target, argv, false, - ftrace__workload_exec_failed_signal) < 0) { - goto out; - } - if (set_tracing_options(ftrace) < 0) goto out_reset; @@ -693,6 +680,270 @@ out: return (done && !workload_exec_errno) ? 0 : -1; } +static void make_histogram(int buckets[], char *buf, size_t len, char *linebuf) +{ + char *p, *q; + char *unit; + double num; + int i; + + /* ensure NUL termination */ + buf[len] = '\0'; + + /* handle data line by line */ + for (p = buf; (q = strchr(p, '\n')) != NULL; p = q + 1) { + *q = '\0'; + /* move it to the line buffer */ + strcat(linebuf, p); + + /* + * parse trace output to get function duration like in + * + * # tracer: function_graph + * # + * # CPU DURATION FUNCTION CALLS + * # | | | | | | | + * 1) + 10.291 us | do_filp_open(); + * 1) 4.889 us | do_filp_open(); + * 1) 6.086 us | do_filp_open(); + * + */ + if (linebuf[0] == '#') + goto next; + + /* ignore CPU */ + p = strchr(linebuf, ')'); + if (p == NULL) + p = linebuf; + + while (*p && !isdigit(*p) && (*p != '|')) + p++; + + /* no duration */ + if (*p == '\0' || *p == '|') + goto next; + + num = strtod(p, &unit); + if (!unit || strncmp(unit, " us", 3)) + goto next; + + i = log2(num); + if (i < 0) + i = 0; + if (i >= NUM_BUCKET) + i = NUM_BUCKET - 1; + + buckets[i]++; + +next: + /* empty the line buffer for the next output */ + linebuf[0] = '\0'; + } + + /* preserve any remaining output (before newline) */ + strcat(linebuf, p); +} + +static void display_histogram(int buckets[]) +{ + int i; + int total = 0; + int bar_total = 46; /* to fit in 80 column */ + char bar[] = "###############################################"; + int bar_len; + + for (i = 0; i < NUM_BUCKET; i++) + total += buckets[i]; + + if (total == 0) { + printf("No data found\n"); + return; + } + + printf("# %14s | %10s | %-*s |\n", + " DURATION ", "COUNT", bar_total, "GRAPH"); + + bar_len = buckets[0] * bar_total / total; + printf(" %4d - %-4d %s | %10d | %.*s%*s |\n", + 0, 1, "us", buckets[0], bar_len, bar, bar_total - bar_len, ""); + + for (i = 1; i < NUM_BUCKET - 1; i++) { + int start = (1 << (i - 1)); + int stop = 1 << i; + const char *unit = "us"; + + if (start >= 1024) { + start >>= 10; + stop >>= 10; + unit = "ms"; + } + bar_len = buckets[i] * bar_total / total; + printf(" %4d - %-4d %s | %10d | %.*s%*s |\n", + start, stop, unit, buckets[i], bar_len, bar, + bar_total - bar_len, ""); + } + + bar_len = buckets[NUM_BUCKET - 1] * bar_total / total; + printf(" %4d - %-4s %s | %10d | %.*s%*s |\n", + 1, "...", " s", buckets[NUM_BUCKET - 1], bar_len, bar, + bar_total - bar_len, ""); + +} + +static int prepare_func_latency(struct perf_ftrace *ftrace) +{ + char *trace_file; + int fd; + + if (ftrace->target.use_bpf) + return perf_ftrace__latency_prepare_bpf(ftrace); + + if (reset_tracing_files(ftrace) < 0) { + pr_err("failed to reset ftrace\n"); + return -1; + } + + /* reset ftrace buffer */ + if (write_tracing_file("trace", "0") < 0) + return -1; + + if (set_tracing_options(ftrace) < 0) + return -1; + + /* force to use the function_graph tracer to track duration */ + if (write_tracing_file("current_tracer", "function_graph") < 0) { + pr_err("failed to set current_tracer to function_graph\n"); + return -1; + } + + trace_file = get_tracing_file("trace_pipe"); + if (!trace_file) { + pr_err("failed to open trace_pipe\n"); + return -1; + } + + fd = open(trace_file, O_RDONLY); + if (fd < 0) + pr_err("failed to open trace_pipe\n"); + + put_tracing_file(trace_file); + return fd; +} + +static int start_func_latency(struct perf_ftrace *ftrace) +{ + if (ftrace->target.use_bpf) + return perf_ftrace__latency_start_bpf(ftrace); + + if (write_tracing_file("tracing_on", "1") < 0) { + pr_err("can't enable tracing\n"); + return -1; + } + + return 0; +} + +static int stop_func_latency(struct perf_ftrace *ftrace) +{ + if (ftrace->target.use_bpf) + return perf_ftrace__latency_stop_bpf(ftrace); + + write_tracing_file("tracing_on", "0"); + return 0; +} + +static int read_func_latency(struct perf_ftrace *ftrace, int buckets[]) +{ + if (ftrace->target.use_bpf) + return perf_ftrace__latency_read_bpf(ftrace, buckets); + + return 0; +} + +static int cleanup_func_latency(struct perf_ftrace *ftrace) +{ + if (ftrace->target.use_bpf) + return perf_ftrace__latency_cleanup_bpf(ftrace); + + reset_tracing_files(ftrace); + return 0; +} + +static int __cmd_latency(struct perf_ftrace *ftrace) +{ + int trace_fd; + char buf[4096]; + char line[256]; + struct pollfd pollfd = { + .events = POLLIN, + }; + int buckets[NUM_BUCKET] = { }; + + if (!(perf_cap__capable(CAP_PERFMON) || + perf_cap__capable(CAP_SYS_ADMIN))) { + pr_err("ftrace only works for %s!\n", +#ifdef HAVE_LIBCAP_SUPPORT + "users with the CAP_PERFMON or CAP_SYS_ADMIN capability" +#else + "root" +#endif + ); + return -1; + } + + trace_fd = prepare_func_latency(ftrace); + if (trace_fd < 0) + goto out; + + fcntl(trace_fd, F_SETFL, O_NONBLOCK); + pollfd.fd = trace_fd; + + if (start_func_latency(ftrace) < 0) + goto out; + + evlist__start_workload(ftrace->evlist); + + line[0] = '\0'; + while (!done) { + if (poll(&pollfd, 1, -1) < 0) + break; + + if (pollfd.revents & POLLIN) { + int n = read(trace_fd, buf, sizeof(buf) - 1); + if (n < 0) + break; + + make_histogram(buckets, buf, n, line); + } + } + + stop_func_latency(ftrace); + + if (workload_exec_errno) { + const char *emsg = str_error_r(workload_exec_errno, buf, sizeof(buf)); + pr_err("workload failed: %s\n", emsg); + goto out; + } + + /* read remaining buffer contents */ + while (!ftrace->target.use_bpf) { + int n = read(trace_fd, buf, sizeof(buf) - 1); + if (n <= 0) + break; + make_histogram(buckets, buf, n, line); + } + + read_func_latency(ftrace, buckets); + + display_histogram(buckets); + +out: + close(trace_fd); + cleanup_func_latency(ftrace); + + return (done && !workload_exec_errno) ? 0 : -1; +} + static int perf_ftrace_config(const char *var, const char *value, void *cb) { struct perf_ftrace *ftrace = cb; @@ -855,22 +1106,11 @@ static int parse_graph_tracer_opts(const struct option *opt, return 0; } -static void select_tracer(struct perf_ftrace *ftrace) -{ - bool graph = !list_empty(&ftrace->graph_funcs) || - !list_empty(&ftrace->nograph_funcs); - bool func = !list_empty(&ftrace->filters) || - !list_empty(&ftrace->notrace); - - /* The function_graph has priority over function tracer. */ - if (graph) - ftrace->tracer = "function_graph"; - else if (func) - ftrace->tracer = "function"; - /* Otherwise, the default tracer is used. */ - - pr_debug("%s tracer is used\n", ftrace->tracer); -} +enum perf_ftrace_subcommand { + PERF_FTRACE_NONE, + PERF_FTRACE_TRACE, + PERF_FTRACE_LATENCY, +}; int cmd_ftrace(int argc, const char **argv) { @@ -879,17 +1119,7 @@ int cmd_ftrace(int argc, const char **argv) .tracer = DEFAULT_TRACER, .target = { .uid = UINT_MAX, }, }; - const char * const ftrace_usage[] = { - "perf ftrace [<options>] [<command>]", - "perf ftrace [<options>] -- <command> [<options>]", - NULL - }; - const struct option ftrace_options[] = { - OPT_STRING('t', "tracer", &ftrace.tracer, "tracer", - "Tracer to use: function_graph(default) or function"), - OPT_CALLBACK_DEFAULT('F', "funcs", NULL, "[FILTER]", - "Show available functions to filter", - opt_list_avail_functions, "*"), + const struct option common_options[] = { OPT_STRING('p', "pid", &ftrace.target.pid, "pid", "Trace on existing process id"), /* TODO: Add short option -t after -t/--tracer can be removed. */ @@ -901,6 +1131,14 @@ int cmd_ftrace(int argc, const char **argv) "System-wide collection from all CPUs"), OPT_STRING('C', "cpu", &ftrace.target.cpu_list, "cpu", "List of cpus to monitor"), + OPT_END() + }; + const struct option ftrace_options[] = { + OPT_STRING('t', "tracer", &ftrace.tracer, "tracer", + "Tracer to use: function_graph(default) or function"), + OPT_CALLBACK_DEFAULT('F', "funcs", NULL, "[FILTER]", + "Show available functions to filter", + opt_list_avail_functions, "*"), OPT_CALLBACK('T', "trace-funcs", &ftrace.filters, "func", "Trace given functions using function tracer", parse_filter_func), @@ -923,24 +1161,65 @@ int cmd_ftrace(int argc, const char **argv) "Trace children processes"), OPT_UINTEGER('D', "delay", &ftrace.initial_delay, "Number of milliseconds to wait before starting tracing after program start"), - OPT_END() + OPT_PARENT(common_options), + }; + const struct option latency_options[] = { + OPT_CALLBACK('T', "trace-funcs", &ftrace.filters, "func", + "Show latency of given function", parse_filter_func), +#ifdef HAVE_BPF_SKEL + OPT_BOOLEAN('b', "use-bpf", &ftrace.target.use_bpf, + "Use BPF to measure function latency"), +#endif + OPT_PARENT(common_options), + }; + const struct option *options = ftrace_options; + + const char * const ftrace_usage[] = { + "perf ftrace [<options>] [<command>]", + "perf ftrace [<options>] -- [<command>] [<options>]", + "perf ftrace {trace|latency} [<options>] [<command>]", + "perf ftrace {trace|latency} [<options>] -- [<command>] [<options>]", + NULL }; + enum perf_ftrace_subcommand subcmd = PERF_FTRACE_NONE; INIT_LIST_HEAD(&ftrace.filters); INIT_LIST_HEAD(&ftrace.notrace); INIT_LIST_HEAD(&ftrace.graph_funcs); INIT_LIST_HEAD(&ftrace.nograph_funcs); + signal(SIGINT, sig_handler); + signal(SIGUSR1, sig_handler); + signal(SIGCHLD, sig_handler); + signal(SIGPIPE, sig_handler); + ret = perf_config(perf_ftrace_config, &ftrace); if (ret < 0) return -1; - argc = parse_options(argc, argv, ftrace_options, ftrace_usage, - PARSE_OPT_STOP_AT_NON_OPTION); - if (!argc && target__none(&ftrace.target)) - ftrace.target.system_wide = true; + if (argc > 1) { + if (!strcmp(argv[1], "trace")) { + subcmd = PERF_FTRACE_TRACE; + } else if (!strcmp(argv[1], "latency")) { + subcmd = PERF_FTRACE_LATENCY; + options = latency_options; + } + + if (subcmd != PERF_FTRACE_NONE) { + argc--; + argv++; + } + } + /* for backward compatibility */ + if (subcmd == PERF_FTRACE_NONE) + subcmd = PERF_FTRACE_TRACE; - select_tracer(&ftrace); + argc = parse_options(argc, argv, options, ftrace_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + if (argc < 0) { + ret = -EINVAL; + goto out_delete_filters; + } ret = target__validate(&ftrace.target); if (ret) { @@ -961,7 +1240,35 @@ int cmd_ftrace(int argc, const char **argv) if (ret < 0) goto out_delete_evlist; - ret = __cmd_ftrace(&ftrace, argc, argv); + if (argc) { + ret = evlist__prepare_workload(ftrace.evlist, &ftrace.target, + argv, false, + ftrace__workload_exec_failed_signal); + if (ret < 0) + goto out_delete_evlist; + } + + switch (subcmd) { + case PERF_FTRACE_TRACE: + if (!argc && target__none(&ftrace.target)) + ftrace.target.system_wide = true; + ret = __cmd_ftrace(&ftrace); + break; + case PERF_FTRACE_LATENCY: + if (list_empty(&ftrace.filters)) { + pr_err("Should provide a function to measure\n"); + parse_options_usage(ftrace_usage, options, "T", 1); + ret = -EINVAL; + goto out_delete_evlist; + } + ret = __cmd_latency(&ftrace); + break; + case PERF_FTRACE_NONE: + default: + pr_err("Invalid subcommand\n"); + ret = -EINVAL; + break; + } out_delete_evlist: evlist__delete(ftrace.evlist); diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index da03a341c63c..99d7ff9a8eff 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -192,7 +192,7 @@ static int evsel__process_alloc_node_event(struct evsel *evsel, struct perf_samp int ret = evsel__process_alloc_event(evsel, sample); if (!ret) { - int node1 = cpu__get_node(sample->cpu), + int node1 = cpu__get_node((struct perf_cpu){.cpu = sample->cpu}), node2 = evsel__intval(evsel, sample, "node"); if (node1 != node2) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 0338b813585a..bb716c953d02 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -111,6 +111,7 @@ struct record { unsigned long long samples; struct mmap_cpu_mask affinity_mask; unsigned long output_max_size; /* = 0: unlimited */ + struct perf_debuginfod debuginfod; }; static volatile int done; @@ -2177,6 +2178,12 @@ static int perf_record_config(const char *var, const char *value, void *cb) rec->opts.nr_cblocks = nr_cblocks_default; } #endif + if (!strcmp(var, "record.debuginfod")) { + rec->debuginfod.urls = strdup(value); + if (!rec->debuginfod.urls) + return -ENOMEM; + rec->debuginfod.set = true; + } return 0; } @@ -2267,6 +2274,10 @@ out_free: return ret; } +void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused) +{ +} + static int parse_control_option(const struct option *opt, const char *str, int unset __maybe_unused) @@ -2663,6 +2674,10 @@ static struct option __record_options[] = { parse_control_option), OPT_CALLBACK(0, "synth", &record.opts, "no|all|task|mmap|cgroup", "Fine-tune event synthesis: default=all", parse_record_synth_option), + OPT_STRING_OPTARG_SET(0, "debuginfod", &record.debuginfod.urls, + &record.debuginfod.set, "debuginfod urls", + "Enable debuginfod data retrieval from DEBUGINFOD_URLS or specified urls", + "system"), OPT_END() }; @@ -2716,6 +2731,8 @@ int cmd_record(int argc, const char **argv) if (err) return err; + perf_debuginfod_setup(&record.debuginfod); + /* Make system wide (-a) the default target. */ if (!argc && target__none(&rec->opts.target)) rec->opts.target.system_wide = true; @@ -2792,7 +2809,7 @@ int cmd_record(int argc, const char **argv) symbol__init(NULL); if (rec->opts.affinity != PERF_AFFINITY_SYS) { - rec->affinity_mask.nbits = cpu__max_cpu(); + rec->affinity_mask.nbits = cpu__max_cpu().cpu; rec->affinity_mask.bits = bitmap_zalloc(rec->affinity_mask.nbits); if (!rec->affinity_mask.bits) { pr_err("Failed to allocate thread mask for %zd cpus\n", rec->affinity_mask.nbits); @@ -2898,6 +2915,10 @@ int cmd_record(int argc, const char **argv) } rec->opts.target.hybrid = perf_pmu__has_hybrid(); + + if (callchain_param.enabled && callchain_param.record_mode == CALLCHAIN_FP) + arch__add_leaf_frame_record_opts(&rec->opts); + err = -ENOMEM; if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0) usage_with_options(record_usage, record_options); diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 8ae400429870..1dd92d8c9279 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -410,7 +410,7 @@ static int report__setup_sample_type(struct report *rep) } } - callchain_param_setup(sample_type); + callchain_param_setup(sample_type, perf_env__arch(&rep->session->header.env)); if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) { ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n" @@ -1127,7 +1127,7 @@ static int process_attr(struct perf_tool *tool __maybe_unused, * on events sample_type. */ sample_type = evlist__combined_sample_type(*pevlist); - callchain_param_setup(sample_type); + callchain_param_setup(sample_type, perf_env__arch((*pevlist)->env)); return 0; } diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index 4527f632ebe4..72d446de9c60 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -167,7 +167,7 @@ struct trace_sched_handler { struct perf_sched_map { DECLARE_BITMAP(comp_cpus_mask, MAX_CPUS); - int *comp_cpus; + struct perf_cpu *comp_cpus; bool comp; struct perf_thread_map *color_pids; const char *color_pids_str; @@ -191,7 +191,7 @@ struct perf_sched { * Track the current task - that way we can know whether there's any * weird events, such as a task being switched away that is not current. */ - int max_cpu; + struct perf_cpu max_cpu; u32 curr_pid[MAX_CPUS]; struct thread *curr_thread[MAX_CPUS]; char next_shortname1; @@ -1535,28 +1535,31 @@ static int map_switch_event(struct perf_sched *sched, struct evsel *evsel, int new_shortname; u64 timestamp0, timestamp = sample->time; s64 delta; - int i, this_cpu = sample->cpu; + int i; + struct perf_cpu this_cpu = { + .cpu = sample->cpu, + }; int cpus_nr; bool new_cpu = false; const char *color = PERF_COLOR_NORMAL; char stimestamp[32]; - BUG_ON(this_cpu >= MAX_CPUS || this_cpu < 0); + BUG_ON(this_cpu.cpu >= MAX_CPUS || this_cpu.cpu < 0); - if (this_cpu > sched->max_cpu) + if (this_cpu.cpu > sched->max_cpu.cpu) sched->max_cpu = this_cpu; if (sched->map.comp) { cpus_nr = bitmap_weight(sched->map.comp_cpus_mask, MAX_CPUS); - if (!test_and_set_bit(this_cpu, sched->map.comp_cpus_mask)) { + if (!test_and_set_bit(this_cpu.cpu, sched->map.comp_cpus_mask)) { sched->map.comp_cpus[cpus_nr++] = this_cpu; new_cpu = true; } } else - cpus_nr = sched->max_cpu; + cpus_nr = sched->max_cpu.cpu; - timestamp0 = sched->cpu_last_switched[this_cpu]; - sched->cpu_last_switched[this_cpu] = timestamp; + timestamp0 = sched->cpu_last_switched[this_cpu.cpu]; + sched->cpu_last_switched[this_cpu.cpu] = timestamp; if (timestamp0) delta = timestamp - timestamp0; else @@ -1577,7 +1580,7 @@ static int map_switch_event(struct perf_sched *sched, struct evsel *evsel, return -1; } - sched->curr_thread[this_cpu] = thread__get(sched_in); + sched->curr_thread[this_cpu.cpu] = thread__get(sched_in); printf(" "); @@ -1608,8 +1611,10 @@ static int map_switch_event(struct perf_sched *sched, struct evsel *evsel, } for (i = 0; i < cpus_nr; i++) { - int cpu = sched->map.comp ? sched->map.comp_cpus[i] : i; - struct thread *curr_thread = sched->curr_thread[cpu]; + struct perf_cpu cpu = { + .cpu = sched->map.comp ? sched->map.comp_cpus[i].cpu : i, + }; + struct thread *curr_thread = sched->curr_thread[cpu.cpu]; struct thread_runtime *curr_tr; const char *pid_color = color; const char *cpu_color = color; @@ -1617,19 +1622,19 @@ static int map_switch_event(struct perf_sched *sched, struct evsel *evsel, if (curr_thread && thread__has_color(curr_thread)) pid_color = COLOR_PIDS; - if (sched->map.cpus && !cpu_map__has(sched->map.cpus, cpu)) + if (sched->map.cpus && !perf_cpu_map__has(sched->map.cpus, cpu)) continue; - if (sched->map.color_cpus && cpu_map__has(sched->map.color_cpus, cpu)) + if (sched->map.color_cpus && perf_cpu_map__has(sched->map.color_cpus, cpu)) cpu_color = COLOR_CPUS; - if (cpu != this_cpu) + if (cpu.cpu != this_cpu.cpu) color_fprintf(stdout, color, " "); else color_fprintf(stdout, cpu_color, "*"); - if (sched->curr_thread[cpu]) { - curr_tr = thread__get_runtime(sched->curr_thread[cpu]); + if (sched->curr_thread[cpu.cpu]) { + curr_tr = thread__get_runtime(sched->curr_thread[cpu.cpu]); if (curr_tr == NULL) { thread__put(sched_in); return -1; @@ -1639,7 +1644,7 @@ static int map_switch_event(struct perf_sched *sched, struct evsel *evsel, color_fprintf(stdout, color, " "); } - if (sched->map.cpus && !cpu_map__has(sched->map.cpus, this_cpu)) + if (sched->map.cpus && !perf_cpu_map__has(sched->map.cpus, this_cpu)) goto out; timestamp__scnprintf_usec(timestamp, stimestamp, sizeof(stimestamp)); @@ -1929,7 +1934,7 @@ static char *timehist_get_commstr(struct thread *thread) static void timehist_header(struct perf_sched *sched) { - u32 ncpus = sched->max_cpu + 1; + u32 ncpus = sched->max_cpu.cpu + 1; u32 i, j; printf("%15s %6s ", "time", "cpu"); @@ -2008,7 +2013,7 @@ static void timehist_print_sample(struct perf_sched *sched, struct thread_runtime *tr = thread__priv(thread); const char *next_comm = evsel__strval(evsel, sample, "next_comm"); const u32 next_pid = evsel__intval(evsel, sample, "next_pid"); - u32 max_cpus = sched->max_cpu + 1; + u32 max_cpus = sched->max_cpu.cpu + 1; char tstr[64]; char nstr[30]; u64 wait_time; @@ -2389,7 +2394,7 @@ static void timehist_print_wakeup_event(struct perf_sched *sched, timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr)); printf("%15s [%04d] ", tstr, sample->cpu); if (sched->show_cpu_visual) - printf(" %*s ", sched->max_cpu + 1, ""); + printf(" %*s ", sched->max_cpu.cpu + 1, ""); printf(" %-*s ", comm_width, timehist_get_commstr(thread)); @@ -2449,13 +2454,13 @@ static void timehist_print_migration_event(struct perf_sched *sched, { struct thread *thread; char tstr[64]; - u32 max_cpus = sched->max_cpu + 1; + u32 max_cpus; u32 ocpu, dcpu; if (sched->summary_only) return; - max_cpus = sched->max_cpu + 1; + max_cpus = sched->max_cpu.cpu + 1; ocpu = evsel__intval(evsel, sample, "orig_cpu"); dcpu = evsel__intval(evsel, sample, "dest_cpu"); @@ -2918,7 +2923,7 @@ static void timehist_print_summary(struct perf_sched *sched, printf(" Total scheduling time (msec): "); print_sched_time(hist_time, 2); - printf(" (x %d)\n", sched->max_cpu); + printf(" (x %d)\n", sched->max_cpu.cpu); } typedef int (*sched_handler)(struct perf_tool *tool, @@ -2935,9 +2940,11 @@ static int perf_timehist__process_sample(struct perf_tool *tool, { struct perf_sched *sched = container_of(tool, struct perf_sched, tool); int err = 0; - int this_cpu = sample->cpu; + struct perf_cpu this_cpu = { + .cpu = sample->cpu, + }; - if (this_cpu > sched->max_cpu) + if (this_cpu.cpu > sched->max_cpu.cpu) sched->max_cpu = this_cpu; if (evsel->handler != NULL) { @@ -3054,10 +3061,10 @@ static int perf_sched__timehist(struct perf_sched *sched) goto out; /* pre-allocate struct for per-CPU idle stats */ - sched->max_cpu = session->header.env.nr_cpus_online; - if (sched->max_cpu == 0) - sched->max_cpu = 4; - if (init_idle_threads(sched->max_cpu)) + sched->max_cpu.cpu = session->header.env.nr_cpus_online; + if (sched->max_cpu.cpu == 0) + sched->max_cpu.cpu = 4; + if (init_idle_threads(sched->max_cpu.cpu)) goto out; /* summary_only implies summary option, but don't overwrite summary if set */ @@ -3209,10 +3216,10 @@ static int setup_map_cpus(struct perf_sched *sched) { struct perf_cpu_map *map; - sched->max_cpu = sysconf(_SC_NPROCESSORS_CONF); + sched->max_cpu.cpu = sysconf(_SC_NPROCESSORS_CONF); if (sched->map.comp) { - sched->map.comp_cpus = zalloc(sched->max_cpu * sizeof(int)); + sched->map.comp_cpus = zalloc(sched->max_cpu.cpu * sizeof(int)); if (!sched->map.comp_cpus) return -1; } diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c82b033e8942..ecd4f99a6c14 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -15,6 +15,7 @@ #include "util/symbol.h" #include "util/thread.h" #include "util/trace-event.h" +#include "util/env.h" #include "util/evlist.h" #include "util/evsel.h" #include "util/evsel_fprintf.h" @@ -648,7 +649,7 @@ out: return 0; } -static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask, +static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask, const char *arch, FILE *fp) { unsigned i = 0, r; @@ -661,7 +662,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask, for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) { u64 val = regs->regs[i++]; - printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), val); + printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r, arch), val); } return printed; @@ -718,17 +719,17 @@ tod_scnprintf(struct perf_script *script, char *buf, int buflen, } static int perf_sample__fprintf_iregs(struct perf_sample *sample, - struct perf_event_attr *attr, FILE *fp) + struct perf_event_attr *attr, const char *arch, FILE *fp) { return perf_sample__fprintf_regs(&sample->intr_regs, - attr->sample_regs_intr, fp); + attr->sample_regs_intr, arch, fp); } static int perf_sample__fprintf_uregs(struct perf_sample *sample, - struct perf_event_attr *attr, FILE *fp) + struct perf_event_attr *attr, const char *arch, FILE *fp) { return perf_sample__fprintf_regs(&sample->user_regs, - attr->sample_regs_user, fp); + attr->sample_regs_user, arch, fp); } static int perf_sample__fprintf_start(struct perf_script *script, @@ -2000,6 +2001,7 @@ static void process_event(struct perf_script *script, struct evsel_script *es = evsel->priv; FILE *fp = es->fp; char str[PAGE_SIZE_NAME_LEN]; + const char *arch = perf_env__arch(machine->env); if (output[type].fields == 0) return; @@ -2066,10 +2068,10 @@ static void process_event(struct perf_script *script, } if (PRINT_FIELD(IREGS)) - perf_sample__fprintf_iregs(sample, attr, fp); + perf_sample__fprintf_iregs(sample, attr, arch, fp); if (PRINT_FIELD(UREGS)) - perf_sample__fprintf_uregs(sample, attr, fp); + perf_sample__fprintf_uregs(sample, attr, arch, fp); if (PRINT_FIELD(BRSTACK)) perf_sample__fprintf_brstack(sample, thread, attr, fp); @@ -2113,8 +2115,8 @@ static struct scripting_ops *scripting_ops; static void __process_stat(struct evsel *counter, u64 tstamp) { int nthreads = perf_thread_map__nr(counter->core.threads); - int ncpus = evsel__nr_cpus(counter); - int cpu, thread; + int idx, thread; + struct perf_cpu cpu; static int header_printed; if (counter->core.system_wide) @@ -2127,13 +2129,13 @@ static void __process_stat(struct evsel *counter, u64 tstamp) } for (thread = 0; thread < nthreads; thread++) { - for (cpu = 0; cpu < ncpus; cpu++) { + perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) { struct perf_counts_values *counts; - counts = perf_counts(counter->counts, cpu, thread); + counts = perf_counts(counter->counts, idx, thread); printf("%3d %8d %15" PRIu64 " %15" PRIu64 " %15" PRIu64 " %15" PRIu64 " %s\n", - counter->core.cpus->map[cpu], + cpu.cpu, perf_thread_map__pid(counter->core.threads, thread), counts->val, counts->ena, @@ -2316,7 +2318,7 @@ static int process_attr(struct perf_tool *tool, union perf_event *event, * on events sample_type. */ sample_type = evlist__combined_sample_type(evlist); - callchain_param_setup(sample_type); + callchain_param_setup(sample_type, perf_env__arch((*pevlist)->env)); /* Enable fields for callchain entries */ if (symbol_conf.use_callchain && @@ -3466,16 +3468,7 @@ static void script__setup_sample_type(struct perf_script *script) struct perf_session *session = script->session; u64 sample_type = evlist__combined_sample_type(session->evlist); - if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) { - if ((sample_type & PERF_SAMPLE_REGS_USER) && - (sample_type & PERF_SAMPLE_STACK_USER)) { - callchain_param.record_mode = CALLCHAIN_DWARF; - dwarf_callchain_users = true; - } else if (sample_type & PERF_SAMPLE_BRANCH_STACK) - callchain_param.record_mode = CALLCHAIN_LBR; - else - callchain_param.record_mode = CALLCHAIN_FP; - } + callchain_param_setup(sample_type, perf_env__arch(session->machines.host.env)); if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) { pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n" diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 7974933dbc77..973ade18b72a 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -234,7 +234,7 @@ static bool cpus_map_matched(struct evsel *a, struct evsel *b) return false; for (int i = 0; i < a->core.cpus->nr; i++) { - if (a->core.cpus->map[i] != b->core.cpus->map[i]) + if (a->core.cpus->map[i].cpu != b->core.cpus->map[i].cpu) return false; } @@ -327,34 +327,35 @@ static int write_stat_round_event(u64 tm, u64 type) #define SID(e, x, y) xyarray__entry(e->core.sample_id, x, y) -static int evsel__write_stat_event(struct evsel *counter, u32 cpu, u32 thread, +static int evsel__write_stat_event(struct evsel *counter, int cpu_map_idx, u32 thread, struct perf_counts_values *count) { - struct perf_sample_id *sid = SID(counter, cpu, thread); + struct perf_sample_id *sid = SID(counter, cpu_map_idx, thread); + struct perf_cpu cpu = perf_cpu_map__cpu(evsel__cpus(counter), cpu_map_idx); return perf_event__synthesize_stat(NULL, cpu, thread, sid->id, count, process_synthesized_event, NULL); } -static int read_single_counter(struct evsel *counter, int cpu, +static int read_single_counter(struct evsel *counter, int cpu_map_idx, int thread, struct timespec *rs) { if (counter->tool_event == PERF_TOOL_DURATION_TIME) { u64 val = rs->tv_nsec + rs->tv_sec*1000000000ULL; struct perf_counts_values *count = - perf_counts(counter->counts, cpu, thread); + perf_counts(counter->counts, cpu_map_idx, thread); count->ena = count->run = val; count->val = val; return 0; } - return evsel__read_counter(counter, cpu, thread); + return evsel__read_counter(counter, cpu_map_idx, thread); } /* * Read out the results of a single counter: * do not aggregate counts across CPUs in system-wide mode */ -static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu) +static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu_map_idx) { int nthreads = perf_thread_map__nr(evsel_list->core.threads); int thread; @@ -368,24 +369,24 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu) for (thread = 0; thread < nthreads; thread++) { struct perf_counts_values *count; - count = perf_counts(counter->counts, cpu, thread); + count = perf_counts(counter->counts, cpu_map_idx, thread); /* * The leader's group read loads data into its group members * (via evsel__read_counter()) and sets their count->loaded. */ - if (!perf_counts__is_loaded(counter->counts, cpu, thread) && - read_single_counter(counter, cpu, thread, rs)) { + if (!perf_counts__is_loaded(counter->counts, cpu_map_idx, thread) && + read_single_counter(counter, cpu_map_idx, thread, rs)) { counter->counts->scaled = -1; - perf_counts(counter->counts, cpu, thread)->ena = 0; - perf_counts(counter->counts, cpu, thread)->run = 0; + perf_counts(counter->counts, cpu_map_idx, thread)->ena = 0; + perf_counts(counter->counts, cpu_map_idx, thread)->run = 0; return -1; } - perf_counts__set_loaded(counter->counts, cpu, thread, false); + perf_counts__set_loaded(counter->counts, cpu_map_idx, thread, false); if (STAT_RECORD) { - if (evsel__write_stat_event(counter, cpu, thread, count)) { + if (evsel__write_stat_event(counter, cpu_map_idx, thread, count)) { pr_err("failed to write stat event\n"); return -1; } @@ -395,7 +396,8 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu) fprintf(stat_config.output, "%s: %d: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n", evsel__name(counter), - cpu, + perf_cpu_map__cpu(evsel__cpus(counter), + cpu_map_idx).cpu, count->val, count->ena, count->run); } } @@ -405,36 +407,33 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu) static int read_affinity_counters(struct timespec *rs) { - struct evsel *counter; - struct affinity affinity; - int i, ncpus, cpu; + struct evlist_cpu_iterator evlist_cpu_itr; + struct affinity saved_affinity, *affinity; if (all_counters_use_bpf) return 0; - if (affinity__setup(&affinity) < 0) + if (!target__has_cpu(&target) || target__has_per_thread(&target)) + affinity = NULL; + else if (affinity__setup(&saved_affinity) < 0) return -1; + else + affinity = &saved_affinity; - ncpus = perf_cpu_map__nr(evsel_list->core.all_cpus); - if (!target__has_cpu(&target) || target__has_per_thread(&target)) - ncpus = 1; - evlist__for_each_cpu(evsel_list, i, cpu) { - if (i >= ncpus) - break; - affinity__set(&affinity, cpu); + evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) { + struct evsel *counter = evlist_cpu_itr.evsel; - evlist__for_each_entry(evsel_list, counter) { - if (evsel__cpu_iter_skip(counter, cpu)) - continue; - if (evsel__is_bpf(counter)) - continue; - if (!counter->err) { - counter->err = read_counter_cpu(counter, rs, - counter->cpu_iter - 1); - } + if (evsel__is_bpf(counter)) + continue; + + if (!counter->err) { + counter->err = read_counter_cpu(counter, rs, + evlist_cpu_itr.cpu_map_idx); } } - affinity__cleanup(&affinity); + if (affinity) + affinity__cleanup(&saved_affinity); + return 0; } @@ -788,8 +787,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) int status = 0; const bool forks = (argc > 0); bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int i, cpu, err; + int err; bool second_pass = false; if (forks) { @@ -813,102 +813,97 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) all_counters_use_bpf = false; } - evlist__for_each_cpu (evsel_list, i, cpu) { + evlist__for_each_cpu(evlist_cpu_itr, evsel_list, &affinity) { + counter = evlist_cpu_itr.evsel; + /* * bperf calls evsel__open_per_cpu() in bperf__load(), so * no need to call it again here. */ if (target.use_bpf) break; - affinity__set(&affinity, cpu); - evlist__for_each_entry(evsel_list, counter) { - if (evsel__cpu_iter_skip(counter, cpu)) + if (counter->reset_group || counter->errored) + continue; + if (evsel__is_bpf(counter)) + continue; +try_again: + if (create_perf_stat_counter(counter, &stat_config, &target, + evlist_cpu_itr.cpu_map_idx) < 0) { + + /* + * Weak group failed. We cannot just undo this here + * because earlier CPUs might be in group mode, and the kernel + * doesn't support mixing group and non group reads. Defer + * it to later. + * Don't close here because we're in the wrong affinity. + */ + if ((errno == EINVAL || errno == EBADF) && + evsel__leader(counter) != counter && + counter->weak_group) { + evlist__reset_weak_group(evsel_list, counter, false); + assert(counter->reset_group); + second_pass = true; continue; - if (counter->reset_group || counter->errored) + } + + switch (stat_handle_error(counter)) { + case COUNTER_FATAL: + return -1; + case COUNTER_RETRY: + goto try_again; + case COUNTER_SKIP: continue; - if (evsel__is_bpf(counter)) + default: + break; + } + + } + counter->supported = true; + } + + if (second_pass) { + /* + * Now redo all the weak group after closing them, + * and also close errored counters. + */ + + /* First close errored or weak retry */ + evlist__for_each_cpu(evlist_cpu_itr, evsel_list, &affinity) { + counter = evlist_cpu_itr.evsel; + + if (!counter->reset_group && !counter->errored) continue; -try_again: + + perf_evsel__close_cpu(&counter->core, evlist_cpu_itr.cpu_map_idx); + } + /* Now reopen weak */ + evlist__for_each_cpu(evlist_cpu_itr, evsel_list, &affinity) { + counter = evlist_cpu_itr.evsel; + + if (!counter->reset_group && !counter->errored) + continue; + if (!counter->reset_group) + continue; +try_again_reset: + pr_debug2("reopening weak %s\n", evsel__name(counter)); if (create_perf_stat_counter(counter, &stat_config, &target, - counter->cpu_iter - 1) < 0) { - - /* - * Weak group failed. We cannot just undo this here - * because earlier CPUs might be in group mode, and the kernel - * doesn't support mixing group and non group reads. Defer - * it to later. - * Don't close here because we're in the wrong affinity. - */ - if ((errno == EINVAL || errno == EBADF) && - evsel__leader(counter) != counter && - counter->weak_group) { - evlist__reset_weak_group(evsel_list, counter, false); - assert(counter->reset_group); - second_pass = true; - continue; - } + evlist_cpu_itr.cpu_map_idx) < 0) { switch (stat_handle_error(counter)) { case COUNTER_FATAL: return -1; case COUNTER_RETRY: - goto try_again; + goto try_again_reset; case COUNTER_SKIP: continue; default: break; } - } counter->supported = true; } } - - if (second_pass) { - /* - * Now redo all the weak group after closing them, - * and also close errored counters. - */ - - evlist__for_each_cpu(evsel_list, i, cpu) { - affinity__set(&affinity, cpu); - /* First close errored or weak retry */ - evlist__for_each_entry(evsel_list, counter) { - if (!counter->reset_group && !counter->errored) - continue; - if (evsel__cpu_iter_skip_no_inc(counter, cpu)) - continue; - perf_evsel__close_cpu(&counter->core, counter->cpu_iter); - } - /* Now reopen weak */ - evlist__for_each_entry(evsel_list, counter) { - if (!counter->reset_group && !counter->errored) - continue; - if (evsel__cpu_iter_skip(counter, cpu)) - continue; - if (!counter->reset_group) - continue; -try_again_reset: - pr_debug2("reopening weak %s\n", evsel__name(counter)); - if (create_perf_stat_counter(counter, &stat_config, &target, - counter->cpu_iter - 1) < 0) { - - switch (stat_handle_error(counter)) { - case COUNTER_FATAL: - return -1; - case COUNTER_RETRY: - goto try_again_reset; - case COUNTER_SKIP: - continue; - default: - break; - } - } - counter->supported = true; - } - } - } affinity__cleanup(&affinity); evlist__for_each_entry(evsel_list, counter) { @@ -1168,6 +1163,26 @@ static int parse_stat_cgroups(const struct option *opt, return parse_cgroups(opt, str, unset); } +static int parse_hybrid_type(const struct option *opt, + const char *str, + int unset __maybe_unused) +{ + struct evlist *evlist = *(struct evlist **)opt->value; + + if (!list_empty(&evlist->core.entries)) { + fprintf(stderr, "Must define cputype before events/metrics\n"); + return -1; + } + + evlist->hybrid_pmu_name = perf_pmu__hybrid_type_to_pmu(str); + if (!evlist->hybrid_pmu_name) { + fprintf(stderr, "--cputype %s is not supported!\n", str); + return -1; + } + + return 0; +} + static struct option stat_options[] = { OPT_BOOLEAN('T', "transaction", &transaction_run, "hardware transaction statistics"), @@ -1282,6 +1297,10 @@ static struct option stat_options[] = { "don't print 'summary' for CSV summary output"), OPT_BOOLEAN(0, "quiet", &stat_config.quiet, "don't print output (useful with record)"), + OPT_CALLBACK(0, "cputype", &evsel_list, "hybrid cpu type", + "Only enable events on applying cpu with this type " + "for hybrid platform (e.g. core or atom)", + parse_hybrid_type), #ifdef HAVE_LIBPFM OPT_CALLBACK(0, "pfm-events", &evsel_list, "event", "libpfm4 event selector. use 'perf list' to list available events", @@ -1298,70 +1317,75 @@ static struct option stat_options[] = { OPT_END() }; +static const char *const aggr_mode__string[] = { + [AGGR_CORE] = "core", + [AGGR_DIE] = "die", + [AGGR_GLOBAL] = "global", + [AGGR_NODE] = "node", + [AGGR_NONE] = "none", + [AGGR_SOCKET] = "socket", + [AGGR_THREAD] = "thread", + [AGGR_UNSET] = "unset", +}; + static struct aggr_cpu_id perf_stat__get_socket(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int cpu) + struct perf_cpu cpu) { - return cpu_map__get_socket(map, cpu, NULL); + return aggr_cpu_id__socket(cpu, /*data=*/NULL); } static struct aggr_cpu_id perf_stat__get_die(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int cpu) + struct perf_cpu cpu) { - return cpu_map__get_die(map, cpu, NULL); + return aggr_cpu_id__die(cpu, /*data=*/NULL); } static struct aggr_cpu_id perf_stat__get_core(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int cpu) + struct perf_cpu cpu) { - return cpu_map__get_core(map, cpu, NULL); + return aggr_cpu_id__core(cpu, /*data=*/NULL); } static struct aggr_cpu_id perf_stat__get_node(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int cpu) + struct perf_cpu cpu) { - return cpu_map__get_node(map, cpu, NULL); + return aggr_cpu_id__node(cpu, /*data=*/NULL); } static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config, - aggr_get_id_t get_id, struct perf_cpu_map *map, int idx) + aggr_get_id_t get_id, struct perf_cpu cpu) { - int cpu; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id = aggr_cpu_id__empty(); - if (idx >= map->nr) - return id; + if (aggr_cpu_id__is_empty(&config->cpus_aggr_map->map[cpu.cpu])) + config->cpus_aggr_map->map[cpu.cpu] = get_id(config, cpu); - cpu = map->map[idx]; - - if (cpu_map__aggr_cpu_id_is_empty(config->cpus_aggr_map->map[cpu])) - config->cpus_aggr_map->map[cpu] = get_id(config, map, idx); - - id = config->cpus_aggr_map->map[cpu]; + id = config->cpus_aggr_map->map[cpu.cpu]; return id; } static struct aggr_cpu_id perf_stat__get_socket_cached(struct perf_stat_config *config, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_stat__get_aggr(config, perf_stat__get_socket, map, idx); + return perf_stat__get_aggr(config, perf_stat__get_socket, cpu); } static struct aggr_cpu_id perf_stat__get_die_cached(struct perf_stat_config *config, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_stat__get_aggr(config, perf_stat__get_die, map, idx); + return perf_stat__get_aggr(config, perf_stat__get_die, cpu); } static struct aggr_cpu_id perf_stat__get_core_cached(struct perf_stat_config *config, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_stat__get_aggr(config, perf_stat__get_core, map, idx); + return perf_stat__get_aggr(config, perf_stat__get_core, cpu); } static struct aggr_cpu_id perf_stat__get_node_cached(struct perf_stat_config *config, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_stat__get_aggr(config, perf_stat__get_node, map, idx); + return perf_stat__get_aggr(config, perf_stat__get_node, cpu); } static bool term_percore_set(void) @@ -1376,54 +1400,67 @@ static bool term_percore_set(void) return false; } -static int perf_stat_init_aggr_mode(void) +static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode) { - int nr; + switch (aggr_mode) { + case AGGR_SOCKET: + return aggr_cpu_id__socket; + case AGGR_DIE: + return aggr_cpu_id__die; + case AGGR_CORE: + return aggr_cpu_id__core; + case AGGR_NODE: + return aggr_cpu_id__node; + case AGGR_NONE: + if (term_percore_set()) + return aggr_cpu_id__core; + + return NULL; + case AGGR_GLOBAL: + case AGGR_THREAD: + case AGGR_UNSET: + default: + return NULL; + } +} - switch (stat_config.aggr_mode) { +static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode) +{ + switch (aggr_mode) { case AGGR_SOCKET: - if (cpu_map__build_socket_map(evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build socket map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_socket_cached; - break; + return perf_stat__get_socket_cached; case AGGR_DIE: - if (cpu_map__build_die_map(evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build die map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_die_cached; - break; + return perf_stat__get_die_cached; case AGGR_CORE: - if (cpu_map__build_core_map(evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build core map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_core_cached; - break; + return perf_stat__get_core_cached; case AGGR_NODE: - if (cpu_map__build_node_map(evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build core map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_node_cached; - break; + return perf_stat__get_node_cached; case AGGR_NONE: if (term_percore_set()) { - if (cpu_map__build_core_map(evsel_list->core.cpus, - &stat_config.aggr_map)) { - perror("cannot build core map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_core_cached; + return perf_stat__get_core_cached; } - break; + return NULL; case AGGR_GLOBAL: case AGGR_THREAD: case AGGR_UNSET: default: - break; + return NULL; + } +} + +static int perf_stat_init_aggr_mode(void) +{ + int nr; + aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode); + + if (get_id) { + stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, + get_id, /*data=*/NULL); + if (!stat_config.aggr_map) { + pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]); + return -1; + } + stat_config.aggr_get_id = aggr_mode__get_id(stat_config.aggr_mode); } /* @@ -1431,7 +1468,7 @@ static int perf_stat_init_aggr_mode(void) * taking the highest cpu number to be the size of * the aggregation translate cpumap. */ - nr = perf_cpu_map__max(evsel_list->core.cpus); + nr = perf_cpu_map__max(evsel_list->core.cpus).cpu; stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1); return stat_config.cpus_aggr_map ? 0 : -ENOMEM; } @@ -1459,169 +1496,139 @@ static void perf_stat__exit_aggr_mode(void) stat_config.cpus_aggr_map = NULL; } -static inline int perf_env__get_cpu(struct perf_env *env, struct perf_cpu_map *map, int idx) -{ - int cpu; - - if (idx > map->nr) - return -1; - - cpu = map->map[idx]; - - if (cpu >= env->nr_cpus_avail) - return -1; - - return cpu; -} - -static struct aggr_cpu_id perf_env__get_socket(struct perf_cpu_map *map, int idx, void *data) +static struct aggr_cpu_id perf_env__get_socket_aggr_by_cpu(struct perf_cpu cpu, void *data) { struct perf_env *env = data; - int cpu = perf_env__get_cpu(env, map, idx); - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id = aggr_cpu_id__empty(); - if (cpu != -1) - id.socket = env->cpu[cpu].socket_id; + if (cpu.cpu != -1) + id.socket = env->cpu[cpu.cpu].socket_id; return id; } -static struct aggr_cpu_id perf_env__get_die(struct perf_cpu_map *map, int idx, void *data) +static struct aggr_cpu_id perf_env__get_die_aggr_by_cpu(struct perf_cpu cpu, void *data) { struct perf_env *env = data; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); - int cpu = perf_env__get_cpu(env, map, idx); + struct aggr_cpu_id id = aggr_cpu_id__empty(); - if (cpu != -1) { + if (cpu.cpu != -1) { /* * die_id is relative to socket, so start * with the socket ID and then add die to * make a unique ID. */ - id.socket = env->cpu[cpu].socket_id; - id.die = env->cpu[cpu].die_id; + id.socket = env->cpu[cpu.cpu].socket_id; + id.die = env->cpu[cpu.cpu].die_id; } return id; } -static struct aggr_cpu_id perf_env__get_core(struct perf_cpu_map *map, int idx, void *data) +static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, void *data) { struct perf_env *env = data; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); - int cpu = perf_env__get_cpu(env, map, idx); + struct aggr_cpu_id id = aggr_cpu_id__empty(); - if (cpu != -1) { + if (cpu.cpu != -1) { /* * core_id is relative to socket and die, * we need a global id. So we set * socket, die id and core id */ - id.socket = env->cpu[cpu].socket_id; - id.die = env->cpu[cpu].die_id; - id.core = env->cpu[cpu].core_id; + id.socket = env->cpu[cpu.cpu].socket_id; + id.die = env->cpu[cpu.cpu].die_id; + id.core = env->cpu[cpu.cpu].core_id; } return id; } -static struct aggr_cpu_id perf_env__get_node(struct perf_cpu_map *map, int idx, void *data) +static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, void *data) { - int cpu = perf_env__get_cpu(data, map, idx); - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id = aggr_cpu_id__empty(); id.node = perf_env__numa_node(data, cpu); return id; } -static int perf_env__build_socket_map(struct perf_env *env, struct perf_cpu_map *cpus, - struct cpu_aggr_map **sockp) -{ - return cpu_map__build_map(cpus, sockp, perf_env__get_socket, env); -} - -static int perf_env__build_die_map(struct perf_env *env, struct perf_cpu_map *cpus, - struct cpu_aggr_map **diep) -{ - return cpu_map__build_map(cpus, diep, perf_env__get_die, env); -} - -static int perf_env__build_core_map(struct perf_env *env, struct perf_cpu_map *cpus, - struct cpu_aggr_map **corep) -{ - return cpu_map__build_map(cpus, corep, perf_env__get_core, env); -} - -static int perf_env__build_node_map(struct perf_env *env, struct perf_cpu_map *cpus, - struct cpu_aggr_map **nodep) -{ - return cpu_map__build_map(cpus, nodep, perf_env__get_node, env); -} - static struct aggr_cpu_id perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_env__get_socket(map, idx, &perf_stat.session->header.env); + return perf_env__get_socket_aggr_by_cpu(cpu, &perf_stat.session->header.env); } static struct aggr_cpu_id perf_stat__get_die_file(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_env__get_die(map, idx, &perf_stat.session->header.env); + return perf_env__get_die_aggr_by_cpu(cpu, &perf_stat.session->header.env); } static struct aggr_cpu_id perf_stat__get_core_file(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_env__get_core(map, idx, &perf_stat.session->header.env); + return perf_env__get_core_aggr_by_cpu(cpu, &perf_stat.session->header.env); } static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *config __maybe_unused, - struct perf_cpu_map *map, int idx) + struct perf_cpu cpu) { - return perf_env__get_node(map, idx, &perf_stat.session->header.env); + return perf_env__get_node_aggr_by_cpu(cpu, &perf_stat.session->header.env); } -static int perf_stat_init_aggr_mode_file(struct perf_stat *st) +static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode) { - struct perf_env *env = &st->session->header.env; + switch (aggr_mode) { + case AGGR_SOCKET: + return perf_env__get_socket_aggr_by_cpu; + case AGGR_DIE: + return perf_env__get_die_aggr_by_cpu; + case AGGR_CORE: + return perf_env__get_core_aggr_by_cpu; + case AGGR_NODE: + return perf_env__get_node_aggr_by_cpu; + case AGGR_NONE: + case AGGR_GLOBAL: + case AGGR_THREAD: + case AGGR_UNSET: + default: + return NULL; + } +} - switch (stat_config.aggr_mode) { +static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode) +{ + switch (aggr_mode) { case AGGR_SOCKET: - if (perf_env__build_socket_map(env, evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build socket map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_socket_file; - break; + return perf_stat__get_socket_file; case AGGR_DIE: - if (perf_env__build_die_map(env, evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build die map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_die_file; - break; + return perf_stat__get_die_file; case AGGR_CORE: - if (perf_env__build_core_map(env, evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build core map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_core_file; - break; + return perf_stat__get_core_file; case AGGR_NODE: - if (perf_env__build_node_map(env, evsel_list->core.cpus, &stat_config.aggr_map)) { - perror("cannot build core map"); - return -1; - } - stat_config.aggr_get_id = perf_stat__get_node_file; - break; + return perf_stat__get_node_file; case AGGR_NONE: case AGGR_GLOBAL: case AGGR_THREAD: case AGGR_UNSET: default: - break; + return NULL; } +} + +static int perf_stat_init_aggr_mode_file(struct perf_stat *st) +{ + struct perf_env *env = &st->session->header.env; + aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode); + + if (!get_id) + return 0; + stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env); + if (!stat_config.aggr_map) { + pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]); + return -1; + } + stat_config.aggr_get_id = aggr_mode__get_id_file(stat_config.aggr_mode); return 0; } diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 273237a9c5e6..32844d8a0ea5 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -3964,6 +3964,9 @@ static int trace__run(struct trace *trace, int argc, const char **argv) evlist__add(evlist, pgfault_min); } + /* Enable ignoring missing threads when -u/-p option is defined. */ + trace->opts.ignore_missing_thread = trace->opts.target.uid != UINT_MAX || trace->opts.target.pid; + if (trace->sched && evlist__add_newtp(evlist, "sched", "sched_stat_runtime", trace__sched_stat_runtime)) goto out_error_sched_stat_runtime; diff --git a/tools/perf/dlfilters/dlfilter-test-api-v0.c b/tools/perf/dlfilters/dlfilter-test-api-v0.c index 7565a1852c74..b17eb52a0694 100644 --- a/tools/perf/dlfilters/dlfilter-test-api-v0.c +++ b/tools/perf/dlfilters/dlfilter-test-api-v0.c @@ -308,8 +308,6 @@ int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, vo int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx) { - struct filter_data *d = data; - pr_debug("%s API\n", __func__); return do_checks(data, sample, ctx, false); diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json new file mode 100644 index 000000000000..79f2016c53b0 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/branch.json @@ -0,0 +1,8 @@ +[ + { + "ArchStdEvent": "BR_MIS_PRED" + }, + { + "ArchStdEvent": "BR_PRED" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json new file mode 100644 index 000000000000..579c1c993d17 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/bus.json @@ -0,0 +1,20 @@ +[ + { + "ArchStdEvent": "CPU_CYCLES" + }, + { + "ArchStdEvent": "BUS_ACCESS" + }, + { + "ArchStdEvent": "BUS_CYCLES" + }, + { + "ArchStdEvent": "BUS_ACCESS_RD" + }, + { + "ArchStdEvent": "BUS_ACCESS_WR" + }, + { + "ArchStdEvent": "CNT_CYCLES" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json new file mode 100644 index 000000000000..0141f749bff3 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/cache.json @@ -0,0 +1,155 @@ +[ + { + "ArchStdEvent": "L1I_CACHE_REFILL" + }, + { + "ArchStdEvent": "L1I_TLB_REFILL" + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL" + }, + { + "ArchStdEvent": "L1D_CACHE" + }, + { + "ArchStdEvent": "L1D_TLB_REFILL" + }, + { + "ArchStdEvent": "L1I_CACHE" + }, + { + "ArchStdEvent": "L1D_CACHE_WB" + }, + { + "ArchStdEvent": "L2D_CACHE" + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL" + }, + { + "ArchStdEvent": "L2D_CACHE_WB" + }, + { + "ArchStdEvent": "L2D_CACHE_ALLOCATE" + }, + { + "ArchStdEvent": "L1D_TLB" + }, + { + "ArchStdEvent": "L1I_TLB" + }, + { + "ArchStdEvent": "L3D_CACHE_ALLOCATE" + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL" + }, + { + "ArchStdEvent": "L3D_CACHE" + }, + { + "ArchStdEvent": "L2D_TLB_REFILL" + }, + { + "ArchStdEvent": "L2D_TLB" + }, + { + "ArchStdEvent": "DTLB_WALK" + }, + { + "ArchStdEvent": "ITLB_WALK" + }, + { + "ArchStdEvent": "LL_CACHE_RD" + }, + { + "ArchStdEvent": "LL_CACHE_MISS_RD" + }, + { + "ArchStdEvent": "L1D_CACHE_LMISS_RD" + }, + { + "ArchStdEvent": "L1D_CACHE_RD" + }, + { + "ArchStdEvent": "L1D_CACHE_WR" + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_RD" + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_WR" + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_INNER" + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER" + }, + { + "ArchStdEvent": "L1D_CACHE_WB_VICTIM" + }, + { + "ArchStdEvent": "L1D_CACHE_WB_CLEAN" + }, + { + "ArchStdEvent": "L1D_CACHE_INVAL" + }, + { + "ArchStdEvent": "L1D_TLB_REFILL_RD" + }, + { + "ArchStdEvent": "L1D_TLB_REFILL_WR" + }, + { + "ArchStdEvent": "L1D_TLB_RD" + }, + { + "ArchStdEvent": "L1D_TLB_WR" + }, + { + "ArchStdEvent": "L2D_CACHE_RD" + }, + { + "ArchStdEvent": "L2D_CACHE_WR" + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_RD" + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_WR" + }, + { + "ArchStdEvent": "L2D_CACHE_WB_VICTIM" + }, + { + "ArchStdEvent": "L2D_CACHE_WB_CLEAN" + }, + { + "ArchStdEvent": "L2D_CACHE_INVAL" + }, + { + "ArchStdEvent": "L2D_TLB_REFILL_RD" + }, + { + "ArchStdEvent": "L2D_TLB_REFILL_WR" + }, + { + "ArchStdEvent": "L2D_TLB_RD" + }, + { + "ArchStdEvent": "L2D_TLB_WR" + }, + { + "ArchStdEvent": "L3D_CACHE_RD" + }, + { + "ArchStdEvent": "L1I_CACHE_LMISS" + }, + { + "ArchStdEvent": "L2D_CACHE_LMISS_RD" + }, + { + "ArchStdEvent": "L3D_CACHE_LMISS_RD" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json new file mode 100644 index 000000000000..344a2d552ad5 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/exception.json @@ -0,0 +1,47 @@ +[ + { + "ArchStdEvent": "EXC_TAKEN" + }, + { + "ArchStdEvent": "MEMORY_ERROR" + }, + { + "ArchStdEvent": "EXC_UNDEF" + }, + { + "ArchStdEvent": "EXC_SVC" + }, + { + "ArchStdEvent": "EXC_PABORT" + }, + { + "ArchStdEvent": "EXC_DABORT" + }, + { + "ArchStdEvent": "EXC_IRQ" + }, + { + "ArchStdEvent": "EXC_FIQ" + }, + { + "ArchStdEvent": "EXC_SMC" + }, + { + "ArchStdEvent": "EXC_HVC" + }, + { + "ArchStdEvent": "EXC_TRAP_PABORT" + }, + { + "ArchStdEvent": "EXC_TRAP_DABORT" + }, + { + "ArchStdEvent": "EXC_TRAP_OTHER" + }, + { + "ArchStdEvent": "EXC_TRAP_IRQ" + }, + { + "ArchStdEvent": "EXC_TRAP_FIQ" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json new file mode 100644 index 000000000000..e57cd55937c6 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/instruction.json @@ -0,0 +1,143 @@ +[ + { + "ArchStdEvent": "SW_INCR" + }, + { + "ArchStdEvent": "INST_RETIRED" + }, + { + "ArchStdEvent": "EXC_RETURN" + }, + { + "ArchStdEvent": "CID_WRITE_RETIRED" + }, + { + "ArchStdEvent": "INST_SPEC" + }, + { + "ArchStdEvent": "TTBR_WRITE_RETIRED" + }, + { + "ArchStdEvent": "BR_RETIRED" + }, + { + "ArchStdEvent": "BR_MIS_PRED_RETIRED" + }, + { + "ArchStdEvent": "OP_RETIRED" + }, + { + "ArchStdEvent": "OP_SPEC" + }, + { + "ArchStdEvent": "LDREX_SPEC" + }, + { + "ArchStdEvent": "STREX_PASS_SPEC" + }, + { + "ArchStdEvent": "STREX_FAIL_SPEC" + }, + { + "ArchStdEvent": "STREX_SPEC" + }, + { + "ArchStdEvent": "LD_SPEC" + }, + { + "ArchStdEvent": "ST_SPEC" + }, + { + "ArchStdEvent": "DP_SPEC" + }, + { + "ArchStdEvent": "ASE_SPEC" + }, + { + "ArchStdEvent": "VFP_SPEC" + }, + { + "ArchStdEvent": "PC_WRITE_SPEC" + }, + { + "ArchStdEvent": "CRYPTO_SPEC" + }, + { + "ArchStdEvent": "BR_IMMED_SPEC" + }, + { + "ArchStdEvent": "BR_RETURN_SPEC" + }, + { + "ArchStdEvent": "BR_INDIRECT_SPEC" + }, + { + "ArchStdEvent": "ISB_SPEC" + }, + { + "ArchStdEvent": "DSB_SPEC" + }, + { + "ArchStdEvent": "DMB_SPEC" + }, + { + "ArchStdEvent": "RC_LD_SPEC" + }, + { + "ArchStdEvent": "RC_ST_SPEC" + }, + { + "ArchStdEvent": "ASE_INST_SPEC" + }, + { + "ArchStdEvent": "SVE_INST_SPEC" + }, + { + "ArchStdEvent": "FP_HP_SPEC" + }, + { + "ArchStdEvent": "FP_SP_SPEC" + }, + { + "ArchStdEvent": "FP_DP_SPEC" + }, + { + "ArchStdEvent": "SVE_PRED_SPEC" + }, + { + "ArchStdEvent": "SVE_PRED_EMPTY_SPEC" + }, + { + "ArchStdEvent": "SVE_PRED_FULL_SPEC" + }, + { + "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC" + }, + { + "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC" + }, + { + "ArchStdEvent": "SVE_LDFF_SPEC" + }, + { + "ArchStdEvent": "SVE_LDFF_FAULT_SPEC" + }, + { + "ArchStdEvent": "FP_SCALE_OPS_SPEC" + }, + { + "ArchStdEvent": "FP_FIXED_OPS_SPEC" + }, + { + "ArchStdEvent": "ASE_SVE_INT8_SPEC" + }, + { + "ArchStdEvent": "ASE_SVE_INT16_SPEC" + }, + { + "ArchStdEvent": "ASE_SVE_INT32_SPEC" + }, + { + "ArchStdEvent": "ASE_SVE_INT64_SPEC" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json new file mode 100644 index 000000000000..e522113aeb96 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/memory.json @@ -0,0 +1,38 @@ +[ + { + "ArchStdEvent": "MEM_ACCESS" + }, + { + "ArchStdEvent": "MEM_ACCESS_RD" + }, + { + "ArchStdEvent": "MEM_ACCESS_WR" + }, + { + "ArchStdEvent": "UNALIGNED_LD_SPEC" + }, + { + "ArchStdEvent": "UNALIGNED_ST_SPEC" + }, + { + "ArchStdEvent": "UNALIGNED_LDST_SPEC" + }, + { + "ArchStdEvent": "LDST_ALIGN_LAT" + }, + { + "ArchStdEvent": "LD_ALIGN_LAT" + }, + { + "ArchStdEvent": "ST_ALIGN_LAT" + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED" + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED_RD" + }, + { + "ArchStdEvent": "MEM_ACCESS_CHECKED_WR" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json new file mode 100644 index 000000000000..20d8365756c5 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/other.json @@ -0,0 +1,5 @@ +[ + { + "ArchStdEvent": "REMOTE_ACCESS" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json new file mode 100644 index 000000000000..f9fae15f7555 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json @@ -0,0 +1,23 @@ +[ + { + "ArchStdEvent": "STALL_FRONTEND" + }, + { + "ArchStdEvent": "STALL_BACKEND" + }, + { + "ArchStdEvent": "STALL" + }, + { + "ArchStdEvent": "STALL_SLOT_BACKEND" + }, + { + "ArchStdEvent": "STALL_SLOT_FRONTEND" + }, + { + "ArchStdEvent": "STALL_SLOT" + }, + { + "ArchStdEvent": "STALL_BACKEND_MEM" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json new file mode 100644 index 000000000000..20f2165c85fe --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/spe.json @@ -0,0 +1,14 @@ +[ + { + "ArchStdEvent": "SAMPLE_POP" + }, + { + "ArchStdEvent": "SAMPLE_FEED" + }, + { + "ArchStdEvent": "SAMPLE_FILTRATE" + }, + { + "ArchStdEvent": "SAMPLE_COLLISION" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json new file mode 100644 index 000000000000..3116135c59e2 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/trace.json @@ -0,0 +1,29 @@ +[ + { + "ArchStdEvent": "TRB_WRAP" + }, + { + "ArchStdEvent": "TRCEXTOUT0" + }, + { + "ArchStdEvent": "TRCEXTOUT1" + }, + { + "ArchStdEvent": "TRCEXTOUT2" + }, + { + "ArchStdEvent": "TRCEXTOUT3" + }, + { + "ArchStdEvent": "CTI_TRIGOUT4" + }, + { + "ArchStdEvent": "CTI_TRIGOUT5" + }, + { + "ArchStdEvent": "CTI_TRIGOUT6" + }, + { + "ArchStdEvent": "CTI_TRIGOUT7" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json index 423767510aff..80d7a70829a0 100644 --- a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json @@ -300,6 +300,30 @@ "BriefDescription": "No operation sent for execution on a slot" }, { + "PublicDescription": "Sample Population", + "EventCode": "0x4000", + "EventName": "SAMPLE_POP", + "BriefDescription": "Sample Population" + }, + { + "PublicDescription": "Sample Taken", + "EventCode": "0x4001", + "EventName": "SAMPLE_FEED", + "BriefDescription": "Sample Taken" + }, + { + "PublicDescription": "Sample Taken and not removed by filtering", + "EventCode": "0x4002", + "EventName": "SAMPLE_FILTRATE", + "BriefDescription": "Sample Taken and not removed by filtering" + }, + { + "PublicDescription": "Sample collided with previous sample", + "EventCode": "0x4003", + "EventName": "SAMPLE_COLLISION", + "BriefDescription": "Sample collided with previous sample" + }, + { "PublicDescription": "Constant frequency cycles. The counter increments at a constant frequency equal to the rate of increment of the system counter, CNTPCT_EL0.", "EventCode": "0x4004", "EventName": "CNT_CYCLES", @@ -330,6 +354,96 @@ "BriefDescription": "Level 3 data cache long-latency read miss" }, { + "PublicDescription": "Trace buffer current write pointer wrapped", + "EventCode": "0x400C", + "EventName": "TRB_WRAP", + "BriefDescription": "Trace buffer current write pointer wrapped" + }, + { + "PublicDescription": "PE Trace Unit external output 0", + "EventCode": "0x4010", + "EventName": "TRCEXTOUT0", + "BriefDescription": "PE Trace Unit external output 0" + }, + { + "PublicDescription": "PE Trace Unit external output 1", + "EventCode": "0x4011", + "EventName": "TRCEXTOUT1", + "BriefDescription": "PE Trace Unit external output 1" + }, + { + "PublicDescription": "PE Trace Unit external output 2", + "EventCode": "0x4012", + "EventName": "TRCEXTOUT2", + "BriefDescription": "PE Trace Unit external output 2" + }, + { + "PublicDescription": "PE Trace Unit external output 3", + "EventCode": "0x4013", + "EventName": "TRCEXTOUT3", + "BriefDescription": "PE Trace Unit external output 3" + }, + { + "PublicDescription": "Cross-trigger Interface output trigger 4", + "EventCode": "0x4018", + "EventName": "CTI_TRIGOUT4", + "BriefDescription": "Cross-trigger Interface output trigger 4" + }, + { + "PublicDescription": "Cross-trigger Interface output trigger 5 ", + "EventCode": "0x4019", + "EventName": "CTI_TRIGOUT5", + "BriefDescription": "Cross-trigger Interface output trigger 5 " + }, + { + "PublicDescription": "Cross-trigger Interface output trigger 6", + "EventCode": "0x401A", + "EventName": "CTI_TRIGOUT6", + "BriefDescription": "Cross-trigger Interface output trigger 6" + }, + { + "PublicDescription": "Cross-trigger Interface output trigger 7", + "EventCode": "0x401B", + "EventName": "CTI_TRIGOUT7", + "BriefDescription": "Cross-trigger Interface output trigger 7" + }, + { + "PublicDescription": "Access with additional latency from alignment", + "EventCode": "0x4020", + "EventName": "LDST_ALIGN_LAT", + "BriefDescription": "Access with additional latency from alignment" + }, + { + "PublicDescription": "Load with additional latency from alignment", + "EventCode": "0x4021", + "EventName": "LD_ALIGN_LAT", + "BriefDescription": "Load with additional latency from alignment" + }, + { + "PublicDescription": "Store with additional latency from alignment", + "EventCode": "0x4022", + "EventName": "ST_ALIGN_LAT", + "BriefDescription": "Store with additional latency from alignment" + }, + { + "PublicDescription": "Checked data memory access", + "EventCode": "0x4024", + "EventName": "MEM_ACCESS_CHECKED", + "BriefDescription": "Checked data memory access" + }, + { + "PublicDescription": "Checked data memory access, read", + "EventCode": "0x4025", + "EventName": "MEM_ACCESS_CHECKED_RD", + "BriefDescription": "Checked data memory access, read" + }, + { + "PublicDescription": "Checked data memory access, write", + "EventCode": "0x4026", + "EventName": "MEM_ACCESS_CHECKED_WR", + "BriefDescription": "Checked data memory access, write" + }, + { "PublicDescription": "SIMD Instruction architecturally executed.", "EventCode": "0x8000", "EventName": "SIMD_INST_RETIRED", @@ -342,6 +456,18 @@ "BriefDescription": "Instruction architecturally executed, SVE." }, { + "PublicDescription": "ASE operations speculatively executed", + "EventCode": "0x8005", + "EventName": "ASE_INST_SPEC", + "BriefDescription": "ASE operations speculatively executed" + }, + { + "PublicDescription": "SVE operations speculatively executed", + "EventCode": "0x8006", + "EventName": "SVE_INST_SPEC", + "BriefDescription": "SVE operations speculatively executed" + }, + { "PublicDescription": "Microarchitectural operation, Operations speculatively executed.", "EventCode": "0x8008", "EventName": "UOP_SPEC", @@ -360,6 +486,24 @@ "BriefDescription": "Floating-point Operations speculatively executed." }, { + "PublicDescription": "Floating-point half-precision operations speculatively executed", + "EventCode": "0x8014", + "EventName": "FP_HP_SPEC", + "BriefDescription": "Floating-point half-precision operations speculatively executed" + }, + { + "PublicDescription": "Floating-point single-precision operations speculatively executed", + "EventCode": "0x8018", + "EventName": "FP_SP_SPEC", + "BriefDescription": "Floating-point single-precision operations speculatively executed" + }, + { + "PublicDescription": "Floating-point double-precision operations speculatively executed", + "EventCode": "0x801C", + "EventName": "FP_DP_SPEC", + "BriefDescription": "Floating-point double-precision operations speculatively executed" + }, + { "PublicDescription": "Floating-point FMA Operations speculatively executed.", "EventCode": "0x8028", "EventName": "FP_FMA_SPEC", @@ -390,6 +534,30 @@ "BriefDescription": "SVE predicated Operations speculatively executed." }, { + "PublicDescription": "SVE predicated operations with no active predicates speculatively executed", + "EventCode": "0x8075", + "EventName": "SVE_PRED_EMPTY_SPEC", + "BriefDescription": "SVE predicated operations with no active predicates speculatively executed" + }, + { + "PublicDescription": "SVE predicated operations speculatively executed with all active predicates", + "EventCode": "0x8076", + "EventName": "SVE_PRED_FULL_SPEC", + "BriefDescription": "SVE predicated operations speculatively executed with all active predicates" + }, + { + "PublicDescription": "SVE predicated operations speculatively executed with partially active predicates", + "EventCode": "0x8077", + "EventName": "SVE_PRED_PARTIAL_SPEC", + "BriefDescription": "SVE predicated operations speculatively executed with partially active predicates" + }, + { + "PublicDescription": "SVE predicated operations with empty or partially active predicates", + "EventCode": "0x8079", + "EventName": "SVE_PRED_NOT_FULL_SPEC", + "BriefDescription": "SVE predicated operations with empty or partially active predicates" + }, + { "PublicDescription": "SVE MOVPRFX Operations speculatively executed.", "EventCode": "0x807C", "EventName": "SVE_MOVPRFX_SPEC", @@ -498,6 +666,12 @@ "BriefDescription": "SVE First-fault load Operations speculatively executed." }, { + "PublicDescription": "SVE first-fault load operations speculatively executed which set FFR bit to 0", + "EventCode": "0x80BD", + "EventName": "SVE_LDFF_FAULT_SPEC", + "BriefDescription": "SVE first-fault load operations speculatively executed which set FFR bit to 0" + }, + { "PublicDescription": "Scalable floating-point element Operations speculatively executed.", "EventCode": "0x80C0", "EventName": "FP_SCALE_OPS_SPEC", @@ -544,5 +718,29 @@ "EventCode": "0x80C7", "EventName": "FP_DP_FIXED_OPS_SPEC", "BriefDescription": "Non-scalable double-precision floating-point element Operations speculatively executed." + }, + { + "PublicDescription": "Advanced SIMD and SVE 8-bit integer operations speculatively executed", + "EventCode": "0x80E3", + "EventName": "ASE_SVE_INT8_SPEC", + "BriefDescription": "Advanced SIMD and SVE 8-bit integer operations speculatively executed" + }, + { + "PublicDescription": "Advanced SIMD and SVE 16-bit integer operations speculatively executed", + "EventCode": "0x80E7", + "EventName": "ASE_SVE_INT16_SPEC", + "BriefDescription": "Advanced SIMD and SVE 16-bit integer operations speculatively executed" + }, + { + "PublicDescription": "Advanced SIMD and SVE 32-bit integer operations speculatively executed", + "EventCode": "0x80EB", + "EventName": "ASE_SVE_INT32_SPEC", + "BriefDescription": "Advanced SIMD and SVE 32-bit integer operations speculatively executed" + }, + { + "PublicDescription": "Advanced SIMD and SVE 64-bit integer operations speculatively executed", + "EventCode": "0x80EF", + "EventName": "ASE_SVE_INT64_SPEC", + "BriefDescription": "Advanced SIMD and SVE 64-bit integer operations speculatively executed" } ] diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv index 31d8b57ca9bb..b899db48c12a 100644 --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv @@ -19,6 +19,7 @@ 0x00000000410fd0b0,v1,arm/cortex-a76-n1,core 0x00000000410fd0c0,v1,arm/cortex-a76-n1,core 0x00000000410fd400,v1,arm/neoverse-v1,core +0x00000000410fd490,v1,arm/neoverse-n2,core 0x00000000420f5160,v1,cavium/thunderx2,core 0x00000000430f0af0,v1,cavium/thunderx2,core 0x00000000460f0010,v1,fujitsu/a64fx,core diff --git a/tools/perf/pmu-events/arch/arm64/armv8-recommended.json b/tools/perf/pmu-events/arch/arm64/recommended.json index d0a19866563d..210afa856091 100644 --- a/tools/perf/pmu-events/arch/arm64/armv8-recommended.json +++ b/tools/perf/pmu-events/arch/arm64/recommended.json @@ -148,305 +148,305 @@ "EventCode": "0x60", "EventName": "BUS_ACCESS_RD", "BriefDescription": "Bus access read" - }, - { + }, + { "PublicDescription": "Bus access write", "EventCode": "0x61", "EventName": "BUS_ACCESS_WR", "BriefDescription": "Bus access write" - }, - { + }, + { "PublicDescription": "Bus access, Normal, Cacheable, Shareable", "EventCode": "0x62", "EventName": "BUS_ACCESS_SHARED", "BriefDescription": "Bus access, Normal, Cacheable, Shareable" - }, - { + }, + { "PublicDescription": "Bus access, not Normal, Cacheable, Shareable", "EventCode": "0x63", "EventName": "BUS_ACCESS_NOT_SHARED", "BriefDescription": "Bus access, not Normal, Cacheable, Shareable" - }, - { + }, + { "PublicDescription": "Bus access, Normal", "EventCode": "0x64", "EventName": "BUS_ACCESS_NORMAL", "BriefDescription": "Bus access, Normal" - }, - { + }, + { "PublicDescription": "Bus access, peripheral", "EventCode": "0x65", "EventName": "BUS_ACCESS_PERIPH", "BriefDescription": "Bus access, peripheral" - }, - { + }, + { "PublicDescription": "Data memory access, read", "EventCode": "0x66", "EventName": "MEM_ACCESS_RD", "BriefDescription": "Data memory access, read" - }, - { + }, + { "PublicDescription": "Data memory access, write", "EventCode": "0x67", "EventName": "MEM_ACCESS_WR", "BriefDescription": "Data memory access, write" - }, - { + }, + { "PublicDescription": "Unaligned access, read", "EventCode": "0x68", "EventName": "UNALIGNED_LD_SPEC", "BriefDescription": "Unaligned access, read" - }, - { + }, + { "PublicDescription": "Unaligned access, write", "EventCode": "0x69", "EventName": "UNALIGNED_ST_SPEC", "BriefDescription": "Unaligned access, write" - }, - { + }, + { "PublicDescription": "Unaligned access", "EventCode": "0x6a", "EventName": "UNALIGNED_LDST_SPEC", "BriefDescription": "Unaligned access" - }, - { + }, + { "PublicDescription": "Exclusive operation speculatively executed, LDREX or LDX", "EventCode": "0x6c", "EventName": "LDREX_SPEC", "BriefDescription": "Exclusive operation speculatively executed, LDREX or LDX" - }, - { + }, + { "PublicDescription": "Exclusive operation speculatively executed, STREX or STX pass", "EventCode": "0x6d", "EventName": "STREX_PASS_SPEC", "BriefDescription": "Exclusive operation speculatively executed, STREX or STX pass" - }, - { + }, + { "PublicDescription": "Exclusive operation speculatively executed, STREX or STX fail", "EventCode": "0x6e", "EventName": "STREX_FAIL_SPEC", "BriefDescription": "Exclusive operation speculatively executed, STREX or STX fail" - }, - { + }, + { "PublicDescription": "Exclusive operation speculatively executed, STREX or STX", "EventCode": "0x6f", "EventName": "STREX_SPEC", "BriefDescription": "Exclusive operation speculatively executed, STREX or STX" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, load", "EventCode": "0x70", "EventName": "LD_SPEC", "BriefDescription": "Operation speculatively executed, load" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, store", "EventCode": "0x71", "EventName": "ST_SPEC", "BriefDescription": "Operation speculatively executed, store" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, load or store", "EventCode": "0x72", "EventName": "LDST_SPEC", "BriefDescription": "Operation speculatively executed, load or store" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, integer data processing", "EventCode": "0x73", "EventName": "DP_SPEC", "BriefDescription": "Operation speculatively executed, integer data processing" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, Advanced SIMD instruction", "EventCode": "0x74", "EventName": "ASE_SPEC", "BriefDescription": "Operation speculatively executed, Advanced SIMD instruction" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, floating-point instruction", "EventCode": "0x75", "EventName": "VFP_SPEC", "BriefDescription": "Operation speculatively executed, floating-point instruction" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, software change of the PC", "EventCode": "0x76", "EventName": "PC_WRITE_SPEC", "BriefDescription": "Operation speculatively executed, software change of the PC" - }, - { + }, + { "PublicDescription": "Operation speculatively executed, Cryptographic instruction", "EventCode": "0x77", "EventName": "CRYPTO_SPEC", "BriefDescription": "Operation speculatively executed, Cryptographic instruction" - }, - { + }, + { "PublicDescription": "Branch speculatively executed, immediate branch", "EventCode": "0x78", "EventName": "BR_IMMED_SPEC", "BriefDescription": "Branch speculatively executed, immediate branch" - }, - { + }, + { "PublicDescription": "Branch speculatively executed, procedure return", "EventCode": "0x79", "EventName": "BR_RETURN_SPEC", "BriefDescription": "Branch speculatively executed, procedure return" - }, - { + }, + { "PublicDescription": "Branch speculatively executed, indirect branch", "EventCode": "0x7a", "EventName": "BR_INDIRECT_SPEC", "BriefDescription": "Branch speculatively executed, indirect branch" - }, - { + }, + { "PublicDescription": "Barrier speculatively executed, ISB", "EventCode": "0x7c", "EventName": "ISB_SPEC", "BriefDescription": "Barrier speculatively executed, ISB" - }, - { + }, + { "PublicDescription": "Barrier speculatively executed, DSB", "EventCode": "0x7d", "EventName": "DSB_SPEC", "BriefDescription": "Barrier speculatively executed, DSB" - }, - { + }, + { "PublicDescription": "Barrier speculatively executed, DMB", "EventCode": "0x7e", "EventName": "DMB_SPEC", "BriefDescription": "Barrier speculatively executed, DMB" - }, - { + }, + { "PublicDescription": "Exception taken, Other synchronous", "EventCode": "0x81", "EventName": "EXC_UNDEF", "BriefDescription": "Exception taken, Other synchronous" - }, - { + }, + { "PublicDescription": "Exception taken, Supervisor Call", "EventCode": "0x82", "EventName": "EXC_SVC", "BriefDescription": "Exception taken, Supervisor Call" - }, - { + }, + { "PublicDescription": "Exception taken, Instruction Abort", "EventCode": "0x83", "EventName": "EXC_PABORT", "BriefDescription": "Exception taken, Instruction Abort" - }, - { + }, + { "PublicDescription": "Exception taken, Data Abort and SError", "EventCode": "0x84", "EventName": "EXC_DABORT", "BriefDescription": "Exception taken, Data Abort and SError" - }, - { + }, + { "PublicDescription": "Exception taken, IRQ", "EventCode": "0x86", "EventName": "EXC_IRQ", "BriefDescription": "Exception taken, IRQ" - }, - { + }, + { "PublicDescription": "Exception taken, FIQ", "EventCode": "0x87", "EventName": "EXC_FIQ", "BriefDescription": "Exception taken, FIQ" - }, - { + }, + { "PublicDescription": "Exception taken, Secure Monitor Call", "EventCode": "0x88", "EventName": "EXC_SMC", "BriefDescription": "Exception taken, Secure Monitor Call" - }, - { + }, + { "PublicDescription": "Exception taken, Hypervisor Call", "EventCode": "0x8a", "EventName": "EXC_HVC", "BriefDescription": "Exception taken, Hypervisor Call" - }, - { + }, + { "PublicDescription": "Exception taken, Instruction Abort not taken locally", "EventCode": "0x8b", "EventName": "EXC_TRAP_PABORT", "BriefDescription": "Exception taken, Instruction Abort not taken locally" - }, - { + }, + { "PublicDescription": "Exception taken, Data Abort or SError not taken locally", "EventCode": "0x8c", "EventName": "EXC_TRAP_DABORT", "BriefDescription": "Exception taken, Data Abort or SError not taken locally" - }, - { + }, + { "PublicDescription": "Exception taken, Other traps not taken locally", "EventCode": "0x8d", "EventName": "EXC_TRAP_OTHER", "BriefDescription": "Exception taken, Other traps not taken locally" - }, - { + }, + { "PublicDescription": "Exception taken, IRQ not taken locally", "EventCode": "0x8e", "EventName": "EXC_TRAP_IRQ", "BriefDescription": "Exception taken, IRQ not taken locally" - }, - { + }, + { "PublicDescription": "Exception taken, FIQ not taken locally", "EventCode": "0x8f", "EventName": "EXC_TRAP_FIQ", "BriefDescription": "Exception taken, FIQ not taken locally" - }, - { + }, + { "PublicDescription": "Release consistency operation speculatively executed, Load-Acquire", "EventCode": "0x90", "EventName": "RC_LD_SPEC", "BriefDescription": "Release consistency operation speculatively executed, Load-Acquire" - }, - { + }, + { "PublicDescription": "Release consistency operation speculatively executed, Store-Release", "EventCode": "0x91", "EventName": "RC_ST_SPEC", "BriefDescription": "Release consistency operation speculatively executed, Store-Release" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache access, read", "EventCode": "0xa0", "EventName": "L3D_CACHE_RD", "BriefDescription": "Attributable Level 3 data or unified cache access, read" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache access, write", "EventCode": "0xa1", "EventName": "L3D_CACHE_WR", "BriefDescription": "Attributable Level 3 data or unified cache access, write" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache refill, read", "EventCode": "0xa2", "EventName": "L3D_CACHE_REFILL_RD", "BriefDescription": "Attributable Level 3 data or unified cache refill, read" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache refill, write", "EventCode": "0xa3", "EventName": "L3D_CACHE_REFILL_WR", "BriefDescription": "Attributable Level 3 data or unified cache refill, write" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache Write-Back, victim", "EventCode": "0xa6", "EventName": "L3D_CACHE_WB_VICTIM", "BriefDescription": "Attributable Level 3 data or unified cache Write-Back, victim" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache Write-Back, cache clean", "EventCode": "0xa7", "EventName": "L3D_CACHE_WB_CLEAN", "BriefDescription": "Attributable Level 3 data or unified cache Write-Back, cache clean" - }, - { + }, + { "PublicDescription": "Attributable Level 3 data or unified cache access, invalidate", "EventCode": "0xa8", "EventName": "L3D_CACHE_INVAL", "BriefDescription": "Attributable Level 3 data or unified cache access, invalidate" - } + } ] diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index 2e7c4153875b..1a57c3f81dd4 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -672,8 +672,6 @@ static int json_events(const char *fn, addfield(map, &je.metric_constraint, "", "", val); } else if (json_streq(map, field, "MetricExpr")) { addfield(map, &je.metric_expr, "", "", val); - for (s = je.metric_expr; *s; s++) - *s = tolower(*s); } else if (json_streq(map, field, "ArchStdEvent")) { addfield(map, &arch_std, "", "", val); for (s = arch_std; *s; s++) diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build index 803ca426f8e6..af2b37ef7c70 100644 --- a/tools/perf/tests/Build +++ b/tools/perf/tests/Build @@ -65,6 +65,7 @@ perf-y += pe-file-parsing.o perf-y += expand-cgroup.o perf-y += perf-time-to-tsc.o perf-y += dlfilter-test.o +perf-y += sigtrap.o $(OUTPUT)tests/llvm-src-base.c: tests/bpf-script-example.c tests/Build $(call rule_mkdir) diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c index 0f73e300f207..56fba08a3037 100644 --- a/tools/perf/tests/attr.c +++ b/tools/perf/tests/attr.c @@ -65,7 +65,7 @@ do { \ #define WRITE_ASS(field, fmt) __WRITE_ASS(field, fmt, attr->field) -static int store_event(struct perf_event_attr *attr, pid_t pid, int cpu, +static int store_event(struct perf_event_attr *attr, pid_t pid, struct perf_cpu cpu, int fd, int group_fd, unsigned long flags) { FILE *file; @@ -93,7 +93,7 @@ static int store_event(struct perf_event_attr *attr, pid_t pid, int cpu, /* syscall arguments */ __WRITE_ASS(fd, "d", fd); __WRITE_ASS(group_fd, "d", group_fd); - __WRITE_ASS(cpu, "d", cpu); + __WRITE_ASS(cpu, "d", cpu.cpu); __WRITE_ASS(pid, "d", pid); __WRITE_ASS(flags, "lu", flags); @@ -144,7 +144,7 @@ static int store_event(struct perf_event_attr *attr, pid_t pid, int cpu, return 0; } -void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu, +void test_attr__open(struct perf_event_attr *attr, pid_t pid, struct perf_cpu cpu, int fd, int group_fd, unsigned long flags) { int errno_saved = errno; diff --git a/tools/perf/tests/bitmap.c b/tools/perf/tests/bitmap.c index 384856347236..0bf399c49849 100644 --- a/tools/perf/tests/bitmap.c +++ b/tools/perf/tests/bitmap.c @@ -18,7 +18,7 @@ static unsigned long *get_bitmap(const char *str, int nbits) if (map && bm) { for (i = 0; i < map->nr; i++) - set_bit(map->map[i], bm); + set_bit(map->map[i].cpu, bm); } if (map) diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 8cb5a1c3489e..fac3717d9ba1 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -107,6 +107,7 @@ static struct test_suite *generic_tests[] = { &suite__expand_cgroup_events, &suite__perf_time_to_tsc, &suite__dlfilter, + &suite__sigtrap, NULL, }; @@ -420,7 +421,7 @@ static int run_shell_tests(int argc, const char *argv[], int i, int width, continue; st.file = ent->d_name; - pr_info("%2d: %-*s:", i, width, test_suite.desc); + pr_info("%3d: %-*s:", i, width, test_suite.desc); if (intlist__find(skiplist, i)) { color_fprintf(stderr, PERF_COLOR_YELLOW, " Skip (user override)\n"); @@ -470,7 +471,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist) continue; } - pr_info("%2d: %-*s:", i, width, test_description(t, -1)); + pr_info("%3d: %-*s:", i, width, test_description(t, -1)); if (intlist__find(skiplist, i)) { color_fprintf(stderr, PERF_COLOR_YELLOW, " Skip (user override)\n"); @@ -510,7 +511,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist) curr, argc, argv)) continue; - pr_info("%2d.%1d: %-*s:", i, subi + 1, subw, + pr_info("%3d.%1d: %-*s:", i, subi + 1, subw, test_description(t, subi)); test_and_print(t, subi); } @@ -545,7 +546,7 @@ static int perf_test__list_shell(int argc, const char **argv, int i) if (!perf_test__matches(t.desc, curr, argc, argv)) continue; - pr_info("%2d: %s\n", i, t.desc); + pr_info("%3d: %s\n", i, t.desc); } @@ -567,14 +568,14 @@ static int perf_test__list(int argc, const char **argv) if (!perf_test__matches(test_description(t, -1), curr, argc, argv)) continue; - pr_info("%2d: %s\n", i, test_description(t, -1)); + pr_info("%3d: %s\n", i, test_description(t, -1)); if (has_subtests(t)) { int subn = num_subtests(t); int subi; for (subi = 0; subi < subn; subi++) - pr_info("%2d:%1d: %s\n", i, subi + 1, + pr_info("%3d:%1d: %s\n", i, subi + 1, test_description(t, subi)); } } @@ -606,6 +607,9 @@ int cmd_test(int argc, const char **argv) if (ret < 0) return ret; + /* Unbuffered output */ + setvbuf(stdout, NULL, _IONBF, 0); + argc = parse_options_subcommand(argc, argv, test_options, test_subcommands, test_usage, 0); if (argc >= 1 && !strcmp(argv[0], "list")) return perf_test__list(argc - 1, argv + 1); diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c index 89a155092f85..84e87e31f119 100644 --- a/tools/perf/tests/cpumap.c +++ b/tools/perf/tests/cpumap.c @@ -38,7 +38,7 @@ static int process_event_mask(struct perf_tool *tool __maybe_unused, TEST_ASSERT_VAL("wrong nr", map->nr == 20); for (i = 0; i < 20; i++) { - TEST_ASSERT_VAL("wrong cpu", map->map[i] == i); + TEST_ASSERT_VAL("wrong cpu", map->map[i].cpu == i); } perf_cpu_map__put(map); @@ -67,8 +67,8 @@ static int process_event_cpus(struct perf_tool *tool __maybe_unused, map = cpu_map__new_data(data); TEST_ASSERT_VAL("wrong nr", map->nr == 2); - TEST_ASSERT_VAL("wrong cpu", map->map[0] == 1); - TEST_ASSERT_VAL("wrong cpu", map->map[1] == 256); + TEST_ASSERT_VAL("wrong cpu", map->map[0].cpu == 1); + TEST_ASSERT_VAL("wrong cpu", map->map[1].cpu == 256); TEST_ASSERT_VAL("wrong refcnt", refcount_read(&map->refcnt) == 1); perf_cpu_map__put(map); return 0; diff --git a/tools/perf/tests/event_update.c b/tools/perf/tests/event_update.c index d01532d40acb..16b6d6f47f38 100644 --- a/tools/perf/tests/event_update.c +++ b/tools/perf/tests/event_update.c @@ -76,9 +76,9 @@ static int process_event_cpus(struct perf_tool *tool __maybe_unused, TEST_ASSERT_VAL("wrong id", ev->id == 123); TEST_ASSERT_VAL("wrong type", ev->type == PERF_EVENT_UPDATE__CPUS); TEST_ASSERT_VAL("wrong cpus", map->nr == 3); - TEST_ASSERT_VAL("wrong cpus", map->map[0] == 1); - TEST_ASSERT_VAL("wrong cpus", map->map[1] == 2); - TEST_ASSERT_VAL("wrong cpus", map->map[2] == 3); + TEST_ASSERT_VAL("wrong cpus", map->map[0].cpu == 1); + TEST_ASSERT_VAL("wrong cpus", map->map[1].cpu == 2); + TEST_ASSERT_VAL("wrong cpus", map->map[2].cpu == 3); perf_cpu_map__put(map); return 0; } diff --git a/tools/perf/tests/mem2node.c b/tools/perf/tests/mem2node.c index b17b86391383..f4a4aba33f76 100644 --- a/tools/perf/tests/mem2node.c +++ b/tools/perf/tests/mem2node.c @@ -31,7 +31,7 @@ static unsigned long *get_bitmap(const char *str, int nbits) if (map && bm) { for (i = 0; i < map->nr; i++) { - set_bit(map->map[i], bm); + set_bit(map->map[i].cpu, bm); } } diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c index 90b2feda31ac..0ad62914b4d7 100644 --- a/tools/perf/tests/mmap-basic.c +++ b/tools/perf/tests/mmap-basic.c @@ -59,11 +59,11 @@ static int test__basic_mmap(struct test_suite *test __maybe_unused, int subtest } CPU_ZERO(&cpu_set); - CPU_SET(cpus->map[0], &cpu_set); + CPU_SET(cpus->map[0].cpu, &cpu_set); sched_setaffinity(0, sizeof(cpu_set), &cpu_set); if (sched_setaffinity(0, sizeof(cpu_set), &cpu_set) < 0) { pr_debug("sched_setaffinity() failed on CPU %d: %s ", - cpus->map[0], str_error_r(errno, sbuf, sizeof(sbuf))); + cpus->map[0].cpu, str_error_r(errno, sbuf, sizeof(sbuf))); goto out_free_cpus; } diff --git a/tools/perf/tests/openat-syscall-all-cpus.c b/tools/perf/tests/openat-syscall-all-cpus.c index cd3dd463783f..1ab362323d25 100644 --- a/tools/perf/tests/openat-syscall-all-cpus.c +++ b/tools/perf/tests/openat-syscall-all-cpus.c @@ -22,7 +22,8 @@ static int test__openat_syscall_event_on_all_cpus(struct test_suite *test __maybe_unused, int subtest __maybe_unused) { - int err = -1, fd, cpu; + int err = -1, fd, idx; + struct perf_cpu cpu; struct perf_cpu_map *cpus; struct evsel *evsel; unsigned int nr_openat_calls = 111, i; @@ -58,23 +59,23 @@ static int test__openat_syscall_event_on_all_cpus(struct test_suite *test __mayb goto out_evsel_delete; } - for (cpu = 0; cpu < cpus->nr; ++cpu) { - unsigned int ncalls = nr_openat_calls + cpu; + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + unsigned int ncalls = nr_openat_calls + idx; /* * XXX eventually lift this restriction in a way that * keeps perf building on older glibc installations * without CPU_ALLOC. 1024 cpus in 2010 still seems * a reasonable upper limit tho :-) */ - if (cpus->map[cpu] >= CPU_SETSIZE) { - pr_debug("Ignoring CPU %d\n", cpus->map[cpu]); + if (cpu.cpu >= CPU_SETSIZE) { + pr_debug("Ignoring CPU %d\n", cpu.cpu); continue; } - CPU_SET(cpus->map[cpu], &cpu_set); + CPU_SET(cpu.cpu, &cpu_set); if (sched_setaffinity(0, sizeof(cpu_set), &cpu_set) < 0) { pr_debug("sched_setaffinity() failed on CPU %d: %s ", - cpus->map[cpu], + cpu.cpu, str_error_r(errno, sbuf, sizeof(sbuf))); goto out_close_fd; } @@ -82,37 +83,29 @@ static int test__openat_syscall_event_on_all_cpus(struct test_suite *test __mayb fd = openat(0, "/etc/passwd", O_RDONLY); close(fd); } - CPU_CLR(cpus->map[cpu], &cpu_set); + CPU_CLR(cpu.cpu, &cpu_set); } - /* - * Here we need to explicitly preallocate the counts, as if - * we use the auto allocation it will allocate just for 1 cpu, - * as we start by cpu 0. - */ - if (evsel__alloc_counts(evsel, cpus->nr, 1) < 0) { - pr_debug("evsel__alloc_counts(ncpus=%d)\n", cpus->nr); - goto out_close_fd; - } + evsel->core.cpus = perf_cpu_map__get(cpus); err = 0; - for (cpu = 0; cpu < cpus->nr; ++cpu) { + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { unsigned int expected; - if (cpus->map[cpu] >= CPU_SETSIZE) + if (cpu.cpu >= CPU_SETSIZE) continue; - if (evsel__read_on_cpu(evsel, cpu, 0) < 0) { + if (evsel__read_on_cpu(evsel, idx, 0) < 0) { pr_debug("evsel__read_on_cpu\n"); err = -1; break; } - expected = nr_openat_calls + cpu; - if (perf_counts(evsel->counts, cpu, 0)->val != expected) { + expected = nr_openat_calls + idx; + if (perf_counts(evsel->counts, idx, 0)->val != expected) { pr_debug("evsel__read_on_cpu: expected to intercept %d calls on cpu %d, got %" PRIu64 "\n", - expected, cpus->map[cpu], perf_counts(evsel->counts, cpu, 0)->val); + expected, cpu.cpu, perf_counts(evsel->counts, idx, 0)->val); err = -1; } } diff --git a/tools/perf/tests/shell/stat_all_metricgroups.sh b/tools/perf/tests/shell/stat_all_metricgroups.sh index de24d374ce24..cb35e488809a 100755 --- a/tools/perf/tests/shell/stat_all_metricgroups.sh +++ b/tools/perf/tests/shell/stat_all_metricgroups.sh @@ -6,7 +6,7 @@ set -e for m in $(perf list --raw-dump metricgroups); do echo "Testing $m" - perf stat -M "$m" true + perf stat -M "$m" -a true done exit 0 diff --git a/tools/perf/tests/sigtrap.c b/tools/perf/tests/sigtrap.c new file mode 100644 index 000000000000..1f147fe6595f --- /dev/null +++ b/tools/perf/tests/sigtrap.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Basic test for sigtrap support. + * + * Copyright (C) 2021, Google LLC. + */ + +#include <errno.h> +#include <stdint.h> +#include <stdlib.h> +#include <linux/hw_breakpoint.h> +#include <linux/string.h> +#include <pthread.h> +#include <signal.h> +#include <sys/ioctl.h> +#include <sys/syscall.h> +#include <unistd.h> + +#include "cloexec.h" +#include "debug.h" +#include "event.h" +#include "tests.h" +#include "../perf-sys.h" + +/* + * PowerPC and S390 do not support creation of instruction breakpoints using the + * perf_event interface. + * + * Just disable the test for these architectures until these issues are + * resolved. + */ +#if defined(__powerpc__) || defined(__s390x__) +#define BP_ACCOUNT_IS_SUPPORTED 0 +#else +#define BP_ACCOUNT_IS_SUPPORTED 1 +#endif + +#define NUM_THREADS 5 + +static struct { + int tids_want_signal; /* Which threads still want a signal. */ + int signal_count; /* Sanity check number of signals received. */ + volatile int iterate_on; /* Variable to set breakpoint on. */ + siginfo_t first_siginfo; /* First observed siginfo_t. */ +} ctx; + +#define TEST_SIG_DATA (~(unsigned long)(&ctx.iterate_on)) + +static struct perf_event_attr make_event_attr(void) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_BREAKPOINT, + .size = sizeof(attr), + .sample_period = 1, + .disabled = 1, + .bp_addr = (unsigned long)&ctx.iterate_on, + .bp_type = HW_BREAKPOINT_RW, + .bp_len = HW_BREAKPOINT_LEN_1, + .inherit = 1, /* Children inherit events ... */ + .inherit_thread = 1, /* ... but only cloned with CLONE_THREAD. */ + .remove_on_exec = 1, /* Required by sigtrap. */ + .sigtrap = 1, /* Request synchronous SIGTRAP on event. */ + .sig_data = TEST_SIG_DATA, + .exclude_kernel = 1, /* To allow */ + .exclude_hv = 1, /* running as !root */ + }; + return attr; +} + +static void +sigtrap_handler(int signum __maybe_unused, siginfo_t *info, void *ucontext __maybe_unused) +{ + if (!__atomic_fetch_add(&ctx.signal_count, 1, __ATOMIC_RELAXED)) + ctx.first_siginfo = *info; + __atomic_fetch_sub(&ctx.tids_want_signal, syscall(SYS_gettid), __ATOMIC_RELAXED); +} + +static void *test_thread(void *arg) +{ + pthread_barrier_t *barrier = (pthread_barrier_t *)arg; + pid_t tid = syscall(SYS_gettid); + int i; + + pthread_barrier_wait(barrier); + + __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED); + for (i = 0; i < ctx.iterate_on - 1; i++) + __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED); + + return NULL; +} + +static int run_test_threads(pthread_t *threads, pthread_barrier_t *barrier) +{ + int i; + + pthread_barrier_wait(barrier); + for (i = 0; i < NUM_THREADS; i++) + TEST_ASSERT_EQUAL("pthread_join() failed", pthread_join(threads[i], NULL), 0); + + return TEST_OK; +} + +static int run_stress_test(int fd, pthread_t *threads, pthread_barrier_t *barrier) +{ + int ret; + + ctx.iterate_on = 3000; + + TEST_ASSERT_EQUAL("misfired signal?", ctx.signal_count, 0); + TEST_ASSERT_EQUAL("enable failed", ioctl(fd, PERF_EVENT_IOC_ENABLE, 0), 0); + ret = run_test_threads(threads, barrier); + TEST_ASSERT_EQUAL("disable failed", ioctl(fd, PERF_EVENT_IOC_DISABLE, 0), 0); + + TEST_ASSERT_EQUAL("unexpected sigtraps", ctx.signal_count, NUM_THREADS * ctx.iterate_on); + TEST_ASSERT_EQUAL("missing signals or incorrectly delivered", ctx.tids_want_signal, 0); + TEST_ASSERT_VAL("unexpected si_addr", ctx.first_siginfo.si_addr == &ctx.iterate_on); +#if 0 /* FIXME: enable when libc's signal.h has si_perf_{type,data} */ + TEST_ASSERT_EQUAL("unexpected si_perf_type", ctx.first_siginfo.si_perf_type, + PERF_TYPE_BREAKPOINT); + TEST_ASSERT_EQUAL("unexpected si_perf_data", ctx.first_siginfo.si_perf_data, + TEST_SIG_DATA); +#endif + + return ret; +} + +static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused) +{ + struct perf_event_attr attr = make_event_attr(); + struct sigaction action = {}; + struct sigaction oldact; + pthread_t threads[NUM_THREADS]; + pthread_barrier_t barrier; + char sbuf[STRERR_BUFSIZE]; + int i, fd, ret = TEST_FAIL; + + if (!BP_ACCOUNT_IS_SUPPORTED) { + pr_debug("Test not supported on this architecture"); + return TEST_SKIP; + } + + pthread_barrier_init(&barrier, NULL, NUM_THREADS + 1); + + action.sa_flags = SA_SIGINFO | SA_NODEFER; + action.sa_sigaction = sigtrap_handler; + sigemptyset(&action.sa_mask); + if (sigaction(SIGTRAP, &action, &oldact)) { + pr_debug("FAILED sigaction(): %s\n", str_error_r(errno, sbuf, sizeof(sbuf))); + goto out; + } + + fd = sys_perf_event_open(&attr, 0, -1, -1, perf_event_open_cloexec_flag()); + if (fd < 0) { + pr_debug("FAILED sys_perf_event_open(): %s\n", str_error_r(errno, sbuf, sizeof(sbuf))); + goto out_restore_sigaction; + } + + for (i = 0; i < NUM_THREADS; i++) { + if (pthread_create(&threads[i], NULL, test_thread, &barrier)) { + pr_debug("FAILED pthread_create(): %s\n", str_error_r(errno, sbuf, sizeof(sbuf))); + goto out_close_perf_event; + } + } + + ret = run_stress_test(fd, threads, &barrier); + +out_close_perf_event: + close(fd); +out_restore_sigaction: + sigaction(SIGTRAP, &oldact, NULL); +out: + pthread_barrier_destroy(&barrier); + return ret; +} + +DEFINE_SUITE("Sigtrap", sigtrap); diff --git a/tools/perf/tests/stat.c b/tools/perf/tests/stat.c index 2eb096b5e6da..500974040fe3 100644 --- a/tools/perf/tests/stat.c +++ b/tools/perf/tests/stat.c @@ -87,7 +87,8 @@ static int test__synthesize_stat(struct test_suite *test __maybe_unused, int sub count.run = 300; TEST_ASSERT_VAL("failed to synthesize stat_config", - !perf_event__synthesize_stat(NULL, 1, 2, 3, &count, process_stat_event, NULL)); + !perf_event__synthesize_stat(NULL, (struct perf_cpu){.cpu = 1}, 2, 3, + &count, process_stat_event, NULL)); return 0; } diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index 8f65098110fc..5bbb8f6a48fc 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -146,6 +146,7 @@ DECLARE_SUITE(pe_file_parsing); DECLARE_SUITE(expand_cgroup_events); DECLARE_SUITE(perf_time_to_tsc); DECLARE_SUITE(dlfilter); +DECLARE_SUITE(sigtrap); /* * PowerPC and S390 do not support creation of instruction breakpoints using the diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c index 869986139146..c4ef0c7002f1 100644 --- a/tools/perf/tests/topology.c +++ b/tools/perf/tests/topology.c @@ -112,62 +112,83 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map) TEST_ASSERT_VAL("Session header CPU map not set", session->header.env.cpu); for (i = 0; i < session->header.env.nr_cpus_avail; i++) { - if (!cpu_map__has(map, i)) + struct perf_cpu cpu = { .cpu = i }; + + if (!perf_cpu_map__has(map, cpu)) continue; pr_debug("CPU %d, core %d, socket %d\n", i, session->header.env.cpu[i].core_id, session->header.env.cpu[i].socket_id); } + // Test that CPU ID contains socket, die, core and CPU + for (i = 0; i < map->nr; i++) { + id = aggr_cpu_id__cpu(perf_cpu_map__cpu(map, i), NULL); + TEST_ASSERT_VAL("Cpu map - CPU ID doesn't match", map->map[i].cpu == id.cpu.cpu); + + TEST_ASSERT_VAL("Cpu map - Core ID doesn't match", + session->header.env.cpu[map->map[i].cpu].core_id == id.core); + TEST_ASSERT_VAL("Cpu map - Socket ID doesn't match", + session->header.env.cpu[map->map[i].cpu].socket_id == id.socket); + + TEST_ASSERT_VAL("Cpu map - Die ID doesn't match", + session->header.env.cpu[map->map[i].cpu].die_id == id.die); + TEST_ASSERT_VAL("Cpu map - Node ID is set", id.node == -1); + TEST_ASSERT_VAL("Cpu map - Thread is set", id.thread == -1); + } + // Test that core ID contains socket, die and core for (i = 0; i < map->nr; i++) { - id = cpu_map__get_core(map, i, NULL); + id = aggr_cpu_id__core(perf_cpu_map__cpu(map, i), NULL); TEST_ASSERT_VAL("Core map - Core ID doesn't match", - session->header.env.cpu[map->map[i]].core_id == id.core); + session->header.env.cpu[map->map[i].cpu].core_id == id.core); TEST_ASSERT_VAL("Core map - Socket ID doesn't match", - session->header.env.cpu[map->map[i]].socket_id == id.socket); + session->header.env.cpu[map->map[i].cpu].socket_id == id.socket); TEST_ASSERT_VAL("Core map - Die ID doesn't match", - session->header.env.cpu[map->map[i]].die_id == id.die); + session->header.env.cpu[map->map[i].cpu].die_id == id.die); TEST_ASSERT_VAL("Core map - Node ID is set", id.node == -1); TEST_ASSERT_VAL("Core map - Thread is set", id.thread == -1); } // Test that die ID contains socket and die for (i = 0; i < map->nr; i++) { - id = cpu_map__get_die(map, i, NULL); + id = aggr_cpu_id__die(perf_cpu_map__cpu(map, i), NULL); TEST_ASSERT_VAL("Die map - Socket ID doesn't match", - session->header.env.cpu[map->map[i]].socket_id == id.socket); + session->header.env.cpu[map->map[i].cpu].socket_id == id.socket); TEST_ASSERT_VAL("Die map - Die ID doesn't match", - session->header.env.cpu[map->map[i]].die_id == id.die); + session->header.env.cpu[map->map[i].cpu].die_id == id.die); TEST_ASSERT_VAL("Die map - Node ID is set", id.node == -1); TEST_ASSERT_VAL("Die map - Core is set", id.core == -1); + TEST_ASSERT_VAL("Die map - CPU is set", id.cpu.cpu == -1); TEST_ASSERT_VAL("Die map - Thread is set", id.thread == -1); } // Test that socket ID contains only socket for (i = 0; i < map->nr; i++) { - id = cpu_map__get_socket(map, i, NULL); + id = aggr_cpu_id__socket(perf_cpu_map__cpu(map, i), NULL); TEST_ASSERT_VAL("Socket map - Socket ID doesn't match", - session->header.env.cpu[map->map[i]].socket_id == id.socket); + session->header.env.cpu[map->map[i].cpu].socket_id == id.socket); TEST_ASSERT_VAL("Socket map - Node ID is set", id.node == -1); TEST_ASSERT_VAL("Socket map - Die ID is set", id.die == -1); TEST_ASSERT_VAL("Socket map - Core is set", id.core == -1); + TEST_ASSERT_VAL("Socket map - CPU is set", id.cpu.cpu == -1); TEST_ASSERT_VAL("Socket map - Thread is set", id.thread == -1); } // Test that node ID contains only node for (i = 0; i < map->nr; i++) { - id = cpu_map__get_node(map, i, NULL); + id = aggr_cpu_id__node(perf_cpu_map__cpu(map, i), NULL); TEST_ASSERT_VAL("Node map - Node ID doesn't match", cpu__get_node(map->map[i]) == id.node); TEST_ASSERT_VAL("Node map - Socket is set", id.socket == -1); TEST_ASSERT_VAL("Node map - Die ID is set", id.die == -1); TEST_ASSERT_VAL("Node map - Core is set", id.core == -1); + TEST_ASSERT_VAL("Node map - CPU is set", id.cpu.cpu == -1); TEST_ASSERT_VAL("Node map - Thread is set", id.thread == -1); } perf_session__delete(session); diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index e81c2493efdf..44ba900828f6 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -966,6 +966,7 @@ int symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel, .opts = opts, }; int ret = -1, err; + int not_annotated = list_empty(¬es->src->source); if (sym == NULL) return -1; @@ -973,13 +974,15 @@ int symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel, if (ms->map->dso->annotate_warned) return -1; - err = symbol__annotate2(ms, evsel, opts, &browser.arch); - if (err) { - char msg[BUFSIZ]; - ms->map->dso->annotate_warned = true; - symbol__strerror_disassemble(ms, err, msg, sizeof(msg)); - ui__error("Couldn't annotate %s:\n%s", sym->name, msg); - goto out_free_offsets; + if (not_annotated) { + err = symbol__annotate2(ms, evsel, opts, &browser.arch); + if (err) { + char msg[BUFSIZ]; + ms->map->dso->annotate_warned = true; + symbol__strerror_disassemble(ms, err, msg, sizeof(msg)); + ui__error("Couldn't annotate %s:\n%s", sym->name, msg); + goto out_free_offsets; + } } ui_helpline__push("Press ESC to exit"); @@ -994,9 +997,11 @@ int symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel, ret = annotate_browser__run(&browser, evsel, hbt); - annotated_source__purge(notes->src); + if(not_annotated) + annotated_source__purge(notes->src); out_free_offsets: - zfree(¬es->offsets); + if(not_annotated) + zfree(¬es->offsets); return ret; } diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 2e5bfbb69960..2a403cefcaf2 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -1,3 +1,4 @@ +perf-y += arm64-frame-pointer-unwind-support.o perf-y += annotate.o perf-y += block-info.o perf-y += block-range.o @@ -144,6 +145,7 @@ perf-$(CONFIG_LIBBPF) += bpf-loader.o perf-$(CONFIG_LIBBPF) += bpf_map.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter_cgroup.o +perf-$(CONFIG_PERF_BPF_SKEL) += bpf_ftrace.o perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o perf-$(CONFIG_LIBELF) += symbol-elf.o perf-$(CONFIG_LIBELF) += probe-file.o diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c index 7b12bd7a3080..f1e30d566db3 100644 --- a/tools/perf/util/affinity.c +++ b/tools/perf/util/affinity.c @@ -11,7 +11,7 @@ static int get_cpu_set_size(void) { - int sz = cpu__max_cpu() + 8 - 1; + int sz = cpu__max_cpu().cpu + 8 - 1; /* * sched_getaffinity doesn't like masks smaller than the kernel. * Hopefully that's big enough. diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c index 3fc528c9270c..5e390a1a79ab 100644 --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c @@ -179,6 +179,8 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder) decoder->record.phys_addr = ip; break; case ARM_SPE_COUNTER: + if (idx == SPE_CNT_PKT_HDR_INDEX_TOTAL_LAT) + decoder->record.latency = payload; break; case ARM_SPE_CONTEXT: decoder->record.context_id = payload; diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h index 46a8556a9e95..69b31084d6be 100644 --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h @@ -33,6 +33,7 @@ struct arm_spe_record { enum arm_spe_sample_type type; int err; u32 op; + u32 latency; u64 from_ip; u64 to_ip; u64 timestamp; diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index fccac06b573a..d2b64e3f588b 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -58,6 +58,8 @@ struct arm_spe { u8 sample_branch; u8 sample_remote_access; u8 sample_memory; + u8 sample_instructions; + u64 instructions_sample_period; u64 l1d_miss_id; u64 l1d_access_id; @@ -68,6 +70,7 @@ struct arm_spe { u64 branch_miss_id; u64 remote_access_id; u64 memory_id; + u64 instructions_id; u64 kernel_start; @@ -90,6 +93,7 @@ struct arm_spe_queue { u64 time; u64 timestamp; struct thread *thread; + u64 period_instructions; }; static void arm_spe_dump(struct arm_spe *spe __maybe_unused, @@ -202,6 +206,7 @@ static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe, speq->pid = -1; speq->tid = -1; speq->cpu = -1; + speq->period_instructions = 0; /* params set */ params.get_trace = arm_spe_get_trace; @@ -330,6 +335,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq, sample.addr = record->virt_addr; sample.phys_addr = record->phys_addr; sample.data_src = data_src; + sample.weight = record->latency; return arm_spe_deliver_synth_event(spe, speq, event, &sample); } @@ -347,6 +353,36 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq, sample.id = spe_events_id; sample.stream_id = spe_events_id; sample.addr = record->to_ip; + sample.weight = record->latency; + + return arm_spe_deliver_synth_event(spe, speq, event, &sample); +} + +static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq, + u64 spe_events_id, u64 data_src) +{ + struct arm_spe *spe = speq->spe; + struct arm_spe_record *record = &speq->decoder->record; + union perf_event *event = speq->event_buf; + struct perf_sample sample = { .ip = 0, }; + + /* + * Handles perf instruction sampling period. + */ + speq->period_instructions++; + if (speq->period_instructions < spe->instructions_sample_period) + return 0; + speq->period_instructions = 0; + + arm_spe_prep_sample(spe, speq, event, &sample); + + sample.id = spe_events_id; + sample.stream_id = spe_events_id; + sample.addr = record->virt_addr; + sample.phys_addr = record->phys_addr; + sample.data_src = data_src; + sample.period = spe->instructions_sample_period; + sample.weight = record->latency; return arm_spe_deliver_synth_event(spe, speq, event, &sample); } @@ -480,6 +516,12 @@ static int arm_spe_sample(struct arm_spe_queue *speq) return err; } + if (spe->sample_instructions) { + err = arm_spe__synth_instruction_sample(speq, spe->instructions_id, data_src); + if (err) + return err; + } + return 0; } @@ -993,7 +1035,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session) attr.type = PERF_TYPE_HARDWARE; attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK; attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID | - PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC; + PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC | + PERF_SAMPLE_WEIGHT; if (spe->timeless_decoding) attr.sample_type &= ~(u64)PERF_SAMPLE_TIME; else @@ -1107,7 +1150,29 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session) return err; spe->memory_id = id; arm_spe_set_event_name(evlist, id, "memory"); + id += 1; + } + + if (spe->synth_opts.instructions) { + if (spe->synth_opts.period_type != PERF_ITRACE_PERIOD_INSTRUCTIONS) { + pr_warning("Only instruction-based sampling period is currently supported by Arm SPE.\n"); + goto synth_instructions_out; + } + if (spe->synth_opts.period > 1) + pr_warning("Arm SPE has a hardware-based sample period.\n" + "Additional instruction events will be discarded by --itrace\n"); + + spe->sample_instructions = true; + attr.config = PERF_COUNT_HW_INSTRUCTIONS; + attr.sample_period = spe->synth_opts.period; + spe->instructions_sample_period = attr.sample_period; + err = arm_spe_synth_event(session, &attr, id); + if (err) + return err; + spe->instructions_id = id; + arm_spe_set_event_name(evlist, id, "instructions"); } +synth_instructions_out: return 0; } diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c new file mode 100644 index 000000000000..2242a885fbd7 --- /dev/null +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "arm64-frame-pointer-unwind-support.h" +#include "callchain.h" +#include "event.h" +#include "perf_regs.h" // SMPL_REG_MASK +#include "unwind.h" + +#define perf_event_arm_regs perf_event_arm64_regs +#include "../../arch/arm64/include/uapi/asm/perf_regs.h" +#undef perf_event_arm_regs + +struct entries { + u64 stack[2]; + size_t length; +}; + +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample) +{ + return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs + && sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR); +} + +static int add_entry(struct unwind_entry *entry, void *arg) +{ + struct entries *entries = arg; + + entries->stack[entries->length++] = entry->ip; + return 0; +} + +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx) +{ + int ret; + struct entries entries = {}; + struct regs_dump old_regs = sample->user_regs; + + if (!get_leaf_frame_caller_enabled(sample)) + return 0; + + /* + * If PC and SP are not recorded, get the value of PC from the stack + * and set its mask. SP is not used when doing the unwinding but it + * still needs to be set to prevent failures. + */ + + if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) { + sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC); + sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1]; + } + + if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) { + sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP); + sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0; + } + + ret = unwind__get_entries(add_entry, &entries, thread, sample, 2); + sample->user_regs = old_regs; + + if (ret || entries.length != 2) + return ret; + + return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1]; +} diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h new file mode 100644 index 000000000000..32af9ce94398 --- /dev/null +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H + +#include "event.h" +#include "thread.h" + +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx); + +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */ diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index c679394b898d..5632efc44738 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -123,7 +123,7 @@ int auxtrace_mmap__mmap(struct auxtrace_mmap *mm, mm->prev = 0; mm->idx = mp->idx; mm->tid = mp->tid; - mm->cpu = mp->cpu; + mm->cpu = mp->cpu.cpu; if (!mp->len) { mm->base = NULL; @@ -180,7 +180,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp, else mp->tid = -1; } else { - mp->cpu = -1; + mp->cpu.cpu = -1; mp->tid = perf_thread_map__pid(evlist->core.threads, idx); } } @@ -292,7 +292,7 @@ static int auxtrace_queues__queue_buffer(struct auxtrace_queues *queues, if (!queue->set) { queue->set = true; queue->tid = buffer->tid; - queue->cpu = buffer->cpu; + queue->cpu = buffer->cpu.cpu; } buffer->buffer_nr = queues->next_buffer_nr++; @@ -339,11 +339,11 @@ static int auxtrace_queues__split_buffer(struct auxtrace_queues *queues, return 0; } -static bool filter_cpu(struct perf_session *session, int cpu) +static bool filter_cpu(struct perf_session *session, struct perf_cpu cpu) { unsigned long *cpu_bitmap = session->itrace_synth_opts->cpu_bitmap; - return cpu_bitmap && cpu != -1 && !test_bit(cpu, cpu_bitmap); + return cpu_bitmap && cpu.cpu != -1 && !test_bit(cpu.cpu, cpu_bitmap); } static int auxtrace_queues__add_buffer(struct auxtrace_queues *queues, @@ -399,7 +399,7 @@ int auxtrace_queues__add_event(struct auxtrace_queues *queues, struct auxtrace_buffer buffer = { .pid = -1, .tid = event->auxtrace.tid, - .cpu = event->auxtrace.cpu, + .cpu = { event->auxtrace.cpu }, .data_offset = data_offset, .offset = event->auxtrace.offset, .reference = event->auxtrace.reference, diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index bbf0d78c6401..19910b9011f3 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -15,6 +15,7 @@ #include <linux/list.h> #include <linux/perf_event.h> #include <linux/types.h> +#include <internal/cpumap.h> #include <asm/bitsperlong.h> #include <asm/barrier.h> @@ -240,7 +241,7 @@ struct auxtrace_buffer { size_t size; pid_t pid; pid_t tid; - int cpu; + struct perf_cpu cpu; void *data; off_t data_offset; void *mmap_addr; @@ -350,7 +351,7 @@ struct auxtrace_mmap_params { int prot; int idx; pid_t tid; - int cpu; + struct perf_cpu cpu; }; /** diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index 528aeb0ab79d..7ecfaac7536a 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -424,7 +424,7 @@ preproc_gen_prologue(struct bpf_program *prog, int n, size_t prologue_cnt = 0; int i, err; - if (IS_ERR(priv) || !priv || priv->is_tp) + if (IS_ERR_OR_NULL(priv) || priv->is_tp) goto errout; pev = &priv->pev; @@ -573,7 +573,7 @@ static int hook_load_preprocessor(struct bpf_program *prog) bool need_prologue = false; int err, i; - if (IS_ERR(priv) || !priv) { + if (IS_ERR_OR_NULL(priv)) { pr_debug("Internal error when hook preprocessor\n"); return -BPF_LOADER_ERRNO__INTERNAL; } @@ -645,8 +645,11 @@ int bpf__probe(struct bpf_object *obj) goto out; priv = bpf_program__priv(prog); - if (IS_ERR(priv) || !priv) { - err = PTR_ERR(priv); + if (IS_ERR_OR_NULL(priv)) { + if (!priv) + err = -BPF_LOADER_ERRNO__INTERNAL; + else + err = PTR_ERR(priv); goto out; } @@ -696,7 +699,7 @@ int bpf__unprobe(struct bpf_object *obj) struct bpf_prog_priv *priv = bpf_program__priv(prog); int i; - if (IS_ERR(priv) || !priv || priv->is_tp) + if (IS_ERR_OR_NULL(priv) || priv->is_tp) continue; for (i = 0; i < priv->pev.ntevs; i++) { @@ -754,7 +757,7 @@ int bpf__foreach_event(struct bpf_object *obj, struct perf_probe_event *pev; int i, fd; - if (IS_ERR(priv) || !priv) { + if (IS_ERR_OR_NULL(priv)) { pr_debug("bpf: failed to get private field\n"); return -BPF_LOADER_ERRNO__INTERNAL; } diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c index 5a97fd7d0a71..3ce8d03cb7ec 100644 --- a/tools/perf/util/bpf_counter.c +++ b/tools/perf/util/bpf_counter.c @@ -265,7 +265,7 @@ static int bpf_program_profiler__read(struct evsel *evsel) return 0; } -static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu, +static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu_map_idx, int fd) { struct bpf_prog_profiler_bpf *skel; @@ -277,7 +277,7 @@ static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu, assert(skel != NULL); ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events), - &cpu, &fd, BPF_ANY); + &cpu_map_idx, &fd, BPF_ANY); if (ret) return ret; } @@ -554,7 +554,7 @@ static int bperf__load(struct evsel *evsel, struct target *target) filter_type == BPERF_FILTER_TGID) key = evsel->core.threads->map[i].pid; else if (filter_type == BPERF_FILTER_CPU) - key = evsel->core.cpus->map[i]; + key = evsel->core.cpus->map[i].cpu; else break; @@ -580,12 +580,12 @@ out: return err; } -static int bperf__install_pe(struct evsel *evsel, int cpu, int fd) +static int bperf__install_pe(struct evsel *evsel, int cpu_map_idx, int fd) { struct bperf_leader_bpf *skel = evsel->leader_skel; return bpf_map_update_elem(bpf_map__fd(skel->maps.events), - &cpu, &fd, BPF_ANY); + &cpu_map_idx, &fd, BPF_ANY); } /* @@ -598,7 +598,7 @@ static int bperf_sync_counters(struct evsel *evsel) num_cpu = all_cpu_map->nr; for (i = 0; i < num_cpu; i++) { - cpu = all_cpu_map->map[i]; + cpu = all_cpu_map->map[i].cpu; bperf_trigger_reading(evsel->bperf_leader_prog_fd, cpu); } return 0; @@ -619,15 +619,17 @@ static int bperf__disable(struct evsel *evsel) static int bperf__read(struct evsel *evsel) { struct bperf_follower_bpf *skel = evsel->follower_skel; - __u32 num_cpu_bpf = cpu__max_cpu(); + __u32 num_cpu_bpf = cpu__max_cpu().cpu; struct bpf_perf_event_value values[num_cpu_bpf]; int reading_map_fd, err = 0; - __u32 i, j, num_cpu; + __u32 i; + int j; bperf_sync_counters(evsel); reading_map_fd = bpf_map__fd(skel->maps.accum_readings); for (i = 0; i < bpf_map__max_entries(skel->maps.accum_readings); i++) { + struct perf_cpu entry; __u32 cpu; err = bpf_map_lookup_elem(reading_map_fd, &i, values); @@ -637,16 +639,15 @@ static int bperf__read(struct evsel *evsel) case BPERF_FILTER_GLOBAL: assert(i == 0); - num_cpu = all_cpu_map->nr; - for (j = 0; j < num_cpu; j++) { - cpu = all_cpu_map->map[j]; + perf_cpu_map__for_each_cpu(entry, j, all_cpu_map) { + cpu = entry.cpu; perf_counts(evsel->counts, cpu, 0)->val = values[cpu].counter; perf_counts(evsel->counts, cpu, 0)->ena = values[cpu].enabled; perf_counts(evsel->counts, cpu, 0)->run = values[cpu].running; } break; case BPERF_FILTER_CPU: - cpu = evsel->core.cpus->map[i]; + cpu = evsel->core.cpus->map[i].cpu; perf_counts(evsel->counts, i, 0)->val = values[cpu].counter; perf_counts(evsel->counts, i, 0)->ena = values[cpu].enabled; perf_counts(evsel->counts, i, 0)->run = values[cpu].running; @@ -771,11 +772,11 @@ static inline bool bpf_counter_skip(struct evsel *evsel) evsel->follower_skel == NULL; } -int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd) +int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd) { if (bpf_counter_skip(evsel)) return 0; - return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd); + return evsel->bpf_counter_ops->install_pe(evsel, cpu_map_idx, fd); } int bpf_counter__load(struct evsel *evsel, struct target *target) diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h index 65ebaa6694fb..4dbf26408b69 100644 --- a/tools/perf/util/bpf_counter.h +++ b/tools/perf/util/bpf_counter.h @@ -16,7 +16,7 @@ typedef int (*bpf_counter_evsel_op)(struct evsel *evsel); typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel, struct target *target); typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel, - int cpu, + int cpu_map_idx, int fd); struct bpf_counter_ops { @@ -40,7 +40,7 @@ int bpf_counter__enable(struct evsel *evsel); int bpf_counter__disable(struct evsel *evsel); int bpf_counter__read(struct evsel *evsel); void bpf_counter__destroy(struct evsel *evsel); -int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd); +int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd); #else /* HAVE_BPF_SKEL */ diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c index cbc6c2bca488..631e34a0b66f 100644 --- a/tools/perf/util/bpf_counter_cgroup.c +++ b/tools/perf/util/bpf_counter_cgroup.c @@ -48,7 +48,7 @@ static int bperf_load_program(struct evlist *evlist) struct cgroup *cgrp, *leader_cgrp; __u32 i, cpu; __u32 nr_cpus = evlist->core.all_cpus->nr; - int total_cpus = cpu__max_cpu(); + int total_cpus = cpu__max_cpu().cpu; int map_size, map_fd; int prog_fd, err; @@ -125,7 +125,7 @@ static int bperf_load_program(struct evlist *evlist) for (cpu = 0; cpu < nr_cpus; cpu++) { int fd = FD(evsel, cpu); __u32 idx = evsel->core.idx * total_cpus + - evlist->core.all_cpus->map[cpu]; + evlist->core.all_cpus->map[cpu].cpu; err = bpf_map_update_elem(map_fd, &idx, &fd, BPF_ANY); @@ -212,7 +212,7 @@ static int bperf_cgrp__sync_counters(struct evlist *evlist) int prog_fd = bpf_program__fd(skel->progs.trigger_read); for (i = 0; i < nr_cpus; i++) { - cpu = evlist->core.all_cpus->map[i]; + cpu = evlist->core.all_cpus->map[i].cpu; bperf_trigger_reading(prog_fd, cpu); } @@ -245,7 +245,7 @@ static int bperf_cgrp__read(struct evsel *evsel) { struct evlist *evlist = evsel->evlist; int i, cpu, nr_cpus = evlist->core.all_cpus->nr; - int total_cpus = cpu__max_cpu(); + int total_cpus = cpu__max_cpu().cpu; struct perf_counts_values *counts; struct bpf_perf_event_value *values; int reading_map_fd, err = 0; @@ -272,7 +272,7 @@ static int bperf_cgrp__read(struct evsel *evsel) } for (i = 0; i < nr_cpus; i++) { - cpu = evlist->core.all_cpus->map[i]; + cpu = evlist->core.all_cpus->map[i].cpu; counts = perf_counts(evsel->counts, i, 0); counts->val = values[cpu].counter; diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c new file mode 100644 index 000000000000..d756cc66eef3 --- /dev/null +++ b/tools/perf/util/bpf_ftrace.c @@ -0,0 +1,152 @@ +#include <stdio.h> +#include <fcntl.h> +#include <stdint.h> +#include <stdlib.h> + +#include <linux/err.h> + +#include "util/ftrace.h" +#include "util/cpumap.h" +#include "util/thread_map.h" +#include "util/debug.h" +#include "util/evlist.h" +#include "util/bpf_counter.h" + +#include "util/bpf_skel/func_latency.skel.h" + +static struct func_latency_bpf *skel; + +int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace) +{ + int fd, err; + int i, ncpus = 1, ntasks = 1; + struct filter_entry *func; + + if (!list_is_singular(&ftrace->filters)) { + pr_err("ERROR: %s target function(s).\n", + list_empty(&ftrace->filters) ? "No" : "Too many"); + return -1; + } + + func = list_first_entry(&ftrace->filters, struct filter_entry, list); + + skel = func_latency_bpf__open(); + if (!skel) { + pr_err("Failed to open func latency skeleton\n"); + return -1; + } + + /* don't need to set cpu filter for system-wide mode */ + if (ftrace->target.cpu_list) { + ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus); + bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus); + } + + if (target__has_task(&ftrace->target) || target__none(&ftrace->target)) { + ntasks = perf_thread_map__nr(ftrace->evlist->core.threads); + bpf_map__set_max_entries(skel->maps.task_filter, ntasks); + } + + set_max_rlimit(); + + err = func_latency_bpf__load(skel); + if (err) { + pr_err("Failed to load func latency skeleton\n"); + goto out; + } + + if (ftrace->target.cpu_list) { + u32 cpu; + u8 val = 1; + + skel->bss->has_cpu = 1; + fd = bpf_map__fd(skel->maps.cpu_filter); + + for (i = 0; i < ncpus; i++) { + cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu; + bpf_map_update_elem(fd, &cpu, &val, BPF_ANY); + } + } + + if (target__has_task(&ftrace->target) || target__none(&ftrace->target)) { + u32 pid; + u8 val = 1; + + skel->bss->has_task = 1; + fd = bpf_map__fd(skel->maps.task_filter); + + for (i = 0; i < ntasks; i++) { + pid = perf_thread_map__pid(ftrace->evlist->core.threads, i); + bpf_map_update_elem(fd, &pid, &val, BPF_ANY); + } + } + + skel->links.func_begin = bpf_program__attach_kprobe(skel->progs.func_begin, + false, func->name); + if (IS_ERR(skel->links.func_begin)) { + pr_err("Failed to attach fentry program\n"); + err = PTR_ERR(skel->links.func_begin); + goto out; + } + + skel->links.func_end = bpf_program__attach_kprobe(skel->progs.func_end, + true, func->name); + if (IS_ERR(skel->links.func_end)) { + pr_err("Failed to attach fexit program\n"); + err = PTR_ERR(skel->links.func_end); + goto out; + } + + /* XXX: we don't actually use this fd - just for poll() */ + return open("/dev/null", O_RDONLY); + +out: + return err; +} + +int perf_ftrace__latency_start_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + skel->bss->enabled = 1; + return 0; +} + +int perf_ftrace__latency_stop_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + skel->bss->enabled = 0; + return 0; +} + +int perf_ftrace__latency_read_bpf(struct perf_ftrace *ftrace __maybe_unused, + int buckets[]) +{ + int i, fd, err; + u32 idx; + u64 *hist; + int ncpus = cpu__max_cpu().cpu; + + fd = bpf_map__fd(skel->maps.latency); + + hist = calloc(ncpus, sizeof(*hist)); + if (hist == NULL) + return -ENOMEM; + + for (idx = 0; idx < NUM_BUCKET; idx++) { + err = bpf_map_lookup_elem(fd, &idx, hist); + if (err) { + buckets[idx] = 0; + continue; + } + + for (i = 0; i < ncpus; i++) + buckets[idx] += hist[i]; + } + + free(hist); + return 0; +} + +int perf_ftrace__latency_cleanup_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + func_latency_bpf__destroy(skel); + return 0; +} diff --git a/tools/perf/util/bpf_skel/func_latency.bpf.c b/tools/perf/util/bpf_skel/func_latency.bpf.c new file mode 100644 index 000000000000..ea94187fe443 --- /dev/null +++ b/tools/perf/util/bpf_skel/func_latency.bpf.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +// Copyright (c) 2021 Google +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +// This should be in sync with "util/ftrace.h" +#define NUM_BUCKET 22 + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u64)); + __uint(value_size, sizeof(__u64)); + __uint(max_entries, 10000); +} functime SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} cpu_filter SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} task_filter SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u64)); + __uint(max_entries, NUM_BUCKET); +} latency SEC(".maps"); + + +int enabled = 0; +int has_cpu = 0; +int has_task = 0; + +SEC("kprobe/func") +int BPF_PROG(func_begin) +{ + __u64 key, now; + + if (!enabled) + return 0; + + key = bpf_get_current_pid_tgid(); + + if (has_cpu) { + __u32 cpu = bpf_get_smp_processor_id(); + __u8 *ok; + + ok = bpf_map_lookup_elem(&cpu_filter, &cpu); + if (!ok) + return 0; + } + + if (has_task) { + __u32 pid = key & 0xffffffff; + __u8 *ok; + + ok = bpf_map_lookup_elem(&task_filter, &pid); + if (!ok) + return 0; + } + + now = bpf_ktime_get_ns(); + + // overwrite timestamp for nested functions + bpf_map_update_elem(&functime, &key, &now, BPF_ANY); + return 0; +} + +SEC("kretprobe/func") +int BPF_PROG(func_end) +{ + __u64 tid; + __u64 *start; + + if (!enabled) + return 0; + + tid = bpf_get_current_pid_tgid(); + + start = bpf_map_lookup_elem(&functime, &tid); + if (start) { + __s64 delta = bpf_ktime_get_ns() - *start; + __u32 key; + __u64 *hist; + + bpf_map_delete_elem(&functime, &tid); + + if (delta < 0) + return 0; + + // calculate index using delta in usec + for (key = 0; key < (NUM_BUCKET - 1); key++) { + if (delta < ((1000UL) << key)) + break; + } + + hist = bpf_map_lookup_elem(&latency, &key); + if (!hist) + return 0; + + *hist += 1; + } + + return 0; +} diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 8e2777133bd9..131207b91d15 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -1600,7 +1600,7 @@ void callchain_cursor_reset(struct callchain_cursor *cursor) map__zput(node->ms.map); } -void callchain_param_setup(u64 sample_type) +void callchain_param_setup(u64 sample_type, const char *arch) { if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) { if ((sample_type & PERF_SAMPLE_REGS_USER) && @@ -1612,6 +1612,18 @@ void callchain_param_setup(u64 sample_type) else callchain_param.record_mode = CALLCHAIN_FP; } + + /* + * It's necessary to use libunwind to reliably determine the caller of + * a leaf function on aarch64, as otherwise we cannot know whether to + * start from the LR or FP. + * + * Always starting from the LR can result in duplicate or entirely + * erroneous entries. Always skipping the LR and starting from the FP + * can result in missing entries. + */ + if (callchain_param.record_mode == CALLCHAIN_FP && !strcmp(arch, "arm64")) + dwarf_callchain_users = true; } static bool chain_match(struct callchain_list *base_chain, diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index 5824134f983b..d95615daed73 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -280,6 +280,8 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused, } #endif +void arch__add_leaf_frame_record_opts(struct record_opts *opts); + char *callchain_list__sym_name(struct callchain_list *cl, char *bf, size_t bfsize, bool show_dso); char *callchain_node__scnprintf_value(struct callchain_node *node, @@ -298,7 +300,7 @@ int callchain_branch_counts(struct callchain_root *root, u64 *branch_count, u64 *predicted_count, u64 *abort_count, u64 *cycles_count); -void callchain_param_setup(u64 sample_type); +void callchain_param_setup(u64 sample_type, const char *arch); bool callchain_cnode_matched(struct callchain_node *base_cnode, struct callchain_node *pair_cnode); diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c index 582f3aeaf5e4..2b81707b9dba 100644 --- a/tools/perf/util/counts.c +++ b/tools/perf/util/counts.c @@ -4,6 +4,7 @@ #include <string.h> #include "evsel.h" #include "counts.h" +#include <perf/threadmap.h> #include <linux/zalloc.h> struct perf_counts *perf_counts__new(int ncpus, int nthreads) @@ -55,9 +56,12 @@ void evsel__reset_counts(struct evsel *evsel) perf_counts__reset(evsel->counts); } -int evsel__alloc_counts(struct evsel *evsel, int ncpus, int nthreads) +int evsel__alloc_counts(struct evsel *evsel) { - evsel->counts = perf_counts__new(ncpus, nthreads); + struct perf_cpu_map *cpus = evsel__cpus(evsel); + int nthreads = perf_thread_map__nr(evsel->core.threads); + + evsel->counts = perf_counts__new(cpus ? cpus->nr : 1, nthreads); return evsel->counts != NULL ? 0 : -ENOMEM; } diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h index 7ff36bf6d644..5de275194f2b 100644 --- a/tools/perf/util/counts.h +++ b/tools/perf/util/counts.h @@ -18,21 +18,21 @@ struct perf_counts { static inline struct perf_counts_values* -perf_counts(struct perf_counts *counts, int cpu, int thread) +perf_counts(struct perf_counts *counts, int cpu_map_idx, int thread) { - return xyarray__entry(counts->values, cpu, thread); + return xyarray__entry(counts->values, cpu_map_idx, thread); } static inline bool -perf_counts__is_loaded(struct perf_counts *counts, int cpu, int thread) +perf_counts__is_loaded(struct perf_counts *counts, int cpu_map_idx, int thread) { - return *((bool *) xyarray__entry(counts->loaded, cpu, thread)); + return *((bool *) xyarray__entry(counts->loaded, cpu_map_idx, thread)); } static inline void -perf_counts__set_loaded(struct perf_counts *counts, int cpu, int thread, bool loaded) +perf_counts__set_loaded(struct perf_counts *counts, int cpu_map_idx, int thread, bool loaded) { - *((bool *) xyarray__entry(counts->loaded, cpu, thread)) = loaded; + *((bool *) xyarray__entry(counts->loaded, cpu_map_idx, thread)) = loaded; } struct perf_counts *perf_counts__new(int ncpus, int nthreads); @@ -40,7 +40,7 @@ void perf_counts__delete(struct perf_counts *counts); void perf_counts__reset(struct perf_counts *counts); void evsel__reset_counts(struct evsel *evsel); -int evsel__alloc_counts(struct evsel *evsel, int ncpus, int nthreads); +int evsel__alloc_counts(struct evsel *evsel); void evsel__free_counts(struct evsel *evsel); #endif /* __PERF_COUNTS_H */ diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c index 87d3eca9b872..12b2243222b0 100644 --- a/tools/perf/util/cpumap.c +++ b/tools/perf/util/cpumap.c @@ -13,9 +13,13 @@ #include <linux/ctype.h> #include <linux/zalloc.h> -static int max_cpu_num; -static int max_present_cpu_num; +static struct perf_cpu max_cpu_num; +static struct perf_cpu max_present_cpu_num; static int max_node_num; +/** + * The numa node X as read from /sys/devices/system/node/nodeX indexed by the + * CPU number. + */ static int *cpunode_map; static struct perf_cpu_map *cpu_map__from_entries(struct cpu_map_entries *cpus) @@ -33,9 +37,9 @@ static struct perf_cpu_map *cpu_map__from_entries(struct cpu_map_entries *cpus) * otherwise it would become 65535. */ if (cpus->cpu[i] == (u16) -1) - map->map[i] = -1; + map->map[i].cpu = -1; else - map->map[i] = (int) cpus->cpu[i]; + map->map[i].cpu = (int) cpus->cpu[i]; } } @@ -54,7 +58,7 @@ static struct perf_cpu_map *cpu_map__from_mask(struct perf_record_record_cpu_map int cpu, i = 0; for_each_set_bit(cpu, mask->mask, nbits) - map->map[i++] = cpu; + map->map[i++].cpu = cpu; } return map; @@ -87,7 +91,7 @@ struct perf_cpu_map *perf_cpu_map__empty_new(int nr) cpus->nr = nr; for (i = 0; i < nr; i++) - cpus->map[i] = -1; + cpus->map[i].cpu = -1; refcount_set(&cpus->refcnt, 1); } @@ -104,7 +108,7 @@ struct cpu_aggr_map *cpu_aggr_map__empty_new(int nr) cpus->nr = nr; for (i = 0; i < nr; i++) - cpus->map[i] = cpu_map__empty_aggr_cpu_id(); + cpus->map[i] = aggr_cpu_id__empty(); refcount_set(&cpus->refcnt, 1); } @@ -122,28 +126,21 @@ static int cpu__get_topology_int(int cpu, const char *name, int *value) return sysfs__read_int(path, value); } -int cpu_map__get_socket_id(int cpu) +int cpu__get_socket_id(struct perf_cpu cpu) { - int value, ret = cpu__get_topology_int(cpu, "physical_package_id", &value); + int value, ret = cpu__get_topology_int(cpu.cpu, "physical_package_id", &value); return ret ?: value; } -struct aggr_cpu_id cpu_map__get_socket(struct perf_cpu_map *map, int idx, - void *data __maybe_unused) +struct aggr_cpu_id aggr_cpu_id__socket(struct perf_cpu cpu, void *data __maybe_unused) { - int cpu; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id = aggr_cpu_id__empty(); - if (idx > map->nr) - return id; - - cpu = map->map[idx]; - - id.socket = cpu_map__get_socket_id(cpu); + id.socket = cpu__get_socket_id(cpu); return id; } -static int cmp_aggr_cpu_id(const void *a_pointer, const void *b_pointer) +static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer) { struct aggr_cpu_id *a = (struct aggr_cpu_id *)a_pointer; struct aggr_cpu_id *b = (struct aggr_cpu_id *)b_pointer; @@ -160,57 +157,64 @@ static int cmp_aggr_cpu_id(const void *a_pointer, const void *b_pointer) return a->thread - b->thread; } -int cpu_map__build_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **res, - struct aggr_cpu_id (*f)(struct perf_cpu_map *map, int cpu, void *data), - void *data) +struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus, + aggr_cpu_id_get_t get_id, + void *data) { - int nr = cpus->nr; - struct cpu_aggr_map *c = cpu_aggr_map__empty_new(nr); - int cpu, s2; - struct aggr_cpu_id s1; + int idx; + struct perf_cpu cpu; + struct cpu_aggr_map *c = cpu_aggr_map__empty_new(cpus->nr); if (!c) - return -1; + return NULL; /* Reset size as it may only be partially filled */ c->nr = 0; - for (cpu = 0; cpu < nr; cpu++) { - s1 = f(cpus, cpu, data); - for (s2 = 0; s2 < c->nr; s2++) { - if (cpu_map__compare_aggr_cpu_id(s1, c->map[s2])) + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + bool duplicate = false; + struct aggr_cpu_id cpu_id = get_id(cpu, data); + + for (int j = 0; j < c->nr; j++) { + if (aggr_cpu_id__equal(&cpu_id, &c->map[j])) { + duplicate = true; break; + } } - if (s2 == c->nr) { - c->map[c->nr] = s1; + if (!duplicate) { + c->map[c->nr] = cpu_id; c->nr++; } } + /* Trim. */ + if (c->nr != cpus->nr) { + struct cpu_aggr_map *trimmed_c = + realloc(c, + sizeof(struct cpu_aggr_map) + sizeof(struct aggr_cpu_id) * c->nr); + + if (trimmed_c) + c = trimmed_c; + } /* ensure we process id in increasing order */ - qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), cmp_aggr_cpu_id); + qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp); + + return c; - *res = c; - return 0; } -int cpu_map__get_die_id(int cpu) +int cpu__get_die_id(struct perf_cpu cpu) { - int value, ret = cpu__get_topology_int(cpu, "die_id", &value); + int value, ret = cpu__get_topology_int(cpu.cpu, "die_id", &value); return ret ?: value; } -struct aggr_cpu_id cpu_map__get_die(struct perf_cpu_map *map, int idx, void *data) +struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data) { - int cpu, die; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id; + int die; - if (idx > map->nr) - return id; - - cpu = map->map[idx]; - - die = cpu_map__get_die_id(cpu); + die = cpu__get_die_id(cpu); /* There is no die_id on legacy system. */ if (die == -1) die = 0; @@ -220,79 +224,59 @@ struct aggr_cpu_id cpu_map__get_die(struct perf_cpu_map *map, int idx, void *dat * with the socket ID and then add die to * make a unique ID. */ - id = cpu_map__get_socket(map, idx, data); - if (cpu_map__aggr_cpu_id_is_empty(id)) + id = aggr_cpu_id__socket(cpu, data); + if (aggr_cpu_id__is_empty(&id)) return id; id.die = die; return id; } -int cpu_map__get_core_id(int cpu) +int cpu__get_core_id(struct perf_cpu cpu) { - int value, ret = cpu__get_topology_int(cpu, "core_id", &value); + int value, ret = cpu__get_topology_int(cpu.cpu, "core_id", &value); return ret ?: value; } -int cpu_map__get_node_id(int cpu) -{ - return cpu__get_node(cpu); -} - -struct aggr_cpu_id cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data) +struct aggr_cpu_id aggr_cpu_id__core(struct perf_cpu cpu, void *data) { - int cpu; - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); - - if (idx > map->nr) - return id; + struct aggr_cpu_id id; + int core = cpu__get_core_id(cpu); - cpu = map->map[idx]; - - cpu = cpu_map__get_core_id(cpu); - - /* cpu_map__get_die returns a struct with socket and die set*/ - id = cpu_map__get_die(map, idx, data); - if (cpu_map__aggr_cpu_id_is_empty(id)) + /* aggr_cpu_id__die returns a struct with socket and die set. */ + id = aggr_cpu_id__die(cpu, data); + if (aggr_cpu_id__is_empty(&id)) return id; /* * core_id is relative to socket and die, we need a global id. * So we combine the result from cpu_map__get_die with the core id */ - id.core = cpu; + id.core = core; return id; + } -struct aggr_cpu_id cpu_map__get_node(struct perf_cpu_map *map, int idx, void *data __maybe_unused) +struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data) { - struct aggr_cpu_id id = cpu_map__empty_aggr_cpu_id(); + struct aggr_cpu_id id; - if (idx < 0 || idx >= map->nr) + /* aggr_cpu_id__core returns a struct with socket, die and core set. */ + id = aggr_cpu_id__core(cpu, data); + if (aggr_cpu_id__is_empty(&id)) return id; - id.node = cpu_map__get_node_id(map->map[idx]); + id.cpu = cpu; return id; -} -int cpu_map__build_socket_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **sockp) -{ - return cpu_map__build_map(cpus, sockp, cpu_map__get_socket, NULL); } -int cpu_map__build_die_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **diep) +struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unused) { - return cpu_map__build_map(cpus, diep, cpu_map__get_die, NULL); -} - -int cpu_map__build_core_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **corep) -{ - return cpu_map__build_map(cpus, corep, cpu_map__get_core, NULL); -} + struct aggr_cpu_id id = aggr_cpu_id__empty(); -int cpu_map__build_node_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **numap) -{ - return cpu_map__build_map(cpus, numap, cpu_map__get_node, NULL); + id.node = cpu__get_node(cpu); + return id; } /* setup simple routines to easily access node numbers given a cpu number */ @@ -335,8 +319,8 @@ static void set_max_cpu_num(void) int ret = -1; /* set up default */ - max_cpu_num = 4096; - max_present_cpu_num = 4096; + max_cpu_num.cpu = 4096; + max_present_cpu_num.cpu = 4096; mnt = sysfs__mountpoint(); if (!mnt) @@ -349,7 +333,7 @@ static void set_max_cpu_num(void) goto out; } - ret = get_max_num(path, &max_cpu_num); + ret = get_max_num(path, &max_cpu_num.cpu); if (ret) goto out; @@ -360,11 +344,11 @@ static void set_max_cpu_num(void) goto out; } - ret = get_max_num(path, &max_present_cpu_num); + ret = get_max_num(path, &max_present_cpu_num.cpu); out: if (ret) - pr_err("Failed to read max cpus, using default of %d\n", max_cpu_num); + pr_err("Failed to read max cpus, using default of %d\n", max_cpu_num.cpu); } /* Determine highest possible node in the system for sparse allocation */ @@ -403,31 +387,31 @@ int cpu__max_node(void) return max_node_num; } -int cpu__max_cpu(void) +struct perf_cpu cpu__max_cpu(void) { - if (unlikely(!max_cpu_num)) + if (unlikely(!max_cpu_num.cpu)) set_max_cpu_num(); return max_cpu_num; } -int cpu__max_present_cpu(void) +struct perf_cpu cpu__max_present_cpu(void) { - if (unlikely(!max_present_cpu_num)) + if (unlikely(!max_present_cpu_num.cpu)) set_max_cpu_num(); return max_present_cpu_num; } -int cpu__get_node(int cpu) +int cpu__get_node(struct perf_cpu cpu) { if (unlikely(cpunode_map == NULL)) { pr_debug("cpu_map not initialized\n"); return -1; } - return cpunode_map[cpu]; + return cpunode_map[cpu.cpu]; } static int init_cpunode_map(void) @@ -437,13 +421,13 @@ static int init_cpunode_map(void) set_max_cpu_num(); set_max_node_num(); - cpunode_map = calloc(max_cpu_num, sizeof(int)); + cpunode_map = calloc(max_cpu_num.cpu, sizeof(int)); if (!cpunode_map) { pr_err("%s: calloc failed\n", __func__); return -1; } - for (i = 0; i < max_cpu_num; i++) + for (i = 0; i < max_cpu_num.cpu; i++) cpunode_map[i] = -1; return 0; @@ -502,47 +486,39 @@ int cpu__setup_cpunode_map(void) return 0; } -bool cpu_map__has(struct perf_cpu_map *cpus, int cpu) -{ - return perf_cpu_map__idx(cpus, cpu) != -1; -} - -int cpu_map__cpu(struct perf_cpu_map *cpus, int idx) -{ - return cpus->map[idx]; -} - size_t cpu_map__snprint(struct perf_cpu_map *map, char *buf, size_t size) { - int i, cpu, start = -1; + int i, start = -1; bool first = true; size_t ret = 0; #define COMMA first ? "" : "," for (i = 0; i < map->nr + 1; i++) { + struct perf_cpu cpu = { .cpu = INT_MAX }; bool last = i == map->nr; - cpu = last ? INT_MAX : map->map[i]; + if (!last) + cpu = map->map[i]; if (start == -1) { start = i; if (last) { ret += snprintf(buf + ret, size - ret, "%s%d", COMMA, - map->map[i]); + map->map[i].cpu); } - } else if (((i - start) != (cpu - map->map[start])) || last) { + } else if (((i - start) != (cpu.cpu - map->map[start].cpu)) || last) { int end = i - 1; if (start == end) { ret += snprintf(buf + ret, size - ret, "%s%d", COMMA, - map->map[start]); + map->map[start].cpu); } else { ret += snprintf(buf + ret, size - ret, "%s%d-%d", COMMA, - map->map[start], map->map[end]); + map->map[start].cpu, map->map[end].cpu); } first = false; start = i; @@ -569,23 +545,23 @@ size_t cpu_map__snprint_mask(struct perf_cpu_map *map, char *buf, size_t size) int i, cpu; char *ptr = buf; unsigned char *bitmap; - int last_cpu = cpu_map__cpu(map, map->nr - 1); + struct perf_cpu last_cpu = perf_cpu_map__cpu(map, map->nr - 1); if (buf == NULL) return 0; - bitmap = zalloc(last_cpu / 8 + 1); + bitmap = zalloc(last_cpu.cpu / 8 + 1); if (bitmap == NULL) { buf[0] = '\0'; return 0; } for (i = 0; i < map->nr; i++) { - cpu = cpu_map__cpu(map, i); + cpu = perf_cpu_map__cpu(map, i).cpu; bitmap[cpu / 8] |= 1 << (cpu % 8); } - for (cpu = last_cpu / 4 * 4; cpu >= 0; cpu -= 4) { + for (cpu = last_cpu.cpu / 4 * 4; cpu >= 0; cpu -= 4) { unsigned char bits = bitmap[cpu / 8]; if (cpu % 8) @@ -614,32 +590,35 @@ const struct perf_cpu_map *cpu_map__online(void) /* thread unsafe */ return online; } -bool cpu_map__compare_aggr_cpu_id(struct aggr_cpu_id a, struct aggr_cpu_id b) +bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b) { - return a.thread == b.thread && - a.node == b.node && - a.socket == b.socket && - a.die == b.die && - a.core == b.core; + return a->thread == b->thread && + a->node == b->node && + a->socket == b->socket && + a->die == b->die && + a->core == b->core && + a->cpu.cpu == b->cpu.cpu; } -bool cpu_map__aggr_cpu_id_is_empty(struct aggr_cpu_id a) +bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a) { - return a.thread == -1 && - a.node == -1 && - a.socket == -1 && - a.die == -1 && - a.core == -1; + return a->thread == -1 && + a->node == -1 && + a->socket == -1 && + a->die == -1 && + a->core == -1 && + a->cpu.cpu == -1; } -struct aggr_cpu_id cpu_map__empty_aggr_cpu_id(void) +struct aggr_cpu_id aggr_cpu_id__empty(void) { struct aggr_cpu_id ret = { .thread = -1, .node = -1, .socket = -1, .die = -1, - .core = -1 + .core = -1, + .cpu = (struct perf_cpu){ .cpu = -1 }, }; return ret; } diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h index a27eeaf086e8..0d3c2006a15d 100644 --- a/tools/perf/util/cpumap.h +++ b/tools/perf/util/cpumap.h @@ -2,71 +2,135 @@ #ifndef __PERF_CPUMAP_H #define __PERF_CPUMAP_H +#include <stdbool.h> #include <stdio.h> #include <stdbool.h> #include <internal/cpumap.h> #include <perf/cpumap.h> +/** Identify where counts are aggregated, -1 implies not to aggregate. */ struct aggr_cpu_id { + /** A value in the range 0 to number of threads. */ int thread; + /** The numa node X as read from /sys/devices/system/node/nodeX. */ int node; + /** + * The socket number as read from + * /sys/devices/system/cpu/cpuX/topology/physical_package_id. + */ int socket; + /** The die id as read from /sys/devices/system/cpu/cpuX/topology/die_id. */ int die; + /** The core id as read from /sys/devices/system/cpu/cpuX/topology/core_id. */ int core; + /** CPU aggregation, note there is one CPU for each SMT thread. */ + struct perf_cpu cpu; }; +/** A collection of aggr_cpu_id values, the "built" version is sorted and uniqued. */ struct cpu_aggr_map { refcount_t refcnt; + /** Number of valid entries. */ int nr; + /** The entries. */ struct aggr_cpu_id map[]; }; struct perf_record_cpu_map_data; struct perf_cpu_map *perf_cpu_map__empty_new(int nr); -struct cpu_aggr_map *cpu_aggr_map__empty_new(int nr); struct perf_cpu_map *cpu_map__new_data(struct perf_record_cpu_map_data *data); size_t cpu_map__snprint(struct perf_cpu_map *map, char *buf, size_t size); size_t cpu_map__snprint_mask(struct perf_cpu_map *map, char *buf, size_t size); size_t cpu_map__fprintf(struct perf_cpu_map *map, FILE *fp); -int cpu_map__get_socket_id(int cpu); -struct aggr_cpu_id cpu_map__get_socket(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__get_die_id(int cpu); -struct aggr_cpu_id cpu_map__get_die(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__get_core_id(int cpu); -struct aggr_cpu_id cpu_map__get_core(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__get_node_id(int cpu); -struct aggr_cpu_id cpu_map__get_node(struct perf_cpu_map *map, int idx, void *data); -int cpu_map__build_socket_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **sockp); -int cpu_map__build_die_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **diep); -int cpu_map__build_core_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **corep); -int cpu_map__build_node_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **nodep); const struct perf_cpu_map *cpu_map__online(void); /* thread unsafe */ -static inline int cpu_map__socket(struct perf_cpu_map *sock, int s) +int cpu__setup_cpunode_map(void); + +int cpu__max_node(void); +struct perf_cpu cpu__max_cpu(void); +struct perf_cpu cpu__max_present_cpu(void); + +/** + * cpu_map__is_dummy - Events associated with a pid, rather than a CPU, use a single dummy map with an entry of -1. + */ +static inline bool cpu_map__is_dummy(struct perf_cpu_map *cpus) { - if (!sock || s > sock->nr || s < 0) - return 0; - return sock->map[s]; + return cpus->nr == 1 && cpus->map[0].cpu == -1; } -int cpu__setup_cpunode_map(void); +/** + * cpu__get_node - Returns the numa node X as read from + * /sys/devices/system/node/nodeX for the given CPU. + */ +int cpu__get_node(struct perf_cpu cpu); +/** + * cpu__get_socket_id - Returns the socket number as read from + * /sys/devices/system/cpu/cpuX/topology/physical_package_id for the given CPU. + */ +int cpu__get_socket_id(struct perf_cpu cpu); +/** + * cpu__get_die_id - Returns the die id as read from + * /sys/devices/system/cpu/cpuX/topology/die_id for the given CPU. + */ +int cpu__get_die_id(struct perf_cpu cpu); +/** + * cpu__get_core_id - Returns the core id as read from + * /sys/devices/system/cpu/cpuX/topology/core_id for the given CPU. + */ +int cpu__get_core_id(struct perf_cpu cpu); -int cpu__max_node(void); -int cpu__max_cpu(void); -int cpu__max_present_cpu(void); -int cpu__get_node(int cpu); +/** + * cpu_aggr_map__empty_new - Create a cpu_aggr_map of size nr with every entry + * being empty. + */ +struct cpu_aggr_map *cpu_aggr_map__empty_new(int nr); + +typedef struct aggr_cpu_id (*aggr_cpu_id_get_t)(struct perf_cpu cpu, void *data); + +/** + * cpu_aggr_map__new - Create a cpu_aggr_map with an aggr_cpu_id for each cpu in + * cpus. The aggr_cpu_id is created with 'get_id' that may have a data value + * passed to it. The cpu_aggr_map is sorted with duplicate values removed. + */ +struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus, + aggr_cpu_id_get_t get_id, + void *data); -int cpu_map__build_map(struct perf_cpu_map *cpus, struct cpu_aggr_map **res, - struct aggr_cpu_id (*f)(struct perf_cpu_map *map, int cpu, void *data), - void *data); +bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b); +bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a); +struct aggr_cpu_id aggr_cpu_id__empty(void); -int cpu_map__cpu(struct perf_cpu_map *cpus, int idx); -bool cpu_map__has(struct perf_cpu_map *cpus, int cpu); -bool cpu_map__compare_aggr_cpu_id(struct aggr_cpu_id a, struct aggr_cpu_id b); -bool cpu_map__aggr_cpu_id_is_empty(struct aggr_cpu_id a); -struct aggr_cpu_id cpu_map__empty_aggr_cpu_id(void); +/** + * aggr_cpu_id__socket - Create an aggr_cpu_id with the socket populated with + * the socket for cpu. The function signature is compatible with + * aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__socket(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__die - Create an aggr_cpu_id with the die and socket populated + * with the die and socket for cpu. The function signature is compatible with + * aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__core - Create an aggr_cpu_id with the core, die and socket + * populated with the core, die and socket for cpu. The function signature is + * compatible with aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__core(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__core - Create an aggr_cpu_id with the cpu, core, die and socket + * populated with the cpu, core, die and socket for cpu. The function signature + * is compatible with aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data); +/** + * aggr_cpu_id__node - Create an aggr_cpu_id with the numa node populated for + * cpu. The function signature is compatible with aggr_cpu_id_get_t. + */ +struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data); #endif /* __PERF_CPUMAP_H */ diff --git a/tools/perf/util/cputopo.c b/tools/perf/util/cputopo.c index 51b429c86f98..e20b835a1194 100644 --- a/tools/perf/util/cputopo.c +++ b/tools/perf/util/cputopo.c @@ -165,7 +165,8 @@ static bool has_die_topology(void) if (uname(&uts) < 0) return false; - if (strncmp(uts.machine, "x86_64", 6)) + if (strncmp(uts.machine, "x86_64", 6) && + strncmp(uts.machine, "s390x", 5)) return false; scnprintf(filename, MAXPATHLEN, DIE_CPUS_FMT, @@ -187,7 +188,7 @@ struct cpu_topology *cpu_topology__new(void) struct perf_cpu_map *map; bool has_die = has_die_topology(); - ncpus = cpu__max_present_cpu(); + ncpus = cpu__max_present_cpu().cpu; /* build online CPU map */ map = perf_cpu_map__new(NULL); @@ -218,7 +219,7 @@ struct cpu_topology *cpu_topology__new(void) tp->core_cpus_list = addr; for (i = 0; i < nr; i++) { - if (!cpu_map__has(map, i)) + if (!perf_cpu_map__has(map, (struct perf_cpu){ .cpu = i })) continue; ret = build_cpu_topology(tp, i); @@ -333,7 +334,7 @@ struct numa_topology *numa_topology__new(void) tp->nr = nr; for (i = 0; i < nr; i++) { - if (load_numa_node(&tp->nodes[i], node_map->map[i])) { + if (load_numa_node(&tp->nodes[i], node_map->map[i].cpu)) { numa_topology__delete(tp); tp = NULL; break; diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c index 2c06abf6dcd2..65e6c22f38e4 100644 --- a/tools/perf/util/debug.c +++ b/tools/perf/util/debug.c @@ -179,7 +179,7 @@ static int trace_event_printer(enum binary_printer_ops op, break; case BINARY_PRINT_CHAR_DATA: printed += color_fprintf(fp, color, "%c", - isprint(ch) ? ch : '.'); + isprint(ch) && isascii(ch) ? ch : '.'); break; case BINARY_PRINT_CHAR_PAD: printed += color_fprintf(fp, color, " "); diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index b9904896eb97..579e44c59914 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -285,13 +285,13 @@ out_enomem: int perf_env__read_cpu_topology_map(struct perf_env *env) { - int cpu, nr_cpus; + int idx, nr_cpus; if (env->cpu != NULL) return 0; if (env->nr_cpus_avail == 0) - env->nr_cpus_avail = cpu__max_present_cpu(); + env->nr_cpus_avail = cpu__max_present_cpu().cpu; nr_cpus = env->nr_cpus_avail; if (nr_cpus == -1) @@ -301,10 +301,12 @@ int perf_env__read_cpu_topology_map(struct perf_env *env) if (env->cpu == NULL) return -ENOMEM; - for (cpu = 0; cpu < nr_cpus; ++cpu) { - env->cpu[cpu].core_id = cpu_map__get_core_id(cpu); - env->cpu[cpu].socket_id = cpu_map__get_socket_id(cpu); - env->cpu[cpu].die_id = cpu_map__get_die_id(cpu); + for (idx = 0; idx < nr_cpus; ++idx) { + struct perf_cpu cpu = { .cpu = idx }; + + env->cpu[idx].core_id = cpu__get_core_id(cpu); + env->cpu[idx].socket_id = cpu__get_socket_id(cpu); + env->cpu[idx].die_id = cpu__get_die_id(cpu); } env->nr_cpus_avail = nr_cpus; @@ -381,7 +383,7 @@ static int perf_env__read_arch(struct perf_env *env) static int perf_env__read_nr_cpus_avail(struct perf_env *env) { if (env->nr_cpus_avail == 0) - env->nr_cpus_avail = cpu__max_present_cpu(); + env->nr_cpus_avail = cpu__max_present_cpu().cpu; return env->nr_cpus_avail ? 0 : -ENOENT; } @@ -487,7 +489,7 @@ const char *perf_env__pmu_mappings(struct perf_env *env) return env->pmu_mappings; } -int perf_env__numa_node(struct perf_env *env, int cpu) +int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu) { if (!env->nr_numa_map) { struct numa_node *nn; @@ -495,7 +497,7 @@ int perf_env__numa_node(struct perf_env *env, int cpu) for (i = 0; i < env->nr_numa_nodes; i++) { nn = &env->numa_nodes[i]; - nr = max(nr, perf_cpu_map__max(nn->map)); + nr = max(nr, perf_cpu_map__max(nn->map).cpu); } nr++; @@ -514,13 +516,14 @@ int perf_env__numa_node(struct perf_env *env, int cpu) env->nr_numa_map = nr; for (i = 0; i < env->nr_numa_nodes; i++) { - int tmp, j; + struct perf_cpu tmp; + int j; nn = &env->numa_nodes[i]; - perf_cpu_map__for_each_cpu(j, tmp, nn->map) - env->numa_map[j] = i; + perf_cpu_map__for_each_cpu(tmp, j, nn->map) + env->numa_map[tmp.cpu] = i; } } - return cpu >= 0 && cpu < env->nr_numa_map ? env->numa_map[cpu] : -1; + return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? env->numa_map[cpu.cpu] : -1; } diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 163e5ec503a2..a3541f98e1fc 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -4,6 +4,7 @@ #include <linux/types.h> #include <linux/rbtree.h> +#include "cpumap.h" #include "rwsem.h" struct perf_cpu_map; @@ -170,5 +171,5 @@ struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env, bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node); struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id); -int perf_env__numa_node(struct perf_env *env, int cpu); +int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu); #endif /* __PERF_ENV_H */ diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 5f92319ce258..6e88d404b5b3 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -342,36 +342,65 @@ static int evlist__nr_threads(struct evlist *evlist, struct evsel *evsel) return perf_thread_map__nr(evlist->core.threads); } -void evlist__cpu_iter_start(struct evlist *evlist) -{ - struct evsel *pos; - - /* - * Reset the per evsel cpu_iter. This is needed because - * each evsel's cpumap may have a different index space, - * and some operations need the index to modify - * the FD xyarray (e.g. open, close) - */ - evlist__for_each_entry(evlist, pos) - pos->cpu_iter = 0; -} +struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affinity *affinity) +{ + struct evlist_cpu_iterator itr = { + .container = evlist, + .evsel = evlist__first(evlist), + .cpu_map_idx = 0, + .evlist_cpu_map_idx = 0, + .evlist_cpu_map_nr = perf_cpu_map__nr(evlist->core.all_cpus), + .cpu = (struct perf_cpu){ .cpu = -1}, + .affinity = affinity, + }; -bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu) -{ - if (ev->cpu_iter >= ev->core.cpus->nr) - return true; - if (cpu >= 0 && ev->core.cpus->map[ev->cpu_iter] != cpu) - return true; - return false; + if (itr.affinity) { + itr.cpu = perf_cpu_map__cpu(evlist->core.all_cpus, 0); + affinity__set(itr.affinity, itr.cpu.cpu); + itr.cpu_map_idx = perf_cpu_map__idx(itr.evsel->core.cpus, itr.cpu); + /* + * If this CPU isn't in the evsel's cpu map then advance through + * the list. + */ + if (itr.cpu_map_idx == -1) + evlist_cpu_iterator__next(&itr); + } + return itr; +} + +void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr) +{ + while (evlist_cpu_itr->evsel != evlist__last(evlist_cpu_itr->container)) { + evlist_cpu_itr->evsel = evsel__next(evlist_cpu_itr->evsel); + evlist_cpu_itr->cpu_map_idx = + perf_cpu_map__idx(evlist_cpu_itr->evsel->core.cpus, + evlist_cpu_itr->cpu); + if (evlist_cpu_itr->cpu_map_idx != -1) + return; + } + evlist_cpu_itr->evlist_cpu_map_idx++; + if (evlist_cpu_itr->evlist_cpu_map_idx < evlist_cpu_itr->evlist_cpu_map_nr) { + evlist_cpu_itr->evsel = evlist__first(evlist_cpu_itr->container); + evlist_cpu_itr->cpu = + perf_cpu_map__cpu(evlist_cpu_itr->container->core.all_cpus, + evlist_cpu_itr->evlist_cpu_map_idx); + if (evlist_cpu_itr->affinity) + affinity__set(evlist_cpu_itr->affinity, evlist_cpu_itr->cpu.cpu); + evlist_cpu_itr->cpu_map_idx = + perf_cpu_map__idx(evlist_cpu_itr->evsel->core.cpus, + evlist_cpu_itr->cpu); + /* + * If this CPU isn't in the evsel's cpu map then advance through + * the list. + */ + if (evlist_cpu_itr->cpu_map_idx == -1) + evlist_cpu_iterator__next(evlist_cpu_itr); + } } -bool evsel__cpu_iter_skip(struct evsel *ev, int cpu) +bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr) { - if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) { - ev->cpu_iter++; - return false; - } - return true; + return evlist_cpu_itr->evlist_cpu_map_idx >= evlist_cpu_itr->evlist_cpu_map_nr; } static int evsel__strcmp(struct evsel *pos, char *evsel_name) @@ -400,31 +429,26 @@ static int evlist__is_enabled(struct evlist *evlist) static void __evlist__disable(struct evlist *evlist, char *evsel_name) { struct evsel *pos; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int cpu, i, imm = 0; bool has_imm = false; if (affinity__setup(&affinity) < 0) return; /* Disable 'immediate' events last */ - for (imm = 0; imm <= 1; imm++) { - evlist__for_each_cpu(evlist, i, cpu) { - affinity__set(&affinity, cpu); - - evlist__for_each_entry(evlist, pos) { - if (evsel__strcmp(pos, evsel_name)) - continue; - if (evsel__cpu_iter_skip(pos, cpu)) - continue; - if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd) - continue; - if (pos->immediate) - has_imm = true; - if (pos->immediate != imm) - continue; - evsel__disable_cpu(pos, pos->cpu_iter - 1); - } + for (int imm = 0; imm <= 1; imm++) { + evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + pos = evlist_cpu_itr.evsel; + if (evsel__strcmp(pos, evsel_name)) + continue; + if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd) + continue; + if (pos->immediate) + has_imm = true; + if (pos->immediate != imm) + continue; + evsel__disable_cpu(pos, evlist_cpu_itr.cpu_map_idx); } if (!has_imm) break; @@ -462,24 +486,19 @@ void evlist__disable_evsel(struct evlist *evlist, char *evsel_name) static void __evlist__enable(struct evlist *evlist, char *evsel_name) { struct evsel *pos; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int cpu, i; if (affinity__setup(&affinity) < 0) return; - evlist__for_each_cpu(evlist, i, cpu) { - affinity__set(&affinity, cpu); - - evlist__for_each_entry(evlist, pos) { - if (evsel__strcmp(pos, evsel_name)) - continue; - if (evsel__cpu_iter_skip(pos, cpu)) - continue; - if (!evsel__is_group_leader(pos) || !pos->core.fd) - continue; - evsel__enable_cpu(pos, pos->cpu_iter - 1); - } + evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + pos = evlist_cpu_itr.evsel; + if (evsel__strcmp(pos, evsel_name)) + continue; + if (!evsel__is_group_leader(pos) || !pos->core.fd) + continue; + evsel__enable_cpu(pos, evlist_cpu_itr.cpu_map_idx); } affinity__cleanup(&affinity); evlist__for_each_entry(evlist, pos) { @@ -800,7 +819,7 @@ perf_evlist__mmap_cb_get(struct perf_evlist *_evlist, bool overwrite, int idx) static int perf_evlist__mmap_cb_mmap(struct perf_mmap *_map, struct perf_mmap_param *_mp, - int output, int cpu) + int output, struct perf_cpu cpu) { struct mmap *map = container_of(_map, struct mmap, core); struct mmap_params *mp = container_of(_mp, struct mmap_params, core); @@ -1264,14 +1283,14 @@ void evlist__set_selected(struct evlist *evlist, struct evsel *evsel) void evlist__close(struct evlist *evlist) { struct evsel *evsel; + struct evlist_cpu_iterator evlist_cpu_itr; struct affinity affinity; - int cpu, i; /* * With perf record core.cpus is usually NULL. * Use the old method to handle this for now. */ - if (!evlist->core.cpus) { + if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) { evlist__for_each_entry_reverse(evlist, evsel) evsel__close(evsel); return; @@ -1279,15 +1298,12 @@ void evlist__close(struct evlist *evlist) if (affinity__setup(&affinity) < 0) return; - evlist__for_each_cpu(evlist, i, cpu) { - affinity__set(&affinity, cpu); - evlist__for_each_entry_reverse(evlist, evsel) { - if (evsel__cpu_iter_skip(evsel, cpu)) - continue; - perf_evsel__close_cpu(&evsel->core, evsel->cpu_iter - 1); - } + evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + perf_evsel__close_cpu(&evlist_cpu_itr.evsel->core, + evlist_cpu_itr.cpu_map_idx); } + affinity__cleanup(&affinity); evlist__for_each_entry_reverse(evlist, evsel) { perf_evsel__free_fd(&evsel->core); diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 97bfb8d0be4f..64cba56fbc74 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -64,6 +64,7 @@ struct evlist { struct evsel *selected; struct events_stats stats; struct perf_env *env; + const char *hybrid_pmu_name; void (*trace_event_sample_raw)(struct evlist *evlist, union perf_event *event, struct perf_sample *sample); @@ -110,6 +111,7 @@ int __evlist__add_default_attrs(struct evlist *evlist, __evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array)) int arch_evlist__add_default_attrs(struct evlist *evlist); +struct evsel *arch_evlist__leader(struct list_head *list); int evlist__add_dummy(struct evlist *evlist); @@ -325,17 +327,53 @@ void evlist__to_front(struct evlist *evlist, struct evsel *move_evsel); #define evlist__for_each_entry_safe(evlist, tmp, evsel) \ __evlist__for_each_entry_safe(&(evlist)->core.entries, tmp, evsel) -#define evlist__for_each_cpu(evlist, index, cpu) \ - evlist__cpu_iter_start(evlist); \ - perf_cpu_map__for_each_cpu (cpu, index, (evlist)->core.all_cpus) +/** Iterator state for evlist__for_each_cpu */ +struct evlist_cpu_iterator { + /** The list being iterated through. */ + struct evlist *container; + /** The current evsel of the iterator. */ + struct evsel *evsel; + /** The CPU map index corresponding to the evsel->core.cpus for the current CPU. */ + int cpu_map_idx; + /** + * The CPU map index corresponding to evlist->core.all_cpus for the + * current CPU. Distinct from cpu_map_idx as the evsel's cpu map may + * contain fewer entries. + */ + int evlist_cpu_map_idx; + /** The number of CPU map entries in evlist->core.all_cpus. */ + int evlist_cpu_map_nr; + /** The current CPU of the iterator. */ + struct perf_cpu cpu; + /** If present, used to set the affinity when switching between CPUs. */ + struct affinity *affinity; +}; + +/** + * evlist__for_each_cpu - without affinity, iterate over the evlist. With + * affinity, iterate over all CPUs and then the evlist + * for each evsel on that CPU. When switching between + * CPUs the affinity is set to the CPU to avoid IPIs + * during syscalls. + * @evlist_cpu_itr: the iterator instance. + * @evlist: evlist instance to iterate. + * @affinity: NULL or used to set the affinity to the current CPU. + */ +#define evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) \ + for ((evlist_cpu_itr) = evlist__cpu_begin(evlist, affinity); \ + !evlist_cpu_iterator__end(&evlist_cpu_itr); \ + evlist_cpu_iterator__next(&evlist_cpu_itr)) + +/** Returns an iterator set to the first CPU/evsel of evlist. */ +struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affinity *affinity); +/** Move to next element in iterator, updating CPU, evsel and the affinity. */ +void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr); +/** Returns true when iterator is at the end of the CPUs and evlist. */ +bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr); struct evsel *evlist__get_tracking_event(struct evlist *evlist); void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_evsel); -void evlist__cpu_iter_start(struct evlist *evlist); -bool evsel__cpu_iter_skip(struct evsel *ev, int cpu); -bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu); - struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str); struct evsel *evlist__event2evsel(struct evlist *evlist, union perf_event *event); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index f29d37004f55..2f6b18af49e5 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1372,9 +1372,9 @@ int evsel__append_addr_filter(struct evsel *evsel, const char *filter) } /* Caller has to clear disabled after going through all CPUs. */ -int evsel__enable_cpu(struct evsel *evsel, int cpu) +int evsel__enable_cpu(struct evsel *evsel, int cpu_map_idx) { - return perf_evsel__enable_cpu(&evsel->core, cpu); + return perf_evsel__enable_cpu(&evsel->core, cpu_map_idx); } int evsel__enable(struct evsel *evsel) @@ -1387,9 +1387,9 @@ int evsel__enable(struct evsel *evsel) } /* Caller has to set disabled after going through all CPUs. */ -int evsel__disable_cpu(struct evsel *evsel, int cpu) +int evsel__disable_cpu(struct evsel *evsel, int cpu_map_idx) { - return perf_evsel__disable_cpu(&evsel->core, cpu); + return perf_evsel__disable_cpu(&evsel->core, cpu_map_idx); } int evsel__disable(struct evsel *evsel) @@ -1455,7 +1455,7 @@ void evsel__delete(struct evsel *evsel) free(evsel); } -void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, +void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread, struct perf_counts_values *count) { struct perf_counts_values tmp; @@ -1463,12 +1463,12 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, if (!evsel->prev_raw_counts) return; - if (cpu == -1) { + if (cpu_map_idx == -1) { tmp = evsel->prev_raw_counts->aggr; evsel->prev_raw_counts->aggr = *count; } else { - tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread); - *perf_counts(evsel->prev_raw_counts, cpu, thread) = *count; + tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread); + *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count; } count->val = count->val - tmp.val; @@ -1476,46 +1476,28 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, count->run = count->run - tmp.run; } -void perf_counts_values__scale(struct perf_counts_values *count, - bool scale, s8 *pscaled) +static int evsel__read_one(struct evsel *evsel, int cpu_map_idx, int thread) { - s8 scaled = 0; + struct perf_counts_values *count = perf_counts(evsel->counts, cpu_map_idx, thread); - if (scale) { - if (count->run == 0) { - scaled = -1; - count->val = 0; - } else if (count->run < count->ena) { - scaled = 1; - count->val = (u64)((double) count->val * count->ena / count->run); - } - } - - if (pscaled) - *pscaled = scaled; -} - -static int evsel__read_one(struct evsel *evsel, int cpu, int thread) -{ - struct perf_counts_values *count = perf_counts(evsel->counts, cpu, thread); - - return perf_evsel__read(&evsel->core, cpu, thread, count); + return perf_evsel__read(&evsel->core, cpu_map_idx, thread, count); } -static void evsel__set_count(struct evsel *counter, int cpu, int thread, u64 val, u64 ena, u64 run) +static void evsel__set_count(struct evsel *counter, int cpu_map_idx, int thread, + u64 val, u64 ena, u64 run) { struct perf_counts_values *count; - count = perf_counts(counter->counts, cpu, thread); + count = perf_counts(counter->counts, cpu_map_idx, thread); count->val = val; count->ena = ena; count->run = run; - perf_counts__set_loaded(counter->counts, cpu, thread, true); + perf_counts__set_loaded(counter->counts, cpu_map_idx, thread, true); } -static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, u64 *data) +static int evsel__process_group_data(struct evsel *leader, int cpu_map_idx, int thread, u64 *data) { u64 read_format = leader->core.attr.read_format; struct sample_read_value *v; @@ -1534,7 +1516,7 @@ static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, v = (struct sample_read_value *) data; - evsel__set_count(leader, cpu, thread, v[0].value, ena, run); + evsel__set_count(leader, cpu_map_idx, thread, v[0].value, ena, run); for (i = 1; i < nr; i++) { struct evsel *counter; @@ -1543,13 +1525,13 @@ static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, if (!counter) return -EINVAL; - evsel__set_count(counter, cpu, thread, v[i].value, ena, run); + evsel__set_count(counter, cpu_map_idx, thread, v[i].value, ena, run); } return 0; } -static int evsel__read_group(struct evsel *leader, int cpu, int thread) +static int evsel__read_group(struct evsel *leader, int cpu_map_idx, int thread) { struct perf_stat_evsel *ps = leader->stats; u64 read_format = leader->core.attr.read_format; @@ -1570,67 +1552,67 @@ static int evsel__read_group(struct evsel *leader, int cpu, int thread) ps->group_data = data; } - if (FD(leader, cpu, thread) < 0) + if (FD(leader, cpu_map_idx, thread) < 0) return -EINVAL; - if (readn(FD(leader, cpu, thread), data, size) <= 0) + if (readn(FD(leader, cpu_map_idx, thread), data, size) <= 0) return -errno; - return evsel__process_group_data(leader, cpu, thread, data); + return evsel__process_group_data(leader, cpu_map_idx, thread, data); } -int evsel__read_counter(struct evsel *evsel, int cpu, int thread) +int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread) { u64 read_format = evsel->core.attr.read_format; if (read_format & PERF_FORMAT_GROUP) - return evsel__read_group(evsel, cpu, thread); + return evsel__read_group(evsel, cpu_map_idx, thread); - return evsel__read_one(evsel, cpu, thread); + return evsel__read_one(evsel, cpu_map_idx, thread); } -int __evsel__read_on_cpu(struct evsel *evsel, int cpu, int thread, bool scale) +int __evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread, bool scale) { struct perf_counts_values count; size_t nv = scale ? 3 : 1; - if (FD(evsel, cpu, thread) < 0) + if (FD(evsel, cpu_map_idx, thread) < 0) return -EINVAL; - if (evsel->counts == NULL && evsel__alloc_counts(evsel, cpu + 1, thread + 1) < 0) + if (evsel->counts == NULL && evsel__alloc_counts(evsel) < 0) return -ENOMEM; - if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) <= 0) + if (readn(FD(evsel, cpu_map_idx, thread), &count, nv * sizeof(u64)) <= 0) return -errno; - evsel__compute_deltas(evsel, cpu, thread, &count); + evsel__compute_deltas(evsel, cpu_map_idx, thread, &count); perf_counts_values__scale(&count, scale, NULL); - *perf_counts(evsel->counts, cpu, thread) = count; + *perf_counts(evsel->counts, cpu_map_idx, thread) = count; return 0; } static int evsel__match_other_cpu(struct evsel *evsel, struct evsel *other, - int cpu) + int cpu_map_idx) { - int cpuid; + struct perf_cpu cpu; - cpuid = perf_cpu_map__cpu(evsel->core.cpus, cpu); - return perf_cpu_map__idx(other->core.cpus, cpuid); + cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx); + return perf_cpu_map__idx(other->core.cpus, cpu); } -static int evsel__hybrid_group_cpu(struct evsel *evsel, int cpu) +static int evsel__hybrid_group_cpu_map_idx(struct evsel *evsel, int cpu_map_idx) { struct evsel *leader = evsel__leader(evsel); if ((evsel__is_hybrid(evsel) && !evsel__is_hybrid(leader)) || (!evsel__is_hybrid(evsel) && evsel__is_hybrid(leader))) { - return evsel__match_other_cpu(evsel, leader, cpu); + return evsel__match_other_cpu(evsel, leader, cpu_map_idx); } - return cpu; + return cpu_map_idx; } -static int get_group_fd(struct evsel *evsel, int cpu, int thread) +static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread) { struct evsel *leader = evsel__leader(evsel); int fd; @@ -1644,11 +1626,11 @@ static int get_group_fd(struct evsel *evsel, int cpu, int thread) */ BUG_ON(!leader->core.fd); - cpu = evsel__hybrid_group_cpu(evsel, cpu); - if (cpu == -1) + cpu_map_idx = evsel__hybrid_group_cpu_map_idx(evsel, cpu_map_idx); + if (cpu_map_idx == -1) return -1; - fd = FD(leader, cpu, thread); + fd = FD(leader, cpu_map_idx, thread); BUG_ON(fd == -1); return fd; @@ -1662,16 +1644,16 @@ static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int } static int update_fds(struct evsel *evsel, - int nr_cpus, int cpu_idx, + int nr_cpus, int cpu_map_idx, int nr_threads, int thread_idx) { struct evsel *pos; - if (cpu_idx >= nr_cpus || thread_idx >= nr_threads) + if (cpu_map_idx >= nr_cpus || thread_idx >= nr_threads) return -EINVAL; evlist__for_each_entry(evsel->evlist, pos) { - nr_cpus = pos != evsel ? nr_cpus : cpu_idx; + nr_cpus = pos != evsel ? nr_cpus : cpu_map_idx; evsel__remove_fd(pos, nr_cpus, nr_threads, thread_idx); @@ -1685,10 +1667,10 @@ static int update_fds(struct evsel *evsel, return 0; } -bool evsel__ignore_missing_thread(struct evsel *evsel, - int nr_cpus, int cpu, - struct perf_thread_map *threads, - int thread, int err) +static bool evsel__ignore_missing_thread(struct evsel *evsel, + int nr_cpus, int cpu_map_idx, + struct perf_thread_map *threads, + int thread, int err) { pid_t ignore_pid = perf_thread_map__pid(threads, thread); @@ -1711,7 +1693,7 @@ bool evsel__ignore_missing_thread(struct evsel *evsel, * We should remove fd for missing_thread first * because thread_map__remove() will decrease threads->nr. */ - if (update_fds(evsel, nr_cpus, cpu, threads->nr, thread)) + if (update_fds(evsel, nr_cpus, cpu_map_idx, threads->nr, thread)) return false; if (thread_map__remove(threads, thread)) @@ -1993,9 +1975,9 @@ bool evsel__increase_rlimit(enum rlimit_action *set_rlimit) static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads, - int start_cpu, int end_cpu) + int start_cpu_map_idx, int end_cpu_map_idx) { - int cpu, thread, nthreads; + int idx, thread, nthreads; int pid = -1, err, old_errno; enum rlimit_action set_rlimit = NO_CHANGE; @@ -2022,7 +2004,7 @@ fallback_missing_features: display_attr(&evsel->core.attr); - for (cpu = start_cpu; cpu < end_cpu; cpu++) { + for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; @@ -2033,17 +2015,17 @@ retry_open: if (!evsel->cgrp && !evsel->core.system_wide) pid = perf_thread_map__pid(threads, thread); - group_fd = get_group_fd(evsel, cpu, thread); + group_fd = get_group_fd(evsel, idx, thread); test_attr__ready(); pr_debug2_peo("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx", - pid, cpus->map[cpu], group_fd, evsel->open_flags); + pid, cpus->map[idx].cpu, group_fd, evsel->open_flags); - fd = sys_perf_event_open(&evsel->core.attr, pid, cpus->map[cpu], + fd = sys_perf_event_open(&evsel->core.attr, pid, cpus->map[idx].cpu, group_fd, evsel->open_flags); - FD(evsel, cpu, thread) = fd; + FD(evsel, idx, thread) = fd; if (fd < 0) { err = -errno; @@ -2053,10 +2035,10 @@ retry_open: goto try_fallback; } - bpf_counter__install_pe(evsel, cpu, fd); + bpf_counter__install_pe(evsel, idx, fd); if (unlikely(test_attr__enabled)) { - test_attr__open(&evsel->core.attr, pid, cpus->map[cpu], + test_attr__open(&evsel->core.attr, pid, cpus->map[idx], fd, group_fd, evsel->open_flags); } @@ -2097,7 +2079,7 @@ try_fallback: if (evsel__precise_ip_fallback(evsel)) goto retry_open; - if (evsel__ignore_missing_thread(evsel, cpus->nr, cpu, threads, thread, err)) { + if (evsel__ignore_missing_thread(evsel, cpus->nr, idx, threads, thread, err)) { /* We just removed 1 thread, so lower the upper nthreads limit. */ nthreads--; @@ -2112,7 +2094,7 @@ try_fallback: if (err == -EMFILE && evsel__increase_rlimit(&set_rlimit)) goto retry_open; - if (err != -EINVAL || cpu > 0 || thread > 0) + if (err != -EINVAL || idx > 0 || thread > 0) goto out_close; if (evsel__detect_missing_features(evsel)) @@ -2124,12 +2106,12 @@ out_close: old_errno = errno; do { while (--thread >= 0) { - if (FD(evsel, cpu, thread) >= 0) - close(FD(evsel, cpu, thread)); - FD(evsel, cpu, thread) = -1; + if (FD(evsel, idx, thread) >= 0) + close(FD(evsel, idx, thread)); + FD(evsel, idx, thread) = -1; } thread = nthreads; - } while (--cpu >= 0); + } while (--idx >= 0); errno = old_errno; return err; } @@ -2146,13 +2128,13 @@ void evsel__close(struct evsel *evsel) perf_evsel__free_id(&evsel->core); } -int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu) +int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx) { - if (cpu == -1) + if (cpu_map_idx == -1) return evsel__open_cpu(evsel, cpus, NULL, 0, cpus ? cpus->nr : 1); - return evsel__open_cpu(evsel, cpus, NULL, cpu, cpu + 1); + return evsel__open_cpu(evsel, cpus, NULL, cpu_map_idx, cpu_map_idx + 1); } int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads) @@ -2952,6 +2934,10 @@ int evsel__open_strerror(struct evsel *evsel, struct target *target, return scnprintf(msg, size, "wrong clockid (%d).", clockid); if (perf_missing_features.aux_output) return scnprintf(msg, size, "The 'aux_output' feature is not supported, update the kernel."); + if (!target__has_cpu(target)) + return scnprintf(msg, size, + "Invalid event (%s) in per-thread mode, enable system wide with '-a'.", + evsel__name(evsel)); break; case ENODATA: return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. " @@ -2975,15 +2961,15 @@ struct perf_env *evsel__env(struct evsel *evsel) static int store_evsel_ids(struct evsel *evsel, struct evlist *evlist) { - int cpu, thread; + int cpu_map_idx, thread; - for (cpu = 0; cpu < xyarray__max_x(evsel->core.fd); cpu++) { + for (cpu_map_idx = 0; cpu_map_idx < xyarray__max_x(evsel->core.fd); cpu_map_idx++) { for (thread = 0; thread < xyarray__max_y(evsel->core.fd); thread++) { - int fd = FD(evsel, cpu, thread); + int fd = FD(evsel, cpu_map_idx, thread); if (perf_evlist__id_add_fd(&evlist->core, &evsel->core, - cpu, thread, fd) < 0) + cpu_map_idx, thread, fd) < 0) return -1; } } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 29d49a8c1e92..5720ceebffac 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -121,7 +121,6 @@ struct evsel { bool errored; struct hashmap *per_pkg_mask; int err; - int cpu_iter; struct { evsel__sb_cb_t *cb; void *data; @@ -195,9 +194,6 @@ static inline int evsel__nr_cpus(struct evsel *evsel) return evsel__cpus(evsel)->nr; } -void perf_counts_values__scale(struct perf_counts_values *count, - bool scale, s8 *pscaled); - void evsel__compute_deltas(struct evsel *evsel, int cpu, int thread, struct perf_counts_values *count); @@ -288,12 +284,12 @@ void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr); int evsel__set_filter(struct evsel *evsel, const char *filter); int evsel__append_tp_filter(struct evsel *evsel, const char *filter); int evsel__append_addr_filter(struct evsel *evsel, const char *filter); -int evsel__enable_cpu(struct evsel *evsel, int cpu); +int evsel__enable_cpu(struct evsel *evsel, int cpu_map_idx); int evsel__enable(struct evsel *evsel); int evsel__disable(struct evsel *evsel); -int evsel__disable_cpu(struct evsel *evsel, int cpu); +int evsel__disable_cpu(struct evsel *evsel, int cpu_map_idx); -int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu); +int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx); int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads); int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads); @@ -305,10 +301,6 @@ bool evsel__detect_missing_features(struct evsel *evsel); enum rlimit_action { NO_CHANGE, SET_TO_MAX, INCREASED_MAX }; bool evsel__increase_rlimit(enum rlimit_action *set_rlimit); -bool evsel__ignore_missing_thread(struct evsel *evsel, - int nr_cpus, int cpu, - struct perf_thread_map *threads, - int thread, int err); bool evsel__precise_ip_fallback(struct evsel *evsel); struct perf_sample; @@ -337,32 +329,32 @@ static inline bool evsel__match2(struct evsel *e1, struct evsel *e2) (e1->core.attr.config == e2->core.attr.config); } -int evsel__read_counter(struct evsel *evsel, int cpu, int thread); +int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread); -int __evsel__read_on_cpu(struct evsel *evsel, int cpu, int thread, bool scale); +int __evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread, bool scale); /** * evsel__read_on_cpu - Read out the results on a CPU and thread * * @evsel - event selector to read value - * @cpu - CPU of interest + * @cpu_map_idx - CPU of interest * @thread - thread of interest */ -static inline int evsel__read_on_cpu(struct evsel *evsel, int cpu, int thread) +static inline int evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread) { - return __evsel__read_on_cpu(evsel, cpu, thread, false); + return __evsel__read_on_cpu(evsel, cpu_map_idx, thread, false); } /** * evsel__read_on_cpu_scaled - Read out the results on a CPU and thread, scaled * * @evsel - event selector to read value - * @cpu - CPU of interest + * @cpu_map_idx - CPU of interest * @thread - thread of interest */ -static inline int evsel__read_on_cpu_scaled(struct evsel *evsel, int cpu, int thread) +static inline int evsel__read_on_cpu_scaled(struct evsel *evsel, int cpu_map_idx, int thread) { - return __evsel__read_on_cpu(evsel, cpu, thread, true); + return __evsel__read_on_cpu(evsel, cpu_map_idx, thread, true); } int evsel__parse_sample(struct evsel *evsel, union perf_event *event, diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c index 666b59baeb70..675f318ce7c1 100644 --- a/tools/perf/util/expr.c +++ b/tools/perf/util/expr.c @@ -405,12 +405,17 @@ double expr_id_data__source_count(const struct expr_id_data *data) double expr__get_literal(const char *literal) { static struct cpu_topology *topology; + double result = NAN; - if (!strcmp("#smt_on", literal)) - return smt_on() > 0 ? 1.0 : 0.0; + if (!strcasecmp("#smt_on", literal)) { + result = smt_on() > 0 ? 1.0 : 0.0; + goto out; + } - if (!strcmp("#num_cpus", literal)) - return cpu__max_present_cpu(); + if (!strcmp("#num_cpus", literal)) { + result = cpu__max_present_cpu().cpu; + goto out; + } /* * Assume that topology strings are consistent, such as CPUs "0-1" @@ -422,16 +427,24 @@ double expr__get_literal(const char *literal) topology = cpu_topology__new(); if (!topology) { pr_err("Error creating CPU topology"); - return NAN; + goto out; } } - if (!strcmp("#num_packages", literal)) - return topology->package_cpus_lists; - if (!strcmp("#num_dies", literal)) - return topology->die_cpus_lists; - if (!strcmp("#num_cores", literal)) - return topology->core_cpus_lists; + if (!strcmp("#num_packages", literal)) { + result = topology->package_cpus_lists; + goto out; + } + if (!strcmp("#num_dies", literal)) { + result = topology->die_cpus_lists; + goto out; + } + if (!strcmp("#num_cores", literal)) { + result = topology->core_cpus_lists; + goto out; + } pr_err("Unrecognized literal '%s'", literal); - return NAN; +out: + pr_debug2("literal: %s = %f\n", literal, result); + return result; } diff --git a/tools/perf/util/ftrace.h b/tools/perf/util/ftrace.h new file mode 100644 index 000000000000..887f68a185f7 --- /dev/null +++ b/tools/perf/util/ftrace.h @@ -0,0 +1,81 @@ +#ifndef __PERF_FTRACE_H__ +#define __PERF_FTRACE_H__ + +#include <linux/list.h> + +#include "target.h" + +struct evlist; + +struct perf_ftrace { + struct evlist *evlist; + struct target target; + const char *tracer; + struct list_head filters; + struct list_head notrace; + struct list_head graph_funcs; + struct list_head nograph_funcs; + unsigned long percpu_buffer_size; + bool inherit; + int graph_depth; + int func_stack_trace; + int func_irq_info; + int graph_nosleep_time; + int graph_noirqs; + int graph_verbose; + int graph_thresh; + unsigned int initial_delay; +}; + +struct filter_entry { + struct list_head list; + char name[]; +}; + +#define NUM_BUCKET 22 /* 20 + 2 (for outliers in both direction) */ + +#ifdef HAVE_BPF_SKEL + +int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace); +int perf_ftrace__latency_start_bpf(struct perf_ftrace *ftrace); +int perf_ftrace__latency_stop_bpf(struct perf_ftrace *ftrace); +int perf_ftrace__latency_read_bpf(struct perf_ftrace *ftrace, + int buckets[]); +int perf_ftrace__latency_cleanup_bpf(struct perf_ftrace *ftrace); + +#else /* !HAVE_BPF_SKEL */ + +static inline int +perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_start_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_stop_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_read_bpf(struct perf_ftrace *ftrace __maybe_unused, + int buckets[] __maybe_unused) +{ + return -1; +} + +static inline int +perf_ftrace__latency_cleanup_bpf(struct perf_ftrace *ftrace __maybe_unused) +{ + return -1; +} + +#endif /* HAVE_BPF_SKEL */ + +#endif /* __PERF_FTRACE_H__ */ diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index e3c1a532d059..6da12e522edc 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -472,7 +472,7 @@ static int write_nrcpus(struct feat_fd *ff, u32 nrc, nra; int ret; - nrc = cpu__max_present_cpu(); + nrc = cpu__max_present_cpu().cpu; nr = sysconf(_SC_NPROCESSORS_ONLN); if (nr < 0) @@ -1163,7 +1163,7 @@ static int build_caches(struct cpu_cache_level caches[], u32 *cntp) u32 nr, cpu; u16 level; - nr = cpu__max_cpu(); + nr = cpu__max_cpu().cpu; for (cpu = 0; cpu < nr; cpu++) { for (level = 0; level < MAX_CACHE_LVL; level++) { @@ -1195,7 +1195,7 @@ static int build_caches(struct cpu_cache_level caches[], u32 *cntp) static int write_cache(struct feat_fd *ff, struct evlist *evlist __maybe_unused) { - u32 max_caches = cpu__max_cpu() * MAX_CACHE_LVL; + u32 max_caches = cpu__max_cpu().cpu * MAX_CACHE_LVL; struct cpu_cache_level caches[max_caches]; u32 cnt = 0, i, version = 1; int ret; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b776465e04ef..0a8033b09e28 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -211,7 +211,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10); hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13); hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13); - hists__new_col_len(hists, HISTC_P_STAGE_CYC, 13); + hists__new_col_len(hists, HISTC_LOCAL_P_STAGE_CYC, 13); + hists__new_col_len(hists, HISTC_GLOBAL_P_STAGE_CYC, 13); + if (symbol_conf.nanosecs) hists__new_col_len(hists, HISTC_TIME, 16); else diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 621f35ae1efa..2a15e22fb89c 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -75,7 +75,8 @@ enum hist_column { HISTC_MEM_BLOCKED, HISTC_LOCAL_INS_LAT, HISTC_GLOBAL_INS_LAT, - HISTC_P_STAGE_CYC, + HISTC_LOCAL_P_STAGE_CYC, + HISTC_GLOBAL_P_STAGE_CYC, HISTC_NR_COLS, /* Last entry */ }; diff --git a/tools/perf/util/libunwind/arm64.c b/tools/perf/util/libunwind/arm64.c index c397be0c2e32..15f60fd09424 100644 --- a/tools/perf/util/libunwind/arm64.c +++ b/tools/perf/util/libunwind/arm64.c @@ -23,7 +23,9 @@ #include "unwind.h" #include "libunwind-aarch64.h" +#define perf_event_arm_regs perf_event_arm64_regs #include <../../../../arch/arm64/include/uapi/asm/perf_regs.h> +#undef perf_event_arm_regs #include "../../arch/arm64/util/unwind-libunwind.c" /* NO_LIBUNWIND_DEBUG_FRAME is a feature flag for local libunwind, diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index fb8496df8432..3901440aeff9 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -34,6 +34,7 @@ #include "bpf-event.h" #include <internal/lib.h> // page_size #include "cgroup.h" +#include "arm64-frame-pointer-unwind-support.h" #include <linux/ctype.h> #include <symbol/kallsyms.h> @@ -2710,6 +2711,15 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread, return err; } +static u64 get_leaf_frame_caller(struct perf_sample *sample, + struct thread *thread, int usr_idx) +{ + if (machine__normalized_is(thread->maps->machine, "arm64")) + return get_leaf_frame_caller_aarch64(sample, thread, usr_idx); + else + return 0; +} + static int thread__resolve_callchain_sample(struct thread *thread, struct callchain_cursor *cursor, struct evsel *evsel, @@ -2723,9 +2733,10 @@ static int thread__resolve_callchain_sample(struct thread *thread, struct ip_callchain *chain = sample->callchain; int chain_nr = 0; u8 cpumode = PERF_RECORD_MISC_USER; - int i, j, err, nr_entries; + int i, j, err, nr_entries, usr_idx; int skip_idx = -1; int first_call = 0; + u64 leaf_frame_caller; if (chain) chain_nr = chain->nr; @@ -2850,6 +2861,34 @@ check_calls: continue; } + /* + * PERF_CONTEXT_USER allows us to locate where the user stack ends. + * Depending on callchain_param.order and the position of PERF_CONTEXT_USER, + * the index will be different in order to add the missing frame + * at the right place. + */ + + usr_idx = callchain_param.order == ORDER_CALLEE ? j-2 : j-1; + + if (usr_idx >= 0 && chain->ips[usr_idx] == PERF_CONTEXT_USER) { + + leaf_frame_caller = get_leaf_frame_caller(sample, thread, usr_idx); + + /* + * check if leaf_frame_Caller != ip to not add the same + * value twice. + */ + + if (leaf_frame_caller && leaf_frame_caller != ip) { + + err = add_callchain_ip(thread, cursor, parent, + root_al, &cpumode, leaf_frame_caller, + false, NULL, NULL, 0); + if (err) + return (err < 0) ? err : 0; + } + } + err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip, false, NULL, NULL, 0); @@ -3079,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid, } /* - * Compares the raw arch string. N.B. see instead perf_env__arch() if a - * normalized arch is needed. + * Compares the raw arch string. N.B. see instead perf_env__arch() or + * machine__normalized_is() if a normalized arch is needed. */ bool machine__is(struct machine *machine, const char *arch) { return machine && !strcmp(perf_env__raw_arch(machine->env), arch); } +bool machine__normalized_is(struct machine *machine, const char *arch) +{ + return machine && !strcmp(perf_env__arch(machine->env), arch); +} + int machine__nr_cpus_avail(struct machine *machine) { return machine ? perf_env__nr_cpus_avail(machine->env) : 0; diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h index a143087eeb47..c5a45dc8df4c 100644 --- a/tools/perf/util/machine.h +++ b/tools/perf/util/machine.h @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine) } bool machine__is(struct machine *machine, const char *arch); +bool machine__normalized_is(struct machine *machine, const char *arch); int machine__nr_cpus_avail(struct machine *machine); struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid); diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c index 3167b4628b6d..ed0ab838bcc5 100644 --- a/tools/perf/util/mem-events.c +++ b/tools/perf/util/mem-events.c @@ -309,6 +309,9 @@ static const char * const mem_hops[] = { * to be set with mem_hops field. */ "core, same node", + "node, same socket", + "socket, same board", + "board", }; int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) @@ -316,7 +319,7 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) size_t i, l = 0; u64 m = PERF_MEM_LVL_NA; u64 hit, miss; - int printed; + int printed = 0; if (mem_info) m = mem_info->data_src.mem_lvl; @@ -335,18 +338,22 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) l += 7; } - if (mem_info && mem_info->data_src.mem_hops) + /* + * Incase mem_hops field is set, we can skip printing data source via + * PERF_MEM_LVL namespace. + */ + if (mem_info && mem_info->data_src.mem_hops) { l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]); - - printed = 0; - for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { - if (!(m & 0x1)) - continue; - if (printed++) { - strcat(out, " or "); - l += 4; + } else { + for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { + if (!(m & 0x1)) + continue; + if (printed++) { + strcat(out, " or "); + l += 4; + } + l += scnprintf(out + l, sz - l, mem_lvl[i]); } - l += scnprintf(out + l, sz - l, mem_lvl[i]); } if (mem_info && mem_info->data_src.mem_lvl_num) { diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index fffe02aae3ed..d8492e339521 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -209,8 +209,8 @@ static struct metric *metric__new(const struct pmu_event *pe, m->metric_name = pe->metric_name; m->modifier = modifier ? strdup(modifier) : NULL; if (modifier && !m->modifier) { - free(m); expr__ctx_free(m->pctx); + free(m); return NULL; } m->metric_expr = pe->metric_expr; @@ -314,7 +314,7 @@ static int setup_metric_events(struct hashmap *ids, */ metric_id = evsel__metric_id(ev); evlist__for_each_entry_continue(metric_evlist, ev) { - if (!strcmp(evsel__metric_id(metric_events[i]), metric_id)) + if (!strcmp(evsel__metric_id(ev), metric_id)) ev->metric_leader = metric_events[i]; } } @@ -1115,13 +1115,27 @@ out: return ret; } +/** + * metric_list_cmp - list_sort comparator that sorts metrics with more events to + * the front. duration_time is excluded from the count. + */ static int metric_list_cmp(void *priv __maybe_unused, const struct list_head *l, const struct list_head *r) { const struct metric *left = container_of(l, struct metric, nd); const struct metric *right = container_of(r, struct metric, nd); + struct expr_id_data *data; + int left_count, right_count; + + left_count = hashmap__size(left->pctx->ids); + if (!expr__get_id(left->pctx, "duration_time", &data)) + left_count--; + + right_count = hashmap__size(right->pctx->ids); + if (!expr__get_id(right->pctx, "duration_time", &data)) + right_count--; - return hashmap__size(right->pctx->ids) - hashmap__size(left->pctx->ids); + return right_count - left_count; } /** @@ -1299,14 +1313,16 @@ err_out: /** * parse_ids - Build the event string for the ids and parse them creating an * evlist. The encoded metric_ids are decoded. + * @metric_no_merge: is metric sharing explicitly disabled. * @fake_pmu: used when testing metrics not supported by the current CPU. * @ids: the event identifiers parsed from a metric. * @modifier: any modifiers added to the events. * @has_constraint: false if events should be placed in a weak group. * @out_evlist: the created list of events. */ -static int parse_ids(struct perf_pmu *fake_pmu, struct expr_parse_ctx *ids, - const char *modifier, bool has_constraint, struct evlist **out_evlist) +static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu, + struct expr_parse_ctx *ids, const char *modifier, + bool has_constraint, struct evlist **out_evlist) { struct parse_events_error parse_error; struct evlist *parsed_evlist; @@ -1314,12 +1330,19 @@ static int parse_ids(struct perf_pmu *fake_pmu, struct expr_parse_ctx *ids, int ret; *out_evlist = NULL; - if (hashmap__size(ids->ids) == 0) { + if (!metric_no_merge || hashmap__size(ids->ids) == 0) { char *tmp; /* - * No ids/events in the expression parsing context. Events may - * have been removed because of constant evaluation, e.g.: - * event1 if #smt_on else 0 + * We may fail to share events between metrics because + * duration_time isn't present in one metric. For example, a + * ratio of cache misses doesn't need duration_time but the same + * events may be used for a misses per second. Events without + * sharing implies multiplexing, that is best avoided, so place + * duration_time in every group. + * + * Also, there may be no ids/events in the expression parsing + * context because of constant evaluation, e.g.: + * event1 if #smt_on else 0 * Add a duration_time event to avoid a parse error on an empty * string. */ @@ -1387,7 +1410,8 @@ static int parse_groups(struct evlist *perf_evlist, const char *str, ret = build_combined_expr_ctx(&metric_list, &combined); if (!ret && combined && hashmap__size(combined->ids)) { - ret = parse_ids(fake_pmu, combined, /*modifier=*/NULL, + ret = parse_ids(metric_no_merge, fake_pmu, combined, + /*modifier=*/NULL, /*has_constraint=*/true, &combined_evlist); } @@ -1435,7 +1459,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str, } } if (!metric_evlist) { - ret = parse_ids(fake_pmu, m->pctx, m->modifier, + ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier, m->has_constraint, &m->evlist); if (ret) goto out; diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index 23ecdba9e670..12261ed8c15b 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -94,7 +94,7 @@ static void perf_mmap__aio_free(struct mmap *map, int idx) } } -static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity) +static int perf_mmap__aio_bind(struct mmap *map, int idx, struct perf_cpu cpu, int affinity) { void *data; size_t mmap_len; @@ -138,7 +138,7 @@ static void perf_mmap__aio_free(struct mmap *map, int idx) } static int perf_mmap__aio_bind(struct mmap *map __maybe_unused, int idx __maybe_unused, - int cpu __maybe_unused, int affinity __maybe_unused) + struct perf_cpu cpu __maybe_unused, int affinity __maybe_unused) { return 0; } @@ -240,7 +240,8 @@ void mmap__munmap(struct mmap *map) static void build_node_mask(int node, struct mmap_cpu_mask *mask) { - int c, cpu, nr_cpus; + int idx, nr_cpus; + struct perf_cpu cpu; const struct perf_cpu_map *cpu_map = NULL; cpu_map = cpu_map__online(); @@ -248,16 +249,16 @@ static void build_node_mask(int node, struct mmap_cpu_mask *mask) return; nr_cpus = perf_cpu_map__nr(cpu_map); - for (c = 0; c < nr_cpus; c++) { - cpu = cpu_map->map[c]; /* map c index to online cpu index */ + for (idx = 0; idx < nr_cpus; idx++) { + cpu = cpu_map->map[idx]; /* map c index to online cpu index */ if (cpu__get_node(cpu) == node) - set_bit(cpu, mask->bits); + set_bit(cpu.cpu, mask->bits); } } static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp) { - map->affinity_mask.nbits = cpu__max_cpu(); + map->affinity_mask.nbits = cpu__max_cpu().cpu; map->affinity_mask.bits = bitmap_zalloc(map->affinity_mask.nbits); if (!map->affinity_mask.bits) return -1; @@ -265,12 +266,12 @@ static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params * if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask); else if (mp->affinity == PERF_AFFINITY_CPU) - set_bit(map->core.cpu, map->affinity_mask.bits); + set_bit(map->core.cpu.cpu, map->affinity_mask.bits); return 0; } -int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) +int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu cpu) { if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) { pr_debug2("failed to mmap perf event ring buffer, error %d\n", diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index 8e259b9610f8..83f6bd4d4082 100644 --- a/tools/perf/util/mmap.h +++ b/tools/perf/util/mmap.h @@ -7,6 +7,7 @@ #include <linux/types.h> #include <linux/ring_buffer.h> #include <linux/bitops.h> +#include <perf/cpumap.h> #include <stdbool.h> #include <pthread.h> // for cpu_set_t #ifdef HAVE_AIO_SUPPORT @@ -52,7 +53,7 @@ struct mmap_params { struct auxtrace_mmap_params auxtrace_mp; }; -int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu); +int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu cpu); void mmap__munmap(struct mmap *map); union perf_event *perf_mmap__read_forward(struct mmap *map); diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c index 608b20c72a5c..48aa3217300b 100644 --- a/tools/perf/util/namespaces.c +++ b/tools/perf/util/namespaces.c @@ -60,17 +60,49 @@ void namespaces__free(struct namespaces *namespaces) free(namespaces); } +static int nsinfo__get_nspid(struct nsinfo *nsi, const char *path) +{ + FILE *f = NULL; + char *statln = NULL; + size_t linesz = 0; + char *nspid; + + f = fopen(path, "r"); + if (f == NULL) + return -1; + + while (getline(&statln, &linesz, f) != -1) { + /* Use tgid if CONFIG_PID_NS is not defined. */ + if (strstr(statln, "Tgid:") != NULL) { + nsi->tgid = (pid_t)strtol(strrchr(statln, '\t'), + NULL, 10); + nsi->nstgid = nsi->tgid; + } + + if (strstr(statln, "NStgid:") != NULL) { + nspid = strrchr(statln, '\t'); + nsi->nstgid = (pid_t)strtol(nspid, NULL, 10); + /* + * If innermost tgid is not the first, process is in a different + * PID namespace. + */ + nsi->in_pidns = (statln + sizeof("NStgid:") - 1) != nspid; + break; + } + } + + fclose(f); + free(statln); + return 0; +} + int nsinfo__init(struct nsinfo *nsi) { char oldns[PATH_MAX]; char spath[PATH_MAX]; char *newns = NULL; - char *statln = NULL; - char *nspid; struct stat old_stat; struct stat new_stat; - FILE *f = NULL; - size_t linesz = 0; int rv = -1; if (snprintf(oldns, PATH_MAX, "/proc/self/ns/mnt") >= PATH_MAX) @@ -100,34 +132,9 @@ int nsinfo__init(struct nsinfo *nsi) if (snprintf(spath, PATH_MAX, "/proc/%d/status", nsi->pid) >= PATH_MAX) goto out; - f = fopen(spath, "r"); - if (f == NULL) - goto out; - - while (getline(&statln, &linesz, f) != -1) { - /* Use tgid if CONFIG_PID_NS is not defined. */ - if (strstr(statln, "Tgid:") != NULL) { - nsi->tgid = (pid_t)strtol(strrchr(statln, '\t'), - NULL, 10); - nsi->nstgid = nsi->tgid; - } - - if (strstr(statln, "NStgid:") != NULL) { - nspid = strrchr(statln, '\t'); - nsi->nstgid = (pid_t)strtol(nspid, NULL, 10); - /* If innermost tgid is not the first, process is in a different - * PID namespace. - */ - nsi->in_pidns = (statln + sizeof("NStgid:") - 1) != nspid; - break; - } - } - rv = 0; + rv = nsinfo__get_nspid(nsi, spath); out: - if (f != NULL) - (void) fclose(f); - free(statln); free(newns); return rv; } @@ -299,3 +306,12 @@ int nsinfo__stat(const char *filename, struct stat *st, struct nsinfo *nsi) return ret; } + +bool nsinfo__is_in_root_namespace(void) +{ + struct nsinfo nsi; + + memset(&nsi, 0x0, sizeof(nsi)); + nsinfo__get_nspid(&nsi, "/proc/self/status"); + return !nsi.in_pidns; +} diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h index ad9775db7b9c..9ceea9643507 100644 --- a/tools/perf/util/namespaces.h +++ b/tools/perf/util/namespaces.h @@ -59,6 +59,8 @@ void nsinfo__mountns_exit(struct nscookie *nc); char *nsinfo__realpath(const char *path, struct nsinfo *nsi); int nsinfo__stat(const char *filename, struct stat *st, struct nsinfo *nsi); +bool nsinfo__is_in_root_namespace(void); + static inline void __nsinfo__zput(struct nsinfo **nsip) { if (nsip) { diff --git a/tools/perf/util/parse-events-hybrid.c b/tools/perf/util/parse-events-hybrid.c index 9fc86971027b..284f8eabd3b9 100644 --- a/tools/perf/util/parse-events-hybrid.c +++ b/tools/perf/util/parse-events-hybrid.c @@ -63,10 +63,13 @@ static int create_event_hybrid(__u32 config_type, int *idx, static int pmu_cmp(struct parse_events_state *parse_state, struct perf_pmu *pmu) { - if (!parse_state->hybrid_pmu_name) - return 0; + if (parse_state->evlist && parse_state->evlist->hybrid_pmu_name) + return strcmp(parse_state->evlist->hybrid_pmu_name, pmu->name); + + if (parse_state->hybrid_pmu_name) + return strcmp(parse_state->hybrid_pmu_name, pmu->name); - return strcmp(parse_state->hybrid_pmu_name, pmu->name); + return 0; } static int add_hw_hybrid(struct parse_events_state *parse_state, diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index ba74fdf74af9..acf20ce98ce9 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1824,6 +1824,11 @@ out: return ret; } +__weak struct evsel *arch_evlist__leader(struct list_head *list) +{ + return list_first_entry(list, struct evsel, core.node); +} + void parse_events__set_leader(char *name, struct list_head *list, struct parse_events_state *parse_state) { @@ -1837,9 +1842,10 @@ void parse_events__set_leader(char *name, struct list_head *list, if (parse_events__set_leader_for_uncore_aliase(name, list, parse_state)) return; - __perf_evlist__set_leader(list); - leader = list_entry(list->next, struct evsel, core.node); + leader = arch_evlist__leader(list); + __perf_evlist__set_leader(list, &leader->core); leader->group_name = name ? strdup(name) : NULL; + list_move(&leader->core.node, list); } /* list_event is assumed to point to malloc'ed memory */ diff --git a/tools/perf/util/perf_api_probe.c b/tools/perf/util/perf_api_probe.c index 020411682a3c..734d006d9a8c 100644 --- a/tools/perf/util/perf_api_probe.c +++ b/tools/perf/util/perf_api_probe.c @@ -11,7 +11,7 @@ typedef void (*setup_probe_fn_t)(struct evsel *evsel); -static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) +static int perf_do_probe_api(setup_probe_fn_t fn, struct perf_cpu cpu, const char *str) { struct evlist *evlist; struct evsel *evsel; @@ -29,7 +29,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) evsel = evlist__first(evlist); while (1) { - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu.cpu, -1, flags); if (fd < 0) { if (pid == -1 && errno == EACCES) { pid = 0; @@ -43,7 +43,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) fn(evsel); - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu.cpu, -1, flags); if (fd < 0) { if (errno == EINVAL) err = -EINVAL; @@ -61,7 +61,8 @@ static bool perf_probe_api(setup_probe_fn_t fn) { const char *try[] = {"cycles:u", "instructions:u", "cpu-clock:u", NULL}; struct perf_cpu_map *cpus; - int cpu, ret, i = 0; + struct perf_cpu cpu; + int ret, i = 0; cpus = perf_cpu_map__new(NULL); if (!cpus) @@ -136,15 +137,17 @@ bool perf_can_record_cpu_wide(void) .exclude_kernel = 1, }; struct perf_cpu_map *cpus; - int cpu, fd; + struct perf_cpu cpu; + int fd; cpus = perf_cpu_map__new(NULL); if (!cpus) return false; + cpu = cpus->map[0]; perf_cpu_map__put(cpus); - fd = sys_perf_event_open(&attr, -1, cpu, -1, 0); + fd = sys_perf_event_open(&attr, -1, cpu.cpu, -1, 0); if (fd < 0) return false; close(fd); diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 06a7461ba864..a982e40ee5a9 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <errno.h> +#include <string.h> #include "perf_regs.h" #include "event.h" @@ -20,6 +21,671 @@ uint64_t __weak arch__user_reg_mask(void) } #ifdef HAVE_PERF_REGS_SUPPORT + +#define perf_event_arm_regs perf_event_arm64_regs +#include "../../arch/arm64/include/uapi/asm/perf_regs.h" +#undef perf_event_arm_regs + +#include "../../arch/arm/include/uapi/asm/perf_regs.h" +#include "../../arch/csky/include/uapi/asm/perf_regs.h" +#include "../../arch/mips/include/uapi/asm/perf_regs.h" +#include "../../arch/powerpc/include/uapi/asm/perf_regs.h" +#include "../../arch/riscv/include/uapi/asm/perf_regs.h" +#include "../../arch/s390/include/uapi/asm/perf_regs.h" +#include "../../arch/x86/include/uapi/asm/perf_regs.h" + +static const char *__perf_reg_name_arm64(int id) +{ + switch (id) { + case PERF_REG_ARM64_X0: + return "x0"; + case PERF_REG_ARM64_X1: + return "x1"; + case PERF_REG_ARM64_X2: + return "x2"; + case PERF_REG_ARM64_X3: + return "x3"; + case PERF_REG_ARM64_X4: + return "x4"; + case PERF_REG_ARM64_X5: + return "x5"; + case PERF_REG_ARM64_X6: + return "x6"; + case PERF_REG_ARM64_X7: + return "x7"; + case PERF_REG_ARM64_X8: + return "x8"; + case PERF_REG_ARM64_X9: + return "x9"; + case PERF_REG_ARM64_X10: + return "x10"; + case PERF_REG_ARM64_X11: + return "x11"; + case PERF_REG_ARM64_X12: + return "x12"; + case PERF_REG_ARM64_X13: + return "x13"; + case PERF_REG_ARM64_X14: + return "x14"; + case PERF_REG_ARM64_X15: + return "x15"; + case PERF_REG_ARM64_X16: + return "x16"; + case PERF_REG_ARM64_X17: + return "x17"; + case PERF_REG_ARM64_X18: + return "x18"; + case PERF_REG_ARM64_X19: + return "x19"; + case PERF_REG_ARM64_X20: + return "x20"; + case PERF_REG_ARM64_X21: + return "x21"; + case PERF_REG_ARM64_X22: + return "x22"; + case PERF_REG_ARM64_X23: + return "x23"; + case PERF_REG_ARM64_X24: + return "x24"; + case PERF_REG_ARM64_X25: + return "x25"; + case PERF_REG_ARM64_X26: + return "x26"; + case PERF_REG_ARM64_X27: + return "x27"; + case PERF_REG_ARM64_X28: + return "x28"; + case PERF_REG_ARM64_X29: + return "x29"; + case PERF_REG_ARM64_SP: + return "sp"; + case PERF_REG_ARM64_LR: + return "lr"; + case PERF_REG_ARM64_PC: + return "pc"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_arm(int id) +{ + switch (id) { + case PERF_REG_ARM_R0: + return "r0"; + case PERF_REG_ARM_R1: + return "r1"; + case PERF_REG_ARM_R2: + return "r2"; + case PERF_REG_ARM_R3: + return "r3"; + case PERF_REG_ARM_R4: + return "r4"; + case PERF_REG_ARM_R5: + return "r5"; + case PERF_REG_ARM_R6: + return "r6"; + case PERF_REG_ARM_R7: + return "r7"; + case PERF_REG_ARM_R8: + return "r8"; + case PERF_REG_ARM_R9: + return "r9"; + case PERF_REG_ARM_R10: + return "r10"; + case PERF_REG_ARM_FP: + return "fp"; + case PERF_REG_ARM_IP: + return "ip"; + case PERF_REG_ARM_SP: + return "sp"; + case PERF_REG_ARM_LR: + return "lr"; + case PERF_REG_ARM_PC: + return "pc"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_csky(int id) +{ + switch (id) { + case PERF_REG_CSKY_A0: + return "a0"; + case PERF_REG_CSKY_A1: + return "a1"; + case PERF_REG_CSKY_A2: + return "a2"; + case PERF_REG_CSKY_A3: + return "a3"; + case PERF_REG_CSKY_REGS0: + return "regs0"; + case PERF_REG_CSKY_REGS1: + return "regs1"; + case PERF_REG_CSKY_REGS2: + return "regs2"; + case PERF_REG_CSKY_REGS3: + return "regs3"; + case PERF_REG_CSKY_REGS4: + return "regs4"; + case PERF_REG_CSKY_REGS5: + return "regs5"; + case PERF_REG_CSKY_REGS6: + return "regs6"; + case PERF_REG_CSKY_REGS7: + return "regs7"; + case PERF_REG_CSKY_REGS8: + return "regs8"; + case PERF_REG_CSKY_REGS9: + return "regs9"; + case PERF_REG_CSKY_SP: + return "sp"; + case PERF_REG_CSKY_LR: + return "lr"; + case PERF_REG_CSKY_PC: + return "pc"; +#if defined(__CSKYABIV2__) + case PERF_REG_CSKY_EXREGS0: + return "exregs0"; + case PERF_REG_CSKY_EXREGS1: + return "exregs1"; + case PERF_REG_CSKY_EXREGS2: + return "exregs2"; + case PERF_REG_CSKY_EXREGS3: + return "exregs3"; + case PERF_REG_CSKY_EXREGS4: + return "exregs4"; + case PERF_REG_CSKY_EXREGS5: + return "exregs5"; + case PERF_REG_CSKY_EXREGS6: + return "exregs6"; + case PERF_REG_CSKY_EXREGS7: + return "exregs7"; + case PERF_REG_CSKY_EXREGS8: + return "exregs8"; + case PERF_REG_CSKY_EXREGS9: + return "exregs9"; + case PERF_REG_CSKY_EXREGS10: + return "exregs10"; + case PERF_REG_CSKY_EXREGS11: + return "exregs11"; + case PERF_REG_CSKY_EXREGS12: + return "exregs12"; + case PERF_REG_CSKY_EXREGS13: + return "exregs13"; + case PERF_REG_CSKY_EXREGS14: + return "exregs14"; + case PERF_REG_CSKY_TLS: + return "tls"; + case PERF_REG_CSKY_HI: + return "hi"; + case PERF_REG_CSKY_LO: + return "lo"; +#endif + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_mips(int id) +{ + switch (id) { + case PERF_REG_MIPS_PC: + return "PC"; + case PERF_REG_MIPS_R1: + return "$1"; + case PERF_REG_MIPS_R2: + return "$2"; + case PERF_REG_MIPS_R3: + return "$3"; + case PERF_REG_MIPS_R4: + return "$4"; + case PERF_REG_MIPS_R5: + return "$5"; + case PERF_REG_MIPS_R6: + return "$6"; + case PERF_REG_MIPS_R7: + return "$7"; + case PERF_REG_MIPS_R8: + return "$8"; + case PERF_REG_MIPS_R9: + return "$9"; + case PERF_REG_MIPS_R10: + return "$10"; + case PERF_REG_MIPS_R11: + return "$11"; + case PERF_REG_MIPS_R12: + return "$12"; + case PERF_REG_MIPS_R13: + return "$13"; + case PERF_REG_MIPS_R14: + return "$14"; + case PERF_REG_MIPS_R15: + return "$15"; + case PERF_REG_MIPS_R16: + return "$16"; + case PERF_REG_MIPS_R17: + return "$17"; + case PERF_REG_MIPS_R18: + return "$18"; + case PERF_REG_MIPS_R19: + return "$19"; + case PERF_REG_MIPS_R20: + return "$20"; + case PERF_REG_MIPS_R21: + return "$21"; + case PERF_REG_MIPS_R22: + return "$22"; + case PERF_REG_MIPS_R23: + return "$23"; + case PERF_REG_MIPS_R24: + return "$24"; + case PERF_REG_MIPS_R25: + return "$25"; + case PERF_REG_MIPS_R28: + return "$28"; + case PERF_REG_MIPS_R29: + return "$29"; + case PERF_REG_MIPS_R30: + return "$30"; + case PERF_REG_MIPS_R31: + return "$31"; + default: + break; + } + return NULL; +} + +static const char *__perf_reg_name_powerpc(int id) +{ + switch (id) { + case PERF_REG_POWERPC_R0: + return "r0"; + case PERF_REG_POWERPC_R1: + return "r1"; + case PERF_REG_POWERPC_R2: + return "r2"; + case PERF_REG_POWERPC_R3: + return "r3"; + case PERF_REG_POWERPC_R4: + return "r4"; + case PERF_REG_POWERPC_R5: + return "r5"; + case PERF_REG_POWERPC_R6: + return "r6"; + case PERF_REG_POWERPC_R7: + return "r7"; + case PERF_REG_POWERPC_R8: + return "r8"; + case PERF_REG_POWERPC_R9: + return "r9"; + case PERF_REG_POWERPC_R10: + return "r10"; + case PERF_REG_POWERPC_R11: + return "r11"; + case PERF_REG_POWERPC_R12: + return "r12"; + case PERF_REG_POWERPC_R13: + return "r13"; + case PERF_REG_POWERPC_R14: + return "r14"; + case PERF_REG_POWERPC_R15: + return "r15"; + case PERF_REG_POWERPC_R16: + return "r16"; + case PERF_REG_POWERPC_R17: + return "r17"; + case PERF_REG_POWERPC_R18: + return "r18"; + case PERF_REG_POWERPC_R19: + return "r19"; + case PERF_REG_POWERPC_R20: + return "r20"; + case PERF_REG_POWERPC_R21: + return "r21"; + case PERF_REG_POWERPC_R22: + return "r22"; + case PERF_REG_POWERPC_R23: + return "r23"; + case PERF_REG_POWERPC_R24: + return "r24"; + case PERF_REG_POWERPC_R25: + return "r25"; + case PERF_REG_POWERPC_R26: + return "r26"; + case PERF_REG_POWERPC_R27: + return "r27"; + case PERF_REG_POWERPC_R28: + return "r28"; + case PERF_REG_POWERPC_R29: + return "r29"; + case PERF_REG_POWERPC_R30: + return "r30"; + case PERF_REG_POWERPC_R31: + return "r31"; + case PERF_REG_POWERPC_NIP: + return "nip"; + case PERF_REG_POWERPC_MSR: + return "msr"; + case PERF_REG_POWERPC_ORIG_R3: + return "orig_r3"; + case PERF_REG_POWERPC_CTR: + return "ctr"; + case PERF_REG_POWERPC_LINK: + return "link"; + case PERF_REG_POWERPC_XER: + return "xer"; + case PERF_REG_POWERPC_CCR: + return "ccr"; + case PERF_REG_POWERPC_SOFTE: + return "softe"; + case PERF_REG_POWERPC_TRAP: + return "trap"; + case PERF_REG_POWERPC_DAR: + return "dar"; + case PERF_REG_POWERPC_DSISR: + return "dsisr"; + case PERF_REG_POWERPC_SIER: + return "sier"; + case PERF_REG_POWERPC_MMCRA: + return "mmcra"; + case PERF_REG_POWERPC_MMCR0: + return "mmcr0"; + case PERF_REG_POWERPC_MMCR1: + return "mmcr1"; + case PERF_REG_POWERPC_MMCR2: + return "mmcr2"; + case PERF_REG_POWERPC_MMCR3: + return "mmcr3"; + case PERF_REG_POWERPC_SIER2: + return "sier2"; + case PERF_REG_POWERPC_SIER3: + return "sier3"; + case PERF_REG_POWERPC_PMC1: + return "pmc1"; + case PERF_REG_POWERPC_PMC2: + return "pmc2"; + case PERF_REG_POWERPC_PMC3: + return "pmc3"; + case PERF_REG_POWERPC_PMC4: + return "pmc4"; + case PERF_REG_POWERPC_PMC5: + return "pmc5"; + case PERF_REG_POWERPC_PMC6: + return "pmc6"; + case PERF_REG_POWERPC_SDAR: + return "sdar"; + case PERF_REG_POWERPC_SIAR: + return "siar"; + default: + break; + } + return NULL; +} + +static const char *__perf_reg_name_riscv(int id) +{ + switch (id) { + case PERF_REG_RISCV_PC: + return "pc"; + case PERF_REG_RISCV_RA: + return "ra"; + case PERF_REG_RISCV_SP: + return "sp"; + case PERF_REG_RISCV_GP: + return "gp"; + case PERF_REG_RISCV_TP: + return "tp"; + case PERF_REG_RISCV_T0: + return "t0"; + case PERF_REG_RISCV_T1: + return "t1"; + case PERF_REG_RISCV_T2: + return "t2"; + case PERF_REG_RISCV_S0: + return "s0"; + case PERF_REG_RISCV_S1: + return "s1"; + case PERF_REG_RISCV_A0: + return "a0"; + case PERF_REG_RISCV_A1: + return "a1"; + case PERF_REG_RISCV_A2: + return "a2"; + case PERF_REG_RISCV_A3: + return "a3"; + case PERF_REG_RISCV_A4: + return "a4"; + case PERF_REG_RISCV_A5: + return "a5"; + case PERF_REG_RISCV_A6: + return "a6"; + case PERF_REG_RISCV_A7: + return "a7"; + case PERF_REG_RISCV_S2: + return "s2"; + case PERF_REG_RISCV_S3: + return "s3"; + case PERF_REG_RISCV_S4: + return "s4"; + case PERF_REG_RISCV_S5: + return "s5"; + case PERF_REG_RISCV_S6: + return "s6"; + case PERF_REG_RISCV_S7: + return "s7"; + case PERF_REG_RISCV_S8: + return "s8"; + case PERF_REG_RISCV_S9: + return "s9"; + case PERF_REG_RISCV_S10: + return "s10"; + case PERF_REG_RISCV_S11: + return "s11"; + case PERF_REG_RISCV_T3: + return "t3"; + case PERF_REG_RISCV_T4: + return "t4"; + case PERF_REG_RISCV_T5: + return "t5"; + case PERF_REG_RISCV_T6: + return "t6"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_s390(int id) +{ + switch (id) { + case PERF_REG_S390_R0: + return "R0"; + case PERF_REG_S390_R1: + return "R1"; + case PERF_REG_S390_R2: + return "R2"; + case PERF_REG_S390_R3: + return "R3"; + case PERF_REG_S390_R4: + return "R4"; + case PERF_REG_S390_R5: + return "R5"; + case PERF_REG_S390_R6: + return "R6"; + case PERF_REG_S390_R7: + return "R7"; + case PERF_REG_S390_R8: + return "R8"; + case PERF_REG_S390_R9: + return "R9"; + case PERF_REG_S390_R10: + return "R10"; + case PERF_REG_S390_R11: + return "R11"; + case PERF_REG_S390_R12: + return "R12"; + case PERF_REG_S390_R13: + return "R13"; + case PERF_REG_S390_R14: + return "R14"; + case PERF_REG_S390_R15: + return "R15"; + case PERF_REG_S390_FP0: + return "FP0"; + case PERF_REG_S390_FP1: + return "FP1"; + case PERF_REG_S390_FP2: + return "FP2"; + case PERF_REG_S390_FP3: + return "FP3"; + case PERF_REG_S390_FP4: + return "FP4"; + case PERF_REG_S390_FP5: + return "FP5"; + case PERF_REG_S390_FP6: + return "FP6"; + case PERF_REG_S390_FP7: + return "FP7"; + case PERF_REG_S390_FP8: + return "FP8"; + case PERF_REG_S390_FP9: + return "FP9"; + case PERF_REG_S390_FP10: + return "FP10"; + case PERF_REG_S390_FP11: + return "FP11"; + case PERF_REG_S390_FP12: + return "FP12"; + case PERF_REG_S390_FP13: + return "FP13"; + case PERF_REG_S390_FP14: + return "FP14"; + case PERF_REG_S390_FP15: + return "FP15"; + case PERF_REG_S390_MASK: + return "MASK"; + case PERF_REG_S390_PC: + return "PC"; + default: + return NULL; + } + + return NULL; +} + +static const char *__perf_reg_name_x86(int id) +{ + switch (id) { + case PERF_REG_X86_AX: + return "AX"; + case PERF_REG_X86_BX: + return "BX"; + case PERF_REG_X86_CX: + return "CX"; + case PERF_REG_X86_DX: + return "DX"; + case PERF_REG_X86_SI: + return "SI"; + case PERF_REG_X86_DI: + return "DI"; + case PERF_REG_X86_BP: + return "BP"; + case PERF_REG_X86_SP: + return "SP"; + case PERF_REG_X86_IP: + return "IP"; + case PERF_REG_X86_FLAGS: + return "FLAGS"; + case PERF_REG_X86_CS: + return "CS"; + case PERF_REG_X86_SS: + return "SS"; + case PERF_REG_X86_DS: + return "DS"; + case PERF_REG_X86_ES: + return "ES"; + case PERF_REG_X86_FS: + return "FS"; + case PERF_REG_X86_GS: + return "GS"; + case PERF_REG_X86_R8: + return "R8"; + case PERF_REG_X86_R9: + return "R9"; + case PERF_REG_X86_R10: + return "R10"; + case PERF_REG_X86_R11: + return "R11"; + case PERF_REG_X86_R12: + return "R12"; + case PERF_REG_X86_R13: + return "R13"; + case PERF_REG_X86_R14: + return "R14"; + case PERF_REG_X86_R15: + return "R15"; + +#define XMM(x) \ + case PERF_REG_X86_XMM ## x: \ + case PERF_REG_X86_XMM ## x + 1: \ + return "XMM" #x; + XMM(0) + XMM(1) + XMM(2) + XMM(3) + XMM(4) + XMM(5) + XMM(6) + XMM(7) + XMM(8) + XMM(9) + XMM(10) + XMM(11) + XMM(12) + XMM(13) + XMM(14) + XMM(15) +#undef XMM + default: + return NULL; + } + + return NULL; +} + +const char *perf_reg_name(int id, const char *arch) +{ + const char *reg_name = NULL; + + if (!strcmp(arch, "csky")) + reg_name = __perf_reg_name_csky(id); + else if (!strcmp(arch, "mips")) + reg_name = __perf_reg_name_mips(id); + else if (!strcmp(arch, "powerpc")) + reg_name = __perf_reg_name_powerpc(id); + else if (!strcmp(arch, "riscv")) + reg_name = __perf_reg_name_riscv(id); + else if (!strcmp(arch, "s390")) + reg_name = __perf_reg_name_s390(id); + else if (!strcmp(arch, "x86")) + reg_name = __perf_reg_name_x86(id); + else if (!strcmp(arch, "arm")) + reg_name = __perf_reg_name_arm(id); + else if (!strcmp(arch, "arm64")) + reg_name = __perf_reg_name_arm64(id); + + return reg_name ?: "unknown"; +} + int perf_reg_value(u64 *valp, struct regs_dump *regs, int id) { int i, idx = 0; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index eeac181ebccf..ce1127af05e4 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -11,8 +11,11 @@ struct sample_reg { const char *name; uint64_t mask; }; -#define SMPL_REG(n, b) { .name = #n, .mask = 1ULL << (b) } -#define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) } + +#define SMPL_REG_MASK(b) (1ULL << (b)) +#define SMPL_REG(n, b) { .name = #n, .mask = SMPL_REG_MASK(b) } +#define SMPL_REG2_MASK(b) (3ULL << (b)) +#define SMPL_REG2(n, b) { .name = #n, .mask = SMPL_REG2_MASK(b) } #define SMPL_REG_END { .name = NULL } enum { @@ -31,22 +34,16 @@ extern const struct sample_reg sample_reg_masks[]; #define DWARF_MINIMAL_REGS ((1ULL << PERF_REG_IP) | (1ULL << PERF_REG_SP)) +const char *perf_reg_name(int id, const char *arch); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); -static inline const char *perf_reg_name(int id) -{ - const char *reg_name = __perf_reg_name(id); - - return reg_name ?: "unknown"; -} - #else #define PERF_REGS_MASK 0 #define PERF_REGS_MAX 0 #define DWARF_MINIMAL_REGS PERF_REGS_MASK -static inline const char *perf_reg_name(int id __maybe_unused) +static inline const char *perf_reg_name(int id __maybe_unused, const char *arch __maybe_unused) { return "unknown"; } diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 82c7f034d91a..f3e5131f183c 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -1059,7 +1059,7 @@ static struct mmap *get_md(struct evlist *evlist, int cpu) for (i = 0; i < evlist->core.nr_mmaps; i++) { struct mmap *md = &evlist->mmap[i]; - if (md->core.cpu == cpu) + if (md->core.cpu.cpu == cpu) return md; } @@ -1445,7 +1445,7 @@ error: * Dummy, to avoid dragging all the test_attr infrastructure in the python * binding. */ -void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu, +void test_attr__open(struct perf_event_attr *attr, pid_t pid, struct perf_cpu cpu, int fd, int group_fd, unsigned long flags) { } diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index bff669b615ee..20461f174991 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call if (opts->group) evlist__set_leader(evlist); - if (evlist->core.cpus->map[0] < 0) + if (evlist->core.cpus->map[0].cpu < 0) opts->no_inherit = true; use_comm_exec = perf_can_comm_exec(); @@ -229,7 +229,8 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str) { struct evlist *temp_evlist; struct evsel *evsel; - int err, fd, cpu; + int err, fd; + struct perf_cpu cpu = { .cpu = 0 }; bool ret = false; pid_t pid = -1; @@ -246,14 +247,16 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str) if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) { struct perf_cpu_map *cpus = perf_cpu_map__new(NULL); - cpu = cpus ? cpus->map[0] : 0; + if (cpus) + cpu = cpus->map[0]; + perf_cpu_map__put(cpus); } else { cpu = evlist->core.cpus->map[0]; } while (1) { - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu.cpu, -1, perf_event_open_cloexec_flag()); if (fd < 0) { if (pid == -1 && errno == EACCES) { diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index d1f1501ce7fc..f5ad0e62227a 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -36,6 +36,7 @@ #include "../debug.h" #include "../dso.h" #include "../callchain.h" +#include "../env.h" #include "../evsel.h" #include "../event.h" #include "../thread.h" @@ -687,7 +688,7 @@ static void set_sample_datasrc_in_dict(PyObject *dict, _PyUnicode_FromString(decode)); } -static void regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size) +static void regs_map(struct regs_dump *regs, uint64_t mask, const char *arch, char *bf, int size) { unsigned int i = 0, r; int printed = 0; @@ -702,7 +703,7 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size) printed += scnprintf(bf + printed, size - printed, "%5s:0x%" PRIx64 " ", - perf_reg_name(r), val); + perf_reg_name(r, arch), val); } } @@ -711,6 +712,7 @@ static void set_regs_in_dict(PyObject *dict, struct evsel *evsel) { struct perf_event_attr *attr = &evsel->core.attr; + const char *arch = perf_env__arch(evsel__env(evsel)); /* * Here value 28 is a constant size which can be used to print @@ -722,12 +724,12 @@ static void set_regs_in_dict(PyObject *dict, int size = __sw_hweight64(attr->sample_regs_intr) * 28; char bf[size]; - regs_map(&sample->intr_regs, attr->sample_regs_intr, bf, sizeof(bf)); + regs_map(&sample->intr_regs, attr->sample_regs_intr, arch, bf, sizeof(bf)); pydict_set_item_string_decref(dict, "iregs", _PyUnicode_FromString(bf)); - regs_map(&sample->user_regs, attr->sample_regs_user, bf, sizeof(bf)); + regs_map(&sample->user_regs, attr->sample_regs_user, arch, bf, sizeof(bf)); pydict_set_item_string_decref(dict, "uregs", _PyUnicode_FromString(bf)); @@ -1555,7 +1557,7 @@ static void get_handler_name(char *str, size_t size, } static void -process_stat(struct evsel *counter, int cpu, int thread, u64 tstamp, +process_stat(struct evsel *counter, struct perf_cpu cpu, int thread, u64 tstamp, struct perf_counts_values *count) { PyObject *handler, *t; @@ -1575,7 +1577,7 @@ process_stat(struct evsel *counter, int cpu, int thread, u64 tstamp, return; } - PyTuple_SetItem(t, n++, _PyLong_FromLong(cpu)); + PyTuple_SetItem(t, n++, _PyLong_FromLong(cpu.cpu)); PyTuple_SetItem(t, n++, _PyLong_FromLong(thread)); tuple_set_u64(t, n++, tstamp); @@ -1599,7 +1601,7 @@ static void python_process_stat(struct perf_stat_config *config, int cpu, thread; if (config->aggr_mode == AGGR_GLOBAL) { - process_stat(counter, -1, -1, tstamp, + process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp, &counter->counts->aggr); return; } diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index d8857d1b6d7c..f19348dddd55 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -15,6 +15,7 @@ #include "map_symbol.h" #include "branch.h" #include "debug.h" +#include "env.h" #include "evlist.h" #include "evsel.h" #include "memswap.h" @@ -1168,7 +1169,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack) } } -static void regs_dump__printf(u64 mask, u64 *regs) +static void regs_dump__printf(u64 mask, u64 *regs, const char *arch) { unsigned rid, i = 0; @@ -1176,7 +1177,7 @@ static void regs_dump__printf(u64 mask, u64 *regs) u64 val = regs[i++]; printf(".... %-5s 0x%016" PRIx64 "\n", - perf_reg_name(rid), val); + perf_reg_name(rid, arch), val); } } @@ -1194,7 +1195,7 @@ static inline const char *regs_dump_abi(struct regs_dump *d) return regs_abi[d->abi]; } -static void regs__printf(const char *type, struct regs_dump *regs) +static void regs__printf(const char *type, struct regs_dump *regs, const char *arch) { u64 mask = regs->mask; @@ -1203,23 +1204,23 @@ static void regs__printf(const char *type, struct regs_dump *regs) mask, regs_dump_abi(regs)); - regs_dump__printf(mask, regs->regs); + regs_dump__printf(mask, regs->regs, arch); } -static void regs_user__printf(struct perf_sample *sample) +static void regs_user__printf(struct perf_sample *sample, const char *arch) { struct regs_dump *user_regs = &sample->user_regs; if (user_regs->regs) - regs__printf("user", user_regs); + regs__printf("user", user_regs, arch); } -static void regs_intr__printf(struct perf_sample *sample) +static void regs_intr__printf(struct perf_sample *sample, const char *arch) { struct regs_dump *intr_regs = &sample->intr_regs; if (intr_regs->regs) - regs__printf("intr", intr_regs); + regs__printf("intr", intr_regs, arch); } static void stack_user__printf(struct stack_dump *dump) @@ -1304,7 +1305,7 @@ char *get_page_size_name(u64 size, char *str) } static void dump_sample(struct evsel *evsel, union perf_event *event, - struct perf_sample *sample) + struct perf_sample *sample, const char *arch) { u64 sample_type; char str[PAGE_SIZE_NAME_LEN]; @@ -1325,10 +1326,10 @@ static void dump_sample(struct evsel *evsel, union perf_event *event, branch_stack__printf(sample, evsel__has_branch_callstack(evsel)); if (sample_type & PERF_SAMPLE_REGS_USER) - regs_user__printf(sample); + regs_user__printf(sample, arch); if (sample_type & PERF_SAMPLE_REGS_INTR) - regs_intr__printf(sample); + regs_intr__printf(sample, arch); if (sample_type & PERF_SAMPLE_STACK_USER) stack_user__printf(&sample->user_stack); @@ -1502,7 +1503,7 @@ static int machines__deliver_event(struct machines *machines, ++evlist->stats.nr_unknown_id; return 0; } - dump_sample(evsel, event, sample); + dump_sample(evsel, event, sample, perf_env__arch(machine->env)); if (machine == NULL) { ++evlist->stats.nr_unprocessable_samples; return 0; @@ -2537,15 +2538,15 @@ int perf_session__cpu_bitmap(struct perf_session *session, } for (i = 0; i < map->nr; i++) { - int cpu = map->map[i]; + struct perf_cpu cpu = map->map[i]; - if (cpu >= nr_cpus) { + if (cpu.cpu >= nr_cpus) { pr_err("Requested CPU %d too large. " - "Consider raising MAX_NR_CPUS\n", cpu); + "Consider raising MAX_NR_CPUS\n", cpu.cpu); goto out_delete_map; } - set_bit(cpu, cpu_bitmap); + set_bit(cpu.cpu, cpu_bitmap); } err = 0; @@ -2597,7 +2598,7 @@ int perf_event__process_id_index(struct perf_session *session, if (!sid) return -ENOENT; sid->idx = e->idx; - sid->cpu = e->cpu; + sid->cpu.cpu = e->cpu; sid->tid = e->tid; } return 0; diff --git a/tools/perf/util/smt.c b/tools/perf/util/smt.c index 34f1b1b1176c..2b0a36ebf27a 100644 --- a/tools/perf/util/smt.c +++ b/tools/perf/util/smt.c @@ -5,6 +5,56 @@ #include "api/fs/fs.h" #include "smt.h" +/** + * hweight_str - Returns the number of bits set in str. Stops at first non-hex + * or ',' character. + */ +static int hweight_str(char *str) +{ + int result = 0; + + while (*str) { + switch (*str++) { + case '0': + case ',': + break; + case '1': + case '2': + case '4': + case '8': + result++; + break; + case '3': + case '5': + case '6': + case '9': + case 'a': + case 'A': + case 'c': + case 'C': + result += 2; + break; + case '7': + case 'b': + case 'B': + case 'd': + case 'D': + case 'e': + case 'E': + result += 3; + break; + case 'f': + case 'F': + result += 4; + break; + default: + goto done; + } + } +done: + return result; +} + int smt_on(void) { static bool cached; @@ -15,9 +65,12 @@ int smt_on(void) if (cached) return cached_result; - if (sysfs__read_int("devices/system/cpu/smt/active", &cached_result) >= 0) - goto done; + if (sysfs__read_int("devices/system/cpu/smt/active", &cached_result) >= 0) { + cached = true; + return cached_result; + } + cached_result = 0; ncpu = sysconf(_SC_NPROCESSORS_CONF); for (cpu = 0; cpu < ncpu; cpu++) { unsigned long long siblings; @@ -26,27 +79,21 @@ int smt_on(void) char fn[256]; snprintf(fn, sizeof fn, - "devices/system/cpu/cpu%d/topology/core_cpus", cpu); + "devices/system/cpu/cpu%d/topology/thread_siblings", cpu); if (sysfs__read_str(fn, &str, &strlen) < 0) { snprintf(fn, sizeof fn, - "devices/system/cpu/cpu%d/topology/thread_siblings", - cpu); + "devices/system/cpu/cpu%d/topology/core_cpus", cpu); if (sysfs__read_str(fn, &str, &strlen) < 0) continue; } /* Entry is hex, but does not have 0x, so need custom parser */ - siblings = strtoull(str, NULL, 16); + siblings = hweight_str(str); free(str); - if (hweight64(siblings) > 1) { + if (siblings > 1) { cached_result = 1; - cached = true; break; } } - if (!cached) { - cached_result = 0; -done: - cached = true; - } + cached = true; return cached_result; } diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index d9a106f0edb2..cfba8c337783 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -37,7 +37,7 @@ const char default_parent_pattern[] = "^sys_|^do_page_fault"; const char *parent_pattern = default_parent_pattern; const char *default_sort_order = "comm,dso,symbol"; const char default_branch_sort_order[] = "comm,dso_from,symbol_from,symbol_to,cycles"; -const char default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc"; +const char default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc"; const char default_top_sort_order[] = "dso,symbol"; const char default_diff_sort_order[] = "dso,symbol"; const char default_tracepoint_sort_order[] = "trace"; @@ -46,8 +46,8 @@ const char *field_order; regex_t ignore_callees_regex; int have_ignore_callees = 0; enum sort_mode sort__mode = SORT_MODE__NORMAL; -const char *dynamic_headers[] = {"local_ins_lat", "p_stage_cyc"}; -const char *arch_specific_sort_keys[] = {"p_stage_cyc"}; +static const char *const dynamic_headers[] = {"local_ins_lat", "ins_lat", "local_p_stage_cyc", "p_stage_cyc"}; +static const char *const arch_specific_sort_keys[] = {"local_p_stage_cyc", "p_stage_cyc"}; /* * Replaces all occurrences of a char used with the: @@ -1392,22 +1392,37 @@ struct sort_entry sort_global_ins_lat = { }; static int64_t -sort__global_p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right) +sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right) { return left->p_stage_cyc - right->p_stage_cyc; } +static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + return repsep_snprintf(bf, size, "%-*u", width, + he->p_stage_cyc * he->stat.nr_events); +} + + static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf, size_t size, unsigned int width) { return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc); } -struct sort_entry sort_p_stage_cyc = { - .se_header = "Pipeline Stage Cycle", - .se_cmp = sort__global_p_stage_cyc_cmp, +struct sort_entry sort_local_p_stage_cyc = { + .se_header = "Local Pipeline Stage Cycle", + .se_cmp = sort__p_stage_cyc_cmp, .se_snprintf = hist_entry__p_stage_cyc_snprintf, - .se_width_idx = HISTC_P_STAGE_CYC, + .se_width_idx = HISTC_LOCAL_P_STAGE_CYC, +}; + +struct sort_entry sort_global_p_stage_cyc = { + .se_header = "Pipeline Stage Cycle", + .se_cmp = sort__p_stage_cyc_cmp, + .se_snprintf = hist_entry__global_p_stage_cyc_snprintf, + .se_width_idx = HISTC_GLOBAL_P_STAGE_CYC, }; struct sort_entry sort_mem_daddr_sym = { @@ -1858,7 +1873,8 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_CODE_PAGE_SIZE, "code_page_size", sort_code_page_size), DIM(SORT_LOCAL_INS_LAT, "local_ins_lat", sort_local_ins_lat), DIM(SORT_GLOBAL_INS_LAT, "ins_lat", sort_global_ins_lat), - DIM(SORT_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_p_stage_cyc), + DIM(SORT_LOCAL_PIPELINE_STAGE_CYC, "local_p_stage_cyc", sort_local_p_stage_cyc), + DIM(SORT_GLOBAL_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_global_p_stage_cyc), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 7b7145501933..f994261888e1 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -235,7 +235,8 @@ enum sort_type { SORT_CODE_PAGE_SIZE, SORT_LOCAL_INS_LAT, SORT_GLOBAL_INS_LAT, - SORT_PIPELINE_STAGE_CYC, + SORT_LOCAL_PIPELINE_STAGE_CYC, + SORT_GLOBAL_PIPELINE_STAGE_CYC, /* branch stack specific sort keys */ __SORT_BRANCH_STACK, diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index 588601000f3f..5db83e51ceef 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -4,6 +4,7 @@ #include <linux/string.h> #include <linux/time64.h> #include <math.h> +#include <perf/cpumap.h> #include "color.h" #include "counts.h" #include "evlist.h" @@ -120,11 +121,10 @@ static void aggr_printout(struct perf_stat_config *config, id.die, config->csv_output ? 0 : -3, id.core, config->csv_sep); - } else if (id.core > -1) { + } else if (id.cpu.cpu > -1) { fprintf(config->output, "CPU%*d%s", config->csv_output ? 0 : -7, - evsel__cpus(evsel)->map[id.core], - config->csv_sep); + id.cpu.cpu, config->csv_sep); } break; case AGGR_THREAD: @@ -327,26 +327,24 @@ static void print_metric_header(struct perf_stat_config *config, fprintf(os->fh, "%*s ", config->metric_only_len, unit); } -static int first_shadow_cpu(struct perf_stat_config *config, - struct evsel *evsel, struct aggr_cpu_id id) +static int first_shadow_cpu_map_idx(struct perf_stat_config *config, + struct evsel *evsel, const struct aggr_cpu_id *id) { - struct evlist *evlist = evsel->evlist; - int i; + struct perf_cpu_map *cpus = evsel__cpus(evsel); + struct perf_cpu cpu; + int idx; if (config->aggr_mode == AGGR_NONE) - return id.core; + return perf_cpu_map__idx(cpus, id->cpu); if (!config->aggr_get_id) return 0; - for (i = 0; i < evsel__nr_cpus(evsel); i++) { - int cpu2 = evsel__cpus(evsel)->map[i]; + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu); - if (cpu_map__compare_aggr_cpu_id( - config->aggr_get_id(config, evlist->core.cpus, cpu2), - id)) { - return cpu2; - } + if (aggr_cpu_id__equal(&cpu_id, id)) + return idx; } return 0; } @@ -505,7 +503,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int } perf_stat__print_shadow_stats(config, counter, uval, - first_shadow_cpu(config, counter, id), + first_shadow_cpu_map_idx(config, counter, &id), &out, &config->metric_events, st); if (!config->csv_output && !config->metric_only) { print_noise(config, counter, noise); @@ -516,23 +514,26 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int static void aggr_update_shadow(struct perf_stat_config *config, struct evlist *evlist) { - int cpu, s; + int idx, s; + struct perf_cpu cpu; struct aggr_cpu_id s2, id; u64 val; struct evsel *counter; + struct perf_cpu_map *cpus; for (s = 0; s < config->aggr_map->nr; s++) { id = config->aggr_map->map[s]; evlist__for_each_entry(evlist, counter) { + cpus = evsel__cpus(counter); val = 0; - for (cpu = 0; cpu < evsel__nr_cpus(counter); cpu++) { - s2 = config->aggr_get_id(config, evlist->core.cpus, cpu); - if (!cpu_map__compare_aggr_cpu_id(s2, id)) + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + s2 = config->aggr_get_id(config, cpu); + if (!aggr_cpu_id__equal(&s2, &id)) continue; - val += perf_counts(counter->counts, cpu, 0)->val; + val += perf_counts(counter->counts, idx, 0)->val; } perf_stat__update_shadow_stats(counter, val, - first_shadow_cpu(config, counter, id), + first_shadow_cpu_map_idx(config, counter, &id), &rt_stat); } } @@ -627,25 +628,28 @@ struct aggr_data { u64 ena, run, val; struct aggr_cpu_id id; int nr; - int cpu; + int cpu_map_idx; }; static void aggr_cb(struct perf_stat_config *config, struct evsel *counter, void *data, bool first) { struct aggr_data *ad = data; - int cpu; + int idx; + struct perf_cpu cpu; + struct perf_cpu_map *cpus; struct aggr_cpu_id s2; - for (cpu = 0; cpu < evsel__nr_cpus(counter); cpu++) { + cpus = evsel__cpus(counter); + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { struct perf_counts_values *counts; - s2 = config->aggr_get_id(config, evsel__cpus(counter), cpu); - if (!cpu_map__compare_aggr_cpu_id(s2, ad->id)) + s2 = config->aggr_get_id(config, cpu); + if (!aggr_cpu_id__equal(&s2, &ad->id)) continue; if (first) ad->nr++; - counts = perf_counts(counter->counts, cpu, 0); + counts = perf_counts(counter->counts, idx, 0); /* * When any result is bad, make them all to give * consistent output in interval mode. @@ -665,7 +669,7 @@ static void aggr_cb(struct perf_stat_config *config, static void print_counter_aggrdata(struct perf_stat_config *config, struct evsel *counter, int s, char *prefix, bool metric_only, - bool *first, int cpu) + bool *first, struct perf_cpu cpu) { struct aggr_data ad; FILE *output = config->output; @@ -695,10 +699,9 @@ static void print_counter_aggrdata(struct perf_stat_config *config, fprintf(output, "%s", prefix); uval = val * counter->scale; - if (cpu != -1) { - id = cpu_map__empty_aggr_cpu_id(); - id.core = cpu; - } + if (cpu.cpu != -1) + id = aggr_cpu_id__cpu(cpu, /*data=*/NULL); + printout(config, id, nr, counter, uval, prefix, run, ena, 1.0, &rt_stat); if (!metric_only) @@ -731,8 +734,8 @@ static void print_aggr(struct perf_stat_config *config, first = true; evlist__for_each_entry(evlist, counter) { print_counter_aggrdata(config, counter, s, - prefix, metric_only, - &first, -1); + prefix, metric_only, + &first, (struct perf_cpu){ .cpu = -1 }); } if (metric_only) fputc('\n', output); @@ -778,7 +781,7 @@ static struct perf_aggr_thread_value *sort_aggr_thread( continue; buf[i].counter = counter; - buf[i].id = cpu_map__empty_aggr_cpu_id(); + buf[i].id = aggr_cpu_id__empty(); buf[i].id.thread = thread; buf[i].uval = uval; buf[i].val = val; @@ -866,7 +869,7 @@ static void print_counter_aggr(struct perf_stat_config *config, fprintf(output, "%s", prefix); uval = cd.avg * counter->scale; - printout(config, cpu_map__empty_aggr_cpu_id(), 0, counter, uval, prefix, cd.avg_running, + printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled, cd.avg, &rt_stat); if (!metric_only) fprintf(output, "\n"); @@ -878,9 +881,9 @@ static void counter_cb(struct perf_stat_config *config __maybe_unused, { struct aggr_data *ad = data; - ad->val += perf_counts(counter->counts, ad->cpu, 0)->val; - ad->ena += perf_counts(counter->counts, ad->cpu, 0)->ena; - ad->run += perf_counts(counter->counts, ad->cpu, 0)->run; + ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val; + ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena; + ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run; } /* @@ -893,11 +896,12 @@ static void print_counter(struct perf_stat_config *config, FILE *output = config->output; u64 ena, run, val; double uval; - int cpu; + int idx; + struct perf_cpu cpu; struct aggr_cpu_id id; - for (cpu = 0; cpu < evsel__nr_cpus(counter); cpu++) { - struct aggr_data ad = { .cpu = cpu }; + perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) { + struct aggr_data ad = { .cpu_map_idx = idx }; if (!collect_data(config, counter, counter_cb, &ad)) return; @@ -909,8 +913,7 @@ static void print_counter(struct perf_stat_config *config, fprintf(output, "%s", prefix); uval = val * counter->scale; - id = cpu_map__empty_aggr_cpu_id(); - id.core = cpu; + id = aggr_cpu_id__cpu(cpu, /*data=*/NULL); printout(config, id, 0, counter, uval, prefix, run, ena, 1.0, &rt_stat); @@ -922,29 +925,32 @@ static void print_no_aggr_metric(struct perf_stat_config *config, struct evlist *evlist, char *prefix) { - int cpu; - int nrcpus = 0; - struct evsel *counter; - u64 ena, run, val; - double uval; - struct aggr_cpu_id id; + int all_idx; + struct perf_cpu cpu; - nrcpus = evlist->core.cpus->nr; - for (cpu = 0; cpu < nrcpus; cpu++) { + perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) { + struct evsel *counter; bool first = true; if (prefix) fputs(prefix, config->output); evlist__for_each_entry(evlist, counter) { - id = cpu_map__empty_aggr_cpu_id(); - id.core = cpu; + u64 ena, run, val; + double uval; + struct aggr_cpu_id id; + int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu); + + if (counter_idx < 0) + continue; + + id = aggr_cpu_id__cpu(cpu, /*data=*/NULL); if (first) { aggr_printout(config, counter, id, 0); first = false; } - val = perf_counts(counter->counts, cpu, 0)->val; - ena = perf_counts(counter->counts, cpu, 0)->ena; - run = perf_counts(counter->counts, cpu, 0)->run; + val = perf_counts(counter->counts, counter_idx, 0)->val; + ena = perf_counts(counter->counts, counter_idx, 0)->ena; + run = perf_counts(counter->counts, counter_idx, 0)->run; uval = val * counter->scale; printout(config, id, 0, counter, uval, prefix, @@ -1208,19 +1214,23 @@ static void print_percore_thread(struct perf_stat_config *config, { int s; struct aggr_cpu_id s2, id; + struct perf_cpu_map *cpus; bool first = true; + int idx; + struct perf_cpu cpu; - for (int i = 0; i < evsel__nr_cpus(counter); i++) { - s2 = config->aggr_get_id(config, evsel__cpus(counter), i); + cpus = evsel__cpus(counter); + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + s2 = config->aggr_get_id(config, cpu); for (s = 0; s < config->aggr_map->nr; s++) { id = config->aggr_map->map[s]; - if (cpu_map__compare_aggr_cpu_id(s2, id)) + if (aggr_cpu_id__equal(&s2, &id)) break; } print_counter_aggrdata(config, counter, s, prefix, false, - &first, i); + &first, cpu); } } @@ -1243,8 +1253,8 @@ static void print_percore(struct perf_stat_config *config, fprintf(output, "%s", prefix); print_counter_aggrdata(config, counter, s, - prefix, metric_only, - &first, -1); + prefix, metric_only, + &first, (struct perf_cpu){ .cpu = -1 }); } if (metric_only) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 5c7308efa768..10af7804e482 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -32,7 +32,7 @@ struct saved_value { struct evsel *evsel; enum stat_type type; int ctx; - int cpu; + int cpu_map_idx; struct cgroup *cgrp; struct runtime_stat *stat; struct stats stats; @@ -47,8 +47,8 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry) rb_node); const struct saved_value *b = entry; - if (a->cpu != b->cpu) - return a->cpu - b->cpu; + if (a->cpu_map_idx != b->cpu_map_idx) + return a->cpu_map_idx - b->cpu_map_idx; /* * Previously the rbtree was used to link generic metrics. @@ -105,7 +105,7 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused, } static struct saved_value *saved_value_lookup(struct evsel *evsel, - int cpu, + int cpu_map_idx, bool create, enum stat_type type, int ctx, @@ -115,7 +115,7 @@ static struct saved_value *saved_value_lookup(struct evsel *evsel, struct rblist *rblist; struct rb_node *nd; struct saved_value dm = { - .cpu = cpu, + .cpu_map_idx = cpu_map_idx, .evsel = evsel, .type = type, .ctx = ctx, @@ -213,10 +213,10 @@ struct runtime_stat_data { static void update_runtime_stat(struct runtime_stat *st, enum stat_type type, - int cpu, u64 count, + int cpu_map_idx, u64 count, struct runtime_stat_data *rsd) { - struct saved_value *v = saved_value_lookup(NULL, cpu, true, type, + struct saved_value *v = saved_value_lookup(NULL, cpu_map_idx, true, type, rsd->ctx, st, rsd->cgrp); if (v) @@ -229,7 +229,7 @@ static void update_runtime_stat(struct runtime_stat *st, * instruction rates, etc: */ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, - int cpu, struct runtime_stat *st) + int cpu_map_idx, struct runtime_stat *st) { u64 count_ns = count; struct saved_value *v; @@ -241,88 +241,88 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, count *= counter->scale; if (evsel__is_clock(counter)) - update_runtime_stat(st, STAT_NSECS, cpu, count_ns, &rsd); + update_runtime_stat(st, STAT_NSECS, cpu_map_idx, count_ns, &rsd); else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) - update_runtime_stat(st, STAT_CYCLES, cpu, count, &rsd); + update_runtime_stat(st, STAT_CYCLES, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, CYCLES_IN_TX)) - update_runtime_stat(st, STAT_CYCLES_IN_TX, cpu, count, &rsd); + update_runtime_stat(st, STAT_CYCLES_IN_TX, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TRANSACTION_START)) - update_runtime_stat(st, STAT_TRANSACTION, cpu, count, &rsd); + update_runtime_stat(st, STAT_TRANSACTION, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, ELISION_START)) - update_runtime_stat(st, STAT_ELISION, cpu, count, &rsd); + update_runtime_stat(st, STAT_ELISION, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS)) update_runtime_stat(st, STAT_TOPDOWN_TOTAL_SLOTS, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED)) update_runtime_stat(st, STAT_TOPDOWN_SLOTS_ISSUED, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED)) update_runtime_stat(st, STAT_TOPDOWN_SLOTS_RETIRED, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES)) update_runtime_stat(st, STAT_TOPDOWN_FETCH_BUBBLES, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES)) update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING)) update_runtime_stat(st, STAT_TOPDOWN_RETIRING, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC)) update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND)) update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND)) update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_HEAVY_OPS)) update_runtime_stat(st, STAT_TOPDOWN_HEAVY_OPS, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_BR_MISPREDICT)) update_runtime_stat(st, STAT_TOPDOWN_BR_MISPREDICT, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_LAT)) update_runtime_stat(st, STAT_TOPDOWN_FETCH_LAT, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, TOPDOWN_MEM_BOUND)) update_runtime_stat(st, STAT_TOPDOWN_MEM_BOUND, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND)) update_runtime_stat(st, STAT_STALLED_CYCLES_BACK, - cpu, count, &rsd); + cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS)) - update_runtime_stat(st, STAT_BRANCHES, cpu, count, &rsd); + update_runtime_stat(st, STAT_BRANCHES, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES)) - update_runtime_stat(st, STAT_CACHEREFS, cpu, count, &rsd); + update_runtime_stat(st, STAT_CACHEREFS, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D)) - update_runtime_stat(st, STAT_L1_DCACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_L1_DCACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I)) - update_runtime_stat(st, STAT_L1_ICACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_L1_ICACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_LL)) - update_runtime_stat(st, STAT_LL_CACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_LL_CACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_DTLB)) - update_runtime_stat(st, STAT_DTLB_CACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_DTLB_CACHE, cpu_map_idx, count, &rsd); else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB)) - update_runtime_stat(st, STAT_ITLB_CACHE, cpu, count, &rsd); + update_runtime_stat(st, STAT_ITLB_CACHE, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, SMI_NUM)) - update_runtime_stat(st, STAT_SMI_NUM, cpu, count, &rsd); + update_runtime_stat(st, STAT_SMI_NUM, cpu_map_idx, count, &rsd); else if (perf_stat_evsel__is(counter, APERF)) - update_runtime_stat(st, STAT_APERF, cpu, count, &rsd); + update_runtime_stat(st, STAT_APERF, cpu_map_idx, count, &rsd); if (counter->collect_stat) { - v = saved_value_lookup(counter, cpu, true, STAT_NONE, 0, st, + v = saved_value_lookup(counter, cpu_map_idx, true, STAT_NONE, 0, st, rsd.cgrp); update_stats(&v->stats, count); if (counter->metric_leader) v->metric_total += count; } else if (counter->metric_leader) { v = saved_value_lookup(counter->metric_leader, - cpu, true, STAT_NONE, 0, st, rsd.cgrp); + cpu_map_idx, true, STAT_NONE, 0, st, rsd.cgrp); v->metric_total += count; v->metric_other++; } @@ -464,12 +464,12 @@ void perf_stat__collect_metric_expr(struct evlist *evsel_list) } static double runtime_stat_avg(struct runtime_stat *st, - enum stat_type type, int cpu, + enum stat_type type, int cpu_map_idx, struct runtime_stat_data *rsd) { struct saved_value *v; - v = saved_value_lookup(NULL, cpu, false, type, rsd->ctx, st, rsd->cgrp); + v = saved_value_lookup(NULL, cpu_map_idx, false, type, rsd->ctx, st, rsd->cgrp); if (!v) return 0.0; @@ -477,12 +477,12 @@ static double runtime_stat_avg(struct runtime_stat *st, } static double runtime_stat_n(struct runtime_stat *st, - enum stat_type type, int cpu, + enum stat_type type, int cpu_map_idx, struct runtime_stat_data *rsd) { struct saved_value *v; - v = saved_value_lookup(NULL, cpu, false, type, rsd->ctx, st, rsd->cgrp); + v = saved_value_lookup(NULL, cpu_map_idx, false, type, rsd->ctx, st, rsd->cgrp); if (!v) return 0.0; @@ -490,7 +490,7 @@ static double runtime_stat_n(struct runtime_stat *st, } static void print_stalled_cycles_frontend(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -498,7 +498,7 @@ static void print_stalled_cycles_frontend(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_CYCLES, cpu, rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -513,7 +513,7 @@ static void print_stalled_cycles_frontend(struct perf_stat_config *config, } static void print_stalled_cycles_backend(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -521,7 +521,7 @@ static void print_stalled_cycles_backend(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_CYCLES, cpu, rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -532,7 +532,7 @@ static void print_stalled_cycles_backend(struct perf_stat_config *config, } static void print_branch_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -540,7 +540,7 @@ static void print_branch_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_BRANCHES, cpu, rsd); + total = runtime_stat_avg(st, STAT_BRANCHES, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -551,7 +551,7 @@ static void print_branch_misses(struct perf_stat_config *config, } static void print_l1_dcache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -559,7 +559,7 @@ static void print_l1_dcache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_L1_DCACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_L1_DCACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -570,7 +570,7 @@ static void print_l1_dcache_misses(struct perf_stat_config *config, } static void print_l1_icache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -578,7 +578,7 @@ static void print_l1_icache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_L1_ICACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_L1_ICACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -588,7 +588,7 @@ static void print_l1_icache_misses(struct perf_stat_config *config, } static void print_dtlb_cache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -596,7 +596,7 @@ static void print_dtlb_cache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_DTLB_CACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_DTLB_CACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -606,7 +606,7 @@ static void print_dtlb_cache_misses(struct perf_stat_config *config, } static void print_itlb_cache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -614,7 +614,7 @@ static void print_itlb_cache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_ITLB_CACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_ITLB_CACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -624,7 +624,7 @@ static void print_itlb_cache_misses(struct perf_stat_config *config, } static void print_ll_cache_misses(struct perf_stat_config *config, - int cpu, double avg, + int cpu_map_idx, double avg, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -632,7 +632,7 @@ static void print_ll_cache_misses(struct perf_stat_config *config, double total, ratio = 0.0; const char *color; - total = runtime_stat_avg(st, STAT_LL_CACHE, cpu, rsd); + total = runtime_stat_avg(st, STAT_LL_CACHE, cpu_map_idx, rsd); if (total) ratio = avg / total * 100.0; @@ -690,61 +690,61 @@ static double sanitize_val(double x) return x; } -static double td_total_slots(int cpu, struct runtime_stat *st, +static double td_total_slots(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { - return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, cpu, rsd); + return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, cpu_map_idx, rsd); } -static double td_bad_spec(int cpu, struct runtime_stat *st, +static double td_bad_spec(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { double bad_spec = 0; double total_slots; double total; - total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, cpu, rsd) - - runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, cpu, rsd) + - runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, cpu, rsd); + total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, cpu_map_idx, rsd) - + runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, cpu_map_idx, rsd) + + runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, cpu_map_idx, rsd); - total_slots = td_total_slots(cpu, st, rsd); + total_slots = td_total_slots(cpu_map_idx, st, rsd); if (total_slots) bad_spec = total / total_slots; return sanitize_val(bad_spec); } -static double td_retiring(int cpu, struct runtime_stat *st, +static double td_retiring(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { double retiring = 0; - double total_slots = td_total_slots(cpu, st, rsd); + double total_slots = td_total_slots(cpu_map_idx, st, rsd); double ret_slots = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, - cpu, rsd); + cpu_map_idx, rsd); if (total_slots) retiring = ret_slots / total_slots; return retiring; } -static double td_fe_bound(int cpu, struct runtime_stat *st, +static double td_fe_bound(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { double fe_bound = 0; - double total_slots = td_total_slots(cpu, st, rsd); + double total_slots = td_total_slots(cpu_map_idx, st, rsd); double fetch_bub = runtime_stat_avg(st, STAT_TOPDOWN_FETCH_BUBBLES, - cpu, rsd); + cpu_map_idx, rsd); if (total_slots) fe_bound = fetch_bub / total_slots; return fe_bound; } -static double td_be_bound(int cpu, struct runtime_stat *st, +static double td_be_bound(int cpu_map_idx, struct runtime_stat *st, struct runtime_stat_data *rsd) { - double sum = (td_fe_bound(cpu, st, rsd) + - td_bad_spec(cpu, st, rsd) + - td_retiring(cpu, st, rsd)); + double sum = (td_fe_bound(cpu_map_idx, st, rsd) + + td_bad_spec(cpu_map_idx, st, rsd) + + td_retiring(cpu_map_idx, st, rsd)); if (sum == 0) return 0; return sanitize_val(1.0 - sum); @@ -755,15 +755,15 @@ static double td_be_bound(int cpu, struct runtime_stat *st, * the ratios we need to recreate the sum. */ -static double td_metric_ratio(int cpu, enum stat_type type, +static double td_metric_ratio(int cpu_map_idx, enum stat_type type, struct runtime_stat *stat, struct runtime_stat_data *rsd) { - double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu, rsd) + - runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu, rsd) + - runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu, rsd) + - runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu, rsd); - double d = runtime_stat_avg(stat, type, cpu, rsd); + double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu_map_idx, rsd) + + runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu_map_idx, rsd) + + runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu_map_idx, rsd) + + runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu_map_idx, rsd); + double d = runtime_stat_avg(stat, type, cpu_map_idx, rsd); if (sum) return d / sum; @@ -775,23 +775,23 @@ static double td_metric_ratio(int cpu, enum stat_type type, * We allow two missing. */ -static bool full_td(int cpu, struct runtime_stat *stat, +static bool full_td(int cpu_map_idx, struct runtime_stat *stat, struct runtime_stat_data *rsd) { int c = 0; - if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, cpu_map_idx, rsd) > 0) c++; - if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, cpu_map_idx, rsd) > 0) c++; - if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, cpu_map_idx, rsd) > 0) c++; - if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu, rsd) > 0) + if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, cpu_map_idx, rsd) > 0) c++; return c >= 2; } -static void print_smi_cost(struct perf_stat_config *config, int cpu, +static void print_smi_cost(struct perf_stat_config *config, int cpu_map_idx, struct perf_stat_output_ctx *out, struct runtime_stat *st, struct runtime_stat_data *rsd) @@ -799,9 +799,9 @@ static void print_smi_cost(struct perf_stat_config *config, int cpu, double smi_num, aperf, cycles, cost = 0.0; const char *color = NULL; - smi_num = runtime_stat_avg(st, STAT_SMI_NUM, cpu, rsd); - aperf = runtime_stat_avg(st, STAT_APERF, cpu, rsd); - cycles = runtime_stat_avg(st, STAT_CYCLES, cpu, rsd); + smi_num = runtime_stat_avg(st, STAT_SMI_NUM, cpu_map_idx, rsd); + aperf = runtime_stat_avg(st, STAT_APERF, cpu_map_idx, rsd); + cycles = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, rsd); if ((cycles == 0) || (aperf == 0)) return; @@ -818,7 +818,7 @@ static void print_smi_cost(struct perf_stat_config *config, int cpu, static int prepare_metric(struct evsel **metric_events, struct metric_ref *metric_refs, struct expr_parse_ctx *pctx, - int cpu, + int cpu_map_idx, struct runtime_stat *st) { double scale; @@ -836,7 +836,7 @@ static int prepare_metric(struct evsel **metric_events, scale = 1e-9; source_count = 1; } else { - v = saved_value_lookup(metric_events[i], cpu, false, + v = saved_value_lookup(metric_events[i], cpu_map_idx, false, STAT_NONE, 0, st, metric_events[i]->cgrp); if (!v) @@ -874,7 +874,7 @@ static void generic_metric(struct perf_stat_config *config, const char *metric_name, const char *metric_unit, int runtime, - int cpu, + int cpu_map_idx, struct perf_stat_output_ctx *out, struct runtime_stat *st) { @@ -889,7 +889,7 @@ static void generic_metric(struct perf_stat_config *config, return; pctx->runtime = runtime; - i = prepare_metric(metric_events, metric_refs, pctx, cpu, st); + i = prepare_metric(metric_events, metric_refs, pctx, cpu_map_idx, st); if (i < 0) { expr__ctx_free(pctx); return; @@ -934,7 +934,7 @@ static void generic_metric(struct perf_stat_config *config, expr__ctx_free(pctx); } -double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_stat *st) +double test_generic_metric(struct metric_expr *mexp, int cpu_map_idx, struct runtime_stat *st) { struct expr_parse_ctx *pctx; double ratio = 0.0; @@ -943,7 +943,7 @@ double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_sta if (!pctx) return NAN; - if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, cpu, st) < 0) + if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, cpu_map_idx, st) < 0) goto out; if (expr__parse(&ratio, pctx, mexp->metric_expr)) @@ -956,7 +956,7 @@ out: void perf_stat__print_shadow_stats(struct perf_stat_config *config, struct evsel *evsel, - double avg, int cpu, + double avg, int cpu_map_idx, struct perf_stat_output_ctx *out, struct rblist *metric_events, struct runtime_stat *st) @@ -975,7 +975,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, if (config->iostat_run) { iostat_print_metric(config, evsel, out); } else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) { - total = runtime_stat_avg(st, STAT_CYCLES, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, &rsd); if (total) { ratio = avg / total; @@ -985,11 +985,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0); } - total = runtime_stat_avg(st, STAT_STALLED_CYCLES_FRONT, cpu, &rsd); + total = runtime_stat_avg(st, STAT_STALLED_CYCLES_FRONT, cpu_map_idx, &rsd); total = max(total, runtime_stat_avg(st, STAT_STALLED_CYCLES_BACK, - cpu, &rsd)); + cpu_map_idx, &rsd)); if (total && avg) { out->new_line(config, ctxp); @@ -999,8 +999,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ratio); } } else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) { - if (runtime_stat_n(st, STAT_BRANCHES, cpu, &rsd) != 0) - print_branch_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_BRANCHES, cpu_map_idx, &rsd) != 0) + print_branch_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all branches", 0); } else if ( @@ -1009,8 +1009,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_L1_DCACHE, cpu, &rsd) != 0) - print_l1_dcache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_L1_DCACHE, cpu_map_idx, &rsd) != 0) + print_l1_dcache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all L1-dcache accesses", 0); } else if ( @@ -1019,8 +1019,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_L1_ICACHE, cpu, &rsd) != 0) - print_l1_icache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_L1_ICACHE, cpu_map_idx, &rsd) != 0) + print_l1_icache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all L1-icache accesses", 0); } else if ( @@ -1029,8 +1029,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_DTLB_CACHE, cpu, &rsd) != 0) - print_dtlb_cache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_DTLB_CACHE, cpu_map_idx, &rsd) != 0) + print_dtlb_cache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all dTLB cache accesses", 0); } else if ( @@ -1039,8 +1039,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_ITLB_CACHE, cpu, &rsd) != 0) - print_itlb_cache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_ITLB_CACHE, cpu_map_idx, &rsd) != 0) + print_itlb_cache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all iTLB cache accesses", 0); } else if ( @@ -1049,27 +1049,27 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) { - if (runtime_stat_n(st, STAT_LL_CACHE, cpu, &rsd) != 0) - print_ll_cache_misses(config, cpu, avg, out, st, &rsd); + if (runtime_stat_n(st, STAT_LL_CACHE, cpu_map_idx, &rsd) != 0) + print_ll_cache_misses(config, cpu_map_idx, avg, out, st, &rsd); else print_metric(config, ctxp, NULL, NULL, "of all LL-cache accesses", 0); } else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) { - total = runtime_stat_avg(st, STAT_CACHEREFS, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CACHEREFS, cpu_map_idx, &rsd); if (total) ratio = avg * 100 / total; - if (runtime_stat_n(st, STAT_CACHEREFS, cpu, &rsd) != 0) + if (runtime_stat_n(st, STAT_CACHEREFS, cpu_map_idx, &rsd) != 0) print_metric(config, ctxp, NULL, "%8.3f %%", "of all cache refs", ratio); else print_metric(config, ctxp, NULL, NULL, "of all cache refs", 0); } else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) { - print_stalled_cycles_frontend(config, cpu, avg, out, st, &rsd); + print_stalled_cycles_frontend(config, cpu_map_idx, avg, out, st, &rsd); } else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) { - print_stalled_cycles_backend(config, cpu, avg, out, st, &rsd); + print_stalled_cycles_backend(config, cpu_map_idx, avg, out, st, &rsd); } else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) { - total = runtime_stat_avg(st, STAT_NSECS, cpu, &rsd); + total = runtime_stat_avg(st, STAT_NSECS, cpu_map_idx, &rsd); if (total) { ratio = avg / total; @@ -1078,7 +1078,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, NULL, NULL, "Ghz", 0); } } else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) { - total = runtime_stat_avg(st, STAT_CYCLES, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, &rsd); if (total) print_metric(config, ctxp, NULL, @@ -1088,8 +1088,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, NULL, NULL, "transactional cycles", 0); } else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) { - total = runtime_stat_avg(st, STAT_CYCLES, cpu, &rsd); - total2 = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES, cpu_map_idx, &rsd); + total2 = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd); if (total2 < avg) total2 = avg; @@ -1099,19 +1099,19 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, else print_metric(config, ctxp, NULL, NULL, "aborted cycles", 0); } else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) { - total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd); if (avg) ratio = total / avg; - if (runtime_stat_n(st, STAT_CYCLES_IN_TX, cpu, &rsd) != 0) + if (runtime_stat_n(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd) != 0) print_metric(config, ctxp, NULL, "%8.0f", "cycles / transaction", ratio); else print_metric(config, ctxp, NULL, NULL, "cycles / transaction", 0); } else if (perf_stat_evsel__is(evsel, ELISION_START)) { - total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu, &rsd); + total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, cpu_map_idx, &rsd); if (avg) ratio = total / avg; @@ -1124,28 +1124,28 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, else print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) { - double fe_bound = td_fe_bound(cpu, st, &rsd); + double fe_bound = td_fe_bound(cpu_map_idx, st, &rsd); if (fe_bound > 0.2) color = PERF_COLOR_RED; print_metric(config, ctxp, color, "%8.1f%%", "frontend bound", fe_bound * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) { - double retiring = td_retiring(cpu, st, &rsd); + double retiring = td_retiring(cpu_map_idx, st, &rsd); if (retiring > 0.7) color = PERF_COLOR_GREEN; print_metric(config, ctxp, color, "%8.1f%%", "retiring", retiring * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) { - double bad_spec = td_bad_spec(cpu, st, &rsd); + double bad_spec = td_bad_spec(cpu_map_idx, st, &rsd); if (bad_spec > 0.1) color = PERF_COLOR_RED; print_metric(config, ctxp, color, "%8.1f%%", "bad speculation", bad_spec * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) { - double be_bound = td_be_bound(cpu, st, &rsd); + double be_bound = td_be_bound(cpu_map_idx, st, &rsd); const char *name = "backend bound"; static int have_recovery_bubbles = -1; @@ -1158,14 +1158,14 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, if (be_bound > 0.2) color = PERF_COLOR_RED; - if (td_total_slots(cpu, st, &rsd) > 0) + if (td_total_slots(cpu_map_idx, st, &rsd) > 0) print_metric(config, ctxp, color, "%8.1f%%", name, be_bound * 100.); else print_metric(config, ctxp, NULL, NULL, name, 0); } else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) && - full_td(cpu, st, &rsd)) { - double retiring = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double retiring = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_RETIRING, st, &rsd); if (retiring > 0.7) @@ -1173,8 +1173,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "retiring", retiring * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) && - full_td(cpu, st, &rsd)) { - double fe_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double fe_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_FE_BOUND, st, &rsd); if (fe_bound > 0.2) @@ -1182,8 +1182,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "frontend bound", fe_bound * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) && - full_td(cpu, st, &rsd)) { - double be_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double be_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BE_BOUND, st, &rsd); if (be_bound > 0.2) @@ -1191,8 +1191,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "backend bound", be_bound * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) && - full_td(cpu, st, &rsd)) { - double bad_spec = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd)) { + double bad_spec = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BAD_SPEC, st, &rsd); if (bad_spec > 0.1) @@ -1200,11 +1200,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "bad speculation", bad_spec * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_HEAVY_OPS) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double retiring = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double retiring = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_RETIRING, st, &rsd); - double heavy_ops = td_metric_ratio(cpu, + double heavy_ops = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_HEAVY_OPS, st, &rsd); double light_ops = retiring - heavy_ops; @@ -1220,11 +1220,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "light operations", light_ops * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_BR_MISPREDICT) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double bad_spec = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double bad_spec = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BAD_SPEC, st, &rsd); - double br_mis = td_metric_ratio(cpu, + double br_mis = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BR_MISPREDICT, st, &rsd); double m_clears = bad_spec - br_mis; @@ -1240,11 +1240,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "machine clears", m_clears * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_LAT) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double fe_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double fe_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_FE_BOUND, st, &rsd); - double fetch_lat = td_metric_ratio(cpu, + double fetch_lat = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_FETCH_LAT, st, &rsd); double fetch_bw = fe_bound - fetch_lat; @@ -1260,11 +1260,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, print_metric(config, ctxp, color, "%8.1f%%", "fetch bandwidth", fetch_bw * 100.); } else if (perf_stat_evsel__is(evsel, TOPDOWN_MEM_BOUND) && - full_td(cpu, st, &rsd) && (config->topdown_level > 1)) { - double be_bound = td_metric_ratio(cpu, + full_td(cpu_map_idx, st, &rsd) && (config->topdown_level > 1)) { + double be_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_BE_BOUND, st, &rsd); - double mem_bound = td_metric_ratio(cpu, + double mem_bound = td_metric_ratio(cpu_map_idx, STAT_TOPDOWN_MEM_BOUND, st, &rsd); double core_bound = be_bound - mem_bound; @@ -1281,12 +1281,12 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, core_bound * 100.); } else if (evsel->metric_expr) { generic_metric(config, evsel->metric_expr, evsel->metric_events, NULL, - evsel->name, evsel->metric_name, NULL, 1, cpu, out, st); - } else if (runtime_stat_n(st, STAT_NSECS, cpu, &rsd) != 0) { + evsel->name, evsel->metric_name, NULL, 1, cpu_map_idx, out, st); + } else if (runtime_stat_n(st, STAT_NSECS, cpu_map_idx, &rsd) != 0) { char unit = ' '; char unit_buf[10] = "/sec"; - total = runtime_stat_avg(st, STAT_NSECS, cpu, &rsd); + total = runtime_stat_avg(st, STAT_NSECS, cpu_map_idx, &rsd); if (total) ratio = convert_unit_double(1000000000.0 * avg / total, &unit); @@ -1294,7 +1294,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit); print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio); } else if (perf_stat_evsel__is(evsel, SMI_NUM)) { - print_smi_cost(config, cpu, out, st, &rsd); + print_smi_cost(config, cpu_map_idx, out, st, &rsd); } else { num = 0; } @@ -1307,7 +1307,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, out->new_line(config, ctxp); generic_metric(config, mexp->metric_expr, mexp->metric_events, mexp->metric_refs, evsel->name, mexp->metric_name, - mexp->metric_unit, mexp->runtime, cpu, out, st); + mexp->metric_unit, mexp->runtime, cpu_map_idx, out, st); } } if (num == 0) diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 09ea334586f2..ee6f03481215 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -152,11 +152,13 @@ static void evsel__free_stat_priv(struct evsel *evsel) zfree(&evsel->stats); } -static int evsel__alloc_prev_raw_counts(struct evsel *evsel, int ncpus, int nthreads) +static int evsel__alloc_prev_raw_counts(struct evsel *evsel) { + int cpu_map_nr = evsel__nr_cpus(evsel); + int nthreads = perf_thread_map__nr(evsel->core.threads); struct perf_counts *counts; - counts = perf_counts__new(ncpus, nthreads); + counts = perf_counts__new(cpu_map_nr, nthreads); if (counts) evsel->prev_raw_counts = counts; @@ -177,12 +179,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel) static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw) { - int ncpus = evsel__nr_cpus(evsel); - int nthreads = perf_thread_map__nr(evsel->core.threads); - if (evsel__alloc_stat_priv(evsel) < 0 || - evsel__alloc_counts(evsel, ncpus, nthreads) < 0 || - (alloc_raw && evsel__alloc_prev_raw_counts(evsel, ncpus, nthreads) < 0)) + evsel__alloc_counts(evsel) < 0 || + (alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0)) return -ENOMEM; return 0; @@ -293,11 +292,12 @@ static bool pkg_id_equal(const void *__key1, const void *__key2, return *key1 == *key2; } -static int check_per_pkg(struct evsel *counter, - struct perf_counts_values *vals, int cpu, bool *skip) +static int check_per_pkg(struct evsel *counter, struct perf_counts_values *vals, + int cpu_map_idx, bool *skip) { struct hashmap *mask = counter->per_pkg_mask; struct perf_cpu_map *cpus = evsel__cpus(counter); + struct perf_cpu cpu = perf_cpu_map__cpu(cpus, cpu_map_idx); int s, d, ret = 0; uint64_t *key; @@ -328,7 +328,7 @@ static int check_per_pkg(struct evsel *counter, if (!(vals->run && vals->ena)) return 0; - s = cpu_map__get_socket(cpus, cpu, NULL).socket; + s = cpu__get_socket_id(cpu); if (s < 0) return -1; @@ -336,7 +336,7 @@ static int check_per_pkg(struct evsel *counter, * On multi-die system, die_id > 0. On no-die system, die_id = 0. * We use hashmap(socket, die) to check the used socket+die pair. */ - d = cpu_map__get_die(cpus, cpu, NULL).die; + d = cpu__get_die_id(cpu); if (d < 0) return -1; @@ -345,9 +345,10 @@ static int check_per_pkg(struct evsel *counter, return -ENOMEM; *key = (uint64_t)d << 32 | s; - if (hashmap__find(mask, (void *)key, NULL)) + if (hashmap__find(mask, (void *)key, NULL)) { *skip = true; - else + free(key); + } else ret = hashmap__add(mask, (void *)key, (void *)1); return ret; @@ -355,14 +356,14 @@ static int check_per_pkg(struct evsel *counter, static int process_counter_values(struct perf_stat_config *config, struct evsel *evsel, - int cpu, int thread, + int cpu_map_idx, int thread, struct perf_counts_values *count) { struct perf_counts_values *aggr = &evsel->counts->aggr; static struct perf_counts_values zero; bool skip = false; - if (check_per_pkg(evsel, count, cpu, &skip)) { + if (check_per_pkg(evsel, count, cpu_map_idx, &skip)) { pr_err("failed to read per-pkg counter\n"); return -1; } @@ -378,11 +379,11 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel, case AGGR_NODE: case AGGR_NONE: if (!evsel->snapshot) - evsel__compute_deltas(evsel, cpu, thread, count); + evsel__compute_deltas(evsel, cpu_map_idx, thread, count); perf_counts_values__scale(count, config->scale, NULL); if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) { perf_stat__update_shadow_stats(evsel, count->val, - cpu, &rt_stat); + cpu_map_idx, &rt_stat); } if (config->aggr_mode == AGGR_THREAD) { @@ -411,15 +412,15 @@ static int process_counter_maps(struct perf_stat_config *config, { int nthreads = perf_thread_map__nr(counter->core.threads); int ncpus = evsel__nr_cpus(counter); - int cpu, thread; + int idx, thread; if (counter->core.system_wide) nthreads = 1; for (thread = 0; thread < nthreads; thread++) { - for (cpu = 0; cpu < ncpus; cpu++) { - if (process_counter_values(config, counter, cpu, thread, - perf_counts(counter->counts, cpu, thread))) + for (idx = 0; idx < ncpus; idx++) { + if (process_counter_values(config, counter, idx, thread, + perf_counts(counter->counts, idx, thread))) return -1; } } @@ -531,7 +532,7 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp) int create_perf_stat_counter(struct evsel *evsel, struct perf_stat_config *config, struct target *target, - int cpu) + int cpu_map_idx) { struct perf_event_attr *attr = &evsel->core.attr; struct evsel *leader = evsel__leader(evsel); @@ -585,7 +586,7 @@ int create_perf_stat_counter(struct evsel *evsel, } if (target__has_cpu(target) && !target__has_per_thread(target)) - return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu); + return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu_map_idx); return evsel__open_per_thread(evsel, evsel->core.threads); } diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 32c8527de347..335d19cc3063 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -108,8 +108,7 @@ struct runtime_stat { struct rblist value_list; }; -typedef struct aggr_cpu_id (*aggr_get_id_t)(struct perf_stat_config *config, - struct perf_cpu_map *m, int cpu); +typedef struct aggr_cpu_id (*aggr_get_id_t)(struct perf_stat_config *config, struct perf_cpu cpu); struct perf_stat_config { enum aggr_mode aggr_mode; @@ -209,7 +208,7 @@ void perf_stat__init_shadow_stats(void); void perf_stat__reset_shadow_stats(void); void perf_stat__reset_shadow_per_stat(struct runtime_stat *st); void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, - int cpu, struct runtime_stat *st); + int cpu_map_idx, struct runtime_stat *st); struct perf_stat_output_ctx { void *ctx; print_metric_t print_metric; @@ -249,10 +248,10 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp); int create_perf_stat_counter(struct evsel *evsel, struct perf_stat_config *config, struct target *target, - int cpu); + int cpu_map_idx); void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *config, struct target *_target, struct timespec *ts, int argc, const char **argv); struct metric_expr; -double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_stat *st); +double test_generic_metric(struct metric_expr *mexp, int cpu_map_idx, struct runtime_stat *st); #endif diff --git a/tools/perf/util/svghelper.c b/tools/perf/util/svghelper.c index 96f941e01681..4c9f211249db 100644 --- a/tools/perf/util/svghelper.c +++ b/tools/perf/util/svghelper.c @@ -728,7 +728,7 @@ static int str_to_bitmap(char *s, cpumask_t *b, int nr_cpus) int i; int ret = 0; struct perf_cpu_map *m; - int c; + struct perf_cpu c; m = perf_cpu_map__new(s); if (!m) @@ -736,12 +736,12 @@ static int str_to_bitmap(char *s, cpumask_t *b, int nr_cpus) for (i = 0; i < m->nr; i++) { c = m->map[i]; - if (c >= nr_cpus) { + if (c.cpu >= nr_cpus) { ret = -1; break; } - set_bit(c, cpumask_bits(b)); + set_bit(c.cpu, cpumask_bits(b)); } perf_cpu_map__put(m); diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 198982109f0f..c9ba8050cc2b 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1191,7 +1191,7 @@ static void synthesize_cpus(struct cpu_map_entries *cpus, cpus->nr = map->nr; for (i = 0; i < map->nr; i++) - cpus->cpu[i] = map->map[i]; + cpus->cpu[i] = map->map[i].cpu; } static void synthesize_mask(struct perf_record_record_cpu_map *mask, @@ -1203,7 +1203,7 @@ static void synthesize_mask(struct perf_record_record_cpu_map *mask, mask->long_size = sizeof(long); for (i = 0; i < map->nr; i++) - set_bit(map->map[i], mask->mask); + set_bit(map->map[i].cpu, mask->mask); } static size_t cpus_size(struct perf_cpu_map *map) @@ -1219,7 +1219,7 @@ static size_t mask_size(struct perf_cpu_map *map, int *max) for (i = 0; i < map->nr; i++) { /* bit position of the cpu is + 1 */ - int bit = map->map[i] + 1; + int bit = map->map[i].cpu + 1; if (bit > *max) *max = bit; @@ -1354,7 +1354,7 @@ int perf_event__synthesize_stat_config(struct perf_tool *tool, } int perf_event__synthesize_stat(struct perf_tool *tool, - u32 cpu, u32 thread, u64 id, + struct perf_cpu cpu, u32 thread, u64 id, struct perf_counts_values *count, perf_event__handler_t process, struct machine *machine) @@ -1366,7 +1366,7 @@ int perf_event__synthesize_stat(struct perf_tool *tool, event.header.misc = 0; event.id = id; - event.cpu = cpu; + event.cpu = cpu.cpu; event.thread = thread; event.val = count->val; event.ena = count->ena; @@ -1763,7 +1763,7 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_ } e->idx = sid->idx; - e->cpu = sid->cpu; + e->cpu = sid->cpu.cpu; e->tid = sid->tid; } } diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index c931433bacbf..78a0450db164 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -6,6 +6,7 @@ #include <sys/types.h> // pid_t #include <linux/compiler.h> #include <linux/types.h> +#include <perf/cpumap.h> struct auxtrace_record; struct dso; @@ -63,7 +64,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); int perf_event__synthesize_stat_round(struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); -int perf_event__synthesize_stat(struct perf_tool *tool, u32 cpu, u32 thread, u64 id, struct perf_counts_values *count, perf_event__handler_t process, struct machine *machine); +int perf_event__synthesize_stat(struct perf_tool *tool, struct perf_cpu cpu, u32 thread, u64 id, struct perf_counts_values *count, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_thread_map2(struct perf_tool *tool, struct perf_thread_map *threads, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_thread_map(struct perf_tool *tool, struct perf_thread_map *threads, perf_event__handler_t process, struct machine *machine, bool needs_mmap, bool mmap_data); int perf_event__synthesize_threads(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine, bool needs_mmap, bool mmap_data, unsigned int nr_threads_synthesize); diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index df3c4671be72..fb4f6616b5fa 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -416,3 +416,18 @@ char *perf_exe(char *buf, int len) } return strcpy(buf, "perf"); } + +void perf_debuginfod_setup(struct perf_debuginfod *di) +{ + /* + * By default '!di->set' we clear DEBUGINFOD_URLS, so debuginfod + * processing is not triggered, otherwise we set it to 'di->urls' + * value. If 'di->urls' is "system" we keep DEBUGINFOD_URLS value. + */ + if (!di->set) + setenv("DEBUGINFOD_URLS", "", 1); + else if (di->urls && strcmp(di->urls, "system")) + setenv("DEBUGINFOD_URLS", di->urls, 1); + + pr_debug("DEBUGINFOD_URLS=%s\n", getenv("DEBUGINFOD_URLS")); +} diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index 9f0d36ba77f2..7b625cbd2dd8 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -11,6 +11,9 @@ #include <stddef.h> #include <linux/compiler.h> #include <sys/types.h> +#ifndef __cplusplus +#include <internal/cpumap.h> +#endif /* General helper functions */ void usage(const char *err) __noreturn; @@ -66,6 +69,12 @@ extern bool test_attr__enabled; void test_attr__ready(void); void test_attr__init(void); struct perf_event_attr; -void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu, +void test_attr__open(struct perf_event_attr *attr, pid_t pid, struct perf_cpu cpu, int fd, int group_fd, unsigned long flags); + +struct perf_debuginfod { + const char *urls; + bool set; +}; +void perf_debuginfod_setup(struct perf_debuginfod *di); #endif /* GIT_COMPAT_UTIL_H */ |