diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-02-23 10:29:51 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-02-23 10:29:51 -0800 |
commit | 0df82189bc42037678fa590a77ed0116f428c90d (patch) | |
tree | 2d75bef59c8bf1d44d7c7899b9fa7512bba10fa0 /tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json | |
parent | b72b5fecc1b8a2e595bd03d7d257c88ea3f9fd45 (diff) | |
parent | f9fa0778ee7349a9aa3d2ea10e9f2ab843a0b44e (diff) | |
download | lwn-0df82189bc42037678fa590a77ed0116f428c90d.tar.gz lwn-0df82189bc42037678fa590a77ed0116f428c90d.zip |
Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"Miscellaneous:
- Add Ian Rogers to MAINTAINERS as a perf tools reviewer.
- Add support for retire latency feature (pipeline stall of a
instruction compared to the previous one, in cycles) present on
some Intel processors.
- Add 'perf c2c' report option to show false sharing with adjacent
cachelines, to be used in machines with cacheline prefetching,
where accesses to a cacheline brings the next one too.
- Skip 'perf test bpf' when the required kernel-debuginfo package
isn't installed.
- Avoid d3-flame-graph package dependency in 'perf script flamegraph',
making this feature more generally available.
- Add JSON metric events to present CPI stall cycles in Power10.
- Assorted improvements/refactorings on the JSON metrics parsing
code.
perf lock contention:
- Add -o/--lock-owner option:
$ sudo ./perf lock contention -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
The owner is unknown in most cases. Filtering only for the
mutex locks, it will more likely get the owners.
- -S/--callstack-filter is to limit display entries having the given
string in the callstack:
$ sudo ./perf lock contention -abv -S net sleep 1
...
contended total wait max wait avg wait type caller
5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d
0xffffffffa5dd1c60 _raw_spin_lock+0x30
0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d
0xffffffffa5cd8267 ip6_finish_output2+0x2c7
0xffffffffa5cdac14 ip6_finish_output+0x1d4
0xffffffffa5cdb477 ip6_xmit+0x457
0xffffffffa5d1fd17 inet6_csk_xmit+0xd7
0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a
0xffffffffa5c6467d tcp_keepalive_timer+0x2fd
Please note that to have the -b option (BPF) working above one has
to build with BUILD_BPF_SKEL=1.
- Add more 'perf test' entries to test these new features.
perf script:
- Add 'cgroup' field for 'perf script' output:
$ perf record --all-cgroups -- true
$ perf script -F comm,pid,cgroup
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
- Add support for showing branch speculation information in 'perf
script' and in the 'perf report' raw dump (-D).
perf record:
- Fix 'perf record' segfault with --overwrite and --max-size.
perf test/bench:
- Switch basic BPF filtering test to use syscall tracepoint to avoid
the variable number of probes inserted when using the previous
probe point (do_epoll_wait) that happens on different CPU
architectures.
- Fix DWARF unwind test by adding non-inline to expected function in
a backtrace.
- Use 'grep -c' where the longer form 'grep | wc -l' was being used.
- Add getpid and execve benchmarks to 'perf bench syscall'.
Intel PT:
- Add support for synthesizing "cycle" events from Intel PT traces as
we support "instruction" events when Intel PT CYC packets are
available. This enables much more accurate profiles than when using
the regular 'perf record -e cycles' (the default) when the workload
lasts for very short periods (<10ms).
- .plt symbol handling improvements, better handling IBT (in the past
MPX) done in the context of decoding Intel PT processor traces,
IFUNC symbols on x86_64, static executables, understanding .plt.got
symbols on x86_64.
- Add a 'perf test' to test symbol resolution, part of the .plt
improvements series, this tests things like symbol size in contexts
where only the symbol start is available (kallsyms), etc.
- Better handle auxtrace/Intel PT data when using pipe mode (perf
record sleep 1|perf report).
- Fix symbol lookup with kcore with multiple segments match stext,
getting the symbol resolution to just show DSOs as unknown.
ARM:
- Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace
Macrocell v4).
- Ensure ARM64 CoreSight timestamps don't go backwards.
- Document that ARM64 SPE (Statistical Profiling Extension) is used
with 'perf c2c/mem'.
- Add raw decoding for ARM64 SPEv1.2 previous branch address.
- Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1,
TLB, cache, branch, PE utilization and instruction mix metrics.
- Update decoder code for OpenCSD version 1.4, on ARM64 systems.
- Fix command line auto-complete of CPU events on aarch64.
Build:
- Fix 'perf probe' and 'perf test' when libtraceevent isn't linked,
as several tests use tracepoints, those should be skipped.
- More fallout fixes for the removal of tools/lib/traceevent/.
- Fix build error when linking with libpfm"
* tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits)
perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc
perf intel-pt: Synthesize cycle events
perf c2c: Add report option to show false sharing in adjacent cachelines
perf record: Fix segfault with --overwrite and --max-size
perf stat: Avoid merging/aggregating metric counts twice
perf tools: Fix perf tool build error in util/pfm.c
perf tools: Fix auto-complete on aarch64
perf lock contention: Support old rw_semaphore type
perf lock contention: Add -o/--lock-owner option
perf lock contention: Fix to save callstack for the default modified
perf test bpf: Skip test if kernel-debuginfo is not present
perf probe: Update the exit error codes in function try_to_find_probe_trace_event
perf script: Fix missing Retire Latency fields option documentation
perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT
perf test x86: Support the retire_lat (Retire Latency) sample_type check
perf test bpf: Check for libtraceevent support
perf script: Support Retire Latency
perf report: Support Retire Latency
perf lock contention: Support filters for different aggregation
...
Diffstat (limited to 'tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json')
-rw-r--r-- | tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json | 273 |
1 files changed, 273 insertions, 0 deletions
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json new file mode 100644 index 000000000000..8ad15b726dca --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json @@ -0,0 +1,273 @@ +[ + { + "ArchStdEvent": "FRONTEND_BOUND", + "MetricExpr": "((stall_slot_frontend) if (#slots - 5) else (stall_slot_frontend - cpu_cycles)) / (#slots * cpu_cycles)" + }, + { + "ArchStdEvent": "BAD_SPECULATION", + "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))" + }, + { + "ArchStdEvent": "RETIRING", + "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))" + }, + { + "ArchStdEvent": "BACKEND_BOUND" + }, + { + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB", + "BriefDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups", + "MetricGroup": "TLB", + "MetricName": "l1d_tlb_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB", + "BriefDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups", + "MetricGroup": "TLB", + "MetricName": "l1i_tlb_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB", + "BriefDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups", + "MetricGroup": "TLB", + "MetricName": "l2_tlb_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000", + "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses", + "MetricGroup": "TLB", + "MetricName": "dtlb_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "DTLB_WALK / L1D_TLB", + "BriefDescription": "The rate of DTLB Walks to the overall L1D TLB lookups", + "MetricGroup": "TLB", + "MetricName": "dtlb_walk_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000", + "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses", + "MetricGroup": "TLB", + "MetricName": "itlb_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "ITLB_WALK / L1I_TLB", + "BriefDescription": "The rate of ITLB Walks to the overall L1I TLB lookups", + "MetricGroup": "TLB", + "MetricName": "itlb_walk_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions", + "MetricGroup": "Cache", + "MetricName": "l1i_cache_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE", + "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache", + "MetricGroup": "Cache", + "MetricName": "l1i_cache_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions", + "MetricGroup": "Cache", + "MetricName": "l1d_cache_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE", + "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache", + "MetricGroup": "Cache", + "MetricName": "l1d_cache_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions", + "MetricGroup": "Cache", + "MetricName": "l2d_cache_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE", + "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache", + "MetricGroup": "Cache", + "MetricName": "l2d_cache_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000", + "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions", + "MetricGroup": "Cache", + "MetricName": "l3d_cache_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE", + "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache", + "MetricGroup": "Cache", + "MetricName": "l3d_cache_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000", + "BriefDescription": "The rate of LL Cache read misses per kilo instructions", + "MetricGroup": "Cache", + "MetricName": "ll_cache_read_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD", + "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read", + "MetricGroup": "Cache", + "MetricName": "ll_cache_read_miss_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD", + "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read", + "MetricGroup": "Cache", + "MetricName": "ll_cache_read_hit_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000", + "BriefDescription": "The rate of branches mis-predicted per kilo instructions", + "MetricGroup": "Branch", + "MetricName": "branch_mpki", + "ScaleUnit": "1MPKI" + }, + { + "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000", + "BriefDescription": "The rate of branches retired per kilo instructions", + "MetricGroup": "Branch", + "MetricName": "branch_pki", + "ScaleUnit": "1PKI" + }, + { + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED", + "BriefDescription": "The rate of branches mis-predited to the overall branches", + "MetricGroup": "Branch", + "MetricName": "branch_miss_pred_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "instructions / CPU_CYCLES", + "BriefDescription": "The average number of instructions executed for each cycle.", + "MetricGroup": "PEutilization", + "MetricName": "ipc" + }, + { + "MetricExpr": "ipc / 5", + "BriefDescription": "IPC percentage of peak. The peak of IPC is 5.", + "MetricGroup": "PEutilization", + "MetricName": "ipc_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "INST_RETIRED / CPU_CYCLES", + "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)", + "MetricGroup": "PEutilization", + "MetricName": "retired_ipc" + }, + { + "MetricExpr": "INST_SPEC / CPU_CYCLES", + "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)", + "MetricGroup": "PEutilization", + "MetricName": "spec_ipc" + }, + { + "MetricExpr": "OP_RETIRED / OP_SPEC", + "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)", + "MetricGroup": "PEutilization", + "MetricName": "retired_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "1 - OP_RETIRED / OP_SPEC", + "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)", + "MetricGroup": "PEutilization", + "MetricName": "wasted_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT if (#slots - 5) else (STALL_SLOT - CPU_CYCLES)) / (#slots * CPU_CYCLES))", + "BriefDescription": "The truly effective ratio of micro-operations executed by the CPU, which means that misprediction and stall are not included", + "MetricGroup": "PEutilization", + "MetricName": "cpu_utilization", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "LD_SPEC / INST_SPEC", + "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "load_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "ST_SPEC / INST_SPEC", + "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "store_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "DP_SPEC / INST_SPEC", + "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "data_process_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "ASE_SPEC / INST_SPEC", + "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "advanced_simd_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "VFP_SPEC / INST_SPEC", + "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "float_point_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "CRYPTO_SPEC / INST_SPEC", + "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "crypto_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "BR_IMMED_SPEC / INST_SPEC", + "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "branch_immed_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "BR_RETURN_SPEC / INST_SPEC", + "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "branch_return_spec_rate", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC", + "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed", + "MetricGroup": "InstructionMix", + "MetricName": "branch_indirect_spec_rate", + "ScaleUnit": "100%" + } +] |