diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2015-08-31 19:49:05 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-08-31 19:49:05 -0700 |
commit | 41d859a83c567a9c9f50a34082cc64aab0abb0cd (patch) | |
tree | ab911ea521701401413d041e1b92225f3dbdab41 /tools/perf/ui/browsers/annotate.c | |
parent | 4658000955d1864b54890214434e171949c7f1c5 (diff) | |
parent | bac2e4a96d1c0bcce5e9654dcc902f75576b9b03 (diff) | |
download | lwn-41d859a83c567a9c9f50a34082cc64aab0abb0cd.tar.gz lwn-41d859a83c567a9c9f50a34082cc64aab0abb0cd.zip |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Main perf kernel side changes:
- uprobes updates/fixes. (Oleg Nesterov)
- Add PERF_RECORD_SWITCH to indicate context switches and use it in
tooling. (Adrian Hunter)
- Support BPF programs attached to uprobes and first steps for BPF
tooling support. (Wang Nan)
- x86 generic x86 MSR-to-perf PMU driver. (Andy Lutomirski)
- x86 Intel PT, LBR and BTS updates. (Alexander Shishkin)
- x86 Intel Skylake support. (Andi Kleen)
- x86 Intel Knights Landing (KNL) RAPL support. (Dasaratharaman
Chandramouli)
- x86 Intel Broadwell-DE uncore support. (Kan Liang)
- x86 hw breakpoints robustization (Andy Lutomirski)
Main perf tooling side changes:
- Support Intel PT in several tools, enabling the use of the
processor trace feature introduced in Intel Broadwell processors:
(Adrian Hunter)
# dmesg | grep Performance
# [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
# perf record -e intel_pt//u -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.216 MB perf.data ]
# perf script # then navigate in the tool output to some area, like this one:
184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)
- Add support for using several Intel PT features (CYC, MTC packets),
the relevant documentation was updated in:
tools/perf/Documentation/intel-pt.txt
briefly describing those packets, its purposes, how to configure
them in the event config terms and relevant external documentation
for further reading. (Adrian Hunter)
- Introduce support for probing at an absolute address, for user and
kernel 'perf probe's, useful when one have the symbol maps on a
developer machine but not on an embedded system. (Wang Nan)
- Add Intel BTS support, with a call-graph script to show it and PT
in use in a GUI using 'perf script' python scripting with
postgresql and Qt. (Adrian Hunter)
- Allow selecting the type of callchains per event, including
disabling callchains in all but one entry in an event list, to save
space, and also to ask for the callchains collected in one event to
be used in other events. (Kan Liang)
- Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
de Melo)
* A bunch more translate file/pathnames from pointers to strings.
* Convert numbers to strings for the 'keyctl' syscall 'option'
arg.
* Add missing 'clockid' entries.
- Introduce 'srcfile' sort key: (Andi Kleen)
# perf record -F 10000 usleep 1
# perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
<SNIP>
# Overhead Source File
26.49% copy_page_64.S
5.49% signal.c
0.51% msr.h
#
It can be combined with other fields, for instance, experiment with
'-s srcfile,symbol'.
There are some oddities in some distros and with some specific
DSOs, being investigated, so your mileage may vary.
- Support per-event 'freq' term: (Namhyung Kim)
$ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
$ perf evlist -F
cpu/instructions,freq=1234/: sample_freq=1234
cycles: sample_period=1000
$
- Deref sys_enter pointer args with contents from probe:vfs_getname,
showing pathnames instead of pointers in many syscalls in 'perf
trace'. (Arnaldo Carvalho de Melo)
- Stop collecting /proc/kallsyms in perf.data files, saving about
4.5MB on a typical x86-64 system, use the the symbol resolution
routines used in all the other tools (report, top, etc) now that we
can ask libtraceevent to use perf's symbol resolution code.
(Arnaldo Carvalho de Melo)
- Allow filtering out of perf's PID via 'perf record --exclude-perf'.
(Wang Nan)
- 'perf trace' now supports syscall groups, like strace, i.e:
$ trace -e file touch file
Will expand 'file' into multiple, file related, syscalls. More
work needed to add extra groups for other syscall groups, and also
to complement what was added for the 'file' group, included as a
proof of concept. (Arnaldo Carvalho de Melo)
- Add lock_pi stresser to 'perf bench futex', to test the kernel code
related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso)
- Let user have timestamps with per-thread recording in 'perf record'
(Adrian Hunter)
- ... and tons of other changes, see the shortlog and the Git log for
details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
perf evlist: Add backpointer for perf_env to evlist
perf tools: Rename perf_session_env to perf_env
perf tools: Do not change lib/api/fs/debugfs directly
perf tools: Add tracing_path and remove unneeded functions
perf buildid: Introduce sysfs/filename__sprintf_build_id
perf evsel: Add a backpointer to the evlist a evsel is in
perf trace: Add header with copyright and background info
perf scripts python: Add new compaction-times script
perf stat: Get correct cpu id for print_aggr
tools lib traceeveent: Allow for negative numbers in print format
perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
tracing/uprobes: Do not print '0x (null)' when offset is 0
perf probe: Support probing at absolute address
perf probe: Fix error reported when offset without function
perf probe: Fix list result when address is zero
perf probe: Fix list result when symbol can't be found
tools build: Allow duplicate objects in the object list
perf tools: Remove export.h from MANIFEST
perf probe: Prevent segfault when reading probe point with absolute address
perf tools: Update Intel PT documentation
...
Diffstat (limited to 'tools/perf/ui/browsers/annotate.c')
-rw-r--r-- | tools/perf/ui/browsers/annotate.c | 149 |
1 files changed, 122 insertions, 27 deletions
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 5995a8bd7c69..29739b347599 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1,7 +1,6 @@ #include "../../util/util.h" #include "../browser.h" #include "../helpline.h" -#include "../libslang.h" #include "../ui.h" #include "../util.h" #include "../../util/annotate.h" @@ -16,6 +15,9 @@ struct disasm_line_samples { u64 nr; }; +#define IPC_WIDTH 6 +#define CYCLES_WIDTH 6 + struct browser_disasm_line { struct rb_node rb_node; u32 idx; @@ -53,6 +55,7 @@ struct annotate_browser { int max_jump_sources; int nr_jumps; bool searching_backwards; + bool have_cycles; u8 addr_width; u8 jumps_width; u8 target_width; @@ -96,6 +99,15 @@ static int annotate_browser__set_jumps_percent_color(struct annotate_browser *br return ui_browser__set_color(&browser->b, color); } +static int annotate_browser__pcnt_width(struct annotate_browser *ab) +{ + int w = 7 * ab->nr_events; + + if (ab->have_cycles) + w += IPC_WIDTH + CYCLES_WIDTH; + return w; +} + static void annotate_browser__write(struct ui_browser *browser, void *entry, int row) { struct annotate_browser *ab = container_of(browser, struct annotate_browser, b); @@ -106,7 +118,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int (!current_entry || (browser->use_navkeypressed && !browser->navkeypressed))); int width = browser->width, printed; - int i, pcnt_width = 7 * ab->nr_events; + int i, pcnt_width = annotate_browser__pcnt_width(ab); double percent_max = 0.0; char bf[256]; @@ -116,19 +128,36 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int } if (dl->offset != -1 && percent_max != 0.0) { - for (i = 0; i < ab->nr_events; i++) { - ui_browser__set_percent_color(browser, - bdl->samples[i].percent, - current_entry); - if (annotate_browser__opts.show_total_period) - slsmg_printf("%6" PRIu64 " ", - bdl->samples[i].nr); - else - slsmg_printf("%6.2f ", bdl->samples[i].percent); + if (percent_max != 0.0) { + for (i = 0; i < ab->nr_events; i++) { + ui_browser__set_percent_color(browser, + bdl->samples[i].percent, + current_entry); + if (annotate_browser__opts.show_total_period) { + ui_browser__printf(browser, "%6" PRIu64 " ", + bdl->samples[i].nr); + } else { + ui_browser__printf(browser, "%6.2f ", + bdl->samples[i].percent); + } + } + } else { + ui_browser__write_nstring(browser, " ", 7 * ab->nr_events); } } else { ui_browser__set_percent_color(browser, 0, current_entry); - slsmg_write_nstring(" ", pcnt_width); + ui_browser__write_nstring(browser, " ", 7 * ab->nr_events); + } + if (ab->have_cycles) { + if (dl->ipc) + ui_browser__printf(browser, "%*.2f ", IPC_WIDTH - 1, dl->ipc); + else + ui_browser__write_nstring(browser, " ", IPC_WIDTH); + if (dl->cycles) + ui_browser__printf(browser, "%*" PRIu64 " ", + CYCLES_WIDTH - 1, dl->cycles); + else + ui_browser__write_nstring(browser, " ", CYCLES_WIDTH); } SLsmg_write_char(' '); @@ -138,7 +167,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int width += 1; if (!*dl->line) - slsmg_write_nstring(" ", width - pcnt_width); + ui_browser__write_nstring(browser, " ", width - pcnt_width); else if (dl->offset == -1) { if (dl->line_nr && annotate_browser__opts.show_linenr) printed = scnprintf(bf, sizeof(bf), "%-*d ", @@ -146,8 +175,8 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int else printed = scnprintf(bf, sizeof(bf), "%*s ", ab->addr_width, " "); - slsmg_write_nstring(bf, printed); - slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1); + ui_browser__write_nstring(browser, bf, printed); + ui_browser__write_nstring(browser, dl->line, width - printed - pcnt_width + 1); } else { u64 addr = dl->offset; int color = -1; @@ -166,7 +195,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int bdl->jump_sources); prev = annotate_browser__set_jumps_percent_color(ab, bdl->jump_sources, current_entry); - slsmg_write_nstring(bf, printed); + ui_browser__write_nstring(browser, bf, printed); ui_browser__set_color(browser, prev); } @@ -180,7 +209,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int if (change_color) color = ui_browser__set_color(browser, HE_COLORSET_ADDR); - slsmg_write_nstring(bf, printed); + ui_browser__write_nstring(browser, bf, printed); if (change_color) ui_browser__set_color(browser, color); if (dl->ins && dl->ins->ops->scnprintf) { @@ -194,11 +223,11 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int ui_browser__write_graph(browser, SLSMG_RARROW_CHAR); SLsmg_write_char(' '); } else { - slsmg_write_nstring(" ", 2); + ui_browser__write_nstring(browser, " ", 2); } } else { if (strcmp(dl->name, "retq")) { - slsmg_write_nstring(" ", 2); + ui_browser__write_nstring(browser, " ", 2); } else { ui_browser__write_graph(browser, SLSMG_LARROW_CHAR); SLsmg_write_char(' '); @@ -206,7 +235,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int } disasm_line__scnprintf(dl, bf, sizeof(bf), !annotate_browser__opts.use_offset); - slsmg_write_nstring(bf, width - pcnt_width - 3 - printed); + ui_browser__write_nstring(browser, bf, width - pcnt_width - 3 - printed); } if (current_entry) @@ -231,7 +260,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) unsigned int from, to; struct map_symbol *ms = ab->b.priv; struct symbol *sym = ms->sym; - u8 pcnt_width = 7; + u8 pcnt_width = annotate_browser__pcnt_width(ab); /* PLT symbols contain external offsets */ if (strstr(sym->name, "@plt")) @@ -255,8 +284,6 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) to = (u64)btarget->idx; } - pcnt_width *= ab->nr_events; - ui_browser__set_color(browser, HE_COLORSET_CODE); __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width, from, to); @@ -266,9 +293,7 @@ static unsigned int annotate_browser__refresh(struct ui_browser *browser) { struct annotate_browser *ab = container_of(browser, struct annotate_browser, b); int ret = ui_browser__list_head_refresh(browser); - int pcnt_width; - - pcnt_width = 7 * ab->nr_events; + int pcnt_width = annotate_browser__pcnt_width(ab); if (annotate_browser__opts.jump_arrows) annotate_browser__draw_current_jump(browser); @@ -390,7 +415,7 @@ static void annotate_browser__calc_percent(struct annotate_browser *browser, max_percent = bpos->samples[i].percent; } - if (max_percent < 0.01) { + if (max_percent < 0.01 && pos->ipc == 0) { RB_CLEAR_NODE(&bpos->rb_node); continue; } @@ -869,6 +894,75 @@ int hist_entry__tui_annotate(struct hist_entry *he, struct perf_evsel *evsel, return map_symbol__tui_annotate(&he->ms, evsel, hbt); } + +static unsigned count_insn(struct annotate_browser *browser, u64 start, u64 end) +{ + unsigned n_insn = 0; + u64 offset; + + for (offset = start; offset <= end; offset++) { + if (browser->offsets[offset]) + n_insn++; + } + return n_insn; +} + +static void count_and_fill(struct annotate_browser *browser, u64 start, u64 end, + struct cyc_hist *ch) +{ + unsigned n_insn; + u64 offset; + + n_insn = count_insn(browser, start, end); + if (n_insn && ch->num && ch->cycles) { + float ipc = n_insn / ((double)ch->cycles / (double)ch->num); + + /* Hide data when there are too many overlaps. */ + if (ch->reset >= 0x7fff || ch->reset >= ch->num / 2) + return; + + for (offset = start; offset <= end; offset++) { + struct disasm_line *dl = browser->offsets[offset]; + + if (dl) + dl->ipc = ipc; + } + } +} + +/* + * This should probably be in util/annotate.c to share with the tty + * annotate, but right now we need the per byte offsets arrays, + * which are only here. + */ +static void annotate__compute_ipc(struct annotate_browser *browser, size_t size, + struct symbol *sym) +{ + u64 offset; + struct annotation *notes = symbol__annotation(sym); + + if (!notes->src || !notes->src->cycles_hist) + return; + + pthread_mutex_lock(¬es->lock); + for (offset = 0; offset < size; ++offset) { + struct cyc_hist *ch; + + ch = ¬es->src->cycles_hist[offset]; + if (ch && ch->cycles) { + struct disasm_line *dl; + + if (ch->have_start) + count_and_fill(browser, ch->start, offset, ch); + dl = browser->offsets[offset]; + if (dl && ch->num_aggr) + dl->cycles = ch->cycles_aggr / ch->num_aggr; + browser->have_cycles = true; + } + } + pthread_mutex_unlock(¬es->lock); +} + static void annotate_browser__mark_jump_targets(struct annotate_browser *browser, size_t size) { @@ -991,6 +1085,7 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, } annotate_browser__mark_jump_targets(&browser, size); + annotate__compute_ipc(&browser, size, sym); browser.addr_width = browser.target_width = browser.min_addr_width = hex_width(size); browser.max_addr_width = hex_width(sym->end); |