Age | Commit message (Collapse) | Author |
|
commit caa470475d9b59eeff093ae650800d34612c4379 upstream.
The original patch introducing this header wrote the number of CPUs available
and online in one order and then swapped those values when reading, fix it.
Before:
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 4
# echo 0 > /sys/devices/system/cpu/cpu2/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 3
# echo 0 > /sys/devices/system/cpu/cpu1/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 2
After the fix, bringing back the CPUs online:
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 2
# nrcpus avail : 4
# echo 1 > /sys/devices/system/cpu/cpu2/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 3
# nrcpus avail : 4
# echo 1 > /sys/devices/system/cpu/cpu1/online
# perf record usleep 1
# perf report --header-only | grep 'nrcpus \(online\|avail\)'
# nrcpus online : 4
# nrcpus avail : 4
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)")
Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 93df8a1ed6231727c5db94a80b1a6bd5ee67cec3 upstream.
perf currently fails to build on MIPS as there is no
tools/perf/arch/mips/Build file. Adding an empty file fixes this as
there are no MIPS-specific sources to build.
It looks like the same is needed for Alpha and PA-RISC, though I
haven't been able to test those.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 5e8c0fb6a957 ("perf build: Add arch x86 objects building")
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1438704627.7315.2.camel@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 601083cffb7cabdcc55b8195d732f0f7028570fa upstream.
print_aggr() fails to print per-core/per-socket statistics after commit
582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
used to get aggregated id.
Here is an example:
Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
cpumask 0,18)
$ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
Without this patch, it failes to get CPU 18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 7526851 cycles
S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/
S1-C0 0 <not counted> cycles
S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/
With this patch, it can get both CPU0 and CPU18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 6327768 cycles
S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/
S1-C0 1 330228 cycles
S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events")
Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e8e6d37e73e6b950c891c780745460b87f4755b6 upstream.
When we introduce a new sort key, we need to update the
hists__calc_col_len() function accordingly, otherwise the width
will be limited to strlen(header).
We can't update it when obtaining a line value for a column (for
instance, in sort__srcline_cmp()), because we reset it all when doing a
resort (see hists__output_recalc_col_len()), so we need to, from what is
in the hist_entry fields, set each of the column widths.
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Fixes: 409a8be61560 ("perf tools: Add sort by src line/number")
Link: http://lkml.kernel.org/n/tip-jgbe0yx8v1gs89cslr93pvz2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b5cabbcbd157a4bf5a92dfc85134999a3b55342d upstream.
A copy of /proc/kcore containing the kernel text can be made to the
buildid cache. e.g.
perf buildid-cache -v -k /proc/kcore
To workaround objdump limitations, a copy is also made when annotating
against /proc/kcore.
The copying process stops working from libelf about v1.62 onwards (the
problem was found with v1.63).
The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails
because additional validation has been added to gelf_getphdr().
The use of gelf_getphdr() is a misguided attempt to get default
initialization of the Gelf_Phdr structure. That should not be
necessary because every member of the Gelf_Phdr structure is
subsequently assigned. So just remove the call to gelf_getphdr().
Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be
removed also.
Committer notes:
Note to stable@kernel.org, from Adrian in the cover letter for this
patchkit:
The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think
it is important enough for stable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0bc2f2f7d080561cc484d2d0a162a9396bed3383 upstream.
When setting yup the symbols library we setup several filter lists,
for dsos, comms, symbols, etc, and there is code that, if there are
filters, do certain operations, like recalculate the number of non
filtered histogram entries in the top/report TUI.
But they were considering just the "Zoom" filters, when they need to
take into account as well the above mentioned filters (perf top --comms,
--dsos, etc).
So store in symbol_conf.has_filter true if any of those filters is in
place.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-f5edfmhq69vfvs1kmikq1wep@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andre Tomt <lkml@tomt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9c0fa8dd3d58de8b688fda758eea1719949c7f0a upstream.
At some point:
commit 2c86c7ca7606
Author: Namhyung Kim <namhyung@kernel.org>
Date: Mon Mar 17 18:18:54 2014 -0300
perf report: Merge al->filtered with hist_entry->filtered
We stopped dropping samples for things filtered via the --comms, --dsos,
--symbols, etc, i.e. things marked as filtered in the symbol resolution
routines (thread__find_addr_map(), perf_event__preprocess_sample(),
etc).
But then, in:
commit 268397cb2a47
Author: Namhyung Kim <namhyung@kernel.org>
Date: Tue Apr 22 14:49:31 2014 +0900
perf top/tui: Update nr_entries properly after a filter is applied
We don't take into account entries that were filtered in
perf_event__preprocess_sample() and friends, which leads to
inconsistency in the browser seek routines, that expects the number of
hist_entry->filtered entries to match what it thinks is the number of
unfiltered, browsable entries.
So, for instance, when we do:
perf top --symbols ___non_existent_symbol___
the hist_browser__nr_entries() routine thinks there are no filters in
place, uses the hists->nr_entries but all entries are filtered, leading
to a segfault.
Tested with:
perf top --symbols malloc,free --percentage=relative
Freezing, by pressing 'f', at any time and doing the math on the
percentages ends up with 100%, ditto for:
perf top --dsos libpthread-2.20.so,libxul.so --percentage=relative
Both were segfaulting, all fixed now.
More work needed to do away with checking if filters are in place, we
should just use the nr_non_filtered_samples counter, no need to
conditionally use it or hists.nr_filter, as what the browser does is
just show unfiltered stuff. An audit of how it is being accounted is
needed, this is the minimal fix.
Reported-by: Michael Petlan <mpetlan@redhat.com>
Fixes: 268397cb2a47 ("perf top/tui: Update nr_entries properly after a filter is applied")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-6w01d5q97qk0d64kuojme5in@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b42b09b88c831ba4da2d669581dde371c38c2af upstream.
With commit: e1e455f4f4d3 (perf tools: Work around lack of sched_getcpu
in glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
correctly calculate convergence.
With the above commit, sched_getcpu always seems to return -1. The
intention of commit e1e455f was to add a sched_getcpu in glibc < 2.6.
Hence keep the sched_getcpu definition under an ifdef.
This regression happened occurred between v4.0 and v4.1
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vinson Lee <vlee@twitter.com>
Fixes: e1e455f4f4d3 ("perf tools: Work around lack of sched_getcpu in glibc < 2.6")
Link: http://lkml.kernel.org/r/20150624111004.GA5220@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Parsing /proc/cpuinfo is a fiddly, arch-dependent business and a recent
change to get it working for Sparc broke arm and arm64 platforms.
Use sysconf to determine the number of online CPUs only parsing
/proc/cpuinfo when sysconf is not available.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <david.ahern@oracle.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150423140454.GJ1652@arm.com
[ Made it fall back to parsing /proc when getconf not found ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
PMU driver hardware-enablement addition"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix segfault if passed with ''.
perf report: Fix -T/--threads option to work again
perf bench numa: Fix immediate meeting of convergence condition
perf bench numa: Fixes of --quiet argument
perf bench futex: Fix hung wakeup tasks after requeueing
perf probe: Fix bug with global variables handling
perf top: Fix a segfault when kernel map is restricted.
tools lib traceevent: Fix build failure on 32-bit arch
perf kmem: Fix compiles on RHEL6/OL6
tools lib api: Undefine _FORTIFY_SOURCE before setting it
perf kmem: Consistently use PRIu64 for printing u64 values
perf trace: Disable events and drain events when forked workload ends
perf trace: Enable events when doing system wide tracing and starting a workload
perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
|
|
Since parse_perf_probe_point() deals with a user passed argument, we
should not assume it to be a valid string.
Without this patch, if pass '' to perf probe, a segfault raises:
$ perf probe -a ''
Segmentation fault
This patch checks argument of parse_perf_probe_point() before
string processing.
After this patch:
$ perf probe -a ''
usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
...
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1430210769-94177-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The commit 512ae1bd6acb ("perf tools: Consolidate management of default
sort orders") changed default value of the 'sort_order' variable to NULL
indicating that users don't set any sort keys on the command line.
However it missed to update a check in perf_evlist__tty_browse_hists()
so that 'perf report -T' cannot show the per-thread values after the
normal output. This patch fixes it to work again.
Note that the -T option only works on --stdio and neither --sort nor
--parent option was given.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1430309328-28317-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This patch fixes the race in the beginning of benchmark run when some
threads hasn't got assigned curr_cpu yet so they don't occur in
nodes-of-process stats and benchmark concludes that all remaining
threads are converged already.
The race can be reproduced with small amount of threads and some bigger
amount of shared process memory, e.g. one process, two threads and 5GB
of process memory.
Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-4-git-send-email-pholasek@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Corrected description and fixed function of --quiet argument.
Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-2-git-send-email-pholasek@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The futex-requeue benchmark can hang because of missing wakeups once the
benchmark is done, ie:
[Run 1]: Requeued 1024 of 1024 threads in 0.3290 ms
perf: couldn't wakeup all tasks (135/1024)
This bug, while perhaps suggesting missing wakeups in kernel futex code,
is merely a consequence of the crappy FUTEX_CMP_REQUEUE man page,
incorrectly mentioning that the number of requeued tasks is in fact
returned, not the wakeups.
This patch acknowledges this and updates the corresponding futex_wake
code around it.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1429894848.10273.44.camel@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are missing curly braces which causes find_variable() return wrong
value when probing with global variables.
This problem can be reproduced as following:
$ perf probe -v --add='generic_perform_write global_variable_for_test'
...
Try to find probe point from debuginfo.
Probe point found: generic_perform_write+0
Searching 'global_variable_for_test' variable in context.
An error occurred in debuginfo analysis (-2).
Error: Failed to add events. Reason: No such file or directory (Code: -2)
After this patch:
$ perf probe -v --add='generic_perform_write global_variable_for_test'
...
Converting variable global_variable_for_test into trace event.
global_variable_for_test type is int.
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Added new event:
Writing event: p:probe/generic_perform_write _stext+1237464
global_variable_for_test=@global_variable_for_test+0:s32
probe:generic_perform_write (on generic_perform_write with
global_variable_for_test)
You can now use it in all perf tools, such as:
perf record -e probe:generic_perform_write -aR sleep 1
Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1429949338-18678-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Perf top raise a warning if a kernel sample is collected but kernel map
is restricted. The warning message needs to dereference al.map->dso...
However, previous perf_event__preprocess_sample() doesn't always
guarantee al.map != NULL, for example, when kernel map is restricted.
This patch validates al.map before dereferencing, avoid the segfault.
Before this patch:
$ cat /proc/sys/kernel/kptr_restrict
1
$ perf top -p 120183
perf: Segmentation fault
-------- backtrace --------
/path/to/perf[0x509868]
/lib64/libc.so.6(+0x3545f)[0x7f9a1540045f]
/path/to/perf[0x448820]
/path/to/perf(cmd_top+0xe3c)[0x44a5dc]
/path/to/perf[0x4766a2]
/path/to/perf(main+0x5f5)[0x42e545]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x7f9a153ecbd4]
/path/to/perf[0x42e674]
And gdb call trace:
Program received signal SIGSEGV, Segmentation fault.
perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0)
at builtin-top.c:736
736 !RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION]) ?
(gdb) bt
#0 perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0)
at builtin-top.c:736
#1 perf_top__mmap_read_idx (top=top@entry=0x7fffffffa8a0, idx=idx@entry=0) at builtin-top.c:855
#2 0x000000000044a5dd in perf_top__mmap_read (top=0x7fffffffa8a0) at builtin-top.c:872
#3 __cmd_top (top=0x7fffffffa8a0) at builtin-top.c:997
#4 cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-top.c:1267
#5 0x00000000004766a3 in run_builtin (p=p@entry=0x8a6ce8 <commands+264>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdf70)
at perf.c:371
#6 0x000000000042e546 in handle_internal_command (argv=0x7fffffffdf70, argc=3) at perf.c:430
#7 run_argv (argv=0x7fffffffdcf0, argcp=0x7fffffffdcfc) at perf.c:474
#8 main (argc=3, argv=0x7fffffffdf70) at perf.c:589
(gdb)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1429946703-80807-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
0d68bc92c48 breaks compiles on RHEL6/OL6:
cc1: warnings being treated as errors
builtin-kmem.c: In function ‘search_page_alloc_stat’:
builtin-kmem.c:322: error: declaration of ‘stat’ shadows a global declaration
node = &parent->rb_left;
/usr/include/sys/stat.h:455: error: shadowed declaration is here
builtin-kmem.c: In function ‘perf_evsel__process_page_alloc_event’:
builtin-kmem.c:378: error: declaration of ‘stat’ shadows a global declaration
/usr/include/sys/stat.h:455: error: shadowed declaration is here
builtin-kmem.c: In function ‘perf_evsel__process_page_free_event’:
builtin-kmem.c:431: error: declaration of ‘stat’ shadows a global declaration
/usr/include/sys/stat.h:455: error: shadowed declaration is here
Rename local variable to pstat to avoid the name conflict.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Link: http://lkml.kernel.org/r/1429033773-31383-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Building the perf tool for 32-bit ARM results in the following build
error due to a combination of an incorrect conversion specifier and
compiling with -Werror:
builtin-kmem.c: In function ‘print_page_summary’:
builtin-kmem.c:644:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ [-Werror=format=]
nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
^
builtin-kmem.c:647:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ [-Werror=format=]
(total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
^
cc1: all warnings being treated as errors
This patch fixes the problem by consistently using PRIu64 for printing
out u64 values.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1429796437-1790-1-git-send-email-will.deacon@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We were not checking in the inner event processing loop if the forked workload
had finished, which, on a busy system, may make it take a long time trying to
drain events, entering a seemingly neverending loop, waiting for the system to
get idle enough to make it drain the buffers.
Fix it by disabling the events when 'done' is true, in the inner loop, to start
draining what is in the buffers.
Now:
[root@ssdandy ~]# time trace --filter-pids 14003 -a sleep 1 | tail
996.748 ( 0.002 ms): sh/30296 rt_sigprocmask(how: SETMASK, nset: 0x7ffc83418160, sigsetsize: 8) = 0
996.751 ( 0.002 ms): sh/30296 rt_sigprocmask(how: BLOCK, nset: 0x7ffc834181f0, oset: 0x7ffc83418270, sigsetsize: 8) = 0
996.755 ( 0.002 ms): sh/30296 rt_sigaction(sig: INT, act: 0x7ffc83417f50, oact: 0x7ffc83417ff0, sigsetsize: 8) = 0
1004.543 ( 0.362 ms): tail/30198 ... [continued]: read()) = 4096
1004.548 ( 7.791 ms): sh/30296 wait4(upid: -1, stat_addr: 0x7ffc834181a0) ...
1004.975 ( 0.427 ms): tail/30198 read(buf: 0x7633f0, count: 8192) = 4096
1005.390 ( 0.410 ms): tail/30198 read(buf: 0x765410, count: 8192) = 4096
1005.743 ( 0.348 ms): tail/30198 read(buf: 0x7633f0, count: 8192) = 4096
1006.197 ( 0.449 ms): tail/30198 read(buf: 0x765410, count: 8192) = 4096
1006.492 ( 0.290 ms): tail/30198 read(buf: 0x7633f0, count: 8192) = 4096
real 0m1.219s
user 0m0.704s
sys 0m0.331s
[root@ssdandy ~]#
Reported-by: Michael Petlan <mpetlan@redhat.com>
Suggested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-p6kpn1b26qcbe47pufpw0tex@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
commit f7aa222ff397
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Tue Feb 3 13:25:39 2015 -0300
perf trace: No need to enable evsels for workload started from perf
The assumption was that whenever a workload is specified, the
attr.enable_on_exec evsel flag would be set, but that is not happening
when perf_record_opts.system_wide is set, for instance
That resulted in both perf_evlist__enable() and attr.enable_on_exec
being not called/set, which made the events to remain disabled while the
workload runs, producing no output.
Fix it, by calling perf_evlist__enable() in the 'trace' tool
when forking and not targetting a workload started from trace
v2: Test against !target__none(), as suggested by Namhyung Kim, that is
what is used in perf_evsel__config() when deciding if the
attr.enable_on_exec flag to be set. More work is needed to cover other
cases such as opts->initial_delay.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-27z7169pvfxgj8upic636syv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"This update has mostly fixes, but also other bits:
- perf tooling fixes
- PMU driver fixes
- Intel Broadwell PMU driver HW-enablement for LBR callstacks
- a late coming 'perf kmem' tool update that enables it to also
analyze page allocation data. Note, this comes with MM tracepoint
changes that we believe to not break anything: because it changes
the formerly opaque 'struct page *' field that uniquely identifies
pages to 'pfn' which identifies pages uniquely too, but isn't as
opaque and can be used for other purposes as well"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
perf/x86/intel: Add Broadwell support for the LBR callstack
perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
perf/x86: Fix hw_perf_event::flags collision
perf probe: Fix segfault when probe with lazy_line to file
perf probe: Find compilation directory path for lazy matching
perf probe: Set retprobe flag when probe in address-based alternative mode
perf kmem: Analyze page allocator events also
tracing, mm: Record pfn instead of pointer to struct page
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
"Core kernel changes:
- One of the more interesting features in this cycle is the ability
to attach eBPF programs (user-defined, sandboxed bytecode executed
by the kernel) to kprobes.
This allows user-defined instrumentation on a live kernel image
that can never crash, hang or interfere with the kernel negatively.
(Right now it's limited to root-only, but in the future we might
allow unprivileged use as well.)
(Alexei Starovoitov)
- Another non-trivial feature is per event clockid support: this
allows, amongst other things, the selection of different clock
sources for event timestamps traced via perf.
This feature is sought by people who'd like to merge perf generated
events with external events that were measured with different
clocks:
- cluster wide profiling
- for system wide tracing with user-space events,
- JIT profiling events
etc. Matching perf tooling support is added as well, available via
the -k, --clockid <clockid> parameter to perf record et al.
(Peter Zijlstra)
Hardware enablement kernel changes:
- x86 Intel Processor Trace (PT) support: which is a hardware tracer
on steroids, available on Broadwell CPUs.
The hardware trace stream is directly output into the user-space
ring-buffer, using the 'AUX' data format extension that was added
to the perf core to support hardware constraints such as the
necessity to have the tracing buffer physically contiguous.
This patch-set was developed for two years and this is the result.
A simple way to make use of this is to use BTS tracing, the PT
driver emulates BTS output - available via the 'intel_bts' PMU.
More explicit PT specific tooling support is in the works as well -
will probably be ready by 4.2.
(Alexander Shishkin, Peter Zijlstra)
- x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
feature of Intel Xeon CPUs that allows the measurement and
allocation/partitioning of caches to individual workloads.
These kernel changes expose the measurement side as a new PMU
driver, which exposes various QoS related PMU events. (The
partitioning change is work in progress and is planned to be merged
as a cgroup extension.)
(Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
Waskiewicz Jr)
- x86 Intel Haswell LBR call stack support: this is a new Haswell
feature that allows the hardware recording of call chains, plus
tooling support. To activate this feature you have to enable it
via the new 'lbr' call-graph recording option:
perf record --call-graph lbr
perf report
or:
perf top --call-graph lbr
This hardware feature is a lot faster than stack walk or dwarf
based unwinding, but has some limitations:
- It reuses the current LBR facility, so LBR call stack and
branch record can not be enabled at the same time.
- It is only available for user-space callchains.
(Yan, Zheng)
- x86 Intel Broadwell CPU support and various event constraints and
event table fixes for earlier models.
(Andi Kleen)
- x86 Intel HT CPUs event scheduling workarounds. This is a complex
CPU bug affecting the SNB,IVB,HSW families that results in counter
value corruption. The mitigation code is automatically enabled and
is transparent.
(Maria Dimakopoulou, Stephane Eranian)
The perf tooling side had a ton of changes in this cycle as well, so
I'm only able to list the user visible changes here, in addition to
the tooling changes outlined above:
User visible changes affecting all tools:
- Improve support of compressed kernel modules (Jiri Olsa)
- Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
- Bash completion for subcommands (Yunlong Song)
- Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
- Support missing -f to override perf.data file ownership. (Yunlong Song)
- Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
User visible changes in individual tools:
'perf data':
New tool for converting perf.data to other formats, initially
for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
Sebastian Siewior)
'perf diff':
Add --kallsyms option (David Ahern)
'perf list':
Allow listing events with 'tracepoint' prefix (Yunlong Song)
Sort the output of the command (Yunlong Song)
'perf kmem':
Respect -i option (Jiri Olsa)
Print big numbers using thousands' group (Namhyung Kim)
Allow -v option (Namhyung Kim)
Fix alignment of slab result table (Namhyung Kim)
'perf probe':
Support multiple probes on different binaries on the same command line (Masami Hiramatsu)
Support unnamed union/structure members data collection. (Masami Hiramatsu)
Check kprobes blacklist when adding new events. (Masami Hiramatsu)
'perf record':
Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)
Support recording running/enabled time (Andi Kleen)
'perf sched':
Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)
'perf report' and 'perf top':
Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)
Indicate which callchain entries are annotated in the
TUI hists browser (Arnaldo Carvalho de Melo)
Add pid/tid filtering to 'report' and 'script' commands (David Ahern)
Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
events (Arnaldo Carvalho de Melo)
'perf stat':
Report unsupported events properly (Suzuki K. Poulose)
Output running time and run/enabled ratio in CSV mode (Andi Kleen)
'perf trace':
Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)
Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)
Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)
Dump stack on segfaults (Arnaldo Carvalho de Melo)
No need to explicitely enable evsels for workload started from perf, let it
be enabled via perf_event_attr.enable_on_exec, removing some events that take
place in the 'perf trace' before a workload is really started by it.
(Arnaldo Carvalho de Melo)
Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)
There's also been a ton of infrastructure work done, such as the
split-out of perf's build system into tools/build/ and other changes -
see the shortlog and changelog for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
perf evlist: Fix type for references to data_head/tail
perf probe: Check the orphaned -x option
perf probe: Support multiple probes on different binaries
perf buildid-list: Fix segfault when show DSOs with hits
perf tools: Fix cross-endian analysis
perf tools: Fix error path to do closedir() when synthesizing threads
perf tools: Fix synthesizing fork_event.ppid for non-main thread
perf tools: Add 'I' event modifier for exclude_idle bit
perf report: Don't call map__kmap if map is NULL.
perf tests: Fix attr tests
perf probe: Fix ARM 32 building error
perf tools: Merge all perf_event_attr print functions
perf record: Add clockid parameter
perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
perf sched replay: Support using -f to override perf.data file ownership
perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
"Usual trivial tree updates. Nothing outstanding -- mostly printk()
and comment fixes and unused identifier removals"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
goldfish: goldfish_tty_probe() is not using 'i' any more
powerpc: Fix comment in smu.h
qla2xxx: Fix printks in ql_log message
lib: correct link to the original source for div64_u64
si2168, tda10071, m88ds3103: Fix firmware wording
usb: storage: Fix printk in isd200_log_config()
qla2xxx: Fix printk in qla25xx_setup_mode
init/main: fix reset_device comment
ipwireless: missing assignment
goldfish: remove unreachable line of code
coredump: Fix do_coredump() comment
stacktrace.h: remove duplicate declaration task_struct
smpboot.h: Remove unused function prototype
treewide: Fix typo in printk messages
treewide: Fix typo in printk messages
mod_devicetable: fix comment for match_flags
|
|
The first argument passed to find_probe_point_lazy() should be CU die,
which will be passed to die_walk_lines() when lazy_line matches.
Currently, when we probe with lazy_line pattern to file without function
name, NULL pointer is passed and causes a segment fault.
Can be reproduced as following:
$ perf probe -k vmlinux --add='fs/super.c;s->s_count=1;'
[ 1958.984658] perf[1020]: segfault at 10 ip 00007fc6e10d8c71 sp
00007ffcbfaaf900 error 4 in libdw-0.161.so[7fc6e10ce000+34000]
Segmentation fault
After this patch:
$ perf probe -k vmlinux --add='fs/super.c;s->s_count=1;'
Added new event:
probe:_stext (on @fs/super.c)
You can now use it in all perf tools, such as:
perf record -e probe:_stext -aR sleep 1
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1428925290-5623-3-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If we use lazy matching, it failed to open a souce file if perf command
is invoked outside of compilation directory:
$ perf probe -a '__schedule;clear_*'
Failed to open kernel/sched/core.c: No such file or directory
Error: Failed to add events. (-2)
OTOH, other commands like "probe -L" can solve the souce directory by
themselves. Let's make it possible for lazy matching too!
Signed-off-by: Naohiro Aota <naota@elisp.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1426223923-1493-1-git-send-email-naota@elisp.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When perf probe searched in a debuginfo file and failed, it tried with
an alternative, in function get_alternative_probe_event():
memcpy(tmp, &pev->point, sizeof(*tmp));
memset(&pev->point, 0, sizeof(pev->point));
In this case, it drops the retprobe flag and forgets to set it back in
find_alternative_probe_point(), so the problem occurs.
Can be reproduced as following:
$ perf probe -v -k vmlinux --add='sys_write%return'
...
Added new event:
Writing event: p:probe/sys_write _stext+1584952
probe:sys_write (on sys_write%return)
$ cat /sys/kernel/debug/tracing/kprobe_events
p:probe/sys_write _stext+1584952
After this patch:
$ perf probe -v -k vmlinux --add='sys_write%return'
Added new event:
Writing event: r:probe/sys_write SyS_write+0
probe:sys_write (on sys_write%return)
$ cat /sys/kernel/debug/tracing/kprobe_events
r:probe/sys_write SyS_write
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1428925290-5623-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The perf kmem command records and analyze kernel memory allocation only
for SLAB objects. This patch implement a simple page allocator analyzer
using kmem:mm_page_alloc and kmem:mm_page_free events.
It adds two new options of --slab and --page. The --slab option is for
analyzing SLAB allocator and that's what perf kmem currently does.
The new --page option enables page allocator events and analyze kernel
memory usage in page unit. Currently, 'stat --alloc' subcommand is
implemented only.
If none of these --slab nor --page is specified, --slab is implied.
First run 'perf kmem record' to generate a suitable perf.data file:
# perf kmem record --page sleep 5
Then run 'perf kmem stat' to postprocess the perf.data file:
# perf kmem stat --page --alloc --line 10
-------------------------------------------------------------------------------
PFN | Total alloc (KB) | Hits | Order | Mig.type | GFP flags
-------------------------------------------------------------------------------
4045014 | 16 | 1 | 2 | RECLAIM | 00285250
4143980 | 16 | 1 | 2 | RECLAIM | 00285250
3938658 | 16 | 1 | 2 | RECLAIM | 00285250
4045400 | 16 | 1 | 2 | RECLAIM | 00285250
3568708 | 16 | 1 | 2 | RECLAIM | 00285250
3729824 | 16 | 1 | 2 | RECLAIM | 00285250
3657210 | 16 | 1 | 2 | RECLAIM | 00285250
4120750 | 16 | 1 | 2 | RECLAIM | 00285250
3678850 | 16 | 1 | 2 | RECLAIM | 00285250
3693874 | 16 | 1 | 2 | RECLAIM | 00285250
... | ... | ... | ... | ... | ...
-------------------------------------------------------------------------------
SUMMARY (page allocator)
========================
Total allocation requests : 44,260 [ 177,256 KB ]
Total free requests : 117 [ 468 KB ]
Total alloc+freed requests : 49 [ 196 KB ]
Total alloc-only requests : 44,211 [ 177,060 KB ]
Total free-only requests : 68 [ 272 KB ]
Total allocation failures : 0 [ 0 KB ]
Order Unmovable Reclaimable Movable Reserved CMA/Isolated
----- ------------ ------------ ------------ ------------ ------------
0 32 . 44,210 . .
1 . . . . .
2 . 18 . . .
3 . . . . .
4 . . . . .
5 . . . . .
6 . . . . .
7 . . . . .
8 . . . . .
9 . . . . .
10 . . . . .
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The data_head and data_tail fields are defined as __u64 in
linux/perf_event.h, but perf userspace uses int and unsigned int.
Convert all references to u64 for consistency.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1428420037-26599-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To avoid probing in unintended binary, the orphaned -x option must be
checked and warned.
Without this patch, following command sets up the probe in the kernel.
-----
# perf probe -a strcpy -x ./perf
Added new event:
probe:strcpy (on strcpy)
You can now use it in all perf tools, such as:
perf record -e probe:strcpy -aR sleep 1
-----
But in this case, it seems that the user may want to probe in the perf
binary. With this patch, perf-probe correctly handles the orphaned -x.
-----
# perf probe -a strcpy -x ./perf
Error: -x/-m must follow the probe definitions.
...
-----
Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150401102541.17137.75477.stgit@localhost.localdomain
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support multiple probes on different binaries with just
one command.
In the result, this example sets up the probes on icmp_rcv in
kernel, on main and set_target in perf, and on pcspkr_event
in pcspker.ko driver.
-----
# perf probe -a icmp_rcv -x ./perf -a main -a set_target \
-m /lib/modules/4.0.0-rc5+/kernel/drivers/input/misc/pcspkr.ko \
-a pcspkr_event
Added new event:
probe:icmp_rcv (on icmp_rcv)
You can now use it in all perf tools, such as:
perf record -e probe:icmp_rcv -aR sleep 1
Added new event:
probe_perf:main (on main in /home/mhiramat/ksrc/linux-3/tools/perf/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:main -aR sleep 1
Added new event:
probe_perf:set_target (on set_target in /home/mhiramat/ksrc/linux-3/tools/perf/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:set_target -aR sleep 1
Added new event:
probe:pcspkr_event (on pcspkr_event in pcspkr)
You can now use it in all perf tools, such as:
perf record -e probe:pcspkr_event -aR sleep 1
-----
Reported-by: Arnaldo Carvalho de Melo <acme@infradead.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150401102539.17137.46454.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
commit: f3b623b8490a ("perf tools: Reference count struct thread")
appends every thread->node to dead_threads in machine__remove_thread()
and list_del_init() this node in thread__put().
perf_event__exit_del_thread() releases thread wihout using
machine__remove_thread(), and causes a NULL pointer crash when
list_del_init(&thread->node) is called. Fix this by using
machine_remove_thread() instead of using thread__put() directly.
This problem can be reproduced as following:
$ perf record ls
$ perf buildid-list --with-hits
[ 3874.195070] perf[1018]: segfault at 0 ip 00000000004b0b15 sp
00007ffc35b44780 error 6 in perf[400000+166000]
Segmentation fault
After this patch:
$ perf record ls
$ perf buildid-list --with-hits
bc23e7c3281e542650ba4324421d6acf78f4c23e /proc/kcore
643324cb0e969f30c56d660f167f84a150845511 [vdso]
0000000000000000000000000000000000000000 /bin/busybox
...
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1428658500-6483-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Trying to analyze a big endian data file on little endian system fails
with the error:
0xa9b40 [0x70]: failed to process type: 9
The problem is that header parsing is not done correctly because the
file attributes are not swapped. Make it so. With this patch able to
analyze a sparc64 data file on x86_64.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1428610546-178789-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When traversing /proc to synthesize the PERF_RECORD_FORK et al events we
were bailing out on errors without calling closedir(), fix it.
Reported-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-vxtp593rfztgbi8noy0m967p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit ca6c41c59b9 sets the ppid based on what is read from the
/proc/pid/status file when synthesizing fork events.
This is correct thing to do for new processes but not threads of a
process.
Fix ppid for threads to be the main thread when synthesizing fork events
(ie., assume main thread spawned all sub-threads in a process).
Reported-by: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Signed-off-by: David Ahern <david.ahern@oracle.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1428598107-178999-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Adding 'I' event modifier to have complete set of modifiers for
perf_event_attr:exclude_* bits.
Any event specified with 'I' modifier will have the
perf_event_attr:exclude_idle bit set.
$ perf record -e cycles:I -vv ls 2>&1 | grep exclude_idle
exclude_hv 0 exclude_idle 1
Adding automated tests.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: William Cohen <wcohen@redhat.com>
Link: http://lkml.kernel.org/r/1428441919-23099-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
report__warn_kptr_restrict() calls map__kmap(kernel_map) before checking
kernel_map againest NULL.
Which is dangerous, since map__kmap() will return a invalid and not NULL
address.
It will trigger a warning message in map__kmap() after the patch "perf:
kmaps: enforce usage of kmaps to protect futher bugs." was applied.
This patch fixes it by adding the missing checking.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1428490772-135393-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Following commit:
1a5941312414 perf: Add wakeup watermark control to the AUX area
enlarged perf_event_attr, but did not updated attr tests.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: http://lkml.kernel.org/n/20150407171715.GA22603@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit 9b118acae310f57baee770b5db402500d8695e50 ("perf probe: Fix to
handle aliased symbols in glibc") uses an absolute format '%lx' to
print u64 argument, which causes compiling error on ARM 32.
This patch replaces it with PRIx64.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1428459274-138470-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently there's 3 (that I found) different and incomplete
implementations of printing perf_event_attr.
This is quite silly. Merge the lot.
While this patch does not retain the exact form all printing that I
found is debug output and thus it should not be critical.
Also, I cannot find a single print_event_desc() caller.
Pre:
$ perf record -vv -e cycles -- sleep 1
------------------------------------------------------------
perf_event_attr:
type 0
size 104
config 0
sample_period 4000
sample_freq 4000
sample_type 0x107
read_format 0
disabled 1 inherit 1
pinned 0 exclusive 0
exclude_user 0 exclude_kernel 0
exclude_hv 0 exclude_idle 0
mmap 1 comm 1
mmap2 1 comm_exec 1
freq 1 inherit_stat 0
enable_on_exec 1 task 1
watermark 0 precise_ip 0
mmap_data 0 sample_id_all 1
exclude_host 0 exclude_guest 1
excl.callchain_kern 0 excl.callchain_user 0
wakeup_events 0
wakeup_watermark 0
bp_type 0
bp_addr 0
config1 0
bp_len 0
config2 0
branch_sample_type 0
sample_regs_user 0
sample_stack_user 0
sample_regs_intr 0
------------------------------------------------------------
$ perf evlist -vv
cycles: sample_freq=4000, size: 104, sample_type: IP|TID|TIME|PERIOD,
disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1,
freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1
Post:
$ ./perf record -vv -e cycles -- sleep 1
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
------------------------------------------------------------
$ ./perf evlist -vv
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq:
1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1,
mmap2: 1, comm_exec: 1
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150407091150.644238729@infradead.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Teach perf-record about the new perf_event_attr::{use_clockid, clockid}
fields. Add a simple parameter to set the clock (if any) to be used for
the events to be recorded into the data file.
Since we store the entire perf_event_attr in the EVENT_DESC section we
also already store the used clockid in the data file.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yunlong Song <yunlong.song@huawei.com>
Link: http://lkml.kernel.org/r/20150407154851.GR23123@twins.programming.kicks-ass.net
[ Conditionally define CLOCK_BOOTTIME, at least rhel6 doesn't have it - dsahern
Ditto for CLOCK_MONOTONIC_RAW, sles11sp2 doesn't have it - yunlong.song ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
instead of the default value 10
Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
10 to calculate their value.
However, the replay_repeat can be changed to other value by using -r
option, so the calculation above should use replay_repeat to achieve
more accurate results instead of the default value 10.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Enable to use perf.data when it is not owned by current user or root.
Example:
$ ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
$ sudo id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
$ sudo perf sched replay -f
run measurement overhead: 98 nsecs
sleep measurement overhead: 52909 nsecs
the run test took 1000015 nsecs
the sleep test took 1054253 nsecs
File perf.data not owned by current user or root (use -f to override)
As shown above, the -f option does not work at all.
After this patch:
$ sudo perf sched replay -f
run measurement overhead: 221 nsecs
sleep measurement overhead: 40514 nsecs
the run test took 1000003 nsecs
the sleep test took 1056098 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
...
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
------------------------------------------------------------
#1 : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
#2 : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
#3 : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
#4 : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
#5 : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
#6 : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
#7 : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
#8 : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
#9 : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
#10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45
As shown above, the -f option really works now.
Besides for replay, -f option can also work for latency and map.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
maximum open files
The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.
Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
RLIMIT_NOFILE in include/asm-generic/resource.h.
And the soft maximum number finally decides the limitation of the
maximum files which are allowed to be opened.
That is to say a process can use at most 1024 file descriptors for its
o pened files, or an EMFILE error will happen.
This error can be fixed by increasing the soft maximum number, under the
constraint that the soft maximum number can not exceed the hard maximum
number, or both soft and hard maximum number should be increased
simultaneously with privilege.
For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events.
That is to say each task needs a unique file descriptor. In x86_64,
there may be over 1024 or 4096 tasks correspoinding to the record in
perf.data, which causes that no enough file descriptors can be used.
As a result, EMFILE error happens and stops the replay process. To solve
this problem, we adaptively increase the soft and hard maximum number of
open files with a '-f' option.
Example:
Test environment: x86_64 with 160 cores
$ cat /proc/sys/kernel/pid_max
163840
$ cat /proc/sys/fs/file-max
6815744
$ ulimit -Sn
1024
$ ulimit -Hn
4096
Before this patch:
$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
After this patch:
$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
Have a try with -f option
$ perf sched replay -f
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
------------------------------------------------------------
#1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
#2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
#3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
#4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
#5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
#6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
#7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
#8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
#9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
#10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
fails for any task
Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).
The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.
For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...
This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.
As a result, all the tasks are required and their threads must be
successfully created.
If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.
And this is a dead halt, since task B's thread_func cannot continue at
all.
To solve this problem, perf sched replay should exit once any task fails
to create its thread.
Example:
Test environment: x86_64 with 160 cores
Before this patch:
$ perf sched replay
...
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
------------------------------------------------------------ <- dead halt
After this patch:
$ perf sched replay
...
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
$
As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
threads
The pr_err in self_open_counters() prints error message to stderr.
Unlike stdout, stderr uses memory buffer on the stack of each calling
process.
The pr_err in self_open_counters() works in a thread called thread_func
created in function create_tasks, which concurrently creates
sched->nr_tasks threads.
If the error happens and pr_err prints the error message in each of
these threads, the stack size of the perf process (default is 8192
kbytes) will quickly run out and the segmentation fault will happen
then.
To solve this problem, pr_err with self_open_counters() should be moved
from newly created threads to the old main thread of the perf process.
Then the pr_err can work in a stable situation without the strange
segmentation fault problem.
Example:
Test environment: x86_64 with 160 cores
Before this patch:
$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Segmentation fault
After this patch:
$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
...
As shown above, the result continues without any segmentation fault.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
the different pid_max configurations
Although the memory of pid_to_task can be allocated via calloc according
to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
pid_max is changed after 'perf sched record' has created its perf.data.
If the new pid_max configured in 'perf sched replay' is smaller than the
old pid_max configured in 'perf sched record', then it will cause the
assertion failure problem.
To solve this problem, we realloc the memory of pid_to_task stepwise
once the passed-in pid parameter in register_pid is larger than the
current pid_max.
Example:
Test environment: x86_64 with 160 cores
$ cat /proc/sys/kernel/pid_max
163840
$ perf sched record ls
$ echo 5000 > /proc/sys/kernel/pid_max
$ cat /proc/sys/kernel/pid_max
5000
Before this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55356 nsecs
the run test took 1000011 nsecs
the sleep test took 1060940 nsecs
perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
long)pid_max)' failed.
Aborted
After this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55611 nsecs
the run test took 1000026 nsecs
the sleep test took 1060486 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
the unexpected change of pid_max
The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
is in a permanent and preset way, and it has two problems:
Problem 1: If the pid_max, which is the max number of pids in the
system, is much smaller than MAX_PID (1024*1000), then it causes a waste
of stack memory. This may happen in the case where the number of cpu
cores is much smaller than 1000.
Problem 2: If the pid_max is changed from the default value to a value
larger than MAX_PID, then it will cause assertion failure problem. The
maximum value of pid_max can be set to pid_max_max (see pidmap_init
defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
value is much larger than MAX_PID, and will take up 32768 Kbytes
(4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
larger than the default 8192 Kbytes of the stack size of calling
process.
Due to these two problems, we use calloc to allocate the memory of
pid_to_task dynamically.
Example:
Test environment: x86_64 with 160 cores
$ cat /proc/sys/kernel/pid_max
163840
$ echo 1025000 > /proc/sys/kernel/pid_max
$ cat /proc/sys/kernel/pid_max
1025000
Run some applications until the pid of some process is greater than
the value of MAX_PID (1024*1000).
Before this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55480 nsecs
the run test took 1000008 nsecs
the sleep test took 1063151 nsecs
perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
failed.
Aborted
After this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55435 nsecs
the run test took 1000004 nsecs
the sleep test took 1059312 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Current MAX_PID is only 65536, which will cause assertion failure problem
when CPU cores are more than 64 in x86_64.
This is because the pid_max value in x86_64 is at least
PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init
defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in
include/linux/threads.h).
Thus for MAX_PID = 65536, the correspoinding CPU cores are
65536/1024=64. This is obviously not enough at all for x86_64, and will
cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the
codes.
We increase MAX_PID value from 65536 to 1024*1000, which can be used in
x86_64 with 1000 cores.
This number is finally decided according to the limitation of stack size
of calling process.
Use 'ulimit -a', the result shows the stack size of any process is 8192
Kbytes, which is defined in include/uapi/linux/resource.h (#define
_STK_LIM (8*1024*1024)).
Thus we choose a large enough value for MAX_PID, and make it satisfy to
the limitation of the stack size, i.e., making the perf process take up
a memory space just smaller than 8192 Kbytes.
We have calculated and tested that 1024*1000 is OK for MAX_PID.
This means perf sched replay can now be used with at most 1000 cores in
x86_64 without any assertion failure problem.
Example:
Test environment: x86_64 with 160 cores
$ cat /proc/sys/kernel/pid_max
163840
Before this patch:
$ perf sched replay
run measurement overhead: 240 nsecs
sleep measurement overhead: 55379 nsecs
the run test took 1000004 nsecs
the sleep test took 1059424 nsecs
perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
failed.
Aborted
After this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55397 nsecs
the run test took 999920 nsecs
the sleep test took 1053313 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
correct meaning
There is no struct task_task at all, thus it is a typo error in the old
commits, now fix it to what it should be in order to avoid unnecessary
misunderstanding.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|