summaryrefslogtreecommitdiff
path: root/tools/perf/builtin-c2c.c
AgeCommit message (Collapse)Author
2023-06-12perf callchain: Use pthread keys for tls callchain_cursorIan Rogers
Pthread keys are more portable than __thread and allow the association of a destructor with the key. Use the destructor to clean up TLS callchain cursors to aid understanding memory leaks. Committer notes: Had to fixup a series of unconverted places and also check for the return of get_tls_callchain_cursor() as it may fail and return NULL. In that unlikely case we now either print something to a file, if the caller was expecting to print a callchain, or return an error code to state that resolving the callchain isn't possible. In some cases this was made easier because thread__resolve_callchain() already can fail for other reasons, so this new one (cursor == NULL) can be added and the callers don't have to explicitely check for this new condition. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-25-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12perf addr_location: Add init/exit/copy functionsIan Rogers
struct addr_location holds references to multiple reference counted objects. Add init/exit functions to make maintenance of those more consistent with the rest of the code and to try to avoid leaks. Modification of thread reference counts isn't included in this change. Committer notes: I needed to initialize result to sample->ip to make sure is set to something, fixing a compile time error, mostly keeping the previous logic as build_alloc_func_list() already does debugging/error prints about what went wrong if it takes the 'goto out'. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12perf thread: Add accessor functions for threadIan Rogers
Using accessors will make it easier to add reference count checking in later patches. Committer notes: thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer. When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-27perf pmu: Separate pmu and pmusIan Rogers
Separate and hide the pmus list in pmus.[ch]. Move pmus functionality out of pmu.[ch] into pmus.[ch] renaming pmus functions which were prefixed perf_pmu__ to perf_pmus__. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230527072210.2900565-28-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-27perf mem: Avoid hybrid PMU listIan Rogers
Add perf_pmu__num_mem_pmus that scans/counts the number of PMUs for mem events. Switch perf_pmu__for_each_hybrid_pmu to iterating all PMUs and only handling is_core ones. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230527072210.2900565-24-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-12perf c2c: Use zfree() to reduce chances of use after freeArnaldo Carvalho de Melo
Do defensive programming by using zfree() to initialize freed pointers to NULL, so that eventual use after free result in a NULL pointer deref instead of more subtle behaviour. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-10perf util: Move input_name to utilIan Rogers
'input_name' is the name of the input perf.data file, it is used by data convert and ui code. Move it to util to make it more consistent with other global state. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chengdong Li <chengdongli@tencent.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raul Silvera <rsilvera@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230410162511.3055900-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-15perf hist: Add 'kvm_info' field in histograms entryLeo Yan
__hists__add_entry() creates a temporary entry and compare it with existed histograms entries, if any existed entry equals to the temporary entry it skips to allocation to avoid duplication. The problem for support KVM event in histograms is it doesn't contain any info to identify KVM event and can be used for comparison entries. This patch adds 'kvm_info' field in the histograms entry which contains the KVM event's key, this identifier will be used for comparison histograms entries in later change. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-16perf c2c: Add report option to show false sharing in adjacent cachelinesFeng Tang
Many platforms have feature of adjacent cachelines prefetch, when it is enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if one is fetched to cache, the other one could likely be fetched too, which sort of extends the cacheline size to double, thus the false sharing could happens in adjacent cachelines. 0Day has captured performance changed related with this [1], and some commercial software explicitly makes its hot global variables 128 bytes aligned (2 cache lines) to avoid this kind of extended false sharing. So add an option "--double-cl" for 'perf c2c report' to show false sharing in double cache line granularity, which acts just like the cacheline size is doubled. There is no change to c2c record. The hardware events of shared cacheline are still per cacheline, and this option just changes the granularity of how events are grouped and displayed. In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case on old kernel): ---------------------------------------------------------------------- 26 31 2 0 0 0 0xffff888103ec6000 ---------------------------------------------------------------------- 35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm 6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge 25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm 19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge 9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge 3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are listed together to give users a hint of extended false sharing. [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Committer notes: Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx Removed -a, leaving just as --double-cl, as this probably is not used so frequently and perhaps will be even auto-detected if we manage to record the MSR where this is configured. Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Joe Mario <jmario@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-02perf tools: Use dedicated non-atomic clear/set bit helpersSean Christopherson
Use the dedicated non-atomic helpers for {clear,set}_bit() and their test variants, i.e. the double-underscore versions. Depsite being defined in atomic.h, and despite the kernel versions being atomic in the kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move to the double-underscore versions so that the versions that are expected to be atomic (for kernel developers) can be made atomic without affecting users that don't want atomic operations. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Message-Id: <20221119013450.2643007-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-06perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE eventsRavi Bangoria
Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events. Set it for combined load-store event as well which will enable recording of load latency by default on arch that does not support independent mem load event. Also document missing -W in perf-record man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-5-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf c2c: Add helpers to get counts of loads or storesShang XiaoJing
Wrap repeated code in helper functions get_load_llc_misses, get_load_cache_hits. For consistence, helper function get_stores is wraped as well. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220906032906.21395-3-shangxiaojing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-06perf c2c: Prevent potential memory leak in c2c_he_zalloc()Shang XiaoJing
Free allocated resources when zalloc() fails for members in c2c_he, to prevent potential memory leak in c2c_he_zalloc(). Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220906032906.21395-4-shangxiaojing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Use 'peer' as default display for Arm64Leo Yan
Since Arm64 arch doesn't support HITMs flags, this patch changes to use 'peer' as default display if user doesn't specify any type; for other arches, it still uses 'tot' as default display type if user doesn't specify it. This patch changes to call perf_session__new() in an earlier place, so session environment can be initialized ahead and arch info can be used for setting display type. Suggested-by: Ali Saidi <alisaidi@amazon.com> Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-15-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Sort on peer snooping for load operationsLeo Yan
This patch adds a new option 'peer' so can sort on the cache hit for peer snooping. For displaying with option 'peer', the "Shared Data Cache Line Table" and "Shared Cache Line Distribution Pareto" both sort with the metrics "tot_peer". As result, we can get the 'peer' display: # perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio ================================================= Shared Data Cache Line Table ================================================= # # ----------- Cacheline ---------- Peer ------- Load Peer ------- Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ---- # Index Address Node PA cnt Snoop Total Local Remote records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........ # 0 0xaaaac17d6000 N/A 0 100.00% 99 99 0 18851 18851 0 0 0 0 0 18752 0 99 0 0 0 0 0 ================================================= Shared Cache Line Distribution Pareto ================================================= # # -- Peer Snoop -- ------- Store Refs ------ --------- Data address --------- ---------- cycles ---------- Total cpu Shared # Num Rmt Lcl L1 Hit L1 Miss N/A Offset Node PA cnt Pid Tid Code address rmt peer lcl peer load records cnt Symbol Object Source:Line Node{cpus %peers %stores} # ..... ....... ....... ....... ....... ....... .................. .... ...... ....... ................. .................. ........ ........ ........ ....... ........ ...................... ................ ............... .... # ---------------------------------------------------------------------- 0 0 99 0 0 0 0xaaaac17d6000 ---------------------------------------------------------------------- 0.00% 3.03% 0.00% 0.00% 0.00% 0x20 N/A 0 3603 3603:memstress 0xaaaac17c25ac 0 376 41 9314 2 [.] 0x00000000000025ac memstress memstress[25ac] 0{ 2 100.0% n/a} 0.00% 3.03% 0.00% 0.00% 0.00% 0x20 N/A 0 3603 3606:memstress 0xaaaac17c25ac 0 375 44 9155 1 [.] 0x00000000000025ac memstress memstress[25ac] 0{ 1 100.0% n/a} 0.00% 48.48% 0.00% 0.00% 0.00% 0x29 N/A 0 3603 3606:memstress 0xaaaac17c3e88 0 180 170 65 1 [.] 0x0000000000003e88 memstress memstress[3e88] 0{ 1 100.0% n/a} 0.00% 45.45% 0.00% 0.00% 0.00% 0x29 N/A 0 3603 3603:memstress 0xaaaac17c3e88 0 180 175 70 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0{ 2 100.0% n/a} Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-14-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Refactor display stringLeo Yan
The display type is shown by combination the display string array and a suffix string "HITMs", which is not friendly to extend display for other sorting type (e.g. extension for peer operations). This patch moves the suffix string "HITMs" into display string array for HITM types, so it can allow us to not necessarily to output string "HITMs" for new incoming display type. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-13-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Refactor node headerLeo Yan
The node header array contains 3 items, each item is used for one of the 3 flavors for node accessing info. To extend sorting on other snooping type and not always stick to HITMs, the second header string "Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as "Node{cpus %peer %stores}"). For this reason, this patch changes the node header array to three flat variables and uses switch-case in function setup_nodes_header(), thus it is easier for altering the header string. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-12-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop'Leo Yan
Use more general naming for the main sort dimension, this can allow us not to sort only on HITM snoop type, so it can be extended to support other costly snooping operations. So rename the dimension to the prefix 'percent_costly_". Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-11-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Use explicit names for display macrosLeo Yan
Perf c2c tool has an assumption that it heavily depends on HITM snoop type to detect cache false sharing, unfortunately, HITM is not supported on some architectures. Essentially, perf c2c tool wants to find some very costly snooping operations for false cache sharing, this means it's not necessarily to stick using HITM tags and we can explore other snooping types (e.g. SNOOPX_PEER). For this reason, this patch renames HITM related display macros with suffix '_HITM', so it can be distinct if later add more display types for on other snooping type. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-10-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Add mean dimensions for peer operationsLeo Yan
This patch adds two dimensions for the mean value of peer operations. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-9-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Add dimensions of peer metrics for cache line viewLeo Yan
This patch adds dimensions of peer ops, which will be used for Shared cache line distribution pareto. It adds the percentage dimensions for local and remote peer operations, and the dimensions for accounting operation numbers which is used for stdio mode. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-8-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Add dimensions for peer load operationsLeo Yan
This patch adds three dimensions for peer load operations of 'lcl_peer', 'rmt_peer' and 'tot_peer'. These three dimensions will be used in the shared data cache line table. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf c2c: Output statistics for peer snoopingLeo Yan
This patch outputs statistics for peer snooping for whole trace events and global shared cache line. Reviewed-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-03perf c2c: Fix sorting in percent_rmt_hitm_cmp()Leo Yan
The function percent_rmt_hitm_cmp() wrongly uses local HITMs for sorting remote HITMs. Since this function is to sort cache lines for remote HITMs, this patch changes to use 'rmt_hitm' field for correct sorting. Fixes: 9cb3500afc0980c5 ("perf c2c report: Add hitm/store percent related sort keys") Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220530084253.750190-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf c2c: Use stdio interface if slang is not supportedLeo Yan
If the slang lib is not installed on the system, perf c2c tool disables TUI mode and roll back to use stdio mode; but the flag 'c2c.use_stdio' is missed to set true and thus it wrongly applies UI quirks in the function ui_quirks(). This commit forces to use stdio interface if slang is not supported, and it can avoid to apply the UI quirks and show the correct metric header. Before: ================================================= Shared Cache Line Distribution Pareto ================================================= ------------------------------------------------------------------------------- 0 0 0 99 0 0 0 0xaaaac17d6000 ------------------------------------------------------------------------------- 0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0 0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0 After: ================================================= Shared Cache Line Distribution Pareto ================================================= ------------------------------------------------------------------------------- 0 0 0 99 0 0 0 0xaaaac17d6000 ------------------------------------------------------------------------------- 0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0 0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0 Fixes: 5a1a99cd2e4e1557 ("perf c2c report: Add main TUI browser") Reported-by: Joe Mario <jmario@redhat.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220526145400.611249-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23perf c2c: Add dimensions for 'N/A' metrics of store operationLeo Yan
Since now we have the statistics 'st_na' for store operations, add dimensions for the 'N/A' (no available memory level) metrics and the associated percentage calculation for the single cache line view. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adam Li <adamli@amperemail.onmicrosoft.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Alyssa Ross <hi@alyssa.is> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220518055729.1869566-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26perf tools: Enhance the matching of sub-commands abbreviationsWei Li
We support short command 'rec*' for 'record' and 'rep*' for 'report' in lots of sub-commands, but the matching is not quite strict currnetly. It may be puzzling sometime, like we mis-type a 'recport' to report but it will perform 'record' in fact without any message. To fix this, add a check to ensure that the short cmd is valid prefix of the real command. Committer testing: [root@quaco ~]# perf c2c re sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # perf c2c rec sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ] # perf c2c recport sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # perf c2c record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ] # perf c2c records sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # Signed-off-by: Wei Li <liwei391@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rui Xiang <rui.xiang@huawei.com> Link: http://lore.kernel.org/lkml/20220325092032.2956161-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15perf c2c: Replace bitmap_weight() with bitmap_empty() where appropriateYury Norov
Some code in builtin-c2c.c calls bitmap_weight() to check if any bit of a given bitmap is set. It's better to use bitmap_empty() in that case because bitmap_empty() stops traversing the bitmap as soon as it finds first set bit, while bitmap_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Laight <david.laight@aculab.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: http://lore.kernel.org/lkml/20220123183925.1052919-13-yury.norov@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf cpumap: Give CPUs their own typeIan Rogers
A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf c2c: Use more intention revealing iteratorIan Rogers
Use perf_cpu_map__for_each_cpu() in setup_nodes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-46-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07perf tools: Check vmlinux/kallsyms arguments in all toolsJames Clark
Only perf report checked the validity of these arguments so apply the same check to all tools that read them for consistency. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20211018134844.2627174-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...
2021-09-08tools: rename bitmap_alloc() to bitmap_zalloc()Andy Shevchenko
Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API in the kernel. No functional changes intended. Link: https://lkml.kernel.org/r/20210814211713.180533-14-yury.norov@gmail.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Suggested-by: Yury Norov <yury.norov@gmail.com> Acked-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-02perf tools: Remove repipe argument from perf_session__new()Namhyung Kim
The repipe argument is only used by perf inject and the all others passes 'false'. Let's remove it from the function signature and add __perf_session__new() to be called from perf inject directly. This is a preparation of the change the pipe input/output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210719223153.1618812-2-namhyung@kernel.org [ Fixed up some trivial conflicts as this patchset fell thru the cracks ;-( ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-01perf c2c: Support record for hybrid platformJin Yao
Support 'perf c2c record' for hybrid platform. On hybrid platform, such as Alderlake, when executing 'perf c2c record', it actually calls: record -W -d --phys-data --sample-cpu -e {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}:P -e cpu_atom/mem-loads,ldlat=30/P -e cpu_core/mem-stores/P -e cpu_atom/mem-stores/P Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210527001610.10553-9-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-01perf tools: Support pmu prefix for mem-load eventJin Yao
The perf_mem_events__name() can generate the mem-load event name. It uses a variable 'mem_loads_name__init' to avoid generating the event name every time (because perf_pmu__scan takes some time). The perf_mem_events__name() assumes the pmu is "cpu" but it's not correct for hybrid platform. For Alderlake, the pmu is "cpu_core" or "cpu_atom" Introduce a new parameter 'pmu_name' in perf_mem_events__name to let the caller specify a pmu name. Considering such event name is x86 specific, so move perf_mem_events[] to arch/x86/util/mem-events.c. We still keep the variable 'mem_loads_name__init' but it's only used when pmu_name is NULL (compatible for original behavior). When pmu_name is not NULL (e.g. "cpu_core"), this patch doesn't have optimization. That can be implemented in follow up patch. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210527001610.10553-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf c2c: Support data block and addr blockKan Liang
'perf c2c' is also a memory profiling tool. Apply the two new data source fields to 'perf c2c' as well. Extend 'perf c2c' to display the number of loads which blocked by data or address conflict. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf c2c: Add local variables for output metricsLeo Yan
This patch adds several local variables: "cl_output": pointer for outputting single cache line metrics; "output_str": pointer for outputting cache line metrics; "sort_str": pointer to the sorting metrics. This can improve readability for the code and it's more flexible when later extend to different strings for the output metrics. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210114154646.209024-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf c2c: Refactor node displayLeo Yan
The macro DISPLAY_HITM() is used to calculate HITM percentage introduced by every node and it's shown for the node info. This patch introduces the static function display_metrics() to replace the macro, and the parameters are refined for passing the metric's statistic and sum value. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210114154646.209024-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf c2c: Fix argument type for percent()Leo Yan
For percent() its arguments are defined as integers; this is not consistent with its consumers which pass u32 arguments. Thus this patch makes argument type as u32 for percent(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210114154646.209024-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf c2c: Refactor display filterLeo Yan
When sorting on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm), the FILTER_HITM macro is used to filter out the cache line entries if its overhead is less than 1%. This patch introduces a static function filter_display() to replace that macro and refines its parameters with a more flexible way, rather than passing field name, it changes to pass the cache line's statistic and sum value. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210114154646.209024-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf c2c: Refactor hist entry validationLeo Yan
This patch has no functionality changes but refactors hist entry validation for cache line resorting. It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()", changes return type from integer type to bool type. In the function, it uses switch-case instead of ternary operators, which is easier to extend for more display types. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210114154646.209024-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf c2c: Rename for shared cache line statsLeo Yan
For shared cache line statistics, 'perf c2c' relies on HITM. We can use more general naming rather than only binding to HITM, so replace "hitm_stats" with "shared_clines_stats" in structure perf_c2c, and rename function resort_hitm_cb() to resort_shared_cl_cb(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20210114154646.209024-2-leo.yan@linaro.org
2020-11-11perf c2c: Support AUX traceLeo Yan
This patch adds the AUX callbacks in session structure, so support AUX trace for "perf c2c" tool; make itrace memory event as default for "perf c2c", this tells the AUX trace decoder to synthesize samples and can be used for statistics. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-9-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORELeo Yan
When user doesn't specify event name, perf c2c tool enables both the load and store events, and this leads to failure for opening the duplicate PMU device for AUX trace. After the memory event PERF_MEM_EVENTS__LOAD_STORE is introduced, when the user doesn't specify event name, this patch converts the required operation to PERF_MEM_EVENTS__LOAD_STORE if the arch supports it. Otherwise, the tool still rolls back to enable events PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf mem: Introduce weak function perf_mem_events__ptr()Leo Yan
Different architectures might use different event or different event parameters for memory profiling, this patch introduces a weak perf_mem_events__ptr() function which allows to return back a architecture specific memory event. Since the variable 'perf_mem_events' can be only accessed by the perf_mem_events__ptr() function, mark the variable as 'static', this allows the architectures to define its own memory event array. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15perf c2c: Add metrics "RMT Load Hit"Leo Yan
The metrics "LLC Ld Miss" and "Load Dram" overlap with each other for accouting items: "LLC Ld Miss" = "lcl_dram" + "rmt_dram" + "rmt_hit" + "rmt_hitm" "Load Dram" = "lcl_dram" + "rmt_dram" Furthermore, the metrics "LLC Ld Miss" is not directive to show statistics due to it contains summary value and cannot give out breakdown details. For this reason, add a new metrics "RMT Load Hit" which is used to present the remote cache hit; it contains two items: "RMT Load Hit" = remote hit ("rmt_hit") + remote hitm ("rmt_hitm") As result, the metrics "LLC Ld Miss" is perfectly divided into two metrics "RMT Load Hit" and "Load Dram". It's not necessary to keep metrics "LLC Ld Miss", so remove it. Before: # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- LLC --- Load Dram ---- # Index Address Node PA cnt Hitm Total LclHitm RmtHitm records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm Ld Miss Lcl Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ....... ........ ........ # 0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 548 2615 66 169 481 0 0 0 1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 187 361 27 11 78 0 0 0 2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 131 0 10 263 1 0 0 0 After: # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ---- # Index Address Node PA cnt Hitm Total LclHitm RmtHitm records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........ # 0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 548 2615 66 169 481 0 0 0 0 1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 187 361 27 11 78 0 0 0 0 2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 131 0 10 263 1 0 0 0 0 Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-9-leo.yan@linaro.org
2020-10-15perf c2c: Correct LLC load hit metricsLeo Yan
"rmt_hit" is accounted into two metrics: one is accounted into the metrics "LLC Ld Miss" (see the function llc_miss() for calculation "llcmiss"); and it's accounted into metrics "LLC Load Hit". Thus, for the literal meaning, it is contradictory that "rmt_hit" is accounted for both "LLC Ld Miss" (LLC miss) and "LLC Load Hit" (LLC hit). Thus this is easily to introduce confusion: "LLC Load Hit" gives impression that all items belong to it are LLC hit; in fact "rmt_hit" is LLC miss and remote cache hit. To give out clear semantics for metric "LLC Load Hit", "rmt_hit" is moved out from it and changes "LLC Load Hit" to contain two items: LLC Load Hit = LLC's hit ("ld_llchit") + LLC's hitm ("lcl_hitm") For output alignment, adjusts the header for "LLC Load Hit". Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-8-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15perf c2c: Change header for LLC local hitLeo Yan
Replace the header string "Lcl" with "LclHit", which is more explicit to express the event type is LLC local hit. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15perf c2c: Use more explicit headers for HITMLeo Yan
Local and remote HITM use the headers 'Lcl' and 'Rmt' respectively, suppose if we want to extend the tool to display these two dimensions under any one metrics, users cannot understand the semantics if only based on the header string 'Lcl' or 'Rmt'. To explicit express the meaning for HITM items, this patch changes the headers string as "LclHitm" and "RmtHitm", the strings are more readable and this allows to extend metrics for using HITM items. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>