diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-01-11 14:39:17 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-01-11 14:39:17 -0800 |
commit | 5cb52b5e1654f3f1ed9c32e34456d98559c85aa0 (patch) | |
tree | 737c73d6aef99a17f57c2974f1e2a142a5f1a377 /arch | |
parent | 24af98c4cf5f5e69266e270c7f3fb34b82ff6656 (diff) | |
parent | 3eb9ede23bdd96e9ba60e2b4d4d17a7c35d58448 (diff) | |
download | lwn-5cb52b5e1654f3f1ed9c32e34456d98559c85aa0.tar.gz lwn-5cb52b5e1654f3f1ed9c32e34456d98559c85aa0.zip |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Intel Knights Landing support. (Harish Chegondi)
- Intel Broadwell-EP uncore PMU support. (Kan Liang)
- Core code improvements. (Peter Zijlstra.)
- Event filter, LBR and PEBS fixes. (Stephane Eranian)
- Enable cycles:pp on Intel Atom. (Stephane Eranian)
- Add cycles:ppp support for Skylake. (Andi Kleen)
- Various x86 NMI overhead optimizations. (Andi Kleen)
- Intel PT enhancements. (Takao Indoh)
- AMD cache events fix. (Vince Weaver)
Tons of tooling changes:
- Show random perf tool tips in the 'perf report' bottom line
(Namhyung Kim)
- perf report now defaults to --group if the perf.data file has
grouped events, try it with:
# perf record -e '{cycles,instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
# perf report
# Samples: 1K of event 'anon group { cycles, instructions }'
# Event count (approx.): 1955219195
#
# Overhead Command Shared Object Symbol
2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle
1.05% 0.33% firefox libxul.so [.] js::SetObjectElement
1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno
0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab
0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1>
0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
0.62% 1.27% firefox libxul.so [.] js::GetIterator
0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty
0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining
- Introduce the 'perf stat record/report' workflow:
Generate perf.data files from 'perf stat', to tap into the
scripting capabilities perf has instead of defining a 'perf stat'
specific scripting support to calculate event ratios, etc.
Simple example:
$ perf stat record -e cycles usleep 1
Performance counter stats for 'usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$ perf stat report
Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$
It generates PERF_RECORD_ userspace records to store the details:
$ perf report -D | grep PERF_RECORD
0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
0x12a [0x40]: PERF_RECORD_STAT_CONFIG
0x16a [0x30]: PERF_RECORD_STAT
-1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
0x1da [0x18]: PERF_RECORD_STAT_ROUND
[acme@ssdandy linux]$
An effort was made to make perf.data files generated like this to
not generate cryptic messages when processed by older tools.
The 'perf script' bits need rebasing, will go up later.
- Make command line options always available, even when they depend
on some feature being enabled, warning the user about use of such
options (Wang Nan)
- Support hw breakpoint events (mem:0xAddress) in the default output
mode in 'perf script' (Wang Nan)
- Fixes and improvements for supporting annotating ARM binaries,
support ARM call and jump instructions, more work needed to have
arch specific stuff separated into tools/perf/arch/*/annotate/
(Russell King)
- Add initial 'perf config' command, for now just with a --list
command to the contents of the configuration file in use and a
basic man page describing its format, commands for doing edits and
detailed documentation are being reviewed and proof-read. (Taeung
Song)
- Allows BPF scriptlets specify arguments to be fetched using DWARF
info, using a prologue generated at compile/build time (He Kuang,
Wang Nan)
- Allow attaching BPF scriptlets to module symbols (Wang Nan)
- Allow attaching BPF scriptlets to userspace code using uprobe (Wang
Nan)
- BPF programs now can specify 'perf probe' tunables via its section
name, separating key=val values using semicolons (Wang Nan)
Testing some of these new BPF features:
Use case: get callchains when receiving SSL packets, filter then in the
kernel, at arbitrary place.
# cat ssl.bpf.c
#define SEC(NAME) __attribute__((section(NAME), used))
struct pt_regs;
SEC("func=__inet_lookup_established hnum")
int func(struct pt_regs *ctx, int err, unsigned short port)
{
return err == 0 && port == 443;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#
# perf record -a -g -e ssl.bpf.c
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
# perf script | head -30
swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
1163ffa start_kernel ([kernel.vmlinux].init.text)
11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
430a br_handle_frame_finish ([bridge])
48bc br_handle_frame ([bridge])
855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
#
- Use 'perf probe' various options to list functions, see what
variables can be collected at any given point, experiment first
collecting without a filter, then filter, use it together with
'perf trace', 'perf top', with or without callchains, if it
explodes, please tell us!
- Introduce a new callchain mode: "folded", that will list per line
representations of all callchains for a give histogram entry,
facilitating 'perf report' output processing by other tools, such
as Brendan Gregg's flamegraph tools (Namhyung Kim)
E.g:
# perf report | grep -v ^# | head
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
|
---cpu_startup_entry
|
|--12.07%--start_secondary
|
--6.30%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
#
Becomes, in "folded" mode:
# perf report -g folded | grep -v ^# | head -5
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
12.07% cpu_startup_entry;start_secondary
6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
11.23% call_cpuidle;cpu_startup_entry;start_secondary
5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state
#
The user can also select one of "count", "period" or "percent" as
the first column.
... and lots of infrastructure enhancements, plus fixes and other
changes, features I failed to list - see the shortlog and the git log
for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
perf evlist: Add --trace-fields option to show trace fields
perf record: Store data mmaps for dwarf unwind
perf libdw: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Use find_map function in access_dso_mem
perf evlist: Remove perf_evlist__(enable|disable)_event functions
perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
perf report: Show random usage tip on the help line
perf hists: Export a couple of hist functions
perf diff: Use perf_hpp__register_sort_field interface
perf tools: Add overhead/overhead_children keys defaults via string
perf tools: Remove list entry from struct sort_entry
perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
perf script: Align event name properly
perf tools: Add missing headers in perf's MANIFEST
perf tools: Do not show trace command if it's not compiled in
perf report: Change default to use event group view
perf top: Decay periods in callchains
tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
tools lib: Sync tools/lib/find_bit.c with the kernel
...
Diffstat (limited to 'arch')
-rw-r--r-- | arch/x86/include/asm/atomic.h | 1 | ||||
-rw-r--r-- | arch/x86/include/asm/atomic64_32.h | 1 | ||||
-rw-r--r-- | arch/x86/include/asm/intel_pt.h | 10 | ||||
-rw-r--r-- | arch/x86/include/asm/msr-trace.h | 57 | ||||
-rw-r--r-- | arch/x86/include/asm/msr.h | 31 | ||||
-rw-r--r-- | arch/x86/include/asm/uaccess.h | 9 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event.c | 36 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event.h | 21 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_amd.c | 2 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel.c | 115 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_ds.c | 39 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_lbr.c | 42 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_pt.c | 9 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_rapl.c | 25 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_uncore.c | 17 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_uncore.h | 3 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c | 2 | ||||
-rw-r--r-- | arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c | 635 | ||||
-rw-r--r-- | arch/x86/kernel/crash.c | 11 | ||||
-rw-r--r-- | arch/x86/lib/msr.c | 26 |
20 files changed, 1029 insertions, 63 deletions
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h index ae5fb83e6d91..3e8674288198 100644 --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -3,7 +3,6 @@ #include <linux/compiler.h> #include <linux/types.h> -#include <asm/processor.h> #include <asm/alternative.h> #include <asm/cmpxchg.h> #include <asm/rmwcc.h> diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h index a11c30b77fb5..a984111135b1 100644 --- a/arch/x86/include/asm/atomic64_32.h +++ b/arch/x86/include/asm/atomic64_32.h @@ -3,7 +3,6 @@ #include <linux/compiler.h> #include <linux/types.h> -#include <asm/processor.h> //#include <asm/cmpxchg.h> /* An 64bit atomic type */ diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h new file mode 100644 index 000000000000..e1a411786bf5 --- /dev/null +++ b/arch/x86/include/asm/intel_pt.h @@ -0,0 +1,10 @@ +#ifndef _ASM_X86_INTEL_PT_H +#define _ASM_X86_INTEL_PT_H + +#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) +void cpu_emergency_stop_pt(void); +#else +static inline void cpu_emergency_stop_pt(void) {} +#endif + +#endif /* _ASM_X86_INTEL_PT_H */ diff --git a/arch/x86/include/asm/msr-trace.h b/arch/x86/include/asm/msr-trace.h new file mode 100644 index 000000000000..7567225747d8 --- /dev/null +++ b/arch/x86/include/asm/msr-trace.h @@ -0,0 +1,57 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM msr + +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE msr-trace + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH asm/ + +#if !defined(_TRACE_MSR_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_MSR_H + +#include <linux/tracepoint.h> + +/* + * Tracing for x86 model specific registers. Directly maps to the + * RDMSR/WRMSR instructions. + */ + +DECLARE_EVENT_CLASS(msr_trace_class, + TP_PROTO(unsigned msr, u64 val, int failed), + TP_ARGS(msr, val, failed), + TP_STRUCT__entry( + __field( unsigned, msr ) + __field( u64, val ) + __field( int, failed ) + ), + TP_fast_assign( + __entry->msr = msr; + __entry->val = val; + __entry->failed = failed; + ), + TP_printk("%x, value %llx%s", + __entry->msr, + __entry->val, + __entry->failed ? " #GP" : "") +); + +DEFINE_EVENT(msr_trace_class, read_msr, + TP_PROTO(unsigned msr, u64 val, int failed), + TP_ARGS(msr, val, failed) +); + +DEFINE_EVENT(msr_trace_class, write_msr, + TP_PROTO(unsigned msr, u64 val, int failed), + TP_ARGS(msr, val, failed) +); + +DEFINE_EVENT(msr_trace_class, rdpmc, + TP_PROTO(unsigned msr, u64 val, int failed), + TP_ARGS(msr, val, failed) +); + +#endif /* _TRACE_MSR_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 77d8b284e4a7..fedd6e6d1e43 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -57,11 +57,34 @@ static inline unsigned long long native_read_tscp(unsigned int *aux) #define EAX_EDX_RET(val, low, high) "=A" (val) #endif +#ifdef CONFIG_TRACEPOINTS +/* + * Be very careful with includes. This header is prone to include loops. + */ +#include <asm/atomic.h> +#include <linux/tracepoint-defs.h> + +extern struct tracepoint __tracepoint_read_msr; +extern struct tracepoint __tracepoint_write_msr; +extern struct tracepoint __tracepoint_rdpmc; +#define msr_tracepoint_active(t) static_key_false(&(t).key) +extern void do_trace_write_msr(unsigned msr, u64 val, int failed); +extern void do_trace_read_msr(unsigned msr, u64 val, int failed); +extern void do_trace_rdpmc(unsigned msr, u64 val, int failed); +#else +#define msr_tracepoint_active(t) false +static inline void do_trace_write_msr(unsigned msr, u64 val, int failed) {} +static inline void do_trace_read_msr(unsigned msr, u64 val, int failed) {} +static inline void do_trace_rdpmc(unsigned msr, u64 val, int failed) {} +#endif + static inline unsigned long long native_read_msr(unsigned int msr) { DECLARE_ARGS(val, low, high); asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr)); + if (msr_tracepoint_active(__tracepoint_read_msr)) + do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0); return EAX_EDX_VAL(val, low, high); } @@ -78,6 +101,8 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr, _ASM_EXTABLE(2b, 3b) : [err] "=r" (*err), EAX_EDX_RET(val, low, high) : "c" (msr), [fault] "i" (-EIO)); + if (msr_tracepoint_active(__tracepoint_read_msr)) + do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), *err); return EAX_EDX_VAL(val, low, high); } @@ -85,6 +110,8 @@ static inline void native_write_msr(unsigned int msr, unsigned low, unsigned high) { asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory"); + if (msr_tracepoint_active(__tracepoint_read_msr)) + do_trace_write_msr(msr, ((u64)high << 32 | low), 0); } /* Can be uninlined because referenced by paravirt */ @@ -102,6 +129,8 @@ notrace static inline int native_write_msr_safe(unsigned int msr, : "c" (msr), "0" (low), "d" (high), [fault] "i" (-EIO) : "memory"); + if (msr_tracepoint_active(__tracepoint_read_msr)) + do_trace_write_msr(msr, ((u64)high << 32 | low), err); return err; } @@ -160,6 +189,8 @@ static inline unsigned long long native_read_pmc(int counter) DECLARE_ARGS(val, low, high); asm volatile("rdpmc" : EAX_EDX_RET(val, low, high) : "c" (counter)); + if (msr_tracepoint_active(__tracepoint_rdpmc)) + do_trace_rdpmc(counter, EAX_EDX_VAL(val, low, high), 0); return EAX_EDX_VAL(val, low, high); } diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 09b1b0ab94b7..660458af425d 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -745,5 +745,14 @@ copy_to_user(void __user *to, const void *from, unsigned long n) #undef __copy_from_user_overflow #undef __copy_to_user_overflow +/* + * We rely on the nested NMI work to allow atomic faults from the NMI path; the + * nested NMI paths are careful to preserve CR2. + * + * Caller must use pagefault_enable/disable, or run in interrupt context, + * and also do a uaccess_ok() check + */ +#define __copy_from_user_nmi __copy_from_user_inatomic + #endif /* _ASM_X86_UACCESS_H */ diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 2bf79d7c97df..1b443db2db50 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -482,6 +482,9 @@ int x86_pmu_hw_config(struct perf_event *event) /* Support for IP fixup */ if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >= 2) precise++; + + if (x86_pmu.pebs_prec_dist) + precise++; } if (event->attr.precise_ip > precise) @@ -1531,6 +1534,7 @@ static void __init filter_events(struct attribute **attrs) { struct device_attribute *d; struct perf_pmu_events_attr *pmu_attr; + int offset = 0; int i, j; for (i = 0; attrs[i]; i++) { @@ -1539,7 +1543,7 @@ static void __init filter_events(struct attribute **attrs) /* str trumps id */ if (pmu_attr->event_str) continue; - if (x86_pmu.event_map(i)) + if (x86_pmu.event_map(i + offset)) continue; for (j = i; attrs[j]; j++) @@ -1547,6 +1551,14 @@ static void __init filter_events(struct attribute **attrs) /* Check the shifted attr. */ i--; + + /* + * event_map() is index based, the attrs array is organized + * by increasing event index. If we shift the events, then + * we need to compensate for the event_map(), otherwise + * we are looking up the wrong event in the map + */ + offset++; } } @@ -2250,12 +2262,19 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry) ss_base = get_segment_base(regs->ss); fp = compat_ptr(ss_base + regs->bp); + pagefault_disable(); while (entry->nr < PERF_MAX_STACK_DEPTH) { unsigned long bytes; frame.next_frame = 0; frame.return_address = 0; - bytes = copy_from_user_nmi(&frame, fp, sizeof(frame)); + if (!access_ok(VERIFY_READ, fp, 8)) + break; + + bytes = __copy_from_user_nmi(&frame.next_frame, fp, 4); + if (bytes != 0) + break; + bytes = __copy_from_user_nmi(&frame.return_address, fp+4, 4); if (bytes != 0) break; @@ -2265,6 +2284,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry) perf_callchain_store(entry, cs_base + frame.return_address); fp = compat_ptr(ss_base + frame.next_frame); } + pagefault_enable(); return 1; } #else @@ -2302,12 +2322,19 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs) if (perf_callchain_user32(regs, entry)) return; + pagefault_disable(); while (entry->nr < PERF_MAX_STACK_DEPTH) { unsigned long bytes; frame.next_frame = NULL; frame.return_address = 0; - bytes = copy_from_user_nmi(&frame, fp, sizeof(frame)); + if (!access_ok(VERIFY_READ, fp, 16)) + break; + + bytes = __copy_from_user_nmi(&frame.next_frame, fp, 8); + if (bytes != 0) + break; + bytes = __copy_from_user_nmi(&frame.return_address, fp+8, 8); if (bytes != 0) break; @@ -2315,8 +2342,9 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs) break; perf_callchain_store(entry, frame.return_address); - fp = frame.next_frame; + fp = (void __user *)frame.next_frame; } + pagefault_enable(); } /* diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index d0e35ebb2adb..7bb61e32fb29 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -14,17 +14,7 @@ #include <linux/perf_event.h> -#if 0 -#undef wrmsrl -#define wrmsrl(msr, val) \ -do { \ - unsigned int _msr = (msr); \ - u64 _val = (val); \ - trace_printk("wrmsrl(%x, %Lx)\n", (unsigned int)(_msr), \ - (unsigned long long)(_val)); \ - native_write_msr((_msr), (u32)(_val), (u32)(_val >> 32)); \ -} while (0) -#endif +/* To enable MSR tracing please use the generic trace points. */ /* * | NHM/WSM | SNB | @@ -318,6 +308,10 @@ struct cpu_hw_events { #define INTEL_UEVENT_CONSTRAINT(c, n) \ EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK) +/* Constraint on specific umask bit only + event */ +#define INTEL_UBIT_EVENT_CONSTRAINT(c, n) \ + EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT|(c)) + /* Like UEVENT_CONSTRAINT, but match flags too */ #define INTEL_FLAGS_UEVENT_CONSTRAINT(c, n) \ EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS) @@ -589,7 +583,8 @@ struct x86_pmu { bts_active :1, pebs :1, pebs_active :1, - pebs_broken :1; + pebs_broken :1, + pebs_prec_dist :1; int pebs_record_size; void (*drain_pebs)(struct pt_regs *regs); struct event_constraint *pebs_constraints; @@ -907,6 +902,8 @@ void intel_pmu_lbr_init_hsw(void); void intel_pmu_lbr_init_skl(void); +void intel_pmu_lbr_init_knl(void); + int intel_pmu_setup_lbr_filter(struct perf_event *event); void intel_pt_interrupt(void); diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c index 1cee5d2d7ece..05e76bf65781 100644 --- a/arch/x86/kernel/cpu/perf_event_amd.c +++ b/arch/x86/kernel/cpu/perf_event_amd.c @@ -18,7 +18,7 @@ static __initconst const u64 amd_hw_cache_event_ids [ C(RESULT_MISS) ] = 0x0141, /* Data Cache Misses */ }, [ C(OP_WRITE) ] = { - [ C(RESULT_ACCESS) ] = 0x0142, /* Data Cache Refills :system */ + [ C(RESULT_ACCESS) ] = 0, [ C(RESULT_MISS) ] = 0, }, [ C(OP_PREFETCH) ] = { diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index e2a430021e46..a667078a5180 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -185,6 +185,14 @@ struct event_constraint intel_skl_event_constraints[] = { EVENT_CONSTRAINT_END }; +static struct extra_reg intel_knl_extra_regs[] __read_mostly = { + INTEL_UEVENT_EXTRA_REG(0x01b7, + MSR_OFFCORE_RSP_0, 0x7f9ffbffffull, RSP_0), + INTEL_UEVENT_EXTRA_REG(0x02b7, + MSR_OFFCORE_RSP_1, 0x3f9ffbffffull, RSP_1), + EVENT_EXTRA_END +}; + static struct extra_reg intel_snb_extra_regs[] __read_mostly = { /* must define OFFCORE_RSP_X first, see intel_fixup_er() */ INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3f807f8fffull, RSP_0), @@ -255,7 +263,7 @@ struct event_constraint intel_bdw_event_constraints[] = { FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ - INTEL_UEVENT_CONSTRAINT(0x8a3, 0x4), /* CYCLE_ACTIVITY.CYCLES_L1D_MISS */ + INTEL_UBIT_EVENT_CONSTRAINT(0x8a3, 0x4), /* CYCLE_ACTIVITY.CYCLES_L1D_MISS */ EVENT_CONSTRAINT_END }; @@ -1457,6 +1465,42 @@ static __initconst const u64 slm_hw_cache_event_ids }, }; +#define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */ +#define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */ +#define KNL_MCDRAM_LOCAL BIT_ULL(21) +#define KNL_MCDRAM_FAR BIT_ULL(22) +#define KNL_DDR_LOCAL BIT_ULL(23) +#define KNL_DDR_FAR BIT_ULL(24) +#define KNL_DRAM_ANY (KNL_MCDRAM_LOCAL | KNL_MCDRAM_FAR | \ + KNL_DDR_LOCAL | KNL_DDR_FAR) +#define KNL_L2_READ SLM_DMND_READ +#define KNL_L2_WRITE SLM_DMND_WRITE +#define KNL_L2_PREFETCH SLM_DMND_PREFETCH +#define KNL_L2_ACCESS SLM_LLC_ACCESS +#define KNL_L2_MISS (KNL_OT_L2_HITE | KNL_OT_L2_HITF | \ + KNL_DRAM_ANY | SNB_SNP_ANY | \ + SNB_NON_DRAM) + +static __initconst const u64 knl_hw_cache_extra_regs + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = { + [C(LL)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = KNL_L2_READ | KNL_L2_ACCESS, + [C(RESULT_MISS)] = 0, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = KNL_L2_WRITE | KNL_L2_ACCESS, + [C(RESULT_MISS)] = KNL_L2_WRITE | KNL_L2_MISS, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = KNL_L2_PREFETCH | KNL_L2_ACCESS, + [C(RESULT_MISS)] = KNL_L2_PREFETCH | KNL_L2_MISS, + }, + }, +}; + /* * Use from PMIs where the LBRs are already disabled. */ @@ -2475,6 +2519,44 @@ static void intel_pebs_aliases_snb(struct perf_event *event) } } +static void intel_pebs_aliases_precdist(struct perf_event *event) +{ + if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) { + /* + * Use an alternative encoding for CPU_CLK_UNHALTED.THREAD_P + * (0x003c) so that we can use it with PEBS. + * + * The regular CPU_CLK_UNHALTED.THREAD_P event (0x003c) isn't + * PEBS capable. However we can use INST_RETIRED.PREC_DIST + * (0x01c0), which is a PEBS capable event, to get the same + * count. + * + * The PREC_DIST event has special support to minimize sample + * shadowing effects. One drawback is that it can be + * only programmed on counter 1, but that seems like an + * acceptable trade off. + */ + u64 alt_config = X86_CONFIG(.event=0xc0, .umask=0x01, .inv=1, .cmask=16); + + alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK); + event->hw.config = alt_config; + } +} + +static void intel_pebs_aliases_ivb(struct perf_event *event) +{ + if (event->attr.precise_ip < 3) + return intel_pebs_aliases_snb(event); + return intel_pebs_aliases_precdist(event); +} + +static void intel_pebs_aliases_skl(struct perf_event *event) +{ + if (event->attr.precise_ip < 3) + return intel_pebs_aliases_core2(event); + return intel_pebs_aliases_precdist(event); +} + static unsigned long intel_pmu_free_running_flags(struct perf_event *event) { unsigned long flags = x86_pmu.free_running_flags; @@ -3332,6 +3414,7 @@ __init int intel_pmu_init(void) x86_pmu.event_constraints = intel_gen_event_constraints; x86_pmu.pebs_constraints = intel_atom_pebs_event_constraints; + x86_pmu.pebs_aliases = intel_pebs_aliases_core2; pr_cont("Atom events, "); break; @@ -3431,7 +3514,8 @@ __init int intel_pmu_init(void) x86_pmu.event_constraints = intel_ivb_event_constraints; x86_pmu.pebs_constraints = intel_ivb_pebs_event_constraints; - x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + x86_pmu.pebs_aliases = intel_pebs_aliases_ivb; + x86_pmu.pebs_prec_dist = true; if (boot_cpu_data.x86_model == 62) x86_pmu.extra_regs = intel_snbep_extra_regs; else @@ -3464,7 +3548,8 @@ __init int intel_pmu_init(void) x86_pmu.event_constraints = intel_hsw_event_constraints; x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints; x86_pmu.extra_regs = intel_snbep_extra_regs; - x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + x86_pmu.pebs_aliases = intel_pebs_aliases_ivb; + x86_pmu.pebs_prec_dist = true; /* all extra regs are per-cpu when HT is on */ x86_pmu.flags |= PMU_FL_HAS_RSP_1; x86_pmu.flags |= PMU_FL_NO_HT_SHARING; @@ -3499,7 +3584,8 @@ __init int intel_pmu_init(void) x86_pmu.event_constraints = intel_bdw_event_constraints; x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints; x86_pmu.extra_regs = intel_snbep_extra_regs; - x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + x86_pmu.pebs_aliases = intel_pebs_aliases_ivb; + x86_pmu.pebs_prec_dist = true; /* all extra regs are per-cpu when HT is on */ x86_pmu.flags |= PMU_FL_HAS_RSP_1; x86_pmu.flags |= PMU_FL_NO_HT_SHARING; @@ -3511,6 +3597,24 @@ __init int intel_pmu_init(void) pr_cont("Broadwell events, "); break; + case 87: /* Knights Landing Xeon Phi */ + memcpy(hw_cache_event_ids, + slm_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + memcpy(hw_cache_extra_regs, + knl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); + intel_pmu_lbr_init_knl(); + + x86_pmu.event_constraints = intel_slm_event_constraints; + x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints; + x86_pmu.extra_regs = intel_knl_extra_regs; + + /* all extra regs are per-cpu when HT is on */ + x86_pmu.flags |= PMU_FL_HAS_RSP_1; + x86_pmu.flags |= PMU_FL_NO_HT_SHARING; + + pr_cont("Knights Landing events, "); + break; + case 78: /* 14nm Skylake Mobile */ case 94: /* 14nm Skylake Desktop */ x86_pmu.late_ack = true; @@ -3521,7 +3625,8 @@ __init int intel_pmu_init(void) x86_pmu.event_constraints = intel_skl_event_constraints; x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints; x86_pmu.extra_regs = intel_skl_extra_regs; - x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + x86_pmu.pebs_aliases = intel_pebs_aliases_skl; + x86_pmu.pebs_prec_dist = true; /* all extra regs are per-cpu when HT is on */ x86_pmu.flags |= PMU_FL_HAS_RSP_1; x86_pmu.flags |= PMU_FL_NO_HT_SHARING; diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index 5db1c7755548..10602f0a438f 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -620,6 +620,8 @@ struct event_constraint intel_atom_pebs_event_constraints[] = { INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED.* */ /* INST_RETIRED.ANY_P, inv=1, cmask=16 (cycles:p). */ INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x01), + /* Allow all events as PEBS with no flags */ + INTEL_ALL_EVENT_CONSTRAINT(0, 0x1), EVENT_CONSTRAINT_END }; @@ -686,6 +688,8 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = { INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */ /* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */ INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf), + /* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */ + INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c0, 0x2), INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */ INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */ INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */ @@ -700,6 +704,8 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = { INTEL_PLD_CONSTRAINT(0x01cd, 0xf), /* MEM_TRANS_RETIRED.* */ /* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */ INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf), + /* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */ + INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c0, 0x2), INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */ @@ -718,9 +724,10 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = { struct event_constraint intel_skl_pebs_event_constraints[] = { INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x2), /* INST_RETIRED.PREC_DIST */ - INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */ - /* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */ - INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf), + /* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */ + INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c0, 0x2), + /* INST_RETIRED.TOTAL_CYCLES_PS (inv=1, cmask=16) (cycles:p). */ + INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x0f), INTEL_PLD_CONSTRAINT(0x1cd, 0xf), /* MEM_TRANS_RETIRED.* */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */ @@ -1101,6 +1108,13 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit) void *at; u64 pebs_status; + /* + * fmt0 does not have a status bitfield (does not use + * perf_record_nhm format) + */ + if (x86_pmu.intel_cap.pebs_format < 1) + return base; + if (base == NULL) return NULL; @@ -1186,7 +1200,7 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs) if (!event->attr.precise_ip) return; - n = (top - at) / x86_pmu.pebs_record_size; + n = top - at; if (n <= 0) return; @@ -1230,12 +1244,21 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs) pebs_status = p->status & cpuc->pebs_enabled; pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1; + /* + * On some CPUs the PEBS status can be zero when PEBS is + * racing with clearing of GLOBAL_STATUS. + * + * Normally we would drop that record, but in the + * case when there is only a single active PEBS event + * we can assume it's for that event. + */ + if (!pebs_status && cpuc->pebs_enabled && + !(cpuc->pebs_enabled & (cpuc->pebs_enabled-1))) + pebs_status = cpuc->pebs_enabled; + bit = find_first_bit((unsigned long *)&pebs_status, x86_pmu.max_pebs_events); - if (WARN(bit >= x86_pmu.max_pebs_events, - "PEBS record without PEBS event! status=%Lx pebs_enabled=%Lx active_mask=%Lx", - (unsigned long long)p->status, (unsigned long long)cpuc->pebs_enabled, - *(unsigned long long *)cpuc->active_mask)) + if (bit >= x86_pmu.max_pebs_events) continue; /* diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index 659f01e165d5..653f88d25987 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -42,6 +42,13 @@ static enum { #define LBR_FAR_BIT 8 /* do not capture far branches */ #define LBR_CALL_STACK_BIT 9 /* enable call stack */ +/* + * Following bit only exists in Linux; we mask it out before writing it to + * the actual MSR. But it helps the constraint perf code to understand + * that this is a separate configuration. + */ +#define LBR_NO_INFO_BIT 63 /* don't read LBR_INFO. */ + #define LBR_KERNEL (1 << LBR_KERNEL_BIT) #define LBR_USER (1 << LBR_USER_BIT) #define LBR_JCC (1 << LBR_JCC_BIT) @@ -52,6 +59,7 @@ static enum { #define LBR_IND_JMP (1 << LBR_IND_JMP_BIT) #define LBR_FAR (1 << LBR_FAR_BIT) #define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT) +#define LBR_NO_INFO (1ULL << LBR_NO_INFO_BIT) #define LBR_PLM (LBR_KERNEL | LBR_USER) @@ -152,8 +160,8 @@ static void __intel_pmu_lbr_enable(bool pmi) * did not change. */ if (cpuc->lbr_sel) - lbr_select = cpuc->lbr_sel->config; - if (!pmi) + lbr_select = cpuc->lbr_sel->config & x86_pmu.lbr_sel_mask; + if (!pmi && cpuc->lbr_sel) wrmsrl(MSR_LBR_SELECT, lbr_select); rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); @@ -422,6 +430,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) */ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) { + bool need_info = false; unsigned long mask = x86_pmu.lbr_nr - 1; int lbr_format = x86_pmu.intel_cap.lbr_format; u64 tos = intel_pmu_lbr_tos(); @@ -429,8 +438,11 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) int out = 0; int num = x86_pmu.lbr_nr; - if (cpuc->lbr_sel->config & LBR_CALL_STACK) - num = tos; + if (cpuc->lbr_sel) { + need_info = !(cpuc->lbr_sel->config & LBR_NO_INFO); + if (cpuc->lbr_sel->config & LBR_CALL_STACK) + num = tos; + } for (i = 0; i < num; i++) { unsigned long lbr_idx = (tos - i) & mask; @@ -442,7 +454,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) rdmsrl(x86_pmu.lbr_from + lbr_idx, from); rdmsrl(x86_pmu.lbr_to + lbr_idx, to); - if (lbr_format == LBR_FORMAT_INFO) { + if (lbr_format == LBR_FORMAT_INFO && need_info) { u64 info; rdmsrl(MSR_LBR_INFO_0 + lbr_idx, info); @@ -590,6 +602,7 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event) if (v != LBR_IGN) mask |= v; } + reg = &event->hw.branch_reg; reg->idx = EXTRA_REG_LBR; @@ -600,6 +613,11 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event) */ reg->config = mask ^ x86_pmu.lbr_sel_mask; + if ((br_type & PERF_SAMPLE_BRANCH_NO_CYCLES) && + (br_type & PERF_SAMPLE_BRANCH_NO_FLAGS) && + (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)) + reg->config |= LBR_NO_INFO; + return 0; } @@ -1028,3 +1046,17 @@ void __init intel_pmu_lbr_init_atom(void) */ pr_cont("8-deep LBR, "); } + +/* Knights Landing */ +void intel_pmu_lbr_init_knl(void) +{ + x86_pmu.lbr_nr = 8; + x86_pmu.lbr_tos = MSR_LBR_TOS; + x86_pmu.lbr_from = MSR_LBR_NHM_FROM; + x86_pmu.lbr_to = MSR_LBR_NHM_TO; + + x86_pmu.lbr_sel_mask = LBR_SEL_MASK; + x86_pmu.lbr_sel_map = snb_lbr_sel_map; + + pr_cont("8-deep LBR, "); +} diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c index 868e1194337f..c0bbd1033b7c 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_pt.c +++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c @@ -27,6 +27,7 @@ #include <asm/perf_event.h> #include <asm/insn.h> #include <asm/io.h> +#include <asm/intel_pt.h> #include "perf_event.h" #include "intel_pt.h" @@ -1122,6 +1123,14 @@ static int pt_event_init(struct perf_event *event) return 0; } +void cpu_emergency_stop_pt(void) +{ + struct pt *pt = this_cpu_ptr(&pt_ctx); + + if (pt->handle.event) + pt_event_stop(pt->handle.event, PERF_EF_UPDATE); +} + static __init int pt_init(void) { int ret, cpu, prior_warn = 0; diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index ed446bdcbf31..24a351ad628d 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -63,7 +63,7 @@ #define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */ #define NR_RAPL_DOMAINS 0x4 -static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { +static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { "pp0-core", "package", "dram", @@ -109,11 +109,11 @@ static struct kobj_attribute format_attr_##_var = \ #define RAPL_CNTR_WIDTH 32 /* 32-bit rapl counters */ -#define RAPL_EVENT_ATTR_STR(_name, v, str) \ -static struct perf_pmu_events_attr event_attr_##v = { \ - .attr = __ATTR(_name, 0444, rapl_sysfs_show, NULL), \ - .id = 0, \ - .event_str = str, \ +#define RAPL_EVENT_ATTR_STR(_name, v, str) \ +static struct perf_pmu_events_attr event_attr_##v = { \ + .attr = __ATTR(_name, 0444, perf_event_sysfs_show, NULL), \ + .id = 0, \ + .event_str = str, \ }; struct rapl_pmu { @@ -405,19 +405,6 @@ static struct attribute_group rapl_pmu_attr_group = { .attrs = rapl_pmu_attrs, }; -static ssize_t rapl_sysfs_show(struct device *dev, - struct device_attribute *attr, - char *page) -{ - struct perf_pmu_events_attr *pmu_attr = \ - container_of(attr, struct perf_pmu_events_attr, attr); - - if (pmu_attr->event_str) - return sprintf(page, "%s", pmu_attr->event_str); - - return 0; -} - RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01"); RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02"); RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03"); diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c index 61215a69b03d..f97f8075bf04 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -884,6 +884,15 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id * each box has a different function id. */ pmu = &type->pmus[UNCORE_PCI_DEV_IDX(id->driver_data)]; + /* Knights Landing uses a common PCI device ID for multiple instances of + * an uncore PMU device type. There is only one entry per device type in + * the knl_uncore_pci_ids table inspite of multiple devices present for + * some device types. Hence PCI device idx would be 0 for all devices. + * So increment pmu pointer to point to an unused array element. + */ + if (boot_cpu_data.x86_model == 87) + while (pmu->func_id >= 0) + pmu++; if (pmu->func_id < 0) pmu->func_id = pdev->devfn; else @@ -966,6 +975,7 @@ static int __init uncore_pci_init(void) case 63: /* Haswell-EP */ ret = hswep_uncore_pci_init(); break; + case 79: /* BDX-EP */ case 86: /* BDX-DE */ ret = bdx_uncore_pci_init(); break; @@ -982,6 +992,9 @@ static int __init uncore_pci_init(void) case 61: /* Broadwell */ ret = bdw_uncore_pci_init(); break; + case 87: /* Knights Landing */ + ret = knl_uncore_pci_init(); + break; default: return 0; } @@ -1287,9 +1300,13 @@ static int __init uncore_cpu_init(void) case 63: /* Haswell-EP */ hswep_uncore_cpu_init(); break; + case 79: /* BDX-EP */ case 86: /* BDX-DE */ bdx_uncore_cpu_init(); break; + case 87: /* Knights Landing */ + knl_uncore_cpu_init(); + break; default: return 0; } diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h index 2f0a4a98e16b..07aa2d6bd710 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h @@ -338,6 +338,7 @@ int hsw_uncore_pci_init(void); int bdw_uncore_pci_init(void); void snb_uncore_cpu_init(void); void nhm_uncore_cpu_init(void); +int snb_pci2phy_map_init(int devid); /* perf_event_intel_uncore_snbep.c */ int snbep_uncore_pci_init(void); @@ -348,6 +349,8 @@ int hswep_uncore_pci_init(void); void hswep_uncore_cpu_init(void); int bdx_uncore_pci_init(void); void bdx_uncore_cpu_init(void); +int knl_uncore_pci_init(void); +void knl_uncore_cpu_init(void); /* perf_event_intel_uncore_nhmex.c */ void nhmex_uncore_cpu_init(void); diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c index 845256158a10..0b934820fafd 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c @@ -417,7 +417,7 @@ static void snb_uncore_imc_event_del(struct perf_event *event, int flags) } } -static int snb_pci2phy_map_init(int devid) +int snb_pci2phy_map_init(int devid) { struct pci_dev *dev = NULL; struct pci2phy_map *map; diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c index f0f4fcba252e..33acb884ccf1 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c @@ -209,31 +209,98 @@ #define HSWEP_PCU_MSR_PMON_BOX_CTL 0x710 #define HSWEP_PCU_MSR_PMON_BOX_FILTER 0x715 +/* KNL Ubox */ +#define KNL_U_MSR_PMON_RAW_EVENT_MASK \ + (SNBEP_U_MSR_PMON_RAW_EVENT_MASK | \ + SNBEP_CBO_PMON_CTL_TID_EN) +/* KNL CHA */ +#define KNL_CHA_MSR_OFFSET 0xc +#define KNL_CHA_MSR_PMON_CTL_QOR (1 << 16) +#define KNL_CHA_MSR_PMON_RAW_EVENT_MASK \ + (SNBEP_CBO_MSR_PMON_RAW_EVENT_MASK | \ + KNL_CHA_MSR_PMON_CTL_QOR) +#define KNL_CHA_MSR_PMON_BOX_FILTER_TID 0x1ff +#define KNL_CHA_MSR_PMON_BOX_FILTER_STATE (7 << 18) +#define KNL_CHA_MSR_PMON_BOX_FILTER_OP (0xfffffe2aULL << 32) + +/* KNL EDC/MC UCLK */ +#define KNL_UCLK_MSR_PMON_CTR0_LOW 0x400 +#define KNL_UCLK_MSR_PMON_CTL0 0x420 +#define KNL_UCLK_MSR_PMON_BOX_CTL 0x430 +#define KNL_UCLK_MSR_PMON_UCLK_FIXED_LOW 0x44c +#define KNL_UCLK_MSR_PMON_UCLK_FIXED_CTL 0x454 +#define KNL_PMON_FIXED_CTL_EN 0x1 + +/* KNL EDC */ +#define KNL_EDC0_ECLK_MSR_PMON_CTR0_LOW 0xa00 +#define KNL_EDC0_ECLK_MSR_PMON_CTL0 0xa20 +#define KNL_EDC0_ECLK_MSR_PMON_BOX_CTL 0xa30 +#define KNL_EDC0_ECLK_MSR_PMON_ECLK_FIXED_LOW 0xa3c +#define KNL_EDC0_ECLK_MSR_PMON_ECLK_FIXED_CTL 0xa44 + +/* KNL MC */ +#define KNL_MC0_CH0_MSR_PMON_CTR0_LOW 0xb00 +#define KNL_MC0_CH0_MSR_PMON_CTL0 0xb20 +#define KNL_MC0_CH0_MSR_PMON_BOX_CTL 0xb30 +#define KNL_MC0_CH0_MSR_PMON_FIXED_LOW 0xb3c +#define KNL_MC0_CH0_MSR_PMON_FIXED_CTL 0xb44 + +/* KNL IRP */ +#define KNL_IRP_PCI_PMON_BOX_CTL 0xf0 +#define KNL_IRP_PCI_PMON_RAW_EVENT_MASK (SNBEP_PMON_RAW_EVENT_MASK | \ + KNL_CHA_MSR_PMON_CTL_QOR) +/* KNL PCU */ +#define KNL_PCU_PMON_CTL_EV_SEL_MASK 0x0000007f +#define KNL_PCU_PMON_CTL_USE_OCC_CTR (1 << 7) +#define KNL_PCU_MSR_PMON_CTL_TRESH_MASK 0x3f000000 +#define KNL_PCU_MSR_PMON_RAW_EVENT_MASK \ + (KNL_PCU_PMON_CTL_EV_SEL_MASK | \ + KNL_PCU_PMON_CTL_USE_OCC_CTR | \ + SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \ + SNBEP_PMON_CTL_EDGE_DET | \ + SNBEP_CBO_PMON_CTL_TID_EN | \ + SNBEP_PMON_CTL_EV_SEL_EXT | \ + SNBEP_PMON_CTL_INVERT | \ + KNL_PCU_MSR_PMON_CTL_TRESH_MASK | \ + SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \ + SNBEP_PCU_MSR_PMON_CTL_OCC_EDGE_DET) DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7"); +DEFINE_UNCORE_FORMAT_ATTR(event2, event, "config:0-6"); DEFINE_UNCORE_FORMAT_ATTR(event_ext, event, "config:0-7,21"); +DEFINE_UNCORE_FORMAT_ATTR(use_occ_ctr, use_occ_ctr, "config:7"); DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15"); +DEFINE_UNCORE_FORMAT_ATTR(qor, qor, "config:16"); DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18"); DEFINE_UNCORE_FORMAT_ATTR(tid_en, tid_en, "config:19"); DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23"); DEFINE_UNCORE_FORMAT_ATTR(thresh8, thresh, "config:24-31"); +DEFINE_UNCORE_FORMAT_ATTR(thresh6, thresh, "config:24-29"); DEFINE_UNCORE_FORMAT_ATTR(thresh5, thresh, "config:24-28"); DEFINE_UNCORE_FORMAT_ATTR(occ_sel, occ_sel, "config:14-15"); DEFINE_UNCORE_FORMAT_ATTR(occ_invert, occ_invert, "config:30"); DEFINE_UNCORE_FORMAT_ATTR(occ_edge, occ_edge, "config:14-51"); +DEFINE_UNCORE_FORMAT_ATTR(occ_edge_det, occ_edge_det, "config:31"); DEFINE_UNCORE_FORMAT_ATTR(filter_tid, filter_tid, "config1:0-4"); DEFINE_UNCORE_FORMAT_ATTR(filter_tid2, filter_tid, "config1:0"); DEFINE_UNCORE_FORMAT_ATTR(filter_tid3, filter_tid, "config1:0-5"); +DEFINE_UNCORE_FORMAT_ATTR(filter_tid4, filter_tid, "config1:0-8"); DEFINE_UNCORE_FORMAT_ATTR(filter_cid, filter_cid, "config1:5"); DEFINE_UNCORE_FORMAT_ATTR(filter_link, filter_link, "config1:5-8"); DEFINE_UNCORE_FORMAT_ATTR(filter_link2, filter_link, "config1:6-8"); +DEFINE_UNCORE_FORMAT_ATTR(filter_link3, filter_link, "config1:12"); DEFINE_UNCORE_FORMAT_ATTR(filter_nid, filter_nid, "config1:10-17"); DEFINE_UNCORE_FORMAT_ATTR(filter_nid2, filter_nid, "config1:32-47"); DEFINE_UNCORE_FORMAT_ATTR(filter_state, filter_state, "config1:18-22"); DEFINE_UNCORE_FORMAT_ATTR(filter_state2, filter_state, "config1:17-22"); DEFINE_UNCORE_FORMAT_ATTR(filter_state3, filter_state, "config1:17-23"); +DEFINE_UNCORE_FORMAT_ATTR(filter_state4, filter_state, "config1:18-20"); +DEFINE_UNCORE_FORMAT_ATTR(filter_local, filter_local, "config1:33"); +DEFINE_UNCORE_FORMAT_ATTR(filter_all_op, filter_all_op, "config1:35"); +DEFINE_UNCORE_FORMAT_ATTR(filter_nnm, filter_nnm, "config1:37"); DEFINE_UNCORE_FORMAT_ATTR(filter_opc, filter_opc, "config1:23-31"); DEFINE_UNCORE_FORMAT_ATTR(filter_opc2, filter_opc, "config1:52-60"); +DEFINE_UNCORE_FORMAT_ATTR(filter_opc3, filter_opc, "config1:41-60"); DEFINE_UNCORE_FORMAT_ATTR(filter_nc, filter_nc, "config1:62"); DEFINE_UNCORE_FORMAT_ATTR(filter_c6, filter_c6, "config1:61"); DEFINE_UNCORE_FORMAT_ATTR(filter_isoc, filter_isoc, "config1:63"); @@ -315,8 +382,9 @@ static u64 snbep_uncore_pci_read_counter(struct intel_uncore_box *box, struct pe static void snbep_uncore_pci_init_box(struct intel_uncore_box *box) { struct pci_dev *pdev = box->pci_dev; + int box_ctl = uncore_pci_box_ctl(box); - pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, SNBEP_PMON_BOX_CTL_INT); + pci_write_config_dword(pdev, box_ctl, SNBEP_PMON_BOX_CTL_INT); } static void snbep_uncore_msr_disable_box(struct intel_uncore_box *box) @@ -1728,6 +1796,419 @@ int ivbep_uncore_pci_init(void) } /* end of IvyTown uncore support */ +/* KNL uncore support */ +static struct attribute *knl_uncore_ubox_formats_attr[] = { + &format_attr_event.attr, + &format_attr_umask.attr, + &format_attr_edge.attr, + &format_attr_tid_en.attr, + &format_attr_inv.attr, + &format_attr_thresh5.attr, + NULL, +}; + +static struct attribute_group knl_uncore_ubox_format_group = { + .name = "format", + .attrs = knl_uncore_ubox_formats_attr, +}; + +static struct intel_uncore_type knl_uncore_ubox = { + .name = "ubox", + .num_counters = 2, + .num_boxes = 1, + .perf_ctr_bits = 48, + .fixed_ctr_bits = 48, + .perf_ctr = HSWEP_U_MSR_PMON_CTR0, + .event_ctl = HSWEP_U_MSR_PMON_CTL0, + .event_mask = KNL_U_MSR_PMON_RAW_EVENT_MASK, + .fixed_ctr = HSWEP_U_MSR_PMON_UCLK_FIXED_CTR, + .fixed_ctl = HSWEP_U_MSR_PMON_UCLK_FIXED_CTL, + .ops = &snbep_uncore_msr_ops, + .format_group = &knl_uncore_ubox_format_group, +}; + +static struct attribute *knl_uncore_cha_formats_attr[] = { + &format_attr_event.attr, + &format_attr_umask.attr, + &format_attr_qor.attr, + &format_attr_edge.attr, + &format_attr_tid_en.attr, + &format_attr_inv.attr, + &format_attr_thresh8.attr, + &format_attr_filter_tid4.attr, + &format_attr_filter_link3.attr, + &format_attr_filter_state4.attr, + &format_attr_filter_local.attr, + &format_attr_filter_all_op.attr, + &format_attr_filter_nnm.attr, + &format_attr_filter_opc3.attr, + &format_attr_filter_nc.attr, + &format_attr_filter_isoc.attr, + NULL, +}; + +static struct attribute_group knl_uncore_cha_format_group = { + .name = "format", + .attrs = knl_uncore_cha_formats_attr, +}; + +static struct event_constraint knl_uncore_cha_constraints[] = { + UNCORE_EVENT_CONSTRAINT(0x11, 0x1), + UNCORE_EVENT_CONSTRAINT(0x1f, 0x1), + UNCORE_EVENT_CONSTRAINT(0x36, 0x1), + EVENT_CONSTRAINT_END +}; + +static struct extra_reg knl_uncore_cha_extra_regs[] = { + SNBEP_CBO_EVENT_EXTRA_REG(SNBEP_CBO_PMON_CTL_TID_EN, + SNBEP_CBO_PMON_CTL_TID_EN, 0x1), + SNBEP_CBO_EVENT_EXTRA_REG(0x3d, 0xff, 0x2), + SNBEP_CBO_EVENT_EXTRA_REG(0x35, 0xff, 0x4), + SNBEP_CBO_EVENT_EXTRA_REG(0x36, 0xff, 0x4), + EVENT_EXTRA_END +}; + +static u64 knl_cha_filter_mask(int fields) +{ + u64 mask = 0; + + if (fields & 0x1) + mask |= KNL_CHA_MSR_PMON_BOX_FILTER_TID; + if (fields & 0x2) + mask |= KNL_CHA_MSR_PMON_BOX_FILTER_STATE; + if (fields & 0x4) + mask |= KNL_CHA_MSR_PMON_BOX_FILTER_OP; + return mask; +} + +static struct event_constraint * +knl_cha_get_constraint(struct intel_uncore_box *box, struct perf_event *event) +{ + return __snbep_cbox_get_constraint(box, event, knl_cha_filter_mask); +} + +static int knl_cha_hw_config(struct intel_uncore_box *box, + struct perf_event *event) +{ + struct hw_perf_event_extra *reg1 = &event->hw.extra_reg; + struct extra_reg *er; + int idx = 0; + + for (er = knl_uncore_cha_extra_regs; er->msr; er++) { + if (er->event != (event->hw.config & er->config_mask)) + continue; + idx |= er->idx; + } + + if (idx) { + reg1->reg = HSWEP_C0_MSR_PMON_BOX_FILTER0 + + KNL_CHA_MSR_OFFSET * box->pmu->pmu_idx; + reg1->config = event->attr.config1 & knl_cha_filter_mask(idx); + reg1->idx = idx; + } + return 0; +} + +static void hswep_cbox_enable_event(struct intel_uncore_box *box, + struct perf_event *event); + +static struct intel_uncore_ops knl_uncore_cha_ops = { + .init_box = snbep_uncore_msr_init_box, + .disable_box = snbep_uncore_msr_disable_box, + .enable_box = snbep_uncore_msr_enable_box, + .disable_event = snbep_uncore_msr_disable_event, + .enable_event = hswep_cbox_enable_event, + .read_counter = uncore_msr_read_counter, + .hw_config = knl_cha_hw_config, + .get_constraint = knl_cha_get_constraint, + .put_constraint = snbep_cbox_put_constraint, +}; + +static struct intel_uncore_type knl_uncore_cha = { + .name = "cha", + .num_counters = 4, + .num_boxes = 38, + .perf_ctr_bits = 48, + .event_ctl = HSWEP_C0_MSR_PMON_CTL0, + .perf_ctr = HSWEP_C0_MSR_PMON_CTR0, + .event_mask = KNL_CHA_MSR_PMON_RAW_EVENT_MASK, + .box_ctl = HSWEP_C0_MSR_PMON_BOX_CTL, + .msr_offset = KNL_CHA_MSR_OFFSET, + .num_shared_regs = 1, + .constraints = knl_uncore_cha_constraints, + .ops = &knl_uncore_cha_ops, + .format_group = &knl_uncore_cha_format_group, +}; + +static struct attribute *knl_uncore_pcu_formats_attr[] = { + &format_attr_event2.attr, + &format_attr_use_occ_ctr.attr, + &format_attr_occ_sel.attr, + &format_attr_edge.attr, + &format_attr_tid_en.attr, + &format_attr_inv.attr, + &format_attr_thresh6.attr, + &format_attr_occ_invert.attr, + &format_attr_occ_edge_det.attr, + NULL, +}; + +static struct attribute_group knl_uncore_pcu_format_group = { + .name = "format", + .attrs = knl_uncore_pcu_formats_attr, +}; + +static struct intel_uncore_type knl_uncore_pcu = { + .name = "pcu", + .num_counters = 4, + .num_boxes = 1, + .perf_ctr_bits = 48, + .perf_ctr = HSWEP_PCU_MSR_PMON_CTR0, + .event_ctl = HSWEP_PCU_MSR_PMON_CTL0, + .event_mask = KNL_PCU_MSR_PMON_RAW_EVENT_MASK, + .box_ctl = HSWEP_PCU_MSR_PMON_BOX_CTL, + .ops = &snbep_uncore_msr_ops, + .format_group = &knl_uncore_pcu_format_group, +}; + +static struct intel_uncore_type *knl_msr_uncores[] = { + &knl_uncore_ubox, + &knl_uncore_cha, + &knl_uncore_pcu, + NULL, +}; + +void knl_uncore_cpu_init(void) +{ + uncore_msr_uncores = knl_msr_uncores; +} + +static void knl_uncore_imc_enable_box(struct intel_uncore_box *box) +{ + struct pci_dev *pdev = box->pci_dev; + int box_ctl = uncore_pci_box_ctl(box); + + pci_write_config_dword(pdev, box_ctl, 0); +} + +static void knl_uncore_imc_enable_event(struct intel_uncore_box *box, + struct perf_event *event) +{ + struct pci_dev *pdev = box->pci_dev; + struct hw_perf_event *hwc = &event->hw; + + if ((event->attr.config & SNBEP_PMON_CTL_EV_SEL_MASK) + == UNCORE_FIXED_EVENT) + pci_write_config_dword(pdev, hwc->config_base, + hwc->config | KNL_PMON_FIXED_CTL_EN); + else + pci_write_config_dword(pdev, hwc->config_base, + hwc->config | SNBEP_PMON_CTL_EN); +} + +static struct intel_uncore_ops knl_uncore_imc_ops = { + .init_box = snbep_uncore_pci_init_box, + .disable_box = snbep_uncore_pci_disable_box, + .enable_box = knl_uncore_imc_enable_box, + .read_counter = snbep_uncore_pci_read_counter, + .enable_event = knl_uncore_imc_enable_event, + .disable_event = snbep_uncore_pci_disable_event, +}; + +static struct intel_uncore_type knl_uncore_imc_uclk = { + .name = "imc_uclk", + .num_counters = 4, + .num_boxes = 2, + .perf_ctr_bits = 48, + .fixed_ctr_bits = 48, + .perf_ctr = KNL_UCLK_MSR_PMON_CTR0_LOW, + .event_ctl = KNL_UCLK_MSR_PMON_CTL0, + .event_mask = SNBEP_PMON_RAW_EVENT_MASK, + .fixed_ctr = KNL_UCLK_MSR_PMON_UCLK_FIXED_LOW, + .fixed_ctl = KNL_UCLK_MSR_PMON_UCLK_FIXED_CTL, + .box_ctl = KNL_UCLK_MSR_PMON_BOX_CTL, + .ops = &knl_uncore_imc_ops, + .format_group = &snbep_uncore_format_group, +}; + +static struct intel_uncore_type knl_uncore_imc_dclk = { + .name = "imc", + .num_counters = 4, + .num_boxes = 6, + .perf_ctr_bits = 48, + .fixed_ctr_bits = 48, + .perf_ctr = KNL_MC0_CH0_MSR_PMON_CTR0_LOW, + .event_ctl = KNL_MC0_CH0_MSR_PMON_CTL0, + .event_mask = SNBEP_PMON_RAW_EVENT_MASK, + .fixed_ctr = KNL_MC0_CH0_MSR_PMON_FIXED_LOW, + .fixed_ctl = KNL_MC0_CH0_MSR_PMON_FIXED_CTL, + .box_ctl = KNL_MC0_CH0_MSR_PMON_BOX_CTL, + .ops = &knl_uncore_imc_ops, + .format_group = &snbep_uncore_format_group, +}; + +static struct intel_uncore_type knl_uncore_edc_uclk = { + .name = "edc_uclk", + .num_counters = 4, + .num_boxes = 8, + .perf_ctr_bits = 48, + .fixed_ctr_bits = 48, + .perf_ctr = KNL_UCLK_MSR_PMON_CTR0_LOW, + .event_ctl = KNL_UCLK_MSR_PMON_CTL0, + .event_mask = SNBEP_PMON_RAW_EVENT_MASK, + .fixed_ctr = KNL_UCLK_MSR_PMON_UCLK_FIXED_LOW, + .fixed_ctl = KNL_UCLK_MSR_PMON_UCLK_FIXED_CTL, + .box_ctl = KNL_UCLK_MSR_PMON_BOX_CTL, + .ops = &knl_uncore_imc_ops, + .format_group = &snbep_uncore_format_group, +}; + +static struct intel_uncore_type knl_uncore_edc_eclk = { + .name = "edc_eclk", + .num_counters = 4, + .num_boxes = 8, + .perf_ctr_bits = 48, + .fixed_ctr_bits = 48, + .perf_ctr = KNL_EDC0_ECLK_MSR_PMON_CTR0_LOW, + .event_ctl = KNL_EDC0_ECLK_MSR_PMON_CTL0, + .event_mask = SNBEP_PMON_RAW_EVENT_MASK, + .fixed_ctr = KNL_EDC0_ECLK_MSR_PMON_ECLK_FIXED_LOW, + .fixed_ctl = KNL_EDC0_ECLK_MSR_PMON_ECLK_FIXED_CTL, + .box_ctl = KNL_EDC0_ECLK_MSR_PMON_BOX_CTL, + .ops = &knl_uncore_imc_ops, + .format_group = &snbep_uncore_format_group, +}; + +static struct event_constraint knl_uncore_m2pcie_constraints[] = { + UNCORE_EVENT_CONSTRAINT(0x23, 0x3), + EVENT_CONSTRAINT_END +}; + +static struct intel_uncore_type knl_uncore_m2pcie = { + .name = "m2pcie", + .num_counters = 4, + .num_boxes = 1, + .perf_ctr_bits = 48, + .constraints = knl_uncore_m2pcie_constraints, + SNBEP_UNCORE_PCI_COMMON_INIT(), +}; + +static struct attribute *knl_uncore_irp_formats_attr[] = { + &format_attr_event.attr, + &format_attr_umask.attr, + &format_attr_qor.attr, + &format_attr_edge.attr, + &format_attr_inv.attr, + &format_attr_thresh8.attr, + NULL, +}; + +static struct attribute_group knl_uncore_irp_format_group = { + .name = "format", + .attrs = knl_uncore_irp_formats_attr, +}; + +static struct intel_uncore_type knl_uncore_irp = { + .name = "irp", + .num_counters = 2, + .num_boxes = 1, + .perf_ctr_bits = 48, + .perf_ctr = SNBEP_PCI_PMON_CTR0, + .event_ctl = SNBEP_PCI_PMON_CTL0, + .event_mask = KNL_IRP_PCI_PMON_RAW_EVENT_MASK, + .box_ctl = KNL_IRP_PCI_PMON_BOX_CTL, + .ops = &snbep_uncore_pci_ops, + .format_group = &knl_uncore_irp_format_group, +}; + +enum { + KNL_PCI_UNCORE_MC_UCLK, + KNL_PCI_UNCORE_MC_DCLK, + KNL_PCI_UNCORE_EDC_UCLK, + KNL_PCI_UNCORE_EDC_ECLK, + KNL_PCI_UNCORE_M2PCIE, + KNL_PCI_UNCORE_IRP, +}; + +static struct intel_uncore_type *knl_pci_uncores[] = { + [KNL_PCI_UNCORE_MC_UCLK] = &knl_uncore_imc_uclk, + [KNL_PCI_UNCORE_MC_DCLK] = &knl_uncore_imc_dclk, + [KNL_PCI_UNCORE_EDC_UCLK] = &knl_uncore_edc_uclk, + [KNL_PCI_UNCORE_EDC_ECLK] = &knl_uncore_edc_eclk, + [KNL_PCI_UNCORE_M2PCIE] = &knl_uncore_m2pcie, + [KNL_PCI_UNCORE_IRP] = &knl_uncore_irp, + NULL, +}; + +/* + * KNL uses a common PCI device ID for multiple instances of an Uncore PMU + * device type. prior to KNL, each instance of a PMU device type had a unique + * device ID. + * + * PCI Device ID Uncore PMU Devices + * ---------------------------------- + * 0x7841 MC0 UClk, MC1 UClk + * 0x7843 MC0 DClk CH 0, MC0 DClk CH 1, MC0 DClk CH 2, + * MC1 DClk CH 0, MC1 DClk CH 1, MC1 DClk CH 2 + * 0x7833 EDC0 UClk, EDC1 UClk, EDC2 UClk, EDC3 UClk, + * EDC4 UClk, EDC5 UClk, EDC6 UClk, EDC7 UClk + * 0x7835 EDC0 EClk, EDC1 EClk, EDC2 EClk, EDC3 EClk, + * EDC4 EClk, EDC5 EClk, EDC6 EClk, EDC7 EClk + * 0x7817 M2PCIe + * 0x7814 IRP +*/ + +static const struct pci_device_id knl_uncore_pci_ids[] = { + { /* MC UClk */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7841), + .driver_data = UNCORE_PCI_DEV_DATA(KNL_PCI_UNCORE_MC_UCLK, 0), + }, + { /* MC DClk Channel */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7843), + .driver_data = UNCORE_PCI_DEV_DATA(KNL_PCI_UNCORE_MC_DCLK, 0), + }, + { /* EDC UClk */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7833), + .driver_data = UNCORE_PCI_DEV_DATA(KNL_PCI_UNCORE_EDC_UCLK, 0), + }, + { /* EDC EClk */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7835), + .driver_data = UNCORE_PCI_DEV_DATA(KNL_PCI_UNCORE_EDC_ECLK, 0), + }, + { /* M2PCIe */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7817), + .driver_data = UNCORE_PCI_DEV_DATA(KNL_PCI_UNCORE_M2PCIE, 0), + }, + { /* IRP */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7814), + .driver_data = UNCORE_PCI_DEV_DATA(KNL_PCI_UNCORE_IRP, 0), + }, + { /* end: all zeroes */ } +}; + +static struct pci_driver knl_uncore_pci_driver = { + .name = "knl_uncore", + .id_table = knl_uncore_pci_ids, +}; + +int knl_uncore_pci_init(void) +{ + int ret; + + /* All KNL PCI based PMON units are on the same PCI bus except IRP */ + ret = snb_pci2phy_map_init(0x7814); /* IRP */ + if (ret) + return ret; + ret = snb_pci2phy_map_init(0x7817); /* M2PCIe */ + if (ret) + return ret; + uncore_pci_uncores = knl_pci_uncores; + uncore_pci_driver = &knl_uncore_pci_driver; + return 0; +} + +/* end of KNL uncore support */ + /* Haswell-EP uncore support */ static struct attribute *hswep_uncore_ubox_formats_attr[] = { &format_attr_event.attr, @@ -2338,7 +2819,7 @@ int hswep_uncore_pci_init(void) } /* end of Haswell-EP uncore support */ -/* BDX-DE uncore support */ +/* BDX uncore support */ static struct intel_uncore_type bdx_uncore_ubox = { .name = "ubox", @@ -2360,13 +2841,14 @@ static struct event_constraint bdx_uncore_cbox_constraints[] = { UNCORE_EVENT_CONSTRAINT(0x09, 0x3), UNCORE_EVENT_CONSTRAINT(0x11, 0x1), UNCORE_EVENT_CONSTRAINT(0x36, 0x1), + UNCORE_EVENT_CONSTRAINT(0x3e, 0x1), EVENT_CONSTRAINT_END }; static struct intel_uncore_type bdx_uncore_cbox = { .name = "cbox", .num_counters = 4, - .num_boxes = 8, + .num_boxes = 24, .perf_ctr_bits = 48, .event_ctl = HSWEP_C0_MSR_PMON_CTL0, .perf_ctr = HSWEP_C0_MSR_PMON_CTR0, @@ -2379,9 +2861,24 @@ static struct intel_uncore_type bdx_uncore_cbox = { .format_group = &hswep_uncore_cbox_format_group, }; +static struct intel_uncore_type bdx_uncore_sbox = { + .name = "sbox", + .num_counters = 4, + .num_boxes = 4, + .perf_ctr_bits = 48, + .event_ctl = HSWEP_S0_MSR_PMON_CTL0, + .perf_ctr = HSWEP_S0_MSR_PMON_CTR0, + .event_mask = HSWEP_S_MSR_PMON_RAW_EVENT_MASK, + .box_ctl = HSWEP_S0_MSR_PMON_BOX_CTL, + .msr_offset = HSWEP_SBOX_MSR_OFFSET, + .ops = &hswep_uncore_sbox_msr_ops, + .format_group = &hswep_uncore_sbox_format_group, +}; + static struct intel_uncore_type *bdx_msr_uncores[] = { &bdx_uncore_ubox, &bdx_uncore_cbox, + &bdx_uncore_sbox, &hswep_uncore_pcu, NULL, }; @@ -2396,7 +2893,7 @@ void bdx_uncore_cpu_init(void) static struct intel_uncore_type bdx_uncore_ha = { .name = "ha", .num_counters = 4, - .num_boxes = 1, + .num_boxes = 2, .perf_ctr_bits = 48, SNBEP_UNCORE_PCI_COMMON_INIT(), }; @@ -2404,7 +2901,7 @@ static struct intel_uncore_type bdx_uncore_ha = { static struct intel_uncore_type bdx_uncore_imc = { .name = "imc", .num_counters = 5, - .num_boxes = 2, + .num_boxes = 8, .perf_ctr_bits = 48, .fixed_ctr_bits = 48, .fixed_ctr = SNBEP_MC_CHy_PCI_PMON_FIXED_CTR, @@ -2424,6 +2921,19 @@ static struct intel_uncore_type bdx_uncore_irp = { .format_group = &snbep_uncore_format_group, }; +static struct intel_uncore_type bdx_uncore_qpi = { + .name = "qpi", + .num_counters = 4, + .num_boxes = 3, + .perf_ctr_bits = 48, + .perf_ctr = SNBEP_PCI_PMON_CTR0, + .event_ctl = SNBEP_PCI_PMON_CTL0, + .event_mask = SNBEP_QPI_PCI_PMON_RAW_EVENT_MASK, + .box_ctl = SNBEP_PCI_PMON_BOX_CTL, + .num_shared_regs = 1, + .ops = &snbep_uncore_qpi_ops, + .format_group = &snbep_uncore_qpi_format_group, +}; static struct event_constraint bdx_uncore_r2pcie_constraints[] = { UNCORE_EVENT_CONSTRAINT(0x10, 0x3), @@ -2432,6 +2942,8 @@ static struct event_constraint bdx_uncore_r2pcie_constraints[] = { UNCORE_EVENT_CONSTRAINT(0x23, 0x1), UNCORE_EVENT_CONSTRAINT(0x25, 0x1), UNCORE_EVENT_CONSTRAINT(0x26, 0x3), + UNCORE_EVENT_CONSTRAINT(0x28, 0x3), + UNCORE_EVENT_CONSTRAINT(0x2c, 0x3), UNCORE_EVENT_CONSTRAINT(0x2d, 0x3), EVENT_CONSTRAINT_END }; @@ -2445,18 +2957,65 @@ static struct intel_uncore_type bdx_uncore_r2pcie = { SNBEP_UNCORE_PCI_COMMON_INIT(), }; +static struct event_constraint bdx_uncore_r3qpi_constraints[] = { + UNCORE_EVENT_CONSTRAINT(0x01, 0x7), + UNCORE_EVENT_CONSTRAINT(0x07, 0x7), + UNCORE_EVENT_CONSTRAINT(0x08, 0x7), + UNCORE_EVENT_CONSTRAINT(0x09, 0x7), + UNCORE_EVENT_CONSTRAINT(0x0a, 0x7), + UNCORE_EVENT_CONSTRAINT(0x0e, 0x7), + UNCORE_EVENT_CONSTRAINT(0x10, 0x3), + UNCORE_EVENT_CONSTRAINT(0x11, 0x3), + UNCORE_EVENT_CONSTRAINT(0x13, 0x1), + UNCORE_EVENT_CONSTRAINT(0x14, 0x3), + UNCORE_EVENT_CONSTRAINT(0x15, 0x3), + UNCORE_EVENT_CONSTRAINT(0x1f, 0x3), + UNCORE_EVENT_CONSTRAINT(0x20, 0x3), + UNCORE_EVENT_CONSTRAINT(0x21, 0x3), + UNCORE_EVENT_CONSTRAINT(0x22, 0x3), + UNCORE_EVENT_CONSTRAINT(0x23, 0x3), + UNCORE_EVENT_CONSTRAINT(0x25, 0x3), + UNCORE_EVENT_CONSTRAINT(0x26, 0x3), + UNCORE_EVENT_CONSTRAINT(0x28, 0x3), + UNCORE_EVENT_CONSTRAINT(0x29, 0x3), + UNCORE_EVENT_CONSTRAINT(0x2c, 0x3), + UNCORE_EVENT_CONSTRAINT(0x2d, 0x3), + UNCORE_EVENT_CONSTRAINT(0x2e, 0x3), + UNCORE_EVENT_CONSTRAINT(0x2f, 0x3), + UNCORE_EVENT_CONSTRAINT(0x33, 0x3), + UNCORE_EVENT_CONSTRAINT(0x34, 0x3), + UNCORE_EVENT_CONSTRAINT(0x36, 0x3), + UNCORE_EVENT_CONSTRAINT(0x37, 0x3), + UNCORE_EVENT_CONSTRAINT(0x38, 0x3), + UNCORE_EVENT_CONSTRAINT(0x39, 0x3), + EVENT_CONSTRAINT_END +}; + +static struct intel_uncore_type bdx_uncore_r3qpi = { + .name = "r3qpi", + .num_counters = 3, + .num_boxes = 3, + .perf_ctr_bits = 48, + .constraints = bdx_uncore_r3qpi_constraints, + SNBEP_UNCORE_PCI_COMMON_INIT(), +}; + enum { BDX_PCI_UNCORE_HA, BDX_PCI_UNCORE_IMC, BDX_PCI_UNCORE_IRP, + BDX_PCI_UNCORE_QPI, BDX_PCI_UNCORE_R2PCIE, + BDX_PCI_UNCORE_R3QPI, }; static struct intel_uncore_type *bdx_pci_uncores[] = { [BDX_PCI_UNCORE_HA] = &bdx_uncore_ha, [BDX_PCI_UNCORE_IMC] = &bdx_uncore_imc, [BDX_PCI_UNCORE_IRP] = &bdx_uncore_irp, + [BDX_PCI_UNCORE_QPI] = &bdx_uncore_qpi, [BDX_PCI_UNCORE_R2PCIE] = &bdx_uncore_r2pcie, + [BDX_PCI_UNCORE_R3QPI] = &bdx_uncore_r3qpi, NULL, }; @@ -2465,6 +3024,10 @@ static const struct pci_device_id bdx_uncore_pci_ids[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f30), .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_HA, 0), }, + { /* Home Agent 1 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f38), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_HA, 1), + }, { /* MC0 Channel 0 */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb0), .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 0), @@ -2473,14 +3036,74 @@ static const struct pci_device_id bdx_uncore_pci_ids[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb1), .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 1), }, + { /* MC0 Channel 2 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb4), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 2), + }, + { /* MC0 Channel 3 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb5), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 3), + }, + { /* MC1 Channel 0 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fd0), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 4), + }, + { /* MC1 Channel 1 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fd1), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 5), + }, + { /* MC1 Channel 2 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fd4), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 6), + }, + { /* MC1 Channel 3 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fd5), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 7), + }, { /* IRP */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f39), .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IRP, 0), }, + { /* QPI0 Port 0 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f32), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_QPI, 0), + }, + { /* QPI0 Port 1 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f33), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_QPI, 1), + }, + { /* QPI1 Port 2 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f3a), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_QPI, 2), + }, { /* R2PCIe */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f34), .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_R2PCIE, 0), }, + { /* R3QPI0 Link 0 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f36), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_R3QPI, 0), + }, + { /* R3QPI0 Link 1 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f37), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_R3QPI, 1), + }, + { /* R3QPI1 Link 2 */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f3e), + .driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_R3QPI, 2), + }, + { /* QPI Port 0 filter */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f86), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 0), + }, + { /* QPI Port 1 filter */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f96), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 1), + }, + { /* QPI Port 2 filter */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f46), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 2), + }, { /* end: all zeroes */ } }; @@ -2500,4 +3123,4 @@ int bdx_uncore_pci_init(void) return 0; } -/* end of BDX-DE uncore support */ +/* end of BDX uncore support */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2c1910f6717e..58f34319b29a 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -35,6 +35,7 @@ #include <asm/cpu.h> #include <asm/reboot.h> #include <asm/virtext.h> +#include <asm/intel_pt.h> /* Alignment required for elf header segment */ #define ELF_CORE_HEADER_ALIGN 4096 @@ -125,6 +126,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); + /* + * Disable Intel PT to stop its logging + */ + cpu_emergency_stop_pt(); + disable_local_APIC(); } @@ -169,6 +175,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs) cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); + /* + * Disable Intel PT to stop its logging + */ + cpu_emergency_stop_pt(); + #ifdef CONFIG_X86_IO_APIC /* Prevent crash_kexec() from deadlocking on ioapic_lock. */ ioapic_zap_locks(); diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c index 43623739c7cf..004c861b1648 100644 --- a/arch/x86/lib/msr.c +++ b/arch/x86/lib/msr.c @@ -1,6 +1,8 @@ #include <linux/module.h> #include <linux/preempt.h> #include <asm/msr.h> +#define CREATE_TRACE_POINTS +#include <asm/msr-trace.h> struct msr *msrs_alloc(void) { @@ -108,3 +110,27 @@ int msr_clear_bit(u32 msr, u8 bit) { return __flip_bit(msr, bit, false); } + +#ifdef CONFIG_TRACEPOINTS +void do_trace_write_msr(unsigned msr, u64 val, int failed) +{ + trace_write_msr(msr, val, failed); +} +EXPORT_SYMBOL(do_trace_write_msr); +EXPORT_TRACEPOINT_SYMBOL(write_msr); + +void do_trace_read_msr(unsigned msr, u64 val, int failed) +{ + trace_read_msr(msr, val, failed); +} +EXPORT_SYMBOL(do_trace_read_msr); +EXPORT_TRACEPOINT_SYMBOL(read_msr); + +void do_trace_rdpmc(unsigned counter, u64 val, int failed) +{ + trace_rdpmc(counter, val, failed); +} +EXPORT_SYMBOL(do_trace_rdpmc); +EXPORT_TRACEPOINT_SYMBOL(rdpmc); + +#endif |