lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2015-04-02	perf/x86/intel/pt: Fix the 32-bit build	Ingo Molnar
	On a 32-bit build I got: arch/x86/kernel/cpu/perf_event_intel_pt.c:413:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] arch/x86/kernel/cpu/perf_event_intel_bts.c:162:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Fix it. The code should probably be (re-)tested on 32-bit systems to make sure all is fine. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Avoid rewriting DEBUGCTL with the same value for LBRs	Andi Kleen
	perf with LBRs on has a tendency to rewrite the DEBUGCTL MSR with the same value. Add a little optimization to skip the unnecessary write. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1426871484-21285-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Streamline LBR MSR handling in PMI	Andi Kleen
	The perf PMI currently does unnecessary MSR accesses when LBRs are enabled. We use LBR freezing, or when in callstack mode force the LBRs to only filter on ring 3. So there is no need to disable the LBRs explicitely in the PMI handler. Also we always unnecessarily rewrite LBR_SELECT in the LBR handler, even though it can never change. 5) \| /* write_msr: MSR_LBR_SELECT(1c8), value 0 / 5) \| / read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 / 5) \| / write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 / 5) \| / write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f / 5) \| / write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 / 5) \| / write_msr: MSR_LBR_SELECT(1c8), value 0 / 5) \| / read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 / 5) \| / write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */ This patch: - Avoids disabling already frozen LBRs unnecessarily in the PMI - Avoids changing LBR_SELECT in the PMI Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1426871484-21285-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Only dump PEBS register when PEBS has been detected	Andi Kleen
	Technically PEBS_ENABLED is only guaranteed to exist when we detected PEBS. So add a check for this to the PMU dump function. I don't think it can happen on a real CPU, but could in a VM. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Dump DEBUGCTL in PMU dump	Andi Kleen
	LBRs and LBR freezing are controlled through the DEBUGCTL MSR. So dump the state of DEBUGCTL too when dumping the PMU state. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Reset more state in PMU reset	Andi Kleen
	The PMU reset code didn't quite keep up with newer PMU features. Improve it a bit to really reset a modern PMU: - Clear all overflow status - Clear LBRs and freezing state - Disable fixed counters too Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Make the HT bug workaround conditional on HT enabled	Stephane Eranian
	This patch disables the PMU HT bug when Hyperthreading (HT) is disabled. We cannot do this test immediately when perf_events is initialized. We need to wait until the topology information is setup properly. As such, we register a later initcall, check the topology and potentially disable the workaround. To do this, we need to ensure there is no user of the PMU. At this point of the boot, the only user is the NMI watchdog, thus we disable it during the switch and re-enable it right after. Having the workaround disabled when it is not needed provides some benefits by limiting the overhead is time and space. The workaround still ensures correct scheduling of the corrupting memory events (0xd0, 0xd1, 0xd2) when HT is off. Those events can only be measured on counters 0-3. Something else the current kernel did not handle correctly. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-13-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Limit to half counters when the HT workaround is enabled, to ↵	Stephane Eranian
	avoid exclusive mode starvation This patch limits the number of counters available to each CPU when the HT bug workaround is enabled. This is necessary to avoid situation of counter starvation. Such can arise from configuration where one HT thread, HT0, is using all 4 counters with corrupting events which require exclusion the the sibling HT, HT1. In such case, HT1 would not be able to schedule any event until HT0 is done. To mitigate this problem, this patch artificially limits the number of counters to 2. That way, we can gurantee that at least 2 counters are not in exclusive mode and therefore allow the sibling thread to schedule events of the same type (system vs. per-thread). The 2 counters are not determined in advance. We simply set the limit to two events per HT. This helps mitigate starvation in case of events with specific counter constraints such a PREC_DIST. Note that this does not elimintate the starvation is all cases. But it is better than not having it. (Solution suggested by Peter Zjilstra.) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Fix intel_get_event_constraints() for dynamic constraints	Stephane Eranian
	With dynamic constraint, we need to restart from the static constraints each time the intel_get_event_constraints() is called. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Enforce HT bug workaround with PEBS for SNB/IVB/HSW	Maria Dimakopoulou
	This patch modifies the PEBS constraint tables for SNB/IVB/HSW such that corrupting events supporting PEBS activate the HT workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Enforce HT bug workaround for SNB/IVB/HSW	Maria Dimakopoulou
	This patches activates the HT bug workaround for the SNB/IVB/HSW processors. This covers non-PEBS mode. Activation is done thru the constraint tables. Both client and server processors needs this workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Implement cross-HT corruption bug workaround	Maria Dimakopoulou
	This patch implements a software workaround for a HW erratum on Intel SandyBridge, IvyBridge and Haswell processors with Hyperthreading enabled. The errata are documented for each processor in their respective specification update documents: - SandyBridge: BJ122 - IvyBridge: BV98 - Haswell: HSD29 The bug causes silent counter corruption across hyperthreads only when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3). Counters measuring those events may leak counts to the sibling counter. For instance, counter 0, thread 0 measuring event 0xd0, may leak to counter 0, thread 1, regardless of the event measured there. The size of the leak is not predictible. It all depends on the workload and the state of each sibling hyper-thread. The corrupting events do undercount as a consequence of the leak. The leak is compensated automatically only when the sibling counter measures the exact same corrupting event AND the workload is on the two threads is the same. Given, there is no way to guarantee this, a work-around is necessary. Furthermore, there is a serious problem if the leaked count is added to a low-occurrence event. In that case the corruption on the low occurrence event can be very large, e.g., orders of magnitude. There is no HW or FW workaround for this problem. The bug is very easy to reproduce on a loaded system. Here is an example on a Haswell client, where CPU0, CPU4 are siblings. We load the CPUs with a simple triad app streaming large floating-point vector. We use 0x81d0 corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and 0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not using the LBR, the 0x20cc event should be zero. $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 139 277 291 r20cc 10,000969126 seconds time elapsed In this example, 0x81d0 and r20cc ar eusing sinling counters on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it from 0 to 139 millions occurrences. This patch provides a software workaround to this problem by modifying the way events are scheduled onto counters by the kernel. The patch forces cross-thread mutual exclusion between counters in case a corrupting event is measured by one of the hyper-threads. If thread 0, counter 0 is measuring event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting event is measured on any hyper-thread, event scheduling proceeds as before. The same example run with the workaround enabled, yield the correct answer: $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 0 r20cc 10,000969126 seconds time elapsed The patch does provide correctness for all non-corrupting events. It does not "repatriate" the leaked counts back to the leaking counter. This is planned for a second patch series. This patch series makes this repatriation more easy by guaranteeing the sibling counter is not measuring any useful event. The patch introduces dynamic constraints for events. That means that events which did not have constraints, i.e., could be measured on any counters, may now be constrained to a subset of the counters depending on what is going on the sibling thread. The algorithm is similar to a cache coherency protocol. We call it XSU in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU counter. As a consequence of the workaround, users may see an increased amount of event multiplexing, even in situtations where there are fewer events than counters measured on a CPU. Patch has been tested on all three impacted processors. Note that when HT is off, there is no corruption. However, the workaround is still enabled, yet not costing too much. Adding a dynamic detection of HT on turned out to be complex are requiring too much to code to be justified. This patch addresses the issue when PEBS is not used. A subsequent patch fixes the problem when PEBS is used. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Add cross-HT counter exclusion infrastructure	Maria Dimakopoulou
	This patch adds a new shared_regs style structure to the per-cpu x86 state (cpuc). It is used to coordinate access between counters which must be used with exclusion across HyperThreads on Intel processors. This new struct is not needed on each PMU, thus is is allocated on demand. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [peterz: spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Add 'index' param to get_event_constraint() callback	Stephane Eranian
	This patch adds an index parameter to the get_event_constraint() x86_pmu callback. It is expected to represent the index of the event in the cpuc->event_list[] array. When the callback is used for fake_cpuc (evnet validation), then the index must be -1. The motivation for passing the index is to use it to index into another cpuc array. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Add 3 new scheduling callbacks	Maria Dimakopoulou
	This patch adds 3 new PMU model specific callbacks during the event scheduling done by x86_schedule_events(). ->start_scheduling(): invoked when entering the schedule routine. ->stop_scheduling(): invoked at the end of the schedule routine ->commit_scheduling(): invoked for each committed event To be used optionally by model-specific code. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Vectorize cpuc->kfree_on_online	Stephane Eranian
	Make the cpuc->kfree_on_online a vector to accommodate more than one entry and add the second entry to be used by a later patch. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Rename x86_pmu::er_flags to 'flags'	Stephane Eranian
	Because it will be used for more than just tracking the presence of extra registers. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	Merge branch 'perf/urgent' into perf/core, before applying dependent patches	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel/bts: Add BTS PMU driver	Alexander Shishkin
	Add support for Branch Trace Store (BTS) via kernel perf event infrastructure. The difference with the existing implementation of BTS support is that this one is a separate PMU that exports events' trace buffers to userspace by means of AUX area of the perf buffer, which is zero-copy mapped into userspace. The immediate benefit is that the buffer size can be much bigger, resulting in fewer interrupts and no kernel side copying is involved and little to no trace data loss. Also, kernel code can be traced with this driver. The old way of collecting BTS traces still works. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel/pt: Add Intel PT PMU driver	Alexander Shishkin
	Add support for Intel Processor Trace (PT) to kernel's perf events. PT is an extension of Intel Architecture that collects information about software execuction such as control flow, execution modes and timings and formats it into highly compressed binary packets. Even being compressed, these packets are generated at hundreds of megabytes per second per core, which makes it impractical to decode them on the fly in the kernel. This driver exports trace data by through AUX space in the perf ring buffer, which is zero-copy mapped into userspace for faster data retrieval. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86: Mark Intel PT and LBR/BTS as mutually exclusive	Alexander Shishkin
	Intel PT cannot be used at the same time as LBR or BTS and will cause a general protection fault if they are used together. In order to avoid fixing up GPs in the fast path, instead we disallow creating LBR/BTS events when PT events are present and vice versa. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-12-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection	Alexander Shishkin
	Intel Processor Trace is an architecture extension that allows for program flow tracing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-11-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints	Andi Kleen
	Some of the CYCLE_ACTIVITY.* events can only be scheduled on counter 2. Due to a typo Haswell matched those with INTEL_EVENT_CONSTRAINT, which lead to the events never matching as the comparison does not expect anything in the umask too. Fix the typo. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02	perf/x86/intel: Filter branches for PEBS event	Kan Liang
	For supporting Intel LBR branches filtering, Intel LBR sharing logic mechanism is introduced from commit b36817e88630 ("perf/x86: Add Intel LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to config lbr_sel, which is finally used to set LBR_SELECT. However, the intel_shared_regs_constraints() function is called after intel_pebs_constraints(). The PEBS event will return immediately after intel_pebs_constraints(). So it's impossible to filter branches for PEBS events. This patch moves intel_shared_regs_constraints() ahead of intel_pebs_constraints(). We can safely do that because the intel_shared_regs_constraints() function only returns empty constraint if its rejecting the event, otherwise it returns NULL such that we continue calling intel_pebs_constraints() and x86_get_event_constraint(). Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-29	Merge tag 'armsoc-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "The latest and greatest fixes for ARM platform code. Worth pointing out are: - Lines-wise, largest is a PXA fix for dealing with interrupts on DT that was quite broken. It's still newish code so while we could have held this off, it seemed appropriate to include now - Some GPIO fixes for OMAP platforms added a few lines. This was also fixes for code recently added (this release). - Small OMAP timer fix to behave better with partially upstreamed platforms, which is quite welcome. - Allwinner fixes about operating point control, reducing overclocking in some cases for better stability. plus a handful of other smaller fixes across the map" * tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: juno: Fix misleading name of UART reference clock ARM: dts: sunxi: Remove overclocked/overvoltaged OPP ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting ARM: socfpga: dts: fix spi1 interrupt ARM: dts: Fix gpio interrupts for dm816x ARM: dts: dra7: remove ti,hwmod property from pcie phy ARM: OMAP: dmtimer: disable pm runtime on remove ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure ARM: OMAP2+: Fix socbus family info for AM33xx devices ARM: dts: omap3: Add missing dmas for crypto ARM: dts: rockchip: disable gmac by default in rk3288.dtsi MAINTAINERS: add rockchip regexp to the ARM/Rockchip entry ARM: pxa: fix pxa interrupts handling in DT ARM: pxa: Fix typo in zeus.c ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage
2015-03-29	Merge tag 'sunxi-fixes-for-4.0' of ↵	Olof Johansson
	https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes Allwinner fixes for 4.0 There's a few fixes to merge for 4.0, one to add a select in the machine Kconfig option to fix a potential build failure, and two fixing cpufreq related issues. * tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: ARM: dts: sunxi: Remove overclocked/overvoltaged OPP ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage Signed-off-by: Olof Johansson <olof@lixom.net>
2015-03-29	Merge tag 'fixes-v4.0-rc4' of ↵	Olof Johansson
	git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Fixes for omaps for the -rc cycle: - Fix a device tree based booting vs legacy booting regression for omap3 crypto hardware by adding the missing DMA channels. - Fix /sys/bus/soc/devices/soc0/family for am33xx devices. - Fix two timer issues that can cause hangs if the timer related hwmod data is missing like it often initially is for new SoCs. - Remove pcie hwmods entry from dts as that causes runtime PM to fail for the PHYs. - A paper bag type dts configuration fix for dm816x GPIO interrupts that I just noticed. This is most of the changes diffstat wise, but as it's a basic feature for connecting devices and things work otherwise, it should be fixed. * tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: Fix gpio interrupts for dm816x ARM: dts: dra7: remove ti,hwmod property from pcie phy ARM: OMAP: dmtimer: disable pm runtime on remove ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure ARM: OMAP2+: Fix socbus family info for AM33xx devices ARM: dts: omap3: Add missing dmas for crypto Signed-off-by: Olof Johansson <olof@lixom.net>
2015-03-29	Merge tag 'socfpga_fix_for_v4.0_2' of ↵	Olof Johansson
	git://git.rocketboards.org/linux-socfpga-next into fixes Late fix for v4.0 on the SoCFPGA platform: - Fix interrupt number for SPI1 interface * tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next: ARM: socfpga: dts: fix spi1 interrupt Signed-off-by: Olof Johansson <olof@lixom.net>
2015-03-29	arm64: juno: Fix misleading name of UART reference clock	Dave Martin
	The UART reference clock speed is 7273.8 kHz, not 72738 kHz. Dots aren't usually used in node names even though ePAPR permits them. However, this can easily be avoided by expressing the frequency in Hz, not kHz. This patch changes the name to refclk7273800hz, reflecting the actual clock speed. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2015-03-29	Merge tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux into fixes	Olof Johansson
	arm: pxa: fixes for v4.0-rc5 There are only 2 fixes, one for the zeus board about the regulator changes, where a typo prevented the zeus board from having a working can regulator, and one regression triggered by the interrupts IRQ shift of 16 affecting all boards. * tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux: ARM: pxa: fix pxa interrupts handling in DT ARM: pxa: Fix typo in zeus.c Signed-off-by: Olof Johansson <olof@lixom.net>
2015-03-28	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix x86 syscall exit code bug that resulted in spurious non-execution of TIF-driven user-return worklets, causing big trouble for things like KVM that rely on user notifiers for correctness of their vcpu model, causing crashes like double faults" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/entry: Check for syscall exit work with IRQs disabled
2015-03-28	Merge branch 'parisc-4.0-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parsic fixes from Helge Deller: "One patch from Mikulas fixes a bug on parisc by artifically incrementing the counter in pmd_free when the kernel tries to free the preallocated pmd. Other than that we now prevent that syscalls gets added without incrementing __NR_Linux_syscalls and fix the initial pmd setup code if a default page size greater than 4k has been selected" * 'parisc-4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BIT parisc: mm: don't count preallocated pmds parisc: Add compile-time check when adding new syscalls
2015-03-28	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm ppc bugfixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Fix instruction emulation KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()
2015-03-28	Merge tag 'arc-4.0-fixes-part-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: "We found some issues with signal handling taking down the system. I know its late, but these are important and all marked for stable. ARC signal handling related fixes uncovered during recent testing of NPTL tools" * tag 'arc-4.0-fixes-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: signal handling robustify ARC: SA_SIGINFO ucontext regs off-by-one
2015-03-27	perf: Add per event clockid support	Peter Zijlstra
	While thinking on the whole clock discussion it occurred to me we have two distinct uses of time: 1) the tracking of event/ctx/cgroup enabled/running/stopped times which includes the self-monitoring support in struct perf_event_mmap_page. 2) the actual timestamps visible in the data records. And we've been conflating them. The first is all about tracking time deltas, nobody should really care in what time base that happens, its all relative information, as long as its internally consistent it works. The second however is what people are worried about when having to merge their data with external sources. And here we have the discussion on MONOTONIC vs MONOTONIC_RAW etc.. Where MONOTONIC is good for correlating between machines (static offset), MONOTNIC_RAW is required for correlating against a fixed rate hardware clock. This means configurability; now 1) makes that hard because it needs to be internally consistent across groups of unrelated events; which is why we had to have a global perf_clock(). However, for 2) it doesn't really matter, perf itself doesn't care what it writes into the buffer. The below patch makes the distinction between these two cases by adding perf_event_clock() which is used for the second case. It further makes this configurable on a per-event basis, but adds a few sanity checks such that we cannot combine events with different clocks in confusing ways. And since we then have per-event configurability we might as well retain the 'legacy' behaviour as a default. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	Merge branch 'perf/core' into perf/timer, before applying new changes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	Merge branch 'timers/core' into perf/timer, to apply dependent patch	Ingo Molnar
	An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(), so merge timers/core here in a separate topic branch until it's all cooked and timers/core is merged upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	perf/x86: Remove redundant calls to perf_pmu_{dis\|en}able()	David Ahern
	perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called afterwards. No need to call these inside of x86_pmu_add() as well. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	Merge branch 'perf/x86' into perf/core, because it's ready	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh ↵	Ingo Molnar
	the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	time: Rename timekeeper::tkr to timekeeper::tkr_mono	Peter Zijlstra
	In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	perf/x86/intel: Add INST_RETIRED.ALL workarounds	Andi Kleen
	On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. This is erratum BDM11 and BDM55: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf BDM11: When using a period < 100; we may get incorrect PEBS/PMI interrupts and/or an invalid counter state. BDM55: When bit0-5 of the period are !0 we may get redundant PEBS records on overflow. Add a new callback to enforce this, and set it for Broadwell. How does this handle the case when an app requests a specific period with some of the bottom bits set? Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer (by Peterz): IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Updated comments and changelog a bit. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	perf/x86/intel: Add Broadwell core support	Andi Kleen
	Add Broadwell support for Broadwell to perf. The basic support is very similar to Haswell. We use the new cache event list added for Haswell earlier. The only differences are a few bits related to remote nodes. To avoid an extra, mostly identical, table these are patched up in the initialization code. The constraint list has one new event that needs to be handled over Haswell. Includes code and testing from Kan Liang. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27	perf/x86/intel: Add new cache events table for Haswell	Andi Kleen
	Haswell offcore events are quite different from Sandy Bridge. Add a new table to handle Haswell properly. Note that the offcore bits listed in the SDM are not quite correct (this is currently being fixed). An uptodate list of bits is in the patch. The basic setup is similar to Sandy Bridge. The prefetch columns have been removed, as prefetch counting is not very reliable on Haswell. One L1 event that is not in the event list anymore has been also removed. - data reads do not include code reads (comparable to earlier Sandy Bridge tables) - data counts include speculative execution (except L1 write, dtlb, bpu) - remote node access includes both remote memory, remote cache, remote mmio. - prefetches are not included in the counts for consistency (different from Sandy Bridge, which includes prefetches in the remote node) Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-26	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "A couple of bug fixes for s390. The ftrace comile fix is quite large for a -rc6 release, but it would be nice to have it in 4.0" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/smp: reenable smt after resume s390/mm: limit STACK_RND_MASK for compat tasks s390/ftrace: fix compile error if CONFIG_KPROBES is disabled s390/cpum_sf: add diagnostic sampling event only if it is authorized
2015-03-26	ARC: signal handling robustify	Vineet Gupta
	A malicious signal handler / restorer can DOS the system by fudging the user regs saved on stack, causing weird things such as sigreturn returning to user mode PC but cpu state still being kernel mode.... Ensure that in sigreturn path status32 always has U bit; any other bogosity (gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms. Reproducer signal handler: void handle_sig(int signo, siginfo_t info, void context) { ucontext_t uc = context; struct user_regs_struct regs = &(uc->uc_mcontext.regs); regs->scratch.status32 = 0; } Before the fix, kernel would go off to weeds like below: --------->8----------- [ARCLinux]$ ./signal-test Path: /signal-test CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65 task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000 [ECR ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698 [EFA ]: 0x00000010 [BLINK ]: 0x2007c1ee [ERET ]: 0x10698 [STAT32]: 0x00000000 : <-------- BTA: 0x00010680 SP: 0x5ffe7e48 FP: 0x00000000 LPS: 0x20003c6c LPE: 0x20003c70 LPC: 0x00000000 ... --------->8----------- Reported-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-03-26	ARC: SA_SIGINFO ucontext regs off-by-one	Vineet Gupta
	The regfile provided to SA_SIGINFO signal handler as ucontext was off by one due to pt_regs gutter cleanups in 2013. Before handling signal, user pt_regs are copied onto user_regs_struct and copied back later. Both structs are binary compatible. This was all fine until commit 2fa919045b72 (ARC: pt_regs update #2) which removed the empty stack slot at top of pt_regs (corresponding to first pad) and made the corresponding fixup in struct user_regs_struct (the pad in there was moved out of @scratch - not removed altogether as it is part of ptrace ABI) struct user_regs_struct { + long pad; struct { - long pad; long bta, lp_start, lp_end,.... } scratch; ... } This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and signal code needs to user_regs_struct.scratch to reflect it as pt_regs, which is what this commit does. This problem was hidden for 2 years, because both save/restore, despite using wrong location, were using the same location. Only an interim inspection (reproducer below) exposed the issue. void handle_segv(int signo, siginfo_t info, void context) { ucontext_t uc = context; struct user_regs_struct regs = &(uc->uc_mcontext.regs); printf("regs %x %x\n", <=== prints 7 8 (vs. 8 9) regs->scratch.r8, regs->scratch.r9); } int main() { struct sigaction sa; sa.sa_sigaction = handle_segv; sa.sa_flags = SA_SIGINFO; sigemptyset(&sa.sa_mask); sigaction(SIGSEGV, &sa, NULL); asm volatile( "mov r7, 7 \n" "mov r8, 8 \n" "mov r9, 9 \n" "mov r10, 10 \n" :::"r7","r8","r9","r10"); ((unsigned int)0x10) = 0; } Fixes: 2fa919045b72ec892e "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs" CC: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-03-25	Merge tag 'metag-fixes-v4.0-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull arch/metag fix from James Hogan: "Another metag architecture fix for v4.0 This is another single fix, for an include dependency problem when using ioremap_wc() from asm/io.h without also including asm/pgtable.h" * tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: metag: Fix ioremap_wc/ioremap_cached build errors
2015-03-25	Merge tag 'signed-for-4.0' of git://github.com/agraf/linux-2.6	Marcelo Tosatti
	Patch queue for 4.0 - 2015-03-25 A few bug fixes for Book3S HV KVM: - Fix spinlock ordering - Fix idle guests on LE hosts - Fix instruction emulation
2015-03-25	s390/smp: reenable smt after resume	Heiko Carstens
	After a suspend/resume cycle we missed to enable smt again, which leads to all sorts of bugs, since the kernel assumes smt is enabled, while the hardware thinks it is not. Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reported-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>