perf: Fix loss of notification with multi-event

When you do: $ perf record -e cycles,cycles,cycles noploop 10 You expect about 10,000 samples for each event, i.e., 10s at 1000samples/sec. However, this is not what's happening. You get much fewer samples, maybe 3700 samples/event: $ perf report -D | tail -15 Aggregated stats: TOTAL events: 10998 MMAP events: 66 COMM events: 2 SAMPLE events: 10930 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 cycles stats: TOTAL events: 3642 SAMPLE events: 3642 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 On a Intel Nehalem or even AMD64, there are 4 counters capable of measuring cycles, so there is plenty of space to measure those events without multiplexing (even with the NMI watchdog active). And even with multiplexing, we'd expect roughly the same number of samples per event. The root of the problem was that when the event that caused the buffer to become full was not the first event passed on the cmdline, the user notification would get lost. The notification was sent to the file descriptor of the overflowed event but the perf tool was not polling on it. The perf tool aggregates all samples into a single buffer, i.e., the buffer of the first event. Consequently, it assumes notifications for any event will come via that descriptor. The seemingly straight forward solution of moving the waitq into the ringbuffer object doesn't work because of life-time issues. One could perf_event_set_output() on a fd that you're also blocking on and cause the old rb object to be freed while its waitq would still be referenced by the blocked thread -> FAIL. Therefore link all events to the ringbuffer and broadcast the wakeup from the ringbuffer object to all possible events that could be waited upon. This is rather ugly, and we're open to better solutions but it works for now. Reported-by: Stephane Eranian <eranian@google.com> Finished-by: Stephane Eranian <eranian@google.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
author: Peter Zijlstra <a.p.zijlstra@chello.nl> 2011-11-26 02:47:31 +0100
committer: Ingo Molnar <mingo@elte.hu> 2011-12-05 09:33:03 +0100
commit: 10c6db110d0eb4466b59812c49088ab56218fc2e (patch)
tree: d1d4e8debcf7415df49ce691b4c3da7443919f11 /include/linux/perf_event.h
parent: 16e5294e5f8303756a179cf218e37dfb9ed34417 (diff)
download: lwn-10c6db110d0eb4466b59812c49088ab56218fc2e.tar.gz
lwn-10c6db110d0eb4466b59812c49088ab56218fc2e.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e9ebe5e0091..b1f89122bf6a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -822,6 +822,7 @@ struct perf_event {
 	int				mmap_locked;
 	struct user_struct		*mmap_user;
 	struct ring_buffer		*rb;
+	struct list_head		rb_entry;
 
 	/* poll related */
 	wait_queue_head_t		waitq;
author	Peter Zijlstra <a.p.zijlstra@chello.nl>	2011-11-26 02:47:31 +0100
committer	Ingo Molnar <mingo@elte.hu>	2011-12-05 09:33:03 +0100
commit	10c6db110d0eb4466b59812c49088ab56218fc2e (patch)
tree	d1d4e8debcf7415df49ce691b4c3da7443919f11 /include/linux/perf_event.h
parent	16e5294e5f8303756a179cf218e37dfb9ed34417 (diff)
download	lwn-10c6db110d0eb4466b59812c49088ab56218fc2e.tar.gz lwn-10c6db110d0eb4466b59812c49088ab56218fc2e.zip