lwn.git/drivers/gpu/drm/i915/intel_ringbuffer.h, branch v4.5-rc4

drm/i915: intel_ring_initialized() must be simple and inline

2015-12-10T13:14:36+00:00

Based on Chris Wilson's patch from 6 months ago, rebased and adapted. The current implementation of intel_ring_initialized() is too heavyweight; it's a non-inlined function that chases several levels of pointers. This wouldn't matter too much if it were rarely called, but it's used inside the iterator test of for_each_ring() and is therefore called quite frequently. So let's make it simple and inline ... The idea here is to use ring->dev as an indicator showing which engines have been initialised and are therefore to be included in iterations that use for_each_ring(). This allows us to avoid multiple memory references and a (non-inlined) function call on each iteration of each such loop. Fixes regression from commit 48d823878d64f93163f5a949623346748bbce1b4 Author: Oscar Mateo Date: Thu Jul 24 17:04:23 2014 +0100 drm/i915/bdw: Generic logical ring init and cleanup Signed-off-by: Chris Wilson Signed-off-by: Dave Gordon Signed-off-by: Daniel Vetter Link: http://patchwork.freedesktop.org/patch/msgid/1449586956-32360-2-git-send-email-david.s.gordon@intel.com

drm/i915: Type safe register read/write

2015-11-18T13:39:11+00:00

Make I915_READ and I915_WRITE more type safe by wrapping the register offset in a struct. This should eliminate most of the fumbles we've had with misplaced parens. This only takes care of normal mmio registers. We could extend the idea to other register types and define each with its own struct. That way you wouldn't be able to accidentally pass the wrong thing to a specific register access function. The gpio_reg setup is probably the ugliest thing left. But I figure I'd just leave it for now, and wait for some divine inspiration to strike before making it nice. As for the generated code, it's actually a bit better sometimes. Eg. looking at i915_irq_handler(), we can see the following change: lea 0x70024(%rdx,%rax,1),%r9d mov $0x1,%edx - movslq %r9d,%r9 - mov %r9,%rsi - mov %r9,-0x58(%rbp) - callq *0xd8(%rbx) + mov %r9d,%esi + mov %r9d,-0x48(%rbp) callq *0xd8(%rbx) So previously gcc thought the register offset might be signed and decided to sign extend it, just in case. The rest appears to be mostly just minor shuffling of instructions. v2: i915_mmio_reg_{offset,equal,valid}() helpers added s/_REG/_MMIO/ in the register defines mo more switch statements left to worry about ring_emit stuff got sorted in a prep patch cmd parser, lrc context and w/a batch buildup also in prep patch vgpu stuff cleaned up and moved to a prep patch all other unrelated changes split out v3: Rebased due to BXT DSI/BLC, MOCS, etc. v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/ Signed-off-by: Ville Syrjälä Reviewed-by: Chris Wilson Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com

drm/i915: Add functions to emit register offsets to the ring

2015-11-18T12:35:24+00:00

When register type safety happens, we can't just try to emit the register itself to the ring. Instead we'll need to extract the offset from it first. Add some convenience functions that will do that. v2: Convert MOCS setup too Signed-off-by: Ville Syrjälä Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson

drm/i915: Recover all available ringbuffer space following reset

2015-10-28T17:10:31+00:00

Having flushed all requests from all queues, we know that all ringbuffers must now be empty. However, since we do not reclaim all space when retiring the request (to prevent HEADs colliding with rapid ringbuffer wraparound) the amount of available space on each ringbuffer upon reset is less than when we start. Do one more pass over all the ringbuffers to reset the available space Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala Cc: Arun Siluvery Cc: Mika Kuoppala Cc: Dave Gordon

drm/i915: Refactor common ringbuffer allocation code

2015-09-04T08:17:00+00:00

A small, very small, step to sharing the duplicate code between execlists and legacy submission engines, starting with the ringbuffer allocation code. Signed-off-by: Chris Wilson Cc: Arun Siluvery Cc: Mika Kuoppala Cc: Dave Gordon Reviewed-by: Paulo Zanoni Reviewed-by: Mika Kuoppala Signed-off-by: Daniel Vetter

drm/i915/bxt: work around HW coherency issue when accessing GPU seqno

2015-08-26T07:39:13+00:00

By running igt/store_dword_loop_render on BXT we can hit a coherency problem where the seqno written at GPU command completion time is not seen by the CPU. This results in __i915_wait_request seeing the stale seqno and not completing the request (not considering the lost interrupt/GPU reset mechanism). I also verified that this isn't a case of a lost interrupt, or that the command didn't complete somehow: when the coherency issue occured I read the seqno via an uncached GTT mapping too. While the cached version of the seqno still showed the stale value the one read via the uncached mapping was the correct one. Work around this issue by clflushing the corresponding CPU cacheline following any store of the seqno and preceding any reading of it. When reading it do this only when the caller expects a coherent view. v2: - fix using the proper logical && instead of a bitwise & (Jani, Mika) - limit the workaround to A stepping, on later steppings this HW issue is fixed v3: - use a separate get_seqno/set_seqno vfunc (Chris) Testcase: igt/store_dword_loop_render Signed-off-by: Imre Deak Reviewed-by: Chris Wilson Signed-off-by: Daniel Vetter

Merge tag 'drm-intel-fixes-2015-07-15' into drm-intel-next-queued

2015-07-15T14:36:50+00:00

Backmerge fixes since it's getting out of hand again with the massive split due to atomic between -next and 4.2-rc. All the bugfixes in 4.2-rc are addressed already (by converting more towards atomic instead of minimal duct-tape) so just always pick the version in next for the conflicts in modeset code. All the other conflicts are just adjacent lines changed. Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_drv.h drivers/gpu/drm/i915/intel_ringbuffer.h Signed-off-by: Daniel Vetter

drm/i915: Snapshot seqno of most recently submitted request.

2015-07-13T20:42:39+00:00

The hang checker needs to inspect whether or not the ring request list is empty as well as if the given engine has reached or passed the most recently submitted request. The problem with this is that the hang checker cannot grab the struct_mutex, which is required in order to safely inspect requests since requests might be deallocated during inspection. In the past we've had kernel panics due to this very unsynchronized access in the hang checker. One solution to this problem is to not inspect the requests directly since we're only interested in the seqno of the most recently submitted request - not the request itself. Instead the seqno of the most recently submitted request is stored separately, which the hang checker then inspects, circumventing the issue of synchronization from the hang checker entirely. This fixes a regression introduced in commit 44cdd6d219bc64f6810b8ed0023a4d4db9e0fe68 Author: John Harrison Date: Mon Nov 24 18:49:40 2014 +0000 drm/i915: Convert 'ring_idle()' to use requests not seqnos v2 (Chris Wilson): - Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather than compute it over again. - Remove extra whitespace. Issue: VIZ-5998 Signed-off-by: Tomas Elf Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson [danvet: Add regressing commit citation provided by Chris.] Signed-off-by: Daniel Vetter

drm/i915: Enable resource streamer bits on MI_BATCH_BUFFER_START

2015-07-06T08:25:57+00:00

Adds support for enabling the resource streamer on the legacy ringbuffer for HSW and GEN8. Reviewed-by: Chris Wilson Signed-off-by: Abdiel Janulgue Signed-off-by: Daniel Vetter

drm/i915: Reserve space improvements

2015-07-03T05:38:59+00:00

An earlier patch was added to reserve space in the ring buffer for the commands issued during 'add_request()'. The initial version was pessimistic in the way it handled buffer wrapping and would cause premature wraps and thus waste ring space. This patch updates the code to better handle the wrap case. It no longer enforces that the space being asked for and the reserved space are a single contiguous block. Instead, it allows the reserve to be on the far end of a wrap operation. It still guarantees that the space is available so when the wrap occurs, no wait will happen. Thus the wrap cannot fail which is the whole point of the exercise. Also fixed a merge failure with some comments from the original patch. v2: Incorporated suggestion by David Gordon to move the wrap code inside the prepare function and thus allow a single combined wait_for_space() call rather than doing one before the wrap and another after. This also makes the prepare code much simpler and easier to follow. v3: Fix for 'effective_size' vs 'size' during ring buffer remainder calculations (spotted by Tomas Elf). For: VIZ-5115 CC: Daniel Vetter Signed-off-by: John Harrison Reviewed-by: Tomas Elf Signed-off-by: Daniel Vetter