lwn.git/drivers/gpu/host1x/job.c, branch docs-mw

gpu: host1x: Implement job tracking using DMA fences

2023-01-26T14:55:38+00:00

In anticipation of removal of the intr API, implement job tracking using DMA fences instead. The main two things about this are making cdma_update schedule the work since fence completion can now be called from interrupt context, and some complication in ensuring the callback is not running when we free the fence. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Do not use mapping cache for job submissions

2022-04-06T13:12:36+00:00

Buffer mappings used in job submissions are usually small and not rapidly reused as opposed to framebuffers (which are usually large and rapidly reused, for example when page-flipping between double-buffered framebuffers). Avoid going through the mapping cache for these buffers since the cache would also lead to leaks if nobody is ever releasing the cache's last reference. For DRM/KMS these last references are dropped when the framebuffers are removed and therefore no longer needed. While at it, also add a note about the need to explicitly remove the final reference to the mapping in the cache. Reviewed-by: Jon Hunter Tested-by: Jon Hunter Signed-off-by: Thierry Reding

drm/tegra: Implement buffer object cache

2021-12-16T13:07:06+00:00

This cache is used to avoid mapping and unmapping buffer objects unnecessarily. Mappings are cached per client and stay hot until the buffer object is destroyed. Signed-off-by: Thierry Reding

drm/tegra: Implement correct DMA-BUF semantics

2021-12-16T13:07:06+00:00

DMA-BUF requires that each device that accesses a DMA-BUF attaches to it separately. To do so the host1x_bo_pin() and host1x_bo_unpin() functions need to be reimplemented so that they can return a mapping, which either represents an attachment or a map of the driver's own GEM object. Signed-off-by: Thierry Reding

gpu: host1x: Add option to skip firewall for a job

2021-08-10T12:42:49+00:00

The new UAPI will have its own firewall, and we don't want to run the firewall in the Host1x driver for those jobs. As such, add a parameter to host1x_job_alloc to specify if we want to skip the firewall in the Host1x driver. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer

2021-08-10T12:41:19+00:00

Add support for inserting syncpoint waits in the CDMA pushbuffer. These waits need to be done in HOST1X class, while gather submitted by the application execute in engine class. Support is added by converting the gather list of job into a command list that can include both gathers and waits. When the job is submitted, these commands are pushed as the appropriate opcodes on the CDMA pushbuffer. Also supported are waits relative to the start of the job, which are useful for jobs doing multiple things with an engine that doesn't natively support pipelining. While at it, use 32-bit waits on chips that support them. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Add job release callback

2021-08-10T12:41:02+00:00

Add a callback field to the job structure, to be called just before the job is to be freed. This allows the job's submitter to clean up any of its own state, like decrement runtime PM refcounts. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Add no-recovery mode

2021-08-10T12:40:23+00:00

Add a new property for jobs to enable or disable recovery i.e. CPU increments of syncpoints to max value on job timeout. This allows for a more solid model for hanged jobs, where userspace doesn't need to guess if a syncpoint increment happened because the job completed, or because job timeout was triggered. On job timeout, we stop the channel, NOP all future jobs on the channel using the same syncpoint, mark the syncpoint as locked and resume the channel from the next job, if any. The future jobs are NOPed, since because we don't do the CPU increments, the value of the syncpoint is no longer synchronized, and any waiters would become confused if a future job incremented the syncpoint. The syncpoint is marked locked to ensure that any future jobs cannot increment the syncpoint either, until the application has recognized the situation and reallocated the syncpoint. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Cleanup and refcounting for syncpoints

2021-03-31T15:42:13+00:00

Add reference counting for allocated syncpoints to allow keeping them allocated while jobs are referencing them. Additionally, clean up various places using syncpoint IDs to use host1x_syncpt pointers instead. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

drm: host1x: fix common struct sg_table related issues

2020-09-10T06:18:35+00:00

The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function returns the number of the created entries in the DMA address space. However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and dma_unmap_sg must be called with the original number of the entries passed to the dma_map_sg(). struct sg_table is a common structure used for describing a non-contiguous memory buffer, used commonly in the DRM and graphics subsystems. It consists of a scatterlist with memory pages and DMA addresses (sgl entry), as well as the number of scatterlist entries: CPU pages (orig_nents entry) and DMA mapped pages (nents entry). It turned out that it was a common mistake to misuse nents and orig_nents entries, calling DMA-mapping functions with a wrong number of entries or ignoring the number of mapped entries returned by the dma_map_sg() function. To avoid such issues, lets use a common dma-mapping wrappers operating directly on the struct sg_table objects and use scatterlist page iterators where possible. This, almost always, hides references to the nents and orig_nents entries, making the code robust, easier to follow and copy/paste safe. Signed-off-by: Marek Szyprowski Reviewed-by: Robin Murphy