<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/drivers/gpu/host1x/job.c, branch docs-mw</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-mw</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-mw'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2023-01-26T14:55:38+00:00</updated>
<entry>
<title>gpu: host1x: Implement job tracking using DMA fences</title>
<updated>2023-01-26T14:55:38+00:00</updated>
<author>
<name>Mikko Perttunen</name>
<email>mperttunen@nvidia.com</email>
</author>
<published>2023-01-19T13:09:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c24973ed795fec5c12d8a822a0de99a4b7bab394'/>
<id>urn:sha1:c24973ed795fec5c12d8a822a0de99a4b7bab394</id>
<content type='text'>
In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.

Signed-off-by: Mikko Perttunen &lt;mperttunen@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>gpu: host1x: Do not use mapping cache for job submissions</title>
<updated>2022-04-06T13:12:36+00:00</updated>
<author>
<name>Thierry Reding</name>
<email>treding@nvidia.com</email>
</author>
<published>2022-03-24T10:30:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3e9c4584336149146fe15cb5703fc10a2ca2d2a0'/>
<id>urn:sha1:3e9c4584336149146fe15cb5703fc10a2ca2d2a0</id>
<content type='text'>
Buffer mappings used in job submissions are usually small and not
rapidly reused as opposed to framebuffers (which are usually large and
rapidly reused, for example when page-flipping between double-buffered
framebuffers). Avoid going through the mapping cache for these buffers
since the cache would also lead to leaks if nobody is ever releasing
the cache's last reference. For DRM/KMS these last references are
dropped when the framebuffers are removed and therefore no longer
needed.

While at it, also add a note about the need to explicitly remove the
final reference to the mapping in the cache.

Reviewed-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Tested-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>drm/tegra: Implement buffer object cache</title>
<updated>2021-12-16T13:07:06+00:00</updated>
<author>
<name>Thierry Reding</name>
<email>treding@nvidia.com</email>
</author>
<published>2020-02-07T15:50:52+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1f39b1dfa53c84b56d7ad37fed44afda7004959d'/>
<id>urn:sha1:1f39b1dfa53c84b56d7ad37fed44afda7004959d</id>
<content type='text'>
This cache is used to avoid mapping and unmapping buffer objects
unnecessarily. Mappings are cached per client and stay hot until
the buffer object is destroyed.

Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>drm/tegra: Implement correct DMA-BUF semantics</title>
<updated>2021-12-16T13:07:06+00:00</updated>
<author>
<name>Thierry Reding</name>
<email>treding@nvidia.com</email>
</author>
<published>2021-09-09T13:51:24+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c6aeaf56f468a565f6d2f27325fc07d35cdcd3cb'/>
<id>urn:sha1:c6aeaf56f468a565f6d2f27325fc07d35cdcd3cb</id>
<content type='text'>
DMA-BUF requires that each device that accesses a DMA-BUF attaches to it
separately. To do so the host1x_bo_pin() and host1x_bo_unpin() functions
need to be reimplemented so that they can return a mapping, which either
represents an attachment or a map of the driver's own GEM object.

Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>gpu: host1x: Add option to skip firewall for a job</title>
<updated>2021-08-10T12:42:49+00:00</updated>
<author>
<name>Mikko Perttunen</name>
<email>mperttunen@nvidia.com</email>
</author>
<published>2021-06-10T11:04:46+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0fddaa85d66140466df8e848afcda452b7d7b416'/>
<id>urn:sha1:0fddaa85d66140466df8e848afcda452b7d7b416</id>
<content type='text'>
The new UAPI will have its own firewall, and we don't want to run
the firewall in the Host1x driver for those jobs. As such, add a
parameter to host1x_job_alloc to specify if we want to skip the
firewall in the Host1x driver.

Signed-off-by: Mikko Perttunen &lt;mperttunen@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer</title>
<updated>2021-08-10T12:41:19+00:00</updated>
<author>
<name>Mikko Perttunen</name>
<email>mperttunen@nvidia.com</email>
</author>
<published>2021-06-10T11:04:45+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e902585fc8b639f1a1258eaa6265e98994e34ef8'/>
<id>urn:sha1:e902585fc8b639f1a1258eaa6265e98994e34ef8</id>
<content type='text'>
Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Also supported are waits relative to the start of the job,
which are useful for jobs doing multiple things with an engine
that doesn't natively support pipelining.

While at it, use 32-bit waits on chips that support them.

Signed-off-by: Mikko Perttunen &lt;mperttunen@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>gpu: host1x: Add job release callback</title>
<updated>2021-08-10T12:41:02+00:00</updated>
<author>
<name>Mikko Perttunen</name>
<email>mperttunen@nvidia.com</email>
</author>
<published>2021-06-10T11:04:44+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=17a298e9ac7c011e64a9c0b6f807b43f9af22eac'/>
<id>urn:sha1:17a298e9ac7c011e64a9c0b6f807b43f9af22eac</id>
<content type='text'>
Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen &lt;mperttunen@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>gpu: host1x: Add no-recovery mode</title>
<updated>2021-08-10T12:40:23+00:00</updated>
<author>
<name>Mikko Perttunen</name>
<email>mperttunen@nvidia.com</email>
</author>
<published>2021-06-10T11:04:43+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c78f837ae3d1e532ff4eb90155b42d7a2e892a3f'/>
<id>urn:sha1:c78f837ae3d1e532ff4eb90155b42d7a2e892a3f</id>
<content type='text'>
Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen &lt;mperttunen@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>gpu: host1x: Cleanup and refcounting for syncpoints</title>
<updated>2021-03-31T15:42:13+00:00</updated>
<author>
<name>Mikko Perttunen</name>
<email>mperttunen@nvidia.com</email>
</author>
<published>2021-03-29T13:38:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2aed4f5ab04af922a7cf1b616701845c9ed2473f'/>
<id>urn:sha1:2aed4f5ab04af922a7cf1b616701845c9ed2473f</id>
<content type='text'>
Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen &lt;mperttunen@nvidia.com&gt;
Signed-off-by: Thierry Reding &lt;treding@nvidia.com&gt;
</content>
</entry>
<entry>
<title>drm: host1x: fix common struct sg_table related issues</title>
<updated>2020-09-10T06:18:35+00:00</updated>
<author>
<name>Marek Szyprowski</name>
<email>m.szyprowski@samsung.com</email>
</author>
<published>2020-04-28T11:11:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=67ed9f9d95189d0f99c38f34b8bb1a8fa135c877'/>
<id>urn:sha1:67ed9f9d95189d0f99c38f34b8bb1a8fa135c877</id>
<content type='text'>
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
</content>
</entry>
</feed>
