Age | Commit message (Collapse) | Author |
|
Now that SCM can be a loadable module, we have to add another
dependency to avoid link failures when ipa or adreno-gpu are
built-in:
aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe':
ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available'
ld.lld: error: undefined symbol: qcom_scm_is_available
>>> referenced by adreno_gpu.c
>>> gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in archive drivers/built-in.a
This can happen when CONFIG_ARCH_QCOM is disabled and we don't select
QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd
use a similar dependency here to what we have for QCOM_RPROC_COMMON,
but that causes dependency loops from other things selecting QCOM_SCM.
This appears to be an endless problem, so try something different this
time:
- CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on'
but that is simply selected by all of its users
- All the stubs in include/linux/qcom_scm.h can go away
- arm-smccc.h needs to provide a stub for __arm_smccc_smc() to
allow compile-testing QCOM_SCM on all architectures.
- To avoid a circular dependency chain involving RESET_CONTROLLER
and PINCTRL_SUNXI, drop the 'select RESET_CONTROLLER' statement.
According to my testing this still builds fine, and the QCOM
platform selects this symbol already.
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Alex Elder <elder@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
'x86/vt-d' and 'core' into next
|
|
Add the missing unlock before return from function arm_smmu_device_group()
in the error handling case.
Fixes: b1a1347912a7 ("iommu/arm-smmu: Fix race condition during iommu_group creation")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210820074949.1946576-1-yangyingliang@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
In preparation for the strict vs. non-strict decision for DMA domains to
be expressed in the domain type, make sure we expose our flush queue
awareness by accepting the new domain type.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/8f217ef285bd0bb9456c27ef622d2efdbbca1ad8.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
IO_PGTABLE_QUIRK_NON_STRICT was never a very comfortable fit, since it's
not a quirk of the pagetable format itself. Now that we have a more
appropriate way to convey non-strict unmaps, though, this last of the
non-quirk quirks can also go, and with the flush queue code also now
enforcing its own ordering we can have a lovely cleanup all round.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/155b5c621cd8936472e273a8b07a182f62c6c20d.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The core code bakes its own cookies now.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/7ae3680dad9735cc69c3618866666896bd11e031.1628682048.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Pre-zeroing the batched commands structure is inefficient, as individual
commands are zeroed later in arm_smmu_cmdq_build_cmd(). The size is quite
large and commonly most commands won't even be used:
struct arm_smmu_cmdq_batch cmds = {};
345c: 52800001 mov w1, #0x0 // #0
3460: d2808102 mov x2, #0x408 // #1032
3464: 910143a0 add x0, x29, #0x50
3468: 94000000 bl 0 <memset>
Stop pre-zeroing the complete structure and only zero the num member.
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1628696966-88386-1-git-send-email-john.garry@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When SMMU_GERROR.CMDQP_ERR is different to SMMU_GERRORN.CMDQP_ERR, it
indicates that one or more errors have been encountered on a command queue
control page interface. We need to traverse all ECMDQs in that control
page to find all errors. For each ECMDQ error handling, it is much the
same as the CMDQ error handling. This common processing part is extracted
as a new function __arm_smmu_cmdq_skip_err().
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210811114852.2429-5-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
One SMMU has only one normal CMDQ. Therefore, this CMDQ is used regardless
of the core on which the command is inserted. It can be referenced
directly through "smmu->cmdq". However, one SMMU has multiple ECMDQs, and
the ECMDQ used by the core on which the command insertion is executed may
be different. So the helper function arm_smmu_get_cmdq() is added, which
returns the CMDQ/ECMDQ that the current core should use. Currently, the
code that supports ECMDQ is not added. just simply returns "&smmu->cmdq".
Many subfunctions of arm_smmu_cmdq_issue_cmdlist() use "&smmu->cmdq" or
"&smmu->cmdq.q" directly. To support ECMDQ, they need to call the newly
added function arm_smmu_get_cmdq() instead.
Note that normal CMDQ is still required until ECMDQ is available.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210811114852.2429-4-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
arm_smmu_cmdq_issue_cmd_with_sync()
The obvious key to the performance optimization of commit 587e6c10a7ce
("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
to allow multiple cores to insert commands in parallel after a brief mutex
contention.
Obviously, inserting as many commands at a time as possible can reduce the
number of times the mutex contention participates, thereby improving the
overall performance. At least it reduces the number of calls to function
arm_smmu_cmdq_issue_cmdlist().
Therefore, function arm_smmu_cmdq_issue_cmd_with_sync() is added to insert
the 'cmd+sync' commands at a time.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210811114852.2429-3-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The obvious key to the performance optimization of commit 587e6c10a7ce
("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
to allow multiple cores to insert commands in parallel after a brief mutex
contention.
Obviously, inserting as many commands at a time as possible can reduce the
number of times the mutex contention participates, thereby improving the
overall performance. At least it reduces the number of calls to function
arm_smmu_cmdq_issue_cmdlist().
Therefore, use command queue batching helpers to insert multiple commands
at a time.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210811114852.2429-2-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently for iommu_unmap() of large scatter-gather list with page size
elements, the majority of time is spent in flushing of partial walks in
__arm_lpae_unmap() which is a VA based TLB invalidation invalidating
page-by-page on iommus like arm-smmu-v2 (TLBIVA).
For example: to unmap a 32MB scatter-gather list with page size elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
overhead.
On qcom implementation, there are several performance improvements for
TLB cache invalidations in HW like wait-for-safe (for realtime clients
such as camera and display) and few others to allow for cache
lookups/updates when TLBI is in progress for the same context bank.
So the cost of over-invalidation is less compared to the unmap latency
on several usecases like camera which deals with large buffers. So,
ASID based TLB invalidations (TLBIASID) can be used to invalidate the
entire context for partial walk flush thereby improving the unmap
latency.
For this example of 32MB scatter-gather list unmap, this change results
in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192
TLBIVAs thereby increasing the performance of unmaps drastically.
Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)
Before this optimization:
size iommu_map_sg iommu_unmap
4K 2.067 us 1.854 us
64K 9.598 us 8.802 us
1M 148.890 us 130.718 us
2M 305.864 us 67.291 us
12M 1793.604 us 390.838 us
16M 2386.848 us 518.187 us
24M 3563.296 us 775.989 us
32M 4747.171 us 1033.364 us
After this optimization:
size iommu_map_sg iommu_unmap
4K 1.723 us 1.765 us
64K 9.880 us 8.869 us
1M 155.364 us 135.223 us
2M 303.906 us 5.385 us
12M 1786.557 us 21.250 us
16M 2391.890 us 27.437 us
24M 3570.895 us 39.937 us
32M 4755.234 us 51.797 us
Real world data also shows big difference in unmap performance as below:
There were reports of camera frame drops because of high overhead in
iommu unmap without this optimization because of frequent unmaps issued
by camera of about 100MB/s taking more than 100ms thereby causing frame
drops.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/20210811160426.10312-1-saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When two devices with same SID are getting probed concurrently through
iommu_probe_device(), the iommu_group sometimes is getting allocated more
than once as call to arm_smmu_device_group() is not protected for
concurrency. Furthermore, it leads to each device holding a different
iommu_group and domain pointer, separate IOVA space and only one of the
devices' domain is used for translations from IOMMU. This causes accesses
from other device to fault or see incorrect translations.
Fix this by protecting iommu_group allocation from concurrency in
arm_smmu_device_group().
Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Link: https://lore.kernel.org/r/1628570641-9127-3-git-send-email-amhetre@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk
of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in
such cases, we would need to drop the XO clock vote in unprepare call and
this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened)
clock driver which controls RPMh managed clock resources for new QTI SoCs.
Given we cannot have a sleeping calls such as clk_bulk_prepare() and
clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
operations like map and unmap can be in atomic context and are in fast
path, add this prepare and unprepare call to drop the XO vote only for
system pm callbacks since it is not a fast path and we expect the system
to enter deep sleep states with system pm as opposed to runtime pm.
This is a similar sequence of clock requests (prepare,enable and
disable,unprepare) in arm-smmu probe and remove.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Co-developed-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
Link: https://lore.kernel.org/r/20210810064808.32486-1-saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Implement the map_pages() callback for ARM SMMUV3 driver to allow calls
from iommu_map to map multiple pages of the same size in one call.
Also remove the map() callback for the ARM SMMUV3 driver as it will no
longer be used.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1627697831-158822-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Implement the unmap_pages() callback for ARM SMMUV3 driver to allow calls
from iommu_unmap to unmap multiple pages of the same size in one call.
Also remove the unmap() callback for the ARM SMMUV3 driver as it will
no longer be used.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1627697831-158822-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Members of struct "llq" will be zero-inited, apart from member max_n_shift.
But we write llq.val straight after the init, so it was pointless to zero
init those other members. As such, separately init member max_n_shift
only.
In addition, struct "head" is initialised to "llq" only so that member
max_n_shift is set. But that member is never referenced for "head", so
remove any init there.
Removing these initializations is seen as a small performance optimisation,
as this code is (very) hot path.
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1624293394-202509-1-git-send-email-john.garry@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
If people are going to insist on calling iommu_iova_to_phys()
pointlessly and expecting it to work, we can at least do ourselves a
favour by handling those cases in the core code, rather than repeatedly
across an inconsistent handful of drivers.
Since all the existing drivers implement the internal callback, and any
future ones are likely to want to work with iommu-dma which relies on
iova_to_phys a fair bit, we may as well remove that currently-redundant
check as well and consider it mandatory.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/f564f3f6ff731b898ff7a898919bf871c2c7745a.1626354264.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Implement the map_pages() callback for the ARM SMMU driver
to allow calls from iommu_map to map multiple pages of
the same size in one call. Also, remove the map() callback
for the ARM SMMU driver, as it will no longer be used.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-16-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Implement the unmap_pages() callback for the ARM SMMU driver
to allow calls from iommu_unmap to unmap multiple pages of
the same size in one call. Also, remove the unmap() callback
for the SMMU driver, as it will no longer be used.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-15-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull fallthrough fixes from Gustavo Silva:
"This fixes many fall-through warnings when building with Clang and
-Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for
Clang, globally.
It's also important to notice that since we have adopted the use of
the pseudo-keyword macro fallthrough, we also want to avoid having
more /* fall through */ comments being introduced. Contrary to GCC,
Clang doesn't recognize any comments as implicit fall-through markings
when the -Wimplicit-fallthrough option is enabled.
So, in order to avoid having more comments being introduced, we use
the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang,
will cause a warning in case a code comment is intended to be used as
a fall-through marking. The patch for Makefile also enforces this.
We had almost 4,000 of these issues for Clang in the beginning, and
there might be a couple more out there when building some
architectures with certain configurations. However, with the recent
fixes I think we are in good shape and it is now possible to enable
the warning for Clang"
* tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
Makefile: Enable -Wimplicit-fallthrough for Clang
powerpc/smp: Fix fall-through warning for Clang
dmaengine: mpc512x: Fix fall-through warning for Clang
usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang
powerpc/powernv: Fix fall-through warning for Clang
MIPS: Fix unreachable code issue
MIPS: Fix fall-through warnings for Clang
ASoC: Mediatek: MT8183: Fix fall-through warning for Clang
power: supply: Fix fall-through warnings for Clang
dmaengine: ti: k3-udma: Fix fall-through warning for Clang
s390: Fix fall-through warnings for Clang
dmaengine: ipu: Fix fall-through warning for Clang
iommu/arm-smmu-v3: Fix fall-through warning for Clang
mmc: jz4740: Fix fall-through warning for Clang
PCI: Fix fall-through warning for Clang
scsi: libsas: Fix fall-through warning for Clang
video: fbdev: Fix fall-through warning for Clang
math-emu: Fix fall-through warning
cpufreq: Fix fall-through warning for Clang
drm/msm: Fix fall-through warning in msm_gem_new_impl()
...
|
|
QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller,
what fails for the second and latter IOMMU devices. This is intended and
must be not fatal to the driver registration process. Also the cleanup
path should take care of the runtime PM state, what is missing in the
current patch. Revert relevant changes to the QCOM IOMMU driver until
a proper fix is prepared.
This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155.
Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210705065657.30356-1-m.szyprowski@samsung.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Fix the following fallthrough warning (arm64-randconfig with Clang):
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:382:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/60edca25.k00ut905IFBjPyt5%25lkp@intel.com/
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM driver updates from Olof Johansson:
- Reset controllers: Adding support for Microchip Sparx5 Switch.
- Memory controllers: ARM Primecell PL35x SMC memory controller driver
cleanups and improvements.
- i.MX SoC drivers: Power domain support for i.MX8MM and i.MX8MN.
- Rockchip: RK3568 power domains support + DT binding updates,
cleanups.
- Qualcomm SoC drivers: Amend socinfo with more SoC/PMIC details,
including support for MSM8226, MDM9607, SM6125 and SC8180X.
- ARM FFA driver: "Firmware Framework for ARMv8-A", defining management
interfaces and communication (including bus model) between partitions
both in Normal and Secure Worlds.
- Tegra Memory controller changes, including major rework to deal with
identity mappings at boot and integration with ARM SMMU pieces.
* tag 'arm-drivers-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (120 commits)
firmware: turris-mox-rwtm: add marvell,armada-3700-rwtm-firmware compatible string
firmware: turris-mox-rwtm: show message about HWRNG registration
firmware: turris-mox-rwtm: fail probing when firmware does not support hwrng
firmware: turris-mox-rwtm: report failures better
firmware: turris-mox-rwtm: fix reply status decoding function
soc: imx: gpcv2: add support for i.MX8MN power domains
dt-bindings: add defines for i.MX8MN power domains
firmware: tegra: bpmp: Fix Tegra234-only builds
iommu/arm-smmu: Use Tegra implementation on Tegra186
iommu/arm-smmu: tegra: Implement SID override programming
iommu/arm-smmu: tegra: Detect number of instances at runtime
dt-bindings: arm-smmu: Add Tegra186 compatible string
firmware: qcom_scm: Add MDM9607 compatible
soc: qcom: rpmpd: Add MDM9607 RPM Power Domains
soc: renesas: Add support to read LSI DEVID register of RZ/G2{L,LC} SoC's
soc: renesas: Add ARCH_R9A07G044 for the new RZ/G2L SoC's
dt-bindings: soc: rockchip: drop unnecessary #phy-cells from grf.yaml
memory: emif: remove unused frequency and voltage notifiers
memory: fsl_ifc: fix leak of private memory on probe failure
memory: fsl_ifc: fix leak of IO mapping on probe failure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- SMMU Updates from Will Deacon:
- SMMUv3:
- Support stalling faults for platform devices
- Decrease defaults sizes for the event and PRI queues
- SMMUv2:
- Support for a new '->probe_finalize' hook, needed by Nvidia
- Even more Qualcomm compatible strings
- Avoid Adreno TTBR1 quirk for DB820C platform
- Intel VT-d updates from Lu Baolu:
- Convert Intel IOMMU to use sva_lib helpers in iommu core
- ftrace and debugfs supports for page fault handling
- Support asynchronous nested capabilities
- Various misc cleanups
- Support for new VIOT ACPI table to make the VirtIO IOMMU
available on x86
- Add the amd_iommu=force_enable command line option to enable
the IOMMU on platforms where they are known to cause problems
- Support for version 2 of the Rockchip IOMMU
- Various smaller fixes, cleanups and refactorings
* tag 'iommu-updates-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits)
iommu/virtio: Enable x86 support
iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
ACPI: Add driver for the VIOT table
ACPI: Move IOMMU setup code out of IORT
ACPI: arm64: Move DMA setup operations out of IORT
iommu/vt-d: Fix dereference of pointer info before it is null checked
iommu: Update "iommu.strict" documentation
iommu/arm-smmu: Check smmu->impl pointer before dereferencing
iommu/arm-smmu-v3: Remove unnecessary oom message
iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation
iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails
iommu/vt-d: Fix linker error on 32-bit
iommu/vt-d: No need to typecast
iommu/vt-d: Define counter explicitly as unsigned int
iommu/vt-d: Remove unnecessary braces
iommu/vt-d: Removed unused iommu_count in dmar domain
iommu/vt-d: Use bitfields for DMAR capabilities
iommu/vt-d: Use DEVICE_ATTR_RO macro
iommu/vt-d: Fix out-bounds-warning in intel/svm.c
iommu/vt-d: Add PRQ handling latency sampling
...
|
|
'x86/amd', 'virtio' and 'core' into next
|
|
Add, via the adreno-smmu-priv interface, a way for the GPU to request
the SMMU to stall translation on faults, and then later resume the
translation, either retrying or terminating the current translation.
This will be used on the GPU side to "freeze" the GPU while we snapshot
useful state for devcoredump.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210610214431.539029-5-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Add a callback in adreno-smmu-priv to read interesting SMMU
registers to provide an opportunity for a richer debug experience
in the GPU driver.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210610214431.539029-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Call report_iommu_fault() to allow upper-level drivers to register their
own fault handlers.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210610214431.539029-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Merge in support for the Arm SMMU '->probe_finalize()' implementation
callback, which is required to prevent early faults in conjunction with
Nvidia's memory controller.
* for-thierry/arm-smmu:
iommu/arm-smmu: Check smmu->impl pointer before dereferencing
iommu/arm-smmu: Implement ->probe_finalize()
|
|
Commit 0d97174aeadf ("iommu/arm-smmu: Implement ->probe_finalize()")
added a new optional ->probe_finalize callback to 'struct arm_smmu_impl'
but neglected to check that 'smmu->impl' is present prior to checking
if the new callback is present.
Add the missing check, which avoids dereferencing NULL when probing an
SMMU which doesn't require any implementation-specific callbacks:
| Unable to handle kernel NULL pointer dereference at virtual address
| 0000000000000070
|
| Call trace:
| arm_smmu_probe_finalize+0x14/0x48
| of_iommu_configure+0xe4/0x1b8
| of_dma_configure_id+0xf8/0x2d8
| pci_dma_configure+0x44/0x88
| really_probe+0xc0/0x3c0
Fixes: 0d97174aeadf ("iommu/arm-smmu: Implement ->probe_finalize()")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message
Remove it can help us save a bit of memory.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210609125438.14369-1-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The reference counting issue happens in several exception handling paths
of arm_smmu_iova_to_phys_hard(). When those error scenarios occur, the
function forgets to decrease the refcount of "smmu" increased by
arm_smmu_rpm_get(), causing a refcount leak.
Fix this issue by jumping to "out" label when those error scenarios
occur.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/1623293391-17261-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
refcount of the "smmu" even though the return value is less than 0.
The reference counting issue happens in some error handling paths of
arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get()
fails, the caller functions forget to decrease the refcount of "smmu"
increased by arm_smmu_rpm_get(), causing a refcount leak.
Fix this issue by calling pm_runtime_resume_and_get() instead of
pm_runtime_get_sync() in arm_smmu_rpm_get(), which can keep the refcount
balanced in case of failure.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Link: https://lore.kernel.org/r/1623293672-17954-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Tegra186 requires the same SID override programming as Tegra194 in order
to seamlessly transition from the firmware framebuffer to the Linux
framebuffer, so the Tegra implementation needs to be used on Tegra186
devices as well.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20210603164632.1000458-7-thierry.reding@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
|
|
The secure firmware keeps some SID override registers set as passthrough
in order to allow devices such as the display controller to operate with
no knowledge of SMMU translations until an operating system driver takes
over. This is needed in order to seamlessly transition from the firmware
framebuffer to the OS framebuffer.
Upon successfully attaching a device to the SMMU and in the process
creating identity mappings for memory regions that are being accessed,
the Tegra implementation will call into the memory controller driver to
program the override SIDs appropriately.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20210603164632.1000458-6-thierry.reding@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
|
|
Parse the reg property in device tree and detect the number of instances
represented by a device tree node. This is subsequently needed in order
to support single-instance SMMUs with the Tegra implementation because
additional programming is needed to properly configure the SID override
registers in the memory controller.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20210603164632.1000458-5-thierry.reding@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
|
|
The struct acpi_platform_list and function acpi_match_platform_list()
defined in include/linux/acpi.h are available only when CONFIG_ACPI is
enabled. Add protection to fix the build issues with !CONFIG_ACPI.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210609015511.3955-1-shawn.guo@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
If device registration fails, remove sysfs attribute
and if setting bus callbacks fails, unregister the device
and cleanup the sysfs attribute.
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Link: https://lore.kernel.org/r/20210608164559.204023-1-ameynarkhede03@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU
both implement "arm,mmu-500" in some QTI SoCs and to run through
adreno smmu specific implementation such as enabling split pagetables
support, we need to match the "qcom,adreno-smmu" compatible first
before apss smmu or else we will be running apps smmu implementation
for adreno smmu and the additional features for adreno smmu is never
set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps
and adreno smmu implementing "arm,mmu-500", so the adreno smmu
implementation is never reached because the current sequence checks
for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that
specific impl and we never reach adreno smmu specific implementation.
Suggested-by: Akhil P Oommen <akhilpo@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/c42181d313fdd440011541a28cde8cd10fffb9d3.1623155117.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add compatible for SC7280 SMMU to use the Qualcomm Technologies, Inc.
specific implementation.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/53a50cd91c97b5b598a73941985b79b51acefa14.1623155117.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The only place of_iommu.h is needed is in drivers/of/device.c. Remove it
from everywhere else.
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Yong Wu <yong.wu@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20210527193710.1281746-2-robh@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Commit d25f6ead162e ("iommu/arm-smmu-v3: Increase maximum size of queues")
expands the cmdq queue size to improve the success rate of concurrent
command queue space allocation by multiple cores. However, this extension
does not apply to evtq and priq, because for both of them, the SMMU driver
is the consumer. Instead, memory resources are wasted. Therefore, the
queue size of evtq and priq is restored to the original setting, one page.
Fixes: d25f6ead162e ("iommu/arm-smmu-v3: Increase maximum size of queues")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210531123553.9602-1-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When a device or driver misbehaves, it is possible to receive DMA fault
events much faster than we can print them out, causing a lock up of the
system and inability to cancel the source of the problem. Ratelimit
printing of events to help recovery.
Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210531095648.118282-1-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCIe PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked
and the OS is given a chance to fix the page tables and retry the
transaction.
Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler.
If the fault is recoverable, it will call us back to terminate or
continue the stall.
To use stall device drivers need to enable IOMMU_DEV_FEAT_IOPF, which
initializes the fault queue for the device.
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210526161927.24268-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
db820c wants to use the qcom smmu path to get HUPCF set (which keeps
the GPU from wedging and then sometimes wedging the kernel after a
page fault), but it doesn't have separate pagetables support yet in
drm/msm so we can't go all the way to the TTBR1 path.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210326231303.3071950-1-eric@anholt.net
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add compatible for SM6125 SoC
Signed-off-by: Martin Botka <martin.botka@somainline.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210523212535.740979-1-martin.botka@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Implement a ->probe_finalize() callback that can be used by vendor
implementations to perform extra programming necessary after devices
have been attached to the SMMU.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20210603164632.1000458-4-thierry.reding@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fix checkpatch warning in arm-smmu-v3.c:
static const char * array should probably be static const char * const
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The hookup with qcom_smmu_impl is required to do ACPI boot on SC8180X
based devices like Lenovo Flex 5G laptop and Microsoft Surface Pro X.
Define acpi_platform_list for these platforms and match them using
acpi_match_platform_list() call, and create qcom_smmu_impl accordingly.
(np == NULL) is used to check ACPI boot, because fwnode of SMMU device
is a static allocation and thus helpers like has_acpi_companion() don't
work here.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210509022607.17534-1-shawn.guo@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|