lwn.git/kernel/dma/swiotlb.c, branch docs-6.9

Merge tag 'dma-mapping-6.8-2024-01-18' of git://git.infradead.org/users/hch/dma-mapping

2024-01-19T00:49:34+00:00

Pull dma-mapping fixes from Christoph Hellwig: - fix kerneldoc warnings (Randy Dunlap) - better bounds checking in swiotlb (ZhangPeng) * tag 'dma-mapping-6.8-2024-01-18' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: fix kernel-doc warnings swiotlb: check alloc_size before the allocation of a new memory pool

Merge tag 'dma-mapping-6.8-2024-01-08' of git://git.infradead.org/users/hch/dma-mapping

2024-01-11T21:46:50+00:00

Pull dma-mapping updates from Christoph Hellwig: - reduce area lock contention for non-primary IO TLB pools (Petr Tesarik) - don't store redundant offsets in the dma_ranges stuctures (Robin Murphy) - clear dev->dma_mem when freeing per-device pools (Joakim Zhang) * tag 'dma-mapping-6.8-2024-01-08' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: clear dev->dma_mem to NULL after freeing it swiotlb: reduce area lock contention for non-primary IO TLB pools dma-mapping: don't store redundant offsets

swiotlb: check alloc_size before the allocation of a new memory pool

2024-01-09T15:58:36+00:00

The allocation request for swiotlb contiguous memory greater than 128*2KB cannot be fulfilled because it exceeds the maximum contiguous memory limit. If the swiotlb memory we allocate is larger than 128*2KB, swiotlb_find_slots() will still schedule the allocation of a new memory pool, which will increase memory overhead. Fix it by adding a check with alloc_size no more than 128*2KB before scheduling the allocation of a new memory pool in swiotlb_find_slots(). Signed-off-by: ZhangPeng Reviewed-by: Petr Tesarik Signed-off-by: Christoph Hellwig

mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER

2024-01-08T23:27:15+00:00

commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov Cc: Linus Torvalds Signed-off-by: Andrew Morton

swiotlb: reduce area lock contention for non-primary IO TLB pools

2023-12-15T11:32:45+00:00

If multiple areas and multiple IO TLB pools exist, first iterate the current CPU specific area in all pools. Then move to the next area index. This is best illustrated by a diagram: area 0 | area 1 | ... | area M | pool 0 A B C pool 1 D E ... pool N F G H Currently, each pool is searched before moving on to the next pool, i.e. the search order is A, B ... C, D, E ... F, G ... H. With this patch, each area is searched in all pools before moving on to the next area, i.e. the search order is A, D ... F, B, E ... G ... C ... H. Note that preemption is not disabled, and raw_smp_processor_id() may not return a stable result, but it is called only once to determine the initial area index. The search will iterate over all areas eventually, even if the current task is preempted. Next, some pools may have less (but not more) areas than default_nareas. Skip such pools if the distance from the initial area index is greater than pool->nareas. This logic ensures that for every pool the search starts in the initial CPU's own area and never tries any area twice. To verify performance impact, I booted the kernel with a minimum pool size ("swiotlb=512,4,force"), so multiple pools get allocated, and I ran these benchmarks: - small: single-threaded I/O of 4 KiB blocks, - big: single-threaded I/O of 64 KiB blocks, - 4way: 4-way parallel I/O of 4 KiB blocks. The "var" column in the tables below is the coefficient of variance over 5 runs of the test, the "diff" column is the relative difference against base in read-write I/O bandwidth (MiB/s). Tested on an x86 VM against a QEMU virtio SATA driver backed by a RAM-based block device on the host: base patched var var diff small 0.69% 0.62% +25.4% big 2.14% 2.27% +25.7% 4way 2.65% 1.70% +23.6% Tested on a Raspberry Pi against a class-10 A1 microSD card: base patched var var diff small 0.53% 1.96% -0.3% big 0.02% 0.57% +0.8% 4way 6.17% 0.40% +0.3% These results confirm that there is significant performance boost in the software IO TLB slot allocation itself. Where performance is dominated by actual hardware, there is no measurable change. Signed-off-by: Petr Tesarik Reviewed-by: Mirsad Todorovac Signed-off-by: Christoph Hellwig

swiotlb: fix out-of-bounds TLB allocations with CONFIG_SWIOTLB_DYNAMIC

2023-11-08T15:27:05+00:00

Limit the free list length to the size of the IO TLB. Transient pool can be smaller than IO_TLB_SEGSIZE, but the free list is initialized with the assumption that the total number of slots is a multiple of IO_TLB_SEGSIZE. As a result, swiotlb_area_find_slots() may allocate slots past the end of a transient IO TLB buffer. Reported-by: Niklas Schnelle Closes: https://lore.kernel.org/linux-iommu/104a8c8fedffd1ff8a2890983e2ec1c26bff6810.camel@linux.ibm.com/ Fixes: 79636caad361 ("swiotlb: if swiotlb is full, fall back to a transient memory pool") Cc: stable@vger.kernel.org Signed-off-by: Petr Tesarik Reviewed-by: Halil Pasic Signed-off-by: Christoph Hellwig

swiotlb: do not free decrypted pages if dynamic

2023-11-03T08:33:45+00:00

Fix these two error paths: 1. When set_memory_decrypted() fails, pages may be left fully or partially decrypted. 2. Decrypted pages may be freed if swiotlb_alloc_tlb() determines that the physical address is too high. To fix the first issue, call set_memory_encrypted() on the allocated region after a failed decryption attempt. If that also fails, leak the pages. To fix the second issue, check that the TLB physical address is below the requested limit before decrypting. Let the caller differentiate between unsuitable physical address (=> retry from a lower zone) and allocation failures (=> no point in retrying). Cc: stable@vger.kernel.org Fixes: 79636caad361 ("swiotlb: if swiotlb is full, fall back to a transient memory pool") Signed-off-by: Petr Tesarik Reviewed-by: Rick Edgecombe Signed-off-by: Christoph Hellwig

Merge tag 'dma-mapping-6.7-2023-10-30' of git://git.infradead.org/users/hch/dma-mapping

2023-11-01T23:15:54+00:00

Pull dma-mapping updates from Christoph Hellwig: - get rid of the fake support for coherent DMA allocation on coldfire with caches (Christoph Hellwig) - add a few Kconfig dependencies so that Kconfig catches the use of invalid configurations (Christoph Hellwig) - fix a type in dma-debug output (Chuck Lever) - rewrite a comment in swiotlb (Sean Christopherson) * tag 'dma-mapping-6.7-2023-10-30' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: Fix a typo in a debugging eye-catcher swiotlb: rewrite comment explaining why the source is preserved on DMA_FROM_DEVICE m68k: remove unused includes from dma.c m68k: don't provide arch_dma_alloc for nommu/coldfire net: fec: use dma_alloc_noncoherent for data cache enabled coldfire m68k: use the coherent DMA code for coldfire without data cache dma-direct: warn when coherent allocations aren't supported dma-direct: simplify the use atomic pool logic in dma_direct_alloc dma-direct: add a CONFIG_ARCH_HAS_DMA_ALLOC symbol dma-direct: add dependencies to CONFIG_DMA_GLOBAL_POOL

swiotlb: do not try to allocate a TLB bigger than MAX_ORDER pages

2023-10-25T14:26:20+00:00

When allocating a new pool at runtime, reduce the number of slabs so that the allocation order is at most MAX_ORDER. This avoids a kernel warning in __alloc_pages(). The warning is relatively benign, because the pool size is subsequently reduced when allocation fails, but it is silly to start with a request that is known to fail, especially since this is the default behavior if the kernel is built with CONFIG_SWIOTLB_DYNAMIC=y and booted without any swiotlb= parameter. Reported-by: Ben Greear Closes: https://lore.kernel.org/netdev/4f173dd2-324a-0240-ff8d-abf5c191be18@candelatech.com/ Fixes: 1aaa736815eb ("swiotlb: allocate a new memory pool when existing pools are full") Signed-off-by: Petr Tesarik Signed-off-by: Christoph Hellwig

swiotlb: rewrite comment explaining why the source is preserved on DMA_FROM_DEVICE

2023-10-23T05:51:36+00:00

Rewrite the comment explaining why swiotlb copies the original buffer to the TLB buffer before initiating DMA *from* the device, i.e. before the device DMAs into the TLB buffer. The existing comment's argument that preserving the original data can prevent a kernel memory leak is bogus. If the driver that triggered the mapping _knows_ that the device will overwrite the entire mapping, or the driver will consume only the written parts, then copying from the original memory is completely pointless. If neither of the above holds true, then copying from the original adds value only if preserving the data is necessary for functional correctness, or the driver explicitly initialized the original memory. If the driver didn't initialize the memory, then copying the original buffer to the TLB buffer simply changes what kernel data is leaked to user space. Writing the entire TLB buffer _does_ prevent leaking stale TLB buffer data from a previous bounce, but that can be achieved by simply zeroing the TLB buffer when grabbing a slot. The real reason swiotlb ended up initializing the TLB buffer with the original buffer is that it's necessary to make swiotlb operate as transparently as possible, i.e. to behave as closely as possible to hardware, and to avoid corrupting the original buffer, e.g. if the driver knows the device will do partial writes and is relying on the unwritten data to be preserved. Reviewed-by: Robin Murphy Link: https://lore.kernel.org/all/ZN5elYQ5szQndN8n@google.com Signed-off-by: Sean Christopherson Signed-off-by: Christoph Hellwig