diff options
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r-- | Documentation/memory-barriers.txt | 249 |
1 files changed, 100 insertions, 149 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1c22b21ae922..f70ebcdfe592 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1937,21 +1937,6 @@ There are some more advanced barrier functions: information on consistent memory. -MMIO WRITE BARRIER ------------------- - -The Linux kernel also has a special barrier for use with memory-mapped I/O -writes: - - mmiowb(); - -This is a variation on the mandatory write barrier that causes writes to weakly -ordered I/O regions to be partially ordered. Its effects may go beyond the -CPU->Hardware interface and actually affect the hardware at some level. - -See the subsection "Acquires vs I/O accesses" for more information. - - =============================== IMPLICIT KERNEL MEMORY BARRIERS =============================== @@ -2317,75 +2302,6 @@ But it won't see any of: *E, *F or *G following RELEASE Q - -ACQUIRES VS I/O ACCESSES ------------------------- - -Under certain circumstances (especially involving NUMA), I/O accesses within -two spinlocked sections on two different CPUs may be seen as interleaved by the -PCI bridge, because the PCI bridge does not necessarily participate in the -cache-coherence protocol, and is therefore incapable of issuing the required -read memory barriers. - -For example: - - CPU 1 CPU 2 - =============================== =============================== - spin_lock(Q) - writel(0, ADDR) - writel(1, DATA); - spin_unlock(Q); - spin_lock(Q); - writel(4, ADDR); - writel(5, DATA); - spin_unlock(Q); - -may be seen by the PCI bridge as follows: - - STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5 - -which would probably cause the hardware to malfunction. - - -What is necessary here is to intervene with an mmiowb() before dropping the -spinlock, for example: - - CPU 1 CPU 2 - =============================== =============================== - spin_lock(Q) - writel(0, ADDR) - writel(1, DATA); - mmiowb(); - spin_unlock(Q); - spin_lock(Q); - writel(4, ADDR); - writel(5, DATA); - mmiowb(); - spin_unlock(Q); - -this will ensure that the two stores issued on CPU 1 appear at the PCI bridge -before either of the stores issued on CPU 2. - - -Furthermore, following a store by a load from the same device obviates the need -for the mmiowb(), because the load forces the store to complete before the load -is performed: - - CPU 1 CPU 2 - =============================== =============================== - spin_lock(Q) - writel(0, ADDR) - a = readl(DATA); - spin_unlock(Q); - spin_lock(Q); - writel(4, ADDR); - b = readl(DATA); - spin_unlock(Q); - - -See Documentation/driver-api/device-io.rst for more information. - - ================================= WHERE ARE MEMORY BARRIERS NEEDED? ================================= @@ -2532,16 +2448,9 @@ the device to malfunction. Inside of the Linux kernel, I/O should be done through the appropriate accessor routines - such as inb() or writel() - which know how to make such accesses appropriately sequential. While this, for the most part, renders the explicit -use of memory barriers unnecessary, there are a couple of situations where they -might be needed: - - (1) On some systems, I/O stores are not strongly ordered across all CPUs, and - so for _all_ general drivers locks should be used and mmiowb() must be - issued prior to unlocking the critical section. - - (2) If the accessor functions are used to refer to an I/O memory window with - relaxed memory access properties, then _mandatory_ memory barriers are - required to enforce ordering. +use of memory barriers unnecessary, if the accessor functions are used to refer +to an I/O memory window with relaxed memory access properties, then _mandatory_ +memory barriers are required to enforce ordering. See Documentation/driver-api/device-io.rst for more information. @@ -2586,8 +2495,7 @@ explicit barriers are used. Normally this won't be a problem because the I/O accesses done inside such sections will include synchronous load operations on strictly ordered I/O -registers that form implicit I/O barriers. If this isn't sufficient then an -mmiowb() may need to be used explicitly. +registers that form implicit I/O barriers. A similar situation may occur between an interrupt routine and two routines @@ -2599,71 +2507,114 @@ likely, then interrupt-disabling locks should be used to guarantee ordering. KERNEL I/O BARRIER EFFECTS ========================== -When accessing I/O memory, drivers should use the appropriate accessor -functions: - - (*) inX(), outX(): - - These are intended to talk to I/O space rather than memory space, but - that's primarily a CPU-specific concept. The i386 and x86_64 processors - do indeed have special I/O space access cycles and instructions, but many - CPUs don't have such a concept. - - The PCI bus, amongst others, defines an I/O space concept which - on such - CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O - space. However, it may also be mapped as a virtual I/O space in the CPU's - memory map, particularly on those CPUs that don't support alternate I/O - spaces. - - Accesses to this space may be fully synchronous (as on i386), but - intermediary bridges (such as the PCI host bridge) may not fully honour - that. - - They are guaranteed to be fully ordered with respect to each other. - - They are not guaranteed to be fully ordered with respect to other types of - memory and I/O operation. +Interfacing with peripherals via I/O accesses is deeply architecture and device +specific. Therefore, drivers which are inherently non-portable may rely on +specific behaviours of their target systems in order to achieve synchronization +in the most lightweight manner possible. For drivers intending to be portable +between multiple architectures and bus implementations, the kernel offers a +series of accessor functions that provide various degrees of ordering +guarantees: (*) readX(), writeX(): - Whether these are guaranteed to be fully ordered and uncombined with - respect to each other on the issuing CPU depends on the characteristics - defined for the memory window through which they're accessing. On later - i386 architecture machines, for example, this is controlled by way of the - MTRR registers. + The readX() and writeX() MMIO accessors take a pointer to the + peripheral being accessed as an __iomem * parameter. For pointers + mapped with the default I/O attributes (e.g. those returned by + ioremap()), the ordering guarantees are as follows: + + 1. All readX() and writeX() accesses to the same peripheral are ordered + with respect to each other. This ensures that MMIO register accesses + by the same CPU thread to a particular device will arrive in program + order. + + 2. A writeX() issued by a CPU thread holding a spinlock is ordered + before a writeX() to the same peripheral from another CPU thread + issued after a later acquisition of the same spinlock. This ensures + that MMIO register writes to a particular device issued while holding + a spinlock will arrive in an order consistent with acquisitions of + the lock. + + 3. A writeX() by a CPU thread to the peripheral will first wait for the + completion of all prior writes to memory either issued by, or + propagated to, the same thread. This ensures that writes by the CPU + to an outbound DMA buffer allocated by dma_alloc_coherent() will be + visible to a DMA engine when the CPU writes to its MMIO control + register to trigger the transfer. + + 4. A readX() by a CPU thread from the peripheral will complete before + any subsequent reads from memory by the same thread can begin. This + ensures that reads by the CPU from an incoming DMA buffer allocated + by dma_alloc_coherent() will not see stale data after reading from + the DMA engine's MMIO status register to establish that the DMA + transfer has completed. + + 5. A readX() by a CPU thread from the peripheral will complete before + any subsequent delay() loop can begin execution on the same thread. + This ensures that two MMIO register writes by the CPU to a peripheral + will arrive at least 1us apart if the first write is immediately read + back with readX() and udelay(1) is called prior to the second + writeX(): + + writel(42, DEVICE_REGISTER_0); // Arrives at the device... + readl(DEVICE_REGISTER_0); + udelay(1); + writel(42, DEVICE_REGISTER_1); // ...at least 1us before this. + + The ordering properties of __iomem pointers obtained with non-default + attributes (e.g. those returned by ioremap_wc()) are specific to the + underlying architecture and therefore the guarantees listed above cannot + generally be relied upon for accesses to these types of mappings. + + (*) readX_relaxed(), writeX_relaxed(): + + These are similar to readX() and writeX(), but provide weaker memory + ordering guarantees. Specifically, they do not guarantee ordering with + respect to locking, normal memory accesses or delay() loops (i.e. + bullets 2-5 above) but they are still guaranteed to be ordered with + respect to other accesses from the same CPU thread to the same + peripheral when operating on __iomem pointers mapped with the default + I/O attributes. + + (*) readsX(), writesX(): + + The readsX() and writesX() MMIO accessors are designed for accessing + register-based, memory-mapped FIFOs residing on peripherals that are not + capable of performing DMA. Consequently, they provide only the ordering + guarantees of readX_relaxed() and writeX_relaxed(), as documented above. - Ordinarily, these will be guaranteed to be fully ordered and uncombined, - provided they're not accessing a prefetchable device. + (*) inX(), outX(): - However, intermediary hardware (such as a PCI bridge) may indulge in - deferral if it so wishes; to flush a store, a load from the same location - is preferred[*], but a load from the same device or from configuration - space should suffice for PCI. + The inX() and outX() accessors are intended to access legacy port-mapped + I/O peripherals, which may require special instructions on some + architectures (notably x86). The port number of the peripheral being + accessed is passed as an argument. - [*] NOTE! attempting to load from the same location as was written to may - cause a malfunction - consider the 16550 Rx/Tx serial registers for - example. + Since many CPU architectures ultimately access these peripherals via an + internal virtual memory mapping, the portable ordering guarantees + provided by inX() and outX() are the same as those provided by readX() + and writeX() respectively when accessing a mapping with the default I/O + attributes. - Used with prefetchable I/O memory, an mmiowb() barrier may be required to - force stores to be ordered. + Device drivers may expect outX() to emit a non-posted write transaction + that waits for a completion response from the I/O peripheral before + returning. This is not guaranteed by all architectures and is therefore + not part of the portable ordering semantics. - Please refer to the PCI specification for more information on interactions - between PCI transactions. + (*) insX(), outsX(): - (*) readX_relaxed(), writeX_relaxed() + As above, the insX() and outsX() accessors provide the same ordering + guarantees as readsX() and writesX() respectively when accessing a + mapping with the default I/O attributes. - These are similar to readX() and writeX(), but provide weaker memory - ordering guarantees. Specifically, they do not guarantee ordering with - respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee - ordering with respect to LOCK or UNLOCK operations. If the latter is - required, an mmiowb() barrier can be used. Note that relaxed accesses to - the same peripheral are guaranteed to be ordered with respect to each - other. + (*) ioreadX(), iowriteX(): - (*) ioreadX(), iowriteX() + These will perform appropriately for the type of access they're actually + doing, be it inX()/outX() or readX()/writeX(). - These will perform appropriately for the type of access they're actually - doing, be it inX()/outX() or readX()/writeX(). +With the exception of the string accessors (insX(), outsX(), readsX() and +writesX()), all of the above assume that the underlying peripheral is +little-endian and will therefore perform byte-swapping operations on big-endian +architectures. ======================================== |