diff options
author | Nathan Lynch <nathanl@linux.ibm.com> | 2023-02-10 12:42:00 -0600 |
---|---|---|
committer | Michael Ellerman <mpe@ellerman.id.au> | 2023-02-13 22:35:02 +1100 |
commit | 43033bc62d349d8d852855a336c91d046de819bd (patch) | |
tree | 9f7ce98c09f955129e605446b45fedf442da0e20 /arch/powerpc/kernel | |
parent | 24098f580e2b5ceb2cec4f02833e0a0bb5d46d2e (diff) | |
download | lwn-43033bc62d349d8d852855a336c91d046de819bd.tar.gz lwn-43033bc62d349d8d852855a336c91d046de819bd.zip |
powerpc/pseries: add RTAS work area allocator
Various pseries-specific RTAS functions take a temporary "work area"
parameter - a buffer in memory accessible to RTAS. Typically such
functions are passed the statically allocated rtas_data_buf buffer as
the argument. This buffer is protected by a global spinlock. So users
of rtas_data_buf cannot perform sleeping operations while accessing
the buffer.
Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
leads to uncomfortable constructions like this:
do {
spin_lock(&rtas_data_buf_lock);
rc = rtas_call(token, __pa(rtas_data_buf, ...);
if (rc == 0) {
/* parse or copy out rtas_data_buf contents */
}
spin_unlock(&rtas_data_buf_lock);
} while (rtas_busy_delay(rc));
Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.
If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.
We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.
There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.
So a special-purpose allocator for RTAS work area buffers seems worth
trying.
Properties:
* The backing memory for the allocator is reserved early in boot in
order to satisfy RTAS addressing requirements, and then managed with
genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
served via an internal static buffer.
Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-12-26929c8cce78@linux.ibm.com
Diffstat (limited to 'arch/powerpc/kernel')
-rw-r--r-- | arch/powerpc/kernel/rtas.c | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index a6d0c46b4512..1f80f0b8a9ad 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -36,6 +36,7 @@ #include <asm/machdep.h> #include <asm/mmu.h> #include <asm/page.h> +#include <asm/rtas-work-area.h> #include <asm/rtas.h> #include <asm/time.h> #include <asm/trace.h> @@ -1939,6 +1940,8 @@ void __init rtas_initialize(void) #endif ibm_open_errinjct_token = rtas_token("ibm,open-errinjct"); ibm_errinjct_token = rtas_token("ibm,errinjct"); + + rtas_work_area_reserve_arena(rtas_region); } int __init early_init_dt_scan_rtas(unsigned long node, |