diff options
Diffstat (limited to 'Documentation/process/adding-syscalls.rst')
| -rw-r--r-- | Documentation/process/adding-syscalls.rst | 104 |
1 files changed, 94 insertions, 10 deletions
diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst index 906c47f1a9e5..91fc88681b1e 100644 --- a/Documentation/process/adding-syscalls.rst +++ b/Documentation/process/adding-syscalls.rst @@ -111,13 +111,13 @@ should use a file descriptor as the handle for that object -- don't invent a new type of userspace object handle when the kernel already has mechanisms and well-defined semantics for using file descriptors. -If your new :manpage:`xyzzy(2)` system call does return a new file descriptor, +If your new xyzzy(2) system call does return a new file descriptor, then the flags argument should include a value that is equivalent to setting ``O_CLOEXEC`` on the new FD. This makes it possible for userspace to close the timing window between ``xyzzy()`` and calling ``fcntl(fd, F_SETFD, FD_CLOEXEC)``, where an unexpected ``fork()`` and ``execve()`` in another thread could leak a descriptor to -the exec'ed program. (However, resist the temptation to re-use the actual value +the exec'ed program. (However, resist the temptation to reuse the actual value of the ``O_CLOEXEC`` constant, as it is architecture-specific and is part of a numbering space of ``O_*`` flags that is fairly full.) @@ -127,18 +127,18 @@ descriptor. Making a file descriptor ready for reading or writing is the normal way for the kernel to indicate to userspace that an event has occurred on the corresponding kernel object. -If your new :manpage:`xyzzy(2)` system call involves a filename argument:: +If your new xyzzy(2) system call involves a filename argument:: int sys_xyzzy(const char __user *path, ..., unsigned int flags); -you should also consider whether an :manpage:`xyzzyat(2)` version is more appropriate:: +you should also consider whether an xyzzyat(2) version is more appropriate:: int sys_xyzzyat(int dfd, const char __user *path, ..., unsigned int flags); This allows more flexibility for how userspace specifies the file in question; in particular it allows userspace to request the functionality for an already-opened file descriptor using the ``AT_EMPTY_PATH`` flag, effectively -giving an :manpage:`fxyzzy(3)` operation for free:: +giving an fxyzzy(3) operation for free:: - xyzzyat(AT_FDCWD, path, ..., 0) is equivalent to xyzzy(path,...) - xyzzyat(fd, "", ..., AT_EMPTY_PATH) is equivalent to fxyzzy(fd, ...) @@ -147,11 +147,11 @@ giving an :manpage:`fxyzzy(3)` operation for free:: :manpage:`openat(2)` man page; for an example of AT_EMPTY_PATH, see the :manpage:`fstatat(2)` man page.) -If your new :manpage:`xyzzy(2)` system call involves a parameter describing an +If your new xyzzy(2) system call involves a parameter describing an offset within a file, make its type ``loff_t`` so that 64-bit offsets can be supported even on 32-bit architectures. -If your new :manpage:`xyzzy(2)` system call involves privileged functionality, +If your new xyzzy(2) system call involves privileged functionality, it needs to be governed by the appropriate Linux capability bit (checked with a call to ``capable()``), as described in the :manpage:`capabilities(7)` man page. Choose an existing capability bit that governs related functionality, @@ -160,7 +160,7 @@ under the same bit, as this goes against capabilities' purpose of splitting the power of root. In particular, avoid adding new uses of the already overly-general ``CAP_SYS_ADMIN`` capability. -If your new :manpage:`xyzzy(2)` system call manipulates a process other than +If your new xyzzy(2) system call manipulates a process other than the calling process, it should be restricted (using a call to ``ptrace_may_access()``) so that only a calling process with the same permissions as the target process, or with the necessary capabilities, can @@ -196,7 +196,7 @@ be cc'ed to linux-api@vger.kernel.org. Generic System Call Implementation ---------------------------------- -The main entry point for your new :manpage:`xyzzy(2)` system call will be called +The main entry point for your new xyzzy(2) system call will be called ``sys_xyzzy()``, but you add this entry point with the appropriate ``SYSCALL_DEFINEn()`` macro rather than explicitly. The 'n' indicates the number of arguments to the system call, and the macro takes the system call name @@ -248,6 +248,52 @@ To summarize, you need a commit that includes: - fallback stub in ``kernel/sys_ni.c`` +.. _syscall_generic_6_11: + +Since 6.11 +~~~~~~~~~~ + +Starting with kernel version 6.11, general system call implementation for the +following architectures no longer requires modifications to +``include/uapi/asm-generic/unistd.h``: + + - arc + - arm64 + - csky + - hexagon + - loongarch + - nios2 + - openrisc + - riscv + +Instead, you need to update ``scripts/syscall.tbl`` and, if applicable, adjust +``arch/*/kernel/Makefile.syscalls``. + +As ``scripts/syscall.tbl`` serves as a common syscall table across multiple +architectures, a new entry is required in this table:: + + 468 common xyzzy sys_xyzzy + +Note that adding an entry to ``scripts/syscall.tbl`` with the "common" ABI +also affects all architectures that share this table. For more limited or +architecture-specific changes, consider using an architecture-specific ABI or +defining a new one. + +If a new ABI, say ``xyz``, is introduced, the corresponding updates should be +made to ``arch/*/kernel/Makefile.syscalls`` as well:: + + syscall_abis_{32,64} += xyz (...) + +To summarize, you need a commit that includes: + + - ``CONFIG`` option for the new function, normally in ``init/Kconfig`` + - ``SYSCALL_DEFINEn(xyzzy, ...)`` for the entry point + - corresponding prototype in ``include/linux/syscalls.h`` + - new entry in ``scripts/syscall.tbl`` + - (if needed) Makefile updates in ``arch/*/kernel/Makefile.syscalls`` + - fallback stub in ``kernel/sys_ni.c`` + + x86 System Call Implementation ------------------------------ @@ -353,6 +399,41 @@ To summarize, you need: ``include/uapi/asm-generic/unistd.h`` +Since 6.11 +~~~~~~~~~~ + +This applies to all the architectures listed in :ref:`Since 6.11<syscall_generic_6_11>` +under "Generic System Call Implementation", except arm64. See +:ref:`Compatibility System Calls (arm64)<compat_arm64>` for more information. + +You need to extend the entry in ``scripts/syscall.tbl`` with an extra column +to indicate that a 32-bit userspace program running on a 64-bit kernel should +hit the compat entry point:: + + 468 common xyzzy sys_xyzzy compat_sys_xyzzy + +To summarize, you need: + + - ``COMPAT_SYSCALL_DEFINEn(xyzzy, ...)`` for the compat entry point + - corresponding prototype in ``include/linux/compat.h`` + - modification of the entry in ``scripts/syscall.tbl`` to include an extra + "compat" column + - (if needed) 32-bit mapping struct in ``include/linux/compat.h`` + + +.. _compat_arm64: + +Compatibility System Calls (arm64) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +On arm64, there is a dedicated syscall table for compatibility system calls +targeting 32-bit (AArch32) userspace: ``arch/arm64/tools/syscall_32.tbl``. +You need to add an additional line to this table specifying the compat +entry point:: + + 468 common xyzzy sys_xyzzy compat_sys_xyzzy + + Compatibility System Calls (x86) -------------------------------- @@ -378,7 +459,7 @@ the compatibility wrapper:: ... 555 x32 xyzzy __x32_compat_sys_xyzzy -If no pointers are involved, then it is preferable to re-use the 64-bit system +If no pointers are involved, then it is preferable to reuse the 64-bit system call for the x32 ABI (and consequently the entry in arch/x86/entry/syscalls/syscall_64.tbl is unchanged). @@ -575,3 +656,6 @@ References and Sources - Recommendation from Linus Torvalds that x32 system calls should prefer compatibility with 64-bit versions rather than 32-bit versions: https://lore.kernel.org/r/CA+55aFxfmwfB7jbbrXxa=K7VBYPfAvmu3XOkGrLbB1UFjX1+Ew@mail.gmail.com + - Patch series revising system call table infrastructure to use + scripts/syscall.tbl across multiple architectures: + https://lore.kernel.org/lkml/20240704143611.2979589-1-arnd@kernel.org |
