diff options
author | Jonathan Corbet <corbet@lwn.net> | 2020-03-02 15:26:38 -0700 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2020-03-10 11:21:08 -0600 |
commit | 6505a18e66876e0f502dcba5a563bd3048094048 (patch) | |
tree | 78feaf379f5bd3dd03eef8f2d81867e33ffc917e /Documentation/core-api | |
parent | 2b4cbd5c950525b6d4d2cd384dcefdd95fedabe3 (diff) | |
download | lwn-6505a18e66876e0f502dcba5a563bd3048094048.tar.gz lwn-6505a18e66876e0f502dcba5a563bd3048094048.zip |
docs: move core-api/ioctl.rst to driver-api/
The ioctl() documentation belongs with the rest of the driver-oriented
info, so move it there.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/index.rst | 1 | ||||
-rw-r--r-- | Documentation/core-api/ioctl.rst | 253 |
2 files changed, 0 insertions, 254 deletions
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 9836a0ac09a3..0897ad12c119 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -102,7 +102,6 @@ Documents that don't fit elsewhere or which have yet to be categorized. :maxdepth: 1 librs - ioctl .. only:: subproject and html diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst deleted file mode 100644 index c455db0e1627..000000000000 --- a/Documentation/core-api/ioctl.rst +++ /dev/null @@ -1,253 +0,0 @@ -====================== -ioctl based interfaces -====================== - -ioctl() is the most common way for applications to interface -with device drivers. It is flexible and easily extended by adding new -commands and can be passed through character devices, block devices as -well as sockets and other special file descriptors. - -However, it is also very easy to get ioctl command definitions wrong, -and hard to fix them later without breaking existing applications, -so this documentation tries to help developers get it right. - -Command number definitions -========================== - -The command number, or request number, is the second argument passed to -the ioctl system call. While this can be any 32-bit number that uniquely -identifies an action for a particular driver, there are a number of -conventions around defining them. - -``include/uapi/asm-generic/ioctl.h`` provides four macros for defining -ioctl commands that follow modern conventions: ``_IO``, ``_IOR``, -``_IOW``, and ``_IOWR``. These should be used for all new commands, -with the correct parameters: - -_IO/_IOR/_IOW/_IOWR - The macro name specifies how the argument will be used. It may be a - pointer to data to be passed into the kernel (_IOW), out of the kernel - (_IOR), or both (_IOWR). _IO can indicate either commands with no - argument or those passing an integer value instead of a pointer. - It is recommended to only use _IO for commands without arguments, - and use pointers for passing data. - -type - An 8-bit number, often a character literal, specific to a subsystem - or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` - -nr - An 8-bit number identifying the specific command, unique for a give - value of 'type' - -data_type - The name of the data type pointed to by the argument, the command number - encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer, - leading to a limit of 8191 bytes for the maximum size of the argument. - Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that - will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t). - _IO does not have a data_type parameter. - - -Interface versions -================== - -Some subsystems use version numbers in data structures to overload -commands with different interpretations of the argument. - -This is generally a bad idea, since changes to existing commands tend -to break existing applications. - -A better approach is to add a new ioctl command with a new number. The -old command still needs to be implemented in the kernel for compatibility, -but this can be a wrapper around the new implementation. - -Return code -=========== - -ioctl commands can return negative error codes as documented in errno(3); -these get turned into errno values in user space. On success, the return -code should be zero. It is also possible but not recommended to return -a positive 'long' value. - -When the ioctl callback is called with an unknown command number, the -handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in --ENOTTY being returned from the system call. Some subsystems return --ENOSYS or -EINVAL here for historic reasons, but this is wrong. - -Prior to Linux 5.5, compat_ioctl handlers were required to return --ENOIOCTLCMD in order to use the fallback conversion into native -commands. As all subsystems are now responsible for handling compat -mode themselves, this is no longer needed, but it may be important to -consider when backporting bug fixes to older kernels. - -Timestamps -========== - -Traditionally, timestamps and timeout values are passed as ``struct -timespec`` or ``struct timeval``, but these are problematic because of -incompatible definitions of these structures in user space after the -move to 64-bit time_t. - -The ``struct __kernel_timespec`` type can be used instead to be embedded -in other data structures when separate second/nanosecond values are -desired, or passed to user space directly. This is still not ideal though, -as the structure matches neither the kernel's timespec64 nor the user -space timespec exactly. The get_timespec64() and put_timespec64() helper -functions can be used to ensure that the layout remains compatible with -user space and the padding is treated correctly. - -As it is cheap to convert seconds to nanoseconds, but the opposite -requires an expensive 64-bit division, a simple __u64 nanosecond value -can be simpler and more efficient. - -Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, -as returned by ktime_get_ns() or ktime_get_ts64(). Unlike -CLOCK_REALTIME, this makes the timestamps immune from jumping backwards -or forwards due to leap second adjustments and clock_settime() calls. - -ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that -need to be persistent across a reboot or between multiple machines. - -32-bit compat mode -================== - -In order to support 32-bit user space running on a 64-bit machine, each -subsystem or driver that implements an ioctl callback handler must also -implement the corresponding compat_ioctl handler. - -As long as all the rules for data structures are followed, this is as -easy as setting the .compat_ioctl pointer to a helper function such as -compat_ptr_ioctl() or blkdev_compat_ptr_ioctl(). - -compat_ptr() ------------- - -On the s390 architecture, 31-bit user space has ambiguous representations -for data pointers, with the upper bit being ignored. When running such -a process in compat mode, the compat_ptr() helper must be used to -clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit -pointer. On other architectures, this macro only performs a cast to a -``void __user *`` pointer. - -In an compat_ioctl() callback, the last argument is an unsigned long, -which can be interpreted as either a pointer or a scalar depending on -the command. If it is a scalar, then compat_ptr() must not be used, to -ensure that the 64-bit kernel behaves the same way as a 32-bit kernel -for arguments with the upper bit set. - -The compat_ptr_ioctl() helper can be used in place of a custom -compat_ioctl file operation for drivers that only take arguments that -are pointers to compatible data structures. - -Structure layout ----------------- - -Compatible data structures have the same layout on all architectures, -avoiding all problematic members: - -* ``long`` and ``unsigned long`` are the size of a register, so - they can be either 32-bit or 64-bit wide and cannot be used in portable - data structures. Fixed-length replacements are ``__s32``, ``__u32``, - ``__s64`` and ``__u64``. - -* Pointers have the same problem, in addition to requiring the - use of compat_ptr(). The best workaround is to use ``__u64`` - in place of pointers, which requires a cast to ``uintptr_t`` in user - space, and the use of u64_to_user_ptr() in the kernel to convert - it back into a user pointer. - -* On the x86-32 (i386) architecture, the alignment of 64-bit variables - is only 32-bit, but they are naturally aligned on most other - architectures including x86-64. This means a structure like:: - - struct foo { - __u32 a; - __u64 b; - __u32 c; - }; - - has four bytes of padding between a and b on x86-64, plus another four - bytes of padding at the end, but no padding on i386, and it needs a - compat_ioctl conversion handler to translate between the two formats. - - To avoid this problem, all structures should have their members - naturally aligned, or explicit reserved fields added in place of the - implicit padding. The ``pahole`` tool can be used for checking the - alignment. - -* On ARM OABI user space, structures are padded to multiples of 32-bit, - making some structs incompatible with modern EABI kernels if they - do not end on a 32-bit boundary. - -* On the m68k architecture, struct members are not guaranteed to have an - alignment greater than 16-bit, which is a problem when relying on - implicit padding. - -* Bitfields and enums generally work as one would expect them to, - but some properties of them are implementation-defined, so it is better - to avoid them completely in ioctl interfaces. - -* ``char`` members can be either signed or unsigned, depending on - the architecture, so the __u8 and __s8 types should be used for 8-bit - integer values, though char arrays are clearer for fixed-length strings. - -Information leaks -================= - -Uninitialized data must not be copied back to user space, as this can -cause an information leak, which can be used to defeat kernel address -space layout randomization (KASLR), helping in an attack. - -For this reason (and for compat support) it is best to avoid any -implicit padding in data structures. Where there is implicit padding -in an existing structure, kernel drivers must be careful to fully -initialize an instance of the structure before copying it to user -space. This is usually done by calling memset() before assigning to -individual members. - -Subsystem abstractions -====================== - -While some device drivers implement their own ioctl function, most -subsystems implement the same command for multiple drivers. Ideally the -subsystem has an .ioctl() handler that copies the arguments from and -to user space, passing them into subsystem specific callback functions -through normal kernel pointers. - -This helps in various ways: - -* Applications written for one driver are more likely to work for - another one in the same subsystem if there are no subtle differences - in the user space ABI. - -* The complexity of user space access and data structure layout is done - in one place, reducing the potential for implementation bugs. - -* It is more likely to be reviewed by experienced developers - that can spot problems in the interface when the ioctl is shared - between multiple drivers than when it is only used in a single driver. - -Alternatives to ioctl -===================== - -There are many cases in which ioctl is not the best solution for a -problem. Alternatives include: - -* System calls are a better choice for a system-wide feature that - is not tied to a physical device or constrained by the file system - permissions of a character device node - -* netlink is the preferred way of configuring any network related - objects through sockets. - -* debugfs is used for ad-hoc interfaces for debugging functionality - that does not need to be exposed as a stable interface to applications. - -* sysfs is a good way to expose the state of an in-kernel object - that is not tied to a file descriptor. - -* configfs can be used for more complex configuration than sysfs - -* A custom file system can provide extra flexibility with a simple - user interface but adds a lot of complexity to the implementation. |