diff options
Diffstat (limited to 'Documentation')
23 files changed, 355 insertions, 85 deletions
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 09a9d4aca0fd..900b3fc4c72d 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -886,6 +886,21 @@ Description: zone commands, they will be treated as regular block devices and zoned will report "none". +What: /sys/block/<disk>/queue/zoned_qd1_writes +Date: January 2026 +Contact: Damien Le Moal <dlemoal@kernel.org> +Description: + [RW] zoned_qd1_writes indicates if write operations to a zoned + block device are being handled using a single issuer context (a + kernel thread) operating at a maximum queue depth of 1. This + attribute is visible only for zoned block devices. The default + value for zoned block devices that are not rotational devices + (e.g. ZNS SSDs or zoned UFS devices) is 0. For rotational zoned + block devices (e.g. SMR HDDs) the default value is 1. Since + this default may not be appropriate for some devices, e.g. + remotely connected devices over high latency networks, the user + can disable this feature by setting this attribute to 0. + What: /sys/block/<disk>/hidden Date: March 2023 diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 3a05604c21bf..82d10d556cc8 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -327,6 +327,24 @@ Description: Energy performance preference This file is only present if the cppc-cpufreq driver is in use. +What: /sys/devices/system/cpu/cpuX/cpufreq/perf_limited +Date: February 2026 +Contact: linux-pm@vger.kernel.org +Description: Performance Limited + + Read to check if platform throttling (thermal/power/current + limits) caused delivered performance to fall below the + requested level. A non-zero value indicates throttling occurred. + + Write the bitmask of bits to clear: + + - 0x1 = clear bit 0 (desired performance excursion) + - 0x2 = clear bit 1 (minimum performance excursion) + - 0x3 = clear both bits + + The platform sets these bits; OSPM can only clear them. + + This file is only present if the cppc-cpufreq driver is in use. What: /sys/devices/system/cpu/cpu*/cache/index3/cache_disable_{0,1} Date: August 2008 diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi index 72e7c9161ce7..fa33dda331f2 100644 --- a/Documentation/ABI/testing/sysfs-firmware-acpi +++ b/Documentation/ABI/testing/sysfs-firmware-acpi @@ -41,6 +41,12 @@ Description: platform runtime firmware S3 resume, just prior to handoff to the OS waking vector. In nanoseconds. + FBPT: The raw binary contents of the Firmware Basic Boot + Performance Table (FBPT) subtable. + + S3PT: The raw binary contents of the S3 Performance Table + (S3PT) subtable. + What: /sys/firmware/acpi/bgrt/ Date: January 2012 Contact: Matthew Garrett <mjg@redhat.com> diff --git a/Documentation/ABI/testing/sysfs-nvme b/Documentation/ABI/testing/sysfs-nvme new file mode 100644 index 000000000000..499d5f843cd4 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-nvme @@ -0,0 +1,13 @@ +What: /sys/devices/virtual/nvme-fabrics/ctl/.../tls_configured_key +Date: November 2025 +KernelVersion: 6.19 +Contact: Linux NVMe mailing list <linux-nvme@lists.infradead.org> +Description: + The file is avaliable when using a secure concatanation + connection to a NVMe target. Reading the file will return + the serial of the currently negotiated key. + + Writing 0 to the file will trigger a PSK reauthentication + (REPLACETLSPSK) with the target. After a reauthentication + the value returned by tls_configured_key will be the new + serial. diff --git a/Documentation/PCI/tph.rst b/Documentation/PCI/tph.rst index e8993be64fd6..b6cf22b9bd90 100644 --- a/Documentation/PCI/tph.rst +++ b/Documentation/PCI/tph.rst @@ -79,10 +79,10 @@ To retrieve a Steering Tag for a target memory associated with a specific CPU, use the following function:: int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type type, - unsigned int cpu_uid, u16 *tag); + unsigned int cpu, u16 *tag); The `type` argument is used to specify the memory type, either volatile -or persistent, of the target memory. The `cpu_uid` argument specifies the +or persistent, of the target memory. The `cpu` argument specifies the CPU where the memory is associated to. After the ST value is retrieved, the device driver can use the following diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index b5cdbba3ec2e..4d886e7c7a95 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -2787,6 +2787,13 @@ which avoids the read-side memory barriers, at least for architectures that apply noinstr to kernel entry/exit code (or that build with ``CONFIG_TASKS_TRACE_RCU_NO_MB=y``. +Now that the implementation is based on SRCU-fast, a call +to synchronize_rcu_tasks_trace() implies at least one call to +synchronize_rcu(), that is, every Tasks Trace RCU grace period contains +at least one plain vanilla RCU grace period. Should there ever +be a synchronize_rcu_tasks_trace_expedited(), this guarantee would +*not* necessarily apply to this hypothetical API member. + The tasks-trace-RCU API is also reasonably compact, consisting of rcu_read_lock_trace(), rcu_read_unlock_trace(), rcu_read_lock_trace_held(), call_rcu_tasks_trace(), diff --git a/Documentation/admin-guide/blockdev/zoned_loop.rst b/Documentation/admin-guide/blockdev/zoned_loop.rst index 6aa865424ac3..f4f1f3121bf9 100644 --- a/Documentation/admin-guide/blockdev/zoned_loop.rst +++ b/Documentation/admin-guide/blockdev/zoned_loop.rst @@ -62,7 +62,7 @@ The options available for the add command can be listed by reading the /dev/zloop-control device:: $ cat /dev/zloop-control - add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io + add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,max_open_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io,zone_append=%u,ordered_zone_append,discard_write_cache remove id=%d In more details, the options that can be used with the "add" command are as @@ -80,6 +80,9 @@ zone_capacity_mb Device zone capacity (must always be equal to or lower conv_zones Total number of conventioanl zones starting from sector 0 Default: 8 +max_open_zones Maximum number of open sequential write required zones + (0 for no limit). + Default: 0 base_dir Path to the base directory where to create the directory containing the zone files of the device. Default=/var/local/zloop. @@ -104,6 +107,11 @@ ordered_zone_append Enable zloop mitigation of zone append reordering. (extents), as when enabled, this can significantly reduce the number of data extents needed to for a file data mapping. +discard_write_cache Discard all data that was not explicitly persisted using a + flush operation when the device is removed by truncating + each zone file to the size recorded during the last flush + operation. This simulates power fail events where + uncommitted data is lost. =================== ========================================================= 3) Deleting a Zoned Device diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9552819051cd..2075e7a9dcde 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -190,6 +190,14 @@ Kernel parameters unusable. The "log_buf_len" parameter may be useful if you need to capture more output. + acpi.poweroff_on_fatal= [ACPI] + {0 | 1} + Causes the system to poweroff when the ACPI bytecode signals + a fatal error. The default value of this setting is 1. + Overriding this value should only be done for diagnosing + ACPI firmware problems, as the system might behave erratically + after having encountered a fatal ACPI error. + acpi_enforce_resources= [ACPI] { strict | lax | no } Check for resource conflicts between native drivers diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst index 746ea60eed3f..acdd4b65964c 100644 --- a/Documentation/admin-guide/xfs.rst +++ b/Documentation/admin-guide/xfs.rst @@ -550,6 +550,10 @@ For zoned file systems, the following attributes are exposed in: is limited by the capabilities of the backing zoned device, file system size and the max_open_zones mount option. + nr_open_zones (Min: 0 Default: Varies Max: UINTMAX) + This read-only attribute exposes the current number of open zones + used by the file system. + zonegc_low_space (Min: 0 Default: 0 Max: 100) Define a percentage for how much of the unused space that GC should keep available for writing. A high value will reclaim more of the space diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst index 78a3e01ff68c..ab7d8e62ddaf 100644 --- a/Documentation/arch/riscv/zicfilp.rst +++ b/Documentation/arch/riscv/zicfilp.rst @@ -76,34 +76,49 @@ the program. 4. prctl() enabling -------------------- -:c:macro:`PR_SET_INDIR_BR_LP_STATUS` / :c:macro:`PR_GET_INDIR_BR_LP_STATUS` / -:c:macro:`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect -branch tracking. These prctls are architecture-agnostic and return -EINVAL if -the underlying functionality is not supported. +Per-task indirect branch tracking state can be monitored and +controlled via the :c:macro:`PR_GET_CFI` and :c:macro:`PR_SET_CFI` +``prctl()` arguments (respectively), by supplying +:c:macro:`PR_CFI_BRANCH_LANDING_PADS` as the second argument. These +are architecture-agnostic, and will return -EINVAL if the underlying +functionality is not supported. -* prctl(PR_SET_INDIR_BR_LP_STATUS, unsigned long arg) +* prctl(:c:macro:`PR_SET_CFI`, :c:macro:`PR_CFI_BRANCH_LANDING_PADS`, unsigned long arg) -If arg1 is :c:macro:`PR_INDIR_BR_LP_ENABLE` and if CPU supports -``zicfilp`` then the kernel will enable indirect branch tracking for the -task. The dynamic loader can issue this :c:macro:`prctl` once it has -determined that all the objects loaded in the address space support -indirect branch tracking. Additionally, if there is a `dlopen` to an -object which wasn't compiled with ``zicfilp``, the dynamic loader can -issue this prctl with arg1 set to 0 (i.e. :c:macro:`PR_INDIR_BR_LP_ENABLE` -cleared). - -* prctl(PR_GET_INDIR_BR_LP_STATUS, unsigned long * arg) +arg is a bitmask. -Returns the current status of indirect branch tracking. If enabled -it'll return :c:macro:`PR_INDIR_BR_LP_ENABLE` - -* prctl(PR_LOCK_INDIR_BR_LP_STATUS, unsigned long arg) +If :c:macro:`PR_CFI_ENABLE` is set in arg, and the CPU supports +``zicfilp``, then the kernel will enable indirect branch tracking for +the task. The dynamic loader can issue this ``prctl()`` once it has +determined that all the objects loaded in the address space support +indirect branch tracking. + +Indirect branch tracking state can also be locked once enabled. This +prevents the task from subsequently disabling it. This is done by +setting the bit :c:macro:`PR_CFI_LOCK` in arg. Either indirect branch +tracking must already be enabled for the task, or the bit +:c:macro:`PR_CFI_ENABLE` must also be set in arg. This is intended +for environments that wish to run with a strict security posture that +do not wish to load objects without ``zicfilp`` support. + +Indirect branch tracking can also be disabled for the task, assuming +that it has not previously been enabled and locked. If there is a +``dlopen()`` to an object which wasn't compiled with ``zicfilp``, the +dynamic loader can issue this ``prctl()`` with arg set to +:c:macro:`PR_CFI_DISABLE`. Disabling indirect branch tracking for the +task is not possible if it has previously been enabled and locked. + + +* prctl(:c:macro:`PR_GET_CFI`, :c:macro:`PR_CFI_BRANCH_LANDING_PADS`, unsigned long * arg) + +Returns the current status of indirect branch tracking into a bitmask +stored into the memory location pointed to by arg. The bitmask will +have the :c:macro:`PR_CFI_ENABLE` bit set if indirect branch tracking +is currently enabled for the task, and if it is locked, will +additionally have the :c:macro:`PR_CFI_LOCK` bit set. If indirect +branch tracking is currently disabled for the task, the +:c:macro:`PR_CFI_DISABLE` bit will be set. -Locks the current status of indirect branch tracking on the task. User -space may want to run with a strict security posture and wouldn't want -loading of objects without ``zicfilp`` support in them, to disallow -disabling of indirect branch tracking. In this case, user space can -use this prctl to lock the current settings. 5. violations related to indirect branch tracking -------------------------------------------------- diff --git a/Documentation/block/inline-encryption.rst b/Documentation/block/inline-encryption.rst index 7e0703a12dfb..cae23949a626 100644 --- a/Documentation/block/inline-encryption.rst +++ b/Documentation/block/inline-encryption.rst @@ -153,7 +153,7 @@ blk-crypto-fallback completes the original bio. If the original bio is too large, multiple bounce bios may be required; see the code for details. For decryption, blk-crypto-fallback "wraps" the bio's completion callback -(``bi_complete``) and private data (``bi_private``) with its own, unsets the +(``bi_end_io``) and private data (``bi_private``) with its own, unsets the bio's encryption context, then submits the bio. If the read completes successfully, blk-crypto-fallback restores the bio's original completion callback and private data, then decrypts the bio's data in-place using the diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 6ad28039663d..0413dcd9ef69 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -485,6 +485,125 @@ Limitations in case that too many ublk devices are handled by this single io_ring_ctx and each one has very large queue depth +Shared Memory Zero Copy (UBLK_F_SHMEM_ZC) +------------------------------------------ + +The ``UBLK_F_SHMEM_ZC`` feature provides an alternative zero-copy path +that works by sharing physical memory pages between the client application +and the ublk server. Unlike the io_uring fixed buffer approach above, +shared memory zero copy does not require io_uring buffer registration +per I/O — instead, it relies on the kernel matching physical pages +at I/O time. This allows the ublk server to access the shared +buffer directly, which is unlikely for the io_uring fixed buffer +approach. + +Motivation +~~~~~~~~~~ + +Shared memory zero copy takes a different approach: if the client +application and the ublk server both map the same physical memory, there is +nothing to copy. The kernel detects the shared pages automatically and +tells the server where the data already lives. + +``UBLK_F_SHMEM_ZC`` can be thought of as a supplement for optimized client +applications — when the client is willing to allocate I/O buffers from +shared memory, the entire data path becomes zero-copy. + +Use Cases +~~~~~~~~~ + +This feature is useful when the client application can be configured to +use a specific shared memory region for its I/O buffers: + +- **Custom storage clients** that allocate I/O buffers from shared memory + (memfd, hugetlbfs) and issue direct I/O to the ublk device +- **Database engines** that use pre-allocated buffer pools with O_DIRECT + +How It Works +~~~~~~~~~~~~ + +1. The ublk server and client both ``mmap()`` the same file (memfd or + hugetlbfs) with ``MAP_SHARED``. This gives both processes access to the + same physical pages. + +2. The ublk server registers its mapping with the kernel:: + + struct ublk_shmem_buf_reg buf = { .addr = mmap_va, .len = size }; + ublk_ctrl_cmd(UBLK_U_CMD_REG_BUF, .addr = &buf); + + The kernel pins the pages and builds a PFN lookup tree. + +3. When the client issues direct I/O (``O_DIRECT``) to ``/dev/ublkb*``, + the kernel checks whether the I/O buffer pages match any registered + pages by comparing PFNs. + +4. On a match, the kernel sets ``UBLK_IO_F_SHMEM_ZC`` in the I/O + descriptor and encodes the buffer index and offset in ``addr``:: + + if (iod->op_flags & UBLK_IO_F_SHMEM_ZC) { + /* Data is already in our shared mapping — zero copy */ + index = ublk_shmem_zc_index(iod->addr); + offset = ublk_shmem_zc_offset(iod->addr); + buf = shmem_table[index].mmap_base + offset; + } + +5. If pages do not match (e.g., the client used a non-shared buffer), + the I/O falls back to the normal copy path silently. + +The shared memory can be set up via two methods: + +- **Socket-based**: the client sends a memfd to the ublk server via + ``SCM_RIGHTS`` on a unix socket. The server mmaps and registers it. +- **Hugetlbfs-based**: both processes ``mmap(MAP_SHARED)`` the same + hugetlbfs file. No IPC needed — same file gives same physical pages. + +Advantages +~~~~~~~~~~ + +- **Simple**: no per-I/O buffer registration or unregistration commands. + Once the shared buffer is registered, all matching I/O is zero-copy + automatically. +- **Direct buffer access**: the ublk server can read and write the shared + buffer directly via its own mmap, without going through io_uring fixed + buffer operations. This is more friendly for server implementations. +- **Fast**: PFN matching is a single maple tree lookup per bvec. No + io_uring command round-trips for buffer management. +- **Compatible**: non-matching I/O silently falls back to the copy path. + The device works normally for any client, with zero-copy as an + optimization when shared memory is available. + +Limitations +~~~~~~~~~~~ + +- **Requires client cooperation**: the client must allocate its I/O + buffers from the shared memory region. This requires a custom or + configured client — standard applications using their own buffers + will not benefit. +- **Direct I/O only**: buffered I/O (without ``O_DIRECT``) goes through + the page cache, which allocates its own pages. These kernel-allocated + pages will never match the registered shared buffer. Only ``O_DIRECT`` + puts the client's buffer pages directly into the block I/O. +- **Contiguous data only**: each I/O request's data must be contiguous + within a single registered buffer. Scatter/gather I/O that spans + multiple non-adjacent registered buffers cannot use the zero-copy path. + +Control Commands +~~~~~~~~~~~~~~~~ + +- ``UBLK_U_CMD_REG_BUF`` + + Register a shared memory buffer. ``ctrl_cmd.addr`` points to a + ``struct ublk_shmem_buf_reg`` containing the buffer virtual address and size. + Returns the assigned buffer index (>= 0) on success. The kernel pins + pages and builds the PFN lookup tree. Queue freeze is handled + internally. + +- ``UBLK_U_CMD_UNREG_BUF`` + + Unregister a previously registered buffer. ``ctrl_cmd.data[0]`` is the + buffer index. Unpins pages and removes PFN entries from the lookup + tree. + References ========== diff --git a/Documentation/devicetree/bindings/display/msm/qcom,qcm2290-mdss.yaml b/Documentation/devicetree/bindings/display/msm/qcom,qcm2290-mdss.yaml index f0cdb5422688..bb09ecd1a5b4 100644 --- a/Documentation/devicetree/bindings/display/msm/qcom,qcm2290-mdss.yaml +++ b/Documentation/devicetree/bindings/display/msm/qcom,qcm2290-mdss.yaml @@ -33,7 +33,7 @@ properties: - const: core iommus: - maxItems: 2 + maxItems: 1 interconnects: items: @@ -107,8 +107,7 @@ examples: interconnect-names = "mdp0-mem", "cpu-cfg"; - iommus = <&apps_smmu 0x420 0x2>, - <&apps_smmu 0x421 0x0>; + iommus = <&apps_smmu 0x420 0x2>; ranges; display-controller@5e01000 { diff --git a/Documentation/devicetree/bindings/media/qcom,qcm2290-venus.yaml b/Documentation/devicetree/bindings/media/qcom,qcm2290-venus.yaml index 3f3ee82fc878..7e6dc410c2d2 100644 --- a/Documentation/devicetree/bindings/media/qcom,qcm2290-venus.yaml +++ b/Documentation/devicetree/bindings/media/qcom,qcm2290-venus.yaml @@ -42,7 +42,7 @@ properties: - const: vcodec0_bus iommus: - maxItems: 5 + maxItems: 2 interconnects: maxItems: 2 @@ -102,10 +102,7 @@ examples: memory-region = <&pil_video_mem>; iommus = <&apps_smmu 0x860 0x0>, - <&apps_smmu 0x880 0x0>, - <&apps_smmu 0x861 0x04>, - <&apps_smmu 0x863 0x0>, - <&apps_smmu 0x804 0xe0>; + <&apps_smmu 0x880 0x0>; interconnects = <&mmnrt_virt MASTER_VIDEO_P0 RPM_ALWAYS_TAG &bimc SLAVE_EBI1 RPM_ALWAYS_TAG>, diff --git a/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml b/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml index 2bd3efff2485..215f14d1897d 100644 --- a/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml +++ b/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml @@ -42,7 +42,7 @@ properties: - const: mgbe - const: mac - const: mac-divider - - const: ptp-ref + - const: ptp_ref - const: rx-input-m - const: rx-input - const: tx @@ -133,7 +133,7 @@ examples: <&bpmp TEGRA234_CLK_MGBE0_RX_PCS_M>, <&bpmp TEGRA234_CLK_MGBE0_RX_PCS>, <&bpmp TEGRA234_CLK_MGBE0_TX_PCS>; - clock-names = "mgbe", "mac", "mac-divider", "ptp-ref", "rx-input-m", + clock-names = "mgbe", "mac", "mac-divider", "ptp_ref", "rx-input-m", "rx-input", "tx", "eee-pcs", "rx-pcs-input", "rx-pcs-m", "rx-pcs", "tx-pcs"; resets = <&bpmp TEGRA234_RESET_MGBE0_MAC>, diff --git a/Documentation/devicetree/bindings/sound/ti,tas2552.yaml b/Documentation/devicetree/bindings/sound/ti,tas2552.yaml index 10369aa5f0a8..85e3ebd2acd8 100644 --- a/Documentation/devicetree/bindings/sound/ti,tas2552.yaml +++ b/Documentation/devicetree/bindings/sound/ti,tas2552.yaml @@ -12,8 +12,8 @@ maintainers: - Baojun Xu <baojun.xu@ti.com> description: > - The TAS2552 can receive its reference clock via MCLK, BCLK, IVCLKIN pin or - use the internal 1.8MHz. This CLKIN is used by the PLL. In addition to PLL, + The TAS2552 can receive its reference clock via MCLK, BCLK, IVCLKIN pin or + use the internal 1.8MHz. This CLKIN is used by the PLL. In addition to PLL, the PDM reference clock is also selectable: PLL, IVCLKIN, BCLK or MCLK. For system integration the dt-bindings/sound/tas2552.h header file provides @@ -34,6 +34,9 @@ properties: maxItems: 1 description: gpio pin to enable/disable the device + '#sound-dai-cells': + const: 0 + required: - compatible - reg @@ -41,7 +44,10 @@ required: - iovdd-supply - avdd-supply -additionalProperties: false +allOf: + - $ref: dai-common.yaml# + +unevaluatedProperties: false examples: - | @@ -54,6 +60,7 @@ examples: audio-codec@41 { compatible = "ti,tas2552"; reg = <0x41>; + #sound-dai-cells = <0>; vbat-supply = <®_vbat>; iovdd-supply = <®_iovdd>; avdd-supply = <®_avdd>; diff --git a/Documentation/filesystems/mount_api.rst b/Documentation/filesystems/mount_api.rst index a064234fed5b..e8b94357b4df 100644 --- a/Documentation/filesystems/mount_api.rst +++ b/Documentation/filesystems/mount_api.rst @@ -647,9 +647,7 @@ The members are as follows: fs_param_is_u64 64-bit unsigned int result->uint_64 fs_param_is_enum Enum value name result->uint_32 fs_param_is_string Arbitrary string param->string - fs_param_is_blob Binary blob param->blob fs_param_is_blockdev Blockdev path * Needs lookup - fs_param_is_path Path * Needs lookup fs_param_is_fd File descriptor result->int_32 fs_param_is_uid User ID (u32) result->uid fs_param_is_gid Group ID (u32) result->gid @@ -681,9 +679,7 @@ The members are as follows: fsparam_u64() fs_param_is_u64 fsparam_enum() fs_param_is_enum fsparam_string() fs_param_is_string - fsparam_blob() fs_param_is_blob fsparam_bdev() fs_param_is_blockdev - fsparam_path() fs_param_is_path fsparam_fd() fs_param_is_fd fsparam_uid() fs_param_is_uid fsparam_gid() fs_param_is_gid diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 52ff1d19405b..d02aa57e4477 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1361,3 +1361,17 @@ to match what strlen() would return if it was ran on the string. However, if the string is freely accessible for the duration of inode's lifetime, consider using inode_set_cached_link() instead. + +--- + +**mandatory** + +lookup_one_qstr_excl() is no longer exported - use start_creating() or +similar. +--- + +** mandatory** + +lock_rename(), lock_rename_child(), unlock_rename() are no +longer available. Use start_renaming() or similar. + diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst index 6b373e193548..84156d031365 100644 --- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -31,8 +31,8 @@ you probably needn't concern yourself with pcmciautils. ====================== =============== ======================================== GNU C 8.1 gcc --version Clang/LLVM (optional) 15.0.0 clang --version -Rust (optional) 1.78.0 rustc --version -bindgen (optional) 0.65.1 bindgen --version +Rust (optional) 1.85.0 rustc --version +bindgen (optional) 0.71.1 bindgen --version GNU make 4.0 make --version bash 4.2 bash --version binutils 2.30 ld -v diff --git a/Documentation/rust/general-information.rst b/Documentation/rust/general-information.rst index 6146b49b6a98..09234bed272c 100644 --- a/Documentation/rust/general-information.rst +++ b/Documentation/rust/general-information.rst @@ -157,5 +157,5 @@ numerical comparisons, one may define a new Kconfig symbol: .. code-block:: kconfig - config RUSTC_VERSION_MIN_107900 - def_bool y if RUSTC_VERSION >= 107900 + config RUSTC_HAS_SPAN_FILE + def_bool RUSTC_VERSION >= 108800 diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst index 152289f0bed2..a6ec3fa94d33 100644 --- a/Documentation/rust/quick-start.rst +++ b/Documentation/rust/quick-start.rst @@ -57,8 +57,8 @@ of the box, e.g.:: Gentoo Linux ************ -Gentoo Linux (and especially the testing branch) provides recent Rust releases -and thus it should generally work out of the box, e.g.:: +Gentoo Linux provides recent Rust releases and thus it should generally work out +of the box, e.g.:: USE='rust-src rustfmt clippy' emerge dev-lang/rust dev-util/bindgen @@ -68,8 +68,8 @@ and thus it should generally work out of the box, e.g.:: Nix *** -Nix (unstable channel) provides recent Rust releases and thus it should -generally work out of the box, e.g.:: +Nix provides recent Rust releases and thus it should generally work out of the +box, e.g.:: { pkgs ? import <nixpkgs> {} }: pkgs.mkShell { @@ -84,16 +84,13 @@ openSUSE openSUSE Slowroll and openSUSE Tumbleweed provide recent Rust releases and thus they should generally work out of the box, e.g.:: - zypper install rust rust1.79-src rust-bindgen clang + zypper install rust rust-src rust-bindgen clang Ubuntu ****** -25.04 -~~~~~ - -The latest Ubuntu releases provide recent Rust releases and thus they should +Ubuntu 25.10 and 26.04 LTS provide recent Rust releases and thus they should generally work out of the box, e.g.:: apt install rustc rust-src bindgen rustfmt rust-clippy @@ -112,33 +109,33 @@ Though Ubuntu 24.04 LTS and older versions still provide recent Rust releases, they require some additional configuration to be set, using the versioned packages, e.g.:: - apt install rustc-1.80 rust-1.80-src bindgen-0.65 rustfmt-1.80 \ - rust-1.80-clippy - ln -s /usr/lib/rust-1.80/bin/rustfmt /usr/bin/rustfmt-1.80 - ln -s /usr/lib/rust-1.80/bin/clippy-driver /usr/bin/clippy-driver-1.80 + apt install rustc-1.85 rust-1.85-src bindgen-0.71 rustfmt-1.85 \ + rust-1.85-clippy + ln -s /usr/lib/rust-1.85/bin/rustfmt /usr/bin/rustfmt-1.85 + ln -s /usr/lib/rust-1.85/bin/clippy-driver /usr/bin/clippy-driver-1.85 None of these packages set their tools as defaults; therefore they should be specified explicitly, e.g.:: - make LLVM=1 RUSTC=rustc-1.80 RUSTDOC=rustdoc-1.80 RUSTFMT=rustfmt-1.80 \ - CLIPPY_DRIVER=clippy-driver-1.80 BINDGEN=bindgen-0.65 + make LLVM=1 RUSTC=rustc-1.85 RUSTDOC=rustdoc-1.85 RUSTFMT=rustfmt-1.85 \ + CLIPPY_DRIVER=clippy-driver-1.85 BINDGEN=bindgen-0.71 -Alternatively, modify the ``PATH`` variable to place the Rust 1.80 binaries +Alternatively, modify the ``PATH`` variable to place the Rust 1.85 binaries first and set ``bindgen`` as the default, e.g.:: - PATH=/usr/lib/rust-1.80/bin:$PATH + PATH=/usr/lib/rust-1.85/bin:$PATH update-alternatives --install /usr/bin/bindgen bindgen \ - /usr/bin/bindgen-0.65 100 - update-alternatives --set bindgen /usr/bin/bindgen-0.65 + /usr/bin/bindgen-0.71 100 + update-alternatives --set bindgen /usr/bin/bindgen-0.71 -``RUST_LIB_SRC`` needs to be set when using the versioned packages, e.g.:: +``RUST_LIB_SRC`` may need to be set when using the versioned packages, e.g.:: - RUST_LIB_SRC=/usr/src/rustc-$(rustc-1.80 --version | cut -d' ' -f2)/library + RUST_LIB_SRC=/usr/src/rustc-$(rustc-1.85 --version | cut -d' ' -f2)/library For convenience, ``RUST_LIB_SRC`` can be exported to the global environment. -In addition, ``bindgen-0.65`` is available in newer releases (24.04 LTS and -24.10), but it may not be available in older ones (20.04 LTS and 22.04 LTS), +In addition, ``bindgen-0.71`` is available in newer releases (24.04 LTS), +but it may not be available in older ones (20.04 LTS and 22.04 LTS), thus ``bindgen`` may need to be built manually (please see below). @@ -355,12 +352,3 @@ Hacking To dive deeper, take a look at the source code of the samples at ``samples/rust/``, the Rust support code under ``rust/`` and the ``Rust hacking`` menu under ``Kernel hacking``. - -If GDB/Binutils is used and Rust symbols are not getting demangled, the reason -is the toolchain does not support Rust's new v0 mangling scheme yet. -There are a few ways out: - -- Install a newer release (GDB >= 10.2, Binutils >= 2.36). - -- Some versions of GDB (e.g. vanilla GDB 10.1) are able to use - the pre-demangled names embedded in the debug info (``CONFIG_DEBUG_INFO``). diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst index 3e4d4d04cfae..c5186526e76f 100644 --- a/Documentation/security/landlock.rst +++ b/Documentation/security/landlock.rst @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation ================================== :Author: Mickaël Salaün -:Date: September 2025 +:Date: March 2026 Landlock's goal is to create scoped access-control (i.e. sandboxing). To harden a whole system, this feature should be available to any process, @@ -89,6 +89,46 @@ this is required to keep access controls consistent over the whole system, and this avoids unattended bypasses through file descriptor passing (i.e. confused deputy attack). +.. _scoped-flags-interaction: + +Interaction between scoped flags and other access rights +-------------------------------------------------------- + +The ``scoped`` flags in &struct landlock_ruleset_attr restrict the +use of *outgoing* IPC from the created Landlock domain, while they +permit reaching out to IPC endpoints *within* the created Landlock +domain. + +In the future, scoped flags *may* interact with other access rights, +e.g. so that abstract UNIX sockets can be allow-listed by name, or so +that signals can be allow-listed by signal number or target process. + +When introducing ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``, we defined it to +implicitly have the same scoping semantics as a +``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` flag would have: connecting to +UNIX sockets within the same domain (where +``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` is used) is unconditionally +allowed. + +The reasoning is: + +* Like other IPC mechanisms, connecting to named UNIX sockets in the + same domain should be expected and harmless. (If needed, users can + further refine their Landlock policies with nested domains or by + restricting ``LANDLOCK_ACCESS_FS_MAKE_SOCK``.) +* We reserve the option to still introduce + ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the future. (This would + be useful if we wanted to have a Landlock rule to permit IPC access + to other Landlock domains.) +* But we can postpone the point in time when users have to deal with + two interacting flags visible in the userspace API. (In particular, + it is possible that it won't be needed in practice, in which case we + can avoid the second flag altogether.) +* If we *do* introduce ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the + future, setting this scoped flag in a ruleset does *not reduce* the + restrictions, because access within the same scope is already + allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``. + Tests ===== diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index 7f86d7a37dc2..fd8b78c31f2f 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -77,7 +77,8 @@ to be explicit about the denied-by-default access rights. LANDLOCK_ACCESS_FS_MAKE_SYM | LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_TRUNCATE | - LANDLOCK_ACCESS_FS_IOCTL_DEV, + LANDLOCK_ACCESS_FS_IOCTL_DEV | + LANDLOCK_ACCESS_FS_RESOLVE_UNIX, .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | LANDLOCK_ACCESS_NET_CONNECT_TCP, @@ -127,6 +128,10 @@ version, and only use the available subset of access rights: /* Removes LANDLOCK_SCOPE_* for ABI < 6 */ ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET | LANDLOCK_SCOPE_SIGNAL); + __attribute__((fallthrough)); + case 6 ... 8: + /* Removes LANDLOCK_ACCESS_FS_RESOLVE_UNIX for ABI < 9 */ + ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_RESOLVE_UNIX; } This enables the creation of an inclusive ruleset that will contain our rules. @@ -378,8 +383,8 @@ Truncating files The operations covered by ``LANDLOCK_ACCESS_FS_WRITE_FILE`` and ``LANDLOCK_ACCESS_FS_TRUNCATE`` both change the contents of a file and sometimes -overlap in non-intuitive ways. It is recommended to always specify both of -these together. +overlap in non-intuitive ways. It is strongly recommended to always specify +both of these together (either granting both, or granting none). A particularly surprising example is :manpage:`creat(2)`. The name suggests that this system call requires the rights to create and write files. However, @@ -391,6 +396,10 @@ It should also be noted that truncating files does not require the system call, this can also be done through :manpage:`open(2)` with the flags ``O_RDONLY | O_TRUNC``. +At the same time, on some filesystems, :manpage:`fallocate(2)` offers a way to +shorten file contents with ``FALLOC_FL_COLLAPSE_RANGE`` when the file is opened +for writing, sidestepping the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right. + The truncate right is associated with the opened file (see below). Rights associated with file descriptors @@ -700,6 +709,13 @@ enforce Landlock rulesets across all threads of the calling process using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to sys_landlock_restrict_self(). +Pathname UNIX sockets (ABI < 9) +------------------------------- + +Starting with the Landlock ABI version 9, it is possible to restrict +connections to pathname UNIX domain sockets (:manpage:`unix(7)`) using +the new ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` right. + .. _kernel_support: Kernel support |
