summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/LSM
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/LSM')
-rw-r--r--Documentation/admin-guide/LSM/SELinux.rst11
-rw-r--r--Documentation/admin-guide/LSM/SafeSetID.rst2
-rw-r--r--Documentation/admin-guide/LSM/Smack.rst16
-rw-r--r--Documentation/admin-guide/LSM/index.rst1
-rw-r--r--Documentation/admin-guide/LSM/ipe.rst86
-rw-r--r--Documentation/admin-guide/LSM/landlock.rst189
6 files changed, 279 insertions, 26 deletions
diff --git a/Documentation/admin-guide/LSM/SELinux.rst b/Documentation/admin-guide/LSM/SELinux.rst
index 520a1c2c6fd2..cdd65164ca96 100644
--- a/Documentation/admin-guide/LSM/SELinux.rst
+++ b/Documentation/admin-guide/LSM/SELinux.rst
@@ -2,6 +2,17 @@
SELinux
=======
+Information about the SELinux kernel subsystem can be found at the
+following links:
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git/tree/README.md
+
+ https://github.com/selinuxproject/selinux-kernel/wiki
+
+Information about the SELinux userspace can be found at:
+
+ https://github.com/SELinuxProject/selinux/wiki
+
If you want to use SELinux, chances are you will want
to use the distro-provided policies, or install the
latest reference policy release from
diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
index 0ec34863c674..6d439c987563 100644
--- a/Documentation/admin-guide/LSM/SafeSetID.rst
+++ b/Documentation/admin-guide/LSM/SafeSetID.rst
@@ -41,7 +41,7 @@ namespace). The higher level goal is to allow for uid-based sandboxing of system
services without having to give out CAP_SETUID all over the place just so that
non-root programs can drop to even-lesser-privileged uids. This is especially
relevant when one non-root daemon on the system should be allowed to spawn other
-processes as different uids, but its undesirable to give the daemon a
+processes as different uids, but it's undesirable to give the daemon a
basically-root-equivalent CAP_SETUID.
diff --git a/Documentation/admin-guide/LSM/Smack.rst b/Documentation/admin-guide/LSM/Smack.rst
index 6d44f4fdbf59..c5ed775f2d10 100644
--- a/Documentation/admin-guide/LSM/Smack.rst
+++ b/Documentation/admin-guide/LSM/Smack.rst
@@ -601,10 +601,15 @@ specification.
Task Attribute
~~~~~~~~~~~~~~
-The Smack label of a process can be read from /proc/<pid>/attr/current. A
-process can read its own Smack label from /proc/self/attr/current. A
+The Smack label of a process can be read from ``/proc/<pid>/attr/current``. A
+process can read its own Smack label from ``/proc/self/attr/current``. A
privileged process can change its own Smack label by writing to
-/proc/self/attr/current but not the label of another process.
+``/proc/self/attr/current`` but not the label of another process.
+
+Format of writing is : only the label or the label followed by one of the
+3 trailers: ``\n`` (by common agreement for ``/proc/...`` interfaces),
+``\0`` (because some applications incorrectly include it),
+``\n\0`` (because we think some applications may incorrectly include it).
File Attribute
~~~~~~~~~~~~~~
@@ -696,6 +701,11 @@ sockets.
A privileged program may set this to match the label of another
task with which it hopes to communicate.
+UNIX domain socket (UDS) with a BSD address functions both as a file in a
+filesystem and as a socket. As a file, it carries the SMACK64 attribute. This
+attribute is not involved in Smack security enforcement and is immutably
+assigned the label "*".
+
Smack Netlabel Exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
index ce63be6d64ad..b44ef68f6e4d 100644
--- a/Documentation/admin-guide/LSM/index.rst
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -48,3 +48,4 @@ subdirectories.
Yama
SafeSetID
ipe
+ landlock
diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
index f93a467db628..a756d8158531 100644
--- a/Documentation/admin-guide/LSM/ipe.rst
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -95,7 +95,20 @@ languages when these scripts are invoked by passing these program files
to the interpreter. This is because the way interpreters execute these
files; the scripts themselves are not evaluated as executable code
through one of IPE's hooks, but they are merely text files that are read
-(as opposed to compiled executables) [#interpreters]_.
+(as opposed to compiled executables). However, with the introduction of the
+``AT_EXECVE_CHECK`` flag (:doc:`AT_EXECVE_CHECK </userspace-api/check_exec>`),
+interpreters can use it to signal the kernel that a script file will be executed,
+and request the kernel to perform LSM security checks on it.
+
+IPE's EXECUTE operation enforcement differs between compiled executables and
+interpreted scripts: For compiled executables, enforcement is triggered
+automatically by the kernel during ``execve()``, ``execveat()``, ``mmap()``
+and ``mprotect()`` syscalls when loading executable content. For interpreted
+scripts, enforcement requires explicit interpreter integration using
+``execveat()`` with ``AT_EXECVE_CHECK`` flag. Unlike exec syscalls that IPE
+intercepts during the execution process, this mechanism needs the interpreter
+to take the initiative, and existing interpreters won't be automatically
+supported unless the signal call is added.
Threat Model
------------
@@ -423,7 +436,7 @@ Field descriptions:
Event Example::
- type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1
+ type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1 errno=0
type=1300 audit(1653425529.927:53): arch=c000003e syscall=1 success=yes exit=2567 a0=3 a1=5596fcae1fb0 a2=a07 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null)
type=1327 audit(1653425529.927:53): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2E
@@ -433,24 +446,55 @@ This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record
Field descriptions:
-+----------------+------------+-----------+---------------------------------------------------+
-| Field | Value Type | Optional? | Description of Value |
-+================+============+===========+===================================================+
-| policy_name | string | No | The policy_name |
-+----------------+------------+-----------+---------------------------------------------------+
-| policy_version | string | No | The policy_version |
-+----------------+------------+-----------+---------------------------------------------------+
-| policy_digest | string | No | The policy hash |
-+----------------+------------+-----------+---------------------------------------------------+
-| auid | integer | No | The login user ID |
-+----------------+------------+-----------+---------------------------------------------------+
-| ses | integer | No | The login session ID |
-+----------------+------------+-----------+---------------------------------------------------+
-| lsm | string | No | The lsm name associated with the event |
-+----------------+------------+-----------+---------------------------------------------------+
-| res | integer | No | The result of the audited operation(success/fail) |
-+----------------+------------+-----------+---------------------------------------------------+
-
++----------------+------------+-----------+-------------------------------------------------------------+
+| Field | Value Type | Optional? | Description of Value |
++================+============+===========+=============================================================+
+| policy_name | string | Yes | The policy_name |
++----------------+------------+-----------+-------------------------------------------------------------+
+| policy_version | string | Yes | The policy_version |
++----------------+------------+-----------+-------------------------------------------------------------+
+| policy_digest | string | Yes | The policy hash |
++----------------+------------+-----------+-------------------------------------------------------------+
+| auid | integer | No | The login user ID |
++----------------+------------+-----------+-------------------------------------------------------------+
+| ses | integer | No | The login session ID |
++----------------+------------+-----------+-------------------------------------------------------------+
+| lsm | string | No | The lsm name associated with the event |
++----------------+------------+-----------+-------------------------------------------------------------+
+| res | integer | No | The result of the audited operation(success/fail) |
++----------------+------------+-----------+-------------------------------------------------------------+
+| errno | integer | No | Error code from policy loading operations (see table below) |
++----------------+------------+-----------+-------------------------------------------------------------+
+
+Policy error codes (errno):
+
+The following table lists the error codes that may appear in the errno field while loading or updating the policy:
+
++----------------+--------------------------------------------------------+
+| Error Code | Description |
++================+========================================================+
+| 0 | Success |
++----------------+--------------------------------------------------------+
+| -EPERM | Insufficient permission |
++----------------+--------------------------------------------------------+
+| -EEXIST | Same name policy already deployed |
++----------------+--------------------------------------------------------+
+| -EBADMSG | Policy is invalid |
++----------------+--------------------------------------------------------+
+| -ENOMEM | Out of memory (OOM) |
++----------------+--------------------------------------------------------+
+| -ERANGE | Policy version number overflow |
++----------------+--------------------------------------------------------+
+| -EINVAL | Policy version parsing error |
++----------------+--------------------------------------------------------+
+| -ENOKEY | Key used to sign the IPE policy not found in keyring |
++----------------+--------------------------------------------------------+
+| -EKEYREJECTED | Policy signature verification failed |
++----------------+--------------------------------------------------------+
+| -ESTALE | Attempting to update an IPE policy with older version |
++----------------+--------------------------------------------------------+
+| -ENOENT | Policy was deleted while updating |
++----------------+--------------------------------------------------------+
1404 AUDIT_MAC_STATUS
^^^^^^^^^^^^^^^^^^^^^
@@ -775,8 +819,6 @@ A:
.. [#digest_cache_lsm] https://lore.kernel.org/lkml/20240415142436.2545003-1-roberto.sassu@huaweicloud.com/
-.. [#interpreters] There is `some interest in solving this issue <https://lore.kernel.org/lkml/20220321161557.495388-1-mic@digikod.net/>`_.
-
.. [#devdoc] Please see :doc:`the design docs </security/ipe>` for more on
this topic.
diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
new file mode 100644
index 000000000000..9923874e2156
--- /dev/null
+++ b/Documentation/admin-guide/LSM/landlock.rst
@@ -0,0 +1,189 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright © 2025 Microsoft Corporation
+
+================================
+Landlock: system-wide management
+================================
+
+:Author: Mickaël Salaün
+:Date: January 2026
+
+Landlock can leverage the audit framework to log events.
+
+User space documentation can be found here:
+Documentation/userspace-api/landlock.rst.
+
+Audit
+=====
+
+Denied access requests are logged by default for a sandboxed program if `audit`
+is enabled. This default behavior can be changed with the
+sys_landlock_restrict_self() flags (cf.
+Documentation/userspace-api/landlock.rst). Landlock logs can also be masked
+thanks to audit rules. Landlock can generate 2 audit record types.
+
+Record types
+------------
+
+AUDIT_LANDLOCK_ACCESS
+ This record type identifies a denied access request to a kernel resource.
+ The ``domain`` field indicates the ID of the domain that blocked the
+ request. The ``blockers`` field indicates the cause(s) of this denial
+ (separated by a comma), and the following fields identify the kernel object
+ (similar to SELinux). There may be more than one of this record type per
+ audit event.
+
+ Example with a file link request generating two records in the same event::
+
+ domain=195ba459b blockers=fs.refer path="/usr/bin" dev="vda2" ino=351
+ domain=195ba459b blockers=fs.make_reg,fs.refer path="/usr/local" dev="vda2" ino=365
+
+
+ The ``blockers`` field uses dot-separated prefixes to indicate the type of
+ restriction that caused the denial:
+
+ **fs.*** - Filesystem access rights (ABI 1+):
+ - fs.execute, fs.write_file, fs.read_file, fs.read_dir
+ - fs.remove_dir, fs.remove_file
+ - fs.make_char, fs.make_dir, fs.make_reg, fs.make_sock
+ - fs.make_fifo, fs.make_block, fs.make_sym
+ - fs.refer (ABI 2+)
+ - fs.truncate (ABI 3+)
+ - fs.ioctl_dev (ABI 5+)
+
+ **net.*** - Network access rights (ABI 4+):
+ - net.bind_tcp - TCP port binding was denied
+ - net.connect_tcp - TCP connection was denied
+
+ **scope.*** - IPC scoping restrictions (ABI 6+):
+ - scope.abstract_unix_socket - Abstract UNIX socket connection denied
+ - scope.signal - Signal sending denied
+
+ Multiple blockers can appear in a single event (comma-separated) when
+ multiple access rights are missing. For example, creating a regular file
+ in a directory that lacks both ``make_reg`` and ``refer`` rights would show
+ ``blockers=fs.make_reg,fs.refer``.
+
+ The object identification fields (path, dev, ino for filesystem; opid,
+ ocomm for signals) depend on the type of access being blocked and provide
+ context about what resource was involved in the denial.
+
+
+AUDIT_LANDLOCK_DOMAIN
+ This record type describes the status of a Landlock domain. The ``status``
+ field can be either ``allocated`` or ``deallocated``.
+
+ The ``allocated`` status is part of the same audit event and follows
+ the first logged ``AUDIT_LANDLOCK_ACCESS`` record of a domain. It identifies
+ Landlock domain information at the time of the sys_landlock_restrict_self()
+ call with the following fields:
+
+ - the ``domain`` ID
+ - the enforcement ``mode``
+ - the domain creator's ``pid``
+ - the domain creator's ``uid``
+ - the domain creator's executable path (``exe``)
+ - the domain creator's command line (``comm``)
+
+ Example::
+
+ domain=195ba459b status=allocated mode=enforcing pid=300 uid=0 exe="/root/sandboxer" comm="sandboxer"
+
+ The ``deallocated`` status is an event on its own and it identifies a
+ Landlock domain release. After such event, it is guarantee that the
+ related domain ID will never be reused during the lifetime of the system.
+ The ``domain`` field indicates the ID of the domain which is released, and
+ the ``denials`` field indicates the total number of denied access request,
+ which might not have been logged according to the audit rules and
+ sys_landlock_restrict_self()'s flags.
+
+ Example::
+
+ domain=195ba459b status=deallocated denials=3
+
+
+Event samples
+--------------
+
+Here are two examples of log events (see serial numbers).
+
+In this example a sandboxed program (``kill``) tries to send a signal to the
+init process, which is denied because of the signal scoping restriction
+(``LL_SCOPED=s``)::
+
+ $ LL_FS_RO=/ LL_FS_RW=/ LL_SCOPED=s LL_FORCE_LOG=1 ./sandboxer kill 1
+
+This command generates two events, each identified with a unique serial
+number following a timestamp (``msg=audit(1729738800.268:30)``). The first
+event (serial ``30``) contains 4 records. The first record
+(``type=LANDLOCK_ACCESS``) shows an access denied by the domain `1a6fdc66f`.
+The cause of this denial is signal scoping restriction
+(``blockers=scope.signal``). The process that would have receive this signal
+is the init process (``opid=1 ocomm="systemd"``).
+
+The second record (``type=LANDLOCK_DOMAIN``) describes (``status=allocated``)
+domain `1a6fdc66f`. This domain was created by process ``286`` executing the
+``/root/sandboxer`` program launched by the root user.
+
+The third record (``type=SYSCALL``) describes the syscall, its provided
+arguments, its result (``success=no exit=-1``), and the process that called it.
+
+The fourth record (``type=PROCTITLE``) shows the command's name as an
+hexadecimal value. This can be translated with ``python -c
+'print(bytes.fromhex("6B696C6C0031"))'``.
+
+Finally, the last record (``type=LANDLOCK_DOMAIN``) is also the only one from
+the second event (serial ``31``). It is not tied to a direct user space action
+but an asynchronous one to free resources tied to a Landlock domain
+(``status=deallocated``). This can be useful to know that the following logs
+will not concern the domain ``1a6fdc66f`` anymore. This record also summarize
+the number of requests this domain denied (``denials=1``), whether they were
+logged or not.
+
+.. code-block::
+
+ type=LANDLOCK_ACCESS msg=audit(1729738800.268:30): domain=1a6fdc66f blockers=scope.signal opid=1 ocomm="systemd"
+ type=LANDLOCK_DOMAIN msg=audit(1729738800.268:30): domain=1a6fdc66f status=allocated mode=enforcing pid=286 uid=0 exe="/root/sandboxer" comm="sandboxer"
+ type=SYSCALL msg=audit(1729738800.268:30): arch=c000003e syscall=62 success=no exit=-1 [..] ppid=272 pid=286 auid=0 uid=0 gid=0 [...] comm="kill" [...]
+ type=PROCTITLE msg=audit(1729738800.268:30): proctitle=6B696C6C0031
+ type=LANDLOCK_DOMAIN msg=audit(1729738800.324:31): domain=1a6fdc66f status=deallocated denials=1
+
+Here is another example showcasing filesystem access control::
+
+ $ LL_FS_RO=/ LL_FS_RW=/tmp LL_FORCE_LOG=1 ./sandboxer sh -c "echo > /etc/passwd"
+
+The related audit logs contains 8 records from 3 different events (serials 33,
+34 and 35) created by the same domain `1a6fdc679`::
+
+ type=LANDLOCK_ACCESS msg=audit(1729738800.221:33): domain=1a6fdc679 blockers=fs.write_file path="/dev/tty" dev="devtmpfs" ino=9
+ type=LANDLOCK_DOMAIN msg=audit(1729738800.221:33): domain=1a6fdc679 status=allocated mode=enforcing pid=289 uid=0 exe="/root/sandboxer" comm="sandboxer"
+ type=SYSCALL msg=audit(1729738800.221:33): arch=c000003e syscall=257 success=no exit=-13 [...] ppid=272 pid=289 auid=0 uid=0 gid=0 [...] comm="sh" [...]
+ type=PROCTITLE msg=audit(1729738800.221:33): proctitle=7368002D63006563686F203E202F6574632F706173737764
+ type=LANDLOCK_ACCESS msg=audit(1729738800.221:34): domain=1a6fdc679 blockers=fs.write_file path="/etc/passwd" dev="vda2" ino=143821
+ type=SYSCALL msg=audit(1729738800.221:34): arch=c000003e syscall=257 success=no exit=-13 [...] ppid=272 pid=289 auid=0 uid=0 gid=0 [...] comm="sh" [...]
+ type=PROCTITLE msg=audit(1729738800.221:34): proctitle=7368002D63006563686F203E202F6574632F706173737764
+ type=LANDLOCK_DOMAIN msg=audit(1729738800.261:35): domain=1a6fdc679 status=deallocated denials=2
+
+
+Event filtering
+---------------
+
+If you get spammed with audit logs related to Landlock, this is either an
+attack attempt or a bug in the security policy. We can put in place some
+filters to limit noise with two complementary ways:
+
+- with sys_landlock_restrict_self()'s flags if we can fix the sandboxed
+ programs,
+- or with audit rules (see :manpage:`auditctl(8)`).
+
+Additional documentation
+========================
+
+* `Linux Audit Documentation`_
+* Documentation/userspace-api/landlock.rst
+* Documentation/security/landlock.rst
+* https://landlock.io
+
+.. Links
+.. _Linux Audit Documentation:
+ https://github.com/linux-audit/audit-documentation/wiki