diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-06-01 13:23:59 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-06-01 13:23:59 -0700 |
commit | a7092c82042b4ba3000cf7b369d1032161c5d4c9 (patch) | |
tree | 240c3b73ac4b25fcca218871f109383ed95fd52a /Documentation | |
parent | 69fc06f70f4569c9969f99fe25bdc9a6bb537b43 (diff) | |
parent | 5cde265384cad739b162cf08afba6da8857778bd (diff) | |
download | lwn-a7092c82042b4ba3000cf7b369d1032161c5d4c9.tar.gz lwn-a7092c82042b4ba3000cf7b369d1032161c5d4c9.zip |
Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add AMD Fam17h RAPL support
- Introduce CAP_PERFMON to kernel and user space
- Add Zhaoxin CPU support
- Misc fixes and cleanups
Tooling changes:
- perf record:
Introduce '--switch-output-event' to use arbitrary events to be
setup and read from a side band thread and, when they take place a
signal be sent to the main 'perf record' thread, reusing the core
for '--switch-output' to take perf.data snapshots from the ring
buffer used for '--overwrite', e.g.:
# perf record --overwrite -e sched:* \
--switch-output-event syscalls:*connect* \
workload
will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
connect syscalls.
Add '--num-synthesize-threads' option to control degree of
parallelism of the synthesize_mmap() code which is scanning
/proc/PID/task/PID/maps and can be time consuming. This mimics
pre-existing behaviour in 'perf top'.
- perf bench:
Add a multi-threaded synthesize benchmark and kallsyms parsing
benchmark.
- Intel PT support:
Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
Allow using Intel PT to synthesize callchains for regular events.
Add support for synthesizing branch stacks for regular events
(cycles, instructions, etc) from Intel PT data.
Misc changes:
- Updated perf vendor events for power9 and Coresight.
- Add flamegraph.py script via 'perf flamegraph'
- Misc other changes, fixes and cleanups - see the Git log for details
Also, since over the last couple of years perf tooling has matured and
decoupled from the kernel perf changes to a large degree, going
forward Arnaldo is going to send perf tooling changes via direct pull
requests"
* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
perf/x86/rapl: Add AMD Fam17h RAPL support
perf/x86/rapl: Make perf_probe_msr() more robust and flexible
perf/x86/rapl: Flip logic on default events visibility
perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
perf/x86/rapl: Move RAPL support to common x86 code
perf/core: Replace zero-length array with flexible-array
perf/x86: Replace zero-length array with flexible-array
perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
perf/x86/rapl: Add Ice Lake RAPL support
perf flamegraph: Use /bin/bash for report and record scripts
perf cs-etm: Move definition of 'traceid_list' global variable from header file
libsymbols kallsyms: Move hex2u64 out of header
libsymbols kallsyms: Parse using io api
perf bench: Add kallsyms parsing
perf: cs-etm: Update to build with latest opencsd version.
perf symbol: Fix kernel symbol address display
perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/perf-security.rst | 86 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/kernel.rst | 16 |
2 files changed, 72 insertions, 30 deletions
diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index 72effa7c23b9..1307b5274a0f 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -1,6 +1,6 @@ .. _perf_security: -Perf Events and tool security +Perf events and tool security ============================= Overview @@ -42,11 +42,11 @@ categories: Data that belong to the fourth category can potentially contain sensitive process data. If PMUs in some monitoring modes capture values of execution context registers or data from process memory then access -to such monitoring capabilities requires to be ordered and secured -properly. So, perf_events/Perf performance monitoring is the subject for -security access control management [5]_ . +to such monitoring modes requires to be ordered and secured properly. +So, perf_events performance monitoring and observability operations are +the subject for security access control management [5]_ . -perf_events/Perf access control +perf_events access control ------------------------------- To perform security checks, the Linux implementation splits processes @@ -66,11 +66,25 @@ into distinct units, known as capabilities [6]_ , which can be independently enabled and disabled on per-thread basis for processes and files of unprivileged users. -Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated +Unprivileged processes with enabled CAP_PERFMON capability are treated as privileged processes with respect to perf_events performance -monitoring and bypass *scope* permissions checks in the kernel. - -Unprivileged processes using perf_events system call API is also subject +monitoring and observability operations, thus, bypass *scope* permissions +checks in the kernel. CAP_PERFMON implements the principle of least +privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and +observability operations in the kernel and provides a secure approach to +perfomance monitoring and observability in the system. + +For backward compatibility reasons the access to perf_events monitoring and +observability operations is also open for CAP_SYS_ADMIN privileged +processes but CAP_SYS_ADMIN usage for secure monitoring and observability +use cases is discouraged with respect to the CAP_PERFMON capability. +If system audit records [14]_ for a process using perf_events system call +API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN +capabilities then providing the process with CAP_PERFMON capability singly +is recommended as the preferred secure approach to resolve double access +denial logging related to usage of performance monitoring and observability. + +Unprivileged processes using perf_events system call are also subject for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome determines whether monitoring is permitted. So unprivileged processes provided with CAP_SYS_PTRACE capability are effectively @@ -82,14 +96,14 @@ performance analysis of monitored processes or a system. For example, CAP_SYSLOG capability permits reading kernel space memory addresses from /proc/kallsyms file. -perf_events/Perf privileged users +Privileged Perf users groups --------------------------------- Mechanisms of capabilities, privileged capability-dumb files [6]_ and -file system ACLs [10]_ can be used to create a dedicated group of -perf_events/Perf privileged users who are permitted to execute -performance monitoring without scope limits. The following steps can be -taken to create such a group of privileged Perf users. +file system ACLs [10]_ can be used to create dedicated groups of +privileged Perf users who are permitted to execute performance monitoring +and observability without scope limits. The following steps can be +taken to create such groups of privileged Perf users. 1. Create perf_users group of privileged Perf users, assign perf_users group to Perf tool executable and limit access to the executable for @@ -108,30 +122,51 @@ taken to create such a group of privileged Perf users. -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf 2. Assign the required capabilities to the Perf tool executable file and - enable members of perf_users group with performance monitoring + enable members of perf_users group with monitoring and observability privileges [6]_ : :: - # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf - # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf + # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf perf: OK # getcap perf - perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep + perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep + +If the libcap installed doesn't yet support "cap_perfmon", use "38" instead, +i.e.: + +:: + + # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf + +Note that you may need to have 'cap_ipc_lock' in the mix for tools such as +'perf top', alternatively use 'perf top -m N', to reduce the memory that +it uses for the perf ring buffer, see the memory allocation section below. + +Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38, +CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u', +so as a workaround explicitly ask for the 'cycles' event, i.e.: + +:: + + # perf top -e cycles + +To get kernel and user samples with a perf binary with just CAP_PERFMON. As a result, members of perf_users group are capable of conducting -performance monitoring by using functionality of the configured Perf -tool executable that, when executes, passes perf_events subsystem scope -checks. +performance monitoring and observability by using functionality of the +configured Perf tool executable that, when executes, passes perf_events +subsystem scope checks. This specific access control management is only available to superuser or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ capabilities. -perf_events/Perf unprivileged users +Unprivileged users ----------------------------------- -perf_events/Perf *scope* and *access* control for unprivileged processes +perf_events *scope* and *access* control for unprivileged processes is governed by perf_event_paranoid [2]_ setting: -1: @@ -166,7 +201,7 @@ is governed by perf_event_paranoid [2]_ setting: perf_event_mlock_kb locking limit is imposed but ignored for unprivileged processes with CAP_IPC_LOCK capability. -perf_events/Perf resource control +Resource control --------------------------------- Open file descriptors @@ -227,4 +262,5 @@ Bibliography .. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_ .. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_ .. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_ - +.. [13] `<https://sites.google.com/site/fullycapable>`_ +.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_ diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0d427fd10941..8d25892a18f8 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -721,7 +721,13 @@ perf_event_paranoid =================== Controls use of the performance events system by unprivileged -users (without CAP_SYS_ADMIN). The default value is 2. +users (without CAP_PERFMON). The default value is 2. + +For backward compatibility reasons access to system performance +monitoring and observability remains open for CAP_SYS_ADMIN +privileged processes but CAP_SYS_ADMIN usage for secure system +performance monitoring and observability operations is discouraged +with respect to CAP_PERFMON use cases. === ================================================================== -1 Allow use of (almost) all events by all users. @@ -730,13 +736,13 @@ users (without CAP_SYS_ADMIN). The default value is 2. ``CAP_IPC_LOCK``. >=0 Disallow ftrace function tracepoint by users without - ``CAP_SYS_ADMIN``. + ``CAP_PERFMON``. - Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``. + Disallow raw tracepoint access by users without ``CAP_PERFMON``. ->=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``. +>=1 Disallow CPU event access by users without ``CAP_PERFMON``. ->=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``. +>=2 Disallow kernel profiling by users without ``CAP_PERFMON``. === ================================================================== |