diff options
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/amd_hsmp.rst | 86 | ||||
-rw-r--r-- | Documentation/x86/index.rst | 2 | ||||
-rw-r--r-- | Documentation/x86/intel-hfi.rst | 72 | ||||
-rw-r--r-- | Documentation/x86/sva.rst | 53 | ||||
-rw-r--r-- | Documentation/x86/x86_64/boot-options.rst | 9 |
5 files changed, 202 insertions, 20 deletions
diff --git a/Documentation/x86/amd_hsmp.rst b/Documentation/x86/amd_hsmp.rst new file mode 100644 index 000000000000..440e4b645a1c --- /dev/null +++ b/Documentation/x86/amd_hsmp.rst @@ -0,0 +1,86 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +AMD HSMP interface +============================================ + +Newer Fam19h EPYC server line of processors from AMD support system +management functionality via HSMP (Host System Management Port). + +The Host System Management Port (HSMP) is an interface to provide +OS-level software with access to system management functions via a +set of mailbox registers. + +More details on the interface can be found in chapter +"7 Host System Management Port (HSMP)" of the family/model PPR +Eg: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip + +HSMP interface is supported on EPYC server CPU models only. + + +HSMP device +============================================ + +amd_hsmp driver under the drivers/platforms/x86/ creates miscdevice +/dev/hsmp to let user space programs run hsmp mailbox commands. + +$ ls -al /dev/hsmp +crw-r--r-- 1 root root 10, 123 Jan 21 21:41 /dev/hsmp + +Characteristics of the dev node: + * Write mode is used for running set/configure commands + * Read mode is used for running get/status monitor commands + +Access restrictions: + * Only root user is allowed to open the file in write mode. + * The file can be opened in read mode by all the users. + +In-kernel integration: + * Other subsystems in the kernel can use the exported transport + function hsmp_send_message(). + * Locking across callers is taken care by the driver. + + +An example +========== + +To access hsmp device from a C program. +First, you need to include the headers:: + + #include <linux/amd_hsmp.h> + +Which defines the supported messages/message IDs. + +Next thing, open the device file, as follows:: + + int file; + + file = open("/dev/hsmp", O_RDWR); + if (file < 0) { + /* ERROR HANDLING; you can check errno to see what went wrong */ + exit(1); + } + +The following IOCTL is defined: + +``ioctl(file, HSMP_IOCTL_CMD, struct hsmp_message *msg)`` + The argument is a pointer to a:: + + struct hsmp_message { + __u32 msg_id; /* Message ID */ + __u16 num_args; /* Number of input argument words in message */ + __u16 response_sz; /* Number of expected output/response words */ + __u32 args[HSMP_MAX_MSG_LEN]; /* argument/response buffer */ + __u16 sock_ind; /* socket number */ + }; + +The ioctl would return a non-zero on failure; you can read errno to see +what happened. The transaction returns 0 on success. + +More details on the interface and message definitions can be found in chapter +"7 Host System Management Port (HSMP)" of the respective family/model PPR +eg: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip + +User space C-APIs are made available by linking against the esmi library, +which is provided by the E-SMS project https://developer.amd.com/e-sms/. +See: https://github.com/amd/esmi_ib_library diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index f498f1d36cd3..91b2fa456618 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -21,9 +21,11 @@ x86-specific Documentation tlb mtrr pat + intel-hfi intel-iommu intel_txt amd-memory-encryption + amd_hsmp pti mds microcode diff --git a/Documentation/x86/intel-hfi.rst b/Documentation/x86/intel-hfi.rst new file mode 100644 index 000000000000..49dea58ea4fb --- /dev/null +++ b/Documentation/x86/intel-hfi.rst @@ -0,0 +1,72 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================================ +Hardware-Feedback Interface for scheduling on Intel Hardware +============================================================ + +Overview +-------- + +Intel has described the Hardware Feedback Interface (HFI) in the Intel 64 and +IA-32 Architectures Software Developer's Manual (Intel SDM) Volume 3 Section +14.6 [1]_. + +The HFI gives the operating system a performance and energy efficiency +capability data for each CPU in the system. Linux can use the information from +the HFI to influence task placement decisions. + +The Hardware Feedback Interface +------------------------------- + +The Hardware Feedback Interface provides to the operating system information +about the performance and energy efficiency of each CPU in the system. Each +capability is given as a unit-less quantity in the range [0-255]. Higher values +indicate higher capability. Energy efficiency and performance are reported in +separate capabilities. Even though on some systems these two metrics may be +related, they are specified as independent capabilities in the Intel SDM. + +These capabilities may change at runtime as a result of changes in the +operating conditions of the system or the action of external factors. The rate +at which these capabilities are updated is specific to each processor model. On +some models, capabilities are set at boot time and never change. On others, +capabilities may change every tens of milliseconds. For instance, a remote +mechanism may be used to lower Thermal Design Power. Such change can be +reflected in the HFI. Likewise, if the system needs to be throttled due to +excessive heat, the HFI may reflect reduced performance on specific CPUs. + +The kernel or a userspace policy daemon can use these capabilities to modify +task placement decisions. For instance, if either the performance or energy +capabilities of a given logical processor becomes zero, it is an indication that +the hardware recommends to the operating system to not schedule any tasks on +that processor for performance or energy efficiency reasons, respectively. + +Implementation details for Linux +-------------------------------- + +The infrastructure to handle thermal event interrupts has two parts. In the +Local Vector Table of a CPU's local APIC, there exists a register for the +Thermal Monitor Register. This register controls how interrupts are delivered +to a CPU when the thermal monitor generates and interrupt. Further details +can be found in the Intel SDM Vol. 3 Section 10.5 [1]_. + +The thermal monitor may generate interrupts per CPU or per package. The HFI +generates package-level interrupts. This monitor is configured and initialized +via a set of machine-specific registers. Specifically, the HFI interrupt and +status are controlled via designated bits in the IA32_PACKAGE_THERM_INTERRUPT +and IA32_PACKAGE_THERM_STATUS registers, respectively. There exists one HFI +table per package. Further details can be found in the Intel SDM Vol. 3 +Section 14.9 [1]_. + +The hardware issues an HFI interrupt after updating the HFI table and is ready +for the operating system to consume it. CPUs receive such interrupt via the +thermal entry in the Local APIC's Local Vector Table. + +When servicing such interrupt, the HFI driver parses the updated table and +relays the update to userspace using the thermal notification framework. Given +that there may be many HFI updates every second, the updates relayed to +userspace are throttled at a rate of CONFIG_HZ jiffies. + +References +---------- + +.. [1] https://www.intel.com/sdm diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst index 076efd51ef1f..2e9b8b0f9a0f 100644 --- a/Documentation/x86/sva.rst +++ b/Documentation/x86/sva.rst @@ -104,18 +104,47 @@ The MSR must be configured on each logical CPU before any application thread can interact with a device. Threads that belong to the same process share the same page tables, thus the same MSR value. -PASID is cleared when a process is created. The PASID allocation and MSR -programming may occur long after a process and its threads have been created. -One thread must call iommu_sva_bind_device() to allocate the PASID for the -process. If a thread uses ENQCMD without the MSR first being populated, a #GP -will be raised. The kernel will update the PASID MSR with the PASID for all -threads in the process. A single process PASID can be used simultaneously -with multiple devices since they all share the same address space. - -One thread can call iommu_sva_unbind_device() to free the allocated PASID. -The kernel will clear the PASID MSR for all threads belonging to the process. - -New threads inherit the MSR value from the parent. +PASID Life Cycle Management +=========================== + +PASID is initialized as INVALID_IOASID (-1) when a process is created. + +Only processes that access SVA-capable devices need to have a PASID +allocated. This allocation happens when a process opens/binds an SVA-capable +device but finds no PASID for this process. Subsequent binds of the same, or +other devices will share the same PASID. + +Although the PASID is allocated to the process by opening a device, +it is not active in any of the threads of that process. It's loaded to the +IA32_PASID MSR lazily when a thread tries to submit a work descriptor +to a device using the ENQCMD. + +That first access will trigger a #GP fault because the IA32_PASID MSR +has not been initialized with the PASID value assigned to the process +when the device was opened. The Linux #GP handler notes that a PASID has +been allocated for the process, and so initializes the IA32_PASID MSR +and returns so that the ENQCMD instruction is re-executed. + +On fork(2) or exec(2) the PASID is removed from the process as it no +longer has the same address space that it had when the device was opened. + +On clone(2) the new task shares the same address space, so will be +able to use the PASID allocated to the process. The IA32_PASID is not +preemptively initialized as the PASID value might not be allocated yet or +the kernel does not know whether this thread is going to access the device +and the cleared IA32_PASID MSR reduces context switch overhead by xstate +init optimization. Since #GP faults have to be handled on any threads that +were created before the PASID was assigned to the mm of the process, newly +created threads might as well be treated in a consistent way. + +Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in +all threads in unbind, free the PASID lazily only on mm exit. + +If a process does a close(2) of the device file descriptor and munmap(2) +of the device MMIO portal, then the driver will unbind the device. The +PASID is still marked VALID in the PASID_MSR for any threads in the +process that accessed the device. But this is harmless as without the +MMIO portal they cannot submit new work to the device. Relationships ============= diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst index ccb7e86bf8d9..07aa0007f346 100644 --- a/Documentation/x86/x86_64/boot-options.rst +++ b/Documentation/x86/x86_64/boot-options.rst @@ -47,14 +47,7 @@ Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables. in a reboot. On Intel systems it is enabled by default. mce=nobootlog Disable boot machine check logging. - mce=tolerancelevel[,monarchtimeout] (number,number) - tolerance levels: - 0: always panic on uncorrected errors, log corrected errors - 1: panic or SIGBUS on uncorrected errors, log corrected errors - 2: SIGBUS or log uncorrected errors, log corrected errors - 3: never panic or SIGBUS, log all errors (for testing only) - Default is 1 - Can be also set using sysfs which is preferable. + mce=monarchtimeout (number) monarchtimeout: Sets the time in us to wait for other CPUs on machine checks. 0 to disable. |