diff options
Diffstat (limited to 'Documentation/userspace-api/fwctl')
-rw-r--r-- | Documentation/userspace-api/fwctl/fwctl-cxl.rst | 142 | ||||
-rw-r--r-- | Documentation/userspace-api/fwctl/fwctl.rst | 286 | ||||
-rw-r--r-- | Documentation/userspace-api/fwctl/index.rst | 14 | ||||
-rw-r--r-- | Documentation/userspace-api/fwctl/pds_fwctl.rst | 46 |
4 files changed, 488 insertions, 0 deletions
diff --git a/Documentation/userspace-api/fwctl/fwctl-cxl.rst b/Documentation/userspace-api/fwctl/fwctl-cxl.rst new file mode 100644 index 000000000000..670b43b72949 --- /dev/null +++ b/Documentation/userspace-api/fwctl/fwctl-cxl.rst @@ -0,0 +1,142 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +fwctl cxl driver +================ + +:Author: Dave Jiang + +Overview +======== + +The CXL spec defines a set of commands that can be issued to the mailbox of a +CXL device or switch. It also left room for vendor specific commands to be +issued to the mailbox as well. fwctl provides a path to issue a set of allowed +mailbox commands from user space to the device moderated by the kernel driver. + +The following 3 commands will be used to support CXL Features: +CXL spec r3.1 8.2.9.6.1 Get Supported Features (Opcode 0500h) +CXL spec r3.1 8.2.9.6.2 Get Feature (Opcode 0501h) +CXL spec r3.1 8.2.9.6.3 Set Feature (Opcode 0502h) + +The "Get Supported Features" return data may be filtered by the kernel driver to +drop any features that are forbidden by the kernel or being exclusively used by +the kernel. The driver will set the "Set Feature Size" of the "Get Supported +Features Supported Feature Entry" to 0 to indicate that the Feature cannot be +modified. The "Get Supported Features" command and the "Get Features" falls +under the fwctl policy of FWCTL_RPC_CONFIGURATION. + +For "Set Feature" command, the access policy currently is broken down into two +categories depending on the Set Feature effects reported by the device. If the +Set Feature will cause immediate change to the device, the fwctl access policy +must be FWCTL_RPC_DEBUG_WRITE_FULL. The effects for this level are +"immediate config change", "immediate data change", "immediate policy change", +or "immediate log change" for the set effects mask. If the effects are "config +change with cold reset" or "config change with conventional reset", then the +fwctl access policy must be FWCTL_RPC_DEBUG_WRITE or higher. + +fwctl cxl User API +================== + +.. kernel-doc:: include/uapi/fwctl/cxl.h + +1. Driver info query +-------------------- + +First step for the app is to issue the ioctl(FWCTL_CMD_INFO). Successful +invocation of the ioctl implies the Features capability is operational and +returns an all zeros 32bit payload. A ``struct fwctl_info`` needs to be filled +out with the ``fwctl_info.out_device_type`` set to ``FWCTL_DEVICE_TYPE_CXL``. +The return data should be ``struct fwctl_info_cxl`` that contains a reserved +32bit field that should be all zeros. + +2. Send hardware commands +------------------------- + +Next step is to send the 'Get Supported Features' command to the driver from +user space via ioctl(FWCTL_RPC). A ``struct fwctl_rpc_cxl`` is pointed to +by ``fwctl_rpc.in``. ``struct fwctl_rpc_cxl.in_payload`` points to +the hardware input structure that is defined by the CXL spec. ``fwctl_rpc.out`` +points to the buffer that contains a ``struct fwctl_rpc_cxl_out`` that includes +the hardware output data inlined as ``fwctl_rpc_cxl_out.payload``. This command +is called twice. First time to retrieve the number of features supported. +A second time to retrieve the specific feature details as the output data. + +After getting the specific feature details, a Get/Set Feature command can be +appropriately programmed and sent. For a "Set Feature" command, the retrieved +feature info contains an effects field that details the resulting +"Set Feature" command will trigger. That will inform the user whether +the system is configured to allowed the "Set Feature" command or not. + +Code example of a Get Feature +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: c + + static int cxl_fwctl_rpc_get_test_feature(int fd, struct test_feature *feat_ctx, + const uint32_t expected_data) + { + struct cxl_mbox_get_feat_in *feat_in; + struct fwctl_rpc_cxl_out *out; + struct fwctl_rpc rpc = {0}; + struct fwctl_rpc_cxl *in; + size_t out_size, in_size; + uint32_t val; + void *data; + int rc; + + in_size = sizeof(*in) + sizeof(*feat_in); + rc = posix_memalign((void **)&in, 16, in_size); + if (rc) + return -ENOMEM; + memset(in, 0, in_size); + feat_in = &in->get_feat_in; + + uuid_copy(feat_in->uuid, feat_ctx->uuid); + feat_in->count = feat_ctx->get_size; + + out_size = sizeof(*out) + feat_ctx->get_size; + rc = posix_memalign((void **)&out, 16, out_size); + if (rc) + goto free_in; + memset(out, 0, out_size); + + in->opcode = CXL_MBOX_OPCODE_GET_FEATURE; + in->op_size = sizeof(*feat_in); + + rpc.size = sizeof(rpc); + rpc.scope = FWCTL_RPC_CONFIGURATION; + rpc.in_len = in_size; + rpc.out_len = out_size; + rpc.in = (uint64_t)(uint64_t *)in; + rpc.out = (uint64_t)(uint64_t *)out; + + rc = send_command(fd, &rpc, out); + if (rc) + goto free_all; + + data = out->payload; + val = le32toh(*(__le32 *)data); + if (memcmp(&val, &expected_data, sizeof(val)) != 0) { + rc = -ENXIO; + goto free_all; + } + + free_all: + free(out); + free_in: + free(in); + return rc; + } + +Take a look at CXL CLI test directory +<https://github.com/pmem/ndctl/tree/main/test/fwctl.c> for a detailed user code +for examples on how to exercise this path. + + +fwctl cxl Kernel API +==================== + +.. kernel-doc:: drivers/cxl/core/features.c + :export: +.. kernel-doc:: include/cxl/features.h diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst new file mode 100644 index 000000000000..fdcfe418a83f --- /dev/null +++ b/Documentation/userspace-api/fwctl/fwctl.rst @@ -0,0 +1,286 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +fwctl subsystem +=============== + +:Author: Jason Gunthorpe + +Overview +======== + +Modern devices contain extensive amounts of FW, and in many cases, are largely +software-defined pieces of hardware. The evolution of this approach is largely a +reaction to Moore's Law where a chip tape out is now highly expensive, and the +chip design is extremely large. Replacing fixed HW logic with a flexible and +tightly coupled FW/HW combination is an effective risk mitigation against chip +respin. Problems in the HW design can be counteracted in device FW. This is +especially true for devices which present a stable and backwards compatible +interface to the operating system driver (such as NVMe). + +The FW layer in devices has grown to incredible size and devices frequently +integrate clusters of fast processors to run it. For example, mlx5 devices have +over 30MB of FW code, and big configurations operate with over 1GB of FW managed +runtime state. + +The availability of such a flexible layer has created quite a variety in the +industry where single pieces of silicon are now configurable software-defined +devices and can operate in substantially different ways depending on the need. +Further, we often see cases where specific sites wish to operate devices in ways +that are highly specialized and require applications that have been tailored to +their unique configuration. + +Further, devices have become multi-functional and integrated to the point they +no longer fit neatly into the kernel's division of subsystems. Modern +multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many +subsystems while sharing the underlying hardware using the auxiliary device +system. + +All together this creates a challenge for the operating system, where devices +have an expansive FW environment that needs robust device-specific debugging +support, and FW-driven functionality that is not well suited to “generic” +interfaces. fwctl seeks to allow access to the full device functionality from +user space in the areas of debuggability, management, and first-boot/nth-boot +provisioning. + +fwctl is aimed at the common device design pattern where the OS and FW +communicate via an RPC message layer constructed with a queue or mailbox scheme. +In this case the driver will typically have some layer to deliver RPC messages +and collect RPC responses from device FW. The in-kernel subsystem drivers that +operate the device for its primary purposes will use these RPCs to build their +drivers, but devices also usually have a set of ancillary RPCs that don't really +fit into any specific subsystem. For example, a HW RAID controller is primarily +operated by the block layer but also comes with a set of RPCs to administer the +construction of drives within the HW RAID. + +In the past when devices were more single function, individual subsystems would +grow different approaches to solving some of these common problems. For instance +monitoring device health, manipulating its FLASH, debugging the FW, +provisioning, all have various unique interfaces across the kernel. + +fwctl's purpose is to define a common set of limited rules, described below, +that allow user space to securely construct and execute RPCs inside device FW. +The rules serve as an agreement between the operating system and FW on how to +correctly design the RPC interface. As a uAPI the subsystem provides a thin +layer of discovery and a generic uAPI to deliver the RPCs and collect the +response. It supports a system of user space libraries and tools which will +use this interface to control the device using the device native protocols. + +Scope of Action +--------------- + +fwctl drivers are strictly restricted to being a way to operate the device FW. +It is not an avenue to access random kernel internals, or other operating system +SW states. + +fwctl instances must operate on a well-defined device function, and the device +should have a well-defined security model for what scope within the physical +device the function is permitted to access. For instance, the most complex PCIe +device today may broadly have several function-level scopes: + + 1. A privileged function with full access to the on-device global state and + configuration + + 2. Multiple hypervisor functions with control over itself and child functions + used with VMs + + 3. Multiple VM functions tightly scoped within the VM + +The device may create a logical parent/child relationship between these scopes. +For instance a child VM's FW may be within the scope of the hypervisor FW. It is +quite common in the VFIO world that the hypervisor environment has a complex +provisioning/profiling/configuration responsibility for the function VFIO +assigns to the VM. + +Further, within the function, devices often have RPC commands that fall within +some general scopes of action (see enum fwctl_rpc_scope): + + 1. Access to function & child configuration, FLASH, etc. that becomes live at a + function reset. Access to function & child runtime configuration that is + transparent or non-disruptive to any driver or VM. + + 2. Read-only access to function debug information that may report on FW objects + in the function & child, including FW objects owned by other kernel + subsystems. + + 3. Write access to function & child debug information strictly compatible with + the principles of kernel lockdown and kernel integrity protection. Triggers + a kernel Taint. + + 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO. + +User space will provide a scope label on each RPC and the kernel must enforce the +above CAPs and taints based on that scope. A combination of kernel and FW can +enforce that RPCs are placed in the correct scope by user space. + +Denied behavior +--------------- + +There are many things this interface must not allow user space to do (without a +Taint or CAP), broadly derived from the principles of kernel lockdown. Some +examples: + + 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with + untrusted code, or otherwise compromise device or system security and + integrity. + + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel + objects owned by kernel drivers. + + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel + driver can react to the device configuration at function reset/driver load + time, but otherwise must not be coupled to fwctl. + + 4. Operate the HW in a way that overlaps with the core purpose of another + primary kernel subsystem, such as read/write to LBAs, send/receive of + network packets, or operate an accelerator's data plane. + +fwctl is not a replacement for device direct access subsystems like uacce or +VFIO. + +Operations exposed through fwctl's non-taining interfaces should be fully +sharable with other users of the device. For instance exposing a RPC through +fwctl should never prevent a kernel subsystem from also concurrently using that +same RPC or hardware unit down the road. In such cases fwctl will be less +important than proper kernel subsystems that eventually emerge. Mistakes in this +area resulting in clashes will be resolved in favour of a kernel implementation. + +fwctl User API +============== + +.. kernel-doc:: include/uapi/fwctl/fwctl.h +.. kernel-doc:: include/uapi/fwctl/mlx5.h +.. kernel-doc:: include/uapi/fwctl/pds.h + +sysfs Class +----------- + +fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices +(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device +operates the iotcl uAPI described above. + +fwctl devices can be related to driver components in other subsystems through +sysfs:: + + $ ls /sys/class/fwctl/fwctl0/device/infiniband/ + ibp0s10f0 + + $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ + fwctl0/ + + $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 + dev device power subsystem uevent + +User space Community +-------------------- + +Drawing inspiration from nvme-cli, participating in the kernel side must come +with a user space in a common TBD git tree, at a minimum to usefully operate the +kernel driver. Providing such an implementation is a pre-condition to merging a +kernel driver. + +The goal is to build user space community around some of the shared problems +we all have, and ideally develop some common user space programs with some +starting themes of: + + - Device in-field debugging + + - HW provisioning + + - VFIO child device profiling before VM boot + + - Confidential Compute topics (attestation, secure provisioning) + +that stretch across all subsystems in the kernel. fwupd is a great example of +how an excellent user space experience can emerge out of kernel-side diversity. + +fwctl Kernel API +================ + +.. kernel-doc:: drivers/fwctl/main.c + :export: +.. kernel-doc:: include/linux/fwctl.h + +fwctl Driver design +------------------- + +In many cases a fwctl driver is going to be part of a larger cross-subsystem +device possibly using the auxiliary_device mechanism. In that case several +subsystems are going to be sharing the same device and FW interface layer so the +device design must already provide for isolation and cooperation between kernel +subsystems. fwctl should fit into that same model. + +Part of the driver should include a description of how its scope restrictions +and security model work. The driver and FW together must ensure that RPCs +provided by user space are mapped to the appropriate scope. If the validation is +done in the driver then the validation can read a 'command effects' report from +the device, or hardwire the enforcement. If the validation is done in the FW, +then the driver should pass the fwctl_rpc_scope to the FW along with the command. + +The driver and FW must cooperate to ensure that either fwctl cannot allocate +any FW resources, or any resources it does allocate are freed on FD closure. A +driver primarily constructed around FW RPCs may find that its core PCI function +and RPC layer belongs under fwctl with auxiliary devices connecting to other +subsystems. + +Each device type must be mindful of Linux's philosophy for stable ABI. The FW +RPC interface does not have to meet a strictly stable ABI, but it does need to +meet an expectation that userspace tools that are deployed and in significant +use don't needlessly break. FW upgrade and kernel upgrade should keep widely +deployed tooling working. + +Development and debugging focused RPCs under more permissive scopes can have +less stabilitiy if the tools using them are only run under exceptional +circumstances and not for every day use of the device. Debugging tools may even +require exact version matching as they may require something similar to DWARF +debug information from the FW binary. + +Security Response +================= + +The kernel remains the gatekeeper for this interface. If violations of the +scopes, security or isolation principles are found, we have options to let +devices fix them with a FW update, push a kernel patch to parse and block RPC +commands or push a kernel patch to block entire firmware versions/devices. + +While the kernel can always directly parse and restrict RPCs, it is expected +that the existing kernel pattern of allowing drivers to delegate validation to +FW to be a useful design. + +Existing Similar Examples +========================= + +The approach described in this document is not a new idea. Direct, or near +direct device access has been offered by the kernel in different areas for +decades. With more devices wanting to follow this design pattern it is becoming +clear that it is not entirely well understood and, more importantly, the +security considerations are not well defined or agreed upon. + +Some examples: + + - HW RAID controllers. This includes RPCs to do things like compose drives into + a RAID volume, configure RAID parameters, monitor the HW and more. + + - Baseboard managers. RPCs for configuring settings in the device and more + + - NVMe vendor command capsules. nvme-cli provides access to some monitoring + functions that different products have defined, but more exist. + + - CXL also has a NVMe-like vendor command system. + + - DRM allows user space drivers to send commands to the device via kernel + mediation + + - RDMA allows user space drivers to directly push commands to the device + without kernel involvement + + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc. + +The first 4 are examples of areas that fwctl intends to cover. The latter three +are examples of denied behavior as they fully overlap with the primary purpose +of a kernel subsystem. + +Some key lessons learned from these past efforts are the importance of having a +common user space project to use as a pre-condition for obtaining a kernel +driver. Developing good community around useful software in user space is key to +getting companies to fund participation to enable their products. diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst new file mode 100644 index 000000000000..316ac456ad3b --- /dev/null +++ b/Documentation/userspace-api/fwctl/index.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Firmware Control (FWCTL) Userspace API +====================================== + +A framework that define a common set of limited rules that allows user space +to securely construct and execute RPCs inside device firmware. + +.. toctree:: + :maxdepth: 1 + + fwctl + fwctl-cxl + pds_fwctl diff --git a/Documentation/userspace-api/fwctl/pds_fwctl.rst b/Documentation/userspace-api/fwctl/pds_fwctl.rst new file mode 100644 index 000000000000..b5a31f82c883 --- /dev/null +++ b/Documentation/userspace-api/fwctl/pds_fwctl.rst @@ -0,0 +1,46 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +fwctl pds driver +================ + +:Author: Shannon Nelson + +Overview +======== + +The PDS Core device makes a fwctl service available through an +auxiliary_device named pds_core.fwctl.N. The pds_fwctl driver binds to +this device and registers itself with the fwctl subsystem. The resulting +userspace interface is used by an application that is a part of the +AMD Pensando software package for the Distributed Service Card (DSC). + +The pds_fwctl driver has little knowledge of the firmware's internals. +It only knows how to send commands through pds_core's message queue to the +firmware for fwctl requests. The set of fwctl operations available +depends on the firmware in the DSC, and the userspace application +version must match the firmware so that they can talk to each other. + +When a connection is created the pds_fwctl driver requests from the +firmware a list of firmware object endpoints, and for each endpoint the +driver requests a list of operations for that endpoint. + +Each operation description includes a firmware defined command attribute +that maps to the FWCTL scope levels. The driver translates those firmware +values into the FWCTL scope values which can then be used for filtering the +scoped user requests. + +pds_fwctl User API +================== + +Each RPC request includes the target endpoint and the operation id, and in +and out buffer lengths and pointers. The driver verifies the existence +of the requested endpoint and operations, then checks the request scope +against the required scope of the operation. The request is then put +together with the request data and sent through pds_core's message queue +to the firmware, and the results are returned to the caller. + +The RPC endpoints, operations, and buffer contents are defined by the +particular firmware package in the device, which varies across the +available product configurations. The details are available in the +specific product SDK documentation. |