diff options
Diffstat (limited to 'Documentation/PCI')
-rw-r--r-- | Documentation/PCI/acpi-info.rst (renamed from Documentation/PCI/acpi-info.txt) | 15 | ||||
-rw-r--r-- | Documentation/PCI/endpoint/index.rst | 13 | ||||
-rw-r--r-- | Documentation/PCI/endpoint/pci-endpoint-cfs.rst (renamed from Documentation/PCI/endpoint/pci-endpoint-cfs.txt) | 99 | ||||
-rw-r--r-- | Documentation/PCI/endpoint/pci-endpoint.rst (renamed from Documentation/PCI/endpoint/pci-endpoint.txt) | 92 | ||||
-rw-r--r-- | Documentation/PCI/endpoint/pci-test-function.rst (renamed from Documentation/PCI/endpoint/pci-test-function.txt) | 84 | ||||
-rw-r--r-- | Documentation/PCI/endpoint/pci-test-howto.rst (renamed from Documentation/PCI/endpoint/pci-test-howto.txt) | 81 | ||||
-rw-r--r-- | Documentation/PCI/index.rst | 18 | ||||
-rw-r--r-- | Documentation/PCI/msi-howto.rst (renamed from Documentation/PCI/MSI-HOWTO.txt) | 85 | ||||
-rw-r--r-- | Documentation/PCI/pci-error-recovery.rst (renamed from Documentation/PCI/pci-error-recovery.txt) | 287 | ||||
-rw-r--r-- | Documentation/PCI/pci-iov-howto.rst (renamed from Documentation/PCI/pci-iov-howto.txt) | 161 | ||||
-rw-r--r-- | Documentation/PCI/pci.rst (renamed from Documentation/PCI/pci.txt) | 356 | ||||
-rw-r--r-- | Documentation/PCI/pcieaer-howto.rst (renamed from Documentation/PCI/pcieaer-howto.txt) | 156 | ||||
-rw-r--r-- | Documentation/PCI/picebus-howto.rst (renamed from Documentation/PCI/PCIEBUS-HOWTO.txt) | 140 |
13 files changed, 879 insertions, 708 deletions
diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.rst index 3ffa3b03970e..060217081c79 100644 --- a/Documentation/PCI/acpi-info.txt +++ b/Documentation/PCI/acpi-info.rst @@ -1,4 +1,8 @@ - ACPI considerations for PCI host bridges +.. SPDX-License-Identifier: GPL-2.0 + +======================================== +ACPI considerations for PCI host bridges +======================================== The general rule is that the ACPI namespace should describe everything the OS might use unless there's another way for the OS to find it [1, 2]. @@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge [4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4: QWord/DWord/Word Address Space Descriptor (.1, .2, .3) - General Flags: Bit [0] Ignored + General Flags: Bit [0] Ignored Extended Address Space Descriptor (.4) - General Flags: Bit [0] Consumer/Producer: - 1–This device consumes this resource - 0–This device produces and consumes this resource + General Flags: Bit [0] Consumer/Producer: + + * 1 – This device consumes this resource + * 0 – This device produces and consumes this resource [5] ACPI 6.2, sec 19.6.43: ResourceUsage specifies whether the Memory range is consumed by diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst new file mode 100644 index 000000000000..d114ea74b444 --- /dev/null +++ b/Documentation/PCI/endpoint/index.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================== +PCI Endpoint Framework +====================== + +.. toctree:: + :maxdepth: 2 + + pci-endpoint + pci-endpoint-cfs + pci-test-function + pci-test-howto diff --git a/Documentation/PCI/endpoint/pci-endpoint-cfs.txt b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst index d740f29960a4..b6d39cdec56e 100644 --- a/Documentation/PCI/endpoint/pci-endpoint-cfs.txt +++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst @@ -1,41 +1,51 @@ - CONFIGURING PCI ENDPOINT USING CONFIGFS - Kishon Vijay Abraham I <kishon@ti.com> +.. SPDX-License-Identifier: GPL-2.0 + +======================================= +Configuring PCI Endpoint Using CONFIGFS +======================================= + +:Author: Kishon Vijay Abraham I <kishon@ti.com> The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the PCI endpoint function and to bind the endpoint function with the endpoint controller. (For introducing other mechanisms to configure the PCI Endpoint Function refer to [1]). -*) Mounting configfs +Mounting configfs +================= The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs -directory. configfs can be mounted using the following command. +directory. configfs can be mounted using the following command:: mount -t configfs none /sys/kernel/config -*) Directory Structure +Directory Structure +=================== The pci_ep configfs has two directories at its root: controllers and functions. Every EPC device present in the system will have an entry in the *controllers* directory and and every EPF driver present in the system will have an entry in the *functions* directory. +:: -/sys/kernel/config/pci_ep/ - .. controllers/ - .. functions/ + /sys/kernel/config/pci_ep/ + .. controllers/ + .. functions/ -*) Creating EPF Device +Creating EPF Device +=================== Every registered EPF driver will be listed in controllers directory. The entries corresponding to EPF driver will be created by the EPF core. +:: -/sys/kernel/config/pci_ep/functions/ - .. <EPF Driver1>/ - ... <EPF Device 11>/ - ... <EPF Device 21>/ - .. <EPF Driver2>/ - ... <EPF Device 12>/ - ... <EPF Device 22>/ + /sys/kernel/config/pci_ep/functions/ + .. <EPF Driver1>/ + ... <EPF Device 11>/ + ... <EPF Device 21>/ + .. <EPF Driver2>/ + ... <EPF Device 12>/ + ... <EPF Device 22>/ In order to create a <EPF device> of the type probed by <EPF Driver>, the user has to create a directory inside <EPF DriverN>. @@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be used to configure the standard configuration header of the endpoint function. (These entries are created by the framework when any new <EPF Device> is created) - - .. <EPF Driver1>/ - ... <EPF Device 11>/ - ... vendorid - ... deviceid - ... revid - ... progif_code - ... subclass_code - ... baseclass_code - ... cache_line_size - ... subsys_vendor_id - ... subsys_id - ... interrupt_pin - -*) EPC Device +:: + + .. <EPF Driver1>/ + ... <EPF Device 11>/ + ... vendorid + ... deviceid + ... revid + ... progif_code + ... subclass_code + ... baseclass_code + ... cache_line_size + ... subsys_vendor_id + ... subsys_id + ... interrupt_pin + +EPC Device +========== Every registered EPC device will be listed in controllers directory. The entries corresponding to EPC device will be created by the EPC core. - -/sys/kernel/config/pci_ep/controllers/ - .. <EPC Device1>/ - ... <Symlink EPF Device11>/ - ... <Symlink EPF Device12>/ - ... start - .. <EPC Device2>/ - ... <Symlink EPF Device21>/ - ... <Symlink EPF Device22>/ - ... start +:: + + /sys/kernel/config/pci_ep/controllers/ + .. <EPC Device1>/ + ... <Symlink EPF Device11>/ + ... <Symlink EPF Device12>/ + ... start + .. <EPC Device2>/ + ... <Symlink EPF Device21>/ + ... <Symlink EPF Device22>/ + ... start The <EPC Device> directory will have a list of symbolic links to <EPF Device>. These symbolic links should be created by the user to @@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once "1" is written to this field, the endpoint device will be ready to establish the link with the host. This is usually done after all the EPF devices are created and linked with the EPC device. - +:: | controllers/ | <Directory: EPC name>/ @@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device. | interrupt_pin | function -[1] -> Documentation/PCI/endpoint/pci-endpoint.txt +[1] :doc:`pci-endpoint` diff --git a/Documentation/PCI/endpoint/pci-endpoint.txt b/Documentation/PCI/endpoint/pci-endpoint.rst index e86a96b66a6a..0e2311b5617b 100644 --- a/Documentation/PCI/endpoint/pci-endpoint.txt +++ b/Documentation/PCI/endpoint/pci-endpoint.rst @@ -1,11 +1,13 @@ - PCI ENDPOINT FRAMEWORK - Kishon Vijay Abraham I <kishon@ti.com> +.. SPDX-License-Identifier: GPL-2.0 + +:Author: Kishon Vijay Abraham I <kishon@ti.com> This document is a guide to use the PCI Endpoint Framework in order to create endpoint controller driver, endpoint function driver, and using configfs interface to bind the function driver to the controller driver. -1. Introduction +Introduction +============ Linux has a comprehensive PCI subsystem to support PCI controllers that operates in Root Complex mode. The subsystem has capability to scan PCI bus, @@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an EP system which can have a wide variety of use cases from testing or validation, co-processor accelerator, etc. -2. PCI Endpoint Core +PCI Endpoint Core +================= The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller library, the Endpoint Function library, and the configfs layer to bind the endpoint function with the endpoint controller. -2.1 PCI Endpoint Controller(EPC) Library +PCI Endpoint Controller(EPC) Library +------------------------------------ The EPC library provides APIs to be used by the controller that can operate in endpoint mode. It also provides APIs to be used by function driver/library in order to implement a particular endpoint function. -2.1.1 APIs for the PCI controller Driver +APIs for the PCI controller Driver +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section lists the APIs that the PCI Endpoint core provides to be used by the PCI controller driver. -*) devm_pci_epc_create()/pci_epc_create() +* devm_pci_epc_create()/pci_epc_create() The PCI controller driver should implement the following ops: + * write_header: ops to populate configuration space header * set_bar: ops to configure the BAR * clear_bar: ops to reset the BAR @@ -51,110 +57,116 @@ by the PCI controller driver. The PCI controller driver can then create a new EPC device by invoking devm_pci_epc_create()/pci_epc_create(). -*) devm_pci_epc_destroy()/pci_epc_destroy() +* devm_pci_epc_destroy()/pci_epc_destroy() The PCI controller driver can destroy the EPC device created by either devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or pci_epc_destroy(). -*) pci_epc_linkup() +* pci_epc_linkup() In order to notify all the function devices that the EPC device to which they are linked has established a link with the host, the PCI controller driver should invoke pci_epc_linkup(). -*) pci_epc_mem_init() +* pci_epc_mem_init() Initialize the pci_epc_mem structure used for allocating EPC addr space. -*) pci_epc_mem_exit() +* pci_epc_mem_exit() Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init(). -2.1.2 APIs for the PCI Endpoint Function Driver + +APIs for the PCI Endpoint Function Driver +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section lists the APIs that the PCI Endpoint core provides to be used by the PCI endpoint function driver. -*) pci_epc_write_header() +* pci_epc_write_header() The PCI endpoint function driver should use pci_epc_write_header() to write the standard configuration header to the endpoint controller. -*) pci_epc_set_bar() +* pci_epc_set_bar() The PCI endpoint function driver should use pci_epc_set_bar() to configure the Base Address Register in order for the host to assign PCI addr space. Register space of the function driver is usually configured using this API. -*) pci_epc_clear_bar() +* pci_epc_clear_bar() The PCI endpoint function driver should use pci_epc_clear_bar() to reset the BAR. -*) pci_epc_raise_irq() +* pci_epc_raise_irq() The PCI endpoint function driver should use pci_epc_raise_irq() to raise Legacy Interrupt, MSI or MSI-X Interrupt. -*) pci_epc_mem_alloc_addr() +* pci_epc_mem_alloc_addr() The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to allocate memory address from EPC addr space which is required to access RC's buffer -*) pci_epc_mem_free_addr() +* pci_epc_mem_free_addr() The PCI endpoint function driver should use pci_epc_mem_free_addr() to free the memory space allocated using pci_epc_mem_alloc_addr(). -2.1.3 Other APIs +Other APIs +~~~~~~~~~~ There are other APIs provided by the EPC library. These are used for binding the EPF device with EPC device. pci-ep-cfs.c can be used as reference for using these APIs. -*) pci_epc_get() +* pci_epc_get() Get a reference to the PCI endpoint controller based on the device name of the controller. -*) pci_epc_put() +* pci_epc_put() Release the reference to the PCI endpoint controller obtained using pci_epc_get() -*) pci_epc_add_epf() +* pci_epc_add_epf() Add a PCI endpoint function to a PCI endpoint controller. A PCIe device can have up to 8 functions according to the specification. -*) pci_epc_remove_epf() +* pci_epc_remove_epf() Remove the PCI endpoint function from PCI endpoint controller. -*) pci_epc_start() +* pci_epc_start() The PCI endpoint function driver should invoke pci_epc_start() once it has configured the endpoint function and wants to start the PCI link. -*) pci_epc_stop() +* pci_epc_stop() The PCI endpoint function driver should invoke pci_epc_stop() to stop the PCI LINK. -2.2 PCI Endpoint Function(EPF) Library + +PCI Endpoint Function(EPF) Library +---------------------------------- The EPF library provides APIs to be used by the function driver and the EPC library to provide endpoint mode functionality. -2.2.1 APIs for the PCI Endpoint Function Driver +APIs for the PCI Endpoint Function Driver +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section lists the APIs that the PCI Endpoint core provides to be used by the PCI endpoint function driver. -*) pci_epf_register_driver() +* pci_epf_register_driver() The PCI Endpoint Function driver should implement the following ops: * bind: ops to perform when a EPC device has been bound to EPF device @@ -166,50 +178,54 @@ by the PCI endpoint function driver. The PCI Function driver can then register the PCI EPF driver by using pci_epf_register_driver(). -*) pci_epf_unregister_driver() +* pci_epf_unregister_driver() The PCI Function driver can unregister the PCI EPF driver by using pci_epf_unregister_driver(). -*) pci_epf_alloc_space() +* pci_epf_alloc_space() The PCI Function driver can allocate space for a particular BAR using pci_epf_alloc_space(). -*) pci_epf_free_space() +* pci_epf_free_space() The PCI Function driver can free the allocated space (using pci_epf_alloc_space) by invoking pci_epf_free_space(). -2.2.2 APIs for the PCI Endpoint Controller Library +APIs for the PCI Endpoint Controller Library +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + This section lists the APIs that the PCI Endpoint core provides to be used by the PCI endpoint controller library. -*) pci_epf_linkup() +* pci_epf_linkup() The PCI endpoint controller library invokes pci_epf_linkup() when the EPC device has established the connection to the host. -2.2.2 Other APIs +Other APIs +~~~~~~~~~~ + There are other APIs provided by the EPF library. These are used to notify the function driver when the EPF device is bound to the EPC device. pci-ep-cfs.c can be used as reference for using these APIs. -*) pci_epf_create() +* pci_epf_create() Create a new PCI EPF device by passing the name of the PCI EPF device. This name will be used to bind the the EPF device to a EPF driver. -*) pci_epf_destroy() +* pci_epf_destroy() Destroy the created PCI EPF device. -*) pci_epf_bind() +* pci_epf_bind() pci_epf_bind() should be invoked when the EPF device has been bound to a EPC device. -*) pci_epf_unbind() +* pci_epf_unbind() pci_epf_unbind() should be invoked when the binding between EPC device and EPF device is lost. diff --git a/Documentation/PCI/endpoint/pci-test-function.txt b/Documentation/PCI/endpoint/pci-test-function.rst index 5916f1f592bb..3c8521d7aa31 100644 --- a/Documentation/PCI/endpoint/pci-test-function.txt +++ b/Documentation/PCI/endpoint/pci-test-function.rst @@ -1,5 +1,10 @@ - PCI TEST - Kishon Vijay Abraham I <kishon@ti.com> +.. SPDX-License-Identifier: GPL-2.0 + +================= +PCI Test Function +================= + +:Author: Kishon Vijay Abraham I <kishon@ti.com> Traditionally PCI RC has always been validated by using standard PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards. @@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers: 8) PCI_ENDPOINT_TEST_IRQ_TYPE 9) PCI_ENDPOINT_TEST_IRQ_NUMBER -*) PCI_ENDPOINT_TEST_MAGIC +* PCI_ENDPOINT_TEST_MAGIC This register will be used to test BAR0. A known pattern will be written and read back from MAGIC register to verify BAR0. -*) PCI_ENDPOINT_TEST_COMMAND: +* PCI_ENDPOINT_TEST_COMMAND This register will be used by the host driver to indicate the function that the endpoint device must perform. -Bitfield Description: - Bit 0 : raise legacy IRQ - Bit 1 : raise MSI IRQ - Bit 2 : raise MSI-X IRQ - Bit 3 : read command (read data from RC buffer) - Bit 4 : write command (write data to RC buffer) - Bit 5 : copy command (copy data from one RC buffer to another - RC buffer) +======== ================================================================ +Bitfield Description +======== ================================================================ +Bit 0 raise legacy IRQ +Bit 1 raise MSI IRQ +Bit 2 raise MSI-X IRQ +Bit 3 read command (read data from RC buffer) +Bit 4 write command (write data to RC buffer) +Bit 5 copy command (copy data from one RC buffer to another RC buffer) +======== ================================================================ -*) PCI_ENDPOINT_TEST_STATUS +* PCI_ENDPOINT_TEST_STATUS This register reflects the status of the PCI endpoint device. -Bitfield Description: - Bit 0 : read success - Bit 1 : read fail - Bit 2 : write success - Bit 3 : write fail - Bit 4 : copy success - Bit 5 : copy fail - Bit 6 : IRQ raised - Bit 7 : source address is invalid - Bit 8 : destination address is invalid - -*) PCI_ENDPOINT_TEST_SRC_ADDR +======== ============================== +Bitfield Description +======== ============================== +Bit 0 read success +Bit 1 read fail +Bit 2 write success +Bit 3 write fail +Bit 4 copy success +Bit 5 copy fail +Bit 6 IRQ raised +Bit 7 source address is invalid +Bit 8 destination address is invalid +======== ============================== + +* PCI_ENDPOINT_TEST_SRC_ADDR This register contains the source address (RC buffer address) for the COPY/READ command. -*) PCI_ENDPOINT_TEST_DST_ADDR +* PCI_ENDPOINT_TEST_DST_ADDR This register contains the destination address (RC buffer address) for the COPY/WRITE command. -*) PCI_ENDPOINT_TEST_IRQ_TYPE +* PCI_ENDPOINT_TEST_IRQ_TYPE This register contains the interrupt type (Legacy/MSI) triggered for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands. Possible types: - - Legacy : 0 - - MSI : 1 - - MSI-X : 2 -*) PCI_ENDPOINT_TEST_IRQ_NUMBER +====== == +Legacy 0 +MSI 1 +MSI-X 2 +====== == + +* PCI_ENDPOINT_TEST_IRQ_NUMBER This register contains the triggered ID interrupt. Admissible values: - - Legacy : 0 - - MSI : [1 .. 32] - - MSI-X : [1 .. 2048] + +====== =========== +Legacy 0 +MSI [1 .. 32] +MSI-X [1 .. 2048] +====== =========== diff --git a/Documentation/PCI/endpoint/pci-test-howto.txt b/Documentation/PCI/endpoint/pci-test-howto.rst index 040479f437a5..909f770a07d6 100644 --- a/Documentation/PCI/endpoint/pci-test-howto.txt +++ b/Documentation/PCI/endpoint/pci-test-howto.rst @@ -1,38 +1,51 @@ - PCI TEST USERGUIDE - Kishon Vijay Abraham I <kishon@ti.com> +.. SPDX-License-Identifier: GPL-2.0 + +=================== +PCI Test User Guide +=================== + +:Author: Kishon Vijay Abraham I <kishon@ti.com> This document is a guide to help users use pci-epf-test function driver and pci_endpoint_test host driver for testing PCI. The list of steps to be followed in the host side and EP side is given below. -1. Endpoint Device +Endpoint Device +=============== -1.1 Endpoint Controller Devices +Endpoint Controller Devices +--------------------------- -To find the list of endpoint controller devices in the system: +To find the list of endpoint controller devices in the system:: # ls /sys/class/pci_epc/ 51000000.pcie_ep -If PCI_ENDPOINT_CONFIGFS is enabled +If PCI_ENDPOINT_CONFIGFS is enabled:: + # ls /sys/kernel/config/pci_ep/controllers 51000000.pcie_ep -1.2 Endpoint Function Drivers -To find the list of endpoint function drivers in the system: +Endpoint Function Drivers +------------------------- + +To find the list of endpoint function drivers in the system:: # ls /sys/bus/pci-epf/drivers pci_epf_test -If PCI_ENDPOINT_CONFIGFS is enabled +If PCI_ENDPOINT_CONFIGFS is enabled:: + # ls /sys/kernel/config/pci_ep/functions pci_epf_test -1.3 Creating pci-epf-test Device + +Creating pci-epf-test Device +---------------------------- PCI endpoint function device can be created using the configfs. To create -pci-epf-test device, the following commands can be used +pci-epf-test device, the following commands can be used:: # mount -t configfs none /sys/kernel/config # cd /sys/kernel/config/pci_ep/ @@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will be probed by pci_epf_test driver. The PCI endpoint framework populates the directory with the following -configurable fields. +configurable fields:: # ls functions/pci_epf_test/func1 baseclass_code interrupt_pin progif_code subsys_id @@ -51,67 +64,83 @@ configurable fields. The PCI endpoint function driver populates these entries with default values when the device is bound to the driver. The pci-epf-test driver populates -vendorid with 0xffff and interrupt_pin with 0x0001 +vendorid with 0xffff and interrupt_pin with 0x0001:: # cat functions/pci_epf_test/func1/vendorid 0xffff # cat functions/pci_epf_test/func1/interrupt_pin 0x0001 -1.4 Configuring pci-epf-test Device + +Configuring pci-epf-test Device +------------------------------- The user can configure the pci-epf-test device using configfs entry. In order to change the vendorid and the number of MSI interrupts used by the function -device, the following commands can be used. +device, the following commands can be used:: # echo 0x104c > functions/pci_epf_test/func1/vendorid # echo 0xb500 > functions/pci_epf_test/func1/deviceid # echo 16 > functions/pci_epf_test/func1/msi_interrupts # echo 8 > functions/pci_epf_test/func1/msix_interrupts -1.5 Binding pci-epf-test Device to EP Controller + +Binding pci-epf-test Device to EP Controller +-------------------------------------------- In order for the endpoint function device to be useful, it has to be bound to a PCI endpoint controller driver. Use the configfs to bind the function -device to one of the controller driver present in the system. +device to one of the controller driver present in the system:: # ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/ Once the above step is completed, the PCI endpoint is ready to establish a link with the host. -1.6 Start the Link + +Start the Link +-------------- In order for the endpoint device to establish a link with the host, the _start_ -field should be populated with '1'. +field should be populated with '1':: # echo 1 > controllers/51000000.pcie_ep/start -2. RootComplex Device -2.1 lspci Output +RootComplex Device +================== + +lspci Output +------------ -Note that the devices listed here correspond to the value populated in 1.4 above +Note that the devices listed here correspond to the value populated in 1.4 +above:: 00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01) 01:00.0 Unassigned class [ff00]: Texas Instruments Device b500 -2.2 Using Endpoint Test function Device + +Using Endpoint Test function Device +----------------------------------- pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint -tests. To compile this tool the following commands should be used: +tests. To compile this tool the following commands should be used:: # cd <kernel-dir> # make -C tools/pci -or if you desire to compile and install in your system: +or if you desire to compile and install in your system:: # cd <kernel-dir> # make -C tools/pci install The tool and script will be located in <rootfs>/usr/bin/ -2.2.1 pcitest.sh Output + +pcitest.sh Output +~~~~~~~~~~~~~~~~~ +:: + # pcitest.sh BAR tests diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst new file mode 100644 index 000000000000..f4c6121868c3 --- /dev/null +++ b/Documentation/PCI/index.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +Linux PCI Bus Subsystem +======================= + +.. toctree:: + :maxdepth: 2 + :numbered: + + pci + picebus-howto + pci-iov-howto + msi-howto + acpi-info + pci-error-recovery + pcieaer-howto + endpoint/index diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/msi-howto.rst index 618e13d5e276..994cbb660ade 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/msi-howto.rst @@ -1,13 +1,16 @@ - The MSI Driver Guide HOWTO - Tom L Nguyen tom.l.nguyen@intel.com - 10/03/2003 - Revised Feb 12, 2004 by Martine Silbermann - email: Martine.Silbermann@hp.com - Revised Jun 25, 2004 by Tom L Nguyen - Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com> - Copyright 2003, 2008 Intel Corporation +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> -1. About this guide +========================== +The MSI Driver Guide HOWTO +========================== + +:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox + +:Copyright: 2003, 2008 Intel Corporation + +About this guide +================ This guide describes the basics of Message Signaled Interrupts (MSIs), the advantages of using MSI over traditional interrupt mechanisms, how @@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to try if a device doesn't support MSIs. -2. What are MSIs? +What are MSIs? +============== A Message Signaled Interrupt is a write from the device to a special address which causes an interrupt to be received by the CPU. @@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at a time. -3. Why use MSIs? +Why use MSIs? +============= There are three reasons why using MSIs can give an advantage over traditional pin-based interrupts. @@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue in a network card or each port in a storage controller. -4. How to use MSIs +How to use MSIs +=============== PCI devices are initialised to use pin-based interrupts. The device driver has to set up the device to use MSI or MSI-X. Not all machines support MSIs correctly, and for those machines, the APIs described below will simply fail and the device will continue to use pin-based interrupts. -4.1 Include kernel support for MSIs +Include kernel support for MSIs +------------------------------- To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI option enabled. This option is only available on some architectures, @@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example, on x86, you must also enable X86_UP_APIC or SMP in order to see the CONFIG_PCI_MSI option. -4.2 Using MSI +Using MSI +--------- Most of the hard work is done for the driver in the PCI layer. The driver simply has to request that the PCI layer set up the MSI capability for this device. To automatically use MSI or MSI-X interrupt vectors, use the following -function: +function:: int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags); @@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set, pci_alloc_irq_vectors() will spread the interrupts around the available CPUs. To get the Linux IRQ numbers passed to request_irq() and free_irq() and the -vectors, use the following function: +vectors, use the following function:: int pci_irq_vector(struct pci_dev *dev, unsigned int nr); Any allocated resources should be freed before removing the device using -the following function: +the following function:: void pci_free_irq_vectors(struct pci_dev *dev); @@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors as possible, likely up to the limit supported by the device. If nvec is larger than the number supported by the device it will automatically be capped to the supported limit, so there is no need to query the number of -vectors supported beforehand: +vectors supported beforehand:: nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES) if (nvec < 0) @@ -135,7 +143,7 @@ vectors supported beforehand: If a driver is unable or unwilling to deal with a variable number of MSI interrupts it can request a particular number of interrupts by passing that number to pci_alloc_irq_vectors() function as both 'min_vecs' and -'max_vecs' parameters: +'max_vecs' parameters:: ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES); if (ret < 0) @@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and The most notorious example of the request type described above is enabling the single MSI mode for a device. It could be done by passing two 1s as -'min_vecs' and 'max_vecs': +'min_vecs' and 'max_vecs':: ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); if (ret < 0) goto out_err; Some devices might not support using legacy line interrupts, in which case -the driver can specify that only MSI or MSI-X is acceptable: +the driver can specify that only MSI or MSI-X is acceptable:: nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX); if (nvec < 0) goto out_err; -4.3 Legacy APIs +Legacy APIs +----------- The following old APIs to enable and disable MSI or MSI-X interrupts should -not be used in new code: +not be used in new code:: pci_enable_msi() /* deprecated */ pci_disable_msi() /* deprecated */ @@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count of vectors we might have to revisit that decision and add a pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently. -4.4 Considerations when using MSIs +Considerations when using MSIs +------------------------------ -4.4.1 Spinlocks +Spinlocks +~~~~~~~~~ Most device drivers have a per-device spinlock which is taken in the interrupt handler. With pin-based interrupts or a single MSI, it is not @@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using spin_lock_irqsave() or spin_lock_irq() which disable local interrupts and acquire the lock (see Documentation/kernel-hacking/locking.rst). -4.5 How to tell whether MSI/MSI-X is enabled on a device +How to tell whether MSI/MSI-X is enabled on a device +---------------------------------------------------- Using 'lspci -v' (as root) may show some devices with "MSI", "Message Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities @@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled) or "-" (disabled). -5. MSI quirks +MSI quirks +========== Several PCI chipsets or devices are known not to support MSIs. The PCI stack provides three ways to disable MSIs: @@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs: 2. on all devices behind a specific bridge 3. on a single device -5.1. Disabling MSIs globally +Disabling MSIs globally +----------------------- Some host chipsets simply don't support MSIs properly. If we're lucky, the manufacturer knows this and has indicated it in the ACPI @@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be in your best interests to report the problem to linux-pci@vger.kernel.org including a full 'lspci -v' so we can add the quirks to the kernel. -5.2. Disabling MSIs below a bridge +Disabling MSIs below a bridge +----------------------------- Some PCI bridges are not able to route MSIs between busses properly. In this case, MSIs must be disabled on all devices behind the bridge. @@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets, Linux mostly knows about them and automatically enables MSIs if it can. If you have a bridge unknown to Linux, you can enable MSIs in configuration space using whatever method you know works, then -enable MSIs on that bridge by doing: +enable MSIs on that bridge by doing:: echo 1 > /sys/bus/pci/devices/$bridge/msi_bus @@ -244,7 +259,8 @@ below this bridge. Again, please notify linux-pci@vger.kernel.org of any bridges that need special handling. -5.3. Disabling MSIs on a single device +Disabling MSIs on a single device +--------------------------------- Some devices are known to have faulty MSI implementations. Usually this is handled in the individual device driver, but occasionally it's necessary @@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use of MSI. While this is a convenient workaround for the driver author, it is not good practice, and should not be emulated. -5.4. Finding why MSIs are disabled on a device +Finding why MSIs are disabled on a device +----------------------------------------- From the above three sections, you can see that there are many reasons why MSIs may not be enabled for a given device. Your first step should @@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled for your machine. You should also check your .config to be sure you have enabled CONFIG_PCI_MSI. -Then, 'lspci -t' gives the list of bridges above a device. Reading -/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1) +Then, 'lspci -t' gives the list of bridges above a device. Reading +`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1) or disabled (0). If 0 is found in any of the msi_bus files belonging to bridges between the PCI root and the device, MSIs are disabled. diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.rst index 0b6bb3ef449e..83db42092935 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.rst @@ -1,12 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 - PCI Error Recovery - ------------------ - February 2, 2006 +================== +PCI Error Recovery +================== - Current document maintainer: - Linas Vepstas <linasvepstas@gmail.com> - updated by Richard Lary <rlary@us.ibm.com> - and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009 + +:Authors: - Linas Vepstas <linasvepstas@gmail.com> + - Richard Lary <rlary@us.ibm.com> + - Mike Mason <mmlnx@us.ibm.com> Many PCI bus controllers are able to detect a variety of hardware @@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets. Detailed Design ---------------- +=============== + Design and implementation details below, based on a chain of public email discussions with Ben Herrenschmidt, circa 5 April 2005. @@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware", and the actual recovery steps taken are platform dependent. The arch/powerpc implementation will simulate a PCI hotplug remove/add. -This structure has the form: -struct pci_error_handlers -{ - int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); - int (*mmio_enabled)(struct pci_dev *dev); - int (*slot_reset)(struct pci_dev *dev); - void (*resume)(struct pci_dev *dev); -}; - -The possible channel states are: -enum pci_channel_state { - pci_channel_io_normal, /* I/O channel is in normal state */ - pci_channel_io_frozen, /* I/O to channel is blocked */ - pci_channel_io_perm_failure, /* PCI card is dead */ -}; - -Possible return values are: -enum pci_ers_result { - PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ - PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ - PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ - PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ - PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ -}; +This structure has the form:: + + struct pci_error_handlers + { + int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); + int (*mmio_enabled)(struct pci_dev *dev); + int (*slot_reset)(struct pci_dev *dev); + void (*resume)(struct pci_dev *dev); + }; + +The possible channel states are:: + + enum pci_channel_state { + pci_channel_io_normal, /* I/O channel is in normal state */ + pci_channel_io_frozen, /* I/O to channel is blocked */ + pci_channel_io_perm_failure, /* PCI card is dead */ + }; + +Possible return values are:: + + enum pci_ers_result { + PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ + PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ + PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ + PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ + PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ + }; A driver does not have to implement all of these callbacks; however, if it implements any, it must implement error_detected(). If a callback @@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a All drivers participating in this system must implement this call. The driver must return one of the following result codes: - - PCI_ERS_RESULT_CAN_RECOVER: - Driver returns this if it thinks it might be able to recover - the HW by just banging IOs or if it wants to be given - a chance to extract some diagnostic information (see - mmio_enable, below). - - PCI_ERS_RESULT_NEED_RESET: - Driver returns this if it can't recover without a - slot reset. - - PCI_ERS_RESULT_DISCONNECT: - Driver returns this if it doesn't want to recover at all. + + - PCI_ERS_RESULT_CAN_RECOVER + Driver returns this if it thinks it might be able to recover + the HW by just banging IOs or if it wants to be given + a chance to extract some diagnostic information (see + mmio_enable, below). + - PCI_ERS_RESULT_NEED_RESET + Driver returns this if it can't recover without a + slot reset. + - PCI_ERS_RESULT_DISCONNECT + Driver returns this if it doesn't want to recover at all. The next step taken will depend on the result codes returned by the drivers. @@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset). If the platform is unable to recover the slot, the next step is STEP 6 (Permanent Failure). ->>> The current powerpc implementation assumes that a device driver will ->>> *not* schedule or semaphore in this routine; the current powerpc ->>> implementation uses one kernel thread to notify all devices; ->>> thus, if one device sleeps/schedules, all devices are affected. ->>> Doing better requires complex multi-threaded logic in the error ->>> recovery implementation (e.g. waiting for all notification threads ->>> to "join" before proceeding with recovery.) This seems excessively ->>> complex and not worth implementing. - ->>> The current powerpc implementation doesn't much care if the device ->>> attempts I/O at this point, or not. I/O's will fail, returning ->>> a value of 0xff on read, and writes will be dropped. If more than ->>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH ->>> assumes that the device driver has gone into an infinite loop ->>> and prints an error to syslog. A reboot is then required to ->>> get the device working again. +.. note:: + + The current powerpc implementation assumes that a device driver will + *not* schedule or semaphore in this routine; the current powerpc + implementation uses one kernel thread to notify all devices; + thus, if one device sleeps/schedules, all devices are affected. + Doing better requires complex multi-threaded logic in the error + recovery implementation (e.g. waiting for all notification threads + to "join" before proceeding with recovery.) This seems excessively + complex and not worth implementing. + + The current powerpc implementation doesn't much care if the device + attempts I/O at this point, or not. I/O's will fail, returning + a value of 0xff on read, and writes will be dropped. If more than + EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH + assumes that the device driver has gone into an infinite loop + and prints an error to syslog. A reboot is then required to + get the device working again. STEP 2: MMIO Enabled -------------------- +-------------------- The platform re-enables MMIO to the device (but typically not the DMA), and then calls the mmio_enabled() callback on all affected device drivers. @@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs without a slot reset or a link reset, it will not call this callback, and instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) ->>> The following is proposed; no platform implements this yet: ->>> Proposal: All I/O's should be done _synchronously_ from within ->>> this callback, errors triggered by them will be returned via ->>> the normal pci_check_whatever() API, no new error_detected() ->>> callback will be issued due to an error happening here. However, ->>> such an error might cause IOs to be re-blocked for the whole ->>> segment, and thus invalidate the recovery that other devices ->>> on the same segment might have done, forcing the whole segment ->>> into one of the next states, that is, link reset or slot reset. +.. note:: + + The following is proposed; no platform implements this yet: + Proposal: All I/O's should be done _synchronously_ from within + this callback, errors triggered by them will be returned via + the normal pci_check_whatever() API, no new error_detected() + callback will be issued due to an error happening here. However, + such an error might cause IOs to be re-blocked for the whole + segment, and thus invalidate the recovery that other devices + on the same segment might have done, forcing the whole segment + into one of the next states, that is, link reset or slot reset. The driver should return one of the following result codes: - - PCI_ERS_RESULT_RECOVERED - Driver returns this if it thinks the device is fully - functional and thinks it is ready to start - normal driver operations again. There is no - guarantee that the driver will actually be - allowed to proceed, as another driver on the - same segment might have failed and thus triggered a - slot reset on platforms that support it. - - - PCI_ERS_RESULT_NEED_RESET - Driver returns this if it thinks the device is not - recoverable in its current state and it needs a slot - reset to proceed. - - - PCI_ERS_RESULT_DISCONNECT - Same as above. Total failure, no recovery even after - reset driver dead. (To be defined more precisely) + - PCI_ERS_RESULT_RECOVERED + Driver returns this if it thinks the device is fully + functional and thinks it is ready to start + normal driver operations again. There is no + guarantee that the driver will actually be + allowed to proceed, as another driver on the + same segment might have failed and thus triggered a + slot reset on platforms that support it. + + - PCI_ERS_RESULT_NEED_RESET + Driver returns this if it thinks the device is not + recoverable in its current state and it needs a slot + reset to proceed. + + - PCI_ERS_RESULT_DISCONNECT + Same as above. Total failure, no recovery even after + reset driver dead. (To be defined more precisely) The next step taken depends on the results returned by the drivers. If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform @@ -293,31 +303,33 @@ device will be considered "dead" in this case. Drivers for multi-function cards will need to coordinate among themselves as to which driver instance will perform any "one-shot" or global device initialization. For example, the Symbios sym53cxx2 -driver performs device init only from PCI function 0: +driver performs device init only from PCI function 0:: -+ if (PCI_FUNC(pdev->devfn) == 0) -+ sym_reset_scsi_bus(np, 0); + + if (PCI_FUNC(pdev->devfn) == 0) + + sym_reset_scsi_bus(np, 0); - Result codes: - - PCI_ERS_RESULT_DISCONNECT - Same as above. +Result codes: + - PCI_ERS_RESULT_DISCONNECT + Same as above. Drivers for PCI Express cards that require a fundamental reset must set the needs_freset bit in the pci_dev structure in their probe function. For example, the QLogic qla2xxx driver sets the needs_freset bit for certain -PCI card types: +PCI card types:: -+ /* Set EEH reset type to fundamental if required by hba */ -+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) -+ pdev->needs_freset = 1; -+ + + /* Set EEH reset type to fundamental if required by hba */ + + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) + + pdev->needs_freset = 1; + + Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent Failure). ->>> The current powerpc implementation does not try a power-cycle ->>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT. ->>> However, it probably should. +.. note:: + + The current powerpc implementation does not try a power-cycle + reset if the driver returned PCI_ERS_RESULT_DISCONNECT. + However, it probably should. STEP 5: Resume Operations @@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy. That is, the recovery API only requires that: - There is no guarantee that interrupt delivery can proceed from any -device on the segment starting from the error detection and until the -slot_reset callback is called, at which point interrupts are expected -to be fully operational. + device on the segment starting from the error detection and until the + slot_reset callback is called, at which point interrupts are expected + to be fully operational. - There is no guarantee that interrupt delivery is stopped, that is, -a driver that gets an interrupt after detecting an error, or that detects -an error within the interrupt handler such that it prevents proper -ack'ing of the interrupt (and thus removal of the source) should just -return IRQ_NOTHANDLED. It's up to the platform to deal with that -condition, typically by masking the IRQ source during the duration of -the error handling. It is expected that the platform "knows" which -interrupts are routed to error-management capable slots and can deal -with temporarily disabling that IRQ number during error processing (this -isn't terribly complex). That means some IRQ latency for other devices -sharing the interrupt, but there is simply no other way. High end -platforms aren't supposed to share interrupts between many devices -anyway :) - ->>> Implementation details for the powerpc platform are discussed in ->>> the file Documentation/powerpc/eeh-pci-error-recovery.txt - ->>> As of this writing, there is a growing list of device drivers with ->>> patches implementing error recovery. Not all of these patches are in ->>> mainline yet. These may be used as "examples": ->>> ->>> drivers/scsi/ipr ->>> drivers/scsi/sym53c8xx_2 ->>> drivers/scsi/qla2xxx ->>> drivers/scsi/lpfc ->>> drivers/next/bnx2.c ->>> drivers/next/e100.c ->>> drivers/net/e1000 ->>> drivers/net/e1000e ->>> drivers/net/ixgb ->>> drivers/net/ixgbe ->>> drivers/net/cxgb3 ->>> drivers/net/s2io.c ->>> drivers/net/qlge - -The End -------- + a driver that gets an interrupt after detecting an error, or that detects + an error within the interrupt handler such that it prevents proper + ack'ing of the interrupt (and thus removal of the source) should just + return IRQ_NOTHANDLED. It's up to the platform to deal with that + condition, typically by masking the IRQ source during the duration of + the error handling. It is expected that the platform "knows" which + interrupts are routed to error-management capable slots and can deal + with temporarily disabling that IRQ number during error processing (this + isn't terribly complex). That means some IRQ latency for other devices + sharing the interrupt, but there is simply no other way. High end + platforms aren't supposed to share interrupts between many devices + anyway :) + +.. note:: + + Implementation details for the powerpc platform are discussed in + the file Documentation/powerpc/eeh-pci-error-recovery.txt + + As of this writing, there is a growing list of device drivers with + patches implementing error recovery. Not all of these patches are in + mainline yet. These may be used as "examples": + + - drivers/scsi/ipr + - drivers/scsi/sym53c8xx_2 + - drivers/scsi/qla2xxx + - drivers/scsi/lpfc + - drivers/next/bnx2.c + - drivers/next/e100.c + - drivers/net/e1000 + - drivers/net/e1000e + - drivers/net/ixgb + - drivers/net/ixgbe + - drivers/net/cxgb3 + - drivers/net/s2io.c + - drivers/net/qlge diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.rst index d2a84151e99c..b9fd003206f1 100644 --- a/Documentation/PCI/pci-iov-howto.txt +++ b/Documentation/PCI/pci-iov-howto.rst @@ -1,14 +1,19 @@ - PCI Express I/O Virtualization Howto - Copyright (C) 2009 Intel Corporation - Yu Zhao <yu.zhao@intel.com> +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> - Update: November 2012 - -- sysfs-based SRIOV enable-/disable-ment - Donald Dutile <ddutile@redhat.com> +==================================== +PCI Express I/O Virtualization Howto +==================================== -1. Overview +:Copyright: |copy| 2009 Intel Corporation +:Authors: - Yu Zhao <yu.zhao@intel.com> + - Donald Dutile <ddutile@redhat.com> -1.1 What is SR-IOV +Overview +======== + +What is SR-IOV +-------------- Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended capability which makes one physical device appear as multiple virtual @@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver operates on the register set so it can be functional and appear as a real existing PCI device. -2. User Guide +User Guide +========== -2.1 How can I enable SR-IOV capability +How can I enable SR-IOV capability +---------------------------------- Multiple methods are available for SR-IOV enablement. In the first method, the device driver (PF driver) will control the @@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure numvfs <= totalvfs. The second method is the recommended method for new/future VF devices. -2.2 How can I use the Virtual Functions +How can I use the Virtual Functions +----------------------------------- The VF is treated as hot-plugged PCI devices in the kernel, so they should be able to work in the same way as real PCI devices. The VF requires device driver that is same as a normal PCI device's. -3. Developer Guide +Developer Guide +=============== -3.1 SR-IOV API +SR-IOV API +---------- To enable SR-IOV capability: -(a) For the first method, in the driver: + +(a) For the first method, in the driver:: + int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); - 'nr_virtfn' is number of VFs to be enabled. -(b) For the second method, from sysfs: + +'nr_virtfn' is number of VFs to be enabled. + +(b) For the second method, from sysfs:: + echo 'nr_virtfn' > \ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs To disable SR-IOV capability: -(a) For the first method, in the driver: + +(a) For the first method, in the driver:: + void pci_disable_sriov(struct pci_dev *dev); -(b) For the second method, from sysfs: + +(b) For the second method, from sysfs:: + echo 0 > \ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs To enable auto probing VFs by a compatible driver on the host, run command below before enabling SR-IOV capabilities. This is the default behavior. +:: + echo 1 > \ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe To disable auto probing VFs by a compatible driver on the host, run command below before enabling SR-IOV capabilities. Updating this entry will not affect VFs which are already probed. +:: + echo 0 > \ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe -3.2 Usage example +Usage example +------------- Following piece of code illustrates the usage of the SR-IOV API. +:: -static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id) -{ - pci_enable_sriov(dev, NR_VIRTFN); + static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id) + { + pci_enable_sriov(dev, NR_VIRTFN); - ... - - return 0; -} + ... -static void dev_remove(struct pci_dev *dev) -{ - pci_disable_sriov(dev); + return 0; + } - ... -} + static void dev_remove(struct pci_dev *dev) + { + pci_disable_sriov(dev); -static int dev_suspend(struct pci_dev *dev, pm_message_t state) -{ - ... + ... + } - return 0; -} + static int dev_suspend(struct pci_dev *dev, pm_message_t state) + { + ... -static int dev_resume(struct pci_dev *dev) -{ - ... + return 0; + } - return 0; -} + static int dev_resume(struct pci_dev *dev) + { + ... -static void dev_shutdown(struct pci_dev *dev) -{ - ... -} + return 0; + } -static int dev_sriov_configure(struct pci_dev *dev, int numvfs) -{ - if (numvfs > 0) { - ... - pci_enable_sriov(dev, numvfs); + static void dev_shutdown(struct pci_dev *dev) + { ... - return numvfs; } - if (numvfs == 0) { - .... - pci_disable_sriov(dev); - ... - return 0; + + static int dev_sriov_configure(struct pci_dev *dev, int numvfs) + { + if (numvfs > 0) { + ... + pci_enable_sriov(dev, numvfs); + ... + return numvfs; + } + if (numvfs == 0) { + .... + pci_disable_sriov(dev); + ... + return 0; + } } -} - -static struct pci_driver dev_driver = { - .name = "SR-IOV Physical Function driver", - .id_table = dev_id_table, - .probe = dev_probe, - .remove = dev_remove, - .suspend = dev_suspend, - .resume = dev_resume, - .shutdown = dev_shutdown, - .sriov_configure = dev_sriov_configure, -}; + + static struct pci_driver dev_driver = { + .name = "SR-IOV Physical Function driver", + .id_table = dev_id_table, + .probe = dev_probe, + .remove = dev_remove, + .suspend = dev_suspend, + .resume = dev_resume, + .shutdown = dev_shutdown, + .sriov_configure = dev_sriov_configure, + }; diff --git a/Documentation/PCI/pci.txt b/Documentation/PCI/pci.rst index badb26ac33dc..6864f9a70f5f 100644 --- a/Documentation/PCI/pci.txt +++ b/Documentation/PCI/pci.rst @@ -1,10 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 - How To Write Linux PCI Drivers +============================== +How To Write Linux PCI Drivers +============================== - by Martin Mares <mj@ucw.cz> on 07-Feb-2000 - updated by Grant Grundler <grundler@parisc-linux.org> on 23-Dec-2006 +:Authors: - Martin Mares <mj@ucw.cz> + - Grant Grundler <grundler@parisc-linux.org> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The world of PCI is vast and full of (mostly unpleasant) surprises. Since each CPU architecture implements different chip-sets and PCI devices have different requirements (erm, "features"), the result is the PCI support @@ -15,8 +17,7 @@ PCI device drivers. A more complete resource is the third edition of "Linux Device Drivers" by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. LDD3 is available for free (under Creative Commons License) from: - - http://lwn.net/Kernel/LDD3/ +http://lwn.net/Kernel/LDD3/. However, keep in mind that all documents are subject to "bit rot". Refer to the source code if things are not working as described here. @@ -25,9 +26,8 @@ Please send questions/comments/patches about Linux PCI API to the "Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list. - -0. Structure of PCI drivers -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Structure of PCI drivers +======================== PCI drivers "discover" PCI devices in a system via pci_register_driver(). Actually, it's the other way around. When the PCI generic code discovers a new device, the driver with a matching "description" will be notified. @@ -42,24 +42,25 @@ pointers and thus dictates the high level structure of a driver. Once the driver knows about a PCI device and takes ownership, the driver generally needs to perform the following initialization: - Enable the device - Request MMIO/IOP resources - Set the DMA mask size (for both coherent and streaming DMA) - Allocate and initialize shared control data (pci_allocate_coherent()) - Access device configuration space (if needed) - Register IRQ handler (request_irq()) - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) - Enable DMA/processing engines + - Enable the device + - Request MMIO/IOP resources + - Set the DMA mask size (for both coherent and streaming DMA) + - Allocate and initialize shared control data (pci_allocate_coherent()) + - Access device configuration space (if needed) + - Register IRQ handler (request_irq()) + - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) + - Enable DMA/processing engines When done using the device, and perhaps the module needs to be unloaded, the driver needs to take the follow steps: - Disable the device from generating IRQs - Release the IRQ (free_irq()) - Stop all DMA activity - Release DMA buffers (both streaming and coherent) - Unregister from other subsystems (e.g. scsi or netdev) - Release MMIO/IOP resources - Disable the device + + - Disable the device from generating IRQs + - Release the IRQ (free_irq()) + - Stop all DMA activity + - Release DMA buffers (both streaming and coherent) + - Unregister from other subsystems (e.g. scsi or netdev) + - Release MMIO/IOP resources + - Disable the device Most of these topics are covered in the following sections. For the rest look at LDD3 or <linux/pci.h> . @@ -70,99 +71,38 @@ completely empty or just returning an appropriate error codes to avoid lots of ifdefs in the drivers. +pci_register_driver() call +========================== -1. pci_register_driver() call -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -PCI device drivers call pci_register_driver() during their +PCI device drivers call ``pci_register_driver()`` during their initialization with a pointer to a structure describing the driver -(struct pci_driver): - - field name Description - ---------- ------------------------------------------------------ - id_table Pointer to table of device ID's the driver is - interested in. Most drivers should export this - table using MODULE_DEVICE_TABLE(pci,...). - - probe This probing function gets called (during execution - of pci_register_driver() for already existing - devices or later if a new device gets inserted) for - all PCI devices which match the ID table and are not - "owned" by the other drivers yet. This function gets - passed a "struct pci_dev *" for each device whose - entry in the ID table matches the device. The probe - function returns zero when the driver chooses to - take "ownership" of the device or an error code - (negative number) otherwise. - The probe function always gets called from process - context, so it can sleep. - - remove The remove() function gets called whenever a device - being handled by this driver is removed (either during - deregistration of the driver or when it's manually - pulled out of a hot-pluggable slot). - The remove function always gets called from process - context, so it can sleep. - - suspend Put device into low power state. - suspend_late Put device into low power state. - - resume_early Wake device from low power state. - resume Wake device from low power state. - - (Please see Documentation/power/pci.txt for descriptions - of PCI Power Management and the related functions.) - - shutdown Hook into reboot_notifier_list (kernel/sys.c). - Intended to stop any idling DMA operations. - Useful for enabling wake-on-lan (NIC) or changing - the power state of a device before reboot. - e.g. drivers/net/e100.c. - - err_handler See Documentation/PCI/pci-error-recovery.txt - - -The ID table is an array of struct pci_device_id entries ending with an -all-zero entry. Definitions with static const are generally preferred. - -Each entry consists of: - - vendor,device Vendor and device ID to match (or PCI_ANY_ID) +(``struct pci_driver``): - subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID) - subdevice, +.. kernel-doc:: include/linux/pci.h + :functions: pci_driver - class Device class, subclass, and "interface" to match. - See Appendix D of the PCI Local Bus Spec or - include/linux/pci_ids.h for a full list of classes. - Most drivers do not need to specify class/class_mask - as vendor/device is normally sufficient. - - class_mask limit which sub-fields of the class field are compared. - See drivers/scsi/sym53c8xx_2/ for example of usage. - - driver_data Data private to the driver. - Most drivers don't need to use driver_data field. - Best practice is to use driver_data as an index - into a static list of equivalent device types, - instead of using it as a pointer. +The ID table is an array of ``struct pci_device_id`` entries ending with an +all-zero entry. Definitions with static const are generally preferred. +.. kernel-doc:: include/linux/mod_devicetable.h + :functions: pci_device_id -Most drivers only need PCI_DEVICE() or PCI_DEVICE_CLASS() to set up +Most drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up a pci_device_id table. New PCI IDs may be added to a device driver pci_ids table at runtime -as shown below: +as shown below:: -echo "vendor device subvendor subdevice class class_mask driver_data" > \ -/sys/bus/pci/drivers/{driver}/new_id + echo "vendor device subvendor subdevice class class_mask driver_data" > \ + /sys/bus/pci/drivers/{driver}/new_id All fields are passed in as hexadecimal values (no leading 0x). The vendor and device fields are mandatory, the others are optional. Users need pass only as many optional fields as necessary: - o subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) - o class and classmask fields default to 0 - o driver_data defaults to 0UL. + + - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) + - class and classmask fields default to 0 + - driver_data defaults to 0UL. Note that driver_data must match the value used by any of the pci_device_id entries defined in the driver. This makes the driver_data field mandatory @@ -175,29 +115,31 @@ When the driver exits, it just calls pci_unregister_driver() and the PCI layer automatically calls the remove hook for all devices handled by the driver. -1.1 "Attributes" for driver functions/data +"Attributes" for driver functions/data +-------------------------------------- Please mark the initialization and cleanup functions where appropriate (the corresponding macros are defined in <linux/init.h>): + ====== ================================================= __init Initialization code. Thrown away after the driver initializes. __exit Exit code. Ignored for non-modular drivers. + ====== ================================================= Tips on when/where to use the above attributes: - o The module_init()/module_exit() functions (and all + - The module_init()/module_exit() functions (and all initialization functions called _only_ from these) should be marked __init/__exit. - o Do not mark the struct pci_driver. + - Do not mark the struct pci_driver. - o Do NOT mark a function if you are not sure which mark to use. + - Do NOT mark a function if you are not sure which mark to use. Better to not mark the function than mark the function wrong. - -2. How to find PCI devices manually -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +How to find PCI devices manually +================================ PCI drivers should have a really good reason for not using the pci_register_driver() interface to search for PCI devices. @@ -207,17 +149,17 @@ E.g. combined serial/parallel port/floppy controller. A manual search may be performed using the following constructs: -Searching by vendor and device ID: +Searching by vendor and device ID:: struct pci_dev *dev = NULL; while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev)) configure_device(dev); -Searching by class ID (iterate in a similar way): +Searching by class ID (iterate in a similar way):: pci_get_class(CLASS_ID, dev) -Searching by both vendor/device and subsystem vendor/device ID: +Searching by both vendor/device and subsystem vendor/device ID:: pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev). @@ -230,21 +172,20 @@ the pci_dev that they return. You must eventually (possibly at module unload) decrement the reference count on these devices by calling pci_dev_put(). - -3. Device Initialization Steps -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Device Initialization Steps +=========================== As noted in the introduction, most PCI drivers need the following steps for device initialization: - Enable the device - Request MMIO/IOP resources - Set the DMA mask size (for both coherent and streaming DMA) - Allocate and initialize shared control data (pci_allocate_coherent()) - Access device configuration space (if needed) - Register IRQ handler (request_irq()) - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) - Enable DMA/processing engines. + - Enable the device + - Request MMIO/IOP resources + - Set the DMA mask size (for both coherent and streaming DMA) + - Allocate and initialize shared control data (pci_allocate_coherent()) + - Access device configuration space (if needed) + - Register IRQ handler (request_irq()) + - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) + - Enable DMA/processing engines. The driver can access PCI config space registers at any time. (Well, almost. When running BIST, config space can go away...but @@ -252,26 +193,29 @@ that will just result in a PCI Bus Master Abort and config reads will return garbage). -3.1 Enable the PCI device -~~~~~~~~~~~~~~~~~~~~~~~~~ +Enable the PCI device +--------------------- Before touching any device registers, the driver needs to enable the PCI device by calling pci_enable_device(). This will: - o wake up the device if it was in suspended state, - o allocate I/O and memory regions of the device (if BIOS did not), - o allocate an IRQ (if BIOS did not). -NOTE: pci_enable_device() can fail! Check the return value. + - wake up the device if it was in suspended state, + - allocate I/O and memory regions of the device (if BIOS did not), + - allocate an IRQ (if BIOS did not). -[ OS BUG: we don't check resource allocations before enabling those - resources. The sequence would make more sense if we called - pci_request_resources() before calling pci_enable_device(). - Currently, the device drivers can't detect the bug when when two - devices have been allocated the same range. This is not a common - problem and unlikely to get fixed soon. +.. note:: + pci_enable_device() can fail! Check the return value. + +.. warning:: + OS BUG: we don't check resource allocations before enabling those + resources. The sequence would make more sense if we called + pci_request_resources() before calling pci_enable_device(). + Currently, the device drivers can't detect the bug when when two + devices have been allocated the same range. This is not a common + problem and unlikely to get fixed soon. + + This has been discussed before but not changed as of 2.6.19: + http://lkml.org/lkml/2006/3/2/194 - This has been discussed before but not changed as of 2.6.19: - http://lkml.org/lkml/2006/3/2/194 -] pci_set_master() will enable DMA by setting the bus master bit in the PCI_COMMAND register. It also fixes the latency timer value if @@ -288,8 +232,8 @@ pci_try_set_mwi() to have the system do its best effort at enabling Mem-Wr-Inval. -3.2 Request MMIO/IOP resources -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Request MMIO/IOP resources +-------------------------- Memory (MMIO), and I/O port addresses should NOT be read directly from the PCI device config space. Use the values in the pci_dev structure as the PCI "bus address" might have been remapped to a "host physical" @@ -304,9 +248,10 @@ Conversely, drivers should call pci_release_region() AFTER calling pci_disable_device(). The idea is to prevent two devices colliding on the same address range. -[ See OS BUG comment above. Currently (2.6.19), The driver can only - determine MMIO and IO Port resource availability _after_ calling - pci_enable_device(). ] +.. tip:: + See OS BUG comment above. Currently (2.6.19), The driver can only + determine MMIO and IO Port resource availability _after_ calling + pci_enable_device(). Generic flavors of pci_request_region() are request_mem_region() (for MMIO ranges) and request_region() (for IO Port ranges). @@ -316,12 +261,13 @@ BARs. Also see pci_request_selected_regions() below. -3.3 Set the DMA mask size -~~~~~~~~~~~~~~~~~~~~~~~~~ -[ If anything below doesn't make sense, please refer to - Documentation/DMA-API.txt. This section is just a reminder that - drivers need to indicate DMA capabilities of the device and is not - an authoritative source for DMA interfaces. ] +Set the DMA mask size +--------------------- +.. note:: + If anything below doesn't make sense, please refer to + Documentation/DMA-API.txt. This section is just a reminder that + drivers need to indicate DMA capabilities of the device and is not + an authoritative source for DMA interfaces. While all drivers should explicitly indicate the DMA capability (e.g. 32 or 64 bit) of the PCI bus master, devices with more than @@ -342,23 +288,23 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are ("consistent") data. -3.4 Setup shared control data -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Setup shared control data +------------------------- Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) memory. See Documentation/DMA-API.txt for a full description of the DMA APIs. This section is just a reminder that it needs to be done before enabling DMA on the device. -3.5 Initialize device registers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Initialize device registers +--------------------------- Some drivers will need specific "capability" fields programmed or other "vendor specific" register initialized or reset. E.g. clearing pending interrupts. -3.6 Register IRQ handler -~~~~~~~~~~~~~~~~~~~~~~~~ +Register IRQ handler +-------------------- While calling request_irq() is the last step described here, this is often just another intermediate step to initialize a device. This step can often be deferred until the device is opened for use. @@ -396,6 +342,7 @@ and msix_enabled flags in the pci_dev structure after calling pci_alloc_irq_vectors. There are (at least) two really good reasons for using MSI: + 1) MSI is an exclusive interrupt vector by definition. This means the interrupt handler doesn't have to verify its device caused the interrupt. @@ -410,24 +357,23 @@ See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples of MSI/MSI-X usage. - -4. PCI device shutdown -~~~~~~~~~~~~~~~~~~~~~~~ +PCI device shutdown +=================== When a PCI device driver is being unloaded, most of the following steps need to be performed: - Disable the device from generating IRQs - Release the IRQ (free_irq()) - Stop all DMA activity - Release DMA buffers (both streaming and consistent) - Unregister from other subsystems (e.g. scsi or netdev) - Disable device from responding to MMIO/IO Port addresses - Release MMIO/IO Port resource(s) + - Disable the device from generating IRQs + - Release the IRQ (free_irq()) + - Stop all DMA activity + - Release DMA buffers (both streaming and consistent) + - Unregister from other subsystems (e.g. scsi or netdev) + - Disable device from responding to MMIO/IO Port addresses + - Release MMIO/IO Port resource(s) -4.1 Stop IRQs on the device -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Stop IRQs on the device +----------------------- How to do this is chip/device specific. If it's not done, it opens the possibility of a "screaming interrupt" if (and only if) the IRQ is shared with another device. @@ -446,16 +392,16 @@ MSI and MSI-X are defined to be exclusive interrupts and thus are not susceptible to the "screaming interrupt" problem. -4.2 Release the IRQ -~~~~~~~~~~~~~~~~~~~ +Release the IRQ +--------------- Once the device is quiesced (no more IRQs), one can call free_irq(). This function will return control once any pending IRQs are handled, "unhook" the drivers IRQ handler from that IRQ, and finally release the IRQ if no one else is using it. -4.3 Stop all DMA activity -~~~~~~~~~~~~~~~~~~~~~~~~~ +Stop all DMA activity +--------------------- It's extremely important to stop all DMA operations BEFORE attempting to deallocate DMA control data. Failure to do so can result in memory corruption, hangs, and on some chip-sets a hard crash. @@ -467,8 +413,8 @@ While this step sounds obvious and trivial, several "mature" drivers didn't get this step right in the past. -4.4 Release DMA buffers -~~~~~~~~~~~~~~~~~~~~~~~ +Release DMA buffers +------------------- Once DMA is stopped, clean up streaming DMA first. I.e. unmap data buffers and return buffers to "upstream" owners if there is one. @@ -478,8 +424,8 @@ Then clean up "consistent" buffers which contain the control data. See Documentation/DMA-API.txt for details on unmapping interfaces. -4.5 Unregister from other subsystems -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Unregister from other subsystems +-------------------------------- Most low level PCI device drivers support some other subsystem like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your driver isn't losing resources from that other subsystem. @@ -487,31 +433,30 @@ If this happens, typically the symptom is an Oops (panic) when the subsystem attempts to call into a driver that has been unloaded. -4.6 Disable Device from responding to MMIO/IO Port addresses -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Disable Device from responding to MMIO/IO Port addresses +-------------------------------------------------------- io_unmap() MMIO or IO Port resources and then call pci_disable_device(). This is the symmetric opposite of pci_enable_device(). Do not access device registers after calling pci_disable_device(). -4.7 Release MMIO/IO Port Resource(s) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Release MMIO/IO Port Resource(s) +-------------------------------- Call pci_release_region() to mark the MMIO or IO Port range as available. Failure to do so usually results in the inability to reload the driver. +How to access PCI config space +============================== -5. How to access PCI config space -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You can use pci_(read|write)_config_(byte|word|dword) to access the config -space of a device represented by struct pci_dev *. All these functions return 0 -when successful or an error code (PCIBIOS_...) which can be translated to a text -string by pcibios_strerror. Most drivers expect that accesses to valid PCI +You can use `pci_(read|write)_config_(byte|word|dword)` to access the config +space of a device represented by `struct pci_dev *`. All these functions return +0 when successful or an error code (`PCIBIOS_...`) which can be translated to a +text string by pcibios_strerror. Most drivers expect that accesses to valid PCI devices don't fail. If you don't have a struct pci_dev available, you can call -pci_bus_(read|write)_config_(byte|word|dword) to access a given device +`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device and function on that bus. If you access fields in the standard portion of the config header, please @@ -522,10 +467,10 @@ pci_find_capability() for the particular capability and it will find the corresponding register block for you. +Other interesting functions +=========================== -6. Other interesting functions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - +============================= ================================================ pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain, bus and slot and number. If the device is found, its reference count is increased. @@ -539,11 +484,11 @@ pci_set_drvdata() Set private driver data pointer for a pci_dev pci_get_drvdata() Return private driver data pointer for a pci_dev pci_set_mwi() Enable Memory-Write-Invalidate transactions. pci_clear_mwi() Disable Memory-Write-Invalidate transactions. +============================= ================================================ - -7. Miscellaneous hints -~~~~~~~~~~~~~~~~~~~~~~ +Miscellaneous hints +=================== When displaying PCI device names to the user (for example when a driver wants to tell the user what card has it found), please use pci_name(pci_dev). @@ -559,9 +504,8 @@ on the bus need to be capable of doing it, so this is something which needs to be handled by platform and generic code, not individual drivers. - -8. Vendor and device identifications -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Vendor and device identifications +================================= Do not add new device or vendor IDs to include/linux/pci_ids.h unless they are shared across multiple drivers. You can add private definitions in @@ -575,28 +519,27 @@ There are mirrors of the pci.ids file at http://pciids.sourceforge.net/ and https://github.com/pciutils/pciids. - -9. Obsolete functions -~~~~~~~~~~~~~~~~~~~~~ +Obsolete functions +================== There are several functions which you might come across when trying to port an old driver to the new PCI interface. They are no longer present in the kernel as they aren't compatible with hotplug or PCI domains or having sane locking. +================= =========================================== pci_find_device() Superseded by pci_get_device() pci_find_subsys() Superseded by pci_get_subsys() pci_find_slot() Superseded by pci_get_domain_bus_and_slot() pci_get_slot() Superseded by pci_get_domain_bus_and_slot() - +================= =========================================== The alternative is the traditional PCI device driver that walks PCI device lists. This is still possible but discouraged. - -10. MMIO Space and "Write Posting" -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +MMIO Space and "Write Posting" +============================== Converting a driver from using I/O Port space to using MMIO space often requires some additional changes. Specifically, "write posting" @@ -609,14 +552,14 @@ the CPU before the transaction has reached its destination. Thus, timing sensitive code should add readl() where the CPU is expected to wait before doing other work. The classic "bit banging" -sequence works fine for I/O Port space: +sequence works fine for I/O Port space:: for (i = 8; --i; val >>= 1) { outb(val & 1, ioport_reg); /* write bit */ udelay(10); } -The same sequence for MMIO space should be: +The same sequence for MMIO space should be:: for (i = 8; --i; val >>= 1) { writeb(val & 1, mmio_reg); /* write bit */ @@ -633,4 +576,3 @@ handle the PCI master abort on all platforms if the PCI device is expected to not respond to a readl(). Most x86 platforms will allow MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage (e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail"). - diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.rst index 48ce7903e3c6..18bdefaafd1a 100644 --- a/Documentation/PCI/pcieaer-howto.txt +++ b/Documentation/PCI/pcieaer-howto.rst @@ -1,21 +1,29 @@ - The PCI Express Advanced Error Reporting Driver Guide HOWTO - T. Long Nguyen <tom.l.nguyen@intel.com> - Yanmin Zhang <yanmin.zhang@intel.com> - 07/29/2006 +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> +=========================================================== +The PCI Express Advanced Error Reporting Driver Guide HOWTO +=========================================================== -1. Overview +:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com> + - Yanmin Zhang <yanmin.zhang@intel.com> -1.1 About this guide +:Copyright: |copy| 2006 Intel Corporation + +Overview +=========== + +About this guide +---------------- This guide describes the basics of the PCI Express Advanced Error Reporting (AER) driver and provides information on how to use it, as well as how to enable the drivers of endpoint devices to conform with PCI Express AER driver. -1.2 Copyright (C) Intel Corporation 2006. -1.3 What is the PCI Express AER Driver? +What is the PCI Express AER Driver? +----------------------------------- PCI Express error signaling can occur on the PCI Express link itself or on behalf of transactions initiated on the link. PCI Express @@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI Express Advanced Error Reporting capability. The PCI Express AER driver provides three basic functions: -- Gathers the comprehensive error information if errors occurred. -- Reports error to the users. -- Performs error recovery actions. + - Gathers the comprehensive error information if errors occurred. + - Reports error to the users. + - Performs error recovery actions. AER driver only attaches root ports which support PCI-Express AER capability. -2. User Guide +User Guide +========== -2.1 Include the PCI Express AER Root Driver into the Linux Kernel +Include the PCI Express AER Root Driver into the Linux Kernel +------------------------------------------------------------- The PCI Express AER Root driver is a Root Port service driver attached to the PCI Express Port Bus driver. If a user wants to use it, the driver @@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and CONFIG_PCIEAER = y. -2.2 Load PCI Express AER Root Driver +Load PCI Express AER Root Driver +-------------------------------- Some systems have AER support in firmware. Enabling Linux AER support at the same time the firmware handles AER may result in unpredictable @@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0 Specification for details regarding _OSC usage. -2.3 AER error output +AER error output +---------------- When a PCIe AER error is captured, an error message will be output to console. If it's a correctable error, it is output as a warning. Otherwise, it is printed as an error. So users could choose different log level to filter out correctable error messages. -Below shows an example: -0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) -0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 -0000:50:00.0: [20] Unsupported Request (First) -0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 +Below shows an example:: + + 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) + 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 + 0000:50:00.0: [20] Unsupported Request (First) + 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 In the example, 'Requester ID' means the ID of the device who sends the error message to root port. Pls. refer to pci express specs for other fields. -2.4 AER Statistics / Counters +AER Statistics / Counters +------------------------- When PCIe AER errors are captured, the counters / statistics are also exposed in the form of sysfs attributes which are documented at Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats -3. Developer Guide +Developer Guide +=============== To enable AER aware support requires a software driver to configure the AER capability structure within its device and to provide callbacks. @@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific errors because device specific errors will still get sent directly to the device driver. -3.1 Configure the AER capability structure +Configure the AER capability structure +-------------------------------------- AER aware drivers of PCI Express component need change the device control registers to enable AER. They also could change AER registers, @@ -128,9 +144,11 @@ including mask and severity registers. Helper function pci_enable_pcie_error_reporting could be used to enable AER. See section 3.3. -3.2. Provide callbacks +Provide callbacks +----------------- -3.2.1 callback reset_link to reset pci express link +callback reset_link to reset pci express link +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This callback is used to reset the pci express physical link when a fatal error happens. The root port aer service driver provides a @@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions. In struct pcie_port_service_driver, a new pointer, reset_link, is added. +:: -pci_ers_result_t (*reset_link) (struct pci_dev *dev); + pci_ers_result_t (*reset_link) (struct pci_dev *dev); Section 3.2.2.2 provides more detailed info on when to call reset_link. -3.2.2 PCI error-recovery callbacks +PCI error-recovery callbacks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The PCI Express AER Root driver uses error callbacks to coordinate with downstream device drivers associated with a hierarchy in question @@ -161,7 +181,8 @@ definitions of the callbacks. Below sections specify when to call the error callback functions. -3.2.2.1 Correctable errors +Correctable errors +~~~~~~~~~~~~~~~~~~ Correctable errors pose no impacts on the functionality of the interface. The PCI Express protocol can recover without any @@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not require any recovery actions. The AER driver clears the device's correctable error status register accordingly and logs these errors. -3.2.2.2 Non-correctable (non-fatal and fatal) errors +Non-correctable (non-fatal and fatal) errors +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If an error message indicates a non-fatal error, performing link reset at upstream is not required. The AER driver calls error_detected(dev, pci_channel_io_normal) to all drivers associated within a hierarchy in -question. for example, -EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort. +question. for example:: + + EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort + If Upstream port A captures an AER error, the hierarchy consists of Downstream port B and EndPoint. @@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes to mmio_enabled. -3.3 helper functions +helper functions +---------------- +:: + + int pci_enable_pcie_error_reporting(struct pci_dev *dev); -3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev); pci_enable_pcie_error_reporting enables the device to send error messages to root port when an error is detected. Note that devices don't enable the error reporting by default, so device drivers need call this function to enable it. -3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev); +:: + + int pci_disable_pcie_error_reporting(struct pci_dev *dev); + pci_disable_pcie_error_reporting disables the device to send error messages to root port when an error is detected. -3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); +:: + + int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);` + pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable error status register. -3.4 Frequent Asked Questions +Frequent Asked Questions +------------------------ -Q: What happens if a PCI Express device driver does not provide an -error recovery handler (pci_driver->err_handler is equal to NULL)? +Q: + What happens if a PCI Express device driver does not provide an + error recovery handler (pci_driver->err_handler is equal to NULL)? -A: The devices attached with the driver won't be recovered. If the -error is fatal, kernel will print out warning messages. Please refer -to section 3 for more information. +A: + The devices attached with the driver won't be recovered. If the + error is fatal, kernel will print out warning messages. Please refer + to section 3 for more information. -Q: What happens if an upstream port service driver does not provide -callback reset_link? +Q: + What happens if an upstream port service driver does not provide + callback reset_link? -A: Fatal error recovery will fail if the errors are reported by the -upstream ports who are attached by the service driver. +A: + Fatal error recovery will fail if the errors are reported by the + upstream ports who are attached by the service driver. -Q: How does this infrastructure deal with driver that is not PCI -Express aware? +Q: + How does this infrastructure deal with driver that is not PCI + Express aware? -A: This infrastructure calls the error callback functions of the -driver when an error happens. But if the driver is not aware of -PCI Express, the device might not report its own errors to root -port. +A: + This infrastructure calls the error callback functions of the + driver when an error happens. But if the driver is not aware of + PCI Express, the device might not report its own errors to root + port. -Q: What modifications will that driver need to make it compatible -with the PCI Express AER Root driver? +Q: + What modifications will that driver need to make it compatible + with the PCI Express AER Root driver? -A: It could call the helper functions to enable AER in devices and -cleanup uncorrectable status register. Pls. refer to section 3.3. +A: + It could call the helper functions to enable AER in devices and + cleanup uncorrectable status register. Pls. refer to section 3.3. -4. Software error injection +Software error injection +======================== Debugging PCIe AER error recovery code is quite difficult because it is hard to trigger real hardware errors. Software based error @@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named Then, you need a user space tool named aer-inject, which can be gotten from: + https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ More information about aer-inject can be found in the document comes diff --git a/Documentation/PCI/PCIEBUS-HOWTO.txt b/Documentation/PCI/picebus-howto.rst index 15f0bb3b5045..f882ff62c51f 100644 --- a/Documentation/PCI/PCIEBUS-HOWTO.txt +++ b/Documentation/PCI/picebus-howto.rst @@ -1,16 +1,23 @@ - The PCI Express Port Bus Driver Guide HOWTO - Tom L Nguyen tom.l.nguyen@intel.com - 11/03/2004 +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> -1. About this guide +=========================================== +The PCI Express Port Bus Driver Guide HOWTO +=========================================== + +:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004 +:Copyright: |copy| 2004 Intel Corporation + +About this guide +================ This guide describes the basics of the PCI Express Port Bus driver and provides information on how to enable the service drivers to register/unregister with the PCI Express Port Bus Driver. -2. Copyright 2004 Intel Corporation -3. What is the PCI Express Port Bus Driver +What is the PCI Express Port Bus Driver +======================================= A PCI Express Port is a logical PCI-PCI Bridge structure. There are two types of PCI Express Port: the Root Port and the Switch @@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may be handled by a single complex driver or be individually distributed and handled by corresponding service drivers. -4. Why use the PCI Express Port Bus Driver? +Why use the PCI Express Port Bus Driver? +======================================== In existing Linux kernels, the Linux Device Driver Model allows a physical device to be handled by only a single driver. The PCI @@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests to the corresponding service drivers as required. Some key advantages of using the PCI Express Port Bus driver are listed below: - - Allow multiple service drivers to run simultaneously on - a PCI-PCI Bridge Port device. + - Allow multiple service drivers to run simultaneously on + a PCI-PCI Bridge Port device. - - Allow service drivers implemented in an independent - staged approach. + - Allow service drivers implemented in an independent + staged approach. - - Allow one service driver to run on multiple PCI-PCI Bridge - Port devices. + - Allow one service driver to run on multiple PCI-PCI Bridge + Port devices. - - Manage and distribute resources of a PCI-PCI Bridge Port - device to requested service drivers. + - Manage and distribute resources of a PCI-PCI Bridge Port + device to requested service drivers. -5. Configuring the PCI Express Port Bus Driver vs. Service Drivers +Configuring the PCI Express Port Bus Driver vs. Service Drivers +=============================================================== -5.1 Including the PCI Express Port Bus Driver Support into the Kernel +Including the PCI Express Port Bus Driver Support into the Kernel +----------------------------------------------------------------- Including the PCI Express Port Bus driver depends on whether the PCI Express support is included in the kernel config. The kernel will automatically include the PCI Express Port Bus driver as a kernel driver when the PCI Express support is enabled in the kernel. -5.2 Enabling Service Driver Support +Enabling Service Driver Support +------------------------------- PCI device drivers are implemented based on Linux Device Driver Model. All service drivers are PCI device drivers. As discussed above, it is @@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs. Failure to do so will result an identity mismatch, which prevents the PCI Express Port Bus driver from loading a service driver. -5.2.1 pcie_port_service_register +pcie_port_service_register +~~~~~~~~~~~~~~~~~~~~~~~~~~ +:: -int pcie_port_service_register(struct pcie_port_service_driver *new) + int pcie_port_service_register(struct pcie_port_service_driver *new) This API replaces the Linux Driver Model's pci_register_driver API. A service driver should always calls pcie_port_service_register at @@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls such as pci_enable_device(dev) and pci_set_master(dev) are no longer necessary since these calls are executed by the PCI Port Bus driver. -5.2.2 pcie_port_service_unregister +pcie_port_service_unregister +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +:: -void pcie_port_service_unregister(struct pcie_port_service_driver *new) + void pcie_port_service_unregister(struct pcie_port_service_driver *new) pcie_port_service_unregister replaces the Linux Driver Model's pci_unregister_driver. It's always called by service driver when a module exits. -5.2.3 Sample Code +Sample Code +~~~~~~~~~~~ Below is sample service driver code to initialize the port service driver data structure. +:: -static struct pcie_port_service_id service_id[] = { { - .vendor = PCI_ANY_ID, - .device = PCI_ANY_ID, - .port_type = PCIE_RC_PORT, - .service_type = PCIE_PORT_SERVICE_AER, - }, { /* end: all zeroes */ } -}; + static struct pcie_port_service_id service_id[] = { { + .vendor = PCI_ANY_ID, + .device = PCI_ANY_ID, + .port_type = PCIE_RC_PORT, + .service_type = PCIE_PORT_SERVICE_AER, + }, { /* end: all zeroes */ } + }; -static struct pcie_port_service_driver root_aerdrv = { - .name = (char *)device_name, - .id_table = &service_id[0], + static struct pcie_port_service_driver root_aerdrv = { + .name = (char *)device_name, + .id_table = &service_id[0], - .probe = aerdrv_load, - .remove = aerdrv_unload, + .probe = aerdrv_load, + .remove = aerdrv_unload, - .suspend = aerdrv_suspend, - .resume = aerdrv_resume, -}; + .suspend = aerdrv_suspend, + .resume = aerdrv_resume, + }; Below is a sample code for registering/unregistering a service driver. +:: -static int __init aerdrv_service_init(void) -{ - int retval = 0; + static int __init aerdrv_service_init(void) + { + int retval = 0; - retval = pcie_port_service_register(&root_aerdrv); - if (!retval) { - /* - * FIX ME - */ - } - return retval; -} + retval = pcie_port_service_register(&root_aerdrv); + if (!retval) { + /* + * FIX ME + */ + } + return retval; + } -static void __exit aerdrv_service_exit(void) -{ - pcie_port_service_unregister(&root_aerdrv); -} + static void __exit aerdrv_service_exit(void) + { + pcie_port_service_unregister(&root_aerdrv); + } -module_init(aerdrv_service_init); -module_exit(aerdrv_service_exit); + module_init(aerdrv_service_init); + module_exit(aerdrv_service_exit); -6. Possible Resource Conflicts +Possible Resource Conflicts +=========================== Since all service drivers of a PCI-PCI Bridge Port device are allowed to run simultaneously, below lists a few of possible resource conflicts with proposed solutions. -6.1 MSI and MSI-X Vector Resource +MSI and MSI-X Vector Resource +----------------------------- Once MSI or MSI-X interrupts are enabled on a device, it stays in this mode until they are disabled again. Since service drivers of the same @@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to call request_irq/free_irq. In addition, the interrupt mode is stored in the field interrupt_mode of struct pcie_device. -6.3 PCI Memory/IO Mapped Regions +PCI Memory/IO Mapped Regions +---------------------------- Service drivers for PCI Express Power Management (PME), Advanced Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access @@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes that all service drivers will be well behaved and not overwrite other service driver's configuration settings. -6.4 PCI Config Registers +PCI Config Registers +-------------------- Each service driver runs its PCI config operations on its own capability structure except the PCI Express capability structure, in |