summaryrefslogblamecommitdiff
path: root/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
blob: 2b84dc8c3149d04be0306021051528a61b952a32 (plain) (tree)







































































































































































































































































                                                                                                                                   
What:		/sys/bus/platform/devices/smpro-errmon.*/error_[core|mem|pcie|other]_[ce|ue]
KernelVersion:	6.1
Contact:	Quan Nguyen <quan@os.amperecomputing.com>
Description:
		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record printed
		in hex format according to the table below:

		+--------+---------------+-------------+------------------------------------------------------------+
		| Offset |     Field     | Size (byte) |                     Description                            |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 00     | Error Type    | 1           | See :ref:`the table below <smpro-error-types>` for details |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 01     | Subtype       | 1           | See :ref:`the table below <smpro-error-types>` for details |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 02     | Instance      | 2           | See :ref:`the table below <smpro-error-types>` for details |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 04     | Error status  | 4           | See ARM RAS specification for details                      |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 08     | Error Address | 8           | See ARM RAS specification for details                      |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 16     | Error Misc 0  | 8           | See ARM RAS specification for details                      |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 24     | Error Misc 1  | 8           | See ARM RAS specification for details                      |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 32     | Error Misc 2  | 8           | See ARM RAS specification for details                      |
		+--------+---------------+-------------+------------------------------------------------------------+
		| 40     | Error Misc 3  | 8           | See ARM RAS specification for details                      |
		+--------+---------------+-------------+------------------------------------------------------------+

		The table below defines the value of error types, their subtype, subcomponent and instance:

		.. _smpro-error-types:

		+-----------------+------------+----------+----------------+----------------------------------------+
		|   Error Group   | Error Type | Sub type | Sub component  |               Instance                 |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| CPM (core)      | 0          | 0        | Snoop-Logic    | CPM #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| CPM (core)      | 0          | 2        | Armv8 Core 1   | CPM #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 1        | ERR1           | MCU # \| SLOT << 11                    |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 2        | ERR2           | MCU # \| SLOT << 11                    |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 3        | ERR3           | MCU #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 4        | ERR4           | MCU #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 5        | ERR5           | MCU #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 6        | ERR6           | MCU #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| MCU (mem)       | 1          | 7        | Link Error     | MCU #                                  |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| Mesh (other)    | 2          | 0        | Cross Point    | X \| (Y << 5) \| NS <<11               |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| Mesh (other)    | 2          | 1        | Home Node(IO)  | X \| (Y << 5) \| NS <<11               |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| Mesh (other)    | 2          | 2        | Home Node(Mem) | X \| (Y << 5) \| NS <<11 \| device<<12 |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| Mesh (other)    | 2          | 4        | CCIX Node      | X \| (Y << 5) \| NS <<11               |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| 2P Link (other) | 3          | 0        | N/A            | Altra 2P Link #                        |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 0        | ERR0           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 1        | ERR1           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 2        | ERR2           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 3        | ERR3           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 4        | ERR4           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 5        | ERR5           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 6        | ERR6           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 7        | ERR7           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 8        | ERR8           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 9        | ERR9           | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 10       | ERR10          | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 11       | ERR11          | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 12       | ERR12          | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| GIC (other)     | 5          | 13-21    | ERR13          | RC # + 1                               |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TCU      | 100            | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU0     | 0              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU1     | 1              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU2     | 2              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU3     | 3              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU4     | 4              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU5     | 5              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU6     | 6              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU7     | 7              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU8     | 8              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMMU (other)    | 6          | TBU9     | 9              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PCIe AER (pcie) | 7          | Root     | 0              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PCIe AER (pcie) | 7          | Device   | 1              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PCIe RC (pcie)  | 8          | RCA HB   | 0              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PCIe RC (pcie)  | 8          | RCB HB   | 1              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PCIe RC (pcie)  | 8          | RASDP    | 8              | RC #                                   |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| OCM (other)     | 9          | ERR0     | 0              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| OCM (other)     | 9          | ERR1     | 1              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| OCM (other)     | 9          | ERR2     | 2              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMpro (other)   | 10         | ERR0     | 0              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMpro (other)   | 10         | ERR1     | 1              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| SMpro (other)   | 10         | MPA_ERR  | 2              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PMpro (other)   | 11         | ERR0     | 0              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PMpro (other)   | 11         | ERR1     | 1              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+
		| PMpro (other)   | 11         | MPA_ERR  | 2              | 0                                      |
		+-----------------+------------+----------+----------------+----------------------------------------+

		Example::

		 # cat error_other_ue
		 880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000

		The detail of each sysfs entries is as below:

		+-------------+---------------------------------------------------------+----------------------------------+
		|   Error     |                   Sysfs entry                           |   Description (when triggered)   |
		+-------------+---------------------------------------------------------+----------------------------------+
		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/error_core_ce  | Core has CE error                |
		+-------------+---------------------------------------------------------+----------------------------------+
		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/error_core_ue  | Core has UE error                |
		+-------------+---------------------------------------------------------+----------------------------------+
		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ce   | Memory has CE error              |
		+-------------+---------------------------------------------------------+----------------------------------+
		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ue   | Memory has UE error              |
		+-------------+---------------------------------------------------------+----------------------------------+
		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ce  | any PCIe controller has CE error |
		+-------------+---------------------------------------------------------+----------------------------------+
		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ue  | any PCIe controller has UE error |
		+-------------+---------------------------------------------------------+----------------------------------+
		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/error_other_ce | any other CE error               |
		+-------------+---------------------------------------------------------+----------------------------------+
		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/error_other_ue | any other UE error               |
		+-------------+---------------------------------------------------------+----------------------------------+

		UE: Uncorrect-able Error
		CE: Correct-able Error

		For details, see section `3.3 Ampere (Vendor-Specific) Error Record Formats,
		Altra Family RAS Supplement`.


What:		/sys/bus/platform/devices/smpro-errmon.*/overflow_[core|mem|pcie|other]_[ce|ue]
KernelVersion:	6.1
Contact:	Quan Nguyen <quan@os.amperecomputing.com>
Description:
		(RO) Return the overflow status of each type HW error reported:

		  - 0      : No overflow
		  - 1      : There is an overflow and the oldest HW errors are dropped

		The detail of each sysfs entries is as below:

		+-------------+-----------------------------------------------------------+---------------------------------------+
		|   Overflow  |                   Sysfs entry                             |             Description               |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ce | Core CE error overflow                |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ue | Core UE error overflow                |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ce  | Memory CE error overflow              |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ue  | Memory UE error overflow              |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ce | any PCIe controller CE error overflow |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ue | any PCIe controller UE error overflow |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ce| any other CE error overflow           |
		+-------------+-----------------------------------------------------------+---------------------------------------+
		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ue| other UE error overflow               |
		+-------------+-----------------------------------------------------------+---------------------------------------+

		where:

		  - UE: Uncorrect-able Error
		  - CE: Correct-able Error

What:		/sys/bus/platform/devices/smpro-errmon.*/[error|warn]_[smpro|pmpro]
KernelVersion:	6.1
Contact:	Quan Nguyen <quan@os.amperecomputing.com>
Description:
		(RO) Contains the internal firmware error/warning printed as hex format.

		The detail of each sysfs entries is as below:

		+---------------+------------------------------------------------------+--------------------------+
		|   Error       |                   Sysfs entry                        |        Description       |
		+---------------+------------------------------------------------------+--------------------------+
		| SMpro error   | /sys/bus/platform/devices/smpro-errmon.*/error_smpro | system has SMpro error   |
		+---------------+------------------------------------------------------+--------------------------+
		| SMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_smpro  | system has SMpro warning |
		+---------------+------------------------------------------------------+--------------------------+
		| PMpro error   | /sys/bus/platform/devices/smpro-errmon.*/error_pmpro | system has PMpro error   |
		+---------------+------------------------------------------------------+--------------------------+
		| PMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_pmpro  | system has PMpro warning |
		+---------------+------------------------------------------------------+--------------------------+

		For details, see section `5.10 RAS Internal Error Register Definitions,
		Altra Family Soc BMC Interface Specification`.

What:		/sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot]
KernelVersion:	6.1
Contact:	Quan Nguyen <quan@os.amperecomputing.com>
Description:
		(RO) Contains the detail information in case of VRD/DIMM warning/hot events
		in hex format as below::

		    AAAA

		where:

		  - ``AAAA``: The event detail information data

		The detail of each sysfs entries is as below:

		+---------------+---------------------------------------------------------------+---------------------+
		|   Event       |                        Sysfs entry                            |     Description     |
		+---------------+---------------------------------------------------------------+---------------------+
		| VRD HOT       | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_hot        | VRD Hot             |
		+---------------+---------------------------------------------------------------+---------------------+
		| VR Warn/Fault | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_warn_fault | VR Warning or Fault |
		+---------------+---------------------------------------------------------------+---------------------+
		| DIMM HOT      | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_hot       | DIMM Hot            |
		+---------------+---------------------------------------------------------------+---------------------+

		For more details, see section `5.7 GPI Status Registers,
		Altra Family Soc BMC Interface Specification`.