diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 7 | ||||
-rw-r--r-- | Documentation/admin-guide/perf/alibaba_pmu.rst | 100 | ||||
-rw-r--r-- | Documentation/admin-guide/perf/index.rst | 1 | ||||
-rw-r--r-- | Documentation/arm64/elf_hwcaps.rst | 3 | ||||
-rw-r--r-- | Documentation/arm64/silicon-errata.rst | 2 | ||||
-rw-r--r-- | Documentation/arm64/sme.rst | 3 | ||||
-rw-r--r-- | Documentation/arm64/sve.rst | 22 |
7 files changed, 136 insertions, 2 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index e92d63d3e878..faa9f4137679 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3203,6 +3203,7 @@ spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] ssbd=force-off [ARM64] + nospectre_bhb [ARM64] l1tf=off [X86] mds=off [X86] tsx_async_abort=off [X86] @@ -3609,7 +3610,7 @@ nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings. - nohugevmalloc [PPC] Disable kernel huge vmalloc mappings. + nohugevmalloc [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings. nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1. @@ -3627,6 +3628,10 @@ vulnerability. System may allow data leaks with this option. + nospectre_bhb [ARM64] Disable all mitigations for Spectre-BHB (branch + history injection) vulnerability. System may allow data leaks + with this option. + nospec_store_bypass_disable [HW] Disable all mitigations for the Speculative Store Bypass vulnerability diff --git a/Documentation/admin-guide/perf/alibaba_pmu.rst b/Documentation/admin-guide/perf/alibaba_pmu.rst new file mode 100644 index 000000000000..11de998bb480 --- /dev/null +++ b/Documentation/admin-guide/perf/alibaba_pmu.rst @@ -0,0 +1,100 @@ +============================================================= +Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU) +============================================================= + +The Yitian 710, custom-built by Alibaba Group's chip development business, +T-Head, implements uncore PMU for performance and functional debugging to +facilitate system maintenance. + +DDR Sub-System Driveway (DRW) PMU Driver +========================================= + +Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel +is independent of others to service system memory requests. And one DDR5 +channel is split into two independent sub-channels. The DDR Sub-System Driveway +implements separate PMUs for each sub-channel to monitor various performance +metrics. + +The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf. +For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two +sub-channels of the same channel in die 0. And the PMU device of die 1 is +prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000. + +Each sub-channel has 36 PMU counters in total, which is classified into +four groups: + +- Group 0: PMU Cycle Counter. This group has one pair of counters + pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count + based on DDRC core clock. + +- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used + to count the total access number of either the eight bank groups in a + selected rank, or four ranks separately in the first 4 counters. The base + transfer unit is 64B. + +- Group 2: PMU Retry Counters. This group has 10 counters, that intend to + count the total retry number of each type of uncorrectable error. + +- Group 3: PMU Common Counters. This group has 16 counters, that are used + to count the common events. + +For now, the Driveway PMU driver only uses counters in group 0 and group 3. + +The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution +for connecting an SoC application bus to DDR memory devices. The DDRCTL +receives transactions Host Interface (HIF) which is custom-defined by Synopsys. +These transactions are queued internally and scheduled for access while +satisfying the SDRAM protocol timing requirements, transaction priorities, and +dependencies between the transactions. The DDRCTL in turn issues commands on +the DDR PHY Interface (DFI) to the PHY module, which launches and captures data +to and from the SDRAM. The driveway PMUs have hardware logic to gather +statistics and performance logging signals on HIF, DFI, etc. + +By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF +interface, we could calculate the bandwidth. Example usage of counting memory +data bandwidth:: + + perf stat \ + -e ali_drw_21000/hif_wr/ \ + -e ali_drw_21000/hif_rd/ \ + -e ali_drw_21000/hif_rmw/ \ + -e ali_drw_21000/cycle/ \ + -e ali_drw_21080/hif_wr/ \ + -e ali_drw_21080/hif_rd/ \ + -e ali_drw_21080/hif_rmw/ \ + -e ali_drw_21080/cycle/ \ + -e ali_drw_23000/hif_wr/ \ + -e ali_drw_23000/hif_rd/ \ + -e ali_drw_23000/hif_rmw/ \ + -e ali_drw_23000/cycle/ \ + -e ali_drw_23080/hif_wr/ \ + -e ali_drw_23080/hif_rd/ \ + -e ali_drw_23080/hif_rmw/ \ + -e ali_drw_23080/cycle/ \ + -e ali_drw_25000/hif_wr/ \ + -e ali_drw_25000/hif_rd/ \ + -e ali_drw_25000/hif_rmw/ \ + -e ali_drw_25000/cycle/ \ + -e ali_drw_25080/hif_wr/ \ + -e ali_drw_25080/hif_rd/ \ + -e ali_drw_25080/hif_rmw/ \ + -e ali_drw_25080/cycle/ \ + -e ali_drw_27000/hif_wr/ \ + -e ali_drw_27000/hif_rd/ \ + -e ali_drw_27000/hif_rmw/ \ + -e ali_drw_27000/cycle/ \ + -e ali_drw_27080/hif_wr/ \ + -e ali_drw_27080/hif_rd/ \ + -e ali_drw_27080/hif_rmw/ \ + -e ali_drw_27080/cycle/ -- sleep 10 + +The average DRAM bandwidth can be calculated as follows: + +- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle +- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle + +Here, DDRC_WIDTH = 64 bytes. + +The current driver does not support sampling. So "perf record" is +unsupported. Also attach to a task is unsupported as the events are all +uncore. diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 9c9ece88ce53..793e1970bc05 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -18,3 +18,4 @@ Performance monitor support xgene-pmu arm_dsu_pmu thunderx2-pmu + alibaba_pmu diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst index 311021f2e560..bb34287c8e01 100644 --- a/Documentation/arm64/elf_hwcaps.rst +++ b/Documentation/arm64/elf_hwcaps.rst @@ -272,6 +272,9 @@ HWCAP2_WFXT HWCAP2_EBF16 Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010. +HWCAP2_SVE_EBF16 + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010. + 4. Unused AT_HWCAP bits ----------------------- diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index fda97b3fcf01..17d9fc5d14fb 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -110,6 +110,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst index 937147f58cc5..16d2db4c2e2e 100644 --- a/Documentation/arm64/sme.rst +++ b/Documentation/arm64/sme.rst @@ -331,6 +331,9 @@ The regset data starts with struct user_za_header, containing: been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread when the coredump was generated. +* The NT_ARM_TLS note will be extended to two registers, the second register + will contain TPIDR2_EL0 on systems that support SME and will be read as + zero with writes ignored otherwise. 9. System runtime configuration -------------------------------- diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst index 93c2c2990584..f338ee2df46d 100644 --- a/Documentation/arm64/sve.rst +++ b/Documentation/arm64/sve.rst @@ -111,7 +111,7 @@ the SVE instruction set architecture. * On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR - become unspecified on return from a syscall. + become zero on return from a syscall. * The SVE registers are not used to pass arguments to or receive results from any syscall. @@ -452,6 +452,24 @@ The regset data starts with struct user_sve_header, containing: * Modifying the system default vector length does not affect the vector length of any existing process or thread that does not make an execve() call. +10. Perf extensions +-------------------------------- + +* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register + at index 46. This register is used for DWARF unwinding when variable length + SVE registers are pushed onto the stack. + +* Its value is equivalent to the current SVE vector length (VL) in bits divided + by 64. + +* The value is included in Perf samples in the regs[46] field if + PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. + +* The value is the current value at the time the sample was taken, and it can + change over time. + +* If the system doesn't support SVE when perf_event_open is called with these + settings, the event will fail to open. Appendix A. SVE programmer's model (informative) ================================================= @@ -593,3 +611,5 @@ References http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html Procedure Call Standard for the ARM 64-bit Architecture (AArch64) + +[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst |