Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Follow up fixes for the BHI mitigations code
- Fix !SPECULATION_MITIGATIONS bug not turning off mitigations as
expected
- Work around an APIC emulation bug when the kernel is built with Clang
and run as a SEV guest
- Follow up x86 topology fixes
* tag 'x86-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Move TOPOEXT enablement into the topology parser
x86/cpu/amd: Make the NODEID_MSR union actually work
x86/cpu/amd: Make the CPUID 0x80000008 parser correct
x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
x86/bugs: Fix BHI handling of RRSBA
x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
x86/bugs: Fix BHI documentation
x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
x86/topology: Don't update cpu_possible_map in topo_set_cpuids()
x86/bugs: Fix return type of spectre_bhi_state()
x86/apic: Force native_apic_mem_read() to use the MOV instruction
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix the x86 PMU multi-counter code returning invalid data in certain
circumstances"
* tag 'perf-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix out of range data
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix the TLBI RANGE operand calculation causing live migration under
KVM/arm64 to miss dirty pages due to stale TLB entries"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: tlb: Fix TLBI RANGE operand
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"The device tree changes this time are all for NXP i.MX platforms,
addressing issues with clocks and regulators on i.MX7 and i.MX8.
The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for
regressions that came in.
The ARM SCMI and FF-A firmware interfaces get a couple of minor bug
fixes.
A regression fix for RISC-V cache management addresses a problem with
probe order on Sifive cores"
* tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits)
MAINTAINERS: Change Krzysztof Kozlowski's email address
arm64: dts: imx8qm-ss-dma: fix can lpcg indices
arm64: dts: imx8-ss-dma: fix can lpcg indices
arm64: dts: imx8-ss-dma: fix adc lpcg indices
arm64: dts: imx8-ss-dma: fix pwm lpcg indices
arm64: dts: imx8-ss-dma: fix spi lpcg indices
arm64: dts: imx8-ss-conn: fix usb lpcg indices
arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
ARM: dts: imx7-mba7: Use 'no-mmc' property
arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
cache: sifive_ccache: Partially convert to a platform driver
firmware: arm_scmi: Make raw debugfs entries non-seekable
firmware: arm_scmi: Fix wrong fastchannel initialization
firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
ARM: OMAP2+: fix USB regression on Nokia N8x0
mmc: omap: restore original power up/down steps
mmc: omap: fix deferred probe
...
|
|
Commit d96c36004e31 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig
entry") removed a hidden tab because it apparently showed breakage in
some third-party kernel config parsing tool.
It wasn't clear what tool it was, but let's make sure it gets fixed.
Because if you can't parse tabs as whitespace, you should not be parsing
the kernel Kconfig files.
In fact, let's make such breakage more obvious than some esoteric ftrace
record size option. If you can't parse tabs, you can't have page sizes.
Yes, tab-vs-space confusion is sadly a traditional Unix thing, and
'make' is famous for being broken in this regard. But no, that does not
mean that it's ok.
I'd add more random tabs to our Kconfig files, but I don't want to make
things uglier than necessary. But it *might* bbe necessary if it turns
out we see more of this kind of silly tooling.
Fixes: d96c36004e31 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry")
Link: https://lore.kernel.org/lkml/CAHk-=wj-hLLN_t_m5OL4dXLaxvXKy_axuoJYXif7iczbfgAevQ@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
"Fix for syscall_get_nr() to make it work even if tracing is disabled"
* tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: scall: Save thread_info.syscall unconditionally on entry
|
|
The topology rework missed that early_init_amd() tries to re-enable the
Topology Extensions when the BIOS disabled them.
The new parser is invoked before early_init_amd() so the re-enable attempt
happens too late.
Move it into the AMD specific topology parser code where it belongs.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/878r1j260l.ffs@tglx
|
|
A system with NODEID_MSR was reported to crash during early boot without
any output.
The reason is that the union which is used for accessing the bitfields in
the MSR is written wrongly and the resulting executable code accesses the
wrong part of the MSR data.
As a consequence a later division by that value results in 0 and that
result is used for another division as divisor, which obviously does not
work well.
The magic world of C, unions and bitfields:
union {
u64 bita : 3,
bitb : 3;
u64 all;
} x;
x.all = foo();
a = x.bita;
b = x.bitb;
results in the effective executable code of:
a = b = x.bita;
because bita and bitb are treated as union members and therefore both end
up at bit offset 0.
Wrapping the bitfield into an anonymous struct:
union {
struct {
u64 bita : 3,
bitb : 3;
};
u64 all;
} x;
works like expected.
Rework the NODEID_MSR union in exactly that way to cure the problem.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Laura Nao <laura.nao@collabora.com>
Link: https://lore.kernel.org/r/20240410194311.596282919@linutronix.de
Closes: https://lore.kernel.org/all/20240322175210.124416-1-laura.nao@collabora.com/
|
|
CPUID 0x80000008 ECX.cpu_nthreads describes the number of threads in the
package. The parser uses this value to initialize the SMT domain level.
That's wrong because cpu_nthreads does not describe the number of threads
per physical core. So this needs to set the CORE domain level and let the
later parsers set the SMT shift if available.
Preset the SMT domain level with the assumption of one thread per core,
which is correct ifrt here are no other CPUID leafs to parse, and propagate
cpu_nthreads and the core level APIC bitwidth into the CORE domain.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Laura Nao <laura.nao@collabora.com>
Link: https://lore.kernel.org/r/20240410194311.535206450@linutronix.de
|
|
For consistency with the other CONFIG_MITIGATION_* options, replace the
CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
CONFIG_MITIGATION_SPECTRE_BHI option.
[ mingo: Fix ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
|
|
Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.
Remove it.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Some cosmetic changes (Erni Sri Satya Vennela, Li Zhijian)
- Introduce hv_numa_node_to_pxm_info() (Nuno Das Neves)
- Fix KVP daemon to handle IPv4 and IPv6 combination for keyfile format
(Shradha Gupta)
- Avoid freeing decrypted memory in a confidential VM (Rick Edgecombe
and Michael Kelley)
* tag 'hyperv-fixes-signed-20240411' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted
uio_hv_generic: Don't free decrypted memory
hv_netvsc: Don't free decrypted memory
Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format
hv: vmbus: Convert sprintf() family to sysfs_emit() family
mshyperv: Introduce hv_numa_node_to_pxm_info()
x86/hyperv: Cosmetic changes for hv_apic.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- make {virt, phys, page, pfn} translation work with KFENCE for
LoongArch (otherwise NVMe and virtio-blk cannot work with KFENCE
enabled)
- update dts files for Loongson-2K series to make devices work
correctly
- fix a build error
* tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors
LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET
LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI
LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC
LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC
LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE
LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
mm: Move lowmem_page_address() a little later
|
|
While syscall hardening helps prevent some BHI attacks, there's still
other low-hanging fruit remaining. Don't classify it as a mitigation
and make it clear that the system may still be vulnerable if it doesn't
have a HW or SW mitigation enabled.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
|
|
The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
disabled by the Spectre v2 mitigation (or can otherwise be disabled by
the BHI mitigation itself if needed). In that case retpolines are fine.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
|
|
So we are using the 'ia32_cap' value in a number of places,
which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
But there's very little 'IA32' about it - this isn't 32-bit only
code, nor does it originate from there, it's just a historic
quirk that many Intel MSR names are prefixed with IA32_.
This is already clear from the helper method around the MSR:
x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
its role and with the naming of the helper function.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
|
|
There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
over. It's even read in the BHI sysfs function which is a big no-no.
Just read it once and cache it.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
|
|
KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
pages are collected by VMM and the page table entries become write
protected during live migration. Unfortunately, the operand passed
to the TLBI RANGE instruction isn't correctly sorted out due to the
commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
It leads to crash on the destination VM after live migration because
TLBs aren't flushed completely and some of the dirty pages are missed.
For example, I have a VM where 8GB memory is assigned, starting from
0x40000000 (1GB). Note that the host has 4KB as the base page size.
In the middile of migration, kvm_tlb_flush_vmid_range() is executed
to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
__kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
in the __flush_tlb_range_op() until the variable @scale underflows
and becomes -9, 0xffff708000040000 is set as the operand. The operand
is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
invalid @scale and @num.
Fix it by extending __TLBI_RANGE_NUM() to support the combination of
SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
be returned from the macro, meaning the TLBs for 0x200000 pages in the
above example can be flushed in one shoot with SCALE#3 and NUM#31. The
macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
comments are also adjusted accordingly.
Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
Cc: stable@kernel.org # v6.6+
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240405035852.1532010-2-gshan@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
topo_set_cpuids() updates cpu_present_map and cpu_possible map. It is
invoked during enumeration and "physical hotplug" operations. In the
latter case this results in a kernel crash because cpu_possible_map is
marked read only after init completes.
There is no reason to update cpu_possible_map in that function. During
enumeration cpu_possible_map is not relevant and gets fully initialized
after enumeration completed. On "physical hotplug" the bit is already set
because the kernel allows only CPUs to be plugged which have been
enumerated and associated to a CPU number during early boot.
Remove the bogus update of cpu_possible_map.
Fixes: 0e53e7b656cf ("x86/cpu/topology: Sanitize the APIC admission logic")
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87ttkc6kwx.ffs@tglx
|
|
LoongArch's include/asm/addrspace.h uses SZ_32M and SZ_16K, so add
<linux/sizes.h> to provide those macros to prevent build errors:
In file included from ../arch/loongarch/include/asm/io.h:11,
from ../include/linux/io.h:13,
from ../include/linux/io-64-nonatomic-lo-hi.h:5,
from ../drivers/cxl/pci.c:4:
../include/asm-generic/io.h: In function 'ioport_map':
../arch/loongarch/include/asm/addrspace.h:124:25: error: 'SZ_32M' undeclared (first use in this function); did you mean 'PS_32M'?
124 | #define PCI_IOSIZE SZ_32M
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Current dts file for Loongson-2K2000's GMAC/GNET is incomplete, both irq
and phy descriptions are missing. Add them to make GMAC/GNET work.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Current dts file for Loongson-2K2000 misses the interrupt-controller &
interrupt-cells descriptions in the msi-controller node, and misses the
msi-parent link in the pci root node. Add them to support PCI-MSI.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Some Loongson-2K2000 platforms have ISA/LPC devices such as Super-IO,
define an ISA node in the dts file to avoid access error. Also adjust
the PCI io resource range to avoid confliction.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Some Loongson-2K1000 platforms have ISA/LPC devices such as Super-IO,
define an ISA node in the dts file to avoid access error.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
When enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the
following backtraces when running LongArch kernels.
[ 2.496257] kernel BUG at include/linux/scatterlist.h:187!
...
[ 2.501925] Call Trace:
[ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0
[ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4
[ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4
[ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690
[ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40
[ 2.502775] [<90000000041b4508>] kthread+0x138/0x158
[ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4
The backtrace is always similar but not exactly the same. It is always
triggered from cryptomgr_test, but not always from the same test.
Analysis shows that with CONFIG_KFENCE active, the address returned from
kmalloc() and friends is not always below vm_map_base. It is allocated
by kfence_alloc() which at least sometimes seems to get its memory from
an address space above vm_map_base. This causes __virt_addr_valid() to
return false for the affected objects.
Let __virt_addr_valid() return 1 for kfence pool addresses, this make
virt_addr_valid()/__virt_addr_valid() work with KFENCE.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
KFENCE changes virt_to_page() to be able to translate tlb mapped virtual
addresses, but forget to change virt_to_phys()/phys_to_virt() and other
translation functions as well. This patch fix it, otherwise some drivers
(such as nvme and virtio-blk) cannot work with KFENCE.
All {virt, phys, page, pfn} translation functions are updated:
1, virt_to_pfn()/pfn_to_virt();
2, virt_to_page()/page_to_virt();
3, virt_to_phys()/phys_to_virt().
DMW/TLB mapped addresses are distinguished by comparing the vaddress
with vm_map_base in virt_to_xyz(), and we define WANT_PAGE_VIRTUAL in
the KFENCE case for the reverse translations, xyz_to_virt().
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The definition of spectre_bhi_state() incorrectly returns a const char
* const. This causes the a compiler warning when building with W=1:
warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
2812 | static const char * const spectre_bhi_state(void)
Remove the const qualifier from the pointer.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.intel.com
|
|
Prepare to fix aspects of the new BHI code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On x86 each struct cpu_hw_events maintains a table for counter assignment but
it missed to update one for the deleted event in x86_pmu_del(). This
can make perf_clear_dirty_counters() reset used counter if it's called
before event scheduling or enabling. Then it would return out of range
data which doesn't make sense.
The following code can reproduce the problem.
$ cat repro.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.disabled = 1,
};
void *worker(void *arg)
{
int cpu = (long)arg;
int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
void *p;
do {
ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
munmap(p, 4096);
ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
} while (1);
return NULL;
}
int main(void)
{
int i;
int n = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t *th = calloc(n, sizeof(*th));
for (i = 0; i < n; i++)
pthread_create(&th[i], NULL, worker, (void *)(long)i);
for (i = 0; i < n; i++)
pthread_join(th[i], NULL);
free(th);
return 0;
}
And you can see the out of range data using perf stat like this.
Probably it'd be easier to see on a large machine.
$ gcc -o repro repro.c -pthread
$ ./repro &
$ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%)
1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%)
1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%)
2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%)
2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%)
2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%)
2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%)
3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%)
3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%)
3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%)
3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle
It should move the contents in the cpuc->assign as well.
Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
|
|
thread_info.syscall is used by syscall_get_nr to supply syscall nr
over a thread stack frame.
Previously, thread_info.syscall is only saved at syscall_trace_enter
when syscall tracing is enabled. However rest of the kernel code do
expect syscall_get_nr to be available without syscall tracing. The
previous design breaks collect_syscall.
Move saving process to syscall entry to fix it.
Reported-by: Xi Ruoyao <xry111@xry111.site>
Link: https://github.com/util-linux/util-linux/issues/2867
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 6.9:
- A couple of i.MX7 board fixes from Fabio Estevam that use correct
'no-mmc' property and pass 'link-frequencies' for OV2680.
- A series from Frank Li to fix LPCG clock indices for i.MX8 subsystems.
- A couple of changes from Tim Harvey that fix USB VBUS regulator for
imx8mp-venice board.
* tag 'imx-fixes-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8qm-ss-dma: fix can lpcg indices
arm64: dts: imx8-ss-dma: fix can lpcg indices
arm64: dts: imx8-ss-dma: fix adc lpcg indices
arm64: dts: imx8-ss-dma: fix pwm lpcg indices
arm64: dts: imx8-ss-dma: fix spi lpcg indices
arm64: dts: imx8-ss-conn: fix usb lpcg indices
arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
ARM: dts: imx7-mba7: Use 'no-mmc' property
arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
Link: https://lore.kernel.org/r/Zg5rfaVVvD9egoBK@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
GPIO regression fixes for n8x0
A series of fixes for n8x0 GPIO regressions caused by the changes to use
GPIO descriptors.
* tag 'omap-for-v6.9/n8x0-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: fix USB regression on Nokia N8x0
mmc: omap: restore original power up/down steps
mmc: omap: fix deferred probe
mmc: omap: fix broken slot switch lookup
ARM: OMAP2+: fix N810 MMC gpiod table
ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
Link: https://lore.kernel.org/r/pull-1712135932-125424@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Pull x86 mitigations from Thomas Gleixner:
"Mitigations for the native BHI hardware vulnerabilty:
Branch History Injection (BHI) attacks may allow a malicious
application to influence indirect branch prediction in kernel by
poisoning the branch history. eIBRS isolates indirect branch targets
in ring0. The BHB can still influence the choice of indirect branch
predictor entry, and although branch predictor entries are isolated
between modes when eIBRS is enabled, the BHB itself is not isolated
between modes.
Add mitigations against it either with the help of microcode or with
software sequences for the affected CPUs"
[ This also ends up enabling the full mitigation by default despite the
system call hardening, because apparently there are other indirect
calls that are still sufficiently reachable, and the 'auto' case just
isn't hardened enough.
We'll have some more inevitable tweaking in the future - Linus ]
* tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Add BHI_NO
x86/bhi: Mitigate KVM by default
x86/bhi: Add BHI mitigation knob
x86/bhi: Enumerate Branch History Injection (BHI) bug
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
x86/bhi: Add support for clearing branch history at syscall entry
x86/syscall: Don't force use of indirect calls for system calls
x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
|
|
Intel processors that aren't vulnerable to BHI will set
MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to
determine if they need to implement BHI mitigations or not. Allow this bit
to be passed to the guests.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
BHI mitigation mode spectre_bhi=auto does not deploy the software
mitigation by default. In a cloud environment, it is a likely scenario
where userspace is trusted but the guests are not trusted. Deploying
system wide mitigation in such cases is not desirable.
Update the auto mode to unconditionally mitigate against malicious
guests. Deploy the software sequence at VMexit in auto mode also, when
hardware mitigation is not available. Unlike the force =on mode,
software sequence is not deployed at syscalls in auto mode.
Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Branch history clearing software sequences and hardware control
BHI_DIS_S were defined to mitigate Branch History Injection (BHI).
Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:
auto - Deploy the hardware mitigation BHI_DIS_S, if available.
on - Deploy the hardware mitigation BHI_DIS_S, if available,
otherwise deploy the software sequence at syscall entry and
VMexit.
off - Turn off BHI mitigation.
The default is auto mode which does not deploy the software sequence
mitigation. This is because of the hardening done in the syscall
dispatch path, which is the likely target of BHI.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Mitigation for BHI is selected based on the bug enumeration. Add bits
needed to enumerate BHI bug.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Newer processors supports a hardware control BHI_DIS_S to mitigate
Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel
from userspace BHI attacks without having to manually overwrite the
branch history.
Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL.
Mitigation is enabled later.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Branch History Injection (BHI) attacks may allow a malicious application to
influence indirect branch prediction in kernel by poisoning the branch
history. eIBRS isolates indirect branch targets in ring0. The BHB can
still influence the choice of indirect branch predictor entry, and although
branch predictor entries are isolated between modes when eIBRS is enabled,
the BHB itself is not isolated between modes.
Alder Lake and new processors supports a hardware control BHI_DIS_S to
mitigate BHI. For older processors Intel has released a software sequence
to clear the branch history on parts that don't support BHI_DIS_S. Add
support to execute the software sequence at syscall entry and VMexit to
overwrite the branch history.
For now, branch history is not cleared at interrupt entry, as malicious
applications are not believed to have sufficient control over the
registers, since previous register state is cleared at interrupt
entry. Researchers continue to poke at this area and it may become
necessary to clear at interrupt entry as well in the future.
This mitigation is only defined here. It is enabled later.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Co-developed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Make <asm/syscall.h> build a switch statement instead, and the compiler can
either decide to generate an indirect jump, or - more likely these days due
to mitigations - just a series of conditional branches.
Yes, the conditional branches also have branch prediction, but the branch
prediction is much more controlled, in that it just causes speculatively
running the wrong system call (harmless), rather than speculatively running
possibly wrong random less controlled code gadgets.
This doesn't mitigate other indirect calls, but the system call indirection
is the first and most easily triggered case.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Change the format of the 'spectre_v2' vulnerabilities sysfs file
slightly by converting the commas to semicolons, so that mitigations for
future variants can be grouped together and separated by commas.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When done from a virtual machine, instructions that touch APIC memory
must be emulated. By convention, MMIO accesses are typically performed
via io.h helpers such as readl() or writeq() to simplify instruction
emulation/decoding (ex: in KVM hosts and SEV guests) [0].
Currently, native_apic_mem_read() does not follow this convention,
allowing the compiler to emit instructions other than the MOV
instruction generated by readl(). In particular, when the kernel is
compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
would emit a TESTL instruction which is not supported by the SEV-ES
emulator, causing a boot failure in that environment. It is likely the
same problem would happen in a TDX guest as that uses the same
instruction emulator as SEV-ES.
To make sure all emulators can emulate APIC memory reads via MOV, use
the readl() function in native_apic_mem_read(). It is expected that any
emulator would support MOV in any addressing mode as it is the most
generic and is what is usually emitted currently.
The TESTL instruction is emitted when native_apic_mem_read() is inlined
into apic_mem_wait_icr_idle(). The emulator comes from
insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
extend insn_decode_mmio() to support more instructions since, in theory,
the compiler could choose to output nearly any instruction for such
reads which would bloat the emulator beyond reason.
[0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/
[ bp: Massage commit message, fix typos. ]
Signed-off-by: Adam Dunlap <acdunlap@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Kevin Loughlin <kevinloughlin@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix MCE timer reinit locking
- Fix/improve CoCo guest random entropy pool init
- Fix SEV-SNP late disable bugs
- Fix false positive objtool build warning
- Fix header dependency bug
- Fix resctrl CPU offlining bug
* tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
x86/CPU/AMD: Track SNP host status with cc_platform_*()
x86/cc: Add cc_platform_set/_clear() helpers
x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
x86/coco: Require seeding RNG with RDRAND on CoCo systems
x86/numa/32: Include missing <asm/pgtable_areas.h>
x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Fix various timer bugs:
- Fix a timer migration bug that may result in missed events
- Fix timer migration group hierarchy event updates
- Fix a PowerPC64 build warning
- Fix a handful of DocBook annotation bugs"
* tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Return early on deactivation
timers/migration: Fix ignored event due to missing CPU update
vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h
timers: Fix text inconsistencies and spelling
tick/sched: Fix struct tick_sched doc warnings
tick/sched: Fix various kernel-doc warnings
timers: Fix kernel-doc format and add Return values
time/timekeeping: Fix kernel-doc warnings and typos
time/timecounter: Fix inline documentation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Ingo Molnar:
"Fix a combined PEBS events bug on x86 Intel CPUs"
* tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
|
|
srso_alias_untrain_ret() is special code, even if it is a dummy
which is called in the !SRSO case, so annotate it like its real
counterpart, to address the following objtool splat:
vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0
Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
|
|
We want to fix:
0e110732473e ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO")
So merge in Linus's latest into x86/urgent to have it available.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix NIOS2 boot with external DTB
- Add missing synchronization needed between fw_devlink and DT overlay
removals
- Fix some unit-address regex's to be hex only
- Drop some 10+ year old "unstable binding" statements
- Add new SoCs to QCom UFS binding
- Add TPM bindings to TPM maintainers
* tag 'devicetree-fixes-for-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
nios2: Only use built-in devicetree blob if configured to do so
dt-bindings: timer: narrow regex for unit address to hex numbers
dt-bindings: soc: fsl: narrow regex for unit address to hex numbers
dt-bindings: remoteproc: ti,davinci: remove unstable remark
dt-bindings: clock: ti: remove unstable remark
dt-bindings: clock: keystone: remove unstable remark
of: module: prevent NULL pointer dereference in vsnprintf()
dt-bindings: ufs: qcom: document SM6125 UFS
dt-bindings: ufs: qcom: document SC7180 UFS
dt-bindings: ufs: qcom: document SC8180X UFS
of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
driver core: Introduce device_link_wait_removal()
docs: dt-bindings: add missing address/size-cells to example
MAINTAINERS: Add TPM DT bindings to TPM maintainers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"8 hotfixes, 3 are cc:stable
There are a couple of fixups for this cycle's vmalloc changes and one
for the stackdepot changes. And a fix for a very old x86 PAT issue
which can cause a warning splat"
* tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
stackdepot: rename pool_index to pool_index_plus_1
x86/mm/pat: fix VM_PAT handling in COW mappings
MAINTAINERS: change vmware.com addresses to broadcom.com
selftests/mm: include strings.h for ffsl
mm: vmalloc: fix lockdep warning
mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
init: open output files from cpio unpacking with O_LARGEFILE
mm/secretmem: fix GUP-fast succeeding on secretmem folios
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"arm64/ptrace fix to use the correct SVE layout based on the saved
floating point state rather than the TIF_SVE flag. The latter may be
left on during syscalls even if the SVE state is discarded"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/ptrace: Use saved floating point state type to determine SVE layout
|