Age | Commit message (Collapse) | Author |
|
kexec on pseries disables AIL (reloc_on_exc), required for scv
instruction support, before other CPUs have been shut down. This means
they can execute scv instructions after AIL is disabled, which causes an
interrupt at an unexpected entry location that crashes the kernel.
Change the kexec sequence to disable AIL after other CPUs have been
brought down.
As a refresher, the real-mode scv interrupt vector is 0x17000, and the
fixed-location head code probably couldn't easily deal with implementing
such high addresses so it was just decided not to support that interrupt
at all.
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/3b4b2943-49ad-4619-b195-bc416f1d1409@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Link: https://msgid.link/20240625134047.298759-1-npiggin@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.
kernel BUG at mm/usercopy.c:102!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 #85
Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3)
MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e
CFAR: c0000000001fdc80 IRQMASK: 0
[ ... GPRs omitted ... ]
NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
Call Trace:
usercopy_abort+0x74/0xb0 (unreliable)
__check_heap_object+0xf8/0x120
check_heap_object+0x218/0x240
__check_object_size+0x84/0x1a4
dtl_file_read+0x17c/0x2c4
full_proxy_read+0x8c/0x110
vfs_read+0xdc/0x3a0
ksys_read+0x84/0x144
system_call_exception+0x124/0x330
system_call_vectored_common+0x15c/0x2ec
--- interrupt: 3000 at 0x7fff81f3ab34
Commit 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.
Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
|
|
Replace all usages of of_root by of_find_node_by_path("/")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-5-mpe@ellerman.id.au
|
|
lppaca_shared_proc() takes a pointer to the lppaca which is typically
accessed through get_lppaca(). With DEBUG_PREEMPT enabled, this leads
to checking if preemption is enabled, for example:
BUG: using smp_processor_id() in preemptible [00000000] code: grep/10693
caller is lparcfg_data+0x408/0x19a0
CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2
Call Trace:
dump_stack_lvl+0x154/0x200 (unreliable)
check_preemption_disabled+0x214/0x220
lparcfg_data+0x408/0x19a0
...
This isn't actually a problem however, as it does not matter which
lppaca is accessed, the shared proc state will be the same.
vcpudispatch_stats_procfs_init() already works around this by disabling
preemption, but the lparcfg code does not, erroring any time
/proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled.
Instead of disabling preemption on the caller side, rework
lppaca_shared_proc() to not take a pointer and instead directly access
the lppaca, bypassing any potential preemption checks.
Fixes: f13c13a00512 ("powerpc: Stop using non-architected shared_proc field in lppaca")
Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rework to avoid needing a definition in paca.h and lppaca.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-4-mpe@ellerman.id.au
|
|
Parse the device tree in early init to find the memory block size to be
used by the kernel. Consolidate the memory block size device tree parsing
to one helper and use that on both powernv and pseries. We still want to
use machine-specific callback because on all machine types other than
powernv and pseries we continue to return MIN_MEMORY_BLOCK_SIZE.
pseries_memory_block_size used to look for the second memory
block (memory@x) to determine the memory_block_size value. This patch
changed that to look at all memory blocks and make sure we can map them all
correctly using the computed memory block size value.
Add workaround to force 256MB memory block size if device driver managed
memory such as GPU memory is present. This helps to add GPU memory
that is not aligned to 1G.
Co-developed-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230801044447.11275-1-aneesh.kumar@linux.ibm.com
|
|
As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.
The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").
For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.
So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230705145143.40545-9-ldufour@linux.ibm.com
|
|
ppc_md.calibrate_decr() is a mandatory item. Its nullity is never
checked so it must be non null on all platforms.
Most platforms define generic_calibrate_decr() as their
ppc_md.calibrate_decr(). Have time_init() call
generic_calibrate_decr() when ppc_md.calibrate_decr() is NULL,
and remove default assignment from all machines.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6cb9865d916231c38401ba34ad1a98c249fae135.1676711562.git.christophe.leroy@csgroup.eu
|
|
Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in the
Type1 VFIO driver. Recent development added 2 uses of iommu_ops to the
generic VFIO which broke POWER:
- a coherency capability check;
- blocking IOMMU domain - iommu_group_dma_owner_claimed()/...
This adds a simple iommu_ops which reports support for cache coherency
and provides a basic support for blocking domains. No other domain types
are implemented so the default domain is NULL.
Since now iommu_ops controls the group ownership, this takes it out of
VFIO.
This adds an IOMMU device into a pci_controller (=PHB) and registers it
in the IOMMU subsystem, iommu_ops is registered at this point. This
setup is done in postcore_initcall_sync.
This replaces iommu_group_add_device() with iommu_probe_device() as the
former misses necessary steps in connecting PCI devices to IOMMU
devices. This adds a comment about why explicit iommu_probe_device() is
still needed.
The previous discussion is here:
https://lore.kernel.org/r/20220707135552.3688927-1-aik@ozlabs.ru/
https://lore.kernel.org/r/20220701061751.1955857-1-aik@ozlabs.ru/
Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[mpe: Fix CONFIG_IOMMU_API=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2000135730.16998523.1678123860135.JavaMail.zimbra@raptorengineeringinc.com
|
|
With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.
Most conversions are trivial, but a few are worth describing in more
detail:
* Error injection token comparisons for lockdown purposes are
consolidated into a simple predicate: token_is_restricted_errinjct().
* A couple of special cases in block_rtas_call() do not use
rtas_token() but perform string comparisons against names in the
function table. These are converted to compare against token values
instead, which is logically equivalent but less expensive.
* The lookup for the ibm,os-term token can be deferred until needed,
instead of caching it at boot to avoid device tree traversal during
panic.
* Since rtas_function_token() accesses a read-only data structure
without taking any locks, xmon's lookup of set-indicator can be
performed as needed instead of cached at startup.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-20-26929c8cce78@linux.ibm.com
|
|
Convert the direct invocation of the ibm,get-system-parameter RTAS
function to papr_sysparm_get().
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-15-26929c8cce78@linux.ibm.com
|
|
The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.
pSeries_cmo_feature_init() ignores this, making it possible to fail to
detect cooperative memory overcommit capabilities during boot.
Move the RTAS call into a conventional rtas_busy_delay()-based
loop, dropping unnecessary clearing of rtas_data_buf.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-5-26929c8cce78@linux.ibm.com
|
|
Add firmware version details to the hardware description, which is
printed at boot and in case of an oops.
Use /hypervisor if we find it, though currently it only exists if we're
running under qemu.
Look for "ibm,powervm-partition" which is specified in PAPR+ v2.11 and
tells us we're running under PowerVM.
Failing that look for "ibm,fw-net-version" which is seen on PowerVM
going back to at least Power6.
eg: Hardware name: ... of:IBM,FW860.42 (SV860_138) hv:phyp
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-6-mpe@ellerman.id.au
|
|
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen
time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled.
Implement this using the VPA accumulated wait counters.
Note this will not work on current KVM hosts because KVM does not
implement the VPA dispatch counters (yet). It could be implemented
with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE,
but that is not necessary for the more limited accounting provided
by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and
has downsides like potential log wrap.
From Shrikanth:
[...] it was tested on Power10 [PowerVM] Shared LPAR. system has two
LPAR. we will call first one LPAR1 and second one as LPAR2. Test was
carried out in SMT=1. Similar observation was seen in SMT=8 as well.
LPAR config header from each LPAR is below. LPAR1 is twice as big as
LPAR2. Since Both are sharing the same underlying hardware, work
stealing will happen when both the LPAR's are contending for the same
resource.
LPAR1:
type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00
LPAR2:
type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00
mpstat was used to check for the utilization. stress-ng has been used
as the workload. Few cases are tested. when the both LPAR are idle
there is no steal time. when LPAR1 starts running at 100% which
consumes all of the physical resource, steal time starts to get
accounted. With LPAR1 running at 100% and LPAR2 starts running, steal
time starts increasing. This is as expected. When the LPAR2 Load is
increased further, steal time increases further.
Case 1: 0% LPAR1; 0% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 99.95
Case 2: 100% LPAR1; 0% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
97.68 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00
Case 3: 100% LPAR1; 50% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
86.34 0.00 0.10 0.00 0.00 0.03 13.54 0.00 0.00 0.00
Case 4: 100% LPAR1; 100% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
78.54 0.00 0.07 0.00 0.00 0.02 21.36 0.00 0.00 0.00
Case 5: 50% LPAR1; 100% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
49.37 0.00 0.00 0.00 0.00 0.00 1.17 0.00 0.00 49.47
Patch is accounting for the steal time and basic tests are holding
good.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
[mpe: Add SPDX tag to new paravirt_api_clock.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
|
|
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits
guest control of one or more virtual watchdog timers.
These timers do not conform to PowerPC device conventions. They are
not affixed to any extant bus, nor do they have full representation in
the device tree.
As a workaround we represent them as platform devices.
This patch registers a single platform device, "pseries-wdt", with the
platform bus if the FW_FEATURE_WATCHDOG flag is set.
A driver for this device, "pseries-wdt", will be introduced in a
subsequent patch.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713202335.1217647-4-cheloha@linux.ibm.com
|
|
Merge our fixes branch. In particular this brings in commit
986481618023 ("powerpc/book3e: Fix PUD allocation size in
map_kernel_page()") which fixes a build failure in next, because commit
2db2008e6363 ("powerpc/64e: Rewrite p4d_populate() as a static inline
function") depends on it.
|
|
There is no need to read the H_BLOCK_REMOVE characteristics when running in
Radix mode because this hcall is never called.
Furthermore since the commit 387e220a2e5e ("powerpc/64s: Move hash MMU
support code under CONFIG_PPC_64S_HASH_MMU") define
pseries_lpar_read_hblkrm_characteristics as un empty function if
CONFIG_PPC_64S_HASH_MMU is not set, the #ifdef block can be removed.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220523164353.26441-1-ldufour@linux.ibm.com
|
|
asm/machdep.h doesn't need asm/setup.h
Remove it.
Add it directly in files that needs it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3b1dfb19a2c3265fb4abc2bfc7b6eae9261a998b.1654966508.git.christophe.leroy@csgroup.eu
|
|
The platform's RNG must be available before random_init() in order to be
useful for initial seeding, which in turn means that it needs to be
called from setup_arch(), rather than from an init call. Fortunately,
each platform already has a setup_arch function pointer, which means
it's easy to wire this up. This commit also removes some noisy log
messages that don't add much.
Fixes: a489043f4626 ("powerpc/pseries: Implement arch_get_random_long() based on H_RANDOM")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220611151015.548325-4-Jason@zx2c4.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Convert to the generic mmap support (ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT)
- Add support for outline-only KASAN with 64-bit Radix MMU (P9 or later)
- Increase SIGSTKSZ and MINSIGSTKSZ and add support for AT_MINSIGSTKSZ
- Enable the DAWR (Data Address Watchpoint) on POWER9 DD2.3 or later
- Drop support for system call instruction emulation
- Many other small features and fixes
Thanks to Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Bagas
Sanjaya, Bjorn Helgaas, Bo Liu, Chen Huang, Christophe Leroy, Colin Ian
King, Daniel Axtens, Dwaipayan Ray, Fabiano Rosas, Finn Thain, Frank
Rowand, Fuqian Huang, Guilherme G. Piccoli, Hangyu Hua, Haowen Bai,
Haren Myneni, Hari Bathini, He Ying, Jason Wang, Jiapeng Chong, Jing
Yangyang, Joel Stanley, Julia Lawall, Kajol Jain, Kevin Hao, Krzysztof
Kozlowski, Laurent Dufour, Lv Ruyi, Madhavan Srinivasan, Magali Lemes,
Miaoqian Lin, Minghao Chi, Nathan Chancellor, Naveen N. Rao, Nicholas
Piggin, Oliver O'Halloran, Oscar Salvador, Pali Rohár, Paul Mackerras,
Peng Wu, Qing Wang, Randy Dunlap, Reza Arbab, Russell Currey, Sohaib
Mohamed, Vaibhav Jain, Vasant Hegde, Wang Qing, Wang Wensheng, Xiang
wangx, Xiaomeng Tong, Xu Wang, Yang Guang, Yang Li, Ye Bin, YueHaibing,
Yu Kuai, Zheng Bin, Zou Wei, and Zucheng Zheng.
* tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits)
powerpc/64: Include cache.h directly in paca.h
powerpc/64s: Only set HAVE_ARCH_UNMAPPED_AREA when CONFIG_PPC_64S_HASH_MMU is set
powerpc/xics: Include missing header
powerpc/powernv/pci: Drop VF MPS fixup
powerpc/fsl_book3e: Don't set rodata RO too early
powerpc/microwatt: Add mmu bits to device tree
powerpc/powernv/flash: Check OPAL flash calls exist before using
powerpc/powermac: constify device_node in of_irq_parse_oldworld()
powerpc/powermac: add missing g5_phy_disable_cpu1() declaration
selftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch"
powerpc: Enable the DAWR on POWER9 DD2.3 and above
powerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask
powerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask
powerpc: Fix all occurences of "the the"
selftests/powerpc/pmu/ebb: remove fixed_instruction.S
powerpc/platforms/83xx: Use of_device_get_match_data()
powerpc/eeh: Drop redundant spinlock initialization
powerpc/iommu: Add missing of_node_put in iommu_init_early_dart
powerpc/pseries/vas: Call misc_deregister if sysfs init fails
powerpc/papr_scm: Fix leaking nvdimm_events_map elements
...
|
|
The kexec code paths involve code that necessarily run in real mode, as
CPUs are disabled and control is transferred to the new kernel. Disable
address sanitization for the kexec code and the functions called in real
mode on CPUs being disabled.
[paulus@ozlabs.org: combined a few work-in-progress commits of
Daniel's and wrote the commit message.]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Move pseries_machine_kexec() into kexec.c so setup.c can be instrumented]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTFSQ2TUSEaDdVC@cleo
|
|
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h,
asm/pci.h etc...
Include the needed headers, and remove asm/prom.h when it was
needed exclusively for pulling necessary headers.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
|
|
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
|
|
Pass a boolean flag to indicate if swiotlb needs to be enabled based on
the addressing needs, and replace the verbose argument with a set of
flags, including one to force enable bounce buffering.
Note that this patch removes the possibility to force xen-swiotlb use
with the swiotlb=force parameter on the command line on x86 (arm and
arm64 never supported that), but this interface will be restored shortly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
Merge some more commits from our KVM topic branch. In particular this
brings in some commits that depend on a new capability that was merged
via the KVM tree for v5.18.
|
|
Use KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL
resource mode to 3 with the H_SET_MODE hypercall. This capability
differs between processor types and KVM types (PR, HV, Nested HV), and
affects guest-visible behaviour.
QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and
use the KVM CAP if available to determine KVM support[2].
[1] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00437.html
[2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00439.html
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
[mpe: Rebase onto 93b71801a827 from kvm-ppc-cap-210 branch, add EXPORT_SYMBOL]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220222064727.2314380-4-npiggin@gmail.com
|
|
In realmode mce handler we use irq_work_queue() to defer
the processing of mce events, irq_work_queue() can only
be called when translation is enabled because it touches
memory outside RMA, hence we enable translation before
calling irq_work_queue and disable on return, though it
is not safe to do in realmode.
To avoid this, program the decrementer and call the event
processing functions from timer handler.
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220120121931.517974-1-ganeshgr@linux.ibm.com
|
|
Some functions defined in 'arch/powerpc/platforms/pseries' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.
Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
|
|
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
Fix radix_enabled() use in setup_initial_memory_limit(). Add some
stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
|
|
Merge misc updates from Andrew Morton:
"257 patches.
Subsystems affected by this patch series: scripts, ocfs2, vfs, and
mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
cleanups, kfence, and damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
mm/damon: remove return value from before_terminate callback
mm/damon: fix a few spelling mistakes in comments and a pr_debug message
mm/damon: simplify stop mechanism
Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Docs/admin-guide/mm/damon/start: simplify the content
Docs/admin-guide/mm/damon/start: fix a wrong link
Docs/admin-guide/mm/damon/start: fix wrong example commands
mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
mm/damon: remove unnecessary variable initialization
Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
selftests/damon: support watermarks
mm/damon/dbgfs: support watermarks
mm/damon/schemes: activate schemes based on a watermarks mechanism
tools/selftests/damon: update for regions prioritization of schemes
mm/damon/dbgfs: support prioritization weights
mm/damon/vaddr,paddr: support pageout prioritization
mm/damon/schemes: prioritize regions within the quotas
mm/damon/selftests: support schemes quotas
mm/damon/dbgfs: support quotas of schemes
...
|
|
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.
Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org> [kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups
that are done under pcibios_add_device() which in turn is only called in
pci_device_add() whih is called when a PCI device is scanned.
pci_dev_assign_added() is called in pci_bus_add_device() which is only
called after scanning the device. Thus pci_dev_is_added() is always
false and can be dropped.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Tweak change log slightly to reflect Oliver's comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210910141940.2598035-2-schnelle@linux.ibm.com
|
|
Merge our fixes branch into next.
That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c.
Between cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when
creating the IPIs"), which moved request_irq() out of xive_init_ipis(),
and 17df41fec5b8 ("powerpc: use IRQF_NO_DEBUG for IPIs") which added
IRQF_NO_DEBUG to that request_irq() call, which has now moved.
|
|
Two IRQ domains are added on top of default machine IRQ domain.
First, the top level "pSeries-PCI-MSI" domain deals with the MSI
specificities. In this domain, the HW IRQ numbers are generated by the
PCI MSI layer, they compose a unique ID for an MSI source with the PCI
device identifier and the MSI vector number.
These numbers can be quite large on a pSeries machine running under
the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will
require small fixes to show them correctly.
Second domain is the in-the-middle "pSeries-MSI" domain which acts as
a proxy between the PCI MSI subsystem and the machine IRQ subsystem.
It usually allocate the MSI vector numbers but, on pSeries machines,
this is done by the RTAS FW and RTAS returns IRQ numbers in the IRQ
number space of the machine. This is why the in-the-middle "pSeries-MSI"
domain has the same HW IRQ numbers as its parent domain.
Only the XIVE (P9/P10) parent domain is supported for now. We still
need to add support for IRQ domain hierarchy under XICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-6-clg@kaod.org
|
|
After LPM, when migrating from a system with security mitigation enabled
to a system with mitigation disabled, the security flavor exposed in
/proc is not correctly set back to 0.
Do not assume the value of the security flavor is set to 0 when entering
init_cpu_char_feature_flags(), so when called after a LPM, the value is
set correctly even if the mitigation are not turned off.
Fixes: 6ce56e1ac380 ("powerpc/pseries: export LPAR security flavor in lparcfg")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805152308.33988-1-ldufour@linux.ibm.com
|
|
With commit c9f3401313a5 ("powerpc: Always enable queued spinlocks for
64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always
enabled on ppc64le, external modules that use spinlock APIs are
failing.
ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol 'shared_processor'
Before the above commit, modules were able to build without any
issues. Also this problem is not seen on other architectures. This
problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in
the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by
default and only enabled in certain conditions like
CONFIG_DEBUG_SPINLOCKS is set in the kernel config.
#include <linux/module.h>
spinlock_t spLock;
static int __init spinlock_test_init(void)
{
spin_lock_init(&spLock);
spin_lock(&spLock);
spin_unlock(&spLock);
return 0;
}
static void __exit spinlock_test_exit(void)
{
printk("spinlock_test unloaded\n");
}
module_init(spinlock_test_init);
module_exit(spinlock_test_exit);
MODULE_DESCRIPTION ("spinlock_test");
MODULE_LICENSE ("non-GPL");
MODULE_AUTHOR ("Srikar Dronamraju");
Given that spin locks are one of the basic facilities for module code,
this effectively makes it impossible to build/load almost any non GPL
modules on ppc64le.
This was first reported at https://github.com/openzfs/zfs/issues/11172
Currently shared_processor is exported as GPL only symbol.
Fix this for parity with other architectures by exposing
shared_processor to non-GPL modules too.
Fixes: 14c73bd344da ("powerpc/vcpu: Assume dedicated processors as non-preempt")
Cc: stable@vger.kernel.org # v5.5+
Reported-by: marc.c.dionne@gmail.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729060449.292780-1-srikar@linux.vnet.ibm.com
|
|
This allows the hypervisor / firmware to describe this workarounds to
the guest.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-4-npiggin@gmail.com
|
|
H_GET_CPU_CHARACTERISTICS
This allows the hypervisor / firmware to describe these workarounds to
the guest.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-2-npiggin@gmail.com
|
|
This is helpful to read the security flavor from inside the LPAR.
In /sys/kernel/debug/powerpc/security_features it can be seen if
mitigations are on or off but not the level set through the ASMI menu.
Furthermore, reporting it through /proc/powerpc/lparcfg allows an easy
processing by the lparstat command [1].
Export it like this in /proc/powerpc/lparcfg:
$ grep security_flavor /proc/powerpc/lparcfg
security_flavor=1
Value follows what is documented on the IBM support page [2]:
0 Speculative execution fully enabled
1 Speculative execution controls to mitigate user-to-kernel attacks
2 Speculative execution controls to mitigate user-to-kernel and
user-to-user side-channel attacks
[1] https://groups.google.com/g/powerpc-utils-devel/c/NaKXvdyl_UI/m/wa2stpIDAQAJ
[2] https://www.ibm.com/support/pages/node/715841
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210305125554.5165-1-ldufour@linux.ibm.com
|
|
Fixes the following W=1 kernel build warning(s):
drivers/tty/hvc/hvc_vio.c:385:13: warning: no previous prototype for ‘hvc_vio_init_early’
385 | void __init hvc_vio_init_early(void)
| ^~~~~~~~~~~~~~~~~~
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210303124603.3150175-1-lee.jones@linaro.org
|
|
Make powernv, pseries, powermac and maple use ppc_mc.discover_phbs.
These platforms need to be done together because they all depend on
pci_dn's being created from the DT. The pci_dn contains a pointer to
the relevant pci_controller so they need to be created after the
pci_controller structures are available, but before PCI devices are
scanned. Currently this ordering is provided by initcalls and the
sequence is:
1. PHBs are discovered (setup_arch) (early boot, pre-initcalls)
2. pci_dn are created from the unflattended DT (core initcall)
3. PHBs are scanned pcibios_init() (subsys initcall)
The new ppc_md.discover_phbs() function is also a core_initcall so we
can't guarantee ordering between the creation of pci_controllers and
the creation of pci_dn's which require a pci_controller. We could use
the postcore, or core_sync initcall levels, but it's cleaner to just
move the pci_dn setup into the per-PHB inits which occur inside of
.discover_phb() for these platforms. This brings the boot-time path in
line with the PHB hotplug path that is used for pseries DLPAR
operations too.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Squash powermac & maple in to avoid breakage those platforms,
convert memblock allocs to use kmalloc to avoid warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201103043523.916109-2-oohall@gmail.com
|
|
These are only used locally. It fixes these W=1 compile errors :
../arch/powerpc/platforms/pseries/setup.c:610:17: error: no previous prototype for ‘pseries_get_iov_fw_value’ [-Werror=missing-prototypes]
610 | resource_size_t pseries_get_iov_fw_value(struct pci_dev *dev, int resno,
| ^~~~~~~~~~~~~~~~~~~~~~~~
../arch/powerpc/platforms/pseries/setup.c:646:6: error: no previous prototype for ‘of_pci_set_vf_bar_size’ [-Werror=missing-prototypes]
646 | void of_pci_set_vf_bar_size(struct pci_dev *dev, const int *indexes)
| ^~~~~~~~~~~~~~~~~~~~~~
../arch/powerpc/platforms/pseries/setup.c:668:6: error: no previous prototype for ‘of_pci_parse_iov_addrs’ [-Werror=missing-prototypes]
668 | void of_pci_parse_iov_addrs(struct pci_dev *dev, const int *indexes)
| ^~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210104143206.695198-22-clg@kaod.org
|
|
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and
we just added entry and uaccess flushes. So the name is not very accurate
any more. In both platforms we then also immediately setup the STF flush.
Rename them to _setup_security_mitigations and fold the STF flush in.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The hypervisor interface has defined branch prediction security bits for
handling the link stack. Wire them up.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200825075612.224656-1-npiggin@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on
Power9 or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be
unsupported on Power10 and future processors, leaving us with no way
to implement the functionality it requests. This risks breaking
userspace, though we believe it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion
checking. We now allow stack expansion up to the rlimit, like other
architectures.
- Remove the remnants of our (previously disabled) topology update
code, which tried to react to NUMA layout changes on virtualised
systems, but was prone to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link
stack (branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as
usual.
Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.
* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
selftests/powerpc: Fix pkey syscall redefinitions
powerpc: Fix circular dependency between percpu.h and mmu.h
powerpc/powernv/sriov: Fix use of uninitialised variable
selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
powerpc/40x: Fix assembler warning about r0
powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
cpuidle: pseries: Fixup exit latency for CEDE(0)
cpuidle: pseries: Add function to parse extended CEDE records
cpuidle: pseries: Set the latency-hint before entering CEDE
selftests/powerpc: Fix online CPU selection
powerpc/perf: Consolidate perf_callchain_user_[64|32]()
powerpc/pseries/hotplug-cpu: Remove double free in error path
powerpc/pseries/mobility: Add pr_debug() for device tree changes
powerpc/pseries/mobility: Set pr_fmt()
powerpc/cacheinfo: Warn if cache object chain becomes unordered
powerpc/cacheinfo: Improve diagnostics about malformed cache lists
powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
powerpc/cacheinfo: Set pr_fmt()
powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
...
|
|
This adds a kernel command line option that can be used to disable GTSE support.
Disabling GTSE implies kernel will make hcalls to invalidate TLB entries.
This was done so that we can do VM migration between configs that enable/disable
GTSE support via hypervisor. To migrate a VM from a system that supports
GTSE to a system that doesn't, we can boot the guest with
radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB
invalidates.
The check for hcall availability is done in pSeries_setup_arch so that
the panic message appears on the console. This should only happen on
a hypervisor that doesn't force the guest to hash translation even
though it can't handle the radix GTSE=0 request via CAS. With
radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate
hcall it should force the LPAR to hash translation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com
|
|
This implements the generic paravirt qspinlocks using H_PROD and
H_CONFER to kick and wait.
This uses an un-directed yield to any CPU rather than the directed
yield to a pre-empted lock holder that paravirtualised simple
spinlocks use, that requires no kick hcall. This is something that
could be investigated and improved in future.
Performance results can be found in the commit which added queued
spinlocks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com
|
|
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5.
Enable powervm guest support with that. This has no effect on kvm guest
because kvm will return error if guest does hcall with resource value 5.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200723090813.303838-9-ravi.bangoria@linux.ibm.com
|
|
Add support for the scv instruction on POWER9 and later CPUs.
For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that LR is not preserved, nor
are volatile CR registers, and error is not indicated with CR0[SO],
but by returning a negative errno.
rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve LR.
getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix ppc64e build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
|