Age | Commit message (Collapse) | Author |
|
commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream.
init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.
The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.
More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 17290231df16eeee5dfc198dbf5ee4b419996dcd upstream.
There are two FIXMEs in the double exception handler 'for the extremely
unlikely case'. This case gets hit by gcc during kernel build once in
a few hours, resulting in an unrecoverable exception condition.
Provide missing fixup routine to handle this case. Double exception
literals now need 8 more bytes, add them to the linker script.
Also replace bbsi instructions with bbsi.l as we're branching depending
on 8th and 7th LSB-based bits of exception address.
This may be tested by adding the explicit DTLB invalidation to window
overflow handlers, like the following:
# --- a/arch/xtensa/kernel/vectors.S
# +++ b/arch/xtensa/kernel/vectors.S
# @@ -592,6 +592,14 @@ ENDPROC(_WindowUnderflow4)
# ENTRY_ALIGN64(_WindowOverflow8)
#
# s32e a0, a9, -16
# + bbsi.l a9, 31, 1f
# + rsr a0, ccount
# + bbsi.l a0, 4, 1f
# + pdtlb a0, a9
# + idtlb a0
# + movi a0, 9
# + idtlb a0
# +1:
# l32e a0, a1, -12
# s32e a2, a9, -8
# s32e a1, a9, -12
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea9f9274bf4337ba7cbab241c780487651642d63 upstream.
Remove xen_enable_nmi() to fix a 64-bit guest crash when registering
the NMI callback on Xen 3.1 and earlier.
It's not needed since the NMI callback is set by a set_trap_table
hypercall (in xen_load_idt() or xen_write_idt_entry()).
It's also broken since it only set the current VCPU's callback.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e120fb459693bbc1ac3eabdd65c3659d7cfbfd2a upstream.
After clarification from the hardware team it was found that
this 1.8V PHY supply can't be switched OFF when SoC is Active.
Since the PHY IPs don't contain isolation logic built in the design to
allow the power rail to be switched off, there is a very high risk
of IP reliability and additional leakage paths which can result in
additional power consumption.
The only scenario where this rail can be switched off is part of Power on
reset sequencing, but it needs to be kept always-on during operation.
This patch is required for proper functionality of USB, SATA
and PCIe on DRA7-evm.
CC: Rajendra Nayak <rnayak@ti.com>
CC: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream.
This moves the espfix64 logic into native_iret. To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).
[ hpa: this is a nonzero cost on native, but probably not enough to
measure. Xen needs to fix this in their own code, probably doing
something equivalent to espfix64. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.
Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 197725de65477bc8509b41388157c1a2283542bb upstream.
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.
This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream.
Header guard is #ifndef, not #ifdef...
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream.
Sparse warns that the percpu variables aren't declared before they are
defined. Rather than hacking around it, move espfix definitions into
a proper header file.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. This
causes some 16-bit software to break, but it also leaks kernel state
to user space. We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.
In checkin:
b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.
This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart. When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace. The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.
(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)
Special thanks to:
- Andy Lutomirski, for the suggestion of using very small stack slots
and copy (as opposed to map) the IRET frame there, and for the
suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.
Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream.
This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in
preparation of merging in the proper fix (espfix64).
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 811a2407a3cf7bbd027fbe92d73416f17485a3d8 upstream.
On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.
When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir. If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.
However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.
When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.
[rewrote commit message and added code comment -- rmk]
Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 823a19cd3b91b0729d7417f1848413846be61712 upstream.
If init_mm.brk is not section aligned, the LPAE fixup code will miss
updating the final PMD. Fix this by aligning map_end.
Fixes: a77e0c7b2774 ("ARM: mm: Recreate kernel mappings in early_paging_init()")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 33753cd2ba41c72a0756edc5dc094d91602deda5 upstream.
This patch adds bch8 ecc software fallback which is mostly used by
omap3s because they lack hardware elm support.
Fixes: 0611c41934ab35ce84dea34ab291897ad3cbc7be (ARM: OMAP2+: gpmc:
update gpmc_hwecc_bch_capable() for new platforms and ECC schemes)
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Reviewed-by: Pekon Gupta <pekon@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 28c9770bcbd2b6dbab99669825a2f8fa69e6d35b upstream.
Fix the address of L2 controler register in hi3620 SoC.
This has been wrong from the point that the file was merged
in v3.14.
Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Acked-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f3c400ef473e00c680ea713a66196b05870b3710 upstream.
Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8903461c9bc56fcb041fb92d054e2529951770b6 upstream.
In the recent commit b50a6c584bb4 "Clear MMCR2 when enabling PMU", I
screwed up the handling of MMCR2 for tasks using EBB.
We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we
overwrite the value of MMCR2 that userspace may have written. That
potentially breaks a task that uses EBB and manually uses MMCR2 for
event freezing.
Fixes: b50a6c584bb4 ("powerpc/perf: Clear MMCR2 when enabling PMU")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8142b215501f8b291a108a202b3a053a265b03dd upstream.
Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.
The following code:
> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
results in:
> result=666 errno=0 error=Success
Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:
> result=-1 errno=38 error=Function not implemented
The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.
Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net
Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 20dbea494543aefaace874cc3ec93a39b94b1ec4 upstream.
The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 97a9a7179aad701ab676e6f29eb90766a1acfde2 upstream.
Commit 75b57ecf9 refactored device tree nodes to use kobjects such that they
can be exposed via /sysfs. A secondary commit 0829f6d1f furthered this rework
by moving the kobect initialization logic out of of_node_add into its own
of_node_init function. The inital commit removed the existing kref_init calls
in the pseries dlpar code with the assumption kobject initialization would
occur in of_node_add. The second commit had the side effect of triggering a
BUG_ON during DLPAR, migration and suspend/resume operations as a result of
dynamically added nodes being uninitialized.
This patch fixes this by adding of_node_init calls in place of the previously
removed kref_init calls.
Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dab6cf55f81a6e16b8147aed9a843e1691dcd318 upstream.
The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect.
The PSW_MASK_USER define contains the PSW_MASK_ASC bits, the ptrace
interface accepts all combinations for the address-space-control
bits. To protect the kernel space the PSW mask check in ptrace needs
to reject the address-space-control bit combination for home space.
Fixes CVE-2014-3534
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit be6ae382dc153da51cf066c8dd523aa955f02531 upstream.
When sysmem reservation occurs exactly at the end of an existing block
that block is deleted, because it is incorrectly included in the range
of memblocks to remove. Fix that by skipping such block.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a4b6cb735b25aa84a462a1985e3e43bebaf5beb4 upstream.
This patch adds implementation of GET_THREAD_AREA ptrace request type. This
is required by GDB to debug NPTL applications.
Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 22970070e027cbbb9b2878f8f7c31d0d7f29e94d upstream.
Add alias for FEC ethernet on i.MX to allow bootloaders (like U-Boot)
patch-in the MAC address for FEC using this alias.
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream.
The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions. Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.
Fix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Cc: Thomas Bächler <thomas@archlinux.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream.
The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.
There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.
Opt in for known good archs.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3896c329df8092661dac80f55a8c3f60136fd61a upstream.
Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver
locked up when doing cpu hotplug.
Because we called set_cyc2ns_scale() from the time_cpufreq_notifier()
unconditionally, it gets called multiple times for each freq change,
instead of only the once, when the tsc_khz value actually changes.
Because it gets called more than once, we run out of cyc2ns data slots
and stall, waiting for a free one, but because we're half way offline,
there's no consumers to free slots.
By placing the call inside the condition that actually changes tsc_khz
we avoid superfluous calls and avoid the problem.
Reported-by: Mauro <registosites@hotmail.com>
Tested-by: Mauro <registosites@hotmail.com>
Fixes: 20d1c86a5776 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Bin Gao <bin.gao@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream.
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
For example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.
This commit deals with the issue simply by ignoring CondChgd bit.
Here is explanation in detail.
On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.
intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.
The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.
As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.
I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.
On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.
I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:
In 18.9.1:
"The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
indicate changes to the state of performancmonitoring hardware"
In Table 35-2 IA-32 Architectural MSRs
63 CondChg: status bits of this register has changed.
These are different from the bahviour I see on the actual system as I
explained above.
At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream.
In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow. For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!
Internally, page_is_ram() calls walk_system_ram_range() on a single
page. Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find. For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.
With this change, ioremap on our 256 GiB BAR takes less than 1 second.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fb43e8477ed9006c4f397f904c691a120503038c upstream.
powerpc:allmodconfig has been failing for some time with the following
error.
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards
make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1
A number of attempts to fix the problem by moving around code have been
unsuccessful and resulted in failed builds for some configurations and
the discovery of toolchain bugs.
Fix the problem by disabling RELOCATABLE for COMPILE_TEST builds instead.
While this is less than perfect, it avoids substantial code changes
which would otherwise be necessary just to make COMPILE_TEST builds
happy and might have undesired side effects.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fa2ec3ea10bd377f9d55772b1dab65178425a1a2 upstream.
include/linux/sched.h implements TASK_SIZE_OF as TASK_SIZE if it
is not set by the architecture headers. TASK_SIZE uses the
current task to determine the size of the virtual address space.
On a 64-bit kernel this will cause reading /proc/pid/pagemap of a
64-bit process from a 32-bit process to return EOF when it reads
past 0xffffffff.
Implement TASK_SIZE_OF exactly the same as TASK_SIZE with
test_tsk_thread_flag instead of test_thread_flag.
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cfe82d4f45c7cc39332a2be7c4c1d3bf279bbd3d upstream.
Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:
CHECK arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b50a6c584bb47b370f84bfd746770c0bbe7129b7 upstream.
On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze
the PMU counters. Aside from on boot they are then never reset,
resulting in stuck perf counters for any user in the guest or host.
We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane
state for perf to use the PMU counters under either the guest or the
host.
This was manifesting as a bug with ppc64_cpu --frequency:
$ sudo ppc64_cpu --frequency
WARNING: couldn't run on cpu 0
WARNING: couldn't run on cpu 8
...
WARNING: couldn't run on cpu 144
WARNING: couldn't run on cpu 152
min: 18446744073.710 GHz (cpu -1)
max: 0.000 GHz (cpu -1)
avg: 0.000 GHz
The command uses a perf counter to measure CPU cycles over a fixed
amount of time, in order to approximate the frequency of the machine.
The counters were returning zero once a guest was started, regardless of
weather it was still running or had been shut down.
By dumping the value of MMCR2, it was observed that once a guest is
running MMCR2 is set to 1s - which stops counters from running:
$ sudo sh -c 'echo p > /proc/sysrq-trigger'
CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6
PMC1: 5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
PMC5: 1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef
MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000
MMCR2: fffffffffffffc00 EBBHR: 0000000000000000
EBBRR: 0000000000000000 BESCR: 0000000000000000
SIAR: 00000000000a51cc SDAR: c00000000fc40000 SIER: 0000000001000000
This is done unconditionally in book3s_hv_interrupts.S upon entering the
guest, and the original value is only save/restored if the host has
indicated it was using the PMU. This is okay, however the user of the
PMU needs to ensure that it is in a defined state when it starts using
it.
Fixes: e05b9b9e5c10 ("powerpc/perf: Power8 PMU support")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4d9690dd56b0d18f2af8a9d4a279cb205aae3345 upstream.
Instead of separate bits for every POWER8 PMU feature, have a single one
for v2.07 of the architecture.
This saves us adding a MMCR2 define for a future patch.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f56029410a13cae3652d1f34788045c40a13ffc7 upstream.
We are seeing a lot of PMU warnings on POWER8:
Can't find PMC that caused IRQ
Looking closer, the active PMC is 0 at this point and we took a PMU
exception on the transition from negative to 0. Some versions of POWER8
have an issue where they edge detect and not level detect PMC overflows.
A number of places program the PMC with (0x80000000 - period_left),
where period_left can be negative. We can either fix all of these or
just ensure that period_left is always >= 1.
This patch takes the second option.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f73128f4f680e8be68cda831f2710214559583cb upstream.
These two registers are already saved in the block above. Aside from
being unnecessary, by the time we get down to the second save location
r8 no longer contains MMCR2, so we are clobbering the saved value with
PMC5.
MMCR2 primarily consists of counter freeze bits. So restoring the value
of PMC5 into MMCR2 will most likely have the effect of freezing
counters.
Fixes: 72cde5a88d37 ("KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ab8a261ba5e2dd9206da640de5870cc31d568a7c upstream.
On parisc we can not use the existing compat implementation for fanotify_mark()
because for the 64bit mask parameter the higher and lower 32bits are ordered
differently than what the compat function expects from big endian
architectures.
Specifically:
It finally turned out, that on hppa we end up with different assignments
of parameters to kernel arguments depending on if we call the glibc
wrapper function
int fanotify_mark (int __fanotify_fd, unsigned int __flags,
uint64_t __mask, int __dfd, const char *__pathname);
or directly calling the syscall manually
syscall(__NR_fanotify_mark, ...)
Reason is, that the syscall() function is implemented as C-function and
because we now have the sysno as first parameter in front of the other
parameters the compiler will unexpectedly add an empty paramenter in
front of the u64 value to ensure the correct calling alignment for 64bit
values.
This means, on hppa you can't simply use syscall() to call the kernel
fanotify_mark() function directly, but you have to use the glibc
function instead.
This patch fixes the kernel in the hppa-arch specifc coding to adjust
the parameters in a way as if userspace calls the glibc wrapper function
fanotify_mark().
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit eadcc7208a2237016be7bdff4551ba7614da85c8 upstream.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1ff38c56cbd095c4c0dfa581a859ba3557830f78 upstream.
Need include "asm/pgtable.h" to include "asm-generic/pgtable-nopmd.h",
so can let 'pmd_t' defined. The related error with allmodconfig:
CC arch/unicore32/mm/alignment.o
In file included from arch/unicore32/mm/alignment.c:24:
arch/unicore32/include/asm/tlbflush.h:135: error: expected .). before .*. token
arch/unicore32/include/asm/tlbflush.h:154: error: expected .). before .*. token
In file included from arch/unicore32/mm/alignment.c:27:
arch/unicore32/mm/mm.h:15: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
arch/unicore32/mm/mm.h:20: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
arch/unicore32/mm/mm.h:25: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
make[1]: *** [arch/unicore32/mm/alignment.o] Error 1
make: *** [arch/unicore32/mm] Error 2
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ba25915fb2cd18152cb14b144dbe8bf2f2bd8e45 upstream.
Fixes: ec7ac6afd07b (ARC: switch to generic ENTRY/END assembler annotations)
Reported-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1fa108d24697b78348fd4e5a531029a50d0d36d upstream.
When kvm_write_guest writes the tsc_ref structure to the guest, or it will lead
the low HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT bits of the TSC page address
must be cleared, or the guest can see a non-zero sequence number.
Otherwise Windows guests would not be able to get a correct clocksource
(QueryPerformanceCounter will always return 0) which causes serious chaos.
Signed-off-by: Xiaoming Gao <newtongao@tencnet.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7cb060a91c0efc5ff94f83c6df3ed705e143cdb9 upstream.
KVM does not really do much with the PAT, so this went unnoticed for a
long time. It is exposed however if you try to do rdmsr on the PAT
register.
Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 682367c494869008eb89ef733f196e99415ae862 upstream.
Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3906c2b53cd23c2ae03e6ce41432c8e7f0a3cbbb upstream.
The value of ESR has been stored into x1, and should be directly pass to
do_sp_pc_abort function, "MOV x1, x25" is an extra operation and do_sp_pc_abort
will get the wrong value of ESR.
Signed-off-by: ChiaHao <andy.jhshiu@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c021f241f4fab2bb4fc4120a38a828a03dd3f970 upstream.
Fix a parser-bug in the omap2 muxing code where muxtable-entries will be
wrongly selected if the requested muxname is a *prefix* of their
m0-entry and they have a matching mN-entry. Fix by additionally checking
that the length of the m0_entry is equal.
For example muxing of "dss_data2.dss_data2" on omap32xx will fail
because the prefix "dss_data2" will match the mux-entries "dss_data2" as
well as "dss_data20", with the suffix "dss_data2" matching m0 (for
dss_data2) and m4 (for dss_data20). Thus both are recognized as signal
path candidates:
Relevant muxentries from mux34xx.c:
_OMAP3_MUXENTRY(DSS_DATA20, 90,
"dss_data20", NULL, "mcspi3_somi", "dss_data2",
"gpio_90", NULL, NULL, "safe_mode"),
_OMAP3_MUXENTRY(DSS_DATA2, 72,
"dss_data2", NULL, NULL, NULL,
"gpio_72", NULL, NULL, "safe_mode"),
This will result in a failure to mux the pin at all:
_omap_mux_get_by_name: Multiple signal paths (2) for dss_data2.dss_data2
Patch should apply to linus' latest master down to rather old linux-2.6
trees.
Signed-off-by: David R. Piegdon <lkml@p23q.org>
Cc: stable@vger.kernel.org
[tony@atomide.com: updated description to include full description]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 923b8f5044da753e4985ab15c1374ced2cdf616c upstream.
The __sync_icache_dcache routine will only flush the dcache for the
first page of a compound page, potentially leading to stale icache
data residing further on in a hugetlb page.
This patch addresses this issue by taking into consideration the
order of the page when flushing the dcache.
Reported-by: Mark Brown <broonie@linaro.org>
Tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
wildcard
commit 8dbdb8e704db34085f5978c335c10256b0fb9629 upstream.
Wildcards in compatible strings should be avoid. "marvell,armada38x"
was recently introduced but was not yet used.
The armada 385 SoC is a superset of the armada 380 SoC (with more CPUs
and more PCIe slots). So this patch replaces the use of
"marvell,armada38x" by the "marvell,armada380" string.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1403533011-21339-1-git-send-email-gregory.clement@free-electrons.com
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f9af420fc8208d3add2fe3198dc5d8215f5a81ba upstream.
fs/notify/fanotify/fanotify_user.c: In function 'SYSC_fanotify_init':
fs/notify/fanotify/fanotify_user.c:726: error: implicit declaration of function 'personality'
fs/notify/fanotify/fanotify_user.c:726: error: 'PER_LINUX32' undeclared (first use in this function)
fs/notify/fanotify/fanotify_user.c:726: error: (Each undeclared identifier is reported only once
fs/notify/fanotify/fanotify_user.c:726: error: for each function it appears in.)
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Will Woods <wwoods@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a upstream.
The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values. That is
very much part of why it's faster than 'iret'.
Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.
However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'. Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.
Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8c9eb041cf76038eb3b62ee259607eec9b89f48d upstream.
kvm_arch_vcpu_free() is called in 2 code paths:
1) kvm_vm_ioctl()
kvm_vm_ioctl_create_vcpu()
kvm_arch_vcpu_destroy()
kvm_arch_vcpu_free()
2) kvm_put_kvm()
kvm_destroy_vm()
kvm_arch_destroy_vm()
kvm_mips_free_vcpus()
kvm_arch_vcpu_free()
Neither of the paths handles VCPU free. We need to do it in
kvm_arch_vcpu_free() corresponding to the memory allocation in
kvm_arch_vcpu_create().
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|