lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2016-06-24	sparc64: Fix return from trap window fill crashes.	David S. Miller
	[ Upstream commit 7cafc0b8bf130f038b0ec2dcdd6a9de6dc59b65a ] We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL__RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_ value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock <nix@esperi.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	sparc: Harden signal return frame checks.	David S. Miller
	[ Upstream commit d11c2a0de2824395656cf8ed15811580c9dd38aa ] All signal frames must be at least 16-byte aligned, because that is the alignment we explicitly create when we build signal return stack frames. All stack pointers must be at least 8-byte aligned. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	sparc64: Take ctx_alloc_lock properly in hugetlb_setup().	David S. Miller
	[ Upstream commit 9ea46abe22550e3366ff7cee2f8391b35b12f730 ] On cheetahplus chips we take the ctx_alloc_lock in order to modify the TLB lookup parameters for the indexed TLBs, which are stored in the context register. This is called with interrupts disabled, however ctx_alloc_lock is an IRQ safe lock, therefore we must take acquire/release it properly with spin_{lock,unlock}_irq(). Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	sparc64: Reduce TLB flushes during hugepte changes	Nitin Gupta
	[ Upstream commit 24e49ee3d76b70853a96520e46b8837e5eae65b2 ] During hugepage map/unmap, TSB and TLB flushes are currently issued at every PAGE_SIZE'd boundary which is unnecessary. We now issue the flush at REAL_HPAGE_SIZE boundaries only. Without this patch workloads which unmap a large hugepage backed VMA region get CPU lockups due to excessive TLB flush calls. Orabug: 22365539, 22643230, 22995196 Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	x86/entry/traps: Don't force in_interrupt() to return true in IST handlers	Andy Lutomirski
	commit aaee8c3c5cce2d9107310dd9f3026b4f901d441c upstream. Forcing in_interrupt() to return true if we're not in a bona fide interrupt confuses the softirq code. This fixes warnings like: NOHZ: local_softirq_pending 282 ... which can happen when running things like selftests/x86. This will change perf's static percpu buffer usage in IST context. I think this is okay, and it's changing the behavior to match historical (pre-4.0) behavior. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context") Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	parisc: Fix pagefault crash in unaligned __get_user() call	Helge Deller
	commit 8b78f260887df532da529f225c49195d18fef36b upstream. One of the debian buildd servers had this crash in the syslog without any other information: Unaligned handler failed, ret = -2 clock_adjtime (pid 22578): Unaligned data reference (code 28) CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G E 4.5.0-2-parisc64-smp #1 Debian 4.5.4-1 task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111100000001111 Tainted: G E r00-03 000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0 r04-07 00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff r08-11 0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4 r12-15 000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b r16-19 0000000000028800 0000000000000001 0000000000000070 00000001bde7c218 r20-23 0000000000000000 00000001bde7c210 0000000000000002 0000000000000000 r24-27 0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0 r28-31 0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218 sr00-03 0000000001200000 0000000001200000 0000000000000000 0000000001200000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88 IIR: 0ca0d089 ISR: 0000000001200000 IOR: 00000000fa6f7fff CPU: 1 CR30: 00000001bde7c000 CR31: ffffffffffffffff ORIG_R28: 00000002369fe628 IAOQ[0]: compat_get_timex+0x2dc/0x3c0 IAOQ[1]: compat_get_timex+0x2e0/0x3c0 RP(r2): compat_get_timex+0x40/0x3c0 Backtrace: [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0 [<0000000040205024>] syscall_exit+0x0/0x14 This means the userspace program clock_adjtime called the clock_adjtime() syscall and then crashed inside the compat_get_timex() function. Syscalls should never crash programs, but instead return EFAULT. The IIR register contains the executed instruction, which disassebles into "ldw 0(sr3,r5),r9". This load-word instruction is part of __get_user() which tried to read the word at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The unaligned handler is able to emulate all ldw instructions, but it fails if it fails to read the source e.g. because of page fault. The following program reproduces the problem: #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/mman.h> int main(void) { /* allocate 8k / char ptr = mmap(NULL, 24096, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, -1, 0); / free second half (upper 4k) and make it invalid. / munmap(ptr+4096, 4096); / syscall where first int is unaligned and clobbers into invalid memory region / / syscall should return EFAULT */ return syscall(__NR_clock_adjtime, 0, ptr+4095); } To fix this issue we simply need to check if the faulting instruction address is in the exception fixup table when the unaligned handler failed. If it is, call the fixup routine instead of crashing. While looking at the unaligned handler I found another issue as well: The target register should not be modified if the handler was unsuccessful. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	powerpc/mm/hash: Fix the reference bit update when handling hash fault	Aneesh Kumar K.V
	commit dc47c0c1f8099fccb2c1e2f3775855066a9e4484 upstream. When we converted the asm routines to C functions, we missed updating HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower bits from pte via. andi. r3,r30,0x1fe /* Get basic set of flags */ We also update the code such that we won't update the Change bit ('C' bit) always. This was added by commit c5cf0e30bf3d8 ("powerpc: Fix buglet with MMU hash management"). With hash64, we need to make sure that hardware doesn't do a pte update directly. This is because we do end up with entries in TLB with no hash page table entry. This happens because when we find a hash bucket full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". For more info look at commit 0608d692463("powerpc/mm: Always invalidate tlb on hpte invalidate and update"). Thus it's critical that valid hash PTEs always have reference bit set and writeable ones have change bit set. We do this by hashing a non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any attempt by Linux at clearing those bits also removes the corresponding hash entry. Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always. We don't really need to do that because we never map a RW pte entry without setting 'C' bit. On READ fault on a RW pte entry, we still map it READ only, hence a store update in the page will still cause a hash pte fault. This patch reverts the part of commit c5cf0e30bf3d8 ("[PATCH] powerpc: Fix buglet with MMU hash management") and retain the updatepp part. - If we hit the updatepp path on native, the old code without that commit, would fail to set C bcause native_hpte_updatepp() was implemented to filter the same bits as H_PROTECT and not let C through thus we would "upgrade" a RO HPTE to RW without setting C thus causing the bug. So the real fix in that commit was the change to native_hpte_updatepp Fixes: 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call	Thomas Huth
	commit 7cc851039d643a2ee7df4d18177150f2c3a484f5 upstream. If we do not provide the PVR for POWER8NVL, a guest on this system currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU does not provide a generic PowerISA 2.07 mode yet. So some new instructions from POWER8 (like "mtvsrd") get disabled for the guest, resulting in crashes when using code compiled explicitly for POWER8 (e.g. with the "-mcpu=power8" option of GCC). Fixes: ddee09c099c3 ("powerpc: Add PVR for POWER8NVL processor") Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	powerpc: Use privileged SPR number for MMCR2	Thomas Huth
	commit 8dd75ccb571f3c92c48014b3dabd3d51a115ab41 upstream. We are already using the privileged versions of MMCR0, MMCR1 and MMCRA in the kernel, so for MMCR2, we should better use the privileged versions, too, to be consistent. Fixes: 240686c13687 ("powerpc: Initialise PMU related regs on Power8") Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	powerpc: Fix definition of SIAR and SDAR registers	Thomas Huth
	commit d23fac2b27d94aeb7b65536a50d32bfdc21fe01e upstream. The SIAR and SDAR registers are available twice, one time as SPRs 780 / 781 (unprivileged, but read-only), and one time as the SPRs 796 / 797 (privileged, but read and write). The Linux kernel code currently uses the unprivileged SPRs - while this is OK for reading, writing to that register of course does not work. Since the KVM code tries to write to this register, too (see the mtspr in book3s_hv_rmhandlers.S), the contents of this register sometimes get lost for the guests, e.g. during migration of a VM. To fix this issue, simply switch to the privileged SPR numbers instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge	Russell Currey
	commit 871e178e0f2c4fa788f694721a10b4758d494ce1 upstream. In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the spec states that values of 9900-9905 can be returned, indicating that software should delay for 10^x (where x is the last digit, i.e. 990x) milliseconds and attempt the call again. Currently, the kernel doesn't know about this, and respecting it fixes some PCI failures when the hypervisor is busy. The delay is capped at 0.2 seconds. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	arm64: mm: always take dirty state from new pte in ptep_set_access_flags	Will Deacon
	commit 0106d456c4cb1770253fefc0ab23c9ca760b43f7 upstream. Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") ensured that pte flags are updated atomically in the face of potential concurrent, hardware-assisted updates. However, Alex reports that: \| This patch breaks swapping for me. \| In the broken case, you'll see either systemd cpu time spike (because \| it's stuck in a page fault loop) or the system hang (because the \| application owning the screen is stuck in a page fault loop). It turns out that this is because the 'dirty' argument to ptep_set_access_flags is always 0 for read faults, and so we can't use it to set PTE_RDONLY. The failing sequence is: 1. We put down a PTE_WRITE \| PTE_DIRTY \| PTE_AF pte 2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF 3. A read faults due to the missing access flag 4. ptep_set_access_flags is called with dirty = 0, due to the read fault 5. pte is then made PTE_WRITE \| PTE_DIRTY \| PTE_AF \| PTE_RDONLY (!) 6. A write faults, but pte_write is true so we get stuck The solution is to check the new page table entry (as would be done by the generic, non-atomic definition of ptep_set_access_flags that just calls set_pte_at) to establish the dirty state. Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks	Catalin Marinas
	commit e47b020a323d1b2a7b1e9aac86e99eae19463630 upstream. This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with the 32-bit ARM one by providing an additional line: model name : ARMv8 Processor rev X (v8l) Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	s390/bpf: reduce maximum program size to 64 KB	Michael Holzheu
	commit 0fa963553a5c28d8f8aabd8878326d3f782045fc upstream. The s390 BFP compiler currently uses relative branch instructions that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj", etc. Currently the maximum size of s390 BPF programs is set to 0x7ffff. If branches over 64 KB are generated the, kernel can crash due to incorrect code. So fix this an reduce the maximum size to 64 KB. Programs larger than that will be interpreted. Fixes: ce2b6ad9c185 ("s390/bpf: increase BPF_SIZE_MAX") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop	Michael Holzheu
	commit 6edf0aa4f8bbdfbb4d6d786892fa02728d05dc36 upstream. In case of usage of skb_vlan_push/pop, in the prologue we store the SKB pointer on the stack and restore it after BPF_JMP_CALL to skb_vlan_push/pop. Unfortunately currently there are two bugs in the code: 1) The wrong stack slot (offset 170 instead of 176) is used 2) The wrong register (W1 instead of B1) is saved So fix this and use correct stack slot and register. Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	ARM: fix PTRACE_SETVFPREGS on SMP systems	Russell King
	commit e2dfb4b880146bfd4b6aa8e138c0205407cebbaf upstream. PTRACE_SETVFPREGS fails to properly mark the VFP register set to be reloaded, because it undoes one of the effects of vfp_flush_hwstate(). Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to an invalid CPU number, but vfp_set() overwrites this with the original CPU number, thereby rendering the hardware state as apparently "valid", even though the software state is more recent. Fix this by reverting the previous change. Fixes: 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Simon Marchi <simon.marchi@ericsson.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS	Paolo Bonzini
	commit d14bdb553f9196169f003058ae1cdabe514470e6 upstream. MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS time, and the next KVM_RUN oopses: general protection fault: 0000 [#1] SMP CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 [...] Call Trace: [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm] [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 RIP [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40 RSP <ffff88005836bd50> Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[8]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); memcpy(&dr, "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72" "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8" "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9" "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb", 48); r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr); r[6] = ioctl(r[4], KVM_RUN, 0); } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24	KVM: arm/arm64: vgic-v3: Clear all dirty LRs	Christoffer Dall
	commit fa89c77e891917b5913f9be080f9131a9457bb3e upstream. When saving the state of the list registers, it is critical to reset them zero, as we could otherwise leave unexpected EOI interrupts pending for virtual level interrupts. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	xen: use same main loop for counting and remapping pages	Juergen Gross
	commit dd14be92fbf5bc1ef7343f34968440e44e21b46a upstream. Instead of having two functions for cycling through the E820 map in order to count to be remapped pages and remap them later, just use one function with a caller supplied sub-function called for each region to be processed. This eliminates the possibility of a mismatch between both loops which showed up in certain configurations. Suggested-by: Ed Swierk <eswierk@skyportsystems.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()	Gavin Shan
	commit 5a0cdbfd17b90a89c64a71d8aec9773ecdb20d0d upstream. The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrou device are transferred to guest and backwards. The content in the device's config space will be lost on PE reset issued in the middle of the recovery. The function saves/restores it before/after the reset. However, config access to some adapters like Broadcom BCM5719 at this point will causes fenced PHB. The config space is always blocked and we save 0xFF's that are restored at late point. The memory BARs are totally corrupted, causing another EEH error upon access to one of the memory BARs. This restores the config space on those adapters like BCM5719 from the content saved to the EEH device when it's populated, to resolve above issue. Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"	Guilherme G. Piccoli
	commit c2078d9ef600bdbe568c89e5ddc2c6f15b7982c8 upstream. This reverts commit 89a51df5ab1d38b257300b8ac940bbac3bb0eb9b. The function eeh_add_device_early() is used to perform EEH initialization in devices added later on the system, like in hotplug/DLPAR scenarios. Since the commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") a new check was introduced in this function - Cell has no EEH capabilities which led to kernel oops if hotplug was performed, so checking for eeh_enabled() was introduced to avoid the issue. However, in architectures that EEH is present like pSeries or PowerNV, we might reach a case in which no PCI devices are present on boot time and so EEH is not initialized. Then, if a device is added via DLPAR for example, eeh_add_device_early() fails because eeh_enabled() is false, and EEH end up not being enabled at all. This reverts the aforementioned patch since a new verification was introduced by the commit d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug") and so the original Cell issue does not happen anymore. Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()	Gavin Shan
	commit affeb0f2d3a9af419ad7ef4ac782e1540b2f7b28 upstream. The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrough device are transferred to guest and backwards, meaning the device's driver is vfio-pci or none. When the driver is vfio-pci that provides error_detected() error handler only, the handler simply stops the guest and it's not expected behaviour. On the other hand, no error handlers will be called if we don't have a bound driver. This ignores the error handler in eeh_pe_reset_and_recover() that reports the error to device driver to avoid the exceptional behaviour. Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel	Hari Bathini
	commit 8ed8ab40047a570fdd8043a40c104a57248dd3fd upstream. Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long (8 instructions), which is not enough for the full first-level interrupt handler. For these we need to branch to an out-of-line (OOL) handler. But when we are running a relocatable kernel, interrupt vectors till __end_interrupts marker are copied down to real address 0x100. So, branching to labels (ie. OOL handlers) outside this section must be handled differently (see LOAD_HANDLER()), considering relocatable kernel, which would need at least 4 instructions. However, branching from interrupt vector means that we corrupt the CFAR (come-from address register) on POWER7 and later processors as mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions) that contains the part up to the point where the CFAR is saved in the PACA should be part of the short interrupt vectors before we branch out to OOL handlers. But as mentioned already, there are interrupt vectors on 64-bit POWER server processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.), which cannot accomodate the above two cases at the same time owing to space constraint. Currently, in these interrupt vectors, we simply branch out to OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when running a relocatable kernel (eg. kdump case). While this has been the case for sometime now and kdump is used widely, we were fortunate not to see any problems so far, for three reasons: 1. In almost all cases, production kernel (relocatable) is used for kdump as well, which would mean that crashed kernel's OOL handler would be at the same place where we end up branching to, from short interrupt vector of kdump kernel. 2. Also, OOL handler was unlikely the reason for crash in almost all the kdump scenarios, which meant we had a sane OOL handler from crashed kernel that we branched to. 3. On most 64-bit POWER server processors, page size is large enough that marking interrupt vector code as executable (see commit 429d2e83) leads to marking OOL handler code from crashed kernel, that sits right below interrupt vector code from kdump kernel, as executable as well. Let us fix this by moving the __end_interrupts marker down past OOL handlers to make sure that we also copy OOL handlers to real address 0x100 when running a relocatable kernel. This fix has been tested successfully in kdump scenario, on an LPAR with 4K page size by using different default/production kernel and kdump kernel. Also tested by manually corrupting the OOL handlers in the first kernel and then kdump'ing, and then causing the OOL handlers to fire - mpe. Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers") Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats	Marek Szyprowski
	commit 330d12764e15f6e3e94ff34cda29db96d2589c24 upstream. MAX8997 PMIC requires interrupt and fails probing without it. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: d105f0b1215d ("ARM: dts: Add basic dts file for Samsung Trats board") [k.kozlowski: Write commit message, add CC-stable] Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description	Florian Vallee
	commit b1f3a3b03eb5f61b4051e2da9aa15653e705e111 upstream. Fix a typo on PIN_PD24 for UTXD2 and FLEXCOM4_IO3 which were wrongly linked to PIN_PD23). Signed-off-by: Florian Vallee <fvallee@eukrea.fr> Fixes: 7f16cb676c00 ("ARM: at91/dt: add sama5d2 pinmux") [nicolas.ferre@atmel.com: add commit message, changed subject] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	ARM: mvebu: fix GPIO config on the Linksys boards	Imre Kaloz
	commit 9800917cf92f5b5fe5cae706cb70db8d014f663c upstream. Some of the GPIO configs were wrong in the submitted DTS files, this patch fixes all affected boards. Signed-off-by: Imre Kaloz <kaloz@openwrt.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
2016-06-07	ARM: sun7i: dt: Enable dram gate 5 (tve0 clock) for simplefb TV output	Priit Laes
	commit 4b8ccef22fb547007ac38c4e5a28a773adee1e6e upstream. Seems like dram_gate 5 was forgotten when DRAM gating driver was added. Add it. Fixes: 0b4bf5a5200b (ARM: dts: sun7i: Add DRAM gates) Signed-off-by: Priit Laes <plaes@plaes.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	ARM: sun4i: dt: Enable dram gate 5 (tve0 clock) for simplefb TV output	Priit Laes
	commit bec38aaafd9ec1463dd3857f02bc029707e4213d upstream. Seems like dram_gate 5 was forgotten when DRAM gate driver was added. Enable it. Fixes: 82f8582feef4 (ARM: dts: sun4i: Add DRAM gates) Signed-off-by: Priit Laes <plaes@plaes.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: VDSO: Build with `-fno-strict-aliasing'	Maciej W. Rozycki
	commit 94cc36b84acc29f543b48bc5ed786011b112a666 upstream. Avoid an aliasing issue causing a build error in VDSO: In file included from include/linux/srcu.h:34:0, from include/linux/notifier.h:15, from ./arch/mips/include/asm/uprobes.h:9, from include/linux/uprobes.h:61, from include/linux/mm_types.h:13, from ./arch/mips/include/asm/vdso.h:14, from arch/mips/vdso/vdso.h:27, from arch/mips/vdso/gettimeofday.c:11: include/linux/workqueue.h: In function 'work_static': include/linux/workqueue.h:186:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] return work_data_bits(work) & WORK_STRUCT_STATIC; ^ cc1: all warnings being treated as errors make[2]: ** [arch/mips/vdso/gettimeofday.o] Error 1 with a CONFIG_DEBUG_OBJECTS_WORK configuration and GCC 5.2.0. Include `-fno-strict-aliasing' along with compiler options used, as required for kernel code, fixing a problem present since the introduction of VDSO with commit ebb5e78cc634 ("MIPS: Initial implementation of a VDSO"). Thanks to Tejun for diagnosing this properly! Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Cc: Tejun Heo <tj@kernel.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13357/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: lib: Mark intrinsics notrace	Harvey Hunt
	commit aedcfbe06558a9f53002e82d5be64c6c94687726 upstream. On certain MIPS32 devices, the ftrace tracer "function_graph" uses __lshrdi3() during the capturing of trace data. ftrace then attempts to trace __lshrdi3() which leads to infinite recursion and a stack overflow. Fix this by marking __lshrdi3() as notrace. Mark the other compiler intrinsics as notrace in case the compiler decides to use them in the ftrace path. Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com> Cc: <linux-mips@linux-mips.org> Cc: <linux-kernel@vger.kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/13354/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Build microMIPS VDSO for microMIPS kernels	James Hogan
	commit bb93078e655be1e24d68f28f2756676e62c037ce upstream. MicroMIPS kernels may be expected to run on microMIPS only cores which don't support the normal MIPS instruction set, so be sure to pass the -mmicromips flag through to the VDSO cflags. Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13349/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Fix sigreturn via VDSO on microMIPS kernel	James Hogan
	commit 13eb192d10bcc9ac518d57356179071d603bcb4e upstream. In microMIPS kernels, handle_signal() sets the isa16 mode bit in the vdso address so that the sigreturn trampolines (which are offset from the VDSO) get executed as microMIPS. However commit ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") changed the offsets to come from the VDSO image, which already have the isa16 mode bit set correctly since they're extracted from the VDSO shared library symbol table. Drop the isa16 mode bit handling from handle_signal() to fix sigreturn for cores which support both microMIPS and normal MIPS. This doesn't fix microMIPS only cores, since the VDSO is still built for normal MIPS, but thats a separate problem. Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13348/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: ptrace: Prevent writes to read-only FCSR bits	Maciej W. Rozycki
	commit abf378be49f38c4d3e23581d3df3fa9f1b1b11d2 upstream. Correct the cases missed with commit 9b26616c8d9d ("MIPS: Respect the ISA level in FCSR handling") and prevent writes to read-only FCSR bits there. This in particular applies to FP context initialisation where any IEEE 754-2008 bits preset by `mips_set_personality_nan' are cleared before the relevant ptrace(2) call takes effect and the PTRACE_POKEUSR request addressing FPC_CSR where no masking of read-only FCSR bits is done. Remove the FCSR clearing from FP context initialisation then and unify PTRACE_POKEUSR/FPC_CSR and PTRACE_SETFPREGS handling, by factoring out code from `ptrace_setfpregs' and calling it from both places. This mostly matters to soft float configurations where the emulator can be switched this way to a mode which should not be accessible and cannot be set with the CTC1 instruction. With hard float configurations any effect is transient anyway as read-only bits will retain their values at the time the FP context is restored. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13239/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: ptrace: Fix FP context restoration FCSR regression	Maciej W. Rozycki
	commit 4249548454f7ba4581aeee26bd83f42b48a14d15 upstream. Fix a floating-point context restoration regression introduced with commit 9b26616c8d9d ("MIPS: Respect the ISA level in FCSR handling") that causes a Floating Point exception and consequently a kernel oops with hard float configurations when one or more FCSR Enable and their corresponding Cause bits are set both at a time via a ptrace(2) call. To do so reinstate Cause bit masking originally introduced with commit b1442d39fac2 ("MIPS: Prevent user from setting FCSR cause bits") to address this exact problem and then inadvertently removed from the PTRACE_SETFPREGS request with the commit referred above. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13238/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)	Paul Burton
	commit bd239f1e1429e7781096bf3884bdb1b2b1bb4f28 upstream. Whilst a PR_SET_FP_MODE prctl is performed there are decisions made based upon whether the task is executing on the current CPU. This may change if we're preempted, so disable preemption to avoid such changes for the lifetime of the mode switch. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS") Reviewed-by: Maciej W. Rozycki <macro@imgtec.com> Tested-by: Aurelien Jarno <aurelien@aurel32.net> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13144/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Prevent "restoration" of MSA context in non-MSA kernels	Paul Burton
	commit 6533af4d4831c421cd9aa4dce7cfc19a3514cc09 upstream. If a kernel doesn't support MSA context (ie. CONFIG_CPU_HAS_MSA=n) then it will only keep 64 bits per FP register in thread context, and the calls to set_fpr64 in restore_msa_extcontext will overrun the end of the FP register context into the FCSR & MSACSR values. GCC 6.x has become smart enough to detect this & complain like so: arch/mips/kernel/signal.c: In function 'protected_restore_fp_context': ./arch/mips/include/asm/processor.h:114:17: error: array subscript is above array bounds [-Werror=array-bounds] fpr->val##width[FPR_IDX(width, idx)] = val; \ ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ ./arch/mips/include/asm/processor.h:118:1: note: in expansion of macro 'BUILD_FPR_ACCESS' BUILD_FPR_ACCESS(64) The only way to trigger this code to run would be for a program to set up an artificial extended MSA context structure following a sigframe & execute sigreturn. Whilst this doesn't allow a program to write to any state that it couldn't already, it makes little sense to allow this "restoration" of MSA context in a system that doesn't support MSA. Fix this by killing a program with SIGSYS if it tries something as crazy as "restoring" fake MSA context in this way, also fixing the build error & allowing for most of restore_msa_extcontext to be optimised out of kernels without support for MSA. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Reported-by: Michal Toman <michal.toman@imgtec.com> Fixes: bf82cb30c7e5 ("MIPS: Save MSA extended context around signals") Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Toman <michal.toman@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13164/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Force CPUs to lose FP context during mode switches	Paul Burton
	commit 6b8322576e9d325b65c54fbef64e4e8690ad70ce upstream. Commit 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS") added support for the PR_SET_FP_MODE prctl, which allows a userland program to modify its FP mode at runtime. This is most notably required if dynamic linking leads to the FP mode requirement changing at runtime from that indicated in the initial executable's ELF header. In order to avoid overhead in the general FP context restore code, it aimed to have threads in the process become unable to enable the FPU during a mode switch & have the thread calling the prctl syscall wait for all other threads in the process to be context switched at least once. Once that happens we can know that no thread in the process whose mode will be switched has live FP context, and it's safe to perform the mode switch. However in the (rare) case of modeswitches occurring in multithreaded programs this can lead to indeterminate delays for the thread invoking the prctl syscall, and the code monitoring for those context switches was woefully inadequate for all but the simplest cases. Fix this by broadcasting an IPI if other CPUs may have live FP context for an affected thread, with a handler causing those CPUs to relinquish their FPU ownership. Threads will then be allowed to continue running but will stall on the wait_on_atomic_t in enable_restore_fp_context if they attempt to use FP again whilst the mode switch is still in progress. The end result is less fragile poking at scheduler context switch counts & a more expedient completion of the mode switch. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS") Reviewed-by: Maciej W. Rozycki <macro@imgtec.com> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13145/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Fix MSA ld_/st_ asm macros to use PTR_ADDU	James Hogan
	commit ea1688573426adc2587ed52d086b51c7c62eaca3 upstream. The MSA ld_/st_ assembler macros for when the toolchain doesn't support MSA use addu to offset the base address. However it is a virtual memory pointer so fix it to use PTR_ADDU which expands to daddu for 64-bit kernels. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13062/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Use copy_s.fmt rather than copy_u.fmt	Paul Burton
	commit 8a3c8b48aca8771bff3536e40aa26ffb311699d1 upstream. In revision 1.12 of the MSA specification, the copy_u.w instruction has been removed for MIPS32 & the copy_u.d instruction has been removed for MIPS64. Newer toolchains (eg. Codescape SDK essentials 2015.10) will complain about this like so: arch/mips/kernel/r4k_fpu.S:290: Error: opcode not supported on this processor: mips32r2 (mips32r2) `copy_u.w $1,$w26[3]' Since we always copy to the width of a GPR, simply use copy_s instead of copy_u to fix this. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13061/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU	Huacai Chen
	commit 3484de7bcbed20ecbf2b8d80671619e7059e2dd7 upstream. Due to datasheet, reserving 0xff800000~0xffffffff (8MB below 4GB) is not enough for RS780E integrated GPU's TOM (top of memory) registers and MSI/MSI-x memory region, so we reserve 0xfe000000~0xffffffff (32MB below 4GB). Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J . Hill <sjhill@realitydiluted.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12889/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Reserve nosave data for hibernation	Huacai Chen
	commit a95d069204e178f18476f5499abab0d0d9cbc32c upstream. After commit 92923ca3aacef63c92d ("mm: meminit: only set page reserved in the memblock region"), the MIPS hibernation is broken. Because pages in nosave data section should be "reserved", but currently they aren't set to "reserved" at initialization. This patch makes hibernation work again. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J . Hill <sjhill@realitydiluted.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12888/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: ath79: make bootconsole wait for both THRE and TEMT	Matthias Schiffer
	commit f5b556c94c8490d42fea79d7b4ae0ecbc291e69d upstream. This makes the ath79 bootconsole behave the same way as the generic 8250 bootconsole. Also waiting for TEMT (transmit buffer is empty) instead of just THRE (transmit buffer is not full) ensures that all characters have been transmitted before the real serial driver starts reconfiguring the serial controller (which would sometimes result in garbage being transmitted.) This change does not cause a visible performance loss. In addition, this seems to fix a hang observed in certain configurations on many AR7xxx/AR9xxx SoCs during autoconfig of the real serial driver. A more complete follow-up patch will disable 8250 autoconfig for ath79 altogether (the serial controller is detected as a 16550A, which is not fully compatible with the ath79 serial, and the autoconfig may lead to undefined behavior on ath79.) Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Loongson-3: Fix build error after ld-version.sh modification	Huacai Chen
	commit 820880cdba0137baff6cc0e828c3c418c363ae44 upstream. Commit d5ece1cb074b2c ("Fix ld-version.sh to handle large 3rd version part") modifies the ld version description. This causes a build error on Loongson-3, so fix it. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J . Hill <sjhill@realitydiluted.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12890/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Sync icache & dcache in set_pte_at	Paul Burton
	commit 37d22a0d798b5c938b277d32cfd86dc231381342 upstream. It's possible for pages to become visible prior to update_mmu_cache running if a thread within the same address space preempts the current thread or runs simultaneously on another CPU. That is, the following scenario is possible: CPU0 CPU1 write to page flush_dcache_page flush_icache_page set_pte_at map page update_mmu_cache If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid & visible, and update_mmu_cache where the dcache flush occurs then CPU1s icache will fill from stale data (unless it fills from the dcache, in which case all is good, but most MIPS CPUs don't have this property). Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.") attempted to fix that by performing the dcache flush in flush_icache_page such that it occurs before the set_pte_at call makes the page visible. However it has the problem that not all code that writes to pages exposed to userland call flush_icache_page. There are many callers of set_pte_at under mm/ and only 2 of them do call flush_icache_page. Thus the race window between a page becoming visible & being coherent between the icache & dcache remains open in some cases. To illustrate some of the cases, a WARN was added to __update_cache with this patch applied that triggered in cases where a page about to be flushed from the dcache was not the last page provided to flush_icache_page. That is, backtraces were obtained for cases in which the race window is left open without this patch. The 2 standout examples follow. When forking a process: [ 15.271842] [<80417630>] __update_cache+0xcc/0x188 [ 15.277274] [<80530394>] copy_page_range+0x56c/0x6ac [ 15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac [ 15.289028] [<80429f80>] do_fork+0xe4/0x420 [ 15.293747] [<80413808>] handle_sys+0x128/0x14c When exec'ing an ELF binary: [ 14.445964] [<80417630>] __update_cache+0xcc/0x188 [ 14.451369] [<80538d88>] move_page_tables+0x414/0x498 [ 14.457075] [<8055d848>] setup_arg_pages+0x220/0x318 [ 14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0 [ 14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214 [ 14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c [ 14.480324] [<8055f938>] do_execve+0x38/0x44 [ 14.485137] [<80413808>] handle_sys+0x128/0x14c These code paths write into a page, call flush_dcache_page then call set_pte_at without flush_icache_page inbetween. The end result is that the icache can become corrupted & userland processes may execute unexpected or invalid code, typically resulting in a reserved instruction exception, a trap or a segfault. Fix this race condition fully by performing any cache maintenance required to keep the icache & dcache in sync in set_pte_at, before the page is made valid. This has the added bonus of ensuring the cache maintenance always happens in one location, rather than being duplicated in flush_icache_page & update_mmu_cache. It also matches the way other architectures solve the same problem (see arm, ia64 & powerpc). Signed-off-by: Paul Burton <paul.burton@imgtec.com> Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com> Cc: Lars Persson <lars.persson@axis.com> Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.") Cc: Steven J. Hill <sjhill@realitydiluted.com> Cc: David Daney <david.daney@cavium.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12722/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Handle highmem pages in __update_cache	Paul Burton
	commit f4281bba818105c7c91799abe40bc05c0dbdaa25 upstream. The following patch will expose __update_cache to highmem pages. Handle them by mapping them in for the duration of the cache maintenance, just like in __flush_dcache_page. The code for that isn't shared because we need the page address in __update_cache so sharing became messy. Given that the entirity is an extra 5 lines, just duplicate it. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Lars Persson <lars.persson@axis.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12721/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Flush highmem pages in __flush_dcache_page	Paul Burton
	commit 234859e49a15323cf1b2331bdde7f658c4cb45fb upstream. When flush_dcache_page is called on an executable page, that page is about to be provided to userland & we can presume that the icache contains no valid entries for its address range. However if the icache does not fill from the dcache then we cannot presume that the pages content has been written back as far as the memories that the dcache will fill from (ie. L2 or further out). This was being done for lowmem pages, but not for highmem which can lead to icache corruption. Fix this by mapping highmem pages & flushing their content from the dcache in __flush_dcache_page before providing the page to userland, just as is done for lowmem pages. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Lars Persson <lars.persson@axis.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12720/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Fix watchpoint restoration	James Hogan
	commit a7e89326b415b5d81c4b1016fd4a40db861eb58d upstream. Commit f51246efee2b ("MIPS: Get rid of finish_arch_switch().") moved the __restore_watch() call from finish_arch_switch() (i.e. after resume() returns) to before the resume() call in switch_to(). This results in watchpoints only being restored when a task is descheduled, preventing the watchpoints from being effective most of the time, except due to chance before the watchpoints are lazily removed. Fix the call sequence from switch_to() through to mips_install_watch_registers() to pass the task_struct pointer of the next task, instead of using current. This allows the watchpoints for the next (non-current) task to be restored without reintroducing finish_arch_switch(). Fixes: f51246efee2b ("MIPS: Get rid of finish_arch_switch().") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12726/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Fix uapi include in exported asm/siginfo.h	James Hogan
	commit 987e5b834467c9251ca584febda65ef8f66351a9 upstream. Since commit 8cb48fe169dd ("MIPS: Provide correct siginfo_t.si_stime"), MIPS' uapi/asm/siginfo.h has included uapi/asm-generic/siginfo.h directly before defining MIPS' struct siginfo, in order to get the necessary definitions needed for the siginfo struct without the generic copy_siginfo() hitting compiler errors due to struct siginfo not yet being defined. Now that the generic copy_siginfo() is moved out to linux/signal.h we can safely include asm-generic/siginfo.h before defining the MIPS specific struct siginfo, which avoids the uapi/ include as well as breakage due to generic copy_siginfo() being defined before struct siginfo. Reported-by: Christopher Ferris <cferris@google.com> Fixes: 8cb48fe169dd ("MIPS: Provide correct siginfo_t.si_stime") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Petr Malat <oss@malat.biz> Cc: linux-mips@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Fix siginfo.h to use strict posix types	James Hogan
	commit 5daebc477da4dfeb31ae193d83084def58fd2697 upstream. Commit 85efde6f4e0d ("make exported headers use strict posix types") changed the asm-generic siginfo.h to use the __kernel_* types, and commit 3a471cbc081b ("remove __KERNEL_STRICT_NAMES") make the internal types accessible only to the kernel, but the MIPS implementation hasn't been updated to match. Switch to proper types now so that the exported asm/siginfo.h won't produce quite so many compiler errors when included alone by a user program. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Christopher Ferris <cferris@google.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12477/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07	MIPS: Avoid using unwind_stack() with usermode	James Hogan
	commit 81a76d7119f63c359750e4adeff922a31ad1135f upstream. When showing backtraces in response to traps, for example crashes and address errors (usually unaligned accesses) when they are set in debugfs to be reported, unwind_stack will be used if the PC was in the kernel text address range. However since EVA it is possible for user and kernel address ranges to overlap, and even without EVA userland can still trigger an address error by jumping to a KSeg0 address. Adjust the check to also ensure that it was running in kernel mode. I don't believe any harm can come of this problem, since unwind_stack() is sufficiently defensive, however it is only meant for unwinding kernel code, so to be correct it should use the raw backtracing instead. Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/11701/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> (cherry picked from commit d2941a975ac745c607dfb590e92bb30bc352dad9) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>