summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-01-10um: Add seccomp supportMickaël Salaün
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64 subarchitectures. secure_computing() is called first in handle_syscall() so that the syscall emulation will be aborted quickly if matching a seccomp rule. This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: James Hogan <james.hogan@imgtec.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: Add full asm/syscall.h supportMickaël Salaün
Add subarchitecture-independent implementation of asm-generic/syscall.h allowing access to user system call parameters and results: * syscall_get_nr() * syscall_rollback() * syscall_get_error() * syscall_get_return_value() * syscall_set_return_value() * syscall_get_arguments() * syscall_set_arguments() * syscall_get_arch() provided by arch/x86/um/asm/syscall.h This provides the necessary syscall helpers needed by HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error(). This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOKMickaël Salaün
Some architectures do not implement PTRACE_GETREGSET nor PTRACE_SETREGSET (required by HAVE_ARCH_TRACEHOOK) but only implement PTRACE_GETREGS and PTRACE_SETREGS (e.g. User-mode Linux). This improve seccomp selftest portability for architectures without HAVE_ARCH_TRACEHOOK support by defining a new trigger HAVE_GETREGS. For now, this is only enabled for i386 and x86_64 architectures. This is required to be able to run this tests on User-mode Linux. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: Fix ptrace GETREGS/SETREGS bugsMickaël Salaün
This fix two related bugs: * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value * PTRACE_SETREGS can't set the orig_ax value (erased by initial value) Get rid of the now useless and error-prone get_syscall(). Fix inconsistent behavior in the ptrace implementation for i386 when updating orig_eax automatically update the syscall number as well. This is now updated in handle_syscall(). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Anton Ivanov <aivanov@brocade.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: link with -lpthreadVegard Nossum
Similarly to commit fb1770aa78a43530940d0c2dd161e77bc705bdac, with gcc 5 on Ubuntu and CONFIG_STATIC_LINK=y I was seeing these linker errors: /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0xcd): undefined reference to `pthread_once' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0x126): undefined reference to `pthread_attr_init' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0x168): undefined reference to `pthread_attr_setdetachstate' [...] Obviously we also need -lpthread for librt.a. Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Update UBD to use pread/pwrite family of functionsAnton Ivanov
This decreases the number of syscalls per read/write by half. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Do not change hard IRQ flags in soft IRQ processingAnton Ivanov
Software IRQ processing in generic architectures assumes that the exit out of hard IRQ may have re-enabled interrupts (some architectures may have an implicit EOI). It presumes them enabled and toggles the flags once more just in case unless this is turned off in the architecture specific hardirq.h by setting __ARCH_IRQ_EXIT_IRQS_DISABLED This patch adds this to UML where due to the way IRQs are handled it is an optimization (it works fine without it too). Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Prevent IRQ handler reentrancyAnton Ivanov
The existing IRQ handler design in UML does not prevent reentrancy This is mitigated by fd-enable/fd-disable semantics for the IO portion of the UML subsystem. The timer, however, can and is re-entered resulting in very deep stack usage and occasional stack exhaustion. This patch prevents this by checking if there is a timer interrupt in-flight before processing any pending timer interrupts. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10uml: flush stdout before forkingVegard Nossum
I was seeing some really weird behaviour where piping UML's output somewhere would cause output to get duplicated: $ ./vmlinux | head -n 40 Checking that ptrace can change system call numbers...Core dump limits : soft - 0 hard - NONE OK Checking syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Checking advanced syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Core dump limits : soft - 0 hard - NONE This is because these tests do a fork() which duplicates the non-empty stdout buffer, then glibc flushes the duplicated buffer as each child exits. A simple workaround is to flush before forking. Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10uml: fix hostfs mknod()Vegard Nossum
An inverted return value check in hostfs_mknod() caused the function to return success after handling it as an error (and cleaning up). It resulted in the following segfault when trying to bind() a named unix socket: Pid: 198, comm: a.out Not tainted 4.4.0-rc4 RIP: 0033:[<0000000061077df6>] RSP: 00000000daae5d60 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208 RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600 RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000 R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000 R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88 Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6 CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1 Stack: e027d620 dfc54208 0000006f da981398 61bee000 0000c1ed daae5de0 0000006e e027d620 dfcd4208 00000005 6092a460 Call Trace: [<60dedc67>] SyS_bind+0xf7/0x110 [<600587be>] handle_syscall+0x7e/0x80 [<60066ad7>] userspace+0x3e7/0x4e0 [<6006321f>] ? save_registers+0x1f/0x40 [<6006c88e>] ? arch_prctl+0x1be/0x1f0 [<60054985>] fork_handler+0x85/0x90 Let's also get rid of the "cosmic ray protection" while we're at it. Fixes: e9193059b1b3 "hostfs: fix races in dentry_name() and inode_name()" Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10m68k: Provide __phys_to_pfn() and __pfn_to_phys()Sudip Mukherjee
The defconfig build of m68k was failing with the error: implicit declaration of function '__pfn_to_phys' Other architectures have added <asm/memory.h>, but if we do so here then we will also get redeclaration of some other functions. So it is better to copy these macros into page.h. Fixes: 0a3c3bf11240 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()") Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> (m68knommu) [geert: Apply to page.h instead of page_mm.h to cover nommu, reword] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-01-10m68k/atari, m68k/sun3: Fix SCSI platform device registration when driver is ↵Finn Thain
modular Fixes: 3ff228af84b5 ("atari_scsi: Convert to platform device") Fixes: 0d31f8759109 ("sun3_scsi: Convert to platform device") Reported-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-01-09Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "A single fix for machines with pages > 4k (PPC mostly). There's a bug in our optimal transfer size code where we don't account for pages > 4k and can set the transfer size to be less than the page size causing nasty failures" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: sd: Reject optimal transfer length smaller than page size
2016-01-09Merge tag 'pci-v4.4-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixlet from Bjorn Helgaas: "This marks the TI DRA7xx host bridge driver as broken. Apparently it has never worked without some additional out-of-tree code, so I'm going to mark it broken now and remove it completely next cycle unless it's fixed" * tag 'pci-v4.4-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: dra7xx: Mark driver as broken
2016-01-09Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New features: - Allow using trace events fields as sort order keys, making 'perf evlist --trace_fields' show those, and then the user can select a subset and use like: perf top -e sched:sched_switch -s prev_comm,next_comm That works as well in 'perf report' when handling files containing tracepoints. The default when just tracepoint events are found in a perf.data file is to format it like ftrace, using the libtraceevent formatters, plugins, etc (Namhyung Kim) - Add support in 'perf script' to process 'perf stat record' generated files, culminating in a python perf script that calculates CPI (Cycles per Instruction) (Jiri Olsa) - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim) - perf report now defaults to --group if the perf.data file has grouped events, try it with: # perf record -e '{cycles,instructions}' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ] # perf report # Samples: 1K of event 'anon group { cycles, instructions }' # Event count (approx.): 1955219195 # # Overhead Command Shared Object Symbol 2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle 1.05% 0.33% firefox libxul.so [.] js::SetObjectElement 1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno 0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab 0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1> 0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay 0.62% 1.27% firefox libxul.so [.] js::GetIterator 0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty 0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining User visible fixes: - Coect data mmaps so that the DWARF unwinder can handle usecases needing them, like softice (Jiri Olsa) - Decay callchains in fractal mode, fixing up cases where 'perf top -g' would show entries with more than 100% (Namhyung Kim) Infrastructure changes: - Sync tools/lib with the lib/ in the kernel sources for find_bit.c and move bitmap.[ch] from tools/perf/util/ to tools/lib/ (Arnaldo Carvalho de Melo) - No need to set attr.sample_freq in some 'perf test' entries that only want to deal with PERF_RECORD_ meta-events, improve a bit error output for CQM test (Arnaldo Carvalho de Melo) - Fix python binding build, adding some missing object files now required due to cpumap using find_bit stuff (Arnaldo Carvalho de Melo) - tools/build improvemnts (Jiri Olsa) - Add more files to cscope/ctags databases (Jiri Olsa) - Do not show 'trace' in 'perf help' if it is not compiled in (Jiri Olsa) - Make perf_evlist__open() open evsels with their cpus and threads, like perf record does, making them consistent (Adrian Hunter) - Fix pmu snapshot initialization bug (Stephane Eranian) - Add missing headers in perf's MANIFEST (Wang Nan) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-09hwmon: (nct6683) Add basic support for NCT6683 on Mitac boardsGuenter Roeck
Mitac microcode differs from Intel microcode. One key difference is that pwm values can be written. Detect vendor from customer ID field and no longer use DMI data to identify which microcode is running on the chip. Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-01-08vmstat: allocate vmstat_wq before it is usedMichal Hocko
kernel test robot has reported the following crash: BUG: unable to handle kernel NULL pointer dereference at 00000100 IP: [<c1074df6>] __queue_work+0x26/0x390 *pdpt = 0000000000000000 *pde = f000ff53f000ff53 *pde = f000ff53f000ff53 Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.4.0-rc4-00139-g373ccbe #1 Workqueue: events vmstat_shepherd task: cb684600 ti: cb7ba000 task.ti: cb7ba000 EIP: 0060:[<c1074df6>] EFLAGS: 00010046 CPU: 0 EIP is at __queue_work+0x26/0x390 EAX: 00000046 EBX: cbb37800 ECX: cbb37800 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: cb7bbe68 ESP: cb7bbe38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000100 CR3: 01fd5000 CR4: 000006b0 Stack: Call Trace: __queue_delayed_work+0xa1/0x160 queue_delayed_work_on+0x36/0x60 vmstat_shepherd+0xad/0xf0 process_one_work+0x1aa/0x4c0 worker_thread+0x41/0x440 kthread+0xb0/0xd0 ret_from_kernel_thread+0x21/0x40 The reason is that start_shepherd_timer schedules the shepherd work item which uses vmstat_wq (vmstat_shepherd) before setup_vmstat allocates that workqueue so if the further initialization takes more than HZ we might end up scheduling on a NULL vmstat_wq. This is really unlikely but not impossible. Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") Reported-by: kernel test robot <ying.huang@linux.intel.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-08compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)Jann Horn
This replaces all code in fs/compat_ioctl.c that translated ioctl arguments into a in-kernel structure, then performed do_ioctl under set_fs(KERNEL_DS), with code that allocates data on the user stack and can call the VFS ioctl handler under USER_DS. This is done as a hardening measure because the caller does not know what kind of ioctl handler will be invoked, only that no corresponding compat_ioctl handler exists and what the ioctl command number is. The accidental invocation of an unlocked_ioctl handler that unexpectedly calls copy_to_user could be a severe security issue. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08compat_ioctl: don't pass fd around when not neededAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08compat_ioctl: don't look up the fd twiceJann Horn
In code in fs/compat_ioctl.c that translates ioctl arguments into a in-kernel structure, then performs sys_ioctl, possibly under set_fs(KERNEL_DS), this commit changes the sys_ioctl calls to do_ioctl calls. do_ioctl is a new function that does the same thing as sys_ioctl, but doesn't look up the fd again. This change is made to avoid (potential) security issues because of ioctl handlers that accept one of the ioctl commands I2C_FUNCS, VIDEO_GET_EVENT, MTIOCPOS, MTIOCGET, TIOCGSERIAL, TIOCSSERIAL, RTC_IRQP_READ, RTC_EPOCH_READ. This can happen for multiple reasons: - The ioctl command number could be reused. - The ioctl handler might not check the full ioctl command. This is e.g. true for drm_ioctl. - The ioctl handler is very special, e.g. cuse_file_ioctl The real issue is that set_fs(KERNEL_DS) is used here, but that's fixed in a separate commit "compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)". This change mitigates potential security issues by preventing a race that permits invocation of unlocked_ioctl handlers under KERNEL_DS through compat code even if a corresponding compat_ioctl handler exists. So far, no way has been identified to use this to damage kernel memory without having CAP_SYS_ADMIN in the init ns (with the capability, doing reads/writes at arbitrary kernel addresses should be easy through CUSE's ioctl handler with FUSE_IOCTL_UNRESTRICTED set). [AV: two missed sys_ioctl() taken care of] Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08dm snapshot: fix hung bios when copy error occursMikulas Patocka
When there is an error copying a chunk dm-snapshot can incorrectly hold associated bios indefinitely, resulting in hung IO. The function copy_callback sets pe->error if there was error copying the chunk, and then calls complete_exception. complete_exception calls pending_complete on error, otherwise it calls commit_exception with commit_callback (and commit_callback calls complete_exception). The persistent exception store (dm-snap-persistent.c) assumes that calls to prepare_exception and commit_exception are paired. persistent_prepare_exception increases ps->pending_count and persistent_commit_exception decreases it. If there is a copy error, persistent_prepare_exception is called but persistent_commit_exception is not. This results in the variable ps->pending_count never returning to zero and that causes some pending exceptions (and their associated bios) to be held forever. Fix this by unconditionally calling commit_exception regardless of whether the copy was successful. A new "valid" parameter is added to commit_exception -- when the copy fails this parameter is set to zero so that the chunk that failed to copy (and all following chunks) is not recorded in the snapshot store. Also, remove commit_callback now that it is merely a wrapper around pending_complete. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-01-08Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "This is the final small set of ARM SoC bug fixes for linux-4.4, almost all regressions: OMAP: - data corruption on the Nokia N900 flash Allwinner: - Two defconfig change to get USB working again ARM Versatile: - Interrupt numbers gone bad after an older bug fix Nomadik: - Crashes from incorrect L2 cache settings VIA vt8500: - SD/MMC support on WM8650 never worked" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: dts: vt8500: Add SDHC node to DTS file for WM8650 ARM: Fix broken USB support in multi_v7_defconfig for sunxi devices ARM: versatile: fix MMC/SD interrupt assignment ARM: nomadik: set latencies to 8 cycles ARM: OMAP2+: Fix onenand rate detection to avoid filesystem corruption ARM: Fix broken USB support in sunxi_defconfig
2016-01-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fix from Paolo Bonzini: "A simple fix. I'm sending it before the merge window, because it refines a patch found in your master branch but not yet in the kvm/next branch that is destined for 4.5" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: only channel 0 of the i8254 is linked to the HPET
2016-01-08Merge tag 'pm+acpi-4.4-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Just one obvious fix that adds a missing function argument in ACPI code introduced recently (Kees Cook)" * tag 'pm+acpi-4.4-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / property: avoid leaking format string into kobject name
2016-01-08Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A handful of x86 fixes: - a syscall ABI fix, fixing an Android breakage - a Xen PV guest fix relating to the RTC device, causing a non-working console - a Xen guest syscall stack frame fix - an MCE hotplug CPU crash fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/numachip: Fix NumaConnect2 MMCFG PCI access x86/entry: Restore traditional SYSENTER calling convention x86/entry: Fix some comments x86/paravirt: Prevent rtc_cmos platform device init on PV guests x86/xen: Avoid fast syscall path for Xen PV guests x86/mce: Ensure offline CPUs don't participate in rendezvous process
2016-01-08Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc scheduler fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Reset task's lockless wake-queues on fork() sched/core: Fix unserialized r-m-w scribbling stuff sched/core: Check tgid in is_global_init() sched/fair: Fix multiplication overflow on 32-bit systems
2016-01-08Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two core subsystem fixes, plus a handful of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix race in swevent hash perf: Fix race in perf_event_exec() perf list: Robustify event printing routine perf list: Add support for PERF_COUNT_SW_BPF_OUT perf hists browser: Fix segfault if use symbol filter in cmdline perf hists browser: Reset selection when refresh perf hists browser: Add NULL pointer check to prevent crash perf buildid-list: Fix return value of perf buildid-list -k perf buildid-list: Show running kernel build id fix
2016-01-08Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fixes a core IRQ subsystem deadlock" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent chip buslock deadlock
2016-01-08Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block revert from Jens Axboe: "The previous pull request had a split fix for NVMe, however there are corner cases where that ends up blowing up. So let's revert it for 4.4. The regression isn't introduced in this cycle, and it's "just" a performance regression, not a stability/integrity issue" * 'for-linus' of git://git.kernel.dk/linux-block: Revert "block: Split bios on chunk boundaries"
2016-01-08Merge tag 'dmaengine-fix-4.4' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull dmaengine fixes from Vinod Koul: "Late fixes for 4.4 are three fixes for drivers which include a revert of mic-x100 fix which is causing regression, xgene fix for double IRQ and async_tx fix to use GFP_NOWAIT" * tag 'dmaengine-fix-4.4' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: xgene-dma: Fix double IRQ issue by setting IRQ_DISABLE_UNLAZY flag async_tx: use GFP_NOWAIT rather than GFP_IO dmaengine: Revert "dmaengine: mic_x100: add missing spin_unlock"
2016-01-08Merge branch 'dmi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi fix from Jean Delvare. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6
2016-01-08Merge tag 'sound-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A slightly higher volume than a new year's wish, but not too worrisome: a large LOC is only for HD-audio device-specific quirks, so fairly safe to apply. The rest ASoC fixes are all trivial and small; a simple replacement of mutex call with nested lock version, a few Arizona and Realtek codec fixes, and a regression fix for Skylake firmware handling" * tag 'sound-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: Intel: Skylake: Fix the memory leak ASoC: Intel: Skylake: Revert previous broken fix memory leak fix ASoC: Use nested lock for snd_soc_dapm_mutex_lock ASoC: rt5645: add sys clk detection ALSA: hda - Add keycode map for alc input device ALSA: hda - Add mic mute hotkey quirk for Lenovo ThinkCentre AIO ASoC: arizona: Fix bclk for sample rates that are multiple of 4kHz
2016-01-08x86/mm: Micro-optimise clflush_cache_range()Chris Wilson
Whilst inspecting the asm for clflush_cache_range() and some perf profiles that required extensive flushing of single cachelines (from part of the intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading boot_cpu_data.x86_clflush_size on every iteration of the loop. We can manually hoist that read which perf regarded as taking ~25% of the function time for a single cacheline flush. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Borislav Petkov <bp@suse.de> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com> Link: http://lkml.kernel.org/r/1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-08kvm/x86: Hyper-V SynIC timers tracepointsAndrey Smetanin
Trace the following Hyper SynIC timers events: * periodic timer start * one-shot timer start * timer callback * timer expiration and message delivery result * timer config setup * timer count setup * timer cleanup Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Hyper-V SynIC tracepointsAndrey Smetanin
Trace the following Hyper SynIC events: * set msr * set sint irq * ack sint * sint irq eoi Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Update SynIC timers on guest entry onlyAndrey Smetanin
Consolidate updating the Hyper-V SynIC timers in a single place: on guest entry in processing KVM_REQ_HV_STIMER request. This simplifies the overall logic, and makes sure the most current state of msrs and guest clock is used for arming the timers (to achieve that, KVM_REQ_HV_STIMER has to be processed after KVM_REQ_CLOCK_UPDATE). Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Skip SynIC vector check for QEMU sideAndrey Smetanin
QEMU zero-inits Hyper-V SynIC vectors. We should allow that, and don't reject zero values if set by the host. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Hyper-V fix SynIC timer disabling conditionAndrey Smetanin
Hypervisor Function Specification(HFS) doesn't require to disable SynIC timer at timer config write if timer->count = 0. So drop this check, this allow to load timers MSR's during migration restore, because config are set before count in QEMU side. Also fix condition according to HFS doc(15.3.1): "It is not permitted to set the SINTx field to zero for an enabled timer. If attempted, the timer will be marked disabled (that is, bit 0 cleared) immediately." Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Reorg stimer_expiration() to better control timer restartAndrey Smetanin
Split stimer_expiration() into two parts - timer expiration message sending and timer restart/cleanup based on timer state(config). This also fixes a bug where a one-shot timer message whose delivery failed once would get lost for good. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Hyper-V unify stimer_start() and stimer_restart()Andrey Smetanin
This will be used in future to start Hyper-V SynIC timer in several places by one logic in one function. Changes v2: * drop stimer->count == 0 check inside stimer_start() * comment stimer_start() assumptions Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Drop stimer_stop() functionAndrey Smetanin
The function stimer_stop() is called in one place so remove the function and replace it's call by function content. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08kvm/x86: Hyper-V timers fix incorrect logical operationAndrey Smetanin
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08KVM: move architecture-dependent requests to arch/Paolo Bonzini
Since the numbers now overlap, it makes sense to enumerate them in asm/kvm_host.h rather than linux/kvm_host.h. Functions that refer to architecture-specific requests are also moved to arch/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08KVM: renumber vcpu->request bitsPaolo Bonzini
Leave room for 4 more arch-independent requests. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08KVM: document which architecture uses each request bitPaolo Bonzini
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requestsPaolo Bonzini
Suggested-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> [Takuya moved all subsequent constants to fill the void, but that is useless in view of the following patches. So this change looks nothing like the original. - Paolo] Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08Merge tag 'kvm-s390-next-4.5-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Feature and fix for 4.5 - allow the runtime instrumentation support inside the guests - remove a useless memory barrier
2016-01-08perf evlist: Add --trace-fields option to show trace fieldsNamhyung Kim
To use dynamic sort keys, it might be good to add an option to see the list of field names. $ perf evlist -i perf.data.sched sched:sched_switch sched:sched_stat_wait sched:sched_stat_sleep sched:sched_stat_iowait sched:sched_stat_runtime sched:sched_process_fork sched:sched_wakeup sched:sched_wakeup_new sched:sched_migrate_task # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events $ perf evlist -i perf.data.sched --trace-fields sched:sched_switch: trace_fields: prev_comm,prev_pid,prev_prio,prev_state,next_comm,next_pid,next_prio sched:sched_stat_wait: trace_fields: comm,pid,delay sched:sched_stat_sleep: trace_fields: comm,pid,delay sched:sched_stat_iowait: trace_fields: comm,pid,delay sched:sched_stat_runtime: trace_fields: comm,pid,runtime,vruntime sched:sched_process_fork: trace_fields: parent_comm,parent_pid,child_comm,child_pid sched:sched_wakeup: trace_fields: comm,pid,prio,success,target_cpu sched:sched_wakeup_new: trace_fields: comm,pid,prio,success,target_cpu sched:sched_migrate_task: trace_fields: comm,pid,prio,orig_cpu,dest_cpu Committer notes: For another file, in verbose mode: # perf evlist -v --trace-fields sched:sched_switch: type: 2, size: 112, config: 0x10b, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, trace_fields: prev_comm,prev_pid,prev_prio,prev_state,next_comm,next_pid,next_prio # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1452125549-1511-5-git-send-email-namhyung@kernel.org [ Replaced 'trace_fields=' with 'trace_fields: ' to make the output consistent in -v mode ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-08perf record: Store data mmaps for dwarf unwindJiri Olsa
Currently we don't synthesize data mmap by default. It depends on -d option, that enables data address sampling. But we've seen cases (softice) where DWARF unwinder went through non executable mmaps, which we need to lookup in MAP__VARIABLE tree. Making data mmaps to be synthesized for dwarf unwind as well. Reported-by: Noel Grandin <noelgrandin@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160107133022.GA32115@krava.brq.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-08perf libdw: Check for mmaps also in MAP__VARIABLE treeJiri Olsa
We've seen cases (softice) where DWARF unwinder went through non executable mmaps, which we need to lookup in MAP__VARIABLE tree. Reported-and-Tested-by: Noel Grandin <noelgrandin@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1452158050-28061-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>