Age | Commit message (Collapse) | Author |
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This patch adds a Userspace IO driver for netX-based fieldbus cards by
Hilscher (see http://www.hilscher.com). ATM, cifX and comX cards are
supported. The userspace part for this driver is provided by Hilscher
and should come with the card.
The driver is in use for several months now and has been tested by
people at Hilscher and Linutronix.
Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Compiling stuff helps even if there is no hardware to test on.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When the plugged CPU sets the online flag it enables interrupts and
goes idle. When an interrupt happens and tried to wakeup a softirq
thread before the other cpu sets the active flag, then the softirq
thread is put on one of the active cpus. Prevent this by waiting for
the cpu_active bit.
Temporary workaround. Needs more thought.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
drivers/dca: convert dca_lock to a raw spinlock
[ 25.607517] igb 0000:01:00.0: DCA enabled
[ 25.607524] BUG: sleeping function called from invalid context at kernel/rtmutex.c:684
[ 25.607528] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 1805, name: modprobe
[ 25.607532] Pid: 1805, comm: modprobe Tainted: G M W N 2.6.33.5-rt23-rt_debug #2
[ 25.607536] Call Trace:
[ 25.607557] [<ffffffff820078a1>] try_stack_unwind+0x151/0x1a0
[ 25.607566] [<ffffffff820062c2>] dump_trace+0x92/0x370
[ 25.607573] [<ffffffff8200731c>] show_trace_log_lvl+0x5c/0x80
[ 25.607578] [<ffffffff82007355>] show_trace+0x15/0x20
[ 25.607587] [<ffffffff823f4588>] dump_stack+0x77/0x8f
[ 25.607595] [<ffffffff82043f2a>] __might_sleep+0x11a/0x130
[ 25.607602] [<ffffffff823f7b93>] rt_spin_lock+0x83/0x90
[ 25.607611] [<ffffffffa0209138>] dca_common_get_tag+0x28/0x80 [dca]
[ 25.607622] [<ffffffffa02091c8>] dca3_get_tag+0x18/0x20 [dca]
[ 25.607634] [<ffffffffa0244e71>] igb_update_dca+0xb1/0x1d0 [igb]
[ 25.607649] [<ffffffffa0244ff5>] igb_setup_dca+0x65/0x80 [igb]
[ 25.607663] [<ffffffffa02535a6>] igb_probe+0x946/0xe4d [igb]
[ 25.607678] [<ffffffff82247517>] local_pci_probe+0x17/0x20
[ 25.607686] [<ffffffff82248661>] pci_device_probe+0x121/0x130
[ 25.607699] [<ffffffff822e4832>] driver_probe_device+0xd2/0x2e0
[ 25.607707] [<ffffffff822e4adb>] __driver_attach+0x9b/0xa0
[ 25.607714] [<ffffffff822e3d1b>] bus_for_each_dev+0x6b/0xa0
[ 25.607720] [<ffffffff822e4591>] driver_attach+0x21/0x30
[ 25.607727] [<ffffffff822e3425>] bus_add_driver+0x1e5/0x350
[ 25.607734] [<ffffffff822e4e41>] driver_register+0x81/0x160
[ 25.607742] [<ffffffff8224890f>] __pci_register_driver+0x6f/0xf0
[ 25.607752] [<ffffffffa011505b>] igb_init_module+0x5b/0x5d [igb]
[ 25.607769] [<ffffffff820001dd>] do_one_initcall+0x3d/0x1a0
[ 25.607778] [<ffffffff820961f6>] sys_init_module+0xe6/0x270
[ 25.607786] [<ffffffff82003232>] system_call_fastpath+0x16/0x1b
[ 25.607794] [<00007f84d6783f4a>] 0x7f84d6783f4a
[ tglx: Fixed the domain allocation which was calling kzalloc from the
irq disabled section ]
Signed-off-by: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1278491341.7926.4.camel@marge.simson.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When the plugged CPU sets the online flag it enables interrupts and
goes idle. When an interrupt happens and tried to wakeup a softirq
thread before the other cpu sets the active flag, then the softirq
thread is put on one of the active cpus. Prevent this by waiting for
the cpu_active bit.
Temporary workaround. Needs more thought.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When idle tasks exits then we do not want to wake the cpu bound
desched thread.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
cmci_discover_lock() is taken in atomic contexts (cpu bring
up). Convert it to a raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
xt_info locking is an open coded rw_lock which works fine in mainline,
but on RT it's racy. Replace it with a real rwlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
timekeeping suspend/resume calls read_persistant_clock() which takes
rtc_lock. That results in might sleep warnings because at that point
we run with interrupts disabled.
We cannot convert rtc_lock to a raw spinlock as that would trigger
other might sleep warnings.
As a temporary workaround we disable the might sleep warnings by
setting system_state to SYSTEM_SUSPEND before calling sysdev_suspend()
and restoring it to SYSTEM_RUNNING afer sysdev_resume().
Needs to be revisited.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The requeue_pi mechanism introduced proxy locking of the rtmutex. This
creates a scenario where a task can wakeup, not knowing it has been
enqueued on an rtmutex. Blocking on an hb->lock() can overwrite a
valid value in current->pi_blocked_on, leading to an inconsistent
state.
Prevent overwriting pi_blocked_on by serializing on the waiter's
pi_lock and using the new PI_WAKEUP_INPROGRESS state flag to indicate
a waiter that has been woken by a timeout or signal. This prevents the
rtmutex code from adding the waiter to the rtmutex wait list,
returning EAGAIN to futex_requeue(), which will in turn ignore the
waiter during a requeue. Care is taken to allow current to block on
locks even if PI_WAKEUP_INPROGRESS is set.
During normal wakeup, this results in one less hb->lock protected
section. In the pre-requeue-timeout-or-signal wakeup, this removes the
"greedy locking" behavior, no attempt will be made to acquire the
lock.
[ tglx: take pi_lock with lock_irq(), removed paranoid warning,
plugged pi_state and pi_blocked_on leak, adjusted some
comments ]
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <4C3C1DCF.9090509@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
commit 0017d735092844118bef006696a750a0e4ef6ebd
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Mar 24 18:34:10 2010 +0100
sched: Fix TASK_WAKING vs fork deadlock
Oleg noticed a few races with the TASK_WAKING usage on fork.
- since TASK_WAKING is basically a spinlock, it should be IRQ safe
- since we set TASK_WAKING (*) without holding rq->lock it could
be there still is a rq->lock holder, thereby not actually
providing full serialization.
(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.
Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().
Cure the first by holding rq->lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq->lock release into select_task_rq_fair()'s cgroup stuff.
Because select_task_rq_fair() still needs to drop the rq->lock we
cannot fully get rid of TASK_WAKING.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
suitable cpu.
I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.
[ upstream commit: 9084bb8246ea935b98320554229e2f371f7f52fa ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091027.GA9155@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.
_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.
[ upstream commit: 6a1bdc1b577ebcb65f6603c57f8347309bc4ab13 ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091023.GA9148@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless.
This can race with other CPUs updating our ->cpus_allowed, and this
looks meaningless to me.
The task is current and running, it must have online cpus in ->cpus_allowed,
the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu,
this likely means we raced with set_cpus_allowed() which was called
for reason, why should sched_exec() retry and call ->select_task_rq()
again?
Change the code to call sched_class->select_task_rq() directly and do
nothing if the returned cpu is wrong after re-checking under rq->lock.
From now task_struct->cpus_allowed is always stable under TASK_WAKING,
select_fallback_rq() is always called under rq-lock or the caller or
the caller owns TASK_WAKING (select_task_rq).
[ upstream commit: 30da688ef6b76e01969b00608202fff1eed2accc ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091019.GA9141@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The previous patch preserved the retry logic, but it looks unneeded.
__migrate_task() can only fail if we raced with migration after we dropped
the lock, but in this case the caller of set_cpus_allowed/etc must initiate
migration itself if ->on_rq == T.
We already fixed p->cpus_allowed, the changes in active/online masks must
be visible to racer, it should migrate the task to online cpu correctly.
[ upstream commit: c1804d547dc098363443667609c272d1e4d15ee8 ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091014.GA9138@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
[ upstream: 1445c08d06c5594895b4fae952ef8a457e89c390 ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
cpuset_lock/cpuset_cpus_allowed_locked code
This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.
- cpuset_lock() doesn't really work. It is needed for
cpuset_cpus_allowed_locked() but we can't take this lock in
try_to_wake_up()->select_fallback_rq() path.
- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.
- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
which is not irq-safe, but try_to_wake_up() can be called from irq.
Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.
Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.
The subsequent patches try to to fix these problems.
[ upstream: 897f0b3c3ff40b443c84e271bef19bd6ae885195 ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091003.GA9123@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Trivial typo fix. rq->migration_thread can be NULL after
task_rq_unlock(), this is why we have "mt" which should be
used instead.
[ upstream: 47a70985e5c093ae03d8ccf633c70a93761d86f2 ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100330165829.GA18284@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Conflicts:
Makefile
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
We still have sporadic and hard to debug problems. Revert it for now
and revisit with Nick's new version.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
commit 2e3219b5c8a2e44e0b83ae6e04f52f20a82ac0f2 upstream.
commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809
sctp: Fix skb_over_panic resulting from multiple invalid \
parameter errors (CVE-2010-1173) (v4)
cause 'error cause' never be add the the ERROR chunk due to
some typo when check valid length in sctp_init_cause_fixed().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6377a7ae1ab82859edccdbc8eaea63782efb134d upstream.
On specific platforms, MSI is unreliable on some of the QLA24xx chips, resulting
in fatal I/O errors under load, as reported in <http://bugs.debian.org/572322>
and by some RHEL customers.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cea7daa3589d6b550546a8c8963599f7c1a3ae5c upstream.
find_keyring_by_name() can gain access to a keyring that has had its reference
count reduced to zero, and is thus ready to be freed. This then allows the
dead keyring to be brought back into use whilst it is being destroyed.
The following timeline illustrates the process:
|(cleaner) (user)
|
| free_user(user) sys_keyctl()
| | |
| key_put(user->session_keyring) keyctl_get_keyring_ID()
| || //=> keyring->usage = 0 |
| |schedule_work(&key_cleanup_task) lookup_user_key()
| || |
| kmem_cache_free(,user) |
| . |[KEY_SPEC_USER_KEYRING]
| . install_user_keyrings()
| . ||
| key_cleanup() [<= worker_thread()] ||
| | ||
| [spin_lock(&key_serial_lock)] |[mutex_lock(&key_user_keyr..mutex)]
| | ||
| atomic_read() == 0 ||
| |{ rb_ease(&key->serial_node,) } ||
| | ||
| [spin_unlock(&key_serial_lock)] |find_keyring_by_name()
| | |||
| keyring_destroy(keyring) ||[read_lock(&keyring_name_lock)]
| || |||
| |[write_lock(&keyring_name_lock)] ||atomic_inc(&keyring->usage)
| |. ||| *** GET freeing keyring ***
| |. ||[read_unlock(&keyring_name_lock)]
| || ||
| |list_del() |[mutex_unlock(&key_user_k..mutex)]
| || |
| |[write_unlock(&keyring_name_lock)] ** INVALID keyring is returned **
| | .
| kmem_cache_free(,keyring) .
| .
| atomic_dec(&keyring->usage)
v *** DESTROYED ***
TIME
If CONFIG_SLUB_DEBUG=y then we may see the following message generated:
=============================================================================
BUG key_jar: Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b
INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086
INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10
INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3
INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300
Bytes b4 0xffff880197a7e1f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xffff880197a7e200: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk
Alternatively, we may see a system panic happen, such as:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
PGD 6b2b4067 PUD 6a80d067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/kexec_crash_loaded
CPU 1
...
Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY
RIP: 0010:[<ffffffff810e61a3>] [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
RSP: 0018:ffff88006af3bd98 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b
RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430
RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0
R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce
FS: 00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0)
Stack:
0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001
0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce
0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3
Call Trace:
[<ffffffff810f20ce>] ? get_empty_filp+0x70/0x12f
[<ffffffff810face3>] ? do_filp_open+0x145/0x590
[<ffffffff810ce208>] ? tlb_finish_mmu+0x2a/0x33
[<ffffffff810ce43c>] ? unmap_region+0xd3/0xe2
[<ffffffff810e4393>] ? virt_to_head_page+0x9/0x2d
[<ffffffff81103916>] ? alloc_fd+0x69/0x10e
[<ffffffff810ef4ed>] ? do_sys_open+0x56/0xfc
[<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b
Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef
RIP [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
This problem is that find_keyring_by_name does not confirm that the keyring is
valid before accepting it.
Skipping keyrings that have been reduced to a zero count seems the way to go.
To this end, use atomic_inc_not_zero() to increment the usage count and skip
the candidate keyring if that returns false.
The following script _may_ cause the bug to happen, but there's no guarantee
as the window of opportunity is small:
#!/bin/sh
LOOP=100000
USER=dummy_user
/bin/su -c "exit;" $USER || { /usr/sbin/adduser -m $USER; add=1; }
for ((i=0; i<LOOP; i++))
do
/bin/su -c "echo '$i' > /dev/null" $USER
done
(( add == 1 )) && /usr/sbin/userdel -r $USER
exit
Note that the nominated user must not be in use.
An alternative way of testing this may be:
for ((i=0; i<100000; i++))
do
keyctl session foo /bin/true || break
done >&/dev/null
as that uses a keyring named "foo" rather than relying on the user and
user-session named keyrings.
Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4d09ec0f705cf88a12add029c058b53f288cfaa2 upstream.
We were using the wrong variable here so the error codes weren't being returned
properly. The original code returns -ENOKEY.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 550f0d922286556c7ea43974bb7921effb5a5278 upstream.
Clear the floating point exception flag before returning to
user space. This is needed, else the libc trampoline handler
may hit the same SIGFPE again while building up a trampoline
to a signal handler.
Fixes debian bug #559406.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This patch disables the possibility for a l2-guest to do a
VMMCALL directly into the host. This would happen if the
l1-hypervisor doesn't intercept VMMCALL and the l2-guest
executes this instruction.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 0d945bd9351199744c1e89d57a70615b6ee9f394)
|
|
This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit b69e8caef5b190af48c525f6d715e7b7728a77f6)
|
|
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 8fbf065d625617bbbf6b72d5f78f84ad13c8b547)
|
|
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 98001d8d017cea1ee0f9f35c6227bbd63ef5005b)
|
|
Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.
Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 9ed3c444ab8987c7b219173a2f7807e3f71e234e)
|
|
On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.
Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 8facbbff071ff2b19268d3732e31badc60471e21)
|
|
Per document, for feature control MSR:
Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
of VMXON in SMX operation causes a general-protection exception.
Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
of VMXON outside SMX operation causes a general-protection exception.
This patch is to enable this kind of check with SMX for VMXON in KVM.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit cafd66595d92591e4bd25c3904e004fc6f897e2d)
|
|
When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte
having u/s=0 and r/w=1. This allows excessive access if the guest sets
cr0.wp=1 and accesses through this spte.
Fix by making cr0.wp part of the base role; we'll have different sptes for
the two cases and the problem disappears.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 3dbe141595faa48a067add3e47bba3205b79d33c)
|
|
kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the
checking of LMA bit didn't work.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit a3d204e28579427609c3d15d2310127ebaa47d94)
|
|
The current lmsw implementation allows the guest to clear cr0.pe, contrary
to the manual, which breaks EMM386.EXE.
Fix by ORing the old cr0.pe with lmsw's operand.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit f78e917688edbf1f14c318d2e50dc8e7dad20445)
|
|
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.
This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.
Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.
Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.
A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.
We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.
OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.
After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 489fb490dbf8dab0249ad82b56688ae3842a79e8)
|
|
This patch implements the reporting of the emulated SVM
features to userspace instead of the real hardware
capabilities. Every real hardware capability needs emulation
in nested svm so the old behavior was broken.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit c2c63a493924e09a1984d1374a0e60dfd54fc0b0)
|
|
This patch adds the get_supported_cpuid callback to
kvm_x86_ops. It will be used in do_cpuid_ent to delegate the
decission about some supported cpuid bits to the
architecture modules.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit d4330ef2fb2236a1e3a176f0f68360f4c0a8661b)
|
|
If fail to create the vcpu, we should not create the debugfs
for it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Alexander Graf <agraf@suse.de>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 06056bfb944a0302a8f22eb45f09123de7fb417b)
|
|
This patch fixed possible memory leak in kvm_arch_vcpu_create()
under s390, which would happen when kvm_arch_vcpu_create() fails.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 7b06bf2ffa15e119c7439ed0b024d44f66d7b605)
|
|
The nested_svm_intr() function does not execute the vmexit
anymore. Therefore we may still be in the nested state after
that function ran. This patch changes the nested_svm_intr()
function to return wether the irq window could be enabled.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 8fe546547cf6857a9d984bfe2f2194910f3fc5d0)
|
|
This patch makes syncing of the guest tpr to the lapic
conditional on !nested. Otherwise a nested guest using the
TPR could freeze the guest.
Another important change this patch introduces is that the
cr8 intercept bits are no longer ORed at vmrun emulation if
the guest sets VINTR_MASKING in its VMCB. The reason is that
nested cr8 accesses need alway be handled by the nested
hypervisor because they change the shadow version of the
tpr.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 88ab24adc7142506c8583ac36a34fa388300b750)
|
|
The nested_svm_exit_handled_msr() function maps only one
page of the guests msr permission bitmap. This patch changes
the code to use kvm_read_guest to fix the bug.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 4c7da8cb43c09e71a405b5aeaa58a1dbac3c39e9)
|
|
Currently the vmexit emulation does not sync control
registers were the access is typically intercepted by the
nested hypervisor. But we can not count on that intercepts
to sync these registers too and make the code
architecturally more correct.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit cdbbdc1210223879450555fee04c29ebf116576b)
|
|
Move the actual vmexit routine out of code that runs with
irqs and preemption disabled.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit b8e88bc8ffba5fe53fb8d8a0a4be3bbcffeebe56)
|