Age | Commit message (Collapse) | Author |
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
A source of system latencies not yet considered in the histograms
of effective latencies are delayed timer interrupts. Such latencies
are mainly due to disabled interrupts. Recording of effective latencies
allows to continuously monitor a system's real-time capabilities
under real-world conditions.
This patch adds latency histograms of missed timer offsets. If the
timer belongs to a sleeper that is about to wakeup a task and the
latency is higher than previous latencies of such timers, some data
of this task are recorded as well.
Adapted and expanded Documentation/trace/histograms.txt.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The algorithm used so far to trace the process with the highest priority
requires that no other processes with the same priority are being woken
up simultaneously. Otherwise, a process with a lower priority may be
picked up for tracing which leads to an erroneously high latency value.
Generally, the wakeup latency of a process that exclusively uses the
highest priority of the system is due to software or hardware issues we
would like to solve or, at least, keep as small as possible. This is
what latency measurements are made for, after all. The wakeup latency of
a process that shares the highest priority of the system with other
processes, is quite another story. It may contain the worst-case runtime
durations of the other processes; thus, it is the result of the priority
design of a given system and nothing a kernel developer or hardware
engineer may want to fix.
This said, we need to separately record latencies i) of processes that
exclusively use the highest priority of the system and ii) of processes
that share the highest priority of the system with other processes.
The above mentioned shortcoming of the tracing algorithm also applies to
the variable tracing_max_latency that the wakeup latency tracer uses,
since it is based on the same procedure as the original version of the
latency histogram. In consequence, if several processes share the
highest priority of the system, the variable tracing_max_latency may
contain erroneously high values. We could now patch the wakeup latency
tracer as well and separately record the various latencies, but we
better document this behavior and recommend the latency histograms to
reliably determine a system's worst-case wakeup latency.
Simplified and cleaned up a bit. Added some more help info to Kconfig.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Jon Masters <jcm@jonmasters.org>
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Commit 9ef1d4c7c7aca1cd436612b6ca785b726ffb8ed8 ("[NETLINK]: Missing
initializations in dumped data") introduced a typo in
initialization. This patch fixes this.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Almost all r128's private ioctls require that the CCE state has
already been initialised. However, most do not test that this has
been done, and will proceed to dereference a null pointer. This may
result in a security vulnerability, since some ioctls are
unprivileged.
This adds a macro for the common initialisation test and changes all
ioctl implementations that require prior initialisation to use that
macro.
Also, r128_do_init_cce() does not test that the CCE state has not
been initialised already. Repeated initialisation may lead to a crash
or resource leak. This adds that test.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
I found a deadlock bug in UNIX domain socket, which makes able to DoS
attack against the local machine by non-root users.
How to reproduce:
1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
namespace(*), and shutdown(2) it.
2. Repeat connect(2)ing to the listening socket from the other sockets
until the connection backlog is full-filled.
3. connect(2) takes the CPU forever. If every core is taken, the
system hangs.
PoC code: (Run as many times as cores on SMP machines.)
int main(void)
{
int ret;
int csd;
int lsd;
struct sockaddr_un sun;
/* make an abstruct name address (*) */
memset(&sun, 0, sizeof(sun));
sun.sun_family = PF_UNIX;
sprintf(&sun.sun_path[1], "%d", getpid());
/* create the listening socket and shutdown */
lsd = socket(AF_UNIX, SOCK_STREAM, 0);
bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
listen(lsd, 1);
shutdown(lsd, SHUT_RDWR);
/* connect loop */
alarm(15); /* forcely exit the loop after 15 sec */
for (;;) {
csd = socket(AF_UNIX, SOCK_STREAM, 0);
ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
if (-1 == ret) {
perror("connect()");
break;
}
puts("Connection OK");
}
return 0;
}
(*) Make sun_path[0] = 0 to use the abstruct namespace.
If a file-based socket is used, the system doesn't deadlock because
of context switches in the file system layer.
Why this happens:
Error checks between unix_socket_connect() and unix_wait_for_peer() are
inconsistent. The former calls the latter to wait until the backlog is
processed. Despite the latter returns without doing anything when the
socket is shutdown, the former doesn't check the shutdown state and
just retries calling the latter forever.
Patch:
The patch below adds shutdown check into unix_socket_connect(), so
connect(2) to the shutdown socket will return -ECONREFUSED.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The destination keyring specified to request_key() and co. is made available to
the process that instantiates the key (the slave process started by
/sbin/request-key typically). This is passed in the request_key_auth struct as
the dest_keyring member.
keyctl_instantiate_key and keyctl_negate_key() call get_instantiation_keyring()
to get the keyring to attach the newly constructed key to at the end of
instantiation. This may be given a specific keyring into which a link will be
made later, or it may be asked to find the keyring passed to request_key(). In
the former case, it returns a keyring with the refcount incremented by
lookup_user_key(); in the latter case, it returns the keyring from the
request_key_auth struct - and does _not_ increment the refcount.
The latter case will eventually result in an oops when the keyring prematurely
runs out of references and gets destroyed. The effect may take some time to
show up as the key is destroyed lazily.
To fix this, the keyring returned by get_instantiation_keyring() must always
have its refcount incremented, no matter where it comes from.
This can be tested by setting /etc/request-key.conf to:
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
#====== ======= =============== =============== ===============================
create * test:* * |/bin/false %u %g %d %{user:_display}
negate * * * /bin/keyctl negate %k 10 @u
and then doing:
keyctl add user _display aaaaaaaa @u
while keyctl request2 user test:x test:x @u &&
keyctl list @u;
do
keyctl request2 user test:x test:x @u;
sleep 31;
keyctl list @u;
done
which will oops eventually. Changing the negate line to have @u rather than
%S at the end is important as that forces the latter case by passing a special
keyring ID rather than an actual keyring ID.
Reported-by: Alexander Zangerl <az@bond.edu.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Alexander Zangerl <az@bond.edu.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch fixes a null pointer exception in pipe_rdwr_open() which
generates the stack trace:
> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
> [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
> [<ffffffff8028125c>] __dentry_open+0x13c/0x230
> [<ffffffff8028143d>] do_filp_open+0x2d/0x40
> [<ffffffff802814aa>] do_sys_open+0x5a/0x100
> [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67
The failure mode is triggered by an attempt to open an anonymous
pipe via /proc/pid/fd/* as exemplified by this script:
=============================================================
while : ; do
{ echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
PID=$!
OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
{ read PID REST ; echo $PID; } )
OUT="${OUT%% *}"
DELAY=$((RANDOM * 1000 / 32768))
usleep $((DELAY * 1000 + RANDOM % 1000 ))
echo n > /proc/$OUT/fd/1 # Trigger defect
done
=============================================================
Note that the failure window is quite small and I could only
reliably reproduce the defect by inserting a small delay
in pipe_rdwr_open(). For example:
static int
pipe_rdwr_open(struct inode *inode, struct file *filp)
{
msleep(100);
mutex_lock(&inode->i_mutex);
Although the defect was observed in pipe_rdwr_open(), I think it
makes sense to replicate the change through all the pipe_*_open()
functions.
The core of the change is to verify that inode->i_pipe has not
been released before attempting to manipulate it. If inode->i_pipe
is no longer present, return ENOENT to indicate so.
The comment about potentially using atomic_t for i_pipe->readers
and i_pipe->writers has also been removed because it is no longer
relevant in this context. The inode->i_mutex lock must be used so
that inode->i_pipe can be dealt with correctly.
Signed-off-by: Earl Chew <earl_chew@agilent.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.31.y
into rt/2.6.31
Conflicts:
Makefile
kernel/futex.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
This crash:
[ 1774.088275] divide error: 0000 [#1] SMP
[ 1774.100355] CPU 13
[ 1774.102498] Modules linked in:
[ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN
[ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>]
sched_balance_self+0x19b/0x2d4
Triggers because update_group_power() modifies the sd tree and does
temporary calculations there - not considering that other CPUs
could observe intermediate values, such as the zero initial value.
Calculate it in a temporary variable instead. (we need no memory
barrier as these are all statistical values anyway)
Got the same oops with the backport to -rt
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
wake_affine() would always fail under low-load situations where
both prev and this were idle, because adding a single task will
always be a significant imbalance, even if there's nothing
around that could balance it.
Deal with this by allowing imbalance when there's nothing you
can do about it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
A more readable version, with a few differences:
- don't check against the root domain, but instead check
SD_LOAD_BALANCE
- don't re-iterate the cpus already iterated on the previous SD
- use rcu_read_lock() around the sd iteration
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Hopefully a more readable version of the same.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
APERF/MPERF support for cpu_power.
APERF/MPERF is arch defined to be a relative scale of work capacity
per logical cpu, this is assumed to include SMT and Turbo mode.
APERF/MPERF are specified to both reset to 0 when either counter
wraps, which is highly inconvenient, since that'll give a blimp when
that happens. The manual specifies writing 0 to the counters after
each read, but that's 1) too expensive, and 2) destroys the
possibility of sharing these counters with other users, so we live
with the blimp - the other existing user does too.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Move some of the aperf/mperf code out from the cpufreq driver
thingy so that other people can enjoy it too.
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Move the APERFMPERF capacility into a X86_FEATURE flag so that it can
be used outside of the acpi cpufreq driver.
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Its a source of fail, also, now that cpu_power is dynamical, its a
waste of time.
before:
<idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1
after:
bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
[andreas.herrmann3@amd.com: remove include]
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When the capacity drops low, we want to migrate load away. Allow the
load-balancer to remove all tasks when we hit rock bottom.
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
[ego@in.ibm.com: fix to update_sd_power_savings_stats]
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Keep an average on the amount of time spend on RT tasks and use that
fraction to scale down the cpu_power for regular tasks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Recompute the cpu_power for each cpu during load-balance
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The idea is that multi-threading a core yields more work capacity than
a single thread, provide a way to express a static gain for threads.
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In order to prepare for a more dynamic cpu_power, update the group sum
while walking the sched domains during load-balance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Do the placement thing using SD flags
XXX: consider degenerate bits
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
cpu_power is supposed to be a representation of the process capacity
of the cpu, not a value to randomly tweak in order to affect
placement.
Remove the placement hacks.
[ dino: backport to 31-rt ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The queue_me/unqueue_me commentary is oddly placed and out of date.
Clean it up and correct the inaccurate bits.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053015.8717.71713.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
When requeuing tasks from one futex to another, the reference held
by the requeued task to the original futex location needs to be
dropped eventually.
Dropping the reference may ultimately lead to a call to
"iput_final" and subsequently call into filesystem- specific code -
which may be non-atomic.
It is therefore safer to defer this drop operation until after the
futex_hash_bucket spinlock has been dropped.
Originally-From: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <4AD7A298.5040802@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The memory barrier semantics of futex_wait_queue_me() are
non-obvious. Add some commentary to try and clarify it.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924185447.694.38948.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The state machine described in the comments wasn't updated with
a follow-on fix. Address that and cleanup the corresponding
commentary in the function.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <4A737C2A.9090001@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Rich reported a lock imbalance in the futex code:
http://bugzilla.kernel.org/show_bug.cgi?id=14288
It's caused by the displacement of the retry_private label in
futex_wake_op(). The code unlocks the hash bucket locks in the
error handling path and retries without locking them again which
makes the next unlock fail.
Move retry_private so we lock the hash bucket locks when we retry.
Reported-by: Rich Ercolany <rercola@acm.jhu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: stable-2.6.31 <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Correct various typos and formatting inconsistencies in the
commentary of futex_wait_requeue_pi().
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922052958.8717.21932.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Make the existing function kernel-doc consistent throughout
futex.c, following Documentation/kernel-doc-nano-howto.txt as
closely as possible.
When unsure, at least be consistent within futex.c.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053022.8717.13339.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Use kernel-doc format to describe struct futex_q.
Correct the wakeup definition to eliminate the statement about
waking the waiter between the plist_del() and the q->lock_ptr = 0.
Note in the comment that PI futexes have a different definition of
the woken state.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053029.8717.62798.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
it will find the futex_q->requeue_pi_key to be NULL and OOPS.
Check for NULL in match_futex() instead of doing explicit NULL pointer
checks on all call sites. While match_futex(NULL, NULL) returning
false is a little odd, it's still correct as we expect valid key
references.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dinakar Guniguntala <dino@in.ibm.com>
CC: John Stultz <johnstul@us.ibm.com>
Cc: stable@kernel.org
LKML-Reference: <4AD60687.10306@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f2 (futex: Handle spurious wake up)
See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.
Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
LKML-Reference: <4AE23C74.1090502@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
There is currently no check to ensure that userspace uses the same
futex requeue target (uaddr2) in futex_requeue() that the waiter used
in futex_wait_requeue_pi(). A mismatch here could very unexpected
results as the waiter assumes it either wakes on uaddr1 or uaddr2. We
could detect this on wakeup in the waiter, but the cleanup is more
intense after the improper requeue has occured.
This patch stores the waiter's expected requeue target in a new
requeue_pi_key pointer in the futex_q which futex_requeue() checks
prior to attempting to do a proxy lock acquistion or a requeue when
requeue_pi=1. If they don't match, return -EINVAL from futex_requeue,
aborting the requeue of any remaining waiters.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090814003650.14634.63916.stgit@Aeon>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Conflicts:
kernel/futex.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c8e33141911bf8fe87dc6c92793b9a59b2be0130 upstream.
The locking logic in this function is extremely subtle, and it broke
when we started doing potentially concurrent 'flush_to_ldisc()' calls in
commit e043e42bdb66885b3ac10d27a01ccb9972e2b0a3 ("pty: avoid forcing
'low_latency' tty flag").
The code in flush_to_ldisc() used to set 'tty->buf.head' to NULL, with
the intention that this would then cause any other concurrent calls to
not do anything (locking note: we have to drop the buf.lock over the
call to ->receive_buf that can block, which is why we can have
concurrency here at all in the first place).
It also used to set the TTY_FLUSHING bit, which would then cause any
concurrent 'tty_buffer_flush()' to not free all the tty buffers and
clear 'tty->buf.tail'. And with 'buf.head' being NULL, and 'buf.tail'
being non-NULL, new data would never touch 'buf.head'.
Does that sound a bit too subtle? It was. If another concurrent call to
'flush_to_ldisc()' were to come in, the NULL buf.head would indeed cause
it to not process the buffer list, but it would still clear TTY_FLUSHING
afterwards, making the buffer protection against 'tty_buffer_flush()' no
longer work.
So this clears it all up. We depend purely on TTY_FLUSHING for handling
re-entrancy, and stop playing games with the buffer list entirely. In
fact, the buffer list handling is now robust enough that we could
probably stop doing the whole "protect against 'tty_buffer_flush()'"
thing entirely.
However, Alan also points out that we would probably be better off
simplifying the locking even further, and just take the tty ldisc_mutex
around all the buffer flushing calls. That seems like a good idea, but
in the meantime this is a conceptually minimal fix (with the patch
itself being bigger than required just to clean the code up and make it
readable).
This fixes keyboard trouble under X:
http://bugzilla.kernel.org/show_bug.cgi?id=14388
Reported-and-tested-by: Frédéric Meunier <fredlwm@gmail.com>
Reported-and-tested-by: Boyan <btanastasov@yahoo.co.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit fbc44bf7177dfd61381da55405550b693943a432 upstream.
When receiving data frames, we can send them only to
the interface they belong to based on transmitting
station (this doesn't work for probe requests). Also,
don't try to handle other frames for AP_VLAN at all
since those interface should only receive data.
Additionally, the transmit side must check that the
station we're sending a frame to is actually on the
interface we're transmitting on, and not transmit
packets to functions that live on other interfaces,
so validate that as well.
Another bug fix is needed in sta_info.c where in the
VLAN case when adding/removing stations we overwrite
the sdata variable we still need.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2facba769d7f9e563cf706de709074a2d20f1bba upstream.
The address stored in the next link address is a word address but when
reading the OTP blocks, a byte address is used. Also if the blocks are
full and the last link pointer is not zero, then none of the blocks are
valid so return an error.
The algorithm is simply valid blocks have a next address and that
address's contents is zero.
Using the wrong address for the next link address gets arbitrary data,
obviously. In cases seen, the first block is considered valid when it is not.
If the block has in fact been invalidated there may be old data or
there may be no data, bad data, or partial data, there is no way of
telling. Without this patch it is possible that a device with valid OTP data
is unable to work.
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b8430e1b82b7e514d76a88eb70a7d8831d50df1e upstream.
usb-storage: Workaround devices with bogus sense size
Some devices, such as Huawei E169, advertise more than the standard
amount of sense data, causing us to set US_FL_SANE_SENSE, assuming
they support it. However, they subsequently fail the request sense
with that size.
This works around it generically. When a sense request fails due to
a device returning an error, US_FL_SANE_SENSE was set, and that sense
request used a larger sense size, we retry with a smaller size before
giving up.
Based on an original patch by Ben Efros <ben@pc-doctor.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0af49167b1e5ba154e90d2c454bf4624ee47df80 upstream.
This fixes a panic which is triggered when the hardware "disappears" from
beneath the driver, i.e. when wireless is toggled off via Fn-F2 on various
EeePC models.
Ref. bug report http://bugzilla.kernel.org/show_bug.cgi?id=13390
panic http://bugzilla.kernel.org/attachment.cgi?id=21928
Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 83db93f4de2d9ae441a491d1dc61c2204f0195de upstream.
sysfs_notify_dirent is a simple atomic operation that can be used to
alert user-space that new data can be read from a sysfs attribute.
Unfortunately it cannot currently be called from non-process context
because of its use of spin_lock which is sometimes taken with
interrupts enabled.
So change all lockers of sysfs_open_dirent_lock to disable interrupts,
thus making sysfs_notify_dirent safe to be called from non-process
context (as drivers/md does in md_safemode_timeout).
sysfs_get_open_dirent is (documented as being) only called from
process context, so it uses spin_lock_irq. Other places
use spin_lock_irqsave.
The usage for sysfs_notify_dirent in md_safemode_timeout was
introduced in 2.6.28, so this patch is suitable for that and more
recent kernels.
Reported-by: Joel Andres Granados <jgranado@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d8e180dcd5bbbab9cd3ff2e779efcf70692ef541 upstream.
When process accounting is enabled, every exiting process writes a log to
the account file. In addition, every once in a while one of the exiting
processes checks whether there's enough free space for the log.
SELinux policy may or may not allow the exiting process to stat the fs.
So unsuspecting processes start generating AVC denials just because
someone enabled process accounting.
For these filesystem operations, the exiting process's credentials should
be temporarily switched to that of the process which enabled accounting,
because it's really that process which wanted to have the accounting
information logged.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 18c4078489fe064cc0ed08be3381cf2f26657f5f upstream.
The client->driver pointer can be NULL when i2c-device probing fails
in i2c_new_device(). This patch adds the NULL checks for client->driver
and return the error instead of blind assumption of driver availability.
Reported-by: Tim Shepard <shep@alum.mit.edu>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 18669eabde2ff5fc446e72e043f0539059763438 upstream.
When an ACPI resource conflict is detected, error messages are already
printed by ACPI. There's no point in causing the driver core to print
more error messages, so return one of the error codes for which no
message is printed.
This fixes bug #14293:
http://bugzilla.kernel.org/show_bug.cgi?id=14293
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|