summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-11-06v2.6.31.5-rt17v2.6.31.5-rt17Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06Merge branch 'rt/head' into rt/2.6.31Thomas Gleixner
2009-11-06ftrace: Add latency histograms of missed timer offsetsCarsten Emde
A source of system latencies not yet considered in the histograms of effective latencies are delayed timer interrupts. Such latencies are mainly due to disabled interrupts. Recording of effective latencies allows to continuously monitor a system's real-time capabilities under real-world conditions. This patch adds latency histograms of missed timer offsets. If the timer belongs to a sleeper that is about to wakeup a task and the latency is higher than previous latencies of such timers, some data of this task are recorded as well. Adapted and expanded Documentation/trace/histograms.txt. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06ftrace: Consider shared max priority in latency histogramsCarsten Emde
The algorithm used so far to trace the process with the highest priority requires that no other processes with the same priority are being woken up simultaneously. Otherwise, a process with a lower priority may be picked up for tracing which leads to an erroneously high latency value. Generally, the wakeup latency of a process that exclusively uses the highest priority of the system is due to software or hardware issues we would like to solve or, at least, keep as small as possible. This is what latency measurements are made for, after all. The wakeup latency of a process that shares the highest priority of the system with other processes, is quite another story. It may contain the worst-case runtime durations of the other processes; thus, it is the result of the priority design of a given system and nothing a kernel developer or hardware engineer may want to fix. This said, we need to separately record latencies i) of processes that exclusively use the highest priority of the system and ii) of processes that share the highest priority of the system with other processes. The above mentioned shortcoming of the tracing algorithm also applies to the variable tracing_max_latency that the wakeup latency tracer uses, since it is based on the same procedure as the original version of the latency histogram. In consequence, if several processes share the highest priority of the system, the variable tracing_max_latency may contain erroneously high values. We could now patch the wakeup latency tracer as well and separately record the various latencies, but we better document this behavior and recommend the latency histograms to reliably determine a system's worst-case wakeup latency. Simplified and cleaned up a bit. Added some more help info to Kconfig. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06hwlat: Fix Kconfig and check kthread_stop resultJon Masters
Signed-off-by: Jon Masters <jcm@jonmasters.org> Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06netlink: fix typo in initializationJiri Pirko
Commit 9ef1d4c7c7aca1cd436612b6ca785b726ffb8ed8 ("[NETLINK]: Missing initializations in dumped data") introduced a typo in initialization. This patch fixes this. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06drm/r128: Add test for initialisation to all ioctls that require itBen Hutchings
Almost all r128's private ioctls require that the CCE state has already been initialised. However, most do not test that this has been done, and will proceed to dereference a null pointer. This may result in a security vulnerability, since some ioctls are unprivileged. This adds a macro for the common initialisation test and changes all ioctl implementations that require prior initialisation to use that macro. Also, r128_do_init_cce() does not test that the CCE state has not been initialised already. Repeated initialisation may lead to a crash or resource leak. This adds that test. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-11-06AF_UNIX: Fix deadlock on connecting to shutdown socketTomoki Sekiyama
I found a deadlock bug in UNIX domain socket, which makes able to DoS attack against the local machine by non-root users. How to reproduce: 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct namespace(*), and shutdown(2) it. 2. Repeat connect(2)ing to the listening socket from the other sockets until the connection backlog is full-filled. 3. connect(2) takes the CPU forever. If every core is taken, the system hangs. PoC code: (Run as many times as cores on SMP machines.) int main(void) { int ret; int csd; int lsd; struct sockaddr_un sun; /* make an abstruct name address (*) */ memset(&sun, 0, sizeof(sun)); sun.sun_family = PF_UNIX; sprintf(&sun.sun_path[1], "%d", getpid()); /* create the listening socket and shutdown */ lsd = socket(AF_UNIX, SOCK_STREAM, 0); bind(lsd, (struct sockaddr *)&sun, sizeof(sun)); listen(lsd, 1); shutdown(lsd, SHUT_RDWR); /* connect loop */ alarm(15); /* forcely exit the loop after 15 sec */ for (;;) { csd = socket(AF_UNIX, SOCK_STREAM, 0); ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun)); if (-1 == ret) { perror("connect()"); break; } puts("Connection OK"); } return 0; } (*) Make sun_path[0] = 0 to use the abstruct namespace. If a file-based socket is used, the system doesn't deadlock because of context switches in the file system layer. Why this happens: Error checks between unix_socket_connect() and unix_wait_for_peer() are inconsistent. The former calls the latter to wait until the backlog is processed. Despite the latter returns without doing anything when the socket is shutdown, the former doesn't check the shutdown state and just retries calling the latter forever. Patch: The patch below adds shutdown check into unix_socket_connect(), so connect(2) to the shutdown socket will return -ECONREFUSED. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06KEYS: get_instantiation_keyring() should inc the keyring refcount in all casesDavid Howells
The destination keyring specified to request_key() and co. is made available to the process that instantiates the key (the slave process started by /sbin/request-key typically). This is passed in the request_key_auth struct as the dest_keyring member. keyctl_instantiate_key and keyctl_negate_key() call get_instantiation_keyring() to get the keyring to attach the newly constructed key to at the end of instantiation. This may be given a specific keyring into which a link will be made later, or it may be asked to find the keyring passed to request_key(). In the former case, it returns a keyring with the refcount incremented by lookup_user_key(); in the latter case, it returns the keyring from the request_key_auth struct - and does _not_ increment the refcount. The latter case will eventually result in an oops when the keyring prematurely runs out of references and gets destroyed. The effect may take some time to show up as the key is destroyed lazily. To fix this, the keyring returned by get_instantiation_keyring() must always have its refcount incremented, no matter where it comes from. This can be tested by setting /etc/request-key.conf to: #OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... #====== ======= =============== =============== =============================== create * test:* * |/bin/false %u %g %d %{user:_display} negate * * * /bin/keyctl negate %k 10 @u and then doing: keyctl add user _display aaaaaaaa @u while keyctl request2 user test:x test:x @u && keyctl list @u; do keyctl request2 user test:x test:x @u; sleep 31; keyctl list @u; done which will oops eventually. Changing the negate line to have @u rather than %S at the end is important as that forces the latter case by passing a special keyring ID rather than an actual keyring ID. Reported-by: Alexander Zangerl <az@bond.edu.au> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Alexander Zangerl <az@bond.edu.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-06fs: pipe.c null pointer dereferenceEarl Chew
This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl | grep 'sleep 1' | grep -v grep | { read PID REST ; echo $PID; } ) OUT="${OUT%% *}" DELAY=$((RANDOM * 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode *inode, struct file *filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29v2.6.31.5-rt16Thomas Gleixner
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.31.y into rt/2.6.31 Conflicts: Makefile kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29Merge branch 'rt/head' into rt/2.6.31Thomas Gleixner
2009-10-29sched: Fix dynamic power-balancing crashDinakar Guniguntala
This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Got the same oops with the backport to -rt Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: Deal with low-load in wake_affine()Peter Zijlstra
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: Add a missing =Dinakar Guniguntala
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: cleanup wake_idlePeter Zijlstra
A more readable version, with a few differences: - don't check against the root domain, but instead check SD_LOAD_BALANCE - don't re-iterate the cpus already iterated on the previous SD - use rcu_read_lock() around the sd iteration Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: cleanup wake_idle power savingPeter Zijlstra
Hopefully a more readable version of the same. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29x86: sched: provide arch implementations using aperf/mperfPeter Zijlstra
APERF/MPERF support for cpu_power. APERF/MPERF is arch defined to be a relative scale of work capacity per logical cpu, this is assumed to include SMT and Turbo mode. APERF/MPERF are specified to both reset to 0 when either counter wraps, which is highly inconvenient, since that'll give a blimp when that happens. The manual specifies writing 0 to the counters after each read, but that's 1) too expensive, and 2) destroys the possibility of sharing these counters with other users, so we live with the blimp - the other existing user does too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29Provide an arch specific hook for cpufreq based scaling of cpu_power.Peter Zijlstra
[ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29x86: Add generic aperf/mperf codeDinakar Guniguntala
Move some of the aperf/mperf code out from the cpufreq driver thingy so that other people can enjoy it too. Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29x86: move APERF/MPERF into a X86_FEATUREPeter Zijlstra
Move the APERFMPERF capacility into a X86_FEATURE flag so that it can be used outside of the acpi cpufreq driver. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: remove reciprocal for cpu_powerPeter Zijlstra
Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> [andreas.herrmann3@amd.com: remove include] Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: try to deal with low capacityPeter Zijlstra
When the capacity drops low, we want to migrate load away. Allow the load-balancer to remove all tasks when we hit rock bottom. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> [ego@in.ibm.com: fix to update_sd_power_savings_stats] Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: scale down cpu_power due to RT tasksPeter Zijlstra
Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: dynamic cpu_powerPeter Zijlstra
Recompute the cpu_power for each cpu during load-balance Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: add smt_gainPeter Zijlstra
The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: update the cpu_power sum during load-balancePeter Zijlstra
In order to prepare for a more dynamic cpu_power, update the group sum while walking the sched domains during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: SD_PREFER_SIBLINGPeter Zijlstra
Do the placement thing using SD flags XXX: consider degenerate bits [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29sched: restore __cpu_power to a straight sum of powerPeter Zijlstra
cpu_power is supposed to be a representation of the process capacity of the cpu, not a value to randomly tweak in order to affect placement. Remove the placement hacks. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28futex: Correct queue_me and unqueue_me commentaryDarren Hart
The queue_me/unqueue_me commentary is oddly placed and out of date. Clean it up and correct the inaccurate bits. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053015.8717.71713.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Move drop_futex_key_refs out of spinlock'ed regionDarren Hart
When requeuing tasks from one futex to another, the reference held by the requeued task to the original futex location needs to be dropped eventually. Dropping the reference may ultimately lead to a call to "iput_final" and subsequently call into filesystem- specific code - which may be non-atomic. It is therefore safer to defer this drop operation until after the futex_hash_bucket spinlock has been dropped. Originally-From: Helge Bahmann <hcb@chaoticmind.net> Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: <stable@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Sven-Thorsten Dietrich <sdietrich@novell.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <4AD7A298.5040802@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Add memory barrier commentary to futex_wait_queue_me()Darren Hart
The memory barrier semantics of futex_wait_queue_me() are non-obvious. Add some commentary to try and clarify it. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924185447.694.38948.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Correct futex_wait_requeue_pi() commentaryDarren Hart
The state machine described in the comments wasn't updated with a follow-on fix. Address that and cleanup the corresponding commentary in the function. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <4A737C2A.9090001@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Fix locking imbalanceThomas Gleixner
Rich reported a lock imbalance in the futex code: http://bugzilla.kernel.org/show_bug.cgi?id=14288 It's caused by the displacement of the retry_private label in futex_wake_op(). The code unlocks the hash bucket locks in the error handling path and retries without locking them again which makes the next unlock fail. Move retry_private so we lock the hash bucket locks when we retry. Reported-by: Rich Ercolany <rercola@acm.jhu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: stable-2.6.31 <stable@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Correct futex_wait_requeue_pi() commentaryDarren Hart
Correct various typos and formatting inconsistencies in the commentary of futex_wait_requeue_pi(). Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922052958.8717.21932.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Make function kernel-doc commentary consistentDarren Hart
Make the existing function kernel-doc consistent throughout futex.c, following Documentation/kernel-doc-nano-howto.txt as closely as possible. When unsure, at least be consistent within futex.c. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053022.8717.13339.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Correct futex_q woken state commentaryDarren Hart
Use kernel-doc format to describe struct futex_q. Correct the wakeup definition to eliminate the statement about waking the waiter between the plist_del() and the q->lock_ptr = 0. Note in the comment that PI futexes have a different definition of the woken state. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053029.8717.62798.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28futex: Check for NULL keys in match_futexDarren Hart
If userspace tries to perform a requeue_pi on a non-requeue_pi waiter, it will find the futex_q->requeue_pi_key to be NULL and OOPS. Check for NULL in match_futex() instead of doing explicit NULL pointer checks on all call sites. While match_futex(NULL, NULL) returning false is a little odd, it's still correct as we expect valid key references. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Dinakar Guniguntala <dino@in.ibm.com> CC: John Stultz <johnstul@us.ibm.com> Cc: stable@kernel.org LKML-Reference: <4AD60687.10306@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28futex: Fix spurious wakeup for requeue_pi reallyThomas Gleixner
The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr == NULL test) nor does it use the wake_list of futex_wake() which where the reason for commit 41890f2 (futex: Handle spurious wake up) See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com> The changes in this fix to the wait_requeue_pi path were considered to be a likely unecessary, but harmless safety net. But it turns out that due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined as EAGAIN we built an endless loop in the code path which returns correctly EWOULDBLOCK. Spurious wakeups in wait_requeue_pi code path are unlikely so we do the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let it deal with the spurious wakeup. Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> LKML-Reference: <4AE23C74.1090502@us.ibm.com> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28futex: Detect mismatched requeue targetsThomas Gleixner
There is currently no check to ensure that userspace uses the same futex requeue target (uaddr2) in futex_requeue() that the waiter used in futex_wait_requeue_pi(). A mismatch here could very unexpected results as the waiter assumes it either wakes on uaddr1 or uaddr2. We could detect this on wakeup in the waiter, but the cleanup is more intense after the improper requeue has occured. This patch stores the waiter's expected requeue target in a new requeue_pi_key pointer in the futex_q which futex_requeue() checks prior to attempting to do a proxy lock acquistion or a requeue when requeue_pi=1. If they don't match, return -EINVAL from futex_requeue, aborting the requeue of any remaining waiters. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090814003650.14634.63916.stgit@Aeon> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Conflicts: kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-22Linux 2.6.31.5v2.6.31.5Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22tty: Make flush_to_ldisc() locking more robustLinus Torvalds
commit c8e33141911bf8fe87dc6c92793b9a59b2be0130 upstream. The locking logic in this function is extremely subtle, and it broke when we started doing potentially concurrent 'flush_to_ldisc()' calls in commit e043e42bdb66885b3ac10d27a01ccb9972e2b0a3 ("pty: avoid forcing 'low_latency' tty flag"). The code in flush_to_ldisc() used to set 'tty->buf.head' to NULL, with the intention that this would then cause any other concurrent calls to not do anything (locking note: we have to drop the buf.lock over the call to ->receive_buf that can block, which is why we can have concurrency here at all in the first place). It also used to set the TTY_FLUSHING bit, which would then cause any concurrent 'tty_buffer_flush()' to not free all the tty buffers and clear 'tty->buf.tail'. And with 'buf.head' being NULL, and 'buf.tail' being non-NULL, new data would never touch 'buf.head'. Does that sound a bit too subtle? It was. If another concurrent call to 'flush_to_ldisc()' were to come in, the NULL buf.head would indeed cause it to not process the buffer list, but it would still clear TTY_FLUSHING afterwards, making the buffer protection against 'tty_buffer_flush()' no longer work. So this clears it all up. We depend purely on TTY_FLUSHING for handling re-entrancy, and stop playing games with the buffer list entirely. In fact, the buffer list handling is now robust enough that we could probably stop doing the whole "protect against 'tty_buffer_flush()'" thing entirely. However, Alan also points out that we would probably be better off simplifying the locking even further, and just take the tty ldisc_mutex around all the buffer flushing calls. That seems like a good idea, but in the meantime this is a conceptually minimal fix (with the patch itself being bigger than required just to clean the code up and make it readable). This fixes keyboard trouble under X: http://bugzilla.kernel.org/show_bug.cgi?id=14388 Reported-and-tested-by: Frédéric Meunier <fredlwm@gmail.com> Reported-and-tested-by: Boyan <btanastasov@yahoo.co.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22mac80211: fix vlan and optimise RXJohannes Berg
commit fbc44bf7177dfd61381da55405550b693943a432 upstream. When receiving data frames, we can send them only to the interface they belong to based on transmitting station (this doesn't work for probe requests). Also, don't try to handle other frames for AP_VLAN at all since those interface should only receive data. Additionally, the transmit side must check that the station we're sending a frame to is actually on the interface we're transmitting on, and not transmit packets to functions that live on other interfaces, so validate that as well. Another bug fix is needed in sta_info.c where in the VLAN case when adding/removing stations we overwrite the sdata variable we still need. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22iwlwifi: incorrect method used for finding valid OTP blocksJay Sternberg
commit 2facba769d7f9e563cf706de709074a2d20f1bba upstream. The address stored in the next link address is a word address but when reading the OTP blocks, a byte address is used. Also if the blocks are full and the last link pointer is not zero, then none of the blocks are valid so return an error. The algorithm is simply valid blocks have a next address and that address's contents is zero. Using the wrong address for the next link address gets arbitrary data, obviously. In cases seen, the first block is considered valid when it is not. If the block has in fact been invalidated there may be old data or there may be no data, bad data, or partial data, there is no way of telling. Without this patch it is possible that a device with valid OTP data is unable to work. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22usb-storage: Workaround devices with bogus sense sizeBenjamin Herrenschmidt
commit b8430e1b82b7e514d76a88eb70a7d8831d50df1e upstream. usb-storage: Workaround devices with bogus sense size Some devices, such as Huawei E169, advertise more than the standard amount of sense data, causing us to set US_FL_SANE_SENSE, assuming they support it. However, they subsequently fail the request sense with that size. This works around it generically. When a sense request fails due to a device returning an error, US_FL_SANE_SENSE was set, and that sense request used a larger sense size, we retry with a smaller size before giving up. Based on an original patch by Ben Efros <ben@pc-doctor.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22Staging: rt2860sta: prevent a panic when disabling when associatedDarren Salt
commit 0af49167b1e5ba154e90d2c454bf4624ee47df80 upstream. This fixes a panic which is triggered when the hardware "disappears" from beneath the driver, i.e. when wireless is toggled off via Fn-F2 on various EeePC models. Ref. bug report http://bugzilla.kernel.org/show_bug.cgi?id=13390 panic http://bugzilla.kernel.org/attachment.cgi?id=21928 Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22sysfs: Allow sysfs_notify_dirent to be called from interrupt context.Neil Brown
commit 83db93f4de2d9ae441a491d1dc61c2204f0195de upstream. sysfs_notify_dirent is a simple atomic operation that can be used to alert user-space that new data can be read from a sysfs attribute. Unfortunately it cannot currently be called from non-process context because of its use of spin_lock which is sometimes taken with interrupts enabled. So change all lockers of sysfs_open_dirent_lock to disable interrupts, thus making sysfs_notify_dirent safe to be called from non-process context (as drivers/md does in md_safemode_timeout). sysfs_get_open_dirent is (documented as being) only called from process context, so it uses spin_lock_irq. Other places use spin_lock_irqsave. The usage for sysfs_notify_dirent in md_safemode_timeout was introduced in 2.6.28, so this patch is suitable for that and more recent kernels. Reported-by: Joel Andres Granados <jgranado@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22bsdacct: switch credentials for writing to the accounting fileMichal Schmidt
commit d8e180dcd5bbbab9cd3ff2e779efcf70692ef541 upstream. When process accounting is enabled, every exiting process writes a log to the account file. In addition, every once in a while one of the exiting processes checks whether there's enough free space for the log. SELinux policy may or may not allow the exiting process to stat the fs. So unsuspecting processes start generating AVC denials just because someone enabled process accounting. For these filesystem operations, the exiting process's credentials should be temporarily switched to that of the process which enabled accounting, because it's really that process which wanted to have the accounting information logged. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22ALSA: Don't assume i2c device probing always succeedsTakashi Iwai
commit 18c4078489fe064cc0ed08be3381cf2f26657f5f upstream. The client->driver pointer can be NULL when i2c-device probing fails in i2c_new_device(). This patch adds the NULL checks for client->driver and return the error instead of blind assumption of driver availability. Reported-by: Tim Shepard <shep@alum.mit.edu> Cc: Jean Delvare <khali@linux-fr.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22i2c: Hide probe errors caused by ACPI resource conflictsJean Delvare
commit 18669eabde2ff5fc446e72e043f0539059763438 upstream. When an ACPI resource conflict is detected, error messages are already printed by ACPI. There's no point in causing the driver core to print more error messages, so return one of the error codes for which no message is printed. This fixes bug #14293: http://bugzilla.kernel.org/show_bug.cgi?id=14293 Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>