lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2009-11-06	v2.6.31.5-rt17v2.6.31.5-rt17	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06	Merge branch 'rt/head' into rt/2.6.31	Thomas Gleixner

2009-11-06	ftrace: Add latency histograms of missed timer offsets	Carsten Emde
	A source of system latencies not yet considered in the histograms of effective latencies are delayed timer interrupts. Such latencies are mainly due to disabled interrupts. Recording of effective latencies allows to continuously monitor a system's real-time capabilities under real-world conditions. This patch adds latency histograms of missed timer offsets. If the timer belongs to a sleeper that is about to wakeup a task and the latency is higher than previous latencies of such timers, some data of this task are recorded as well. Adapted and expanded Documentation/trace/histograms.txt. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06	ftrace: Consider shared max priority in latency histograms	Carsten Emde
	The algorithm used so far to trace the process with the highest priority requires that no other processes with the same priority are being woken up simultaneously. Otherwise, a process with a lower priority may be picked up for tracing which leads to an erroneously high latency value. Generally, the wakeup latency of a process that exclusively uses the highest priority of the system is due to software or hardware issues we would like to solve or, at least, keep as small as possible. This is what latency measurements are made for, after all. The wakeup latency of a process that shares the highest priority of the system with other processes, is quite another story. It may contain the worst-case runtime durations of the other processes; thus, it is the result of the priority design of a given system and nothing a kernel developer or hardware engineer may want to fix. This said, we need to separately record latencies i) of processes that exclusively use the highest priority of the system and ii) of processes that share the highest priority of the system with other processes. The above mentioned shortcoming of the tracing algorithm also applies to the variable tracing_max_latency that the wakeup latency tracer uses, since it is based on the same procedure as the original version of the latency histogram. In consequence, if several processes share the highest priority of the system, the variable tracing_max_latency may contain erroneously high values. We could now patch the wakeup latency tracer as well and separately record the various latencies, but we better document this behavior and recommend the latency histograms to reliably determine a system's worst-case wakeup latency. Simplified and cleaned up a bit. Added some more help info to Kconfig. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06	hwlat: Fix Kconfig and check kthread_stop result	Jon Masters
	Signed-off-by: Jon Masters <jcm@jonmasters.org> Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-06	netlink: fix typo in initialization	Jiri Pirko
	Commit 9ef1d4c7c7aca1cd436612b6ca785b726ffb8ed8 ("[NETLINK]: Missing initializations in dumped data") introduced a typo in initialization. This patch fixes this. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06	drm/r128: Add test for initialisation to all ioctls that require it	Ben Hutchings
	Almost all r128's private ioctls require that the CCE state has already been initialised. However, most do not test that this has been done, and will proceed to dereference a null pointer. This may result in a security vulnerability, since some ioctls are unprivileged. This adds a macro for the common initialisation test and changes all ioctl implementations that require prior initialisation to use that macro. Also, r128_do_init_cce() does not test that the CCE state has not been initialised already. Repeated initialisation may lead to a crash or resource leak. This adds that test. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-11-06	AF_UNIX: Fix deadlock on connecting to shutdown socket	Tomoki Sekiyama
	I found a deadlock bug in UNIX domain socket, which makes able to DoS attack against the local machine by non-root users. How to reproduce: 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct namespace(), and shutdown(2) it. 2. Repeat connect(2)ing to the listening socket from the other sockets until the connection backlog is full-filled. 3. connect(2) takes the CPU forever. If every core is taken, the system hangs. PoC code: (Run as many times as cores on SMP machines.) int main(void) { int ret; int csd; int lsd; struct sockaddr_un sun; / make an abstruct name address () / memset(&sun, 0, sizeof(sun)); sun.sun_family = PF_UNIX; sprintf(&sun.sun_path[1], "%d", getpid()); /* create the listening socket and shutdown / lsd = socket(AF_UNIX, SOCK_STREAM, 0); bind(lsd, (struct sockaddr )&sun, sizeof(sun)); listen(lsd, 1); shutdown(lsd, SHUT_RDWR); /* connect loop / alarm(15); / forcely exit the loop after 15 sec / for (;;) { csd = socket(AF_UNIX, SOCK_STREAM, 0); ret = connect(csd, (struct sockaddr )&sun, sizeof(sun)); if (-1 == ret) { perror("connect()"); break; } puts("Connection OK"); } return 0; } (*) Make sun_path[0] = 0 to use the abstruct namespace. If a file-based socket is used, the system doesn't deadlock because of context switches in the file system layer. Why this happens: Error checks between unix_socket_connect() and unix_wait_for_peer() are inconsistent. The former calls the latter to wait until the backlog is processed. Despite the latter returns without doing anything when the socket is shutdown, the former doesn't check the shutdown state and just retries calling the latter forever. Patch: The patch below adds shutdown check into unix_socket_connect(), so connect(2) to the shutdown socket will return -ECONREFUSED. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06	KEYS: get_instantiation_keyring() should inc the keyring refcount in all cases	David Howells
	The destination keyring specified to request_key() and co. is made available to the process that instantiates the key (the slave process started by /sbin/request-key typically). This is passed in the request_key_auth struct as the dest_keyring member. keyctl_instantiate_key and keyctl_negate_key() call get_instantiation_keyring() to get the keyring to attach the newly constructed key to at the end of instantiation. This may be given a specific keyring into which a link will be made later, or it may be asked to find the keyring passed to request_key(). In the former case, it returns a keyring with the refcount incremented by lookup_user_key(); in the latter case, it returns the keyring from the request_key_auth struct - and does _not_ increment the refcount. The latter case will eventually result in an oops when the keyring prematurely runs out of references and gets destroyed. The effect may take some time to show up as the key is destroyed lazily. To fix this, the keyring returned by get_instantiation_keyring() must always have its refcount incremented, no matter where it comes from. This can be tested by setting /etc/request-key.conf to: #OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... #====== ======= =============== =============== =============================== create * test:* * \|/bin/false %u %g %d %{user:_display} negate * * * /bin/keyctl negate %k 10 @u and then doing: keyctl add user _display aaaaaaaa @u while keyctl request2 user test:x test:x @u && keyctl list @u; do keyctl request2 user test:x test:x @u; sleep 31; keyctl list @u; done which will oops eventually. Changing the negate line to have @u rather than %S at the end is important as that forces the latter case by passing a special keyring ID rather than an actual keyring ID. Reported-by: Alexander Zangerl <az@bond.edu.au> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Alexander Zangerl <az@bond.edu.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-06	fs: pipe.c null pointer dereference	Earl Chew
	This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } \| { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl \| grep 'sleep 1' \| grep -v grep \| { read PID REST ; echo $PID; } ) OUT="${OUT%% }" DELAY=$((RANDOM 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode inode, struct file filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29	v2.6.31.5-rt16	Thomas Gleixner
	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.31.y into rt/2.6.31 Conflicts: Makefile kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	Merge branch 'rt/head' into rt/2.6.31	Thomas Gleixner

2009-10-29	sched: Fix dynamic power-balancing crash	Dinakar Guniguntala
	This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Got the same oops with the backport to -rt Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: Deal with low-load in wake_affine()	Peter Zijlstra
	wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: Add a missing =	Dinakar Guniguntala
	Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: cleanup wake_idle	Peter Zijlstra
	A more readable version, with a few differences: - don't check against the root domain, but instead check SD_LOAD_BALANCE - don't re-iterate the cpus already iterated on the previous SD - use rcu_read_lock() around the sd iteration Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: cleanup wake_idle power saving	Peter Zijlstra
	Hopefully a more readable version of the same. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	x86: sched: provide arch implementations using aperf/mperf	Peter Zijlstra
	APERF/MPERF support for cpu_power. APERF/MPERF is arch defined to be a relative scale of work capacity per logical cpu, this is assumed to include SMT and Turbo mode. APERF/MPERF are specified to both reset to 0 when either counter wraps, which is highly inconvenient, since that'll give a blimp when that happens. The manual specifies writing 0 to the counters after each read, but that's 1) too expensive, and 2) destroys the possibility of sharing these counters with other users, so we live with the blimp - the other existing user does too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	Provide an arch specific hook for cpufreq based scaling of cpu_power.	Peter Zijlstra
	[ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	x86: Add generic aperf/mperf code	Dinakar Guniguntala
	Move some of the aperf/mperf code out from the cpufreq driver thingy so that other people can enjoy it too. Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	x86: move APERF/MPERF into a X86_FEATURE	Peter Zijlstra
	Move the APERFMPERF capacility into a X86_FEATURE flag so that it can be used outside of the acpi cpufreq driver. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: remove reciprocal for cpu_power	Peter Zijlstra
	Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> [andreas.herrmann3@amd.com: remove include] Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: try to deal with low capacity	Peter Zijlstra
	When the capacity drops low, we want to migrate load away. Allow the load-balancer to remove all tasks when we hit rock bottom. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> [ego@in.ibm.com: fix to update_sd_power_savings_stats] Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: scale down cpu_power due to RT tasks	Peter Zijlstra
	Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: dynamic cpu_power	Peter Zijlstra
	Recompute the cpu_power for each cpu during load-balance Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: add smt_gain	Peter Zijlstra
	The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: update the cpu_power sum during load-balance	Peter Zijlstra
	In order to prepare for a more dynamic cpu_power, update the group sum while walking the sched domains during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: SD_PREFER_SIBLING	Peter Zijlstra
	Do the placement thing using SD flags XXX: consider degenerate bits [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-29	sched: restore __cpu_power to a straight sum of power	Peter Zijlstra
	cpu_power is supposed to be a representation of the process capacity of the cpu, not a value to randomly tweak in order to affect placement. Remove the placement hacks. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28	futex: Correct queue_me and unqueue_me commentary	Darren Hart
	The queue_me/unqueue_me commentary is oddly placed and out of date. Clean it up and correct the inaccurate bits. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053015.8717.71713.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Move drop_futex_key_refs out of spinlock'ed region	Darren Hart
	When requeuing tasks from one futex to another, the reference held by the requeued task to the original futex location needs to be dropped eventually. Dropping the reference may ultimately lead to a call to "iput_final" and subsequently call into filesystem- specific code - which may be non-atomic. It is therefore safer to defer this drop operation until after the futex_hash_bucket spinlock has been dropped. Originally-From: Helge Bahmann <hcb@chaoticmind.net> Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: <stable@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Sven-Thorsten Dietrich <sdietrich@novell.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <4AD7A298.5040802@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Add memory barrier commentary to futex_wait_queue_me()	Darren Hart
	The memory barrier semantics of futex_wait_queue_me() are non-obvious. Add some commentary to try and clarify it. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924185447.694.38948.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Correct futex_wait_requeue_pi() commentary	Darren Hart
	The state machine described in the comments wasn't updated with a follow-on fix. Address that and cleanup the corresponding commentary in the function. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <4A737C2A.9090001@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Fix locking imbalance	Thomas Gleixner
	Rich reported a lock imbalance in the futex code: http://bugzilla.kernel.org/show_bug.cgi?id=14288 It's caused by the displacement of the retry_private label in futex_wake_op(). The code unlocks the hash bucket locks in the error handling path and retries without locking them again which makes the next unlock fail. Move retry_private so we lock the hash bucket locks when we retry. Reported-by: Rich Ercolany <rercola@acm.jhu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: stable-2.6.31 <stable@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Correct futex_wait_requeue_pi() commentary	Darren Hart
	Correct various typos and formatting inconsistencies in the commentary of futex_wait_requeue_pi(). Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922052958.8717.21932.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Make function kernel-doc commentary consistent	Darren Hart
	Make the existing function kernel-doc consistent throughout futex.c, following Documentation/kernel-doc-nano-howto.txt as closely as possible. When unsure, at least be consistent within futex.c. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053022.8717.13339.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Correct futex_q woken state commentary	Darren Hart
	Use kernel-doc format to describe struct futex_q. Correct the wakeup definition to eliminate the statement about waking the waiter between the plist_del() and the q->lock_ptr = 0. Note in the comment that PI futexes have a different definition of the woken state. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053029.8717.62798.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-28	futex: Check for NULL keys in match_futex	Darren Hart
	If userspace tries to perform a requeue_pi on a non-requeue_pi waiter, it will find the futex_q->requeue_pi_key to be NULL and OOPS. Check for NULL in match_futex() instead of doing explicit NULL pointer checks on all call sites. While match_futex(NULL, NULL) returning false is a little odd, it's still correct as we expect valid key references. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Dinakar Guniguntala <dino@in.ibm.com> CC: John Stultz <johnstul@us.ibm.com> Cc: stable@kernel.org LKML-Reference: <4AD60687.10306@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28	futex: Fix spurious wakeup for requeue_pi really	Thomas Gleixner
	The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr == NULL test) nor does it use the wake_list of futex_wake() which where the reason for commit 41890f2 (futex: Handle spurious wake up) See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com> The changes in this fix to the wait_requeue_pi path were considered to be a likely unecessary, but harmless safety net. But it turns out that due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined as EAGAIN we built an endless loop in the code path which returns correctly EWOULDBLOCK. Spurious wakeups in wait_requeue_pi code path are unlikely so we do the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let it deal with the spurious wakeup. Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> LKML-Reference: <4AE23C74.1090502@us.ibm.com> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-28	futex: Detect mismatched requeue targets	Thomas Gleixner
	There is currently no check to ensure that userspace uses the same futex requeue target (uaddr2) in futex_requeue() that the waiter used in futex_wait_requeue_pi(). A mismatch here could very unexpected results as the waiter assumes it either wakes on uaddr1 or uaddr2. We could detect this on wakeup in the waiter, but the cleanup is more intense after the improper requeue has occured. This patch stores the waiter's expected requeue target in a new requeue_pi_key pointer in the futex_q which futex_requeue() checks prior to attempting to do a proxy lock acquistion or a requeue when requeue_pi=1. If they don't match, return -EINVAL from futex_requeue, aborting the requeue of any remaining waiters. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090814003650.14634.63916.stgit@Aeon> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Conflicts: kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-22	Linux 2.6.31.5v2.6.31.5	Greg Kroah-Hartman
	Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	tty: Make flush_to_ldisc() locking more robust	Linus Torvalds
	commit c8e33141911bf8fe87dc6c92793b9a59b2be0130 upstream. The locking logic in this function is extremely subtle, and it broke when we started doing potentially concurrent 'flush_to_ldisc()' calls in commit e043e42bdb66885b3ac10d27a01ccb9972e2b0a3 ("pty: avoid forcing 'low_latency' tty flag"). The code in flush_to_ldisc() used to set 'tty->buf.head' to NULL, with the intention that this would then cause any other concurrent calls to not do anything (locking note: we have to drop the buf.lock over the call to ->receive_buf that can block, which is why we can have concurrency here at all in the first place). It also used to set the TTY_FLUSHING bit, which would then cause any concurrent 'tty_buffer_flush()' to not free all the tty buffers and clear 'tty->buf.tail'. And with 'buf.head' being NULL, and 'buf.tail' being non-NULL, new data would never touch 'buf.head'. Does that sound a bit too subtle? It was. If another concurrent call to 'flush_to_ldisc()' were to come in, the NULL buf.head would indeed cause it to not process the buffer list, but it would still clear TTY_FLUSHING afterwards, making the buffer protection against 'tty_buffer_flush()' no longer work. So this clears it all up. We depend purely on TTY_FLUSHING for handling re-entrancy, and stop playing games with the buffer list entirely. In fact, the buffer list handling is now robust enough that we could probably stop doing the whole "protect against 'tty_buffer_flush()'" thing entirely. However, Alan also points out that we would probably be better off simplifying the locking even further, and just take the tty ldisc_mutex around all the buffer flushing calls. That seems like a good idea, but in the meantime this is a conceptually minimal fix (with the patch itself being bigger than required just to clean the code up and make it readable). This fixes keyboard trouble under X: http://bugzilla.kernel.org/show_bug.cgi?id=14388 Reported-and-tested-by: Frédéric Meunier <fredlwm@gmail.com> Reported-and-tested-by: Boyan <btanastasov@yahoo.co.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	mac80211: fix vlan and optimise RX	Johannes Berg
	commit fbc44bf7177dfd61381da55405550b693943a432 upstream. When receiving data frames, we can send them only to the interface they belong to based on transmitting station (this doesn't work for probe requests). Also, don't try to handle other frames for AP_VLAN at all since those interface should only receive data. Additionally, the transmit side must check that the station we're sending a frame to is actually on the interface we're transmitting on, and not transmit packets to functions that live on other interfaces, so validate that as well. Another bug fix is needed in sta_info.c where in the VLAN case when adding/removing stations we overwrite the sdata variable we still need. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	iwlwifi: incorrect method used for finding valid OTP blocks	Jay Sternberg
	commit 2facba769d7f9e563cf706de709074a2d20f1bba upstream. The address stored in the next link address is a word address but when reading the OTP blocks, a byte address is used. Also if the blocks are full and the last link pointer is not zero, then none of the blocks are valid so return an error. The algorithm is simply valid blocks have a next address and that address's contents is zero. Using the wrong address for the next link address gets arbitrary data, obviously. In cases seen, the first block is considered valid when it is not. If the block has in fact been invalidated there may be old data or there may be no data, bad data, or partial data, there is no way of telling. Without this patch it is possible that a device with valid OTP data is unable to work. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	usb-storage: Workaround devices with bogus sense size	Benjamin Herrenschmidt
	commit b8430e1b82b7e514d76a88eb70a7d8831d50df1e upstream. usb-storage: Workaround devices with bogus sense size Some devices, such as Huawei E169, advertise more than the standard amount of sense data, causing us to set US_FL_SANE_SENSE, assuming they support it. However, they subsequently fail the request sense with that size. This works around it generically. When a sense request fails due to a device returning an error, US_FL_SANE_SENSE was set, and that sense request used a larger sense size, we retry with a smaller size before giving up. Based on an original patch by Ben Efros <ben@pc-doctor.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	Staging: rt2860sta: prevent a panic when disabling when associated	Darren Salt
	commit 0af49167b1e5ba154e90d2c454bf4624ee47df80 upstream. This fixes a panic which is triggered when the hardware "disappears" from beneath the driver, i.e. when wireless is toggled off via Fn-F2 on various EeePC models. Ref. bug report http://bugzilla.kernel.org/show_bug.cgi?id=13390 panic http://bugzilla.kernel.org/attachment.cgi?id=21928 Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	sysfs: Allow sysfs_notify_dirent to be called from interrupt context.	Neil Brown
	commit 83db93f4de2d9ae441a491d1dc61c2204f0195de upstream. sysfs_notify_dirent is a simple atomic operation that can be used to alert user-space that new data can be read from a sysfs attribute. Unfortunately it cannot currently be called from non-process context because of its use of spin_lock which is sometimes taken with interrupts enabled. So change all lockers of sysfs_open_dirent_lock to disable interrupts, thus making sysfs_notify_dirent safe to be called from non-process context (as drivers/md does in md_safemode_timeout). sysfs_get_open_dirent is (documented as being) only called from process context, so it uses spin_lock_irq. Other places use spin_lock_irqsave. The usage for sysfs_notify_dirent in md_safemode_timeout was introduced in 2.6.28, so this patch is suitable for that and more recent kernels. Reported-by: Joel Andres Granados <jgranado@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	bsdacct: switch credentials for writing to the accounting file	Michal Schmidt
	commit d8e180dcd5bbbab9cd3ff2e779efcf70692ef541 upstream. When process accounting is enabled, every exiting process writes a log to the account file. In addition, every once in a while one of the exiting processes checks whether there's enough free space for the log. SELinux policy may or may not allow the exiting process to stat the fs. So unsuspecting processes start generating AVC denials just because someone enabled process accounting. For these filesystem operations, the exiting process's credentials should be temporarily switched to that of the process which enabled accounting, because it's really that process which wanted to have the accounting information logged. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	ALSA: Don't assume i2c device probing always succeeds	Takashi Iwai
	commit 18c4078489fe064cc0ed08be3381cf2f26657f5f upstream. The client->driver pointer can be NULL when i2c-device probing fails in i2c_new_device(). This patch adds the NULL checks for client->driver and return the error instead of blind assumption of driver availability. Reported-by: Tim Shepard <shep@alum.mit.edu> Cc: Jean Delvare <khali@linux-fr.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22	i2c: Hide probe errors caused by ACPI resource conflicts	Jean Delvare
	commit 18669eabde2ff5fc446e72e043f0539059763438 upstream. When an ACPI resource conflict is detected, error messages are already printed by ACPI. There's no point in causing the driver core to print more error messages, so return one of the error codes for which no message is printed. This fixes bug #14293: http://bugzilla.kernel.org/show_bug.cgi?id=14293 Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>