lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2012-03-12	NOMMU: Don't need to clear vm_mm when deleting a VMA	David Howells
	commit b94cfaf6685d691dc3fab023cf32f65e9b7be09c upstream. Don't clear vm_mm in a deleted VMA as it's unnecessary and might conceivably break the filesystem or driver VMA close routine. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	mm: memcg: Correct unregistring of events attached to the same eventfd	Anton Vorontsov
	commit 371528caec553785c37f73fa3926ea0de84f986f upstream. There is an issue when memcg unregisters events that were attached to the same eventfd: - On the first call mem_cgroup_usage_unregister_event() removes all events attached to a given eventfd, and if there were no events left, thresholds->primary would become NULL; - Since there were several events registered, cgroups core will call mem_cgroup_usage_unregister_event() again, but now kernel will oops, as the function doesn't expect that threshold->primary may be NULL. That's a good question whether mem_cgroup_usage_unregister_event() should actually remove all events in one go, but nowadays it can't do any better as cftype->unregister_event callback doesn't pass any private event-associated cookie. So, let's fix the issue by simply checking for threshold->primary. FWIW, w/o the patch the following oops may be observed: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0 Pid: 574, comm: kworker/0:2 Not tainted 3.3.0-rc4+ #9 Bochs Bochs RIP: 0010:[<ffffffff810be32c>] [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0 RSP: 0018:ffff88001d0b9d60 EFLAGS: 00010246 Process kworker/0:2 (pid: 574, threadinfo ffff88001d0b8000, task ffff88001de91cc0) Call Trace: [<ffffffff8107092b>] cgroup_event_remove+0x2b/0x60 [<ffffffff8103db94>] process_one_work+0x174/0x450 [<ffffffff8103e413>] worker_thread+0x123/0x2d0 Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	mmc: sdhci-esdhc-imx: fix for mmc cards on i.MX5	Sascha Hauer
	commit 5b6b0ad6e572b32a641116aaa5f897ffebe31e44 upstream. On i.MX53 we have to write a special SDHCI_CMD_ABORTCMD to the SDHCI_TRANSFER_MODE register during a MMC_STOP_TRANSMISSION command. This works for SD cards. However, with MMC cards the MMC_SET_BLOCK_COUNT command is used instead, but this needs the same handling. Fix MMC cards by testing for the MMC_SET_BLOCK_COUNT command aswell. Tested on a custom i.MX53 board with a Transcend MMC+ card and eMMC. The kernel started used MMC_SET_BLOCK_COUNT in 3.0, so this is a regression for these boards introduced in 3.0; it should go to 3.0/3.1/3.2-stable. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	alpha: fix 32/64-bit bug in futex support	Andrew Morton
	commit 62aca403657fe30e5235c5331e9871e676d9ea0a upstream. Michael Cree said: : : I have noticed some user space problems (pulseaudio crashes in pthread : : code, glibc/nptl test suite failures, java compiler freezes on SMP alpha : : systems) that arise when using a 2.6.39 or later kernel on Alpha. : : Bisecting between 2.6.38 and 2.6.39 (using glibc/nptl test suite as : : criterion for good/bad kernel) eventually leads to: : : : : 8d7718aa082aaf30a0b4989e1f04858952f941bc is the first bad commit : : commit 8d7718aa082aaf30a0b4989e1f04858952f941bc : : Author: Michel Lespinasse <walken@google.com> : : Date: Thu Mar 10 18:50:58 2011 -0800 : : : : futex: Sanitize futex ops argument types : : : : Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic : : prototypes to use u32 types for the futex as this is the data type the : : futex core code uses all over the place. : : : : Looking at the commit I see there is a change of the uaddr argument in : : the Alpha architecture specific code for futexes from int to u32, but I : : don't see why this should cause a problem. Richard Henderson said: : futex_atomic_cmpxchg_inatomic(u32 uval, u32 __user uaddr, : u32 oldval, u32 newval) : ... : : "r"(uaddr), "r"((long)oldval), "r"(newval) : : : There is no 32-bit compare instruction. These are implemented by : consistently extending the values to a 64-bit type. Since the : load instruction sign-extends, we want to sign-extend the other : quantity as well (despite the fact it's logically unsigned). : : So: : : - : "r"(uaddr), "r"((long)oldval), "r"(newval) : + : "r"(uaddr), "r"((long)(int)oldval), "r"(newval) : : should do the trick. Michael said: : This fixes the glibc test suite failures and the pulseaudio related : crashes, but it does not fix the java compiiler lockups that I was (and : are still) observing. That is some other problem. Reported-by: Michael Cree <mcree@orcon.net.nz> Tested-by: Michael Cree <mcree@orcon.net.nz> Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Michel Lespinasse <walken@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	Move Logitech Harmony 900 from cdc_ether to zaurus	Scott Talbert
	commit ee932bf9acb2e2c6a309e808000f24856330e3f9 upstream. In the current kernel implementation, the Logitech Harmony 900 remote control is matched to the cdc_ether driver through the generic USB_CDC_SUBCLASS_MDLM entry. However, this device appears to be of the pseudo-MDLM (Belcarra) type, rather than the standard one. This patch blacklists the Harmony 900 from the cdc_ether driver and whitelists it for the pseudo-MDLM driver in zaurus. Signed-off-by: Scott Talbert <talbert@techie.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ARM: S3C24XX: DMA resume regression fix	Gusakov Andrey
	commit e39d40c65dfd8390b50c03482ae9e289b8a8f351 upstream. s3c2410_dma_suspend suspends channels from 0 to dma_channels. s3c2410_dma_resume resumes channels in reverse order. So pointer should be decremented instead of being incremented. Signed-off-by: Gusakov Andrey <dron0gus@gmail.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	genirq: Clear action->thread_mask if IRQ_ONESHOT is not set	Thomas Gleixner
	commit 52abb700e16a9aa4cbc03f3d7f80206cbbc80680 upstream. Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled by handle_level_irq. This happens because thread_mask is or'ed unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So the check for !desc->thread_active fails and keeps the interrupt disabled. Keep the thread_mask zero for !IRQ_ONESHOT interrupts. Document the thread_mask magic while at it. Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de> Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	mfd: Fix ACPI conflict check	Jean Delvare
	commit 81b5482c32769abb6dfb979560dab2f952ba86fa upstream. The code is currently always checking the first resource of every device only (several times.) This has been broken since the ACPI check was added in February 2010 in commit 91fedede0338eb6203cdd618d8ece873fdb7c22c. Fix the check to run on each resource individually, once. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	regset: Return -EFAULT, not -EIO, on host-side memory fault	H. Peter Anvin
	commit 5189fa19a4b2b4c3bec37c3a019d446148827717 upstream. There is only one error code to return for a bad user-space buffer pointer passed to a system call in the same address space as the system call is executed, and that is EFAULT. Furthermore, the low-level access routines, which catch most of the faults, return EFAULT already. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	regset: Prevent null pointer reference on readonly regsets	H. Peter Anvin
	commit c8e252586f8d5de906385d8cf6385fee289a825e upstream. The regset common infrastructure assumed that regsets would always have .get and .set methods, but not necessarily .active methods. Unfortunately people have since written regsets without .set methods. Rather than putting in stub functions everywhere, handle regsets with null .get or .set methods explicitly. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ALSA: hda - Always set HP pin in unsol handler for STAC/IDT codecs	Takashi Iwai
	commit 7bff172a352a2fbe9856bba517d71a2072aab041 upstream. A bug report with an old Sony laptop showed that we can't rely on BIOS setting the pins of headphones but the driver should set always by itself. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ALSA: hda - Add a fake mute feature	Takashi Iwai
	commit 3868137ea41866773e75d9ac4b9988dcc361ff1d upstream. Some codecs don't supply the mute amp-capabilities although the lowest volume gives the mute. It'd be handy if the parser provides the mute mixers in such a case. This patch adds an extension amp-cap bit (which is used only in the driver) to represent the min volume = mute state. Also modified the amp cache code to support the fake mute feature when this bit is set but the real mute bit is unset. In addition, conexant cx5051 parser uses this new feature to implement the missing mute controls. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42825 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	S390: KEYS: Enable the compat keyctl wrapper on s390x	David Howells
	commit 1d057720609ed052a6371fe1d53300e5e6328e94 upstream. Enable the compat keyctl wrapper on s390x so that 32-bit s390 userspace can call the keyctl() syscall. There's an s390x assembly wrapper that truncates all the register values to 32-bits and this then calls compat_sys_keyctl() - but the latter only exists if CONFIG_KEYS_COMPAT is enabled, and the s390 Kconfig doesn't enable it. Without this patch, 32-bit calls to the keyctl() syscall are given an ENOSYS error: [root@devel4 ~]# keyctl show Session Keyring -3: key inaccessible (Function not implemented) Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: dan@danny.cz Cc: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: linux-s390@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	regulator: fix the ldo configure according to 88pm860x spec	Jett.Zhou
	commit 3380643b0eaa7ecf99c4f095bdfcb6e5df471616 upstream. Signed-off-by: Jett.Zhou <jtzhou@marvell.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	i2c: mxs: only flag completion when queue is completely done	Wolfram Sang
	commit 844990daa2e69a4258049ba9c2bae1180657dac3 upstream. The hardware generates an interrupt for every completed command in the queue while the code assumed that it will only generate one interrupt when the queue is empty. So, explicitly check if the queue is really empty. This patch fixed problems which occurred due to high traffic on the bus. While we are here, move the completion-initialization after the parameter error checking. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Lothar Waßmann <LW@KARO-electronics.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	watchdog: hpwdt: clean up set_memory_x call for 32 bit	Maxim Uvarov
	commit 97d2a10d5804d585ab0b58efbd710948401b886a upstream. 1. address has to be page aligned. 2. set_memory_x uses page size argument, not size. Bug causes with following commit: commit da28179b4e90dda56912ee825c7eaa62fc103797 Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com> Date: Mon Nov 7 10:59:00 2011 +0100 watchdog: hpwdt: Changes to handle NX secure bit in 32bit path commit e67d668e147c3b4fec638c9e0ace04319f5ceccd upstream. This patch makes use of the set_memory_x() kernel API in order to make necessary BIOS calls to source NMIs. Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ARM: LPC32xx: Fix irq on GPI_28	Roland Stigge
	commit f6737055c1c432a9628a9a731f9881ad8e0a9eee upstream. The GPI_28 IRQ was not registered properly. The registration of IRQ_LPC32XX_GPI_28 was added and the (wrong) IRQ_LPC32XX_GPI_11 at LPC32XX_SIC1_IRQ(4) was replaced by IRQ_LPC32XX_GPI_28 (see manual of LPC32xx / interrupt controller). Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ARM: LPC32xx: Fix interrupt controller init	Roland Stigge
	commit 35dd0a75d4a382e7f769dd0277732e7aa5235718 upstream. This patch fixes the initialization of the interrupt controller of the LPC32xx by correctly setting up SIC1 and SIC2 instead of (wrongly) using the same value as for the Main Interrupt Controller (MIC). Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ARM: LPC32xx: irq.c: Clear latched event	Roland Stigge
	commit 94ed7830cba4dce57b18a2926b5d826bfd184bd6 upstream. This patch fixes the wakeup disable function by clearing latched events. Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ARM: LPC32xx: serial.c: Fixed loop limit	Roland Stigge
	commit ff424aa4c89d19082e8ae5a3351006bc8a4cd91b upstream. This patch fixes a wrong loop limit on UART init. Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	ARM: LPC32xx: serial.c: HW bug workaround	Roland Stigge
	commit 2707208ee8a80dbbd5426f5aa1a934f766825bb5 upstream. This patch fixes a HW bug by flushing RX FIFOs of the UARTs on init. It was ported from NXP's git.lpclinux.com tree. Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	drm/i915: Prevent a machine hang by checking crtc->active before loading lut	Alban Browaeys
	commit aed3f09db39596e539f90b11a5016aea4d8442e1 upstream. Before loading the lut (gamma), check the active state of intel_crtc, otherwise at least on gen2 hang ensue. This is reproducible in Xorg via: xset dpms force off then xgamma -rgamma 2.0 # freeze. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44505 Signed-off-by: Alban Browaeys <prahal@yahoo.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	compat: fix compile breakage on s390	Heiko Carstens
	commit 048cd4e51d24ebf7f3552226d03c769d6ad91658 upstream. The new is_compat_task() define for the !COMPAT case in include/linux/compat.h conflicts with a similar define in arch/s390/include/asm/compat.h. This is the minimal patch which fixes the build issues. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	Fix autofs compile without CONFIG_COMPAT	Linus Torvalds
	commit 3c761ea05a8900a907f32b628611873f6bef24b2 upstream. The autofs compat handling fix caused a compile failure when CONFIG_COMPAT isn't defined. Instead of adding random #ifdef'fery in autofs, let's just make the compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task() just hardcodes to zero. We could probably do something similar for a number of other cases where we have #ifdef's in code, but this is the low-hanging fruit. Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12	autofs: work around unhappy compat problem on x86-64	Ian Kent
	commit a32744d4abae24572eff7269bc17895c41bd0085 upstream. When the autofs protocol version 5 packet type was added in commit 5c0a32fc2cd0 ("autofs4: add new packet type for v5 communications"), it obvously tried quite hard to be word-size agnostic, and uses explicitly sized fields that are all correctly aligned. However, with the final "char name[NAME_MAX+1]" array at the end, the actual size of the structure ends up being not very well defined: because the struct isn't marked 'packed', doing a "sizeof()" on it will align the size of the struct up to the biggest alignment of the members it has. And despite all the members being the same, the alignment of them is different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round number (256), the name[] array starts out a 4-byte aligned. End result: the "packed" size of the structure is 300 bytes: 4-byte, but not 8-byte aligned. As a result, despite all the fields being in the same place on all architectures, sizeof() will round up that size to 304 bytes on architectures that have 8-byte alignment for u64. Note that this is not a problem for 32-bit compat mode on POWER, since there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit and 64-bit alignment is different for 64-bit entities, and as a result the structure that has exactly the same layout has different sizes. So on x86-64, but no other architecture, we will just subtract 4 from the size of the structure when running in a compat task. That way we will write the properly sized packet that user mode expects. Not pretty. Sadly, this very subtle, and unnecessary, size difference has been encoded in user space that wants to read packets of exactly the right size, and will refuse to touch anything else. Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	Linux 3.0.23v3.0.23	Greg Kroah-Hartman

2012-02-29	cdrom: use copy_to_user() without the underscores	Dan Carpenter
	commit 822bfa51ce44f2c63c300fdb76dc99c4d5a5ca9f upstream. "nframes" comes from the user and "nframes * CD_FRAMESIZE_RAW" can wrap on 32 bit systems. That would have been ok if we used the same wrapped value for the copy, but we use a shifted value. We should just use the checked version of copy_to_user() because it's not going to make a difference to the speed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	epoll: limit paths	Jason Baron
	commit 28d82dc1c4edbc352129f97f4ca22624d1fe61de upstream. The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	epoll: ep_unregister_pollwait() can use the freed pwq->whead	Oleg Nesterov
	commit 971316f0503a5c50633d07b83b6db2f15a3a5b00 upstream. signalfd_cleanup() ensures that ->signalfd_wqh is not used, but this is not enough. eppoll_entry->whead still points to the memory we are going to free, ep_unregister_pollwait()->remove_wait_queue() is obviously unsafe. Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL, change ep_unregister_pollwait() to check pwq->whead != NULL under rcu_read_lock() before remove_wait_queue(). We add the new helper, ep_remove_wait_queue(), for this. This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because ->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand. ep_unregister_pollwait()->remove_wait_queue() can play with already freed and potentially reused ->sighand, but this is fine. This memory must have the valid ->signalfd_wqh until rcu_read_unlock(). Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()	Oleg Nesterov
	commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream. This patch is intentionally incomplete to simplify the review. It ignores ep_unregister_pollwait() which plays with the same wqh. See the next change. epoll assumes that the EPOLL_CTL_ADD'ed file controls everything f_op->poll() needs. In particular it assumes that the wait queue can't go away until eventpoll_release(). This is not true in case of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand which is not connected to the file. This patch adds the special event, POLLFREE, currently only for epoll. It expects that init_poll_funcptr()'ed hook should do the necessary cleanup. Perhaps it should be defined as EPOLLFREE in eventpoll. __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if ->signalfd_wqh is not empty, we add the new signalfd_cleanup() helper. ep_poll_callback(POLLFREE) simply does list_del_init(task_list). This make this poll entry inconsistent, but we don't care. If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special". I simply do not know how we can define the "right" semantics if it used with epoll. The main problem is, epoll calls signalfd_poll() once to establish the connection with the wait queue, after that signalfd_poll(NULL) returns the different/inconsistent results depending on who does EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd has nothing to do with the file, it works with the current thread. In short: this patch is the hack which tries to fix the symptoms. It also assumes that nobody can take tasklist_lock under epoll locks, this seems to be true. Note: - we do not have wake_up_all_poll() but wake_up_poll() is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE. - signalfd_cleanup() uses POLLHUP along with POLLFREE, we need a couple of simple changes in eventpoll.c to make sure it can't be "lost". Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	hwmon: (f75375s) Fix register write order when setting fans to full speed	Nikolaus Schulz
	commit c1c1a3d012fe5e82a9a025fb4b5a4f8ee67a53f6 upstream. By hwmon sysfs interface convention, setting pwm_enable to zero sets a fan to full speed. In the f75375s driver, this need be done by enabling manual fan control, plus duty mode for the F875387 chip, and then setting the maximum duty cycle. Fix a bug where the two necessary register writes were swapped, effectively discarding the setting to full-speed. Signed-off-by: Nikolaus Schulz <mail@microschulz.de> Cc: Riku Voipio <riku.voipio@iki.fi> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	hdpvr: fix race conditon during start of streaming	Janne Grunau
	commit afa159538af61f1a65d48927f4e949fe514fb4fc upstream. status has to be set to STREAMING before the streaming worker is queued. hdpvr_transmit_buffers() will exit immediately otherwise. Reported-by: Joerg Desch <vvd.joede@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	builddeb: Don't create files in /tmp with predictable names	Ben Hutchings
	commit 6c635224602d760c1208ada337562f40d8ae93a5 upstream. The current use of /tmp for file lists is insecure. Put them under $objtree/debian instead. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: maximilian attems <max@stro.at> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	davinci_emac: Do not free all rx dma descriptors during init	Christian Riesch
	commit 5d69703263d588dbb03f4e57091afd8942d96e6d upstream. This patch fixes a regression that was introduced by commit 0a5f38467765ee15478db90d81e40c269c8dda20 davinci_emac: Add Carrier Link OK check in Davinci RX Handler Said commit adds a check whether the carrier link is ok. If the link is not ok, the skb is freed and no new dma descriptor added to the rx dma channel. This causes trouble during initialization when the carrier status has not yet been updated. If a lot of packets are received while netif_carrier_ok returns false, all dma descriptors are freed and the rx dma transfer is stopped. The bug occurs when the board is connected to a network with lots of traffic and the ifconfig down/up is done, e.g., when reconfiguring the interface with DHCP. The bug can be reproduced by flood pinging the davinci board while doing ifconfig eth0 down ifconfig eth0 up on the board. After that, the rx path stops working and the overrun value reported by ifconfig is counting up. This patch reverts commit 0a5f38467765ee15478db90d81e40c269c8dda20 and instead issues warnings only if cpdma_chan_submit returns -ENOMEM. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Cc: <stable@vger.kernel.org> Cc: Cyril Chemparathy <cyril@ti.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Rajashekhara, Sudhakar <sudhakar.raj@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	jme: Fix FIFO flush issue	Guo-Fu Tseng
	commit ba9adbe67e288823ac1deb7f11576ab5653f833e upstream. Set the RX FIFO flush watermark lower. According to Federico and JMicron's reply, setting it to 16QW would be stable on most platforms. Otherwise, user might experience packet drop issue. Reported-by: Federico Quagliata <federico@quagliata.org> Fixed-by: Federico Quagliata <federico@quagliata.org> Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	ipvs: fix matching of fwmark templates during scheduling	Simon Horman
	commit e0aac52e17a3db68fe2ceae281780a70fc69957f upstream. Commit f11017ec2d1859c661f4e2b12c4a8d250e1f47cf (2.6.37) moved the fwmark variable in subcontext that is invalidated before reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer in the param structure make sure the fwmark variable is in same context. As the fwmark templates can not be matched, more and more template connections are created and the controlled connections can not go to single real server. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	scsi_pm: Fix bug in the SCSI power management handler	Alan Stern
	commit fea6d607e154cf96ab22254ccb48addfd43d4cb5 upstream. This patch (as1520) fixes a bug in the SCSI layer's power management implementation. LUN scanning can be carried out asynchronously in do_scan_async(), and sd uses an asynchronous thread for the time-consuming parts of disk probing in sd_probe_async(). Currently nothing coordinates these async threads with system sleep transitions; they can and do attempt to continue scanning/probing SCSI devices even after the host adapter has been suspended. As one might expect, the outcome is not ideal. This is what the "prepare" stage of system suspend was created for. After the prepare callback has been called for a host, target, or device, drivers are not allowed to register any children underneath them. Currently the SCSI prepare callback is not implemented; this patch rectifies that omission. For SCSI hosts, the prepare routine calls scsi_complete_async_scans() to wait until async scanning is finished. It might be slightly more efficient to wait only until the host in question has been scanned, but there's currently no way to do that. Besides, during a sleep transition we will ultimately have to wait until all the host scanning has finished anyway. For SCSI devices, the prepare routine calls async_synchronize_full() to wait until sd probing is finished. The routine does nothing for SCSI targets, because asynchronous target scanning is done only as part of host scanning. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost'	Huajun Li
	commit 267a6ad4aefaafbde607804c60945bcf97f91c1b upstream. In do_scan_async(), calling scsi_autopm_put_host(shost) may reference freed shost, and cause Posison overwitten warning. Yes, this case can happen, for example, an USB is disconnected just when do_scan_async() thread starts to run, then scsi_host_put() called in scsi_finish_async_scan() will lead to shost be freed(because the refcount of shost->shost_gendev decreases to 1 after USB disconnects), at this point, if references shost again, system will show following warning msg. To make scsi_autopm_put_host(shost) always reference a valid shost, put it just before scsi_host_put() in function scsi_finish_async_scan(). [ 299.281565] ============================================================================= [ 299.281634] BUG kmalloc-4096 (Tainted: G I ): Poison overwritten [ 299.281682] ----------------------------------------------------------------------------- [ 299.281684] [ 299.281752] INFO: 0xffff880056c305d0-0xffff880056c305d0. First byte 0x6a instead of 0x6b [ 299.281816] INFO: Allocated in scsi_host_alloc+0x4a/0x490 age=1688 cpu=1 pid=2004 [ 299.281870] __slab_alloc+0x617/0x6c1 [ 299.281901] __kmalloc+0x28c/0x2e0 [ 299.281931] scsi_host_alloc+0x4a/0x490 [ 299.281966] usb_stor_probe1+0x5b/0xc40 [usb_storage] [ 299.282010] storage_probe+0xa4/0xe0 [usb_storage] [ 299.282062] usb_probe_interface+0x172/0x330 [usbcore] [ 299.282105] driver_probe_device+0x257/0x3b0 [ 299.282138] __driver_attach+0x103/0x110 [ 299.282171] bus_for_each_dev+0x8e/0xe0 [ 299.282201] driver_attach+0x26/0x30 [ 299.282230] bus_add_driver+0x1c4/0x430 [ 299.282260] driver_register+0xb6/0x230 [ 299.282298] usb_register_driver+0xe5/0x270 [usbcore] [ 299.282337] 0xffffffffa04ab03d [ 299.282364] do_one_initcall+0x47/0x230 [ 299.282396] sys_init_module+0xa0f/0x1fe0 [ 299.282429] INFO: Freed in scsi_host_dev_release+0x18a/0x1d0 age=85 cpu=0 pid=2008 [ 299.282482] __slab_free+0x3c/0x2a1 [ 299.282510] kfree+0x296/0x310 [ 299.282536] scsi_host_dev_release+0x18a/0x1d0 [ 299.282574] device_release+0x74/0x100 [ 299.282606] kobject_release+0xc7/0x2a0 [ 299.282637] kobject_put+0x54/0xa0 [ 299.282668] put_device+0x27/0x40 [ 299.282694] scsi_host_put+0x1d/0x30 [ 299.282723] do_scan_async+0x1fc/0x2b0 [ 299.282753] kthread+0xdf/0xf0 [ 299.282782] kernel_thread_helper+0x4/0x10 [ 299.282817] INFO: Slab 0xffffea00015b0c00 objects=7 used=7 fp=0x (null) flags=0x100000000004080 [ 299.282882] INFO: Object 0xffff880056c30000 @offset=0 fp=0x (null) [ 299.282884] ... Signed-off-by: Huajun Li <huajun.li.lee@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	genirq: Handle pending irqs in irq_startup()	Thomas Gleixner
	commit b4bc724e82e80478cba5fe9825b62e71ddf78757 upstream. An interrupt might be pending when irq_startup() is called, but the startup code does not invoke the resend logic. In some cases this prevents the device from issuing another interrupt which renders the device non functional. Call the resend function in irq_startup() to keep things going. Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	genirq: Unmask oneshot irqs when thread was not woken	Thomas Gleixner
	commit ac5637611150281f398bb7a47e3fcb69a09e7803 upstream. When the primary handler of an interrupt which is marked IRQ_ONESHOT returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not woken and the unmask logic of the interrupt line is never invoked. This keeps the interrupt masked forever. This was not noticed as most IRQ_ONESHOT users wake the thread unconditionally (usually because they cannot access the underlying device from hard interrupt context). Though this behaviour was nowhere documented and not necessarily intentional. Some drivers can avoid the thread wakeup in certain cases and run into the situation where the interrupt line s kept masked. Handle it gracefully. Reported-and-tested-by: Lothar Wassmann <lw@karo-electronics.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	ath9k: stop on rates with idx -1 in ath9k rate control's .tx_status	Pavel Roskin
	commit 2504a6423b9ab4c36df78227055995644de19edb upstream. Rate control algorithms are supposed to stop processing when they encounter a rate with the index -1. Checking for rate->count not being zero is not enough. Allowing a rate with negative index leads to memory corruption in ath_debug_stat_rc(). One consequence of the bug is discussed at https://bugzilla.redhat.com/show_bug.cgi?id=768639 Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors	Andreas Herrmann
	commit 32c3233885eb10ac9cb9410f2f8cd64b8df2b2a1 upstream. For L1 instruction cache and L2 cache the shared CPU information is wrong. On current AMD family 15h CPUs those caches are shared between both cores of a compute unit. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=42607 Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Petkov Borislav <Borislav.Petkov@amd.com> Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/20120208195229.GA17523@alberich.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	USB: Don't fail USB3 probe on missing legacy PCI IRQ.	Sarah Sharp
	commit 68d07f64b8a11a852d48d1b05b724c3e20c0d94b upstream Intel has a PCI USB xhci host controller on a new platform. It doesn't have a line IRQ definition in BIOS. The Linux driver refuses to initialize this controller, but Windows works well because it only depends on MSI. Actually, Linux also can work for MSI. This patch avoids the line IRQ checking for USB3 HCDs in usb core PCI probe. It allows the xHCI driver to try to enable MSI or MSI-X first. It will fail the probe if MSI enabling failed and there's no legacy PCI IRQ. This patch should be backported to kernels as old as 2.6.32. [Maintainer note: This patch is a backport of commit 68d07f64b8a11a852d48d1b05b724c3e20c0d94b "USB: Don't fail USB3 probe on missing legacy PCI IRQ." to the 3.0 kernel. Note, the original patch description was wrong. We should not back port this to kernels older than 2.6.36, since that was the first kernel to support MSI and MSI-X for xHCI hosts. These systems will just not work without MSI support, so the probe should fail on kernels older than 2.6.36.] Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	usb-storage: fix freezing of the scanning thread	Alan Stern
	commit bb94a406682770a35305daaa241ccdb7cab399de upstream. This patch (as1521b) fixes the interaction between usb-storage's scanning thread and the freezer. The current implementation has a race: If the device is unplugged shortly after being plugged in and just as a system sleep begins, the scanning thread may get frozen before the khubd task. Khubd won't be able to freeze until the disconnect processing is complete, and the disconnect processing can't proceed until the scanning thread finishes, so the sleep transition will fail. The implementation in the 3.2 kernel suffers from an additional problem. There the scanning thread calls set_freezable_with_signal(), and the signals sent by the freezer will mess up the thread's I/O delays, which are all interruptible. The solution to both problems is the same: Replace the kernel thread used for scanning with a delayed-work routine on the system freezable work queue. Freezable work queues have the nice property that you can cancel a work item even while the work queue is frozen, and no signals are needed. The 3.2 version of this patch solves the problem in Bugzilla #42730. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	i387: re-introduce FPU state preloading at context switch time	Linus Torvalds
	commit 34ddc81a230b15c0e345b6b253049db731499f7e upstream. After all the FPU state cleanups and finally finding the problem that caused all our FPU save/restore problems, this re-introduces the preloading of FPU state that was removed in commit b3b0870ef3ff ("i387: do not preload FPU state at task switch time"). However, instead of simply reverting the removal, this reimplements preloading with several fixes, most notably - properly abstracted as a true FPU state switch, rather than as open-coded save and restore with various hacks. In particular, implementing it as a proper FPU state switch allows us to optimize the CR0.TS flag accesses: there is no reason to set the TS bit only to then almost immediately clear it again. CR0 accesses are quite slow and expensive, don't flip the bit back and forth for no good reason. - Make sure that the same model works for both x86-32 and x86-64, so that there are no gratuitous differences between the two due to the way they save and restore segment state differently due to architectural differences that really don't matter to the FPU state. - Avoid exposing the "preload" state to the context switch routines, and in particular allow the concept of lazy state restore: if nothing else has used the FPU in the meantime, and the process is still on the same CPU, we can avoid restoring state from memory entirely, just re-expose the state that is still in the FPU unit. That optimized lazy restore isn't actually implemented here, but the infrastructure is set up for it. Of course, older CPU's that use 'fnsave' to save the state cannot take advantage of this, since the state saving also trashes the state. In other words, there is now an actual _design_ to the FPU state saving, rather than just random historical baggage. Hopefully it's easier to follow as a result. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	i387: move TS_USEDFPU flag from thread_info to task_struct	Linus Torvalds
	commit f94edacf998516ac9d849f7bc6949a703977a7f3 upstream. This moves the bit that indicates whether a thread has ownership of the FPU from the TS_USEDFPU bit in thread_info->status to a word of its own (called 'has_fpu') in task_struct->thread.has_fpu. This fixes two independent bugs at the same time: - changing 'thread_info->status' from the scheduler causes nasty problems for the other users of that variable, since it is defined to be thread-synchronous (that's what the "TS_" part of the naming was supposed to indicate). So perfectly valid code could (and did) do ti->status \|= TS_RESTORE_SIGMASK; and the compiler was free to do that as separate load, or and store instructions. Which can cause problems with preemption, since a task switch could happen in between, and change the TS_USEDFPU bit. The change to TS_USEDFPU would be overwritten by the final store. In practice, this seldom happened, though, because the 'status' field was seldom used more than once, so gcc would generally tend to generate code that used a read-modify-write instruction and thus happened to avoid this problem - RMW instructions are naturally low fat and preemption-safe. - On x86-32, the current_thread_info() pointer would, during interrupts and softirqs, point to a copy of the real thread_info, because x86-32 uses %esp to calculate the thread_info address, and thus the separate irq (and softirq) stacks would cause these kinds of odd thread_info copy aliases. This is normally not a problem, since interrupts aren't supposed to look at thread information anyway (what thread is running at interrupt time really isn't very well-defined), but it confused the heck out of irq_fpu_usable() and the code that tried to squirrel away the FPU state. (It also caused untold confusion for us poor kernel developers). It also turns out that using 'task_struct' is actually much more natural for most of the call sites that care about the FPU state, since they tend to work with the task struct for other reasons anyway (ie scheduling). And the FPU data that we are going to save/restore is found there too. Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to the %esp issue. Cc: Arjan van de Ven <arjan@linux.intel.com> Reported-and-tested-by: Raphael Prevost <raphael@buro.asia> Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore	Linus Torvalds
	commit 4903062b5485f0e2c286a23b44c9b59d9b017d53 upstream. The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is pending. In order to not leak FIP state from one process to another, we need to do a floating point load after the fxsave of the old process, and before the fxrstor of the new FPU state. That resets the state to the (uninteresting) kernel load, rather than some potentially sensitive user information. We used to do this directly after the FPU state save, but that is actually very inconvenient, since it (a) corrupts what is potentially perfectly good FPU state that we might want to lazy avoid restoring later and (b) on x86-64 it resulted in a very annoying ordering constraint, where "__unlazy_fpu()" in the task switch needs to be delayed until after the DS segment has been reloaded just to get the new DS value. Coupling it to the fxrstor instead of the fxsave automatically avoids both of these issues, and also ensures that we only do it when actually necessary (the FP state after a save may never actually get used). It's simply a much more natural place for the leaked state cleanup. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	i387: do not preload FPU state at task switch time	Linus Torvalds
	commit b3b0870ef3ffed72b92415423da864f440f57ad6 upstream. Yes, taking the trap to re-load the FPU/MMX state is expensive, but so is spending several days looking for a bug in the state save/restore code. And the preload code has some rather subtle interactions with both paravirtualization support and segment state restore, so it's not nearly as simple as it should be. Also, now that we no longer necessarily depend on a single bit (ie TS_USEDFPU) for keeping track of the state of the FPU, we migth be able to do better. If we are really switching between two processes that keep touching the FP state, save/restore is inevitable, but in the case of having one process that does most of the FPU usage, we may actually be able to do much better than the preloading. In particular, we may be able to keep track of which CPU the process ran on last, and also per CPU keep track of which process' FP state that CPU has. For modern CPU's that don't destroy the FPU contents on save time, that would allow us to do a lazy restore by just re-enabling the existing FPU state - with no restore cost at all! Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	i387: don't ever touch TS_USEDFPU directly, use helper functions	Linus Torvalds
	commit 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 upstream. This creates three helper functions that do the TS_USEDFPU accesses, and makes everybody that used to do it by hand use those helpers instead. In addition, there's a couple of helper functions for the "change both CR0.TS and TS_USEDFPU at the same time" case, and the places that do that together have been changed to use those. That means that we have fewer random places that open-code this situation. The intent is partly to clarify the code without actually changing any semantics yet (since we clearly still have some hard to reproduce bug in this area), but also to make it much easier to use another approach entirely to caching the CR0.TS bit for software accesses. Right now we use a bit in the thread-info 'status' variable (this patch does not change that), but we might want to make it a full field of its own or even make it a per-cpu variable. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29	i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers	Linus Torvalds
	commit b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 upstream. Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do it. By moving it into the callers, we always do the TS_USEDFPU next to the CR0.TS accesses in the source code, and it's much easier to see how the two go hand in hand. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>