lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2011-08-04	Linux 3.0.1v3.0.1	Greg Kroah-Hartman

2011-08-04	dm: fix idr leak on module removal	Alasdair G Kergon
	commit d15b774c2920d55e3d58275c97fbe3adc3afde38 upstream. Destroy _minor_idr when unloading the core dm module. (Found by kmemleak.) Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	dm mpath: fix potential NULL pointer in feature arg processing	Mike Snitzer
	commit 286f367dad40beb3234a18c17391d03ba939a7f3 upstream. Avoid dereferencing a NULL pointer if the number of feature arguments supplied is fewer than indicated. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	dm snapshot: flush disk cache when merging	Mikulas Patocka
	commit 762a80d9fc9f690a3a35983f3b4619a220650808 upstream. This patch makes dm-snapshot flush disk cache when writing metadata for merging snapshot. Without cache flushing the disk may reorder metadata write and other data writes and there is a possibility of data corruption in case of power fault. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	dm io: flush cpu cache with vmapped io	Mikulas Patocka
	commit bb91bc7bacb906c9f3a9b22744c53fa7564b51ba upstream. For normal kernel pages, CPU cache is synchronized by the dma layer. However, this is not done for pages allocated with vmalloc. If we do I/O to/from vmallocated pages, we must synchronize CPU cache explicitly. Prior to doing I/O on vmallocated page we must call flush_kernel_vmap_range to flush dirty cache on the virtual address. After finished read we must call invalidate_kernel_vmap_range to invalidate cache on the virtual address, so that accesses to the virtual address return newly read data and not stale data from CPU cache. This patch fixes metadata corruption on dm-snapshots on PA-RISC and possibly other architectures with caches indexed by virtual address. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	ALSA: sound/core/pcm_compat.c: adjust array index	Julia Lawall
	commit ca9380fd68514c7bc952282c1b4fc70607e9fe43 upstream. Convert array index from the loop bound to the loop index. A simplified version of the semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e1,e2,ar; @@ for(e1 = 0; e1 < e2; e1++) { <... ar[ - e2 + e1 ] ...> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	watchdog: shwdt: fix usage of mod_timer	David Engraf
	commit bea1906620ce72b63f83735c4cc2642b25ec54ae upstream. Fix the usage of mod_timer() and make the driver usable. mod_timer() must be called with an absolute timeout in jiffies. The old implementation used a relative timeout thus the hardware watchdog was never triggered. Signed-off-by: David Engraf <david.engraf@sysgo.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Wim Van sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	GFS2: Fix mount hang caused by certain access pattern to sysfs files	Steven Whitehouse
	commit 19237039919088781b4191a00bdc1284d8fea1dd upstream. Depending upon the order of userspace/kernel during the mount process, this can result in a hang without the _all version of the completion. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	rt2x00: Add device ID for RT539F device.	Gertjan van Wingerde
	commit 71e0b38c2914018b01f3f08b43ee9e3328197699 upstream. Reported-by: Wim Vander Schelden <wim@fixnum.org> Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	oom: task->mm == NULL doesn't mean the memory was freed	Oleg Nesterov
	commit c027a474a68065391c8773f6e83ed5412657e369 upstream. exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which frees the memory. However select_bad_process() checks ->mm != NULL before TIF_MEMDIE, so it continues to kill other tasks even if we have the oom-killed task freeing its memory. Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip the tasks which have already passed exit_notify() to ensure a zombie with TIF_MEMDIE set can't block oom-killer. Alternatively we could probably clear TIF_MEMDIE after exit_mmap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	AppArmor: Fix masking of capabilities in complain mode	John Johansen
	commit 25e75dff519bcce2cb35023105e7df51d7b9e691 upstream. AppArmor is masking the capabilities returned by capget against the capabilities mask in the profile. This is wrong, in complain mode the profile has effectively all capabilities, as the profile restrictions are not being enforced, merely tested against to determine if an access is known by the profile. This can result in the wrong behavior of security conscience applications like sshd which examine their capability set, and change their behavior accordingly. In this case because of the masked capability set being returned sshd fails due to DAC checks, even when the profile is in complain mode. Kernels affected: 2.6.36 - 3.0. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	AppArmor: Fix reference to rcu protected pointer outside of rcu_read_lock	John Johansen
	commit 04fdc099f9c80c7775dbac388fc97e156d4d47e7 upstream. The pointer returned from tracehook_tracer_task() is only valid inside the rcu_read_lock. However the tracer pointer obtained is being passed to aa_may_ptrace outside of the rcu_read_lock critical section. Mover the aa_may_ptrace test into the rcu_read_lock critical section, to fix this. Kernels affected: 2.6.36 - 3.0 Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID	Manfred Spraul
	commit d694ad62bf539dbb20a0899ac2a954555f9e4a83 upstream. If a semaphore array is removed and in parallel a sleeping task is woken up (signal or timeout, does not matter), then the woken up task does not wait until wake_up_sem_queue_do() is completed. This will cause crashes, because wake_up_sem_queue_do() will read from a stale pointer. The fix is simple: Regardless of anything, always call get_queue_result(). This function waits until wake_up_sem_queue_do() has finished it's task. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142 Reported-by: Yuriy Yevtukhov <yuriy@ucoz.com> Reported-by: Harald Laabs <kernel@dasr.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	hvc_console: Improve tty/console put_chars handling	Hendrik Brueckner
	commit 8c2381af0d3ef62a681dac5a141b6dabb27bf2e1 upstream. Currently, the hvc_console_print() function drops console output if the hvc backend's put_chars() returns 0. This patch changes this behavior to allow a retry through returning -EAGAIN. This change also affects the hvc_push() function. Both functions are changed to handle -EAGAIN and to retry the put_chars() operation. If a hvc backend returns -EAGAIN, the retry handling differs: - hvc_console_print() spins to write the complete console output. - hvc_push() behaves the same way as for returning 0. Now hvc backends can indirectly control the way how console output is handled through the hvc console layer. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	powerpc/pseries/hvconsole: Fix dropped console output	Anton Blanchard
	commit 51d33021425e1f905beb4208823146f2fb6517da upstream. Return -EAGAIN when we get H_BUSY back from the hypervisor. This makes the hvc console driver retry, avoiding dropped printks. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	SERIAL: SC26xx: Fix link error.	Ralf Baechle
	commit f2eb3cdf14457fccb14ae8c4d7d7cee088cd3957 upstream. Kconfig allows enabling console support for the SC26xx driver even when it's configured as a module resulting in a: ERROR: "uart_console_device" [drivers/tty/serial/sc26xx.ko] undefined! modpost error since the driver was merged in eea63e0e8a60d00485b47fb6e75d9aa2566b989b [SC26XX: New serial driver for SC2681 uarts] in 2.6.25. Fixed by only allowing console support to be enabled if the driver is builtin. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: linux-serial@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	tty/serial: Fix XSCALE serial ports, e.g. ce4100	Stephen Warren
	commit 5568181f188ae9485a0cdbea5ea48f63d186a298 upstream. Commit 4539c24fe4f92c09ee668ef959d3e8180df619b9 "tty/serial: Add explicit PORT_TEGRA type" introduced separate flags describing the need for IER bits UUE and RTOIE. Both bits are required for the XSCALE port type. While that patch updated uart_config[] as required, the auto-probing code wasn't updated to set the RTOIE flag when an XSCALE port type was detected. This caused such ports to stop working. This patch rectifies that. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	memcg: fix behavior of mem_cgroup_resize_limit()	Daisuke Nishimura
	commit 108b6a78463bb8c7163e4f9779f36ad8bbade334 upstream. Commit 22a668d7c3ef ("memcg: fix behavior under memory.limit equals to memsw.limit") introduced "memsw_is_minimum" flag, which becomes true when mem_limit == memsw_limit. The flag is checked at the beginning of reclaim, and "noswap" is set if the flag is true, because using swap is meaningless in this case. This works well in most cases, but when we try to shrink mem_limit, which is the same as memsw_limit now, we might fail to shrink mem_limit because swap doesn't used. This patch fixes this behavior by: - check MEM_CGROUP_RECLAIM_SHRINK at the begining of reclaim - If it is set, don't set "noswap" flag even if memsw_is_minimum is true. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	cfg80211: really ignore the regulatory request	Sven Neumann
	commit a203c2aa4cefccb879c879b8e1cad1a09a679e55 upstream. At the beginning of wiphy_update_regulatory() a check is performed whether the request is to be ignored. Then the request is sent to the driver nevertheless. This happens even if last_request points to NULL, leading to a crash in the driver: [<bf01d864>] (lbs_set_11d_domain_info+0x28/0x1e4 [libertas]) from [<c03b714c>] (wiphy_update_regulatory+0x4d0/0x4f4) [<c03b714c>] (wiphy_update_regulatory+0x4d0/0x4f4) from [<c03b4008>] (wiphy_register+0x354/0x420) [<c03b4008>] (wiphy_register+0x354/0x420) from [<bf01b17c>] (lbs_cfg_register+0x80/0x164 [libertas]) [<bf01b17c>] (lbs_cfg_register+0x80/0x164 [libertas]) from [<bf020e64>] (lbs_start_card+0x20/0x88 [libertas]) [<bf020e64>] (lbs_start_card+0x20/0x88 [libertas]) from [<bf02cbd8>] (if_sdio_probe+0x898/0x9c0 [libertas_sdio]) Fix this by returning early. Also remove the out: label as it is not any longer needed. Signed-off-by: Sven Neumann <s.neumann@raumfeld.com> Cc: linux-wireless@vger.kernel.org Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Daniel Mack <daniel@zonque.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	EHCI: fix direction handling for interrupt data toggles	Alan Stern
	commit e04f5f7e423018bcec84c11af2058cdce87816f3 upstream. This patch (as1480) fixes a rather obscure bug in ehci-hcd. The qh_update() routine needs to know the number and direction of the endpoint corresponding to its QH argument. The number can be taken directly from the QH data structure, but the direction isn't stored there. The direction is taken instead from the first qTD linked to the QH. However, it turns out that for interrupt transfers, qh_update() gets called before the qTDs are linked to the QH. As a result, qh_update() computes a bogus direction value, which messes up the endpoint toggle handling. Under the right combination of circumstances this causes usb_reset_endpoint() not to work correctly, which causes packets to be dropped and communications to fail. Now, it's silly for the QH structure not to have direct access to all the descriptor information for the corresponding endpoint. Ultimately it may get a pointer to the usb_host_endpoint structure; for now, adding a copy of the direction flag solves the immediate problem. This allows the Spyder2 color-calibration system (a low-speed USB device that sends all its interrupt data packets with the toggle set to 0 and hance requires constant use of usb_reset_endpoint) to work when connected through a high-speed hub. Thanks to Graeme Gill for supplying the hardware that allowed me to track down this bug. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Graeme Gill <graeme@argyllcms.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	EHCI: only power off port if over-current is active	Sergei Shtylyov
	commit 81463c1d707186adbbe534016cd1249edeab0dac upstream. MAX4967 USB power supply chip we use on our boards signals over-current when power is not enabled; once it's enabled, over-current signal returns to normal. That unfortunately caused the endless stream of "over-current change on port" messages. The EHCI root hub code reacts on every over-current signal change with powering off the port -- such change event is generated the moment the port power is enabled, so once enabled the power is immediately cut off. I think we should only cut off power when we're seeing the active over-current signal, so I'm adding such check to that code. I also think that the fact that we've cut off the port power should be reflected in the result of GetPortStatus request immediately, hence I'm adding a PORTSCn register readback after write... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	n_gsm: fix the wrong FCS handling	Du, Alek
	commit f086ced17191fa0c5712539d2b680eae3dc972a1 upstream. FCS could be GSM0_SOF, so will break state machine... [This byte isn't quoted in any way so a SOF here doesn't imply an error occurred.] Signed-off-by: Alek Du <alek.du@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> [Trivial but best backported once its in 3.1rc I think] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	proc: fix a race in do_io_accounting()	Vasiliy Kulikov
	commit 293eb1e7772b25a93647c798c7b89bf26c2da2e0 upstream. If an inode's mode permits opening /proc/PID/io and the resulting file descriptor is kept across execve() of a setuid or similar binary, the ptrace_may_access() check tries to prevent using this fd against the task with escalated privileges. Unfortunately, there is a race in the check against execve(). If execve() is processed after the ptrace check, but before the actual io information gathering, io statistics will be gathered from the privileged process. At least in theory this might lead to gathering sensible information (like ssh/ftp password length) that wouldn't be available otherwise. Holding task->signal->cred_guard_mutex while gathering the io information should protect against the race. The order of locking is similar to the one inside of ptrace_attach(): first goes cred_guard_mutex, then lock_task_sighand(). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	NFS: Fix spurious readdir cookie loop messages	Trond Myklebust
	commit 0c0308066ca53fdf1423895f3a42838b67b3a5a8 upstream. If the directory contents change, then we have to accept that the file->f_pos value may shrink if we do a 'search-by-cookie'. In that case, we should turn off the loop detection and let the NFS client try to recover. The patch also fixes a second loop detection bug by ensuring that after turning on the ctx->duped flag, we read at least one new cookie into ctx->dir_cookie before attempting to match with ctx->dup_cookie. Reported-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()	Trond Myklebust
	commit ed1e6211a0a134ff23592c6f057af982ad5dab52 upstream. nfs_mark_return_delegation() is usually called without any locking, and so it is not safe to dereference delegation->inode. Since the inode is only used to discover the nfs_client anyway, it makes more sense to have the callers pass a valid pointer to the nfs_server as a parameter. Reported-by: Ian Kent <raven@themaw.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	svcrpc: fix list-corrupting race on nfsd shutdown	J. Bruce Fields
	commit ebc63e531cc6a457595dd110b07ac530eae788c3 upstream. After commit 3262c816a3d7fb1eaabce633caa317887ed549ae "[PATCH] knfsd: split svc_serv into pools", svc_delete_xprt (then svc_delete_socket) no longer removed its xpt_ready (then sk_ready) field from whatever list it was on, noting that there was no point since the whole list was about to be destroyed anyway. That was mostly true, but forgot that a few svc_xprt_enqueue()'s might still be hanging around playing with the about-to-be-destroyed list, and could get themselves into trouble writing to freed memory if we left this xprt on the list after freeing it. (This is actually functionally identical to a patch made first by Ben Greear, but with more comments.) Cc: gnb@fmeh.org Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	nfsd4: fix file leak on open_downgrade	J. Bruce Fields
	commit f197c27196a5e7631b89e2e92daa096fcf7c302c upstream. Stateid's hold a read reference for a read open, a write reference for a write open, and an additional one of each for each read+write open. The latter wasn't getting put on a downgrade, so something like: open RW open R downgrade to R was resulting in a file leak. Also fix an imbalance in an error path. Regression from 7d94784293096c0a46897acdb83be5abd9278ece "nfsd4: fix downgrade/lock logic". Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	nfsd4: remember to put RW access on stateid destruction	J. Bruce Fields
	commit 499f3edc23ca0431f3a0a6736b3a40944c81bf3b upstream. Without this, for example, open read open read+write close will result in a struct file leak. Regression from 7d94784293096c0a46897acdb83be5abd9278ece "nfsd4: fix downgrade/lock logic". Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	nfsd: don't break lease on CLAIM_DELEGATE_CUR	Casey Bodley
	commit 0c12eaffdf09466f36a9ffe970dda8f4aeb6efc0 upstream. CLAIM_DELEGATE_CUR is used in response to a broken lease; allowing it to break the lease and return EAGAIN leaves the client unable to make progress in returning the delegation nfs4_get_vfs_file() now takes struct nfsd4_open for access to the claim type, and calls nfsd_open() with NFSD_MAY_NOT_BREAK_LEASE when claim type is CLAIM_DELEGATE_CUR Signed-off-by: Casey Bodley <cbodley@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	eCryptfs: Unlock keys needed by ecryptfsd	Tyler Hicks
	commit b2987a5e05ec7a1af7ca42e5d5349d7a22753031 upstream. Fixes a regression caused by b5695d04634fa4ccca7dcbc05bb4a66522f02e0b Kernel keyring keys containing eCryptfs authentication tokens should not be write locked when calling out to ecryptfsd to wrap and unwrap file encryption keys. The eCryptfs kernel code can not hold the key's write lock because ecryptfsd needs to request the key after receiving such a request from the kernel. Without this fix, all file opens and creates will timeout and fail when using the eCryptfs PKI infrastructure. This is not an issue when using passphrase-based mount keys, which is the most widely deployed eCryptfs configuration. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: Roberto Sassu <roberto.sassu@polito.it> Tested-by: Roberto Sassu <roberto.sassu@polito.it> Tested-by: Alexis Hafner1 <haf@zurich.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	ecryptfs: Make inode bdi consistent with superblock bdi	Thieu Le
	commit 985ca0e626e195ea08a1a82b8dbeb6719747429a upstream. Make the inode mapping bdi consistent with the superblock bdi so that dirty pages are flushed properly. Signed-off-by: Thieu Le <thieule@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	ext3: Fix oops in ext3_try_to_allocate_with_rsv()	Jan Kara
	commit ad95c5e9bc8b5885f94dce720137cac8fa8da4c9 upstream. Block allocation is called from two places: ext3_get_blocks_handle() and ext3_xattr_block_set(). These two callers are not necessarily synchronized because xattr code holds only xattr_sem and i_mutex, and ext3_get_blocks_handle() may hold only truncate_mutex when called from writepage() path. Block reservation code does not expect two concurrent allocations to happen to the same inode and thus assertions can be triggered or reservation structure corruption can occur. Fix the problem by taking truncate_mutex in xattr code to serialize allocations. CC: Sage Weil <sage@newdream.net> Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails	Jiaying Zhang
	commit 575a1d4bdfa2ea9fc10733013136145b497e1be0 upstream. Upon corrupted inode or disk failures, we may fail after we already allocate some blocks from the inode or take some blocks from the inode's preallocation list, but before we successfully insert the corresponding extent to the extent tree. In this case, we should free any allocated blocks and discard the inode's preallocated blocks because the entries in the inode's preallocation list may be in an inconsistent state. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	ext4: fix i_blocks/quota accounting when extent insertion fails	Maxim Patlasov
	commit 7132de744ba76930d13033061018ddd7e3e8cd91 upstream. The current implementation of ext4_free_blocks() always calls dquot_free_block This looks quite sensible in the most cases: blocks to be freed are associated with inode and were accounted in quota and i_blocks some time ago. However, there is a case when blocks to free were not accounted by the time calling ext4_free_blocks() yet: 1. delalloc is on, write_begin pre-allocated some space in quota 2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks() 3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from ext4_ext_insert_extent() and calls ext4_free_blocks(). In this scenario, ext4_free_blocks() calls dquot_free_block() who, in turn, decrements i_blocks for blocks which were not accounted yet (due to delalloc) After clean umount, e2fsck reports something like: > Inode 21, i_blocks is 5080, should be 5128. Fix<y>? because i_blocks was erroneously decremented as explained above. The patch fixes the problem by passing the new flag EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request that the dquot_free_block() call be skipped. Signed-off-by: Maxim Patlasov <maxim.patlasov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	xtensa: prevent arbitrary read in ptrace	Dan Rosenberg
	commit 0d0138ebe24b94065580bd2601f8bb7eb6152f56 upstream. Prevent an arbitrary kernel read. Check the user pointer with access_ok() before copying data in. [akpm@linux-foundation.org: s/EIO/EFAULT/] Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Christian Zankel <chris@zankel.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	mm/backing-dev.c: reset bdi min_ratio in bdi_unregister()	Peter Zijlstra
	commit ccb6108f5b0b541d3eb332c3a73e645c0f84278e upstream. Vito said: : The system has many usb disks coming and going day to day, with their : respective bdi's having min_ratio set to 1 when inserted. It works for : some time until eventually min_ratio can no longer be set, even when the : active set of bdi's seen in /sys/class/bdi/*/min_ratio doesn't add up to : anywhere near 100. : : This then leads to an unrelated starvation problem caused by write-heavy : fuse mounts being used atop the usb disks, a problem the min_ratio setting : at the underlying devices bdi effectively prevents. Fix this leakage by resetting the bdi min_ratio when unregistering the BDI. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Vito Caputo <lkml@pengaru.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	mm/futex: fix futex writes on archs with SW tracking of dirty & young	Benjamin Herrenschmidt
	commit 2efaca927f5cd7ecd0f1554b8f9b6a9a2c329c03 upstream. I haven't reproduced it myself but the fail scenario is that on such machines (notably ARM and some embedded powerpc), if you manage to hit that futex path on a writable page whose dirty bit has gone from the PTE, you'll livelock inside the kernel from what I can tell. It will go in a loop of trying the atomic access, failing, trying gup to "fix it up", getting succcess from gup, go back to the atomic access, failing again because dirty wasn't fixed etc... So I think you essentially hang in the kernel. The scenario is probably rare'ish because affected architecture are embedded and tend to not swap much (if at all) so we probably rarely hit the case where dirty is missing or young is missing, but I think Shan has a piece of SW that can reliably reproduce it using a shared writable mapping & fork or something like that. On archs who use SW tracking of dirty & young, a page without dirty is effectively mapped read-only and a page without young unaccessible in the PTE. Additionally, some architectures might lazily flush the TLB when relaxing write protection (by doing only a local flush), and expect a fault to invalidate the stale entry if it's still present on another processor. The futex code assumes that if the "in_atomic()" access -EFAULT's, it can "fix it up" by causing get_user_pages() which would then be equivalent to taking the fault. However that isn't the case. get_user_pages() will not call handle_mm_fault() in the case where the PTE seems to have the right permissions, regardless of the dirty and young state. It will eventually update those bits ... in the struct page, but not in the PTE. Additionally, it will not handle the lazy TLB flushing that can be required by some architectures in the fault case. Basically, gup is the wrong interface for the job. The patch provides a more appropriate one which boils down to just calling handle_mm_fault() since what we are trying to do is simulate a real page fault. The futex code currently attempts to write to user memory within a pagefault disabled section, and if that fails, tries to fix it up using get_user_pages(). This doesn't work on archs where the dirty and young bits are maintained by software, since they will gate access permission in the TLB, and will not be updated by gup(). In addition, there's an expectation on some archs that a spurious write fault triggers a local TLB flush, and that is missing from the picture as well. I decided that adding those "features" to gup() would be too much for this already too complex function, and instead added a new simpler fixup_user_fault() which is essentially a wrapper around handle_mm_fault() which the futex code can call. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Shan Hai <haishan.bai@gmail.com> Tested-by: Shan Hai <haishan.bai@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <darren.hart@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	geode: reflect mfgpt dependency on mfd	Philip A. Prindeville
	commit 703f03c896fdbd726b809066ae279df513992f0e upstream. As stated in drivers/mfd/cs5535-mfd.c, the mfd driver exposes the BARs which then make the GPIO, MFGPT, ACPI, etc. all visible to the system. So the dependencies of the MFGPT stuff have changed, and most people expect Kconfig to bring in the necessary dependencies. Without them, the module fails to load and most people don't understand why because the details of the rewrite aren't captured anywhere most people who know to look. This dependency needs to be reflected in Kconfig. Signed-off-by: Philip A. Prindeville <philipp@redfish-solutions.com> Acked-by: Alexandros C. Couloumbis <alex@ozo.com> Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	drivers/firmware/sigma.c needs MODULE_LICENSE	Randy Dunlap
	commit 27c46a2546c75c6814562e85b751e3d64c188ad5 upstream. Fix module tainting message: sigma: module license 'unspecified' taints kernel. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	cciss: do not attempt to read from a write-only register	Stephen M. Cameron
	commit 07d0c38e7d84f911c72058a124c7f17b3c779a65 upstream. Most smartarrays will tolerate it, but some new ones don't. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Note: this is a regression caused by commit 1ddd5049 Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	PCI: ARI is a PCIe v2 feature	Chris Wright
	commit 864d296cf948aef0fa32b81407541572583f7572 upstream. The function pci_enable_ari() may mistakenly set the downstream port of a v1 PCIe switch in ARI Forwarding mode. This is a PCIe v2 feature, and with an SR-IOV device on that switch port believing the switch above is ARI capable it may attempt to use functions 8-255, translating into invalid (non-zero) device numbers for that bus. This has been seen to cause Completion Timeouts and general misbehaviour including hangs and panics. Acked-by: Don Dutile <ddutile@redhat.com> Tested-by: Don Dutile <ddutile@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	XZ: Fix missing <linux/kernel.h> include	Lasse Collin
	commit 81d67439855a7f928d90965d832aa4f2fb677342 upstream. <linux/kernel.h> is needed for min_t. The old version happened to work on x86 because <asm/unaligned.h> indirectly includes <linux/kernel.h>, but it didn't work on ARM. <linux/kernel.h> includes <asm/byteorder.h> so it's not necessary to include it explicitly anymore. Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	tracing: Have "enable" file use refcounts like the "filter" file	Steven Rostedt
	commit 40ee4dffff061399eb9358e0c8fcfbaf8de4c8fe upstream. The "enable" file for the event system can be removed when a module is unloaded and the event system only has events from that module. As the event system nr_events count goes to zero, it may be freed if its ref_count is also set to zero. Like the "filter" file, the "enable" file may be opened by a task and referenced later, after a module has been unloaded and the events for that event system have been removed. Although the "filter" file referenced the event system structure, the "enable" file only references a pointer to the event system name. Since the name is freed when the event system is removed, it is possible that an access to the "enable" file may reference a freed pointer. Update the "enable" file to use the subsystem_open() routine that the "filter" file uses, to keep a reference to the event system structure while the "enable" file is opened. Reported-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	tracing: Fix bug when reading system filters on module removal	Steven Rostedt
	commit e9dbfae53eeb9fc3d4bb7da3df87fa9875f5da02 upstream. The event system is freed when its nr_events is set to zero. This happens when a module created an event system and then later the module is removed. Modules may share systems, so the system is allocated when it is created and freed when the modules are unloaded and all the events under the system are removed (nr_events set to zero). The problem arises when a task opened the "filter" file for the system. If the module is unloaded and it removed the last event for that system, the system structure is freed. If the task that opened the filter file accesses the "filter" file after the system has been freed, the system will access an invalid pointer. By adding a ref_count, and using it to keep track of what is using the event system, we can free it after all users are finished with the event system. Reported-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	irq_work, alpha: Fix up arch hooks	Peter Zijlstra
	commit 0f933625e7b6c3d91878ae95e341bf1984db7eaf upstream. Commit e360adbe29 ("irq_work: Add generic hardirq context callbacks") fouled up the Alpha bit, not properly naming the arch specific function that raises the 'self-IPI'. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Link: http://lkml.kernel.org/n/tip-gukh0txmql2l4thgrekzzbfy@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	powerpc/kdump: Fix timeout in crash_kexec_wait_realmode	Michael Neuling
	commit 63f21a56f1cc0b800a4c00349c59448f82473d19 upstream. The existing code it pretty ugly. How about we clean it up even more like this? From: Anton Blanchard <anton@samba.org> We check for timeout expiry in the outer loop, but we also need to check it in the inner loop or we can lock up forever waiting for a CPU to hit real mode. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	oprofile, x86: Fix nmi-unsafe callgraph support	Robert Richter
	commit a0e3e70243f5b270bc3eca718f0a9fa5e6b8262e upstream. Current oprofile's x86 callgraph support may trigger page faults throwing the BUG_ON(in_nmi()) message below. This patch fixes this by using the same nmi-safe copy-from-user code as in perf. ------------[ cut here ]------------ kernel BUG at .../arch/x86/kernel/traps.c:436! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:07:00.0/0000:08:04.0/net/eth0/broadcast CPU 5 Modules linked in: Pid: 8611, comm: opcontrol Not tainted 2.6.39-00007-gfe47ae7 #1 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff813e8e35>] [<ffffffff813e8e35>] do_nmi+0x22/0x1ee RSP: 0000:ffff88042fd47f28 EFLAGS: 00010002 RAX: ffff88042c0a7fd8 RBX: 0000000000000001 RCX: 00000000c0000101 RDX: 00000000ffff8804 RSI: ffffffffffffffff RDI: ffff88042fd47f58 RBP: ffff88042fd47f48 R08: 0000000000000004 R09: 0000000000001484 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88042fd47f58 R13: 0000000000000000 R14: ffff88042fd47d98 R15: 0000000000000020 FS: 00007fca25e56700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000074 CR3: 000000042d28b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process opcontrol (pid: 8611, threadinfo ffff88042c0a6000, task ffff88042c532310) Stack: 0000000000000000 0000000000000001 ffff88042c0a7fd8 0000000000000000 ffff88042fd47de8 ffffffff813e897a 0000000000000020 ffff88042fd47d98 0000000000000000 ffff88042c0a7fd8 ffff88042fd47de8 0000000000000074 Call Trace: <NMI> [<ffffffff813e897a>] nmi+0x1a/0x20 [<ffffffff813f08ab>] ? bad_to_user+0x25/0x771 <<EOE>> Code: ff 59 5b 41 5c 41 5d c9 c3 55 65 48 8b 04 25 88 b5 00 00 48 89 e5 41 55 41 54 49 89 fc 53 48 83 ec 08 f6 80 47 e0 ff ff 04 74 04 <0f> 0b eb fe 81 80 44 e0 ff ff 00 00 01 04 65 ff 04 25 c4 0f 01 RIP [<ffffffff813e8e35>] do_nmi+0x22/0x1ee RSP <ffff88042fd47f28> ---[ end trace ed6752185092104b ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 8611, comm: opcontrol Tainted: G D 2.6.39-00007-gfe47ae7 #1 Call Trace: <NMI> [<ffffffff813e5e0a>] panic+0x8c/0x188 [<ffffffff813e915c>] oops_end+0x81/0x8e [<ffffffff8100403d>] die+0x55/0x5e [<ffffffff813e8c45>] do_trap+0x11c/0x12b [<ffffffff810023c8>] do_invalid_op+0x91/0x9a [<ffffffff813e8e35>] ? do_nmi+0x22/0x1ee [<ffffffff8131e6fa>] ? oprofile_add_sample+0x83/0x95 [<ffffffff81321670>] ? op_amd_check_ctrs+0x4f/0x2cf [<ffffffff813ee4d5>] invalid_op+0x15/0x20 [<ffffffff813e8e35>] ? do_nmi+0x22/0x1ee [<ffffffff813e8e7a>] ? do_nmi+0x67/0x1ee [<ffffffff813e897a>] nmi+0x1a/0x20 [<ffffffff813f08ab>] ? bad_to_user+0x25/0x771 <<EOE>> Cc: John Lumby <johnlumby@hotmail.com> Cc: Maynard Johnson <maynardj@us.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	kexec, x86: Fix incorrect jump back address if not preserving context	Huang Ying
	commit 050438ed5a05b25cdf287f5691e56a58c2606997 upstream. In kexec jump support, jump back address passed to the kexeced kernel via function calling ABI, that is, the function call return address is the jump back entry. Furthermore, jump back entry == 0 should be used to signal that the jump back or preserve context is not enabled in the original kernel. But in the current implementation the stack position used for function call return address is not cleared context preservation is disabled. The patch fixes this bug. Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	pnfs: use lwb as layoutcommit length	Peng Tao
	commit 3557c6c3be5b2ca0b11365db7f8a813253eb520b upstream. Using NFS4_MAX_UINT64 will break current protocol. [Needed in v3.0] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04	pnfs: let layoutcommit handle a list of lseg	Peng Tao
	commit a9bae5666d0510ad69bdb437371c9a3e6b770705 upstream. There can be multiple lseg per file, so layoutcommit should be able to handle it. [Needed in v3.0] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>