lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2013-03-06	Linux 3.2.40v3.2.40	Ben Hutchings

2013-03-06	ext4: fix kernel BUG on large-scale rm -rf commands	Theodore Ts'o
	commit 89a4e48f8479f8145eca9698f39fe188c982212f upstream. Commit 968dee7722: "ext4: fix hole punch failure when depth is greater than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel crashes when users ran run "rm -rf" on large directory hierarchy on ext4 filesystems on RAID devices: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710) Call Trace: [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110 [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0 [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0 [<ffffffff81207e05>] ext4_truncate+0xf5/0x100 [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490 [<ffffffff811a1312>] evict+0xa2/0x1a0 [<ffffffff811a1513>] iput+0x103/0x1f0 [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0 [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40 [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50 [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0 This could be reproduced as follows: The problem in commit 968dee7722 was that caused the variable 'i' to be left uninitialized if the truncate required more space than was available in the journal. This resulted in the function ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused ext4_ext_remove_space() to restart the truncate operation after starting a new jbd2 handle. Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: Marti Raudsepp <marti@juffo.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ext4: fix hole punch failure when depth is greater than 0	Ashish Sangwan
	commit 968dee77220768a5f52cf8b21d0bdb73486febef upstream. Whether to continue removing extents or not is decided by the return value of function ext4_ext_more_to_rm() which checks 2 conditions: a) if there are no more indexes to process. b) if the number of entries are decreased in the header of "depth -1". In case of hole punch, if the last block to be removed is not part of the last extent index than this index will not be deleted, hence the number of valid entries in the extent header of "depth - 1" will remain as it is and ext4_ext_more_to_rm will return 0 although the required blocks are not yet removed. This patch fixes the above mentioned problem as instead of removing the extents from the end of file, it starts removing the blocks from the particular extent from which removing blocks is actually required and continue backward until done. Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ext4: rewrite punch hole to use ext4_ext_remove_space()	Lukas Czerner
	commit 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 upstream. This commit rewrites ext4 punch hole implementation to use ext4_ext_remove_space() instead of its home gown way of doing this via ext4_ext_map_blocks(). There are several reasons for changing this. Firstly it is quite non obvious that punching hole needs to ext4_ext_map_blocks() to punch a hole, especially given that this function should map blocks, not unmap it. It also required a lot of new code in ext4_ext_map_blocks(). Secondly the design of it is not very effective. The reason is that we are trying to punch out blocks in ext4_ext_punch_hole() in opposite direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf() to iterate through the whole tree from the end to the start to find the requested extent for every extent we are going to punch out. And finally the current implementation does not use the existing code, but bring a lot of new code, which is IMO unnecessary since there already is some infrastructure we can use. Specifically ext4_ext_remove_space(). This commit changes ext4_ext_remove_space() to accept 'end' parameter so we can not only truncate to the end of file, but also remove the space in the middle of the file (punch a hole). Moreover, because the last block to punch out, might be in the middle of the extent, we have to split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either remove the whole fist part of split extent, or change its size. ext4_ext_remove_space() is then used to actually remove the space (extents) from within the hole, instead of ext4_ext_map_blocks(). Note that this also fix the issue with punch hole, where we would forget to remove empty index blocks from the extent tree, resulting in double free block error and file system corruption. This is simply because we now use different code path, where this problem does not exist. This has been tested with fsx running for several days and xfstests, plus xfstest #251 with '-o discard' run on the loop image (which converts discard requestes into punch hole to the backing file). All of it on 1K and 4K file system block size. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2.y: move EXT4_EXT_DATA_VALID{1,2} along with the other extent splitting flags] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	fs: cachefiles: add support for large files in filesystem caching	Justin Lecher
	commit 98c350cda2c14a343d34ea01a3d9c24fea5ec66d upstream. Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust context - dentry_open() takes dentry and vfsmount pointers, not a path pointer] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	exec: use -ELOOP for max recursion depth	Kees Cook
	commit d740269867021faf4ce38a449353d2b986c34a67 upstream. To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { argv-- = cmd; argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	kmod: make __request_module() killable	Oleg Nesterov
	commit 1cc684ab75123efe7ff446eb821d44375ba8fa30 upstream. As Tetsuo Handa pointed out, request_module() can stress the system while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE. The task T uses "almost all" memory, then it does something which triggers request_module(). Say, it can simply call sys_socket(). This in turn needs more memory and leads to OOM. oom-killer correctly chooses T and kills it, but this can't help because it sleeps in TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the TIF_MEMDIE task T. Make __request_module() killable. The only necessary change is that call_modprobe() should kmalloc argv and module_name, they can't live in the stack if we use UMH_KILLABLE. This memory is freed via call_usermodehelper_freeinfo()->cleanup. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	kmod: introduce call_modprobe() helper	Oleg Nesterov
	commit 3e63a93b987685f02421e18b2aa452d20553a88b upstream. No functional changes. Move the call_usermodehelper code from __request_module() into the new simple helper, call_modprobe(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	usermodehelper: ____call_usermodehelper() doesn't need do_exit()	Oleg Nesterov
	commit 5b9bd473e3b8a8c6c4ae99be475e6e9b27568555 upstream. Minor cleanup. ____call_usermodehelper() can simply return, no need to call do_exit() explicitely. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	usermodehelper: implement UMH_KILLABLE	Oleg Nesterov
	commit d0bd587a80960d7ba7e0c8396e154028c9045c54 upstream. Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC. The caller must ensure that subprocess_info->path/etc can not go away until call_usermodehelper_freeinfo(). call_usermodehelper_exec(UMH_KILLABLE) does wait_for_completion_killable. If it fails, it uses xchg(&sub_info->complete, NULL) to serialize with umh_complete() which does the same xhcg() to access sub_info->complete. If call_usermodehelper_exec wins, it can safely return. umh_complete() should get NULL and call call_usermodehelper_freeinfo(). Otherwise we know that umh_complete() was already called, in this case call_usermodehelper_exec() falls back to wait_for_completion() which should succeed "very soon". Note: UMH_NO_WAIT == -1 but it obviously should not be used with UMH_KILLABLE. We delay the neccessary cleanup to simplify the back porting. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	usermodehelper: introduce umh_complete(sub_info)	Oleg Nesterov
	commit b3449922502f5a161ee2b5022a33aec8472fbf18 upstream. Preparation. Add the new trivial helper, umh_complete(). Currently it simply does complete(sub_info->complete). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	asus-laptop: Do not call HWRS on init	Ben Hutchings
	commit cb7da022450cdaaebd33078b6b32fb7dd2aaf6db upstream. Since commit 8871e99f89b7 ('asus-laptop: HRWS/HWRS typo'), module initialisation is very slow on the Asus UL30A. The HWRS method takes about 12 seconds to run, and subsequent initialisation also seems to be delayed. Since we don't really need the result, don't bother calling it on init. Those who are curious can still get the result through the 'infos' device attribute. Update the comment about HWRS in show_infos(). Reported-by: ryan <draziw+deb@gmail.com> References: http://bugs.debian.org/692436 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Corentin Chary <corentin.chary@gmail.com> Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
2013-03-06	speakup: lower default software speech rate	Samuel Thibault
	commit cfd757010691eae4e17acc246f74e7622c3a2f05 upstream. Speech synthesis beginners need a low speech rate, and trained people want a high speech rate. A medium speech rate is thus actually not a good default for neither. Since trained people will typically know how to change the rate, better default for a low speech rate, which beginners can grasp and learn how to increase it afterwards This was agreed with users on the speakup mailing list. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	usb: Add USB_QUIRK_RESET_RESUME for all Logitech UVC webcams	Laurent Pinchart
	commit e387ef5c47ddeaeaa3cbdc54424cdb7a28dae2c0 upstream. Most Logitech UVC webcams (both early models that don't advertise UVC compatibility and newer UVC-advertised devices) require the RESET_RESUME quirk. Instead of listing each and every model, match the devices based on the UVC interface information. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Adjust context to apply after 3.2.38] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	usb: Add quirk detection based on interface information	Laurent Pinchart
	commit 80da2e0df5af700518611b7d1cc4fc9945bcaf95 upstream. When a whole class of devices (possibly from a specific vendor, or across multiple vendors) require a quirk, explictly listing all devices in the class make the quirks table unnecessarily large. Fix this by allowing matching devices based on interface information. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	8250: use correct value for PORT_BRCM_TRUMANAGE	Ben Hutchings
	When backporting commit ebebd49a8eab ('8250/16?50: Add support for Broadcom TruManage redirected serial port') I took the next available port type number for PORT_BRCM_TRUMANAGE (22). However, the 8250 port type numbers are exposed to userland through the TIOC{G,S}SERIAL ioctls and so must remain stable. Redefine PORT_BRCM_TRUMANAGE as 25, matching mainline as of commit 85f024401bf807. This leaves port types 22-24 within the valid range for 8250 but not implemented there. Change serial8250_verify_port() to specifically reject these and change serial8250_type() to return "unknown" for them (though I'm not sure why it would ever see them). Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	vhost: fix length for cross region descriptor	Michael S. Tsirkin
	commit bd97120fc3d1a11f3124c7c9ba1d91f51829eb85 upstream. If a single descriptor crosses a region, the second chunk length should be decremented by size translated so far, instead it includes the full descriptor length. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	rc: unlock on error in show_protocols()	Dan Carpenter
	commit 30ebc5e44d057a1619ad63fe32c8c1670c37c4b8 upstream. We recently introduced a new return -ENODEV in this function but we need to unlock before returning. [mchehab@redhat.com: found two patches with the same fix. Merged SOB's/acks into one patch] Acked-by: Herton R. Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Douglas Bagnall <douglas@paradise.net.nz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	Avoid sysfs oops when an rc_dev's raw device is absent	Douglas Bagnall
	commit 720bb6436ff30fccad05cf5bdf961ea5b1f5686d upstream. For some reason, when the lirc daemon learns that a usb remote control has been unplugged, it wants to read the sysfs attributes of the disappearing device. This is useful for uncovering transient inconsistencies, but less so for keeping the system running when such inconsistencies exist. Under some circumstances (like every time I unplug my dvb stick from my laptop), lirc catches an rc_dev whose raw event handler has been removed (presumably by ir_raw_event_unregister), and proceeds to interrogate the raw protocols supported by the NULL pointer. This patch avoids the NULL dereference, and ignores the issue of how this state of affairs came about in the first place. Version 2 incorporates changes recommended by Mauro Carvalho Chehab (-ENODEV instead of -EINVAL, and a signed-off-by). Signed-off-by: Douglas Bagnall <douglas@paradise.net.nz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	usb hid quirks for Masterkit MA901 usb radio	Alexey Klimov
	commit 0322bd3980b3ebf7dde8474e22614cb443d6479a upstream. Don't let Masterkit MA901 USB radio be handled by usb hid drivers. This device will be handled by radio-ma901.c driver. Signed-off-by: Alexey Klimov <klimov.linux@gmail.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ata_piix: Add Device IDs for Intel Wellsburg PCH	James Ralston
	commit 3aee8bc52c415aba8148f144e5e5359b0fd75dd1 upstream. This patch adds the IDE-mode SATA Device IDs for the Intel Wellsburg PCH Signed-off-by: James Ralston <james.d.ralston@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs	Seth Heasley
	commit aaa515277db9585eeb4fdeb4637b9f9df50a1dd9 upstream. This patch adds the IDE-mode SATA DeviceIDs for the Intel Avoton SOC. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ata_piix: Add Device IDs for Intel Lynx Point-LP PCH	James Ralston
	commit 389cd784969e9148fedcde0608f15bd74d6b769e upstream. This patch adds the IDE-mode SATA Device IDs for the Intel Lynx Point-LP PCH Signed-off-by: James Ralston <james.d.ralston@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ata_piix: IDE-mode SATA patch for Intel DH89xxCC DeviceIDs	Seth Heasley
	commit 96d5d96aedc29c75bb16433f6ecf8664ec3c1b46 upstream. This patch adds the IDE-mode SATA DeviceIDs for the Intel DH89xxCC PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ata_piix: IDE-mode SATA patch for Intel Lynx Point DeviceIDs	Seth Heasley
	commit 78140cfec503c60a178b11fbaae2fef63e9abdc0 upstream. This patch adds the IDE-mode SATA DeviceIDs for the Intel Lynx Point PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	pstore: Avoid deadlock in panic and emergency-restart path	Seiji Aguchi
	commit 9f244e9cfd70c7c0f82d3c92ce772ab2a92d9f64 upstream. [Issue] When pstore is in panic and emergency-restart paths, it may be blocked in those paths because it simply takes spin_lock. This is an example scenario which pstore may hang up in a panic path: - cpuA grabs psinfo->buf_lock - cpuB panics and calls smp_send_stop - smp_send_stop sends IRQ to cpuA - after 1 second, cpuB gives up on cpuA and sends an NMI instead - cpuA is now in an NMI handler while still holding buf_lock - cpuB is deadlocked This case may happen if a firmware has a bug and cpuA is stuck talking with it more than one second. Also, this is a similar scenario in an emergency-restart path: - cpuA grabs psinfo->buf_lock and stucks in a firmware - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer. And then, cpuB is deadlocked by taking psinfo->buf_lock again. [Solution] This patch avoids the deadlocking issues in both panic and emergency_restart paths by introducing a function, is_non_blocking_path(), to check if a cpu can be blocked in current path. With this patch, pstore is not blocked even if another cpu has taken a spin_lock, in those paths by changing from spin_lock_irqsave to spin_trylock_irqsave. In addition, according to a comment of emergency_restart() in kernel/sys.c, spin_lock shouldn't be taken in an emergency_restart path to avoid deadlock. This patch fits the comment below. <snip> /** * emergency_restart - reboot the system * * Without shutting down any hardware or taking any locks * reboot the system. This is called when we know we are in * trouble so this is our best effort to reboot. This is * safe to call in interrupt context. */ void emergency_restart(void) <snip> Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> [bwh: Backported to 3.2: - Adjust context - Add #include <linux/kmsg_dump.h>] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	staging: comedi: ni_labpc: set up command4 register after command3	Ian Abbott
	commit 22056e2b46246d97ff0f7c6e21a77b8daa07f02c upstream. Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting meaningful output from a Lab-PC+ in differential mode for AI cmds, but AI insn reads gave correct readings. He tracked it down to two problems, one of which is addressed by this patch. It seems that writing to the command3 register after writing to the command4 register in `labpc_ai_cmd()` messes up the differential reference bit setting in the command4 register. Set up the command4 register after the command3 register (as in `labpc_ai_rinsn()`) to avoid the problem. Thanks to Tuomas for suggesting the fix. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	staging: comedi: ni_labpc: correct differential channel sequence for AI commands	Ian Abbott
	commit 4c4bc25d0fa6beaf054c0b4c3b324487f266c820 upstream. Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting meaningful output from a Lab-PC+ in differential mode for AI cmds, but AI insn reads gave correct readings. He tracked it down to two problems, one of which is addressed by this patch. It seems the setting of the channel bits for particular scanning modes was incorrect for differential mode. (Only half the number of channels are available in differential mode; comedi refers to them as channels 0, 1, 2 and 3, but the hardware documentation refers to them as channels 0, 2, 4 and 6.) In differential mode, the setting of the channel enable bits in the command1 register should depend on whether the scan enable bit is set. Effectively, we need to double the comedi channel number when the scan enable bit is not set in differential mode. The scan enable bit gets set when the AI scan mode is `MODE_MULT_CHAN_UP` or `MODE_MULT_CHAN_DOWN`, and gets cleared when the AI scan mode is `MODE_SINGLE_CHAN` or `MODE_SINGLE_CHAN_INTERVAL`. The existing test for whether the comedi channel number needs to be doubled in differential mode is incorrect in `labpc_ai_cmd()`. This patch corrects the test. Thanks to Tuomas for suggesting the fix. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ipv6: use a stronger hash for tcp	Eric Dumazet
	[ Upstream commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 ] It looks like its possible to open thousands of TCP IPv6 sessions on a server, all landing in a single slot of TCP hash table. Incoming packets have to lookup sockets in a very long list. We should hash all bits from foreign IPv6 addresses, using a salt and hash mix, not a simple XOR. inet6_ehashfn() can also separately use the ports, instead of xoring them. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ipv4: fix a bug in ping_err().	Li Wei
	[ Upstream commit b531ed61a2a2a77eeb2f7c88b49aa5ec7d9880d8 ] We should get 'type' and 'code' from the outer ICMP header. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	xen-netback: cancel the credit timer when taking the vif down	David Vrabel
	[ Upstream commit 3e55f8b306cf305832a4ac78aa82e1b40e818ece ] If the credit timer is left armed after calling xen_netbk_remove_xenvif(), then it may fire and attempt to schedule the vif which will then oops as vif->netbk == NULL. This may happen both in the fatal error path and during normal disconnection from the front end. The sequencing during shutdown is critical to ensure that: a) vif->netbk doesn't become unexpectedly NULL; and b) the net device/vif is not freed. 1. Mark as unschedulable (netif_carrier_off()). 2. Synchronously cancel the timer. 3. Remove the vif from the schedule list. 4. Remove it from it netback thread group. 5. Wait for vif->refcnt to become 0. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reported-by: Christopher S. Aker <caker@theshore.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	xen-netback: correctly return errors from netbk_count_requests()	David Vrabel
	[ Upstream commit 35876b5ffc154c357476b2c3bdab10feaf4bd8f0 ] netbk_count_requests() could detect an error, call netbk_fatal_tx_error() but return 0. The vif may then be used afterwards (e.g., in a call to netbk_tx_error(). Since netbk_fatal_tx_error() could set vif->refcnt to 1, the vif may be freed immediately after the call to netbk_fatal_tx_error() (e.g., if the vif is also removed). Netback thread Xenwatch thread ------------------------------------------- netbk_fatal_tx_err() netback_remove() xenvif_disconnect() ... free_netdev() netbk_tx_err() Oops! Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Christopher S. Aker <caker@theshore.net> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	bridge: set priority of STP packets	Stephen Hemminger
	[ Upstream commit 547b4e718115eea74087e28d7fa70aec619200db ] Spanning Tree Protocol packets should have always been marked as control packets, this causes them to get queued in the high prirority FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge gets overloaded and can't communicate. This is a long-standing bug back to the first versions of Linux bridge. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	xen-pciback: rate limit error messages from xen_pcibk_enable_msi{,x}()	Jan Beulich
	commit 51ac8893a7a51b196501164e645583bf78138699 upstream. ... as being guest triggerable (e.g. by invoking XEN_PCI_OP_enable_msi{,x} on a device not being MSI/MSI-X capable). This is CVE-2013-0231 / XSA-43. Also make the two messages uniform in both their wording and severity. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [bwh: Backported to 3.2: add #include <linux/ratelimited.h>, needed by printk_ratelimited()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	unbreak automounter support on 64-bit kernel with 32-bit userspace (v2)	Helge Deller
	commit 4f4ffc3a5398ef9bdbb32db04756d7d34e356fcf upstream. automount-support is broken on the parisc architecture, because the existing #if list does not include a check for defined(__hppa__). The HPPA (parisc) architecture is similiar to other 64bit Linux targets where we have to define autofs_wqt_t (which is passed back and forth to user space) as int type which has a size of 32bit across 32 and 64bit kernels. During the discussion on the mailing list, H. Peter Anvin suggested to invert the #if list since only specific platforms (specifically those who do not have a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned long type. This suggestion is probably the best way to go, since Arm64 (and maybe others?) seems to have a non-working automounter. So in the long run even for other new upcoming architectures this inverted check seem to be the best solution, since it will not require them to change this #if again (unless they are 64bit only). Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Ian Kent <raven@themaw.net> Acked-by: Catalin Marinas <catalin.marinas@arm.com> CC: James Bottomley <James.Bottomley@HansenPartnership.com> CC: Rolf Eike Beer <eike-kernel@sf-tec.de> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	s390/timer: avoid overflow when programming clock comparator	Heiko Carstens
	commit d911e03d097bdc01363df5d81c43f69432eb785c upstream. Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function is used to avoid overflows when converting TOD format values to nanosecond values. The kvm interrupt code formerly however only worked by accident because of an overflow. It tried to program a timer that would expire in more than ~29 years. Because of the old TOD-to-nanoseconds overflow bug the real expiry value however was much smaller, but now it isn't anymore. This however triggers yet another bug in the function that programs the clock comparator s390_next_ktime(): if the absolute "expires" value is after 2042 this will result in an overflow and the programmed value is lower than the current TOD value which immediatly triggers a clock comparator (= timer) interrupt. Since the timer isn't expired it will be programmed immediately again and so on... the result is a dead system. To fix this simply program the maximum possible value if an overflow is detected. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	drm/radeon/evergreen+: wait for the MC to settle after MC blackout	Alex Deucher
	commit ed39fadd6df01095378e499fac3674883f16b853 upstream. Some chips seem to need a little delay after blacking out the MC before the requests actually stop. May fix: https://bugs.freedesktop.org/show_bug.cgi?id=56139 https://bugs.freedesktop.org/show_bug.cgi?id=57567 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	igb: Remove artificial restriction on RQDPC stat reading	Alexander Duyck
	commit ae1c07a6b7ced6c0c94c99e3b53f4e7856fa8bff upstream. For some reason the reading of the RQDPC register was being artificially limited to 4K. Instead of limiting the value we should read the value and add the full amount. Otherwise this can lead to a misleading number of dropped packets when the actual value is in fact much higher. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	nbd: fsync and kill block device on shutdown	Paolo Bonzini
	commit 3a2d63f87989e01437ba994df5f297528c353d7d upstream. There are two problems with shutdown in the NBD driver. 1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem. This patch adds the sync operation into __nbd_ioctl()'s NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted to processes that have CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the block device does not sync the filesystem, either). 2: Once we clear the socket we have no guarantee that later reads will come from the same backing storage. The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket clearing code so the page cache is cleaned, lest reads that hit on the page cache will return stale data from the previously-accessible disk. Example: # qemu-nbd -r -c/dev/nbd0 /dev/sr0 # file -s /dev/nbd0 /dev/stdin: # UDF filesystem data (version 1.5) etc. # qemu-nbd -d /dev/nbd0 # qemu-nbd -r -c/dev/nbd0 /dev/sda # file -s /dev/nbd0 /dev/stdin: # UDF filesystem data (version 1.5) etc. While /dev/sda has: # file -s /dev/sda /dev/sda: x86 boot sector; etc. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul Clements <Paul.Clements@steeleye.com> Cc: Alex Bligh <alex@alex.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjusted context - s/\bnbd\b/lo/ - Incorporate export of kill_bdev() from commit ff01bb483265 ('fs: move code out of buffer.c')] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	sysctl: fix null checking in bin_dn_node_address()	Xi Wang
	commit df1778be1a33edffa51d094eeda87c858ded6560 upstream. The null check of `strchr() + 1' is broken, which is always non-null, leading to OOB read. Instead, check the result of strchr(). Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	idr: fix top layer handling	Tejun Heo
	commit 326cf0f0f308933c10236280a322031f0097205d upstream. Most functions in idr fail to deal with the high bits when the idr tree grows to the maximum height. * idr_get_empty_slot() stops growing idr tree once the depth reaches MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to cover the whole range. The function doesn't even notice that it didn't grow the tree enough and ends up allocating the wrong ID given sufficiently high @starting_id. For example, on 64 bit, if the starting id is 0x7fffff01, idr_get_empty_slot() will grow the tree 5 layer deep, which only covers the 30 bits and then proceed to allocate as if the bit 30 wasn't specified. It ends up allocating 0x3fffff01 without the bit 30 but still returns 0x7fffff01. * __idr_remove_all() will not remove anything if the tree is fully grown. * idr_find() can't find anything if the tree is fully grown. * idr_for_each() and idr_get_next() can't iterate anything if the tree is fully grown. Fix it by introducing idr_max() which returns the maximum possible ID given the depth of tree and replacing the id limit checks in all affected places. As the idr_layer pointer array pa[] needs to be 1 larger than the maximum depth, enlarge pa[] arrays by one. While this plugs the discovered issues, the whole code base is horrible and in desparate need of rewrite. It's fragile like hell, Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust context - s/MAX_IDR_LEVEL/MAX_LEVEL/; s/MAX_IDR_SHIFT/MAX_ID_SHIFT/ - Drop change to idr_alloc()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	idr: make idr_get_next() good for rcu_read_lock()	Hugh Dickins
	commit 9f7de8275b46d9d11b1505adbfe6c2bb48df4741 upstream. Make one small adjustment to idr_get_next(): take the height from the top layer (stable under RCU) instead of from the root (unprotected by RCU), as idr_find() does: so that it can be used with RCU locking. Copied comment on RCU locking from idr_find(). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	firewire: add minor number range check to fw_device_init()	Tejun Heo
	commit 3bec60d511179853138836ae6e1b61fe34d9235f upstream. fw_device_init() didn't check whether the allocated minor number isn't too large. Fail if it goes overflows MINORBITS. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	block: fix synchronization and limit check in blk_alloc_devt()	Tejun Heo
	commit ce23bba842aee98092225d9576dba47c82352521 upstream. idr allocation in blk_alloc_devt() wasn't synchronized against lookup and removal, and its limit check was off by one - 1 << MINORBITS is the number of minors allowed, not the maximum allowed minor. Add locking and rename MAX_EXT_DEVT to NR_EXT_DEVT and fix limit checking. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	idr: fix a subtle bug in idr_get_next()	Tejun Heo
	commit 6cdae7416a1c45c2ce105a78187d9b7e8feb9e24 upstream. The iteration logic of idr_get_next() is borrowed mostly verbatim from idr_for_each(). It walks down the tree looking for the slot matching the current ID. If the matching slot is not found, the ID is incremented by the distance of single slot at the given level and repeats. The implementation assumes that during the whole iteration id is aligned to the layer boundaries of the level closest to the leaf, which is true for all iterations starting from zero or an existing element and thus is fine for idr_for_each(). However, idr_get_next() may be given any point and if the starting id hits in the middle of a non-existent layer, increment to the next layer will end up skipping the same offset into it. For example, an IDR with IDs filled between [64, 127] would look like the following. [ 0 64 ... ] /----/ \| \| \| NULL [ 64 ... 127 ] If idr_get_next() is called with 63 as the starting point, it will try to follow down the pointer from 0. As it is NULL, it will then try to proceed to the next slot in the same level by adding the slot distance at that level which is 64 - making the next try 127. It goes around the loop and finds and returns 127 skipping [64, 126]. Note that this bug also triggers in idr_for_each_entry() loop which deletes during iteration as deletions can make layers go away leaving the iteration with unaligned ID into missing layers. Fix it by ensuring proceeding to the next slot doesn't carry over the unaligned offset - ie. use round_up(id + 1, slot_distance) instead of id += slot_distance. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Teigland <teigland@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	block: fix ext_devt_idr handling	Tomas Henzl
	commit 7b74e912785a11572da43292786ed07ada7e3e0c upstream. While adding and removing a lot of disks disks and partitions this sometimes shows up: WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted) Hardware name: sysfs: cannot create duplicate filename '/dev/block/259:751' Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan] Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1 Call Trace: warn_slowpath_common+0x87/0xc0 warn_slowpath_fmt+0x46/0x50 sysfs_add_one+0xc9/0x130 sysfs_do_create_link+0x12b/0x170 sysfs_create_link+0x13/0x20 device_add+0x317/0x650 idr_get_new+0x13/0x50 add_partition+0x21c/0x390 rescan_partitions+0x32b/0x470 sd_open+0x81/0x1f0 [sd_mod] __blkdev_get+0x1b6/0x3c0 blkdev_get+0x10/0x20 register_disk+0x155/0x170 add_disk+0xa6/0x160 sd_probe_async+0x13b/0x210 [sd_mod] add_wait_queue+0x46/0x60 async_thread+0x102/0x250 default_wake_function+0x0/0x20 async_thread+0x0/0x250 kthread+0x96/0xa0 child_rip+0xa/0x20 kthread+0x0/0xa0 child_rip+0x0/0x20 This most likely happens because dev_t is freed while the number is still used and idr_get_new() is not protected on every use. The fix adds a mutex where it wasn't before and moves the dev_t free function so it is called after device del. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ocfs2: ac->ac_allow_chain_relink=0 won't disable group relink	Xiaowei.Hu
	commit 309a85b6861fedbb48a22d45e0e079d1be993b3a upstream. ocfs2_block_group_alloc_discontig() disables chain relink by setting ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple cluster groups. It doesn't keep the credits for all chain relink,but ocfs2_claim_suballoc_bits overrides this in this call trace: ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()-> __ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits() ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call ocfs2_search_chain() one time and disable it again, and then we run out of credits. Fix is to allow relink by default and disable it in ocfs2_block_group_alloc_discontig. Without this patch, End-users will run into a crash due to run out of credits, backtrace like this: RIP: 0010:[<ffffffffa0808b14>] [<ffffffffa0808b14>] jbd2_journal_dirty_metadata+0x164/0x170 [jbd2] RSP: 0018:ffff8801b919b5b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0 RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8 RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0 R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800 FS: 00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0) Call Trace: ocfs2_journal_dirty+0x2f/0x70 [ocfs2] ocfs2_relink_block_group+0x111/0x480 [ocfs2] ocfs2_search_chain+0x455/0x9a0 [ocfs2] ... Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly	Jeff Liu
	commit 32918dd9f19e5960af4cdfa41190bb843fb2247b upstream. We need to re-initialize the security for a new reflinked inode with its parent dirs if it isn't specified to be preserved for ocfs2_reflink(). However, the code logic is broken at ocfs2_init_security_and_acl() although ocfs2_init_security_get() succeed. As a result, ocfs2_acl_init() does not involked and therefore the default ACL of parent dir was missing on the new inode. Note this was introduced by 9d8f13ba3 ("security: new security_inode_init_security API adds function callback") To reproduce: set default ACL for the parent dir(ocfs2 in this case): $ setfacl -m default:user:jeff:rwx ../ocfs2/ $ getfacl ../ocfs2/ # file: ../ocfs2/ # owner: jeff # group: jeff user::rwx group::r-x other::r-x default:user::rwx default:user:jeff:rwx default:group::r-x default:mask::rwx default:other::r-x $ touch a $ getfacl a # file: a # owner: jeff # group: jeff user::rw- group::rw- other::r-- Before patching, create reflink file b from a, the user default ACL entry(user:jeff:rwx)was missing: $ ./ocfs2_reflink a b $ getfacl b # file: b # owner: jeff # group: jeff user::rw- group::rw- other::r-- In this case, the end user can also observed an error message at syslog: (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0 After applying this patch, create reflink file c from a: $ ./ocfs2_reflink a c $ getfacl c # file: c # owner: jeff # group: jeff user::rw- user:jeff:rwx #effective:rw- group::r-x #effective:r-- mask::rw- other::r-- Test program: /* Usage: reflink <source> <dest> / #include <stdio.h> #include <stdint.h> #include <stdbool.h> #include <string.h> #include <errno.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> static int reflink_file(char const src_name, char const dst_name, bool preserve_attrs) { int fd; #ifndef REFLINK_ATTR_NONE # define REFLINK_ATTR_NONE 0 #endif #ifndef REFLINK_ATTR_PRESERVE # define REFLINK_ATTR_PRESERVE 1 #endif #ifndef OCFS2_IOC_REFLINK struct reflink_arguments { uint64_t old_path; uint64_t new_path; uint64_t preserve; }; # define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments) #endif struct reflink_arguments args = { .old_path = (unsigned long) src_name, .new_path = (unsigned long) dst_name, .preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE : REFLINK_ATTR_NONE, }; fd = open(src_name, O_RDONLY); if (fd < 0) { fprintf(stderr, "Failed to open %s: %s\n", src_name, strerror(errno)); return -1; } if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) { fprintf(stderr, "Failed to reflink %s to %s: %s\n", src_name, dst_name, strerror(errno)); return -1; } } int main(int argc, char argv[]) { if (argc != 3) { fprintf(stdout, "Usage: %s source dest\n", argv[0]); return 1; } return reflink_file(argv[1], argv[2], 0); } Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Tao Ma <boyu.mt@taobao.com> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	x86: Make sure we can boot in the case the BDA contains pure garbage	H. Peter Anvin
	commit 7c10093692ed2e6f318387d96b829320aa0ca64c upstream. On non-BIOS platforms it is possible that the BIOS data area contains garbage instead of being zeroed or something equivalent (firmware people: we are talking of 1.5K here, so please do the sane thing.) We need on the order of 20-30K of low memory in order to boot, which may grow up to < 64K in the future. We probably want to avoid the lowest of the low memory. At the same time, it seems extremely unlikely that a legitimate EBDA would ever reach down to the 128K (which would require it to be over half a megabyte in size.) Thus, pick 128K as the cutoff for "this is insane, ignore." We may still end up reserving a bunch of extra memory on the low megabyte, but that is not really a major issue these days. In the worst case we lose 512K of RAM. This code really should be merged with trim_bios_range() in arch/x86/kernel/setup.c, but that is a bigger patch for a later merge window. Reported-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.org Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06	ocfs2: fix possible use-after-free with AIO	Jan Kara
	commit 9b171e0c74ca0549d0610990a862dd895870f04a upstream. Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>