lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2010-07-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.33.y	Thomas Gleixner
	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-13	vfs: Revert the scalability patches	Thomas Gleixner
	We still have sporadic and hard to debug problems. Revert it for now and revisit with Nick's new version. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-05	Btrfs: should add a permission check for setfacl	Shi Weihua
	commit 2f26afba46f0ebf155cf9be746496a0304a5b7cf upstream. On btrfs, do the following ------------------ # su user1 # cd btrfs-part/ # touch aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rw- group::rw- other::r-- # su user2 # cd btrfs-part/ # setfacl -m u::rwx aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rwx <- successed to setfacl group::rw- other::r-- ------------------ but we should prohibit it that user2 changing user1's acl. In fact, on ext3 and other fs, a message occurs: setfacl: aaa: Operation not permitted This patch fixed it. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	vfs: add NOFOLLOW flag to umount(2)	Miklos Szeredi
	commit db1f05bb85d7966b9176e293f3ceead1cb8b5d79 upstream. Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent symlink attacks in unprivileged unmounts (fuse, samba, ncpfs). Additionally, return -EINVAL if an unknown flag is used (and specify an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for the caller to determine if a flag is supported or not. CC: Eugene Teo <eugene@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	CIFS: Allow null nd (as nfs server uses) on create	Steve French
	commit fa588e0c57048b3d4bfcd772d80dc0615f83fd35 upstream. While creating a file on a server which supports unix extensions such as Samba, if a file is being created which does not supply nameidata (i.e. nd is null), cifs client can oops when calling cifs_posix_open. Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	GFS2: Fix permissions checking for setflags ioctl()	Steven Whitehouse
	commit 7df0e0397b9a18358573274db9fdab991941062f upstream. We should be checking for the ownership of the file for which flags are being set, rather than just for write access. Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files	Theodore Ts'o
	commit 1f5a81e41f8b1a782c68d3843e9ec1bfaadf7d72 upstream. Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the donor file is an append-only file, we should not allow the operation to proceed, lest we end up overwriting the contents of an append-only file. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	ext4: check s_log_groups_per_flex in online resize code	Eric Sandeen
	commit 42007efd569f1cf3bfb9a61da60ef6c2179508ca upstream. If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549 Reported-by: Alessandro Polverini <alex@nibbles.it> Test-case-by: Christoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	wrong type for 'magic' argument in simple_fill_super()	Roberto Sassu
	commit 7d683a09990ff095a91b6e724ecee0ff8733274a upstream. It's used to superblock ->s_magic, which is unsigned long. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Reviewed-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	exofs: confusion between kmap() and kmap_atomic() api	Dan Carpenter
	commit ddf08f4b90a413892bbb9bb2e8a57aed991cd47d upstream. For kmap_atomic() we call kunmap_atomic() on the returned pointer. That's different from kmap() and kunmap() and so it's easy to get them backwards. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	writeback: disable periodic old data writeback for !dirty_writeback_centisecs	Jens Axboe
	commit 69b62d01ec44fe0d505d89917392347732135a4d upstream. Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled periodic dirty writeback from kupdate. This got broken and now causes excessive sys CPU usage if set to zero, as we'll keep beating on schedule(). Reported-by: Justin Maggard <jmaggard10@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	NFSD: don't report compiled-out versions as present	Pavel Emelyanov
	commit 15ddb4aec54422ead137b03ea4e9b3f5db3f7cc2 upstream. The /proc/fs/nfsd/versions file calls nfsd_vers() to check whether the particular nfsd version is present/available. The problem is that once I turn off e.g. NFSD-V4 this call returns -1 which is true from the callers POV which is wrong. The proposal is to report false in that case. The bug has existed since 6658d3a7bbfd1768 "[PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions". Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-06-08	dcache: Prevent d_genocide() from decrementing d_count more than once	Steven Rostedt
	When d_genocide is called, it traverses the dentries decrementing the d_count. After the loops, a check is made to see if a rename or other change has been made, if so, it traverse the tree again. Unfortunately, this will decrement the same dentries that it decremented the first time. To avoid this multiple decrement, I added a DCACHE_GENOCIDE flag that will be set to a dentry when the d_genocide has decremented its counter. It will only decrement the counter if this flag is not set. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <npiggin@suse.de> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Clark Williams <williams@redhat.com> Acked-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-06-08	nfs: Avoid igrab deadlock.	john stultz
	On Fri, 2010-05-21 at 11:04 -0400, Jan Kansky wrote: > It's crashing when I try to login as a user that has a nfs4 mounted home > directory. See attached. > > I saw the discussion here: > > http://kerneltrap.org/mailarchive/linux-kernel/2010/5/7/4566886/thread > > That patch you mentioned there is already included in 2.6.33.4, and yet > I still see: > > kernel BUG at kernel/rtmutex.c:808! So the issue is nfs4_get_open_state() is calling igrab() while holding the inode->i_lock. With the vfs scalability patches, igrab() takes that lock as well, so we deadlock. I believe the following resolves the issue, however I'm not sure if its totally correct or not, since I'm not totally sure if the __iget() rules. With this patch, Jan Kansky (the original reporter) still had booting issues, but was unable to provide details. However, this patch has been tested by John Kacur and resolves the hang for him, however we still see MNT_MOUNTED WARN_ONs that I'm still trying to decide how to fix, but are otherwise harmless. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Jan Kansky <kansky@ll.mit.edu> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-06-08	dcache: Fix select_parent dentry traversal locking	john stultz
	With the vfs-scalability patches, the dcache_lock was replaced by the dentry->d_lock for protecting the dentry->d_subdirs as well as child dentries from being killed from under us. However, in select_parent, to minimize lock holds, when traversing down a directory, the parent d_lock was released. Then on ascending, the parent d_lock needed to be reacquired. This causes problems because we must aquire the parent lock first, so we would have to release the child lock, aquire the parent lock, then reaquire the next child lock. The problem being, when we released the child lock, we held no locks. If we were preempted here, the next dentry we were going to look at may be killed under us, and the d_subdirs list modified, so when we then reacquire the parent lock, the next value is then stale and we've lost our place in the d_subdirs list. This results in semi random null pointer dereferences and gpf oopses. To resolve this, I simply made select_parent continue to hold the parent lock as it descends down the directory tree. This avoids the lock acquisition dance on ascend and avoids the race illustrated above. This may results in higher contention due to us holding more locks, but its still better then the global dcache_lock originally being replaced. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Clark Williams <williams@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-31	Merge stable/linux-2.6.33.y into rt/2.6.33	Thomas Gleixner
	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-26	nilfs2: fix sync silent failure	Ryusuke Konishi
	commit 973bec34bfc1bc2465646181653d67f767d418c8 upstream. As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set. And nilfs does not set s_bdi anywhere. I noticed this problem by the warning introduced by the recent commit 5129a469 ("Catch filesystem lacking s_bdi"). WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e() Hardware name: PowerEdge 2850 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38 Call Trace: [<c1028422>] warn_slowpath_common+0x60/0x90 [<c102845f>] warn_slowpath_null+0xd/0x10 [<c1095936>] vfs_kern_mount+0xc5/0x14e [<c1095a03>] do_kern_mount+0x32/0xbd [<c10a811e>] do_mount+0x671/0x6d0 [<c1073794>] ? __get_free_pages+0x1f/0x21 [<c10a684f>] ? copy_mount_options+0x2b/0xe2 [<c107b634>] ? strndup_user+0x48/0x67 [<c10a81de>] sys_mount+0x61/0x8f [<c100280c>] sysenter_do_call+0x12/0x32 This ensures to set s_bdi for nilfs and fixes the sync silent failure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	CacheFiles: Fix error handling in cachefiles_determine_cache_security()	David Howells
	commit 7ac512aa8237c43331ffaf77a4fd8b8d684819ba upstream. cachefiles_determine_cache_security() is expected to return with a security override in place. However, if set_create_files_as() fails, we fail to do this. In this case, we should just reinstate the security override that was set by the caller. Furthermore, if set_create_files_as() fails, we should dispose of the new credentials we were in the process of creating. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	Btrfs: check for read permission on src file in the clone ioctl	Dan Rosenberg
	commit 5dc6416414fb3ec6e2825fd4d20c8bf1d7fe0395 upstream. The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	inotify: don't leak user struct on inotify release	Pavel Emelyanov
	commit b3b38d842fa367d862b83e7670af4e0fd6a80fc0 upstream. inotify_new_group() receives a get_uid-ed user_struct and saves the reference on group->inotify_data.user. The problem is that free_uid() is never called on it. Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify using fsnotify) after 2.6.30. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	inotify: race use after free/double free in inotify inode marks	Eric Paris
	commit e08733446e72b983fed850fc5d8bd21b386feb29 upstream. There is a race in the inotify add/rm watch code. A task can find and remove a mark which doesn't have all of it's references. This can result in a use after free/double free situation. Task A Task B ------------ ----------- inotify_new_watch() allocate a mark (refcnt == 1) add it to the idr inotify_rm_watch() inotify_remove_from_idr() fsnotify_put_mark() refcnt hits 0, free take reference because we are on idr [at this point it is a use after free] [time goes on] refcnt may hit 0 again, double free The fix is to take the reference BEFORE the object can be found in the idr. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	cifs: guard against hardlinking directories	Jeff Layton
	commit 3d69438031b00c601c991ab447cafb7d5c3c59a6 upstream. When we made serverino the default, we trusted that the field sent by the server in the "uniqueid" field was actually unique. It turns out that it isn't reliably so. Samba, in particular, will just put the st_ino in the uniqueid field when unix extensions are enabled. When a share spans multiple filesystems, it's quite possible that there will be collisions. This is a server bug, but when the inodes in question are a directory (as is often the case) and there is a collision with the root inode of the mount, the result is a kernel panic on umount. Fix this by checking explicitly for directory inodes with the same uniqueid. If that is the case, then we can assume that using server inode numbers will be a problem and that they should be disabled. Fixes Samba bugzilla 7407 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	revert "procfs: provide stack information for threads" and its fixup commits	Robin Holt
	commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream. Originally, commit d899bf7b ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit 1306d603f ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted c44972f1 ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-20	fs: namespace: Fix fuse mount fallout	john stultz
	> WARNING: at fs/namespace.c:648 commit_tree+0xf1/0x10b() > Pid: 14002, comm: fusermount Not tainted 2.6.33.3-rt19 #32 > Call Trace: > [<ffffffff81040e67>] warn_slowpath_common+0x7c/0x94 > [<ffffffff81040e93>] warn_slowpath_null+0x14/0x16 > [<ffffffff81115811>] commit_tree+0xf1/0x10b > [<ffffffff8111661d>] attach_recursive_mnt+0xf2/0x188 > [<ffffffff811167b3>] graft_tree+0x100/0x102 > [<ffffffff8111765b>] do_mount+0x386/0x7ae > [<ffffffff810d55f2>] ? strndup_user+0x5d/0x85 > [<ffffffff81117b0b>] sys_mount+0x88/0xc2 > [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b Looks like the fuse mounts are more interesting here and are tripping up the MNT_MOUNTED logic. Trivial cases: MNT_MOUNTED gets set by: attach_mnt commit_tree MNT_MOUNTED gets unset by: detach_mnt unmount_tree So there's a nice symmetry there. We also clear MNT_MOUNTED in clone_mnt(), since we're creating a unmounted copy that we will latter call attach_mnt() upon. Now, here's where things get messy: copy_tree(): Earlier, we didn't set MNT_MOUNTED on the first clone on the root of the mnt to be copied. This caused problems with new namespaces since after it is copied, we don't call attach_mnt or commit_tree. So when the namespace is removed, and we call unmount_tree, and hit a WARN_ON. Similarly, if we bombed out in copy_tree due to a ENOMEM, we call umount_tree on the mnt and will hit the WARN_ON as well. The same issue hits us with collect_mounts and drop_collected_mounts, where we copy_tree() and then unmount_tree() and hit the WARN_ON. This seemed broken, so I set MNT_MOUNTED on the root cloned mnt in copy_tree and it resolved the above asymmetries. However, do_loopback is more complicated, since it calls either copy_tree or clone_mnt (depending on the recursive flag) and then grafts that mnt which calls commit_tree()/attach_mnt(). Leaving clone_mnt(), the mnt is not set as MNT_MOUNTED, but now with my change to copy_tree(), it sets the root as MNT_MOUNTED. This then causes a WARN_ON in the commit_tree() called by graft_tree(). The fix below simply clears the recently set MNT_MOUNTED flag after copy_tree() returns, before calling graft_tree(). Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Carsten Emde <C.Emde@osadl.org> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-13	fs: Resolve mntput_no_expire issues.	john stultz
	In testing the mnt_count typo fix, I hit a few BUG_ON/WARN_ON messages in the mntput_no_expire code. The first issue was a race against the MNT_MOUNTED flag, where if after the optimistic lock free check is done, someone changes the value, we might BUG_ON after getting the lock. The fix is after getting the lock, re-check the MNT_MOUNTED bit and drop the lock and try again if its changed. The second issue was a call to smp_processor_id() in add_mnt_count() that was done while preemptable. This was missed in my earlier commit 070976b5b038218900648ea4cc88786d5dfcd58d. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <1273711934.2856.22.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-13	fs: Fix mnt_count typo	john stultz
	Clark noticed the following snippit in commit 070976b5b038218900648ea4cc88786d5dfcd58d : if (mnt->mnt_pinned) { - inc_mnt_count(mnt); + preempt_disable(); + dec_mnt_count(mnt); + preempt_enable(); mnt->mnt_pinned--; } vfsmount_write_unlock(); I accidentally replaced an inc_mnt_count() with a dec_mnt_count(). The issue went unnoticed, as the only user of mnt_unpin in the acct syscall. This patch corrects the mistake. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Cc: Darren Hart <dvhltc@us.ibm.com> LKML-Reference: <1273711544.2856.15.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-13	Merge branch '2.6.33.4' into rt/2.6.33	Thomas Gleixner
	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-12	xfs: add a shrinker to background inode reclaim	Dave Chinner
	commit 9bf729c0af67897ea8498ce17c29b0683f7f2028 upstream On low memory boxes or those with highmem, kernel can OOM before the background reclaims inodes via xfssyncd. Add a shrinker to run inode reclaim so that it inode reclaim is expedited when memory is low. This is more complex than it needs to be because the VM folk don't want a context added to the shrinker infrastructure. Hence we need to add a global list of XFS mount structures so the shrinker can traverse them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	jfs: fix diAllocExt error in resizing filesystem	Bill Pemberton
	commit 2b0b39517d1af5294128dbc2fd7ed39c8effa540 upstream. Resizing the filesystem would result in an diAllocExt error in some instances because changes in bmp->db_agsize would not get noticed if goto extendBmap was called. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: jfs-discussion@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	ext4: correctly calculate number of blocks for fiemap	Leonard Michlmayr
	commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1 upstream. ext4_fiemap() rounds the length of the requested range down to blocksize, which is is not the true number of blocks that cover the requested region. This problem is especially impressive if the user requests only the first byte of a file: not a single extent will be reported. We fix this by calculating the last block of the region and then subtract to find the number of blocks in the extents. Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	NFS: rsize and wsize settings ignored on v4 mounts	Chuck Lever
	commit 356e76b855bdbfd8d1c5e75bcf0c6bf0dfe83496 upstream. NFSv4 mounts ignore the rsize and wsize mount options, and always use the default transfer size for both. This seems to be because all NFSv4 mounts are now cloned, and the cloning logic doesn't copy the rsize and wsize settings from the parent nfs_server. I tested Fedora's 2.6.32.11-99 and it seems to have this problem as well, so I'm guessing that .33, .32, and perhaps older kernels have this issue as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	nfs d_revalidate() is too trigger-happy with d_drop()	Al Viro
	commit d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f upstream. If dentry found stale happens to be a root of disconnected tree, we can't d_drop() it; its d_hash is actually part of s_anon and d_drop() would simply hide it from shrink_dcache_for_umount(), leading to all sorts of fun, including busy inodes on umount and oopsen after that. Bug had been there since at least 2006 (commit c636eb already has it), so it's definitely -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	ocfs2_dlmfs: Fix math error when reading LVB.	Joel Becker
	commit a36d515c7a2dfacebcf41729f6812dbc424ebcf0 upstream. When asked for a partial read of the LVB in a dlmfs file, we can accidentally calculate a negative count. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	ocfs2: Compute metaecc for superblocks during online resize.	Joel Becker
	commit a42ab8e1a37257da37e0f018e707bf365ac24531 upstream. Online resize writes out the new superblock and its backups directly. The metaecc data wasn't being recomputed. Let's do that directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	ocfs2: potential ERR_PTR dereference on error paths	Dan Carpenter
	commit 0350cb078f5035716ebdad4ad4709d02fe466a8a upstream. If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	ocfs2: Update VFS inode's id info after reflink.	Tao Ma
	commit c21a534e2f24968cf74976a4e721ac194db30ded upstream. In reflink we update the id info on the disk but forgot to update the corresponding information in the VFS inode. Update them accordingly when we want to preserve the attributes. Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	nfsd4: bug in read_buf	Neil Brown
	commit 2bc3c1179c781b359d4f2f3439cb3df72afc17fc upstream. When read_buf is called to move over to the next page in the pagelist of an NFSv4 request, it sets argp->end to essentially a random number, certainly not an address within the page which argp->p now points to. So subsequent calls to READ_BUF will think there is much more than a page of spare space (the cast to u32 ensures an unsigned comparison) so we can expect to fall off the end of the second page. We never encountered thsi in testing because typically the only operations which use more than two pages are write-like operations, which have their own decoding logic. Something like a getattr after a write may cross a page boundary, but it would be very unusual for it to cross another boundary after that. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	procfs: fix tid fdinfo	Jerome Marchand
	commit 3835541dd481091c4dbf5ef83c08aed12e50fd61 upstream. Correct the file_operations struct in fdinfo entry of tid_base_stuff[]. Presently /proc//task//fdinfo contains symlinks to opened files like /proc/*/fd/. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	reiserfs: fix corruption during shrinking of xattrs	Jeff Mahoney
	commit fb2162df74bb19552db3d988fd11c787cf5fad56 upstream. Commit 48b32a3553a54740d236b79a90f20147a25875e3 ("reiserfs: use generic xattr handlers") introduced a problem that causes corruption when extended attributes are replaced with a smaller value. The issue is that the reiserfs_setattr to shrink the xattr file was moved from before the write to after the write. The root issue has always been in the reiserfs xattr code, but was papered over by the fact that in the shrink case, the file would just be expanded again while the xattr was written. The end result is that the last 8 bytes of xattr data are lost. This patch fixes it to use new_size. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jethro Beekman <kernel@jbeekman.nl> Cc: Greg Surbey <gregsurbey@hotmail.com> Cc: Marco Gatti <marco.gatti@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12	reiserfs: fix permissions on .reiserfs_priv	Jeff Mahoney
	commit cac36f707119b792b2396aed371d6b5cdc194890 upstream. Commit 677c9b2e393a0cd203bd54e9c18b012b2c73305a ("reiserfs: remove privroot hiding in lookup") removed the magic from the lookup code to hide the .reiserfs_priv directory since it was getting loaded at mount-time instead. The intent was that the entry would be hidden from the user via a poisoned d_compare, but this was faulty. This introduced a security issue where unprivileged users could access and modify extended attributes or ACLs belonging to other users, including root. This patch resolves the issue by properly hiding .reiserfs_priv. This was the intent of the xattr poisoning code, but it appears to have never worked as expected. This is fixed by using d_revalidate instead of d_compare. This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way. The effort involved in working out the corner cases wrt permissions and caching outweigh the benefit of the feature. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Reported-by: Matt McCutchen <matt@mattmccutchen.net> Tested-by: Matt McCutchen <matt@mattmccutchen.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-11	autofs4: Remove another autofs4_lock deadlock	John Stultz
	Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-10	autofs: Remove deadlock	john stultz
	Apparently the conversion from using the dcache_lock -> autofs4_lock forgot that this function already grabs the autofs_lock for a small moment, so we end up grabbing the lock, then a moment later grab it again. Splat. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> LKML-Reference: <1273279153.2776.7.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-02	fs: Add missing parantheses	Olaf Hering
	Fix for this compile warning: fs/namespace.c:757: warning: suggest parentheses around operand \ of '!' or change '&' to '&&' or '!' to '~' Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-02	fs: Prevent dput race	Thomas Gleixner
	dput() drops dentry->d_lock when it fails to lock inode->i_lock or parent->d_lock. dentry->d_count is 0 at this point so dentry kann be killed and freed by someone else. This leaves dput with a stale pointer in the retry code which results in interesting kernel crashes. Prevent this by incrementing dentry->d_count before dropping the lock. Go back to start after dropping the lock so d_count is decremented again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-30	fs: Use s_inodes not s_files for inode lists	Thomas Gleixner
	The VFS scalability rework broke UP due to a stupid typo which enqueued inodes on the file list. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-29	fs: Fix namespace related hangs	John Stultz
	Nick converted the dentry->d_mounted counter to a flag, however with namespaces, dentries can be mounted multiple times (and more importantly unmounted multiple times). If a namespace was created and then released, the unmount_tree would remove the DCACHE_MOUNTED flag and that would make d_mountpoint fail, causing the mounts to be lost. This patch coverts it back to a counter, and adds some extra WARN_ONs to make sure things are accounted properly. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <1272522942.1967.12.camel@work-vm> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-29	xfs: Make i_count access non-atomic	John Stultz
	i_count is not longer atomic. Fix up the leftover. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-28	fs: Fix d_count fallout	Thomas Gleixner
	d_count got converted to int and back to atomic_t. Two instances were missed in the backward conversion. Fix them up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-28	fs: namespace: Fix MNT_MOUNTED handling for cloned rootfs	John Stultz
	We don't call attach_mnt on a cloned rootfs so set the MNT_MOUNTED flag in copy_tree(). Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-28	fs: namespace: Make put_mnt_ns rt aware	Thomas Gleixner
	On RT the lock() inside the preempt disabled region of get_cpu_var() results in a might sleep warning. Restructure the code and check the atomic transition to 0 open coded to avoid vfsmount_write_lock() in the case when ns->count is > 1. If ns->count == 1 then do the atomic decrement under full locking of namespace_sem and vfsmount_write_lock(). In most cases the atomic_dec_and_test() will have dropped ns->count to 0 so we need the full locking anyway. Based on a patch from John Stultz Signed-off-by: Thomas Gleixner <tglx@linutronix.de>