summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-08-08proc: restrict access to /proc/PID/ioVasiliy Kulikov
commit 1d1221f375c94ef961ba8574ac4f85c8870ddd51 upstream. /proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08cifs: check for NULL session passwordJeff Layton
commit 24e6cf92fde1f140d8eb0bf7cd24c2c78149b6b2 upstream. It's possible for a cifsSesInfo struct to have a NULL password, so we need to check for that prior to running strncmp on it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08cifs: fix NULL pointer dereference in cifs_find_smb_sesJeff Layton
commit fc87a40677bbe0937e2ff0642c7e83c9a4813f3d upstream. cifs_find_smb_ses assumes that the vol->password field is a valid pointer, but that's only the case if a password was passed in via the options string. It's possible that one won't be if there is no mount helper on the box. Reported-by: diabel <gacek-2004@wp.pl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08cifs: clean up cifs_find_smb_ses (try #2)Jeff Layton
commit 4ff67b720c02c36e54d55b88c2931879b7db1cd2 upstream. This patch replaces the earlier patch by the same name. The only difference is that MAX_PASSWORD_SIZE has been increased to attempt to match the limits that windows enforces. Do a better job of matching sessions by authtype. Matching by username for a Kerberos session is incorrect, and anonymous sessions need special handling. Also, in the case where we do match by username, we also need to match by password. That ensures that someone else doesn't "borrow" an existing session without needing to know the password. Finally, passwords can be longer than 16 bytes. Bump MAX_PASSWORD_SIZE to 512 to match the size that the userspace mount helper allows. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> [dannf: backported to Debian's 2.6.32] Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08Revert "block: rescan partitions on invalidated devices on -ENOMEDIA too"Greg Kroah-Hartman
This reverts commit 5b2745db12a3f97a9ec9efd4ffa077da707d3e4c (commit 02e352287a40bd456eb78df705bf888bc3161d3f upstream) This should have only been commited on .38 and newer, not older kernels like this one, sorry. Cc: Tejun Heo <tj@kernel.org> Cc: David Zeuthen <zeuthen@gmail.com> Cc: Martin Pitt <martin.pitt@ubuntu.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08ext3: Fix oops in ext3_try_to_allocate_with_rsv()Jan Kara
commit ad95c5e9bc8b5885f94dce720137cac8fa8da4c9 upstream. Block allocation is called from two places: ext3_get_blocks_handle() and ext3_xattr_block_set(). These two callers are not necessarily synchronized because xattr code holds only xattr_sem and i_mutex, and ext3_get_blocks_handle() may hold only truncate_mutex when called from writepage() path. Block reservation code does not expect two concurrent allocations to happen to the same inode and thus assertions can be triggered or reservation structure corruption can occur. Fix the problem by taking truncate_mutex in xattr code to serialize allocations. CC: Sage Weil <sage@newdream.net> Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08NFSv4.1: update nfs4_fattr_bitmap_maxszAndy Adamson
commit e5012d1f3861d18c7f3814e757c1c3ab3741dbcd upstream. Attribute IDs assigned in RFC 5661 now require three bitmaps. Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-13mm: prevent concurrent unmap_mapping_range() on the same inodeMiklos Szeredi
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream. Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23exec: delay address limit change until point of no returnMathias Krause
commit dac853ae89043f1b7752875300faf614de43c74b upstream. Unconditionally changing the address limit to USER_DS and not restoring it to its old value in the error path is wrong because it prevents us using kernel memory on repeated calls to this function. This, in fact, breaks the fallback of hard coded paths to the init program from being ever successful if the first candidate fails to load. With this patch applied switching to USER_DS is delayed until the point of no return is reached which makes it possible to have a multi-arch rootfs with one arch specific init binary for each of the (hard coded) probed paths. Since the address limit is already set to USER_DS when start_thread() will be invoked, this redundancy can be safely removed. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23xfs: properly account for reclaimed inodesJohannes Weiner
commit 081003fff467ea0e727f66d5d435b4f473a789b3 upstream. When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> Backported-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23oprofile, dcookies: Fix possible circular locking dependencyRobert Richter
commit fe47ae7f53e179d2ef6771024feb000cbb86640f upstream. The lockdep warning below detects a possible A->B/B->A locking dependency of mm->mmap_sem and dcookie_mutex. The order in sync_buffer() is mm->mmap_sem/dcookie_mutex, while in sys_lookup_dcookie() it is vice versa. Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before copy_to_user(). oprofiled/4432 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3 but task is already holding lock: (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (dcookie_mutex){+.+.+.}: [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309 [<ffffffff81124e5c>] get_dcookie+0x30/0x144 [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile] [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile] [<ffffffff81467b96>] notifier_call_chain+0x37/0x63 [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67 [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c [<ffffffff81039e8f>] do_exit+0x2a/0x6fc [<ffffffff8103a5e4>] do_group_exit+0x83/0xae [<ffffffff8103a626>] sys_exit_group+0x17/0x1b [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff810b4478>] might_fault+0x80/0xa3 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by oprofiled/4432: #0: (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149 stack backtrace: Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d #9 Call Trace: [<ffffffff81063193>] print_circular_bug+0xae/0xbc [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42 [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff810d7d54>] ? path_put+0x22/0x27 [<ffffffff810b4478>] might_fault+0x80/0xa3 [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b References: https://bugzilla.kernel.org/show_bug.cgi?id=13809 Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23fat: Fix corrupt inode flags when remove ATTR_SYS flagOGAWA Hirofumi
commit 1adffbae22332bb558c2a29de19d9aca391869f6 upstream. We are clearly missing '~' in fat_ioctl_set_attributes(). Reported-by: Dmitry Dmitriev <dimondmm@yandex.ru> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23UBIFS: fix memory leak on error pathArtem Bityutskiy
commit 812eb258311f89bcd664a34a620f249d54a2cd83 upstream. UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write failure because it forgets to free the 'struct ubifs_dent_node *dent' object. Although the object is small, the alignment can make it large - e.g., 2KiB if the min. I/O unit is 2KiB. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23UBIFS: fix shrinker object count reportsArtem Bityutskiy
commit cf610bf4199770420629d3bc273494bd27ad6c1d upstream. Sometimes VM asks the shrinker to return amount of objects it can shrink, and we return the ubifs_clean_zn_cnt in that case. However, it is possible that this counter is negative for a short period of time, due to the way UBIFS TNC code updates it. And I can observe the following warnings sometimes: shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788 This patch makes sure UBIFS never returns negative count of objects. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23UBIFS: fix a rare memory leak in ro to rw remounting pathArtem Bityutskiy
commit eaeee242c531cd4b0a4a46e8b5dd7ef504380c42 upstream. When re-mounting from R/O mode to R/W mode and the LEB count in the superblock is not up-to date, because for the underlying UBI volume became larger, we re-write the superblock. We allocate RAM for these purposes, but never free it. So this is a memory leak, although very rare one. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23eCryptfs: Allow 2 scatterlist entries for encrypted filenamesTyler Hicks
commit 8d08dab786ad5cc2aca2bf870de370144b78c85a upstream. The buffers allocated while encrypting and decrypting long filenames can sometimes straddle two pages. In this situation, virt_to_scatterlist() will return -ENOMEM, causing the operation to fail and the user will get scary error messages in their logs: kernel: ecryptfs_write_tag_70_packet: Internal error whilst attempting to convert filename memory to scatterlist; expected rc = 1; got rc = [-12]. block_aligned_filename_size = [272] kernel: ecryptfs_encrypt_filename: Error attempting to generate tag 70 packet; rc = [-12] kernel: ecryptfs_encrypt_and_encode_filename: Error attempting to encrypt filename; rc = [-12] kernel: ecryptfs_lookup: Error attempting to encrypt and encode filename; rc = [-12] The solution is to allow up to 2 scatterlist entries to be used. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23Fix for buffer overflow in ldm_frag_add not sufficientTimo Warns
commit cae13fe4cc3f24820ffb990c09110626837e85d4 upstream. As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer overflow in ldm_frag_add) is not sufficient. The original patch in commit c340b1d64000 ("fs/partitions/ldm.c: fix oops caused by corrupted partition table") does not consider that, for subsequent fragments, previously allocated memory is used. [1] http://lkml.org/lkml/2011/5/6/407 Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23ext4: release page cache in ext4_mb_load_buddy error pathYang Ruirui
commit 26626f1172fb4f3f323239a6a5cf4e082643fa46 upstream. Add missing page_cache_release in the error path of ext4_mb_load_buddy Signed-off-by: Yang Ruirui <ruirui.r.yang@tieto.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23jbd: fix fsync() tid wraparound bugTed Ts'o
commit d9b01934d56a96d9f4ae2d6204d4ea78a36f5f36 upstream. If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that is valid for us to start: namely, the currently running transaction (if it exists). Reported-by: Martin_Zielinski@McAfee.com Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23jbd: Fix forever sleeping process in do_get_write_access()Jan Kara
commit 2842bb20eed2e25cde5114298edc62c8883a1d9a upstream. In do_get_write_access() we wait on BH_Unshadow bit for buffer to get from shadow state. The waking code in journal_commit_transaction() has a bug because it does not issue a memory barrier after the buffer is moved from the shadow state and before wake_up_bit() is called. Thus a waitqueue check can happen before the buffer is actually moved from the shadow state and waiting process may never be woken. Fix the problem by issuing proper barrier. Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23ext3: Fix fs corruption when make_indexed_dir() failsJan Kara
commit 86c4f6d85595cd7da635dc6985d27bfa43b1ae10 upstream. When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23block: rescan partitions on invalidated devices on -ENOMEDIA tooTejun Heo
commit 02e352287a40bd456eb78df705bf888bc3161d3f upstream. __blkdev_get() doesn't rescan partitions if disk->fops->open() fails, which leads to ghost partition devices lingering after medimum removal is known to both the kernel and userland. The behavior also creates a subtle inconsistency where O_NONBLOCK open, which doesn't fail even if there's no medium, clears the ghots partitions, which is exploited to work around the problem from userland. Fix it by updating __blkdev_get() to issue partition rescan after -ENOMEDIA too. This was reported in the following bz. https://bugzilla.kernel.org/show_bug.cgi?id=13029 Stable: 2.6.38 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Zeuthen <zeuthen@gmail.com> Reported-by: Martin Pitt <martin.pitt@ubuntu.com> Reported-by: Kay Sievers <kay.sievers@vrfy.org> Tested-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23cifs: add fallback in is_path_accessible for old serversJeff Layton
commit 221d1d797202984cb874e3ed9f1388593d34ee22 upstream. The is_path_accessible check uses a QPathInfo call, which isn't supported by ancient win9x era servers. Fall back to an older SMBQueryInfo call if it fails with the magic error codes. Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23CIFS: Fix memory over bound bug in cifs_parse_mount_optionsPavel Shilovsky
commit 4906e50b37e6f6c264e7ee4237343eb2b7f8d16d upstream. While password processing we can get out of options array bound if the next character after array is delimiter. The patch adds a check if we reach the end. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23Validate size of EFI GUID partition entries.Timo Warns
commit fa039d5f6b126fbd65eefa05db2f67e44df8f121 upstream. Otherwise corrupted EFI partition tables can cause total confusion. Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23cifs: check for bytes_remaining going to zero in CIFS_SessSetupJeff Layton
commit fcda7f4578bbf9717444ca6da8a421d21489d078 upstream. It's possible that when we go to decode the string area in the SESSION_SETUP response, that bytes_remaining will be 0. Decrementing it at that point will mean that it can go "negative" and wrap. Check for a bytes_remaining value of 0, and don't try to decode the string area if that's the case. Reported-and-Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09btrfs: Require CAP_SYS_ADMIN for filesystem rebalanceBen Hutchings
commit 6f88a4403def422bd8e276ddf6863d6ac71435d2 upstream. Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09cifs: fix another memleak, in cifs_root_igetOskar Schirmer
commit a7851ce73b9fdef53f251420e6883cf4f3766534 upstream. cifs_root_iget allocates full_path through cifs_build_path_to_root, but fails to kfree it upon cifs_get_inode_info* failure. Make all failure exit paths traverse clean up handling at the end of the function. Signed-off-by: Oskar Schirmer <oskar@scara.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09Increase OSF partition limit from 8 to 18Linus Torvalds
commit 34d211a2d5df4984a35b18d8ccacbe1d10abb067 upstream. It turns out that while a maximum of 8 partitions may be what people "should" have had, you can actually fit up to 18 entries(*) in a sector. And some people clearly were taking advantage of that, like Michael Cree, who had ten partitions on one of his OSF disks. (*) The OSF partition data starts at byte offset 64 in the first sector, and the array of 16-byte partition entries start at offset 148 in the on-disk partition structure. Reported-by: Michael Cree <mcree@orcon.net.nz> Cc: stable@kernel.org (v2.6.38) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09Fix corrupted OSF partition table parsingTimo Warns
commit 1eafbfeb7bdf59cfe173304c76188f3fd5f1fd05 upstream. The kernel automatically evaluates partition tables of storage devices. The code for evaluating OSF partitions contains a bug that leaks data from kernel heap memory to userspace for certain corrupted OSF partitions. In more detail: for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) { iterates from 0 to d_npartitions - 1, where d_npartitions is read from the partition table without validation and partition is a pointer to an array of at most 8 d_partitions. Add the proper and obvious validation. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org [ Changed the patch trivially to not repeat the whole le16_to_cpu() thing, and to use an explicit constant for the magic value '8' ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09nfs: fix compilation warningJovi Zhang
commit 43b7c3f051dea504afccc39bcb56d8e26c2e0b77 upstream. this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)Neil Horman
commit e9e3d724e2145f5039b423c290ce2b2c3d8f94bc upstream. The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09GFS2: BUG in gfs2_adjust_quotaAbhijith Das
commit 8b4216018bdbfbb1b76150d202b15ee68c38e991 upstream. HighMem pages on i686 do not get mapped to the buffer_heads and this was causing a NULL pointer dereference when we were trying to memset page buffers to zero. We now use zero_user() that kmaps the page and directly manipulates page data. This patch also fixes a boundary condition that was incorrect. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> [Adjusted to apply to 2.6.32 by dann frazier <dannf@debian.org>] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09GFS2: Fix writing to non-page aligned gfs2_quota structuresAbhijith Das
commit 7e619bc3e6252dc746f64ac3b486e784822e9533 upstream. This is the upstream fix for this bug. This patch differs from the RHEL5 fix (Red Hat bz #555754) which simply writes to the 8-byte value field of the quota. In upstream quota code, we're required to write the entire quota (88 bytes) which can be split across a page boundary. We check for such quotas, and read/write the two parts from/to the corresponding pages holding these parts. With this patch, I don't see the bug anymore using the reproducer in Red Hat bz 555754. I successfully ran a couple of simple tests/mounts/ umounts and it doesn't seem like this patch breaks anything else. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> [Backported to 2.6.32 by dann frazier <dannf@debian.org>] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09GFS2: Clean up gfs2_adjust_quota() and do_glock()Steven Whitehouse
commit 1e72c0f7c40e665d2ed40014750fdd2fa9968bcf upstream. Both of these functions contained confusing and in one case duplicate code. This patch adds a new check in do_glock() so that we report -ENOENT if we are asked to sync a quota entry which doesn't exist. Due to the previous patch this is now reported correctly to userspace. Also there are a few new comments, and I hope that the code is easier to understand now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09fs/partitions/ldm.c: fix oops caused by corrupted partition tableTimo Warns
commit c340b1d640001c8c9ecff74f68fd90422ae2448a upstream. The kernel automatically evaluates partition tables of storage devices. The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains a bug that causes a kernel oops on certain corrupted LDM partitions. A kernel subsystem seems to crash, because, after the oops, the kernel no longer recognizes newly connected storage devices. The patch validates the value of vblk_size. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Richard Russon <rich@flatcap.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09Open with O_CREAT flag set fails to open existing files on non writable ↵Sachin Prabhu
directories commit 1574dff8996ab1ed92c09012f8038b5566fce313 upstream. An open on a NFS4 share using the O_CREAT flag on an existing file for which we have permissions to open but contained in a directory with no write permissions will fail with EACCES. A tcpdump shows that the client had set the open mode to UNCHECKED which indicates that the file should be created if it doesn't exist and encountering an existing flag is not an error. Since in this case the file exists and can be opened by the user, the NFS server is wrong in attempting to check create permissions on the parent directory. The patch adds a conditional statement to check for create permissions only if the file doesn't exist. Signed-off-by: Sachin S. Prabhu <sprabhu@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09NFSv4.1: Ensure state manager thread dies on last umountTrond Myklebust
commit 47c2199b6eb5fbe38ddb844db7cdbd914d304f9c upstream. Currently, the state manager may continue to try recovering state forever even after the last filesystem to reference that nfs_client has umounted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09nfs: don't lose MS_SYNCHRONOUS on remount of noac mountJeff Layton
commit 26c4c170731f00008f4317a2888a0a07ac99d90d upstream. On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the assumption that the flags on the mount syscall will have it set if the remounted fs is supposed to keep it. In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part of the mount options. Reported-by: Max Matveev <makc@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09UBIFS: fix master node recoveryArtem Bityutskiy
commit 6e0d9fd38b750d678bf9fd07db23582f52fafa55 upstream. This patch fixes the following symptoms: 1. Unmount UBIFS cleanly. 2. Start mounting UBIFS R/W and have a power cut immediately 3. Start mounting UBIFS R/O, this succeeds 4. Try to re-mount UBIFS R/W - this fails immediately or later on, because UBIFS will write the master node to the flash area which has been written before. The analysis of the problem: 1. UBIFS is unmounted cleanly, both copies of the master node are clean. 2. UBIFS is being mounter R/W, starts changing master node copy 1, and a power cut happens. The copy N1 becomes corrupted. 3. UBIFS is being mounted R/O. It notices the copy N1 is corrupted and reads copy N2. Copy N2 is clean. 4. Because of R/O mode, UBIFS cannot recover copy 1. 5. The mount code (ubifs_mount()) sees that the master node is clean, so it decides that no recovery is needed. 6. We are re-mounting R/W. UBIFS believes no recovery is needed and starts updating the master node, but copy N1 is still corrupted and was not recovered! Fix this problem by marking the master node as dirty every time we recover it and we are in R/O mode. This forces further recovery and the UBIFS cleans-up the corruptions and recovers the copy N1 when re-mounting R/W later. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09NFS: nfs_wcc_update_inode() should set nfsi->attr_gencountTrond Myklebust
commit 27dc1cd3ad9300f81e1219e5fc305d91d85353f8 upstream. If the call to nfs_wcc_update_inode() results in an attribute update, we need to ensure that the inode's attr_gencount gets bumped too, otherwise we are not protected against races with other GETATTR calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22proc: do proper range check on readdir offsetLinus Torvalds
commit d8bdc59f215e62098bc5b4256fd9928bf27053a1 upstream. Rather than pass in some random truncated offset to the pid-related functions, check that the offset is in range up-front. This is just cleanup, the previous commit fixed the real problem. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22UBIFS: fix oops when R/O file-system is fsync'edArtem Bityutskiy
commit 78530bf7f2559b317c04991b52217c1608d5a58d upstream. This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an file on R/O-mounter file-system. We (the UBIFS authors) incorrectly thought that VFS would not propagate 'fsync()' down to the file-system if it is read-only, but this is not the case. It is easy to exploit this bug using the following simple perl script: use strict; use File::Sync qw(fsync sync); die "File path is not specified" if not defined $ARGV[0]; my $path = $ARGV[0]; open FILE, "<", "$path" or die "Cannot open $path: $!"; fsync(\*FILE) or die "cannot fsync $path: $!"; close FILE or die "Cannot close $path: $!"; Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reported-by: Reuben Dowle <Reuben.Dowle@navico.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22ramfs: fix memleak on no-mmu archBob Liu
commit b836aec53e2bce71de1d5415313380688c851477 upstream. On no-mmu arch, there is a memleak during shmem test. The cause of this memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2 which makes iput() can't free that pages. The simple test file is like this: int main(void) { int i; key_t k = ftok("/etc", 42); for ( i=0; i<100; ++i) { int id = shmget(k, 10000, 0644|IPC_CREAT); if (id == -1) { printf("shmget error\n"); } if(shmctl(id, IPC_RMID, NULL ) == -1) { printf("shm rm error\n"); return -1; } } printf("run ok...\n"); return 0; } And the result: root:/> free total used free shared buffers Mem: 60320 17912 42408 0 0 -/+ buffers: 17912 42408 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 19096 41224 0 0 -/+ buffers: 19096 41224 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 20296 40024 0 0 -/+ buffers: 20296 40024 ... After this patch the test result is:(no memleak anymore) root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22UBIFS: restrict world-writable debugfs filesVasiliy Kulikov
commit 8c559d30b4e59cf6994215ada1fe744928f494bf upstream. Don't allow everybody to dump sensitive information about filesystems. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22cifs: always do is_path_accessible check in cifs_mountJeff Layton
commit 70945643722ffeac779d2529a348f99567fa5c33 upstream. Currently, we skip doing the is_path_accessible check in cifs_mount if there is no prefixpath. I have a report of at least one server however that allows a TREE_CONNECT to a share that has a DFS referral at its root. The reporter in this case was using a UNC that had no prefixpath, so the is_path_accessible check was not triggered and the box later hit a BUG() because we were chasing a DFS referral on the root dentry for the mount. This patch fixes this by removing the check for a zero-length prefixpath. That should make the is_path_accessible check be done in this situation and should allow the client to chase the DFS referral at mount time instead. Reported-and-Tested-by: Yogesh Sharma <ysharma@cymer.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14xfs: zero proper structure size for geometry callsAlex Elder
commit af24ee9ea8d532e16883251a6684dfa1be8eec29 upstream. Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to xfs_fs_geometry() in order to avoid passing kernel stack data back to user space: + memset(geo, 0, sizeof(*geo)); Unfortunately, one of the callers of that function passes the address of a smaller data type, cast to fit the type that xfs_fs_geometry() requires. As a result, this can happen: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: f87aca93 Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1 Call Trace: [<c12991ac>] ? panic+0x50/0x150 [<c102ed71>] ? __stack_chk_fail+0x10/0x18 [<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs] Fix this by fixing that one caller to pass the right type and then copy out the subset it is interested in. Note: This patch is an alternative to one originally proposed by Eric Sandeen. Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14exec: copy-and-paste the fixes into compat_do_execve() pathsOleg Nesterov
commit 114279be2120a916e8a04feeb2ac976a10016f2f upstream. Note: this patch targets 2.6.37 and tries to be as simple as possible. That is why it adds more copy-and-paste horror into fs/compat.c and uglifies fs/exec.c, this will be cleanuped later. compat_copy_strings() plays with bprm->vma/mm directly and thus has two problems: it lacks the RLIMIT_STACK check and argv/envp memory is not visible to oom killer. Export acct_arg_size() and get_arg_page(), change compat_copy_strings() to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0) as do_execve() does. Add the fatal_signal_pending/cond_resched checks into compat_count() and compat_copy_strings(), this matches the code in fs/exec.c and certainly makes sense. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14exec: make argv/envp memory visible to oom-killerOleg Nesterov
commit 3c77f845722158206a7209c45ccddc264d19319c upstream. Brad Spengler published a local memory-allocation DoS that evades the OOM-killer (though not the virtual memory RLIMIT): http://www.grsecurity.net/~spender/64bit_dos.c execve()->copy_strings() can allocate a lot of memory, but this is not visible to oom-killer, nobody can see the nascent bprm->mm and take it into account. With this patch get_arg_page() increments current's MM_ANONPAGES counter every time we allocate the new page for argv/envp. When do_execve() succeds or fails, we change this counter back. Technically this is not 100% correct, we can't know if the new page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but I don't think this really matters and everything becomes correct once exec changes ->mm or fails. Reported-by: Brad Spengler <spender@grsecurity.net> Reviewed-and-discussed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14nfsd: fix auth_domain reference leak on nlm operationsJ. Bruce Fields
commit 954032d2527f2fce7355ba70709b5e143d6b686f upstream. This was noticed by users who performed more than 2^32 lock operations and hence made this counter overflow (eventually leading to use-after-free's). Setting rq_client to NULL here means that it won't later get auth_domain_put() when it should be. Appears to have been introduced in 2.5.42 by "[PATCH] kNFSd: Move auth domain lookup into svcauth" which moved most of the rq_client handling to common svcauth code, but left behind this one line. Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>