lwn.git/fs, branch v3.4.80

hpfs: deadlock and race in directory lseek()

2014-02-13T19:51:18+00:00

commit 31abdab9c11bb1694ecd1476a7edbe8e964d94ac upstream. For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex in hpfs_dir_lseek() - there's a lot of methods that grab the former with the caller already holding the latter, so it must take i_mutex first. For another, locking the damn thing, carefully validating the offset, then dropping locks and assigning the offset is obviously racy. Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c won't modify the sucker on B-tree surgeries. Signed-off-by: Al Viro Cc: Mikulas Patocka Signed-off-by: Greg Kroah-Hartman

nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME

2014-02-13T19:51:13+00:00

commit 78b19bae0813bd6f921ca58490196abd101297bd upstream. Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP by nfs4_stat_to_errno. This allows the client to mount v4.1 servers that don't support SECINFO_NO_NAME by falling back to the "guess and check" method of nfs4_find_root_sec. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman

NFSv4: OPEN must handle the NFS4ERR_IO return code correctly

2014-02-13T19:51:12+00:00

commit c7848f69ec4a8c03732cde5c949bd2aa711a9f4b upstream. decode_op_hdr() cannot distinguish between an XDR decoding error and the perfectly valid errorcode NFS4ERR_IO. This is normally not a problem, but for the particular case of OPEN, we need to be able to increment the NFSv4 open sequence id when the server returns a valid response. Reported-by: J Bruce Fields Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.org Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman

ore: Fix wrong math in allocation of per device BIO

2014-02-13T19:51:11+00:00

commit aad560b7f63b495f48a7232fd086c5913a676e6f upstream. At IO preparation we calculate the max pages at each device and allocate a BIO per device of that size. The calculation was wrong on some unaligned corner cases offset/length combination and would make prepare return with -ENOMEM. This would be bad for pnfs-objects that would in that case IO through MDS. And fatal for exofs were it would fail writes with EIO. Fix it by doing the proper math, that will work in all cases. (I ran a test with all possible offset/length combinations this time round). Also when reading we do not need to allocate for the parity units since we jump over them. Also lower the max_io_length to take into account the parity pages so not to allocate BIOs bigger than PAGE_SIZE Signed-off-by: Boaz Harrosh Signed-off-by: Greg Kroah-Hartman

Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot()

2014-02-06T19:05:48+00:00

commit 90515e7f5d7d24cbb2a4038a3f1b5cfa2921aa17 upstream. We may return early in btrfs_drop_snapshot(), we shouldn't call btrfs_std_err() for this case, fix it. Signed-off-by: Wang Shilong Signed-off-by: Josef Bacik Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman

nilfs2: fix segctor bug that causes file system corruption

2014-01-29T13:10:42+00:00

commit 70f2fe3a26248724d8a5019681a869abdaf3e89a upstream. There is a bug in the function nilfs_segctor_collect, which results in active data being written to a segment, that is marked as clean. It is possible, that this segment is selected for a later segment construction, whereby the old data is overwritten. The problem shows itself with the following kernel log message: nilfs_sufile_do_cancel_free: segment 6533 must be clean Usually a few hours later the file system gets corrupted: NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0 NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660) The issue can be reproduced with a file system that is nearly full and with the cleaner running, while some IO intensive task is running. Although it is quite hard to reproduce. This is what happens: 1. The cleaner starts the segment construction 2. nilfs_segctor_collect is called 3. sc_stage is on NILFS_ST_SUFILE and segments are freed 4. sc_stage is on NILFS_ST_DAT current segment is full 5. nilfs_segctor_extend_segments is called, which allocates a new segment 6. The new segment is one of the segments freed in step 3 7. nilfs_sufile_cancel_freev is called and produces an error message 8. Loop around and the collection starts again 9. sc_stage is on NILFS_ST_SUFILE and segments are freed including the newly allocated segment, which will contain active data and can be allocated at a later time 10. A few hours later another segment construction allocates the segment and causes file system corruption This can be prevented by simply reordering the statements. If nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments the freed segments are marked as dirty and cannot be allocated any more. Signed-off-by: Andreas Rohner Reviewed-by: Ryusuke Konishi Tested-by: Andreas Rohner Signed-off-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

jbd2: don't BUG but return ENOSPC if a handle runs out of space

2014-01-08T17:42:12+00:00

commit f6c07cad081ba222d63623d913aafba5586c1d2c upstream. If a handle runs out of space, we currently stop the kernel with a BUG in jbd2_journal_dirty_metadata(). This makes it hard to figure out what might be going on. So return an error of ENOSPC, so we can let the file system layer figure out what is going on, to make it more likely we can get useful debugging information). This should make it easier to debug problems such as the one which was reported by: https://bugzilla.kernel.org/show_bug.cgi?id=44731 The only two callers of this function are ext4_handle_dirty_metadata() and ocfs2_journal_dirty(). The ocfs2 function will trigger a BUG_ON(), which means there will be no change in behavior. The ext4 function will call ext4_error_inode() which will print the useful debugging information and then handle the situation using ext4's error handling mechanisms (i.e., which might mean halting the kernel or remounting the file system read-only). Also, since both file systems already call WARN_ON(), drop the WARN_ON from jbd2_journal_dirty_metadata() to avoid two stack traces from being displayed. Signed-off-by: "Theodore Ts'o" Cc: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker Signed-off-by: Greg Kroah-Hartman

GFS2: Fix incorrect invalidation for DIO/buffered I/O

2014-01-08T17:42:12+00:00

commit dfd11184d894cd0a92397b25cac18831a1a6a5bc upstream. In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed local deferred locks to be granted against a cached exclusive lock. That opened up a corner case which this patch now fixes. The solution to the problem is to check whether we have cached pages each time we do direct I/O and if so to unmap, flush and invalidate those pages. Since the glock state machine normally does that for us, mostly the code will be a no-op. Signed-off-by: Steven Whitehouse Signed-off-by: Greg Kroah-Hartman

GFS2: don't hold s_umount over blkdev_put

2014-01-08T17:42:12+00:00

commit dfe5b9ad83a63180f358b27d1018649a27b394a9 upstream. This is a GFS2 version of Tejun's patch: 4f331f01b9c43bf001d3ffee578a97a1e0633eac vfs: don't hold s_umount over close_bdev_exclusive() call In this case its blkdev_put itself that is the issue and this patch uses the same solution of dropping and retaking s_umount. Reported-by: Tejun Heo Reported-by: Al Viro Signed-off-by: Steven Whitehouse Signed-off-by: Greg Kroah-Hartman

ceph: Avoid data inconsistency due to d-cache aliasing in readpage()

2014-01-08T17:42:11+00:00

commit 56f91aad69444d650237295f68c195b74d888d95 upstream. If the length of data to be read in readpage() is exactly PAGE_CACHE_SIZE, the original code does not flush d-cache for data consistency after finishing reading. This patches fixes this. Signed-off-by: Li Wang Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman