Age | Commit message (Collapse) | Author |
|
commit a6bbce54efa9145dbcf3029c885549f7ebc40a3b upstream.
The recent refactoring of the bulkstat code left a small landmine in
the code. If a inobt read fails, then the tree walk is aborted and
returns without releasing the AGI buffer or freeing the cursor. This
can lead to a subsequent bulkstat call hanging trying to grab the
AGI buffer again.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6e5aafb27419f32575b27ef9d6a31e5d54661aca upstream.
If we hit any errors in btrfs_lookup_csums_range, we'll loop through all
the csums we allocate and free them. But the code was using list_entry
incorrectly, and ended up trying to free the on-stack list_head instead.
This bug came from commit 0678b6185
btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range()
Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Erik Berg <btrfs@slipsprogrammoer.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7e8631e8b9d4e9f698c09c7e7309c96249180ff9 upstream.
uninitialized msghdr. Broken in "ocfs2: don't open-code kernel_recvmsg()"
by me ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 474d2605d119479e5aa050f738632e63589d4bb5 upstream.
Due to a switched left and right side of an assignment,
dquot_writeback_dquots() never returned error. This could result in
errors during quota writeback to not be reported to userspace properly.
Fix it.
Coverity-id: 1226884
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7938db449bbc55bbeb164bec7af406212e7e98f1 upstream.
The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 51904b08072a8bf2b9ed74d1bd7a5300a614471d upstream.
Unknown operation numbers are caught in nfsd4_decode_compound() which
sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The
error causes the main loop in nfsd4_proc_compound() to skip most
processing. But nfsd4_proc_compound also peeks ahead at the next
operation in one case and doesn't take similar precautions there.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1d84c9626bb3a519863b3ffc40d347166f9fb83 upstream.
We added this new estimator function but forgot to hook it up. The
effect is that NFSv4.1 (and greater) won't do zero-copy reads.
The estimate was also wrong by 8 bytes.
Fixes: ccae70a9ee41 "nfsd4: estimate sequence response size"
Reported-by: Chuck Lever <chucklever@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 51486b900ee92856b977eacfc5bfbe6565028070 upstream.
d_splice_alias() callers expect it to either stash the inode reference
into a new alias, or drop the inode reference. That makes it possible
to just return d_splice_alias() result from ->lookup() instance, without
any extra housekeeping required.
Unfortunately, that should include the failure exits. If d_splice_alias()
returns an error, it leaves the dentry it has been given negative and
thus it *must* drop the inode reference. Easily fixed, but it goes way
back and will need backporting.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a41537e69b4aa43f0fea02498c2595a81267383b upstream.
O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked
twice inside ext4_file_write_iter() and __generic_file_write() which
result in BUG_ON inside ext4_direct_IO.
Let's initialize iocb->private unconditionally.
TESTCASE: xfstest:generic/036 https://patchwork.ozlabs.org/patch/402445/
#TYPICAL STACK TRACE:
kernel BUG at fs/ext4/inode.c:2960!
invalid opcode: 0000 [#1] SMP
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000
RIP: 0010:[<ffffffff811fabf2>] [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
RSP: 0018:ffff88080f90bb58 EFLAGS: 00010246
RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818
RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001
RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80
R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28
FS: 00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0
Stack:
ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30
0000000000000200 0000000000000200 0000000000000001 0000000000000200
ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200
Call Trace:
[<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180
[<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370
[<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400
[<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
[<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
[<ffffffff810990e5>] ? local_clock+0x25/0x30
[<ffffffff810abd94>] ? __lock_acquire+0x274/0x700
[<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0
[<ffffffff811bd756>] aio_run_iocb+0x286/0x410
[<ffffffff810990e5>] ? local_clock+0x25/0x30
[<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
[<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0
[<ffffffff811bde3b>] do_io_submit+0x55b/0x740
[<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740
[<ffffffff811be030>] SyS_io_submit+0x10/0x20
[<ffffffff815ce192>] system_call_fastpath+0x16/0x1b
Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c
24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89
RIP [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
RSP <ffff88080f90bb58>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 98c1a7593fa355fda7f5a5940c8bf5326ca964ba upstream.
If metadata checksumming is turned on for the FS, we need to tell the
journal to use checksumming too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9378c6768e4fca48971e7b6a9075bc006eda981d upstream.
When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.
Coverity-id: 741252
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 599a9b77ab289d85c2d5c8607624efbe1f552b0f upstream.
When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().
Coverity-id: 989065
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 813d32f91333e4c33d5a19b67167c4bae42dae75 upstream.
Convert the ext4_has_group_desc_csum predicate to look for a checksum
driver instead of the metadata_csum flag and change the bg checksum
calculation function to look for GDT_CSUM before taking the crc16
path.
Without this patch, if we mount with ^uninit_bg,^metadata_csum and
later metadata_csum gets turned on by accident, the block group
checksum functions will incorrectly assume that checksumming is
enabled (metadata_csum) but that crc16 should be used
(!s_chksum_driver). This is totally wrong, so fix the predicate
and the checksum formula selection.
(Granted, if the metadata_csum feature bit gets enabled on a live FS
then something underhanded is going on, but we could at least avoid
writing garbage into the on-disk fields.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit aef4885ae14f1df75b58395c5314d71f613d26d9 upstream.
Error report likely result in IO so it is bad idea to do it from
atomic context.
This patch should fix following issue:
BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1
5 locks held by kworker/u128:1/137:
#0: ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
#1: ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
#2: (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0
#3: (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430
#4: (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630
CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
Workqueue: writeback bdi_writeback_workfn (flush-1:0)
0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288
ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38
ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000
Call Trace:
[<ffffffff815c7fdc>] dump_stack+0x51/0x6d
[<ffffffff8108fb30>] __might_sleep+0xf0/0x100
[<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0
[<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20
[<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230
[<ffffffff8120fa03>] save_error_info+0x23/0x30
[<ffffffff8120fd06>] __ext4_error+0xb6/0xd0
[<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190
[<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630
[<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0
[<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60
[<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80
[<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280
[<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190
[<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410
[<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380
[<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0
[<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180
[<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0
[<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0
[<ffffffff81221797>] ? ext4_find_extent+0x97/0x320
[<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050
[<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430
[<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430
[<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50
[<ffffffff811feb44>] ext4_writepages+0x564/0xaf0
[<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0
[<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
[<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
[<ffffffff811377e3>] do_writepages+0x23/0x40
[<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280
[<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490
[<ffffffff811a0664>] wb_writeback+0x174/0x2d0
[<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
[<ffffffff811a0863>] wb_do_writeback+0xa3/0x200
[<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230
[<ffffffff81085618>] ? process_one_work+0x228/0x4d0
[<ffffffff810856cd>] process_one_work+0x2dd/0x4d0
[<ffffffff81085618>] ? process_one_work+0x228/0x4d0
[<ffffffff81085c1d>] worker_thread+0x35d/0x460
[<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
[<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
[<ffffffff8108a885>] kthread+0xf5/0x100
[<ffffffff810990e5>] ? local_clock+0x25/0x30
[<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70
[<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0
[<ffffffff8108a790>] ? __init_kthread_work
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9aa5d32ba269bec0e7eaba2697a986a7b0bc8528 upstream.
Besides the fact that this replacement improves code readability
it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
which may result attempt to use uninitialized csum machinery.
#Testcase_BEGIN
IMG=/dev/ram0
MNT=/mnt
mkfs.ext4 $IMG
mount $IMG $MNT
#Enable feature directly on disk, on mounted fs
tune2fs -O metadata_csum $IMG
# Provoke metadata update, likey result in OOPS
touch $MNT/test
umount $MNT
#Testcase_END
# Replacement script
@@
expression E;
@@
- EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+ ext4_has_metadata_csum(E)
https://bugzilla.kernel.org/show_bug.cgi?id=82201
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0ff8947fc5f700172b37cbca811a38eb9cb81e08 upstream.
Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary. However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.
This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:
dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
conv=notrunc of=testfile
leads to:
EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e2bfb088fac03c0f621886a04cffc7faa2b49b1d upstream.
The boot loader inode (inode #5) should never be visible in the
directory hierarchy, but it's possible if the file system is corrupted
that there will be a directory entry that points at inode #5. In
order to avoid accidentally trashing it, when such a directory inode
is opened, the inode will be marked as a bad inode, so that it's not
possible to modify (or read) the inode from userspace.
Unfortunately, when we unlink this (invalid/illegal) directory entry,
we will put the bad inode on the ophan list, and then when try to
unlink the directory, we don't actually remove the bad inode from the
orphan list before freeing in-memory inode structure. This means the
in-memory orphan list is corrupted, leading to a kernel oops.
In addition, avoid truncating a bad inode in ext4_destroy_inode(),
since truncating the boot loader inode is not a smart thing to do.
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f4bb2981024fc91b23b4d09a8817c415396dbabb upstream.
If there is a corrupted file system which has directory entries that
point at reserved, metadata inodes, prohibit them from being used by
treating them the same way we treat Boot Loader inodes --- that is,
mark them to be bad inodes. This prohibits them from being opened,
deleted, or modified via chmod, chown, utimes, etc.
In particular, this prevents a corrupted file system which has a
directory entry which points at the journal inode from being deleted
and its blocks released, after which point Much Hilarity Ensues.
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3e67cfad22230ebed85c56cbe413876f33fea82b upstream.
Otherwise this provokes complain like follows:
WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0()
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030
0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8
ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000
Call Trace:
[<ffffffff815c7dfc>] dump_stack+0x51/0x6d
[<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81202008>] ? ext4_ioctl+0x9e8/0xeb0
[<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
[<ffffffff8122867e>] ext4_journal_check_start+0x4e/0xa0
[<ffffffff81228c10>] __ext4_journal_start_sb+0x90/0x110
[<ffffffff81202008>] ext4_ioctl+0x9e8/0xeb0
[<ffffffff8107b0bd>] ? ptrace_stop+0x24d/0x2f0
[<ffffffff81088530>] ? alloc_pid+0x480/0x480
[<ffffffff8107b1f2>] ? ptrace_do_notify+0x92/0xb0
[<ffffffff81186545>] do_vfs_ioctl+0x4e5/0x550
[<ffffffff815cdbcb>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff81186603>] SyS_ioctl+0x53/0x80
[<ffffffff815ce2ce>] tracesys+0xd0/0xd5
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d6320cbfc92910a3e5f10c42d98c231c98db4f60 upstream.
Use truncate_isize_extended() when hole is being created in a file so that
->page_mkwrite() will get called for the partial tail page if it is
mmaped (see the first patch in the series for details).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 279bf6d390933d5353ab298fcc306c391a961469 upstream.
The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a0626e75954078cfacddb00a4545dde821170bc5 upstream.
When loading extended attributes, check each entry's value offset to
make sure it doesn't collide with the entries.
Without this check it is easy to crash the kernel by mounting a
malicious FS containing a file with an EA wherein e_value_offs = 0 and
e_value_size > 0 and then deleting the EA, which corrupts the name
list.
(See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 064d83892e9ba547f7d4eae22cbca066d95210ce upstream.
Free the buffer head if the journal descriptor block fails checksum
verification.
This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
verify error in do_one_pass".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d4bf205da618bbd0b038e404d646f14e76915718 upstream.
The pstore filesystem still creates duplicate filename/inode pairs for
some pstore types. Add the id to the filename to prevent that.
Before patch:
[/sys/fs/pstore] ls -li
total 0
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
1250 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi
After:
[/sys/fs/pstore] ls -li
total 0
1232 -r--r--r--. 1 root root 148 Sep 29 17:09 console-efi-141202499100000
1231 -r--r--r--. 1 root root 67 Sep 29 17:09 console-efi-141202499200000
1230 -r--r--r--. 1 root root 148 Sep 29 17:44 console-efi-141202705400000
1229 -r--r--r--. 1 root root 67 Sep 29 17:44 console-efi-141202705500000
1228 -r--r--r--. 1 root root 67 Sep 29 20:42 console-efi-141203772600000
1227 -r--r--r--. 1 root root 148 Sep 29 23:42 console-efi-141204854900000
1226 -r--r--r--. 1 root root 67 Sep 29 23:42 console-efi-141204855000000
1225 -r--r--r--. 1 root root 148 Sep 29 23:59 console-efi-141204954200000
1224 -r--r--r--. 1 root root 67 Sep 29 23:59 console-efi-141204954400000
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0d0826019e529f21c84687521d03f60cd241ca7d upstream.
Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.
In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another. Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.
Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.
[Added stable cc. Fixes CVE-2014-7970. --Andy]
Reported-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 99358a1ca53e8e6ce09423500191396f0e6584d2 upstream.
schedule_delayed_work() happening when the work is already pending is
a cheap no-op. Don't bother with ->wbuf_queued logics - it's both
broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris)
and pointless. It's cheaper to let schedule_delayed_work() handle that
case.
Reported-by: Jeff Harris <jefftharris@gmail.com>
Tested-by: Jeff Harris <jefftharris@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6d13f69444bd3d4888e43f7756449748f5a98bad upstream.
AFAICS, prepend_name() is broken on SMP alpha. Disclaimer: I don't have
SMP alpha boxen to reproduce it on. However, it really looks like the race
is real.
CPU1: d_path() on /mnt/ramfs/<255-character>/foo
CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character>
CPU2 does d_alloc(), which allocates an external name, stores the name there
including terminating NUL, does smp_wmb() and stores its address in
dentry->d_name.name. It proceeds to d_add(dentry, NULL) and d_move()
old dentry over to that. ->d_name.name value ends up in that dentry.
In the meanwhile, CPU1 gets to prepend_name() for that dentry. It fetches
->d_name.name and ->d_name.len; the former ends up pointing to new name
(64-byte kmalloc'ed array), the latter - 255 (length of the old name).
Nothing to force the ordering there, and normally that would be OK, since we'd
run into the terminating NUL and stop. Except that it's alpha, and we'd need
a data dependency barrier to guarantee that we see that store of NUL
__d_alloc() has done. In a similar situation dentry_cmp() would survive; it
does explicit smp_read_barrier_depends() after fetching ->d_name.name.
prepend_name() doesn't and it risks walking past the end of kmalloc'ed object
and possibly oops due to taking a page fault in kernel mode.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90a8020278c1598fafd071736a0846b38510309c upstream.
->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.
However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
mremap(map, 1024, 10000, 0);
map[4095] = 'a'; ----> no page_mkwrite() called
At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.
This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ba29e721eb2df6df8f33c1f248388bb037a47914 upstream.
Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the
'empty_log_bytes()' function, which calculates how many bytes are left in the
log:
"
If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h'
would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes'
instead of 0.
"
At this point it is not clear what would be the consequences of this, and
whether this may lead to any problems, but this patch addresses the issue just
in case.
Tested-by: hujianyang <hujianyang@huawei.com>
Reported-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 052c28073ff26f771d44ef33952a41d18dadd255 upstream.
Hu (hujianyang@huawei.com) discovered a race condition which may lead to a
situation when UBIFS is unable to mount the file-system after an unclean
reboot. The problem is theoretical, though.
In UBIFS, we have the log, which basically a set of LEBs in a certain area. The
log has the tail and the head.
Every time user writes data to the file-system, the UBIFS journal grows, and
the log grows as well, because we append new reference nodes to the head of the
log. So the head moves forward all the time, while the log tail stays at the
same position.
At any time, the UBIFS master node points to the tail of the log. When we mount
the file-system, we scan the log, and we always start from its tail, because
this is where the master node points to. The only occasion when the tail of the
log changes is the commit operation.
The commit operation has 2 phases - "commit start" and "commit end". The former
is relatively short, and does not involve much I/O. During this phase we mostly
just build various in-memory lists of the things which have to be written to
the flash media during "commit end" phase.
During the commit start phase, what we do is we "clean" the log. Indeed, the
commit operation will index all the data in the journal, so the entire journal
"disappears", and therefore the data in the log become unneeded. So we just
move the head of the log to the next LEB, and write the CS node there. This LEB
will be the tail of the new log when the commit operation finishes.
When the "commit start" phase finishes, users may write more data to the
file-system, in parallel with the ongoing "commit end" operation. At this point
the log tail was not changed yet, it is the same as it had been before we
started the commit. The log head keeps moving forward, though.
The commit operation now needs to write the new master node, and the new master
node should point to the new log tail. After this the LEBs between the old log
tail and the new log tail can be unmapped and re-used again.
And here is the possible problem. We do 2 operations: (a) We first update the
log tail position in memory (see 'ubifs_log_end_commit()'). (b) And then we
write the master node (see the big lock of code in 'do_commit()').
But nothing prevents the log head from moving forward between (a) and (b), and
the log head may "wrap" now to the old log tail. And when the "wrap" happens,
the contends of the log tail gets erased. Now a power cut happens and we are in
trouble. We end up with the old master node pointing to the old tail, which was
erased. And replay fails because it expects the master node to point to the
correct log tail at all times.
This patch merges the abovementioned (a) and (b) operations by moving the master
node change code to the 'ubifs_log_end_commit()' function, so that it runs with
the log mutex locked, which will prevent the log from being changed benween
operations (a) and (b).
Reported-by: hujianyang <hujianyang@huawei.com>
Tested-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 69a91c237ab0ebe4e9fdeaf6d0090c85275594ec upstream.
The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:
Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.
The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.
O_TMPFILE, however, behaves differently:
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);
For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}
But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().
This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().
A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523
The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.
On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.
But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.
Signed-off-by: Eric Rannaud <e@nanocritical.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 475d0db742e3755c6b267f48577ff7cbb7dfda0d upstream.
total_objects could be 0 and is used as a denom.
While total_objects is a "long", total_objects == 0 unlikely happens for
3.12 and later kernels because 32-bit architectures would not be able to
hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
between 3.1 and 3.11 because total_objects in prune_super() was an "int"
and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c2ca0fcd202863b14bd041a7fece2e789926c225 upstream.
This patch makes it possible to kill a process looping in
cont_expand_zero. A process may spend a lot of time in this function, so
it is desirable to be able to kill it.
It happened to me that I wanted to copy a piece data from the disk to a
file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
to the "seek" parameter, dd attempted to extend the file and became stuck
doing so - the only possibility was to reset the machine or wait many
hours until the filesystem runs out of space and cont_expand_zero fails.
We need this patch to be able to terminate the process.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 173b3afceebe76fa2205b2c8808682d5b541fe3c upstream.
If rpc.statd is restarted, upcalls to monitor hosts can fail with
ECONNREFUSED. In that case force a lookup of statd's new port and retry the
upcall.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3034a146820c26fe6da66a45f6340fe87fe0983a upstream.
Empty files and missing xattrs do not guarantee that a file was
just created. This patch passes FILE_CREATED flag to IMA to
reliably identify new files.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0b37e097a648aa71d4db1ad108001e95b69a2da4 upstream.
According to commit 80af258867648 ("fanotify: groups can specify their
f_flags for new fd"), file descriptors created as part of file access
notification events inherit flags from the event_f_flags argument passed
to syscall fanotify_init(2)[1].
Unfortunately O_CLOEXEC is currently silently ignored.
Indeed, event_f_flags are only given to dentry_open(), which only seems to
care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
open_check_o_direct() and O_LARGEFILE in generic_file_open().
It's a pity, since, according to some lookup on various search engines and
http://codesearch.debian.net/, there's already some userspace code which
use O_CLOEXEC:
- in systemd's readahead[2]:
fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
- in clsync[3]:
#define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
- in examples [4] from "Filesystem monitoring in the Linux
kernel" article[5] by Aleksander Morgado:
if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
Additionally, since commit 48149e9d3a7e ("fanotify: check file flags
passed in fanotify_init"). having O_CLOEXEC as part of fanotify_init()
second argument is expressly allowed.
So it seems expected to set close-on-exec flag on the file descriptors if
userspace is allowed to request it with O_CLOEXEC.
But Andrew Morton raised[6] the concern that enabling now close-on-exec
might break existing applications which ask for O_CLOEXEC but expect the
file descriptor to be inherited across exec().
In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
descriptor returned as part of file access notify can break applications
due to deadlock. So close-on-exec is needed for most applications.
More, applications asking for close-on-exec are likely expecting it to be
enabled, relying on O_CLOEXEC being effective. If not, it might weaken
their security, as noted by Jan Kara[8].
So this patch replaces call to macro get_unused_fd() by a call to function
get_unused_fd_flags() with event_f_flags value as argument. This way
O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
interpreted and close-on-exec get enabled when requested.
[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz
Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
Cc: Pádraig Brady <P@draigBrady.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a8b1ee8bafc765ebf029d03c5479a69aebff9693 upstream.
caused a regression in xfs_inumbers, which in turn broke
xfsdump, causing incomplete dumps.
The loop in xfs_inumbers() needs to fill the user-supplied
buffers, and iterates via xfs_btree_increment, reading new
ags as needed.
But the first time through the loop, if xfs_btree_increment()
succeeds, we continue, which triggers the ++agno at the bottom
of the loop, and we skip to soon to the next ag - without
the proper setup under next_ag to read the next ag.
Fix this by removing the agno increment from the loop conditional,
and only increment agno if we have actually hit the code under
the next_ag: target.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0d085a529b427d97710e6a41f8a4f23e1757cd12 upstream.
XFS has been having trouble with stray delayed allocation extents
beyond EOF for a long time. Recent changes to the collapse range
code has triggered erroneous EBUSY errors on page invalidtion for
block size smaller than page size filesystems. These
have been caused by dirty buffers beyond EOF on a partial page which
do not get written to disk during a sync.
The issue is that write-ahead in xfs_cluster_write() finds such a
partial page and handles it by leaving the page dirty but pushing it
into a writeback state. This used to work just fine, as the
write_cache_pages() code would then find the dirty partial page in
the next mapping tree lookup as the dirty tag is still set.
Unfortunately, when we moved to a mark and sweep approach to
writeback to fix other writeback sync issues, we broken this. THe
act of marking the page as under writeback now clears the TOWRITE
tag in the radix tree, even though the page is still dirty. This
causes the TOWRITE tag to be cleared, and hence the next lookup on
the mapping tree does not find the dirty partial page and so doesn't
try to write it again.
This same writeback bug was found recently in ext4 and fixed in
commit 1c8349a ("ext4: fix data integrity sync in ordered mode")
without communication to the wider filesystem community. We can use
exactly the same fix here so the TOWRITE flag is not cleared on
partial page writes.
cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e
Root-cause-found-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6174c2eb8ecef271159bdcde460ce8af54d8f72f upstream.
Some UDF media have special inodes (like VAT or metadata partition
inodes) whose link_count is 0. Thus commit 4071b9136223 (udf: Properly
detect stale inodes) broke loading these inodes because udf_iget()
started returning -ESTALE for them. Since we still need to properly
detect stale inodes queried by NFS, create two variants of udf_iget() -
one which is used for looking up special inodes (which ignores
link_count == 0) and one which is used for other cases which return
ESTALE when link_count == 0.
Fixes: 4071b913622316970d0e1919f7d82b4403fec5f2
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 35425ea2492175fd39f6116481fe98b2b3ddd4ca upstream.
Christopher Head 2014-06-28 05:26:20 UTC described:
"I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
PGD d7840067 PUD b2c3c067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nvidia(PO)
CPU: 3 PID: 3566 Comm: bash Tainted: P O 3.12.21-gentoo-r1 #2
Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
RIP: 0010:[<ffffffff8110eb39>] [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP: 0018:ffff8800bad71c10 EFLAGS: 00010246
RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
FS: 00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
Stack:
ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
Call Trace:
[<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
[<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
[<ffffffff81082c2c>] ? should_resched+0x5/0x23
[<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
[<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
[<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
[<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
[<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
[<ffffffff810fa2a3>] ? path_openat+0x224/0x472
[<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
[<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
[<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
[<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
RIP [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP <ffff8800bad71c10>
CR2: 0000000000000000
---[ end trace df9dba5f1ddb8565 ]---"
If we create a file when we mount with ecryptfs_xattr_metadata option, we will
encounter a crash in this path:
->ecryptfs_create
->ecryptfs_initialize_file
->ecryptfs_write_metadata
->ecryptfs_write_metadata_to_xattr
->ecryptfs_setxattr
->fsstack_copy_attr_all
It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
will be initialized when ecryptfs_initialize_file finish.
So we should skip copying attr from lower inode when the value of ->d_inode is
invalid.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6543f803670530f6aa93790d9fa116d8395a537d upstream.
You cannot call pnfs_put_lseg_async() more than once per lseg, so it
is really an inappropriate way to deal with a refcount issue.
Instead, replace it with a function that decrements the refcount, and
puts the final 'free' operation (which is incompatible with locks) on
the workqueue.
Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: e6cf82d1830f: pnfs: add pnfs_put_lseg_async
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b8fb9c30f25e45dab5d2cd310ab6913b6861d00f upstream.
It is OK for pageused == pagecount in the loop, as long as we don't add
another entry to the *pages array. Move the test so that it only triggers
in that case.
Reported-by: Steve Dickson <SteveD@redhat.com>
Fixes: bba5c1887a92 (nfs: disallow duplicate pages in pgio page vectors)
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3caa0c6ed754d91b15266abf222498edbef982bd upstream.
SteveD reports the following Oops:
RIP: 0010:[<ffffffffa053461d>] [<ffffffffa053461d>] __put_nfs_open_context+0x1d/0x100 [nfs]
RSP: 0018:ffff880fed687b90 EFLAGS: 00010286
RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880fed687bc0 R08: 0000000000000092 R09: 000000000000047a
R10: 0000000000000000 R11: ffff880fed6878d6 R12: ffff880fed687d20
R13: ffff880fed687d20 R14: 0000000000000070 R15: ffffea000aa33ec0
FS: 00007fce290f0740(0000) GS:ffff8807ffc60000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000070 CR3: 00000007f2e79000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
0000000000000000 ffff880036c5e510 ffff880fed687d20 ffff880fed687d20
ffff880036c5e200 ffffea000aa33ec0 ffff880fed687bd0 ffffffffa0534710
ffff880fed687be8 ffffffffa053d5f0 ffff880036c5e200 ffff880fed687c08
Call Trace:
[<ffffffffa0534710>] put_nfs_open_context+0x10/0x20 [nfs]
[<ffffffffa053d5f0>] nfs_pgio_data_destroy+0x20/0x40 [nfs]
[<ffffffffa053d672>] nfs_pgio_error+0x22/0x40 [nfs]
[<ffffffffa053d8f4>] nfs_generic_pgio+0x74/0x2e0 [nfs]
[<ffffffffa06b18c3>] pnfs_generic_pg_writepages+0x63/0x210 [nfsv4]
[<ffffffffa053d579>] nfs_pageio_doio+0x19/0x50 [nfs]
[<ffffffffa053eb84>] nfs_pageio_complete+0x24/0x30 [nfs]
[<ffffffffa053cb25>] nfs_direct_write_schedule_iovec+0x115/0x1f0 [nfs]
[<ffffffffa053675f>] ? nfs_get_lock_context+0x4f/0x120 [nfs]
[<ffffffffa053d252>] nfs_file_direct_write+0x262/0x420 [nfs]
[<ffffffffa0532d91>] nfs_file_write+0x131/0x1d0 [nfs]
[<ffffffffa0532c60>] ? nfs_need_sync_write.isra.17+0x40/0x40 [nfs]
[<ffffffff812127b8>] do_io_submit+0x3b8/0x840
[<ffffffff81212c50>] SyS_io_submit+0x10/0x20
[<ffffffff81610f29>] system_call_fastpath+0x16/0x1b
This is due to the calls to nfs_pgio_error() in nfs_generic_pgio(), which
happen before the nfs_pgio_header's open context is referenced in
nfs_pgio_rpcsetup().
Reported-by: Steve Dickson <SteveD@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f7b43d0c992c3ec3e8d9285c3fb5e1e0eb0d031a upstream.
As of 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space", we permit the server to process a LOCK operation even if
there might not be space to return the conflicting lockowner, because
we've made returning the conflicting lockowner optional.
However, the rpc server still wants to know the most we might possibly
return, so we need to take into account the possible conflicting
lockowner in the svc_reserve_space() call here.
Symptoms were log messages like "RPC request reserved 88 but used 108".
Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space"
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1f456b0b9545f1606a54cd17c20775f159bd2ce upstream.
Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.
The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:
In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.
Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
An example application:
open, lock, write a file.
sleep for 6 * lease (could be less)
ulock, close.
In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.
This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit df817ba35736db2d62b07de6f050a4db53492ad8 upstream.
The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a4339b7b686b4acc8b6de2b07d7bacbe3ae44b83 upstream.
If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
a second time, then the client will currently take this to mean that it must
declare all locks to be stale, and hence ineligible for reboot recovery.
RFC3530 and RFC5661 both suggest that the client should instead rely on the
server to respond to inelegible open share, lock and delegation reclaim
requests with NFS4ERR_NO_GRACE in this situation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2f3169fb18f4643ac9a6a097a6a6c71f0b2cef75 upstream.
Commit 65b38851a174
("NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes")
updated the following function:
static int nfs_volume_list_open(struct inode *inode, struct file *file)
it used &nfs_server_list_ops instead of &nfs_volume_list_ops
which means cat /proc/fs/nfsfs/volumes = /proc/fs/nfsfs/servers
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Fixes: 65b38851a174 (NFS: Fix /proc/fs/nfsfs/servers and...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8faaa6d5d48b201527e0451296d9e71d23afb362 upstream.
Commit c9fdeb28 removed a 'continue' after checking if the lease needs
to be renewed. However, if client hasn't moved, the code falls down to
starting reboot recovery erroneously (ie., sends open reclaim and gets
back stale_clientid error) before recovering from getting stale_clientid
on the renew operation.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: c9fdeb280b8c (NFS: Add basic migration support to state manager thread)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a1480dcc3c706e309a88884723446f2e84fedd5b upstream.
Accessing do_remount_sb should require global CAP_SYS_ADMIN, but
only one of the two call sites was appropriately protected.
Fixes CVE-2014-7975.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|