Age | Commit message (Collapse) | Author |
|
We only need to check lock exclusive/shared types against open mode when
flock() is used on NFS, so move it into the flock-specific path instead of
checking it for all locks.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
flock64_to_posix_lock() is already doing this check
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It is not used outside the NFSv4 module.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
They are not used outside the NFSv4 module.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
encode_layoutreturn and encode_layoutcommit are now unused. Let's
remove them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The objlayout code has been in the tree, but it's been unmaintained and
no server product for it actually ever shipped.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The check in nfs4_ff_layout_prepare_ds() seems to be missing.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: a33e4b036d461 ("pNFS: return status from nfs4_pnfs_ds_connect")
Cc: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org # v4.11
|
|
If the server fails to return the attributes as part of an OPEN
reply, and then reboots, we can end up hanging. The reason is that
the client attempts to send a GETATTR in order to pick up the
missing OPEN call, but fails to release the slot first, causing
reboot recovery to deadlock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 2e80dbe7ac51a ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
Cc: stable@vger.kernel.org # v4.8+
|
|
Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Since commit 00bfa30abe86 ("NFS: Create a common pgio_alloc and
pgio_release function"), nfs_pgarray_set() has only a single caller. Let's
open code it.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Prevent a deadlock that can occur if we wait on allocations
that try to write back our pages.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 00bfa30abe869 ("NFS: Create a common pgio_alloc and pgio_release...")
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Fixes: 0bcbf039f6b2b ("nfs: handle request add failure properly")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit a7d42ddb3099727f58366fa006f850a219cce6c8 ("nfs: add mirroring
support to pgio layer") moved pg_cleanup out of the path when there was
non-sequental I/O that needed to be flushed. The result is that for
layouts that have more than one layout segment per file, the pg_lseg is not
cleared, so we can end up hitting the WARN_ON_ONCE(req_start >= seg_end) in
pnfs_generic_pg_test since the pg_lseg will be pointing to that
previously-flushed layout segment.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When passed GFP flags that allow sleeping (such as
GFP_NOIO), mempool_alloc() will never return NULL, it will
wait until memory is available.
This means that we don't need to handle failure, but that we
do need to ensure one thread doesn't call mempool_alloc()
twice on the one pool without queuing or freeing the first
allocation. If multiple threads did this during times of
high memory pressure, the pool could be exhausted and a
deadlock could result.
pnfs_generic_alloc_ds_commits() attempts to allocate from
the nfs_commit_mempool while already holding an allocation
from that pool. This is not safe. So change
nfs_commitdata_alloc() to take a flag that indicates whether
failure is acceptable.
In pnfs_generic_alloc_ds_commits(), accept failure and
handle it as we currently do. Else where, do not accept
failure, and do not handle it.
Even when failure is acceptable, we want to succeed if
possible. That means both
- using an entry from the pool if there is one
- waiting for direct reclaim is there isn't.
We call mempool_alloc(GFP_NOWAIT) to achieve the first, then
kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the
second. Each of these can fail, but together they do the
best they can without blocking indefinitely.
The objects returned by kmem_cache_alloc() will still be freed
by mempool_free(). This is safe as mempool_alloc() uses
exactly the same function to allocate objects (since the mempool
was created with mempool_create_slab_pool()). The object returned
by mempool_alloc() and kmem_cache_alloc() are indistinguishable
so mempool_free() will handle both identically, either adding to the
pool or calling kmem_cache_free().
Also, don't test for failure when allocating from
nfs_wdata_mempool.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Returning errors directly even lets us remove the goto
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we cut out the dprintk()s, then we can return error codes directly
and cut out the goto.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This puts all the common code in a single place for the
walk_client_list() functions.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Once again, we can remove the function and compare integer values
directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we cut out the dprintk()s, then we don't even need this to be a
separate function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Just remove the function and have the caller use nfs_release_request()
instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We always call nfs_mark_client_ready() even if nfs_create_rpc_client()
returns an error, so we can rearrange nfs_init_client() to mark the
client ready from a single place.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Let's cut out the goto and return any errors immedately
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Additionally, this change lets us cut out the goto by returning errors
immediately.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Removing the dprintk() lets us simplify the function by returning status
codes directly, rather than using a goto.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Removing the dprintk() lets us return the status value directly, rather
than jumping to a label if an error occurs.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
In addition to removing the dprintk(), this patch also initializes "res"
to the default return value instead of doing this through an else
condition.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Removing the dprintk()s lets us simplify the function by removing the
else condition entirely and returning the status of
initiate_{file,bulk}_draining() directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Flexfilelayout supports data servers which talk NFS v3 and v4.{0,1,2}.
However, this code path is disabled and v3 only servers are accepted.
This change removes this limitation.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
NFS has some optimizations for readdir to choose between using READDIR or
READDIRPLUS based on workload, and which NFS operation to use is determined
by subsequent interactions with lookup, d_revalidate, and getattr.
Concurrent use of nfs_readdir() via ->iterate_shared() can cause those
optimizations to repeatedly invalidate the pagecache used to store
directory entries during readdir(), which causes some very bad performance
for directories with many entries (more than about 10000).
There's a couple ways to fix this in NFS, but no fix would be as simple as
going back to ->iterate() to serialize nfs_readdir(), and neither fix I
tested performed as well as going back to ->iterate().
The first required taking the directory's i_lock for each entry, with the
result of terrible contention.
The second way adds another flag to the nfs_inode, and so keeps the
optimizations working for large directories. The difference from using
->iterate() here is that much more memory is consumed for a given workload
without any performance gain.
The workings of nfs_readdir() are such that concurrent users are serialized
within read_cache_page() waiting to retrieve pages of entries from the
server. By serializing this work in iterate_dir() instead, contention for
cache pages is reduced. Waiting processes can have an uncontended pass at
the entirety of the directory's pagecache once previous processes have
completed filling it.
v2 - Keep the bits needed for parallel lookup
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Otherwise lockdep says:
[ 1337.483798] ================================================
[ 1337.483999] [ BUG: lock held when returning to user space! ]
[ 1337.484252] 4.11.0-rc6 #19 Not tainted
[ 1337.484423] ------------------------------------------------
[ 1337.484626] mount/14766 is leaving the kernel with locks still held!
[ 1337.484841] 1 lock held by mount/14766:
[ 1337.485017] #0: (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520
Caught by xfstests generic/413 which tried to mount with the unsupported
mount option dax. Then xfstests generic/422 ran sync which deadlocks.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Normal pathname lookup doesn't allow empty pathnames, but using
AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you
can trigger an empty pathname lookup.
And not only is the RCU lookup in that case entirely unnecessary
(because we'll obviously immediately finalize the end result), it is
actively wrong.
Why? An empth path is a special case that will return the original
'dirfd' dentry - and that dentry may not actually be RCU-free'd,
resulting in a potential use-after-free if we were to initialize the
path lazily under the RCU read lock and depend on complete_walk()
finalizing the dentry.
Found by syzkaller and KASAN.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Dave Sterba collected a few more fixes for the last rc.
These aren't marked for stable, but I'm putting them in with a batch
were testing/sending by hand for this release"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix potential use-after-free for cloned bio
Btrfs: fix segmentation fault when doing dio read
Btrfs: fix invalid dereference in btrfs_retry_endio
btrfs: drop the nossd flag when remounting with -o ssd
|
|
Pull more CIFS fixes from Steve French:
"As promised, here is the remaining set of cifs/smb3 fixes for stable
(and a fix for one regression) now that they have had additional
review and testing"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix SMB3 mount without specifying a security mechanism
CIFS: store results of cifs_reopen_file to avoid infinite wait
CIFS: remove bad_network_name flag
CIFS: reconnect thread reschedule itself
CIFS: handle guest access errors to Windows shares
CIFS: Fix null pointer deref during read resp processing
|
|
If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start. Offset could be a negative value when
represented as a loff_t. The offset plus length will be used to update
the file size (i_size) which is also a loff_t.
Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.
Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied. Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.
To reproduce:
mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
Resulted in,
kernel BUG at mm/hugetlb.c:742!
Call Trace:
hugetlbfs_evict_inode+0x80/0xa0
evict+0x24a/0x620
iput+0x48f/0x8c0
dentry_unlink_inode+0x31f/0x4d0
__dentry_kill+0x292/0x5e0
dput+0x730/0x830
__fput+0x438/0x720
____fput+0x1a/0x20
task_work_run+0xfe/0x180
exit_to_usermode_loop+0x133/0x150
syscall_return_slowpath+0x184/0x1c0
entry_SYSCALL_64_fastpath+0xab/0xad
Fixes: ff8c0c53c475 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|