summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2026-07-03 05:48:05 -1000
committerLinus Torvalds <torvalds@linux-foundation.org>2026-07-03 05:48:05 -1000
commit71dfdfb0209b43dfd6f494f84f5548e4cfd18cb5 (patch)
treecfe70d8de248fc18924b14f05d6315282d6febc7
parent025d0d6221d9b060bce251427c671cd0080d9dae (diff)
parent5c6ce05e406520290c1d89da97fb3cd70c09137d (diff)
downloadlinux-master.tar.gz
linux-master.zip
Merge tag 'vfs-7.2-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsHEADmaster
Pull vfs fixes from Christian Brauner: - netfs: - fix the decision when to disallow write-streaming with fscache in use, handling of asynchronous cache object creation, a double fput in cachefiles, clearing S_KERNEL_FILE without the inode lock held, page extraction bugs in the iov_iter helpers (a potential underflow, a missing allocation failure check, a memory leak, and a folio offset miscalculation), writeback error and ENOMEM handling, DIO write retry for filesystems without a ->prepare_write() method, and the replacement of the wb_lock mutex with a bit lock plus writethrough collection offload so that multiple asynchronous writebacks don't interfere with each other. - Fix the barriering when walking the netfs subrequest list during retries as it was possible to see a subrequest that was just added by the application thread. - iomap: - Change iomap to submit read bios after each extent instead of building them up across extents. The old behavior was considered problematic for a while and now caused an actual erofs bug. - Guard the ioend io_size EOF trim in iomap against underflow when a concurrent truncate moves EOF below the start of the ioend, wrapping io_size to a huge value. - overlayfs - Fix a stale overlayfs comment about the locking order. - Store the linked-in upper dentry instead of the disconnected O_TMPFILE dentry during overlayfs tmpfile copy-up. With a FUSE or virtiofs upper layer ->d_revalidate() would try to look up "/" in the workdir and fail, causing persistent ESTALE errors that broke dpkg and apt. - vfs-bpf: Have the bpf_real_data_inode() kfunc take a struct file instead of a dentry so it is usable from the bprm_check_security, mmap_file, and file_mprotect hooks, and rename it from bpf_real_inode() to make the data-inode semantics explicit. The kfunc landed this cycle so the change is safe. - afs: NULL pointer dereferences in the callback service and in afs_get_tree(), several memory and refcount leaks, missing locking around the dynamic root inode numbers and premature cell exposure through /afs, a netns destruction hang caused by a misplaced increment of net->cells_outstanding, a bulk lookup malfunction caused by the dir_emit() API change, inode (re)initialisation issues, and assorted smaller fixes to error codes, seqlock handling, and debug output. - vfs: Refuse O_TMPFILE creation with an unmapped fsuid or fsgid and add a selftest for it. - vboxsf: Add Jori Koolstra as vboxsf maintainer, taking over from Hans de Goede. - dio: Release the pages attached to a short atomic dio bio; the REQ_ATOMIC size check error path leaked them. - procfs: Only bump the parent directory link count when registering directories in procfs. Registering regular files inflated the count and leaked a link on every create and remove cycle. - minix: Avoid an unsigned overflow in the minix bitmap block count calculation that let crafted images with huge inode or zone counts pass superblock validation and crash the kernel during mount. - cachefiles: Fix a double unlock in the cachefiles nomem_d_alloc error path left over from the start_creating() conversion. - fat: Stop fat from reading directory entries past the 0x00 end-of-directory marker. If the trailing on-disk slots aren't zero-filled the driver surfaced arbitrary garbage as directory entries. - freexvfs: Don't BUG() on unknown typed-extent types in freevxfs, reachable via ioctl(FIBMAP) on a crafted image; fail with an I/O error instead. - orangefs: Keep the readdir entry size 64-bit in orangefs fill_from_part(). Truncating it to __u32 bypassed the bounds check and led to out-of-bounds reads triggerable by the userspace client. - xfs: Fix the error unwind in xfs_open_devices() which released the rt device file twice and left dangling buftarg pointers behind that were freed again when the failed mount was torn down. - exec: Fix an off-by-one in the comment documenting the maximum binfmt rewrite depth in exec_binprm(). The code allows five rewrites, not four; restricting the code would break userspace so the comment is fixed instead. - file handles: Reject detached mounts in capable_wrt_mount(). A detached mount can be dissolved concurrently, leaving a NULL mount namespace that open_by_handle_at() would dereference. * tag 'vfs-7.2-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (57 commits) netfs: Fix barriering when walking subrequest list iomap: submit read bio after each extent fuse: call fuse_send_readpages explicitly from fuse_readahead iomap: consolidate bio submission fhandle: reject detached mounts in capable_wrt_mount() netfs: Fix DIO write retry for filesystems without a ->prepare_write() netfs: Fix folio state after ENOMEM whilst under writeback iteration netfs: Fix writeback error handling netfs: Fix writethrough to use collection offload netfs: Replace wb_lock with a bit lock for asynchronicity netfs: Fix kdoc warning scatterlist: Fix offset in folio calc in extract_xarray_to_sg() iov_iter: Remove unused variable in kunit_iov_iter.c iov_iter: Fix a memory leak in iov_iter_extract_user_pages() iov_iter: Fix missing alloc fail check in iov_iter_extract_bvec_pages() iov_iter: Fix potential underflow in iov_iter_extract_xarray_pages() cachefiles: Fix file burial to take lock when unsetting S_KERNEL_FILE cachefiles: Fix double fput netfs: Fix netfs_create_write_req() to handle async cache object creation netfs: Fix decision whether to disallow write-streaming due to fscache use ...
-rw-r--r--MAINTAINERS2
-rw-r--r--fs/afs/callback.c17
-rw-r--r--fs/afs/cell.c27
-rw-r--r--fs/afs/cmservice.c7
-rw-r--r--fs/afs/dir.c40
-rw-r--r--fs/afs/dynroot.c2
-rw-r--r--fs/afs/fs_operation.c2
-rw-r--r--fs/afs/inode.c15
-rw-r--r--fs/afs/internal.h3
-rw-r--r--fs/afs/super.c5
-rw-r--r--fs/afs/symlink.c4
-rw-r--r--fs/afs/vl_list.c24
-rw-r--r--fs/afs/volume.c2
-rw-r--r--fs/bpf_fs_kfuncs.c23
-rw-r--r--fs/cachefiles/namei.c4
-rw-r--r--fs/exec.c2
-rw-r--r--fs/exfat/iomap.c5
-rw-r--r--fs/fat/dir.c44
-rw-r--r--fs/fhandle.c2
-rw-r--r--fs/freevxfs/vxfs_bmap.c3
-rw-r--r--fs/fuse/file.c14
-rw-r--r--fs/iomap/bio.c15
-rw-r--r--fs/iomap/buffered-io.c16
-rw-r--r--fs/iomap/direct-io.c7
-rw-r--r--fs/iomap/ioend.c8
-rw-r--r--fs/minix/minix.h2
-rw-r--r--fs/namei.c4
-rw-r--r--fs/netfs/buffered_read.c2
-rw-r--r--fs/netfs/buffered_write.c2
-rw-r--r--fs/netfs/direct_write.c18
-rw-r--r--fs/netfs/internal.h12
-rw-r--r--fs/netfs/locking.c95
-rw-r--r--fs/netfs/read_retry.c7
-rw-r--r--fs/netfs/write_collect.c10
-rw-r--r--fs/netfs/write_issue.c55
-rw-r--r--fs/netfs/write_retry.c7
-rw-r--r--fs/ntfs/aops.c6
-rw-r--r--fs/ntfs3/inode.c5
-rw-r--r--fs/orangefs/dir.c7
-rw-r--r--fs/overlayfs/copy_up.c12
-rw-r--r--fs/overlayfs/inode.c4
-rw-r--r--fs/proc/generic.c9
-rw-r--r--fs/xfs/xfs_aops.c3
-rw-r--r--fs/xfs/xfs_super.c3
-rw-r--r--include/linux/iomap.h2
-rw-r--r--include/linux/netfs.h13
-rw-r--r--lib/iov_iter.c20
-rw-r--r--lib/scatterlist.c1
-rw-r--r--lib/tests/kunit_iov_iter.c5
-rw-r--r--tools/testing/selftests/filesystems/.gitignore1
-rw-r--r--tools/testing/selftests/filesystems/Makefile4
-rw-r--r--tools/testing/selftests/filesystems/idmapped_tmpfile.c168
52 files changed, 592 insertions, 178 deletions
diff --git a/MAINTAINERS b/MAINTAINERS
index 25453040dffb..7cc4bca5a2c5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -28725,7 +28725,7 @@ F: include/linux/vbox_utils.h
F: include/uapi/linux/vbox*.h
VIRTUAL BOX SHARED FOLDER VFS DRIVER
-M: Hans de Goede <hansg@kernel.org>
+M: Jori Koolstra <jkoolstra@xs4all.nl>
L: linux-fsdevel@vger.kernel.org
S: Maintained
F: fs/vboxsf/*
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 894d2bad6b6c..61354003c006 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -113,16 +113,12 @@ static struct afs_volume *afs_lookup_volume_rcu(struct afs_cell *cell,
{
struct afs_volume *volume = NULL;
struct rb_node *p;
- int seq = 1;
- for (;;) {
+ scoped_seqlock_read(&cell->volume_lock, ss_lock) {
/* Unfortunately, rbtree walking doesn't give reliable results
* under just the RCU read lock, so we have to check for
* changes.
*/
- seq++; /* 2 on the 1st/lockless path, otherwise odd */
- read_seqbegin_or_lock(&cell->volume_lock, &seq);
-
p = rcu_dereference_raw(cell->volumes.rb_node);
while (p) {
volume = rb_entry(p, struct afs_volume, cell_node);
@@ -138,12 +134,9 @@ static struct afs_volume *afs_lookup_volume_rcu(struct afs_cell *cell,
if (volume && afs_try_get_volume(volume, afs_volume_trace_get_callback))
break;
- if (!need_seqretry(&cell->volume_lock, seq))
- break;
- seq |= 1; /* Want a lock next time */
+ volume = NULL;
}
- done_seqretry(&cell->volume_lock, seq);
return volume;
}
@@ -221,7 +214,11 @@ static void afs_break_some_callbacks(struct afs_server *server,
rcu_read_lock();
volume = afs_lookup_volume_rcu(server->cell, vid);
- if (cbb->fid.vnode == 0 && cbb->fid.unique == 0) {
+ if (!volume) {
+ /* Ignore breaks on unknown volumes. */
+ rcu_read_unlock();
+ *_count = 0;
+ } else if (cbb->fid.vnode == 0 && cbb->fid.unique == 0) {
afs_break_volume_callback(server, volume);
*_count -= 1;
if (*_count)
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 9738684dbdd2..47a2645768d7 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -206,11 +206,6 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net,
cell->dns_status = vllist->status;
smp_store_release(&cell->dns_lookup_count, 1); /* vs source/status */
atomic_inc(&net->cells_outstanding);
- ret = idr_alloc_cyclic(&net->cells_dyn_ino, cell,
- 2, INT_MAX / 2, GFP_KERNEL);
- if (ret < 0)
- goto error;
- cell->dynroot_ino = ret;
cell->debug_id = atomic_inc_return(&cell_debug_id);
trace_afs_cell(cell->debug_id, 1, 0, afs_cell_trace_alloc);
@@ -304,6 +299,13 @@ struct afs_cell *afs_lookup_cell(struct afs_net *net,
goto cell_already_exists;
}
+ ret = idr_alloc_cyclic(&net->cells_dyn_ino, candidate,
+ 2, INT_MAX / 2, GFP_KERNEL);
+ if (ret < 0)
+ goto cant_alloc_ino;
+ candidate->dynroot_ino = ret;
+ set_bit(AFS_CELL_FL_HAVE_INO, &candidate->flags);
+
cell = candidate;
candidate = NULL;
afs_use_cell(cell, trace);
@@ -378,6 +380,11 @@ no_wait:
_leave(" = %p [cell]", cell);
return cell;
+cant_alloc_ino:
+ up_write(&net->cells_lock);
+ afs_put_cell(candidate, afs_cell_trace_put_candidate);
+ goto error_noput;
+
cell_already_exists:
_debug("cell exists");
cell = cursor;
@@ -547,6 +554,8 @@ static int afs_update_cell(struct afs_cell *cell)
rcu_assign_pointer(cell->vl_servers, vllist);
cell->dns_source = vllist->source;
old = p;
+ } else {
+ old = vllist;
}
write_unlock(&cell->vl_servers_lock);
afs_put_vlserverlist(cell->net, old);
@@ -577,7 +586,6 @@ static void afs_cell_destroy(struct rcu_head *rcu)
afs_put_vlserverlist(net, rcu_access_pointer(cell->vl_servers));
afs_unuse_cell(cell->alias_of, afs_cell_trace_unuse_alias);
key_put(cell->anonymous_key);
- idr_remove(&net->cells_dyn_ino, cell->dynroot_ino);
kfree(cell->name - 1);
kfree(cell);
@@ -592,6 +600,13 @@ static void afs_destroy_cell_work(struct work_struct *work)
afs_see_cell(cell, afs_cell_trace_destroy);
timer_delete_sync(&cell->management_timer);
cancel_work_sync(&cell->manager);
+
+ if (test_bit(AFS_CELL_FL_HAVE_INO, &cell->flags)) {
+ down_write(&cell->net->cells_lock);
+ idr_remove(&cell->net->cells_dyn_ino, cell->dynroot_ino);
+ up_write(&cell->net->cells_lock);
+ }
+
call_rcu(&cell->rcu, afs_cell_destroy);
}
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 5540ae1cad59..db394f101fc6 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -334,7 +334,6 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call)
ret = afs_extract_data(call, false);
switch (ret) {
case 0: break;
- case -EAGAIN: return 0;
default: return ret;
}
@@ -364,6 +363,11 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call)
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
return afs_io_error(call, afs_io_error_cm_reply);
+ if (!call->server) {
+ trace_afs_cm_no_server_u(call, call->request);
+ return 0;
+ }
+
if (memcmp(call->request, &call->server->_uuid, sizeof(call->server->_uuid)) != 0) {
pr_notice("Callback UUID does not match fileserver UUID\n");
trace_afs_cm_no_server_u(call, call->request);
@@ -451,7 +455,6 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call)
ret = afs_extract_data(call, false);
switch (ret) {
case 0: break;
- case -EAGAIN: return 0;
default: return ret;
}
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 498b99ccdf0e..6df56fe9163f 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -28,9 +28,11 @@ static int afs_d_revalidate(struct inode *dir, const struct qstr *name,
static int afs_d_delete(const struct dentry *dentry);
static void afs_d_iput(struct dentry *dentry, struct inode *inode);
static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name, int nlen,
- loff_t fpos, u64 ino, unsigned dtype);
+ u64 ino, u32 uniquifier);
+#define AFS_LOOKUP_ONE ((filldir_t)0x123UL)
static bool afs_lookup_filldir(struct dir_context *ctx, const char *name, int nlen,
- loff_t fpos, u64 ino, unsigned dtype);
+ u64 ino, u32 uniquifier);
+#define AFS_LOOKUP ((filldir_t)0x137UL)
static int afs_create(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, bool excl);
static struct dentry *afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
@@ -421,11 +423,18 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode,
}
/* found the next entry */
- if (!dir_emit(ctx, dire->u.name, nlen,
- ntohl(dire->u.vnode),
- (ctx->actor == afs_lookup_filldir ||
- ctx->actor == afs_lookup_one_filldir)?
- ntohl(dire->u.unique) : DT_UNKNOWN)) {
+ if (ctx->actor == AFS_LOOKUP) {
+ if (!afs_lookup_filldir(ctx, dire->u.name, nlen,
+ ntohl(dire->u.vnode),
+ ntohl(dire->u.unique)))
+ return 0;
+ } else if (ctx->actor == AFS_LOOKUP_ONE) {
+ if (!afs_lookup_one_filldir(ctx, dire->u.name, nlen,
+ ntohl(dire->u.vnode),
+ ntohl(dire->u.unique)))
+ return 0;
+ } else if (!dir_emit(ctx, dire->u.name, nlen,
+ ntohl(dire->u.vnode), DT_UNKNOWN)) {
_leave(" = 0 [full]");
return 0;
}
@@ -545,6 +554,7 @@ static int afs_readdir(struct file *file, struct dir_context *ctx)
{
afs_dataversion_t dir_version;
+ ctx->dt_flags_mask = UINT_MAX;
return afs_dir_iterate(file_inode(file), ctx, file, &dir_version);
}
@@ -554,14 +564,14 @@ static int afs_readdir(struct file *file, struct dir_context *ctx)
* uniquifier through dtype
*/
static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name,
- int nlen, loff_t fpos, u64 ino, unsigned dtype)
+ int nlen, u64 ino, u32 uniquifier)
{
struct afs_lookup_one_cookie *cookie =
container_of(ctx, struct afs_lookup_one_cookie, ctx);
_enter("{%s,%u},%s,%u,,%llu,%u",
cookie->name.name, cookie->name.len, name, nlen,
- (unsigned long long) ino, dtype);
+ (unsigned long long) ino, uniquifier);
/* insanity checks first */
BUILD_BUG_ON(sizeof(union afs_xdr_dir_block) != 2048);
@@ -574,7 +584,7 @@ static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name,
}
cookie->fid.vnode = ino;
- cookie->fid.unique = dtype;
+ cookie->fid.unique = uniquifier;
cookie->found = 1;
_leave(" = false [found]");
@@ -591,7 +601,7 @@ static int afs_do_lookup_one(struct inode *dir, const struct qstr *name,
{
struct afs_super_info *as = dir->i_sb->s_fs_info;
struct afs_lookup_one_cookie cookie = {
- .ctx.actor = afs_lookup_one_filldir,
+ .ctx.actor = AFS_LOOKUP_ONE,
.name = *name,
.fid.vid = as->volume->vid
};
@@ -622,14 +632,14 @@ static int afs_do_lookup_one(struct inode *dir, const struct qstr *name,
* uniquifier through dtype
*/
static bool afs_lookup_filldir(struct dir_context *ctx, const char *name,
- int nlen, loff_t fpos, u64 ino, unsigned dtype)
+ int nlen, u64 ino, u32 uniquifier)
{
struct afs_lookup_cookie *cookie =
container_of(ctx, struct afs_lookup_cookie, ctx);
_enter("{%s,%u},%s,%u,,%llu,%u",
cookie->name.name, cookie->name.len, name, nlen,
- (unsigned long long) ino, dtype);
+ (unsigned long long) ino, uniquifier);
/* insanity checks first */
BUILD_BUG_ON(sizeof(union afs_xdr_dir_block) != 2048);
@@ -637,7 +647,7 @@ static bool afs_lookup_filldir(struct dir_context *ctx, const char *name,
if (cookie->nr_fids < 50) {
cookie->fids[cookie->nr_fids].vnode = ino;
- cookie->fids[cookie->nr_fids].unique = dtype;
+ cookie->fids[cookie->nr_fids].unique = uniquifier;
cookie->nr_fids++;
}
@@ -778,7 +788,7 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry)
for (i = 0; i < ARRAY_SIZE(cookie->fids); i++)
cookie->fids[i].vid = dvnode->fid.vid;
- cookie->ctx.actor = afs_lookup_filldir;
+ cookie->ctx.actor = AFS_LOOKUP;
cookie->name = dentry->d_name;
cookie->nr_fids = 2; /* slot 1 is saved for the fid we actually want
* and slot 0 for the directory */
diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c
index 1d5e33bc7502..6e3c8c691ba9 100644
--- a/fs/afs/dynroot.c
+++ b/fs/afs/dynroot.c
@@ -278,7 +278,7 @@ static struct dentry *afs_lookup_atcell(struct inode *dir, struct dentry *dentry
}
/*
- * Transcribe the cell database into readdir content under the RCU read lock.
+ * Transcribe the cell database into readdir content under net->cells_lock.
* Each cell produces two entries, one prefixed with a dot and one not.
*/
static int afs_dynroot_readdir_cells(struct afs_net *net, struct dir_context *ctx)
diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c
index c0dbbc6d3716..20801b29521d 100644
--- a/fs/afs/fs_operation.c
+++ b/fs/afs/fs_operation.c
@@ -348,7 +348,7 @@ int afs_put_operation(struct afs_operation *op)
for (i = 0; i < op->nr_files - 2; i++)
if (op->more_files[i].put_vnode)
iput(&op->more_files[i].vnode->netfs.inode);
- kfree(op->more_files);
+ kvfree(op->more_files);
}
if (op->estate) {
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 3f48458694ba..14f39a9bea6c 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -52,9 +52,9 @@ static noinline void dump_vnode(struct afs_vnode *vnode, struct afs_vnode *paren
/*
* Set parameters for the netfs library
*/
-static void afs_set_netfs_context(struct afs_vnode *vnode)
+static void afs_set_netfs_context(struct afs_vnode *vnode, bool is_file)
{
- netfs_inode_init(&vnode->netfs, &afs_req_ops, true);
+ netfs_inode_init(&vnode->netfs, &afs_req_ops, is_file);
}
/*
@@ -93,6 +93,10 @@ static int afs_inode_init_from_status(struct afs_operation *op,
inode->i_gid = make_kgid(&init_user_ns, status->group);
set_nlink(&vnode->netfs.inode, status->nlink);
+ i_size_write(inode, status->size);
+ inode_set_bytes(inode, status->size);
+ afs_set_netfs_context(vnode, status->type == AFS_FTYPE_FILE);
+
switch (status->type) {
case AFS_FTYPE_FILE:
inode->i_mode = S_IFREG | (status->mode & S_IALLUGO);
@@ -126,7 +130,6 @@ static int afs_inode_init_from_status(struct afs_operation *op,
}
inode->i_mapping->a_ops = &afs_symlink_aops;
inode_nohighmem(inode);
- mapping_set_release_always(inode->i_mapping);
break;
default:
dump_vnode(vnode, op->file[0].vnode != vnode ? op->file[0].vnode : NULL);
@@ -134,10 +137,6 @@ static int afs_inode_init_from_status(struct afs_operation *op,
return afs_protocol_error(NULL, afs_eproto_file_type);
}
- i_size_write(inode, status->size);
- inode_set_bytes(inode, status->size);
- afs_set_netfs_context(vnode);
-
vnode->invalid_before = status->data_version;
trace_afs_set_dv(vnode, status->data_version);
inode_set_iversion_raw(&vnode->netfs.inode, status->data_version);
@@ -566,7 +565,6 @@ struct inode *afs_root_iget(struct super_block *sb, struct key *key)
vnode = AFS_FS_I(inode);
vnode->cb_v_check = atomic_read(&as->volume->cb_v_break);
- afs_set_netfs_context(vnode);
op = afs_alloc_operation(key, as->volume);
if (IS_ERR(op)) {
@@ -682,6 +680,7 @@ void afs_evict_inode(struct inode *inode)
inode->i_mapping->a_ops->writepages(inode->i_mapping, &wbc);
}
+ flush_delayed_work(&vnode->lock_work);
netfs_wait_for_outstanding_io(inode);
truncate_inode_pages_final(&inode->i_data);
netfs_free_folioq_buffer(vnode->directory);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 0b72a8566299..601f01e5c15f 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -388,6 +388,7 @@ struct afs_cell {
#define AFS_CELL_FL_NO_GC 0 /* The cell was added manually, don't auto-gc */
#define AFS_CELL_FL_DO_LOOKUP 1 /* DNS lookup requested */
#define AFS_CELL_FL_CHECK_ALIAS 2 /* Need to check for aliases */
+#define AFS_CELL_FL_HAVE_INO 3 /* Have dynroot_ino */
enum afs_cell_state state;
short error;
enum dns_record_source dns_source:8; /* Latest source of data from lookup */
@@ -750,8 +751,6 @@ static inline void afs_vnode_set_cache(struct afs_vnode *vnode,
{
#ifdef CONFIG_AFS_FSCACHE
vnode->netfs.cache = cookie;
- if (cookie)
- mapping_set_release_always(vnode->netfs.inode.i_mapping);
#endif
}
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 942f3e9800d7..82bb713825a0 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -587,7 +587,8 @@ static int afs_get_tree(struct fs_context *fc)
}
fc->root = dget(sb->s_root);
- trace_afs_get_tree(as->cell, as->volume);
+ if (!ctx->dyn_root)
+ trace_afs_get_tree(as->cell, as->volume);
_leave(" = 0 [%p]", sb);
return 0;
@@ -659,7 +660,6 @@ static void afs_i_init_once(void *_vnode)
INIT_LIST_HEAD(&vnode->wb_keys);
INIT_LIST_HEAD(&vnode->pending_locks);
INIT_LIST_HEAD(&vnode->granted_locks);
- INIT_DELAYED_WORK(&vnode->lock_work, afs_lock_work);
INIT_LIST_HEAD(&vnode->cb_mmap_link);
seqlock_init(&vnode->cb_lock);
}
@@ -693,6 +693,7 @@ static struct inode *afs_alloc_inode(struct super_block *sb)
init_rwsem(&vnode->rmdir_lock);
INIT_WORK(&vnode->cb_work, afs_invalidate_mmap_work);
+ INIT_DELAYED_WORK(&vnode->lock_work, afs_lock_work);
_leave(" = %p", &vnode->netfs.inode);
return &vnode->netfs.inode;
diff --git a/fs/afs/symlink.c b/fs/afs/symlink.c
index ed5868369f37..16b4823cb7b7 100644
--- a/fs/afs/symlink.c
+++ b/fs/afs/symlink.c
@@ -255,11 +255,11 @@ int afs_symlink_writepages(struct address_space *mapping,
}
if (ret == 0) {
- mutex_lock(&vnode->netfs.wb_lock);
+ netfs_wb_begin(&vnode->netfs, false);
netfs_free_folioq_buffer(vnode->directory);
vnode->directory = NULL;
vnode->directory_size = 0;
- mutex_unlock(&vnode->netfs.wb_lock);
+ netfs_wb_end(&vnode->netfs);
} else if (ret == 1) {
ret = 0; /* Skipped write due to lock conflict. */
}
diff --git a/fs/afs/vl_list.c b/fs/afs/vl_list.c
index 3e4966915ea4..c1dac5dbed0d 100644
--- a/fs/afs/vl_list.c
+++ b/fs/afs/vl_list.c
@@ -92,7 +92,7 @@ static struct afs_addr_list *afs_extract_vl_addrs(struct afs_net *net,
{
struct afs_addr_list *alist;
const u8 *b = *_b;
- int ret = -EINVAL;
+ int ret;
alist = afs_alloc_addrlist(nr_addrs);
if (!alist)
@@ -110,6 +110,7 @@ static struct afs_addr_list *afs_extract_vl_addrs(struct afs_net *net,
case DNS_ADDRESS_IS_IPV4:
if (end - b < 4) {
_leave(" = -EINVAL [short inet]");
+ ret = -EINVAL;
goto error;
}
memcpy(x, b, 4);
@@ -122,6 +123,7 @@ static struct afs_addr_list *afs_extract_vl_addrs(struct afs_net *net,
case DNS_ADDRESS_IS_IPV6:
if (end - b < 16) {
_leave(" = -EINVAL [short inet6]");
+ ret = -EINVAL;
goto error;
}
memcpy(x, b, 16);
@@ -198,6 +200,8 @@ struct afs_vlserver_list *afs_extract_vlserver_list(struct afs_cell *cell,
b += sizeof(*hdr);
while (end - b >= sizeof(bs)) {
+ int nlen;
+
bs.name_len = afs_extract_le16(&b);
bs.priority = afs_extract_le16(&b);
bs.weight = afs_extract_le16(&b);
@@ -207,10 +211,12 @@ struct afs_vlserver_list *afs_extract_vlserver_list(struct afs_cell *cell,
bs.protocol = *b++;
bs.nr_addrs = *b++;
+ nlen = min3(bs.name_len, end - b, 255);
+
_debug("extract %u %u %u %u %u %u %*.*s",
bs.name_len, bs.priority, bs.weight,
bs.port, bs.protocol, bs.nr_addrs,
- bs.name_len, bs.name_len, b);
+ bs.name_len, nlen, b);
if (end - b < bs.name_len)
break;
@@ -287,8 +293,20 @@ struct afs_vlserver_list *afs_extract_vlserver_list(struct afs_cell *cell,
afs_put_addrlist(old, afs_alist_trace_put_vlserver_old);
}
+ /* Check for duplicates in the server list */
+ for (j = 0; j < vllist->nr_servers; j++) {
+ struct afs_vlserver *s = vllist->servers[j].server;
- /* TODO: Might want to check for duplicates */
+ if (s->name_len == server->name_len &&
+ s->port == server->port &&
+ strncasecmp(s->name, server->name, server->name_len) == 0) {
+ afs_put_vlserver(cell->net, server);
+ server = NULL;
+ break;
+ }
+ }
+ if (!server)
+ continue;
/* Insertion-sort by priority and weight */
for (j = 0; j < vllist->nr_servers; j++) {
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 9ae5c8ad2e04..4f79d25ec37f 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -40,7 +40,7 @@ static struct afs_volume *afs_insert_volume_into_cell(struct afs_cell *cell,
goto found;
}
- set_bit(AFS_VOLUME_RM_TREE, &volume->flags);
+ set_bit(AFS_VOLUME_RM_TREE, &p->flags);
rb_replace_node_rcu(&p->cell_node, &volume->cell_node, &cell->volumes);
}
}
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 768aca2dc0f0..f1863a891db6 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -360,18 +360,23 @@ __bpf_kfunc int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__s
#endif /* CONFIG_CGROUPS */
/**
- * bpf_real_inode - get the real inode backing a dentry
- * @dentry: dentry to resolve
+ * bpf_real_data_inode - get the real inode hosting a file's data
+ * @file: file to resolve
*
- * If the dentry is on a union/overlay filesystem, return the underlying, real
- * inode that hosts the data. Otherwise return the inode attached to the
- * dentry itself.
+ * Resolve @file to the inode that hosts its data. For a regular file on a
+ * union/overlay filesystem this is the underlying (upper or lower) inode that
+ * stores the data, not the overlay inode.
*
- * Return: The real inode backing the dentry, or NULL for a negative dentry.
+ * Data resolution only applies to regular files. For a non-regular file (e.g.
+ * a device node, fifo or socket) on a union/overlay filesystem the overlay
+ * inode itself is returned; for any file on a non-union filesystem the inode
+ * attached to @file is returned.
+ *
+ * Return: The inode hosting @file's data, or NULL.
*/
-__bpf_kfunc struct inode *bpf_real_inode(struct dentry *dentry)
+__bpf_kfunc struct inode *bpf_real_data_inode(struct file *file)
{
- return d_real_inode(dentry);
+ return d_real_inode(file_dentry(file));
}
__bpf_kfunc_end_defs();
@@ -384,7 +389,7 @@ BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_set_dentry_xattr, KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_remove_dentry_xattr, KF_SLEEPABLE)
-BTF_ID_FLAGS(func, bpf_real_inode, KF_SLEEPABLE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_real_data_inode, KF_SLEEPABLE | KF_RET_NULL)
BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 2937db690b40..8a9f6be15828 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -209,7 +209,6 @@ lookup_error:
return ERR_PTR(ret);
nomem_d_alloc:
- inode_unlock(d_inode(dir));
_leave(" = -ENOMEM");
return ERR_PTR(-ENOMEM);
}
@@ -375,7 +374,7 @@ try_again:
"Rename failed with error %d", ret);
}
- __cachefiles_unmark_inode_in_use(object, d_inode(rep));
+ cachefiles_do_unmark_inode_in_use(object, d_inode(rep));
end_renaming(&rd);
_leave(" = 0");
return 0;
@@ -467,7 +466,6 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
ret = -EINVAL;
if (unlikely(!file->f_op->read_iter) ||
unlikely(!file->f_op->write_iter)) {
- fput(file);
pr_notice("Cache does not support read_iter and write_iter\n");
goto err_unuse;
}
diff --git a/fs/exec.c b/fs/exec.c
index b92fe7db176c..d5993cedc829 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1717,7 +1717,7 @@ static int exec_binprm(struct linux_binprm *bprm)
old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));
rcu_read_unlock();
- /* This allows 4 levels of binfmt rewrites before failing hard. */
+ /* This allows 5 levels of binfmt rewrites before failing hard. */
for (depth = 0;; depth++) {
struct file *exec;
if (depth > 5)
diff --git a/fs/exfat/iomap.c b/fs/exfat/iomap.c
index 1aac38e63fe6..190fc6471f84 100644
--- a/fs/exfat/iomap.c
+++ b/fs/exfat/iomap.c
@@ -253,10 +253,7 @@ static void exfat_iomap_read_end_io(struct bio *bio)
static void exfat_iomap_bio_submit_read(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx)
{
- struct bio *bio = ctx->read_ctx;
-
- bio->bi_end_io = exfat_iomap_read_end_io;
- submit_bio(bio);
+ iomap_bio_submit_read_endio(iter, ctx, exfat_iomap_read_end_io);
}
const struct iomap_read_ops exfat_iomap_bio_read_ops = {
diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index 4f6f42f33613..c6cca5d00ffd 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -131,6 +131,31 @@ static inline int fat_get_entry(struct inode *dir, loff_t *pos,
}
/*
+ * Like fat_get_entry(), but honour the FAT end-of-directory marker:
+ * a dirent whose first name byte is NUL terminates iteration per the
+ * spec, which also guarantees that every following slot is zeroed.
+ * Skip straight to the end of the directory so the next call returns
+ * -1 from fat_bmap() without re-reading the trailing zero slots, and
+ * so callers that persist *pos across invocations (e.g. readdir's
+ * ctx->pos) keep reporting EOD. Release *bh and set it to NULL to
+ * match fat_get_entry()'s contract that *bh is NULL on the -1 return.
+ */
+static int fat_get_entry_eod(struct inode *dir, loff_t *pos,
+ struct buffer_head **bh,
+ struct msdos_dir_entry **de)
+{
+ int err = fat_get_entry(dir, pos, bh, de);
+
+ if (err == 0 && (*de)->name[0] == 0) {
+ brelse(*bh);
+ *bh = NULL;
+ *pos = dir->i_size;
+ return -1;
+ }
+ return err;
+}
+
+/*
* Convert Unicode 16 to UTF-8, translated Unicode, or ASCII.
* If uni_xlate is enabled and we can't get a 1:1 conversion, use a
* colon as an escape character since it is normally invalid on the vfat
@@ -327,7 +352,7 @@ parse_long:
if (ds->id & 0x40)
(*unicode)[offset + 13] = 0;
- if (fat_get_entry(dir, pos, bh, de) < 0)
+ if (fat_get_entry_eod(dir, pos, bh, de) < 0)
return PARSE_EOF;
if (slot == 0)
break;
@@ -489,7 +514,7 @@ int fat_search_long(struct inode *inode, const unsigned char *name,
err = -ENOENT;
while (1) {
- if (fat_get_entry(inode, &cpos, &bh, &de) == -1)
+ if (fat_get_entry_eod(inode, &cpos, &bh, &de) == -1)
goto end_of_dir;
parse_record:
nr_slots = 0;
@@ -601,7 +626,7 @@ static int __fat_readdir(struct inode *inode, struct file *file,
bh = NULL;
get_new:
- if (fat_get_entry(inode, &cpos, &bh, &de) == -1)
+ if (fat_get_entry_eod(inode, &cpos, &bh, &de) == -1)
goto end_of_dir;
parse_record:
nr_slots = 0;
@@ -885,7 +910,7 @@ static int fat_get_short_entry(struct inode *dir, loff_t *pos,
struct buffer_head **bh,
struct msdos_dir_entry **de)
{
- while (fat_get_entry(dir, pos, bh, de) >= 0) {
+ while (fat_get_entry_eod(dir, pos, bh, de) >= 0) {
/* free entry or long name entry or volume label */
if (!IS_FREE((*de)->name) && !((*de)->attr & ATTR_VOLUME))
return 0;
@@ -1302,6 +1327,7 @@ int fat_add_entries(struct inode *dir, void *slots, int nr_slots,
struct msdos_dir_entry *de;
int err, free_slots, i, nr_bhs;
loff_t pos;
+ bool saw_eod;
sinfo->nr_slots = nr_slots;
@@ -1310,12 +1336,15 @@ int fat_add_entries(struct inode *dir, void *slots, int nr_slots,
bh = prev = NULL;
pos = 0;
err = -ENOSPC;
+ saw_eod = false;
while (fat_get_entry(dir, &pos, &bh, &de) > -1) {
/* check the maximum size of directory */
if (pos >= FAT_MAX_DIR_SIZE)
goto error;
if (IS_FREE(de->name)) {
+ if (de->name[0] == 0)
+ saw_eod = true;
if (prev != bh) {
get_bh(bh);
bhs[nr_bhs] = prev = bh;
@@ -1325,6 +1354,13 @@ int fat_add_entries(struct inode *dir, void *slots, int nr_slots,
if (free_slots == nr_slots)
goto found;
} else {
+ if (saw_eod) {
+ fat_fs_error_ratelimit(sb,
+ "allocated dir entry found after end-of-directory marker (i_pos %lld)",
+ MSDOS_I(dir)->i_pos);
+ err = -EIO;
+ goto error;
+ }
for (i = 0; i < nr_bhs; i++)
brelse(bhs[i]);
prev = NULL;
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 1ca7eb3a6cb5..f8829231e3d7 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -295,7 +295,7 @@ static bool capable_wrt_mount(struct mount *mount)
*/
guard(rcu)();
mnt_ns = READ_ONCE(mount->mnt_ns);
- return ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN);
+ return mnt_ns && ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN);
}
static inline int may_decode_fh(struct handle_to_path_ctx *ctx,
diff --git a/fs/freevxfs/vxfs_bmap.c b/fs/freevxfs/vxfs_bmap.c
index e85222892038..1b8216eb1d90 100644
--- a/fs/freevxfs/vxfs_bmap.c
+++ b/fs/freevxfs/vxfs_bmap.c
@@ -227,7 +227,8 @@ vxfs_bmap_typed(struct inode *ip, long iblock)
return 0;
}
default:
- BUG();
+ WARN_ON_ONCE(1);
+ return 0;
}
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e052a0d44dee..ceada75310b8 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -981,19 +981,8 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
return ret;
}
-static void fuse_iomap_submit_read(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
-{
- struct fuse_fill_read_data *data = ctx->read_ctx;
-
- if (data->ia)
- fuse_send_readpages(data->ia, data->file, data->nr_bytes,
- data->fc->async_read);
-}
-
static const struct iomap_read_ops fuse_iomap_read_ops = {
.read_folio_range = fuse_iomap_read_folio_range_async,
- .submit_read = fuse_iomap_submit_read,
};
static int fuse_read_folio(struct file *file, struct folio *folio)
@@ -1116,6 +1105,9 @@ static void fuse_readahead(struct readahead_control *rac)
return;
iomap_readahead(&fuse_iomap_ops, &ctx, NULL);
+ if (data.ia)
+ fuse_send_readpages(data.ia, data.file, data.nr_bytes,
+ fc->async_read);
}
static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
index 4504f4633f17..dc8ac7e370a5 100644
--- a/fs/iomap/bio.c
+++ b/fs/iomap/bio.c
@@ -78,14 +78,24 @@ u32 iomap_finish_ioend_buffered_read(struct iomap_ioend *ioend)
return __iomap_read_end_io(&ioend->io_bio, ioend->io_error);
}
-static void iomap_bio_submit_read(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
+void iomap_bio_submit_read_endio(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, bio_end_io_t end_io)
{
struct bio *bio = ctx->read_ctx;
+ bio->bi_end_io = end_io;
if (iter->iomap.flags & IOMAP_F_INTEGRITY)
fs_bio_integrity_alloc(bio);
submit_bio(bio);
+
+ ctx->read_ctx = NULL;
+}
+EXPORT_SYMBOL_GPL(iomap_bio_submit_read_endio);
+
+static void iomap_bio_submit_read(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx)
+{
+ return iomap_bio_submit_read_endio(iter, ctx, iomap_read_end_io);
}
static struct bio_set *iomap_read_bio_set(struct iomap_read_folio_ctx *ctx)
@@ -127,7 +137,6 @@ static void iomap_read_alloc_bio(const struct iomap_iter *iter,
if (ctx->rac)
bio->bi_opf |= REQ_RAHEAD;
bio->bi_iter.bi_sector = iomap_sector(iomap, iter->pos);
- bio->bi_end_io = iomap_read_end_io;
bio_add_folio_nofail(bio, folio, plen,
offset_in_folio(folio, iter->pos));
ctx->read_ctx = bio;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8d4806dc46d4..276720bc18dc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -642,12 +642,12 @@ void iomap_read_folio(const struct iomap_ops *ops,
fsverity_readahead(ctx->vi, folio->index,
folio_nr_pages(folio));
- while ((ret = iomap_iter(&iter, ops)) > 0)
+ while ((ret = iomap_iter(&iter, ops)) > 0) {
iter.status = iomap_read_folio_iter(&iter, ctx,
&bytes_submitted);
-
- if (ctx->read_ctx && ctx->ops->submit_read)
- ctx->ops->submit_read(&iter, ctx);
+ if (ctx->read_ctx && ctx->ops->submit_read)
+ ctx->ops->submit_read(&iter, ctx);
+ }
if (ctx->cur_folio)
iomap_read_end(ctx->cur_folio, bytes_submitted);
@@ -718,12 +718,12 @@ void iomap_readahead(const struct iomap_ops *ops,
fsverity_readahead(ctx->vi, readahead_index(rac),
readahead_count(rac));
- while (iomap_iter(&iter, ops) > 0)
+ while (iomap_iter(&iter, ops) > 0) {
iter.status = iomap_readahead_iter(&iter, ctx,
&cur_bytes_submitted);
-
- if (ctx->read_ctx && ctx->ops->submit_read)
- ctx->ops->submit_read(&iter, ctx);
+ if (ctx->read_ctx && ctx->ops->submit_read)
+ ctx->ops->submit_read(&iter, ctx);
+ }
if (ctx->cur_folio)
iomap_read_end(ctx->cur_folio, cur_bytes_submitted);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b485e3b191da..e2cd5f92babe 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -369,7 +369,7 @@ static ssize_t iomap_dio_bio_iter_one(struct iomap_iter *iter,
*/
if ((op & REQ_ATOMIC) && WARN_ON_ONCE(ret != iomap_length(iter))) {
ret = -EINVAL;
- goto out_put_bio;
+ goto out_bio_release_pages;
}
if (iter->iomap.flags & IOMAP_F_INTEGRITY) {
@@ -393,6 +393,11 @@ static ssize_t iomap_dio_bio_iter_one(struct iomap_iter *iter,
iomap_dio_submit_bio(iter, dio, bio, pos);
return ret;
+out_bio_release_pages:
+ if (dio->flags & IOMAP_DIO_BOUNCE)
+ bio_iov_iter_unbounce(bio, true, false);
+ else
+ bio_release_pages(bio, false);
out_put_bio:
bio_put(bio);
return ret;
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index f7c3e0c70fd7..0565328764c1 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -298,8 +298,12 @@ new_ioend:
* appending writes.
*/
ioend->io_size += map_len;
- if (ioend->io_offset + ioend->io_size > end_pos)
- ioend->io_size = end_pos - ioend->io_offset;
+ if (ioend->io_offset + ioend->io_size > end_pos) {
+ if (ioend->io_offset >= end_pos)
+ ioend->io_size = 0;
+ else
+ ioend->io_size = end_pos - ioend->io_offset;
+ }
wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
return map_len;
diff --git a/fs/minix/minix.h b/fs/minix/minix.h
index f2025c9b5825..9e52d4302f0d 100644
--- a/fs/minix/minix.h
+++ b/fs/minix/minix.h
@@ -97,7 +97,7 @@ static inline struct minix_inode_info *minix_i(struct inode *inode)
static inline unsigned minix_blocks_needed(unsigned bits, unsigned blocksize)
{
- return DIV_ROUND_UP(bits, blocksize * 8);
+ return DIV_ROUND_UP_POW2(bits, blocksize * 8);
}
#if defined(CONFIG_MINIX_FS_NATIVE_ENDIAN) && \
diff --git a/fs/namei.c b/fs/namei.c
index 5cc9f0f466b8..19ce43c9a6e6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4736,6 +4736,10 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
int error;
int open_flag = file->f_flags;
+ /* A tmpfile is I_LINKABLE, so guard its owner like may_o_create(). */
+ if (!fsuidgid_has_mapping(dir->i_sb, idmap))
+ return -EOVERFLOW;
+
/* we want directory to be writable */
error = inode_permission(idmap, dir, MAY_WRITE | MAY_EXEC);
if (error)
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 76d0f6a29aba..24a8a5418e31 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -659,7 +659,7 @@ retry:
* within the cache granule containing the EOF, in which case we need
* to preload the granule.
*/
- if (!netfs_is_cache_enabled(ctx) &&
+ if (!netfs_is_cache_maybe_enabled(ctx) &&
netfs_skip_folio_read(folio, pos, len, false)) {
netfs_stat(&netfs_n_rh_write_zskip);
goto have_folio_no_wait;
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 6bde3320bcec..2cdb68e6b16f 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -277,7 +277,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
* caching service temporarily because the backing store got
* culled.
*/
- if (netfs_is_cache_enabled(ctx)) {
+ if (netfs_is_cache_maybe_enabled(ctx)) {
if (finfo) {
netfs_stat(&netfs_n_wh_wstream_conflict);
goto flush_content;
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 25f8ceb15fad..c16fbad286a1 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -166,13 +166,16 @@ static int netfs_unbuffered_write(struct netfs_io_request *wreq)
*/
subreq->error = -EAGAIN;
trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
- if (subreq->transferred > 0)
+ if (subreq->transferred > 0) {
iov_iter_advance(&wreq->buffer.iter, subreq->transferred);
+ wreq->transferred += subreq->transferred;
+ }
if (stream->source == NETFS_UPLOAD_TO_SERVER &&
wreq->netfs_ops->retry_request)
wreq->netfs_ops->retry_request(wreq, stream);
+ __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
__clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
__clear_bit(NETFS_SREQ_BOUNDARY, &subreq->flags);
__clear_bit(NETFS_SREQ_FAILED, &subreq->flags);
@@ -186,17 +189,10 @@ static int netfs_unbuffered_write(struct netfs_io_request *wreq)
netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
- if (stream->prepare_write) {
+ if (stream->prepare_write)
stream->prepare_write(subreq);
- __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
- netfs_stat(&netfs_n_wh_retry_write_subreq);
- } else {
- struct iov_iter source;
-
- netfs_reset_iter(subreq);
- source = subreq->io_iter;
- netfs_reissue_write(stream, subreq, &source);
- }
+ __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
+ netfs_stat(&netfs_n_wh_retry_write_subreq);
}
netfs_unbuffered_write_done(wreq);
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 645996ecfc80..d889caa401dc 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -239,6 +239,18 @@ static inline bool netfs_is_cache_enabled(struct netfs_inode *ctx)
#endif
}
+static inline bool netfs_is_cache_maybe_enabled(struct netfs_inode *ctx)
+{
+#if IS_ENABLED(CONFIG_FSCACHE)
+ struct fscache_cookie *cookie = ctx->cache;
+
+ return fscache_cookie_valid(cookie) &&
+ test_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags);
+#else
+ return false;
+#endif
+}
+
/*
* Get a ref on a netfs group attached to a dirty page (e.g. a ceph snap).
*/
diff --git a/fs/netfs/locking.c b/fs/netfs/locking.c
index 2249ecd09d0a..4e3be2b81504 100644
--- a/fs/netfs/locking.c
+++ b/fs/netfs/locking.c
@@ -9,6 +9,11 @@
#include <linux/netfs.h>
#include "internal.h"
+struct netfs_wb_waiter {
+ struct list_head link; /* Link in ictx->wb_queue */
+ struct task_struct *waiter; /* Waiter task; cleared when lock granted */
+};
+
/*
* inode_dio_wait_interruptible - wait for outstanding DIO requests to finish
* @inode: inode to wait for
@@ -203,3 +208,93 @@ void netfs_end_io_direct(struct inode *inode)
up_read(&inode->i_rwsem);
}
EXPORT_SYMBOL(netfs_end_io_direct);
+
+/*
+ * Wait to have exclusive access to writeback.
+ */
+static bool netfs_wb_begin_wait(struct netfs_inode *ictx)
+{
+ struct netfs_wb_waiter waiter = {};
+ struct task_struct *tsk = current;
+ bool got = false;
+
+ spin_lock(&ictx->lock);
+
+ if (test_and_set_bit_lock(NETFS_ICTX_WB_LOCK, &ictx->flags)) {
+ get_task_struct(tsk);
+ waiter.waiter = tsk;
+ list_add_tail(&waiter.link, &ictx->wb_queue);
+ } else {
+ got = true;
+ }
+ spin_unlock(&ictx->lock);
+
+ if (!got) {
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ /* Read waiter before accessing inode state. */
+ if (smp_load_acquire(&waiter.waiter) == NULL)
+ break;
+ schedule();
+ }
+ }
+ __set_current_state(TASK_RUNNING);
+ return true;
+}
+
+/**
+ * netfs_wb_begin - Begin writeback, waiting if need be
+ * @ictx: The inode to get writeback access on
+ * @nowait: Return failure immediately rather than waiting if true
+ *
+ * Begin writeback to an inode, waiting for exclusive access if @nowait is
+ * false. This prevents collection from being done out of order with respect
+ * to the issuance of write subrequests.
+ *
+ * Note that writeback may be ended in a different process (e.g. the collection
+ * function on a workqueue) than started it.
+ *
+ * Return: True if can proceed, false if denied.
+ */
+bool netfs_wb_begin(struct netfs_inode *ictx, bool nowait)
+{
+ if (!test_and_set_bit_lock(NETFS_ICTX_WB_LOCK, &ictx->flags))
+ return true;
+ if (nowait) {
+ netfs_stat(&netfs_n_wb_lock_skip);
+ return false;
+ }
+ netfs_stat(&netfs_n_wb_lock_wait);
+ return netfs_wb_begin_wait(ictx);
+}
+EXPORT_SYMBOL(netfs_wb_begin);
+
+/* netfs_wb_end - End writeback
+ * @ictx: The inode we have writeback access to
+ *
+ * End writeback access on an inode, waking up the next writeback request.
+ */
+void netfs_wb_end(struct netfs_inode *ictx)
+{
+ struct netfs_wb_waiter *waiter;
+ struct task_struct *tsk;
+
+ WARN_ON_ONCE(!test_bit(NETFS_ICTX_WB_LOCK, &ictx->flags));
+
+ spin_lock(&ictx->lock);
+
+ waiter = list_first_entry_or_null(&ictx->wb_queue, struct netfs_wb_waiter, link);
+ if (waiter) {
+ list_del(&waiter->link);
+ tsk = waiter->waiter;
+ /* Write inode state before clearing waiter. */
+ smp_store_release(&waiter->waiter, NULL);
+ wake_up_process(tsk);
+ put_task_struct(tsk);
+ } else {
+ clear_bit_unlock(NETFS_ICTX_WB_LOCK, &ictx->flags);
+ }
+
+ spin_unlock(&ictx->lock);
+}
+EXPORT_SYMBOL(netfs_wb_end);
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index f59a70f3a086..2b42758e01ec 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -98,7 +98,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
goto abandon;
}
- list_for_each_continue(next, &stream->subrequests) {
+ for (;;) {
+ /* Read pointer to subreq before reading subreq state. */
+ next = smp_load_acquire(&next->next);
+ if (next == &stream->subrequests)
+ break;
+
subreq = list_entry(next, struct netfs_io_subrequest, rreq_link);
if (subreq->start + subreq->transferred != start + len ||
test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags) ||
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 24fc2bb2f8a4..210eb8f3958d 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -408,6 +408,16 @@ bool netfs_write_collection(struct netfs_io_request *wreq)
netfs_wake_rreq_flag(wreq, NETFS_RREQ_IN_PROGRESS, netfs_rreq_trace_wake_ip);
/* As we cleared NETFS_RREQ_IN_PROGRESS, we acquired its ref. */
+ switch (wreq->origin) {
+ case NETFS_WRITEBACK:
+ case NETFS_WRITEBACK_SINGLE:
+ case NETFS_WRITETHROUGH:
+ netfs_wb_end(ictx);
+ break;
+ default:
+ break;
+ }
+
if (wreq->iocb) {
size_t written = min(wreq->transferred, wreq->len);
wreq->iocb->ki_pos += written;
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index c03c7cc45e47..f2761c99795a 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -106,7 +106,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
_enter("R=%x", wreq->debug_id);
ictx = netfs_inode(wreq->inode);
- if (is_cacheable && netfs_is_cache_enabled(ictx))
+ if (is_cacheable)
fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
if (rolling_buffer_init(&wreq->buffer, wreq->debug_id, ITER_SOURCE) < 0)
goto nomem;
@@ -551,14 +551,8 @@ int netfs_writepages(struct address_space *mapping,
struct folio *folio;
int error = 0;
- if (!mutex_trylock(&ictx->wb_lock)) {
- if (wbc->sync_mode == WB_SYNC_NONE) {
- netfs_stat(&netfs_n_wb_lock_skip);
- return 0;
- }
- netfs_stat(&netfs_n_wb_lock_wait);
- mutex_lock(&ictx->wb_lock);
- }
+ if (!netfs_wb_begin(ictx, wbc->sync_mode == WB_SYNC_NONE))
+ return 0;
/* Need the first folio to be able to set up the op. */
folio = writeback_iter(mapping, wbc, NULL, &error);
@@ -588,13 +582,13 @@ int netfs_writepages(struct address_space *mapping,
}
error = netfs_write_folio(wreq, wbc, folio);
- if (error < 0)
- break;
+ if (error == -ENOMEM) {
+ folio_redirty_for_writepage(wbc, folio);
+ folio_unlock(folio);
+ }
} while ((folio = writeback_iter(mapping, wbc, folio, &error)));
netfs_end_issue_write(wreq);
-
- mutex_unlock(&ictx->wb_lock);
netfs_wake_collector(wreq);
netfs_put_request(wreq, netfs_rreq_trace_put_return);
@@ -602,9 +596,16 @@ int netfs_writepages(struct address_space *mapping,
return error;
couldnt_start:
- netfs_kill_dirty_pages(mapping, wbc, folio);
+ if (error == -ENOMEM) {
+ folio_redirty_for_writepage(wbc, folio);
+ folio_unlock(folio);
+ folio = writeback_iter(mapping, wbc, folio, &error);
+ WARN_ON_ONCE(folio != NULL);
+ } else {
+ netfs_kill_dirty_pages(mapping, wbc, folio);
+ }
out:
- mutex_unlock(&ictx->wb_lock);
+ netfs_wb_end(ictx);
_leave(" = %d", error);
return error;
}
@@ -618,16 +619,17 @@ struct netfs_io_request *netfs_begin_writethrough(struct kiocb *iocb, size_t len
struct netfs_io_request *wreq = NULL;
struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
- mutex_lock(&ictx->wb_lock);
+ netfs_wb_begin(ictx, false);
wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
iocb->ki_pos, NETFS_WRITETHROUGH);
if (IS_ERR(wreq)) {
- mutex_unlock(&ictx->wb_lock);
+ netfs_wb_end(ictx);
return wreq;
}
wreq->io_streams[0].avail = true;
+ __set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &wreq->flags);
trace_netfs_write(wreq, netfs_write_trace_writethrough);
return wreq;
}
@@ -685,7 +687,6 @@ int netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_c
ssize_t netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
struct folio *writethrough_cache)
{
- struct netfs_inode *ictx = netfs_inode(wreq->inode);
ssize_t ret;
_enter("R=%x", wreq->debug_id);
@@ -699,8 +700,6 @@ ssize_t netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_c
netfs_end_issue_write(wreq);
- mutex_unlock(&ictx->wb_lock);
-
if (wreq->iocb)
ret = -EIOCBQUEUED;
else
@@ -847,15 +846,10 @@ int netfs_writeback_single(struct address_space *mapping,
if (WARN_ON_ONCE(!iov_iter_is_folioq(iter)))
return -EIO;
- if (!mutex_trylock(&ictx->wb_lock)) {
- if (wbc->sync_mode == WB_SYNC_NONE) {
- /* The VFS will have undirtied the inode. */
- netfs_single_mark_inode_dirty(&ictx->inode);
- netfs_stat(&netfs_n_wb_lock_skip);
- return 1;
- }
- netfs_stat(&netfs_n_wb_lock_wait);
- mutex_lock(&ictx->wb_lock);
+ if (!netfs_wb_begin(ictx, wbc->sync_mode == WB_SYNC_NONE)) {
+ /* The VFS will have undirtied the inode. */
+ netfs_single_mark_inode_dirty(&ictx->inode);
+ return 1;
}
wreq = netfs_create_write_req(mapping, NULL, 0, NETFS_WRITEBACK_SINGLE);
@@ -893,7 +887,6 @@ stop:
smp_wmb(); /* Write lists before ALL_QUEUED. */
set_bit(NETFS_RREQ_ALL_QUEUED, &wreq->flags);
- mutex_unlock(&ictx->wb_lock);
netfs_wake_collector(wreq);
netfs_put_request(wreq, netfs_rreq_trace_put_return);
@@ -901,7 +894,7 @@ stop:
return ret;
couldnt_start:
- mutex_unlock(&ictx->wb_lock);
+ netfs_wb_end(ictx);
_leave(" = %d", ret);
return ret;
}
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 32735abfa03f..058bc7a166a5 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -72,7 +72,12 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
!test_bit(NETFS_SREQ_NEED_RETRY, &from->flags))
return;
- list_for_each_continue(next, &stream->subrequests) {
+ for (;;) {
+ /* Read pointer to subreq before reading subreq state. */
+ next = smp_load_acquire(&next->next);
+ if (next == &stream->subrequests)
+ break;
+
subreq = list_entry(next, struct netfs_io_subrequest, rreq_link);
if (subreq->start + subreq->transferred != start + len ||
test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags) ||
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 1fbf832ad165..f2bb56506046 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -38,11 +38,9 @@ static void ntfs_iomap_read_end_io(struct bio *bio)
}
static void ntfs_iomap_bio_submit_read(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx)
{
- struct bio *bio = ctx->read_ctx;
- bio->bi_end_io = ntfs_iomap_read_end_io;
- submit_bio(bio);
+ iomap_bio_submit_read_endio(iter, ctx, ntfs_iomap_read_end_io);
}
static const struct iomap_read_ops ntfs_iomap_bio_read_ops = {
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index c43101cc064d..0c9bd669117d 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -608,10 +608,7 @@ static void ntfs_iomap_read_end_io(struct bio *bio)
static void ntfs_iomap_bio_submit_read(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx)
{
- struct bio *bio = ctx->read_ctx;
-
- bio->bi_end_io = ntfs_iomap_read_end_io;
- submit_bio(bio);
+ iomap_bio_submit_read_endio(iter, ctx, ntfs_iomap_read_end_io);
}
static const struct iomap_read_ops ntfs_iomap_bio_read_ops = {
diff --git a/fs/orangefs/dir.c b/fs/orangefs/dir.c
index 6e2ebc8b9867..115b2c2f5269 100644
--- a/fs/orangefs/dir.c
+++ b/fs/orangefs/dir.c
@@ -191,7 +191,8 @@ static int fill_from_part(struct orangefs_dir_part *part,
{
const int offset = sizeof(struct orangefs_readdir_response_s);
struct orangefs_khandle *khandle;
- __u32 *len, padlen;
+ __u32 *len;
+ u64 padlen;
loff_t i;
char *s;
i = ctx->pos & ~PART_MASK;
@@ -215,8 +216,8 @@ static int fill_from_part(struct orangefs_dir_part *part,
* len is the size of the string itself. padlen is the
* total size of the encoded string.
*/
- padlen = (sizeof *len + *len + 1) +
- (8 - (sizeof *len + *len + 1)%8)%8;
+ padlen = (u64)sizeof *len + *len + 1;
+ padlen += (8 - padlen % 8) % 8;
if (part->len < i + padlen + sizeof *khandle)
goto next;
s = (void *)part + offset + i + sizeof *len;
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 13cb60b52bd6..e963701b4c87 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -853,7 +853,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
{
struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
struct inode *udir = d_inode(c->destdir);
- struct dentry *temp, *upper;
+ struct dentry *temp, *upper, *newdentry = NULL;
struct file *tmpfile;
int err;
@@ -889,6 +889,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
err = PTR_ERR(upper);
if (!IS_ERR(upper)) {
err = ovl_do_link(ofs, temp, udir, upper);
+ if (!err) {
+ /*
+ * Record the linked dentry -- not the disconnected
+ * O_TMPFILE dentry -- so that ->d_revalidate() on
+ * the upper fs sees the real parent/name.
+ */
+ newdentry = dget(upper);
+ }
end_creating(upper);
}
@@ -903,7 +911,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
if (!c->metacopy)
ovl_set_upperdata(d_inode(c->dentry));
- ovl_inode_update(d_inode(c->dentry), dget(temp));
+ ovl_inode_update(d_inode(c->dentry), newdentry);
out:
ovl_end_write(c->dentry);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 00c69707bda9..bc71231cad53 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -783,8 +783,8 @@ static const struct address_space_operations ovl_aops = {
*
* This chain is valid:
* - inode->i_rwsem (inode_lock[2])
- * - upper_mnt->mnt_sb->s_writers (ovl_want_write[0])
* - OVL_I(inode)->lock (ovl_inode_lock[2])
+ * - upper_mnt->mnt_sb->s_writers (ovl_want_write[0])
* - OVL_I(lowerinode)->lock (ovl_inode_lock[1])
*
* And this chain is valid:
@@ -797,8 +797,8 @@ static const struct address_space_operations ovl_aops = {
* held, because it is in reverse order of the non-nested case using the same
* upper fs:
* - inode->i_rwsem (inode_lock[1])
- * - upper_mnt->mnt_sb->s_writers (ovl_want_write[0])
* - OVL_I(inode)->lock (ovl_inode_lock[1])
+ * - upper_mnt->mnt_sb->s_writers (ovl_want_write[0])
*/
#define OVL_MAX_NESTING FILESYSTEM_MAX_STACK_DEPTH
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index adc9b9a092b0..26086a283672 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -112,6 +112,8 @@ static bool pde_subdir_insert(struct proc_dir_entry *dir,
/* Add new node and rebalance tree. */
rb_link_node(&de->subdir_node, parent, new);
rb_insert_color(&de->subdir_node, root);
+ if (S_ISDIR(de->mode))
+ dir->nlink++;
return true;
}
@@ -404,7 +406,6 @@ struct proc_dir_entry *proc_register(struct proc_dir_entry *dir,
write_unlock(&proc_subdir_lock);
goto out_free_inum;
}
- dir->nlink++;
write_unlock(&proc_subdir_lock);
return dp;
@@ -706,6 +707,8 @@ static void pde_erase(struct proc_dir_entry *pde, struct proc_dir_entry *parent)
{
rb_erase(&pde->subdir_node, &parent->subdir);
RB_CLEAR_NODE(&pde->subdir_node);
+ if (S_ISDIR(pde->mode))
+ parent->nlink--;
}
/*
@@ -731,8 +734,6 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
de = NULL;
} else {
pde_erase(de, parent);
- if (S_ISDIR(de->mode))
- parent->nlink--;
}
}
write_unlock(&proc_subdir_lock);
@@ -791,8 +792,6 @@ int remove_proc_subtree(const char *name, struct proc_dir_entry *parent)
continue;
}
next = de->parent;
- if (S_ISDIR(de->mode))
- next->nlink--;
write_unlock(&proc_subdir_lock);
proc_entry_rundown(de);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2a0c54256e93..51293b6f331f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -764,8 +764,7 @@ xfs_bio_submit_read(
/* defer read completions to the ioend workqueue */
iomap_init_ioend(iter->inode, bio, ctx->read_ctx_file_offset, 0);
- bio->bi_end_io = xfs_end_bio;
- submit_bio(bio);
+ iomap_bio_submit_read_endio(iter, ctx, xfs_end_bio);
}
static const struct iomap_read_ops xfs_iomap_read_ops = {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eac7f9503805..8531d526fc44 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -534,8 +534,11 @@ xfs_open_devices(
out_free_rtdev_targ:
if (mp->m_rtdev_targp)
xfs_free_buftarg(mp->m_rtdev_targp);
+ mp->m_rtdev_targp = NULL;
+ rtdev_file = NULL; /* released by xfs_free_buftarg() */
out_free_ddev_targ:
xfs_free_buftarg(mp->m_ddev_targp);
+ mp->m_ddev_targp = NULL;
out_close_rtdev:
if (rtdev_file)
bdev_fput(rtdev_file);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 3582ed1fe236..56b43d594e6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -622,6 +622,8 @@ extern struct bio_set iomap_ioend_bioset;
#ifdef CONFIG_BLOCK
int iomap_bio_read_folio_range(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx, size_t plen);
+void iomap_bio_submit_read_endio(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, bio_end_io_t end_io);
extern const struct iomap_read_ops iomap_bio_read_ops;
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 243c0f737938..1bc120d61c5b 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -61,14 +61,16 @@ struct netfs_inode {
#if IS_ENABLED(CONFIG_FSCACHE)
struct fscache_cookie *cache;
#endif
- struct mutex wb_lock; /* Writeback serialisation */
+ struct list_head wb_queue; /* Queue of processes wanting to do writeback */
loff_t _remote_i_size; /* Size of the remote file */
loff_t _zero_point; /* Size after which we assume there's no data
* on the server */
+ spinlock_t lock; /* Lock covering wb_queue */
atomic_t io_count; /* Number of outstanding reqs */
unsigned long flags;
#define NETFS_ICTX_ODIRECT 0 /* The file has DIO in progress */
#define NETFS_ICTX_UNBUFFERED 1 /* I/O should not use the pagecache */
+#define NETFS_ICTX_WB_LOCK 2 /* Writeback serialisation lock */
#define NETFS_ICTX_MODIFIED_ATTR 3 /* Indicate change in mtime/ctime */
#define NETFS_ICTX_SINGLE_NO_UPLOAD 4 /* Monolithic payload, cache but no upload */
};
@@ -462,6 +464,10 @@ int netfs_alloc_folioq_buffer(struct address_space *mapping,
size_t *_cur_size, ssize_t size, gfp_t gfp);
void netfs_free_folioq_buffer(struct folio_queue *fq);
+/* Writeback exclusion API. */
+bool netfs_wb_begin(struct netfs_inode *ictx, bool nowait);
+void netfs_wb_end(struct netfs_inode *ictx);
+
/**
* netfs_inode - Get the netfs inode context from the inode
* @inode: The inode to query
@@ -743,7 +749,8 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
#if IS_ENABLED(CONFIG_FSCACHE)
ctx->cache = NULL;
#endif
- mutex_init(&ctx->wb_lock);
+ INIT_LIST_HEAD(&ctx->wb_queue);
+ spin_lock_init(&ctx->lock);
/* ->releasepage() drives zero_point */
if (use_zero_point) {
ctx->_zero_point = ctx->_remote_i_size;
@@ -753,7 +760,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
/**
* netfs_resize_file - Note that a file got resized
- * @ctx: The netfs inode being resized
+ * @ictx: The netfs inode being resized
* @new_i_size: The new file size
* @changed_on_server: The change was applied to the server
*
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 273919b16161..c2484551a4e8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1568,6 +1568,7 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
struct folio *folio;
unsigned int nr = 0, offset;
loff_t pos = i->xarray_start + i->iov_offset;
+ bool will_alloc = !*pages;
XA_STATE(xas, i->xarray, pos >> PAGE_SHIFT);
offset = pos & ~PAGE_MASK;
@@ -1595,6 +1596,14 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i,
}
rcu_read_unlock();
+ if (!nr) {
+ if (will_alloc) {
+ kvfree(*pages);
+ *pages = NULL;
+ }
+ return 0;
+ }
+
maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
iov_iter_advance(i, maxsize);
return maxsize;
@@ -1628,6 +1637,8 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i,
bi.bi_bvec_done = skip;
maxpages = want_pages_array(pages, maxsize, skip, maxpages);
+ if (!maxpages)
+ return -ENOMEM;
while (bi.bi_size && bi.bi_idx < i->nr_segs) {
struct bio_vec bv = bvec_iter_bvec(i->bvec, bi);
@@ -1745,6 +1756,7 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
unsigned long addr;
unsigned int gup_flags = 0;
size_t offset;
+ bool will_alloc = !*pages;
int res;
if (i->data_source == ITER_DEST)
@@ -1761,8 +1773,14 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
if (!maxpages)
return -ENOMEM;
res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages);
- if (unlikely(res <= 0))
+ if (unlikely(res <= 0)) {
+ if (will_alloc) {
+ kvfree(*pages);
+ *pages = NULL;
+ }
return res;
+ }
+
maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
iov_iter_advance(i, maxsize);
return maxsize;
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index b7fe91ef35b8..6ea40d2e6247 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1366,6 +1366,7 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
sg_max--;
maxsize -= len;
+ start += len;
ret += len;
if (maxsize <= 0 || sg_max == 0)
break;
diff --git a/lib/tests/kunit_iov_iter.c b/lib/tests/kunit_iov_iter.c
index 1e6fce9cb255..d9690ba1db88 100644
--- a/lib/tests/kunit_iov_iter.c
+++ b/lib/tests/kunit_iov_iter.c
@@ -283,7 +283,7 @@ static void __init iov_kunit_copy_to_bvec(struct kunit *test)
struct page **spages, **bpages;
u8 *scratch, *buffer;
size_t bufsize, npages, size, copied;
- int i, b, patt;
+ int i, patt;
bufsize = 0x100000;
npages = bufsize / PAGE_SIZE;
@@ -306,10 +306,9 @@ static void __init iov_kunit_copy_to_bvec(struct kunit *test)
KUNIT_EXPECT_EQ(test, iter.nr_segs, 0);
/* Build the expected image in the scratch buffer. */
- b = 0;
patt = 0;
memset(scratch, 0, bufsize);
- for (pr = bvec_test_ranges; pr->from >= 0; pr++, b++) {
+ for (pr = bvec_test_ranges; pr->from >= 0; pr++) {
u8 *p = scratch + pr->page * PAGE_SIZE;
for (i = pr->from; i < pr->to; i++)
diff --git a/tools/testing/selftests/filesystems/.gitignore b/tools/testing/selftests/filesystems/.gitignore
index 64ac0dfa46b7..a78f894157de 100644
--- a/tools/testing/selftests/filesystems/.gitignore
+++ b/tools/testing/selftests/filesystems/.gitignore
@@ -5,3 +5,4 @@ fclog
file_stressor
anon_inode_test
kernfs_test
+idmapped_tmpfile
diff --git a/tools/testing/selftests/filesystems/Makefile b/tools/testing/selftests/filesystems/Makefile
index 85427d7f19b9..a7ec2ba2dd83 100644
--- a/tools/testing/selftests/filesystems/Makefile
+++ b/tools/testing/selftests/filesystems/Makefile
@@ -2,6 +2,10 @@
CFLAGS += $(KHDR_INCLUDES)
TEST_GEN_PROGS := devpts_pts file_stressor anon_inode_test kernfs_test fclog
+TEST_GEN_PROGS += idmapped_tmpfile
TEST_GEN_PROGS_EXTENDED := dnotify_test
include ../lib.mk
+
+$(OUTPUT)/idmapped_tmpfile: LDLIBS += -lcap
+$(OUTPUT)/idmapped_tmpfile: utils.c
diff --git a/tools/testing/selftests/filesystems/idmapped_tmpfile.c b/tools/testing/selftests/filesystems/idmapped_tmpfile.c
new file mode 100644
index 000000000000..bc411ab8281e
--- /dev/null
+++ b/tools/testing/selftests/filesystems/idmapped_tmpfile.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <sched.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/fsuid.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+
+#include <linux/mount.h>
+#include <linux/types.h>
+
+#include "kselftest_harness.h"
+#include "wrappers.h"
+#include "utils.h"
+
+/*
+ * The test mount maps caller-visible ids [0, MAP_RANGE) onto the on-disk range
+ * [MAP_HOST, MAP_HOST + MAP_RANGE). An id outside [0, MAP_RANGE) therefore has
+ * no mapping in the mount and is not representable in the filesystem.
+ */
+#define MAP_HOST 10000
+#define MAP_RANGE 10000
+#define UNMAPPED 50000
+
+#ifndef MOUNT_ATTR_IDMAP
+#define MOUNT_ATTR_IDMAP 0x00100000
+#endif
+
+#ifndef __NR_mount_setattr
+#define __NR_mount_setattr 442
+#endif
+
+static inline int sys_mount_setattr(int dfd, const char *path,
+ unsigned int flags,
+ struct mount_attr *attr, size_t size)
+{
+ return syscall(__NR_mount_setattr, dfd, path, flags, attr, size);
+}
+
+/*
+ * Clone @path into a detached mount idmapped so that caller-visible ids
+ * [0, MAP_RANGE) map onto the on-disk ids [MAP_HOST, MAP_HOST + MAP_RANGE).
+ * Returns the mount fd, or -1 if idmapped mounts are not available.
+ */
+static int idmapped_clone(const char *path)
+{
+ struct mount_attr attr = {
+ .attr_set = MOUNT_ATTR_IDMAP,
+ };
+ int fd_tree, userns_fd, ret;
+
+ fd_tree = sys_open_tree(AT_FDCWD, path,
+ OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC);
+ if (fd_tree < 0)
+ return -1;
+
+ userns_fd = get_userns_fd(MAP_HOST, 0, MAP_RANGE);
+ if (userns_fd < 0) {
+ close(fd_tree);
+ return -1;
+ }
+
+ attr.userns_fd = userns_fd;
+ ret = sys_mount_setattr(fd_tree, "", AT_EMPTY_PATH, &attr, sizeof(attr));
+ close(userns_fd);
+ if (ret) {
+ close(fd_tree);
+ return -1;
+ }
+
+ return fd_tree;
+}
+
+FIXTURE(idmapped_tmpfile) {
+ char dir[64]; /* non-idmapped path to the layer directory */
+};
+
+FIXTURE_SETUP(idmapped_tmpfile)
+{
+ /* Private mount namespace so test mounts need no cleanup. */
+ ASSERT_EQ(unshare(CLONE_NEWNS), 0);
+ ASSERT_EQ(sys_mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL), 0);
+ ASSERT_EQ(sys_mount("tmpfs", "/tmp", "tmpfs", 0, NULL), 0);
+
+ snprintf(self->dir, sizeof(self->dir), "/tmp/d");
+ ASSERT_EQ(mkdir(self->dir, 0777), 0);
+ /* World-writable so an unmapped caller still passes permission(). */
+ ASSERT_EQ(chmod(self->dir, 0777), 0);
+}
+
+FIXTURE_TEARDOWN(idmapped_tmpfile)
+{
+}
+
+/*
+ * A caller whose fsuid/fsgid have no mapping in the idmapped mount must not be
+ * able to create an O_TMPFILE. Without the check in vfs_tmpfile() the inode
+ * would be created owned by (uid_t)-1 and could then be linked into the
+ * namespace.
+ */
+TEST_F(idmapped_tmpfile, unmapped_caller_is_refused)
+{
+ int mfd, fd;
+
+ mfd = idmapped_clone(self->dir);
+ if (mfd < 0)
+ SKIP(return, "idmapped mounts not supported");
+
+ /* Become a caller outside the mount's [0, MAP_RANGE) range. */
+ setfsgid(UNMAPPED);
+ setfsuid(UNMAPPED);
+ ASSERT_EQ(setfsuid(-1), UNMAPPED);
+
+ fd = openat(mfd, ".", O_TMPFILE | O_WRONLY, 0644);
+ ASSERT_LT(fd, 0);
+ EXPECT_EQ(errno, EOVERFLOW);
+ if (fd >= 0)
+ close(fd);
+
+ EXPECT_EQ(close(mfd), 0);
+}
+
+/*
+ * A mapped caller can create an O_TMPFILE and link it into the namespace; the
+ * ownership round-trips through the mount idmap. This is what makes refusing
+ * the unmapped case above necessary in the first place.
+ */
+TEST_F(idmapped_tmpfile, mapped_caller_creates_and_links)
+{
+ char path[PATH_MAX];
+ struct stat st;
+ int mfd, fd;
+
+ mfd = idmapped_clone(self->dir);
+ if (mfd < 0)
+ SKIP(return, "idmapped mounts not supported");
+
+ /* Caller is uid/gid 0, which maps to MAP_HOST through the mount. */
+ fd = openat(mfd, ".", O_TMPFILE | O_RDWR, 0600);
+ ASSERT_GE(fd, 0);
+
+ ASSERT_EQ(fstat(fd, &st), 0);
+ EXPECT_EQ(st.st_uid, 0);
+ EXPECT_EQ(st.st_gid, 0);
+
+ /* The tmpfile is linkable: splice it into the directory. */
+ ASSERT_EQ(linkat(fd, "", mfd, "linked", AT_EMPTY_PATH), 0);
+ EXPECT_EQ(close(fd), 0);
+
+ ASSERT_EQ(fstatat(mfd, "linked", &st, 0), 0);
+ EXPECT_EQ(st.st_uid, 0);
+ EXPECT_EQ(st.st_gid, 0);
+
+ /* On the underlying, non-idmapped tmpfs it is stored as MAP_HOST. */
+ snprintf(path, sizeof(path), "%s/linked", self->dir);
+ ASSERT_EQ(stat(path, &st), 0);
+ EXPECT_EQ(st.st_uid, MAP_HOST);
+ EXPECT_EQ(st.st_gid, MAP_HOST);
+
+ EXPECT_EQ(close(mfd), 0);
+}
+
+TEST_HARNESS_MAIN