lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2025-01-29	cifs: Add mount option -o symlink= for choosing symlink create type	Pali Rohár
	Currently Linux CIFS client creates a new symlink of the first flavor which is allowed by mount options, parsed in this order: -o (no)mfsymlinks, -o (no)sfu, -o (no)unix (+ its aliases) and -o reparse=[type]. Introduce a new mount option -o symlink= for explicitly choosing a symlink flavor. Possible options are: -o symlink=default - The default behavior, like before this change. -o symlink=none - Disallow creating a new symlinks -o symlink=native - Create as native SMB symlink reparse point -o symlink=unix - Create via SMB1 unix extension command -o symlink=mfsymlinks - Create as regular file of mfsymlinks format -o symlink=sfu - Create as regular system file of SFU format -o symlink=nfs - Create as NFS reparse point -o symlink=wsl - Create as WSL reparse point So for example specifying -o sfu,mfsymlinks,symlink=native will allow to parse symlinks also of SFU and mfsymlinks types (which are disabled by default unless mount option is explicitly specified), but new symlinks will be created under native SMB type (which parsing is always enabled). Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Fix creating and resolving absolute NT-style symlinks	Pali Rohár
	If the SMB symlink is stored on NT server in absolute form then it points to the NT object hierarchy, which is different from POSIX one and needs some conversion / mapping. To make interoperability with Windows SMB server and WSL subsystem, reuse its logic of mapping between NT paths and POSIX paths into Linux SMB client. WSL subsystem on Windows uses for -t drvfs mount option -o symlinkroot= which specifies the POSIX path where are expected to be mounted lowercase Windows drive letters (without colon). Do same for Linux SMB client and add a new mount option -o symlinkroot= which mimics the drvfs mount option of the same name. It specifies where in the Linux VFS hierarchy is the root of the DOS / Windows drive letters, and translates between absolute NT-style symlinks and absolute Linux VFS symlinks. Default value of symlinkroot is "/mnt", same what is using WSL. Note that DOS / Windows drive letter symlinks are just subset of all possible NT-style symlinks. Drive letters live in NT subtree \??\ and important details about NT paths and object hierarchy are in the comments in this change. When symlink target location from non-POSIX SMB server is in absolute form (indicated by absence of SYMLINK_FLAG_RELATIVE) then it is converted to Linux absolute symlink according to symlinkroot configuration. And when creating a new symlink on non-POSIX SMB server in absolute form then Linux absolute target is converted to NT-style according to symlinkroot configuration. When SMB server is POSIX, then this change does not affect neither reading target location of symlink, nor creating a new symlink. It is expected that POSIX SMB server works with POSIX paths where the absolute root is /. This change improves interoperability of absolute SMB symlinks with Windows SMB servers. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Simplify reparse point check in cifs_query_path_info() function	Pali Rohár
	For checking if path is reparse point and setting data->reparse_point member, it is enough to check if ATTR_REPARSE is present. It is not required to call CIFS_open() without OPEN_REPARSE_POINT and checking for -EOPNOTSUPP error code. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove symlink member from cifs_open_info_data union	Pali Rohár
	Member 'symlink' is part of the union in struct cifs_open_info_data. Its value is assigned on few places, but is always read through another union member 'reparse_point'. So to make code more readable, always use only 'reparse_point' member and drop whole union structure. No function change. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Update description about ACL permissions	Pali Rohár
	There are some incorrect information about individual SMB permission constants like WRITE_DAC can change ownership, or incomplete information to distinguish between ACL types (discretionary vs system) and there is completely missing information how permissions apply for directory objects and what is meaning of GENERIC_* bits. Also there is missing constant for MAXIMUM_ALLOWED permission. Fix and extend description of all SMB permission constants to match the reality, how the reference Windows SMB / NTFS implementation handles them. Links to official Microsoft documentation related to permissions: https://learn.microsoft.com/en-us/windows/win32/fileio/file-access-rights-constants https://learn.microsoft.com/en-us/windows/win32/secauthz/access-mask https://learn.microsoft.com/en-us/windows/win32/secauthz/standard-access-rights https://learn.microsoft.com/en-us/windows/win32/secauthz/generic-access-rights https://learn.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntcreatefile Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move ↵	Pali Rohár
	to common/smb2pdu.h Function parse_reparse_posix() parses NFS-style reparse points, which are used only by Windows NFS server since Windows Server 2012 version. This style is not understood by Microsoft POSIX/Interix/SFU/SUA subsystems. So make it clear that parse_reparse_posix() function and reparse_posix_data structure are not POSIX general, but rather NFS specific. All reparse buffer structures are defined in common/smb2pdu.h and have _buffer suffix. So move struct reparse_posix_data from client/cifspdu.h to common/smb2pdu.h and rename it to reparse_nfs_data_buffer for consistency. Note that also SMB specification in [MS-FSCC] document, section 2.1.2.6 defines it under name "Network File System (NFS) Reparse Data Buffer". So use this name for consistency. Having this structure in common/smb2pdu.h can be useful for ksmbd server code as NFS-style reparse points is the preferred way for implementing support for special files. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove struct reparse_posix_data from struct cifs_open_info_data	Pali Rohár
	Linux SMB client already supports more reparse point types but only the reparse_posix_data is defined in union of struct cifs_open_info_data. This union is currently used as implicit casting between point types. With this code style, it hides information that union is used for pointer casting, and just in mknod_nfs() and posix_reparse_to_fattr() functions. Other reparse point buffers do not use this kind of casting. So remove reparse_posix_data from reparse part of struct cifs_open_info_data and for all cases of reparse buffer use just struct reparse_data_buffer *buf. Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove unicode parameter from parse_reparse_point() function	Pali Rohár
	This parameter is always true, so remove it and also remove dead code which is never called (for all false code paths). Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Fix getting and setting SACLs over SMB1	Pali Rohár
	SMB1 callback get_cifs_acl_by_fid() currently ignores its last argument and therefore ignores request for SACL_SECINFO. Fix this issue by correctly propagating info argument from get_cifs_acl() and get_cifs_acl_by_fid() to CIFSSMBGetCIFSACL() function and pass SACL_SECINFO when requested. For accessing SACLs it is needed to open object with SYSTEM_SECURITY access. Pass this flag when trying to get or set SACLs. Same logic is in the SMB2+ code path. This change fixes getting and setting of "system.cifs_ntsd_full" and "system.smb3_ntsd_full" xattrs over SMB1 as currently it silentely ignored SACL part of passed xattr buffer. Fixes: 3970acf7ddb9 ("SMB3: Add support for getting and setting SACLs") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove intermediate object of failed create SFU call	Pali Rohár
	Check if the server honored ATTR_SYSTEM flag by CREATE_OPTION_SPECIAL option. If not then server does not support ATTR_SYSTEM and newly created file is not SFU compatible, which means that the call failed. If CREATE was successful but either setting ATTR_SYSTEM failed or writing type/data information failed then remove the intermediate object created by CREATE. Otherwise intermediate empty object stay on the server. This ensures that if the creating of SFU files with system attribute is unsupported by the server then no empty file stay on the server as a result of unsupported operation. This is for example case with Samba server and Linux tmpfs storage without enabled xattr support (where Samba stores ATTR_SYSTEM bit). Cc: stable@vger.kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Validate EAs for WSL reparse points	Pali Rohár
	Major and minor numbers for char and block devices are mandatory for stat. So check that the WSL EA $LXDEV is present for WSL CHR and BLK reparse points. WSL reparse point tag determinate type of the file. But file type is present also in the WSL EA $LXMOD. So check that both file types are same. Fixes: 78e26bec4d6d ("smb: client: parse uid, gid, mode and dev from WSL reparse points") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM	Pali Rohár
	STATUS_PRIVILEGE_NOT_HELD indicates that user does not have privilege to issue some operation, for example to create symlink. Currently STATUS_PRIVILEGE_NOT_HELD is translated to -EIO. Change it to -EPERM which better describe this error code. Note that there is no ERR* code usable in ntstatus_to_dos_map[] table which can be used to -EPERM translation, so do explicit translation in map_smb_to_linux_error() function. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	Merge tag 'constfy-sysctl-6.14-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl table constification from Joel Granados: "All ctl_table declared outside of functions and that remain unmodified after initialization are const qualified. This prevents unintended modifications to proc_handler function pointers by placing them in the .rodata section. This is a continuation of the tree-wide effort started a few releases ago with the constification of the ctl_table struct arguments in the sysctl API done in 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers")" * tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: treewide: const qualify ctl_tables where applicable
2025-01-29	Merge tag 'fuse-update-6.14' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "Add support for io-uring communication between kernel and userspace using IORING_OP_URING_CMD (Bernd Schubert). Following features enable gains in performance compared to the regular interface: - Allow processing multiple requests with less syscall overhead - Combine commit of old and fetch of new fuse request - CPU/NUMA affinity of queues Patches were reviewed by several people, including Pavel Begunkov, io-uring co-maintainer" * tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: prevent disabling io-uring on active connections fuse: enable fuse-over-io-uring fuse: block request allocation until io-uring init is complete fuse: {io-uring} Prevent mount point hang on fuse-server termination fuse: Allow to queue bg requests through io-uring fuse: Allow to queue fg requests through io-uring fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static fuse: {io-uring} Handle teardown of ring entries fuse: Add io-uring sqe commit and fetch support fuse: {io-uring} Make hash-list req unique finding functions non-static fuse: Add fuse-io-uring handling into fuse_copy fuse: Make fuse_copy non static fuse: {io-uring} Handle SQEs - register commands fuse: make args->in_args[0] to be always the header fuse: Add fuse-io-uring design documentation fuse: Move request bits fuse: Move fuse_get_dev to header file fuse: rename to fuse_dev_end_requests and make non-static
2025-01-28	Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs	Linus Torvalds
	Pull NFS client updates from Anna Schumaker: "New Features: - Enable using direct IO with localio - Added localio related tracepoints Bugfixes: - Sunrpc fixes for working with a very large cl_tasks list - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client() - Fixes for handling reconnections with localio - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT - Fix COPY_NOTIFY xdr_buf size calculations - pNFS/Flexfiles fix for retrying requesting a layout segment for reads - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired Cleanups: - Various other nfs & nfsd localio cleanups - Prepratory patches for async copy improvements that are under development - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts - Add netns inum and srcaddr to debugfs rpc_xprt info" * tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits) SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info pnfs/flexfiles: retry getting layout segment for reads NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE NFSv4.2: fix COPY_NOTIFY xdr buf size calculation NFS: Rename struct nfs4_offloadcancel_data NFS: Fix typo in OFFLOAD_CANCEL comment NFS: CB_OFFLOAD can return NFS4ERR_DELAY nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it nfs: fix incorrect error handling in LOCALIO nfs: probe for LOCALIO when v3 client reconnects to server nfs: probe for LOCALIO when v4 client reconnects to server nfs/localio: remove redundant code and simplify LOCALIO enablement nfs_common: add nfs_localio trace events nfs_common: track all open nfsd_files per LOCALIO nfs_client nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ nfsd: update percpu_ref to manage references on nfsd_net ...
2025-01-28	Merge tag 'driver-core-6.14-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the big set of driver core and debugfs updates for 6.14-rc1. Included in here is a bunch of driver core, PCI, OF, and platform rust bindings (all acked by the different subsystem maintainers), hence the merge conflict with the rust tree, and some driver core api updates to mark things as const, which will also require some fixups due to new stuff coming in through other trees in this merge window. There are also a bunch of debugfs updates from Al, and there is at least one user that does have a regression with these, but Al is working on tracking down the fix for it. In my use (and everyone else's linux-next use), it does not seem like a big issue at the moment. Here's a short list of the things in here: - driver core rust bindings for PCI, platform, OF, and some i/o functions. We are almost at the "write a real driver in rust" stage now, depending on what you want to do. - misc device rust bindings and a sample driver to show how to use them - debugfs cleanups in the fs as well as the users of the fs api for places where drivers got it wrong or were unnecessarily doing things in complex ways. - driver core const work, making more of the api take const * for different parameters to make the rust bindings easier overall. - other small fixes and updates All of these have been in linux-next with all of the aforementioned merge conflicts, and the one debugfs issue, which looks to be resolved "soon"" * tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits) rust: device: Use as_char_ptr() to avoid explicit cast rust: device: Replace CString with CStr in property_present() devcoredump: Constify 'struct bin_attribute' devcoredump: Define 'struct bin_attribute' through macro rust: device: Add property_present() saner replacement for debugfs_rename() orangefs-debugfs: don't mess with ->d_name octeontx2: don't mess with ->d_parent or ->d_parent->d_name arm_scmi: don't mess with ->d_parent->d_name slub: don't mess with ->d_name sof-client-ipc-flood-test: don't mess with ->d_name qat: don't mess with ->d_name xhci: don't mess with ->d_iname mtu3: don't mess wiht ->d_iname greybus/camera - stop messing with ->d_iname mediatek: stop messing with ->d_iname netdevsim: don't embed file_operations into your structs b43legacy: make use of debugfs_get_aux() b43: stop embedding struct file_operations into their objects carl9170: stop embedding file_operations into their objects ...
2025-01-28	treewide: const qualify ctl_tables where applicable	Joel Granados
	Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-28	xfs: remove xfs_buf_cache.bc_lock	Christoph Hellwig
	xfs_buf_cache.bc_lock serializes adding buffers to and removing them from the hashtable. But as the rhashtable code already uses fine grained internal locking for inserts and removals the extra protection isn't actually required. It also happens to fix a lock order inversion vs b_lock added by the recent lookup race fix. Fixes: ee10f6fcdb96 ("xfs: fix buffer lookup vs release race") Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-01-27	Merge tag 'f2fs-for-6.14-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, there are several major improvements such as folio conversion by Matthew, speed-up of block truncation, and caching more dentry pages. In addition, we implemented a linear dentry search to address recent unicode regression, and figured out some false alarms that we could get rid of. Enhancements: - foilio conversion in various IO paths - optimize f2fs_truncate_data_blocks_range() - cache more dentry pages - remove unnecessary blk_finish_plug - procfs: show mtime in segment_bits Bug fixes: - introduce linear search for dentries - don't call block truncation for aliased file - fix using wrong 'submitted' value in f2fs_write_cache_pages - fix to do sanity check correctly on i_inline_xattr_size - avoid trying to get invalid block address - fix inconsistent dirty state of atomic file" * tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: fix inconsistent dirty state of atomic file f2fs: fix to avoid changing 'check only' behaior of recovery f2fs: Clean up the loop outside of f2fs_invalidate_blocks() f2fs: procfs: show mtime in segment_bits f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime() f2fs: Fix format specifier in sanity_check_inode() f2fs: avoid trying to get invalid block address f2fs: fix to do sanity check correctly on i_inline_xattr_size f2fs: remove blk_finish_plug f2fs: Optimize f2fs_truncate_data_blocks_range() f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pages f2fs: add parameter @len to f2fs_invalidate_blocks() f2fs: update_sit_entry_for_release() supports consecutive blocks. f2fs: introduce update_sit_entry_for_release/alloc() f2fs: don't call block truncation for aliased file f2fs: Introduce linear search for dentries f2fs: add parameter @len to f2fs_invalidate_internal_cache() f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range() f2fs: ensure that node info flags are always initialized f2fs: The GC triggered by ioctl also needs to mark the segno as victim ...
2025-01-27	Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux	Linus Torvalds
	Pull nfsd updates from Chuck Lever: "Jeff Layton contributed an implementation of NFSv4.2+ attribute delegation, as described here: https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html This interoperates with similar functionality introduced into the Linux NFS client in v6.11. An attribute delegation permits an NFS client to manage a file's mtime, rather than flushing dirty data to the NFS server so that the file's mtime reflects the last write, which is considerably slower. Neil Brown contributed dynamic NFSv4.1 session slot table resizing. This facility enables NFSD to increase or decrease the number of slots per NFS session depending on server memory availability. More session slots means greater parallelism. Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND encoding screws up when crossing a page boundary in the encoding buffer. This is a zero-day bug, but hitting it is rare and depends on the NFS client implementation. The Linux NFS client does not happen to trigger this issue. A variety of bug fixes and other incremental improvements fill out the list of commits in this release. Great thanks to all contributors, reviewers, testers, and bug reporters who participated during this development cycle" * tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode sunrpc: Remove gss_generic_token deadcode sunrpc: Remove unused xprt_iter_get_xprt Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages" nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION nfsd: handle delegated timestamps in SETATTR nfsd: add support for delegated timestamps nfsd: rework NFS4_SHARE_WANT_* flag handling nfsd: add support for FATTR4_OPEN_ARGUMENTS nfsd: prepare delegation code for handing out _ATTRS_DELEG delegations nfsd: rename NFS4_SHARE_WANT_ constants to OPEN4_SHARE_ACCESS_WANT_* nfsd: switch to autogenerated definitions for open_delegation_type4 nfs_common: make include/linux/nfs4.h include generated nfs4_1.h nfsd: fix handling of delegated change attr in CB_GETATTR SUNRPC: Document validity guarantees of the pointer returned by reserve_space NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer NFSD: Refactor nfsd4_do_encode_secinfo() again NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer ...
2025-01-27	add a string-to-qstr constructor	Al Viro
	Quite a few places want to build a struct qstr by given string; it would be convenient to have a primitive doing that, rather than open-coding it via QSTR_INIT(). The closest approximation was in bcachefs, but that expands to initializer list - {.len = strlen(string), .name = string}. It would be more useful to have it as compound literal - (struct qstr){.len = strlen(string), .name = string}. Unlike initializer list it's a valid expression. What's more, it's a valid lvalue - it's an equivalent of anonymous local variable with such initializer, so the things like path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name)); are valid. It can also be used as initializer, with identical effect - struct qstr x = (struct qstr){.name = s, .len = strlen(s)}; is equivalent to struct qstr anon_variable = {.name = s, .len = strlen(s)}; struct qstr x = anon_variable; // anon_variable is never used after that point and any even remotely sane compiler will manage to collapse that into struct qstr x = {.name = s, .len = strlen(s)}; What compound literals can't be used for is initialization of global variables, but those are covered by QSTR_INIT(). This commit lifts definition(s) of QSTR() into linux/dcache.h, converts it to compound literal (all bcachefs users are fine with that) and converts assorted open-coded instances to using that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	9p: fix ->rename_sem exclusion	Al Viro
	9p wants to be able to build a path from given dentry to fs root and keep it valid over a blocking operation. ->s_vfs_rename_mutex would be a natural candidate, but there are places where we need that and where we have no way to tell if ->s_vfs_rename_mutex is already held deeper in callchain. Moreover, it's only held for cross-directory renames; name changes within the same directory happen without it. Solution: * have d_move() done in ->rename() rather than in its caller * maintain a 9p-private rwsem (per-filesystem) * hold it exclusive over the relevant part of ->rename() * hold it shared over the places where we want the path. That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move() and d_exchange() calls under filesystem's control. However, there's also __d_unalias(), which isn't covered by any of that. If ->lookup() hits a directory inode with preexisting dentry elsewhere (due to e.g. rename done on server behind our back), d_splice_alias() called by ->lookup() will move/rename that alias. Add a couple of optional methods, so that __d_unalias() would do if alias->d_op->d_unalias_trylock != NULL if (!alias->d_op->d_unalias_trylock(alias)) fail (resulting in -ESTALE from lookup) __d_move(...) if alias->d_op->d_unalias_unlock != NULL alias->d_unalias_unlock(alias) where it currently does __d_move(). 9p instances do down_write_trylock() and up_write() of ->rename_mutex. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	orangefs_d_revalidate(): use stable parent inode and name passed by caller	Al Viro
	->d_name use is a UAF if the userland side of things can be slowed down by attacker. Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller	Al Viro
	theoretically, ->d_name use in there is a UAF, but only if you are messing with tracepoints... Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	nfs: fix ->d_revalidate() UAF on ->d_name accesses	Al Viro
	Pass the stable name all the way down to ->rpc_ops->lookup() instances. Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it is stable there, as it is in ->create() et.al. dget_parent() in nfs_instantiate() should be redundant - it'd better be stable there; if it's not, we have more trouble, since ->d_name would also be unsafe in such case. nfs_submount() and nfs4_submount() may or may not require fixes - if they ever get moved on server with fhandle preserved, we are in trouble there... UAF window is fairly narrow here and exfiltration requires the ability to watch the traffic. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	nfs{,4}_lookup_validate(): use stable parent inode passed by caller	Al Viro
	we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate in it is gone Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	gfs2_drevalidate(): use stable parent inode and name passed by caller	Al Viro
	No need to mess with dget_parent() for the former; for the latter we really should not rely upon ->d_name.name remaining stable. Theoretically a UAF, but it's hard to exfiltrate the information... Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	fuse_dentry_revalidate(): use stable parent inode and name passed by caller	Al Viro
	No need to mess with dget_parent() for the former; for the latter we really should not rely upon ->d_name.name remaining stable - it's a real-life UAF. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	vfat_revalidate{,_ci}(): use stable parent inode passed by caller	Al Viro
	Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	exfat_d_revalidate(): use stable parent inode passed by caller	Al Viro
	... no need to bother with ->d_lock and ->d_parent->d_inode. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	fscrypt_d_revalidate(): use stable parent inode passed by caller	Al Viro
	The only thing it's using is parent directory inode and we are already given a stable reference to that - no need to bother with boilerplate. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	ceph_d_revalidate(): propagate stable name down into request encoding	Al Viro
	Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable and it gets that in almost all cases. The only exception is ->d_revalidate(), where we have a stable name, but it's passed separately - dentry->d_name is not stable there. Propagate it down to get_fscrypt_altname() as a new field of struct ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name when non-NULL. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	ceph_d_revalidate(): use stable parent inode passed by caller	Al Viro
	No need to mess with the boilerplate for obtaining what we already have. Note that ceph is one of the "will want a path from filesystem root if we want to talk to server" cases, so the name of the last component is of little use - it is passed to fscrypt_d_revalidate() and it's used to deal with (also crypt-related) case in request marshalling, when encrypted name turns out to be too long. The former is not a problem, but the latter is racy; that part will be handled in the next commit. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	afs_d_revalidate(): use stable name and parent inode passed by caller	Al Viro
	No need to bother with boilerplate for obtaining the latter and for the former we really should not count upon ->d_name.name remaining stable under us. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	Pass parent directory inode and expected name to ->d_revalidate()	Al Viro
	->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	generic_ci_d_compare(): use shortname_storage	Al Viro
	... and check the "name might be unstable" predicate the right way. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	ext4 fast_commit: make use of name_snapshot primitives	Al Viro
	... rather than open-coding them. As a bonus, that avoids the pointless work with extra allocations, etc. for long names. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	dissolve external_name.u into separate members	Al Viro
	... and document the constraints on the layout. Kept separate from the previous commit to keep the noise separate from actual changes. The reason for explicit __aligned() on ->name[] rather than relying upon the alignment of the previous field is that the previous iteration of that commit tried to save 4 bytes on 64bit by eliminating a hole in there, which broke the assumptions in dentry_string_cmp(). Better spell it out and avoid the temptation for the future... Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place: - vdpa/octeon support for multiple interrupts - virtio-pci support for error recovery - vp_vdpa support for notification with data - vhost/net fix to set num_buffers for spec compliance - virtio-mem now works with kdump on s390 And small cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (23 commits) virtio_blk: Add support for transport error recovery virtio_pci: Add support for PCIe Function Level Reset vhost/net: Set num_buffers for virtio 1.0 vdpa/octeon_ep: read vendor-specific PCI capability virtio-pci: define type and header for PCI vendor data vdpa/octeon_ep: handle device config change events vdpa/octeon_ep: enable support for multiple interrupts per device vdpa: solidrun: Replace deprecated PCI functions s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM virtio-mem: remember usable region size virtio-mem: mark device ready before registering callbacks in kdump mode fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel fs/proc/vmcore: factor out freeing a list of vmcore ranges fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list fs/proc/vmcore: move vmcore definitions out of kcore.h fs/proc/vmcore: prefix all pr_* with "vmcore:" fs/proc/vmcore: disallow vmcore modifications while the vmcore is open fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex ...
2025-01-27	fuse: prevent disabling io-uring on active connections	Bernd Schubert
	The enable_uring module parameter allows administrators to enable/disable io-uring support for FUSE at runtime. However, disabling io-uring while connections already have it enabled can lead to an inconsistent state. Fix this by keeping io-uring enabled on connections that were already using it, even if the module parameter is later disabled. This ensures active FUSE mounts continue to function correctly. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: enable fuse-over-io-uring	Bernd Schubert
	All required parts are handled now, fuse-io-uring can be enabled. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: block request allocation until io-uring init is complete	Bernd Schubert
	Avoid races and block request allocation until io-uring queues are ready. This is a especially important for background requests, as bg request completion might cause lock order inversion of the typical queue->lock and then fc->bg_lock fuse_request_end spin_lock(&fc->bg_lock); flush_bg_queue fuse_send_one fuse_uring_queue_fuse_req spin_lock(&queue->lock); Signed-off-by: Bernd Schubert <bernd@bsbernd.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: {io-uring} Prevent mount point hang on fuse-server termination	Bernd Schubert
	When the fuse-server terminates while the fuse-client or kernel still has queued URING_CMDs, these commands retain references to the struct file used by the fuse connection. This prevents fuse_dev_release() from being invoked, resulting in a hung mount point. This patch addresses the issue by making queued URING_CMDs cancelable, allowing fuse_dev_release() to proceed as expected and preventing the mount point from hanging. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: Allow to queue bg requests through io-uring	Bernd Schubert
	This prepares queueing and sending background requests through io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: Allow to queue fg requests through io-uring	Bernd Schubert
	This prepares queueing and sending foreground requests through io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static	Bernd Schubert
	These functions are also needed by fuse-over-io-uring. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: {io-uring} Handle teardown of ring entries	Bernd Schubert
	On teardown struct file_operations::uring_cmd requests need to be completed by calling io_uring_cmd_done(). Not completing all ring entries would result in busy io-uring tasks giving warning messages in intervals and unreleased struct file. Additionally the fuse connection and with that the ring can only get released when all io-uring commands are completed. Completion is done with ring entries that are a) in waiting state for new fuse requests - io_uring_cmd_done is needed b) already in userspace - io_uring_cmd_done through teardown is not needed, the request can just get released. If fuse server is still active and commits such a ring entry, fuse_uring_cmd() already checks if the connection is active and then complete the io-uring itself with -ENOTCONN. I.e. special handling is not needed. This scheme is basically represented by the ring entry state FRRS_WAIT and FRRS_USERSPACE. Entries in state: - FRRS_INIT: No action needed, do not contribute to ring->queue_refs yet - All other states: Are currently processed by other tasks, async teardown is needed and it has to wait for the two states above. It could be also solved without an async teardown task, but would require additional if conditions in hot code paths. Also in my personal opinion the code looks cleaner with async teardown. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	fuse: Add io-uring sqe commit and fetch support	Bernd Schubert
	This adds support for fuse request completion through ring SQEs (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing the ring entry it becomes available for new fuse requests. Handling of requests through the ring (SQE/CQE handling) is complete now. Fuse request data are copied through the mmaped ring buffer, there is no support for any zero copy yet. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-27	ceph: exchange hardcoded value on NAME_MAX	Viacheslav Dubeyko
	Initially, ceph_fs_debugfs_init() had temporary name buffer with hardcoded length of 80 symbols. Then, it was hardcoded again for 100 symbols. Finally, it makes sense to exchange hardcoded value on properly defined constant and 255 symbols should be enough for any name case. Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-01-27	ceph: streamline request head structures in MDS client	Liang Jie
	The existence of the ceph_mds_request_head_old structure in the MDS client code is no longer required due to improvements in handling different MDS request header versions. This patch removes the now redundant ceph_mds_request_head_old structure and replaces its usage with the flexible and extensible ceph_mds_request_head structure. Changes include: - Modification of find_legacy_request_head to directly cast the pointer to ceph_mds_request_head_legacy without going through the old structure. - Update sizeof calculations in create_request_message to use offsetofend for consistency and future-proofing, rather than referencing the old structure. - Use of the structured ceph_mds_request_head directly instead of the old one. Additionally, this consolidation normalizes the handling of request_head_version v1 to align with versions v2 and v3, leading to a more consistent and maintainable codebase. These changes simplify the codebase and reduce potential confusion stemming from the existence of an obsolete structure. Signed-off-by: Liang Jie <liangjie@lixiang.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>