summaryrefslogtreecommitdiff
path: root/include/linux/sunrpc
AgeCommit message (Collapse)Author
16 hoursMerge tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: - filehandle signing to defend against filehandle-guessing attacks (Benjamin Coddington) The server now appends a SipHash-2-4 MAC to each filehandle when the new "sign_fh" export option is enabled. NFSD then verifies filehandles received from clients against the expected MAC; mismatches return NFS error STALE - convert the entire NLMv4 server-side XDR layer from hand-written C to xdrgen-generated code, spanning roughly thirty patches (Chuck Lever) XDR functions are generally boilerplate code and are easy to get wrong. The goals of this conversion are improved memory safety, lower maintenance burden, and groundwork for eventual Rust code generation for these functions. - improve pNFS block/SCSI layout robustness with two related changes (Dai Ngo) SCSI persistent reservation fencing is now tracked per client and per device via an xarray, to avoid both redundant preempt operations on devices already fenced and a potential NFSD deadlock when all nfsd threads are waiting for a layout return. - scalability and infrastructure improvements Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.1 NFSD development cycle. * tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (83 commits) NFSD: Docs: clean up pnfs server timeout docs nfsd: fix comment typo in nfsxdr nfsd: fix comment typo in nfs3xdr NFSD: convert callback RPC program to per-net namespace NFSD: use per-operation statidx for callback procedures svcrdma: Use contiguous pages for RDMA Read sink buffers SUNRPC: Add svc_rqst_page_release() helper SUNRPC: xdr.h: fix all kernel-doc warnings svcrdma: Factor out WR chain linking into helper svcrdma: Add Write chunk WRs to the RPC's Send WR chain svcrdma: Clean up use of rdma->sc_pd->device svcrdma: Clean up use of rdma->sc_pd->device in Receive paths svcrdma: Add fair queuing for Send Queue access SUNRPC: Optimize rq_respages allocation in svc_alloc_arg SUNRPC: Track consumed rq_pages entries svcrdma: preserve rq_next_page in svc_rdma_save_io_pages SUNRPC: Handle NULL entries in svc_rqst_release_pages SUNRPC: Allocate a separate Reply page array SUNRPC: Tighten bounds checking in svc_rqst_replace_page NFSD: Sign filehandles ...
2026-04-05folio_batch: rename pagevec.h to folio_batch.hTal Zussman
struct pagevec was removed in commit 1e0877d58b1e ("mm: remove struct pagevec"). Rename include/linux/pagevec.h to reflect reality and update includes tree-wide. Add the new filename to MAINTAINERS explicitly, as it no longer matches the "include/linux/page[-_]*" pattern in MEMORY MANAGEMENT - CORE. Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-3-716868cc2d11@columbia.edu Signed-off-by: Tal Zussman <tz2294@columbia.edu> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-03SUNRPC: Add svc_rqst_page_release() helperChuck Lever
svc_rqst_replace_page() releases displaced pages through a per-rqst folio batch, but exposes the add-or-flush sequence directly. svc_tcp_restore_pages() releases displaced pages individually with put_page(). Introduce svc_rqst_page_release() to encapsulate the batched release mechanism. Convert svc_rqst_replace_page() and svc_tcp_restore_pages() to use it. The latter now benefits from the same batched release that svc_rqst_replace_page() already uses. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: xdr.h: fix all kernel-doc warningsRandy Dunlap
Correct a function parameter name (s/page/folio/) and add function return value sections for multiple functions to eliminate kernel-doc warnings: Warning: include/linux/sunrpc/xdr.h:298 function parameter 'folio' not described in 'xdr_set_scratch_folio' Warning: include/linux/sunrpc/xdr.h:337 No description found for return value of 'xdr_stream_remaining' Warning: include/linux/sunrpc/xdr.h:357 No description found for return value of 'xdr_align_size' Warning: include/linux/sunrpc/xdr.h:374 No description found for return value of 'xdr_pad_size' Warning: include/linux/sunrpc/xdr.h:387 No description found for return value of 'xdr_stream_encode_item_present' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Add Write chunk WRs to the RPC's Send WR chainChuck Lever
Previously, Write chunk RDMA Writes were posted via a separate ib_post_send() call with their own completion handler. Each Write chunk incurred a doorbell and generated a completion event. Link Write chunk WRs onto the RPC Reply's Send WR chain so that a single ib_post_send() call posts both the RDMA Writes and the Send WR. A single completion event signals that all operations have finished. This reduces both doorbell rate and completion rate, as well as eliminating the latency of a round-trip between the Write chunk completion and the subsequent Send WR posting. The lifecycle of Write chunk resources changes: previously, the svc_rdma_write_done() completion handler released Write chunk resources when RDMA Writes completed. With WR chaining, resources remain live until the Send completion. A new sc_write_info_list tracks Write chunk metadata attached to each Send context, and svc_rdma_write_chunk_release() frees these resources when the Send context is released. The svc_rdma_write_done() handler now handles only error cases. On success it returns immediately since the Send completion handles resource release. On failure (WR flush), it closes the connection to signal to the client that the RPC Reply is incomplete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Add fair queuing for Send Queue accessChuck Lever
When the Send Queue fills, multiple threads may wait for SQ slots. The previous implementation had no ordering guarantee, allowing starvation when one thread repeatedly acquires slots while others wait indefinitely. Introduce a ticket-based fair queuing system. Each waiter takes a ticket number and is served in FIFO order. This ensures forward progress for all waiters when SQ capacity is constrained. The implementation has two phases: 1. Fast path: attempt to reserve SQ slots without waiting 2. Slow path: take a ticket, wait for turn, then wait for slots The ticket system adds two atomic counters to the transport: - sc_sq_ticket_head: next ticket to issue - sc_sq_ticket_tail: ticket currently being served A dedicated wait queue (sc_sq_ticket_wait) handles ticket ordering, separate from sc_send_wait which handles SQ capacity. This separation ensures that send completions (the high-frequency wake source) wake only the current ticket holder rather than all queued waiters. Ticket handoff wakes only the ticket wait queue, and each ticket holder that exits via connection close propagates the wake to the next waiter in line. When a waiter successfully reserves slots, it advances the tail counter and wakes the next waiter. This creates an orderly handoff that prevents starvation while maintaining good throughput on the fast path when contention is low. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Optimize rq_respages allocation in svc_alloc_argChuck Lever
svc_alloc_arg() invokes alloc_pages_bulk() with the full rq_maxpages count (~259 for 1MB messages) for the rq_respages array, causing a full-array scan despite most slots holding valid pages. svc_rqst_release_pages() NULLs only the range [rq_respages, rq_next_page) after each RPC, so only that range contains NULL entries. Limit the rq_respages fill in svc_alloc_arg() to that range instead of scanning the full array. svc_init_buffer() initializes rq_next_page to span the entire rq_respages array, so the first svc_alloc_arg() call fills all slots. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Track consumed rq_pages entriesChuck Lever
The rq_pages array holds pages allocated for incoming RPC requests. Two transport receive paths NULL entries in rq_pages to prevent svc_rqst_release_pages() from freeing pages that the transport has taken ownership of: - svc_tcp_save_pages() moves partial request data pages to svsk->sk_pages during multi-fragment TCP reassembly. - svc_rdma_clear_rqst_pages() moves request data pages to head->rc_pages because they are targets of active RDMA Read WRs. A new rq_pages_nfree field in struct svc_rqst records how many entries were NULLed. svc_alloc_arg() uses it to refill only those entries rather than scanning the full rq_pages array. In steady state, the transport NULLs a handful of entries per RPC, so the allocator visits only those entries instead of the full ~259 slots (for 1MB messages). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Allocate a separate Reply page arrayChuck Lever
struct svc_rqst uses a single dynamically-allocated page array (rq_pages) for both the incoming RPC Call message and the outgoing RPC Reply message. rq_respages is a sliding pointer into rq_pages that each transport receive path must compute based on how many pages the Call consumed. This boundary tracking is a source of confusion and bugs, and prevents an RPC transaction from having both a large Call and a large Reply simultaneously. Allocate rq_respages as its own page array, eliminating the boundary arithmetic. This decouples Call and Reply buffer lifetimes, following the precedent set by rq_bvec (a separate dynamically- allocated array for I/O vectors). Each svc_rqst now pins twice as many pages as before. For a server running 16 threads with a 1MB maximum payload, the additional cost is roughly 16MB of pinned memory. The new dynamic svc thread count facility keeps this overhead minimal on an idle server. A subsequent patch in this series limits per-request repopulation to only the pages released during the previous RPC, avoiding a full-array scan on each call to svc_alloc_arg(). Note: We've considered several alternatives to maintaining a full second array. Each alternative reintroduces either boundary logic complexity or I/O-path allocation pressure. rq_next_page is initialized in svc_alloc_arg() and svc_process() during Reply construction, and in svc_rdma_recvfrom() as a precaution on error paths. Transport receive paths no longer compute it from the Call size. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: split cache_detail queue into request and reader listsJeff Layton
Replace the single interleaved queue (which mixed cache_request and cache_reader entries distinguished by a ->reader flag) with two dedicated lists: cd->requests for upcall requests and cd->readers for open file handles. Readers now track their position via a monotonically increasing sequence number (next_seqno) rather than by their position in the shared list. Each cache_request is assigned a seqno when enqueued, and a new cache_next_request() helper finds the next request at or after a given seqno. This eliminates the cache_queue wrapper struct entirely, simplifies the reader-skipping loops in cache_read/cache_poll/cache_ioctl/ cache_release, and makes the data flow easier to reason about. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: convert queue_wait from global to per-cache-detail waitqueueJeff Layton
The queue_wait waitqueue is currently a file-scoped global, so a wake_up for one cache_detail wakes pollers on all caches. Convert it to a per-cache-detail field so that only pollers on the relevant cache are woken. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: convert queue_lock from global spinlock to per-cache-detail lockJeff Layton
The global queue_lock serializes all upcall queue operations across every cache_detail instance. Convert it to a per-cache-detail spinlock so that different caches (e.g. auth.unix.ip vs nfsd.fh) no longer contend with each other on queue operations. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29Documentation: Add the RPC language description of NLM version 4Chuck Lever
In order to generate source code to encode and decode NLMv4 protocol elements, include a copy of the RPC language description of NLMv4 for xdrgen to process. The language description is an amalgam of RFC 1813 and the Open Group's XNFS specification: https://pubs.opengroup.org/onlinepubs/9629799/chap10.htm The C code committed here was generated from the new nlm4.x file using tools/net/sunrpc/xdrgen/xdrgen. The goals of replacing hand-written XDR functions with ones that are tool-generated are to improve memory safety and make XDR encoding and decoding less brittle to maintain. The xdrgen utility derives both the type definitions and the encode/decode functions directly from protocol specifications, using names and symbols familiar to anyone who knows those specs. Unlike hand-written code that can inadvertently diverge from the specification, xdrgen guarantees that the generated code matches the specification exactly. We would eventually like xdrgen to generate Rust code as well, making the conversion of the kernel's NFS stacks to use Rust just a little easier for us. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: Fix compilation error (`make W=1`) when dprintk() is no-opAndy Shevchenko
Clang compiler is not happy about set but unused variables: .../flexfilelayout/flexfilelayoutdev.c:56:9: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] .../flexfilelayout/flexfilelayout.c:1505:6: error: variable 'err' set but not used [-Werror,-Wunused-but-set-variable] .../nfs4proc.c:9244:12: error: variable 'ptr' set but not used [-Werror,-Wunused-but-set-variable] Fix these by forwarding parameters of dprintk() to no_printk(). The positive side-effect is a format-string checker enabled even for the cases when dprintk() is no-op. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Fixes: fc931582c260 ("nfs41: create_session operation") Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: Kill RPC_IFDEBUG()Andy Shevchenko
RPC_IFDEBUG() is used in only two places. In one the user of the definition is guarded by ifdeffery, in the second one it's implied due to dprintk() usage. Kill the macro and move the ifdeffery to the regular condition with the variable defined inside, while in the second case add the same conditional and move the respective code there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29nfsd/sunrpc: move rq_cachetype into struct nfsd_thread_local_infoJeff Layton
The svc_rqst->rq_cachetype field is only accessed by nfsd. Move it into the nfsd_thread_local_info instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29nfsd/sunrpc: add svc_rqst->rq_private pointer and remove rq_lease_breakerJeff Layton
rq_lease_breaker has always been a NFSv4 specific layering violation in svc_rqst. The reason it's there though is that we need a place that is thread-local, and accessible from the svc_rqst pointer. Add a new rq_private pointer to struct svc_rqst. This is intended for use by the threads that are handling the service. sunrpc code doesn't touch it. In nfsd, define a new struct nfsd_thread_local_info. nfsd declares one of these on the stack and puts a pointer to it in rq_private. Add a new ntli_lease_breaker field to the new struct and convert all of the places that access rq_lease_breaker to use the new field instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-02-12Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Use an LRU list for returning unused delegations - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: - NFS/localio: - Handle short writes by retrying - Prevent direct reclaim recursion into NFS via nfs_writepages - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit - Remove -EAGAIN handling in nfs_local_doio() - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN - fs/nfs: Fix a readdir slow-start regression - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other cleanups and improvements: - A few other NFS/localio cleanups - Various other delegation handling cleanups from Christoph - Unify security_inode_listsecurity() calls - Improvements to NFSv4 lease handling - Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set" * tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits) SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path nfs: nfs4proc: Convert comma to semicolon SUNRPC: Change list definition method sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset NFSv4: limit lease period in nfs4_set_lease_period() NFSv4: pass lease period in seconds to nfs4_set_lease_period() nfs: unify security_inode_listsecurity() calls fs/nfs: Fix readdir slow-start regression pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN NFS: fix delayed delegation return handling NFS: simplify error handling in nfs_end_delegation_return NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return NFS: remove the delegation == NULL check in nfs_end_delegation_return NFS: use bool for the issync argument to nfs_end_delegation_return NFS: return void from ->return_delegation NFS: return void from nfs4_inode_make_writeable NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4 NFS: Add a way to disable NFS v4.0 via KConfig NFS: Move sequence slot operations into minorversion operations NFS: Pass a struct nfs_client to nfs4_init_sequence() ...
2026-02-09sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unsetBen Dooks
The rpc_debug, nfs_debug, nfsd_debug and nlm_debug are exported even if CONFIG_SUNRPC_DEBUG is not set. This means that the debug header should also define these to remove the following sparse warnings: net/sunrpc/sysctl.c:29:17: warning: symbol 'rpc_debug' was not declared. Should it be static? net/sunrpc/sysctl.c:32:17: warning: symbol 'nfs_debug' was not declared. Should it be static? net/sunrpc/sysctl.c:35:17: warning: symbol 'nfsd_debug' was not declared. Should it be static? net/sunrpc/sysctl.c:38:17: warning: symbol 'nlm_debug' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-29Add RPC language definition of NFSv4 POSIX ACL extensionChuck Lever
The language definition was extracted from the new draft-ietf-nfsv4-posix-acls specification. This ensures good constant and type name alignment between the spec and the Linux kernel source code, and brings in some basic XDR utilities for handling NFSv4 POSIX draft ACLs. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29xdrgen: Implement pass-through lines in specificationsChuck Lever
XDR specification files can contain lines prefixed with '%' that pass through unchanged to generated output. Traditional rpcgen removes the '%' and emits the remainder verbatim, allowing direct insertion of C includes, pragma directives, or other language- specific content into the generated code. Until now, xdrgen silently discarded these lines during parsing. This prevented specifications from including necessary headers or preprocessor directives that might be required for the generated code to compile correctly. The grammar now captures pass-through lines instead of ignoring them. A new AST node type represents pass-through content, and the AST transformer strips the leading '%' character. Definition and source generators emit pass-through content in document order, preserving the original placement within the specification. This brings xdrgen closer to feature parity with traditional rpcgen while maintaining the existing document-order processing model. Existing generated xdrgen source code has been regenerated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSYJeff Layton
To dynamically adjust the thread count, nfsd requires some information about how busy things are. Change svc_recv() to take a timeout value, and then allow the wait for work to time out if it's set. If a timeout is not defined, then the schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the full timeout, then have it return -ETIMEDOUT to the caller. If it wakes up, finds that there is more work and that no threads are available, then attempt to set SP_TASK_STARTING. If wasn't already set, have the task return -EBUSY to cue to the caller that the service could use more threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: split new thread creation into a separate functionJeff Layton
Break out the part of svc_start_kthreads() that creates a thread into svc_new_thread(), as a new exported helper function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: introduce the concept of a minimum number of threads per poolJeff Layton
Add a new pool->sp_nrthrmin field to track the minimum number of threads in a pool. Add min_threads parameters to both svc_set_num_threads() and svc_set_pool_threads(). If min_threads is non-zero and less than the max, svc_set_num_threads() will ensure that the number of running threads is between the min and the max. If the min is 0 or greater than the max, then it is ignored, and the maximum number of threads will be started, and never spun down. For now, the min_threads is always 0, but a later patch will pass the proper value through from nfsd. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: track the max number of requested threads in a poolJeff Layton
The kernel currently tracks the number of threads running in a pool in the "sp_nrthreads" field. In the future, where threads are dynamically spun up and down, it'll be necessary to keep track of the maximum number of requested threads separately from the actual number running. Add a pool->sp_nrthrmax parameter to track this. When userland changes the number of threads in a pool, update that value accordingly. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: split svc_set_num_threads() into two functionsJeff Layton
svc_set_num_threads() will set the number of running threads for a given pool. If the pool argument is set to NULL however, it will distribute the threads among all of the pools evenly. These divergent codepaths complicate the move to dynamic threading. Simplify the API by splitting these two cases into different helpers: Add a new svc_set_pool_threads() function that sets the number of threads in a single, given pool. Modify svc_set_num_threads() to distribute the threads evenly between all of the pools and then call svc_set_pool_threads() for each. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26xdrgen: Add enum value validation to generated decodersChuck Lever
XDR enum decoders generated by xdrgen do not verify that incoming values are valid members of the enum. Incoming out-of-range values from malicious or buggy peers propagate through the system unchecked. Add validation logic to generated enum decoders using a switch statement that explicitly lists valid enumerator values. The compiler optimizes this to a simple range check when enum values are dense (contiguous), while correctly rejecting invalid values for sparse enums with gaps in their value ranges. The --no-enum-validation option on the source subcommand disables this validation when not needed. The minimum and maximum fields in _XdrEnum, which were previously unused placeholders for a range-based validation approach, have been removed since the switch-based validation handles both dense and sparse enums correctly. Because the new mechanism results in substantive changes to generated code, existing .x files are regenerated. Unrelated white space and semicolon changes in the generated code are due to recent commit 1c873a2fd110 ("xdrgen: Don't generate unnecessary semicolon") and commit 38c4df91242b ("xdrgen: Address some checkpatch whitespace complaints"). Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26xdrgen: Initialize data pointer for zero-length itemsChuck Lever
The xdrgen decoders for strings and opaque data had an optimization that skipped calling xdr_inline_decode() when the item length was zero. This left the data pointer uninitialized, which could lead to unpredictable behavior when callers access it. Remove the zero-length check and always call xdr_inline_decode(). When passed a length of zero, xdr_inline_decode() returns the current buffer position, which is valid and matches the behavior of hand-coded XDR decoders throughout the kernel. Fixes: 4b132aacb076 ("tools: Add xdrgen") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26xdrgen: Implement short (16-bit) integer typesChuck Lever
"short" and "unsigned short" types are not defined in RFC 4506, but are supported by the rpcgen program. An upcoming protocol specification includes at least one "unsigned short" field, so xdrgen needs to implement support for these types. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-12Merge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Fix 'nlink' attribute update races when unlinking a file - Add missing initialisers for the directory verifier in various places - Don't regress the NFSv4 open state due to misordered racing replies - Ensure the NFSv4.x callback server uses the correct transport connection - Fix potential use-after-free races when shutting down the NFSv4.x callback server - Fix a pNFS layout commit crash - Assorted fixes to ensure correct propagation of mount options when the client crosses a filesystem boundary and triggers the VFS automount code - More localio fixes Features and cleanups: - Add initial support for basic directory delegations - SunRPC back channel code cleanups" * tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits) NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegations nfs/localio: remove 61 byte hole from needless ____cacheline_aligned nfs/localio: remove alignment size checking in nfs_is_local_dio_possible NFS: Fix up the automount fs_context to use the correct cred NFS: Fix inheritance of the block sizes when automounting NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags Revert "nfs: ignore SB_RDONLY when mounting nfs" Revert "nfs: clear SB_RDONLY before getting superblock" Revert "nfs: ignore SB_RDONLY when remounting nfs" NFS: Add a module option to disable directory delegations NFS: Shortcut lookup revalidations if we have a directory delegation NFS: Request a directory delegation during RENAME NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK NFS: Add support for sending GDD_GETATTR NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid NFSv4.1: protect destroying and nullifying bc_serv structure SUNRPC: new helper function for stopping backchannel server SUNRPC: cleanup common code in backchannel request NFSv4.1: pass transport for callback shutdown NFSv4: ensure the open stateid seqid doesn't go backwards ...
2025-11-23SUNRPC: new helper function for stopping backchannel serverOlga Kornievskaia
Create a new backchannel function to stop the backchannel server and clear the bc_serv in transport protected under the bc_pa_lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23SUNRPC: cleanup common code in backchannel requestOlga Kornievskaia
Create a helper function for common code between rdma and tcp backchannel handling of the backchannel request. Make sure that access is protected by the bc_pa_lock lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-16sunrpc: allocate a separate bvec array for socket sendsJeff Layton
svc_tcp_sendmsg() calls xdr_buf_to_bvec() with the second slot of rq_bvec as the start, but doesn't reduce the array length by one, which could lead to an array overrun. Also, rq_bvec is always rq_maxpages in length, which can be too short in some cases, since the TCP record marker consumes a slot. Fix both problems by adding a separate bvec array to the svc_sock that is specifically for sending. For TCP, make this array one slot longer than rq_maxpages, to account for the record marker. For UDP, only allocate as large an array as we need since it's limited to 64k of payload. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-16svcrdma: Increase the server's default RPC/RDMA credit grantChuck Lever
The range of commits from commit e3274026e2ec ("SUNRPC: move all of xprt handling into svc_xprt_handle()") to commit 15d39883ee7d ("SUNRPC: change the back-channel queue to lwq") enabled NFSD performance to scale better as the number of nfsd threads is increased. These commits were merged in v6.7. Now that the nfsd thread count can scale to more threads, permit individual clients to make more use of those threads. Increase the RPC/RDMA per-connection credit grant from 64 to 128 -- same as the Linux NFS client. Simple single client fio-based benchmarking so far shows only improvement, no regression. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-06Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Mike Snitzer has prototyped a mechanism for disabling I/O caching in NFSD. This is introduced in v6.18 as an experimental feature. This enables scaling NFSD in /both/ directions: - NFS service can be supported on systems with small memory footprints, such as low-cost cloud instances - Large NFS workloads will be less likely to force the eviction of server-local activity, helping it avoid thrashing Jeff Layton contributed a number of fixes to the new attribute delegation implementation (based on a pending Internet RFC) that we hope will make attribute delegation reliable enough to enable by default, as it is on the Linux NFS client. The remaining patches in this pull request are clean-ups and minor optimizations. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.18 NFSD development cycle" * tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) nfsd: discard nfserr_dropit SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it NFSD: Add io_cache_{read,write} controls to debugfs NFSD: Do the grace period check in ->proc_layoutget nfsd: delete unnecessary NULL check in __fh_verify() NFSD: Allow layoutcommit during grace period NFSD: Disallow layoutget during grace period sunrpc: fix "occurence"->"occurrence" nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in nfsd: nfserr_jukebox in nlm_fopen should lead to a retry NFSD: Reduce DRC bucket size NFSD: Delay adding new entries to LRU SUNRPC: Move the svc_rpcb_cleanup() call sites NFS: Remove rpcbind cleanup for NFSv4.0 callback nfsd: unregister with rpcbind when deleting a transport NFSD: Drop redundant conversion to bool sunrpc: eliminate return pointer in svc_tcp_sendmsg() sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length nfsd: decouple the xprtsec policy check from check_nfsd_access() NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() ...
2025-09-23SUNRPC: Update gssx_accept_sec_context() to use xdr_set_scratch_folio()Anna Schumaker
This was the last caller of xdr_set_scratch_page(), so I remove this function while I'm at it. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Update svcxdr_init_decode() to call xdr_set_scratch_folio()Anna Schumaker
The only snag here is that __folio_alloc_node() doesn't handle NUMA_NO_NODE, so I also need to update svc_pool_map_get_node() to return numa_mem_id() instead. I arrived at this approach by looking at what other users of __folio_alloc_node() do for this case. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Introduce xdr_set_scratch_folio()Anna Schumaker
This will replace xdr_set_scratch_page() when we switch pages to folios. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23sunrpc: add a Kconfig option to redirect dfprintk() output to trace bufferJeff Layton
We have a lot of old dprintk() call sites that aren't going anywhere anytime soon. At the same time, turning them up is a serious burden on the host due to the console locking overhead. Add a new Kconfig option that redirects dfprintk() output to the trace buffer. This is more efficient than logging to the console and allows for proper interleaving of dprintk and static tracepoint events. Since using trace_printk() causes scary warnings to pop at boot time, this new option defaults to "n". Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23sunrpc: remove dfprintk_cont() and dfprintk_rcu_cont()Jeff Layton
KERN_CONT hails from a simpler time, when SMP wasn't the norm. These days, it doesn't quite work right since another printk() can always race in between the first one and the one being "continued". Nothing calls dprintk_rcu_cont(), so just remove it. The only caller of dprintk_cont() is in nfs_commit_release_pages(). Just use a normal dprintk() there instead, since this is not SMP-safe anyway. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-21SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: unregister with rpcbind when deleting a transportOlga Kornievskaia
When a listener is added, a part of creation of transport also registers program/port with rpcbind. However, when the listener is removed, while transport goes away, rpcbind still has the entry for that port/type. When deleting the transport, unregister with rpcbind when appropriate. ---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP transport and at xprt destroy send rpcbind unregister if flag set. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: d093c9089260 ("nfsd: fix management of listener transports") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21sunrpc: Change ret code of xdr_stream_decode_opaque_fixedSergey Bashirov
Since the opaque is fixed in size, the caller already knows how many bytes were decoded, on success. Thus, xdr_stream_decode_opaque_fixed() doesn't need to return that value. And, xdr_stream_decode_u32 and _u64 both return zero on success. This patch simplifies the caller's error checking to avoid potential integer promotion issues. Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-08-09Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - don't inherit NFS filesystem capabilities when crossing from one filesystem to another Bugfixes: - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers - NFS improve bounds checking in nfs_fh_to_dentry() - NFS Fix allocation errors when writing to a NFS file backed loopback device - NFSv4: More listxattr fixes - SUNRPC: fix client handling of TLS alerts - pNFS block/scsi layout fix for an uninitialised pointer dereference - pNFS block/scsi layout fixes for the extent encoding, stripe mapping, and disk offset overflows - pNFS layoutcommit work around for RPC size limitations - pNFS/flexfiles avoid looping when handling fatal errors after layoutget - localio: fix various race conditions Features and cleanups: - Add NFSv4 support for retrieving the btime - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC - NFS: Support using a kernel keyring to store TLS certificates - NFSv4: Speed up delegation lookup using a hash table - Assorted cleanups to remove unused variables and struct fields - Assorted new tracepoints to improve debugging" * tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() NFS/localio: nfs_close_local_fh() fix check for file closed NFSv4: Remove duplicate lookups, capability probes and fsinfo calls NFS: Fix the setting of capabilities when automounting a new filesystem sunrpc: fix client side handling of tls alerts nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock() NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY NFSv4.2: another fix for listxattr NFS: Fix filehandle bounds checking in nfs_fh_to_dentry() SUNRPC: Silence warnings about parameters not being described NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final() NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate() NFS: use a hash table for delegation lookup NFS: track active delegations per-server NFS: move the delegation_watermark module parameter NFS: cleanup nfs_inode_reclaim_delegation NFS: cleanup error handling in nfs4_server_common_setup pNFS/flexfiles: don't attempt pnfs on fatal DS errors NFS: drop __exit from nfs_exit_keyring ...
2025-07-28Merge tag 'pull-rpc_pipefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rpc_pipefs updates from Al Viro: "Massage rpc_pipefs to use saner primitives and clean up the APIs provided to the rest of the kernel" * tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rpc_create_client_dir(): return 0 or -E... rpc_create_client_dir(): don't bother with rpc_populate() rpc_new_dir(): the last argument is always NULL rpc_pipe: expand the calls of rpc_mkdir_populate() rpc_gssd_dummy_populate(): don't bother with rpc_populate() rpc_mkpipe_dentry(): switch to simple_start_creating() rpc_pipe: saner primitive for creating regular files rpc_pipe: saner primitive for creating subdirectories rpc_pipe: don't overdo directory locking rpc_mkpipe_dentry(): saner calling conventions rpc_unlink(): saner calling conventions rpc_populate(): lift cleanup into callers rpc_unlink(): use simple_recursive_removal() rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead rpc_pipe: clean failure exits in fill_super new helper: simple_start_creating()
2025-07-14SUNRPC: Remove unused xdr functionsDr. David Alan Gilbert
Remove a bunch of unused xdr_*decode* functions: The last use of xdr_decode_netobj() was removed in 2021 by: commit 7cf96b6d0104 ("lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream") The last use of xdr_decode_string_inplace() was removed in 2021 by: commit 3049e974a7c7 ("lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream") The last use of xdr_stream_decode_opaque() was removed in 2024 by: commit fed8a17c61ff ("xdrgen: typedefs should use the built-in string and opaque functions") The functions xdr_stream_decode_string() and xdr_stream_decode_opaque_dup() were both added in 2018 by the commit 0e779aa70308 ("SUNRPC: Add helpers for decoding opaque and string types") but never used. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250712233006.403226-1-linux@treblig.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-07-14sunrpc: rearrange struct svc_rqst for fewer cachelinesJeff Layton
This shrinks the struct by 4 bytes, but also takes it from 19 to 18 cachelines on x86_64. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: remove SVC_SYSERRJeff Layton
Nothing returns this error code. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: fix handling of unknown auth status codesJeff Layton
In the case of an unknown error code from svc_authenticate or pg_authenticate, return AUTH_ERROR with a status of AUTH_FAILED. Also add the other auth_stat value from RFC 5531, and document all the status codes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>