<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/sunrpc/xprtrdma/rpc_rdma.c, branch docs-6.0</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-6.0</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-6.0'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2022-05-31T21:09:30+00:00</updated>
<entry>
<title>xprtrdma: treat all calls not a bcall when bc_serv is NULL</title>
<updated>2022-05-31T21:09:30+00:00</updated>
<author>
<name>Kinglong Mee</name>
<email>kinglongmee@gmail.com</email>
</author>
<published>2022-05-22T12:36:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=11270e7ca268e8d61b5d9e5c3a54bd1550642c9c'/>
<id>urn:sha1:11270e7ca268e8d61b5d9e5c3a54bd1550642c9c</id>
<content type='text'>
When a rdma server returns a fault format reply, nfs v3 client may
treats it as a bcall when bc service is not exist.

The debug message at rpcrdma_bc_receive_call are,

[56579.837169] RPC:       rpcrdma_bc_receive_call: callback XID
00000001, length=20
[56579.837174] RPC:       rpcrdma_bc_receive_call: 00 00 00 01 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 04

After that, rpcrdma_bc_receive_call will meets NULL pointer as,

[  226.057890] BUG: unable to handle kernel NULL pointer dereference at
00000000000000c8
...
[  226.058704] RIP: 0010:_raw_spin_lock+0xc/0x20
...
[  226.059732] Call Trace:
[  226.059878]  rpcrdma_bc_receive_call+0x138/0x327 [rpcrdma]
[  226.060011]  __ib_process_cq+0x89/0x170 [ib_core]
[  226.060092]  ib_cq_poll_work+0x26/0x80 [ib_core]
[  226.060257]  process_one_work+0x1a7/0x360
[  226.060367]  ? create_worker+0x1a0/0x1a0
[  226.060440]  worker_thread+0x30/0x390
[  226.060500]  ? create_worker+0x1a0/0x1a0
[  226.060574]  kthread+0x116/0x130
[  226.060661]  ? kthread_flush_work_fn+0x10/0x10
[  226.060724]  ret_from_fork+0x35/0x40
...

Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Reviewed-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Remove definitions of RPCDBG_FACILITY</title>
<updated>2022-01-14T15:35:08+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2022-01-13T17:20:30+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c0f26167ddcf94fb94e80fcb20aaac7f7db13c1a'/>
<id>urn:sha1:c0f26167ddcf94fb94e80fcb20aaac7f7db13c1a</id>
<content type='text'>
Deprecated. dprintk is no longer used in xprtrdma.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Provide a buffer to pad Write chunks of unaligned length</title>
<updated>2021-10-20T22:09:54+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-10-05T14:17:59+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=21037b8c2258ec40de3b31be9ced43ceb3b784f7'/>
<id>urn:sha1:21037b8c2258ec40de3b31be9ced43ceb3b784f7</id>
<content type='text'>
This is a buffer to be left persistently registered while a
connection is up. Connection tear-down will automatically DMA-unmap,
invalidate, and dereg the MR. A persistently registered buffer is
lower in cost to provide, and it can never be coalesced into the
RDMA segment that carries the data payload.

An RPC that provisions a Write chunk with a non-aligned length now
uses this MR rather than the tail buffer of the RPC's rq_rcv_buf.

Reviewed-By: Tom Talpey &lt;tom@talpey.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Revert 586a0787ce35</title>
<updated>2021-05-27T12:46:19+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-05-26T19:35:20+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ae605ee9830840f14566a3b1cde27fa8096dbdd4'/>
<id>urn:sha1:ae605ee9830840f14566a3b1cde27fa8096dbdd4</id>
<content type='text'>
Commit 9ed5af268e88 ("SUNRPC: Clean up the handling of page padding
in rpc_prepare_reply_pages()") [Dec 2020] affects RPC Replies that
have a data payload (i.e., Write chunks).

rpcrdma_prepare_readch(), as its name suggests, sets up Read chunks
which are data payloads within RPC Calls. Those payloads are
constructed by xdr_write_pages(), which continues to stuff the call
buffer's tail kvec with the payload's XDR roundup. Thus removing
the tail buffer logic in rpcrdma_prepare_readch() was the wrong
thing to do.

Fixes: 586a0787ce35 ("xprtrdma: Clean up rpcrdma_prepare_readch()")
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Do not wake RPC consumer on a failed LocalInv</title>
<updated>2021-04-26T13:25:19+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-04-19T18:03:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8a053433de00380a9c5758d94c7c2ec2e25321fe'/>
<id>urn:sha1:8a053433de00380a9c5758d94c7c2ec2e25321fe</id>
<content type='text'>
Throw away any reply where the LocalInv flushes or could not be
posted. The registered memory region is in an unknown state until
the disconnect completes.

rpcrdma_xprt_disconnect() will find and release the MR. No need to
put it back on the MR free list in this case.

The client retransmits pending RPC requests once it reestablishes a
fresh connection, so a replacement reply should be forthcoming on
the next connection instance.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Delete rpcrdma_recv_buffer_put()</title>
<updated>2021-04-26T13:24:12+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-04-19T18:02:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c35ca60d490e32b7e7d21f344693ea29d4f4a9d3'/>
<id>urn:sha1:c35ca60d490e32b7e7d21f344693ea29d4f4a9d3</id>
<content type='text'>
Clean up: The name recv_buffer_put() is a vestige of older code,
and the function is just a wrapper for the newer rpcrdma_rep_put().
In most of the existing call sites, a pointer to the owning
rpcrdma_buffer is already available.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Fix cwnd update ordering</title>
<updated>2021-04-26T13:24:05+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-04-19T18:02:41+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=35d8b10a25884050bb3b0149b62c3818ec59f77c'/>
<id>urn:sha1:35d8b10a25884050bb3b0149b62c3818ec59f77c</id>
<content type='text'>
After a reconnect, the reply handler is opening the cwnd (and thus
enabling more RPC Calls to be sent) /before/ rpcrdma_post_recvs()
can post enough Receive WRs to receive their replies. This causes an
RNR and the new connection is lost immediately.

The race is most clearly exposed when KASAN and disconnect injection
are enabled. This slows down rpcrdma_rep_create() enough to allow
the send side to post a bunch of RPC Calls before the Receive
completion handler can invoke ib_post_recv().

Fixes: 2ae50ad68cd7 ("xprtrdma: Close window between waking RPC senders and posting Receives")
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Clean up rpcrdma_prepare_readch()</title>
<updated>2021-02-05T20:54:03+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-02-05T16:31:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=586a0787ce35f2e32d399b7e344e634e07de2997'/>
<id>urn:sha1:586a0787ce35f2e32d399b7e344e634e07de2997</id>
<content type='text'>
Since commit 9ed5af268e88 ("SUNRPC: Clean up the handling of page
padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client
passes payload data to the transport with the padding in xdr-&gt;pages
instead of in the send buffer's tail kvec. There's no need for the
extra logic to advance the base of the tail kvec because the upper
layer no longer places XDR padding there.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
</content>
</entry>
<entry>
<title>xprtrdma: Pad optimization, revisited</title>
<updated>2021-02-05T16:16:56+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-02-04T16:59:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2324fbedc207568b11354b146c3fc4ff2b89d588'/>
<id>urn:sha1:2324fbedc207568b11354b146c3fc4ff2b89d588</id>
<content type='text'>
The NetApp Linux team discovered that with NFS/RDMA servers that do
not support RFC 8797, the Linux client is forming NFSv4.x WRITE
requests incorrectly.

In this case, the Linux NFS client disables implicit chunk round-up
for odd-length Read and Write chunks. The goal was to support old
servers that needed that padding to be sent explicitly by clients.

In that case the Linux NFS included the tail kvec in the Read chunk,
since the tail contains any needed padding. That meant a separate
memory registration is needed for the tail kvec, adding to the cost
of forming such requests. To avoid that cost for a mere 3 bytes of
zeroes that are always ignored by receivers, we try to use implicit
roundup when possible.

For NFSv4.x, the tail kvec also sometimes contains a trailing
GETATTR operation. The Linux NFS client unintentionally includes
that GETATTR operation in the Read chunk as well as inline.

The fix is simply to /never/ include the tail kvec when forming a
data payload Read chunk. The padding is thus now always present.

Note that since commit 9ed5af268e88 ("SUNRPC: Clean up the handling
of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS
client passes payload data to the transport with the padding in
xdr-&gt;pages instead of in the send buffer's tail kvec. So now the
Linux NFS client appends XDR padding to all odd-sized Read chunks.
This shouldn't be a problem because:

 - RFC 8166-compliant servers are supposed to work with or without
   that XDR padding in Read chunks.

 - Since the padding is now in the same memory region as the data
   payload, a separate memory registration is not needed. In
   addition, the link layer extends data in RDMA Read responses to
   4-byte boundaries anyway. Thus there is now no savings when the
   padding is not included.

Because older kernels include the payload's XDR padding in the
tail kvec, a fix there will be more complicated. Thus backporting
this patch is not recommended.

Reported by: Olga Kornievskaia &lt;Olga.Kornievskaia@netapp.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Reviewed-by: Tom Talpey &lt;tom@talpey.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
</content>
</entry>
<entry>
<title>rpcrdma: Fix comments about reverse-direction operation</title>
<updated>2021-02-05T16:16:56+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2021-02-04T16:59:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=84dff5eb86ceb2a356d5409fc1af21a366bf3c35'/>
<id>urn:sha1:84dff5eb86ceb2a356d5409fc1af21a366bf3c35</id>
<content type='text'>
During the final stages of publication of RFC 8167, reviewers
requested that we use the term "reverse direction" rather than
"backwards direction". Update comments to reflect this preference.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Reviewed-by: Tom Talpey &lt;tom@talpey.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
</content>
</entry>
</feed>
