<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/smc/smc_clc.c, branch v6.5</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v6.5</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v6.5'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2023-08-09T10:20:28+00:00</updated>
<entry>
<title>net/smc: Fix setsockopt and sysctl to specify same buffer size again</title>
<updated>2023-08-09T10:20:28+00:00</updated>
<author>
<name>Gerd Bayer</name>
<email>gbayer@linux.ibm.com</email>
</author>
<published>2023-08-04T17:06:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=833bac7ec392bf75053c8a4fa4c36d4148dac77d'/>
<id>urn:sha1:833bac7ec392bf75053c8a4fa4c36d4148dac77d</id>
<content type='text'>
Commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock
and make them tunable") introduced the net.smc.rmem and net.smc.wmem
sysctls to specify the size of buffers to be used for SMC type
connections. This created a regression for users that specified the
buffer size via setsockopt() as the effective buffer size was now
doubled.

Re-introduce the division by 2 in the SMC buffer create code and level
this out by duplicating the net.smc.[rw]mem values used for initializing
sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both
methods (setsockopt or sysctl) the effective buffer size that they
expect.

Initialize net.smc.[rw]mem from its own constant of 64kB, respectively.
Internal performance tests show that this value is a good compromise
between throughput/latency and memory consumption. Also, this decouples
it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the
module for SMC protocol was loaded. Check that no more than INT_MAX / 2
is assigned to net.smc.[rw]mem, in order to avoid any overflow condition
when that is doubled for use in sk_sndbuf or sk_rcvbuf.

While at it, drop the confusing sk_buf_size variable from
__smc_buf_create and name "compressed" buffer size variables more
consistently.

Background:

Before the commit mentioned above, SMC's buffer allocator in
__smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf
value as initial value to search for appropriate buffers. If the search
resorted to using a bigger buffer when all buffers of the specified
size were busy, the duplicate of the used effective buffer size is
stored back to sk_rcvbuf/sk_sndbuf.

When available, buffers of exactly the size that a user had specified as
input to setsockopt() were used, despite setsockopt()'s documentation in
"man 7 socket" talking of a mandatory duplication:

[...]
       SO_SNDBUF
              Sets  or  gets the maximum socket send buffer in bytes.
              The kernel doubles this value (to allow space for book‐
              keeping  overhead)  when it is set using setsockopt(2),
              and this doubled value is  returned  by  getsockopt(2).
              The     default     value     is     set     by     the
              /proc/sys/net/core/wmem_default file  and  the  maximum
              allowed value is set by the /proc/sys/net/core/wmem_max
              file.  The minimum (doubled) value for this  option  is
              2048.
[...]

Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable")
Co-developed-by: Jan Karcher &lt;jaka@linux.ibm.com&gt;
Signed-off-by: Jan Karcher &lt;jaka@linux.ibm.com&gt;
Reviewed-by: Wenjia Zhang &lt;wenjia@linux.ibm.com&gt;
Reviewed-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: Gerd Bayer &lt;gbayer@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: Separate SMC-D and ISM APIs</title>
<updated>2023-01-25T09:46:48+00:00</updated>
<author>
<name>Stefan Raspl</name>
<email>raspl@linux.ibm.com</email>
</author>
<published>2023-01-23T18:17:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9de4df7b6be1cfca500f8ba21137d53eec45418a'/>
<id>urn:sha1:9de4df7b6be1cfca500f8ba21137d53eec45418a</id>
<content type='text'>
We separate the code implementing the struct smcd_ops API in the ISM
device driver from the functions that may be used by other exploiters of
ISM devices.
Note: We start out small, and don't offer the whole breadth of the ISM
device for public use, as many functions are specific to or likely only
ever used in the context of SMC-D.
This is the third part of a bigger overhaul of the interfaces between SMC
and ISM.

Signed-off-by: Stefan Raspl &lt;raspl@linux.ibm.com&gt;
Signed-off-by: Jan Karcher &lt;jaka@linux.ibm.com&gt;
Signed-off-by: Wenjia Zhang &lt;wenjia@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>use less confusing names for iov_iter direction initializers</title>
<updated>2022-11-25T18:01:55+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2022-09-16T00:25:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=de4eda9de2d957ef2d6a8365a01e26a435e958cb'/>
<id>urn:sha1:de4eda9de2d957ef2d6a8365a01e26a435e958cb</id>
<content type='text'>
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R</title>
<updated>2022-07-18T10:19:17+00:00</updated>
<author>
<name>Wen Gu</name>
<email>guwen@linux.alibaba.com</email>
</author>
<published>2022-07-14T09:44:04+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b8d199451c99b3796b840c350eb74b830c5c869b'/>
<id>urn:sha1:b8d199451c99b3796b840c350eb74b830c5c869b</id>
<content type='text'>
On long-running enterprise production servers, high-order contiguous
memory pages are usually very rare and in most cases we can only get
fragmented pages.

When replacing TCP with SMC-R in such production scenarios, attempting
to allocate high-order physically contiguous sndbufs and RMBs may result
in frequent memory compaction, which will cause unexpected hung issue
and further stability risks.

So this patch is aimed to allow SMC-R link group to use virtually
contiguous sndbufs and RMBs to avoid potential issues mentioned above.
Whether to use physically or virtually contiguous buffers can be set
by sysctl smcr_buf_type.

Note that using virtually contiguous buffers will bring an acceptable
performance regression, which can be mainly divided into two parts:

1) regression in data path, which is brought by additional address
   translation of sndbuf by RNIC in Tx. But in general, translating
   address through MTT is fast.

   Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
   latency and bandwidth test with physically and virtually contiguous
   buffers are as follows:

- client:
  smc_run taskset -c &lt;cpu&gt; qperf &lt;server&gt; -oo msg_size:1:64K:*2\
  -t 5 -vu tcp_{bw|lat}
- server:
  smc_run taskset -c &lt;cpu&gt; qperf

   [latency]
   msgsize              tcp            smcr        smcr-use-virt-buf
   1               11.17 us         7.56 us         7.51 us (-0.67%)
   2               10.65 us         7.74 us         7.56 us (-2.31%)
   4               11.11 us         7.52 us         7.59 us ( 0.84%)
   8               10.83 us         7.55 us         7.51 us (-0.48%)
   16              11.21 us         7.46 us         7.51 us ( 0.71%)
   32              10.65 us         7.53 us         7.58 us ( 0.61%)
   64              10.95 us         7.74 us         7.80 us ( 0.76%)
   128             11.14 us         7.83 us         7.87 us ( 0.47%)
   256             10.97 us         7.94 us         7.92 us (-0.28%)
   512             11.23 us         7.94 us         8.20 us ( 3.25%)
   1024            11.60 us         8.12 us         8.20 us ( 0.96%)
   2048            14.04 us         8.30 us         8.51 us ( 2.49%)
   4096            16.88 us         9.13 us         9.07 us (-0.64%)
   8192            22.50 us        10.56 us        11.22 us ( 6.26%)
   16384           28.99 us        12.88 us        13.83 us ( 7.37%)
   32768           40.13 us        16.76 us        16.95 us ( 1.16%)
   65536           68.70 us        24.68 us        24.85 us ( 0.68%)
   [bandwidth]
   msgsize                tcp              smcr          smcr-use-virt-buf
   1                1.65 MB/s         1.59 MB/s         1.53 MB/s (-3.88%)
   2                3.32 MB/s         3.17 MB/s         3.08 MB/s (-2.67%)
   4                6.66 MB/s         6.33 MB/s         6.09 MB/s (-3.85%)
   8               13.67 MB/s        13.45 MB/s        11.97 MB/s (-10.99%)
   16              25.36 MB/s        27.15 MB/s        24.16 MB/s (-11.01%)
   32              48.22 MB/s        54.24 MB/s        49.41 MB/s (-8.89%)
   64             106.79 MB/s       107.32 MB/s        99.05 MB/s (-7.71%)
   128            210.21 MB/s       202.46 MB/s       201.02 MB/s (-0.71%)
   256            400.81 MB/s       416.81 MB/s       393.52 MB/s (-5.59%)
   512            746.49 MB/s       834.12 MB/s       809.99 MB/s (-2.89%)
   1024          1292.33 MB/s      1641.96 MB/s      1571.82 MB/s (-4.27%)
   2048          2007.64 MB/s      2760.44 MB/s      2717.68 MB/s (-1.55%)
   4096          2665.17 MB/s      4157.44 MB/s      4070.76 MB/s (-2.09%)
   8192          3159.72 MB/s      4361.57 MB/s      4270.65 MB/s (-2.08%)
   16384         4186.70 MB/s      4574.13 MB/s      4501.17 MB/s (-1.60%)
   32768         4093.21 MB/s      4487.42 MB/s      4322.43 MB/s (-3.68%)
   65536         4057.14 MB/s      4735.61 MB/s      4555.17 MB/s (-3.81%)

2) regression in buffer initialization and destruction path, which is
   brought by additional MR operations of sndbufs. But thanks to link
   group buffer reuse mechanism, the impact of this kind of regression
   decreases as times of buffer reuse increases.

   Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
   buffer-related function obtained by bpftrace are as follows:

   Function                         Phys-bufs           Virt-bufs
   smcr_new_buf_create()             67154 ns            79164 ns
   smc_ib_buf_map_sg()                 525 ns              928 ns
   smc_ib_get_memory_region()       162294 ns           161191 ns
   smc_wr_reg_send()                  9957 ns             9635 ns
   smc_ib_put_memory_region()       203548 ns           198374 ns
   smc_ib_buf_unmap_sg()               508 ns             1158 ns

------------
Test environment notes:
1. Above tests run on 2 VMs within the same Host.
2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
   the each VM respectively.
3. VMs' vCPUs are binded to different physical CPUs, and the binded
   physical CPUs are isolated by `isolcpus=xxx` cmdline.
4. NICs' queue number are set to 1.

Signed-off-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: use memcpy instead of snprintf to avoid out of bounds read</title>
<updated>2022-04-12T01:28:02+00:00</updated>
<author>
<name>Karsten Graul</name>
<email>kgraul@linux.ibm.com</email>
</author>
<published>2022-04-08T15:10:33+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b1871fd48efc567650dbdc974e5a2342a03fe0d2'/>
<id>urn:sha1:b1871fd48efc567650dbdc974e5a2342a03fe0d2</id>
<content type='text'>
Using snprintf() to convert not null-terminated strings to null
terminated strings may cause out of bounds read in the source string.
Therefore use memcpy() and terminate the target string with a null
afterwards.

Fixes: fa0866625543 ("net/smc: add support for user defined EIDs")
Fixes: 3c572145c24e ("net/smc: add generic netlink support for system EID")
Signed-off-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: Introduce a new conn-&gt;lgr validity check helper</title>
<updated>2022-01-13T13:14:53+00:00</updated>
<author>
<name>Wen Gu</name>
<email>guwen@linux.alibaba.com</email>
</author>
<published>2022-01-13T08:36:41+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ea89c6c0983c39702a4a52ccaa4702e0cb71179b'/>
<id>urn:sha1:ea89c6c0983c39702a4a52ccaa4702e0cb71179b</id>
<content type='text'>
It is no longer suitable to identify whether a smc connection
is registered in a link group through checking if conn-&gt;lgr
is NULL, because conn-&gt;lgr won't be reset even the connection
is unregistered from a link group.

So this patch introduces a new helper smc_conn_lgr_valid() and
replaces all the check of conn-&gt;lgr in original implementation
with the new helper to judge if conn-&gt;lgr is valid to use.

Signed-off-by: Wen Gu &lt;guwen@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: remove redundant re-assignment of pointer link</title>
<updated>2022-01-02T12:14:10+00:00</updated>
<author>
<name>Colin Ian King</name>
<email>colin.i.king@gmail.com</email>
</author>
<published>2021-12-30T15:39:00+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3a856c14c31bb8ba35534e9a2f2273c1b322f372'/>
<id>urn:sha1:3a856c14c31bb8ba35534e9a2f2273c1b322f372</id>
<content type='text'>
The pointer link is being re-assigned the same value that it was
initialized with in the previous declaration statement. The
re-assignment is redundant and can be removed.

Fixes: 387707fdf486 ("net/smc: convert static link ID to dynamic references")
Signed-off-by: Colin Ian King &lt;colin.i.king@gmail.com&gt;
Reviewed-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: add v2 format of CLC decline message</title>
<updated>2021-10-16T13:58:13+00:00</updated>
<author>
<name>Karsten Graul</name>
<email>kgraul@linux.ibm.com</email>
</author>
<published>2021-10-16T09:37:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8ade200c269f8530efde05b616801ed0612d7d72'/>
<id>urn:sha1:8ade200c269f8530efde05b616801ed0612d7d72</id>
<content type='text'>
The CLC decline message changed with SMC-Rv2 and supports up to
4 additional diagnosis codes.

Signed-off-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: add SMC-Rv2 connection establishment</title>
<updated>2021-10-16T13:58:12+00:00</updated>
<author>
<name>Karsten Graul</name>
<email>kgraul@linux.ibm.com</email>
</author>
<published>2021-10-16T09:37:45+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e5c4744cfb598f98672f8d21d59ef2c1fa9c9b5f'/>
<id>urn:sha1:e5c4744cfb598f98672f8d21d59ef2c1fa9c9b5f</id>
<content type='text'>
Send a CLC proposal message, and the remote side process this type of
message and determine the target GID. Check for a valid route to this
GID, and complete the connection establishment.

Signed-off-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: prepare for SMC-Rv2 connection</title>
<updated>2021-10-16T13:58:12+00:00</updated>
<author>
<name>Karsten Graul</name>
<email>kgraul@linux.ibm.com</email>
</author>
<published>2021-10-16T09:37:44+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=42042dbbc2ebb926e594b22490374d9343c746ef'/>
<id>urn:sha1:42042dbbc2ebb926e594b22490374d9343c746ef</id>
<content type='text'>
Prepare the connection establishment with SMC-Rv2. Detect eligible
RoCE cards and indicate all supported SMC modes for the connection.

Signed-off-by: Karsten Graul &lt;kgraul@linux.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
