diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-08-29 11:33:01 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-08-29 11:33:01 -0700 |
commit | bd6c11bc43c496cddfc6cf603b5d45365606dbd5 (patch) | |
tree | 36318fa68f784d397111991177d65bd6325189c4 /net/smc | |
parent | 68cf01760bc0891074e813b9bb06d2696cac1c01 (diff) | |
parent | c873512ef3a39cc1a605b7a5ff2ad0a33d619aa8 (diff) | |
download | lwn-bd6c11bc43c496cddfc6cf603b5d45365606dbd5.tar.gz lwn-bd6c11bc43c496cddfc6cf603b5d45365606dbd5.zip |
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
Diffstat (limited to 'net/smc')
-rw-r--r-- | net/smc/af_smc.c | 88 | ||||
-rw-r--r-- | net/smc/smc.h | 5 | ||||
-rw-r--r-- | net/smc/smc_clc.c | 147 | ||||
-rw-r--r-- | net/smc/smc_clc.h | 53 | ||||
-rw-r--r-- | net/smc/smc_core.c | 13 | ||||
-rw-r--r-- | net/smc/smc_core.h | 26 | ||||
-rw-r--r-- | net/smc/smc_ib.h | 1 | ||||
-rw-r--r-- | net/smc/smc_llc.c | 25 |
8 files changed, 298 insertions, 60 deletions
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index f5834af5fad5..bacdd971615e 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -641,20 +641,22 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) smc_llc_link_active(link); smcr_lgr_set_type(link->lgr, SMC_LGR_SINGLE); - /* optional 2nd link, receive ADD LINK request from server */ - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, - SMC_LLC_ADD_LINK); - if (!qentry) { - struct smc_clc_msg_decline dclc; - - rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), - SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT); - if (rc == -EAGAIN) - rc = 0; /* no DECLINE received, go with one link */ - return rc; + if (link->lgr->max_links > 1) { + /* optional 2nd link, receive ADD LINK request from server */ + qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, + SMC_LLC_ADD_LINK); + if (!qentry) { + struct smc_clc_msg_decline dclc; + + rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), + SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT); + if (rc == -EAGAIN) + rc = 0; /* no DECLINE received, go with one link */ + return rc; + } + smc_llc_flow_qentry_clr(&link->lgr->llc_flow_lcl); + smc_llc_cli_add_link(link, qentry); } - smc_llc_flow_qentry_clr(&link->lgr->llc_flow_lcl); - smc_llc_cli_add_link(link, qentry); return 0; } @@ -1144,7 +1146,7 @@ static int smc_connect_ism_vlan_cleanup(struct smc_sock *smc, #define SMC_CLC_MAX_ACCEPT_LEN \ (sizeof(struct smc_clc_msg_accept_confirm_v2) + \ - sizeof(struct smc_clc_first_contact_ext) + \ + sizeof(struct smc_clc_first_contact_ext_v2x) + \ sizeof(struct smc_clc_msg_trail)) /* CLC handshake during connect */ @@ -1198,8 +1200,8 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, struct smc_clc_msg_accept_confirm_v2 *clc_v2 = (struct smc_clc_msg_accept_confirm_v2 *)aclc; struct smc_clc_first_contact_ext *fce = - (struct smc_clc_first_contact_ext *) - (((u8 *)clc_v2) + sizeof(*clc_v2)); + smc_get_clc_first_contact_ext(clc_v2, false); + int rc; if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) return 0; @@ -1218,6 +1220,12 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, return SMC_CLC_DECL_NOINDIRECT; } } + + ini->release_nr = fce->release; + rc = smc_clc_clnt_v2x_features_validate(fce, ini); + if (rc) + return rc; + return 0; } @@ -1236,6 +1244,8 @@ static int smc_connect_rdma(struct smc_sock *smc, memcpy(ini->peer_systemid, aclc->r0.lcl.id_for_peer, SMC_SYSTEMID_LEN); memcpy(ini->peer_gid, aclc->r0.lcl.gid, SMC_GID_SIZE); memcpy(ini->peer_mac, aclc->r0.lcl.mac, ETH_ALEN); + ini->max_conns = SMC_CONN_PER_LGR_MAX; + ini->max_links = SMC_LINKS_ADD_LNK_MAX; reason_code = smc_connect_rdma_v2_prepare(smc, aclc, ini); if (reason_code) @@ -1386,6 +1396,16 @@ static int smc_connect_ism(struct smc_sock *smc, struct smc_clc_msg_accept_confirm_v2 *aclc_v2 = (struct smc_clc_msg_accept_confirm_v2 *)aclc; + if (ini->first_contact_peer) { + struct smc_clc_first_contact_ext *fce = + smc_get_clc_first_contact_ext(aclc_v2, true); + + ini->release_nr = fce->release; + rc = smc_clc_clnt_v2x_features_validate(fce, ini); + if (rc) + return rc; + } + rc = smc_v2_determine_accepted_chid(aclc_v2, ini); if (rc) return rc; @@ -1420,7 +1440,7 @@ static int smc_connect_ism(struct smc_sock *smc, } rc = smc_clc_send_confirm(smc, ini->first_contact_local, - aclc->hdr.version, eid, NULL); + aclc->hdr.version, eid, ini); if (rc) goto connect_abort; mutex_unlock(&smc_server_lgr_pending); @@ -1820,7 +1840,7 @@ void smc_close_non_accepted(struct sock *sk) lock_sock(sk); if (!sk->sk_lingertime) /* wait for peer closing */ - sk->sk_lingertime = SMC_MAX_STREAM_WAIT_TIMEOUT; + WRITE_ONCE(sk->sk_lingertime, SMC_MAX_STREAM_WAIT_TIMEOUT); __smc_release(smc); release_sock(sk); sock_put(sk); /* sock_hold above */ @@ -1870,10 +1890,12 @@ static int smcr_serv_conf_first_link(struct smc_sock *smc) smc_llc_link_active(link); smcr_lgr_set_type(link->lgr, SMC_LGR_SINGLE); - down_write(&link->lgr->llc_conf_mutex); - /* initial contact - try to establish second link */ - smc_llc_srv_add_link(link, NULL); - up_write(&link->lgr->llc_conf_mutex); + if (link->lgr->max_links > 1) { + down_write(&link->lgr->llc_conf_mutex); + /* initial contact - try to establish second link */ + smc_llc_srv_add_link(link, NULL); + up_write(&link->lgr->llc_conf_mutex); + } return 0; } @@ -1996,6 +2018,10 @@ static int smc_listen_v2_check(struct smc_sock *new_smc, } } + ini->release_nr = pclc_v2_ext->hdr.flag.release; + if (pclc_v2_ext->hdr.flag.release > SMC_RELEASE) + ini->release_nr = SMC_RELEASE; + out: if (!ini->smcd_version && !ini->smcr_version) return rc; @@ -2430,6 +2456,10 @@ static void smc_listen_work(struct work_struct *work) if (rc) goto out_decl; + rc = smc_clc_srv_v2x_features_validate(pclc, ini); + if (rc) + goto out_decl; + mutex_lock(&smc_server_lgr_pending); smc_close_init(new_smc); smc_rx_init(new_smc); @@ -2443,7 +2473,7 @@ static void smc_listen_work(struct work_struct *work) /* send SMC Accept CLC message */ accept_version = ini->is_smcd ? ini->smcd_version : ini->smcr_version; rc = smc_clc_send_accept(new_smc, ini->first_contact_local, - accept_version, ini->negotiated_eid); + accept_version, ini->negotiated_eid, ini); if (rc) goto out_unlock; @@ -2462,6 +2492,18 @@ static void smc_listen_work(struct work_struct *work) goto out_decl; } + rc = smc_clc_v2x_features_confirm_check(cclc, ini); + if (rc) { + if (!ini->is_smcd) + goto out_unlock; + goto out_decl; + } + + /* fce smc release version is needed in smc_listen_rdma_finish, + * so save fce info here. + */ + smc_conn_save_peer_info_fce(new_smc, cclc); + /* finish worker */ if (!ini->is_smcd) { rc = smc_listen_rdma_finish(new_smc, cclc, diff --git a/net/smc/smc.h b/net/smc/smc.h index 1f2b912c43d1..24745fde4ac2 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -21,7 +21,10 @@ #define SMC_V1 1 /* SMC version V1 */ #define SMC_V2 2 /* SMC version V2 */ -#define SMC_RELEASE 0 + +#define SMC_RELEASE_0 0 +#define SMC_RELEASE_1 1 +#define SMC_RELEASE SMC_RELEASE_1 /* the latest release version */ #define SMCPROTO_SMC 0 /* SMC protocol, IPv4 */ #define SMCPROTO_SMC6 1 /* SMC protocol, IPv6 */ diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index c90d9e5dda54..8deb46c28f1d 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -391,9 +391,7 @@ smc_clc_msg_acc_conf_valid(struct smc_clc_msg_accept_confirm_v2 *clc_v2) return false; } else { if (hdr->typev1 == SMC_TYPE_D && - ntohs(hdr->length) != SMCD_CLC_ACCEPT_CONFIRM_LEN_V2 && - (ntohs(hdr->length) != SMCD_CLC_ACCEPT_CONFIRM_LEN_V2 + - sizeof(struct smc_clc_first_contact_ext))) + ntohs(hdr->length) < SMCD_CLC_ACCEPT_CONFIRM_LEN_V2) return false; if (hdr->typev1 == SMC_TYPE_R && ntohs(hdr->length) < SMCR_CLC_ACCEPT_CONFIRM_LEN_V2) @@ -420,13 +418,29 @@ smc_clc_msg_decl_valid(struct smc_clc_msg_decline *dclc) return true; } -static void smc_clc_fill_fce(struct smc_clc_first_contact_ext *fce, int *len) +static int smc_clc_fill_fce(struct smc_clc_first_contact_ext_v2x *fce, + struct smc_init_info *ini) { + int ret = sizeof(*fce); + memset(fce, 0, sizeof(*fce)); - fce->os_type = SMC_CLC_OS_LINUX; - fce->release = SMC_RELEASE; - memcpy(fce->hostname, smc_hostname, sizeof(smc_hostname)); - (*len) += sizeof(*fce); + fce->fce_v2_base.os_type = SMC_CLC_OS_LINUX; + fce->fce_v2_base.release = ini->release_nr; + memcpy(fce->fce_v2_base.hostname, smc_hostname, sizeof(smc_hostname)); + if (ini->is_smcd && ini->release_nr < SMC_RELEASE_1) { + ret = sizeof(struct smc_clc_first_contact_ext); + goto out; + } + + if (ini->release_nr >= SMC_RELEASE_1) { + if (!ini->is_smcd) { + fce->max_conns = ini->max_conns; + fce->max_links = ini->max_links; + } + } + +out: + return ret; } /* check if received message has a correct header length and contains valid @@ -927,8 +941,11 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini) sizeof(struct smc_clc_smcd_gid_chid); } } - if (smcr_indicated(ini->smc_type_v2)) + if (smcr_indicated(ini->smc_type_v2)) { memcpy(v2_ext->roce, ini->smcrv2.ib_gid_v2, SMC_GID_SIZE); + v2_ext->max_conns = SMC_CONN_PER_LGR_PREFER; + v2_ext->max_links = SMC_LINKS_PER_LGR_MAX_PREFER; + } pclc_base->hdr.length = htons(plen); memcpy(trl->eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER)); @@ -986,13 +1003,13 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, u8 *eid, struct smc_init_info *ini) { struct smc_connection *conn = &smc->conn; + struct smc_clc_first_contact_ext_v2x fce; struct smc_clc_msg_accept_confirm *clc; - struct smc_clc_first_contact_ext fce; struct smc_clc_fce_gid_ext gle; struct smc_clc_msg_trail trl; + int i, len, fce_len; struct kvec vec[5]; struct msghdr msg; - int i, len; /* send SMC Confirm CLC msg */ clc = (struct smc_clc_msg_accept_confirm *)clc_v2; @@ -1018,8 +1035,10 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, if (eid && eid[0]) memcpy(clc_v2->d1.eid, eid, SMC_MAX_EID_LEN); len = SMCD_CLC_ACCEPT_CONFIRM_LEN_V2; - if (first_contact) - smc_clc_fill_fce(&fce, &len); + if (first_contact) { + fce_len = smc_clc_fill_fce(&fce, ini); + len += fce_len; + } clc_v2->hdr.length = htons(len); } memcpy(trl.eyecatcher, SMCD_EYECATCHER, @@ -1063,15 +1082,14 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, memcpy(clc_v2->r1.eid, eid, SMC_MAX_EID_LEN); len = SMCR_CLC_ACCEPT_CONFIRM_LEN_V2; if (first_contact) { - smc_clc_fill_fce(&fce, &len); - fce.v2_direct = !link->lgr->uses_gateway; - memset(&gle, 0, sizeof(gle)); - if (ini && clc->hdr.type == SMC_CLC_CONFIRM) { + fce_len = smc_clc_fill_fce(&fce, ini); + len += fce_len; + fce.fce_v2_base.v2_direct = !link->lgr->uses_gateway; + if (clc->hdr.type == SMC_CLC_CONFIRM) { + memset(&gle, 0, sizeof(gle)); gle.gid_cnt = ini->smcrv2.gidlist.len; len += sizeof(gle); len += gle.gid_cnt * sizeof(gle.gid[0]); - } else { - len += sizeof(gle.reserved); } } clc_v2->hdr.length = htons(len); @@ -1094,7 +1112,7 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, sizeof(trl); if (version > SMC_V1 && first_contact) { vec[i].iov_base = &fce; - vec[i++].iov_len = sizeof(fce); + vec[i++].iov_len = fce_len; if (!conn->lgr->is_smcd) { if (clc->hdr.type == SMC_CLC_CONFIRM) { vec[i].iov_base = &gle; @@ -1102,9 +1120,6 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, vec[i].iov_base = &ini->smcrv2.gidlist.list; vec[i++].iov_len = gle.gid_cnt * sizeof(gle.gid[0]); - } else { - vec[i].iov_base = &gle.reserved; - vec[i++].iov_len = sizeof(gle.reserved); } } } @@ -1141,7 +1156,7 @@ int smc_clc_send_confirm(struct smc_sock *smc, bool clnt_first_contact, /* send CLC ACCEPT message across internal TCP socket */ int smc_clc_send_accept(struct smc_sock *new_smc, bool srv_first_contact, - u8 version, u8 *negotiated_eid) + u8 version, u8 *negotiated_eid, struct smc_init_info *ini) { struct smc_clc_msg_accept_confirm_v2 aclc_v2; int len; @@ -1149,13 +1164,95 @@ int smc_clc_send_accept(struct smc_sock *new_smc, bool srv_first_contact, memset(&aclc_v2, 0, sizeof(aclc_v2)); aclc_v2.hdr.type = SMC_CLC_ACCEPT; len = smc_clc_send_confirm_accept(new_smc, &aclc_v2, srv_first_contact, - version, negotiated_eid, NULL); + version, negotiated_eid, ini); if (len < ntohs(aclc_v2.hdr.length)) len = len >= 0 ? -EPROTO : -new_smc->clcsock->sk->sk_err; return len > 0 ? 0 : len; } +int smc_clc_srv_v2x_features_validate(struct smc_clc_msg_proposal *pclc, + struct smc_init_info *ini) +{ + struct smc_clc_v2_extension *pclc_v2_ext; + + ini->max_conns = SMC_CONN_PER_LGR_MAX; + ini->max_links = SMC_LINKS_ADD_LNK_MAX; + + if ((!(ini->smcd_version & SMC_V2) && !(ini->smcr_version & SMC_V2)) || + ini->release_nr < SMC_RELEASE_1) + return 0; + + pclc_v2_ext = smc_get_clc_v2_ext(pclc); + if (!pclc_v2_ext) + return SMC_CLC_DECL_NOV2EXT; + + if (ini->smcr_version & SMC_V2) { + ini->max_conns = min_t(u8, pclc_v2_ext->max_conns, SMC_CONN_PER_LGR_PREFER); + if (ini->max_conns < SMC_CONN_PER_LGR_MIN) + return SMC_CLC_DECL_MAXCONNERR; + + ini->max_links = min_t(u8, pclc_v2_ext->max_links, SMC_LINKS_PER_LGR_MAX_PREFER); + if (ini->max_links < SMC_LINKS_ADD_LNK_MIN) + return SMC_CLC_DECL_MAXLINKERR; + } + + return 0; +} + +int smc_clc_clnt_v2x_features_validate(struct smc_clc_first_contact_ext *fce, + struct smc_init_info *ini) +{ + struct smc_clc_first_contact_ext_v2x *fce_v2x = + (struct smc_clc_first_contact_ext_v2x *)fce; + + if (ini->release_nr < SMC_RELEASE_1) + return 0; + + if (!ini->is_smcd) { + if (fce_v2x->max_conns < SMC_CONN_PER_LGR_MIN) + return SMC_CLC_DECL_MAXCONNERR; + ini->max_conns = fce_v2x->max_conns; + + if (fce_v2x->max_links > SMC_LINKS_ADD_LNK_MAX || + fce_v2x->max_links < SMC_LINKS_ADD_LNK_MIN) + return SMC_CLC_DECL_MAXLINKERR; + ini->max_links = fce_v2x->max_links; + } + + return 0; +} + +int smc_clc_v2x_features_confirm_check(struct smc_clc_msg_accept_confirm *cclc, + struct smc_init_info *ini) +{ + struct smc_clc_msg_accept_confirm_v2 *clc_v2 = + (struct smc_clc_msg_accept_confirm_v2 *)cclc; + struct smc_clc_first_contact_ext *fce = + smc_get_clc_first_contact_ext(clc_v2, ini->is_smcd); + struct smc_clc_first_contact_ext_v2x *fce_v2x = + (struct smc_clc_first_contact_ext_v2x *)fce; + + if (cclc->hdr.version == SMC_V1 || + !(cclc->hdr.typev2 & SMC_FIRST_CONTACT_MASK)) + return 0; + + if (ini->release_nr != fce->release) + return SMC_CLC_DECL_RELEASEERR; + + if (fce->release < SMC_RELEASE_1) + return 0; + + if (!ini->is_smcd) { + if (fce_v2x->max_conns != ini->max_conns) + return SMC_CLC_DECL_MAXCONNERR; + if (fce_v2x->max_links != ini->max_links) + return SMC_CLC_DECL_MAXLINKERR; + } + + return 0; +} + void smc_clc_get_hostname(u8 **host) { *host = &smc_hostname[0]; diff --git a/net/smc/smc_clc.h b/net/smc/smc_clc.h index 5fee545c9a10..c5c8e7db775a 100644 --- a/net/smc/smc_clc.h +++ b/net/smc/smc_clc.h @@ -45,6 +45,9 @@ #define SMC_CLC_DECL_NOSEID 0x03030006 /* peer sent no SEID */ #define SMC_CLC_DECL_NOSMCD2DEV 0x03030007 /* no SMC-Dv2 device found */ #define SMC_CLC_DECL_NOUEID 0x03030008 /* peer sent no UEID */ +#define SMC_CLC_DECL_RELEASEERR 0x03030009 /* release version negotiate failed */ +#define SMC_CLC_DECL_MAXCONNERR 0x0303000a /* max connections negotiate failed */ +#define SMC_CLC_DECL_MAXLINKERR 0x0303000b /* max links negotiate failed */ #define SMC_CLC_DECL_MODEUNSUPP 0x03040000 /* smc modes do not match (R or D)*/ #define SMC_CLC_DECL_RMBE_EC 0x03050000 /* peer has eyecatcher in RMBE */ #define SMC_CLC_DECL_OPTUNSUPP 0x03060000 /* fastopen sockopt not supported */ @@ -133,7 +136,9 @@ struct smc_clc_smcd_gid_chid { struct smc_clc_v2_extension { struct smc_clnt_opts_area_hdr hdr; u8 roce[16]; /* RoCEv2 GID */ - u8 reserved[16]; + u8 max_conns; + u8 max_links; + u8 reserved[14]; u8 user_eids[][SMC_MAX_EID_LEN]; }; @@ -147,7 +152,9 @@ struct smc_clc_msg_proposal_prefix { /* prefix part of clc proposal message*/ struct smc_clc_msg_smcd { /* SMC-D GID information */ struct smc_clc_smcd_gid_chid ism; /* ISM native GID+CHID of requestor */ __be16 v2_ext_offset; /* SMC Version 2 Extension Offset */ - u8 reserved[28]; + u8 vendor_oui[3]; /* vendor organizationally unique identifier */ + u8 vendor_exp_options[5]; + u8 reserved[20]; }; struct smc_clc_smcd_v2_extension { @@ -231,8 +238,19 @@ struct smc_clc_first_contact_ext { u8 hostname[SMC_MAX_HOSTNAME_LEN]; }; +struct smc_clc_first_contact_ext_v2x { + struct smc_clc_first_contact_ext fce_v2_base; + u8 max_conns; /* for SMC-R only */ + u8 max_links; /* for SMC-R only */ + u8 reserved3[2]; + __be32 vendor_exp_options; + u8 reserved4[8]; +} __packed; /* format defined in + * IBM Shared Memory Communications Version 2 (Third Edition) + * (https://www.ibm.com/support/pages/node/7009315) + */ + struct smc_clc_fce_gid_ext { - u8 reserved[16]; u8 gid_cnt; u8 reserved2[3]; u8 gid[][SMC_GID_SIZE]; @@ -370,6 +388,27 @@ smc_get_clc_smcd_v2_ext(struct smc_clc_v2_extension *prop_v2ext) ntohs(prop_v2ext->hdr.smcd_v2_ext_offset)); } +static inline struct smc_clc_first_contact_ext * +smc_get_clc_first_contact_ext(struct smc_clc_msg_accept_confirm_v2 *clc_v2, + bool is_smcd) +{ + int clc_v2_len; + + if (clc_v2->hdr.version == SMC_V1 || + !(clc_v2->hdr.typev2 & SMC_FIRST_CONTACT_MASK)) + return NULL; + + if (is_smcd) + clc_v2_len = + offsetofend(struct smc_clc_msg_accept_confirm_v2, d1); + else + clc_v2_len = + offsetofend(struct smc_clc_msg_accept_confirm_v2, r1); + + return (struct smc_clc_first_contact_ext *)(((u8 *)clc_v2) + + clc_v2_len); +} + struct smcd_dev; struct smc_init_info; @@ -382,7 +421,13 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini); int smc_clc_send_confirm(struct smc_sock *smc, bool clnt_first_contact, u8 version, u8 *eid, struct smc_init_info *ini); int smc_clc_send_accept(struct smc_sock *smc, bool srv_first_contact, - u8 version, u8 *negotiated_eid); + u8 version, u8 *negotiated_eid, struct smc_init_info *ini); +int smc_clc_srv_v2x_features_validate(struct smc_clc_msg_proposal *pclc, + struct smc_init_info *ini); +int smc_clc_clnt_v2x_features_validate(struct smc_clc_first_contact_ext *fce, + struct smc_init_info *ini); +int smc_clc_v2x_features_confirm_check(struct smc_clc_msg_accept_confirm *cclc, + struct smc_init_info *ini); void smc_clc_init(void) __init; void smc_clc_exit(void); void smc_clc_get_hostname(u8 **host); diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 6b78075404d7..bd01dd31e4bd 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -319,6 +319,10 @@ static int smc_nl_fill_smcr_lgr_v2(struct smc_link_group *lgr, goto errattr; if (nla_put_u8(skb, SMC_NLA_LGR_R_V2_DIRECT, !lgr->uses_gateway)) goto errv2attr; + if (nla_put_u8(skb, SMC_NLA_LGR_R_V2_MAX_CONNS, lgr->max_conns)) + goto errv2attr; + if (nla_put_u8(skb, SMC_NLA_LGR_R_V2_MAX_LINKS, lgr->max_links)) + goto errv2attr; nla_nest_end(skb, v2_attrs); return 0; @@ -895,9 +899,13 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini) lgr->uses_gateway = ini->smcrv2.uses_gateway; memcpy(lgr->nexthop_mac, ini->smcrv2.nexthop_mac, ETH_ALEN); + lgr->max_conns = ini->max_conns; + lgr->max_links = ini->max_links; } else { ibdev = ini->ib_dev; ibport = ini->ib_port; + lgr->max_conns = SMC_CONN_PER_LGR_MAX; + lgr->max_links = SMC_LINKS_ADD_LNK_MAX; } memcpy(lgr->pnet_id, ibdev->pnetid[ibport - 1], SMC_MAX_PNETID_LEN); @@ -1664,6 +1672,9 @@ void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport) !rdma_dev_access_netns(smcibdev->ibdev, lgr->net)) continue; + if (lgr->type == SMC_LGR_SINGLE && lgr->max_links <= 1) + continue; + /* trigger local add link processing */ link = smc_llc_usable_link(lgr); if (link) @@ -1888,7 +1899,7 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini) (ini->smcd_version == SMC_V2 || lgr->vlan_id == ini->vlan_id) && (role == SMC_CLNT || ini->is_smcd || - (lgr->conns_num < SMC_RMBS_PER_LGR_MAX && + (lgr->conns_num < lgr->max_conns && !bitmap_full(lgr->rtokens_used_mask, SMC_RMBS_PER_LGR_MAX)))) { /* link group found */ ini->first_contact_local = 0; diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 1645fba0d2d3..120027d40469 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -22,6 +22,15 @@ #include "smc_ib.h" #define SMC_RMBS_PER_LGR_MAX 255 /* max. # of RMBs per link group */ +#define SMC_CONN_PER_LGR_MIN 16 /* min. # of connections per link group */ +#define SMC_CONN_PER_LGR_MAX 255 /* max. # of connections per link group, + * also is the default value for SMC-R v1 and v2.0 + */ +#define SMC_CONN_PER_LGR_PREFER 255 /* Preferred connections per link group used for + * SMC-R v2.1 and later negotiation, vendors or + * distrubutions may modify it to a value between + * 16-255 as needed. + */ struct smc_lgr_list { /* list of link group definition */ struct list_head list; @@ -164,6 +173,15 @@ struct smc_link { */ #define SMC_LINKS_PER_LGR_MAX 3 #define SMC_SINGLE_LINK 0 +#define SMC_LINKS_ADD_LNK_MIN 1 /* min. # of links per link group */ +#define SMC_LINKS_ADD_LNK_MAX 2 /* max. # of links per link group, also is the + * default value for smc-r v1.0 and v2.0 + */ +#define SMC_LINKS_PER_LGR_MAX_PREFER 2 /* Preferred max links per link group used for + * SMC-R v2.1 and later negotiation, vendors or + * distrubutions may modify it to a value between + * 1-2 as needed. + */ /* tx/rx buffer list element for sndbufs list and rmbs list of a lgr */ struct smc_buf_desc { @@ -331,6 +349,10 @@ struct smc_link_group { __be32 saddr; /* net namespace */ struct net *net; + u8 max_conns; + /* max conn can be assigned to lgr */ + u8 max_links; + /* max links can be added in lgr */ }; struct { /* SMC-D */ u64 peer_gid; @@ -374,6 +396,9 @@ struct smc_init_info { u8 is_smcd; u8 smc_type_v1; u8 smc_type_v2; + u8 release_nr; + u8 max_conns; + u8 max_links; u8 first_contact_peer; u8 first_contact_local; unsigned short vlan_id; @@ -539,7 +564,6 @@ int smc_vlan_by_tcpsk(struct socket *clcsock, struct smc_init_info *ini); void smc_conn_free(struct smc_connection *conn); int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini); -void smc_lgr_schedule_free_work_fast(struct smc_link_group *lgr); int smc_core_init(void); void smc_core_exit(void); diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 034295676e88..4df5f8c8a0a1 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -96,7 +96,6 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk); int smc_ib_create_queue_pair(struct smc_link *lnk); int smc_ib_ready_link(struct smc_link *lnk); int smc_ib_modify_qp_rts(struct smc_link *lnk); -int smc_ib_modify_qp_reset(struct smc_link *lnk); int smc_ib_modify_qp_error(struct smc_link *lnk); long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev); int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags, diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index 90f0b60b196a..018ce8133b02 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -52,14 +52,13 @@ struct smc_llc_msg_confirm_link { /* type 0x01 */ u8 link_num; u8 link_uid[SMC_LGR_ID_SIZE]; u8 max_links; - u8 reserved[9]; + u8 max_conns; + u8 reserved[8]; }; #define SMC_LLC_FLAG_ADD_LNK_REJ 0x40 #define SMC_LLC_REJ_RSN_NO_ALT_PATH 1 -#define SMC_LLC_ADD_LNK_MAX_LINKS 2 - struct smc_llc_msg_add_link { /* type 0x02 */ struct smc_llc_hdr hd; u8 sender_mac[ETH_ALEN]; @@ -471,7 +470,12 @@ int smc_llc_send_confirm_link(struct smc_link *link, hton24(confllc->sender_qp_num, link->roce_qp->qp_num); confllc->link_num = link->link_id; memcpy(confllc->link_uid, link->link_uid, SMC_LGR_ID_SIZE); - confllc->max_links = SMC_LLC_ADD_LNK_MAX_LINKS; + confllc->max_links = SMC_LINKS_ADD_LNK_MAX; + if (link->lgr->smc_version == SMC_V2 && + link->lgr->peer_smc_release >= SMC_RELEASE_1) { + confllc->max_conns = link->lgr->max_conns; + confllc->max_links = link->lgr->max_links; + } /* send llc message */ rc = smc_wr_tx_send(link, pend); put_out: @@ -1041,6 +1045,11 @@ int smc_llc_cli_add_link(struct smc_link *link, struct smc_llc_qentry *qentry) goto out_reject; } + if (lgr->type == SMC_LGR_SINGLE && lgr->max_links <= 1) { + rc = 0; + goto out_reject; + } + ini->vlan_id = lgr->vlan_id; if (lgr->smc_version == SMC_V2) { ini->check_smcrv2 = true; @@ -1165,6 +1174,9 @@ static void smc_llc_cli_add_link_invite(struct smc_link *link, lgr->type == SMC_LGR_ASYMMETRIC_PEER) goto out; + if (lgr->type == SMC_LGR_SINGLE && lgr->max_links <= 1) + goto out; + ini = kzalloc(sizeof(*ini), GFP_KERNEL); if (!ini) goto out; @@ -1410,6 +1422,11 @@ int smc_llc_srv_add_link(struct smc_link *link, goto out; } + if (lgr->type == SMC_LGR_SINGLE && lgr->max_links <= 1) { + rc = 0; + goto out; + } + /* ignore client add link recommendation, start new flow */ ini->vlan_id = lgr->vlan_id; if (lgr->smc_version == SMC_V2) { |