summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-12-26net/smc: compatible with 128-bits extended GID of virtual ISM deviceWen Gu
According to virtual ISM support feature defined by SMCv2.1, GIDs of virtual ISM device are UUIDs defined by RFC4122, which are 128-bits long. So some adaptation work is required. And note that the GIDs of existing platform firmware ISM devices still remain 64-bits long. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: define a reserved CHID range for virtual ISM devicesWen Gu
According to virtual ISM support feature defined by SMCv2.1, CHIDs in the range 0xFF00 to 0xFFFF are reserved for use by virtual ISM devices. And two helpers are introduced to distinguish virtual ISM devices from the existing platform firmware ISM devices. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: introduce virtual ISM device support featureWen Gu
This introduces virtual ISM device support feature to SMCv2.1 as the first supplemental feature. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: support SMCv2.x supplemental features negotiationWen Gu
This patch adds SMCv2.x supplemental features negotiation. Supported SMCv2.x supplemental features are represented by feature_mask in FCE field. The negotiation process is as follows. Server Client Proposal(features(c-mask bits)) <----------------------------------------- Accept(features(s-mask bits)) -----------------------------------------> Confirm(features(s&c-mask bits)) <----------------------------------------- Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: unify the structs of accept or confirm message for v1 and v2Wen Gu
The structs of CLC accept and confirm messages for SMCv1 and SMCv2 are separately defined and often casted to each other in the code, which may increase the risk of errors caused by future divergence of them. So unify them into one struct for better maintainability. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: introduce sub-functions for smc_clc_send_confirm_accept()Wen Gu
There is a large if-else block in smc_clc_send_confirm_accept() and it is better to split it into two sub-functions. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: rename some 'fce' to 'fce_v2x' for clarityWen Gu
Rename some functions or variables with 'fce' in their name but used in SMCv2.1 as 'fce_v2x' for clarity. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tipc: Remove some excess struct member documentationJonathan Corbet
Remove documentation for nonexistent struct members, addressing these warnings: ./net/tipc/link.c:228: warning: Excess struct member 'media_addr' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'timer' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'refcnt' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'proto_msg' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'pmsg' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'backlog_limit' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'exp_msg_count' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'reset_rcv_checkpt' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'transmitq' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'snt_nxt' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'deferred_queue' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'unacked_window' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'next_out' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'long_msg_seq_no' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'bc_rcvr' description in 'tipc_link' Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Remove dead code and fields for bhash2.Kuniyuki Iwashima
Now all sockets including TIME_WAIT are linked to bhash2 using sock_common.skc_bind_node. We no longer use inet_bind2_bucket.deathrow, sock.sk_bind2_node, and inet_timewait_sock.tw_bind2_node. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Link sk and twsk to tb2->owners using skc_bind_node.Kuniyuki Iwashima
Now we can use sk_bind_node/tw_bind_node for bhash2, which means we need not link TIME_WAIT sockets separately. The dead code and sk_bind2_node will be removed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Unlink sk from bhash.Kuniyuki Iwashima
Now we do not use tb->owners and can unlink sockets from bhash. sk_bind_node/tw_bind_node are available for bhash2 and will be used in the following patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Check hlist_empty(&tb->bhash2) instead of hlist_empty(&tb->owners).Kuniyuki Iwashima
We use hlist_empty(&tb->owners) to check if the bhash bucket has a socket. We can check the child bhash2 buckets instead. For this to work, the bhash2 bucket must be freed before the bhash bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Iterate tb->bhash2 in inet_csk_bind_conflict().Kuniyuki Iwashima
Sockets in bhash are also linked to bhash2, but TIME_WAIT sockets are linked separately in tb2->deathrow. Let's replace tb->owners iteration in inet_csk_bind_conflict() with two iterations over tb2->owners and tb2->deathrow. This can be done safely under bhash's lock because socket insertion/ deletion in bhash2 happens with bhash's lock held. Note that twsk_for_each_bound_bhash() will be removed later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Rearrange tests in inet_csk_bind_conflict().Kuniyuki Iwashima
The following patch adds code in the !inet_use_bhash2_on_bind(sk) case in inet_csk_bind_conflict(). To avoid adding nest and make the change cleaner, this patch rearranges tests in inet_csk_bind_conflict(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Link bhash2 to bhash.Kuniyuki Iwashima
bhash2 added a new member sk_bind2_node in struct sock to link sockets to bhash2 in addition to bhash. bhash is still needed to search conflicting sockets efficiently from a port for the wildcard address. However, bhash itself need not have sockets. If we link each bhash2 bucket to the corresponding bhash bucket, we can iterate the same set of the sockets from bhash2 via bhash. This patch links bhash2 to bhash only, and the actual use will be in the later patches. Finally, we will remove sk_bind2_node. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Rename tb in inet_bind2_bucket_(init|create)().Kuniyuki Iwashima
Later, we no longer link sockets to bhash. Instead, each bhash2 bucket is linked to the corresponding bhash bucket. Then, we pass the bhash bucket to bhash2 allocation functions as tb. However, tb is already used in inet_bind2_bucket_create() and inet_bind2_bucket_init() as the bhash2 bucket. To make the following diff clear, let's use tb2 for the bhash2 bucket there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Save address type in inet_bind2_bucket.Kuniyuki Iwashima
inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() are called for each bhash2 bucket to check conflicts. Thus, we call ipv6_addr_any() and ipv6_addr_v4mapped() over and over during bind(). Let's avoid calling them by saving the address type in inet_bind2_bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr.Kuniyuki Iwashima
In bhash2, IPv4/IPv6 addresses are saved in two union members, which complicate address checks in inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() considering uninitialised memory and v4-mapped-v6 conflicts. Let's simplify that by saving IPv4 address as v4-mapped-v6 address and defining tb2.rcv_saddr as tb2.v6_rcv_saddr.s6_addr32[3]. Then, we can compare v6 address as is, and after checking v4-mapped-v6, we can compare v4 address easily. Also, we can remove tb2->family. Note these functions will be further refactored in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Rearrange tests in inet_bind2_bucket_(addr_match|match_addr_any)().Kuniyuki Iwashima
The protocol family tests in inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() are ordered as follows. if (sk->sk_family != tb2->family) else if (sk->sk_family == AF_INET6) else This patch rearranges them so that AF_INET6 socket is handled first to make the following patch tidy, where tb2->family will be removed. if (sk->sk_family == AF_INET6) else if (tb2->family == AF_INET6) else Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22tcp: Use bhash2 for v4-mapped-v6 non-wildcard address.Kuniyuki Iwashima
While checking port availability in bind() or listen(), we used only bhash for all v4-mapped-v6 addresses. But there is no good reason not to use bhash2 for v4-mapped-v6 non-wildcard addresses. Let's do it by returning true in inet_use_bhash2_on_bind(). Then, we also need to add a test in inet_bind2_bucket_match_addr_any() so that ::ffff:X.X.X.X will match with 0.0.0.0. Note that sk->sk_rcv_saddr is initialised for v4-mapped-v6 sk in __inet6_bind(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21Merge tag 'net-6.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and bpf. Current release - regressions: - bpf: syzkaller found null ptr deref in unix_bpf proto add - eth: i40e: fix ST code value for clause 45 Previous releases - regressions: - core: return error from sk_stream_wait_connect() if sk_wait_event() fails - ipv6: revert remove expired routes with a separated list of routes - wifi rfkill: - set GPIO direction - fix crash with WED rx support enabled - bluetooth: - fix deadlock in vhci_send_frame - fix use-after-free in bt_sock_recvmsg - eth: mlx5e: fix a race in command alloc flow - eth: ice: fix PF with enabled XDP going no-carrier after reset - eth: bnxt_en: do not map packet buffers twice Previous releases - always broken: - core: - check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() - check dev->gso_max_size in gso_features_check() - mptcp: fix inconsistent state on fastopen race - phy: skip LED triggers on PHYs on SFP modules - eth: mlx5e: - fix double free of encap_header - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()" * tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: check dev->gso_max_size in gso_features_check() kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail net/ipv6: Revert remove expired routes with a separated list of routes net: avoid build bug in skb extension length calculation net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean() net: stmmac: fix incorrect flag check in timestamp interrupt selftests: add vlan hw filter tests net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() net: hns3: add new maintainer for the HNS3 ethernet driver net: mana: select PAGE_POOL net: ks8851: Fix TX stall caused by TX buffer overrun ice: Fix PF with enabled XDP going no-carrier after reset ice: alter feature support check for SRIOV and LAG ice: stop trashing VF VSI aggregator node ID information mailmap: add entries for Geliang Tang mptcp: fill in missing MODULE_DESCRIPTION() mptcp: fix inconsistent state on fastopen race selftests: mptcp: join: fix subflow_send_ack lookup net: phy: skip LED triggers on PHYs on SFP modules bpf: Add missing BPF_LINK_TYPE invocations ...
2023-12-21Merge tag 'for-netdev' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-12-21 Hi David, hi Jakub, hi Paolo, hi Eric, The following pull-request contains BPF updates for your *net* tree. We've added 3 non-merge commits during the last 5 day(s) which contain a total of 4 files changed, 45 insertions(+). The main changes are: 1) Fix a syzkaller splat which triggered an oob issue in bpf_link_show_fdinfo(), from Jiri Olsa. 2) Fix another syzkaller-found issue which triggered a NULL pointer dereference in BPF sockmap for unconnected unix sockets, from John Fastabend. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add missing BPF_LINK_TYPE invocations bpf: sockmap, test for unconnected af_unix sock bpf: syzkaller found null ptr deref in unix_bpf proto add ==================== Link: https://lore.kernel.org/r/20231221104844.1374-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21net: check dev->gso_max_size in gso_features_check()Eric Dumazet
Some drivers might misbehave if TSO packets get too big. GVE for instance uses a 16bit field in its TX descriptor, and will do bad things if a packet is bigger than 2^16 bytes. Linux TCP stack honors dev->gso_max_size, but there are other ways for too big packets to reach an ndo_start_xmit() handler : virtio_net, af_packet, GRO... Add a generic check in gso_features_check() and fallback to GSO when needed. gso_max_size was added in the blamed commit. Fixes: 82cc1a7a5687 ("[NET]: Add per-connection option to set max TSO frame size") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219125331.4127498-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21net/ipv6: Revert remove expired routes with a separated list of routesDavid Ahern
This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87. The commit has some race conditions given how expires is managed on a fib6_info in relation to gc start, adding the entry to the gc list and setting the timer value leading to UAF. Revert the commit and try again in a later release. Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes") Cc: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21net: avoid build bug in skb extension length calculationThomas Weißschuh
GCC seems to incorrectly fail to evaluate skb_ext_total_length() at compile time under certain conditions. The issue even occurs if all values in skb_ext_type_len[] are "0", ruling out the possibility of an actual overflow. As the patch has been in mainline since v6.6 without triggering the problem it seems to be a very uncommon occurrence. As the issue only occurs when -fno-tree-loop-im is specified as part of CFLAGS_GCOV, disable the BUILD_BUG_ON() only when building with coverage reporting enabled. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312171924.4FozI5FG-lkp@intel.com/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/lkml/487cfd35-fe68-416f-9bfd-6bb417f98304@app.fastmail.com/ Fixes: 5d21d0a65b57 ("net: generalize calculation of skb extensions length") Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231218-net-skbuff-build-bug-v1-1-eefc2fb0a7d3@weissschuh.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-20Merge tag 'nfsd-6.7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address a few recently-introduced issues * tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806 NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0 NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d nfsd: hold nfsd_mutex across entire netlink operation nfsd: call nfsd_last_thread() before final nfsd_put()
2023-12-20net: sched: Add initial TC error skb drop reasonsVictor Nogueira
Continue expanding Daniel's patch by adding new skb drop reasons that are idiosyncratic to TC. More specifically: - SKB_DROP_REASON_TC_COOKIE_ERROR: An error occurred whilst processing a tc ext cookie. - SKB_DROP_REASON_TC_CHAIN_NOTFOUND: tc chain lookup failed. - SKB_DROP_REASON_TC_RECLASSIFY_LOOP: tc exceeded max reclassify loop iterations Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20net: sched: Make tc-related drop reason more flexible for remaining qdiscsVictor Nogueira
Incrementing on Daniel's patch[1], make tc-related drop reason more flexible for remaining qdiscs - that is, all qdiscs aside from clsact. In essence, the drop reason will be set by cls_api and act_api in case any error occurred in the data path. With that, we can give the user more detailed information so that they can distinguish between a policy drop or an error drop. [1] https://lore.kernel.org/all/20231009092655.22025-1-daniel@iogearbox.net Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20net: sched: Move drop_reason to struct tc_skb_cbVictor Nogueira
Move drop_reason from struct tcf_result to skb cb - more specifically to struct tc_skb_cb. With that, we'll be able to also set the drop reason for the remaining qdiscs (aside from clsact) that do not have access to tcf_result when time comes to set the skb drop reason. Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20rtnetlink: bridge: Enable MDB bulk deletionIdo Schimmel
Now that both the common code as well as individual drivers support MDB bulk deletion, allow user space to make such requests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20bridge: mdb: Add MDB bulk deletion supportIdo Schimmel
Implement MDB bulk deletion support in the bridge driver, allowing MDB entries to be deleted in bulk according to provided parameters. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20rtnetlink: bridge: Invoke MDB bulk deletion when neededIdo Schimmel
Invoke the new MDB bulk deletion device operation when the 'NLM_F_BULK' flag is set in the netlink message header. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20rtnetlink: bridge: Use a different policy for MDB bulk deleteIdo Schimmel
For MDB bulk delete we will need to validate 'MDBA_SET_ENTRY' differently compared to regular delete. Specifically, allow the ifindex to be zero (in case not filtering on bridge port) and force the address to be zero as bulk delete based on address is not supported. Do that by introducing a new policy and choosing the correct policy based on the presence of the 'NLM_F_BULK' flag in the netlink message header. Use nlmsg_parse() for strict validation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20Merge tag 'for-net-2023-12-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Add encryption key size check when acting as peripheral - Shut up false-positive build warning - Send reject if L2CAP command request is corrupted - Fix Use-After-Free in bt_sock_recvmsg - Fix not notifying when connection encryption changes - Fix not checking if HCI_OP_INQUIRY has been sent - Fix address type send over to the MGMT interface - Fix deadlock in vhci_send_frame ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-19Merge tag 'for-netdev' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-12-19 Hi David, hi Jakub, hi Paolo, hi Eric, The following pull-request contains BPF updates for your *net-next* tree. We've added 2 non-merge commits during the last 1 day(s) which contain a total of 40 files changed, 642 insertions(+), 2926 deletions(-). The main changes are: 1) Revert all of BPF token-related patches for now as per list discussion [0], from Andrii Nakryiko. [0] https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com 2) Fix a syzbot-reported use-after-free read in nla_find() triggered from bpf_skb_get_nlattr_nest() helper, from Jakub Kicinski. bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: Revert BPF token-related functionality bpf: Use nla_ok() instead of checking nla_len directly ==================== Link: https://lore.kernel.org/r/20231219170359.11035-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19Revert BPF token-related functionalityAndrii Nakryiko
This patch includes the following revert (one conflicting BPF FS patch and three token patch sets, represented by merge commits): - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'"; - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs"; - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'"; - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'". Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-19devlink: extend multicast filtering by port indexJiri Pirko
Expose the previously introduced notification multicast messages filtering infrastructure and allow the user to select messages using port index. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19devlink: add a command to set notification filter and use it for multicastsJiri Pirko
Currently the user listening on a socket for devlink notifications gets always all messages for all existing instances, even if he is interested only in one of those. That may cause unnecessary overhead on setups with thousands of instances present. User is currently able to narrow down the devlink objects replies to dump commands by specifying select attributes. Allow similar approach for notifications. Introduce a new devlink NOTIFY_FILTER_SET which the user passes the select attributes. Store these per-socket and use them for filtering messages during multicast send. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19netlink: introduce typedef for filter functionJiri Pirko
Make the code using filter function a bit nicer by consolidating the filter function arguments using typedef. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19genetlink: introduce per-sock family private storageJiri Pirko
Introduce an xarray for Generic netlink family to store per-socket private. Initialize this xarray only if family uses per-socket privs. Introduce genl_sk_priv_get() to get the socket priv pointer for a family and initialize it in case it does not exist. Introduce __genl_sk_priv_get() to obtain socket priv pointer for a family under RCU read lock. Allow family to specify the priv size, init() and destroy() callbacks. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19devlink: introduce a helper for netlink multicast sendJiri Pirko
Introduce a helper devlink_nl_notify_send() so each object notification function does not have to call genlmsg_multicast_netns() with the same arguments. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19devlink: send notifications only if there are listenersJiri Pirko
Introduce devlink_nl_notify_need() helper and using it to check at the beginning of notification functions to avoid overhead of composing notification messages in case nobody listens. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19devlink: introduce __devl_is_registered() helper and use it instead of ↵Jiri Pirko
xa_get_mark() Introduce __devl_is_registered() which does not assert on devlink instance lock and use it in notifications which may be called without devlink instance lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19devlink: use devl_is_registered() helper instead xa_get_mark()Jiri Pirko
Instead of checking the xarray mark directly using xa_get_mark() helper use devl_is_registered() helper which wraps it up. Note that there are couple more users of xa_get_mark() left which are going to be handled by the next patch. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19bpf: Use nla_ok() instead of checking nla_len directlyJakub Kicinski
nla_len may also be too short to be sane, in which case after recent changes nla_len() will return a wrapped value. Fixes: 172db56d90d2 ("netlink: Return unsigned value for nla_len()") Reported-by: syzbot+f43a23b6e622797c7a28@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/bpf/20231218231904.260440-1-kuba@kernel.org
2023-12-19net: check vlan filter feature in vlan_vids_add_by_dev() and ↵Liu Jian
vlan_vids_del_by_dev() I got the below warning trace: WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0 Call Trace: rtnl_dellink rtnetlink_rcv_msg netlink_rcv_skb netlink_unicast netlink_sendmsg __sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 entry_SYSCALL_64_after_hwframe It can be repoduced via: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode 0 ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 ip netns exec ns1 ip link set bond_slave_1 master bond0 [1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off [2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0 [3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0 [4] ip netns exec ns1 ip link set bond_slave_1 nomaster [5] ip netns exec ns1 ip link del veth2 ip netns del ns1 This is all caused by command [1] turning off the rx-vlan-filter function of bond0. The reason is the same as commit 01f4fd270870 ("bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands [2] [3] add the same vid to slave and master respectively, causing command [4] to empty slave->vlan_info. The following command [5] triggers this problem. To fix this problem, we should add VLAN_FILTER feature checks in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect addition or deletion of vlan_vid information. Fixes: 348a1443cc43 ("vlan: introduce functions to do mass addition/deletion of vids by another device") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-18Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-12-18 This PR is larger than usual and contains changes in various parts of the kernel. The main changes are: 1) Fix kCFI bugs in BPF, from Peter Zijlstra. End result: all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y. 2) Introduce BPF token object, from Andrii Nakryiko. It adds an ability to delegate a subset of BPF features from privileged daemon (e.g., systemd) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The design accommodates suggestions from Christian Brauner and Paul Moore. Example: $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp 3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei. - Complete precision tracking support for register spills - Fix verification of possibly-zero-sized stack accesses - Fix access to uninit stack slots - Track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs. - Fix verifier retval logic 4) Support for VLAN tag in XDP hints, from Larysa Zaremba. 5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu. End result: better memory utilization and lower I$ miss for calls to BPF via BPF trampoline. 6) Fix race between BPF prog accessing inner map and parallel delete, from Hou Tao. 7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu. It allows BPF interact with IPSEC infra. The intent is to support software RSS (via XDP) for the upcoming ipsec pcpu work. Experiments on AWS demonstrate single tunnel pcpu ipsec reaching line rate on 100G ENA nics. 8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao. 9) BPF file verification via fsverity, from Song Liu. It allows BPF progs get fsverity digest. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits) bpf: Ensure precise is reset to false in __mark_reg_const_zero() selftests/bpf: Add more uprobe multi fail tests bpf: Fail uprobe multi link with negative offset selftests/bpf: Test the release of map btf s390/bpf: Fix indirect trampoline generation selftests/bpf: Temporarily disable dummy_struct_ops test on s390 x86/cfi,bpf: Fix bpf_exception_cb() signature bpf: Fix dtor CFI cfi: Add CFI_NOSEAL() x86/cfi,bpf: Fix bpf_struct_ops CFI x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix BPF JIT call cfi: Flip headers selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment bpf: Limit the number of kprobes when attaching program to multiple kprobes bpf: Limit the number of uprobes when attaching program to multiple uprobes bpf: xdp: Register generic_kfunc_set with XDP programs selftests/bpf: utilize string values for delegate_xxx mount options ... ==================== Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18Merge tag 'wireless-next-2023-12-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.8 The second features pull request for v6.8. A bigger one this time with changes both to stack and drivers. We have a new Wifi band RFI (WBRF) mitigation feature for which we pulled an immutable branch shared with other subsystems. And, as always, other new features and bug fixes all over. Major changes: cfg80211/mac80211 * AMD ACPI based Wifi band RFI (WBRF) mitigation feature * Basic Service Set (BSS) usage reporting * TID to link mapping support * mac80211 hardware flag to disallow puncturing iwlwifi * new debugfs file fw_dbg_clear mt76 * NVMEM EEPROM improvements * mt7996 Extremely High Throughpu (EHT) improvements * mt7996 Wireless Ethernet Dispatcher (WED) support * mt7996 36-bit DMA support ath12k * support one MSI vector * WCN7850: support AP mode * tag 'wireless-next-2023-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits) wifi: mt76: mt7996: Use DECLARE_FLEX_ARRAY() and fix -Warray-bounds warnings wifi: ath11k: workaround too long expansion sparse warnings Revert "wifi: ath12k: use ATH12K_PCI_IRQ_DP_OFFSET for DP IRQ" wifi: rt2x00: remove useless code in rt2x00queue_create_tx_descriptor() wifi: rtw89: only reset BB/RF for existing WiFi 6 chips while starting up wifi: rtw89: add DBCC H2C to notify firmware the status wifi: rtw89: mac: add suffix _ax to MAC functions wifi: rtw89: mac: add flags to check if CMAC and DMAC are enabled wifi: rtw89: 8922a: add power on/off functions wifi: rtw89: add XTAL SI for WiFi 7 chips wifi: rtw89: phy: print out RFK log with formatted string wifi: rtw89: parse and print out RFK log from C2H events wifi: rtw89: add C2H event handlers of RFK log and report wifi: rtw89: load RFK log format string from firmware file wifi: rtw89: fw: add version field to BB MCU firmware element wifi: rtw89: fw: load TX power track tables from fw_element wifi: mwifiex: configure BSSID consistently when starting AP wifi: mwifiex: add extra delay for firmware ready wifi: mac80211: sta_info.c: fix sentence grammar wifi: mac80211: rx.c: fix sentence grammar ... ==================== Link: https://lore.kernel.org/r/20231218163900.C031DC433C9@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806Chuck Lever
Guillaume says: > I believe commit 5f7fc5d69f6e ("SUNRPC: Resupply rq_pages from > node-local memory") in Linux 6.5+ is incorrect. It passes > unconditionally rq_pool->sp_id as the NUMA node. > > While the comment in the svc_pool declaration in sunrpc/svc.h says > that sp_id is also the NUMA node id, it might not be the case if > the svc is created using svc_create_pooled(). svc_created_pooled() > can use the per-cpu pool mode therefore in this case sp_id would > be the cpu id. Fix this by reverting now. At a later point this minor optimization, and the deceptive labeling of the sp_id field, can be revisited. Reported-by: Guillaume Morin <guillaume@morinfr.org> Closes: https://lore.kernel.org/linux-nfs/ZYC9rsno8qYggVt9@bender.morinfr.org/T/#u Fixes: 5f7fc5d69f6e ("SUNRPC: Resupply rq_pages from node-local memory") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>