summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-02-15devlink: Move devlink health get and set code to health fileMoshe Shemesh
Move devlink health get and set callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: health: Fix nla_nest_end in error flowMoshe Shemesh
devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix it to call nla_nest_cancel() instead. Note the bug is harmless as genlmsg_cancel() cancel the entire message, so no fixes tag added. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Split out health reporter create codeMoshe Shemesh
Move devlink health reporter create/destroy and related dev code to new file health.c. This file shall include all callbacks and functionality that are related to devlink health. In addition, fix kdoc indentation and make reporter create/destroy kdoc more clear. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net: no longer support SOCK_REFCNT_DEBUG featureJason Xing
Commit e48c414ee61f ("[INET]: Generalise the TCP sock ID lookup routines") commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") removed it. Since we could track all of them through bpf and kprobe related tools and the feature could print loads of information which might not be that helpful even under a little bit pressure, the whole feature which has been inactive for many years is no longer supported. Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/ Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-14net-sysfs: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14net: bridge: make kobj_type structure constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14devlink: don't allow to change net namespace for FW_ACTIVATE reload actionJiri Pirko
The change on network namespace only makes sense during re-init reload action. For FW activation it is not applicable. So check if user passed an ATTR indicating network namespace change request and forbid it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230213115836.3404039-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14net/sched: support per action hw statsOz Shlomo
There are currently two mechanisms for populating hardware stats: 1. Using flow_offload api to query the flow's statistics. The api assumes that the same stats values apply to all the flow's actions. This assumption breaks when action drops or jumps over following actions. 2. Using hw_action api to query specific action stats via a driver callback method. This api assures the correct action stats for the offloaded action, however, it does not apply to the rest of the actions in the flow's actions array. Extend the flow_offload stats callback to indicate that a per action stats update is required. Use the existing flow_offload_action api to query the action's hw stats. In addition, currently the tc action stats utility only updates hw actions. Reuse the existing action stats cb infrastructure to query any action stats. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14net/sched: introduce flow_offload action cookieOz Shlomo
Currently a hardware action is uniquely identified by the <id, hw_index> tuple. However, the id is set by the flow_act_setup callback and tc core cannot enforce this, and it is possible that a future change could break this. In addition, <id, hw_index> are not unique across network namespaces. Uniquely identify the action by setting an action cookie by the tc core. Use the unique action cookie to query the action's hardware stats. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14net/sched: pass flow_stats instead of multiple stats argsOz Shlomo
Instead of passing 6 stats related args, pass the flow_stats. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14net/sched: act_pedit, setup offload action for action stats queryOz Shlomo
A single tc pedit action may be translated to multiple flow_offload actions. Offload only actions that translate to a single pedit command value. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14net/sched: optimize action stats api callsOz Shlomo
Currently the hw action stats update is called from tcf_exts_hw_stats_update, when a tc filter is dumped, and from tcf_action_copy_stats, when a hw action is dumped. However, the tcf_action_copy_stats is also called from tcf_action_dump. As such, the hw action stats update cb is called 3 times for every tc flower filter dump. Move the tc action hw stats update from tcf_action_copy_stats to tcf_dump_walker to update the hw action stats when tc action is dumped. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-13ipv6: icmp6: add drop reason support to ndisc_rcv()Eric Dumazet
Creates three new drop reasons: SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc). SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit. SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-13ipv6: icmp6: add drop reason support to icmpv6_notify()Eric Dumazet
Accurately reports what happened in icmpv6_notify() when handling a packet. This makes use of the new IPV6_BAD_EXTHDR drop reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-13net: ethtool: extend ringparam set/get APIs for rx_pushShannon Nelson
Similar to what was done for TX_PUSH, add an RX_PUSH concept to the ethtool interfaces. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13net/sched: fix error recovery in qdisc_create()Eric Dumazet
If TCA_STAB attribute is malformed, qdisc_get_stab() returns an error, and we end up calling ops->destroy() while ops->init() has not been called yet. While we are at it, call qdisc_put_stab() after ops->destroy(). Fixes: 1f62879e3632 ("net/sched: make stab available before ops->init() call") Reported-by: syzbot+d44d88f1d11e6ca8576b@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: add forgotten devlink instance lock assertion to ↵Jiri Pirko
devl_param_driverinit_value_set() Driver calling devl_param_driverinit_value_set() has to hold devlink instance lock while doing that. Put an assertion there. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: allow to call devl_param_driverinit_value_get() without holding ↵Jiri Pirko
instance lock If the driver maintains following basic sane behavior, the devl_param_driverinit_value_get() function could be called without holding instance lock: 1) Driver ensures a call to devl_param_driverinit_value_get() cannot race with registering/unregistering the parameter with the same parameter ID. 2) Driver ensures a call to devl_param_driverinit_value_get() cannot race with devl_param_driverinit_value_set() call with the same parameter ID. 3) Driver ensures a call to devl_param_driverinit_value_get() cannot race with reload operation. By the nature of params usage, these requirements should be trivially achievable. If the driver for some off reason is not able to comply, it has to take the devlink->lock while calling devl_param_driverinit_value_get(). Remove the lock assertion and add comment describing the locking requirements. This fixes a splat in mlx5 driver introduced by the commit referenced in the "Fixes" tag. Lore: https://lore.kernel.org/netdev/719de4f0-76ac-e8b9-38a9-167ae239efc7@amd.com/ Reported-by: Kim Phillips <kim.phillips@amd.com> Fixes: 075935f0ae0f ("devlink: protect devlink param list by instance lock") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: convert param list to xarrayJiri Pirko
Loose the linked list for params and use xarray instead. Note that this is required to be eventually possible to call devl_param_driverinit_value_get() without holding instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: use xa_for_each_start() helper in devlink_nl_cmd_port_get_dump_one()Jiri Pirko
As xarray has an iterator helper that allows to start from specified index, use this directly and avoid repeated iteration from 0. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: fix the name of value arg of devl_param_driverinit_value_get()Jiri Pirko
Probably due to copy-paste error, the name of the arg is "init_val" which is misleading, as the pointer is used to point to struct where to store the current value. Rename it to "val" and change the arg comment a bit on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: make sure driver does not read updated driverinit param before reloadJiri Pirko
The driverinit param purpose is to serve the driver during init/reload time to provide a value, either default or set by user. Make sure that driver does not read value updated by user before the reload is performed. Hold the new value in a separate struct and switch it during reload. Note that this is required to be eventually possible to call devl_param_driverinit_value_get() without holding instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: don't use strcpy() to copy param valueJiri Pirko
No need to treat string params any different comparing to other types. Rely on the struct assign to copy the whole struct, including the string. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: stop using NL_SET_ERR_MSG_MODJacob Keller
NL_SET_ERR_MSG_MOD inserts the KBUILD_MODNAME and a ':' before the actual extended error message. The devlink feature hasn't been able to be compiled as a module since commit f4b6bcc7002f ("net: devlink: turn devlink into a built-in"). Stop using NL_SET_ERR_MSG_MOD, and just use the base NL_SET_ERR_MSG. This aligns the extended error messages better with the NL_SET_ERR_MSG_ATTR messages as well. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13rds: rds_rm_zerocopy_callback() correct order for list_add_tail()Pietro Borrello
rds_rm_zerocopy_callback() uses list_add_tail() with swapped arguments. This links the list head with the new entry, losing the references to the remaining part of the list. Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10bridge: mcast: Move validation to a policyIdo Schimmel
Future patches are going to move parts of the bridge MDB code to the common rtnetlink code in preparation for VXLAN MDB support. To facilitate code sharing between both drivers, move the validation of the top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will eventually be moved to the rtnetlink code. Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of NLA_POLICY_NESTED() as this attribute is going to be validated using different policies in the underlying drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10bridge: mcast: Remove pointless sequence generation counter assignmentIdo Schimmel
The purpose of the sequence generation counter in the netlink callback is to identify if a multipart dump is consistent or not by calling nl_dump_check_consistent() whenever a message is generated. The function is not invoked by the MDB code, rendering the sequence generation counter assignment pointless. Remove it. Note that even if the function was invoked, we still could not accurately determine if the dump is consistent or not, as there is no sequence generation counter for MDB entries, unlike nexthop objects, for example. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10bridge: mcast: Use correct define in MDB dumpIdo Schimmel
'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same value, but the latter is uAPI and cannot change, so use it when dumping an MDB entry. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge tag 'for-net-next-2023-02-09' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== pull-request: bluetooth-next - Add new PID/VID 0489:e0f2 for MT7921 - Add VID:PID 13d3:3529 for Realtek RTL8821CE - Add CIS feature bits to controller information - Set Per Platform Antenna Gain(PPAG) for Intel controllers * tag 'for-net-next-2023-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG) Bluetooth: Make sure LE create conn cancel is sent when timeout Bluetooth: Free potentially unfreed SCO connection Bluetooth: hci_qca: get wakeup status from serdev device handle Bluetooth: L2CAP: Fix potential user-after-free Bluetooth: MGMT: add CIS feature bits to controller information Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds Bluetooth: HCI: Replace zero-length arrays with flexible-array members Bluetooth: qca: Fix sparse warnings Bluetooth: btusb: Add VID:PID 13d3:3529 for Realtek RTL8821CE Bluetooth: btusb: Add new PID/VID 0489:e0f2 for MT7921 Bluetooth: Fix issue with Actions Semi ATS2851 based devices ==================== Link: https://lore.kernel.org/r/20230209234922.3756173-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Daniel Borkmann says:Jakub Kicinski
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: extract nf_ct_handle_fragments to nf_conntrack_ovsXin Long
Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: sched: move frag check and tc_skb_cb update out of handle_fragmentsXin Long
This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10openvswitch: move key and ovs_cb update out of handle_fragmentsXin Long
This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: extract nf_ct_skb_network_trim function to nf_conntrack_ovsXin Long
There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: create nf_conntrack_ovs for ovs and tc useXin Long
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: skbuff: drop the word head from skb cacheJakub Kicinski
skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10Merge tag 'rxrpc-next-20230208' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc development Here are some miscellaneous changes for rxrpc: (1) Use consume_skb() rather than kfree_skb_reason(). (2) Fix unnecessary waking when poking and already-poked call. (3) Add ack.rwind to the rxrpc_tx_ack tracepoint as this indicates how many incoming DATA packets we're telling the peer that we are currently willing to accept on this call. (4) Reduce duplicate ACK transmission. We send ACKs to let the peer know that we're increasing the receive window (ack.rwind) as we consume packets locally. Normal ACK transmission is triggered in three places and that leads to duplicates being sent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-09openvswitch: Use string_is_terminated() helperAndy Shevchenko
Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09genetlink: Use string_is_terminated() helperAndy Shevchenko
Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09string_helpers: Move string_is_valid() to the headerAndy Shevchenko
Move string_is_valid() to the header for wider use. While at it, rename to string_is_terminated() to be precise about its semantics. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net: introduce default_rps_mask netns attributePaolo Abeni
If RPS is enabled, this allows configuring a default rps mask, which is effective since receive queue creation time. A default RPS mask allows the system admin to ensure proper isolation, avoiding races at network namespace or device creation time. The default RPS mask is initially empty, and can be modified via a newly added sysctl entry. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net-sysctl: factor-out rpm mask manipulation helpersPaolo Abeni
Will simplify the following patch. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net-sysctl: factor out cpumask parsing helperPaolo Abeni
Will be used by the following patch to avoid code duplication. No functional changes intended. The only difference is that now flow_limit_cpu_sysctl() will always compute the flow limit mask on each read operation, even when read() will not return any byte to user-space. Note that the new helper is placed under a new #ifdef at the file start to better fit the usage in the later patch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Bluetooth: Make sure LE create conn cancel is sent when timeoutArchie Pusaka
When sending LE create conn command, we set a timer with a duration of HCI_LE_CONN_TIMEOUT before timing out and calling create_le_conn_complete. Additionally, when receiving the command complete, we also set a timer with the same duration to call le_conn_timeout. Usually the latter will be triggered first, which then sends a LE create conn cancel command. However, due to the nature of racing, it is possible for the former to be called first, thereby calling the chain hci_conn_failed -> hci_conn_del -> cancel_delayed_work, thereby preventing LE create conn cancel to be sent. In this situation, the controller will be stuck in trying the LE connection. This patch flushes le_conn_timeout on create_le_conn_complete to make sure we always send LE create connection cancel, if necessary. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-09Bluetooth: Free potentially unfreed SCO connectionArchie Pusaka
It is possible to initiate a SCO connection while deleting the corresponding ACL connection, e.g. in below scenario: (1) < hci setup sync connect command (2) > hci disconn complete event (for the acl connection) (3) > hci command complete event (for(1), failure) When it happens, hci_cs_setup_sync_conn won't be able to obtain the reference to the SCO connection, so it will be stuck and potentially hinder subsequent connections to the same device. This patch prevents that by also deleting the SCO connection if it is still not established when the corresponding ACL connection is deleted. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-09Bluetooth: L2CAP: Fix potential user-after-freeLuiz Augusto von Dentz
This fixes all instances of which requires to allocate a buffer calling alloc_skb which may release the chan lock and reacquire later which makes it possible that the chan is disconnected in the meantime. Fixes: a6a5568c03c4 ("Bluetooth: Lock the L2CAP channel when sending") Reported-by: Alexander Coffin <alex.coffin@matician.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-09Bluetooth: MGMT: add CIS feature bits to controller informationPauli Virtanen
Userspace needs to know whether the adapter has feature support for Connected Isochronous Stream - Central/Peripheral, so it can set up LE Audio features accordingly. Expose these feature bits as settings in MGMT controller info. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-09Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeedsKees Cook
The compiler thinks "conn" might be NULL after a call to hci_bind_bis(), which cannot happen. Avoid any confusion by just making it not return a value since it cannot fail. Fixes the warnings seen with GCC 13: In function 'arch_atomic_dec_and_test', inlined from 'atomic_dec_and_test' at ../include/linux/atomic/atomic-instrumented.h:576:9, inlined from 'hci_conn_drop' at ../include/net/bluetooth/hci_core.h:1391:6, inlined from 'hci_connect_bis' at ../net/bluetooth/hci_conn.c:2124:3: ../arch/x86/include/asm/rmwcc.h:37:9: warning: array subscript 0 is outside array bounds of 'atomic_t[0]' [-Warray-bounds=] 37 | asm volatile (fullop CC_SET(cc) \ | ^~~ ... In function 'hci_connect_bis': cc1: note: source object is likely at address zero Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/devlink/leftover.c / net/core/devlink.c: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") f05bd8ebeb69 ("devlink: move code to a dedicated directory") 687125b5799c ("devlink: split out core code") https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net: enable usercopy for skb_small_head_cacheEric Dumazet
syzbot and other bots reported that we have to enable user copy to/from skb->head. [1] We can prevent access to skb_shared_info, which is a nice improvement over standard kmem_cache. Layout of these kmem_cache objects is: < SKB_SMALL_HEAD_HEADROOM >< struct skb_shared_info > usercopy: Kernel memory overwrite attempt detected to SLUB object 'skbuff_small_head' (offset 32, size 20)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102 ! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6-syzkaller-01425-gcb6b2e11a42d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102 Code: e8 ee ad ba f7 49 89 d9 4d 89 e8 4c 89 e1 41 56 48 89 ee 48 c7 c7 20 2b 5b 8a ff 74 24 08 41 57 48 8b 54 24 20 e8 7a 17 fe ff <0f> 0b e8 c2 ad ba f7 e8 7d fb 08 f8 48 8b 0c 24 49 89 d8 44 89 ea RSP: 0000:ffffc90000067a48 EFLAGS: 00010286 RAX: 000000000000006b RBX: ffffffff8b5b6ea0 RCX: 0000000000000000 RDX: ffff8881401c0000 RSI: ffffffff8166195c RDI: fffff5200000cf3b RBP: ffffffff8a5b2a60 R08: 000000000000006b R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8bf2a925 R13: ffffffff8a5b29a0 R14: 0000000000000014 R15: ffffffff8a5b2960 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000c48e000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __check_heap_object+0xdd/0x110 mm/slub.c:4761 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:251 [inline] __check_object_size+0x1da/0x5a0 mm/usercopy.c:213 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_from_iter include/linux/uio.h:186 [inline] copy_from_iter_full include/linux/uio.h:194 [inline] memcpy_from_msg include/linux/skbuff.h:3977 [inline] qrtr_sendmsg+0x65f/0x970 net/qrtr/af_qrtr.c:965 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 say_hello+0xf6/0x170 net/qrtr/ns.c:325 qrtr_ns_init+0x220/0x2b0 net/qrtr/ns.c:804 qrtr_proto_init+0x59/0x95 net/qrtr/af_qrtr.c:1296 do_one_initcall+0x141/0x790 init/main.c:1306 do_initcall_level init/main.c:1379 [inline] do_initcalls init/main.c:1395 [inline] do_basic_setup init/main.c:1414 [inline] kernel_init_freeable+0x6f9/0x782 init/main.c:1634 kernel_init+0x1e/0x1d0 init/main.c:1522 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Fixes: bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-next/CA+G9fYs-i-c2KTSA7Ai4ES_ZESY1ZnM=Zuo8P1jN00oed6KHMA@mail.gmail.com Link: https://lore.kernel.org/r/20230208142508.3278406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>