Age | Commit message (Collapse) | Author |
|
Michael Chan says:
====================
bnxt_en: Updates for net-next
The first patch converts the sw_stats field in the completion
ring structure to a pointer. This allows the group of
completion rings using the same MSIX to share the same sw_stats
structure. Prior to this, the correct completion ring must be
used to count packets.
The next four patches remove the RTNL lock when calling the RoCE
driver for asynchronous stop and start during error recovery and
firmware reset. The RTNL ilock is replaced with a private mutex
used to synchronize RoCE register, unregister, stop, and start.
The last patch adds VF PCI IDs for the 5760X chips.
v2: Dropped patch #1 from v1. Will work with David to get that
patch in separately.
====================
Link: https://lore.kernel.org/r/20240501003056.100607-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No driver logic changes are required to support the VFs, so just add
the VF PCI ID.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the error recovery path (AER, firmware recovery, etc), the
driver notifies the RoCE driver via ULP_STOP before the reset
and via ULP_START after the reset, all under RTNL_LOCK. The
RoCE driver can take a long time if there are a lot of QPs to
destroy, so it is not ideal to hold the global RTNL lock.
Rely on the new en_dev_lock mutex instead for ULP_STOP and
ULP_START. For the most part, we move the ULP_STOP call before
we take the RTNL lock and move the ULP_START after RTNL unlock.
Note that SRIOV re-enablement must be done after ULP_START
or RoCE on the VFs will not resume properly after reset.
The one scenario in bnxt_hwrm_if_change() where the RTNL lock
is already taken in the .ndo_open() context requires the ULP
restart to be deferred to the bnxt_sp_task() workqueue.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current scheme relies heavily on the RTNL lock for all ULP
operations between the L2 and the RoCE driver. Add a new en_dev_lock
mutex so that the asynchronous ULP_STOP and ULP_START operations
can be serialized with bnxt_register_dev() and bnxt_unregister_dev()
calls without relying on the RTNL lock. The next patch will remove
the RTNL lock from the ULP_STOP and ULP_START calls.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is no need to call ULP_STOP and ULP_START before and after the
L2 reset in bnxt_reset_task(). This L2 reset is done after detecting
TX timeout, RX ring errors, or VF config changes. The L2 reset does
not affect RoCE since the firmware is not reset and the backing store
is left alone.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Offline self test is a very disruptive operation for RoCE and requires
all active QPs to be destroyed. With a large number of QPs, it can
take a long time to destroy all the QPs and can timeout. Do not allow
ethtool offline self test if the RoCE driver is registered on the
device.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On P5_PLUS chips and later, the NQ rings have subrings for RX and TX
completions respectively. These subrings are passed to the poll
function instead of the base NQ, but each ring carries its own
copy of the software ring statistics.
For stats to be conveniently accessible in __bnxt_poll_work(), the
statistics memory should either be shared between the NQ and its
subrings or the subrings need to be included in the ethtool stats
aggregation logic. This patch opts for the former, because it's more
efficient and less confusing having the software statistics for a
ring exist in a single place.
Before this patch, the counter will not be displayed if the "wrong"
cpr->sw_stats was used to increment a counter.
Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
i40e: cleanups & refactors
Ivan Vecera says:
This series do following:
Patch 1 - Removes write-only flags field from i40e_veb structure and
from i40e_veb_setup() parameters
Patch 2 - Refactors parameter of i40e_notify_client_of_l2_param_changes()
and i40e_notify_client_of_netdev_close()
Patch 3 - Refactors parameter of i40e_detect_recover_hung()
Patch 4 - Adds helper i40e_pf_get_main_vsi() to get main VSI and uses it
in existing code
Patch 5 - Consolidates checks whether given VSI is the main one
Patch 6 - Adds helper i40e_pf_get_main_veb() to get main VEB and uses it
in existing code
Patch 7 - Adds helper i40e_vsi_reconfig_tc() to reconfigure TC for
particular and uses it to replace existing open-coded pieces
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: Add and use helper to reconfigure TC for given VSI
i40e: Add helper to access main VEB
i40e: Consolidate checks whether given VSI is main
i40e: Add helper to access main VSI
i40e: Refactor argument of i40e_detect_recover_hung()
i40e: Refactor argument of several client notification functions
i40e: Remove flags field from i40e_veb
====================
Link: https://lore.kernel.org/r/20240430180639.1938515-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Naresh and Eric report several errors (corrupted elements in the dynamic
key hash list), when running tdc.py or syzbot. The error path of
qdisc_alloc() and qdisc_create() frees the qdisc memory, but it forgets
to unregister the lockdep key, thus causing use-after-free like the
following one:
==================================================================
BUG: KASAN: slab-use-after-free in lockdep_register_key+0x5f2/0x700
Read of size 8 at addr ffff88811236f2a8 by task ip/7925
CPU: 26 PID: 7925 Comm: ip Kdump: loaded Not tainted 6.9.0-rc2+ #648
Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013
Call Trace:
<TASK>
dump_stack_lvl+0x7c/0xc0
print_report+0xc9/0x610
kasan_report+0x89/0xc0
lockdep_register_key+0x5f2/0x700
qdisc_alloc+0x21d/0xb60
qdisc_create_dflt+0x63/0x3c0
attach_one_default_qdisc.constprop.37+0x8e/0x170
dev_activate+0x4bd/0xc30
__dev_open+0x275/0x380
__dev_change_flags+0x3f1/0x570
dev_change_flags+0x7c/0x160
do_setlink+0x1ea1/0x34b0
__rtnl_newlink+0x8c9/0x1510
rtnl_newlink+0x61/0x90
rtnetlink_rcv_msg+0x2f0/0xbc0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x420/0x630
netlink_sendmsg+0x732/0xbc0
__sock_sendmsg+0x1ea/0x280
____sys_sendmsg+0x5a9/0x990
___sys_sendmsg+0xf1/0x180
__sys_sendmsg+0xd3/0x180
do_syscall_64+0x96/0x180
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7f9503f4fa07
Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
RSP: 002b:00007fff6c729068 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000006630c681 RCX: 00007f9503f4fa07
RDX: 0000000000000000 RSI: 00007fff6c7290d0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000078
R10: 000000000000009b R11: 0000000000000246 R12: 0000000000000001
R13: 00007fff6c729180 R14: 0000000000000000 R15: 000055bf67dd9040
</TASK>
Allocated by task 7745:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0x7b/0x90
__kmalloc_node+0x1ff/0x460
qdisc_alloc+0xae/0xb60
qdisc_create+0xdd/0xfb0
tc_modify_qdisc+0x37e/0x1960
rtnetlink_rcv_msg+0x2f0/0xbc0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x420/0x630
netlink_sendmsg+0x732/0xbc0
__sock_sendmsg+0x1ea/0x280
____sys_sendmsg+0x5a9/0x990
___sys_sendmsg+0xf1/0x180
__sys_sendmsg+0xd3/0x180
do_syscall_64+0x96/0x180
entry_SYSCALL_64_after_hwframe+0x71/0x79
Freed by task 7745:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x36/0x60
__kasan_slab_free+0xfe/0x180
kfree+0x113/0x380
qdisc_create+0xafb/0xfb0
tc_modify_qdisc+0x37e/0x1960
rtnetlink_rcv_msg+0x2f0/0xbc0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x420/0x630
netlink_sendmsg+0x732/0xbc0
__sock_sendmsg+0x1ea/0x280
____sys_sendmsg+0x5a9/0x990
___sys_sendmsg+0xf1/0x180
__sys_sendmsg+0xd3/0x180
do_syscall_64+0x96/0x180
entry_SYSCALL_64_after_hwframe+0x71/0x79
Fix this ensuring that lockdep_unregister_key() is called before the
qdisc struct is freed, also in the error path of qdisc_create() and
qdisc_alloc().
Fixes: af0cb3fa3f9e ("net/sched: fix false lockdep warning on qdisc root lock")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/netdev/20240429221706.1492418-1-naresh.kamboju@linaro.org/
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2aa1ca0c0a3aa0acc15925c666c777a4b5de553c.1714496886.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Florian Fainelli says:
====================
net: dsa: adjust_link removal
Now that the last in-tree driver (b53) has been converted to PHYLINK, we
can get rid of all of code that catered to working with drivers
implementing only PHYLIB's adjust_link callback.
====================
Link: https://lore.kernel.org/r/20240430164816.2400606-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we no longer any drivers using PHYLIB's adjust_link callback,
remove all paths that made use of adjust_link as well as the associated
functions.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240430164816.2400606-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have not had a switch driver use a fixed_link_update callback since
58d56fcc3964f9be0a9ca42fd126bcd9dc7afc90 ("net: dsa: bcm_sf2: Get rid of
PHYLIB functions") remove this callback.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240430164816.2400606-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add SW IRQ coalescing based on hrtimers for RX and TX data path for ICSSG
driver, which can be enabled by ethtool commands:
- RX coalescing
ethtool -C eth1 rx-usecs 50
- TX coalescing can be enabled per TX queue
- by default enables coalescing for TX0
ethtool -C eth1 tx-usecs 50
- configure TX0
ethtool -Q eth0 queue_mask 1 --coalesce tx-usecs 100
- configure TX1
ethtool -Q eth0 queue_mask 2 --coalesce tx-usecs 100
- configure TX0 and TX1
ethtool -Q eth0 queue_mask 3 --coalesce tx-usecs 100 --coalesce
tx-usecs 100
Minimum value for both rx-usecs and tx-usecs is 20us.
Compared to gro_flush_timeout and napi_defer_hard_irqs this patch allows
to enable IRQ coalescing for RX path separately.
Benchmarking numbers:
===============================================================
| Method | Tput_TX | CPU_TX | Tput_RX | CPU_RX |
| ==============================================================
| Default Driver 943 Mbps 31% 517 Mbps 38% |
| IRQ Coalescing (Patch) 943 Mbps 28% 518 Mbps 25% |
===============================================================
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240430120634.1558998-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kuniyuki Iwashima says:
====================
arp: Random clean up and RCU conversion for ioctl(SIOCGARP).
arp_ioctl() holds rtnl_lock() regardless of cmd (SIOCDARP, SIOCSARP,
and SIOCGARP) to get net_device by __dev_get_by_name() and copy
dev->name safely.
In the SIOCGARP path, arp_req_get() calls neigh_lookup(), which looks
up a neighbour entry under RCU.
This series cleans up ioctl() code a bit and extends the RCU section
not to take rtnl_lock() and instead use dev_get_by_name_rcu() and
netdev_copy_name() for SIOCGARP.
v2: https://lore.kernel.org/netdev/20240425170002.68160-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20240422194755.4221-1-kuniyu@amazon.com/
====================
Link: https://lore.kernel.org/r/20240430015813.71143-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ioctl(SIOCGARP) holds rtnl_lock() to get netdev by __dev_get_by_name()
and copy dev->name safely and calls neigh_lookup() later, which looks
up a neighbour entry under RCU.
Let's replace __dev_get_by_name() with dev_get_by_name_rcu() and strscpy()
with netdev_copy_name() to avoid locking rtnl_lock().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will convert ioctl(SIOCGARP) to RCU, and then we need to copy
dev->name which is currently protected by rtnl_lock().
This patch does the following:
1) Add seqlock netdev_rename_lock to protect dev->name
2) Add netdev_copy_name() that copies dev->name to buffer
under netdev_rename_lock
3) Use netdev_copy_name() in netdev_get_name() and drop
devnet_rename_sem
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89iJEWs7AYSJqGCUABeVqOCTkErponfZdT5kV-iD=-SajnQ@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
arp_ioctl() holds rtnl_lock() first regardless of cmd (SIOCDARP,
SIOCSARP, and SIOCGARP) to get net_device by __dev_get_by_name()
and copy dev->name safely.
In the SIOCGARP path, arp_req_get() calls neigh_lookup(), which
looks up a neighbour entry under RCU.
We will extend the RCU section not to take rtnl_lock() and instead
use dev_get_by_name_rcu() for SIOCGARP.
As a preparation, let's move __dev_get_by_name() into another
function and call it from arp_req_delete(), arp_req_set(), and
arp_req_get().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a prep patch to make the following changes tidy.
No functional change intended.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When ioctl(SIOCDARP/SIOCSARP) is issued for non-proxy entry (no ATF_COM)
without arpreq.arp_dev[] set, arp_req_set() and arp_req_delete() looks up
dev based on IPv4 address by ip_route_output().
Let's factorise the same code as arp_req_dev().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When ioctl(SIOCDARP/SIOCSARP) is issued with ATF_PUBL, r.arp_netmask
must be 0.0.0.0 or 255.255.255.255.
Currently, the netmask is validated in arp_req_delete_public() or
arp_req_set_public() under rtnl_lock().
We have ATF_NETMASK test in arp_ioctl() before holding rtnl_lock(),
so let's move the netmask validation there.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In arp_req_set(), if ATF_PERM is set in arpreq.arp_flags,
ATF_COM is set automatically.
The flag will be used later for neigh_update() only when
a neighbour entry is found.
Let's set ATF_COM just before calling neigh_update().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240430015813.71143-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Even a 1h timeout isn't enough for nft_concat_range.sh to complete on
debug kernels.
Reduce test complexity and only match on single entry if
KSFT_MACHINE_SLOW is set.
To spot 'slow' tests, print the subtest duration (in seconds) in
addition to the status.
Add new nft_concat_range_perf.sh script, not executed via kselftest,
to run the performance (pps match rate) tests.
Those need about 25m to complete which seems too much to run this
via 'make run_tests'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240430145810.23447-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a followup of commit b5327b9a300e ("ipv6: use
call_rcu_hurry() in fib6_info_release()").
I had another pmtu.sh failure, and found another lazy
call_rcu() causing this failure.
aca_free_rcu() calls fib6_info_release() which releases
devices references.
We must not delay it too much or risk unregister_netdevice/ref_tracker
traces because references to netdev are not released in time.
This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KCSAN detected a race condition in netpoll:
BUG: KCSAN: data-race in net_rx_action / netpoll_send_skb
write (marked) to 0xffff8881164168b0 of 4 bytes by interrupt on cpu 10:
net_rx_action (./include/linux/netpoll.h:90 net/core/dev.c:6712 net/core/dev.c:6822)
<snip>
read to 0xffff8881164168b0 of 4 bytes by task 1 on cpu 2:
netpoll_send_skb (net/core/netpoll.c:319 net/core/netpoll.c:345 net/core/netpoll.c:393)
netpoll_send_udp (net/core/netpoll.c:?)
<snip>
value changed: 0x0000000a -> 0xffffffff
This happens because netpoll_owner_active() needs to check if the
current CPU is the owner of the lock, touching napi->poll_owner
non atomically. The ->poll_owner field contains the current CPU holding
the lock.
Use an atomic read to check if the poll owner is the current CPU.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240429100437.3487432-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the loopback driver and leverage the network
core allocation instead.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240429085559.2841918-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Flavio Suligoi says:
====================
dt-bindings: net: snps, dwmac: remove tx-sched-sp property
Strict priority for the tx scheduler is by default in Linux driver, so the
tx-sched-sp property was removed in commit aed6864035b1 ("net: stmmac:
platform: Delete a redundant condition branch").
This property is still in use in the following DT (and it will be removed
in a separate patch series):
- arch/arm64/boot/dts/freescale/imx8mp-beacon-som.dtsi
- arch/arm64/boot/dts/freescale/imx8mp-evk.dts
- arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi
- arch/arm64/boot/dts/qcom/sa8540p-ride.dts
- arch/arm64/boot/dts/qcom/sa8775p-ride.dts
There is no problem if that property is still used in the DTs above,
since, as seen above, it is a default property of the driver.
====================
Link: https://lore.kernel.org/r/20240429092654.31390-1-f.suligoi@asem.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Strict priority for the tx scheduler is by default in Linux driver, so the
tx-sched-sp property was removed in commit aed6864035b1 ("net: stmmac:
platform: Delete a redundant condition branch").
This property is still in use in the following DT (and it will be removed
in a separate patch series):
- arch/arm64/boot/dts/freescale/imx8mp-beacon-som.dtsi
- arch/arm64/boot/dts/freescale/imx8mp-evk.dts
- arch/arm64/boot/dts/freescale/imx8mp-verdin.dtsi
- arch/arm64/boot/dts/qcom/sa8540p-ride.dts
- arch/arm64/boot/dts/qcom/sa8775p-ride.dts
There is no problem if that property is still used in the DTs above,
since, as seen above, it is a default property of the driver.
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Adam Ford <aford173@gmail.com>
Link: https://lore.kernel.org/r/20240429092654.31390-2-f.suligoi@asem.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net: three additions to net_hotdata
This series moves three fast path sysctls to net_hotdata.
To avoid <net/hotdata.h> inclusion from <net/sock.h>,
create <net/proto_memory.h> to hold proto memory definitions.
====================
Link: https://lore.kernel.org/r/20240429134025.1233626-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sysctl_mem_pcpu_rsv is used in TCP fast path,
move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240429134025.1233626-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move some proto memory definitions out of <net/sock.h>
Very few files need them, and following patch
will include <net/hotdata.h> from <net/proto_memory.h>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240429134025.1233626-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_out_of_memory() has a single caller: tcp_check_oom().
Following patch will also make sk_memory_allocated()
not anymore visible from <net/sock.h> and <net/tcp.h>
Add const qualifier to sock argument of tcp_out_of_memory()
and tcp_check_oom().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240429134025.1233626-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sysctl_skb_defer_max is used in TCP fast path,
move it to net_hodata.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240429134025.1233626-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sysctl_max_skb_frags is used in TCP and MPTCP fast paths,
move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240429134025.1233626-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I added dst_rt6_info() in commit
e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper")
This patch does a similar change for IPv4.
Instead of (struct rtable *)dst casts, we can use :
#define dst_rtable(_ptr) \
container_of_const(_ptr, struct rtable, dst)
Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add helper i40e_vsi_reconfig_tc(vsi) that configures TC
for given VSI using previously stored TC bitmap.
Effectively replaces open-coded patterns:
enabled_tc = vsi->tc_config.enabled_tc;
vsi->tc_config.enabled_tc = 0;
i40e_vsi_config_tc(vsi, enabled_tc);
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add a helper to access main VEB:
i40e_pf_get_main_veb(pf) replaces 'pf->veb[pf->lan_veb]'
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In the driver code there are 3 types of checks whether given
VSI is main or not:
1. vsi->type ==/!= I40E_VSI_MAIN
2. vsi ==/!= pf->vsi[pf->lan_vsi]
3. vsi->seid ==/!= pf->vsi[pf->lan_vsi]->seid
All of them are equivalent and can be consolidated. Convert cases
2 and 3 to case 1.
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add simple helper i40e_pf_get_main_vsi(pf) to access main VSI
that replaces pattern 'pf->vsi[pf->lan_vsi]'
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 07d44190a389 ("i40e/i40evf: Detect and recover hung queue
scenario") changes i40e_detect_recover_hung() argument type from
i40e_pf* to i40e_vsi* to be shareable by both i40e and i40evf.
Because the i40evf does not exist anymore and the function is
exclusively used by i40e we can revert this change.
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 0ef2d5afb12d ("i40e: KISS the client interface") simplified
the client interface so in practice it supports only one client
per i40e netdev. But we have still 2 notification functions that
uses as parameter a pointer to VSI of netdevice associated with
the client. After the mentioned commit only possible and used
VSI is the main (LAN) VSI.
So refactor these functions so they are called with PF pointer argument
and the associated VSI (LAN) is taken inside them.
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The field is initialized always to zero and it is never read.
Remove it.
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Jakub Kicinski says:
====================
selftests: net: page_poll allocation error injection
Add a test for exercising driver memory allocation failure paths.
page pool is a bit tricky to inject errors into at the page allocator
level because of the bulk alloc and recycling, so add explicit error
injection support "in front" of the caches.
Add a test to exercise that using only the standard APIs.
This is the first useful test for the new tests with an endpoint.
There's no point testing netdevsim here, so this is also the first
HW-only test in Python.
I'm not super happy with the traffic generation using iperf3,
my initial approach was to use mausezahn. But it turned out to be
5x slower in terms of PPS. Hopefully this is good enough for now.
v1: https://lore.kernel.org/all/20240426232400.624864-1-kuba@kernel.org/
====================
Link: https://lore.kernel.org/r/20240429144426.743476-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Bugs in memory allocation failure paths are quite common.
Add a test exercising those paths based on qstat and page pool
failure hook.
Running on bnxt:
# ./drivers/net/hw/pp_alloc_fail.py
KTAP version 1
1..1
# ethtool -G change retval: success
ok 1 pp_alloc_fail.test_pp_alloc
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
I initially wrote this test to validate commit be43b7489a3c ("net/mlx5e:
RX, Fix page_pool allocation failure recovery for striding rq") but mlx5
still doesn't have qstat. So I run it on bnxt, and while bnxt survives
I found the problem fixed in commit 730117730709 ("eth: bnxt: fix counting
packets discarded due to OOM and netpoll").
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240429144426.743476-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While we are not very interested in testing performance
it's useful to be able to generate a lot of traffic.
iperf is the simplest way of getting relatively high PPS.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240429144426.743476-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When picking TCP ports to use, avoid all below 10k.
This should lower the chance of collision or running
afoul whatever random policies may be on the host.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240429144426.743476-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The main use of the ip() wrapper over cmd() is that it can parse JSON.
cmd("ip -j link show") will return stdout as a string, and test has
to call json.loads(). With ip("link show", json=True) the return value
will be already parsed.
More tools (ethtool, bpftool etc.) support the --json switch.
To avoid having to wrap all of them individually create a tool()
helper.
Switch from -j to --json (for ethtool).
While at it consume the netns attribute at the ip() level.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240429144426.743476-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We created a separate directory for HW-only tests, recently.
Glue in the Python test library there, Python is a bit annoying
when it comes to using library code located "lower"
in the directory structure.
Reuse the Env class, but let tests require non-nsim setup.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240429144426.743476-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Because of caching / recycling using the general page allocation
failures to induce errors in page pool allocation is very hard.
Add direct error injection support to page_pool_alloc_pages().
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240429144426.743476-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub reports that some tests fail on netdev CI when executed in a debug
kernel.
Increase test timeout to 30m, this should hopefully be enough.
Also reduce test duration where possible for "slow" machines.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240429105736.22677-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sfp_select_interface() does not modify its link_modes argument, so
make this a const pointer.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/E1s15s0-00AHyq-8E@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|