lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2024-10-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-06	sfc: implement per-queue rx drop and overrun stats	Edward Cree
	Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-06	sfc: implement basic per-queue stats	Edward Cree
	Just RX and TX packet counts and TX bytes for now. We do not have per-queue RX byte counts, which causes us to fail stats.pkt_byte_sum selftest with "Drivers should always report basic keys" error. Per-queue counts are since the last time the queue was inited (typically by efx_start_datapath(), on ifup or reconfiguration); device-wide total (efx_get_base_stats()) is since driver probe. This is not the same lifetime as rtnl_link_stats64, which uses firmware stats which count since FW (re)booted; this can cause a "Qstats are lower" or "RTNL stats are lower" failure in stats.pkt_byte_sum selftest. Move the increment of rx_queue->rx_packets to match the semantics specified for netdev per-queue stats, i.e. just before handing the packet to XDP (if present) or the netstack (through GRO). This will affect the existing ethtool -S output which also reports these counters. XDP TX packets are not yet counted into base_stats. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-03	sfc: Don't invoke xdp_do_flush() from netpoll.	Sebastian Andrzej Siewior
	Yury reported a crash in the sfc driver originated from netpoll_send_udp(). The netconsole sends a message and then netpoll invokes the driver's NAPI function with a budget of zero. It is dedicated to allow driver to free TX resources, that it may have used while sending the packet. In the netpoll case the driver invokes xdp_do_flush() unconditionally, leading to crash because bpf_net_context was never assigned. Invoke xdp_do_flush() only if budget is not zero. Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Reported-by: Yury Vostrikov <mon@unformed.ru> Closes: https://lore.kernel.org/5627f6d1-5491-4462-9d75-bc0612c26a22@app.fastmail.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20241002125837.utOcRo6Y@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03	net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().	Sebastian Andrzej Siewior
	xdp_do_flush_map() is deprecated and new code should use xdp_do_flush() instead. Replace xdp_do_flush_map() with xdp_do_flush(). Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Clark Wang <xiaoning.wang@nxp.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: David Arinzon <darinzon@amazon.com> Cc: Edward Cree <ecree.xilinx@gmail.com> Cc: Felix Fietkau <nbd@nbd.name> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Jassi Brar <jaswinder.singh@linaro.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: John Crispin <john@phrozen.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Cc: Louis Peens <louis.peens@corigine.com> Cc: Marcin Wojtas <mw@semihalf.com> Cc: Mark Lee <Mark-MC.Lee@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Noam Dagan <ndagan@amazon.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Shay Agroskin <shayagr@amazon.com> Cc: Shenwei Wang <shenwei.wang@nxp.com> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Wei Fang <wei.fang@nxp.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Arthur Kiyanovski <akiyano@amazon.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28	sfc: Remove struct efx_special_buffer	Martin Habets
	The attributes index and entries are no longer needed, so use struct efx_buffer instead. next_buffer_table was also Siena specific. Removed some checkpatch warnings. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-15	sfc: fix XDP queues mode with legacy IRQ	Íñigo Huguet
	In systems without MSI-X capabilities, xdp_txq_queues_mode is calculated in efx_allocate_msix_channels, but when enabling MSI-X fails, it was not changed to a proper default value. This was leading to the driver thinking that it has dedicated XDP queues, when it didn't. Fix it by setting xdp_txq_queues_mode to the correct value if the driver fallbacks to MSI or legacy IRQ mode. The correct value is EFX_XDP_TX_QUEUES_BORROWED because there are no XDP dedicated queues. The issue can be easily visible if the kernel is started with pci=nomsi, then a call trace is shown. It is not shown only with sfc's modparam interrupt_mode=2. Call trace example: WARNING: CPU: 2 PID: 663 at drivers/net/ethernet/sfc/efx_channels.c:828 efx_set_xdp_channels+0x124/0x260 [sfc] [...skip...] Call Trace: <TASK> efx_set_channels+0x5c/0xc0 [sfc] efx_probe_nic+0x9b/0x15a [sfc] efx_probe_all+0x10/0x1a2 [sfc] efx_pci_probe_main+0x12/0x156 [sfc] efx_pci_probe_post_io+0x18/0x103 [sfc] efx_pci_probe.cold+0x154/0x257 [sfc] local_pci_probe+0x42/0x80 Fixes: 6215b608a8c4 ("sfc: last resort fallback for lack of xdp tx queues") Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add start and stop methods to channels	Edward Cree
	The TC extra channel needs to do extra work in efx_{start,stop}_channels() to start/stop MAE counter streaming from the hardware. Add callbacks for it to implement. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28	net: drop the weight argument from netif_napi_add	Jakub Kicinski
	We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19	sfc: fix TX channel offset when using legacy interrupts	Íñigo Huguet
	In legacy interrupt mode the tx_channel_offset was hardcoded to 1, but that's not correct if efx_sepparate_tx_channels is false. In that case, the offset is 0 because the tx queues are in the single existing channel at index 0, together with the rx queue. Without this fix, as soon as you try to send any traffic, it tries to get the tx queues from an uninitialized channel getting these errors: WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/sfc/tx.c:540 efx_hard_start_xmit+0x12e/0x170 [sfc] [...] RIP: 0010:efx_hard_start_xmit+0x12e/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] Fixes: c308dfd1b43e ("sfc: fix wrong tx channel offset with efx_separate_tx_channels") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914103648.16902-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-29	sfc: fix wrong tx channel offset with efx_separate_tx_channels	Íñigo Huguet
	tx_channel_offset is calculated in efx_allocate_msix_channels, but it is also calculated again in efx_set_channels because it was originally done there, and when efx_allocate_msix_channels was introduced it was forgotten to be removed from efx_set_channels. Moreover, the old calculation is wrong when using efx_separate_tx_channels because now we can have XDP channels after the TX channels, so n_channels - n_tx_channels doesn't point to the first TX channel. Remove the old calculation from efx_set_channels, and add the initialization of this variable if MSI or legacy interrupts are used, next to the initialization of the rest of the related variables, where it was missing. Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	eth: sfc: remove remnants of the out-of-tree napi_weight module param	Jakub Kicinski
	Remove napi_weight statics which are set to 64 and never modified, remnants of the out-of-tree napi_weight module param. Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220512205603.1536771-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09	net: sfc: fix memory leak due to ptp channel	Taehee Yoo
	It fixes memory leak in ring buffer change logic. When ring buffer size is changed(ethtool -G eth0 rx 4096), sfc driver works like below. 1. stop all channels and remove ring buffers. 2. allocates new buffer array. 3. allocates rx buffers. 4. start channels. While the above steps are working, it skips some steps if the channel doesn't have a ->copy callback function. Due to ptp channel doesn't have ->copy callback, these above steps are skipped for ptp channel. It eventually makes some problems. a. ptp channel's ring buffer size is not changed, it works only 1024(default). b. memory leak. The reason for memory leak is to use the wrong ring buffer values. There are some values, which is related to ring buffer size. a. efx->rxq_entries - This is global value of rx queue size. b. rx_queue->ptr_mask - used for access ring buffer as circular ring. - roundup_pow_of_two(efx->rxq_entries) - 1 c. rx_queue->max_fill - efx->rxq_entries - EFX_RXD_HEAD_ROOM These all values should be based on ring buffer size consistently. But ptp channel's values are not. a. efx->rxq_entries - This is global(for sfc) value, always new ring buffer size. b. rx_queue->ptr_mask - This is always 1023(default). c. rx_queue->max_fill - This is new ring buffer size - EFX_RXD_HEAD_ROOM. Let's assume we set 4096 for rx ring buffer, normal channel ptp channel efx->rxq_entries 4096 4096 rx_queue->ptr_mask 4095 1023 rx_queue->max_fill 4086 4086 sfc driver allocates rx ring buffers based on these values. When it allocates ptp channel's ring buffer, 4086 ring buffers are allocated then, these buffers are attached to the allocated array. But ptp channel's ring buffer array size is still 1024(default) and ptr_mask is still 1023 too. So, 3062 ring buffers will be overwritten to the array. This is the reason for memory leak. Test commands: ethtool -G <interface name> rx 4096 while : do ip link set <interface name> up ip link set <interface name> down done In order to avoid this problem, it adds ->copy callback to ptp channel type. So that rx_queue->ptr_mask value will be updated correctly. Fixes: 7c236c43b838 ("sfc: Add support for IEEE-1588 PTP") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08	eth: switch to netif_napi_add_weight()	Jakub Kicinski
	Switch all Ethernet drivers which use custom napi weights to the new API. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-12	sfc: efx_default_channel_type APIs can be static	Martin Habets
	This means we can remove them from efx_channel.h and avoid naming conflicts later. efx_channel_dummy_op_void() cannot be static as it is used in ef100_nic.c. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-06	net: sfc: fix using uninitialized xdp tx_queue	Taehee Yoo
	In some cases, xdp tx_queue can get used before initialization. 1. interface up/down 2. ring buffer size change When CPU cores are lower than maximum number of channels of sfc driver, it creates new channels only for XDP. When an interface is up or ring buffer size is changed, all channels are initialized. But xdp channels are always initialized later. So, the below scenario is possible. Packets are received to rx queue of normal channels and it is acted XDP_TX and tx_queue of xdp channels get used. But these tx_queues are not initialized yet. If so, TX DMA or queue error occurs. In order to avoid this problem. 1. initializes xdp tx_queues earlier than other rx_queue in efx_start_channels(). 2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers(). Splat looks like: sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250 sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL) sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22 (raw=22) arg=789 sfc 0000:08:00.1 enp8s0f1np1: has been disabled Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01	net: sfc: add missing xdp queue reinitialization	Taehee Yoo
	After rx/tx ring buffer size is changed, kernel panic occurs when it acts XDP_TX or XDP_REDIRECT. When tx/rx ring buffer size is changed(ethtool -G), sfc driver reallocates and reinitializes rx and tx queues and their buffer (tx_queue->buffer). But it misses reinitializing xdp queues(efx->xdp_tx_queues). So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized tx_queue->buffer. A new function efx_set_xdp_channels() is separated from efx_set_channels() to handle only xdp queues. Splat looks like: BUG: kernel NULL pointer dereference, address: 000000000000002a #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#4] PREEMPT SMP NOPTI RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 5.17.0+ #55 e8beeee8289528f11357029357cf Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297 RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0 RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0 FS: 0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0 RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297 PKRU: 55555554 RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700 RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700 FS: 0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0 PKRU: 55555554 Call Trace: <IRQ> efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] ? enqueue_task_fair+0x95/0x550 efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-30	sfc: Avoid NULL pointer dereference on systems without numa awareness	Martin Habets
	On such systems cpumask_of_node() returns NULL, which bitmap operations are not happy with. Fixes: c265b569a45f ("sfc: default config to 1 channel/core in local NUMA node only") Fixes: 09a99ab16c60 ("sfc: set affinity hints in local NUMA node only") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/164857006953.8140.3265568858101821256.stgit@palantir17.mph.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01	sfc: set affinity hints in local NUMA node only	Íñigo Huguet
	Affinity hints were being set to CPUs in local NUMA node first, and then in other CPUs. This was creating 2 unintended issues: 1. Channels created to be assigned each to a different physical core were assigned to hyperthreading siblings because of being in same NUMA node. Since the patch previous to this one, this did not longer happen with default rss_cpus modparam because less channels are created. 2. XDP channels could be assigned to CPUs in different NUMA nodes, decreasing performance too much (to less than half in some of my tests). This patch sets the affinity hints spreading the channels only in local NUMA node's CPUs. A fallback for the case that no CPU in local NUMA node is online has been added too. Example of CPUs being assigned in a non optimal way before this and the previous patch (note: in this system, xdp-8 to xdp-15 are created because num_possible_cpus == 64, but num_present_cpus == 32 so they're never used): $ lscpu \| grep -i numa NUMA node(s): 2 NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 $ grep -H . /proc/irq//0000:07:00.0/../smp_affinity_list /proc/irq/141/0000:07:00.0-0/../smp_affinity_list:0 /proc/irq/142/0000:07:00.0-1/../smp_affinity_list:1 /proc/irq/143/0000:07:00.0-2/../smp_affinity_list:2 /proc/irq/144/0000:07:00.0-3/../smp_affinity_list:3 /proc/irq/145/0000:07:00.0-4/../smp_affinity_list:4 /proc/irq/146/0000:07:00.0-5/../smp_affinity_list:5 /proc/irq/147/0000:07:00.0-6/../smp_affinity_list:6 /proc/irq/148/0000:07:00.0-7/../smp_affinity_list:7 /proc/irq/149/0000:07:00.0-8/../smp_affinity_list:16 /proc/irq/150/0000:07:00.0-9/../smp_affinity_list:17 /proc/irq/151/0000:07:00.0-10/../smp_affinity_list:18 /proc/irq/152/0000:07:00.0-11/../smp_affinity_list:19 /proc/irq/153/0000:07:00.0-12/../smp_affinity_list:20 /proc/irq/154/0000:07:00.0-13/../smp_affinity_list:21 /proc/irq/155/0000:07:00.0-14/../smp_affinity_list:22 /proc/irq/156/0000:07:00.0-15/../smp_affinity_list:23 /proc/irq/157/0000:07:00.0-xdp-0/../smp_affinity_list:8 /proc/irq/158/0000:07:00.0-xdp-1/../smp_affinity_list:9 /proc/irq/159/0000:07:00.0-xdp-2/../smp_affinity_list:10 /proc/irq/160/0000:07:00.0-xdp-3/../smp_affinity_list:11 /proc/irq/161/0000:07:00.0-xdp-4/../smp_affinity_list:12 /proc/irq/162/0000:07:00.0-xdp-5/../smp_affinity_list:13 /proc/irq/163/0000:07:00.0-xdp-6/../smp_affinity_list:14 /proc/irq/164/0000:07:00.0-xdp-7/../smp_affinity_list:15 /proc/irq/165/0000:07:00.0-xdp-8/../smp_affinity_list:24 /proc/irq/166/0000:07:00.0-xdp-9/../smp_affinity_list:25 /proc/irq/167/0000:07:00.0-xdp-10/../smp_affinity_list:26 /proc/irq/168/0000:07:00.0-xdp-11/../smp_affinity_list:27 /proc/irq/169/0000:07:00.0-xdp-12/../smp_affinity_list:28 /proc/irq/170/0000:07:00.0-xdp-13/../smp_affinity_list:29 /proc/irq/171/0000:07:00.0-xdp-14/../smp_affinity_list:30 /proc/irq/172/0000:07:00.0-xdp-15/../smp_affinity_list:31 CPUs assignments after this and previous patch, so normal channels created only one per core in NUMA node and affinities set only to local NUMA node: $ grep -H . /proc/irq//0000:07:00.0/../smp_affinity_list /proc/irq/116/0000:07:00.0-0/../smp_affinity_list:0 /proc/irq/117/0000:07:00.0-1/../smp_affinity_list:1 /proc/irq/118/0000:07:00.0-2/../smp_affinity_list:2 /proc/irq/119/0000:07:00.0-3/../smp_affinity_list:3 /proc/irq/120/0000:07:00.0-4/../smp_affinity_list:4 /proc/irq/121/0000:07:00.0-5/../smp_affinity_list:5 /proc/irq/122/0000:07:00.0-6/../smp_affinity_list:6 /proc/irq/123/0000:07:00.0-7/../smp_affinity_list:7 /proc/irq/124/0000:07:00.0-xdp-0/../smp_affinity_list:16 /proc/irq/125/0000:07:00.0-xdp-1/../smp_affinity_list:17 /proc/irq/126/0000:07:00.0-xdp-2/../smp_affinity_list:18 /proc/irq/127/0000:07:00.0-xdp-3/../smp_affinity_list:19 /proc/irq/128/0000:07:00.0-xdp-4/../smp_affinity_list:20 /proc/irq/129/0000:07:00.0-xdp-5/../smp_affinity_list:21 /proc/irq/130/0000:07:00.0-xdp-6/../smp_affinity_list:22 /proc/irq/131/0000:07:00.0-xdp-7/../smp_affinity_list:23 /proc/irq/132/0000:07:00.0-xdp-8/../smp_affinity_list:0 /proc/irq/133/0000:07:00.0-xdp-9/../smp_affinity_list:1 /proc/irq/134/0000:07:00.0-xdp-10/../smp_affinity_list:2 /proc/irq/135/0000:07:00.0-xdp-11/../smp_affinity_list:3 /proc/irq/136/0000:07:00.0-xdp-12/../smp_affinity_list:4 /proc/irq/137/0000:07:00.0-xdp-13/../smp_affinity_list:5 /proc/irq/138/0000:07:00.0-xdp-14/../smp_affinity_list:6 /proc/irq/139/0000:07:00.0-xdp-15/../smp_affinity_list:7 Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01	sfc: default config to 1 channel/core in local NUMA node only	Íñigo Huguet
	Handling channels from CPUs in different NUMA node can penalize performance, so better configure only one channel per core in the same NUMA node than the NIC, and not per each core in the system. Fallback to all other online cores if there are not online CPUs in local NUMA node. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06	sfc: Use swap() instead of open coding it	Jiapeng Chong
	Clean the following coccicheck warning: ./drivers/net/ethernet/sfc/efx_channels.c:870:36-37: WARNING opportunity for swap(). ./drivers/net/ethernet/sfc/efx_channels.c:824:36-37: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-29	net: Don't include filter.h from net/sock.h	Jakub Kicinski
	sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-09-09	sfc: last resort fallback for lack of xdp tx queues	Íñigo Huguet
	Previous patch addressed the situation of having some free resources for xdp tx but not enough for one tx queue per CPU. This patch address the worst case of not having resources at all for xdp tx. Instead of using queues dedicated to xdp, normal queues used by network stack are shared for both cases, using __netif_tx_lock for synchronization. Also queue stop/restart must be considered in the xdp path to avoid freezing the queue. This is not the ideal situation we might want to be, and a performance penalty is expected both for normal and xdp traffic, but at least XDP will work in all possible situations (with a warning in the logs), improving a bit the pain of not knowing in what situations we can use it and in what situations we cannot. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-09	sfc: fallback for lack of xdp tx queues	Íñigo Huguet
	If there are not enough resources to allocate one TX queue per core for XDP TX it was completely disabled. This patch implements a fallback solution for sharing the available queues using __netif_tx_lock for synchronization. In the normal case that there is one TX queue per CPU, no locking is done, as it was before. With this fallback solution, XDP TX will work in much more cases that were failing, specially in machines with many CPUs. It's hard for XDP users to know what features are supported across different NICs and configurations, so they will benefit on having wider support. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13	sfc: add logs explaining XDP_TX/REDIRECT is not available	Íñigo Huguet
	If it's not possible to allocate enough channels for XDP, XDP_TX and XDP_REDIRECT don't work. However, only a message saying that not enough channels were available was shown, but not saying what are the consequences in that case. The user didn't know if he/she can use XDP or not, if the performance is reduced, or what. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13	sfc: ensure correct number of XDP queues	Íñigo Huguet
	Commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") intended to fix a problem caused by a round up when calculating the number of XDP channels and queues. However, this was not the real problem. The real problem was that the number of XDP TX queues had been reduced to half in commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"), but the variable xdp_tx_queue_count had remained the same. Once the correct number of XDP TX queues is created again in the previous patch of this series, this also can be reverted since the error doesn't actually exist. Only in the case that there is a bug in the code we can have different values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this, and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it happens again in the future. Note that the number of allocated queues can be higher than the number of used ones due to the round up, as explained in the existing comment in the code. That's why we also have to stop increasing xdp_queue_number beyond efx->xdp_tx_queue_count. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13	sfc: fix lack of XDP TX queues - error XDP TX failed (-22)	Íñigo Huguet
	Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues The buggy commit intended to allocate less channels for XDP in order to be more unlikely to reach the limit of 32 channels of the driver. The idea was to use each IRQ/eventqeue for more XDP TX queues than before, calculating which is the maximum number of TX queues that one event queue can handle. For example, in EF10 each event queue could handle up to 8 queues, better than the 4 they were handling before the change. This way, it would have to allocate half of channels than before for XDP TX. The problem is that the TX queues are also contained inside the channel structs, and there are only 4 queues per channel. Reducing the number of channels means also reducing the number of queues, resulting in not having the desired number of 1 queue per CPU. This leads to getting errors on XDP_TX and XDP_REDIRECT if they're executed from a high numbered CPU, because there only exist queues for the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low numbered CPU, the error doesn't happen. This is the error in the logs (repeated many times, even rate limited): sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) This errors happens in function efx_xdp_tx_buffers, where it expects to have a dedicated XDP TX queue per CPU. Reverting the change makes again more likely to reach the limit of 32 channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT will be possible at all, and we will have this log error messages: At interface probe: sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32) At every subsequent XDP_TX/REDIRECT failure, rate limited: sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) However, without reverting the change, it makes the user to think that everything is OK at probe time, but later it fails in an unpredictable way, depending on the CPU that handles the packet. It is better to restore the predictable behaviour. If the user sees the error message at probe time, he/she can try to configure the best way it fits his/her needs. At least, he/she will have 2 options: - Accept that XDP_TX/REDIRECT is not available (he/she may not need it) - Load sfc module with modparam 'rss_cpus' with a lower number, thus creating less normal RX queues/channels, letting more free resources for XDP, with some performance penalty. Anyway, let the calculation of maximum TX queues that can be handled by a single event queue, and use it only if it's less than the number of TX queues per channel. This doesn't happen in practice, but could happen if some constant values are tweaked in the future, such us EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE. Related mailing list thread: https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/ Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27	sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues	Ignat Korchagin
	efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is later used to allocate and traverse efx->xdp_tx_queues lookup array. However, we may end up not initializing all the array slots with real queues during probing. This results, for example, in a NULL pointer dereference, when running "# ethtool -S <iface>", similar to below [2570283.664955][T4126959] BUG: kernel NULL pointer dereference, address: 00000000000000f8 [2570283.681283][T4126959] #PF: supervisor read access in kernel mode [2570283.695678][T4126959] #PF: error_code(0x0000) - not-present page [2570283.710013][T4126959] PGD 0 P4D 0 [2570283.721649][T4126959] Oops: 0000 [#1] SMP PTI [2570283.734108][T4126959] CPU: 23 PID: 4126959 Comm: ethtool Tainted: G O 5.10.20-cloudflare-2021.3.1 #1 [2570283.752641][T4126959] Hardware name: <redacted> [2570283.781408][T4126959] RIP: 0010:efx_ethtool_get_stats+0x2ca/0x330 [sfc] [2570283.796073][T4126959] Code: 00 85 c0 74 39 48 8b 95 a8 0f 00 00 48 85 d2 74 2d 31 c0 eb 07 48 8b 95 a8 0f 00 00 48 63 c8 49 83 c4 08 83 c0 01 48 8b 14 ca <48> 8b 92 f8 00 00 00 49 89 54 24 f8 39 85 a0 0f 00 00 77 d7 48 8b [2570283.831259][T4126959] RSP: 0018:ffffb79a77657ce8 EFLAGS: 00010202 [2570283.845121][T4126959] RAX: 0000000000000019 RBX: ffffb799cd0c9280 RCX: 0000000000000018 [2570283.860872][T4126959] RDX: 0000000000000000 RSI: ffff96dd970ce000 RDI: 0000000000000005 [2570283.876525][T4126959] RBP: ffff96dd86f0a000 R08: ffff96dd970ce480 R09: 000000000000005f [2570283.892014][T4126959] R10: ffffb799cd0c9fff R11: ffffb799cd0c9000 R12: ffffb799cd0c94f8 [2570283.907406][T4126959] R13: ffffffffc11b1090 R14: ffff96dd970ce000 R15: ffffffffc11cd66c [2570283.922705][T4126959] FS: 00007fa7723f8740(0000) GS:ffff96f51fac0000(0000) knlGS:0000000000000000 [2570283.938848][T4126959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2570283.952524][T4126959] CR2: 00000000000000f8 CR3: 0000001a73e6e006 CR4: 00000000007706e0 [2570283.967529][T4126959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [2570283.982400][T4126959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [2570283.997308][T4126959] PKRU: 55555554 [2570284.007649][T4126959] Call Trace: [2570284.017598][T4126959] dev_ethtool+0x1832/0x2830 Fix this by adjusting efx->xdp_tx_queue_count after probing to reflect the true value of initialized slots in efx->xdp_tx_queues. Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Fixes: e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues") Cc: <stable@vger.kernel.org> # 5.12.x Signed-off-by: David S. Miller <davem@davemloft.net>
2021-01-22	sfc: reduce the number of requested xdp ev queues	Ivan Babrou
	Without this change the driver tries to allocate too many queues, breaching the number of available msi-x interrupts on machines with many logical cpus and default adapter settings: Insufficient resources for 12 XDP event queues (24 other channels, max 32) Which in turn triggers EINVAL on XDP processing: sfc 0000:86:00.0 ext0: XDP TX failed (-22) Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20210120212759.81548-1-ivan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-11	sfc: decouple TXQ type from label	Edward Cree
	Make it possible to have an arbitrary mapping from types to labels, because when we add inner-csum-offload TXQs there will no longer be a convenient nesting hierarchy of NIC types (EF10 will have inner-csum TXQs, while Siena will have HIGHPRI). Correct a misleading comment on efx_hard_start_xmit(). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11	sfc: cleanups around efx_alloc_channel	Edward Cree
	The old_channel argument is never used, so remove it. The function is only called from elsewhere in efx_channels.c, so make it static and remove the declaration from the header file. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02	sfc: assign TXQs without gaps	Edward Cree
	Since we only allocate VIs for the number of TXQs we actually need, we cannot naively use "channel * TXQ_TYPES + txq" for the TXQ number, as this has gaps (when efx->tx_queues_per_channel < EFX_TXQ_TYPES) and thus overruns the driver's VI allocations, causing the firmware to reject the MC_CMD_INIT_TXQ based on INSTANCE. Thus, we distinguish INSTANCE (stored in tx_queue->queue) from LABEL (tx_queue->label); the former is allocated starting from 0 in efx_set_channels(), while the latter is simply the txq type (index in channel->tx_queue array). To simplify things, rather than changing tx_queues_per_channel after setting up TXQs, make Siena always probe its HIGHPRI queues at start of day, rather than deferring it until tc mqprio enables them. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02	sfc: commonise netif_set_real_num[tr]x_queues calls	Edward Cree
	While we're at it, also check them for failure. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02	sfc: make tx_queues_per_channel variable at runtime	Edward Cree
	Siena needs four TX queues (csum * highpri), EF10 needs two (csum), and EF100 only needs one (as checksumming is controlled entirely by the transmit descriptor). Rather than having various bits of ad-hoc code to decide which queues to set up etc., put the knowledge of how many TXQs a channel has in one place. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02	sfc: move modparam 'rss_cpus' out of common channel code	Edward Cree
	Instead of exposing this old module parameter on the new driver (thus having to keep it forever after for compatibility), let's confine it to the old one; if we find later that we need the feature, we ought to support it properly, with ethtool set-channels. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02	sfc: move modparam 'interrupt_mode' out of common channel code	Edward Cree
	EF100 only supports MSI-X, so there's no need for the new driver to expose this old module parameter. Since it's now visible to the linker, we have to rename it internally to efx_interrupt_mode to avoid symbol collisions in non-modular builds. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02	sfc: remove max_interrupt_mode	Edward Cree
	All NICs supported by this driver are capable of MSI-X interrupts (only Falcon A1 wasn't, and that's now hived off into its own driver), so no need for a nic-type parameter. Besides, the code that checked it was buggy anyway (the following assignment that checked min_interrupt_mode overrode it). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30	sfc: initialise max_[tx_]channels in efx_init_channels()	Edward Cree
	Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29	sfc: don't try to create more channels than we can have VIs	Edward Cree
	Calculate efx->max_vis at probe time, and check against it in efx_allocate_msix_channels() when considering whether to create XDP TX channels. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller
	Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09	sfc: detach from cb_page in efx_copy_channel()	Edward Cree
	It's a resource, not a parameter, so we can't copy it into the new channel's TX queues, otherwise aliasing will lead to resource- management bugs if the channel is subsequently torn down without being initialised. Before the Fixes:-tagged commit there was a similar bug with tsoh_page, but I'm not sure it's worth doing another fix for such old kernels. Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2") Suggested-by: Derek Shute <Derek.Shute@stratus.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17	sfc: move some ARFS code out of headers	Edward Cree
	efx_filter_rfs_expire() is a work-function, so it being inline makes no sense. It's only ever used in efx_channels.c, so move it there. While we're at it, clean out some related unused cruft. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17	sfc: only schedule asynchronous filter work if needed	Edward Cree
	Prevent excessive CPU time spent running a workitem with nothing to do. We avoid any races by keeping the same check in efx_filter_rfs_expire(). Suggested-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09	sfc: conditioned some functionality	Alex Maftei (amaftei)
	Before calling certain function pointers, check that they are non-NULL. Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08	sfc: move event queue management code	Alex Maftei (amaftei)
	Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08	sfc: move channel interrupt management code	Alex Maftei (amaftei)
	Small code styling fixes included. Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08	sfc: move channel alloc/removal code	Alex Maftei (amaftei)
	Reallocation and copying code is included, as well as some housekeeping code. Other files have been patched up a bit to accommodate the changes. Small code styling fixes included. Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08	sfc: move channel start/stop code	Alex Maftei (amaftei)
	Also includes interrupt enabling/disabling code. Small code styling fixes included. Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08	sfc: move some channel-related code	Alex Maftei (amaftei)
	Just a handful of function, but also removed many 'static' identifiers so the code builds. These will, of course, be moved. Module parameters for IRQ moderation threshold also moved. Small code styling fixes included. Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>