summaryrefslogtreecommitdiff
path: root/drivers/net/veth.c
diff options
context:
space:
mode:
authorDaniel Borkmann <daniel@iogearbox.net>2022-01-06 01:46:06 +0100
committerDavid S. Miller <davem@davemloft.net>2022-01-06 13:49:54 +0000
commit710ad98c363a66a0cd8526465426c5c5f8377ee0 (patch)
treeaf6727f8d4f07e9495f6a606ad338b99f7df58b9 /drivers/net/veth.c
parentc288bc0db2d1938691ef283ce61ae6122e562bc3 (diff)
downloadlwn-710ad98c363a66a0cd8526465426c5c5f8377ee0.tar.gz
lwn-710ad98c363a66a0cd8526465426c5c5f8377ee0.zip
veth: Do not record rx queue hint in veth_xmit
Laurent reported that they have seen a significant amount of TCP retransmissions at high throughput from applications residing in network namespaces talking to the outside world via veths. The drops were seen on the qdisc layer (fq_codel, as per systemd default) of the phys device such as ena or virtio_net due to all traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the setup was _not_ using XDP on veths as the issue is generic.) More specifically, after edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") which made it all the way back to v4.19.184+, skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX queue by default for veths) instead of leaving at 0. This is eventually retained and callbacks like ena_select_queue() will also pick single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic is forwarded to that device via upper stack or other means. Similarly, for others not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well. In general, it is a _bad_ idea for virtual devices like veth to mess around with queue selection [by default]. Given dev->real_num_tx_queues is by default 1, the skb->queue_mapping was left untouched, and so prior to edbea9220251 the netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device. Unbreak this and restore prior behavior by removing the skb_record_rx_queue() from veth_xmit() altogether. If the veth peer has an XDP program attached, then it would return the first RX queue index in xdp_md->rx_queue_index (unless configured in non-default manner). However, this is still better than breaking the generic case. Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") Fixes: 638264dc9022 ("veth: Support per queue XDP ring") Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Toshiaki Makita <toshiaki.makita1@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers/net/veth.c')
-rw-r--r--drivers/net/veth.c1
1 files changed, 0 insertions, 1 deletions
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index d21dd25f429e..354a963075c5 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -335,7 +335,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
*/
use_napi = rcu_access_pointer(rq->napi) &&
veth_skb_is_eligible_for_gro(dev, rcv, skb);
- skb_record_rx_queue(skb, rxq);
}
skb_tx_timestamp(skb);