diff options
author | Alexei Starovoitov <ast@kernel.org> | 2023-02-02 20:48:24 -0800 |
---|---|---|
committer | Alexei Starovoitov <ast@kernel.org> | 2023-02-02 20:48:24 -0800 |
commit | 0a312cf8dbec111df6d5c72e0a6b017d48a829c6 (patch) | |
tree | 2b2ed1a72d237cba78d026aedc5f6afb205659b5 | |
parent | 150809082aab95f7078f343c1c82770d6c9ebe68 (diff) | |
parent | 4dba3e7852b7717a20ba206ab23ad289a0a5c504 (diff) | |
download | lwn-0a312cf8dbec111df6d5c72e0a6b017d48a829c6.tar.gz lwn-0a312cf8dbec111df6d5c72e0a6b017d48a829c6.zip |
Merge branch 'xdp: introduce xdp-feature support'
Lorenzo Bianconi says:
====================
Introduce the capability to export the XDP features supported by the NIC.
Introduce a XDP compliance test tool (xdp_features) to check the features
exported by the NIC match the real features supported by the driver.
Allow XDP_REDIRECT of non-linear XDP frames into a devmap.
Export XDP features for each XDP capable driver.
Extend libbpf netlink implementation in order to support netlink_generic
protocol.
Introduce a simple generic netlink family for netdev data.
Changes since v4:
- rebase on top of bpf-next
- get rid of XDP_FEATURE_* enum in XDP compliance test tool
- rely on AF_INET6 address family and on IPv6-mapped-IPv6 addresses for IPv4
in XDP compliance test tool
- add tsnep driver support
Changes since v3:
- add IPv6 support to XDP compliance test tool
- rely on network_helpers in XDP compliance test tool
- cosmetics changes
Changes since v2:
- rebase on top of bpf-next
- fix compilation error
Changes since v1:
- add Documentation to netdev.yaml
- use flags instead of enum as type for netdev.yaml definitions
- squash XDP_PASS, XDP_DROP, XDP_TX and XDP_ABORTED into XDP_BASIC since they
are supported by all drivers.
- add notifier event to xdp_features_set_redirect_target() and
xdp_features_clear_redirect_target()
- add selftest for xdp-features support in bpf_xdp_detach()
- add IPv6 preliminary support to XDP compliance test tool
Changes since RFCv2:
- do not assume fixed layout for genl kernel messages
- fix warnings in netdev_nl_dev_fill
- fix capabilities for nfp driver
- add supported_sg parameter to xdp_features_set_redirect_target and drop
__xdp_features_set_redirect_target routine
Changes since RFCv1:
- Introduce netdev-genl implementation and get rid of rtnl one.
- Introduce netlink_generic support in libbpf netlink implementation
- Rename XDP_FEATURE_* in NETDEV_XDP_ACT_*
- Rename XDP_FEATURE_REDIRECT_TARGET in NETDEV_XDP_ACT_NDO_XMIT
- Rename XDP_FEATURE_FRAG_RX in NETDEV_XDP_ACT_RX_SG
- Rename XDP_FEATURE_FRAG_TARFET in NETDEV_XDP_ACT_NDO_XMIT
- Get rid of XDP_LOCK feature.
- Move xdp_feature field in a netdevice struct hole in the 4th cacheline.
Jakub Kicinski (1):
netdev-genl: create a simple family for netdev stuff
Lorenzo Bianconi (5):
libbpf: add the capability to specify netlink proto in
libbpf_netlink_send_recv
libbpf: add API to get XDP/XSK supported features
bpf: devmap: check XDP features in __xdp_enqueue routine
selftests/bpf: add test for bpf_xdp_query xdp-features support
selftests/bpf: introduce XDP compliance test tool
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
63 files changed, 1945 insertions, 33 deletions
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml new file mode 100644 index 000000000000..b4dcdae54ffd --- /dev/null +++ b/Documentation/netlink/specs/netdev.yaml @@ -0,0 +1,100 @@ +name: netdev + +doc: + netdev configuration over generic netlink. + +definitions: + - + type: flags + name: xdp-act + entries: + - + name: basic + doc: + XDP feautues set supported by all drivers + (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) + - + name: redirect + doc: + The netdev supports XDP_REDIRECT + - + name: ndo-xmit + doc: + This feature informs if netdev implements ndo_xdp_xmit callback. + - + name: xsk-zerocopy + doc: + This feature informs if netdev supports AF_XDP in zero copy mode. + - + name: hw-offload + doc: + This feature informs if netdev supports XDP hw oflloading. + - + name: rx-sg + doc: + This feature informs if netdev implements non-linear XDP buffer + support in the driver napi callback. + - + name: ndo-xmit-sg + doc: + This feature informs if netdev implements non-linear XDP buffer + support in ndo_xdp_xmit callback. + +attribute-sets: + - + name: dev + attributes: + - + name: ifindex + doc: netdev ifindex + type: u32 + value: 1 + checks: + min: 1 + - + name: pad + type: pad + - + name: xdp-features + doc: Bitmask of enabled xdp-features. + type: u64 + enum: xdp-act + enum-as-flags: true + +operations: + list: + - + name: dev-get + doc: Get / dump information about a netdev. + value: 1 + attribute-set: dev + do: + request: + attributes: + - ifindex + reply: &dev-all + attributes: + - ifindex + - xdp-features + dump: + reply: *dev-all + - + name: dev-add-ntf + doc: Notification about device appearing. + notify: dev-get + mcgrp: mgmt + - + name: dev-del-ntf + doc: Notification about device disappearing. + notify: dev-get + mcgrp: mgmt + - + name: dev-change-ntf + doc: Notification about device configuration being changed. + notify: dev-get + mcgrp: mgmt + +mcast-groups: + list: + - + name: mgmt diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index e8ad5ea31aff..d3999db7c6a2 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -597,7 +597,9 @@ static int ena_xdp_set(struct net_device *netdev, struct netdev_bpf *bpf) if (rc) return rc; } + xdp_features_set_redirect_target(netdev, false); } else if (old_bpf_prog) { + xdp_features_clear_redirect_target(netdev); rc = ena_destroy_and_free_all_xdp_queues(adapter); if (rc) return rc; @@ -4103,6 +4105,8 @@ static void ena_set_conf_feat_params(struct ena_adapter *adapter, /* Set offload features */ ena_set_dev_offloads(feat, netdev); + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + adapter->max_mtu = feat->dev_attr.max_mtu; netdev->max_mtu = adapter->max_mtu; netdev->min_mtu = ENA_MIN_MTU; diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c index 06508eebb585..d6d6d5d37ff3 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c @@ -384,6 +384,11 @@ void aq_nic_ndev_init(struct aq_nic_s *self) self->ndev->mtu = aq_nic_cfg->mtu - ETH_HLEN; self->ndev->max_mtu = aq_hw_caps->mtu - ETH_FCS_LEN - ETH_HLEN; + self->ndev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; } void aq_nic_set_tx_ring(struct aq_nic_s *self, unsigned int idx, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 240a7e8a7652..a1b4356dfb6c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -13686,6 +13686,9 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) netif_set_tso_max_size(dev, GSO_MAX_SIZE); + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_RX_SG; + #ifdef CONFIG_BNXT_SRIOV init_waitqueue_head(&bp->sriov_cfg_wait); #endif diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 36d5202c0aee..5843c93b1711 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -422,9 +422,11 @@ static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog) if (prog) { bnxt_set_rx_skb_mode(bp, true); + xdp_features_set_redirect_target(dev, true); } else { int rx, tx; + xdp_features_clear_redirect_target(dev); bnxt_set_rx_skb_mode(bp, false); bnxt_get_max_rings(bp, &rx, &tx, true); if (rx > 1) { diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index f2f95493ec89..8b25313c7f6b 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -2218,6 +2218,8 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) netdev->netdev_ops = &nicvf_netdev_ops; netdev->watchdog_timeo = NICVF_TX_TIMEOUT; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC; + /* MTU range: 64 - 9200 */ netdev->min_mtu = NIC_HW_MIN_FRS; netdev->max_mtu = NIC_HW_MAX_FRS; diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index c3cf427a9409..6982aaa928b5 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -1926,6 +1926,10 @@ static int tsnep_probe(struct platform_device *pdev) netdev->features = NETIF_F_SG; netdev->hw_features = netdev->features | NETIF_F_LOOPBACK; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_NDO_XMIT_SG; + /* carrier off reporting is important to ethtool even BEFORE open */ netif_carrier_off(netdev); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 027fff9f7db0..9318a2554056 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -244,6 +244,10 @@ static int dpaa_netdev_init(struct net_device *net_dev, net_dev->features |= net_dev->hw_features; net_dev->vlan_features = net_dev->features; + net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + if (is_valid_ether_addr(mac_addr)) { memcpy(net_dev->perm_addr, mac_addr, net_dev->addr_len); eth_hw_addr_set(net_dev, mac_addr); diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 2e79d18fc3c7..746ccfde7255 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -4596,6 +4596,10 @@ static int dpaa2_eth_netdev_init(struct net_device *net_dev) NETIF_F_LLTX | NETIF_F_HW_TC | NETIF_F_TSO; net_dev->gso_max_segs = DPAA2_ETH_ENQUEUE_MAX_FDS; net_dev->hw_features = net_dev->features; + net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY | + NETDEV_XDP_ACT_NDO_XMIT; if (priv->dpni_attrs.vlan_filter_entries) net_dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER; diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index 7facc7d5261e..6b54071d4ecc 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -807,6 +807,9 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev, ndev->hw_features |= NETIF_F_RXHASH; ndev->priv_flags |= IFF_UNICAST_FLT; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) { priv->active_offloads |= ENETC_F_QCI; diff --git a/drivers/net/ethernet/fungible/funeth/funeth_main.c b/drivers/net/ethernet/fungible/funeth/funeth_main.c index b4cce30e526a..df86770731ad 100644 --- a/drivers/net/ethernet/fungible/funeth/funeth_main.c +++ b/drivers/net/ethernet/fungible/funeth/funeth_main.c @@ -1160,6 +1160,11 @@ static int fun_xdp_setup(struct net_device *dev, struct netdev_bpf *xdp) WRITE_ONCE(rxqs[i]->xdp_prog, prog); } + if (prog) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); + dev->max_mtu = prog ? XDP_MAX_MTU : FUN_MAX_MTU; old_prog = xchg(&fp->xdp_prog, prog); if (old_prog) @@ -1765,6 +1770,7 @@ static int fun_create_netdev(struct fun_ethdev *ed, unsigned int portid) netdev->vlan_features = netdev->features & VLAN_FEAT; netdev->mpls_features = netdev->vlan_features; netdev->hw_enc_features = netdev->hw_features; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; netdev->min_mtu = ETH_MIN_MTU; netdev->max_mtu = FUN_MAX_MTU; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 53d0083e35da..8a79cc18c428 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -13339,9 +13339,11 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog, old_prog = xchg(&vsi->xdp_prog, prog); if (need_reset) { - if (!prog) + if (!prog) { + xdp_features_clear_redirect_target(vsi->netdev); /* Wait until ndo_xsk_wakeup completes. */ synchronize_rcu(); + } i40e_reset_and_rebuild(pf, true, true); } @@ -13362,11 +13364,13 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog, /* Kick start the NAPI context if there is an AF_XDP socket open * on that queue id. This so that receiving will start. */ - if (need_reset && prog) + if (need_reset && prog) { for (i = 0; i < vsi->num_queue_pairs; i++) if (vsi->xdp_rings[i]->xsk_pool) (void)i40e_xsk_wakeup(vsi->netdev, i, XDP_WAKEUP_RX); + xdp_features_set_redirect_target(vsi->netdev, true); + } return 0; } @@ -13783,6 +13787,8 @@ static int i40e_config_netdev(struct i40e_vsi *vsi) netdev->hw_enc_features |= NETIF_F_TSO_MANGLEID; netdev->features &= ~NETIF_F_HW_TC; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; if (vsi->type == I40E_VSI_MAIN) { SET_NETDEV_DEV(netdev, &pf->pdev->dev); diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 26a8910a41ff..074b0e6d0e2d 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -22,6 +22,7 @@ #include "ice_eswitch.h" #include "ice_tc_lib.h" #include "ice_vsi_vlan_ops.h" +#include <net/xdp_sock_drv.h> #define DRV_SUMMARY "Intel(R) Ethernet Connection E800 Series Linux Driver" static const char ice_driver_string[] = DRV_SUMMARY; @@ -2912,11 +2913,13 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); } + xdp_features_set_redirect_target(vsi->netdev, false); /* reallocate Rx queues that are used for zero-copy */ xdp_ring_err = ice_realloc_zc_buf(vsi, true); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Rx resources failed"); } else if (ice_is_xdp_ena_vsi(vsi) && !prog) { + xdp_features_clear_redirect_target(vsi->netdev); xdp_ring_err = ice_destroy_xdp_rings(vsi); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Freeing XDP Tx resources failed"); @@ -3459,6 +3462,8 @@ static int ice_cfg_netdev(struct ice_vsi *vsi) np->vsi = vsi; ice_set_netdev_features(netdev); + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 3c0c35ecea10..0e11a082f7a1 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2871,8 +2871,14 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf) bpf_prog_put(old_prog); /* bpf is just replaced, RXQ and MTU are already setup */ - if (!need_reset) + if (!need_reset) { return 0; + } else { + if (prog) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); + } if (running) igb_open(dev); @@ -3317,6 +3323,7 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) netdev->priv_flags |= IFF_SUPP_NOFCS; netdev->priv_flags |= IFF_UNICAST_FLT; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; /* MTU range: 68 - 9216 */ netdev->min_mtu = ETH_MIN_MTU; diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index e86b15efaeb8..8b572cd2c350 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -6533,6 +6533,9 @@ static int igc_probe(struct pci_dev *pdev, netdev->mpls_features |= NETIF_F_HW_CSUM; netdev->hw_enc_features |= netdev->vlan_features; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; + /* MTU range: 68 - 9216 */ netdev->min_mtu = ETH_MIN_MTU; netdev->max_mtu = MAX_STD_JUMBO_FRAME_SIZE; diff --git a/drivers/net/ethernet/intel/igc/igc_xdp.c b/drivers/net/ethernet/intel/igc/igc_xdp.c index aeeb34e64610..e27af72aada8 100644 --- a/drivers/net/ethernet/intel/igc/igc_xdp.c +++ b/drivers/net/ethernet/intel/igc/igc_xdp.c @@ -29,6 +29,11 @@ int igc_xdp_set_prog(struct igc_adapter *adapter, struct bpf_prog *prog, if (old_prog) bpf_prog_put(old_prog); + if (prog) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); + if (if_running) igc_open(dev); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 43a44c1e1576..af4c12b6059f 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -10301,6 +10301,8 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog) if (err) return -EINVAL; + if (!prog) + xdp_features_clear_redirect_target(dev); } else { for (i = 0; i < adapter->num_rx_queues; i++) { WRITE_ONCE(adapter->rx_ring[i]->xdp_prog, @@ -10321,6 +10323,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog) if (adapter->xdp_ring[i]->xsk_pool) (void)ixgbe_xsk_wakeup(adapter->netdev, i, XDP_WAKEUP_RX); + xdp_features_set_redirect_target(dev, true); } return 0; @@ -11018,6 +11021,9 @@ skip_sriov: netdev->priv_flags |= IFF_UNICAST_FLT; netdev->priv_flags |= IFF_SUPP_NOFCS; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; + /* MTU range: 68 - 9710 */ netdev->min_mtu = ETH_MIN_MTU; netdev->max_mtu = IXGBE_MAX_JUMBO_FRAME_SIZE - (ETH_HLEN + ETH_FCS_LEN); diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index ea0a230c1153..a44e4bd56142 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -4634,6 +4634,7 @@ static int ixgbevf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) NETIF_F_HW_VLAN_CTAG_TX; netdev->priv_flags |= IFF_UNICAST_FLT; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC; /* MTU range: 68 - 1504 or 9710 */ netdev->min_mtu = ETH_MIN_MTU; diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index f8925cac61e4..dc2989103a77 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -5612,6 +5612,9 @@ static int mvneta_probe(struct platform_device *pdev) NETIF_F_TSO | NETIF_F_RXCSUM; dev->hw_features |= dev->features; dev->vlan_features |= dev->features; + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netif_set_tso_max_segs(dev, MVNETA_MAX_TSO_SEGS); diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 4da45c5abba5..9b4ecbe4f36d 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -6866,6 +6866,10 @@ static int mvpp2_port_probe(struct platform_device *pdev, dev->vlan_features |= features; netif_set_tso_max_segs(dev, MVPP2_MAX_TSO_SEGS); + + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + dev->priv_flags |= IFF_UNICAST_FLT; /* MTU range: 68 - 9704 */ diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index c1ea60bc2630..179433d0a54a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -2512,10 +2512,13 @@ static int otx2_xdp_setup(struct otx2_nic *pf, struct bpf_prog *prog) /* Network stack and XDP shared same rx queues. * Use separate tx queues for XDP and network stack. */ - if (pf->xdp_prog) + if (pf->xdp_prog) { pf->hw.xdp_queues = pf->hw.rx_queues; - else + xdp_features_set_redirect_target(dev, false); + } else { pf->hw.xdp_queues = 0; + xdp_features_clear_redirect_target(dev); + } pf->hw.tot_tx_queues += pf->hw.xdp_queues; @@ -2878,6 +2881,7 @@ static int otx2_probe(struct pci_dev *pdev, const struct pci_device_id *id) netdev->watchdog_timeo = OTX2_TX_TIMEOUT; netdev->netdev_ops = &otx2_netdev_ops; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; netdev->min_mtu = OTX2_MIN_MTU; netdev->max_mtu = otx2_get_max_mtu(pf); diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 801deac58bf7..ac54b6f2bb5c 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -4447,6 +4447,12 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) register_netdevice_notifier(&mac->device_notifier); } + if (mtk_page_pool_enabled(eth)) + eth->netdev[id]->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_NDO_XMIT_SG; + return 0; free_netdev: diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index af4c4858f397..e11bc0ac880e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -3416,6 +3416,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, priv->rss_hash_fn = ETH_RSS_HASH_TOP; } + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + /* MTU range: 68 - hw-specific max */ dev->min_mtu = ETH_MIN_MTU; dev->max_mtu = priv->max_mtu; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 0e87432ec6f1..e4996ef04d86 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4780,6 +4780,13 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) if (old_prog) bpf_prog_put(old_prog); + if (reset) { + if (prog) + xdp_features_set_redirect_target(netdev, true); + else + xdp_features_clear_redirect_target(netdev); + } + if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset) goto unlock; @@ -5175,6 +5182,10 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->features |= NETIF_F_HIGHDMA; netdev->features |= NETIF_F_HW_VLAN_STAG_FILTER; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY | + NETDEV_XDP_ACT_RX_SG; + netdev->priv_flags |= IFF_UNICAST_FLT; netif_set_tso_max_size(netdev, GSO_MAX_SIZE); diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 2f6a048dee90..6120f2b6684f 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -2160,6 +2160,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx, ndev->hw_features |= NETIF_F_RXHASH; ndev->features = ndev->hw_features; ndev->vlan_features = 0; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; err = register_netdev(ndev); if (err) { diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 18fc9971f1c8..e4825d885560 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -2529,10 +2529,15 @@ static void nfp_net_netdev_init(struct nfp_net *nn) netdev->features &= ~NETIF_F_HW_VLAN_STAG_RX; nn->dp.ctrl &= ~NFP_NET_CFG_CTRL_RXQINQ; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC; + if (nn->app && nn->app->type->id == NFP_APP_BPF_NIC) + netdev->xdp_features |= NETDEV_XDP_ACT_HW_OFFLOAD; + /* Finalise the netdev setup */ switch (nn->dp.ops->version) { case NFP_NFD_VER_NFD3: netdev->netdev_ops = &nfp_nfd3_netdev_ops; + netdev->xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY; break; case NFP_NFD_VER_NFDK: netdev->netdev_ops = &nfp_nfdk_netdev_ops; diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 953f304b8588..b6d999927e86 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -892,6 +892,9 @@ static void qede_init_ndev(struct qede_dev *edev) ndev->hw_features = hw_features; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + /* MTU range: 46 - 9600 */ ndev->min_mtu = ETH_ZLEN - ETH_HLEN; ndev->max_mtu = QEDE_MAX_JUMBO_PACKET_SIZE; diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 0556542d7a6b..18ff8d8cff42 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -1078,6 +1078,10 @@ static int efx_pci_probe(struct pci_dev *pci_dev, pci_info(pci_dev, "Solarflare NIC detected\n"); + efx->net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + if (!efx->type->is_vf) efx_probe_vpd_strings(efx); diff --git a/drivers/net/ethernet/sfc/siena/efx.c b/drivers/net/ethernet/sfc/siena/efx.c index 60e5b7c8ccf9..a6ef21845224 100644 --- a/drivers/net/ethernet/sfc/siena/efx.c +++ b/drivers/net/ethernet/sfc/siena/efx.c @@ -1048,6 +1048,10 @@ static int efx_pci_probe(struct pci_dev *pci_dev, pci_info(pci_dev, "Solarflare NIC detected\n"); + efx->net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + if (!efx->type->is_vf) efx_probe_vpd_strings(efx); diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 9b46579b5a10..2d7347b71c41 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -2104,6 +2104,9 @@ static int netsec_probe(struct platform_device *pdev) NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM; ndev->hw_features = ndev->features; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + priv->rx_cksum_offload_flag = true; ret = netsec_register_mdio(priv, phy_addr); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b7e5af58ab75..734d84263fd2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7150,6 +7150,8 @@ int stmmac_dvr_probe(struct device *device, ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; ret = stmmac_tc_init(priv, priv); if (!ret) { diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 13c9c2d6b79b..37f0b62ec5d6 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1458,6 +1458,8 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv) priv_sl2->emac_port = 1; cpsw->slaves[1].ndev = ndev; ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; ndev->netdev_ops = &cpsw_netdev_ops; ndev->ethtool_ops = &cpsw_ethtool_ops; @@ -1635,6 +1637,8 @@ static int cpsw_probe(struct platform_device *pdev) cpsw->slaves[0].ndev = ndev; ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; ndev->netdev_ops = &cpsw_netdev_ops; ndev->ethtool_ops = &cpsw_ethtool_ops; diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c index 83596ec0c7cb..35128dd45ffc 100644 --- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -1405,6 +1405,10 @@ static int cpsw_create_ports(struct cpsw_common *cpsw) ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_NETNS_LOCAL | NETIF_F_HW_TC; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + ndev->netdev_ops = &cpsw_netdev_ops; ndev->ethtool_ops = &cpsw_ethtool_ops; SET_NETDEV_DEV(ndev, dev); diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index f9b219e6cd58..a9b139bbdb2c 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2559,6 +2559,8 @@ static int netvsc_probe(struct hv_device *dev, netdev_lockdep_set_classes(net); + net->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + /* MTU range: 68 - 1500 or 65521 */ net->min_mtu = NETVSC_MTU_MIN; if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2) diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 6db6a75ff9b9..35fa1ca98671 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -286,6 +286,7 @@ static void nsim_setup(struct net_device *dev) NETIF_F_TSO; dev->hw_features |= NETIF_F_HW_TC; dev->max_mtu = ETH_MAX_MTU; + dev->xdp_features = NETDEV_XDP_ACT_HW_OFFLOAD; } static int nsim_init_netdevsim(struct netdevsim *ns) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a7d17c680f4a..36620afde373 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1401,6 +1401,11 @@ static void tun_net_initialize(struct net_device *dev) eth_hw_addr_random(dev); + /* Currently tun does not support XDP, only tap does. */ + dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + break; } diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ba3e05832843..1bb54de7124d 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1686,6 +1686,10 @@ static void veth_setup(struct net_device *dev) dev->hw_enc_features = VETH_FEATURES; dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE; netif_set_tso_max_size(dev, GSO_MAX_SIZE); + + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; } /* diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7e1a98430190..692dff071782 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3280,7 +3280,10 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (i == 0 && !old_prog) virtnet_clear_guest_offloads(vi); } + if (!old_prog) + xdp_features_set_redirect_target(dev, false); } else { + xdp_features_clear_redirect_target(dev); vi->xdp_enabled = false; } @@ -3910,6 +3913,7 @@ static int virtnet_probe(struct virtio_device *vdev) dev->hw_features |= NETIF_F_GRO_HW; dev->vlan_features = dev->features; + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; /* MTU range: 68 - 65535 */ dev->min_mtu = MIN_MTU; diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 12b074286df9..47d54d8ea59d 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1741,6 +1741,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev) * negotiate with the backend regarding supported features. */ netdev->features |= netdev->hw_features; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; netdev->ethtool_ops = &xennet_ethtool_ops; netdev->min_mtu = ETH_MIN_MTU; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2466afa25078..0f7967591288 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -47,6 +47,7 @@ #include <uapi/linux/netdevice.h> #include <uapi/linux/if_bonding.h> #include <uapi/linux/pkt_cls.h> +#include <uapi/linux/netdev.h> #include <linux/hashtable.h> #include <linux/rbtree.h> #include <net/net_trackers.h> @@ -2055,6 +2056,7 @@ struct net_device { /* Read-mostly cache-line for fast-path access */ unsigned int flags; + xdp_features_t xdp_features; unsigned long long priv_flags; const struct net_device_ops *netdev_ops; const struct xdp_metadata_ops *xdp_metadata_ops; @@ -2839,6 +2841,7 @@ enum netdev_cmd { NETDEV_OFFLOAD_XSTATS_DISABLE, NETDEV_OFFLOAD_XSTATS_REPORT_USED, NETDEV_OFFLOAD_XSTATS_REPORT_DELTA, + NETDEV_XDP_FEAT_CHANGE, }; const char *netdev_cmd_to_name(enum netdev_cmd cmd); diff --git a/include/net/xdp.h b/include/net/xdp.h index 91292aa13bc0..d517bfac937b 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -7,6 +7,7 @@ #define __LINUX_NET_XDP_H__ #include <linux/skbuff.h> /* skb_shared_info */ +#include <uapi/linux/netdev.h> /** * DOC: XDP RX-queue information @@ -43,6 +44,8 @@ enum xdp_mem_type { MEM_TYPE_MAX, }; +typedef u32 xdp_features_t; + /* XDP flags for ndo_xdp_xmit */ #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH @@ -425,9 +428,21 @@ MAX_XDP_METADATA_KFUNC, #ifdef CONFIG_NET u32 bpf_xdp_metadata_kfunc_id(int id); bool bpf_dev_bound_kfunc_id(u32 btf_id); +void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg); +void xdp_features_clear_redirect_target(struct net_device *dev); #else static inline u32 bpf_xdp_metadata_kfunc_id(int id) { return 0; } static inline bool bpf_dev_bound_kfunc_id(u32 btf_id) { return false; } + +static inline void +xdp_features_set_redirect_target(struct net_device *dev, bool support_sg) +{ +} + +static inline void +xdp_features_clear_redirect_target(struct net_device *dev) +{ +} #endif #endif /* __LINUX_NET_XDP_H__ */ diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h new file mode 100644 index 000000000000..9ee459872600 --- /dev/null +++ b/include/uapi/linux/netdev.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN uapi header */ + +#ifndef _UAPI_LINUX_NETDEV_H +#define _UAPI_LINUX_NETDEV_H + +#define NETDEV_FAMILY_NAME "netdev" +#define NETDEV_FAMILY_VERSION 1 + +/** + * enum netdev_xdp_act + * @NETDEV_XDP_ACT_BASIC: XDP feautues set supported by all drivers + * (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) + * @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT + * @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements + * ndo_xdp_xmit callback. + * @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP + * in zero copy mode. + * @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw + * oflloading. + * @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear + * XDP buffer support in the driver napi callback. + * @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements + * non-linear XDP buffer support in ndo_xdp_xmit callback. + */ +enum netdev_xdp_act { + NETDEV_XDP_ACT_BASIC = 1, + NETDEV_XDP_ACT_REDIRECT = 2, + NETDEV_XDP_ACT_NDO_XMIT = 4, + NETDEV_XDP_ACT_XSK_ZEROCOPY = 8, + NETDEV_XDP_ACT_HW_OFFLOAD = 16, + NETDEV_XDP_ACT_RX_SG = 32, + NETDEV_XDP_ACT_NDO_XMIT_SG = 64, +}; + +enum { + NETDEV_A_DEV_IFINDEX = 1, + NETDEV_A_DEV_PAD, + NETDEV_A_DEV_XDP_FEATURES, + + __NETDEV_A_DEV_MAX, + NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) +}; + +enum { + NETDEV_CMD_DEV_GET = 1, + NETDEV_CMD_DEV_ADD_NTF, + NETDEV_CMD_DEV_DEL_NTF, + NETDEV_CMD_DEV_CHANGE_NTF, + + __NETDEV_CMD_MAX, + NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1) +}; + +#define NETDEV_MCGRP_MGMT "mgmt" + +#endif /* _UAPI_LINUX_NETDEV_H */ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index d01e4c55b376..2675fefc6cb6 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -474,7 +474,11 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf, { int err; - if (!dev->netdev_ops->ndo_xdp_xmit) + if (!(dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT)) + return -EOPNOTSUPP; + + if (unlikely(!(dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT_SG) && + xdp_frame_has_frags(xdpf))) return -EOPNOTSUPP; err = xdp_ok_fwd_dev(dev, xdp_get_frame_len(xdpf)); @@ -532,8 +536,14 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf, static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf) { - if (!obj || - !obj->dev->netdev_ops->ndo_xdp_xmit) + if (!obj) + return false; + + if (!(obj->dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT)) + return false; + + if (unlikely(!(obj->dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT_SG) && + xdp_frame_has_frags(xdpf))) return false; if (xdp_ok_fwd_dev(obj->dev, xdp_get_frame_len(xdpf))) diff --git a/net/core/Makefile b/net/core/Makefile index 10edd66a8a37..8f367813bc68 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -12,7 +12,8 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o obj-y += dev.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \ - fib_notifier.o xdp.o flow_offload.o gro.o + fib_notifier.o xdp.o flow_offload.o gro.o \ + netdev-genl.o netdev-genl-gen.o obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o diff --git a/net/core/dev.c b/net/core/dev.c index f72f5c4ee7e2..9ac0eeb2c8cd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1614,6 +1614,7 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd) N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO) N(PRE_CHANGEADDR) N(OFFLOAD_XSTATS_ENABLE) N(OFFLOAD_XSTATS_DISABLE) N(OFFLOAD_XSTATS_REPORT_USED) N(OFFLOAD_XSTATS_REPORT_DELTA) + N(XDP_FEAT_CHANGE) } #undef N return "UNKNOWN_NETDEV_EVENT"; diff --git a/net/core/filter.c b/net/core/filter.c index 0039cf16713e..2ce06a72a5ba 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4318,16 +4318,13 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); enum bpf_map_type map_type = ri->map_type; - /* XDP_REDIRECT is not fully supported yet for xdp frags since - * not all XDP capable drivers can map non-linear xdp_frame in - * ndo_xdp_xmit. - */ - if (unlikely(xdp_buff_has_frags(xdp) && - map_type != BPF_MAP_TYPE_CPUMAP)) - return -EOPNOTSUPP; + if (map_type == BPF_MAP_TYPE_XSKMAP) { + /* XDP_REDIRECT is not supported AF_XDP yet. */ + if (unlikely(xdp_buff_has_frags(xdp))) + return -EOPNOTSUPP; - if (map_type == BPF_MAP_TYPE_XSKMAP) return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); + } return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), xdp_prog); diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c new file mode 100644 index 000000000000..48812ec843f5 --- /dev/null +++ b/net/core/netdev-genl-gen.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: BSD-3-Clause +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN kernel source */ + +#include <net/netlink.h> +#include <net/genetlink.h> + +#include "netdev-genl-gen.h" + +#include <linux/netdev.h> + +/* NETDEV_CMD_DEV_GET - do */ +static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1] = { + [NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1), +}; + +/* Ops table for netdev */ +static const struct genl_split_ops netdev_nl_ops[2] = { + { + .cmd = NETDEV_CMD_DEV_GET, + .doit = netdev_nl_dev_get_doit, + .policy = netdev_dev_get_nl_policy, + .maxattr = NETDEV_A_DEV_IFINDEX, + .flags = GENL_CMD_CAP_DO, + }, + { + .cmd = NETDEV_CMD_DEV_GET, + .dumpit = netdev_nl_dev_get_dumpit, + .flags = GENL_CMD_CAP_DUMP, + }, +}; + +static const struct genl_multicast_group netdev_nl_mcgrps[] = { + [NETDEV_NLGRP_MGMT] = { "mgmt", }, +}; + +struct genl_family netdev_nl_family __ro_after_init = { + .name = NETDEV_FAMILY_NAME, + .version = NETDEV_FAMILY_VERSION, + .netnsok = true, + .parallel_ops = true, + .module = THIS_MODULE, + .split_ops = netdev_nl_ops, + .n_split_ops = ARRAY_SIZE(netdev_nl_ops), + .mcgrps = netdev_nl_mcgrps, + .n_mcgrps = ARRAY_SIZE(netdev_nl_mcgrps), +}; diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h new file mode 100644 index 000000000000..b16dc7e026bb --- /dev/null +++ b/net/core/netdev-genl-gen.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN kernel header */ + +#ifndef _LINUX_NETDEV_GEN_H +#define _LINUX_NETDEV_GEN_H + +#include <net/netlink.h> +#include <net/genetlink.h> + +#include <linux/netdev.h> + +int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info); +int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); + +enum { + NETDEV_NLGRP_MGMT, +}; + +extern struct genl_family netdev_nl_family; + +#endif /* _LINUX_NETDEV_GEN_H */ diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c new file mode 100644 index 000000000000..a4270fafdf11 --- /dev/null +++ b/net/core/netdev-genl.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/netdevice.h> +#include <linux/notifier.h> +#include <linux/rtnetlink.h> +#include <net/net_namespace.h> +#include <net/sock.h> + +#include "netdev-genl-gen.h" + +static int +netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp, + u32 portid, u32 seq, int flags, u32 cmd) +{ + void *hdr; + + hdr = genlmsg_put(rsp, portid, seq, &netdev_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) || + nla_put_u64_64bit(rsp, NETDEV_A_DEV_XDP_FEATURES, + netdev->xdp_features, NETDEV_A_DEV_PAD)) { + genlmsg_cancel(rsp, hdr); + return -EINVAL; + } + + genlmsg_end(rsp, hdr); + + return 0; +} + +static void +netdev_genl_dev_notify(struct net_device *netdev, int cmd) +{ + struct sk_buff *ntf; + + if (!genl_has_listeners(&netdev_nl_family, dev_net(netdev), + NETDEV_NLGRP_MGMT)) + return; + + ntf = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!ntf) + return; + + if (netdev_nl_dev_fill(netdev, ntf, 0, 0, 0, cmd)) { + nlmsg_free(ntf); + return; + } + + genlmsg_multicast_netns(&netdev_nl_family, dev_net(netdev), ntf, + 0, NETDEV_NLGRP_MGMT, GFP_KERNEL); +} + +int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct net_device *netdev; + struct sk_buff *rsp; + u32 ifindex; + int err; + + if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX)) + return -EINVAL; + + ifindex = nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); + + rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!rsp) + return -ENOMEM; + + rtnl_lock(); + + netdev = __dev_get_by_index(genl_info_net(info), ifindex); + if (netdev) + err = netdev_nl_dev_fill(netdev, rsp, info->snd_portid, + info->snd_seq, 0, info->genlhdr->cmd); + else + err = -ENODEV; + + rtnl_unlock(); + + if (err) + goto err_free_msg; + + return genlmsg_reply(rsp, info); + +err_free_msg: + nlmsg_free(rsp); + return err; +} + +int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct net *net = sock_net(skb->sk); + struct net_device *netdev; + int idx = 0, s_idx; + int h, s_h; + int err; + + s_h = cb->args[0]; + s_idx = cb->args[1]; + + rtnl_lock(); + + for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { + struct hlist_head *head; + + idx = 0; + head = &net->dev_index_head[h]; + hlist_for_each_entry(netdev, head, index_hlist) { + if (idx < s_idx) + goto cont; + err = netdev_nl_dev_fill(netdev, skb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, 0, + NETDEV_CMD_DEV_GET); + if (err < 0) + break; +cont: + idx++; + } + } + + rtnl_unlock(); + + if (err != -EMSGSIZE) + return err; + + cb->args[1] = idx; + cb->args[0] = h; + cb->seq = net->dev_base_seq; + + return skb->len; +} + +static int netdev_genl_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr) +{ + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); + + switch (event) { + case NETDEV_REGISTER: + netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_ADD_NTF); + break; + case NETDEV_UNREGISTER: + netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_DEL_NTF); + break; + case NETDEV_XDP_FEAT_CHANGE: + netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_CHANGE_NTF); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block netdev_genl_nb = { + .notifier_call = netdev_genl_netdevice_event, +}; + +static int __init netdev_genl_init(void) +{ + int err; + + err = register_netdevice_notifier(&netdev_genl_nb); + if (err) + return err; + + err = genl_register_family(&netdev_nl_family); + if (err) + goto err_unreg_ntf; + + return 0; + +err_unreg_ntf: + unregister_netdevice_notifier(&netdev_genl_nb); + return err; +} + +subsys_initcall(netdev_genl_init); diff --git a/net/core/xdp.c b/net/core/xdp.c index 787fb9f92b36..26483935b7a4 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -774,3 +774,21 @@ static int __init xdp_metadata_init(void) return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); } late_initcall(xdp_metadata_init); + +void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg) +{ + dev->xdp_features |= NETDEV_XDP_ACT_NDO_XMIT; + if (support_sg) + dev->xdp_features |= NETDEV_XDP_ACT_NDO_XMIT_SG; + + call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev); +} +EXPORT_SYMBOL_GPL(xdp_features_set_redirect_target); + +void xdp_features_clear_redirect_target(struct net_device *dev) +{ + dev->xdp_features &= ~(NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_NDO_XMIT_SG); + call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev); +} +EXPORT_SYMBOL_GPL(xdp_features_clear_redirect_target); diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index ed6c71826d31..b2df1e0f8153 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -140,6 +140,10 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool) } } +#define NETDEV_XDP_ACT_ZC (NETDEV_XDP_ACT_BASIC | \ + NETDEV_XDP_ACT_REDIRECT | \ + NETDEV_XDP_ACT_XSK_ZEROCOPY) + int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev, u16 queue_id, u16 flags) { @@ -178,8 +182,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool, /* For copy-mode, we are done. */ return 0; - if (!netdev->netdev_ops->ndo_bpf || - !netdev->netdev_ops->ndo_xsk_wakeup) { + if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) { err = -EOPNOTSUPP; goto err_unreg_pool; } diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h new file mode 100644 index 000000000000..9ee459872600 --- /dev/null +++ b/tools/include/uapi/linux/netdev.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN uapi header */ + +#ifndef _UAPI_LINUX_NETDEV_H +#define _UAPI_LINUX_NETDEV_H + +#define NETDEV_FAMILY_NAME "netdev" +#define NETDEV_FAMILY_VERSION 1 + +/** + * enum netdev_xdp_act + * @NETDEV_XDP_ACT_BASIC: XDP feautues set supported by all drivers + * (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) + * @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT + * @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements + * ndo_xdp_xmit callback. + * @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP + * in zero copy mode. + * @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw + * oflloading. + * @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear + * XDP buffer support in the driver napi callback. + * @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements + * non-linear XDP buffer support in ndo_xdp_xmit callback. + */ +enum netdev_xdp_act { + NETDEV_XDP_ACT_BASIC = 1, + NETDEV_XDP_ACT_REDIRECT = 2, + NETDEV_XDP_ACT_NDO_XMIT = 4, + NETDEV_XDP_ACT_XSK_ZEROCOPY = 8, + NETDEV_XDP_ACT_HW_OFFLOAD = 16, + NETDEV_XDP_ACT_RX_SG = 32, + NETDEV_XDP_ACT_NDO_XMIT_SG = 64, +}; + +enum { + NETDEV_A_DEV_IFINDEX = 1, + NETDEV_A_DEV_PAD, + NETDEV_A_DEV_XDP_FEATURES, + + __NETDEV_A_DEV_MAX, + NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) +}; + +enum { + NETDEV_CMD_DEV_GET = 1, + NETDEV_CMD_DEV_ADD_NTF, + NETDEV_CMD_DEV_DEL_NTF, + NETDEV_CMD_DEV_CHANGE_NTF, + + __NETDEV_CMD_MAX, + NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1) +}; + +#define NETDEV_MCGRP_MGMT "mgmt" + +#endif /* _UAPI_LINUX_NETDEV_H */ diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 8777ff21ea1d..b18581277eb2 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1048,9 +1048,10 @@ struct bpf_xdp_query_opts { __u32 hw_prog_id; /* output */ __u32 skb_prog_id; /* output */ __u8 attach_mode; /* output */ + __u64 feature_flags; /* output */ size_t :0; }; -#define bpf_xdp_query_opts__last_field attach_mode +#define bpf_xdp_query_opts__last_field feature_flags LIBBPF_API int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, const struct bpf_xdp_attach_opts *opts); diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 35104580870c..32b13b7a11b0 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -9,6 +9,7 @@ #include <linux/if_ether.h> #include <linux/pkt_cls.h> #include <linux/rtnetlink.h> +#include <linux/netdev.h> #include <sys/socket.h> #include <errno.h> #include <time.h> @@ -39,9 +40,15 @@ struct xdp_id_md { int ifindex; __u32 flags; struct xdp_link_info info; + __u64 feature_flags; }; -static int libbpf_netlink_open(__u32 *nl_pid) +struct xdp_features_md { + int ifindex; + __u64 flags; +}; + +static int libbpf_netlink_open(__u32 *nl_pid, int proto) { struct sockaddr_nl sa; socklen_t addrlen; @@ -51,7 +58,7 @@ static int libbpf_netlink_open(__u32 *nl_pid) memset(&sa, 0, sizeof(sa)); sa.nl_family = AF_NETLINK; - sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE); + sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, proto); if (sock < 0) return -errno; @@ -212,14 +219,14 @@ done: } static int libbpf_netlink_send_recv(struct libbpf_nla_req *req, - __dump_nlmsg_t parse_msg, + int proto, __dump_nlmsg_t parse_msg, libbpf_dump_nlmsg_t parse_attr, void *cookie) { __u32 nl_pid = 0; int sock, ret; - sock = libbpf_netlink_open(&nl_pid); + sock = libbpf_netlink_open(&nl_pid, proto); if (sock < 0) return sock; @@ -238,6 +245,43 @@ out: return ret; } +static int parse_genl_family_id(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn, + void *cookie) +{ + struct genlmsghdr *gnl = NLMSG_DATA(nh); + struct nlattr *na = (struct nlattr *)((void *)gnl + GENL_HDRLEN); + struct nlattr *tb[CTRL_ATTR_FAMILY_ID + 1]; + __u16 *id = cookie; + + libbpf_nla_parse(tb, CTRL_ATTR_FAMILY_ID, na, + NLMSG_PAYLOAD(nh, sizeof(*gnl)), NULL); + if (!tb[CTRL_ATTR_FAMILY_ID]) + return NL_CONT; + + *id = libbpf_nla_getattr_u16(tb[CTRL_ATTR_FAMILY_ID]); + return NL_DONE; +} + +static int libbpf_netlink_resolve_genl_family_id(const char *name, + __u16 len, __u16 *id) +{ + struct libbpf_nla_req req = { + .nh.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN), + .nh.nlmsg_type = GENL_ID_CTRL, + .nh.nlmsg_flags = NLM_F_REQUEST, + .gnl.cmd = CTRL_CMD_GETFAMILY, + .gnl.version = 2, + }; + int err; + + err = nlattr_add(&req, CTRL_ATTR_FAMILY_NAME, name, len); + if (err < 0) + return err; + + return libbpf_netlink_send_recv(&req, NETLINK_GENERIC, + parse_genl_family_id, NULL, id); +} + static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, __u32 flags) { @@ -271,7 +315,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, } nlattr_end_nested(&req, nla); - return libbpf_netlink_send_recv(&req, NULL, NULL, NULL); + return libbpf_netlink_send_recv(&req, NETLINK_ROUTE, NULL, NULL, NULL); } int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, const struct bpf_xdp_attach_opts *opts) @@ -357,6 +401,29 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb) return 0; } +static int parse_xdp_features(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn, + void *cookie) +{ + struct genlmsghdr *gnl = NLMSG_DATA(nh); + struct nlattr *na = (struct nlattr *)((void *)gnl + GENL_HDRLEN); + struct nlattr *tb[NETDEV_CMD_MAX + 1]; + struct xdp_features_md *md = cookie; + __u32 ifindex; + + libbpf_nla_parse(tb, NETDEV_CMD_MAX, na, + NLMSG_PAYLOAD(nh, sizeof(*gnl)), NULL); + + if (!tb[NETDEV_A_DEV_IFINDEX] || !tb[NETDEV_A_DEV_XDP_FEATURES]) + return NL_CONT; + + ifindex = libbpf_nla_getattr_u32(tb[NETDEV_A_DEV_IFINDEX]); + if (ifindex != md->ifindex) + return NL_CONT; + + md->flags = libbpf_nla_getattr_u64(tb[NETDEV_A_DEV_XDP_FEATURES]); + return NL_DONE; +} + int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts) { struct libbpf_nla_req req = { @@ -366,6 +433,10 @@ int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts) .ifinfo.ifi_family = AF_PACKET, }; struct xdp_id_md xdp_id = {}; + struct xdp_features_md md = { + .ifindex = ifindex, + }; + __u16 id; int err; if (!OPTS_VALID(opts, bpf_xdp_query_opts)) @@ -382,7 +453,7 @@ int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts) xdp_id.ifindex = ifindex; xdp_id.flags = xdp_flags; - err = libbpf_netlink_send_recv(&req, __dump_link_nlmsg, + err = libbpf_netlink_send_recv(&req, NETLINK_ROUTE, __dump_link_nlmsg, get_xdp_info, &xdp_id); if (err) return libbpf_err(err); @@ -393,6 +464,31 @@ int bpf_xdp_query(int ifindex, int xdp_flags, struct bpf_xdp_query_opts *opts) OPTS_SET(opts, skb_prog_id, xdp_id.info.skb_prog_id); OPTS_SET(opts, attach_mode, xdp_id.info.attach_mode); + if (!OPTS_HAS(opts, feature_flags)) + return 0; + + err = libbpf_netlink_resolve_genl_family_id("netdev", sizeof("netdev"), &id); + if (err < 0) + return libbpf_err(err); + + memset(&req, 0, sizeof(req)); + req.nh.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); + req.nh.nlmsg_flags = NLM_F_REQUEST; + req.nh.nlmsg_type = id; + req.gnl.cmd = NETDEV_CMD_DEV_GET; + req.gnl.version = 2; + + err = nlattr_add(&req, NETDEV_A_DEV_IFINDEX, &ifindex, sizeof(ifindex)); + if (err < 0) + return err; + + err = libbpf_netlink_send_recv(&req, NETLINK_GENERIC, + parse_xdp_features, NULL, &md); + if (err) + return libbpf_err(err); + + opts->feature_flags = md.flags; + return 0; } @@ -493,7 +589,7 @@ static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags) if (ret < 0) return ret; - return libbpf_netlink_send_recv(&req, NULL, NULL, NULL); + return libbpf_netlink_send_recv(&req, NETLINK_ROUTE, NULL, NULL, NULL); } static int tc_qdisc_create_excl(struct bpf_tc_hook *hook) @@ -673,7 +769,8 @@ int bpf_tc_attach(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts) info.opts = opts; - ret = libbpf_netlink_send_recv(&req, get_tc_info, NULL, &info); + ret = libbpf_netlink_send_recv(&req, NETLINK_ROUTE, get_tc_info, NULL, + &info); if (ret < 0) return libbpf_err(ret); if (!info.processed) @@ -739,7 +836,7 @@ static int __bpf_tc_detach(const struct bpf_tc_hook *hook, return ret; } - return libbpf_netlink_send_recv(&req, NULL, NULL, NULL); + return libbpf_netlink_send_recv(&req, NETLINK_ROUTE, NULL, NULL, NULL); } int bpf_tc_detach(const struct bpf_tc_hook *hook, @@ -804,7 +901,8 @@ int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts) info.opts = opts; - ret = libbpf_netlink_send_recv(&req, get_tc_info, NULL, &info); + ret = libbpf_netlink_send_recv(&req, NETLINK_ROUTE, get_tc_info, NULL, + &info); if (ret < 0) return libbpf_err(ret); if (!info.processed) diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h index 4d15ae2ff812..d92d1c1de700 100644 --- a/tools/lib/bpf/nlattr.h +++ b/tools/lib/bpf/nlattr.h @@ -14,6 +14,7 @@ #include <errno.h> #include <linux/netlink.h> #include <linux/rtnetlink.h> +#include <linux/genetlink.h> /* avoid multiple definition of netlink features */ #define __LINUX_NETLINK_H @@ -58,6 +59,7 @@ struct libbpf_nla_req { union { struct ifinfomsg ifinfo; struct tcmsg tc; + struct genlmsghdr gnl; }; char buf[128]; }; @@ -89,11 +91,21 @@ static inline uint8_t libbpf_nla_getattr_u8(const struct nlattr *nla) return *(uint8_t *)libbpf_nla_data(nla); } +static inline uint16_t libbpf_nla_getattr_u16(const struct nlattr *nla) +{ + return *(uint16_t *)libbpf_nla_data(nla); +} + static inline uint32_t libbpf_nla_getattr_u32(const struct nlattr *nla) { return *(uint32_t *)libbpf_nla_data(nla); } +static inline uint64_t libbpf_nla_getattr_u64(const struct nlattr *nla) +{ + return *(uint64_t *)libbpf_nla_data(nla); +} + static inline const char *libbpf_nla_getattr_str(const struct nlattr *nla) { return (const char *)libbpf_nla_data(nla); diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index 4aa5bba956ff..116fecf80ca1 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -48,3 +48,4 @@ xskxceiver xdp_redirect_multi xdp_synproxy xdp_hw_metadata +xdp_features diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index e79039726173..b2eb3201b85a 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -73,7 +73,8 @@ TEST_PROGS := test_kmod.sh \ test_bpftool.sh \ test_bpftool_metadata.sh \ test_doc_build.sh \ - test_xsk.sh + test_xsk.sh \ + test_xdp_features.sh TEST_PROGS_EXTENDED := with_addr.sh \ with_tunnels.sh ima_setup.sh verify_sig_setup.sh \ @@ -83,7 +84,8 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata + xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \ + xdp_features TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file TEST_GEN_FILES += liburandom_read.so @@ -385,6 +387,7 @@ test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o +xdp_features.skel.h-deps := xdp_features.bpf.o LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps))) @@ -587,6 +590,10 @@ $(OUTPUT)/xdp_hw_metadata: xdp_hw_metadata.c $(OUTPUT)/network_helpers.o $(OUTPU $(call msg,BINARY,,$@) $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ +$(OUTPUT)/xdp_features: xdp_features.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xdp_features.skel.h | $(OUTPUT) + $(call msg,BINARY,,$@) + $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ + # Make sure we are able to include and link libbpf against c++. $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ) $(call msg,CXX,,$@) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c index ac70e871d62f..2666c84dbd01 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -4,10 +4,12 @@ #include <net/if.h> #include <linux/if_ether.h> #include <linux/if_packet.h> +#include <linux/if_link.h> #include <linux/ipv6.h> #include <linux/in6.h> #include <linux/udp.h> #include <bpf/bpf_endian.h> +#include <uapi/linux/netdev.h> #include "test_xdp_do_redirect.skel.h" #define SYS(fmt, ...) \ @@ -96,7 +98,7 @@ void test_xdp_do_redirect(void) struct test_xdp_do_redirect *skel = NULL; struct nstoken *nstoken = NULL; struct bpf_link *link; - + LIBBPF_OPTS(bpf_xdp_query_opts, query_opts); struct xdp_md ctx_in = { .data = sizeof(__u32), .data_end = sizeof(data) }; DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts, @@ -157,6 +159,29 @@ void test_xdp_do_redirect(void) !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst")) goto out; + /* Check xdp features supported by veth driver */ + err = bpf_xdp_query(ifindex_src, XDP_FLAGS_DRV_MODE, &query_opts); + if (!ASSERT_OK(err, "veth_src bpf_xdp_query")) + goto out; + + if (!ASSERT_EQ(query_opts.feature_flags, + NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG, + "veth_src query_opts.feature_flags")) + goto out; + + err = bpf_xdp_query(ifindex_dst, XDP_FLAGS_DRV_MODE, &query_opts); + if (!ASSERT_OK(err, "veth_dst bpf_xdp_query")) + goto out; + + if (!ASSERT_EQ(query_opts.feature_flags, + NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG, + "veth_dst query_opts.feature_flags")) + goto out; + memcpy(skel->rodata->expect_dst, &pkt_udp.eth.h_dest, ETH_ALEN); skel->rodata->ifindex_out = ifindex_src; /* redirect back to the same iface */ skel->rodata->ifindex_in = ifindex_src; diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_info.c b/tools/testing/selftests/bpf/prog_tests/xdp_info.c index cd3aa340e65e..286c21ecdc65 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_info.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_info.c @@ -8,6 +8,7 @@ void serial_test_xdp_info(void) { __u32 len = sizeof(struct bpf_prog_info), duration = 0, prog_id; const char *file = "./xdp_dummy.bpf.o"; + LIBBPF_OPTS(bpf_xdp_query_opts, opts); struct bpf_prog_info info = {}; struct bpf_object *obj; int err, prog_fd; @@ -61,6 +62,13 @@ void serial_test_xdp_info(void) if (CHECK(prog_id, "prog_id_drv", "unexpected prog_id=%u\n", prog_id)) goto out; + /* Check xdp features supported by lo device */ + opts.feature_flags = ~0; + err = bpf_xdp_query(IFINDEX_LO, XDP_FLAGS_DRV_MODE, &opts); + if (!ASSERT_OK(err, "bpf_xdp_query")) + goto out; + + ASSERT_EQ(opts.feature_flags, 0, "opts.feature_flags"); out: bpf_xdp_detach(IFINDEX_LO, 0, NULL); out_close: diff --git a/tools/testing/selftests/bpf/progs/xdp_features.c b/tools/testing/selftests/bpf/progs/xdp_features.c new file mode 100644 index 000000000000..87c247d56f72 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_features.c @@ -0,0 +1,269 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <stdbool.h> +#include <linux/bpf.h> +#include <linux/netdev.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_endian.h> +#include <bpf/bpf_tracing.h> +#include <linux/if_ether.h> +#include <linux/ip.h> +#include <linux/ipv6.h> +#include <linux/in.h> +#include <linux/in6.h> +#include <linux/udp.h> +#include <asm-generic/errno-base.h> + +#include "xdp_features.h" + +#define ipv6_addr_equal(a, b) ((a).s6_addr32[0] == (b).s6_addr32[0] && \ + (a).s6_addr32[1] == (b).s6_addr32[1] && \ + (a).s6_addr32[2] == (b).s6_addr32[2] && \ + (a).s6_addr32[3] == (b).s6_addr32[3]) + +struct net_device; +struct bpf_prog; + +struct xdp_cpumap_stats { + unsigned int redirect; + unsigned int pass; + unsigned int drop; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, __u32); + __uint(max_entries, 1); +} stats SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, __u32); + __type(value, __u32); + __uint(max_entries, 1); +} dut_stats SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_CPUMAP); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(struct bpf_cpumap_val)); + __uint(max_entries, 1); +} cpu_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 1); +} dev_map SEC(".maps"); + +const volatile struct in6_addr tester_addr; +const volatile struct in6_addr dut_addr; + +static __always_inline int +xdp_process_echo_packet(struct xdp_md *xdp, bool dut) +{ + void *data_end = (void *)(long)xdp->data_end; + void *data = (void *)(long)xdp->data; + struct ethhdr *eh = data; + struct tlv_hdr *tlv; + struct udphdr *uh; + __be16 port; + __u8 *cmd; + + if (eh + 1 > (struct ethhdr *)data_end) + return -EINVAL; + + if (eh->h_proto == bpf_htons(ETH_P_IP)) { + struct iphdr *ih = (struct iphdr *)(eh + 1); + __be32 saddr = dut ? tester_addr.s6_addr32[3] + : dut_addr.s6_addr32[3]; + __be32 daddr = dut ? dut_addr.s6_addr32[3] + : tester_addr.s6_addr32[3]; + + ih = (struct iphdr *)(eh + 1); + if (ih + 1 > (struct iphdr *)data_end) + return -EINVAL; + + if (saddr != ih->saddr) + return -EINVAL; + + if (daddr != ih->daddr) + return -EINVAL; + + if (ih->protocol != IPPROTO_UDP) + return -EINVAL; + + uh = (struct udphdr *)(ih + 1); + } else if (eh->h_proto == bpf_htons(ETH_P_IPV6)) { + struct in6_addr saddr = dut ? tester_addr : dut_addr; + struct in6_addr daddr = dut ? dut_addr : tester_addr; + struct ipv6hdr *ih6 = (struct ipv6hdr *)(eh + 1); + + if (ih6 + 1 > (struct ipv6hdr *)data_end) + return -EINVAL; + + if (!ipv6_addr_equal(saddr, ih6->saddr)) + return -EINVAL; + + if (!ipv6_addr_equal(daddr, ih6->daddr)) + return -EINVAL; + + if (ih6->nexthdr != IPPROTO_UDP) + return -EINVAL; + + uh = (struct udphdr *)(ih6 + 1); + } else { + return -EINVAL; + } + + if (uh + 1 > (struct udphdr *)data_end) + return -EINVAL; + + port = dut ? uh->dest : uh->source; + if (port != bpf_htons(DUT_ECHO_PORT)) + return -EINVAL; + + tlv = (struct tlv_hdr *)(uh + 1); + if (tlv + 1 > data_end) + return -EINVAL; + + return bpf_htons(tlv->type) == CMD_ECHO ? 0 : -EINVAL; +} + +static __always_inline int +xdp_update_stats(struct xdp_md *xdp, bool tx, bool dut) +{ + __u32 *val, key = 0; + + if (xdp_process_echo_packet(xdp, tx)) + return -EINVAL; + + if (dut) + val = bpf_map_lookup_elem(&dut_stats, &key); + else + val = bpf_map_lookup_elem(&stats, &key); + + if (val) + __sync_add_and_fetch(val, 1); + + return 0; +} + +/* Tester */ + +SEC("xdp") +int xdp_tester_check_tx(struct xdp_md *xdp) +{ + xdp_update_stats(xdp, true, false); + + return XDP_PASS; +} + +SEC("xdp") +int xdp_tester_check_rx(struct xdp_md *xdp) +{ + xdp_update_stats(xdp, false, false); + + return XDP_PASS; +} + +/* DUT */ + +SEC("xdp") +int xdp_do_pass(struct xdp_md *xdp) +{ + xdp_update_stats(xdp, true, true); + + return XDP_PASS; +} + +SEC("xdp") +int xdp_do_drop(struct xdp_md *xdp) +{ + if (xdp_update_stats(xdp, true, true)) + return XDP_PASS; + + return XDP_DROP; +} + +SEC("xdp") +int xdp_do_aborted(struct xdp_md *xdp) +{ + if (xdp_process_echo_packet(xdp, true)) + return XDP_PASS; + + return XDP_ABORTED; +} + +SEC("xdp") +int xdp_do_tx(struct xdp_md *xdp) +{ + void *data = (void *)(long)xdp->data; + struct ethhdr *eh = data; + __u8 tmp_mac[ETH_ALEN]; + + if (xdp_update_stats(xdp, true, true)) + return XDP_PASS; + + __builtin_memcpy(tmp_mac, eh->h_source, ETH_ALEN); + __builtin_memcpy(eh->h_source, eh->h_dest, ETH_ALEN); + __builtin_memcpy(eh->h_dest, tmp_mac, ETH_ALEN); + + return XDP_TX; +} + +SEC("xdp") +int xdp_do_redirect(struct xdp_md *xdp) +{ + if (xdp_process_echo_packet(xdp, true)) + return XDP_PASS; + + return bpf_redirect_map(&cpu_map, 0, 0); +} + +SEC("tp_btf/xdp_exception") +int BPF_PROG(xdp_exception, const struct net_device *dev, + const struct bpf_prog *xdp, __u32 act) +{ + __u32 *val, key = 0; + + val = bpf_map_lookup_elem(&dut_stats, &key); + if (val) + __sync_add_and_fetch(val, 1); + + return 0; +} + +SEC("tp_btf/xdp_cpumap_kthread") +int BPF_PROG(tp_xdp_cpumap_kthread, int map_id, unsigned int processed, + unsigned int drops, int sched, struct xdp_cpumap_stats *xdp_stats) +{ + __u32 *val, key = 0; + + val = bpf_map_lookup_elem(&dut_stats, &key); + if (val) + __sync_add_and_fetch(val, 1); + + return 0; +} + +SEC("xdp/cpumap") +int xdp_do_redirect_cpumap(struct xdp_md *xdp) +{ + void *data = (void *)(long)xdp->data; + struct ethhdr *eh = data; + __u8 tmp_mac[ETH_ALEN]; + + if (xdp_process_echo_packet(xdp, true)) + return XDP_PASS; + + __builtin_memcpy(tmp_mac, eh->h_source, ETH_ALEN); + __builtin_memcpy(eh->h_source, eh->h_dest, ETH_ALEN); + __builtin_memcpy(eh->h_dest, tmp_mac, ETH_ALEN); + + return bpf_redirect_map(&dev_map, 0, 0); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_xdp_features.sh b/tools/testing/selftests/bpf/test_xdp_features.sh new file mode 100755 index 000000000000..0aa71c4455c0 --- /dev/null +++ b/tools/testing/selftests/bpf/test_xdp_features.sh @@ -0,0 +1,107 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +readonly NS="ns1-$(mktemp -u XXXXXX)" +readonly V0_IP4=10.10.0.11 +readonly V1_IP4=10.10.0.1 +readonly V0_IP6=2001:db8::11 +readonly V1_IP6=2001:db8::1 + +ret=1 + +setup() { + { + ip netns add ${NS} + + ip link add v1 type veth peer name v0 netns ${NS} + + ip link set v1 up + ip addr add $V1_IP4/24 dev v1 + ip addr add $V1_IP6/64 nodad dev v1 + ip -n ${NS} link set dev v0 up + ip -n ${NS} addr add $V0_IP4/24 dev v0 + ip -n ${NS} addr add $V0_IP6/64 nodad dev v0 + + # Enable XDP mode and disable checksum offload + ethtool -K v1 gro on + ethtool -K v1 tx-checksumming off + ip netns exec ${NS} ethtool -K v0 gro on + ip netns exec ${NS} ethtool -K v0 tx-checksumming off + } > /dev/null 2>&1 +} + +cleanup() { + ip link del v1 2> /dev/null + ip netns del ${NS} 2> /dev/null + [ "$(pidof xdp_features)" = "" ] || kill $(pidof xdp_features) 2> /dev/null +} + +wait_for_dut_server() { + while sleep 1; do + ss -tlp | grep -q xdp_features + [ $? -eq 0 ] && break + done +} + +test_xdp_features() { + setup + + ## XDP_PASS + ./xdp_features -f XDP_PASS -D $V1_IP6 -T $V0_IP6 v1 & + wait_for_dut_server + ip netns exec ${NS} ./xdp_features -t -f XDP_PASS \ + -D $V1_IP6 -C $V1_IP6 \ + -T $V0_IP6 v0 + [ $? -ne 0 ] && exit + + ## XDP_DROP + ./xdp_features -f XDP_DROP -D ::ffff:$V1_IP4 -T ::ffff:$V0_IP4 v1 & + wait_for_dut_server + ip netns exec ${NS} ./xdp_features -t -f XDP_DROP \ + -D ::ffff:$V1_IP4 \ + -C ::ffff:$V1_IP4 \ + -T ::ffff:$V0_IP4 v0 + [ $? -ne 0 ] && exit + + ## XDP_ABORTED + ./xdp_features -f XDP_ABORTED -D $V1_IP6 -T $V0_IP6 v1 & + wait_for_dut_server + ip netns exec ${NS} ./xdp_features -t -f XDP_ABORTED \ + -D $V1_IP6 -C $V1_IP6 \ + -T $V0_IP6 v0 + [ $? -ne 0 ] && exit + + ## XDP_TX + ./xdp_features -f XDP_TX -D ::ffff:$V1_IP4 -T ::ffff:$V0_IP4 v1 & + wait_for_dut_server + ip netns exec ${NS} ./xdp_features -t -f XDP_TX \ + -D ::ffff:$V1_IP4 \ + -C ::ffff:$V1_IP4 \ + -T ::ffff:$V0_IP4 v0 + [ $? -ne 0 ] && exit + + ## XDP_REDIRECT + ./xdp_features -f XDP_REDIRECT -D $V1_IP6 -T $V0_IP6 v1 & + wait_for_dut_server + ip netns exec ${NS} ./xdp_features -t -f XDP_REDIRECT \ + -D $V1_IP6 -C $V1_IP6 \ + -T $V0_IP6 v0 + [ $? -ne 0 ] && exit + + ## XDP_NDO_XMIT + ./xdp_features -f XDP_NDO_XMIT -D ::ffff:$V1_IP4 -T ::ffff:$V0_IP4 v1 & + wait_for_dut_server + ip netns exec ${NS} ./xdp_features -t -f XDP_NDO_XMIT \ + -D ::ffff:$V1_IP4 \ + -C ::ffff:$V1_IP4 \ + -T ::ffff:$V0_IP4 v0 + ret=$? + cleanup +} + +set -e +trap cleanup 2 3 6 9 + +test_xdp_features + +exit $ret diff --git a/tools/testing/selftests/bpf/xdp_features.c b/tools/testing/selftests/bpf/xdp_features.c new file mode 100644 index 000000000000..10fad1243573 --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_features.c @@ -0,0 +1,699 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <uapi/linux/bpf.h> +#include <uapi/linux/netdev.h> +#include <linux/if_link.h> +#include <signal.h> +#include <argp.h> +#include <net/if.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <netinet/tcp.h> +#include <unistd.h> +#include <arpa/inet.h> +#include <bpf/bpf.h> +#include <bpf/libbpf.h> +#include <pthread.h> + +#include <network_helpers.h> + +#include "xdp_features.skel.h" +#include "xdp_features.h" + +#define RED(str) "\033[0;31m" str "\033[0m" +#define GREEN(str) "\033[0;32m" str "\033[0m" +#define YELLOW(str) "\033[0;33m" str "\033[0m" + +static struct env { + bool verbosity; + int ifindex; + bool is_tester; + struct { + enum netdev_xdp_act drv_feature; + enum xdp_action action; + } feature; + struct sockaddr_storage dut_ctrl_addr; + struct sockaddr_storage dut_addr; + struct sockaddr_storage tester_addr; +} env; + +#define BUFSIZE 128 + +void test__fail(void) { /* for network_helpers.c */ } + +static int libbpf_print_fn(enum libbpf_print_level level, + const char *format, va_list args) +{ + if (level == LIBBPF_DEBUG && !env.verbosity) + return 0; + return vfprintf(stderr, format, args); +} + +static volatile bool exiting; + +static void sig_handler(int sig) +{ + exiting = true; +} + +const char *argp_program_version = "xdp-features 0.0"; +const char argp_program_doc[] = +"XDP features detecion application.\n" +"\n" +"XDP features application checks the XDP advertised features match detected ones.\n" +"\n" +"USAGE: ./xdp-features [-vt] [-f <xdp-feature>] [-D <dut-data-ip>] [-T <tester-data-ip>] [-C <dut-ctrl-ip>] <iface-name>\n" +"\n" +"dut-data-ip, tester-data-ip, dut-ctrl-ip: IPv6 or IPv4-mapped-IPv6 addresses;\n" +"\n" +"XDP features\n:" +"- XDP_PASS\n" +"- XDP_DROP\n" +"- XDP_ABORTED\n" +"- XDP_REDIRECT\n" +"- XDP_NDO_XMIT\n" +"- XDP_TX\n"; + +static const struct argp_option opts[] = { + { "verbose", 'v', NULL, 0, "Verbose debug output" }, + { "tester", 't', NULL, 0, "Tester mode" }, + { "feature", 'f', "XDP-FEATURE", 0, "XDP feature to test" }, + { "dut_data_ip", 'D', "DUT-DATA-IP", 0, "DUT IP data channel" }, + { "dut_ctrl_ip", 'C', "DUT-CTRL-IP", 0, "DUT IP control channel" }, + { "tester_data_ip", 'T', "TESTER-DATA-IP", 0, "Tester IP data channel" }, + {}, +}; + +static int get_xdp_feature(const char *arg) +{ + if (!strcmp(arg, "XDP_PASS")) { + env.feature.action = XDP_PASS; + env.feature.drv_feature = NETDEV_XDP_ACT_BASIC; + } else if (!strcmp(arg, "XDP_DROP")) { + env.feature.drv_feature = NETDEV_XDP_ACT_BASIC; + env.feature.action = XDP_DROP; + } else if (!strcmp(arg, "XDP_ABORTED")) { + env.feature.drv_feature = NETDEV_XDP_ACT_BASIC; + env.feature.action = XDP_ABORTED; + } else if (!strcmp(arg, "XDP_TX")) { + env.feature.drv_feature = NETDEV_XDP_ACT_BASIC; + env.feature.action = XDP_TX; + } else if (!strcmp(arg, "XDP_REDIRECT")) { + env.feature.drv_feature = NETDEV_XDP_ACT_REDIRECT; + env.feature.action = XDP_REDIRECT; + } else if (!strcmp(arg, "XDP_NDO_XMIT")) { + env.feature.drv_feature = NETDEV_XDP_ACT_NDO_XMIT; + } else { + return -EINVAL; + } + + return 0; +} + +static char *get_xdp_feature_str(void) +{ + switch (env.feature.action) { + case XDP_PASS: + return YELLOW("XDP_PASS"); + case XDP_DROP: + return YELLOW("XDP_DROP"); + case XDP_ABORTED: + return YELLOW("XDP_ABORTED"); + case XDP_TX: + return YELLOW("XDP_TX"); + case XDP_REDIRECT: + return YELLOW("XDP_REDIRECT"); + default: + break; + } + + if (env.feature.drv_feature == NETDEV_XDP_ACT_NDO_XMIT) + return YELLOW("XDP_NDO_XMIT"); + + return ""; +} + +static error_t parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case 'v': + env.verbosity = true; + break; + case 't': + env.is_tester = true; + break; + case 'f': + if (get_xdp_feature(arg) < 0) { + fprintf(stderr, "Invalid xdp feature: %s\n", arg); + argp_usage(state); + return ARGP_ERR_UNKNOWN; + } + break; + case 'D': + if (make_sockaddr(AF_INET6, arg, DUT_ECHO_PORT, + &env.dut_addr, NULL)) { + fprintf(stderr, "Invalid DUT address: %s\n", arg); + return ARGP_ERR_UNKNOWN; + } + break; + case 'C': + if (make_sockaddr(AF_INET6, arg, DUT_CTRL_PORT, + &env.dut_ctrl_addr, NULL)) { + fprintf(stderr, "Invalid DUT CTRL address: %s\n", arg); + return ARGP_ERR_UNKNOWN; + } + break; + case 'T': + if (make_sockaddr(AF_INET6, arg, 0, &env.tester_addr, NULL)) { + fprintf(stderr, "Invalid Tester address: %s\n", arg); + return ARGP_ERR_UNKNOWN; + } + break; + case ARGP_KEY_ARG: + errno = 0; + if (strlen(arg) >= IF_NAMESIZE) { + fprintf(stderr, "Invalid device name: %s\n", arg); + argp_usage(state); + return ARGP_ERR_UNKNOWN; + } + + env.ifindex = if_nametoindex(arg); + if (!env.ifindex) + env.ifindex = strtoul(arg, NULL, 0); + if (!env.ifindex) { + fprintf(stderr, + "Bad interface index or name (%d): %s\n", + errno, strerror(errno)); + argp_usage(state); + return ARGP_ERR_UNKNOWN; + } + break; + default: + return ARGP_ERR_UNKNOWN; + } + + return 0; +} + +static const struct argp argp = { + .options = opts, + .parser = parse_arg, + .doc = argp_program_doc, +}; + +static void set_env_default(void) +{ + env.feature.drv_feature = NETDEV_XDP_ACT_NDO_XMIT; + env.feature.action = -EINVAL; + env.ifindex = -ENODEV; + make_sockaddr(AF_INET6, "::ffff:127.0.0.1", DUT_CTRL_PORT, + &env.dut_ctrl_addr, NULL); + make_sockaddr(AF_INET6, "::ffff:127.0.0.1", DUT_ECHO_PORT, + &env.dut_addr, NULL); + make_sockaddr(AF_INET6, "::ffff:127.0.0.1", 0, &env.tester_addr, NULL); +} + +static void *dut_echo_thread(void *arg) +{ + unsigned char buf[sizeof(struct tlv_hdr)]; + int sockfd = *(int *)arg; + + while (!exiting) { + struct tlv_hdr *tlv = (struct tlv_hdr *)buf; + struct sockaddr_storage addr; + socklen_t addrlen; + size_t n; + + n = recvfrom(sockfd, buf, sizeof(buf), MSG_WAITALL, + (struct sockaddr *)&addr, &addrlen); + if (n != ntohs(tlv->len)) + continue; + + if (ntohs(tlv->type) != CMD_ECHO) + continue; + + sendto(sockfd, buf, sizeof(buf), MSG_NOSIGNAL | MSG_CONFIRM, + (struct sockaddr *)&addr, addrlen); + } + + pthread_exit((void *)0); + close(sockfd); + + return NULL; +} + +static int dut_run_echo_thread(pthread_t *t, int *sockfd) +{ + int err; + + sockfd = start_reuseport_server(AF_INET6, SOCK_DGRAM, NULL, + DUT_ECHO_PORT, 0, 1); + if (!sockfd) { + fprintf(stderr, "Failed to create echo socket\n"); + return -errno; + } + + /* start echo channel */ + err = pthread_create(t, NULL, dut_echo_thread, sockfd); + if (err) { + fprintf(stderr, "Failed creating dut_echo thread: %s\n", + strerror(-err)); + free_fds(sockfd, 1); + return -EINVAL; + } + + return 0; +} + +static int dut_attach_xdp_prog(struct xdp_features *skel, int flags) +{ + enum xdp_action action = env.feature.action; + struct bpf_program *prog; + unsigned int key = 0; + int err, fd = 0; + + if (env.feature.drv_feature == NETDEV_XDP_ACT_NDO_XMIT) { + struct bpf_devmap_val entry = { + .ifindex = env.ifindex, + }; + + err = bpf_map__update_elem(skel->maps.dev_map, + &key, sizeof(key), + &entry, sizeof(entry), 0); + if (err < 0) + return err; + + fd = bpf_program__fd(skel->progs.xdp_do_redirect_cpumap); + action = XDP_REDIRECT; + } + + switch (action) { + case XDP_TX: + prog = skel->progs.xdp_do_tx; + break; + case XDP_DROP: + prog = skel->progs.xdp_do_drop; + break; + case XDP_ABORTED: + prog = skel->progs.xdp_do_aborted; + break; + case XDP_PASS: + prog = skel->progs.xdp_do_pass; + break; + case XDP_REDIRECT: { + struct bpf_cpumap_val entry = { + .qsize = 2048, + .bpf_prog.fd = fd, + }; + + err = bpf_map__update_elem(skel->maps.cpu_map, + &key, sizeof(key), + &entry, sizeof(entry), 0); + if (err < 0) + return err; + + prog = skel->progs.xdp_do_redirect; + break; + } + default: + return -EINVAL; + } + + err = bpf_xdp_attach(env.ifindex, bpf_program__fd(prog), flags, NULL); + if (err) + fprintf(stderr, + "Failed to attach XDP program to ifindex %d\n", + env.ifindex); + return err; +} + +static int recv_msg(int sockfd, void *buf, size_t bufsize, void *val, + size_t val_size) +{ + struct tlv_hdr *tlv = (struct tlv_hdr *)buf; + size_t len; + + len = recv(sockfd, buf, bufsize, 0); + if (len != ntohs(tlv->len) || len < sizeof(*tlv)) + return -EINVAL; + + if (val) { + len -= sizeof(*tlv); + if (len > val_size) + return -ENOMEM; + + memcpy(val, tlv->data, len); + } + + return 0; +} + +static int dut_run(struct xdp_features *skel) +{ + int flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE; + int state, err, *sockfd, ctrl_sockfd, echo_sockfd; + struct sockaddr_storage ctrl_addr; + pthread_t dut_thread; + socklen_t addrlen; + + sockfd = start_reuseport_server(AF_INET6, SOCK_STREAM, NULL, + DUT_CTRL_PORT, 0, 1); + if (!sockfd) { + fprintf(stderr, "Failed to create DUT socket\n"); + return -errno; + } + + ctrl_sockfd = accept(*sockfd, (struct sockaddr *)&ctrl_addr, &addrlen); + if (ctrl_sockfd < 0) { + fprintf(stderr, "Failed to accept connection on DUT socket\n"); + free_fds(sockfd, 1); + return -errno; + } + + /* CTRL loop */ + while (!exiting) { + unsigned char buf[BUFSIZE] = {}; + struct tlv_hdr *tlv = (struct tlv_hdr *)buf; + + err = recv_msg(ctrl_sockfd, buf, BUFSIZE, NULL, 0); + if (err) + continue; + + switch (ntohs(tlv->type)) { + case CMD_START: { + if (state == CMD_START) + continue; + + state = CMD_START; + /* Load the XDP program on the DUT */ + err = dut_attach_xdp_prog(skel, flags); + if (err) + goto out; + + err = dut_run_echo_thread(&dut_thread, &echo_sockfd); + if (err < 0) + goto out; + + tlv->type = htons(CMD_ACK); + tlv->len = htons(sizeof(*tlv)); + err = send(ctrl_sockfd, buf, sizeof(*tlv), 0); + if (err < 0) + goto end_thread; + break; + } + case CMD_STOP: + if (state != CMD_START) + break; + + state = CMD_STOP; + + exiting = true; + bpf_xdp_detach(env.ifindex, flags, NULL); + + tlv->type = htons(CMD_ACK); + tlv->len = htons(sizeof(*tlv)); + err = send(ctrl_sockfd, buf, sizeof(*tlv), 0); + goto end_thread; + case CMD_GET_XDP_CAP: { + LIBBPF_OPTS(bpf_xdp_query_opts, opts); + unsigned long long val; + size_t n; + + err = bpf_xdp_query(env.ifindex, XDP_FLAGS_DRV_MODE, + &opts); + if (err) { + fprintf(stderr, + "Failed to query XDP cap for ifindex %d\n", + env.ifindex); + goto end_thread; + } + + tlv->type = htons(CMD_ACK); + n = sizeof(*tlv) + sizeof(opts.feature_flags); + tlv->len = htons(n); + + val = htobe64(opts.feature_flags); + memcpy(tlv->data, &val, sizeof(val)); + + err = send(ctrl_sockfd, buf, n, 0); + if (err < 0) + goto end_thread; + break; + } + case CMD_GET_STATS: { + unsigned int key = 0, val; + size_t n; + + err = bpf_map__lookup_elem(skel->maps.dut_stats, + &key, sizeof(key), + &val, sizeof(val), 0); + if (err) { + fprintf(stderr, "bpf_map_lookup_elem failed\n"); + goto end_thread; + } + + tlv->type = htons(CMD_ACK); + n = sizeof(*tlv) + sizeof(val); + tlv->len = htons(n); + + val = htonl(val); + memcpy(tlv->data, &val, sizeof(val)); + + err = send(ctrl_sockfd, buf, n, 0); + if (err < 0) + goto end_thread; + break; + } + default: + break; + } + } + +end_thread: + pthread_join(dut_thread, NULL); +out: + bpf_xdp_detach(env.ifindex, flags, NULL); + close(ctrl_sockfd); + free_fds(sockfd, 1); + + return err; +} + +static bool tester_collect_detected_cap(struct xdp_features *skel, + unsigned int dut_stats) +{ + unsigned int err, key = 0, val; + + if (!dut_stats) + return false; + + err = bpf_map__lookup_elem(skel->maps.stats, &key, sizeof(key), + &val, sizeof(val), 0); + if (err) { + fprintf(stderr, "bpf_map_lookup_elem failed\n"); + return false; + } + + switch (env.feature.action) { + case XDP_PASS: + case XDP_TX: + case XDP_REDIRECT: + return val > 0; + case XDP_DROP: + case XDP_ABORTED: + return val == 0; + default: + break; + } + + if (env.feature.drv_feature == NETDEV_XDP_ACT_NDO_XMIT) + return val > 0; + + return false; +} + +static int send_and_recv_msg(int sockfd, enum test_commands cmd, void *val, + size_t val_size) +{ + unsigned char buf[BUFSIZE] = {}; + struct tlv_hdr *tlv = (struct tlv_hdr *)buf; + int err; + + tlv->type = htons(cmd); + tlv->len = htons(sizeof(*tlv)); + + err = send(sockfd, buf, sizeof(*tlv), 0); + if (err < 0) + return err; + + err = recv_msg(sockfd, buf, BUFSIZE, val, val_size); + if (err < 0) + return err; + + return ntohs(tlv->type) == CMD_ACK ? 0 : -EINVAL; +} + +static int send_echo_msg(void) +{ + unsigned char buf[sizeof(struct tlv_hdr)]; + struct tlv_hdr *tlv = (struct tlv_hdr *)buf; + int sockfd, n; + + sockfd = socket(AF_INET6, SOCK_DGRAM, 0); + if (sockfd < 0) { + fprintf(stderr, "Failed to create echo socket\n"); + return -errno; + } + + tlv->type = htons(CMD_ECHO); + tlv->len = htons(sizeof(*tlv)); + + n = sendto(sockfd, buf, sizeof(*tlv), MSG_NOSIGNAL | MSG_CONFIRM, + (struct sockaddr *)&env.dut_addr, sizeof(env.dut_addr)); + close(sockfd); + + return n == ntohs(tlv->len) ? 0 : -EINVAL; +} + +static int tester_run(struct xdp_features *skel) +{ + int flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE; + unsigned long long advertised_feature; + struct bpf_program *prog; + unsigned int stats; + int i, err, sockfd; + bool detected_cap; + + sockfd = socket(AF_INET6, SOCK_STREAM, 0); + if (sockfd < 0) { + fprintf(stderr, "Failed to create tester socket\n"); + return -errno; + } + + if (settimeo(sockfd, 1000) < 0) + return -EINVAL; + + err = connect(sockfd, (struct sockaddr *)&env.dut_ctrl_addr, + sizeof(env.dut_ctrl_addr)); + if (err) { + fprintf(stderr, "Failed to connect to the DUT\n"); + return -errno; + } + + err = send_and_recv_msg(sockfd, CMD_GET_XDP_CAP, &advertised_feature, + sizeof(advertised_feature)); + if (err < 0) { + close(sockfd); + return err; + } + + advertised_feature = be64toh(advertised_feature); + + if (env.feature.drv_feature == NETDEV_XDP_ACT_NDO_XMIT || + env.feature.action == XDP_TX) + prog = skel->progs.xdp_tester_check_tx; + else + prog = skel->progs.xdp_tester_check_rx; + + err = bpf_xdp_attach(env.ifindex, bpf_program__fd(prog), flags, NULL); + if (err) { + fprintf(stderr, "Failed to attach XDP program to ifindex %d\n", + env.ifindex); + goto out; + } + + err = send_and_recv_msg(sockfd, CMD_START, NULL, 0); + if (err) + goto out; + + for (i = 0; i < 10 && !exiting; i++) { + err = send_echo_msg(); + if (err < 0) + goto out; + + sleep(1); + } + + err = send_and_recv_msg(sockfd, CMD_GET_STATS, &stats, sizeof(stats)); + if (err) + goto out; + + /* stop the test */ + err = send_and_recv_msg(sockfd, CMD_STOP, NULL, 0); + /* send a new echo message to wake echo thread of the dut */ + send_echo_msg(); + + detected_cap = tester_collect_detected_cap(skel, ntohl(stats)); + + fprintf(stdout, "Feature %s: [%s][%s]\n", get_xdp_feature_str(), + detected_cap ? GREEN("DETECTED") : RED("NOT DETECTED"), + env.feature.drv_feature & advertised_feature ? GREEN("ADVERTISED") + : RED("NOT ADVERTISED")); +out: + bpf_xdp_detach(env.ifindex, flags, NULL); + close(sockfd); + return err < 0 ? err : 0; +} + +int main(int argc, char **argv) +{ + struct xdp_features *skel; + int err; + + libbpf_set_strict_mode(LIBBPF_STRICT_ALL); + libbpf_set_print(libbpf_print_fn); + + signal(SIGINT, sig_handler); + signal(SIGTERM, sig_handler); + + set_env_default(); + + /* Parse command line arguments */ + err = argp_parse(&argp, argc, argv, 0, NULL, NULL); + if (err) + return err; + + if (env.ifindex < 0) { + fprintf(stderr, "Invalid ifindex\n"); + return -ENODEV; + } + + /* Load and verify BPF application */ + skel = xdp_features__open(); + if (!skel) { + fprintf(stderr, "Failed to open and load BPF skeleton\n"); + return -EINVAL; + } + + skel->rodata->tester_addr = + ((struct sockaddr_in6 *)&env.tester_addr)->sin6_addr; + skel->rodata->dut_addr = + ((struct sockaddr_in6 *)&env.dut_addr)->sin6_addr; + + /* Load & verify BPF programs */ + err = xdp_features__load(skel); + if (err) { + fprintf(stderr, "Failed to load and verify BPF skeleton\n"); + goto cleanup; + } + + err = xdp_features__attach(skel); + if (err) { + fprintf(stderr, "Failed to attach BPF skeleton\n"); + goto cleanup; + } + + if (env.is_tester) { + /* Tester */ + fprintf(stdout, "Starting tester on device %d\n", env.ifindex); + err = tester_run(skel); + } else { + /* DUT */ + fprintf(stdout, "Starting DUT on device %d\n", env.ifindex); + err = dut_run(skel); + } + +cleanup: + xdp_features__destroy(skel); + + return err < 0 ? -err : 0; +} diff --git a/tools/testing/selftests/bpf/xdp_features.h b/tools/testing/selftests/bpf/xdp_features.h new file mode 100644 index 000000000000..2670c541713b --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_features.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* test commands */ +enum test_commands { + CMD_STOP, /* CMD */ + CMD_START, /* CMD */ + CMD_ECHO, /* CMD */ + CMD_ACK, /* CMD + data */ + CMD_GET_XDP_CAP, /* CMD */ + CMD_GET_STATS, /* CMD */ +}; + +#define DUT_CTRL_PORT 12345 +#define DUT_ECHO_PORT 12346 + +struct tlv_hdr { + __be16 type; + __be16 len; + __u8 data[]; +}; |