diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 19:06:09 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 19:06:09 -0800 |
commit | 8efd0d9c316af470377894a6a0f9ff63ce18c177 (patch) | |
tree | 65d00bf8c7fd8f938a42d38e44bad11d4cf08664 /samples | |
parent | 9bcbf894b6872216ef61faf17248ec234e3db6bc (diff) | |
parent | 8aaaf2f3af2ae212428f4db1af34214225f5cec3 (diff) | |
download | lwn-8efd0d9c316af470377894a6a0f9ff63ce18c177.tar.gz lwn-8efd0d9c316af470377894a6a0f9ff63ce18c177.zip |
Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Defer freeing TCP skbs to the BH handler, whenever possible, or at
least perform the freeing outside of the socket lock section to
decrease cross-CPU allocator work and improve latency.
- Add netdevice refcount tracking to locate sources of netdevice and
net namespace refcount leaks.
- Make Tx watchdog less intrusive - avoid pausing Tx and restarting
all queues from a single CPU removing latency spikes.
- Various small optimizations throughout the stack from Eric Dumazet.
- Make netdev->dev_addr[] constant, force modifications to go via
appropriate helpers to allow us to keep addresses in ordered data
structures.
- Replace unix_table_lock with per-hash locks, improving performance
of bind() calls.
- Extend skb drop tracepoint with a drop reason.
- Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
BPF
---
- New helpers:
- bpf_find_vma(), find and inspect VMAs for profiling use cases
- bpf_loop(), runtime-bounded loop helper trading some execution
time for much faster (if at all converging) verification
- bpf_strncmp(), improve performance, avoid compiler flakiness
- bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
for tracing programs, all inlined by the verifier
- Support BPF relocations (CO-RE) in the kernel loader.
- Further the support for BTF_TYPE_TAG annotations.
- Allow access to local storage in sleepable helpers.
- Convert verifier argument types to a composable form with different
attributes which can be shared across types (ro, maybe-null).
- Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
creating new, extensible ones where missing and deprecating those
to be removed.
Protocols
---------
- WiFi (mac80211/cfg80211):
- notify user space about long "come back in N" AP responses,
allow it to react to such temporary rejections
- allow non-standard VHT MCS 10/11 rates
- use coarse time in airtime fairness code to save CPU cycles
- Bluetooth:
- rework of HCI command execution serialization to use a common
queue and work struct, and improve handling errors reported in
the middle of a batch of commands
- rework HCI event handling to use skb_pull_data, avoiding packet
parsing pitfalls
- support AOSP Bluetooth Quality Report
- SMC:
- support net namespaces, following the RDMA model
- improve connection establishment latency by pre-clearing buffers
- introduce TCP ULP for automatic redirection to SMC
- Multi-Path TCP:
- support ioctls: SIOCINQ, OUTQ, and OUTQNSD
- support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
- support cmsgs: TCP_INQ
- improvements in the data scheduler (assigning data to subflows)
- support fastclose option (quick shutdown of the full MPTCP
connection, similar to TCP RST in regular TCP)
- MCTP (Management Component Transport) over serial, as defined by
DMTF spec DSP0253 - "MCTP Serial Transport Binding".
Driver API
----------
- Support timestamping on bond interfaces in active/passive mode.
- Introduce generic phylink link mode validation for drivers which
don't have any quirks and where MAC capability bits fully express
what's supported. Allow PCS layer to participate in the validation.
Convert a number of drivers.
- Add support to set/get size of buffers on the Rx rings and size of
the tx copybreak buffer via ethtool.
- Support offloading TC actions as first-class citizens rather than
only as attributes of filters, improve sharing and device resource
utilization.
- WiFi (mac80211/cfg80211):
- support forwarding offload (ndo_fill_forward_path)
- support for background radar detection hardware
- SA Query Procedures offload on the AP side
New hardware / drivers
----------------------
- tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
real-time requirements for isochronous communication with protocols
like OPC UA Pub/Sub.
- Qualcomm BAM-DMUX WWAN - driver for data channels of modems
integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
(qcom_bam_dmux).
- Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
with support for bridging, VLANs and multicast forwarding
(lan966x).
- iwlmei driver for co-operating between Intel's WiFi driver and
Intel's Active Management Technology (AMT) devices.
- mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
- Bluetooth:
- MediaTek MT7921 SDIO devices
- Foxconn MT7922A
- Realtek RTL8852AE
Drivers
-------
- Significantly improve performance in the datapaths of: lan78xx,
ax88179_178a, lantiq_xrx200, bnxt.
- Intel Ethernet NICs:
- igb: support PTP/time PEROUT and EXTTS SDP functions on
82580/i354/i350 adapters
- ixgbevf: new PF -> VF mailbox API which avoids the risk of
mailbox corruption with ESXi
- iavf: support configuration of VLAN features of finer
granularity, stacked tags and filtering
- ice: PTP support for new E822 devices with sub-ns precision
- ice: support firmware activation without reboot
- Mellanox Ethernet NICs (mlx5):
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- support TC forwarding when tunnel encap and decap happen between
two ports of the same NIC
- dynamically size and allow disabling various features to save
resources for running in embedded / SmartNIC scenarios
- Broadcom Ethernet NICs (bnxt):
- use page frag allocator to improve Rx performance
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- Other Ethernet NICs:
- amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
- Microsoft cloud/virtual NIC (mana):
- add XDP support (PASS, DROP, TX)
- Mellanox Ethernet switches (mlxsw):
- initial support for Spectrum-4 ASICs
- VxLAN with IPv6 underlay
- Marvell Ethernet switches (prestera):
- support flower flow templates
- add basic IP forwarding support
- NXP embedded Ethernet switches (ocelot & felix):
- support Per-Stream Filtering and Policing (PSFP)
- enable cut-through forwarding between ports by default
- support FDMA to improve packet Rx/Tx to CPU
- Other embedded switches:
- hellcreek: improve trapping management (STP and PTP) packets
- qca8k: support link aggregation and port mirroring
- Qualcomm 802.11ax WiFi (ath11k):
- qca6390, wcn6855: enable 802.11 power save mode in station mode
- BSS color change support
- WCN6855 hw2.1 support
- 11d scan offload support
- scan MAC address randomization support
- full monitor mode, only supported on QCN9074
- qca6390/wcn6855: report signal and tx bitrate
- qca6390: rfkill support
- qca6390/wcn6855: regdb.bin support
- Intel WiFi (iwlwifi):
- support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
in cooperation with the BIOS
- support for Optimized Connectivity Experience (OCE) scan
- support firmware API version 68
- lots of preparatory work for the upcoming Bz device family
- MediaTek WiFi (mt76):
- Specific Absorption Rate (SAR) support
- mt7921: 160 MHz channel support
- RealTek WiFi (rtw88):
- Specific Absorption Rate (SAR) support
- scan offload
- Other WiFi NICs
- ath10k: support fetching (pre-)calibration data from nvmem
- brcmfmac: configure keep-alive packet on suspend
- wcn36xx: beacon filter support"
* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
tcp: tcp_send_challenge_ack delete useless param `skb`
net/qla3xxx: Remove useless DMA-32 fallback configuration
rocker: Remove useless DMA-32 fallback configuration
hinic: Remove useless DMA-32 fallback configuration
lan743x: Remove useless DMA-32 fallback configuration
net: enetc: Remove useless DMA-32 fallback configuration
cxgb4vf: Remove useless DMA-32 fallback configuration
cxgb4: Remove useless DMA-32 fallback configuration
cxgb3: Remove useless DMA-32 fallback configuration
bnx2x: Remove useless DMA-32 fallback configuration
et131x: Remove useless DMA-32 fallback configuration
be2net: Remove useless DMA-32 fallback configuration
vmxnet3: Remove useless DMA-32 fallback configuration
bna: Simplify DMA setting
net: alteon: Simplify DMA setting
myri10ge: Simplify DMA setting
qlcnic: Simplify DMA setting
net: allwinner: Fix print format
page_pool: remove spinlock in page_pool_refill_alloc_cache()
amt: fix wrong return type of amt_send_membership_update()
...
Diffstat (limited to 'samples')
-rw-r--r-- | samples/bpf/Makefile | 18 | ||||
-rw-r--r-- | samples/bpf/Makefile.target | 11 | ||||
-rw-r--r-- | samples/bpf/cookie_uid_helper_example.c | 14 | ||||
-rw-r--r-- | samples/bpf/fds_example.c | 29 | ||||
-rw-r--r-- | samples/bpf/hbm.c | 11 | ||||
-rw-r--r-- | samples/bpf/lwt_len_hist_kern.c | 7 | ||||
-rw-r--r-- | samples/bpf/map_perf_test_user.c | 15 | ||||
-rw-r--r-- | samples/bpf/sock_example.c | 12 | ||||
-rw-r--r-- | samples/bpf/sockex1_user.c | 15 | ||||
-rw-r--r-- | samples/bpf/sockex2_user.c | 14 | ||||
-rw-r--r-- | samples/bpf/test_cgrp2_array_pin.c | 4 | ||||
-rw-r--r-- | samples/bpf/test_cgrp2_attach.c | 13 | ||||
-rw-r--r-- | samples/bpf/test_cgrp2_sock.c | 8 | ||||
-rw-r--r-- | samples/bpf/test_lru_dist.c | 11 | ||||
-rw-r--r-- | samples/bpf/trace_output_user.c | 4 | ||||
-rw-r--r-- | samples/bpf/xdp_fwd_user.c | 12 | ||||
-rw-r--r-- | samples/bpf/xdp_redirect_cpu.bpf.c | 4 | ||||
-rw-r--r-- | samples/bpf/xdp_sample_pkts_user.c | 22 | ||||
-rw-r--r-- | samples/bpf/xdp_sample_user.h | 2 | ||||
-rw-r--r-- | samples/bpf/xdpsock_ctrl_proc.c | 3 | ||||
-rw-r--r-- | samples/bpf/xdpsock_user.c | 366 | ||||
-rw-r--r-- | samples/bpf/xsk_fwd.c | 3 |
22 files changed, 489 insertions, 109 deletions
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index a886dff1ba89..38638845db9d 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -215,6 +215,11 @@ TPROGS_LDFLAGS := -L$(SYSROOT)/usr/lib endif TPROGS_LDLIBS += $(LIBBPF) -lelf -lz +TPROGLDLIBS_xdp_monitor += -lm +TPROGLDLIBS_xdp_redirect += -lm +TPROGLDLIBS_xdp_redirect_cpu += -lm +TPROGLDLIBS_xdp_redirect_map += -lm +TPROGLDLIBS_xdp_redirect_map_multi += -lm TPROGLDLIBS_tracex4 += -lrt TPROGLDLIBS_trace_output += -lrt TPROGLDLIBS_map_perf_test += -lrt @@ -328,7 +333,7 @@ $(BPF_SAMPLES_PATH)/*.c: verify_target_bpf $(LIBBPF) $(src)/*.c: verify_target_bpf $(LIBBPF) libbpf_hdrs: $(LIBBPF) -$(obj)/$(TRACE_HELPERS): | libbpf_hdrs +$(obj)/$(TRACE_HELPERS) $(obj)/$(CGROUP_HELPERS) $(obj)/$(XDP_SAMPLE): | libbpf_hdrs .PHONY: libbpf_hdrs @@ -343,6 +348,17 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h $(obj)/hbm.o: $(src)/hbm.h $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h +# Override includes for xdp_sample_user.o because $(srctree)/usr/include in +# TPROGS_CFLAGS causes conflicts +XDP_SAMPLE_CFLAGS += -Wall -O2 \ + -I$(src)/../../tools/include \ + -I$(src)/../../tools/include/uapi \ + -I$(LIBBPF_INCLUDE) \ + -I$(src)/../../tools/testing/selftests/bpf + +$(obj)/$(XDP_SAMPLE): TPROGS_CFLAGS = $(XDP_SAMPLE_CFLAGS) +$(obj)/$(XDP_SAMPLE): $(src)/xdp_sample_user.h $(src)/xdp_sample_shared.h + -include $(BPF_SAMPLES_PATH)/Makefile.target VMLINUX_BTF_PATHS ?= $(abspath $(if $(O),$(O)/vmlinux)) \ diff --git a/samples/bpf/Makefile.target b/samples/bpf/Makefile.target index 5a368affa038..7621f55e2947 100644 --- a/samples/bpf/Makefile.target +++ b/samples/bpf/Makefile.target @@ -73,14 +73,3 @@ quiet_cmd_tprog-cobjs = CC $@ cmd_tprog-cobjs = $(CC) $(tprogc_flags) -c -o $@ $< $(tprog-cobjs): $(obj)/%.o: $(src)/%.c FORCE $(call if_changed_dep,tprog-cobjs) - -# Override includes for xdp_sample_user.o because $(srctree)/usr/include in -# TPROGS_CFLAGS causes conflicts -XDP_SAMPLE_CFLAGS += -Wall -O2 -lm \ - -I./tools/include \ - -I./tools/include/uapi \ - -I./tools/lib \ - -I./tools/testing/selftests/bpf -$(obj)/xdp_sample_user.o: $(src)/xdp_sample_user.c \ - $(src)/xdp_sample_user.h $(src)/xdp_sample_shared.h - $(CC) $(XDP_SAMPLE_CFLAGS) -c -o $@ $< diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c index 54958802c032..f0df3dda4b1f 100644 --- a/samples/bpf/cookie_uid_helper_example.c +++ b/samples/bpf/cookie_uid_helper_example.c @@ -67,8 +67,8 @@ static bool test_finish; static void maps_create(void) { - map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(uint32_t), - sizeof(struct stats), 100, 0); + map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(uint32_t), + sizeof(struct stats), 100, NULL); if (map_fd < 0) error(1, errno, "map create failed!\n"); } @@ -157,9 +157,13 @@ static void prog_load(void) offsetof(struct __sk_buff, len)), BPF_EXIT_INSN(), }; - prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, - ARRAY_SIZE(prog), "GPL", 0, - log_buf, sizeof(log_buf)); + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .log_buf = log_buf, + .log_size = sizeof(log_buf), + ); + + prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", + prog, ARRAY_SIZE(prog), &opts); if (prog_fd < 0) error(1, errno, "failed to load prog\n%s\n", log_buf); } diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c index 59f45fef5110..16dbf49e0f19 100644 --- a/samples/bpf/fds_example.c +++ b/samples/bpf/fds_example.c @@ -46,12 +46,6 @@ static void usage(void) printf(" -h Display this help.\n"); } -static int bpf_map_create(void) -{ - return bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(uint32_t), - sizeof(uint32_t), 1024, 0); -} - static int bpf_prog_create(const char *object) { static struct bpf_insn insns[] = { @@ -60,16 +54,22 @@ static int bpf_prog_create(const char *object) }; size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn); struct bpf_object *obj; - int prog_fd; + int err; if (object) { - assert(!bpf_prog_load(object, BPF_PROG_TYPE_UNSPEC, - &obj, &prog_fd)); - return prog_fd; + obj = bpf_object__open_file(object, NULL); + assert(!libbpf_get_error(obj)); + err = bpf_object__load(obj); + assert(!err); + return bpf_program__fd(bpf_object__next_program(obj, NULL)); } else { - return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, - insns, insns_cnt, "GPL", 0, - bpf_log_buf, BPF_LOG_BUF_SIZE); + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .log_buf = bpf_log_buf, + .log_size = BPF_LOG_BUF_SIZE, + ); + + return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", + insns, insns_cnt, &opts); } } @@ -79,7 +79,8 @@ static int bpf_do_map(const char *file, uint32_t flags, uint32_t key, int fd, ret; if (flags & BPF_F_PIN) { - fd = bpf_map_create(); + fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(uint32_t), + sizeof(uint32_t), 1024, NULL); printf("bpf: map fd:%d (%s)\n", fd, strerror(errno)); assert(fd > 0); diff --git a/samples/bpf/hbm.c b/samples/bpf/hbm.c index b0c18efe7928..1fe5bcafb3bc 100644 --- a/samples/bpf/hbm.c +++ b/samples/bpf/hbm.c @@ -120,6 +120,9 @@ static void do_error(char *msg, bool errno_flag) static int prog_load(char *prog) { + struct bpf_program *pos; + const char *sec_name; + obj = bpf_object__open_file(prog, NULL); if (libbpf_get_error(obj)) { printf("ERROR: opening BPF object file failed\n"); @@ -132,7 +135,13 @@ static int prog_load(char *prog) goto err; } - bpf_prog = bpf_object__find_program_by_title(obj, "cgroup_skb/egress"); + bpf_object__for_each_program(pos, obj) { + sec_name = bpf_program__section_name(pos); + if (sec_name && !strcmp(sec_name, "cgroup_skb/egress")) { + bpf_prog = pos; + break; + } + } if (!bpf_prog) { printf("ERROR: finding a prog in obj file failed\n"); goto err; diff --git a/samples/bpf/lwt_len_hist_kern.c b/samples/bpf/lwt_len_hist_kern.c index 9ed63e10e170..1fa14c54963a 100644 --- a/samples/bpf/lwt_len_hist_kern.c +++ b/samples/bpf/lwt_len_hist_kern.c @@ -16,13 +16,6 @@ #include <uapi/linux/in.h> #include <bpf/bpf_helpers.h> -# define printk(fmt, ...) \ - ({ \ - char ____fmt[] = fmt; \ - bpf_trace_printk(____fmt, sizeof(____fmt), \ - ##__VA_ARGS__); \ - }) - struct bpf_elf_map { __u32 type; __u32 size_key; diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index 9db949290a78..319fd31522f3 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -134,19 +134,22 @@ static void do_test_lru(enum test_type test, int cpu) */ int outer_fd = map_fd[array_of_lru_hashs_idx]; unsigned int mycpu, mynode; + LIBBPF_OPTS(bpf_map_create_opts, opts, + .map_flags = BPF_F_NUMA_NODE, + ); assert(cpu < MAX_NR_CPUS); ret = syscall(__NR_getcpu, &mycpu, &mynode, NULL); assert(!ret); + opts.numa_node = mynode; inner_lru_map_fds[cpu] = - bpf_create_map_node(BPF_MAP_TYPE_LRU_HASH, - test_map_names[INNER_LRU_HASH_PREALLOC], - sizeof(uint32_t), - sizeof(long), - inner_lru_hash_size, 0, - mynode); + bpf_map_create(BPF_MAP_TYPE_LRU_HASH, + test_map_names[INNER_LRU_HASH_PREALLOC], + sizeof(uint32_t), + sizeof(long), + inner_lru_hash_size, &opts); if (inner_lru_map_fds[cpu] == -1) { printf("cannot create BPF_MAP_TYPE_LRU_HASH %s(%d)\n", strerror(errno), errno); diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c index 23d1930e1927..a88f69504c08 100644 --- a/samples/bpf/sock_example.c +++ b/samples/bpf/sock_example.c @@ -37,8 +37,8 @@ static int test_sock(void) int sock = -1, map_fd, prog_fd, i, key; long long value = 0, tcp_cnt, udp_cnt, icmp_cnt; - map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value), - 256, 0); + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(key), sizeof(value), + 256, NULL); if (map_fd < 0) { printf("failed to create map '%s'\n", strerror(errno)); goto cleanup; @@ -59,9 +59,13 @@ static int test_sock(void) BPF_EXIT_INSN(), }; size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .log_buf = bpf_log_buf, + .log_size = BPF_LOG_BUF_SIZE, + ); - prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, insns_cnt, - "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE); + prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", + prog, insns_cnt, &opts); if (prog_fd < 0) { printf("failed to load prog '%s'\n", strerror(errno)); goto cleanup; diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c index 3c83722877dc..9e8d39e245c1 100644 --- a/samples/bpf/sockex1_user.c +++ b/samples/bpf/sockex1_user.c @@ -11,17 +11,26 @@ int main(int ac, char **argv) { struct bpf_object *obj; + struct bpf_program *prog; int map_fd, prog_fd; char filename[256]; - int i, sock; + int i, sock, err; FILE *f; snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); - if (bpf_prog_load(filename, BPF_PROG_TYPE_SOCKET_FILTER, - &obj, &prog_fd)) + obj = bpf_object__open_file(filename, NULL); + if (libbpf_get_error(obj)) return 1; + prog = bpf_object__next_program(obj, NULL); + bpf_program__set_type(prog, BPF_PROG_TYPE_SOCKET_FILTER); + + err = bpf_object__load(obj); + if (err) + return 1; + + prog_fd = bpf_program__fd(prog); map_fd = bpf_object__find_map_fd_by_name(obj, "my_map"); sock = open_raw_sock("lo"); diff --git a/samples/bpf/sockex2_user.c b/samples/bpf/sockex2_user.c index bafa567b840c..6a3fd369d3fc 100644 --- a/samples/bpf/sockex2_user.c +++ b/samples/bpf/sockex2_user.c @@ -16,18 +16,26 @@ struct pair { int main(int ac, char **argv) { + struct bpf_program *prog; struct bpf_object *obj; int map_fd, prog_fd; char filename[256]; - int i, sock; + int i, sock, err; FILE *f; snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + obj = bpf_object__open_file(filename, NULL); + if (libbpf_get_error(obj)) + return 1; + + prog = bpf_object__next_program(obj, NULL); + bpf_program__set_type(prog, BPF_PROG_TYPE_SOCKET_FILTER); - if (bpf_prog_load(filename, BPF_PROG_TYPE_SOCKET_FILTER, - &obj, &prog_fd)) + err = bpf_object__load(obj); + if (err) return 1; + prog_fd = bpf_program__fd(prog); map_fd = bpf_object__find_map_fd_by_name(obj, "hash_map"); sock = open_raw_sock("lo"); diff --git a/samples/bpf/test_cgrp2_array_pin.c b/samples/bpf/test_cgrp2_array_pin.c index 6d564aa75447..05e88aa63009 100644 --- a/samples/bpf/test_cgrp2_array_pin.c +++ b/samples/bpf/test_cgrp2_array_pin.c @@ -64,9 +64,9 @@ int main(int argc, char **argv) } if (create_array) { - array_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY, + array_fd = bpf_map_create(BPF_MAP_TYPE_CGROUP_ARRAY, NULL, sizeof(uint32_t), sizeof(uint32_t), - 1, 0); + 1, NULL); if (array_fd < 0) { fprintf(stderr, "bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY,...): %s(%d)\n", diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c index 390ff38d2ac6..6d90874b09c3 100644 --- a/samples/bpf/test_cgrp2_attach.c +++ b/samples/bpf/test_cgrp2_attach.c @@ -71,10 +71,13 @@ static int prog_load(int map_fd, int verdict) BPF_EXIT_INSN(), }; size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .log_buf = bpf_log_buf, + .log_size = BPF_LOG_BUF_SIZE, + ); - return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB, - prog, insns_cnt, "GPL", 0, - bpf_log_buf, BPF_LOG_BUF_SIZE); + return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB, NULL, "GPL", + prog, insns_cnt, &opts); } static int usage(const char *argv0) @@ -90,9 +93,9 @@ static int attach_filter(int cg_fd, int type, int verdict) int prog_fd, map_fd, ret, key; long long pkt_cnt, byte_cnt; - map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(key), sizeof(byte_cnt), - 256, 0); + 256, NULL); if (map_fd < 0) { printf("Failed to create map: '%s'\n", strerror(errno)); return EXIT_FAILURE; diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c index b0811da5a00f..a0811df888f4 100644 --- a/samples/bpf/test_cgrp2_sock.c +++ b/samples/bpf/test_cgrp2_sock.c @@ -70,6 +70,10 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio) BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, priority)), BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, priority)), }; + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .log_buf = bpf_log_buf, + .log_size = BPF_LOG_BUF_SIZE, + ); struct bpf_insn *prog; size_t insns_cnt; @@ -115,8 +119,8 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio) insns_cnt /= sizeof(struct bpf_insn); - ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt, - "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE); + ret = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", + prog, insns_cnt, &opts); free(prog); diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c index c92c5c06b965..75e877853596 100644 --- a/samples/bpf/test_lru_dist.c +++ b/samples/bpf/test_lru_dist.c @@ -105,10 +105,10 @@ struct pfect_lru { static void pfect_lru_init(struct pfect_lru *lru, unsigned int lru_size, unsigned int nr_possible_elems) { - lru->map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, + lru->map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(unsigned long long), sizeof(struct pfect_lru_node *), - nr_possible_elems, 0); + nr_possible_elems, NULL); assert(lru->map_fd != -1); lru->free_nodes = malloc(lru_size * sizeof(struct pfect_lru_node)); @@ -207,10 +207,13 @@ static unsigned int read_keys(const char *dist_file, static int create_map(int map_type, int map_flags, unsigned int size) { + LIBBPF_OPTS(bpf_map_create_opts, opts, + .map_flags = map_flags, + ); int map_fd; - map_fd = bpf_create_map(map_type, sizeof(unsigned long long), - sizeof(unsigned long long), size, map_flags); + map_fd = bpf_map_create(map_type, NULL, sizeof(unsigned long long), + sizeof(unsigned long long), size, &opts); if (map_fd == -1) perror("bpf_create_map"); diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c index 364b98764d54..371732f9cf8e 100644 --- a/samples/bpf/trace_output_user.c +++ b/samples/bpf/trace_output_user.c @@ -43,7 +43,6 @@ static void print_bpf_output(void *ctx, int cpu, void *data, __u32 size) int main(int argc, char **argv) { - struct perf_buffer_opts pb_opts = {}; struct bpf_link *link = NULL; struct bpf_program *prog; struct perf_buffer *pb; @@ -84,8 +83,7 @@ int main(int argc, char **argv) goto cleanup; } - pb_opts.sample_cb = print_bpf_output; - pb = perf_buffer__new(map_fd, 8, &pb_opts); + pb = perf_buffer__new(map_fd, 8, print_bpf_output, NULL, NULL, NULL); ret = libbpf_get_error(pb); if (ret) { printf("failed to setup perf_buffer: %d\n", ret); diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c index 00061261a8da..4ad896782f77 100644 --- a/samples/bpf/xdp_fwd_user.c +++ b/samples/bpf/xdp_fwd_user.c @@ -79,7 +79,9 @@ int main(int argc, char **argv) .prog_type = BPF_PROG_TYPE_XDP, }; const char *prog_name = "xdp_fwd"; - struct bpf_program *prog; + struct bpf_program *prog = NULL; + struct bpf_program *pos; + const char *sec_name; int prog_fd, map_fd = -1; char filename[PATH_MAX]; struct bpf_object *obj; @@ -134,7 +136,13 @@ int main(int argc, char **argv) return 1; } - prog = bpf_object__find_program_by_title(obj, prog_name); + bpf_object__for_each_program(pos, obj) { + sec_name = bpf_program__section_name(pos); + if (sec_name && !strcmp(sec_name, prog_name)) { + prog = pos; + break; + } + } prog_fd = bpf_program__fd(prog); if (prog_fd < 0) { printf("program not found: %s\n", strerror(prog_fd)); diff --git a/samples/bpf/xdp_redirect_cpu.bpf.c b/samples/bpf/xdp_redirect_cpu.bpf.c index f10fe3cf25f6..25e3a405375f 100644 --- a/samples/bpf/xdp_redirect_cpu.bpf.c +++ b/samples/bpf/xdp_redirect_cpu.bpf.c @@ -100,7 +100,6 @@ u16 get_dest_port_ipv4_udp(struct xdp_md *ctx, u64 nh_off) void *data = (void *)(long)ctx->data; struct iphdr *iph = data + nh_off; struct udphdr *udph; - u16 dport; if (iph + 1 > data_end) return 0; @@ -111,8 +110,7 @@ u16 get_dest_port_ipv4_udp(struct xdp_md *ctx, u64 nh_off) if (udph + 1 > data_end) return 0; - dport = bpf_ntohs(udph->dest); - return dport; + return bpf_ntohs(udph->dest); } static __always_inline diff --git a/samples/bpf/xdp_sample_pkts_user.c b/samples/bpf/xdp_sample_pkts_user.c index f4382ccdcbb1..587eacb49103 100644 --- a/samples/bpf/xdp_sample_pkts_user.c +++ b/samples/bpf/xdp_sample_pkts_user.c @@ -110,12 +110,9 @@ static void usage(const char *prog) int main(int argc, char **argv) { - struct bpf_prog_load_attr prog_load_attr = { - .prog_type = BPF_PROG_TYPE_XDP, - }; - struct perf_buffer_opts pb_opts = {}; const char *optstr = "FS"; int prog_fd, map_fd, opt; + struct bpf_program *prog; struct bpf_object *obj; struct bpf_map *map; char filename[256]; @@ -144,15 +141,19 @@ int main(int argc, char **argv) } snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); - prog_load_attr.file = filename; - if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) + obj = bpf_object__open_file(filename, NULL); + if (libbpf_get_error(obj)) return 1; - if (!prog_fd) { - printf("bpf_prog_load_xattr: %s\n", strerror(errno)); + prog = bpf_object__next_program(obj, NULL); + bpf_program__set_type(prog, BPF_PROG_TYPE_XDP); + + err = bpf_object__load(obj); + if (err) return 1; - } + + prog_fd = bpf_program__fd(prog); map = bpf_object__next_map(obj, NULL); if (!map) { @@ -181,8 +182,7 @@ int main(int argc, char **argv) return 1; } - pb_opts.sample_cb = print_bpf_output; - pb = perf_buffer__new(map_fd, 8, &pb_opts); + pb = perf_buffer__new(map_fd, 8, print_bpf_output, NULL, NULL, NULL); err = libbpf_get_error(pb); if (err) { perror("perf_buffer setup failed"); diff --git a/samples/bpf/xdp_sample_user.h b/samples/bpf/xdp_sample_user.h index d97465ff8c62..5f44b877ecf5 100644 --- a/samples/bpf/xdp_sample_user.h +++ b/samples/bpf/xdp_sample_user.h @@ -45,7 +45,9 @@ const char *get_driver_name(int ifindex); int get_mac_addr(int ifindex, void *mac_addr); #pragma GCC diagnostic push +#ifndef __clang__ #pragma GCC diagnostic ignored "-Wstringop-truncation" +#endif __attribute__((unused)) static inline char *safe_strncpy(char *dst, const char *src, size_t size) { diff --git a/samples/bpf/xdpsock_ctrl_proc.c b/samples/bpf/xdpsock_ctrl_proc.c index 384e62e3c6d6..cc4408797ab7 100644 --- a/samples/bpf/xdpsock_ctrl_proc.c +++ b/samples/bpf/xdpsock_ctrl_proc.c @@ -15,6 +15,9 @@ #include <bpf/xsk.h> #include "xdpsock.h" +/* libbpf APIs for AF_XDP are deprecated starting from v0.7 */ +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" + static const char *opt_if = ""; static struct option long_options[] = { diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index 49d7a6ad7e39..aa50864e4415 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -14,6 +14,7 @@ #include <arpa/inet.h> #include <locale.h> #include <net/ethernet.h> +#include <netinet/ether.h> #include <net/if.h> #include <poll.h> #include <pthread.h> @@ -30,12 +31,16 @@ #include <sys/un.h> #include <time.h> #include <unistd.h> +#include <sched.h> #include <bpf/libbpf.h> #include <bpf/xsk.h> #include <bpf/bpf.h> #include "xdpsock.h" +/* libbpf APIs for AF_XDP are deprecated starting from v0.7 */ +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" + #ifndef SOL_XDP #define SOL_XDP 283 #endif @@ -53,12 +58,27 @@ #define DEBUG_HEXDUMP 0 +#define VLAN_PRIO_MASK 0xe000 /* Priority Code Point */ +#define VLAN_PRIO_SHIFT 13 +#define VLAN_VID_MASK 0x0fff /* VLAN Identifier */ +#define VLAN_VID__DEFAULT 1 +#define VLAN_PRI__DEFAULT 0 + +#define NSEC_PER_SEC 1000000000UL +#define NSEC_PER_USEC 1000 + +#define SCHED_PRI__DEFAULT 0 + typedef __u64 u64; typedef __u32 u32; typedef __u16 u16; typedef __u8 u8; static unsigned long prev_time; +static long tx_cycle_diff_min; +static long tx_cycle_diff_max; +static double tx_cycle_diff_ave; +static long tx_cycle_cnt; enum benchmark_type { BENCH_RXDROP = 0, @@ -78,14 +98,23 @@ static u32 opt_batch_size = 64; static int opt_pkt_count; static u16 opt_pkt_size = MIN_PKT_SIZE; static u32 opt_pkt_fill_pattern = 0x12345678; +static bool opt_vlan_tag; +static u16 opt_pkt_vlan_id = VLAN_VID__DEFAULT; +static u16 opt_pkt_vlan_pri = VLAN_PRI__DEFAULT; +static struct ether_addr opt_txdmac = {{ 0x3c, 0xfd, 0xfe, + 0x9e, 0x7f, 0x71 }}; +static struct ether_addr opt_txsmac = {{ 0xec, 0xb1, 0xd7, + 0x98, 0x3a, 0xc0 }}; static bool opt_extra_stats; static bool opt_quiet; static bool opt_app_stats; static const char *opt_irq_str = ""; static u32 irq_no; static int irqs_at_init = -1; +static u32 sequence; static int opt_poll; static int opt_interval = 1; +static int opt_retries = 3; static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP; static u32 opt_umem_flags; static int opt_unaligned_chunks; @@ -97,6 +126,27 @@ static u32 opt_num_xsks = 1; static u32 prog_id; static bool opt_busy_poll; static bool opt_reduced_cap; +static clockid_t opt_clock = CLOCK_MONOTONIC; +static unsigned long opt_tx_cycle_ns; +static int opt_schpolicy = SCHED_OTHER; +static int opt_schprio = SCHED_PRI__DEFAULT; +static bool opt_tstamp; + +struct vlan_ethhdr { + unsigned char h_dest[6]; + unsigned char h_source[6]; + __be16 h_vlan_proto; + __be16 h_vlan_TCI; + __be16 h_vlan_encapsulated_proto; +}; + +#define PKTGEN_MAGIC 0xbe9be955 +struct pktgen_hdr { + __be32 pgh_magic; + __be32 seq_num; + __be32 tv_sec; + __be32 tv_usec; +}; struct xsk_ring_stats { unsigned long rx_npkts; @@ -153,15 +203,63 @@ struct xsk_socket_info { u32 outstanding_tx; }; +static const struct clockid_map { + const char *name; + clockid_t clockid; +} clockids_map[] = { + { "REALTIME", CLOCK_REALTIME }, + { "TAI", CLOCK_TAI }, + { "BOOTTIME", CLOCK_BOOTTIME }, + { "MONOTONIC", CLOCK_MONOTONIC }, + { NULL } +}; + +static const struct sched_map { + const char *name; + int policy; +} schmap[] = { + { "OTHER", SCHED_OTHER }, + { "FIFO", SCHED_FIFO }, + { NULL } +}; + static int num_socks; struct xsk_socket_info *xsks[MAX_SOCKS]; int sock; +static int get_clockid(clockid_t *id, const char *name) +{ + const struct clockid_map *clk; + + for (clk = clockids_map; clk->name; clk++) { + if (strcasecmp(clk->name, name) == 0) { + *id = clk->clockid; + return 0; + } + } + + return -1; +} + +static int get_schpolicy(int *policy, const char *name) +{ + const struct sched_map *sch; + + for (sch = schmap; sch->name; sch++) { + if (strcasecmp(sch->name, name) == 0) { + *policy = sch->policy; + return 0; + } + } + + return -1; +} + static unsigned long get_nsecs(void) { struct timespec ts; - clock_gettime(CLOCK_MONOTONIC, &ts); + clock_gettime(opt_clock, &ts); return ts.tv_sec * 1000000000UL + ts.tv_nsec; } @@ -254,6 +352,15 @@ static void dump_app_stats(long dt) xsks[i]->app_stats.prev_tx_wakeup_sendtos = xsks[i]->app_stats.tx_wakeup_sendtos; xsks[i]->app_stats.prev_opt_polls = xsks[i]->app_stats.opt_polls; } + + if (opt_tx_cycle_ns) { + printf("\n%-18s %-10s %-10s %-10s %-10s %-10s\n", + "", "period", "min", "ave", "max", "cycle"); + printf("%-18s %-10lu %-10lu %-10lu %-10lu %-10lu\n", + "Cyclic TX", opt_tx_cycle_ns, tx_cycle_diff_min, + (long)(tx_cycle_diff_ave / tx_cycle_cnt), + tx_cycle_diff_max, tx_cycle_cnt); + } } static bool get_interrupt_number(void) @@ -737,29 +844,69 @@ static inline u16 udp_csum(u32 saddr, u32 daddr, u32 len, #define ETH_FCS_SIZE 4 -#define PKT_HDR_SIZE (sizeof(struct ethhdr) + sizeof(struct iphdr) + \ - sizeof(struct udphdr)) +#define ETH_HDR_SIZE (opt_vlan_tag ? sizeof(struct vlan_ethhdr) : \ + sizeof(struct ethhdr)) +#define PKTGEN_HDR_SIZE (opt_tstamp ? sizeof(struct pktgen_hdr) : 0) +#define PKT_HDR_SIZE (ETH_HDR_SIZE + sizeof(struct iphdr) + \ + sizeof(struct udphdr) + PKTGEN_HDR_SIZE) +#define PKTGEN_HDR_OFFSET (ETH_HDR_SIZE + sizeof(struct iphdr) + \ + sizeof(struct udphdr)) +#define PKTGEN_SIZE_MIN (PKTGEN_HDR_OFFSET + sizeof(struct pktgen_hdr) + \ + ETH_FCS_SIZE) #define PKT_SIZE (opt_pkt_size - ETH_FCS_SIZE) -#define IP_PKT_SIZE (PKT_SIZE - sizeof(struct ethhdr)) +#define IP_PKT_SIZE (PKT_SIZE - ETH_HDR_SIZE) #define UDP_PKT_SIZE (IP_PKT_SIZE - sizeof(struct iphdr)) -#define UDP_PKT_DATA_SIZE (UDP_PKT_SIZE - sizeof(struct udphdr)) +#define UDP_PKT_DATA_SIZE (UDP_PKT_SIZE - \ + (sizeof(struct udphdr) + PKTGEN_HDR_SIZE)) static u8 pkt_data[XSK_UMEM__DEFAULT_FRAME_SIZE]; static void gen_eth_hdr_data(void) { - struct udphdr *udp_hdr = (struct udphdr *)(pkt_data + + struct pktgen_hdr *pktgen_hdr; + struct udphdr *udp_hdr; + struct iphdr *ip_hdr; + + if (opt_vlan_tag) { + struct vlan_ethhdr *veth_hdr = (struct vlan_ethhdr *)pkt_data; + u16 vlan_tci = 0; + + udp_hdr = (struct udphdr *)(pkt_data + + sizeof(struct vlan_ethhdr) + + sizeof(struct iphdr)); + ip_hdr = (struct iphdr *)(pkt_data + + sizeof(struct vlan_ethhdr)); + pktgen_hdr = (struct pktgen_hdr *)(pkt_data + + sizeof(struct vlan_ethhdr) + + sizeof(struct iphdr) + + sizeof(struct udphdr)); + /* ethernet & VLAN header */ + memcpy(veth_hdr->h_dest, &opt_txdmac, ETH_ALEN); + memcpy(veth_hdr->h_source, &opt_txsmac, ETH_ALEN); + veth_hdr->h_vlan_proto = htons(ETH_P_8021Q); + vlan_tci = opt_pkt_vlan_id & VLAN_VID_MASK; + vlan_tci |= (opt_pkt_vlan_pri << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK; + veth_hdr->h_vlan_TCI = htons(vlan_tci); + veth_hdr->h_vlan_encapsulated_proto = htons(ETH_P_IP); + } else { + struct ethhdr *eth_hdr = (struct ethhdr *)pkt_data; + + udp_hdr = (struct udphdr *)(pkt_data + + sizeof(struct ethhdr) + + sizeof(struct iphdr)); + ip_hdr = (struct iphdr *)(pkt_data + + sizeof(struct ethhdr)); + pktgen_hdr = (struct pktgen_hdr *)(pkt_data + sizeof(struct ethhdr) + - sizeof(struct iphdr)); - struct iphdr *ip_hdr = (struct iphdr *)(pkt_data + - sizeof(struct ethhdr)); - struct ethhdr *eth_hdr = (struct ethhdr *)pkt_data; + sizeof(struct iphdr) + + sizeof(struct udphdr)); + /* ethernet header */ + memcpy(eth_hdr->h_dest, &opt_txdmac, ETH_ALEN); + memcpy(eth_hdr->h_source, &opt_txsmac, ETH_ALEN); + eth_hdr->h_proto = htons(ETH_P_IP); + } - /* ethernet header */ - memcpy(eth_hdr->h_dest, "\x3c\xfd\xfe\x9e\x7f\x71", ETH_ALEN); - memcpy(eth_hdr->h_source, "\xec\xb1\xd7\x98\x3a\xc0", ETH_ALEN); - eth_hdr->h_proto = htons(ETH_P_IP); /* IP header */ ip_hdr->version = IPVERSION; @@ -782,6 +929,9 @@ static void gen_eth_hdr_data(void) udp_hdr->dest = htons(0x1000); udp_hdr->len = htons(UDP_PKT_SIZE); + if (opt_tstamp) + pktgen_hdr->pgh_magic = htonl(PKTGEN_MAGIC); + /* UDP data */ memset32_htonl(pkt_data + PKT_HDR_SIZE, opt_pkt_fill_pattern, UDP_PKT_DATA_SIZE); @@ -905,6 +1055,7 @@ static struct option long_options[] = { {"xdp-skb", no_argument, 0, 'S'}, {"xdp-native", no_argument, 0, 'N'}, {"interval", required_argument, 0, 'n'}, + {"retries", required_argument, 0, 'O'}, {"zero-copy", no_argument, 0, 'z'}, {"copy", no_argument, 0, 'c'}, {"frame-size", required_argument, 0, 'f'}, @@ -913,10 +1064,20 @@ static struct option long_options[] = { {"shared-umem", no_argument, 0, 'M'}, {"force", no_argument, 0, 'F'}, {"duration", required_argument, 0, 'd'}, + {"clock", required_argument, 0, 'w'}, {"batch-size", required_argument, 0, 'b'}, {"tx-pkt-count", required_argument, 0, 'C'}, {"tx-pkt-size", required_argument, 0, 's'}, {"tx-pkt-pattern", required_argument, 0, 'P'}, + {"tx-vlan", no_argument, 0, 'V'}, + {"tx-vlan-id", required_argument, 0, 'J'}, + {"tx-vlan-pri", required_argument, 0, 'K'}, + {"tx-dmac", required_argument, 0, 'G'}, + {"tx-smac", required_argument, 0, 'H'}, + {"tx-cycle", required_argument, 0, 'T'}, + {"tstamp", no_argument, 0, 'y'}, + {"policy", required_argument, 0, 'W'}, + {"schpri", required_argument, 0, 'U'}, {"extra-stats", no_argument, 0, 'x'}, {"quiet", no_argument, 0, 'Q'}, {"app-stats", no_argument, 0, 'a'}, @@ -940,6 +1101,7 @@ static void usage(const char *prog) " -S, --xdp-skb=n Use XDP skb-mod\n" " -N, --xdp-native=n Enforce XDP native mode\n" " -n, --interval=n Specify statistics update interval (default 1 sec).\n" + " -O, --retries=n Specify time-out retries (1s interval) attempt (default 3).\n" " -z, --zero-copy Force zero-copy mode.\n" " -c, --copy Force copy mode.\n" " -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n" @@ -949,6 +1111,7 @@ static void usage(const char *prog) " -F, --force Force loading the XDP prog\n" " -d, --duration=n Duration in secs to run command.\n" " Default: forever.\n" + " -w, --clock=CLOCK Clock NAME (default MONOTONIC).\n" " -b, --batch-size=n Batch size for sending or receiving\n" " packets. Default: %d\n" " -C, --tx-pkt-count=n Number of packets to send.\n" @@ -957,6 +1120,15 @@ static void usage(const char *prog) " (Default: %d bytes)\n" " Min size: %d, Max size %d.\n" " -P, --tx-pkt-pattern=nPacket fill pattern. Default: 0x%x\n" + " -V, --tx-vlan Send VLAN tagged packets (For -t|--txonly)\n" + " -J, --tx-vlan-id=n Tx VLAN ID [1-4095]. Default: %d (For -V|--tx-vlan)\n" + " -K, --tx-vlan-pri=n Tx VLAN Priority [0-7]. Default: %d (For -V|--tx-vlan)\n" + " -G, --tx-dmac=<MAC> Dest MAC addr of TX frame in aa:bb:cc:dd:ee:ff format (For -V|--tx-vlan)\n" + " -H, --tx-smac=<MAC> Src MAC addr of TX frame in aa:bb:cc:dd:ee:ff format (For -V|--tx-vlan)\n" + " -T, --tx-cycle=n Tx cycle time in micro-seconds (For -t|--txonly).\n" + " -y, --tstamp Add time-stamp to packet (For -t|--txonly).\n" + " -W, --policy=POLICY Schedule policy. Default: SCHED_OTHER\n" + " -U, --schpri=n Schedule priority. Default: %d\n" " -x, --extra-stats Display extra statistics.\n" " -Q, --quiet Do not display any stats.\n" " -a, --app-stats Display application (syscall) statistics.\n" @@ -966,7 +1138,9 @@ static void usage(const char *prog) "\n"; fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE, opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE, - XSK_UMEM__DEFAULT_FRAME_SIZE, opt_pkt_fill_pattern); + XSK_UMEM__DEFAULT_FRAME_SIZE, opt_pkt_fill_pattern, + VLAN_VID__DEFAULT, VLAN_PRI__DEFAULT, + SCHED_PRI__DEFAULT); exit(EXIT_FAILURE); } @@ -978,7 +1152,8 @@ static void parse_command_line(int argc, char **argv) opterr = 0; for (;;) { - c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:BR", + c = getopt_long(argc, argv, + "Frtli:q:pSNn:w:O:czf:muMd:b:C:s:P:VJ:K:G:H:T:yW:U:xQaI:BR", long_options, &option_index); if (c == -1) break; @@ -1012,6 +1187,17 @@ static void parse_command_line(int argc, char **argv) case 'n': opt_interval = atoi(optarg); break; + case 'w': + if (get_clockid(&opt_clock, optarg)) { + fprintf(stderr, + "ERROR: Invalid clock %s. Default to CLOCK_MONOTONIC.\n", + optarg); + opt_clock = CLOCK_MONOTONIC; + } + break; + case 'O': + opt_retries = atoi(optarg); + break; case 'z': opt_xdp_bind_flags |= XDP_ZEROCOPY; break; @@ -1059,6 +1245,49 @@ static void parse_command_line(int argc, char **argv) case 'P': opt_pkt_fill_pattern = strtol(optarg, NULL, 16); break; + case 'V': + opt_vlan_tag = true; + break; + case 'J': + opt_pkt_vlan_id = atoi(optarg); + break; + case 'K': + opt_pkt_vlan_pri = atoi(optarg); + break; + case 'G': + if (!ether_aton_r(optarg, + (struct ether_addr *)&opt_txdmac)) { + fprintf(stderr, "Invalid dmac address:%s\n", + optarg); + usage(basename(argv[0])); + } + break; + case 'H': + if (!ether_aton_r(optarg, + (struct ether_addr *)&opt_txsmac)) { + fprintf(stderr, "Invalid smac address:%s\n", + optarg); + usage(basename(argv[0])); + } + break; + case 'T': + opt_tx_cycle_ns = atoi(optarg); + opt_tx_cycle_ns *= NSEC_PER_USEC; + break; + case 'y': + opt_tstamp = 1; + break; + case 'W': + if (get_schpolicy(&opt_schpolicy, optarg)) { + fprintf(stderr, + "ERROR: Invalid policy %s. Default to SCHED_OTHER.\n", + optarg); + opt_schpolicy = SCHED_OTHER; + } + break; + case 'U': + opt_schprio = atoi(optarg); + break; case 'x': opt_extra_stats = 1; break; @@ -1264,16 +1493,22 @@ static void rx_drop_all(void) } } -static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) +static int tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, + int batch_size, unsigned long tx_ns) { - u32 idx; + u32 idx, tv_sec, tv_usec; unsigned int i; while (xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx) < batch_size) { complete_tx_only(xsk, batch_size); if (benchmark_done) - return; + return 0; + } + + if (opt_tstamp) { + tv_sec = (u32)(tx_ns / NSEC_PER_SEC); + tv_usec = (u32)((tx_ns % NSEC_PER_SEC) / 1000); } for (i = 0; i < batch_size; i++) { @@ -1281,6 +1516,21 @@ static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) idx + i); tx_desc->addr = (*frame_nb + i) * opt_xsk_frame_size; tx_desc->len = PKT_SIZE; + + if (opt_tstamp) { + struct pktgen_hdr *pktgen_hdr; + u64 addr = tx_desc->addr; + char *pkt; + + pkt = xsk_umem__get_data(xsk->umem->buffer, addr); + pktgen_hdr = (struct pktgen_hdr *)(pkt + PKTGEN_HDR_OFFSET); + + pktgen_hdr->seq_num = htonl(sequence++); + pktgen_hdr->tv_sec = htonl(tv_sec); + pktgen_hdr->tv_usec = htonl(tv_usec); + + hex_dump(pkt, PKT_SIZE, addr); + } } xsk_ring_prod__submit(&xsk->tx, batch_size); @@ -1289,6 +1539,8 @@ static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) *frame_nb += batch_size; *frame_nb %= NUM_FRAMES; complete_tx_only(xsk, batch_size); + + return batch_size; } static inline int get_batch_size(int pkt_cnt) @@ -1315,23 +1567,48 @@ static void complete_tx_only_all(void) pending = !!xsks[i]->outstanding_tx; } } - } while (pending); + sleep(1); + } while (pending && opt_retries-- > 0); } static void tx_only_all(void) { struct pollfd fds[MAX_SOCKS] = {}; u32 frame_nb[MAX_SOCKS] = {}; + unsigned long next_tx_ns = 0; int pkt_cnt = 0; int i, ret; + if (opt_poll && opt_tx_cycle_ns) { + fprintf(stderr, + "Error: --poll and --tx-cycles are both set\n"); + return; + } + for (i = 0; i < num_socks; i++) { fds[0].fd = xsk_socket__fd(xsks[i]->xsk); fds[0].events = POLLOUT; } + if (opt_tx_cycle_ns) { + /* Align Tx time to micro-second boundary */ + next_tx_ns = (get_nsecs() / NSEC_PER_USEC + 1) * + NSEC_PER_USEC; + next_tx_ns += opt_tx_cycle_ns; + + /* Initialize periodic Tx scheduling variance */ + tx_cycle_diff_min = 1000000000; + tx_cycle_diff_max = 0; + tx_cycle_diff_ave = 0.0; + } + while ((opt_pkt_count && pkt_cnt < opt_pkt_count) || !opt_pkt_count) { int batch_size = get_batch_size(pkt_cnt); + unsigned long tx_ns = 0; + struct timespec next; + int tx_cnt = 0; + long diff; + int err; if (opt_poll) { for (i = 0; i < num_socks; i++) @@ -1344,13 +1621,43 @@ static void tx_only_all(void) continue; } + if (opt_tx_cycle_ns) { + next.tv_sec = next_tx_ns / NSEC_PER_SEC; + next.tv_nsec = next_tx_ns % NSEC_PER_SEC; + err = clock_nanosleep(opt_clock, TIMER_ABSTIME, &next, NULL); + if (err) { + if (err != EINTR) + fprintf(stderr, + "clock_nanosleep failed. Err:%d errno:%d\n", + err, errno); + break; + } + + /* Measure periodic Tx scheduling variance */ + tx_ns = get_nsecs(); + diff = tx_ns - next_tx_ns; + if (diff < tx_cycle_diff_min) + tx_cycle_diff_min = diff; + + if (diff > tx_cycle_diff_max) + tx_cycle_diff_max = diff; + + tx_cycle_diff_ave += (double)diff; + tx_cycle_cnt++; + } else if (opt_tstamp) { + tx_ns = get_nsecs(); + } + for (i = 0; i < num_socks; i++) - tx_only(xsks[i], &frame_nb[i], batch_size); + tx_cnt += tx_only(xsks[i], &frame_nb[i], batch_size, tx_ns); - pkt_cnt += batch_size; + pkt_cnt += tx_cnt; if (benchmark_done) break; + + if (opt_tx_cycle_ns) + next_tx_ns += opt_tx_cycle_ns; } if (opt_pkt_count) @@ -1581,6 +1888,7 @@ int main(int argc, char **argv) struct __user_cap_data_struct data[2] = { { 0 } }; struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; bool rx = false, tx = false; + struct sched_param schparam; struct xsk_umem_info *umem; struct bpf_object *obj; int xsks_map_fd = 0; @@ -1643,6 +1951,9 @@ int main(int argc, char **argv) apply_setsockopt(xsks[i]); if (opt_bench == BENCH_TXONLY) { + if (opt_tstamp && opt_pkt_size < PKTGEN_SIZE_MIN) + opt_pkt_size = PKTGEN_SIZE_MIN; + gen_eth_hdr_data(); for (i = 0; i < NUM_FRAMES; i++) @@ -1682,6 +1993,16 @@ int main(int argc, char **argv) prev_time = get_nsecs(); start_time = prev_time; + /* Configure sched priority for better wake-up accuracy */ + memset(&schparam, 0, sizeof(schparam)); + schparam.sched_priority = opt_schprio; + ret = sched_setscheduler(0, opt_schpolicy, &schparam); + if (ret) { + fprintf(stderr, "Error(%d) in setting priority(%d): %s\n", + errno, opt_schprio, strerror(errno)); + goto out; + } + if (opt_bench == BENCH_RXDROP) rx_drop_all(); else if (opt_bench == BENCH_TXONLY) @@ -1689,6 +2010,7 @@ int main(int argc, char **argv) else l2fwd_all(); +out: benchmark_done = true; if (!opt_quiet) diff --git a/samples/bpf/xsk_fwd.c b/samples/bpf/xsk_fwd.c index 1cd97c84c337..52e7c4ffd228 100644 --- a/samples/bpf/xsk_fwd.c +++ b/samples/bpf/xsk_fwd.c @@ -27,6 +27,9 @@ #include <bpf/xsk.h> #include <bpf/bpf.h> +/* libbpf APIs for AF_XDP are deprecated starting from v0.7 */ +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" + #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) typedef __u64 u64; |