diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 19:06:09 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 19:06:09 -0800 |
commit | 8efd0d9c316af470377894a6a0f9ff63ce18c177 (patch) | |
tree | 65d00bf8c7fd8f938a42d38e44bad11d4cf08664 /lib | |
parent | 9bcbf894b6872216ef61faf17248ec234e3db6bc (diff) | |
parent | 8aaaf2f3af2ae212428f4db1af34214225f5cec3 (diff) | |
download | lwn-8efd0d9c316af470377894a6a0f9ff63ce18c177.tar.gz lwn-8efd0d9c316af470377894a6a0f9ff63ce18c177.zip |
Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Defer freeing TCP skbs to the BH handler, whenever possible, or at
least perform the freeing outside of the socket lock section to
decrease cross-CPU allocator work and improve latency.
- Add netdevice refcount tracking to locate sources of netdevice and
net namespace refcount leaks.
- Make Tx watchdog less intrusive - avoid pausing Tx and restarting
all queues from a single CPU removing latency spikes.
- Various small optimizations throughout the stack from Eric Dumazet.
- Make netdev->dev_addr[] constant, force modifications to go via
appropriate helpers to allow us to keep addresses in ordered data
structures.
- Replace unix_table_lock with per-hash locks, improving performance
of bind() calls.
- Extend skb drop tracepoint with a drop reason.
- Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
BPF
---
- New helpers:
- bpf_find_vma(), find and inspect VMAs for profiling use cases
- bpf_loop(), runtime-bounded loop helper trading some execution
time for much faster (if at all converging) verification
- bpf_strncmp(), improve performance, avoid compiler flakiness
- bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
for tracing programs, all inlined by the verifier
- Support BPF relocations (CO-RE) in the kernel loader.
- Further the support for BTF_TYPE_TAG annotations.
- Allow access to local storage in sleepable helpers.
- Convert verifier argument types to a composable form with different
attributes which can be shared across types (ro, maybe-null).
- Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
creating new, extensible ones where missing and deprecating those
to be removed.
Protocols
---------
- WiFi (mac80211/cfg80211):
- notify user space about long "come back in N" AP responses,
allow it to react to such temporary rejections
- allow non-standard VHT MCS 10/11 rates
- use coarse time in airtime fairness code to save CPU cycles
- Bluetooth:
- rework of HCI command execution serialization to use a common
queue and work struct, and improve handling errors reported in
the middle of a batch of commands
- rework HCI event handling to use skb_pull_data, avoiding packet
parsing pitfalls
- support AOSP Bluetooth Quality Report
- SMC:
- support net namespaces, following the RDMA model
- improve connection establishment latency by pre-clearing buffers
- introduce TCP ULP for automatic redirection to SMC
- Multi-Path TCP:
- support ioctls: SIOCINQ, OUTQ, and OUTQNSD
- support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
- support cmsgs: TCP_INQ
- improvements in the data scheduler (assigning data to subflows)
- support fastclose option (quick shutdown of the full MPTCP
connection, similar to TCP RST in regular TCP)
- MCTP (Management Component Transport) over serial, as defined by
DMTF spec DSP0253 - "MCTP Serial Transport Binding".
Driver API
----------
- Support timestamping on bond interfaces in active/passive mode.
- Introduce generic phylink link mode validation for drivers which
don't have any quirks and where MAC capability bits fully express
what's supported. Allow PCS layer to participate in the validation.
Convert a number of drivers.
- Add support to set/get size of buffers on the Rx rings and size of
the tx copybreak buffer via ethtool.
- Support offloading TC actions as first-class citizens rather than
only as attributes of filters, improve sharing and device resource
utilization.
- WiFi (mac80211/cfg80211):
- support forwarding offload (ndo_fill_forward_path)
- support for background radar detection hardware
- SA Query Procedures offload on the AP side
New hardware / drivers
----------------------
- tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
real-time requirements for isochronous communication with protocols
like OPC UA Pub/Sub.
- Qualcomm BAM-DMUX WWAN - driver for data channels of modems
integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
(qcom_bam_dmux).
- Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
with support for bridging, VLANs and multicast forwarding
(lan966x).
- iwlmei driver for co-operating between Intel's WiFi driver and
Intel's Active Management Technology (AMT) devices.
- mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
- Bluetooth:
- MediaTek MT7921 SDIO devices
- Foxconn MT7922A
- Realtek RTL8852AE
Drivers
-------
- Significantly improve performance in the datapaths of: lan78xx,
ax88179_178a, lantiq_xrx200, bnxt.
- Intel Ethernet NICs:
- igb: support PTP/time PEROUT and EXTTS SDP functions on
82580/i354/i350 adapters
- ixgbevf: new PF -> VF mailbox API which avoids the risk of
mailbox corruption with ESXi
- iavf: support configuration of VLAN features of finer
granularity, stacked tags and filtering
- ice: PTP support for new E822 devices with sub-ns precision
- ice: support firmware activation without reboot
- Mellanox Ethernet NICs (mlx5):
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- support TC forwarding when tunnel encap and decap happen between
two ports of the same NIC
- dynamically size and allow disabling various features to save
resources for running in embedded / SmartNIC scenarios
- Broadcom Ethernet NICs (bnxt):
- use page frag allocator to improve Rx performance
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- Other Ethernet NICs:
- amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
- Microsoft cloud/virtual NIC (mana):
- add XDP support (PASS, DROP, TX)
- Mellanox Ethernet switches (mlxsw):
- initial support for Spectrum-4 ASICs
- VxLAN with IPv6 underlay
- Marvell Ethernet switches (prestera):
- support flower flow templates
- add basic IP forwarding support
- NXP embedded Ethernet switches (ocelot & felix):
- support Per-Stream Filtering and Policing (PSFP)
- enable cut-through forwarding between ports by default
- support FDMA to improve packet Rx/Tx to CPU
- Other embedded switches:
- hellcreek: improve trapping management (STP and PTP) packets
- qca8k: support link aggregation and port mirroring
- Qualcomm 802.11ax WiFi (ath11k):
- qca6390, wcn6855: enable 802.11 power save mode in station mode
- BSS color change support
- WCN6855 hw2.1 support
- 11d scan offload support
- scan MAC address randomization support
- full monitor mode, only supported on QCN9074
- qca6390/wcn6855: report signal and tx bitrate
- qca6390: rfkill support
- qca6390/wcn6855: regdb.bin support
- Intel WiFi (iwlwifi):
- support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
in cooperation with the BIOS
- support for Optimized Connectivity Experience (OCE) scan
- support firmware API version 68
- lots of preparatory work for the upcoming Bz device family
- MediaTek WiFi (mt76):
- Specific Absorption Rate (SAR) support
- mt7921: 160 MHz channel support
- RealTek WiFi (rtw88):
- Specific Absorption Rate (SAR) support
- scan offload
- Other WiFi NICs
- ath10k: support fetching (pre-)calibration data from nvmem
- brcmfmac: configure keep-alive packet on suspend
- wcn36xx: beacon filter support"
* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
tcp: tcp_send_challenge_ack delete useless param `skb`
net/qla3xxx: Remove useless DMA-32 fallback configuration
rocker: Remove useless DMA-32 fallback configuration
hinic: Remove useless DMA-32 fallback configuration
lan743x: Remove useless DMA-32 fallback configuration
net: enetc: Remove useless DMA-32 fallback configuration
cxgb4vf: Remove useless DMA-32 fallback configuration
cxgb4: Remove useless DMA-32 fallback configuration
cxgb3: Remove useless DMA-32 fallback configuration
bnx2x: Remove useless DMA-32 fallback configuration
et131x: Remove useless DMA-32 fallback configuration
be2net: Remove useless DMA-32 fallback configuration
vmxnet3: Remove useless DMA-32 fallback configuration
bna: Simplify DMA setting
net: alteon: Simplify DMA setting
myri10ge: Simplify DMA setting
qlcnic: Simplify DMA setting
net: allwinner: Fix print format
page_pool: remove spinlock in page_pool_refill_alloc_cache()
amt: fix wrong return type of amt_send_membership_update()
...
Diffstat (limited to 'lib')
-rw-r--r-- | lib/Kconfig | 5 | ||||
-rw-r--r-- | lib/Kconfig.debug | 15 | ||||
-rw-r--r-- | lib/Makefile | 4 | ||||
-rw-r--r-- | lib/objagg.c | 7 | ||||
-rw-r--r-- | lib/ref_tracker.c | 140 | ||||
-rw-r--r-- | lib/test_bpf.c | 4 | ||||
-rw-r--r-- | lib/test_ref_tracker.c | 115 |
7 files changed, 282 insertions, 8 deletions
diff --git a/lib/Kconfig b/lib/Kconfig index 5e7165e6a346..655b0e43f260 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -680,6 +680,11 @@ config STACK_HASH_ORDER Select the hash size as a power of 2 for the stackdepot hash table. Choose a lower value to reduce the memory impact. +config REF_TRACKER + bool + depends on STACKTRACE_SUPPORT + select STACKDEPOT + config SBITMAP bool diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 5e14e32056ad..c77fe36bb3d8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -599,6 +599,11 @@ config DEBUG_MISC Say Y here if you need to enable miscellaneous debug code that should be under a more specific debug option but isn't. +menu "Networking Debugging" + +source "net/Kconfig.debug" + +endmenu # "Networking Debugging" menu "Memory Debugging" @@ -2107,6 +2112,16 @@ config BACKTRACE_SELF_TEST Say N if you are unsure. +config TEST_REF_TRACKER + tristate "Self test for reference tracker" + depends on DEBUG_KERNEL && STACKTRACE_SUPPORT + select REF_TRACKER + help + This option provides a kernel module performing tests + using reference tracker infrastructure. + + Say N if you are unsure. + config RBTREE_TEST tristate "Red-Black tree test" depends on DEBUG_KERNEL diff --git a/lib/Makefile b/lib/Makefile index 364c23f15578..b213a7bbf3fd 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -101,7 +101,7 @@ obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o obj-$(CONFIG_TEST_HMM) += test_hmm.o obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o - +obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o # # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS @@ -270,6 +270,8 @@ obj-$(CONFIG_STACKDEPOT) += stackdepot.o KASAN_SANITIZE_stackdepot.o := n KCOV_INSTRUMENT_stackdepot.o := n +obj-$(CONFIG_REF_TRACKER) += ref_tracker.o + libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \ fdt_empty_tree.o fdt_addresses.o $(foreach file, $(libfdt_files), \ diff --git a/lib/objagg.c b/lib/objagg.c index 5e1676ccdadd..1e248629ed64 100644 --- a/lib/objagg.c +++ b/lib/objagg.c @@ -781,7 +781,6 @@ static struct objagg_tmp_graph *objagg_tmp_graph_create(struct objagg *objagg) struct objagg_tmp_node *node; struct objagg_tmp_node *pnode; struct objagg_obj *objagg_obj; - size_t alloc_size; int i, j; graph = kzalloc(sizeof(*graph), GFP_KERNEL); @@ -793,9 +792,7 @@ static struct objagg_tmp_graph *objagg_tmp_graph_create(struct objagg *objagg) goto err_nodes_alloc; graph->nodes_count = nodes_count; - alloc_size = BITS_TO_LONGS(nodes_count * nodes_count) * - sizeof(unsigned long); - graph->edges = kzalloc(alloc_size, GFP_KERNEL); + graph->edges = bitmap_zalloc(nodes_count * nodes_count, GFP_KERNEL); if (!graph->edges) goto err_edges_alloc; @@ -833,7 +830,7 @@ err_nodes_alloc: static void objagg_tmp_graph_destroy(struct objagg_tmp_graph *graph) { - kfree(graph->edges); + bitmap_free(graph->edges); kfree(graph->nodes); kfree(graph); } diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c new file mode 100644 index 000000000000..0ae2e66dcf0f --- /dev/null +++ b/lib/ref_tracker.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include <linux/export.h> +#include <linux/ref_tracker.h> +#include <linux/slab.h> +#include <linux/stacktrace.h> +#include <linux/stackdepot.h> + +#define REF_TRACKER_STACK_ENTRIES 16 + +struct ref_tracker { + struct list_head head; /* anchor into dir->list or dir->quarantine */ + bool dead; + depot_stack_handle_t alloc_stack_handle; + depot_stack_handle_t free_stack_handle; +}; + +void ref_tracker_dir_exit(struct ref_tracker_dir *dir) +{ + struct ref_tracker *tracker, *n; + unsigned long flags; + bool leak = false; + + spin_lock_irqsave(&dir->lock, flags); + list_for_each_entry_safe(tracker, n, &dir->quarantine, head) { + list_del(&tracker->head); + kfree(tracker); + dir->quarantine_avail++; + } + list_for_each_entry_safe(tracker, n, &dir->list, head) { + pr_err("leaked reference.\n"); + if (tracker->alloc_stack_handle) + stack_depot_print(tracker->alloc_stack_handle); + leak = true; + list_del(&tracker->head); + kfree(tracker); + } + spin_unlock_irqrestore(&dir->lock, flags); + WARN_ON_ONCE(leak); + WARN_ON_ONCE(refcount_read(&dir->untracked) != 1); +} +EXPORT_SYMBOL(ref_tracker_dir_exit); + +void ref_tracker_dir_print(struct ref_tracker_dir *dir, + unsigned int display_limit) +{ + struct ref_tracker *tracker; + unsigned long flags; + unsigned int i = 0; + + spin_lock_irqsave(&dir->lock, flags); + list_for_each_entry(tracker, &dir->list, head) { + if (i < display_limit) { + pr_err("leaked reference.\n"); + if (tracker->alloc_stack_handle) + stack_depot_print(tracker->alloc_stack_handle); + i++; + } else { + break; + } + } + spin_unlock_irqrestore(&dir->lock, flags); +} +EXPORT_SYMBOL(ref_tracker_dir_print); + +int ref_tracker_alloc(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp, + gfp_t gfp) +{ + unsigned long entries[REF_TRACKER_STACK_ENTRIES]; + struct ref_tracker *tracker; + unsigned int nr_entries; + unsigned long flags; + + *trackerp = tracker = kzalloc(sizeof(*tracker), gfp | __GFP_NOFAIL); + if (unlikely(!tracker)) { + pr_err_once("memory allocation failure, unreliable refcount tracker.\n"); + refcount_inc(&dir->untracked); + return -ENOMEM; + } + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1); + nr_entries = filter_irq_stacks(entries, nr_entries); + tracker->alloc_stack_handle = stack_depot_save(entries, nr_entries, gfp); + + spin_lock_irqsave(&dir->lock, flags); + list_add(&tracker->head, &dir->list); + spin_unlock_irqrestore(&dir->lock, flags); + return 0; +} +EXPORT_SYMBOL_GPL(ref_tracker_alloc); + +int ref_tracker_free(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp) +{ + unsigned long entries[REF_TRACKER_STACK_ENTRIES]; + struct ref_tracker *tracker = *trackerp; + depot_stack_handle_t stack_handle; + unsigned int nr_entries; + unsigned long flags; + + if (!tracker) { + refcount_dec(&dir->untracked); + return -EEXIST; + } + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1); + nr_entries = filter_irq_stacks(entries, nr_entries); + stack_handle = stack_depot_save(entries, nr_entries, GFP_ATOMIC); + + spin_lock_irqsave(&dir->lock, flags); + if (tracker->dead) { + pr_err("reference already released.\n"); + if (tracker->alloc_stack_handle) { + pr_err("allocated in:\n"); + stack_depot_print(tracker->alloc_stack_handle); + } + if (tracker->free_stack_handle) { + pr_err("freed in:\n"); + stack_depot_print(tracker->free_stack_handle); + } + spin_unlock_irqrestore(&dir->lock, flags); + WARN_ON_ONCE(1); + return -EINVAL; + } + tracker->dead = true; + + tracker->free_stack_handle = stack_handle; + + list_move_tail(&tracker->head, &dir->quarantine); + if (!dir->quarantine_avail) { + tracker = list_first_entry(&dir->quarantine, struct ref_tracker, head); + list_del(&tracker->head); + } else { + dir->quarantine_avail--; + tracker = NULL; + } + spin_unlock_irqrestore(&dir->lock, flags); + + kfree(tracker); + return 0; +} +EXPORT_SYMBOL_GPL(ref_tracker_free); diff --git a/lib/test_bpf.c b/lib/test_bpf.c index adae39567264..0c5cb2d6436a 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -14683,7 +14683,7 @@ static struct tail_call_test tail_call_tests[] = { BPF_EXIT_INSN(), }, .flags = FLAG_NEED_STATE | FLAG_RESULT_IN_STATE, - .result = (MAX_TAIL_CALL_CNT + 1 + 1) * MAX_TESTRUNS, + .result = (MAX_TAIL_CALL_CNT + 1) * MAX_TESTRUNS, }, { "Tail call count preserved across function calls", @@ -14705,7 +14705,7 @@ static struct tail_call_test tail_call_tests[] = { }, .stack_depth = 8, .flags = FLAG_NEED_STATE | FLAG_RESULT_IN_STATE, - .result = (MAX_TAIL_CALL_CNT + 1 + 1) * MAX_TESTRUNS, + .result = (MAX_TAIL_CALL_CNT + 1) * MAX_TESTRUNS, }, { "Tail call error path, NULL target", diff --git a/lib/test_ref_tracker.c b/lib/test_ref_tracker.c new file mode 100644 index 000000000000..19d7dec70cc6 --- /dev/null +++ b/lib/test_ref_tracker.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Referrence tracker self test. + * + * Copyright (c) 2021 Eric Dumazet <edumazet@google.com> + */ +#include <linux/init.h> +#include <linux/module.h> +#include <linux/delay.h> +#include <linux/ref_tracker.h> +#include <linux/slab.h> +#include <linux/timer.h> + +static struct ref_tracker_dir ref_dir; +static struct ref_tracker *tracker[20]; + +#define TRT_ALLOC(X) static noinline void \ + alloctest_ref_tracker_alloc##X(struct ref_tracker_dir *dir, \ + struct ref_tracker **trackerp) \ + { \ + ref_tracker_alloc(dir, trackerp, GFP_KERNEL); \ + } + +TRT_ALLOC(1) +TRT_ALLOC(2) +TRT_ALLOC(3) +TRT_ALLOC(4) +TRT_ALLOC(5) +TRT_ALLOC(6) +TRT_ALLOC(7) +TRT_ALLOC(8) +TRT_ALLOC(9) +TRT_ALLOC(10) +TRT_ALLOC(11) +TRT_ALLOC(12) +TRT_ALLOC(13) +TRT_ALLOC(14) +TRT_ALLOC(15) +TRT_ALLOC(16) +TRT_ALLOC(17) +TRT_ALLOC(18) +TRT_ALLOC(19) + +#undef TRT_ALLOC + +static noinline void +alloctest_ref_tracker_free(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp) +{ + ref_tracker_free(dir, trackerp); +} + + +static struct timer_list test_ref_tracker_timer; +static atomic_t test_ref_timer_done = ATOMIC_INIT(0); + +static void test_ref_tracker_timer_func(struct timer_list *t) +{ + ref_tracker_alloc(&ref_dir, &tracker[0], GFP_ATOMIC); + atomic_set(&test_ref_timer_done, 1); +} + +static int __init test_ref_tracker_init(void) +{ + int i; + + ref_tracker_dir_init(&ref_dir, 100); + + timer_setup(&test_ref_tracker_timer, test_ref_tracker_timer_func, 0); + mod_timer(&test_ref_tracker_timer, jiffies + 1); + + alloctest_ref_tracker_alloc1(&ref_dir, &tracker[1]); + alloctest_ref_tracker_alloc2(&ref_dir, &tracker[2]); + alloctest_ref_tracker_alloc3(&ref_dir, &tracker[3]); + alloctest_ref_tracker_alloc4(&ref_dir, &tracker[4]); + alloctest_ref_tracker_alloc5(&ref_dir, &tracker[5]); + alloctest_ref_tracker_alloc6(&ref_dir, &tracker[6]); + alloctest_ref_tracker_alloc7(&ref_dir, &tracker[7]); + alloctest_ref_tracker_alloc8(&ref_dir, &tracker[8]); + alloctest_ref_tracker_alloc9(&ref_dir, &tracker[9]); + alloctest_ref_tracker_alloc10(&ref_dir, &tracker[10]); + alloctest_ref_tracker_alloc11(&ref_dir, &tracker[11]); + alloctest_ref_tracker_alloc12(&ref_dir, &tracker[12]); + alloctest_ref_tracker_alloc13(&ref_dir, &tracker[13]); + alloctest_ref_tracker_alloc14(&ref_dir, &tracker[14]); + alloctest_ref_tracker_alloc15(&ref_dir, &tracker[15]); + alloctest_ref_tracker_alloc16(&ref_dir, &tracker[16]); + alloctest_ref_tracker_alloc17(&ref_dir, &tracker[17]); + alloctest_ref_tracker_alloc18(&ref_dir, &tracker[18]); + alloctest_ref_tracker_alloc19(&ref_dir, &tracker[19]); + + /* free all trackers but first 0 and 1. */ + for (i = 2; i < ARRAY_SIZE(tracker); i++) + alloctest_ref_tracker_free(&ref_dir, &tracker[i]); + + /* Attempt to free an already freed tracker. */ + alloctest_ref_tracker_free(&ref_dir, &tracker[2]); + + while (!atomic_read(&test_ref_timer_done)) + msleep(1); + + /* This should warn about tracker[0] & tracker[1] being not freed. */ + ref_tracker_dir_exit(&ref_dir); + + return 0; +} + +static void __exit test_ref_tracker_exit(void) +{ +} + +module_init(test_ref_tracker_init); +module_exit(test_ref_tracker_exit); + +MODULE_LICENSE("GPL v2"); |