linux-next.git/net/core/dev.c, branch master

Merge branch 'master' of git://git.code.sf.net/p/tomoyo/tomoyo.git

2026-07-03T15:21:11+00:00

net: add "struct dst_entry" debugging

2026-06-29T11:36:46+00:00

This change is not for upstream. This change is for linux-next only. syzbot is reporting "struct dst_entry" leaks. unregister_netdevice: waiting for lo to become free. Usage count = 2 ref_tracker: netdev@ffff88807a79a630 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4418 [inline] netdev_hold include/linux/netdevice.h:4447 [inline] dst_init+0xe6/0x490 net/core/dst.c:52 dst_alloc+0x12a/0x170 net/core/dst.c:94 rt_dst_alloc net/ipv4/route.c:1651 [inline] __mkroute_output net/ipv4/route.c:2655 [inline] (...snipped...) Let's try to report all trace hold/release calls. Signed-off-by: Tetsuo Handa

net: update dev_put()/dev_hold() debugging

2026-06-29T11:26:36+00:00

This change is not for upstream. This change is for linux-next only. syzbot is still reporting unregister_netdevice: waiting for DEV to become free problem. Since commit 4c6c11ea0f7b ("net: refine dev_put()/dev_hold() debugging") is not sufficient for me, let's try to report all locations which called dev_put()/dev_hold(), with a hope that we can find some hints for locations where dev_put() is missing. Signed-off-by: Tetsuo Handa

xfrm: propagate -EINPROGRESS from validate_xmit_xfrm()

2026-06-26T06:13:54+00:00

validate_xmit_xfrm() returns NULL both when a packet is dropped and when it is stolen by async crypto (-EINPROGRESS from ->xmit()). Callers cannot distinguish the two cases. f53c723902d1 ("net: Add asynchronous callbacks for xfrm on layer 2.") changed the semantics of a NULL return from "dropped" to "stolen or dropped", but __dev_queue_xmit() was not updated. On virtual/bridge interfaces (noqueue qdisc) __dev_queue_xmit() initialises rc=-ENOMEM and jumps to out: when skb is NULL, returning -ENOMEM to the caller even though the packet will be delivered correctly via xfrm_dev_resume(). Return ERR_PTR(-EINPROGRESS) from validate_xmit_xfrm() for the async case so callers can tell it apart from a real drop. Update __dev_queue_xmit() to handle ERR_PTR(-EINPROGRESS) from validate_xmit_skb() correctly. Update validate_xmit_skb_list() to use IS_ERR_OR_NULL() so that ERR_PTR(-EINPROGRESS) is not mistakenly added to the transmitted list. Fixes: f53c723902d1 ("net: Add asynchronous callbacks for xfrm on layer 2.") Suggested-by: Sabrina Dubroca Signed-off-by: Petr Wozniak Signed-off-by: Steffen Klassert

vlan: defer real device state propagation to netdev_work

2026-06-25T17:18:40+00:00

vlan_device_event() generates nested UP/DOWN, MTU and feature change events. It executes an event for the VLAN device directly from the notifier - while the locks of the lower device are held. This causes deadlocks, for example: bond (3) bond_update_speed_duplex(vlan) | ^ v vlan (2) UP(vlan) (4) vlan_ethtool_get_link_ksettings() | ^ v dummy (1) UP(dummy) (5) __ethtool_get_link_ksettings() The dummy device is ops locked, vlan creates a nested event (2), then bond wants to ask vlan for link state (3). bond uses the "I'm already holding the instance lock" flavor of API. But in this case the lock held refers to vlan itself. We hit vlan's link settings trampoline (4) and call __ethtool_get_link_ksettings() which tries to lock dummy. Deadlock. There's no clean way for us to tell the vlan_ethtool_get_link_ksettings() that the caller is already in lower device's critical section. Defer the propagation to the per-netdev work facility instead: the notifier only schedules netdev_work_sched(vlandev, VLAN_WORK_*), and ndo_work (vlan_dev_work) applies the change later. Hopefully nobody expects the VLAN state changes to be instantaneous. If someone does expect the changes to be instantaneous we will have to do the same thing Stan did for rx_mode and "strategically" place sync calls, to make sure such delayed works are executed after we drop the ops lock but before we drop rtnl_lock. Stan suggests that if we need that down the line we may consider reshaping the mechanism into "async notifications". AFAICT only vlan does this sort of netdev open chaining, so as a first try I think that sticking the complexity into the vlan code makes sense. One corner case is that we need to cancel the event if user explicitly changes the state before work could run. Consider the following operations with vlan0 on top of dummy0: ip link set dev dummy0 up # queues work to up vlan0 ip link set dev vlan0 down # user explicitly downs the vlan ndo_work # acts on the stale event Reported-by: syzbot+09da62a8b78959ceb8bb@syzkaller.appspotmail.com Reported-by: syzbot+cb67c392b0b8f0fd0fc1@syzkaller.appspotmail.com Reported-by: syzbot+9bb8bd77f3966641f298@syzkaller.appspotmail.com Fixes: 9f275c2e9020 ("net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked") Reviewed-by: Kuniyuki Iwashima Reviewed-by: Nicolai Buchwitz Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20260624182018.2445732-4-kuba@kernel.org Signed-off-by: Jakub Kicinski

net: turn the rx_mode work into a generic netdev_work facility

2026-06-25T17:18:40+00:00

The rx_mode update runs from a workqueue: drivers have their ndo_set_rx_mode_async() callback executed by a single global work item under RTNL and ops lock. This is a useful pattern. Support multiple "events" that need to be serviced and make RX_MODE sync the first one. Call the events "core" because later on we will let drivers define and schedule their own. Reviewed-by: Kuniyuki Iwashima Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20260624182018.2445732-2-kuba@kernel.org Signed-off-by: Jakub Kicinski

net: serialize netif_running() check in enqueue_to_backlog()

2026-06-16T22:42:53+00:00

Syzbot reported a KASAN slab-use-after-free in fib_rules_lookup(). The root cause is a race condition where packets can escape the backlog flushing during device unregistration (e.g., during netns exit). Commit e9e4dd3267d0 ("net: do not process device backlog during unregistration") introduced a lockless netif_running() check in enqueue_to_backlog() to prevent queuing packets to an unregistering device. However, this creates a TOCTOU race window. A lockless transmitter (like veth_xmit) can pass the check before dev_close() clears IFF_UP. If the transmitter is then delayed, flush_all_backlogs() can run and finish before the transmitter grabs the backlog lock and queues the packet. The packet then escapes the flush and triggers UAF later when processed. Fix this by moving the netif_running() check inside the backlog lock. This serializes the check with the flush work (which also grabs the lock). We then either queue the packet before the flush runs (so it gets flushed), or check netif_running() after the flush/close completes (so it gets dropped). Fixes: e9e4dd3267d0 ("net: do not process device backlog during unregistration") Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a315824.b0403584.28d0ff.0000.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Cc: Julian Anastasov Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260616141317.407791-1-edumazet@google.com Signed-off-by: Jakub Kicinski

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2026-06-16T21:59:58+00:00

Merge in late fixes in preparation for the net-next PR. Conflicts: net/tls/tls_sw.c 406e8a651a7b ("net: skmsg: preserve sg.copy across SG transforms") 79511603a65b ("tls: remove dead sockmap (psock) handling from the SW path") drivers/net/ethernet/microsoft/mana/mana_en.c f8fd56977eeea ("net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size") https://lore.kernel.org/ajAPXu-C_PuTgV-a@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski

net: watchdog: fix refcount tracking races

2026-06-13T00:34:57+00:00

Blamed commit converted the untracked dev_hold()/dev_put() calls in the watchdog code to use the tracked dev_hold_track()/dev_put_track() (which were later renamed/interfaced to netdev_hold() and netdev_put()). By introducing dev->watchdog_dev_tracker to store the reference tracking information without adding synchronization between netdev_watchdog_up() and dev_watchdog(), it enabled the race condition where this pointer could be overwritten or freed concurrently, leading to the list corruption crash syzbot reported: list_del corruption, ffff888114a18c00->next is NULL kernel BUG at lib/list_debug.c:52 ! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 91 Comm: kworker/u8:5 Not tainted syzkaller #0 PREEMPT(lazy) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026 Workqueue: events_unbound linkwatch_event RIP: 0010:__list_del_entry_valid_or_report.cold+0x22/0x2a lib/list_debug.c:52 Call Trace: __list_del_entry_valid include/linux/list.h:132 [inline] __list_del_entry include/linux/list.h:246 [inline] list_move_tail include/linux/list.h:341 [inline] ref_tracker_free+0x1a7/0x6c0 lib/ref_tracker.c:329 netdev_tracker_free include/linux/netdevice.h:4491 [inline] netdev_put include/linux/netdevice.h:4508 [inline] netdev_put include/linux/netdevice.h:4504 [inline] netdev_watchdog_down net/sched/sch_generic.c:600 [inline] dev_deactivate_many+0x28c/0xfe0 net/sched/sch_generic.c:1363 dev_deactivate+0x109/0x1d0 net/sched/sch_generic.c:1397 linkwatch_do_dev net/core/link_watch.c:184 [inline] linkwatch_do_dev+0xd3/0x120 net/core/link_watch.c:166 __linkwatch_run_queue+0x3a5/0x810 net/core/link_watch.c:240 linkwatch_event+0x8f/0xc0 net/core/link_watch.c:314 process_one_work+0xa0e/0x1980 kernel/workqueue.c:3314 process_scheduled_works kernel/workqueue.c:3397 [inline] worker_thread+0x5ef/0xe50 kernel/workqueue.c:3478 kthread+0x370/0x450 kernel/kthread.c:436 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 This patch has three coordinated parts: 1) Add dev->watchdog_lock and dev->watchdog_ref_held to serialize watchdog operations. 2) Remove netdev_watchdog_up() call from netif_carrier_on(): This ensures netdev_watchdog_up() is only called from process/BH context (via linkwatch workqueue dev_activate()), allowing us to use spin_lock_bh() for synchronization. 3) Synchronize watchdog up and watchdog timer: Protect netdev_watchdog_up() with tx_global_lock and watchdog_lock. Only allocate a new tracker in netdev_watchdog_up() if one is not already present. In dev_watchdog(), ensure we don't release the tracker if the timer was rescheduled either by dev_watchdog() itself or concurrently by netdev_watchdog_up(). Fixes: f12bf6f3f942 ("net: watchdog: add net device refcount tracker") Reported-by: syzbot+381d82bbf0253710b35d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a26b751.c25708ab.1b19ef.0013.GAE@google.com/T/#u Tested-by: syzbot+3479efbc2821cb2a79f2@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet Link: https://patch.msgid.link/20260611152737.2580480-1-edumazet@google.com Signed-off-by: Jakub Kicinski

net: add retry mechanism to ndo_set_rx_mode_async

2026-06-10T01:15:30+00:00

When ndo_set_rx_mode_async returns an error, schedule a retry with exponential backoff (1s, 2s, 4s, 8s -- 15s total). Give up after the 4th retry and log an error via netdev_err(). This moves retry logic from individual drivers into the core stack. Timer callback does not hold a ref on dev. Safe because the timer can only be armed when dev is IFF_UP, and __dev_close_many runs timer_delete_sync before clearing IFF_UP. Unregister always closes IFF_UP devices first, so by the time dev can be freed the timer is dead and cannot be re-armed. Reviewed-by: Jakub Kicinski Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20260608154014.227538-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski