summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
44 hoursReapply "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"Jakub Kicinski
This reverts commit 850d9248d2eac662f869c766a598c877690c74e5. This reapplies commit 325eb217e41f ("bnxt_en: bring back rtnl_lock() in the bnxt_open() path"). Breno reports a lockdep warning in bnxt. During FW reset the driver may end up calling netif_set_real_num_tx_queues() (if queue count changes), so calls to bnxt_open() still require rtnl_lock. net/sched/sch_generic.c:1416 suspicious rcu_dereference_protected() usage! dev_qdisc_change_real_num_tx+0x54/0xe0 netif_set_real_num_tx_queues+0x4ed/0xa80 __bnxt_open_nic+0x9cb/0x3490 bnxt_open+0x1cb/0x370 bnxt_fw_reset_task+0x80d/0x1e80 process_scheduled_works+0x9c1/0x13b0 The reverted commit was just an optimization / experiment so let's go back to taking the lock. Reported-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/ah726OtFX-Qw3U-R@gmail.com Fixes: 850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260603195845.2574426-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
44 hoursbonding: annotate data-races arcound churn variablesEric Dumazet
These fields are updated asynchronously by the bonding state machine in ad_churn_machine() while holding bond->mode_lock. bond_info_show_slave() and bond_fill_slave_info() read them without bond->mode_lock being held, we need to add READ_ONCE() and WRITE_ONCE() annotations. Note that AD_CHURN_MONITOR, AD_CHURN, and AD_NO_CHURN are defined exclusively in (kernel private) include/net/bond_3ad.h header. They should be moved to include/uapi/linux/if_bonding.h or userspace tools will have to hardcode their values. Fixes: 4916f2e2f3fc ("bonding: print churn state via netlink") Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260603123514.388226-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
44 hoursrtase: Avoid sleeping in get_stats64()Justin Lai
The .ndo_get_stats64 callback must not sleep because it can be called when reading /proc/net/dev. rtase_get_stats64() calls rtase_dump_tally_counter(), which polls the tally counter dump bit with read_poll_timeout(). This may sleep while waiting for the hardware counter dump to complete. Use read_poll_timeout_atomic() instead to avoid sleeping in the get_stats64() path. Fixes: 079600489960 ("rtase: Implement net_device_ops") Cc: stable@vger.kernel.org Signed-off-by: Justin Lai <justinlai0215@realtek.com> Link: https://patch.msgid.link/20260603061816.31356-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
44 hoursvxlan: vnifilter: fix spurious notification on VNI updateAndy Roulin
When a VNI is re-added with the same attributes (e.g. same group or no group), vxlan_vni_update() sends a spurious RTM_NEWTUNNEL notification even though nothing changed. The bug is that 'if (changed)' tests whether the pointer is non-NULL, not the bool value it points to. Since every caller passes a valid pointer, the condition is always true and the notification fires unconditionally. Fix by dereferencing the pointer: 'if (*changed)'. Reproducer: # ip link add vxlan100 type vxlan dstport 4789 local 10.0.0.1 \ nolearning external vnifilter # ip link set vxlan100 up # bridge monitor vni & # bridge vni add vni 1000 dev vxlan100 # bridge vni add vni 1000 dev vxlan100 # spurious notification Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260602185138.253265-3-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
44 hoursvxlan: vnifilter: send notification on VNI addAndy Roulin
When a new VNI is added to a vxlan device with vnifilter enabled, no RTM_NEWTUNNEL notification is sent to userspace. This means 'bridge monitor vni' never shows VNI add events, even though VNI delete events are reported correctly. The bug is in vxlan_vni_add(), where the notification is guarded by 'if (changed)'. The 'changed' flag is set by vxlan_vni_update_group() only when the multicast group or remote IP is modified, but for a new VNI added without a group (e.g. in L3 VxLAN interface scenarios), the function returns early without setting changed=true. Since this is a new VNI, the notification should be sent unconditionally. The notification is not guarded by the return value of vxlan_vni_update_group() because, at this point, the VNI has already been inserted into the hash table and list with no rollback on error. The VNI will be visible in 'bridge vni show' regardless, so userspace should be informed. This is consistent with vxlan_vni_del() which also notifies unconditionally. The 'if (changed)' guard remains correct in vxlan_vni_update(), which handles the case where a VNI already exists and is being re-added -- there, we only want to notify if the group/remote actually changed. Reproducer: # ip link add vxlan100 type vxlan dstport 4789 local 10.0.0.1 \ nolearning external vnifilter # ip link set vxlan100 up # bridge monitor vni & # bridge vni add vni 1000 dev vxlan100 # no notification # bridge vni delete vni 1000 dev vxlan100 # notification received Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Reported-by: Chirag Shah <chirag@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260602185138.253265-2-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
45 hoursrtase: Reset TX subqueue when clearing TX ringJustin Lai
rtase_tx_clear() clears the TX ring and resets the ring indexes. However, the TX queue state and BQL accounting are not reset at the same time. This may leave __QUEUE_STATE_STACK_XOFF asserted after rtase_sw_reset(), preventing new TX packets from being scheduled. Reset the TX subqueue when clearing the TX ring so the TX queue state and BQL accounting are restored together. Fixes: 5a2a2f15244c ("rtase: Implement the rtase_down function") Cc: stable@vger.kernel.org Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260602114659.12335-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
45 hoursocteontx2-af: npc: Fix CPT channel mask in npc_install_flowNithin Dabilpuram
Use the CPT-aware NIX channel mask in the npc_install_flow path so that when the host PF installs steering rules in kernel for a VF used from userspace (e.g. DPDK), MCAM entries see the same channel mask semantics as other RX paths. Fixes: 56bcef528bd8 ("octeontx2-af: Use npc_install_flow API for promisc and broadcast entries") Cc: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260602045853.1558530-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysnet: bonding: fix NULL pointer dereference in bond_do_ioctl()ZhaoJinming
In bond_do_ioctl(), slave_dev is obtained via __dev_get_by_name() which can return NULL if the requested interface name does not exist. However, the subsequent slave_dbg() call is placed before the NULL check: slave_dev = __dev_get_by_name(net, ifr->ifr_slave); slave_dbg(bond_dev, slave_dev, "slave_dev=%p:\n", slave_dev); //here if (!slave_dev) return -ENODEV; The slave_dbg() macro expands to netdev_dbg(bond_dev, "(slave %s): " fmt, (slave_dev)->name, ...) which unconditionally dereferences slave_dev->name before the NULL check is performed. This results in a NULL pointer dereference kernel oops when a user calls bonding ioctl (e.g. SIOCBONDENSLAVE, SIOCBONDRELEASE, etc.) with a non-existent slave interface name. This is reachable from userspace via the bonding ioctl interface with CAP_NET_ADMIN capability, making it a potential local denial-of-service vector. Fix by moving the slave_dbg() call after the NULL check. Fixes: e2a7420df2e0 ("bonding/main: convert to using slave printk macros") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com> Link: https://patch.msgid.link/20260601085649.4029067-1-zhaojinming@uniontech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 daysgeneve: fix length used in GRO hint UDP checksum adjustmentAntoine Tenart
In geneve_post_decap_hint the length used for adjusting the UDP checksum should be 'skb->len - gro_hint->nested_tp_offset' (UDP length) instead of 'skb->len - gro_hint->nested_nh_offset' (IP length). Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path") Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260521131436.748832-1-jhs%40mojatatu.com Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260529144713.780938-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 daysnet: ethernet: mtk_eth_soc: Fix use-after-free in metadata dst teardownLorenzo Bianconi
mtk_free_dev() calls metadata_dst_free() which frees the metadata_dst with kfree() immediately, bypassing the RCU grace period. In the RX path, skb_dst_set_noref() sets a non-refcounted pointer from the skb to the metadata_dst. This function requires RCU read-side protection and the dst must remain valid until all RCU readers complete. Since metadata_dst_free() calls kfree() directly, a use-after-free can occur if any skb still holds a noref pointer to the dst when the driver tears it down. Replace metadata_dst_free() with dst_release() which properly goes through the refcount path: when the refcount drops to zero, it schedules the actual free via call_rcu_hurry(), ensuring all RCU readers have completed before the memory is freed. Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260602-airoha-mtk-metadata-uaf-fix-v1-2-3aaa99d83351@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysnet: airoha: Fix use-after-free in metadata dst teardownLorenzo Bianconi
airoha_metadata_dst_free() runs metadata_dst_free() which frees the metadata_dst with kfree() immediately, bypassing the RCU grace period. In the RX path, skb_dst_set_noref() sets a non-refcounted pointer from the skb to the metadata_dst. This function requires RCU read-side protection and the dst must remain valid until all RCU readers complete. Since metadata_dst_free() calls kfree() directly, an use-after-free can occur if any skb still holds a noref pointer to the dst when the driver tears it down. Replace metadata_dst_free() with dst_release() which properly goes through the refcount path: when the refcount drops to zero, it schedules the actual free via call_rcu_hurry(), ensuring all RCU readers have completed before the memory is freed. Fixes: af3cf757d5c9 ("net: airoha: Move DSA tag in DMA descriptor") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260602-airoha-mtk-metadata-uaf-fix-v1-1-3aaa99d83351@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysMerge tag 'wireless-2026-06-03' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Things are finally quieting down: - iwlwifi: - FW reset handshake removal for older devices - NIC access fix in fast resume - avoid too large command for some BIOSes - fix TX power constraints in AP mode - cfg80211: - fix netlink parse overflow - fix potential 6 GHz scan memory leak - enforce HE/EHT consistency to avoid mac80211 crash - mac80211: guard radiotap antenna parsing * tag 'wireless-2026-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: enforce HE/EHT cap/oper consistency wifi: fix leak if split 6 GHz scanning fails wifi: mac80211: limit injected antenna index in ieee80211_parse_tx_radiotap wifi: nl80211: reject oversized EMA RNR lists wifi: iwlwifi: pcie: simplify the resume flow if fast resume is not used wifi: iwlwifi: mvm: avoid oversized UATS command copy wifi: iwlwifi: mld: send tx power constraints before link activation wifi: iwlwifi: mvm: don't support the reset handshake for old firmwares ==================== Link: https://patch.msgid.link/20260603113208.171874-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysocteontx2-af: Fix initialization of mcam's entry2target_pffunc fieldSuman Ghosh
NPC mcam entry stores a mapping between mcam entry and target pcifunc. During initialization of this field, API kmalloc_array has been used which caused some junk values to array. Whereas, the array is expected to be initialized by 0. This patch fixes the same by using kcalloc instead of kmalloc_array. Fixes: 55307fcb9258 ("octeontx2-af: Add mbox messages to install and delete MCAM rules") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1780054625-17090-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysocteontx2-pf: Fix NDC sync operation errorsGeetha sowjanya
On system reboot "rvu_nicpf 0002:03:00.0: NDC sync operation failed" error messages are shown, even if the operations is successful. This is due to wrong if error check in ndc_syc() function. Fixes: 42c45ac1419c ("octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1780054677-17249-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysnet: sfp: initialize i2c_block_size at adapter configure timeJonas Jelonek
sfp->i2c_block_size is only assigned in sfp_sm_mod_probe(), which runs from the state machine timer after SFP_F_PRESENT has been set. Between those two points, sfp_module_eeprom() (the ethtool -m callback) gates only on SFP_F_PRESENT and can be entered with i2c_block_size still at its kzalloc'd value of 0. On a pure-I2C adapter, sfp_i2c_read() then issues an i2c_transfer() with msgs[1].len = 0 inside a loop that subtracts this_len from len each iteration; on adapters that succeed a zero-length read the loop never advances, spinning while holding rtnl_lock. This was previously addressed by initializing i2c_block_size in sfp_alloc() (commit 813c2dd78618), but the initialization was dropped when i2c_block_size was split from i2c_max_block_size. Initialize sfp->i2c_block_size from sfp->i2c_max_block_size in sfp_i2c_configure(), so the field is valid as soon as the adapter is known. sfp_sm_mod_probe() still reassigns it on each module insertion to recover from a per-module clamp to 1 (sfp_id_needs_byte_io). Fixes: 7662abf4db94 ("net: phy: sfp: Add support for SMBus module access") Cc: stable@vger.kernel.org Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Link: https://patch.msgid.link/20260528205242.971410-2-jelonek.jonas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysMerge tag 'iwlwifi-fixes-2026-05-31' of ↵Johannes Berg
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next wifi: iwlwifi: fixes - 2026-05-31 Miri Korenblit says: ==================== This contains a few fixes: - Don't grab nic access in non-fast-resume - Don't send a large hcmd than transport supports - In AP mode, don't send tx power constraints command before activating the link - Don't do sw reset handshake on older firmwares. ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 daysnet: fec: fix pinctrl default state restore order on resumeTapio Reijonen
In fec_resume(), fec_enet_clk_enable() is called before pinctrl_pm_select_default_state() in the non-WoL path, inverting the ordering used in fec_suspend() which correctly switches to the sleep pinctrl state before disabling clocks. For PHYs with the PHY_RST_AFTER_CLK_EN flag (e.g. TI DP83848 or SMSC LAN87xx), fec_enet_clk_enable() triggers a hardware reset pulse via the phy-reset GPIO. With the GPIO pin still in sleep pinctrl state at that point, the GPIO write has no physical effect and the PHY never receives the required reset after clock enable, leading to unreliable link establishment after system resume. Fix by restoring the default pinctrl state before enabling clocks, making resume the proper mirror of suspend. The call is made unconditionally: fec_suspend() only switches to the sleep pinctrl state on the non-WoL path and leaves the pins in the default state when WoL is enabled, so on a WoL resume the device is already in the default state and pinctrl_pm_select_default_state() is a no-op. Fixes: de40ed31b3c5 ("net: fec: add Wake-on-LAN support") Signed-off-by: Tapio Reijonen <tapio.reijonen@vaisala.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260529-b4-fec-resume-pinctrl-order-v3-1-6eda0f592fca@vaisala.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: lan743x: permit VLAN-tagged packets up to configured MTUDavid Thompson
VLAN-tagged interfaces on lan743x devices were previously unreachable via SSH and failed to respond to large ping packets (e.g. "ping -s 1469" given MTU=1500). In these scenarios, "ethtool -S" reports non-zero "RX Oversize Frame Errors". According to Microchip AN2948, the MAC_RX FSE (VLAN field size enforcement) bit determines whether frames with VLAN tags exceeding the base MTU plus tag length are discarded. The driver must set the MAC_RX.FSE bit before setting MAC_RX.RXEN to allow VLAN-tagged frames up to the interface MTU, preventing them from being treated as oversized. As a result, both the base and VLAN-tagged interfaces can use the same MTU without receive errors. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Thangaraj Samynathan <Thangaraj.s@microchip.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Tested-by: Nicolai Buchwitz <nb@tipi-net.de> # lan7430 on arm64 (RevPi Link: https://patch.msgid.link/20260529210300.433135-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 dayspcnet32: stop holding device spin lock during napi_complete_doneOscar Maes
napi_complete_done may call gro_flush_normal (though not currently, as GRO is unsupported at the moment), which may result in packet TX. This will eventually result in calling pcnet32_start_xmit - resulting in a deadlock while trying to re-acquire the already locked spin lock. It is safe to split the spinlock block into two, because the hardware registers are still protected from concurrent access, and the two blocks perform unrelated operations that don't need to happen atomically. Fixes: 5b2ec6f2be51 ("pcnet32: use napi_complete_done()") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Oscar Maes <oscmaes92@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260528140320.5556-1-oscmaes92@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnet/mlx5: Reorder completion before putting command entry in cmd_work_handlerNikolay Kuratov
Assuming callback != NULL && !page_queue, cmd_work_handler takes command entry with refcnt == 1 from mlx5_cmd_invoke. If either semaphore timeout or index allocation error happens, it does final cmd_ent_put(ent). To avoid access to freed memory, notify slotted completion before cmd_ent_put. This is theoretical issue found by Svace static analyser. Cc: stable@vger.kernel.org Fixes: 485d65e135712 ("net/mlx5: Add a timeout to acquire the command queue semaphore") Fixes: 0e2909c6bec90 ("net/mlx5: Fix variable not being completed when function returns") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260526162932.501584-1-kniv@yandex-team.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 dayswifi: iwlwifi: pcie: simplify the resume flow if fast resume is not usedEmmanuel Grumbach
In most distributions, NetworkManager shuts the device down before entering system suspend, so fast suspend is typically not used. On older devices, resume currently tries to grab NIC access to infer whether the device was powered off while suspended. That probe is only meaningful for the fast-suspend path where the device is expected to remain alive. Unfortunately, for unclear reasons, grabbing NIC access was harmful as reported in the bugzilla ticket below. Workaround this issue by simply not grabbing NIC access if fast suspend is not used. Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221501 Assisted-by: GitHub Copilot:gpt-5.3-codex Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://patch.msgid.link/20260531133005.e2ed9e0cd44f.If283625983a843933e0c01561a421daff184e9e9@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
7 dayswifi: iwlwifi: mvm: avoid oversized UATS command copyEmmanuel Grumbach
MCC_ALLOWED_AP_TYPE_CMD exceeds the fixed copied host-command buffer and triggers warnings in the gen2 enqueue path when command 0xc05 is sent. Use IWL_HCMD_DFL_NOCOPY as it was done before the offending commit. Fixes: 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v2") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260529085453.9af349ab459b.I348df3980764c15efce0099a35fe8a88fb2a6ee2@changeid
7 dayswifi: iwlwifi: mld: send tx power constraints before link activationPagadala Yesu Anjaneyulu
TX power constraints must be sent to the firmware before link activation. If not, the firmware will use default power values. Fix this by moving the iwl_mld_send_ap_tx_power_constraint_cmd() call from iwl_mld_start_ap_ibss() to iwl_mld_assign_vif_chanctx(), before iwl_mld_activate_link() for AP interfaces. Also update the guard in the function to allow it to run before link activation for AP interfaces. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260529085453.06c94b01efd2.Id43bdfe5eb030061c23348779687ba71b5f58182@changeid
7 dayswifi: iwlwifi: mvm: don't support the reset handshake for old firmwaresEmmanuel Grumbach
-77.ucode doesn't contain the fixes for this flow it seems. Don't use the firmware reset handshake even if the firmware claims support for it. Fixes: 906d4eb84408 ("iwlwifi: support firmware reset handshake") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220600 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260529085453.9307b81d9b02.I21bba9e649f4cd0e35d3ea6cd97a03258be5832f@changeid
8 dayswireguard: send: append trailer after expanding headJason A. Donenfeld
With how this is currently written, we add the trailer, zero it out, and then add the header space on. If that header space requires a reallocation + copy, the zeros in the trailer aren't copied, because the skb len hasn't actually been yet expanded to cover that. Instead add the padding at the end of the process rather than at the beginning. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20260529173134.3080773-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysnet: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configurationFrank Wunderlich
Commit 8871389da151 introduces common pcs dts properties which writes rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch. This is initialized with tx-bit set and so change inverts polarity compared to before. It looks like mt7531 has tx polarity inverted in hardware and set tx-bit by default to restore the normal polarity. The MT7531 datasheet quite clearly states: Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control Reset value: 0x00000501 BIT 1 RX_BIT_POLARITY -- RX bit polarity control 1'b0: normal 1'b1: inverted BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed in MT7531) 1'b0: normal 1'b1: inverted Till this patch the register write was only called when mediatek,pnswap property was set which cannot be done for switch because the fw-node param was always NULL from switch driver in the mtk_pcs_lynxi_create call. Do not configure switch side like it's done before. Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260526153239.30194-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mana: Skip redundant detach on already-detached portDipayaan Roy
When mana_per_port_queue_reset_work_handler() runs after a previous detach succeeded but attach failed, the port is left in a detached state with apc->tx_qp and apc->rxqs already freed. Calling mana_detach() again unconditionally leads to NULL pointer dereferences during queue teardown. Add an early exit in mana_detach() when the port is already in detached state (!netif_device_present) for non-close callers, making it safe to call idempotently. This allows the queue reset handler and other recovery paths to simply retry mana_attach() without redundant teardown. Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-3-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mana: Add NULL guards in teardown path to prevent panic on attach failureDipayaan Roy
When queue allocation fails partway through, the error cleanup frees and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as mana_remove(), mana_change_mtu() recovery, and internal error handling in mana_alloc_queues() can subsequently call into functions that dereference these pointers without NULL checks: - mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26). - mana_destroy_vport() iterates apc->rxqs without a NULL check. - mana_fence_rqs() iterates apc->rxqs without a NULL check. - mana_dealloc_queues() iterates apc->tx_qp without a NULL check. Add NULL guards for apc->rxqs in mana_fence_rqs(), mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue draining when TX queues were never allocated or already freed. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-2-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'net-7.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "This is again significantly bigger than the same point into the previous cycle, but at least smaller than last week. I'm not aware of any pending regression for the current cycle. Including fixes from netfilter. Current release - regressions: - netfilter: walk fib6_siblings under RCU Previous releases - regressions: - netlink: fix sending unassigned nsid after assigned one - bridge: fix sleep in atomic context in netlink path - sched: fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop - ipv4: fix net->ipv4.sysctl_local_reserved_ports UaF - eth: tun: free page on short-frame rejection in tun_xdp_one() Previous releases - always broken: - skbuff: fix missing zerocopy reference in pskb_carve helpers - handshake: drain pending requests at net namespace exit - ethtool: - rss: avoid modifying the RSS context response - module: avoid leaking a netdev ref on module flash errors - coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES - netfilter: fix dst corruption in same register operation - nfc: hci: fix out-of-bounds read in HCP header parsing - ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo() - eth: - vti: use ip6_tnl.net in vti6_changelink(). - vxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()" * tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) dpll: zl3073x: make frequency monitor a per-device attribute dpll: zl3073x: use __dpll_device_change_ntf() and remove change_work dpll: export __dpll_device_change_ntf() for use under dpll_lock net/handshake: Drain pending requests at net namespace exit net/handshake: Verify file-reference balance in submit paths net/handshake: Close the submit-side sock_hold race net/handshake: hand off the pinned file reference to accept_doit net/handshake: Take a long-lived file reference at submit net/handshake: Pass negative errno through handshake_complete() nvme-tcp: store negative errno in queue->tls_err net/handshake: Use spin_lock_bh for hn_lock net: skbuff: fix missing zerocopy reference in pskb_carve helpers net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path net: hibmcge: disable Relaxed Ordering to fix RX packet corruption selftests/tc-testing: Add netem test case exercising loops selftests/tc-testing: Add mirred test cases exercising loops net/sched: act_mirred: Fix return code in early mirred redirect error paths net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop net/sched: fix packet loop on netem when duplicate is on ...
9 daysnet: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX pathJijie Shao
The dma_rmb() barrier was placed before dma_sync_single_for_cpu(), which is incorrect. DMA sync must complete first to make the buffer accessible to the CPU, then the rmb barrier ensures subsequent descriptor reads observe the latest data written by the hardware. Reorder the operations so dma_sync_single_for_cpu() is called before dma_rmb() to guarantee the driver reads consistent data from the DMA buffer. Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260525144525.94884-3-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: hibmcge: disable Relaxed Ordering to fix RX packet corruptionJijie Shao
When SMMU is disabled, the hibmcge driver may receive corrupted packets. The hardware writes packet data and descriptors to the same page, but with Relaxed Ordering enabled, PCI write transactions may not be strictly ordered. This can cause the driver to observe a valid descriptor before the corresponding packet data is fully written. Fix this by clearing PCI_EXP_DEVCTL_RELAX_EN in the PCI bridge control register to ensure strict write ordering between packet data and descriptors. Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260525144525.94884-2-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 daysbonding: refuse to enslave CAN devicesOliver Hartkopp
syzbot reported a kernel paging request crash in can_rx_unregister() inside net/can/af_can.c. The crash occurs because a virtual CAN device (vxcan) is being enslaved to a bonding master. During the enslavement process, the bonding driver mutates and modifies the network device states to fit an Ethernet-like aggregation model. However, CAN devices operate on a completely different Layer 2 architecture, relying on the CAN mid-layer private data structure (can_ml_priv) instead of standard Ethernet structures. Since bonding does not initialize or maintain these CAN structures, subsequent operations on the half-enslaved interface (such as closing associated sockets via isotp_release) lead to a null-pointer dereference when accessing the CAN receiver lists. Bonding CAN interfaces is architecturally invalid as CAN lacks MAC addresses, ARP capabilities, and standard Ethernet link-layer mechanisms. While generic loopback devices are blocked globally in net/core/dev.c, virtual CAN devices bypass this check because they do not carry the IFF_LOOPBACK flag, despite acting as local software-loopbacks. Fix this by explicitly blocking network devices of type ARPHRD_CAN from being enslaved at the very beginning of bond_enslave(). This prevents illegal state mutations, eliminates the resulting KASAN crashes, and avoids potential memory leaks from incomplete socket cleanups. As the CAN support has been added a long time after bonding the Fixes-tag points to the introduction of ARPHRD_CAN that would have needed a specific handling in bonding_main.c. Fixes: cd05acfe65ed ("[CAN]: Allocate protocol numbers for PF_CAN") Reported-by: syzbot+8ed98cbd0161632bce95@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8ed98cbd0161632bce95 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20260526-bonding-candev-v1-1-ba1df400918a@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysvxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()Eric Dumazet
skb_tunnel_check_pmtu() can change skb->head. Reusing old_iph afer skb_tunnel_check_pmtu() can cause an UAF. Use instead ip_hdr(skb) as done in drivers/net/bareudp.c and drivers/net/geneve.c. Found by Sashiko. Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Link: https://patch.msgid.link/20260525203642.2389723-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: phy: air_en8811h: add AN8811HB MCU assert/deassert supportLucien.Jheng
AN8811HB needs a MCU soft-reset cycle before firmware loading begins. Assert the MCU (hold it in reset) and immediately deassert (release) via a dedicated PBUS register pair (0x5cf9f8 / 0x5cf9fc), accessed through a registered mdio_device at PHY-addr+8. Add __air_pbus_reg_write() as a low-level helper taking a struct mdio_device *, create and register the PBUS mdio_device in an8811hb_probe() and store it in priv->pbusdev, then implement an8811hb_mcu_assert() / _deassert() on top of it. Add an8811hb_remove() to unregister the PBUS device on teardown. Wire both calls into an8811hb_load_firmware() and en8811h_restart_mcu() so every firmware load or MCU restart on AN8811HB correctly sequences the reset control registers. Fixes: 5afda1d734ed ("net: phy: air_en8811h: add Airoha AN8811HB support") Signed-off-by: Lucien Jheng <lucienzx159@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260524063915.47961-1-lucienzx159@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 daysMerge tag 'mm-hotfixes-stable-2026-05-25-16-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address post-7.1 issues or aren't considered suitable for backporting. All patches are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-05-25-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "mm: introduce a new page type for page pool in page type" mm/vmalloc: do not trigger BUG() on BH disabled context MAINTAINERS, mailmap: change email for Eugen Hristev mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page kernel/fork: validate exit_signal in kernel_clone() mm: memcontrol: propagate NMI slab stats to memcg vmstats mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one zram: fix use-after-free in zram_writeback_endio memfd: deny writeable mappings when implying SEAL_WRITE ipc: limit next_id allocation to the valid ID range Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" MAINTAINERS: .mailmap: update after GEHC spin-off
11 daysnet: team: fix NULL pointer dereference in team_xmit during mode changeWeiming Shi
__team_change_mode() clears team->ops with memset() before restoring safe dummy handlers via team_adjust_ops(). A concurrent team_xmit() running under RCU on another CPU can read team->ops.transmit during this window and call a NULL function pointer, crashing the kernel. The race requires a mode change (CAP_NET_ADMIN) concurrent with transmit on the team device. BUG: kernel NULL pointer dereference, address: 0000000000000000 Oops: 0010 [#1] SMP KASAN NOPTI RIP: 0010:0x0 Call Trace: team_xmit (drivers/net/team/team_core.c:1853) dev_hard_start_xmit (net/core/dev.c:3904) __dev_queue_xmit (net/core/dev.c:4871) packet_sendmsg (net/packet/af_packet.c:3109) __sys_sendto (net/socket.c:2265) The original code assumed that no ports means no traffic, so mode changes could freely memset()/memcpy() the ops. AF_PACKET with forced carrier breaks that assumption. Prevent the race instead of making it safe: replace memset()/memcpy() with per-field updates that never touch transmit or receive. Those two handlers are managed solely by team_adjust_ops(), which already installs dummies when tx_en_port_count == 0 (always true during mode change since no ports are present). WRITE_ONCE/READ_ONCE prevent store/load tearing on the handler pointers. synchronize_net() before exit_op() drains in-flight readers that may still reference old mode state from before port removal switched the handlers to dummies. Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260521081159.1491563-3-bestswngs@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 daysocteontx2-af: validate body pcifunc in rvu_mbox_handler_rep_event_notifyMichael Bommarito
rvu_mbox_handler_rep_event_notify() in drivers/net/ethernet/marvell/ octeontx2/af/rvu_rep.c queues a sender-controlled REP_EVENT_NOTIFY request body verbatim, and rvu_rep_up_notify() then forwards event->pcifunc (the nested body field, distinct from the AF-normalised header pcifunc) into rvu_get_pfvf(), rvu_get_pf() and the AF->PF mailbox device index without any bounds check. A VF attached to a PF that has been put into switchdev representor mode reaches this path: the VF mailbox handler otx2_pfvf_mbox_handler() forwards every message id including MBOX_MSG_REP_EVENT_NOTIFY to AF without an allowlist, and the AF dispatcher rewrites only msg->pcifunc, leaving struct rep_event::pcifunc attacker-controlled. The sibling rvu_mbox_handler_esw_cfg() refuses requests whose header pcifunc is not rvu->rep_pcifunc; this handler has no equivalent gate. An out-of-range body pcifunc selects an &rvu->pf[]/&rvu->hwvf[] element past the allocated array and, for RVU_EVENT_MAC_ADDR_CHANGE, turns into a six-byte attacker-chosen OOB ether_addr_copy() target inside the queued worker; KASAN reports a slab-out-of-bounds write in rvu_rep_wq_handler. Reject malformed requests at the handler entry by gating on is_pf_func_valid(), which is already the canonical PF/VF range check in this driver; expose it via rvu.h so callers in rvu_rep.c can use it instead of open-coding the same range arithmetic. Fixes: b8fea84a0468 ("octeontx2-pf: Add support to sync link state between representor and VFs") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260520154157.1439319-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysmacsec: fix replay protection at XPN lower-PN wrapJunrui Luo
In macsec_post_decrypt(), when pn is U32_MAX, pn + 1 overflows u32 to 0 and the first branch never fires. If next_pn_halves.lower is also in the upper half, pn_same_half(pn, lower) is true and the XPN else-if does not fire either, leaving next_pn_halves unchanged. An attacker that captures the legitimate frame carrying pn == 0xFFFFFFFF on an XPN association can then replay it indefinitely, since lowest_pn never rises above the captured pn and macsec_decrypt() reconstructs the same IV. Extend the XPN else-if to also fire when pn + 1 wraps to 0, so receipt of pn == U32_MAX advances next_pn_halves to (upper + 1, 0). Fixes: a21ecf0e0338 ("macsec: Support XPN frame handling - IEEE 802.1AEbw") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://patch.msgid.link/SYBPR01MB78813FD49E58F253B989F197AF012@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysnet/mlx5: HWS: Reject unsupported remove-header actionPrathamesh Deshpande
mlx5_cmd_hws_packet_reformat_alloc() handles MLX5_REFORMAT_TYPE_REMOVE_HDR by looking up a matching HWS remove-header action. If mlx5_fs_get_action_remove_header_vlan() returns NULL, the code only logs an error and continues. The function then returns success with a NULL HWS action stored in the packet-reformat object. Return an error when no matching remove-header action is available. Fixes: aecd9d1020e3 ("net/mlx5: fs, add HWS packet reformat API function") Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260506000054.51797-1-prathameshdeshpande7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-22tun: free page on build_skb failure in tun_xdp_one()Weiming Shi
When build_skb() fails in tun_xdp_one(), the function sets ret to -ENOMEM and jumps to the out label, which returns without freeing the page that vhost_net_build_xdp() allocated for the frame. As with the short-frame rejection path, tun_sendmsg() discards the per-buffer error and still returns total_len, so vhost_tx_batch() takes the success path and never frees the page. Each build_skb() failure in a batch leaks one page-frag chunk. Free the page before taking the error path, matching the put_page() the other error exits of tun_xdp_one() already perform. Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163312.1479805-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-22tap: free page on error paths in tap_get_user_xdp()Weiming Shi
tap_get_user_xdp() rejects a frame shorter than ETH_HLEN with -EINVAL, and returns -ENOMEM when build_skb() fails. Both paths jump to the err label without freeing the page that vhost_net_build_xdp() allocated for the frame. tap_sendmsg() discards the per-buffer return value and always returns 0, so vhost_tx_batch() takes the success path and never frees the page; each rejected frame in a batch leaks one page-frag chunk. Free the page on both error paths, before the skb is built. This is the tap counterpart of the same leak in tun_xdp_one(). Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()") Fixes: ed7f2afdd0e0 ("tap: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163230.1478627-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-22tun: free page on short-frame rejection in tun_xdp_one()Weiming Shi
tun_xdp_one() returns -EINVAL on a frame shorter than ETH_HLEN without freeing the page that vhost_net_build_xdp() allocated for it. tun_sendmsg() discards that -EINVAL and still returns total_len, so vhost_tx_batch() takes the success path and never frees the page; each short frame in a batch leaks one page-frag chunk. A local process that can open /dev/net/tun and /dev/vhost-net can hit this path: it attaches a tun/tap device as the vhost-net backend and feeds TX descriptors whose length minus the virtio-net header is below ETH_HLEN. Each kick leaks the page-frag chunks for that batch, and a tight submission loop exhausts host memory and triggers an OOM panic. Free the page before returning -EINVAL, matching the XDP-program error path in the same function. Fixes: 049584807f1d ("tun: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260520160020.375349-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21Revert "mm: introduce a new page type for page pool in page type"Byungchul Park
This reverts commit db359fccf212 ("mm: introduce a new page type for page pool in page type") and a part of 735a309b4bfb9e ("net: add net_iov_init() and use it to initialize ->page_type"). Netpp page_type'ed pages might be used in mapping so as to use @_mapcount. However, since @page_type and @_mapcount are union'ed in struct page, these two can't be used at the same time. Revert the commit introducing page_type for Netpp for now. The patch will be retried once @page_type and @_mapcount get allowed to be used at the same time. The revert also includes removal of @page_type initialization part introduced by commit 735a309b4bfb9e ("net: add net_iov_init() and use it to initialize ->page_type"), which will be restored on the retry. Link: https://lore.kernel.org/20260515034701.17027-1-byungchul@sk.com Fixes: db359fccf212 ("mm: introduce a new page type for page pool in page type") Signed-off-by: Byungchul Park <byungchul@sk.com> Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Closes: https://lore.kernel.org/all/982b9bc1-0a0a-4fc5-8e3a-3672db2b29a1@nvidia.com Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Mark Bloch <mbloch@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Toke Hoiland-Jorgensen <toke@redhat.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-21Merge tag 'wireless-2026-05-21' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a few more updates: - cfg80211/mac80211: - various security(-ish) fixes - fix A-MSDU subframe handling - fix multi-link element parsing - ath10: avoid sending commands to dead device - ath11k: - fix WMI buffer leaks on error conditions - fix UAF in RX MSDU coalesce path - allow peer ID 0 on RX path (legal for mobile devices) - reinitialize shared SRNG pointers on restart - ath12k: - fix 20 MHz-only parsing of EHT-MCS map - iwlwifi: - fix TSO segmentation explosion - don't TX to dead device - fix warning in WoWLAN - fix TX rates on old devices - disconnect on beacon loss only if also no other traffic - fill NULL-ptr deref - fix STEP_URM hardware access * tag 'wireless-2026-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (24 commits) wifi: cfg80211: wext: validate chandef in monitor mode wifi: mac80211: consume only present negotiated TTLM maps wifi: wilc1000: fix dma_buffer leak on bus acquire failure wifi: mac80211: capture fast-RX rate before mesh reuses skb->cb wifi: mac80211: fix multi-link element inheritance wifi: mac80211: fix MLE defragmentation wifi: mac80211: don't override max_amsdu_subframes wifi: mac80211: bounds-check link_id in ieee80211_ml_epcs wifi: ath12k: fix EHT TX MCS limitation due to wrong 20 MHz-only parsing wifi: ath11k: clear shared SRNG pointer state on restart wifi: ath11k: fix use after free in ath11k_dp_rx_msdu_coalesce() wifi: ath11k: fix peer resolution on rx path when peer_id=0 wifi: iwlwifi: mld: disconnect only after 6 beacons without Rx wifi: iwlwifi: mld: don't WARN on WoWLAN suspend w/o BSS vif wifi: iwlwifi: use correct function to read STEP_URM register wifi: iwlwifi: mvm: fix driver-set TX rates on old devices wifi: iwlwifi: mld: don't dereference a pointer before NULL checking it wifi: iwlwifi: mld: stop TX during firmware restart wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled wifi: ath10k: skip WMI and beacon transmission when device is wedged ... ==================== Link: https://patch.msgid.link/20260521152903.374070-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21net: enetc: avoid VF->PF mailbox timeout during SR-IOV teardownWei Fang
During SR-IOV teardown, enetc_msg_psi_free() disables the MR interrupt before pci_disable_sriov() removes the VFs. If a VF sends a mailbox message during this window, the PF cannot receive it, causing the VF to timeout waiting for a reply. Since the timeout occurs during SR-IOV teardown when the VF is about to be removed anyway, it has no functional impact on operation. However, more messages will be added in the future, some visible error logs may confuse users. So fix it by calling pci_disable_sriov() first to remove all VFs, then safely clean up the mailbox resources. This eliminates the race window where VFs could send messages to an unresponsive PF. Fixes: beb74ac878c8 ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260520064421.91569-10-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21net: enetc: fix init and teardown order to prevent use of unsafe resourcesWei Fang
Sashiko reported a potential issue in enetc_msg_psi_init() where the IRQ handler is registered before DMA resources are fully initialized [1]. The current initialization sequence is: 1. request_irq(enetc_msg_psi_msix) <- IRQ handler registered 2. INIT_WORK(&pf->msg_task, ...) <- work_struct initialized 3. enetc_msg_alloc_mbx() <- mailbox DMA allocated This ordering is unsafe because if a spurious interrupt or pending interrupt from a previous device state fires immediately after request_irq() returns, the registered ISR enetc_msg_psi_msix() will execute and unconditionally call: schedule_work(&pf->msg_task) At this point, pf->msg_task has not been initialized by INIT_WORK(), so the work_struct contains garbage values in its internal linked list pointers (work_struct->entry). Passing an uninitialized work_struct to schedule_work() could corrupt the kernel's workqueue linked lists, potentially leading to: - Kernel panic in __queue_work() - Memory corruption in workqueue data structures - System deadlock or undefined behavior Additionally, even if the work_struct was initialized, the mailbox DMA buffers (pf->rxmsg[]) may not yet be allocated when the work handler enetc_msg_task() runs, resulting in NULL pointer dereference. Fix by reordering the initialization sequence to ensure all resources are properly initialized before the interrupt handler can execute: 1. enetc_msg_alloc_mbx() <- Allocate all mailboxes 2. INIT_WORK(&pf->msg_task, ...) <- Initialize work first 3. request_irq(enetc_msg_psi_msix) <- Register IRQ last 4. Configure hardware & enable MR interrupts This guarantees that when enetc_msg_psi_msix() runs: - pf->msg_task is properly initialized (safe for schedule_work) - pf->rxmsg[] buffers are allocated (safe for work handler access) - Hardware is configured appropriately As the inverse of enetc_msg_psi_init(), enetc_msg_psi_free() also has similar problems. For example, if a pending interrupt fires between enetc_msg_free_mbx() and free_irq(), the ISR enetc_msg_psi_msix() may schedule the work handler again via schedule_work(), which could then access already-freed DMA buffers (pf->rxmsg[]), leading to use-after-free and potential memory corruption. Therefore, the order of enetc_msg_psi_free() is adjusted: 1. enetc_msg_disable_mr_int() <- Stop new interrupts first 2. free_irq() <- Ensure no IRQ handler can run 3. cancel_work_sync() <- Wait for any pending work 4. enetc_msg_disable_mr_int() <- Re-disable in case work re-enabled it 5. enetc_msg_free_mbx() <- Safe to free DMA buffers now Link: https://sashiko.dev/#/patchset/20260511080805.2052495-1-wei.fang%40nxp.com #1 Fixes: beb74ac878c8 ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260520064421.91569-9-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21net: enetc: fix unbounded loop and interrupt handling in VF-to-PF messagingWei Fang
The enetc_msg_task() function has several issues that need to be addressed: 1. Unbounded loop causing potential DoS: enetc_msg_task() processes VF-to-PF mailbox messages in an unbounded for(;;) loop that keeps polling ENETC_PSIMSGRR until no MR bits are set. A malicious guest VM can exploit this by continuously sending messages at a high rate - immediately sending a new message as soon as the PF acknowledges the previous one. Since the worker thread never yields or enforces a processing budget, the mr_mask check frequently evaluates to non-zero, causing the PF to spin indefinitely and starving other tasks. Fix this by replacing the unbounded loop with a single snapshot read at task entry. The task processes only the VFs whose MR bits were set at that point, then re-enables message interrupts before returning. This bounds work per invocation to at most num_vfs iterations. No messages are lost because the message interrupt is disabled in enetc_msg_psi_msix() before scheduling enetc_msg_task(), so any new messages arriving during processing will trigger a fresh interrupt once re-enabled, scheduling another task invocation. 2. Write order of ENETC_PSIIDR and ENETC_PSIMSGRR: Both ENETC_PSIIDR and ENETC_PSIMSGRR contain MR bits indicating messages have been received from VSIs, but only ENETC_PSIIDR trigger the CPU interrupt. Previously, ENETC_PSIMSGRR was written before ENETC_PSIIDR. Writing ENETC_PSIMSGRR returns the message code to the VSI in its upper 16 bits, signaling to the VF that message processing is complete and it may send the next message. If the VF sends a new message before ENETC_PSIIDR is written, the subsequent w1c write to ENETC_PSIIDR would inadvertently clear the MR bit set by the new message, causing the interrupt to be lost and the new message to go unprocessed. Therefore, write ENETC_PSIIDR first to clear the interrupt source, then write ENETC_PSIMSGRR to acknowledge the message to the VSI. 3. Check both ENETC_PSIMSGRR and ENETC_PSIIDR for mr_status: The write order change above introduces a potential race: if a VF sends a new message in the window between the ENETC_PSIIDR w1c and the ENETC_PSIMSGRR w1c, the ENETC_PSIMSGRR MR bit for the new message may not be set. If mr_status was derived solely from ENETC_PSIMSGRR, this message would never be detected despite ENETC_PSIIDR retaining its MR bit, leading to an unacknowledged interrupt storm. Fix this by computing mr_status as the union of both ENETC_PSIMSGRR and ENETC_PSIIDR MR bits, ensuring all pending messages are detected regardless of which register reflects the new message state. Additionally, rename the per-register MR macros (ENETC_PSI*_MR_MASK, ENETC_PSI*_MR) to register-agnostic names (ENETC_PSIMR_MASK, ENETC_PSIMR_BIT) since the MR bit layout is shared across ENETC_PSIMSGRR, ENETC_PSIIER, and ENETC_PSIIDR. Make the mask macro dynamic based on the actual number of active VFs rather than hardcoded. Fixes: beb74ac878c8 ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260520064421.91569-8-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21net: enetc: fix DMA write to freed memory in enetc_msg_free_mbx()Wei Fang
The teardown sequence in enetc_msg_psi_free() frees the DMA buffer before clearing the device's DMA address registers. If a VF sends a message or a pending DMA transfer completes within this window, the hardware will perform a DMA write into the kernel memory that has already been returned to the allocator. The result is silent memory corruption that can affect arbitrary kernel data structures. Therefore, clear the DMA address registers before the DMA buffer is freed. Fixes: beb74ac878c8 ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260520064421.91569-7-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21net: enetc: fix race condition in VF MAC address configurationWei Fang
Sashiko reported a potential race condition between the VF message handler and administrative VF MAC configuration from the host [1]. The VF message handler (enetc_msg_pf_set_vf_primary_mac_addr) runs asynchronously in a workqueue context and accesses vf_state->flags without any locking. Concurrently, the host can administratively change the VF MAC address via enetc_pf_set_vf_mac(), which executes under RTNL lock and modifies both vf_state->flags and hardware registers. This creates two race windows: 1) TOCTOU race on vf_state->flags: The check of ENETC_VF_FLAG_PF_SET_MAC and subsequent MAC programming are not atomic, allowing the flag state to change between check and use. 2) Torn MAC address writes: Hardware MAC programming requires multiple non-atomic register writes (__raw_writel for lower 32 bits and __raw_writew for upper 16 bits). Concurrent updates from VF mailbox and PF admin paths can interleave these operations, resulting in a corrupted MAC address being programmed into the hardware. Fix by introducing a per-VF mutex to serialize access to vf_state and hardware MAC register updates. Both enetc_pf_set_vf_mac() and enetc_msg_pf_set_vf_primary_mac_addr() now acquire this lock before accessing vf_state->flags or programming the MAC address, ensuring atomic read-modify-write sequences and preventing register write interleaving. Link: https://sashiko.dev/#/patchset/20260511080805.2052495-1-wei.fang%40nxp.com #1 Fixes: beb74ac878c8 ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260520064421.91569-6-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21net: enetc: fix TOCTOU race and validate VF MAC addressWei Fang
Sashiko reported that the PF driver accepts arbitrary MAC address from from VF mailbox messages without proper validation, creating a security vulnerability [1]. In enetc_msg_pf_set_vf_primary_mac_addr(), the MAC address is extracted directly from the message buffer (cmd->mac.sa_data) and programmed into hardware via pf->ops->set_si_primary_mac() without any validity checks. A malicious VF can configure a multicast, broadcast, or all-zero MAC address. Therefore, a validation to check the MAC address provided by VF is required. However, simply checking the MAC address is not enough, because it also has the potential TOCTOU race [2]: The code reads the MAC address from the DMA buffer to validate it via is_valid_ether_addr(), if validation passes, reads the same DMA buffer a second time when calling enetc_pf_set_primary_mac_addr() to program the hardware. A malicious VF can exploit this window by overwriting the MAC address in the DMA buffer between the validation check and the hardware programming, bypassing the validation entirely. Therefore, allocate a local buffer in enetc_msg_handle_rxmsg() and copy the message content from the DMA buffer via memcpy() before processing. This ensures the PF operates on a stable snapshot that the VF cannot modify. Link: https://sashiko.dev/#/patchset/20260511080805.2052495-1-wei.fang%40nxp.com #1 Link: https://sashiko.dev/#/patchset/20260513103021.2190593-1-wei.fang%40nxp.com #2 Fixes: beb74ac878c8 ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260520064421.91569-5-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>