summaryrefslogtreecommitdiff
path: root/net/vmw_vsock
AgeCommit message (Collapse)Author
2025-02-18vsock/bpf: Warn on socket without transportMichal Luczaj
In the spirit of commit 91751e248256 ("vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]"), armorize the "impossible" cases with a warning. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-14vsock/virtio: fix variables initialization during resumingJunnan Wu
When executing suspend to ram twice in a row, the `rx_buf_nr` and `rx_buf_max_nr` increase to three times vq->num_free. Then after virtqueue_get_buf and `rx_buf_nr` decreased in function virtio_transport_rx_work, the condition to fill rx buffer (rx_buf_nr < rx_buf_max_nr / 2) will never be met. It is because that `rx_buf_nr` and `rx_buf_max_nr` are initialized only in virtio_vsock_probe(), but they should be reset whenever virtqueues are recreated, like after a suspend/resume. Move the `rx_buf_nr` and `rx_buf_max_nr` initialization in virtio_vsock_vqs_init(), so we are sure that they are properly initialized, every time we initialize the virtqueues, either when we load the driver or after a suspend/resume. To prevent erroneous atomic load operations on the `queued_replies` in the virtio_transport_send_pkt_work() function which may disrupt the scheduling of vsock->rx_work when transmitting reply-required socket packets, this atomic variable must undergo synchronized initialization alongside the preceding two variables after a suspend/resume. Fixes: bd50c5dc182b ("vsock/virtio: add support for device suspend/resume") Link: https://lore.kernel.org/virtualization/20250207052033.2222629-1-junnan01.wu@samsung.com/ Co-developed-by: Ying Gao <ying01.gao@samsung.com> Signed-off-by: Ying Gao <ying01.gao@samsung.com> Signed-off-by: Junnan Wu <junnan01.wu@samsung.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250214012200.1883896-1-junnan01.wu@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12vsock: Orphan socket after transport releaseMichal Luczaj
During socket release, sock_orphan() is called without considering that it sets sk->sk_wq to NULL. Later, if SO_LINGER is enabled, this leads to a null pointer dereferenced in virtio_transport_wait_close(). Orphan the socket only after transport release. Partially reverts the 'Fixes:' commit. KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] lock_acquire+0x19e/0x500 _raw_spin_lock_irqsave+0x47/0x70 add_wait_queue+0x46/0x230 virtio_transport_release+0x4e7/0x7f0 __vsock_release+0xfd/0x490 vsock_release+0x90/0x120 __sock_release+0xa3/0x250 sock_close+0x14/0x20 __fput+0x35e/0xa90 __x64_sys_close+0x78/0xd0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Reported-by: syzbot+9d55b199192a4be7d02c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c Fixes: fcdd2242c023 ("vsock: Keep the binding until socket destruction") Tested-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250210-vsock-linger-nullderef-v3-1-ef6244d02b54@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30Merge tag 'net-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from IPSec, netfilter and Bluetooth. Nothing really stands out, but as usual there's a slight concentration of fixes for issues added in the last two weeks before the merge window, and driver bugs from 6.13 which tend to get discovered upon wider distribution. Current release - regressions: - net: revert RTNL changes in unregister_netdevice_many_notify() - Bluetooth: fix possible infinite recursion of btusb_reset - eth: adjust locking in some old drivers which protect their state with spinlocks to avoid sleeping in atomic; core protects netdev state with a mutex now Previous releases - regressions: - eth: - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node() - bgmac: reduce max frame size to support just 1500 bytes; the jumbo frame support would previously cause OOB writes, but now fails outright - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid false detection of MPTCP blackholing Previous releases - always broken: - mptcp: handle fastopen disconnect correctly - xfrm: - make sure skb->sk is a full sock before accessing its fields - fix taking a lock with preempt disabled for RT kernels - usb: ipheth: improve safety of packet metadata parsing; prevent potential OOB accesses - eth: renesas: fix missing rtnl lock in suspend/resume path" * tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) MAINTAINERS: add Neal to TCP maintainers net: revert RTNL changes in unregister_netdevice_many_notify() net: hsr: fix fill_frame_info() regression vs VLAN packets doc: mptcp: sysctl: blackhole_timeout is per-netns mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted netfilter: nf_tables: reject mismatching sum of field_len with set key length net: sh_eth: Fix missing rtnl lock in suspend/resume path net: ravb: Fix missing rtnl lock in suspend/resume path selftests/net: Add test for loading devbound XDP program in generic mode net: xdp: Disallow attaching device-bound programs in generic mode tcp: correct handling of extreme memory squeeze bgmac: reduce max frame size to support just MTU 1500 vsock/test: Add test for connect() retries vsock/test: Add test for UAF due to socket unbinding vsock/test: Introduce vsock_connect_fd() vsock/test: Introduce vsock_bind() vsock: Allow retrying on connect() failure vsock: Keep the binding until socket destruction Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming ...
2025-01-29vsock: Allow retrying on connect() failureMichal Luczaj
sk_err is set when a (connectible) connect() fails. Effectively, this makes an otherwise still healthy SS_UNCONNECTED socket impossible to use for any subsequent connection attempts. Clear sk_err upon trying to establish a connection. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-2-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29vsock: Keep the binding until socket destructionMichal Luczaj
Preserve sockets bindings; this includes both resulting from an explicit bind() and those implicitly bound through autobind during connect(). Prevents socket unbinding during a transport reassignment, which fixes a use-after-free: 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2) 2. transport->release() calls vsock_remove_bound() without checking if sk was bound and moved to bound list (refcnt=1) 3. vsock_bind() assumes sk is in unbound list and before __vsock_insert_bound(vsock_bound_sockets()) calls __vsock_remove_bound() which does: list_del_init(&vsk->bound_table); // nop sock_put(&vsk->sk); // refcnt=0 BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730 Read of size 4 at addr ffff88816b46a74c by task a.out/2057 dump_stack_lvl+0x68/0x90 print_report+0x174/0x4f6 kasan_report+0xb9/0x190 __vsock_bind+0x62e/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 2057: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x85/0x90 kmem_cache_alloc_noprof+0x131/0x450 sk_prot_alloc+0x5b/0x220 sk_alloc+0x2c/0x870 __vsock_create.constprop.0+0x2e/0xb60 vsock_create+0xe4/0x420 __sock_create+0x241/0x650 __sys_socket+0xf2/0x1a0 __x64_sys_socket+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 2057: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x60 __kasan_slab_free+0x4b/0x70 kmem_cache_free+0x1a1/0x590 __sk_destruct+0x388/0x5a0 __vsock_bind+0x5e1/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150 RIP: 0010:refcount_warn_saturate+0xce/0x150 __vsock_bind+0x66d/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e refcount_t: underflow; use-after-free. WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150 RIP: 0010:refcount_warn_saturate+0xee/0x150 vsock_remove_bound+0x187/0x1e0 __vsock_release+0x383/0x4a0 vsock_release+0x90/0x120 __sock_release+0xa3/0x250 sock_close+0x14/0x20 __fput+0x359/0xa80 task_work_run+0x107/0x1d0 do_exit+0x847/0x2560 do_group_exit+0xb8/0x250 __x64_sys_exit_group+0x3a/0x50 x64_sys_call+0xfec/0x14f0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-1-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-25Merge tag 'hyperv-next-signed-20250123' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-14vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]Stefano Garzarella
Recent reports have shown how we sometimes call vsock_*_has_data() when a vsock socket has been de-assigned from a transport (see attached links), but we shouldn't. Previous commits should have solved the real problems, but we may have more in the future, so to avoid null-ptr-deref, we can return 0 (no space, no data available) but with a warning. This way the code should continue to run in a nearly consistent state and have a warning that allows us to debug future problems. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/ Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Co-developed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Co-developed-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock: reset socket state when de-assigning the transportStefano Garzarella
Transport's release() and destruct() are called when de-assigning the vsock transport. These callbacks can touch some socket state like sock flags, sk_state, and peer_shutdown. Since we are reassigning the socket to a new transport during vsock_connect(), let's reset these fields to have a clean state with the new transport. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock/virtio: cancel close work in the destructorStefano Garzarella
During virtio_transport_release() we can schedule a delayed work to perform the closing of the socket before destruction. The destructor is called either when the socket is really destroyed (reference counter to zero), or it can also be called when we are de-assigning the transport. In the former case, we are sure the delayed work has completed, because it holds a reference until it completes, so the destructor will definitely be called after the delayed work is finished. But in the latter case, the destructor is called by AF_VSOCK core, just after the release(), so there may still be delayed work scheduled. Refactor the code, moving the code to delete the close work already in the do_close() to a new function. Invoke it during destruction to make sure we don't leave any pending work. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Reported-by: Hyunwoo Kim <v4bel@theori.io> Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/ Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Tested-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock/bpf: return early if transport is not assignedStefano Garzarella
Some of the core functions can only be called if the transport has been assigned. As Michal reported, a socket might have the transport at NULL, for example after a failed connect(), causing the following trace: BUG: kernel NULL pointer dereference, address: 00000000000000a0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+ RIP: 0010:vsock_connectible_has_data+0x1f/0x40 Call Trace: vsock_bpf_recvmsg+0xca/0x5e0 sock_recvmsg+0xb9/0xc0 __sys_recvfrom+0xb3/0x130 __x64_sys_recvfrom+0x20/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e So we need to check the `vsk->transport` in vsock_bpf_recvmsg(), especially for connected sockets (stream/seqpacket) as we already do in __vsock_connectible_recvmsg(). Fixes: 634f1a7110b4 ("vsock: support sockmap") Cc: stable@vger.kernel.org Reported-by: Michal Luczaj <mhal@rbox.co> Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Tested-by: Michal Luczaj <mhal@rbox.co> Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock/virtio: discard packets if the transport changesStefano Garzarella
If the socket has been de-assigned or assigned to another transport, we must discard any packets received because they are not expected and would cause issues when we access vsk->transport. A possible scenario is described by Hyunwoo Kim in the attached link, where after a first connect() interrupted by a signal, and a second connect() failed, we can find `vsk->transport` at NULL, leading to a NULL pointer dereference. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Reported-by: Hyunwoo Kim <v4bel@theori.io> Reported-by: Wongi Lee <qwerty@theori.io> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/ Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-10hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.hNuno Das Neves
Switch to using hvhdk.h everywhere in the kernel. This header includes all the new Hyper-V headers in include/hyperv, which form a superset of the definitions found in hyperv-tlfs.h. This makes it easier to add new Hyper-V interfaces without being restricted to those in the TLFS doc (reflected in hyperv-tlfs.h). To be more consistent with the original Hyper-V code, the names of some definitions are changed slightly. Update those where needed. Update comments in mshyperv.h files to point to include/hyperv for adding new definitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2024-11-25bpf, vsock: Invoke proto::close on close()Michal Luczaj
vsock defines a BPF callback to be invoked when close() is called. However, this callback is never actually executed. As a result, a closed vsock socket is not automatically removed from the sockmap/sockhash. Introduce a dummy vsock_close() and make vsock_release() call proto::close. Note: changes in __vsock_release() look messy, but it's only due to indent level reduction and variables xmas tree reorder. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-3-f1b9669cacdc@rbox.co Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com>
2024-11-25bpf, vsock: Fix poll() missing a queueMichal Luczaj
When a verdict program simply passes a packet without redirection, sk_msg is enqueued on sk_psock::ingress_msg. Add a missing check to poll(). Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-1-f1b9669cacdc@rbox.co Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com>
2024-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore 252e01e68241 ("selftests: net: add netlink-dumps to .gitignore") be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c 671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled") 7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14Merge tag 'net-6.12-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Quite calm week. No new regression under investigation. Current release - regressions: - eth: revert "igb: Disable threaded IRQ for igb_msix_other" Current release - new code bugs: - bluetooth: btintel: direct exception event to bluetooth stack Previous releases - regressions: - core: fix data-races around sk->sk_forward_alloc - netlink: terminate outstanding dump on socket close - mptcp: error out earlier on disconnect - vsock: fix accept_queue memory leak - phylink: ensure PHY momentary link-fails are handled - eth: mlx5: - fix null-ptr-deref in add rule err flow - lock FTE when checking if active - eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol Previous releases - always broken: - sched: fix u32's systematic failure to free IDR entries for hnodes. - sctp: fix possible UAF in sctp_v6_available() - eth: bonding: add ns target multicast address to slave device - eth: mlx5: fix msix vectors to respect platform limit - eth: icssg-prueth: fix 1 PPS sync" * tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) net: sched: u32: Add test case for systematic hnode IDR leaks selftests: bonding: add ns multicast group testing bonding: add ns target multicast address to slave device net: ti: icssg-prueth: Fix 1 PPS sync stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines net: Make copy_safe_from_sockptr() match documentation net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol ipmr: Fix access to mfc_cache_list without lock held samples: pktgen: correct dev to DEV net: phylink: ensure PHY momentary link-fails are handled mptcp: pm: use _rcu variant under rcu_read_lock mptcp: hold pm lock when deleting entry mptcp: update local address flags when setting it net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. MAINTAINERS: Re-add cancelled Renesas driver sections Revert "igb: Disable threaded IRQ for igb_msix_other" Bluetooth: btintel: Direct exception event to bluetooth stack Bluetooth: hci_core: Fix calling mgmt_device_connected virtio/vsock: Improve MSG_ZEROCOPY error handling vsock: Fix sk_error_queue memory leak ...
2024-11-12virtio/vsock: Improve MSG_ZEROCOPY error handlingMichal Luczaj
Add a missing kfree_skb() to prevent memory leaks. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12vsock: Fix sk_error_queue memory leakMichal Luczaj
Kernel queues MSG_ZEROCOPY completion notifications on the error queue. Where they remain, until explicitly recv()ed. To prevent memory leaks, clean up the queue when the socket is destroyed. unreferenced object 0xffff8881028beb00 (size 224): comm "vsock_test", pid 1218, jiffies 4294694897 hex dump (first 32 bytes): 90 b0 21 17 81 88 ff ff 90 b0 21 17 81 88 ff ff ..!.......!..... 00 00 00 00 00 00 00 00 00 b0 21 17 81 88 ff ff ..........!..... backtrace (crc 6c7031ca): [<ffffffff81418ef7>] kmem_cache_alloc_node_noprof+0x2f7/0x370 [<ffffffff81d35882>] __alloc_skb+0x132/0x180 [<ffffffff81d2d32b>] sock_omalloc+0x4b/0x80 [<ffffffff81d3a8ae>] msg_zerocopy_realloc+0x9e/0x240 [<ffffffff81fe5cb2>] virtio_transport_send_pkt_info+0x412/0x4c0 [<ffffffff81fe6183>] virtio_transport_stream_enqueue+0x43/0x50 [<ffffffff81fe0813>] vsock_connectible_sendmsg+0x373/0x450 [<ffffffff81d233d5>] ____sys_sendmsg+0x365/0x3a0 [<ffffffff81d246f4>] ___sys_sendmsg+0x84/0xd0 [<ffffffff81d26f47>] __sys_sendmsg+0x47/0x80 [<ffffffff820d3df3>] do_syscall_64+0x93/0x180 [<ffffffff8220012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12virtio/vsock: Fix accept_queue memory leakMichal Luczaj
As the final stages of socket destruction may be delayed, it is possible that virtio_transport_recv_listen() will be called after the accept_queue has been flushed, but before the SOCK_DONE flag has been set. As a result, sockets enqueued after the flush would remain unremoved, leading to a memory leak. vsock_release __vsock_release lock virtio_transport_release virtio_transport_close schedule_delayed_work(close_work) sk_shutdown = SHUTDOWN_MASK (!) flush accept_queue release virtio_transport_recv_pkt vsock_find_bound_socket lock if flag(SOCK_DONE) return virtio_transport_recv_listen child = vsock_create_connected (!) vsock_enqueue_accept(child) release close_work lock virtio_transport_do_close set_flag(SOCK_DONE) virtio_transport_remove_sock vsock_remove_sock vsock_remove_bound release Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during socket destruction. unreferenced object 0xffff888109e3f800 (size 2040): comm "kworker/5:2", pid 371, jiffies 4294940105 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............ backtrace (crc 9e5f4e84): [<ffffffff81418ff1>] kmem_cache_alloc_noprof+0x2c1/0x360 [<ffffffff81d27aa0>] sk_prot_alloc+0x30/0x120 [<ffffffff81d2b54c>] sk_alloc+0x2c/0x4b0 [<ffffffff81fe049a>] __vsock_create.constprop.0+0x2a/0x310 [<ffffffff81fe6d6c>] virtio_transport_recv_pkt+0x4dc/0x9a0 [<ffffffff81fe745d>] vsock_loopback_work+0xfd/0x140 [<ffffffff810fc6ac>] process_one_work+0x20c/0x570 [<ffffffff810fce3f>] worker_thread+0x1bf/0x3a0 [<ffffffff811070dd>] kthread+0xdd/0x110 [<ffffffff81044fdd>] ret_from_fork+0x2d/0x50 [<ffffffff8100785a>] ret_from_fork_asm+0x1a/0x30 Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-09hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointerHyunwoo Kim
When hvs is released, there is a possibility that vsk->trans may not be initialized to NULL, which could lead to a dangling pointer. This issue is resolved by initializing vsk->trans to NULL. Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/Zys4hCj61V+mQfX2@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06vsock/virtio: Initialization of the dangling pointer occurring in vsk->transHyunwoo Kim
During loopback communication, a dangling pointer can be created in vsk->trans, potentially leading to a Use-After-Free condition. This issue is resolved by initializing vsk->trans to NULL. Cc: stable <stable@kernel.org> Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <2024102245-strive-crib-c8d3@gregkh> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-10-28vsock: do not leave dangling sk pointer in vsock_create()Eric Dumazet
syzbot was able to trigger the following warning after recent core network cleanup. On error vsock_create() frees the allocated sk object, but sock_init_data() has already attached it to the provided sock object. We must clear sock->sk to avoid possible use-after-free later. WARNING: CPU: 0 PID: 5282 at net/socket.c:1581 __sock_create+0x897/0x950 net/socket.c:1581 Modules linked in: CPU: 0 UID: 0 PID: 5282 Comm: syz.2.43 Not tainted 6.12.0-rc2-syzkaller-00667-g53bac8330865 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__sock_create+0x897/0x950 net/socket.c:1581 Code: 7f 06 01 65 48 8b 34 25 00 d8 03 00 48 81 c6 b0 08 00 00 48 c7 c7 60 0b 0d 8d e8 d4 9a 3c 02 e9 11 f8 ff ff e8 0a ab 0d f8 90 <0f> 0b 90 e9 82 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c c7 f8 ff RSP: 0018:ffffc9000394fda8 EFLAGS: 00010293 RAX: ffffffff89873c46 RBX: ffff888079f3c818 RCX: ffff8880314b9e00 RDX: 0000000000000000 RSI: 00000000ffffffed RDI: 0000000000000000 RBP: ffffffff8d3337f0 R08: ffffffff8987384e R09: ffffffff8989473a R10: dffffc0000000000 R11: fffffbfff203a276 R12: 00000000ffffffed R13: ffff888079f3c8c0 R14: ffffffff898736e7 R15: dffffc0000000000 FS: 00005555680ab500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f22b11196d0 CR3: 00000000308c0000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_create net/socket.c:1632 [inline] __sys_socket_create net/socket.c:1669 [inline] __sys_socket+0x150/0x3c0 net/socket.c:1716 __do_sys_socket net/socket.c:1730 [inline] __se_sys_socket net/socket.c:1728 [inline] __x64_sys_socket+0x7a/0x90 net/socket.c:1728 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f22b117dff9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff56aec0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000029 RAX: ffffffffffffffda RBX: 00007f22b1335f80 RCX: 00007f22b117dff9 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000028 RBP: 00007f22b11f0296 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f22b1335f80 R14: 00007f22b1335f80 R15: 00000000000012dd Fixes: 48156296a08c ("net: warn, if pf->create does not clear sock->sk on error") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20241022134819.1085254-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-18Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to not affect subreg_def marks in its range propagation (Eduard Zingerman) - Fix a truncation bug in the BPF verifier's handling of coerce_reg_to_size_sx (Dimitar Kanaliev) - Fix the BPF verifier's delta propagation between linked registers under 32-bit addition (Daniel Borkmann) - Fix a NULL pointer dereference in BPF devmap due to missing rxq information (Florian Kauer) - Fix a memory leak in bpf_core_apply (Jiri Olsa) - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for arrays of nested structs (Hou Tao) - Fix build ID fetching where memory areas backing the file were created with memfd_secret (Andrii Nakryiko) - Fix BPF task iterator tid filtering which was incorrectly using pid instead of tid (Jordan Rome) - Several fixes for BPF sockmap and BPF sockhash redirection in combination with vsocks (Michal Luczaj) - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri) - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility of an infinite BPF tailcall (Pu Lehui) - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot be resolved (Thomas Weißschuh) - Fix a bug in kfunc BTF caching for modules where the wrong BTF object was returned (Toke Høiland-Jørgensen) - Fix a BPF selftest compilation error in cgroup-related tests with musl libc (Tony Ambardar) - Several fixes to BPF link info dumps to fill missing fields (Tyrone Wu) - Add BPF selftests for kfuncs from multiple modules, checking that the correct kfuncs are called (Simon Sundberg) - Ensure that internal and user-facing bpf_redirect flags don't overlap (Toke Høiland-Jørgensen) - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van Riel) - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat under RT (Wander Lairson Costa) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits) lib/buildid: Handle memfd_secret() files in build_id_parse() selftests/bpf: Add test case for delta propagation bpf: Fix print_reg_state's constant scalar dump bpf: Fix incorrect delta propagation between linked registers bpf: Properly test iter/task tid filtering bpf: Fix iter/task tid filtering riscv, bpf: Make BPF_CMPXCHG fully ordered bpf, vsock: Drop static vsock_bpf_prot initialization vsock: Update msg_count on read_skb() vsock: Update rx_bytes on read_skb() bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock selftests/bpf: Add asserts for netfilter link info bpf: Fix link info netfilter flags to populate defrag flag selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx() selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx() bpf: Fix truncation bug in coerce_reg_to_size_sx() selftests/bpf: Assert link info uprobe_multi count & path_size if unset bpf: Fix unpopulated path_size when uprobe_multi fields unset selftests/bpf: Fix cross-compiling urandom_read selftests/bpf: Add test for kfunc module order ...
2024-10-17bpf, vsock: Drop static vsock_bpf_prot initializationMichal Luczaj
vsock_bpf_prot is set up at runtime. Remove the superfluous init. No functional change intended. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-4-d6577bbfe742@rbox.co
2024-10-17vsock: Update msg_count on read_skb()Michal Luczaj
Dequeuing via vsock_transport::read_skb() left msg_count outdated, which then confused SOCK_SEQPACKET recv(). Decrease the counter. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-3-d6577bbfe742@rbox.co
2024-10-17vsock: Update rx_bytes on read_skb()Michal Luczaj
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after vsock_transport::read_skb(). While here, also inform the peer that we've freed up space and it has more credit. Failing to update rx_bytes after packet is dequeued leads to a warning on SOCK_STREAM recv(): [ 233.396654] rx_queue is empty, but rx_bytes is non-zero [ 233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589 Fixes: 634f1a7110b4 ("vsock: support sockmap") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-2-d6577bbfe742@rbox.co
2024-10-07Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Several small bugfixes all over the place. Most notably, fixes the vsock allocation with GFP_KERNEL in atomic context, which has been triggering warnings for lots of testers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: null-ptr-dereference in vhost_scsi_get_req() vsock/virtio: use GFP_ATOMIC under RCU read lock virtio_console: fix misc probe bugs virtio_ring: tag event_triggered as racy for KCSAN vdpa/octeon_ep: Fix format specifier for pointers in debug messages
2024-10-07vsock/virtio: use GFP_ATOMIC under RCU read lockMichael S. Tsirkin
virtio_transport_send_pkt in now called on transport fast path, under RCU read lock. In that case, we have a bug: virtio_add_sgs is called with GFP_KERNEL, and might sleep. Pass the gfp flags as an argument, and use GFP_ATOMIC on the fast path. Link: https://lore.kernel.org/all/hfcr2aget2zojmqpr4uhlzvnep4vgskblx5b6xf2ddosbsrke7@nt34bxgp7j2x Fixes: efcd71af38be ("vsock/virtio: avoid queuing packets when intermediate queue is empty") Reported-by: Christian Brauner <brauner@kernel.org> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: Luigi Leonardi <luigi.leonardi@outlook.com> Message-ID: <3fbfb6e871f625f89eb578c7228e127437b1975a.1727876449.git.mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2024-09-26Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...
2024-09-25vsock/virtio: avoid queuing packets when intermediate queue is emptyLuigi Leonardi
When the driver needs to send new packets to the device, it always queues the new sk_buffs into an intermediate queue (send_pkt_queue) and schedules a worker (send_pkt_work) to then queue them into the virtqueue exposed to the device. This increases the chance of batching, but also introduces a lot of latency into the communication. So we can optimize this path by adding a fast path to be taken when there is no element in the intermediate queue, there is space available in the virtqueue, and no other process that is sending packets (tx_lock held). The following benchmarks were run to check improvements in latency and throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz and L1 guest running on QEMU/KVM with vhost process and all vCPUs pinned individually to pCPUs. - Latency Tool: Fio version 3.37-56 Mode: pingpong (h-g-h) Test runs: 50 Runtime-per-test: 50s Type: SOCK_STREAM In the following fio benchmark (pingpong mode) the host sends a payload to the guest and waits for the same payload back. fio process pinned both inside the host and the guest system. Before: Linux 6.9.8 Payload 64B: 1st perc. overall 99th perc. Before 12.91 16.78 42.24 us After 9.77 13.57 39.17 us Payload 512B: 1st perc. overall 99th perc. Before 13.35 17.35 41.52 us After 10.25 14.11 39.58 us Payload 4K: 1st perc. overall 99th perc. Before 14.71 19.87 41.52 us After 10.51 14.96 40.81 us - Throughput Tool: iperf-vsock The size represents the buffer length (-l) to read/write P represents the number of parallel streams P=1 4K 64K 128K Before 6.87 29.3 29.5 Gb/s After 10.5 39.4 39.9 Gb/s P=2 4K 64K 128K Before 10.5 32.8 33.2 Gb/s After 17.8 47.7 48.5 Gb/s P=4 4K 64K 128K Before 12.7 33.6 34.2 Gb/s After 16.9 48.1 50.5 Gb/s The performance improvement is related to this optimization, I used a ebpf kretprobe on virtio_transport_send_skb to check that each packet was sent directly to the virtqueue Co-developed-by: Marco Pinna <marco.pinn95@gmail.com> Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2024-09-25vsock/virtio: refactor virtio_transport_send_pkt_workMarco Pinna
Preliminary patch to introduce an optimization to the enqueue system. All the code used to enqueue a packet into the virtqueue is removed from virtio_transport_send_pkt_work() and moved to the new virtio_transport_send_skb() function. Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20240730-pinna-v4-1-5c9179164db5@outlook.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml c25504a0ba36 ("dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys") be034ee6c33d ("dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties") https://lore.kernel.org/20240815110934.56ae623a@canb.auug.org.au drivers/net/dsa/vitesse-vsc73xx-core.c 5b9eebc2c7a5 ("net: dsa: vsc73xx: pass value in phy_write operation") fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations") 2524d6c28bdc ("net: dsa: vsc73xx: use defined values in phy operations") https://lore.kernel.org/20240813104039.429b9fe6@canb.auug.org.au Resolve by using FIELD_PREP(), Stephen's resolution is simpler. Adjacent changes: net/vmw_vsock/af_vsock.c 69139d2919dd ("vsock: fix recursive ->recvmsg calls") 744500d81f81 ("vsock: add support for SIOCOUTQ ioctl") Link: https://patch.msgid.link/20240815141149.33862-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-15vsock: fix recursive ->recvmsg callsCong Wang
After a vsock socket has been added to a BPF sockmap, its prot->recvmsg has been replaced with vsock_bpf_recvmsg(). Thus the following recursiion could happen: vsock_bpf_recvmsg() -> __vsock_recvmsg() -> vsock_connectible_recvmsg() -> prot->recvmsg() -> vsock_bpf_recvmsg() again We need to fix it by calling the original ->recvmsg() without any BPF sockmap logic in __vsock_recvmsg(). Fixes: 634f1a7110b4 ("vsock: support sockmap") Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Cc: Bobby Eshleman <bobby.eshleman@bytedance.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-02vsock/virtio: add SIOCOUTQ support for all virtio based transportsLuigi Leonardi
Introduce support for virtio_transport_unsent_bytes ioctl for virtio_transport, vhost_vsock and vsock_loopback. For all transports the unsent bytes counter is incremented in virtio_transport_get_credit. In virtio_transport (G2H) and in vhost-vsock (H2G) the counter is decremented when the skbuff is consumed. In vsock_loopback the same skbuff is passed from the transmitter to the receiver, so the counter is decremented before queuing the skbuff to the receiver. Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02vsock: add support for SIOCOUTQ ioctlLuigi Leonardi
Add support for ioctl(s) in AF_VSOCK. The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number of unsent bytes in the socket. This information is transport-specific and is delegated to them using a callback. Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-17virtio: rename virtio_find_vqs_info() to virtio_find_vqs()Jiri Pirko
Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()Jiri Pirko
Instead of passing separate names and callbacks arrays to virtio_find_vqs(), have one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-18-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...
2024-05-22vsock/virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-19-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-13net: change proto and proto_ops accept typeJens Axboe
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-02vsock/virtio: fix packet delivery to tap deviceMarco Pinna
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added virtio_transport_deliver_tap_pkt() for handing packets to the vsockmon device. However, in virtio_transport_send_pkt_work(), the function is called before actually sending the packet (i.e. before placing it in the virtqueue with virtqueue_add_sgs() and checking whether it returned successfully). Queuing the packet in the virtqueue can fail even multiple times. However, in virtio_transport_deliver_tap_pkt() we deliver the packet to the monitoring tap interface only the first time we call it. This certainly avoids seeing the same packet replicated multiple times in the monitoring interface, but it can show the packet sent with the wrong timestamp or even before we succeed to queue it in the virtqueue. Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs() and making sure it returned successfully. Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") Cc: stable@vge.kernel.org Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23sock_diag: add module pointer to "struct sock_diag_handler"Eric Dumazet
Following patch is going to use RCU instead of sock_diag_table_mutex acquisition. This patch is a preparation, no change of behavior yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-03vsock/virtio: use skb_frag_*() helpersMina Almasry
Minor fix for virtio: code wanting to access the fields inside an skb frag should use the skb_frag_*() helpers, instead of accessing the fields directly. This allows for extensions where the underlying memory is not a page. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240102205905.793738-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-15virtio/vsock: send credit update during setting SO_RCVLOWATArseniy Krasnov
Send credit update message when SO_RCVLOWAT is updated and it is bigger than number of bytes in rx queue. It is needed, because 'poll()' will wait until number of bytes in rx queue will be not smaller than O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup for tx/rx is possible: sender waits for free space and receiver is waiting data in 'poll()'. Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set 'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the transport doesn't need to do it. Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15virtio/vsock: fix logic which reduces credit update messagesArseniy Krasnov
Add one more condition for sending credit update during dequeue from stream socket: when number of bytes in the rx queue is smaller than SO_RCVLOWAT value of the socket. This is actual for non-default value of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data transmission, because we need at least SO_RCVLOWAT bytes in our rx queue to wake up user for reading data (in corner case it is also possible to stuck both tx and rx sides, this is why 'Fixes' is used). Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()Nikolay Kuratov
We need to do signed arithmetic if we expect condition `if (bytes < 0)` to be possible Found by Linux Verification Center (linuxtesting.org) with SVACE Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231211162317.4116625-1-kniv@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warningStefano Garzarella
After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") in CentOS Stream 9, CI reported the following error: In file included from ./include/linux/kernel.h:17, from ./include/linux/list.h:9, from ./include/linux/preempt.h:11, from ./include/linux/spinlock.h:56, from net/vmw_vsock/virtio_transport_common.c:9: net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘: ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘ 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘ 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ ./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘ 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘ 63 | int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); We could solve it by using min_t(), but this operation seems entirely unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), which performs almost the same check, returning at most MAX_SKB_FRAGS elements. So, let's eliminate this unnecessary comparison. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Cc: avkrasnov@salutedevices.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-19net: fill in MODULE_DESCRIPTION()s for SOCK_DIAG modulesJakub Kicinski
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the sock diag modules in one fell swoop. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-07virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()Shigeru Yoshida
KMSAN reported the following uninit-value access issue: ===================================================== BUG: KMSAN: uninit-value in virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421 virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784 kthread+0x3cc/0x520 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 Uninit was stored to memory at: virtio_transport_space_update net/vmw_vsock/virtio_transport_common.c:1274 [inline] virtio_transport_recv_pkt+0x1ee8/0x26a0 net/vmw_vsock/virtio_transport_common.c:1415 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784 kthread+0x3cc/0x520 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 Uninit was created at: slab_post_alloc_hook+0x105/0xad0 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x5a2/0xaf0 mm/slub.c:3523 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:559 __alloc_skb+0x2fd/0x770 net/core/skbuff.c:650 alloc_skb include/linux/skbuff.h:1286 [inline] virtio_vsock_alloc_skb include/linux/virtio_vsock.h:66 [inline] virtio_transport_alloc_skb+0x90/0x11e0 net/vmw_vsock/virtio_transport_common.c:58 virtio_transport_reset_no_sock net/vmw_vsock/virtio_transport_common.c:957 [inline] virtio_transport_recv_pkt+0x1279/0x26a0 net/vmw_vsock/virtio_transport_common.c:1387 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784 kthread+0x3cc/0x520 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 CPU: 1 PID: 10664 Comm: kworker/1:5 Not tainted 6.6.0-rc3-00146-g9f3ebbef746f #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 Workqueue: vsock-loopback vsock_loopback_work ===================================================== The following simple reproducer can cause the issue described above: int main(void) { int sock; struct sockaddr_vm addr = { .svm_family = AF_VSOCK, .svm_cid = VMADDR_CID_ANY, .svm_port = 1234, }; sock = socket(AF_VSOCK, SOCK_STREAM, 0); connect(sock, (struct sockaddr *)&addr, sizeof(addr)); return 0; } This issue occurs because the `buf_alloc` and `fwd_cnt` fields of the `struct virtio_vsock_hdr` are not initialized when a new skb is allocated in `virtio_transport_init_hdr()`. This patch resolves the issue by initializing these fields during allocation. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Reported-and-tested-by: syzbot+0c8ce1da0ac31abbadcd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0c8ce1da0ac31abbadcd Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231104150531.257952-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>