summaryrefslogtreecommitdiff
path: root/drivers/vdpa
AgeCommit message (Collapse)Author
2026-06-10vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handlerSrujana Challa
Look up the IRQ index in oct_hw->irqs instead of assuming irq - irqs[0]. This supports non-contiguous IRQ numbers and avoids incorrect ring indexing when irqs[0] is not the base. Fixes: 26f8ce06af64 ("vdpa/octeon_ep: enable support for multiple interrupts per device") Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260224095226.1001151-5-schalla@marvell.com>
2026-06-10vdpa/octeon_ep: Add vDPA device event handling for firmware notificationsVamsi Attunuru
Handle vDPA device add and remove events from Octeon firmware. Use irq 0 for event delivery as device interrupts are multiplexed. Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260224095226.1001151-4-schalla@marvell.com>
2026-06-10vdpa/octeon_ep: Use 4 bytes for mailbox signatureVamsi Attunuru
The upper 4 bytes are reserved by the firmware for storing meta data. Use only lower 4 bytes to update the signature details. Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260224095226.1001151-3-schalla@marvell.com>
2026-06-10vdpa/octeon_ep: Fix PF->VF mailbox data address calculationSrujana Challa
The mailbox address was computed assuming 1 ring per VF. Read the actual rings-per-VF from OCTEP_EPF_RINFO and use it when calculating OCTEP_PF_MBOX_DATA offsets, fixing VF initialization when rings per VF > 1. Fixes: 8b6c724cdab8 ("virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices") Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260224095226.1001151-2-schalla@marvell.com>
2026-06-10vdpa/mlx5: Use kvzalloc_flex() for MTT command memoryRosen Penev
The create mkey command memory embeds the MTT array as a flexible array member. Use kvzalloc_flex() to allocate it directly instead of open-coding the struct_size() calculation with kvcalloc(). The MTT allocation still needs to be aligned to MLX5_VDPA_MTT_ALIGN bytes. Since each MTT entry is __be64, align the entry count directly and avoid carrying a separate byte length variable. Assisted-by: Codex:GPT-5.5 Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260508051837.1744409-1-rosenp@gmail.com>
2026-06-10vdpa_sim_net: switch to dynamic root deviceJohan Hovold
Driver core expects devices to be dynamically allocated and will, for example, complain loudly when no release function has been provided. Use root_device_register() to allocate and register the root device instead of open coding using a static device. Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260424104703.2619093-3-johan@kernel.org>
2026-06-10vdpa_sim_blk: switch to dynamic root deviceJohan Hovold
Driver core expects devices to be dynamically allocated and will, for example, complain loudly when no release function has been provided. Use root_device_register() to allocate and register the root device instead of open coding using a static device. Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260424104703.2619093-2-johan@kernel.org>
2026-06-10vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFOArnd Bergmann
These two ioctls are incompatible on 32-bit x86 userspace, because the data structures are shorter than they are on 64-bit. Add a proper .compat_ioctl handler for x86 that reads the structures with the smaller padding before calling the internal handlers. On all other architectures, CONFIG_COMPAT_FOR_U64_ALIGNMENT is disabled and no special handling is required. Fixes: ad146355bfad ("vduse: Support querying information of IOVA regions") Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260213154051.4172275-1-arnd@kernel.org>
2026-06-10VDUSE: avoid leaking information to userspaceJason Wang
The bounceing is not necessarily page aligned, so current VDUSE can leak kernel information through mapping bounce pages to userspace. Allocate bounce pages with __GFP_ZERO to avoid leaking information to userspace. Fixes: 8c773d53fb7b ("vduse: Implement an MMU-based software IOTLB") Cc: stable@vger.kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260130050750.4050-1-jasowang@redhat.com>
2026-06-10vduse: Fix race in vduse_dev_msg_sync and vduse_dev_read_iterZhang Tianci
There is one race case in vduse_dev_msg_sync and vduse_dev_read_iter: vduse_dev_read_iter(): lock(msg_lock); dequeue_msg(send_list); unlock(msg_lock); vduse_dev_msg_sync(): wait_timeout() finish lock(msg_lock); check msg->complete is false list_del(msg); <- double list_del() crash! To fix this case, we shall ensure vduse_msg is on send_list or recv_list outside the msg_lock critical section. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: stable@vger.kernel.org Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260226115550.1814-3-zhangtianci.1997@bytedance.com>
2026-06-10vduse: Requeue failed read to send_list headZhang Tianci
When copy_to_iter() fails in vduse_dev_read_iter(), put the message back at the head of send_list to preserve FIFO ordering and retry the oldest pending request first. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Reported-by: Michael S. Tsirkin <mst@redhat.com> Suggested-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260226115550.1814-2-zhangtianci.1997@bytedance.com>
2026-06-10vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr()Cindy Lu
Improve MAC address handling in mlx5_vdpa_set_attr() to ensure that old MAC entries are properly removed from the MPFS table before adding a new one. The new MAC address is then added to both the MPFS and VLAN tables. This change fixes an issue where the updated MAC address would not take effect until QEMU was rebooted. Signed-off-by: Cindy Lu <lulu@redhat.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260126094848.9601-4-lulu@redhat.com>
2026-06-10vdpa/mlx5: update mlx_features with driver state checkCindy Lu
Add logic in mlx5_vdpa_set_attr() to ensure the VIRTIO_NET_F_MAC feature bit is properly set only when the device is not yet in the DRIVER_OK (running) state. This makes the MAC address visible in the output of: vdpa dev config show -jp when the device is created without an initial MAC address. Signed-off-by: Cindy Lu <lulu@redhat.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260126094848.9601-2-lulu@redhat.com>
2026-06-10vdpa/ifcvf: handle dev_set_name() failure in ifcvf_vdpa_dev_add()Evgenii Burenchev
dev_set_name() may fail and return an error, but its return value is currently ignored and overwritten by _vdpa_register_device(). Abort device creation if dev_set_name() fails and release the device reference to avoid continuing with an improperly initialized struct device. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Zhu Lingshan <lingshan.zhu@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260226152924.38790-1-evg28bur@yandex.ru>
2026-06-10vduse: hold vduse_lock across IDR lookup in open pathQihang Tang
vduse_dev_open() looks up struct vduse_dev through the IDR and then acquires dev->lock only after vduse_lock has been dropped. This leaves a window where a concurrent VDUSE_DESTROY_DEV can remove the same object from the IDR and free it before the open path locks the device, leading to a use-after-free. Close this race by keeping vduse_lock held until dev->lock has been acquired in the open path, matching the lock ordering already used by the destroy path. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Qihang Tang <q.h.hack.winter@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20260508094659.94647-1-q.h.hack.winter@gmail.com>
2026-04-04vdpa: use generic driver_override infrastructureDanilo Krummrich
When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 539fec78edb4 ("vdpa: add driver_override support") Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260324005919.2408620-9-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-09vduse: avoid adding implicit paddingArnd Bergmann
The vduse_iova_range_v2 and vduse_iotlb_entry_v2 structures are both defined in a way that adds implicit padding and is incompatible between i386 and x86_64 userspace because of the different structure alignment requirements. Building the header with -Wpadded shows these new warnings: vduse.h:305:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] vduse.h:374:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] Change the amount of padding in these two structures to align them to 64 bit words and avoid those problems. Since the v1 vduse_iotlb_entry already has an inconsistent size, do not attempt to reuse the structure but rather list the members indiviudally, with a fixed amount of padding. Fixes: 079212f6877e ("vduse: add vq group asid support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260202224835.559538-1-arnd@kernel.org>
2026-02-04vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr()Cindy Lu
Improve MAC address handling in mlx5_vdpa_set_attr() to ensure that old MAC entries are properly removed from the MPFS table before adding a new one. The new MAC address is then added to both the MPFS and VLAN tables. This change fixes an issue where the updated MAC address would not take effect until QEMU was rebooted. Signed-off-by: Cindy Lu <lulu@redhat.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260126094848.9601-4-lulu@redhat.com>
2026-02-04vdpa/mlx5: reuse common function for MAC address updatesCindy Lu
Factor out MAC address update logic and reuse it from handle_ctrl_mac(). This ensures that old MAC entries are removed from the MPFS table before adding a new one and that the forwarding rules are updated accordingly. If updating the flow table fails, the original MAC and rules are restored as much as possible to keep the software and hardware state consistent. Signed-off-by: Cindy Lu <lulu@redhat.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260126094848.9601-3-lulu@redhat.com>
2026-02-04vdpa/mlx5: update mlx_features with driver state checkCindy Lu
Add logic in mlx5_vdpa_set_attr() to ensure the VIRTIO_NET_F_MAC feature bit is properly set only when the device is not yet in the DRIVER_OK (running) state. This makes the MAC address visible in the output of: vdpa dev config show -jp when the device is created without an initial MAC address. Signed-off-by: Cindy Lu <lulu@redhat.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260126094848.9601-2-lulu@redhat.com>
2026-01-28vduse: bump version numberEugenio Pérez
Finalize the series by advertising VDUSE API v1 support to userspace. Now that all required infrastructure for v1 (ASIDs, VQ groups, update_iotlb_v2) is in place, VDUSE devices can opt in to the new features. Assume API version 0 if the VDUSE instance does not call VDUSE_GET_API_VERSION to maintain compatibility. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-13-eperezma@redhat.com>
2026-01-28vduse: add vq group asid supportEugenio Pérez
Add support for assigning Address Space Identifiers (ASIDs) to each VQ group. This enables mapping each group into a distinct memory space. The vq group to ASID association is protected by a rwlock now. But the mutex domain_lock keeps protecting the domains of all ASIDs, as some operations like the one related with the bounce buffer size still requires to lock all the ASIDs. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-12-eperezma@redhat.com>
2026-01-28vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctlsEugenio Pérez
The next patch adds new ioctl with the ASID member per entry. Abstract these two so it can be build on top easily. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-11-eperezma@redhat.com>
2026-01-28vduse: take out allocations from vduse_dev_alloc_coherentEugenio Pérez
The function vduse_dev_alloc_coherent will be called under rwlock in next patches. Make it out of the lock to avoid increasing its fail rate. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-10-eperezma@redhat.com>
2026-01-28vduse: remove unused vaddr parameter of vduse_domain_free_coherentEugenio Pérez
We will modify the function in next patches so let's clean it first. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-9-eperezma@redhat.com>
2026-01-28vduse: refactor vdpa_dev_add for goto err handlingEugenio Pérez
Next patches introduce more error paths in this function. Refactor it so they can be accommodated through gotos. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-8-eperezma@redhat.com>
2026-01-28vduse: return internal vq group struct as map tokenEugenio Pérez
Return the internal struct that represents the vq group as virtqueue map token, instead of the device. This allows the map functions to access the information per group. At this moment all the virtqueues share the same vq group, that only can point to ASID 0. This change prepares the infrastructure for actual per-group address space handling Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-5-eperezma@redhat.com>
2026-01-28vduse: add vq group supportEugenio Pérez
This allows separate the different virtqueues in groups that shares the same address space. Asking the VDUSE device for the groups of the vq at the beginning as they're needed for the DMA API. Allocating 3 vq groups as net is the device that need the most groups: * Dataplane (guest passthrough) * CVQ * Shadowed vrings. Future versions of the series can include dynamic allocation of the groups array so VDUSE can declare more groups. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-4-eperezma@redhat.com>
2026-01-28vhost: move vdpa group bound check to vhost_vdpaEugenio Pérez
Remove duplication by consolidating these here. This reduces the posibility of a parent driver missing them. While we're at it, fix a bug in vdpa_sim where a valid ASID can be assigned to a group equal to ngroups, causing an out of bound write. Cc: stable@vger.kernel.org Fixes: bda324fd037a ("vdpasim: control virtqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260119143306.1818855-2-eperezma@redhat.com>
2025-12-05Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-11-27vduse: add WQ_PERCPU to alloc_workqueue usersMarco Crivellari
Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251107154917.313090-3-marco.crivellari@suse.com>
2025-11-27vdpa/pds: use %pe for ERR_PTR() in event handler registrationAlok Tiwari
Use %pe instead of %ps when printing ERR_PTR() values. %ps is intended for string pointers, while %pe correctly prints symbolic error names for error pointers returned via ERR_PTR(). This shows the returned error value more clearly. Fixes: 67f27b8b3a34 ("pds_vdpa: subscribe to the pds_core events") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251018174705.1511982-1-alok.a.tiwari@oracle.com>
2025-11-27virtio: vdpa: Fix reference count leak in octep_sriov_enable()Miaoqian Lin
pci_get_device() will increase the reference count for the returned pci_dev, and also decrease the reference count for the input parameter from if it is not NULL. If we break the loop in with 'vf_pdev' not NULL. We need to call pci_dev_put() to decrease the reference count. Found via static anlaysis and this is similar to commit c508eb042d97 ("perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology()") Fixes: 8b6c724cdab8 ("virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251027060737.33815-1-linmq006@gmail.com>
2025-11-27vdpa/mlx5: Fix incorrect error code reporting in query_virtqueuesAlok Tiwari
When query_virtqueues() fails, the error log prints the variable err instead of cmd->err. Since err may still be zero at this point, the log message can misleadingly report a success value 0 even though the command actually failed. Even worse, once err is set to the first failure, subsequent logs print that same stale value. This makes the error reporting appear one step behind the actual failing queue index, which is confusing and misleading. Fix the log to report cmd->err, which reflects the real failure code returned by the firmware. Fixes: 1fcdf43ea69e ("vdpa/mlx5: Use async API for vq query command") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20250929134258.80956-1-alok.a.tiwari@oracle.com>
2025-11-16mm: make INVALID_PHYS_ADDR a generic macroAnshuman Khandual
INVALID_PHYS_ADDR has very similar definitions across the code base. Hence just move that inside header <liux/mm.h> for more generic usage. Also drop the now redundant ones which are no longer required. Link: https://lkml.kernel.org/r/20251021025638.2420216-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-11mlx5: Fix default values in create CQAkiva Goldberger
Currently, CQs without a completion function are assigned the mlx5_add_cq_to_tasklet function by default. This is problematic since only user CQs created through the mlx5_ib driver are intended to use this function. Additionally, all CQs that will use doorbells instead of polling for completions must call mlx5_cq_arm. However, the default CQ creation flow leaves a valid value in the CQ's arm_db field, allowing FW to send interrupts to polling-only CQs in certain corner cases. These two factors would allow a polling-only kernel CQ to be triggered by an EQ interrupt and call a completion function intended only for user CQs, causing a null pointer exception. Some areas in the driver have prevented this issue with one-off fixes but did not address the root cause. This patch fixes the described issue by adding defaults to the create CQ flow. It adds a default dummy completion function to protect against null pointer exceptions, and it sets an invalid command sequence number by default in kernel CQs to prevent the FW from sending an interrupt to the CQ until it is armed. User CQs are responsible for their own initialization values. Callers of mlx5_core_create_cq are responsible for changing the completion function and arming the CQ per their needs. Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Leon Romanovsky <leon@kernel.org> Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-01vduse: Use fixed 4KB bounce pages for non-4KB page sizeSheng Zhao
The allocation granularity of bounce pages is PAGE_SIZE. This may cause even small IO requests to occupy an entire bounce page exclusively. The kind of memory waste will be more significant when PAGE_SIZE is larger than 4KB (e.g. arm64 with 64KB pages). So, optimize it by using fixed 4KB bounce maps and iova allocation granularity. A single IO request occupies at least a 4KB bounce page instead of the entire memory page of PAGE_SIZE. Signed-off-by: Sheng Zhao <sheng.zhao@bytedance.com> Message-Id: <20250925113516.60305-1-sheng.zhao@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-01vduse: switch to use virtio map API instead of DMA APIJason Wang
Lacking the support of device specific mapping supported in virtio, VDUSE must trick the DMA API in order to make virtio-vdpa transport work. This is done by advertising vDPA device as dma device with a VDUSE specific dma_ops even if it doesn't do DMA at all. This will be fixed by this patch. Thanks to the new mapping operations support by virtio and vDPA. VDUSE can simply switch to advertise its specific mappings operations to virtio via virtio-vdpa then DMA API is not needed for VDUSE any more and iova domain could be used as the mapping token instead. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250924070045.10361-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2025-10-01vdpa: introduce map opsJason Wang
Virtio core allows the transport to provide device or transport specific mapping functions. This patch adds this support to vDPA. We can simply do this by allowing the vDPA parent to register a virtio_map_ops. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250924070045.10361-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2025-10-01vdpa: support virtio_mapJason Wang
Virtio core switches from DMA device to virtio_map, let's do that as well for vDPA. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250821064641.5025-8-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2025-08-01vdpa: Fix IDR memory leak in VDUSE module exitAnders Roxell
Add missing idr_destroy() call in vduse_exit() to properly free the vduse_idr radix tree nodes. Without this, module load/unload cycles leak 576-byte radix tree node allocations, detectable by kmemleak as: unreferenced object (size 576): backtrace: [<ffffffff81234567>] radix_tree_node_alloc+0xa0/0xf0 [<ffffffff81234568>] idr_get_free+0x128/0x280 The vduse_idr is initialized via DEFINE_IDR() at line 136 and used throughout the VDUSE (vDPA Device in Userspace) driver for device ID management. The fix follows the documented pattern in lib/idr.c and matches the cleanup approach used by other drivers. This leak was discovered through comprehensive module testing with cumulative kmemleak detection across 10 load/unload iterations per module. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Message-Id: <20250704125335.1084649-1-anders.roxell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01vdpa/mlx5: Fix release of uninitialized resources on error pathDragos Tatulea
The commit in the fixes tag made sure that mlx5_vdpa_free() is the single entrypoint for removing the vdpa device resources added in mlx5_vdpa_dev_add(), even in the cleanup path of mlx5_vdpa_dev_add(). This means that all functions from mlx5_vdpa_free() should be able to handle uninitialized resources. This was not the case though: mlx5_vdpa_destroy_mr_resources() and mlx5_cmd_cleanup_async_ctx() were not able to do so. This caused the splat below when adding a vdpa device without a MAC address. This patch fixes these remaining issues: - Makes mlx5_vdpa_destroy_mr_resources() return early if called on uninitialized resources. - Moves mlx5_cmd_init_async_ctx() early on during device addition because it can't fail. This means that mlx5_cmd_cleanup_async_ctx() also can't fail. To mirror this, move the call site of mlx5_cmd_cleanup_async_ctx() in mlx5_vdpa_free(). An additional comment was added in mlx5_vdpa_free() to document the expectations of functions called from this context. Splat: mlx5_core 0000:b5:03.2: mlx5_vdpa_dev_add:3950:(pid 2306) warning: No mac address provisioned? ------------[ cut here ]------------ WARNING: CPU: 13 PID: 2306 at kernel/workqueue.c:4207 __flush_work+0x9a/0xb0 [...] Call Trace: <TASK> ? __try_to_del_timer_sync+0x61/0x90 ? __timer_delete_sync+0x2b/0x40 mlx5_vdpa_destroy_mr_resources+0x1c/0x40 [mlx5_vdpa] mlx5_vdpa_free+0x45/0x160 [mlx5_vdpa] vdpa_release_dev+0x1e/0x50 [vdpa] device_release+0x31/0x90 kobject_cleanup+0x37/0x130 mlx5_vdpa_dev_add+0x327/0x890 [mlx5_vdpa] vdpa_nl_cmd_dev_add_set_doit+0x2c1/0x4d0 [vdpa] genl_family_rcv_msg_doit+0xd8/0x130 genl_family_rcv_msg+0x14b/0x220 ? __pfx_vdpa_nl_cmd_dev_add_set_doit+0x10/0x10 [vdpa] genl_rcv_msg+0x47/0xa0 ? __pfx_genl_rcv_msg+0x10/0x10 netlink_rcv_skb+0x53/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x27b/0x3b0 netlink_sendmsg+0x1f7/0x430 __sys_sendto+0x1fa/0x210 ? ___pte_offset_map+0x17/0x160 ? next_uptodate_folio+0x85/0x2b0 ? percpu_counter_add_batch+0x51/0x90 ? filemap_map_pages+0x515/0x660 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x7b/0x2c0 ? do_read_fault+0x108/0x220 ? do_pte_missing+0x14a/0x3e0 ? __handle_mm_fault+0x321/0x730 ? count_memcg_events+0x13f/0x180 ? handle_mm_fault+0x1fb/0x2d0 ? do_user_addr_fault+0x20c/0x700 ? syscall_exit_work+0x104/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f0c25b0feca [...] ---[ end trace 0000000000000000 ]--- Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: 83e445e64f48 ("vdpa/mlx5: Fix error path during device add") Reported-by: Wenli Quan <wquan@redhat.com> Closes: https://lore.kernel.org/virtualization/CADZSLS0r78HhZAStBaN1evCSoPqRJU95Lt8AqZNJ6+wwYQ6vPQ@mail.gmail.com/ Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20250708120424.2363354-2-dtatulea@nvidia.com> Tested-by: Wenli Quan <wquan@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01vdpa/mlx5: Fix needs_teardown flag calculationDragos Tatulea
needs_teardown is a device flag that indicates when virtual queues need to be recreated. This happens for certain configuration changes: queue size and some specific features. Currently, the needs_teardown state can be incorrectly reset by subsequent .set_vq_num() calls. For example, for 1 rx VQ with size 512 and 1 tx VQ with size 256: .set_vq_num(0, 512) -> sets needs_teardown to true (rx queue has a non-default size) .set_vq_num(1, 256) -> sets needs_teardown to false (tx queue has a default size) This change takes into account the previous value of the needs_teardown flag when re-calculating it during VQ size configuration. Fixes: 0fe963d6fc16 ("vdpa/mlx5: Re-create HW VQs under certain conditions") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu<si-wei.liu@oracle.com> Message-Id: <20250604184802.2625300-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2025-05-27vdpa/octeon_ep: Control PCI dev enabling manuallyPhilipp Stanner
PCI region request functions such as pci_request_region() currently have the problem of becoming sometimes managed functions, if pcim_enable_device() instead of pci_enable_device() was called. The PCI subsystem wants to remove this deprecated behavior from its interfaces. octeopn_ep enables its device with pcim_enable_device() (for VF. PF uses manual management), but does so only to get automatic disablement. The driver wants to manage its PCI resources for VF manually, without devres. The easiest way not to use automatic resource management at all is by also handling device enable- and disablement manually. Replace pcim_enable_device() with pci_enable_device(). Add the necessary calls to pci_disable_device(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Acked-by: Vamsi Attunuru <vattunuru@marvell.com> Message-Id: <20250508085134.24084-2-phasta@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Philipp Stanner &lt;<a href="mailto:phasta@kernel.org" target="_blank">phasta@kernel.org</a>&gt;<br> Acked-by: Vamsi Attunuru &lt;<a href="mailto:vattunuru@marvell.com" target="_blank">vattunuru@marvell.com</a>&gt;<br>
2025-02-25vduse: add virtio_fs to allowed dev idEugenio Pérez
A VDUSE device that implements virtiofs device works fine just by adding the device id to the whitelist. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20250121103346.1030165-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-02-25vdpa/mlx5: Fix oversized null mkey longer than 32bitSi-Wei Liu
create_user_mr() has correct code to count the number of null keys used to fill in a hole for the memory map. However, fill_indir() does not follow the same to cap the range up to the 1GB limit correspondingly. Fill in more null keys for the gaps in between, so that null keys are correctly populated. Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Cc: stable@vger.kernel.org Reported-by: Cong Meng <cong.meng@oracle.com> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20250220193732.521462-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>