summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2025-02-23RDMA/bnxt_re: Fix the page details for the srq created by kernel consumersKashyap Desai
While using nvme target with use_srq on, below kernel panic is noticed. [ 549.698111] bnxt_en 0000:41:00.0 enp65s0np0: FEC autoneg off encoding: Clause 91 RS(544,514) [ 566.393619] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI .. [ 566.393799] <TASK> [ 566.393807] ? __die_body+0x1a/0x60 [ 566.393823] ? die+0x38/0x60 [ 566.393835] ? do_trap+0xe4/0x110 [ 566.393847] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393867] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393881] ? do_error_trap+0x7c/0x120 [ 566.393890] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393911] ? exc_divide_error+0x34/0x50 [ 566.393923] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393939] ? asm_exc_divide_error+0x16/0x20 [ 566.393966] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393997] bnxt_qplib_create_srq+0xc9/0x340 [bnxt_re] [ 566.394040] bnxt_re_create_srq+0x335/0x3b0 [bnxt_re] [ 566.394057] ? srso_return_thunk+0x5/0x5f [ 566.394068] ? __init_swait_queue_head+0x4a/0x60 [ 566.394090] ib_create_srq_user+0xa7/0x150 [ib_core] [ 566.394147] nvmet_rdma_queue_connect+0x7d0/0xbe0 [nvmet_rdma] [ 566.394174] ? lock_release+0x22c/0x3f0 [ 566.394187] ? srso_return_thunk+0x5/0x5f Page size and shift info is set only for the user space SRQs. Set page size and page shift for kernel space SRQs also. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1740237621-29291-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23RDMA/mlx5: Fix bind QP error cleanup flowPatrisious Haddad
When there is a failure during bind QP, the cleanup flow destroys the counter regardless if it is the one that created it or not, which is problematic since if it isn't the one that created it, that counter could still be in use. Fix that by destroying the counter only if it was created during this call. Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://patch.msgid.link/25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-20RDMA/mlx5: Fix AH static rate parsingPatrisious Haddad
Previously static rate wasn't translated according to our PRM but simply used the 4 lower bytes. Correctly translate static rate value passed in AH creation attribute according to our PRM expected values. In addition change 800GB mapping to zero, which is the PRM specified value. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://patch.msgid.link/18ef4cc5396caf80728341eb74738cd777596f60.1739187089.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-20RDMA/mlx5: Fix implicit ODP hang on parent deregistrationYishai Hadas
Fix the destroy_unused_implicit_child_mr() to prevent hanging during parent deregistration as of below [1]. Upon entering destroy_unused_implicit_child_mr(), the reference count for the implicit MR parent is incremented using: refcount_inc_not_zero(). A corresponding decrement must be performed if free_implicit_child_mr_work() is not called. The code has been updated to properly manage the reference count that was incremented. [1] INFO: task python3:2157 blocked for more than 120 seconds. Not tainted 6.12.0-rc7+ #1633 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:python3 state:D stack:0 pid:2157 tgid:2157 ppid:1685 flags:0x00000000 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 __mlx5_ib_dereg_mr+0x379/0x5d0 [mlx5_ib] ? __pfx_autoremove_wake_function+0x10/0x10 ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 ? kmem_cache_free+0x221/0x400 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f20f21f017b RSP: 002b:00007ffcfc4a77c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffcfc4a78d8 RCX: 00007f20f21f017b RDX: 00007ffcfc4a78c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffcfc4a78a0 R08: 000056147d125190 R09: 00007f20f1f14c60 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcfc4a7890 R13: 000000000000001c R14: 000056147d100fc0 R15: 00007f20e365c9d0 </TASK> Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Link: https://patch.msgid.link/80f2fcd19952dfa7d9981d93fd6359b4471f8278.1739186929.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Fix the statistics for Gen P7 VFSelvin Xavier
Gen P7 VF support the extended stats and is prevented by a VF check. Fix the check to issue the FW command for GenP7 VFs also. Fixes: 1801d87b3598 ("RDMA/bnxt_re: Support new 5760X P7 devices") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Fix issue in the unload pathKalesh AP
The cited comment removed the netdev notifier register call from the driver. But, it did not remove the cleanup code from the unload path. As a result, driver unload is not clean and resulted in undesired behaviour. Fixes: d3b15fcc4201 ("RDMA/bnxt_re: Remove deliver net device event") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Add sanity checks on rdev validityKalesh AP
There is a possibility that ulp_irq_stop and ulp_irq_start callbacks will be called when the device is in detached state. This can cause a crash due to NULL pointer dereference as the rdev is already freed. Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Fix an issue in bnxt_re_async_notifierKalesh AP
In the bnxt_re_async_notifier() callback, the way driver retrieves rdev pointer is wrong. The rdev pointer should be parsed from adev pointer as while registering with the L2 for ULP, driver uses the aux device pointer for the handle. Fixes: 7fea32784068 ("RDMA/bnxt_re: Add Async event handling support") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-09RDMA/hns: Fix mbox timing out by adding retry mechanismJunxian Huang
If a QP is modified to error state and a flush CQE process is triggered, the subsequent QP destruction mbox can still be successfully posted but will be blocked in HW until the flush CQE process finishes. This causes further mbox posting timeouts in driver. The blocking time is related to QP depth. Considering an extreme case where SQ depth and RQ depth are both 32K, the blocking time can reach about 135ms. This patch adds a retry mechanism for mbox posting. For each try, FW waits 15ms for HW to complete the previous mbox, otherwise return a timeout error code to driver. Counting other time consumption in FW, set 8 tries for mbox posting and a 5ms time gap before each retry to increase to a sufficient timeout limit. Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250208105930.522796-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mana_ib: Allocate PAGE aligned doorbell indexKonstantin Taranov
Allocate a PAGE aligned doorbell index to ensure each process gets a separate PAGE sized doorbell area space remapped to it in mana_ib_mmap Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1738751405-15041-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mlx5: Fix a WARN during dereg_mr for DM typeYishai Hadas
Memory regions (MR) of type DM (device memory) do not have an associated umem. In the __mlx5_ib_dereg_mr() -> mlx5_free_priv_descs() flow, the code incorrectly takes the wrong branch, attempting to call dma_unmap_single() on a DMA address that is not mapped. This results in a WARN [1], as shown below. The issue is resolved by properly accounting for the DM type and ensuring the correct branch is selected in mlx5_free_priv_descs(). [1] WARNING: CPU: 12 PID: 1346 at drivers/iommu/dma-iommu.c:1230 iommu_dma_unmap_page+0x79/0x90 Modules linked in: ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry ovelay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 12 UID: 0 PID: 1346 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1631 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:iommu_dma_unmap_page+0x79/0x90 Code: 2b 49 3b 29 72 26 49 3b 69 08 73 20 4d 89 f0 44 89 e9 4c 89 e2 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 07 b8 88 ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 0f 1f 44 00 RSP: 0018:ffffc90001913a10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810194b0a8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88810194b0a8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f537abdd740(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f537aeb8000 CR3: 000000010c248001 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x84/0x190 ? iommu_dma_unmap_page+0x79/0x90 ? report_bug+0xf8/0x1c0 ? handle_bug+0x55/0x90 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? iommu_dma_unmap_page+0x79/0x90 dma_unmap_page_attrs+0xe6/0x290 mlx5_free_priv_descs+0xb0/0xe0 [mlx5_ib] __mlx5_ib_dereg_mr+0x37e/0x520 [mlx5_ib] ? _raw_spin_unlock_irq+0x24/0x40 ? wait_for_completion+0xfe/0x130 ? rdma_restrack_put+0x63/0xe0 [ib_core] ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f537adaf17b Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffff218f0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffff218f1d8 RCX: 00007f537adaf17b RDX: 00007ffff218f1c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffff218f1a0 R08: 00007f537aa8d010 R09: 0000561ee2e4f270 R10: 00007f537aace3a8 R11: 0000000000000246 R12: 00007ffff218f190 R13: 000000000000001c R14: 0000561ee2e4d7c0 R15: 00007ffff218f450 </TASK> Fixes: f18ec4223117 ("RDMA/mlx5: Use a union inside mlx5_ib_mr") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/2039c22cfc3df02378747ba4d623a558b53fc263.1738587076.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mlx5: Fix a race for DMABUF MR which can lead to CQE with errorYishai Hadas
This patch addresses a potential race condition for a DMABUF MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_dmabuf_invalidate_cb() might be triggered by another task attempting to invalidate the MR having that freed lkey. Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state. To resolve this race condition, the dma_resv_lock() which was hold as part of the mlx5_ib_dmabuf_invalidate_cb() is now also acquired as part of the mlx5_revoke_mr() scope. Upon a successful revoke, we set umem_dmabuf->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@mnvidia.com> Link: https://patch.msgid.link/70617067abbfaa0c816a2544c922e7f4346def58.1738587016.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-03IB/mlx5: Set and get correct qp_num for a DCT QPMark Zhang
When a DCT QP is created on an active lag, it's dctc.port is assigned in a round-robin way, which is from 1 to dev->lag_port. In this case when querying this QP, we may get qp_attr.port_num > 2. Fix this by setting qp->port when modifying a DCT QP, and read port_num from qp->port instead of dctc.port when querying it. Fixes: 7c4b1ab9f167 ("IB/mlx5: Add DCT RoCE LAG support") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/94c76bf0adbea997f87ffa27674e0a7118ad92a9.1737290358.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-03RDMA/mlx5: Fix the recovery flow of the UMR QPYishai Hadas
This patch addresses an issue in the recovery flow of the UMR QP, ensuring tasks do not get stuck, as highlighted by the call trace [1]. During recovery, before transitioning the QP to the RESET state, the software must wait for all outstanding WRs to complete. Failing to do so can cause the firmware to skip sending some flushed CQEs with errors and simply discard them upon the RESET, as per the IB specification. This race condition can result in lost CQEs and tasks becoming stuck. To resolve this, the patch sends a final WR which serves only as a barrier before moving the QP state to RESET. Once a CQE is received for that final WR, it guarantees that no outstanding WRs remain, making it safe to transition the QP to RESET and subsequently back to RTS, restoring proper functionality. Note: For the barrier WR, we simply reuse the failed and ready WR. Since the QP is in an error state, it will only receive IB_WC_WR_FLUSH_ERR. However, as it serves only as a barrier we don't care about its status. [1] INFO: task rdma_resource_l:1922 blocked for more than 120 seconds. Tainted: G W 6.12.0-rc7+ #1626 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:rdma_resource_l state:D stack:0 pid:1922 tgid:1922 ppid:1369 flags:0x00004004 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 schedule_timeout+0x280/0x300 ? mark_held_locks+0x48/0x80 ? lockdep_hardirqs_on_prepare+0xe5/0x1a0 wait_for_completion+0x75/0x130 mlx5r_umr_post_send_wait+0x3c2/0x5b0 [mlx5_ib] ? __pfx_mlx5r_umr_done+0x10/0x10 [mlx5_ib] mlx5r_umr_revoke_mr+0x93/0xc0 [mlx5_ib] __mlx5_ib_dereg_mr+0x299/0x520 [mlx5_ib] ? _raw_spin_unlock_irq+0x24/0x40 ? wait_for_completion+0xfe/0x130 ? rdma_restrack_put+0x63/0xe0 [ib_core] ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? __lock_acquire+0x64e/0x2080 ? mark_held_locks+0x48/0x80 ? find_held_lock+0x2d/0xa0 ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? __fget_files+0xc3/0x1b0 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f99c918b17b RSP: 002b:00007ffc766d0468 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc766d0578 RCX: 00007f99c918b17b RDX: 00007ffc766d0560 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffc766d0540 R08: 00007f99c8f99010 R09: 000000000000bd7e R10: 00007f99c94c1c70 R11: 0000000000000246 R12: 00007ffc766d0530 R13: 000000000000001c R14: 0000000040246a80 R15: 0000000000000000 </TASK> Fixes: 158e71bb69e3 ("RDMA/mlx5: Add a umr recovery flow") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/27b51b92ec42dfb09d8096fcbd51878f397ce6ec.1737290141.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-26Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-26Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, fnic, qla2xx, mpi3mr). The major core change is the renaming of the slave_ methods plus a bit of constification. The rest are minor updates and fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (103 commits) scsi: fnic: Propagate SCSI error code from fnic_scsi_drv_init() scsi: fnic: Test for memory allocation failure and return error code scsi: fnic: Return appropriate error code from failure of scsi drv init scsi: fnic: Return appropriate error code for mem alloc failure scsi: fnic: Remove always-true IS_FNIC_FCP_INITIATOR macro scsi: fnic: Fix use of uninitialized value in debug message scsi: fnic: Delete incorrect debugfs error handling scsi: fnic: Remove unnecessary else to fix warning in FDLS FIP scsi: fnic: Remove extern definition from .c files scsi: fnic: Remove unnecessary else and unnecessary break in FDLS scsi: mpi3mr: Fix possible crash when setting up bsg fails scsi: ufs: bsg: Set bsg_queue to NULL after removal scsi: ufs: bsg: Delete bsg_dev when setting up bsg fails scsi: st: Don't set pos_unknown just after device recognition scsi: aic7xxx: Fix build 'aicasm' warning scsi: Revert "scsi: ufs: core: Probe for EXT_IID support" scsi: storvsc: Ratelimit warning logs to prevent VM denial of service scsi: scsi_debug: Constify sdebug_driver_template scsi: documentation: Corrections for struct updates scsi: driver-api: documentation: Change what is added to docbook ...
2025-01-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Lighter that normal, but the now usual collection of driver fixes and small improvements: - Small fixes and minor improvements to cxgb4, bnxt_re, rxe, srp, efa, cxgb4 - Update mlx4 to use the new umem APIs, avoiding direct use of scatterlist - Support ROCEv2 in erdma - Remove various uncalled functions, constify bin_attribute - Provide core infrastructure to catch netdev events and route them to drivers, consolidating duplicated driver code - Fix rare race condition crashes in mlx5 ODP flows" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (63 commits) RDMA/mlx5: Fix implicit ODP use after free RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error RDMA/qib: Constify 'struct bin_attribute' RDMA/hfi1: Constify 'struct bin_attribute' RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]" RDMA/cxgb4: Notify rdma stack for IB_EVENT_QP_LAST_WQE_REACHED event RDMA/bnxt_re: Allocate dev_attr information dynamically RDMA/bnxt_re: Pass the context for ulp_irq_stop RDMA/bnxt_re: Add support to handle DCB_CONFIG_CHANGE event RDMA/bnxt_re: Query firmware defaults of CC params during probe RDMA/bnxt_re: Add Async event handling support bnxt_en: Add ULP call to notify async events RDMA/mlx5: Fix indirect mkey ODP page count MAINTAINERS: Update the bnxt_re maintainers RDMA/hns: Clean up the legacy CONFIG_INFINIBAND_HNS RDMA/rtrs: Add missing deinit() call RDMA/efa: Align interrupt related fields to same type RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of error RDMA/mlx5: Fix link status down event for MPV RDMA/erdma: Support create_ah/destroy_ah in non-sleepable contexts ...
2025-01-21RDMA/mlx5: Fix implicit ODP use after freePatrisious Haddad
Prevent double queueing of implicit ODP mr destroy work by using __xa_cmpxchg() to make sure this is the only time we are destroying this specific mr. Without this change, we could try to invalidate this mr twice, which in turn could result in queuing a MR work destroy twice, and eventually the second work could execute after the MR was freed due to the first work, causing a user after free and trace below. refcount_t: underflow; use-after-free. WARNING: CPU: 2 PID: 12178 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x130 Modules linked in: bonding ib_ipoib vfio_pci ip_gre geneve nf_tables ip6_gre gre ip6_tunnel tunnel6 ipip tunnel4 ib_umad rdma_ucm mlx5_vfio_pci vfio_pci_core vfio_iommu_type1 mlx5_ib vfio ib_uverbs mlx5_core iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs] CPU: 2 PID: 12178 Comm: kworker/u20:5 Not tainted 6.5.0-rc1_net_next_mlx5_58c644e #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound free_implicit_child_mr_work [mlx5_ib] RIP: 0010:refcount_warn_saturate+0x12b/0x130 Code: 48 c7 c7 38 95 2a 82 c6 05 bc c6 fe 00 01 e8 0c 66 aa ff 0f 0b 5b c3 48 c7 c7 e0 94 2a 82 c6 05 a7 c6 fe 00 01 e8 f5 65 aa ff <0f> 0b 5b c3 90 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 8d 50 ff RSP: 0018:ffff8881008e3e40 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 RDX: ffff88852c91b5c8 RSI: 0000000000000001 RDI: ffff88852c91b5c0 RBP: ffff8881dacd4e00 R08: 00000000ffffffff R09: 0000000000000019 R10: 000000000000072e R11: 0000000063666572 R12: ffff88812bfd9e00 R13: ffff8881c792d200 R14: ffff88810011c005 R15: ffff8881002099c0 FS: 0000000000000000(0000) GS:ffff88852c900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5694b5e000 CR3: 00000001153f6003 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? refcount_warn_saturate+0x12b/0x130 free_implicit_child_mr_work+0x180/0x1b0 [mlx5_ib] process_one_work+0x1cc/0x3c0 worker_thread+0x218/0x3c0 kthread+0xc6/0xf0 ret_from_fork+0x1f/0x30 </TASK> Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/r/c96b8645a81085abff739e6b06e286a350d1283d.1737274283.git.leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-21RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with errorYishai Hadas
This patch addresses a race condition for an ODP MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_invalidate_range() might be triggered by another task attempting to invalidate a range for the same freed lkey. This task will: - Acquire the umem_odp->umem_mutex lock. - Call mlx5r_umr_update_xlt() on the UMR QP. - Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state [1]. To resolve this race condition, the umem_odp->umem_mutex lock is now also acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, we set umem_odp->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. [1] From dmesg: infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2 WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] [..] Call Trace: <TASK> mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x1e1/0x240 zap_page_range_single+0xf1/0x1a0 madvise_vma_behavior+0x677/0x6e0 do_madvise+0x1a2/0x4b0 __x64_sys_madvise+0x25/0x30 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/r/68a1e007c25b2b8fe5d625f238cc3b63e5341f77.1737290229.git.leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-19RDMA/qib: Constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20250114-sysfs-const-bin_attr-infiniband-v1-2-397aaa94d453@weissschuh.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-19RDMA/hfi1: Constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20250114-sysfs-const-bin_attr-infiniband-v1-1-397aaa94d453@weissschuh.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]"Zhu Yanjun
The Call Trace is as below: " <TASK> ? show_regs.cold+0x1a/0x1f ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? __warn+0x84/0xd0 ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? report_bug+0x105/0x180 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? __rxe_cleanup+0x124/0x170 [rdma_rxe] rxe_destroy_qp.cold+0x24/0x29 [rdma_rxe] ib_destroy_qp_user+0x118/0x190 [ib_core] rdma_destroy_qp.cold+0x43/0x5e [rdma_cm] rtrs_cq_qp_destroy.cold+0x1d/0x2b [rtrs_core] rtrs_srv_close_work.cold+0x1b/0x31 [rtrs_server] process_one_work+0x21d/0x3f0 worker_thread+0x4a/0x3c0 ? process_one_work+0x3f0/0x3f0 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> " When too many rdma resources are allocated, rxe needs more time to handle these rdma resources. Sometimes with the current timeout, rxe can not release the rdma resources correctly. Compared with other rdma drivers, a bigger timeout is used. Fixes: 215d0a755e1b ("RDMA/rxe: Stop lookup of partially built objects") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250110160927.55014-1-yanjun.zhu@linux.dev Tested-by: Joe Klein <joe.klein812@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/cxgb4: Notify rdma stack for IB_EVENT_QP_LAST_WQE_REACHED eventAnumula Murali Mohan Reddy
This patch sends IB_EVENT_QP_LAST_WQE_REACHED event on a QP that is in error state and associated with an SRQ. This behaviour is incorporated in flush_qp() which is called when QP transitions to error state. Supports SRQ drain functionality added by commit 844bc12e6da3 ("IB/core: add support for draining Shared receive queues") Fixes: 844bc12e6da3 ("IB/core: add support for draining Shared receive queues") Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20250107095053.81007-1-anumula@chelsio.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/bnxt_re: Allocate dev_attr information dynamicallyKalesh AP
In order to optimize the size of driver private structure, the memory for dev_attr is allocated dynamically during the chip context initialization. In order to make certain runtime decisions, store dev_attr in the qplib_res structure. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/bnxt_re: Pass the context for ulp_irq_stopKalesh AP
ulp_irq_stop() can be invoked from a context where FW is healthy or when FW is in a reset state. In the latter case, ULP must stop all interactions with HW/FW and also with application and stack. Added a new parameter to the ulp_irq_stop() function to achieve that. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/bnxt_re: Add support to handle DCB_CONFIG_CHANGE eventKalesh AP
QP1 context in HW needs to be updated when there is a change in the default DSCP values used for RoCE traffic. Handle the event from FW and modify the dscp value used by QP1. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-5-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/bnxt_re: Query firmware defaults of CC params during probeKalesh AP
Added function to query firmware default values of CC parameters during driver init. These values will be stored in driver local structure and used in subsequent patch. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-4-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14RDMA/bnxt_re: Add Async event handling supportKalesh AP
Using the option provided by Ethernet driver, register for FW Async event. During probe, while registeriung with Ethernet driver, provide the ulp hook 'ulp_async_notifier' for receiving the firmware events. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-13RDMA/mlx5: Fix indirect mkey ODP page countMichael Guralnik
Restrict the check for the number of pages handled during an ODP page fault to direct mkeys. Perform the check right after handling the page fault and don't propagate the number of handled pages to callers. Indirect mkeys and their associated direct mkeys can have different start addresses. As a result, the calculation of the number of pages to handle for an indirect mkey may not match the actual page fault handling done on the direct mkey. For example: A 4K sized page fault on a KSM mkey that has a start address that is not aligned to a page will result a calculation that assumes the number of pages required to handle are 2. While the underlying MTT might be aligned will require fetching only a single page. Thus, do the calculation and compare number of pages handled only per direct mkey. Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Link: https://patch.msgid.link/86c483d9e75ce8fe14e9ff85b62df72b779f8ab1.1736187990.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-12kernel-wide: add explicity||explicitly to spelling.txtShivam Chaudhary
Correct the spelling dictionary so that future instances will be caught by checkpatch, and fix the instances found. Link: https://lkml.kernel.org/r/20241211154903.47027-1-cvam0000@gmail.com Signed-off-by: Shivam Chaudhary <cvam0000@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Shivam Chaudhary <cvam0000@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-06RDMA/hns: Clean up the legacy CONFIG_INFINIBAND_HNSJunxian Huang
hns driver used to support hip06 and hip08 devices with CONFIG_INFINIBAND_HNS_HIP06 and CONFIG_INFINIBAND_HNS_HIP08 respectively, which both depended on CONFIG_INFINIBAND_HNS. But we no longer provide support for hip06 and only support hip08 and higher since the commit in fixes line, so there is no need to have CONFIG_INFINIBAND_HNS any more. Remove it and only keep CONFIG_INFINIBAND_HNS_HIP08. Fixes: 38d220882426 ("RDMA/hns: Remove support for HIP06") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250106111211.3945051-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-06RDMA/rtrs: Add missing deinit() callLi Zhijian
A warning is triggered when repeatedly connecting and disconnecting the rnbd: list_add corruption. prev->next should be next (ffff88800b13e480), but was ffff88801ecd1338. (prev=ffff88801ecd1340). WARNING: CPU: 1 PID: 36562 at lib/list_debug.c:32 __list_add_valid_or_report+0x7f/0xa0 Workqueue: ib_cm cm_work_handler [ib_cm] RIP: 0010:__list_add_valid_or_report+0x7f/0xa0 ? __list_add_valid_or_report+0x7f/0xa0 ib_register_event_handler+0x65/0x93 [ib_core] rtrs_srv_ib_dev_init+0x29/0x30 [rtrs_server] rtrs_ib_dev_find_or_add+0x124/0x1d0 [rtrs_core] __alloc_path+0x46c/0x680 [rtrs_server] ? rtrs_rdma_connect+0xa6/0x2d0 [rtrs_server] ? rcu_is_watching+0xd/0x40 ? __mutex_lock+0x312/0xcf0 ? get_or_create_srv+0xad/0x310 [rtrs_server] ? rtrs_rdma_connect+0xa6/0x2d0 [rtrs_server] rtrs_rdma_connect+0x23c/0x2d0 [rtrs_server] ? __lock_release+0x1b1/0x2d0 cma_cm_event_handler+0x4a/0x1a0 [rdma_cm] cma_ib_req_handler+0x3a0/0x7e0 [rdma_cm] cm_process_work+0x28/0x1a0 [ib_cm] ? _raw_spin_unlock_irq+0x2f/0x50 cm_req_handler+0x618/0xa60 [ib_cm] cm_work_handler+0x71/0x520 [ib_cm] Commit 667db86bcbe8 ("RDMA/rtrs: Register ib event handler") introduced a new element .deinit but never used it at all. Fix it by invoking the `deinit()` to appropriately unregister the IB event handler. Cc: Jinpu Wang <jinpu.wang@ionos.com> Fixes: 667db86bcbe8 ("RDMA/rtrs: Register ib event handler") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20250106004516.16611-1-lizhijian@fujitsu.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-06RDMA/efa: Align interrupt related fields to same typeYonatan Nachum
There is a lot of implicit casting of interrupt related fields. Use u32 as common type since this is what the device use as type for max supported EQs and what IB core expects in num_comp_vectors field. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://patch.msgid.link/20250105131421.29030-1-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-05RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of errorKalesh AP
In the error handling path of bnxt_re_mmap(), driver should invoke rdma_user_mmap_entry_put() to free the reference of mmap entry in case the error happens after rdma_user_mmap_entry_get was called. Fixes: ea2224857882 ("RDMA/bnxt_re: Update alloc_page uapi for pacing") Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/20250104061519.2540178-1-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.13-rc6). No conflicts. Adjacent changes: include/linux/if_vlan.h f91a5b808938 ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK") 3f330db30638 ("net: reformat kdoc return statements") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A lot of fixes accumulated over the holiday break: - Static tool fixes, value is already proven to be NULL, possible integer overflow - Many bnxt_re fixes: - Crashes due to a mismatch in the maximum SGE list size - Don't waste memory for user QPs by creating kernel-only structures - Fix compatability issues with older HW in some of the new HW features recently introduced: RTS->RTS feature, work around 9096 - Do not allow destroy_qp to fail - Validate QP MTU against device limits - Add missing validation on madatory QP attributes for RTR->RTS - Report port_num in query_qp as required by the spec - Fix creation of QPs of the maximum queue size, and in the variable mode - Allow all QPs to be used on newer HW by limiting a work around only to HW it affects - Use the correct MSN table size for variable mode QPs - Add missing locking in create_qp() accessing the qp_tbl - Form WQE buffers correctly when some of the buffers are 0 hop - Don't crash on QP destroy if the userspace doesn't setup the dip_ctx - Add the missing QP flush handler call on the DWQE path to avoid hanging on error recovery - Consistently use ENXIO for return codes if the devices is fatally errored - Try again to fix VLAN support on iwarp, previous fix was reverted due to breaking other cards - Correct error path return code for rdma netlink events - Remove the seperate net_device pointer in siw and rxe which syzkaller found a way to UAF - Fix a UAF of a stack ib_sge in rtrs - Fix a regression where old mlx5 devices and FW were wrongly activing new device features and failing" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits) RDMA/mlx5: Enable multiplane mode only when it is supported RDMA/bnxt_re: Fix error recovery sequence RDMA/rtrs: Ensure 'ib_sge list' is accessible RDMA/rxe: Remove the direct link to net_device RDMA/hns: Fix missing flush CQE for DWQE RDMA/hns: Fix warning storm caused by invalid input in IO path RDMA/hns: Fix accessing invalid dip_ctx during destroying QP RDMA/hns: Fix mapping error of zero-hop WQE buffer RDMA/bnxt_re: Fix the locking while accessing the QP table RDMA/bnxt_re: Fix MSN table size for variable wqe mode RDMA/bnxt_re: Add send queue size check for variable wqe RDMA/bnxt_re: Disable use of reserved wqes RDMA/bnxt_re: Fix max_qp_wrs reported RDMA/siw: Remove direct link to net_device RDMA/nldev: Set error code in rdma_nl_notify_event RDMA/bnxt_re: Fix reporting hw_ver in query_device RDMA/bnxt_re: Fix to export port num to ib_query_qp RDMA/bnxt_re: Fix setting mandatory attributes for modify_qp RDMA/bnxt_re: Add check for path mtu in modify_qp RDMA/bnxt_re: Fix the check for 9060 condition ...
2025-01-03RDMA/mlx5: Enable multiplane mode only when it is supportedMark Zhang
Driver queries vport_cxt.num_plane and enables multiplane when it is greater then 0, but some old FWs (versions from x.40.1000 till x.42.1000), report vport_cxt.num_plane = 1 unexpectedly. Fix it by querying num_plane only when HCA_CAP2.multiplane bit is set. Fixes: 2a5db20fa532 ("RDMA/mlx5: Add support to multi-plane device and port") Link: https://patch.msgid.link/r/1ef901acdf564716fcf550453cf5e94f343777ec.1734610916.git.leon@kernel.org Cc: stable@vger.kernel.org Reported-by: Francesco Poli <invernomuto@paranoici.org> Closes: https://lore.kernel.org/all/nvs4i2v7o6vn6zhmtq4sgazy2hu5kiulukxcntdelggmznnl7h@so3oul6uwgbl/ Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-02RDMA/mlx5: Fix link status down event for MPVPatrisious Haddad
The commit below prevented MPV from unloading correctly due to blocking the netdev down event, allow sending the event for MPV mode to maintain proper unload flow. Fixes: 379013776222 ("RDMA/mlx5: Handle link status event only for LAG device") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://patch.msgid.link/d7731478e456f61255af798a7fd4e64b006ddebb.1735567976.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-31RDMA/bnxt_re: Fix error recovery sequenceKalesh AP
Fixed to return ENXIO from __send_message_basic_sanity() to indicate that device is in error state. In the case of ERR_DEVICE_DETACHED state, the driver should not post the commands to the firmware as it will time out eventually. Removed bnxt_re_modify_qp() call from bnxt_re_dev_stop() as it is a no-op. Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Link: https://patch.msgid.link/20241231025008.2267162-1-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-31RDMA/rtrs: Ensure 'ib_sge list' is accessibleLi Zhijian
Move the declaration of the 'ib_sge list' variable outside the 'always_invalidate' block to ensure it remains accessible for use throughout the function. Previously, 'ib_sge list' was declared within the 'always_invalidate' block, limiting its accessibility, then caused a 'BUG: kernel NULL pointer dereference'[1]. ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2d0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? memcpy_orig+0xd5/0x140 rxe_mr_copy+0x1c3/0x200 [rdma_rxe] ? rxe_pool_get_index+0x4b/0x80 [rdma_rxe] copy_data+0xa5/0x230 [rdma_rxe] rxe_requester+0xd9b/0xf70 [rdma_rxe] ? finish_task_switch.isra.0+0x99/0x2e0 rxe_sender+0x13/0x40 [rdma_rxe] do_task+0x68/0x1e0 [rdma_rxe] process_one_work+0x177/0x330 worker_thread+0x252/0x390 ? __pfx_worker_thread+0x10/0x10 This change ensures the variable is available for subsequent operations that require it. [1] https://lore.kernel.org/linux-rdma/6a1f3e8f-deb0-49f9-bc69-a9b03ecfcda7@fujitsu.com/ Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20241231013416.1290920-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-30RDMA/erdma: Support create_ah/destroy_ah in non-sleepable contextsBoshi Yu
The RDMA CM module might invoke erdma_create_ah() or erdma_destroy_ah() in a non-sleepable context. Both of these functions will call the erdma_post_cmd_wait(), which can potentially sleep and occasionally lead to a hard lockup. Therefore, post the create_ah and destroy_ah commands in polling mode if the RDMA_CREATE_AH_SLEEPABLE and RDMA_DESTROY_AH_SLEEPABLE flags are not set, respectively. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241226084141.74823-5-boshiyu@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-30RDMA/erdma: Support non-sleeping erdma_post_cmd_wait()Boshi Yu
Several scenarios require posting commands to the cmdq in a non-sleepable context. For example, the cm_alloc_msg() might call erdma_create_ah() while still holding a spinlock. So we add support for non-sleeping erdma_post_cmd_wait(). Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241226084141.74823-4-boshiyu@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-30RDMA/erdma: Fix incorrect response returned from query_qpBoshi Yu
The erdma_post_cmd_wait() function returns the cmdq response only when both resp0 and resp1 are not NULL. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241226084141.74823-3-boshiyu@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-30RDMA/erdma: Add missing fields to the erdma_device_ops_rocev2Boshi Yu
Set the query_ah field to the erdma_create_ah() function and set the size_ib_ah field to the size of struct erdma_ah. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20241226084141.74823-2-boshiyu@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-30RDMA/efa: Reset device on probe failureMichael Margolin
Make sure the device is being reset on driver exit whatever the reason is, to keep the device aligned and allow it to close shared resources (e.g. admin queue). Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20241225131548.15155-1-mrgolin@amazon.com Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-25RDMA/hns: Support fast path for link-down events dispatchingYuyu Li
hns3 NIC driver can directly notify the RoCE driver about link status events bypassing the netdev notifier. This can provide more timely event dispatching for ULPs. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-25RDMA/mlx5: Handle link status event only for LAG deviceYuyu Li
The link status events of non-LAG devices are now handled in ib_core, so only LAG device events need to be handled in driver. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-25RDMA/pvrdma: Support report_port_event() opsYuyu Li
In addition to dispatching event, some private stuffs need to be done in this driver's link status event handler. Implement the new report_port_event() ops with the link status event codes. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-25RDMA/mlx4: Support report_port_event() opsYuyu Li
In addition to dispatching event, some private stuffs need to be done in this driver's link status event handler. Implement the new report_port_event() ops with the link status event codes. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-25RDMA/usnic: Support report_port_event() opsYuyu Li
In addition to dispatching event, some private stuffs need to be done in this driver's link status event handler. Implement the new report_port_event() ops with the link status event codes. Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>