diff options
author | Daisuke Matsuda <matsuda-daisuke@fujitsu.com> | 2023-04-18 18:06:42 +0900 |
---|---|---|
committer | Jason Gunthorpe <jgg@nvidia.com> | 2023-04-21 12:33:00 -0300 |
commit | 10af303192bc5490bb39b29541ecb0ead2eff1ce (patch) | |
tree | b19bf2e684c8b32a28a1c80888da76a8723bb9b5 | |
parent | 3e358ea8614ddfbc59ca7a3f5dff5dde2b350b2c (diff) | |
download | lwn-10af303192bc5490bb39b29541ecb0ead2eff1ce.tar.gz lwn-10af303192bc5490bb39b29541ecb0ead2eff1ce.zip |
RDMA/rxe: Fix spinlock recursion deadlock on requester
The following deadlock is observed:
Call Trace:
<IRQ>
_raw_spin_lock_bh+0x29/0x30
check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe]
rxe_rcv+0x173/0x3d0 [rdma_rxe]
rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe]
? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe]
udp_queue_rcv_one_skb+0x258/0x520
udp_unicast_rcv_skb+0x75/0x90
__udp4_lib_rcv+0x364/0x5c0
ip_protocol_deliver_rcu+0xa7/0x160
ip_local_deliver_finish+0x73/0xa0
ip_sublist_rcv_finish+0x80/0x90
ip_sublist_rcv+0x191/0x220
ip_list_rcv+0x132/0x160
__netif_receive_skb_list_core+0x297/0x2c0
netif_receive_skb_list_internal+0x1c5/0x300
napi_complete_done+0x6f/0x1b0
virtnet_poll+0x1f4/0x2d0 [virtio_net]
__napi_poll+0x2c/0x1b0
net_rx_action+0x293/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xcb/0x2ab
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x80/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:_raw_spin_lock+0x17/0x30
rxe_requester+0xe4/0x8f0 [rdma_rxe]
? xas_load+0x9/0xa0
? xa_load+0x70/0xb0
do_task+0x64/0x1f0 [rdma_rxe]
rxe_post_send+0x54/0x110 [rdma_rxe]
ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs]
? netif_receive_skb_list_internal+0x1e3/0x300
ib_uverbs_write+0x3c8/0x500 [ib_uverbs]
vfs_write+0xc5/0x3b0
ksys_write+0xab/0xe0
? syscall_trace_enter.constprop.0+0x126/0x1a0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
</TASK>
The deadlock is easily reproducible with perftest. Fix it by disabling
softirq when acquiring the lock in process context.
Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock")
Link: https://lore.kernel.org/r/20230418090642.1849358-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-rw-r--r-- | drivers/infiniband/sw/rxe/rxe_req.c | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 8e50d116d273..65134a9aefe7 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -180,13 +180,13 @@ static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp) if (wqe == NULL) return NULL; - spin_lock(&qp->state_lock); + spin_lock_bh(&qp->state_lock); if (unlikely((qp_state(qp) == IB_QPS_SQD) && (wqe->state != wqe_state_processing))) { - spin_unlock(&qp->state_lock); + spin_unlock_bh(&qp->state_lock); return NULL; } - spin_unlock(&qp->state_lock); + spin_unlock_bh(&qp->state_lock); wqe->mask = wr_opcode_mask(wqe->wr.opcode, qp); return wqe; |