diff options
author | Cong Wang <cong.wang@bytedance.com> | 2022-09-12 10:35:53 -0700 |
---|---|---|
committer | Paolo Abeni <pabeni@redhat.com> | 2022-09-20 14:47:21 +0200 |
commit | db4192a754ebd52300a28abe1a50dd18eae0eb12 (patch) | |
tree | b5c8303e4305c06b00f2bc92540299c5cbac07f1 /net/ipv4 | |
parent | 90fdd1c1e9c49bcb46cde589dbdee94a6977086a (diff) | |
download | lwn-db4192a754ebd52300a28abe1a50dd18eae0eb12.tar.gz lwn-db4192a754ebd52300a28abe1a50dd18eae0eb12.zip |
tcp: read multiple skbs in tcp_read_skb()
Before we switched to ->read_skb(), ->read_sock() was passed with
desc.count=1, which technically indicates we only read one skb per
->sk_data_ready() call. However, for TCP, this is not true.
TCP at least has sk_rcvlowat which intentionally holds skb's in
receive queue until this watermark is reached. This means when
->sk_data_ready() is invoked there could be multiple skb's in the
queue, therefore we have to read multiple skbs in tcp_read_skb()
instead of one.
Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()")
Reported-by: Peilin Ye <peilin.ye@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220912173553.235838-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Diffstat (limited to 'net/ipv4')
-rw-r--r-- | net/ipv4/tcp.c | 29 |
1 files changed, 19 insertions, 10 deletions
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3488388eea5d..e373dde1f46f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1761,19 +1761,28 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) if (sk->sk_state == TCP_LISTEN) return -ENOTCONN; - skb = tcp_recv_skb(sk, seq, &offset); - if (!skb) - return 0; + while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) { + u8 tcp_flags; + int used; - __skb_unlink(skb, &sk->sk_receive_queue); - WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); - copied = recv_actor(sk, skb); - if (copied >= 0) { - seq += copied; - if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) + __skb_unlink(skb, &sk->sk_receive_queue); + WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); + tcp_flags = TCP_SKB_CB(skb)->tcp_flags; + used = recv_actor(sk, skb); + consume_skb(skb); + if (used < 0) { + if (!copied) + copied = used; + break; + } + seq += used; + copied += used; + + if (tcp_flags & TCPHDR_FIN) { ++seq; + break; + } } - consume_skb(skb); WRITE_ONCE(tp->copied_seq, seq); tcp_rcv_space_adjust(sk); |