<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/ipv6/udp.c, branch v4.5-rc4</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v4.5-rc4</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v4.5-rc4'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2016-01-19T18:52:25+00:00</updated>
<entry>
<title>udp: fix potential infinite loop in SO_REUSEPORT logic</title>
<updated>2016-01-19T18:52:25+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-01-19T16:36:43+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ed0dfffd7dcd3f517b1507929642c2aed4ef00fb'/>
<id>urn:sha1:ed0dfffd7dcd3f517b1507929642c2aed4ef00fb</id>
<content type='text'>
Using a combination of connected and un-connected sockets, Dmitry
was able to trigger soft lockups with his fuzzer.

The problem is that sockets in the SO_REUSEPORT array might have
different scores.

Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and
bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into
the soreuseport array, with a score which is smaller than the score of
first socket sk1 found in hash table (I am speaking of the regular UDP
hash table), if sk1 had the connect() done, giving a +8 to its score.

hash bucket [X] -&gt; sk1 -&gt; sk2 -&gt; NULL

sk1 score = 14  (because it did a connect())
sk2 score = 6

SO_REUSEPORT fast selection is an optimization. If it turns out the
score of the selected socket does not match score of first socket, just
fallback to old SO_REUSEPORT logic instead of trying to be too smart.

Normal SO_REUSEPORT users do not mix different kind of sockets, as this
mechanism is used for load balance traffic.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Craig Gallek &lt;kraigatgoog@gmail.com&gt;
Acked-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>soreuseport: pass skb to secondary UDP socket lookup</title>
<updated>2016-01-06T06:28:04+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-01-05T20:08:07+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1134158ba3d656b8dbc79a23d482129a531ba0ae'/>
<id>urn:sha1:1134158ba3d656b8dbc79a23d482129a531ba0ae</id>
<content type='text'>
This socket-lookup path did not pass along the skb in question
in my original BPF-based socket selection patch.  The skb in the
udpN_lib_lookup2 path can be used for BPF-based socket selection just
like it is in the 'traditional' udpN_lib_lookup path.

udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
the same hlist slot.  Coincidentally, I chose 10 sockets per
reuseport group in my functional test, so the lookup2 path was not
excersised. This adds an additional set of tests with 20 sockets.

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Fixes: 3ca8e4029969 ("soreuseport: BPF selection functional test")
Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF</title>
<updated>2016-01-05T03:49:59+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-01-04T22:41:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=538950a1b7527a0a52ccd9337e3fcd304f027f13'/>
<id>urn:sha1:538950a1b7527a0a52ccd9337e3fcd304f027f13</id>
<content type='text'>
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>soreuseport: fast reuseport UDP socket selection</title>
<updated>2016-01-05T03:49:58+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-01-04T22:41:46+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e32ea7e747271a0abcd37e265005e97cc81d9df5'/>
<id>urn:sha1:e32ea7e747271a0abcd37e265005e97cc81d9df5</id>
<content type='text'>
Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -&gt; 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -&gt; 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -&gt; 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai &lt;ycai@google.com&gt; for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: properly support MSG_PEEK with truncated buffers</title>
<updated>2016-01-04T22:23:36+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-12-30T13:51:12+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=197c949e7798fbf28cfadc69d9ca0c2abbf93191'/>
<id>urn:sha1:197c949e7798fbf28cfadc69d9ca0c2abbf93191</id>
<content type='text'>
Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg-&gt;msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb-&gt;ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: add complete rcu protection around np-&gt;opt</title>
<updated>2015-12-03T04:37:16+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-11-30T03:37:57+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=45f6fad84cc305103b28d73482b344d7f5b76f39'/>
<id>urn:sha1:45f6fad84cc305103b28d73482b344d7f5b76f39</id>
<content type='text'>
This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np-&gt;opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np-&gt;opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np-&gt;opt

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: SO_INCOMING_CPU setsockopt() support</title>
<updated>2015-10-13T02:28:20+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-10-09T02:33:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=70da268b569d32a9fddeea85dc18043de9d89f89'/>
<id>urn:sha1:70da268b569d32a9fddeea85dc18043de9d89f89</id>
<content type='text'>
SO_INCOMING_CPU as added in commit 2c8c56e15df3 was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()

This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.

This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.

Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: trivial whitespace fix</title>
<updated>2015-08-17T21:34:48+00:00</updated>
<author>
<name>Ian Morris</name>
<email>ipm@chirality.org.uk</email>
</author>
<published>2015-08-14T21:43:38+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ec120da6f0fe59f175c2a8faa0a7700280c39644'/>
<id>urn:sha1:ec120da6f0fe59f175c2a8faa0a7700280c39644</id>
<content type='text'>
Change brace placement to be in line with coding standards

Signed-off-by: Ian Morris &lt;ipm@chirality.org.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: fix behavior of wrong checksums</title>
<updated>2015-06-01T04:42:18+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-05-30T16:16:53+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=beb39db59d14990e401e235faf66a6b9b31240b0'/>
<id>urn:sha1:beb39db59d14990e401e235faf66a6b9b31240b0</id>
<content type='text'>
We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/ipv6/udp: Fix ipv6 multicast socket filter regression</title>
<updated>2015-05-19T20:34:43+00:00</updated>
<author>
<name>Henning Rogge</name>
<email>hrogge@gmail.com</email>
</author>
<published>2015-05-18T19:08:49+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=33b4b015e1a1ca7a8fdce40af5e71642a8ea355c'/>
<id>urn:sha1:33b4b015e1a1ca7a8fdce40af5e71642a8ea355c</id>
<content type='text'>
Commit &lt;5cf3d46192fc&gt; ("udp: Simplify__udp*_lib_mcast_deliver")
simplified the filter for incoming IPv6 multicast but removed
the check of the local socket address and the UDP destination
address.

This patch restores the filter to prevent sockets bound to a IPv6
multicast IP to receive other UDP traffic link unicast.

Signed-off-by: Henning Rogge &lt;hrogge@gmail.com&gt;
Fixes: 5cf3d46192fc ("udp: Simplify__udp*_lib_mcast_deliver")
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
