<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/ipv4/inet_timewait_sock.c, branch standardize-docs</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=standardize-docs</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=standardize-docs'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2017-07-01T14:39:08+00:00</updated>
<entry>
<title>net: convert sock.sk_refcnt from atomic_t to refcount_t</title>
<updated>2017-07-01T14:39:08+00:00</updated>
<author>
<name>Reshetova, Elena</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-06-30T10:08:01+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=41c6d650f6537e55a1b53438c646fbc3f49176bf'/>
<id>urn:sha1:41c6d650f6537e55a1b53438c646fbc3f49176bf</id>
<content type='text'>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob</title>
<updated>2016-12-29T16:38:31+00:00</updated>
<author>
<name>Haishuang Yan</name>
<email>yanhaishuang@cmss.chinamobile.com</email>
</author>
<published>2016-12-28T09:52:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1946e672c173559155a3e210fe95dbf8b7b8ddf7'/>
<id>urn:sha1:1946e672c173559155a3e210fe95dbf8b7b8ddf7</id>
<content type='text'>
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>timers, net/ipv4/inet: Initialize connection request timers as pinned</title>
<updated>2016-07-07T08:35:06+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-04T09:50:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f3438bc7813c439a06104ba074301c8bdd64ac8b'/>
<id>urn:sha1:f3438bc7813c439a06104ba074301c8bdd64ac8b</id>
<content type='text'>
Pinned timers must carry the pinned attribute in the timer structure
itself, so convert the code to the new API.

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.617891430@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: must block bh in __inet_twsk_hashdance()</title>
<updated>2016-05-04T20:55:11+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-05-04T00:10:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=614bdd4d6e61d260d82945f5f52a5dc288f64783'/>
<id>urn:sha1:614bdd4d6e61d260d82945f5f52a5dc288f64783</id>
<content type='text'>
__inet_twsk_hashdance() might be called from process context,
better block BH before acquiring bind hash and established locks

Fixes: c10d9310edf5 ("tcp: do not assume TCP code is non preemptible")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rename NET_{ADD|INC}_STATS_BH()</title>
<updated>2016-04-28T02:48:24+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-27T23:44:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=02a1d6e7a6bb025a77da77012190e1efc1970f1c'/>
<id>urn:sha1:02a1d6e7a6bb025a77da77012190e1efc1970f1c</id>
<content type='text'>
Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: fix timewait races in timer handling</title>
<updated>2015-09-21T23:32:29+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-09-19T16:08:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054'/>
<id>urn:sha1:ed2e923945892a8372ab70d2f61d364b0b6d9054</id>
<content type='text'>
When creating a timewait socket, we need to arm the timer before
allowing other cpus to find it. The signal allowing cpus to find
the socket is setting tw_refcnt to non zero value.

As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
call inet_twsk_schedule() first.

This also means we need to remove tw_refcnt changes from
inet_twsk_schedule() and let the caller handle it.

Note that because we use mod_timer_pinned(), we have the guarantee
the timer wont expire before we set tw_refcnt as we run in BH context.

To make things more readable I introduced inet_twsk_reschedule() helper.

When rearming the timer, we can use mod_timer_pending() to make sure
we do not rearm a canceled timer.

Note: This bug can possibly trigger if packets of a flow can hit
multiple cpus. This does not normally happen, unless flow steering
is broken somehow. This explains this bug was spotted ~5 months after
its introduction.

A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
but will be provided in a separate patch for proper tracking.

Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Ying Cai &lt;ycai@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: inet_twsk_deschedule factorization</title>
<updated>2015-07-09T22:12:20+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-07-08T21:28:30+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=dbe7faa4045ea83a37b691b12bb02a8f86c2d2e9'/>
<id>urn:sha1:dbe7faa4045ea83a37b691b12bb02a8f86c2d2e9</id>
<content type='text'>
inet_twsk_deschedule() calls are followed by inet_twsk_put().

Only particular case is in inet_twsk_purge() but there is no point
to defer the inet_twsk_put() after re-enabling BH.

Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
and move the inet_twsk_put() inside.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: simplify timewait refcounting</title>
<updated>2015-07-09T22:12:20+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-07-08T21:28:29+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=fc01538f9fb75572c969ca9988176ffc2a8741d6'/>
<id>urn:sha1:fc01538f9fb75572c969ca9988176ffc2a8741d6</id>
<content type='text'>
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()

In particular, deferred inet_twsk_put() added in commit
13475a30b66cd ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: tw_timer_handler() is static</title>
<updated>2015-05-13T19:21:33+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-05-12T13:22:56+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=216f8bb9f67ff636769064f86812e1ce6466e4af'/>
<id>urn:sha1:216f8bb9f67ff636769064f86812e1ce6466e4af</id>
<content type='text'>
tw_timer_handler() is only used from net/ipv4/inet_timewait_sock.c

Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: get rid of central timewait timer</title>
<updated>2015-04-13T20:40:05+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-04-13T01:51:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=789f558cfb3680aeb52de137418637f6b04b7d22'/>
<id>urn:sha1:789f558cfb3680aeb52de137418637f6b04b7d22</id>
<content type='text'>
Using a timer wheel for timewait sockets was nice ~15 years ago when
memory was expensive and machines had a single processor.

This does not scale, code is ugly and source of huge latencies
(Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

We can afford to use an extra 64 bytes per timewait sock and spread
timewait load to all cpus to have better behavior.

Tested:

On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
on the target (lpaa24)

Before patch :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
419594

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
437171

While test is running, we can observe 25 or even 33 ms latencies.

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

After patch :

About 90% increase of throughput :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
810442

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
800992

And latencies are kept to minimal values during this load, even
if network utilization is 90% higher :

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
