<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/ipv4/inet_connection_sock.c, branch v3.8-rc5</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.8-rc5</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.8-rc5'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2012-12-14T18:14:07+00:00</updated>
<entry>
<title>inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock</title>
<updated>2012-12-14T18:14:07+00:00</updated>
<author>
<name>Christoph Paasch</name>
<email>christoph.paasch@uclouvain.be</email>
</author>
<published>2012-12-14T04:07:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e337e24d6624e74a558aa69071e112a65f7b5758'/>
<id>urn:sha1:e337e24d6624e74a558aa69071e112a65f7b5758</id>
<content type='text'>
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
  comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
  hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff8153d190&gt;] kmemleak_alloc+0x21/0x3e
    [&lt;ffffffff810ab3e7&gt;] kmem_cache_alloc+0xb5/0xc5
    [&lt;ffffffff8149b65b&gt;] sk_prot_alloc.isra.53+0x2b/0xcd
    [&lt;ffffffff8149b784&gt;] sk_clone_lock+0x16/0x21e
    [&lt;ffffffff814d711a&gt;] inet_csk_clone_lock+0x10/0x7b
    [&lt;ffffffff814ebbc3&gt;] tcp_create_openreq_child+0x21/0x481
    [&lt;ffffffff814e8fa5&gt;] tcp_v4_syn_recv_sock+0x3a/0x23b
    [&lt;ffffffff814ec5ba&gt;] tcp_check_req+0x29f/0x416
    [&lt;ffffffff814e8e10&gt;] tcp_v4_do_rcv+0x161/0x2bc
    [&lt;ffffffff814eb917&gt;] tcp_v4_rcv+0x6c9/0x701
    [&lt;ffffffff814cea9f&gt;] ip_local_deliver_finish+0x70/0xc4
    [&lt;ffffffff814cec20&gt;] ip_local_deliver+0x4e/0x7f
    [&lt;ffffffff814ce9f8&gt;] ip_rcv_finish+0x1fc/0x233
    [&lt;ffffffff814cee68&gt;] ip_rcv+0x217/0x267
    [&lt;ffffffff814a7bbe&gt;] __netif_receive_skb+0x49e/0x553
    [&lt;ffffffff814a7cc3&gt;] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version &gt;= 2.6.37.

Signed-off-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: better retrans tracking for defer-accept</title>
<updated>2012-11-03T18:45:00+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-10-27T23:16:46+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e6c022a4fa2d2d9ca9d0a7ac3b05ad988f39fc30'/>
<id>urn:sha1:e6c022a4fa2d2d9ca9d0a7ac3b05ad988f39fc30</id>
<content type='text'>
For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req-&gt;retrans each time timeout triggers
while no SYNACK is sent.

SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)

TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.

Decouple req-&gt;retrans field into two independent fields :

num_retrans : number of retransmit
num_timeout : number of timeouts

num_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.

Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if -&gt;rtx_syn_ack() succeeded.

Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.

Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.

Reported-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Cc: Elliott Hughes &lt;enh@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: introduce rt_uses_gateway</title>
<updated>2012-10-08T21:42:36+00:00</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2012-10-08T11:41:18+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=155e8336c373d14d87a7f91e356d85ef4b93b8f9'/>
<id>urn:sha1:155e8336c373d14d87a7f91e356d85ef4b93b8f9</id>
<content type='text'>
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix TFO regression</title>
<updated>2012-09-06T18:21:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-09-06T08:07:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7ab4551f3b391818e29263279031dca1e26417c6'/>
<id>urn:sha1:7ab4551f3b391818e29263279031dca1e26417c6</id>
<content type='text'>
Fengguang Wu reported various panics and bisected to commit
8336886f786fdac (tcp: TCP Fast Open Server - support TFO listeners)

Fix this by making sure socket is a TCP socket before accessing TFO data
structures.

[  233.046014] kfree_debugcheck: out of range ptr ea6000000bb8h.
[  233.047399] ------------[ cut here ]------------
[  233.048393] kernel BUG at /c/kernel-tests/src/stable/mm/slab.c:3074!
[  233.048393] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  233.048393] Modules linked in:
[  233.048393] CPU 0
[  233.048393] Pid: 3929, comm: trinity-watchdo Not tainted 3.6.0-rc3+
#4192 Bochs Bochs
[  233.048393] RIP: 0010:[&lt;ffffffff81169653&gt;]  [&lt;ffffffff81169653&gt;]
kfree_debugcheck+0x27/0x2d
[  233.048393] RSP: 0018:ffff88000facbca8  EFLAGS: 00010092
[  233.048393] RAX: 0000000000000031 RBX: 0000ea6000000bb8 RCX:
00000000a189a188
[  233.048393] RDX: 000000000000a189 RSI: ffffffff8108ad32 RDI:
ffffffff810d30f9
[  233.048393] RBP: ffff88000facbcb8 R08: 0000000000000002 R09:
ffffffff843846f0
[  233.048393] R10: ffffffff810ae37c R11: 0000000000000908 R12:
0000000000000202
[  233.048393] R13: ffffffff823dbd5a R14: ffff88000ec5bea8 R15:
ffffffff8363c780
[  233.048393] FS:  00007faa6899c700(0000) GS:ffff88001f200000(0000)
knlGS:0000000000000000
[  233.048393] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  233.048393] CR2: 00007faa6841019c CR3: 0000000012c82000 CR4:
00000000000006f0
[  233.048393] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  233.048393] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  233.048393] Process trinity-watchdo (pid: 3929, threadinfo
ffff88000faca000, task ffff88000faec600)
[  233.048393] Stack:
[  233.048393]  0000000000000000 0000ea6000000bb8 ffff88000facbce8
ffffffff8116ad81
[  233.048393]  ffff88000ff588a0 ffff88000ff58850 ffff88000ff588a0
0000000000000000
[  233.048393]  ffff88000facbd08 ffffffff823dbd5a ffffffff823dbcb0
ffff88000ff58850
[  233.048393] Call Trace:
[  233.048393]  [&lt;ffffffff8116ad81&gt;] kfree+0x5f/0xca
[  233.048393]  [&lt;ffffffff823dbd5a&gt;] inet_sock_destruct+0xaa/0x13c
[  233.048393]  [&lt;ffffffff823dbcb0&gt;] ? inet_sk_rebuild_header
+0x319/0x319
[  233.048393]  [&lt;ffffffff8231c307&gt;] __sk_free+0x21/0x14b
[  233.048393]  [&lt;ffffffff8231c4bd&gt;] sk_free+0x26/0x2a
[  233.048393]  [&lt;ffffffff825372db&gt;] sctp_close+0x215/0x224
[  233.048393]  [&lt;ffffffff810d6835&gt;] ? lock_release+0x16f/0x1b9
[  233.048393]  [&lt;ffffffff823daf12&gt;] inet_release+0x7e/0x85
[  233.048393]  [&lt;ffffffff82317d15&gt;] sock_release+0x1f/0x77
[  233.048393]  [&lt;ffffffff82317d94&gt;] sock_close+0x27/0x2b
[  233.048393]  [&lt;ffffffff81173bbe&gt;] __fput+0x101/0x20a
[  233.048393]  [&lt;ffffffff81173cd5&gt;] ____fput+0xe/0x10
[  233.048393]  [&lt;ffffffff810a3794&gt;] task_work_run+0x5d/0x75
[  233.048393]  [&lt;ffffffff8108da70&gt;] do_exit+0x290/0x7f5
[  233.048393]  [&lt;ffffffff82707415&gt;] ? retint_swapgs+0x13/0x1b
[  233.048393]  [&lt;ffffffff8108e23f&gt;] do_group_exit+0x7b/0xba
[  233.048393]  [&lt;ffffffff8108e295&gt;] sys_exit_group+0x17/0x17
[  233.048393]  [&lt;ffffffff8270de10&gt;] tracesys+0xdd/0xe2
[  233.048393] Code: 59 01 5d c3 55 48 89 e5 53 41 50 0f 1f 44 00 00 48
89 fb e8 d4 b0 f0 ff 84 c0 75 11 48 89 de 48 c7 c7 fc fa f7 82 e8 0d 0f
57 01 &lt;0f&gt; 0b 5f 5b 5d c3 55 48 89 e5 0f 1f 44 00 00 48 63 87 d8 00 00
[  233.048393] RIP  [&lt;ffffffff81169653&gt;] kfree_debugcheck+0x27/0x2d
[  233.048393]  RSP &lt;ffff88000facbca8&gt;

Reported-by: Fengguang Wu &lt;wfg@linux.intel.com&gt;
Tested-by: Fengguang Wu &lt;wfg@linux.intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: "H.K. Jerry Chu" &lt;hkchu@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: TCP Fast Open Server - support TFO listeners</title>
<updated>2012-09-01T00:02:19+00:00</updated>
<author>
<name>Jerry Chu</name>
<email>hkchu@google.com</email>
</author>
<published>2012-08-31T12:29:12+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8336886f786fdacbc19b719c1f7ea91eb70706d4'/>
<id>urn:sha1:8336886f786fdacbc19b719c1f7ea91eb70706d4</id>
<content type='text'>
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Use newinet-&gt;inet_opt in inet_csk_route_child_sock()</title>
<updated>2012-08-21T21:49:11+00:00</updated>
<author>
<name>Christoph Paasch</name>
<email>christoph.paasch@uclouvain.be</email>
</author>
<published>2012-08-20T02:52:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1a7b27c97ce675b42eeb7bfaf6e15c34f35c8f95'/>
<id>urn:sha1:1a7b27c97ce675b42eeb7bfaf6e15c34f35c8f95</id>
<content type='text'>
Since 0e734419923bd ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq-&gt;opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet-&gt;inet_opt instead of ireq-&gt;opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi &lt;luca.boccassi@gmail.com&gt;
Signed-off-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.</title>
<updated>2012-07-20T20:36:54+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-17T21:02:46+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ba3f7f04ef2b19aace38f855aedd17fe43035d50'/>
<id>urn:sha1:ba3f7f04ef2b19aace38f855aedd17fe43035d50</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Adjust semantics of rt-&gt;rt_gateway.</title>
<updated>2012-07-20T20:31:20+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-13T12:03:45+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f8126f1d5136be1ca1a3536d43ad7a710b5620f8'/>
<id>urn:sha1:f8126f1d5136be1ca1a3536d43ad7a710b5620f8</id>
<content type='text'>
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph-&gt;daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Tested-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
</content>
</entry>
<entry>
<title>ipv4: fix rcu splat</title>
<updated>2012-07-17T20:47:33+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-17T20:42:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5abf7f7e0f6bdbfcac737f636497d7016d9507eb'/>
<id>urn:sha1:5abf7f7e0f6bdbfcac737f636497d7016d9507eb</id>
<content type='text'>
free_nh_exceptions() should use rcu_dereference_protected(..., 1)
since its called after one RCU grace period.

Also add some const-ification in recent code.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Pass optional SKB and SK arguments to dst_ops-&gt;{update_pmtu,redirect}()</title>
<updated>2012-07-17T10:29:28+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-17T10:29:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6700c2709c08d74ae2c3c29b84a30da012dbc7f1'/>
<id>urn:sha1:6700c2709c08d74ae2c3c29b84a30da012dbc7f1</id>
<content type='text'>
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
