<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/ipv4/fib_hash.c, branch v3.13-rc3</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.13-rc3</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.13-rc3'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2011-02-01T23:35:25+00:00</updated>
<entry>
<title>ipv4: Remove fib_hash.</title>
<updated>2011-02-01T23:35:25+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-01T23:15:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3630b7c050d9c3564f143d595339fc06b888d6f3'/>
<id>urn:sha1:3630b7c050d9c3564f143d595339fc06b888d6f3</id>
<content type='text'>
The time has finally come to remove the hash based routing table
implementation in ipv4.

FIB Trie is mature, well tested, and I've done an audit of it's code
to confirm that it implements insert, delete, and lookup with the same
identical semantics as fib_hash did.

If there are any semantic differences found in fib_trie, we should
simply fix them.

I've placed the trie statistic config option under advanced router
configuration.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
</content>
</entry>
<entry>
<title>ipv4: Consolidate all default route selection implementations.</title>
<updated>2011-02-01T00:16:50+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-01T00:16:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0c838ff1ade71162775afffd9e5c6478a60bdca6'/>
<id>urn:sha1:0c838ff1ade71162775afffd9e5c6478a60bdca6</id>
<content type='text'>
Both fib_trie and fib_hash have a local implementation of
fib_table_select_default().  This is completely unnecessary
code duplication.

Since we now remember the fib_table and the head of the fib
alias list of the default route, we can implement one single
generic version of this routine.

Looking at the fib_hash implementation you may get the impression
that it's possible for there to be multiple top-level routes in
the table for the default route.  The truth is, it isn't, the
insert code will only allow one entry to exist in the zero
prefix hash table, because all keys evaluate to zero and all
keys in a hash table must be unique.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Remember FIB alias list head and table in lookup results.</title>
<updated>2011-02-01T00:10:03+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-01T00:10:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5b4704419cbd0b7597a91c19f9e8e8b17c1af071'/>
<id>urn:sha1:5b4704419cbd0b7597a91c19f9e8e8b17c1af071</id>
<content type='text'>
This will be used later to implement fib_select_default() in a
completely generic manner, instead of the current situation where the
default route is re-looked up in the TRIE/HASH table and then the
available aliases are analyzed.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib: Fix fib zone and its hash leak on namespace stop</title>
<updated>2010-10-28T17:27:03+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@parallels.com</email>
</author>
<published>2010-10-28T02:00:43+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4aa2c466a7733af093a526e9d1cdd0b3b90d47e9'/>
<id>urn:sha1:4aa2c466a7733af093a526e9d1cdd0b3b90d47e9</id>
<content type='text'>
When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib_hash: fix rcu sparse and logical errors</title>
<updated>2010-10-26T18:42:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-26T03:24:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ded85aa86bff953190cb893fceeecaadcab53a80'/>
<id>urn:sha1:ded85aa86bff953190cb893fceeecaadcab53a80</id>
<content type='text'>
While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to
fz-&gt;fz_hash for real.

-	&amp;fz-&gt;fz_hash[fn_hash(f-&gt;fn_key, fz)]
+	rcu_dereference(fz-&gt;fz_hash) + fn_hash(f-&gt;fn_key, fz)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib: introduce fib_alias_accessed() helper</title>
<updated>2010-10-21T10:09:41+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-20T22:03:38+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9b0c290e78d667e6a483bde8c7cef7dd15f49017'/>
<id>urn:sha1:9b0c290e78d667e6a483bde8c7cef7dd15f49017</id>
<content type='text'>
Perf tools session at NFWS 2010 pointed out a false sharing on struct
fib_alias that can be avoided pretty easily, if we set FA_S_ACCESSED bit
only if needed (ie : not already set)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib_hash: RCU conversion phase 2</title>
<updated>2010-10-17T20:53:16+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-14T20:56:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=19f572565ef66a0439574fd2299a7c804147e133'/>
<id>urn:sha1:19f572565ef66a0439574fd2299a7c804147e133</id>
<content type='text'>
Get rid of fib_hash_lock rwlock.

The fn_zone hash table resize is the noticeable part of this patch.

I added a seqlock per fn_zone, so that readers can restart their lookup
in the (very rare) case a writer expanded the hash table.

Add rcu heads in fib_alias and fib_node, use call_rcu() to defer their
freeing, and use appropriate _rcu list manipulations.

Stress test (160.000.000 udp frames sent, IP route cache disabled to
mimic DDOS attack, FIB_HASH)

Before:
real	0m41.191s
user	0m13.137s
sys	8m55.241s

After:
real	0m38.091s
user	0m13.189s
sys	7m53.018s

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib_hash: RCU conversion phase 1</title>
<updated>2010-10-17T20:53:15+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-14T20:53:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=117a8cdea3647e8e11fac10d14eafefc20f9bda5'/>
<id>urn:sha1:117a8cdea3647e8e11fac10d14eafefc20f9bda5</id>
<content type='text'>
First step for RCU conversion of fib_hash :

struct fn_zone are created and never deleted.

Very classic conversion, using rcu_assign_pointer(), rcu_dereference()
and rtnl_dereference() verbs.

__rcu markers on fz_next and fn_zone_list

They are created under RTNL, we dont need fib_hash_lock anymore in
fn_new_zone().

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib_hash: embed initial hash table in fn_zone</title>
<updated>2010-10-17T20:53:15+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-14T20:53:04+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9bef83edfba72ba58b42c14fb046da2199574bc0'/>
<id>urn:sha1:9bef83edfba72ba58b42c14fb046da2199574bc0</id>
<content type='text'>
While looking for false sharing problems, I noticed
sizeof(struct fn_zone) was small (28 bytes) and possibly sharing a cache
line with an often written kernel structure.

Most of the time, fn_zone uses its initial hash table of 16 slots.

We can avoid the false sharing problem by embedding this initial hash
table in fn_zone itself, so that sizeof(fn_zone) &gt; L1_CACHE_BYTES

We did a similar optimization in commit a6501e080c (Reduce memory needs
and speedup lookups)

Add a fz_revorder field to speedup fn_hash() a bit.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>fib: RCU conversion of fib_lookup()</title>
<updated>2010-10-06T03:39:38+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-05T10:41:36+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ebc0ffae5dfb4447e0a431ffe7fe1d467c48bbb9'/>
<id>urn:sha1:ebc0ffae5dfb4447e0a431ffe7fe1d467c48bbb9</id>
<content type='text'>
fib_lookup() converted to be called in RCU protected context, no
reference taken and released on a contended cache line (fib_clntref)

fib_table_lookup() and fib_semantic_match() get an additional parameter.

struct fib_info gets an rcu_head field, and is freed after an rcu grace
period.

Stress test :
(Sending 160.000.000 UDP frames on same neighbour,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_HASH) (about same results for FIB_TRIE)

Before patch :

real	1m31.199s
user	0m13.761s
sys	23m24.780s

After patch:

real	1m5.375s
user	0m14.997s
sys	15m50.115s

Before patch Profile :

13044.00 15.4% __ip_route_output_key vmlinux
 8438.00 10.0% dst_destroy           vmlinux
 5983.00  7.1% fib_semantic_match    vmlinux
 5410.00  6.4% fib_rules_lookup      vmlinux
 4803.00  5.7% neigh_lookup          vmlinux
 4420.00  5.2% _raw_spin_lock        vmlinux
 3883.00  4.6% rt_set_nexthop        vmlinux
 3261.00  3.9% _raw_read_lock        vmlinux
 2794.00  3.3% fib_table_lookup      vmlinux
 2374.00  2.8% neigh_resolve_output  vmlinux
 2153.00  2.5% dst_alloc             vmlinux
 1502.00  1.8% _raw_read_lock_bh     vmlinux
 1484.00  1.8% kmem_cache_alloc      vmlinux
 1407.00  1.7% eth_header            vmlinux
 1406.00  1.7% ipv4_dst_destroy      vmlinux
 1298.00  1.5% __copy_from_user_ll   vmlinux
 1174.00  1.4% dev_queue_xmit        vmlinux
 1000.00  1.2% ip_output             vmlinux

After patch Profile :

13712.00 15.8% dst_destroy             vmlinux
 8548.00  9.9% __ip_route_output_key   vmlinux
 7017.00  8.1% neigh_lookup            vmlinux
 4554.00  5.3% fib_semantic_match      vmlinux
 4067.00  4.7% _raw_read_lock          vmlinux
 3491.00  4.0% dst_alloc               vmlinux
 3186.00  3.7% neigh_resolve_output    vmlinux
 3103.00  3.6% fib_table_lookup        vmlinux
 2098.00  2.4% _raw_read_lock_bh       vmlinux
 2081.00  2.4% kmem_cache_alloc        vmlinux
 2013.00  2.3% _raw_spin_lock          vmlinux
 1763.00  2.0% __copy_from_user_ll     vmlinux
 1763.00  2.0% ip_output               vmlinux
 1761.00  2.0% ipv4_dst_destroy        vmlinux
 1631.00  1.9% eth_header              vmlinux
 1440.00  1.7% _raw_read_unlock_bh     vmlinux

Reference results, if IP route cache is enabled :

real	0m29.718s
user	0m10.845s
sys	7m37.341s

25213.00 29.5% __ip_route_output_key   vmlinux
 9011.00 10.5% dst_release             vmlinux
 4817.00  5.6% ip_push_pending_frames  vmlinux
 4232.00  5.0% ip_finish_output        vmlinux
 3940.00  4.6% udp_sendmsg             vmlinux
 3730.00  4.4% __copy_from_user_ll     vmlinux
 3716.00  4.4% ip_route_output_flow    vmlinux
 2451.00  2.9% __xfrm_lookup           vmlinux
 2221.00  2.6% ip_append_data          vmlinux
 1718.00  2.0% _raw_spin_lock_bh       vmlinux
 1655.00  1.9% __alloc_skb             vmlinux
 1572.00  1.8% sock_wfree              vmlinux
 1345.00  1.6% kfree                   vmlinux

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
