summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWillem de Bruijn <willemb@google.com>2012-10-26 11:52:08 +0000
committerDavid S. Miller <davem@davemloft.net>2012-10-31 13:56:40 -0400
commitecd5cf5dc5b07bc57aedf4c92453d6c343e3d840 (patch)
tree1e1b88386c2f7ceaabf56d362bf3b7789c91f9e1
parent9add4d8174907c9977ad29cf6e71460829ec954e (diff)
downloadlwn-ecd5cf5dc5b07bc57aedf4c92453d6c343e3d840.tar.gz
lwn-ecd5cf5dc5b07bc57aedf4c92453d6c343e3d840.zip
net: compute skb->rxhash if nic hash may be 3-tuple
Network device drivers can communicate a Toeplitz hash in skb->rxhash, but devices differ in their hashing capabilities. All compute a 5-tuple hash for TCP over IPv4, but for other connection-oriented protocols, they may compute only a 3-tuple. This breaks RPS load balancing, e.g., for TCP over IPv6 flows. Additionally, for GRE and other tunnels, the kernel computes a 5-tuple hash over the inner packet if possible, but devices do not. This patch recomputes the rxhash in software in all cases where it cannot be certain that a 5-tuple was computed. Device drivers can avoid recomputation by setting the skb->l4_rxhash flag. Recomputing adds cycles to each packet when RPS is enabled or the packet arrives over a tunnel. A comparison of 200x TCP_STREAM between two servers running unmodified netnext with rxhash computation in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows how much time is spent in __skb_get_rxhash in this worst case: 0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash 0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash 0.05% swapper [kernel.kallsyms] [k] __skb_get_rxhash With 200x TCP_RR it increases to 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash I considered having the patch explicitly skips recomputation when it knows that it will not improve the hash (TCP over IPv4), but that conditional complicates code without saving many cycles in practice, because it has to take place after flow dissector. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r--include/linux/skbuff.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6a2c34e6d962..a2a0bdb95a8f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -643,7 +643,7 @@ extern unsigned int skb_find_text(struct sk_buff *skb, unsigned int from,
extern void __skb_get_rxhash(struct sk_buff *skb);
static inline __u32 skb_get_rxhash(struct sk_buff *skb)
{
- if (!skb->rxhash)
+ if (!skb->l4_rxhash)
__skb_get_rxhash(skb);
return skb->rxhash;