lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2011-05-08	inet: Pass flowi to ->queue_xmit().	David S. Miller
	This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.	David S. Miller
	Operation order is now transposed, we first create the child socket then we try to hook up the route. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Create inet_csk_route_child_sock().	David S. Miller
	This is just like inet_csk_route_req() except that it operates after we've created the new child socket. In this way we can use the new socket's cork flow for proper route key storage. This will be used by DCCP and TCP child socket creation handling. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	sctp: Store a flowi in transports to provide persistent keying.	David S. Miller
	Several future simplifications are possible now because of this. For example, the sctp_addr unions can simply refer directly to the flowi information. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Use cork flow in ip_queue_xmit()	David S. Miller
	All invokers of ip_queue_xmit() must make certain that the socket is locked. All of SCTP, TCP, DCCP, and L2TP now make sure this is the case. Therefore we can use the cork flow during output route lookup in ip_queue_xmit() when the socket route check fails. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Use cork flow in inet_sk_{reselect_saddr,rebuild_header}()	David S. Miller
	These two functions must be invoked only when the socket is locked (because socket identity modifications are made non-atomically). Therefore we can use the cork flow for output route lookups. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Lock socket and use cork flow in ip4_datagram_connect().	David S. Miller
	This is to make sure that an l2tp socket's inet cork flow is fully filled in, when it's encapsulated in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	l2tp: Use cork flow in l2tp_ip_connect() and l2tp_ip_sendmsg()	David S. Miller
	Now that the socket is consistently locked in these two routines, this transformation is legal. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	l2tp: Fix locking in l2tp_core.c	David S. Miller
	l2tp_xmit_skb() must take the socket lock. It makes use of ip_queue_xmit() which expects to execute in a socket atomic context. Since we execute this function in software interrupts, we cannot use the usual lock_sock()/release_sock() sequence, instead we have to use bh_lock_sock() and see if a user has the socket locked, and if so drop the packet. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	l2tp: Fix locking in l2tp_ip.c	David S. Miller
	Both l2tp_ip_connect() and l2tp_ip_sendmsg() must take the socket lock. They both modify socket state non-atomically, and in particular l2tp_ip_sendmsg() increments socket private counters without using atomic operations. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	tcp: Use cork flow in tcp_v4_connect()	David S. Miller
	Since this is invoked from inet_stream_connect() the socket is locked and therefore this usage is safe. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	dccp: Use cork flow in dccp_v4_connect()	David S. Miller
	Since this is invoked from inet_stream_connect() the socket is locked and therefore this usage is safe. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ethtool: remove phys_id from ethtool_ops	Stephen Hemminger
	After that all the upstream kernel drivers now use phys_id, and the old ethtool_ops interface (phys_id) can be removed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-06	ipv4: Initialize cork->opt using NULL not 0.	David S. Miller
	Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-06	ipv4: Initialize on-stack cork more efficiently.	David S. Miller
	ip_setup_cork() explicitly initializes every member of inet_cork except flags, addr, and opt. So we can simply set those three members to zero instead of using a memset() via an empty struct assignment. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-05-06	inet: Decrease overhead of on-stack inet_cork.	David S. Miller
	When we fast path datagram sends to avoid locking by putting the inet_cork on the stack we use up lots of space that isn't necessary. This is because inet_cork contains a "struct flowi" which isn't used in these code paths. Split inet_cork to two parts, "inet_cork" and "inet_cork_full". Only the latter of which has the "struct flowi" and is what is stored in inet_sock. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-05-05	Merge branch 'master' of ↵	David S. Miller
	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c
2011-05-05	Merge branch 'for-davem' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2011-05-05	net: Add sendmmsg socket system call	Anton Blanchard
	This patch adds a multiple message send syscall and is the send version of the existing recvmmsg syscall. This is heavily based on the patch by Arnaldo that added recvmmsg. I wrote a microbenchmark to test the performance gains of using this new syscall: http://ozlabs.org/~anton/junkcode/sendmmsg_test.c The test was run on a ppc64 box with a 10 Gbit network card. The benchmark can send both UDP and RAW ethernet packets. 64B UDP batch pkts/sec 1 804570 2 872800 (+ 8 %) 4 916556 (+14 %) 8 939712 (+17 %) 16 952688 (+18 %) 32 956448 (+19 %) 64 964800 (+20 %) 64B raw socket batch pkts/sec 1 1201449 2 1350028 (+12 %) 4 1461416 (+22 %) 8 1513080 (+26 %) 16 1541216 (+28 %) 32 1553440 (+29 %) 64 1557888 (+30 %) We see a 20% improvement in throughput on UDP send and 30% on raw socket send. [ Add sparc syscall entries. -DaveM ] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05	net: call dev_alloc_name from register_netdevice	Jiri Pirko
	Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by 84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05	Merge branch 'master' of ↵	John W. Linville
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/libertas/if_cs.c drivers/net/wireless/rtlwifi/pci.c net/bluetooth/l2cap_sock.c
2011-05-05	mac80211: Fix a warning due to skipping tailroom reservation for IV	Mohammed Shafi Shajakhan
	The devices that require IV generation in software need tailroom reservation for ICVs used in TKIP or WEP encryptions. Currently, decision to skip the tailroom reservation in the tx path was taken only on whether driver wants MMIC to be generated in software or not. Following patch appends IV generation check for such decisions and fixes the following warning. WARNING: at net/mac80211/wep.c:101 ieee80211_wep_add_iv+0x56/0xf3() Hardware name: 64756D6 Modules linked in: ath9k ath9k_common ath9k_hw Pid: 0, comm: swapper Tainted: G W 2.6.39-rc5-wl Call Trace: [<c102fd29>] warn_slowpath_common+0x65/0x7a [<c1465c4e>] ? ieee80211_wep_add_iv+0x56/0xf3 [<c102fd4d>] warn_slowpath_null+0xf/0x13 [<c1465c4e>] ieee80211_wep_add_iv+0x56/0xf3 [<c1466007>] ieee80211_crypto_wep_encrypt+0x63/0x88 [<c1478bf3>] ieee80211_tx_h_encrypt+0x2f/0x63 [<c1478cba>] invoke_tx_handlers+0x93/0xe1 [<c1478eda>] ieee80211_tx+0x4b/0x6d [<c147907c>] ieee80211_xmit+0x180/0x188 [<c147779d>] ? ieee80211_skb_resize+0x95/0xd9 [<c1479edf>] ieee80211_subif_start_xmit+0x64f/0x668 [<c13956fc>] dev_hard_start_xmit+0x368/0x48c [<c13a8bd6>] sch_direct_xmit+0x4d/0x101 [<c1395ae1>] dev_queue_xmit+0x2c1/0x43f [<c13a74a2>] ? eth_header+0x1e/0x90 [<c13a7400>] ? eth_type_trans+0x91/0xc2 [<c13a7484>] ? eth_rebuild_header+0x53/0x53 [<c139f079>] neigh_resolve_output+0x223/0x27e [<c13c6b23>] ip_finish_output2+0x1d4/0x1fe [<c13c6bc6>] ip_finish_output+0x79/0x7d [<c13c6cbe>] T.1075+0x43/0x48 [<c13c6e6e>] ip_output+0x75/0x7b [<c13c4970>] dst_output+0xc/0xe [<c13c62c9>] ip_local_out+0x17/0x1a [<c13c67bb>] ip_queue_xmit+0x2aa/0x2f8 [<c138b742>] ? sk_setup_caps+0x21/0x92 [<c13d95ea>] ? __tcp_v4_send_check+0x7e/0xb7 [<c13d5d2e>] tcp_transmit_skb+0x6a1/0x6d7 [<c13d533b>] ? tcp_established_options+0x20/0x8b [<c13d6f28>] tcp_retransmit_skb+0x43a/0x527 [<c13d8d6d>] tcp_retransmit_timer+0x32e/0x45d [<c13d8f23>] tcp_write_timer+0x87/0x16c [<c103a030>] run_timer_softirq+0x156/0x1f9 [<c13d8e9c>] ? tcp_retransmit_timer+0x45d/0x45d [<c1034d65>] __do_softirq+0x97/0x14a [<c1034cce>] ? irq_enter+0x4d/0x4d Cc: Yogesh Powar <yogeshp@marvell.com> Reported-by: Fabio Rossi <rossi.f@inwind.it> Tested-by: Fabio Rossi <rossi.f@inwind.it> Signed-off-by: Mohammed Shafi Shajakhan <mshajakhan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-05-04	can: rename can_try_module_get to can_get_proto	Kurt Van Dijck
	can: rename can_try_module_get to can_get_proto can_try_module_get does return a struct can_proto. The name explains what is done in so much detail that a caller may not notice that a struct can_proto is locked/unlocked. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	can: make struct can_proto const	Kurt Van Dijck
	commit 53914b67993c724cec585863755c9ebc8446e83b had the same message. That commit did put everything in place but did not make can_proto const itself. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	net: ip_expire() must revalidate route	Eric Dumazet
	Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, in case timeout is fired. When a frame is defragmented, we use last skb dst field when building final skb. Its dst is valid, since we are in rcu read section. But if a timeout occurs, we take first queued fragment to build one ICMP TIME EXCEEDED message. Problem is all queued skb have weak dst pointers, since we escaped RCU critical section after their queueing. icmp_send() might dereference a now freed (and possibly reused) part of memory. Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is the only possible choice. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	ipv6: Use flowi4->{daddr,saddr} in ipip6_tunnel_xmit().	David S. Miller
	Instead of rt->rt_{dst,src} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels.	David S. Miller
	First, make callers pass on-stack flowi4 to ip_route_output_gre() so they can get at the fully resolved flow key. Next, use that in ipgre_tunnel_xmit() to avoid the need to use rt->rt_{dst,src}. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	ipv4: Pass explicit saddr/daddr args to ipmr_get_route().	David S. Miller
	This eliminates the need to use rt->rt_{src,dst}. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	ipv4: In ip_build_and_send_pkt() use 'saddr' and 'daddr' args passed in.	David S. Miller
	Instead of rt->rt_{dst,src} The only tricky part is source route option handling. If the source route option is enabled we can't just use plain 'daddr', we have to use opt->opt.faddr. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04	ipv4: Use flowi4->{daddr,saddr} in ipip_tunnel_xmit().	David S. Miller
	Instead of rt->rt_{dst,src} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03	sctp: Use flowi4's {saddr,daddr} in sctp_v4_dst_saddr() and sctp_v4_get_dst()	David S. Miller
	Instead of rt->rt_{src,dst} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03	ipv4: Use flowi4's {saddr,daddr} in igmpv3_newpack() and igmp_send_report()	David S. Miller
	Instead of rt->rt_{src,dst} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03	ipv4: Make caller provide on-stack flow key to ip_route_output_ports().	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03	dccp: Use flowi4->saddr in dccp_v4_connect()	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03	ipv4: Renamt struct rtable's rt_tos to rt_key_tos.	David S. Miller
	To more accurately reflect that it is purely a routing cache lookup key and is used in no other context. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03	ipv4: Rework ipmr_rt_fib_lookup() flow key initialization.	David S. Miller
	Use information from the skb as much as possible, currently this means daddr, saddr, and TOS. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02	sysctl: net: call unregister_net_sysctl_table where needed	Lucian Adrian Grijincu
	ctl_table_headers registered with register_net_sysctl_table should have been unregistered with the equivalent unregister_net_sysctl_table Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02	networking: inappropriate ioctl operation should return ENOTTY	Lifeng Sun
	ioctl() calls against a socket with an inappropriate ioctl operation are incorrectly returning EINVAL rather than ENOTTY: [ENOTTY] Inappropriate I/O control operation. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992 Signed-off-by: Lifeng Sun <lifongsun@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02	net: dont hold rtnl mutex during netlink dump callbacks	Eric Dumazet
	Four years ago, Patrick made a change to hold rtnl mutex during netlink dump callbacks. I believe it was a wrong move. This slows down concurrent dumps, making good old /proc/net/ files faster than rtnetlink in some situations. This occurred to me because one "ip link show dev ..." was _very_ slow on a workload adding/removing network devices in background. All dump callbacks are able to use RCU locking now, so this patch does roughly a revert of commits : 1c2d670f366 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks 6313c1e0992 : [RTNETLINK]: Remove unnecessary locking in dump callbacks This let writers fight for rtnl mutex and readers going full speed. It also takes care of phonet : phonet_route_get() is now called from rcu read section. I renamed it to phonet_route_get_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02	Merge branch 'batman-adv/next' of git://git.open-mesh.org/ecsv/linux-merge	David S. Miller

2011-05-02	ipv4: Make sure flowi4->{saddr,daddr} are always set.	David S. Miller
	Slow path output route resolution always makes sure that ->{saddr,daddr} are set, and also if we trigger into IPSEC resolution we initialize them as well, because xfrm_lookup() expects them to be fully resolved. But if we hit the fast path and flowi4->flowi4_proto is zero, we won't do this initialization. Therefore, move the IPSEC path initialization to the route cache lookup fast path to make sure these are always set. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02	mac80211: consolidate MIC failure report handling	Christian Lamparter
	Currently, mac80211 handles MIC failures differently depending on whenever they are detected by the stack's own software crypto or when are handed down from the driver. This patch tries to unify both by moving the special branch out of mac80211 rx hotpath and into into the software crypto part. This has the advantage that we can run a few more sanity checks on the data and verify if the key type was TKIP. This is very handy because several devices generate false postive MIC failure reports. Like carl9170, ath9k and wl12xx: <http://www.spinics.net/lists/linux-wireless/msg68494.html> "mac80211: report MIC failure for truncated packets in AP mode" Cc: Luciano Coelho <coelho@ti.com> Cc: Arik Nemtsov <arik@wizery.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-05-01	ipv4: don't spam dmesg with "Using LC-trie" messages	Alexey Dobriyan
	fib_trie_table() is called during netns creation and Chromium uses clone(CLONE_NEWNET) to sandbox renderer process. Don't print anything. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-01	af_unix: Only allow recv on connected seqpacket sockets.	Eric W. Biederman
	This fixes the following oops discovered by Dan Aloni: > Anyway, the following is the output of the Oops that I got on the > Ubuntu kernel on which I first detected the problem > (2.6.37-12-generic). The Oops that followed will be more useful, I > guess. >[ 5594.669852] BUG: unable to handle kernel NULL pointer dereference > at (null) > [ 5594.681606] IP: [<ffffffff81550b7b>] unix_dgram_recvmsg+0x1fb/0x420 > [ 5594.687576] PGD 2a05d067 PUD 2b951067 PMD 0 > [ 5594.693720] Oops: 0002 [#1] SMP > [ 5594.699888] last sysfs file: The bug was that unix domain sockets use a pseduo packet for connecting and accept uses that psudo packet to get the socket. In the buggy seqpacket case we were allowing unconnected sockets to call recvmsg and try to receive the pseudo packet. That is always wrong and as of commit 7361c36c5 the pseudo packet had become enough different from a normal packet that the kernel started oopsing. Do for seqpacket_recv what was done for seqpacket_send in 2.5 and only allow it on connected seqpacket sockets. Cc: stable@kernel.org Tested-by: Dan Aloni <dan@aloni.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-01	batman-adv: Make bat_priv->primary_if an rcu protected pointer	Marek Lindner
	The rcu protected macros rcu_dereference() and rcu_assign_pointer() for the bat_priv->primary_if need to be used, as well as spin/rcu locking. Otherwise we might end up using a primary_if pointer pointing to already freed memory. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-01	batman-adv: fix gw_node_update() and gw_election()	Antonio Quartulli
	This is a regression from c4aac1ab9b973798163b34939b522f01e4d28ac9 - gw_node_update() doesn't add a new gw_node in case of empty curr_gw. This means that at the beginning no gw_node is added, leading to an empty gateway list. - gw_election() is terminating in case of curr_gw == NULL. It has to terminate in case of curr_gw != NULL Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-01	batman-adv: Move definition of atomic_dec_not_zero() into main.h	Antonio Quartulli
	atomic_dec_not_zero() is very useful and it is currently defined multiple times. So it is possible to move it in main.h Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-01	batman-adv: orig_hash_find() manages rcu_lock/unlock internally	Antonio Quartulli
	orig_hash_find() manages rcu_lock/unlock internally and doesn't need to be surrounded by rcu_read_lock() / rcu_read_unlock() anymore Signed-off-by: Antonio Quartulli <ordex@autistici.org> Acked-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-04-29	ethtool: cosmetic: Use ethtool ethtool_cmd_speed API	David Decotigny
	This updates the network drivers so that they don't access the ethtool_cmd::speed field directly, but use ethtool_cmd_speed() instead. For most of the drivers, these changes are purely cosmetic and don't fix any problem, such as for those 1GbE/10GbE drivers that indirectly call their own ethtool get_settings()/mii_ethtool_gset(). The changes are meant to enforce code consistency and provide robustness with future larger throughputs, at the expense of a few CPU cycles for each ethtool operation. All drivers compiled with make allyesconfig ion x86_64 have been updated. Tested: make allyesconfig on x86_64 + e1000e/bnx2x work Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-29	ethtool: Use full 32 bit speed range in ethtool's set_settings	David Decotigny
	This makes sure the ethtool's set_settings() callback of network drivers don't ignore the 16 most significant bits when ethtool calls their set_settings(). All drivers compiled with make allyesconfig on x86_64 have been updated. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>