lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2013-12-14	ipv6: fix compiler warning in ipv6_exthdrs_len	Hannes Frederic Sowa
	Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") used an uninitialized variable which leads to the following compiler warning: net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’: net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-uninitialized] opth = (void *)opth + optlen; ^ net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here int len = 0, proto, optlen; ^ Fix it up. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	Merge branch 'bonding_rcu'	David S. Miller
	Ding Tianhong says: ==================== bonding: rebuild the lock use for bond monitor Now the bond slave list is not protected by bond lock, only by RTNL, but the monitor still use the bond lock to protect the slave list, it is useless, according to the Veaceslav's opinion, there were three way to fix the protect problem: 1. add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe to call call_netdevice_notifiers() in write lock. 2. remove unused bond->lock for monitor function, only use the exist rtnl lock(), it will take performance loss in fast path. 3. use RCU to protect the slave list, of course, performance is better, but in slow path, it is ignored. obviously the solution 1 is not fit here, I will consider the 2 and 3 solution. My principle is simple, if in fast path, RCU is better, otherwise in slow path, both is well, but according to the Jay Vosburgh's opinion, the monitor will loss performace if use RTNL to protect the all slave list, so remove the bond lock and replace with RCU. The second problem is the curr_slave_lock for bond, it is too old and unwanted in many place, because the curr_active_slave would only be changed in 3 place: 1. enslave slave. 2. release slave. 3. change active slave. all above were already holding bond lock, RTNL and curr_slave_lock together, it is tedious and no need to add so mach lock, when change the curr_active_slave, you have to hold the RTNL and curr_slave_lock together, and when you read the curr_active_slave, RTNL or curr_slave_lock, any one of them is no problem. for the stability, I did not change the logic for the monitor, all change is clear and simple, I have test the patch set for lockdep, it work well and stability. v2. accept the Jay Vosburgh's opinion, remove the RTNL and replace with RCU, also add some rcu function for bond use, so the patch set reach 10. v3. accept the Nikolay Aleksandrov's opinion, remove no needed bond_has_slave_rcu(), add protection for several 3ad mode handler functions and current_arp_slave. rebuild the bond_first_slave_rcu(), make it more clear. v4. because the struct netdev_adjacent should not be exist in netdevice.h, so I have to make a new function to support micro bond_first_slave_rcu(). also add a new patch to simplify the bond_resend_igmp_join_requests_delayed(). v5. according the Jay Vosburgh's opinion, in patch 2 and 6, the calling of notify peer is hardly to happen with the bond_xxx_commit() when the monitoring is running, so the performance impact about make two round trips to one trip on RTNL is minimal, no need to do that,the reason is very clear, so modify the patch 2 and 6, recover the notify peer in RTNL alone. ==================== Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: rebuild the bond_resend_igmp_join_requests_delayed()	dingtianhong
	The bond_resend_igmp_join_requests_delayed() and bond_resend_igmp_join_requests() should be integrated, because the bond_resend_igmp_join_requests_delayed() did nothing except bond_resend_igmp_join_requests(). The bond igmp_retrans could only be changed in bond_change_active_slave and here, bond_change_active_slave will be called in RTNL and curr_slave_lock, the bond_resend_igmp_join_requests already hold RTNL, so no need to free RTNL and hold curr_slave_lock again, it may be a small optimization, so move the igmp_retrans in RTNL and remove the curr_slave_lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: remove unwanted lock for bond_store_primaryxxx()	dingtianhong
	The bond_select_active_slave() will not release and acquire bond lock, so it is no need to read the bond lock for them, and the bond_store_primaryxxx() is already in RTNL, so remove the unwanted lock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: remove unwanted lock for bond_option_active_slave_set()	dingtianhong
	The bond_option_active_slave_set() is always called in RTNL, the RTNL could protect bond slave list, so remove the unwanted bond lock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add RCU for bond_3ad_state_machine_handler()	dingtianhong
	The bond_3ad_state_machine_handler() use the bond lock to protect the bond slave list and slave port together, but it is not enough, the bond slave list was link and unlink in RTNL, not bond lock, so I add RCU to protect the slave list from leaving. The bond lock is still used here, because when the slave has been removed from the list by the time the state machine runs, it appears to be possible for both function to manupulate the same aggregator->lag_ports by finding the aggregator via two different ports that are both members of that aggregator (i.e., port A of the agg is being unbound, and port B of the agg is runing its state machine). If I remove the bond lock, there are nothing to mutex changes to aggregator->lag_ports between bond_3ad_state_machine_handler and bond_3ad_unbind_slave, So the bond lock is the simplest way to protect aggregator->lag_ports. There was a lot of function need RCU protect, I have two choice to make the function in RCU-safe, (1) create new similar functions and make the bond slave list in RCU. (2) modify the existed functions and make them in read-side critical section, because the RCU read-side critical sections may be nested. I choose (2) because it is no need to create more similar functions. The nots in the function is still too old, clean up the nots. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: remove unwanted lock for bond enslave and release	dingtianhong
	The bond_change_active_slave() and bond_select_active_slave() do't need bond lock anymore, so remove the unwanted bond lock for these two functions. The bond_select_active_slave() will release and acquire curr_slave_lock, so the curr_slave_lock need to protect the function. In bond enslave and bond release, the bond slave list is also protected by RTNL, so bond lock is no need to exist, remove the lock and clean the functions. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: rebuild the lock use for bond_activebackup_arp_mon()	dingtianhong
	The bond_activebackup_arp_mon() use the bond lock for read to protect the slave list, it is no effect, and the RTNL is only called for bond_ab_arp_commit() and peer notify, for the performance better, use RCU to replace with the bond lock, to the bond slave list need to called in RCU, add a new bond_first_slave_rcu() to get the first slave in RCU protection. In bond_ab_arp_probe(), the bond->current_arp_slave may changd if bond release slave, just like: bond_ab_arp_probe() bond_release() cpu 0 cpu 1 ... if (bond->current_arp_slave...) ... ... bond->current_arp_slave = NULl bond->current_arp_slave->dev->name ... So the current_arp_slave need to dereference in the section. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: create bond_first_slave_rcu()	dingtianhong
	The bond_first_slave_rcu() will be used to instead of bond_first_slave() in rcu_read_lock(). According to the Jay Vosburgh's suggestion, the struct netdev_adjacent should hide from users who wanted to use it directly. so I package a new function to get the first slave of the bond. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: rebuild the lock use for bond_loadbalance_arp_mon()	dingtianhong
	The bond_loadbalance_arp_mon() use the bond lock to protect the bond slave list, it is no effect, so I could use RTNL or RCU to replace it, considering the performance impact, the RCU is more better here, so the bond lock replace with the RCU. The bond_select_active_slave() need RTNL and curr_slave_lock together, but there is no RTNL lock here, so add a rtnl_rtylock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: rebuild the lock use for bond_alb_monitor()	dingtianhong
	The bond_alb_monitor use bond lock to protect the bond slave list, it is no effect here, we need to use RTNL or RCU to replace bond lock, the bond_alb_monitor will called 10 times one second, RTNL may loss performance here, so I replace bond lock with RCU to protect the bond slave list, also the RTNL is preserved, the logic of the monitor did not changed. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: rebuild the lock use for bond_mii_monitor()	dingtianhong
	The bond_mii_monitor() still use bond lock to protect bond slave list, it is no effect, I have 2 way to fix the problem, move the RTNL to the top of the function, or add RCU to protect the bond slave list, according to the Jay Vosburgh's opinion, 10 times one second is a truely big performance loss if use RTNL to protect the whole monitor, so I would take the advice and use RCU to protect the bond slave list. The bond_has_slave() will not protect by anything, there will no things happen if the slave list is be changed, unless the bond was free, but it will not happened before the monitor, the bond will closed before be freed. The peers notify for the bond will calling curr_active_slave, so derefence the slave to make sure we will accessing the same slave if the curr_active_slave changed, as the rcu dereference need in read-side critical sector and bond_change_active_slave() will call it with no RCU hold, so add peer notify in rcu_read_lock which will be nested in monitor. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: remove the no effect lock for bond_select_active_slave()	dingtianhong
	The bond slave list was no longer protected by bond lock and only protected by RTNL or RCU, so anywhere that use bond lock to protect slave list is meaningless. remove the release and acquire bond lock for bond_select_active_slave(). The curr_active_slave could only be changed in 3 place: 1. enslave slave. 2. release slave. 3. change_active_slave. all above place were holding bond lock, RTNL and curr_slave_lock together, it is tedious and meaningless, obviously bond lock is no need here, but RTNL or curr_slave_lock is needed, so if you want to access active slave, you have to choose one lock, RTNL or curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock, otherwise curr_slave_lock is better, because of the performance. there are several place calling bond_select_active_slave() and bond_change_active_slave(), the next step I will clean these place and remove the no effect lock. there are some document changed together when update the function. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	pkt_sched: set root qdisc before change() in attach_default_qdiscs()	Eric Dumazet
	After commit 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs") we call disc_list_add() while the device qdisc might be the noop_qdisc one. This shows up as duplicates in "tc qdisc show", as all inactive devices point to noop_qdisc. Fix this by setting dev->qdisc to the new qdisc before calling ops->change() in attach_default_qdiscs() Add a WARN_ON_ONCE() to catch any future similar problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	Merge branch 'for-davem' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== An assortment of changes for Linux 3.14: 1. Merge the sfc fixes that you have already merged into net.git. (The branch point for those was such that this does not bring in any other changes.) 2. Reduce log level for a generally useless warning message, from Robert Stonehouse. 3. Include BISTs in ethtool offline self-test for EF10 and recover from BISTs initiated through other functions, from Jon Cooper. 4. Improve a sanity check on RX completions. 5. Avoid incrementing RX dropped count while the interface is down, from Jon Cooper. 6. Improve hardware sensor naming and log messages, from Edward Cree. 7. Log all unexpected errors returned by firmware, from Edward Cree. 8. Expose another NVRAM partition to userland. 9. Some refactoring of the PTP code in preparation for EF10 support. 10. Various minor cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	Merge branch 'bonding_netlink'	David S. Miller
	Scott Feldman says: ==================== bonding: add more netlink attributes v2: Addressed v1 review comments. In particular, Jay's concern about current sysfs ordering limitations carrying over to iproute. Netlink attributes are processed in a priority order in bond_netlink.c:bond_changelink(). Lower priority attributes can't undo higher priority attributes when attempting to set both with iproute command. For example, this command will fail: ip link add bond1 type bond mode active-backup miimon 10 arp_interval 10 Because we're trying to create a new bond to use incompatible miimon and ARP interval attributes. However, if attributes are applied one-at-a-time, previously applied attributes can be overridden: ip link add bond1 type bond mode active-backup miimon 10 ip link set dev bond1 type bond arp_interval 10 These two commands succeed. The bond is first created to use miimon. Next, the bond is converted to use ARP interval, which undoes miimon. v1: Following Jiri Pirko's lead, add more bonding netlink attributes. Sending matching iproute2 patch separately. sysfs access to attributes is retained. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add arp_all_targets netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_ARP_ALL_TARGETS to allow get/set of bonding parameter arp_all_targets via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add arp_validate netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_ARP_VALIDATE to allow get/set of bonding parameter arp_validate via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add arp_ip_target netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_ARP_IP_TARGET to allow get/set of bonding parameter arp_ip_target via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add arp_interval netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_ARP_INTERVAL to allow get/set of bonding parameter arp_interval via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add use_carrier netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_USE_CARRIER to allow get/set of bonding parameter use_carrier via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add downdelay netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_DOWNDELAY to allow get/set of bonding parameter downdelay via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add updelay netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_UPDELAY to allow get/set of bonding parameter updelay via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	bonding: add miimon netlink support	sfeldma@cumulusnetworks.com
	Add IFLA_BOND_MIIMON to allow get/set of bonding parameter miimon via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	packet: fix using smp_processor_id() in preemptible code	Li Zhong
	This patches fixes the following warning by replacing smp_processor_id() with raw_smp_processor_id(): [ 11.120893] BUG: using smp_processor_id() in preemptible [00000000] code: arping/3510 [ 11.120913] caller is .packet_sendmsg+0xc14/0xe68 [ 11.120920] CPU: 13 PID: 3510 Comm: arping Not tainted 3.13.0-rc3-next-20131211-dirty #1 [ 11.120926] Call Trace: [ 11.120932] [c0000001f803f6f0] [c0000000000138dc] .show_stack+0x110/0x25c (unreliable) [ 11.120942] [c0000001f803f7e0] [c00000000083dd24] .dump_stack+0xa0/0x37c [ 11.120951] [c0000001f803f870] [c000000000493fd4] .debug_smp_processor_id+0xfc/0x12c [ 11.120959] [c0000001f803f900] [c0000000007eba78] .packet_sendmsg+0xc14/0xe68 [ 11.120968] [c0000001f803fa80] [c000000000700968] .sock_sendmsg+0xa0/0xe0 [ 11.120975] [c0000001f803fbf0] [c0000000007014d8] .SyS_sendto+0x100/0x148 [ 11.120983] [c0000001f803fd60] [c0000000006fff10] .SyS_socketcall+0x1c4/0x2e8 [ 11.120990] [c0000001f803fe30] [c00000000000a1e4] syscall_exit+0x0/0x9c Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	netconf: add proxy-arp support	stephen hemminger
	Add support to netconf to show changes to proxy-arp status on a per interface basis via netlink in a manner similar to forwarding and reverse path state. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	sfc: Remove dependency of PTP on having a dedicated channel	Ben Hutchings
	We need a dedicated channel on Siena to ensure we can match up the separate RX and timestamp events for each PTP packet. We won't do this for EF10 as timestamps are delivered inline. Pass a channel index of 0 to MC_CMD_PTP_OP_ENABLE when there is no dedicated channel. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Split PTP multicast filter insertion/removal out of efx_ptp_{start,stop}()	Ben Hutchings
	Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Return EBUSY for filter insertion on EF10, matching Falcon/Siena	Ben Hutchings
	The MC firmware will return error MC_CMD_ERR_ENOSPC if filter insertion fails due to lack of resources. The net driver's filter implementation for Falcon-architecture returns EBUSY. They should behave consistently, so for EF10 change ENOSPC to EBUSY. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Expose NVRAM_PARTITION_TYPE_LICENSE on EF10	Ben Hutchings
	Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Fold efx_flush_all() into efx_stop_port() and update comments	Ben Hutchings
	efx_flush_all() is a really misleading name - it has nothing to do with e.g. flushing DMA queues. Since it's called immediately after efx_stop_port() and is highly dependent on what that does, combine the two functions. Update comments to explain what this is doing a little better. Also update an related and erroneous comment in efx_start_port(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Map MCDI error MC_CMD_ERR_ENOTSUP to Linux EOPNOTSUPP	Ben Hutchings
	Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Log all unexpected MCDI errors	Edward Cree
	Split each of efx_mcdi_rpc, efx_mcdi_rpc_finish, and efx_mcdi_rpc_async into a normal and a _quiet version; made the former log MCDI errors with netif_err (and include the raw MCDI error code), and the latter never log them at all. Changed various callers; any where some errors are expected (but others are not) call the _quiet version and then if necessary log the MCDI error themselves. Said logging is done by new efx_mcdi_display_error. Callers of efx_mcdi_rpc*_quiet functions which may want to log the error need to ensure that their outbuf is big enough to hold an MCDI error; to this end, they now use MCDI_DECLARE_BUF_OUT_OR_ERR, which always allocates at least 8 bytes. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Add new sensor names	Ben Hutchings
	Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Revise sensor names to be more understandable and consistent	Edward Cree
	Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Report units in sensor warnings	Edward Cree
	Add units to the "Sensor reports condition X for raw value Y" messages. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Correct RX dropped count for drops while interface is down	Jon Cooper
	We don't directly control RX ingress on Siena or any later controllers, and so we cannot prevent packets from entering the RX datapath while the RX queues are not set up. This results in the hardware incrementing RX_NODESC_DROP_CNT, but it's not an error and we should not include it in error stats. When bringing an interface up or down, pull (or wait for) stats and count the number of packets that were dropped while the interface was down. Subtract this from the reported RX dropped count. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Make initial fill of RX descriptors synchronous	Jon Cooper
	Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Tighten the check for RX merged completion events	Ben Hutchings
	The addition of RX event merging support means we don't reliably detect dropped RX events now. Currently we will only detect them if the previous event for the RX queue had the CONT bit set. Only accept RX completion events as merged if the GET_CAPABILITIES_OUT_RX_BATCHING bit is set in datapath_caps (which it won't be for the low-latency datapath) and the CONT bit is not set on the event. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	sfc: Add MC BISTs to ethtool offline self test on EF10	Jon Cooper
	To run BISTs the MC goes down in to a special mode where it will only respond to MCDI from the testing PF, and TX, RX and event queues are torn down. Other PFs get a message as it goes down to tell them it's going down. When the other PFs get this message, they check the soft status register to tell when the MC has rebooted after BIST mode and they can start recovery. [bwh: Convert the test result to 1 or -1 as for earlier NICs] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-12-12	ipv6: fix incorrect type in declaration	Florent Fourcot
	Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd "ipv6: add flowinfo for tcp6 pkt_options for all cases" Reported-by: kbuild test robot <fengguang.wu@intel.com> V2: fix the title, add empty line after the declaration (Sergei Shtylyov feedbacks) Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	net: eth: 8390: remove section warning in etherh.c	Olof Johansson
	Commit c45f812f0280 ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature') ended up moving the printout of version[] from something that will be compiled out due to defines, to something that is now evaluated at runtime. That means that what always used to be an access to an __initdata string from non-__init code started showing up as a section mismatch when it didn't before. All other 8390 versions skip __initdata on the version string, and starting to annotate the whole chain of callers with __init seems like more churn than it's worth on this driver, so remove it from etherh.c as well. Fixes: c45f812f0280 ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature') Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	net-gro: Prepare GRO stack for the upcoming tunneling support	Jerry Chu
	This patch modifies the GRO stack to avoid the use of "network_header" and associated macros like ip_hdr() and ipv6_hdr() in order to allow an arbitary number of IP hdrs (v4 or v6) to be used in the encapsulation chain. This lays the foundation for various IP tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later. With this patch, the GRO stack traversing now is mostly based on skb_gro_offset rather than special hdr offsets saved in skb (e.g., skb->network_header). As a result all but the top layer (i.e., the the transport layer) must have hdrs of the same length in order for a pkt to be considered for aggregation. Therefore when adding a new encap layer (e.g., for tunneling), one must check and skip flows (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a different hdr length. Note that unlike the network header, the transport header can and will continue to be set by the GRO code since there will be at most one "transport layer" in the encap chain. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	Merge branch 'macvtap_capture'	David S. Miller
	Vlad Yasevich says: ==================== Add packet capture support on macvtap device Change from RFC: - moved to the rx_handler approach. This series adds support for packet capturing on macvtap device. The initial approach was to simply export the capturing code as a function from the core network. While simple, it was not a very architecturally clean approach. The new appraoch is to provide macvtap with its rx_handler which can is attached to the macvtap device itself. Macvlan will simply requeue the packet with an updated skb->dev. BTW, macvlan layer already does this for macvlan devices. So, now macvtap and macvlan have almost the same exact input path. I've toyed with short-circuting the input path for macvtap by returning RX_HANDLER_ANOTHER, but that just made the code more complicated and didn't provide any kind of measurable gain (at least according to netperf and perf runs on the host). To see if there was a performance regression, I ran 1, 2 and 4 netperf STREAM and MAERTS tests agains the VM from both remote host and another guest on the same system. The command ran was netperf -H $host -t $test -l 20 -i 10 -I 95 -c -C The numbers I was getting with the new code were consistently very slightly (1-2%) better then the old code. I don't consider this an improvement, but it's not a regression! :) Running 'perf record' on the host didn't show any new hot spots and cpu utilization stayed about the same. This was better then I expected from simply looking at the code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	macvlan: Remove custom recieve and forward handlers	Vlad Yasevich
	Since now macvlan and macvtap use the same receive and forward handlers, we can remove them completely and use netif_rx and dev_forward_skb() directly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	macvtap: Add support of packet capture on macvtap device.	Vlad Yasevich
	Macvtap device currently doesn not allow a user to capture traffic on due to the fact that it steals the packets from the network stack before the skb->dev is set correctly on the receive side, and that use uses macvlan transmit path directly on the send side. As a result, we never get a change to give traffic to the taps while the correct device is set in the skb. This patch makes macvtap device behave almost exaclty like macvlan. On the send side, we switch to using dev_queue_xmit(). On the receive side, to deliver packets to macvtap, we now use netif_rx and dev_forward_skb just like macvlan. The only differnce now is that macvtap has its own rx_handler which is attached to the macvtap netdev. It is here that we now steal the packet and provide it to the socket. As a result, we can now capture traffic on the macvtap device: tcpdump -i macvtap0 It also gives us the abilit to add tc actions to the macvtap device and actually utilize different bandwidth management queues on output. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11	Merge branch 'bpf'	David S. Miller
	Daniel Borkmann says: ==================== bpf/filter updates This set adds just two minimal helper tools that complement the already available bpf_jit_disasm and complete BPF tooling; plus it adds and an extensive documentation update of filter.txt. Please see individual descriptions for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11	filter: doc: improve BPF documentation	Daniel Borkmann
	This patch significantly updates the BPF documentation and describes its internal architecture, Linux extensions, and handling of the kernel's BPF and JIT engine, plus documents how development can be facilitated with the help of bpf_dbg, bpf_asm, bpf_jit_disasm. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11	filter: bpf_asm: add minimal bpf asm tool	Daniel Borkmann
	There are a couple of valid use cases for a minimal low-level bpf asm like tool, for example, using/linking to libpcap is not an option, the required BPF filters use Linux extensions that are not supported by libpcap's compiler, a filter might be more complex and not cleanly implementable with libpcap's compiler, particular filter codes should be optimized differently than libpcap's internal BPF compiler does, or for security audits of emitted BPF JIT code for prepared set of BPF instructions resp. BPF JIT compiler development in general. Then, in such cases writing such a filter in low-level syntax can be an good alternative, for example, xt_bpf and cls_bpf users might have requirements that could result in more complex filter code, or one that cannot be expressed with libpcap (e.g. different return codes in cls_bpf for flowids on various BPF code paths). Moreover, BPF JIT implementors may wish to manually write test cases in order to verify the resulting JIT image, and thus need low-level access to BPF code generation as well. Therefore, complete the available toolchain for BPF with this small bpf_asm helper tool for the tools/net/ directory. These 3 complementary minimal helper tools round up and facilitate BPF development. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11	filter: bpf_dbg: add minimal bpf debugger	Daniel Borkmann
	This patch adds a minimal BPF debugger that "emulates" the kernel's BPF engine (w/o extensions) and allows for single stepping (forwards and backwards through BPF code) or running with >=1 breakpoints through selected or all packets from a pcap file with a provided user filter in order to facilitate verification of a BPF program. When a breakpoint is being hit, it dumps all register contents, decoded instructions and in case of branches both decoded branch targets as well as other useful information. Having this facility is in particular useful to verify BPF programs against given test traffic before attaching to a live system. With the general availability of cls_bpf, xt_bpf, socket filters, team driver and e.g. PTP code, all BPF users, quite often a single more complex BPF program is being used. Reasons for a more complex BPF program are primarily to optimize execution time for making a verdict when multiple simple BPF programs are combined into one in order to prevent parsing same headers multiple times. In particular, for cls_bpf that can have various return paths for encoding flowids, and xt_bpf to come to a fw verdict this can be the case. Therefore, as this can result in more complex and harder to debug code, it would be very useful to have this minimal tool for testing purposes. It can also be of help for BPF JIT developers as filters are "test attached" to the kernel on a temporary socket thus triggering a JIT image dump when enabled. The tool uses an interactive libreadline shell with auto-completion and history support. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>