lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2014-04-14	xen-netback: disable rogue vif in kthread context	Wei Liu
	[ Upstream commit e9d8b2c2968499c1f96563e6522c56958d5a1d0d ] When netback discovers frontend is sending malformed packet it will disables the interface which serves that frontend. However disabling a network interface involving taking a mutex which cannot be done in softirq context, so we need to defer this process to kthread context. This patch does the following: 1. introduce a flag to indicate the interface is disabled. 2. check that flag in TX path, don't do any work if it's true. 3. check that flag in RX path, turn off that interface if it's true. The reason to disable it in RX path is because RX uses kthread. After this change the behavior of netback is still consistent -- it won't do any TX work for a rogue frontend, and the interface will be eventually turned off. Also change a "continue" to "break" after xenvif_fatal_tx_err, as it doesn't make sense to continue processing packets if frontend is rogue. This is a fix for XSA-90. Reported-by: Török Edwin <edwin@etorok.net> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14	xen-netback: BUG_ON in xenvif_rx_action() not catching overflow	Paul Durrant
	[ Upstream commit 1425c7a4e8d3d2eebf308bcbdc3fa3c1247686b4 ] The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption that meta_slots_used == ring slots used. This is not necessarily the case for GSO packets, because the non-prefix GSO protocol consumes one more ring slot than meta-slot for the 'extra_info'. This patch changes the test to actually check ring slots. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14	xen-netback: worse-case estimate in xenvif_rx_action is underestimating	Paul Durrant
	[ Upstream commit a02eb4732cf975d7fc71b6d1a71c058c9988b949 ] The worse-case estimate for skb ring slot usage in xenvif_rx_action() fails to take fragment page_offset into account. The page_offset does, however, affect the number of times the fragmentation code calls start_new_rx_buffer() (i.e. consume another slot) and the worse-case should assume that will always return true. This patch adds the page_offset into the DIV_ROUND_UP for each frag. Unfortunately some frontends aggressively limit the number of requests they post into the shared ring so to avoid an estimate that is 'too' pessimal it is capped at MAX_SKB_FRAGS. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14	xen-netback: remove pointless clause from if statement	Paul Durrant
	[ Upstream commit 0576eddf24df716d8570ef8ca11452a9f98eaab2 ] This patch removes a test in start_new_rx_buffer() that checks whether a copy operation is less than MAX_BUFFER_OFFSET in length, since MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Reported-By: Sander Eikelenboom <linux@eikelenboom.it> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-10	Xen-netback: Fix issue caused by using gso_type wrongly	Annie Li
	Current netback uses gso_type to check whether the skb contains gso offload, and this is wrong. Gso_size is the right one to check gso existence, and gso_type is only used to check gso type. Some skbs contains nonzero gso_type and zero gso_size, current netback would treat these skbs as gso and create wrong response for this. This also causes ssh failure to domu from other server. V2: use skb_is_gso function as Paul Durrant suggested Signed-off-by: Annie Li <annie.li@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-05	xen-netback: Fix Rx stall due to race condition	Zoltan Kiss
	The recent patch to fix receive side flow control (11b57f90257c1d6a91cee720151b69e0c2020cf6: xen-netback: stop vif thread spinning if frontend is unresponsive) solved the spinning thread problem, however caused an another one. The receive side can stall, if: - [THREAD] xenvif_rx_action sets rx_queue_stopped to true - [INTERRUPT] interrupt happens, and sets rx_event to true - [THREAD] then xenvif_kthread sets rx_event to false - [THREAD] rx_work_todo doesn't return true anymore Also, if interrupt sent but there is still no room in the ring, it take quite a long time until xenvif_rx_action realize it. This patch ditch that two variable, and rework rx_work_todo. If the thread finds it can't fit more skb's into the ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's kept as 0. Then rx_work_todo will check if: - there is something to send to the ring (like before) - there is space for the topmost packet in the queue I think that's more natural and optimal thing to test than two bool which are set somewhere else. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14	xen-netback: use new skb_checksum_setup function	Paul Durrant
	Use skb_checksum_setup to set up partial checksum offsets rather then a private implementation. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09	xen-netback: stop vif thread spinning if frontend is unresponsive	Paul Durrant
	The recent patch to improve guest receive side flow control (ca2f09f2) had a slight flaw in the wait condition for the vif thread in that any remaining skbs in the guest receive side netback internal queue would prevent the thread from sleeping. An unresponsive frontend can lead to a permanently non-empty internal queue and thus the thread will spin. In this case the thread should really sleep until the frontend becomes responsive again. This patch adds an extra flag to the vif which is set if the shared ring is full and cleared when skbs are drained into the shared ring. Thus, if the thread runs, finds the shared ring full and can make no progress the flag remains set. If the flag remains set then the thread will sleep, regardless of a non-empty queue, until the next event from the frontend. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29	xen-netback: fix guest-receive-side array sizes	Paul Durrant
	The sizes chosen for the metadata and grant_copy_op arrays on the guest receive size are wrong; - The meta array is needlessly twice the ring size, when we only ever consume a single array element per RX ring slot - The grant_copy_op array is way too small. It's sized based on a bogus assumption: that at most two copy ops will be used per ring slot. This may have been true at some point in the past but it's clear from looking at start_new_rx_buffer() that a new ring slot is only consumed if a frag would overflow the current slot (plus some other conditions) so the actual limit is MAX_SKB_FRAGS grant_copy_ops per ring slot. This patch fixes those two sizing issues and, because grant_copy_ops grows so much, it pulls it out into a separate chunk of vmalloc()ed memory. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19	xen-netback: add gso_segs calculation	Paul Durrant
	netback already has code which parses IPv4 and v6 headers to set up checksum offsets and these are always applied to GSO packets being sent from frontends. It's therefore suboptimal that GSOs are being marked SKB_GSO_DODGY to defer the gso_segs calculation when netback already has all necessary information to hand to do the calculation. This patch adds that calculation. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19	xen-netback: fix some error return code	Wei Yongjun
	'err' is overwrited to 0 after maybe_pull_tail() call, so the error code was not set if skb_partial_csum_set() call failed. Fix to return error -EPROTO from those error handling case instead of 0. Fixes: d52eb0d46f36 ('xen-netback: make sure skb linear area covers checksum field') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17	xen-netback: fix fragments error handling in checksum_setup_ip()	Wei Yongjun
	Fix to return -EPROTO error if fragments detected in checksum_setup_ip(). Fixes: 1431fb31ecba ('xen-netback: fix fragment detection in checksum setup') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	xen-netback: fix gso_prefix check	Paul Durrant
	There is a mistake in checking the gso_prefix mask when passing large packets to a guest. The wrong shift is applied to the bit - the raw skb gso type is used rather then the translated one. This leads to large packets being handed to the guest without the GSO metadata. This patch fixes the check. The mistake manifested as errors whilst running Microsoft HCK large packet offload tests between a pair of Windows 8 VMs. I have verified this patch fixes those errors. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	xen-netback: napi: don't prematurely request a tx event	Paul Durrant
	This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the former call has the side effect of advancing the ring event pointer and therefore inviting another interrupt from the frontend before the napi poll has actually finished, thereby defeating the point of napi. The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in xenvif_poll, the napi poll function, if the work done is less than the budget i.e. when actually transitioning back to interrupt mode. Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	xen-netback: napi: fix abuse of budget	Paul Durrant
	netback seems to be somewhat confused about the napi budget parameter. The parameter is supposed to limit the number of skbs processed in each poll, but netback has this confused with grant operations. This patch fixes that, properly limiting the work done in each poll. Note that this limit makes sure we do not process any more data from the shared ring than we intend to pass back from the poll. This is important to prevent tx_queue potentially growing without bound. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11	xen-netback: make sure skb linear area covers checksum field	Paul Durrant
	skb_partial_csum_set requires that the linear area of the skb covers the checksum field. The checksum setup code in netback was only doing that pullup in the case when the pseudo header checksum was being recalculated though. This patch makes that pullup unconditional. (I pullup the whole transport header just for simplicity; the requirement is only for the check field but in the case of UDP this is the last field in the header and in the case of TCP it's the last but one). The lack of pullup manifested as failures running Microsoft HCK network tests on a pair of Windows 8 VMs and it has been verified that this patch fixes the problem. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09	xen-netback: improve guest-receive-side flow control	Paul Durrant
	The way that flow control works without this patch is that, in start_xmit() the code uses xenvif_count_skb_slots() to predict how many slots xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek' counter which it then uses to determine if the shared ring has that amount of space available by checking whether 'req_prod' has passed that value. If the ring doesn't have space the tx queue is stopped. xenvif_gop_skb() will then consume slots and update 'req_cons' and issue responses, updating 'rsp_prod' as it goes. The frontend will consume those responses and post new requests, by updating req_prod. So, req_prod chases req_cons which chases rsp_prod, and can never exceed that value. Thus if xenvif_count_skb_slots() ever returns a number of slots greater than xenvif_gop_skb() uses, req_cons_peek will get to a value that req_prod cannot possibly achieve (since it's limited by the 'real' req_cons) and, if this happens enough times, req_cons_peek gets more than a ring size ahead of req_cons and the tx queue then remains stopped forever waiting for an unachievable amount of space to become available in the ring. Having two routines trying to calculate the same value is always going to be fragile, so this patch does away with that. All we essentially need to do is make sure that we have 'enough stuff' on our internal queue without letting it build up uncontrollably. So start_xmit() makes a cheap optimistic check of how much space is needed for an skb and only turns the queue off if that is unachievable. net_rx_action() is the place where we could do with an accurate predicition but, since that has proven tricky to calculate, a cheap worse-case (but not too bad) estimate is all we really need since the only thing we must prevent is xenvif_gop_skb() consuming more slots than are available. Without this patch I can trivially stall netback permanently by just doing a large guest to guest file copy between two Windows Server 2008R2 VMs on a single host. Patch tested with frontends in: - Windows Server 2008R2 - CentOS 6.0 - Debian Squeeze - Debian Wheezy - SLES11 Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Annie Li <annie.li@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-05	xen-netback: fix fragment detection in checksum setup	Paul Durrant
	The code to detect fragments in checksum_setup() was missing for IPv4 and too eager for IPv6. (It transpires that Windows seems to send IPv6 packets with a fragment header even if they are not a fragment - i.e. offset is zero, and M bit is not set). This patch also incorporates a fix to callers of maybe_pull_tail() where skb->network_header was being erroneously added to the length argument. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> cc: David Miller <davem@davemloft.net> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28	xen-netback: include definition of csum_ipv6_magic	Andy Whitcroft
	We are now using csum_ipv6_magic, include the appropriate header. Avoids the following error: drivers/net/xen-netback/netback.c:1313:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] tcph->check = ~csum_ipv6_magic(&ipv6h->saddr, Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29	xen-netback: use jiffies_64 value to calculate credit timeout	Wei Liu
	time_after_eq() only works if the delta is < MAX_ULONG/2. For a 32bit Dom0, if netfront sends packets at a very low rate, the time between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for timer_after_eq() will be incorrect. Credit will not be replenished and the guest may become unable to send packets (e.g., if prior to the long gap, all credit was exhausted). Use jiffies_64 variant to mitigate this problem for 32bit Dom0. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jason Luan <jianhai.luan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17	xen-netback: enable IPv6 TCP GSO to the guest	Paul Durrant
	This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate extra or prefix segments to pass the large packet to the frontend. New xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled to determine if the frontend is capable of handling such packets. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17	xen-netback: handle IPv6 TCP GSO packets from the guest	Paul Durrant
	This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs if the frontend passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17	xen-netback: add support for IPv6 checksum offload from guest	Paul Durrant
	For performance of VM to VM traffic on a single host it is better to avoid calculation of TCP/UDP checksum in the sending frontend. To allow this this patch adds the code necessary to set up partial checksum for IPv6 packets and xenstore flag feature-ipv6-csum-offload to advertise that fact to frontends. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08	Revert "xen-netback: improve ring effeciency for guest RX"	Wei Liu
	This reverts commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c. The named changeset is causing problem. Let's aim to make this part less fragile before trying to improve things. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Annie Li <annie.li@oracle.com> Cc: Matt Wilson <msw@amazon.com> Cc: Xi Xiong <xixiong@amazon.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30	xen-netback: improve ring effeciency for guest RX	Wei Liu
	There was a bug that netback routines netbk/xenvif_skb_count_slots and netbk/xenvif_gop_frag_copy disagreed with each other, which caused netback to push wrong number of responses to netfront, which caused netfront to eventually crash. The bug was fixed in 6e43fc04a ("xen-netback: count number required slots for an skb more carefully"). Commit 6e43fc04a focused on backport-ability. The drawback with the existing packing scheme is that the ring is not used effeciently, as stated in 6e43fc04a. skb->data like: \| 1111\|222222222222\|3333 \| is arranged as: \|1111 \|222222222222\|3333 \| If we can do this: \|111122222222\|22223333 \| That would save one ring slot, which improves ring effeciency. This patch effectively reverts 6e43fc04a. That patch made count_slots agree with gop_frag_copy, while this patch goes the other way around -- make gop_frag_copy agree with count_slots. The end result is that they still agree with each other, and the ring is now arranged like: \|111122222222\|22223333 \| The patch that improves packing was first posted by Xi Xong and Matt Wilson. I only rebase it on top of net-next and rewrite commit message, so I retain all their SoBs. For more infomation about the original bug please refer to email listed below and commit message of 6e43fc04a. Original patch: http://lists.xen.org/archives/html/xen-devel/2013-07/msg00760.html Signed-off-by: Xi Xiong <xixiong@amazon.com> Reviewed-by: Matt Wilson <msw@amazon.com> [ msw: minor code cleanups, rewrote commit message, adjusted code to count RX slots instead of meta structures ] Signed-off-by: Matt Wilson <msw@amazon.com> Cc: Annie Li <annie.li@oracle.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> [ liuw: rebased on top of net-next tree, rewrote commit message, coding style cleanup. ] Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12	xen-netback: count number required slots for an skb more carefully	David Vrabel
	When a VM is providing an iSCSI target and the LUN is used by the backend domain, the generated skbs for direct I/O writes to the disk have large, multi-page skb->data but no frags. With some lengths and starting offsets, xen_netbk_count_skb_slots() would be one short because the simple calculation of DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the decisions made by start_new_rx_buffer() which does not guarantee responses are fully packed. For example, a skb with length < 2 pages but which spans 3 pages would be counted as requiring 2 slots but would actually use 3 slots. skb->data: \| 1111\|222222222222\|3333 \| Fully packed, this would need 2 slots: \|111122222222\|22223333 \| But because the 2nd page wholy fits into a slot it is not split across slots and goes into a slot of its own: \|1111 \|222222222222\|3333 \| Miscounting the number of slots means netback may push more responses than the number of available requests. This will cause the frontend to get very confused and report "Too many frags/slots". The frontend never recovers and will eventually BUG. Fix this by counting the number of required slots more carefully. In xen_netbk_count_skb_slots(), more closely follow the algorithm used by xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which is the dry-run equivalent of netbk_gop_frag_copy(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29	xen-netback: rename functions	Wei Liu
	As we move to 1:1 model and melt xen_netbk and xenvif together, it would be better to use single prefix for all functions in xen-netback. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29	xen-netback: switch to NAPI + kthread 1:1 model	Wei Liu
	This patch implements 1:1 model netback. NAPI and kthread are utilized to do the weight-lifting job: - NAPI is used for guest side TX (host side RX) - kthread is used for guest side RX (host side TX) Xenvif and xen_netbk are made into one structure to reduce code size. This model provides better scheduling fairness among vifs. It is also prerequisite for implementing multiqueue for Xen netback. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29	xen-netback: remove page tracking facility	Wei Liu
	The data flow from DomU to DomU on the same host in current copying scheme with tracking facility: copy DomU --------> Dom0 DomU \| ^ \|____________________________\| copy The page in Dom0 is a page with valid MFN. So we can always copy from page Dom0, thus removing the need for a tracking facility. copy copy DomU --------> Dom0 -------> DomU Simple iperf test shows no performance regression (obviously we copy twice either way): W/ tracking: ~5.3Gb/s W/o tracking: ~5.4Gb/s Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Matt Wilson <msw@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01	xen: Use more current logging styles	Joe Perches
	Instead of mixing printk and pr_<level> forms, just use pr_<level> Miscellaneous changes around these conversions: Add a missing newline to avoid message interleaving, coalesce formats, reflow modified lines to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-24	xen-netback: double free on unload	Dan Carpenter
	There is a typo here, "i" vs "j", so we would crash on module_exit(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13	xen-netback: don't de-reference vif pointer after having called xenvif_put()	Jan Beulich
	When putting vif-s on the rx notify list, calling xenvif_put() must be deferred until after the removal from the list and the issuing of the notification, as both operations dereference the pointer. Changing this got me to notice that the "irq" variable was effectively unused (and was of too narrow type anyway). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23	xen-netback: split event channels support for Xen backend driver	Wei Liu
	Netback and netfront only use one event channel to do TX / RX notification, which may cause unnecessary wake-up of processing routines. This patch adds a new feature called feature-split-event-channels to netback, enabling it to handle TX and RX events separately. Netback will use tx_irq to notify guest for TX completion, rx_irq for RX notification. If frontend doesn't support this feature, tx_irq equals to rx_irq. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-17	xen-netback: enable user to unload netback module	Wei Liu
	This patch enables user to unload netback module, which is useful when user wants to upgrade to a newer netback module without rebooting the host. Netfront cannot handle netback removal event. As we cannot fix all possible frontends we add module get / put along with vif get / put to avoid mis-unloading of netback. To unload netback module, user needs to shutdown all VMs or migrate them to another host or unplug all vifs before hand. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>¬ Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-17	xen-netback: remove dead code	Wei Liu
	The array mmap_pages is never touched in the initialization function. This is remnant of mapping mechanism, which does not exist upstream. In current upstream code this array only tracks usage of pages inside netback. Those pages are allocated when contructing a SKB and passed directly to network subsystem. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-02	xen-netback: better names for thresholds	Wei Liu
	This patch only changes some names to avoid confusion. In this patch we have: MAX_SKB_SLOTS_DEFAULT -> FATAL_SKB_SLOTS_DEFAULT max_skb_slots -> fatal_skb_slots #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN The fatal_skb_slots is the threshold to determine whether a packet is malicious. XEN_NETBK_LEGACY_SLOTS_MAX is the maximum slots a valid packet can have at this point. It is defined to be XEN_NETIF_NR_SLOTS_MIN because that's guaranteed to be supported by all backends. Suggested-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-02	xen-netback: avoid allocating variable size array on stack	Wei Liu
	Tune xen_netbk_count_requests to not touch working array beyond limit, so that we can make working array size constant. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-02	xen-netback: remove redundent parameter in netbk_count_requests	Wei Liu
	Tracking down from the caller, first_idx is always equal to vif->tx.req_cons. Remove it to avoid confusion. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22	xen-netback: don't disconnect frontend when seeing oversize packet	Wei Liu
	Some frontend drivers are sending packets > 64 KiB in length. This length overflows the length field in the first slot making the following slots have an invalid length. Turn this error back into a non-fatal error by dropping the packet. To avoid having the following slots having fatal errors, consume all slots in the packet. This does not reopen the security hole in XSA-39 as if the packet as an invalid number of slots it will still hit fatal error case. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22	xen-netback: coalesce slots in TX path and fix regressions	Wei Liu
	This patch tries to coalesce tx requests when constructing grant copy structures. It enables netback to deal with situation when frontend's MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS. With the help of coalescing, this patch tries to address two regressions avoid reopening the security hole in XSA-39. Regression 1. The reduction of the number of supported ring entries (slots) per packet (from 18 to 17). This regression has been around for some time but remains unnoticed until XSA-39 security fix. This is fixed by coalescing slots. Regression 2. The XSA-39 security fix turning "too many frags" errors from just dropping the packet to a fatal error and disabling the VIF. This is fixed by coalescing slots (handling 18 slots when backend's MAX_SKB_FRAGS is 17) which rules out false positive (using 18 slots is legit) and dropping packets using 19 to `max_skb_slots` slots. To avoid reopening security hole in XSA-39, frontend sending packet using more than max_skb_slots is considered malicious. The behavior of netback for packet is thus: 1-18 slots: valid 19-max_skb_slots slots: drop and respond with an error max_skb_slots+ slots: fatal error max_skb_slots is configurable by admin, default value is 20. Also change variable name from "frags" to "slots" in netbk_count_requests. Please note that RX path still has dependency on MAX_SKB_FRAGS. This will be fixed with separate patch. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12	xen-netback: switch to use skb_partial_csum_set()	Jason Wang
	Switch to use skb_partial_csum_set() to simplify the codes. Cc: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-10	xen-netback: fix sparse warning	stephen hemminger
	Fix warning about 0 used as NULL. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27	net: switch to use skb_probe_transport_header()	Jason Wang
	Switch to use the new help skb_probe_transport_header() to do the l4 header probing for untrusted sources. For packets with partial csum, the header should already been set by skb_partial_csum_set(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26	netback: set transport header before passing it to kernel	Jason Wang
	Currently, for the packets receives from netback, before doing header check, kernel just reset the transport header in netif_receive_skb() which pretends non l4 header. This is suboptimal for precise packet length estimation (introduced in 1def9238: net_sched: more precise pkt_len computation) which needs correct l4 header for gso packets. The patch just reuse the header probed by netback for partial checksum packets and tries to use skb_flow_dissect() for other cases, if both fail, just pretend no l4 header. Cc: Eric Dumazet <edumazet@google.com> Cc: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25	xen-netback: remove skb in xen_netbk_alloc_page	Wei Liu
	This variable is never used. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19	Revert "xen: netback: remove redundant xenvif_put"	David S. Miller
	This reverts commit d37204566a61d5116d385ae909db8e14a734b30f. This change is incorrect, as per Jan Beulich: ==================== But this is wrong from all we can tell, we discussed this before (Wei pointed to the discussion in an earlier reply). The core of it is that the put here parallels the one in netbk_tx_err(), and the one in xenvif_carrier_off() matches the get from xenvif_connect() (which normally would be done on the path coming through xenvif_disconnect()). ==================== And a previous discussion of this issue is at: http://marc.info/?l=xen-devel&m=136084174026977&w=2 Signed-off-by: David S. Miller <davem@davemloft.net>