From dca145ffaa8d39ea1904491ac81b92b7049372c0 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 27 Oct 2014 21:45:24 -0700
Subject: tcp: allow for bigger reordering level

While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
 deep packet inspections, alternative TCP congestion modules...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/bonding.txt   |  7 ++-----
 Documentation/networking/ip-sysctl.txt | 10 +++++++++-
 2 files changed, 11 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index eeb5b2e97bed..83bf4986baea 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -2230,11 +2230,8 @@ balance-rr: This mode is the only mode that will permit a single
 
 	It is possible to adjust TCP/IP's congestion limits by
 	altering the net.ipv4.tcp_reordering sysctl parameter.  The
-	usual default value is 3, and the maximum useful value is 127.
-	For a four interface balance-rr bond, expect that a single
-	TCP/IP stream will utilize no more than approximately 2.3
-	interface's worth of throughput, even after adjusting
-	tcp_reordering.
+	usual default value is 3. But keep in mind TCP stack is able
+	to automatically increase this when it detects reorders.
 
 	Note that the fraction of packets that will be delivered out of
 	order is highly variable, and is unlikely to be zero.  The level
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 0307e2875f21..9028b879a97b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -376,9 +376,17 @@ tcp_orphan_retries - INTEGER
 	may consume significant resources. Cf. tcp_max_orphans.
 
 tcp_reordering - INTEGER
-	Maximal reordering of packets in a TCP stream.
+	Initial reordering level of packets in a TCP stream.
+	TCP stack can then dynamically adjust flow reordering level
+	between this initial value and tcp_max_reordering
 	Default: 3
 
+tcp_max_reordering - INTEGER
+	Maximal reordering level of packets in a TCP stream.
+	300 is a fairly conservative value, but you might increase it
+	if paths are using per packet load balancing (like bonding rr mode)
+	Default: 300
+
 tcp_retrans_collapse - BOOLEAN
 	Bug-to-bug compatibility with some broken printers.
 	On retransmit try to send bigger packets to work around bugs in
-- 
cgit v1.2.3


From 7fd2561e4ebdd070ebba6d3326c4c5b13942323f Mon Sep 17 00:00:00 2001
From: Erik Kline <ek@google.com>
Date: Tue, 28 Oct 2014 18:11:14 +0900
Subject: net: ipv6: Add a sysctl to make optimistic addresses useful
 candidates

Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes.  Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.

This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).

The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.

For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.

Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set).  A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.

Also: add an entry in ip-sysctl.txt for optimistic_dad.

Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt | 13 ++++++++++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ipv6.h              |  1 +
 net/ipv6/addrconf.c                    | 46 ++++++++++++++++++++++++++++++++--
 4 files changed, 59 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 9028b879a97b..368e3251c553 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1460,6 +1460,19 @@ suppress_frag_ndisc - INTEGER
 	1 - (default) discard fragmented neighbor discovery packets
 	0 - allow fragmented neighbor discovery packets
 
+optimistic_dad - BOOLEAN
+	Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
+		0: disabled (default)
+		1: enabled
+
+use_optimistic - BOOLEAN
+	If enabled, do not classify optimistic addresses as deprecated during
+	source address selection.  Preferred addresses will still be chosen
+	before optimistic addresses, subject to other ranking in the source
+	address selection algorithm.
+		0: disabled (default)
+		1: enabled
+
 icmp/*:
 ratelimit - INTEGER
 	Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ff560537dd61..7121a2e97ce2 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -42,6 +42,7 @@ struct ipv6_devconf {
 	__s32		accept_ra_from_local;
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 	__s32		optimistic_dad;
+	__s32		use_optimistic;
 #endif
 #ifdef CONFIG_IPV6_MROUTE
 	__s32		mc_forwarding;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index efa2666f4b8a..e863d088b9a5 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -164,6 +164,7 @@ enum {
 	DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL,
 	DEVCONF_SUPPRESS_FRAG_NDISC,
 	DEVCONF_ACCEPT_RA_FROM_LOCAL,
+	DEVCONF_USE_OPTIMISTIC,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 50b95b2db87c..8d12b7c7018f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1170,6 +1170,9 @@ enum {
 	IPV6_SADDR_RULE_PRIVACY,
 	IPV6_SADDR_RULE_ORCHID,
 	IPV6_SADDR_RULE_PREFIX,
+#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
+	IPV6_SADDR_RULE_NOT_OPTIMISTIC,
+#endif
 	IPV6_SADDR_RULE_MAX
 };
 
@@ -1197,6 +1200,15 @@ static inline int ipv6_saddr_preferred(int type)
 	return 0;
 }
 
+static inline bool ipv6_use_optimistic_addr(struct inet6_dev *idev)
+{
+#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
+	return idev && idev->cnf.optimistic_dad && idev->cnf.use_optimistic;
+#else
+	return false;
+#endif
+}
+
 static int ipv6_get_saddr_eval(struct net *net,
 			       struct ipv6_saddr_score *score,
 			       struct ipv6_saddr_dst *dst,
@@ -1257,10 +1269,16 @@ static int ipv6_get_saddr_eval(struct net *net,
 		score->scopedist = ret;
 		break;
 	case IPV6_SADDR_RULE_PREFERRED:
+	    {
 		/* Rule 3: Avoid deprecated and optimistic addresses */
+		u8 avoid = IFA_F_DEPRECATED;
+
+		if (!ipv6_use_optimistic_addr(score->ifa->idev))
+			avoid |= IFA_F_OPTIMISTIC;
 		ret = ipv6_saddr_preferred(score->addr_type) ||
-		      !(score->ifa->flags & (IFA_F_DEPRECATED|IFA_F_OPTIMISTIC));
+		      !(score->ifa->flags & avoid);
 		break;
+	    }
 #ifdef CONFIG_IPV6_MIP6
 	case IPV6_SADDR_RULE_HOA:
 	    {
@@ -1306,6 +1324,14 @@ static int ipv6_get_saddr_eval(struct net *net,
 			ret = score->ifa->prefix_len;
 		score->matchlen = ret;
 		break;
+#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
+	case IPV6_SADDR_RULE_NOT_OPTIMISTIC:
+		/* Optimistic addresses still have lower precedence than other
+		 * preferred addresses.
+		 */
+		ret = !(score->ifa->flags & IFA_F_OPTIMISTIC);
+		break;
+#endif
 	default:
 		ret = 0;
 	}
@@ -3222,8 +3248,15 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 	 * Optimistic nodes can start receiving
 	 * Frames right away
 	 */
-	if (ifp->flags & IFA_F_OPTIMISTIC)
+	if (ifp->flags & IFA_F_OPTIMISTIC) {
 		ip6_ins_rt(ifp->rt);
+		if (ipv6_use_optimistic_addr(idev)) {
+			/* Because optimistic nodes can use this address,
+			 * notify listeners. If DAD fails, RTM_DELADDR is sent.
+			 */
+			ipv6_ifa_notify(RTM_NEWADDR, ifp);
+		}
+	}
 
 	addrconf_dad_kick(ifp);
 out:
@@ -4330,6 +4363,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_ACCEPT_SOURCE_ROUTE] = cnf->accept_source_route;
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 	array[DEVCONF_OPTIMISTIC_DAD] = cnf->optimistic_dad;
+	array[DEVCONF_USE_OPTIMISTIC] = cnf->use_optimistic;
 #endif
 #ifdef CONFIG_IPV6_MROUTE
 	array[DEVCONF_MC_FORWARDING] = cnf->mc_forwarding;
@@ -5155,6 +5189,14 @@ static struct addrconf_sysctl_table
 			.proc_handler   = proc_dointvec,
 
 		},
+		{
+			.procname       = "use_optimistic",
+			.data           = &ipv6_devconf.use_optimistic,
+			.maxlen         = sizeof(int),
+			.mode           = 0644,
+			.proc_handler   = proc_dointvec,
+
+		},
 #endif
 #ifdef CONFIG_IPV6_MROUTE
 		{
-- 
cgit v1.2.3


From 06745729c48e3677a64db63481184cc7aef1ea69 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Wed, 29 Oct 2014 10:45:02 -0700
Subject: dsa: Add new optional devicetree property to describe EEPROM size

The dsa core now supports reading from and writing to a switch EEPROM
if connected. Describe optional devicetree property indicating that
an EEPROM is present and its size.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index a62c889aafca..e124847443f8 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -10,7 +10,7 @@ Required properties:
 - dsa,ethernet		: Should be a phandle to a valid Ethernet device node
 - dsa,mii-bus		: Should be a phandle to a valid MDIO bus device node
 
-Optionnal properties:
+Optional properties:
 - interrupts		: property with a value describing the switch
 			  interrupt number (not supported by the driver)
 
@@ -23,6 +23,13 @@ Each of these switch child nodes should have the following required properties:
 - #address-cells	: Must be 1
 - #size-cells		: Must be 0
 
+A switch child node has the following optional property:
+
+- eeprom-length		: Set to the length of an EEPROM connected to the
+			  switch. Must be set if the switch can not detect
+			  the presence and/or size of a connected EEPROM,
+			  otherwise optional.
+
 A switch may have multiple "port" children nodes
 
 Each port children node must have the following mandatory properties:
-- 
cgit v1.2.3


From 9227dc5e579b6b2ef58ad0d3d0d23ddac77846ef Mon Sep 17 00:00:00 2001
From: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
Date: Tue, 4 Nov 2014 16:06:56 -0600
Subject: amd-xgbe: Add support for per DMA channel interrupts

This patch provides support for interrupts that are generated by the
Tx/Rx DMA channel pairs of the device.  This allows for Tx and Rx
processing to run across multiple processsors.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/amd-xgbe.txt |  12 +-
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c           |  12 +-
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c           | 226 +++++++++++++++++----
 drivers/net/ethernet/amd/xgbe/xgbe-main.c          |  10 +-
 drivers/net/ethernet/amd/xgbe/xgbe.h               |  10 +-
 5 files changed, 219 insertions(+), 51 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/amd-xgbe.txt b/Documentation/devicetree/bindings/net/amd-xgbe.txt
index 41354f730beb..26efd526d16c 100644
--- a/Documentation/devicetree/bindings/net/amd-xgbe.txt
+++ b/Documentation/devicetree/bindings/net/amd-xgbe.txt
@@ -7,7 +7,10 @@ Required properties:
    - PCS registers
 - interrupt-parent: Should be the phandle for the interrupt controller
   that services interrupts for this device
-- interrupts: Should contain the amd-xgbe interrupt
+- interrupts: Should contain the amd-xgbe interrupt(s). The first interrupt
+  listed is required and is the general device interrupt. If the optional
+  amd,per-channel-interrupt property is specified, then one additional
+  interrupt for each DMA channel supported by the device should be specified
 - clocks:
    - DMA clock for the amd-xgbe device (used for calculating the
      correct Rx interrupt watchdog timer value on a DMA channel
@@ -23,6 +26,9 @@ Optional properties:
 - mac-address: mac address to be assigned to the device. Can be overridden
   by UEFI.
 - dma-coherent: Present if dma operations are coherent
+- amd,per-channel-interrupt: Indicates that Rx and Tx complete will generate
+  a unique interrupt for each DMA channel - this requires an additional
+  interrupt be configured for each DMA channel
 
 Example:
 	xgbe@e0700000 {
@@ -30,7 +36,9 @@ Example:
 		reg = <0 0xe0700000 0 0x80000>,
 		      <0 0xe0780000 0 0x80000>;
 		interrupt-parent = <&gic>;
-		interrupts = <0 325 4>;
+		interrupts = <0 325 4>,
+			     <0 326 1>, <0 327 1>, <0 328 1>, <0 329 1>;
+		amd,per-channel-interrupt;
 		clocks = <&xgbe_dma_clk>, <&xgbe_ptp_clk>;
 		clock-names = "dma_clk", "ptp_clk";
 		phy-handle = <&phy>;
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
index b3719f154637..ac3d319ffab3 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
@@ -481,17 +481,21 @@ static void xgbe_enable_dma_interrupts(struct xgbe_prv_data *pdata)
 
 		if (channel->tx_ring) {
 			/* Enable the following Tx interrupts
-			 *   TIE  - Transmit Interrupt Enable (unless polling)
+			 *   TIE  - Transmit Interrupt Enable (unless using
+			 *          per channel interrupts)
 			 */
-			XGMAC_SET_BITS(dma_ch_ier, DMA_CH_IER, TIE, 1);
+			if (!pdata->per_channel_irq)
+				XGMAC_SET_BITS(dma_ch_ier, DMA_CH_IER, TIE, 1);
 		}
 		if (channel->rx_ring) {
 			/* Enable following Rx interrupts
 			 *   RBUE - Receive Buffer Unavailable Enable
-			 *   RIE  - Receive Interrupt Enable
+			 *   RIE  - Receive Interrupt Enable (unless using
+			 *          per channel interrupts)
 			 */
 			XGMAC_SET_BITS(dma_ch_ier, DMA_CH_IER, RBUE, 1);
-			XGMAC_SET_BITS(dma_ch_ier, DMA_CH_IER, RIE, 1);
+			if (!pdata->per_channel_irq)
+				XGMAC_SET_BITS(dma_ch_ier, DMA_CH_IER, RIE, 1);
 		}
 
 		XGMAC_DMA_IOWRITE(channel, DMA_CH_IER, dma_ch_ier);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 07e2d216323a..c3533e104c61 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -114,6 +114,7 @@
  *     THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <linux/platform_device.h>
 #include <linux/spinlock.h>
 #include <linux/tcp.h>
 #include <linux/if_vlan.h>
@@ -126,7 +127,8 @@
 #include "xgbe.h"
 #include "xgbe-common.h"
 
-static int xgbe_poll(struct napi_struct *, int);
+static int xgbe_one_poll(struct napi_struct *, int);
+static int xgbe_all_poll(struct napi_struct *, int);
 static void xgbe_set_rx_mode(struct net_device *);
 
 static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
@@ -134,6 +136,7 @@ static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
 	struct xgbe_channel *channel_mem, *channel;
 	struct xgbe_ring *tx_ring, *rx_ring;
 	unsigned int count, i;
+	int ret = -ENOMEM;
 
 	count = max_t(unsigned int, pdata->tx_ring_count, pdata->rx_ring_count);
 
@@ -158,6 +161,19 @@ static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
 		channel->dma_regs = pdata->xgmac_regs + DMA_CH_BASE +
 				    (DMA_CH_INC * i);
 
+		if (pdata->per_channel_irq) {
+			/* Get the DMA interrupt (offset 1) */
+			ret = platform_get_irq(pdata->pdev, i + 1);
+			if (ret < 0) {
+				netdev_err(pdata->netdev,
+					   "platform_get_irq %u failed\n",
+					   i + 1);
+				goto err_irq;
+			}
+
+			channel->dma_irq = ret;
+		}
+
 		if (i < pdata->tx_ring_count) {
 			spin_lock_init(&tx_ring->lock);
 			channel->tx_ring = tx_ring++;
@@ -168,9 +184,9 @@ static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
 			channel->rx_ring = rx_ring++;
 		}
 
-		DBGPR("  %s - queue_index=%u, dma_regs=%p, tx=%p, rx=%p\n",
+		DBGPR("  %s: queue=%u, dma_regs=%p, dma_irq=%d, tx=%p, rx=%p\n",
 		      channel->name, channel->queue_index, channel->dma_regs,
-		      channel->tx_ring, channel->rx_ring);
+		      channel->dma_irq, channel->tx_ring, channel->rx_ring);
 	}
 
 	pdata->channel = channel_mem;
@@ -178,6 +194,9 @@ static int xgbe_alloc_channels(struct xgbe_prv_data *pdata)
 
 	return 0;
 
+err_irq:
+	kfree(rx_ring);
+
 err_rx_ring:
 	kfree(tx_ring);
 
@@ -185,9 +204,7 @@ err_tx_ring:
 	kfree(channel_mem);
 
 err_channel:
-	netdev_err(pdata->netdev, "channel allocation failed\n");
-
-	return -ENOMEM;
+	return ret;
 }
 
 static void xgbe_free_channels(struct xgbe_prv_data *pdata)
@@ -287,11 +304,7 @@ static irqreturn_t xgbe_isr(int irq, void *data)
 	if (!dma_isr)
 		goto isr_done;
 
-	DBGPR("-->xgbe_isr\n");
-
 	DBGPR("  DMA_ISR = %08x\n", dma_isr);
-	DBGPR("  DMA_DS0 = %08x\n", XGMAC_IOREAD(pdata, DMA_DSR0));
-	DBGPR("  DMA_DS1 = %08x\n", XGMAC_IOREAD(pdata, DMA_DSR1));
 
 	for (i = 0; i < pdata->channel_count; i++) {
 		if (!(dma_isr & (1 << i)))
@@ -302,6 +315,10 @@ static irqreturn_t xgbe_isr(int irq, void *data)
 		dma_ch_isr = XGMAC_DMA_IOREAD(channel, DMA_CH_SR);
 		DBGPR("  DMA_CH%u_ISR = %08x\n", i, dma_ch_isr);
 
+		/* If we get a TI or RI interrupt that means per channel DMA
+		 * interrupts are not enabled, so we use the private data napi
+		 * structure, not the per channel napi structure
+		 */
 		if (XGMAC_GET_BITS(dma_ch_isr, DMA_CH_SR, TI) ||
 		    XGMAC_GET_BITS(dma_ch_isr, DMA_CH_SR, RI)) {
 			if (napi_schedule_prep(&pdata->napi)) {
@@ -344,12 +361,28 @@ static irqreturn_t xgbe_isr(int irq, void *data)
 
 	DBGPR("  DMA_ISR = %08x\n", XGMAC_IOREAD(pdata, DMA_ISR));
 
-	DBGPR("<--xgbe_isr\n");
-
 isr_done:
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t xgbe_dma_isr(int irq, void *data)
+{
+	struct xgbe_channel *channel = data;
+
+	/* Per channel DMA interrupts are enabled, so we use the per
+	 * channel napi structure and not the private data napi structure
+	 */
+	if (napi_schedule_prep(&channel->napi)) {
+		/* Disable Tx and Rx interrupts */
+		disable_irq(channel->dma_irq);
+
+		/* Turn on polling */
+		__napi_schedule(&channel->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
 static enum hrtimer_restart xgbe_tx_timer(struct hrtimer *timer)
 {
 	struct xgbe_channel *channel = container_of(timer,
@@ -357,18 +390,24 @@ static enum hrtimer_restart xgbe_tx_timer(struct hrtimer *timer)
 						    tx_timer);
 	struct xgbe_ring *ring = channel->tx_ring;
 	struct xgbe_prv_data *pdata = channel->pdata;
+	struct napi_struct *napi;
 	unsigned long flags;
 
 	DBGPR("-->xgbe_tx_timer\n");
 
+	napi = (pdata->per_channel_irq) ? &channel->napi : &pdata->napi;
+
 	spin_lock_irqsave(&ring->lock, flags);
 
-	if (napi_schedule_prep(&pdata->napi)) {
+	if (napi_schedule_prep(napi)) {
 		/* Disable Tx and Rx interrupts */
-		xgbe_disable_rx_tx_ints(pdata);
+		if (pdata->per_channel_irq)
+			disable_irq(channel->dma_irq);
+		else
+			xgbe_disable_rx_tx_ints(pdata);
 
 		/* Turn on polling */
-		__napi_schedule(&pdata->napi);
+		__napi_schedule(napi);
 	}
 
 	channel->tx_timer_active = 0;
@@ -504,18 +543,46 @@ void xgbe_get_all_hw_features(struct xgbe_prv_data *pdata)
 
 static void xgbe_napi_enable(struct xgbe_prv_data *pdata, unsigned int add)
 {
-	if (add)
-		netif_napi_add(pdata->netdev, &pdata->napi, xgbe_poll,
-			       NAPI_POLL_WEIGHT);
-	napi_enable(&pdata->napi);
+	struct xgbe_channel *channel;
+	unsigned int i;
+
+	if (pdata->per_channel_irq) {
+		channel = pdata->channel;
+		for (i = 0; i < pdata->channel_count; i++, channel++) {
+			if (add)
+				netif_napi_add(pdata->netdev, &channel->napi,
+					       xgbe_one_poll, NAPI_POLL_WEIGHT);
+
+			napi_enable(&channel->napi);
+		}
+	} else {
+		if (add)
+			netif_napi_add(pdata->netdev, &pdata->napi,
+				       xgbe_all_poll, NAPI_POLL_WEIGHT);
+
+		napi_enable(&pdata->napi);
+	}
 }
 
 static void xgbe_napi_disable(struct xgbe_prv_data *pdata, unsigned int del)
 {
-	napi_disable(&pdata->napi);
+	struct xgbe_channel *channel;
+	unsigned int i;
+
+	if (pdata->per_channel_irq) {
+		channel = pdata->channel;
+		for (i = 0; i < pdata->channel_count; i++, channel++) {
+			napi_disable(&channel->napi);
 
-	if (del)
-		netif_napi_del(&pdata->napi);
+			if (del)
+				netif_napi_del(&channel->napi);
+		}
+	} else {
+		napi_disable(&pdata->napi);
+
+		if (del)
+			netif_napi_del(&pdata->napi);
+	}
 }
 
 void xgbe_init_tx_coalesce(struct xgbe_prv_data *pdata)
@@ -828,7 +895,9 @@ static void xgbe_stop(struct xgbe_prv_data *pdata)
 
 static void xgbe_restart_dev(struct xgbe_prv_data *pdata, unsigned int reset)
 {
+	struct xgbe_channel *channel;
 	struct xgbe_hw_if *hw_if = &pdata->hw_if;
+	unsigned int i;
 
 	DBGPR("-->xgbe_restart_dev\n");
 
@@ -837,7 +906,12 @@ static void xgbe_restart_dev(struct xgbe_prv_data *pdata, unsigned int reset)
 		return;
 
 	xgbe_stop(pdata);
-	synchronize_irq(pdata->irq_number);
+	synchronize_irq(pdata->dev_irq);
+	if (pdata->per_channel_irq) {
+		channel = pdata->channel;
+		for (i = 0; i < pdata->channel_count; i++, channel++)
+			synchronize_irq(channel->dma_irq);
+	}
 
 	xgbe_free_tx_data(pdata);
 	xgbe_free_rx_data(pdata);
@@ -1165,6 +1239,9 @@ static int xgbe_open(struct net_device *netdev)
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
 	struct xgbe_hw_if *hw_if = &pdata->hw_if;
 	struct xgbe_desc_if *desc_if = &pdata->desc_if;
+	struct xgbe_channel *channel = NULL;
+	char dma_irq_name[IFNAMSIZ + 32];
+	unsigned int i = 0;
 	int ret;
 
 	DBGPR("-->xgbe_open\n");
@@ -1208,14 +1285,32 @@ static int xgbe_open(struct net_device *netdev)
 	INIT_WORK(&pdata->tx_tstamp_work, xgbe_tx_tstamp);
 
 	/* Request interrupts */
-	ret = devm_request_irq(pdata->dev, netdev->irq, xgbe_isr, 0,
+	ret = devm_request_irq(pdata->dev, pdata->dev_irq, xgbe_isr, 0,
 			       netdev->name, pdata);
 	if (ret) {
 		netdev_alert(netdev, "error requesting irq %d\n",
-			     pdata->irq_number);
+			     pdata->dev_irq);
 		goto err_rings;
 	}
-	pdata->irq_number = netdev->irq;
+
+	if (pdata->per_channel_irq) {
+		channel = pdata->channel;
+		for (i = 0; i < pdata->channel_count; i++, channel++) {
+			snprintf(dma_irq_name, sizeof(dma_irq_name) - 1,
+				 "%s-TxRx-%u", netdev_name(netdev),
+				 channel->queue_index);
+
+			ret = devm_request_irq(pdata->dev, channel->dma_irq,
+					       xgbe_dma_isr, 0, dma_irq_name,
+					       channel);
+			if (ret) {
+				netdev_alert(netdev,
+					     "error requesting irq %d\n",
+					     channel->dma_irq);
+				goto err_irq;
+			}
+		}
+	}
 
 	ret = xgbe_start(pdata);
 	if (ret)
@@ -1228,8 +1323,14 @@ static int xgbe_open(struct net_device *netdev)
 err_start:
 	hw_if->exit(pdata);
 
-	devm_free_irq(pdata->dev, pdata->irq_number, pdata);
-	pdata->irq_number = 0;
+err_irq:
+	if (pdata->per_channel_irq) {
+		/* Using an unsigned int, 'i' will go to UINT_MAX and exit */
+		for (i--, channel--; i < pdata->channel_count; i--, channel--)
+			devm_free_irq(pdata->dev, channel->dma_irq, channel);
+	}
+
+	devm_free_irq(pdata->dev, pdata->dev_irq, pdata);
 
 err_rings:
 	desc_if->free_ring_resources(pdata);
@@ -1254,6 +1355,8 @@ static int xgbe_close(struct net_device *netdev)
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
 	struct xgbe_hw_if *hw_if = &pdata->hw_if;
 	struct xgbe_desc_if *desc_if = &pdata->desc_if;
+	struct xgbe_channel *channel;
+	unsigned int i;
 
 	DBGPR("-->xgbe_close\n");
 
@@ -1269,10 +1372,12 @@ static int xgbe_close(struct net_device *netdev)
 	/* Free the channel and ring structures */
 	xgbe_free_channels(pdata);
 
-	/* Release the interrupt */
-	if (pdata->irq_number != 0) {
-		devm_free_irq(pdata->dev, pdata->irq_number, pdata);
-		pdata->irq_number = 0;
+	/* Release the interrupts */
+	devm_free_irq(pdata->dev, pdata->dev_irq, pdata);
+	if (pdata->per_channel_irq) {
+		channel = pdata->channel;
+		for (i = 0; i < pdata->channel_count; i++, channel++)
+			devm_free_irq(pdata->dev, channel->dma_irq, channel);
 	}
 
 	/* Disable the clocks */
@@ -1505,14 +1610,20 @@ static int xgbe_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto,
 static void xgbe_poll_controller(struct net_device *netdev)
 {
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
+	struct xgbe_channel *channel;
+	unsigned int i;
 
 	DBGPR("-->xgbe_poll_controller\n");
 
-	disable_irq(pdata->irq_number);
-
-	xgbe_isr(pdata->irq_number, pdata);
-
-	enable_irq(pdata->irq_number);
+	if (pdata->per_channel_irq) {
+		channel = pdata->channel;
+		for (i = 0; i < pdata->channel_count; i++, channel++)
+			xgbe_dma_isr(channel->dma_irq, channel);
+	} else {
+		disable_irq(pdata->dev_irq);
+		xgbe_isr(pdata->dev_irq, pdata);
+		enable_irq(pdata->dev_irq);
+	}
 
 	DBGPR("<--xgbe_poll_controller\n");
 }
@@ -1704,6 +1815,7 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget)
 	struct xgbe_ring_data *rdata;
 	struct xgbe_packet_data *packet;
 	struct net_device *netdev = pdata->netdev;
+	struct napi_struct *napi;
 	struct sk_buff *skb;
 	struct skb_shared_hwtstamps *hwtstamps;
 	unsigned int incomplete, error, context_next, context;
@@ -1717,6 +1829,8 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget)
 	if (!ring)
 		return 0;
 
+	napi = (pdata->per_channel_irq) ? &channel->napi : &pdata->napi;
+
 	rdata = XGBE_GET_DESC_DATA(ring, ring->cur);
 	packet = &ring->packet_data;
 	while (packet_count < budget) {
@@ -1849,10 +1963,10 @@ read_again:
 		skb->dev = netdev;
 		skb->protocol = eth_type_trans(skb, netdev);
 		skb_record_rx_queue(skb, channel->queue_index);
-		skb_mark_napi_id(skb, &pdata->napi);
+		skb_mark_napi_id(skb, napi);
 
 		netdev->last_rx = jiffies;
-		napi_gro_receive(&pdata->napi, skb);
+		napi_gro_receive(napi, skb);
 
 next_packet:
 		packet_count++;
@@ -1874,7 +1988,35 @@ next_packet:
 	return packet_count;
 }
 
-static int xgbe_poll(struct napi_struct *napi, int budget)
+static int xgbe_one_poll(struct napi_struct *napi, int budget)
+{
+	struct xgbe_channel *channel = container_of(napi, struct xgbe_channel,
+						    napi);
+	int processed = 0;
+
+	DBGPR("-->xgbe_one_poll: budget=%d\n", budget);
+
+	/* Cleanup Tx ring first */
+	xgbe_tx_poll(channel);
+
+	/* Process Rx ring next */
+	processed = xgbe_rx_poll(channel, budget);
+
+	/* If we processed everything, we are done */
+	if (processed < budget) {
+		/* Turn off polling */
+		napi_complete(napi);
+
+		/* Enable Tx and Rx interrupts */
+		enable_irq(channel->dma_irq);
+	}
+
+	DBGPR("<--xgbe_one_poll: received = %d\n", processed);
+
+	return processed;
+}
+
+static int xgbe_all_poll(struct napi_struct *napi, int budget)
 {
 	struct xgbe_prv_data *pdata = container_of(napi, struct xgbe_prv_data,
 						   napi);
@@ -1883,7 +2025,7 @@ static int xgbe_poll(struct napi_struct *napi, int budget)
 	int processed, last_processed;
 	unsigned int i;
 
-	DBGPR("-->xgbe_poll: budget=%d\n", budget);
+	DBGPR("-->xgbe_all_poll: budget=%d\n", budget);
 
 	processed = 0;
 	ring_budget = budget / pdata->rx_ring_count;
@@ -1911,7 +2053,7 @@ static int xgbe_poll(struct napi_struct *napi, int budget)
 		xgbe_enable_rx_tx_ints(pdata);
 	}
 
-	DBGPR("<--xgbe_poll: received = %d\n", processed);
+	DBGPR("<--xgbe_all_poll: received = %d\n", processed);
 
 	return processed;
 }
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index e5077fd5b012..cff9902d1456 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -264,12 +264,18 @@ static int xgbe_probe(struct platform_device *pdev)
 		pdata->awcache = XGBE_DMA_SYS_AWCACHE;
 	}
 
+	/* Check for per channel interrupt support */
+	if (of_property_read_bool(dev->of_node, XGBE_DMA_IRQS))
+		pdata->per_channel_irq = 1;
+
 	ret = platform_get_irq(pdev, 0);
 	if (ret < 0) {
-		dev_err(dev, "platform_get_irq failed\n");
+		dev_err(dev, "platform_get_irq 0 failed\n");
 		goto err_io;
 	}
-	netdev->irq = ret;
+	pdata->dev_irq = ret;
+
+	netdev->irq = pdata->dev_irq;
 	netdev->base_addr = (unsigned long)pdata->xgmac_regs;
 
 	/* Set all the function pointers */
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h
index 1480c9d41821..55c935f4884a 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -173,6 +173,7 @@
 /* Device-tree clock names */
 #define XGBE_DMA_CLOCK		"dma_clk"
 #define XGBE_PTP_CLOCK		"ptp_clk"
+#define XGBE_DMA_IRQS		"amd,per-channel-interrupt"
 
 /* Timestamp support - values based on 50MHz PTP clock
  *   50MHz => 20 nsec
@@ -359,6 +360,12 @@ struct xgbe_channel {
 	unsigned int queue_index;
 	void __iomem *dma_regs;
 
+	/* Per channel interrupt irq number */
+	int dma_irq;
+
+	/* Netdev related settings */
+	struct napi_struct napi;
+
 	unsigned int saved_ier;
 
 	unsigned int tx_timer_active;
@@ -609,7 +616,8 @@ struct xgbe_prv_data {
 	/* XPCS indirect addressing mutex */
 	struct mutex xpcs_mutex;
 
-	int irq_number;
+	int dev_irq;
+	unsigned int per_channel_irq;
 
 	struct xgbe_hw_if hw_if;
 	struct xgbe_desc_if desc_if;
-- 
cgit v1.2.3


From ba7a46f16dd29f93303daeb1fee8af316c5a07f4 Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 11 Nov 2014 10:59:17 -0800
Subject: net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited

Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_<LEVEL> uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/sysctl/net.txt | 12 ++++++++----
 include/net/sock.h           |  7 -------
 include/net/udplite.h        |  6 +++---
 net/core/sysctl_net_core.c   |  2 ++
 net/core/utils.c             |  3 ---
 net/ipv4/icmp.c              |  8 ++++----
 net/ipv4/inet_fragment.c     |  2 +-
 net/ipv4/ip_fragment.c       |  3 +--
 net/ipv4/tcp_input.c         |  8 ++++----
 net/ipv4/tcp_timer.c         | 18 ++++++++++--------
 net/ipv4/udp.c               | 30 +++++++++++++++---------------
 net/ipv6/addrconf.c          |  6 ++----
 net/ipv6/ah6.c               |  7 +++----
 net/ipv6/datagram.c          |  4 ++--
 net/ipv6/esp6.c              |  4 ++--
 net/ipv6/exthdrs.c           | 18 +++++++++---------
 net/ipv6/icmp.c              | 15 +++++++--------
 net/ipv6/mip6.c              | 11 ++++++-----
 net/ipv6/netfilter.c         |  2 +-
 net/ipv6/udp.c               | 31 +++++++++++++------------------
 net/phonet/af_phonet.c       |  9 +++++----
 net/phonet/pep-gprs.c        |  3 +--
 net/phonet/pep.c             | 12 ++++++------
 23 files changed, 105 insertions(+), 116 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 04892b821157..e26c607468a6 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -120,10 +120,14 @@ seconds.
 warnings
 --------
 
-This controls console messages from the networking stack that can occur because
-of problems on the network like duplicate address or bad checksums. Normally,
-this should be enabled, but if the problem persists the messages can be
-disabled.
+This sysctl is now unused.
+
+This was used to control console messages from the networking stack that
+occur because of problems on the network like duplicate address or bad
+checksums.
+
+These messages are now emitted at KERN_DEBUG and can generally be enabled
+and controlled by the dynamic_debug facility.
 
 netdev_budget
 -------------
diff --git a/include/net/sock.h b/include/net/sock.h
index 7789b59c0c40..83a669f83bae 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2288,13 +2288,6 @@ bool sk_ns_capable(const struct sock *sk,
 bool sk_capable(const struct sock *sk, int cap);
 bool sk_net_capable(const struct sock *sk, int cap);
 
-/*
- *	Enable debug/info messages
- */
-extern int net_msg_warn;
-#define LIMIT_NETDEBUG(fmt, args...) \
-	do { if (net_msg_warn && net_ratelimit()) printk(fmt,##args); } while(0)
-
 extern __u32 sysctl_wmem_max;
 extern __u32 sysctl_rmem_max;
 
diff --git a/include/net/udplite.h b/include/net/udplite.h
index 2caadabcd07b..9a28a5179400 100644
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -40,7 +40,7 @@ static inline int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh)
          * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
          * with a zero checksum field are illegal.                            */
 	if (uh->check == 0) {
-		LIMIT_NETDEBUG(KERN_DEBUG "UDPLite: zeroed checksum field\n");
+		net_dbg_ratelimited("UDPLite: zeroed checksum field\n");
 		return 1;
 	}
 
@@ -52,8 +52,8 @@ static inline int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh)
 		/*
 		 * Coverage length violates RFC 3828: log and discard silently.
 		 */
-		LIMIT_NETDEBUG(KERN_DEBUG "UDPLite: bad csum coverage %d/%d\n",
-			       cscov, skb->len);
+		net_dbg_ratelimited("UDPLite: bad csum coverage %d/%d\n",
+				    cscov, skb->len);
 		return 1;
 
 	} else if (cscov < skb->len) {
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cf9cd13509a7..f93f092fe226 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -26,6 +26,8 @@ static int zero = 0;
 static int one = 1;
 static int ushort_max = USHRT_MAX;
 
+static int net_msg_warn;	/* Unused, but still a sysctl */
+
 #ifdef CONFIG_RPS
 static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos)
diff --git a/net/core/utils.c b/net/core/utils.c
index efc76dd9dcd1..7b803884c162 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -33,9 +33,6 @@
 #include <asm/byteorder.h>
 #include <asm/uaccess.h>
 
-int net_msg_warn __read_mostly = 1;
-EXPORT_SYMBOL(net_msg_warn);
-
 DEFINE_RATELIMIT_STATE(net_ratelimit_state, 5 * HZ, 10);
 /*
  * All net warning printk()s should be guarded by this function.
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5882f584910e..36b7bfa609d6 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -784,8 +784,8 @@ static void icmp_unreach(struct sk_buff *skb)
 			 */
 			switch (net->ipv4.sysctl_ip_no_pmtu_disc) {
 			default:
-				LIMIT_NETDEBUG(KERN_INFO pr_fmt("%pI4: fragmentation needed and DF set\n"),
-					       &iph->daddr);
+				net_dbg_ratelimited("%pI4: fragmentation needed and DF set\n",
+						    &iph->daddr);
 				break;
 			case 2:
 				goto out;
@@ -798,8 +798,8 @@ static void icmp_unreach(struct sk_buff *skb)
 			}
 			break;
 		case ICMP_SR_FAILED:
-			LIMIT_NETDEBUG(KERN_INFO pr_fmt("%pI4: Source Route Failed\n"),
-				       &iph->daddr);
+			net_dbg_ratelimited("%pI4: Source Route Failed\n",
+					    &iph->daddr);
 			break;
 		default:
 			break;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 19419b60cb37..e7920352646a 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -458,6 +458,6 @@ void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
 		". Dropping fragment.\n";
 
 	if (PTR_ERR(q) == -ENOBUFS)
-		LIMIT_NETDEBUG(KERN_WARNING "%s%s", prefix, msg);
+		net_dbg_ratelimited("%s%s", prefix, msg);
 }
 EXPORT_SYMBOL(inet_frag_maybe_warn_overflow);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 4d964dadd655..e5b6d0ddcb58 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -618,8 +618,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 	return 0;
 
 out_nomem:
-	LIMIT_NETDEBUG(KERN_ERR pr_fmt("queue_glue: no memory for gluing queue %p\n"),
-		       qp);
+	net_dbg_ratelimited("queue_glue: no memory for gluing queue %p\n", qp);
 	err = -ENOMEM;
 	goto out_fail;
 out_oversize:
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5f979c7f5135..d91436ba17ea 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5854,12 +5854,12 @@ static inline void pr_drop_req(struct request_sock *req, __u16 port, int family)
 	struct inet_request_sock *ireq = inet_rsk(req);
 
 	if (family == AF_INET)
-		LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("drop open request from %pI4/%u\n"),
-			       &ireq->ir_rmt_addr, port);
+		net_dbg_ratelimited("drop open request from %pI4/%u\n",
+				    &ireq->ir_rmt_addr, port);
 #if IS_ENABLED(CONFIG_IPV6)
 	else if (family == AF_INET6)
-		LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("drop open request from %pI6/%u\n"),
-			       &ireq->ir_v6_rmt_addr, port);
+		net_dbg_ratelimited("drop open request from %pI6/%u\n",
+				    &ireq->ir_v6_rmt_addr, port);
 #endif
 }
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 9b21ae8b2e31..1829c7fbc77e 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -374,17 +374,19 @@ void tcp_retransmit_timer(struct sock *sk)
 		 */
 		struct inet_sock *inet = inet_sk(sk);
 		if (sk->sk_family == AF_INET) {
-			LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("Peer %pI4:%u/%u unexpectedly shrunk window %u:%u (repaired)\n"),
-				       &inet->inet_daddr,
-				       ntohs(inet->inet_dport), inet->inet_num,
-				       tp->snd_una, tp->snd_nxt);
+			net_dbg_ratelimited("Peer %pI4:%u/%u unexpectedly shrunk window %u:%u (repaired)\n",
+					    &inet->inet_daddr,
+					    ntohs(inet->inet_dport),
+					    inet->inet_num,
+					    tp->snd_una, tp->snd_nxt);
 		}
 #if IS_ENABLED(CONFIG_IPV6)
 		else if (sk->sk_family == AF_INET6) {
-			LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("Peer %pI6:%u/%u unexpectedly shrunk window %u:%u (repaired)\n"),
-				       &sk->sk_v6_daddr,
-				       ntohs(inet->inet_dport), inet->inet_num,
-				       tp->snd_una, tp->snd_nxt);
+			net_dbg_ratelimited("Peer %pI6:%u/%u unexpectedly shrunk window %u:%u (repaired)\n",
+					    &sk->sk_v6_daddr,
+					    ntohs(inet->inet_dport),
+					    inet->inet_num,
+					    tp->snd_una, tp->snd_nxt);
 		}
 #endif
 		if (tcp_time_stamp - tp->rcv_tstamp > TCP_RTO_MAX) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d13751685f44..1b6e9d5fadef 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1051,7 +1051,7 @@ back_from_confirm:
 		/* ... which is an evident application bug. --ANK */
 		release_sock(sk);
 
-		LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("cork app bug 2\n"));
+		net_dbg_ratelimited("cork app bug 2\n");
 		err = -EINVAL;
 		goto out;
 	}
@@ -1133,7 +1133,7 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 	if (unlikely(!up->pending)) {
 		release_sock(sk);
 
-		LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("udp cork app bug 3\n"));
+		net_dbg_ratelimited("udp cork app bug 3\n");
 		return -EINVAL;
 	}
 
@@ -1547,8 +1547,8 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		 * provided by the application."
 		 */
 		if (up->pcrlen == 0) {          /* full coverage was set  */
-			LIMIT_NETDEBUG(KERN_WARNING "UDPLite: partial coverage %d while full coverage %d requested\n",
-				       UDP_SKB_CB(skb)->cscov, skb->len);
+			net_dbg_ratelimited("UDPLite: partial coverage %d while full coverage %d requested\n",
+					    UDP_SKB_CB(skb)->cscov, skb->len);
 			goto drop;
 		}
 		/* The next case involves violating the min. coverage requested
@@ -1558,8 +1558,8 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		 * Therefore the above ...()->partial_cov statement is essential.
 		 */
 		if (UDP_SKB_CB(skb)->cscov  <  up->pcrlen) {
-			LIMIT_NETDEBUG(KERN_WARNING "UDPLite: coverage %d too small, need min %d\n",
-				       UDP_SKB_CB(skb)->cscov, up->pcrlen);
+			net_dbg_ratelimited("UDPLite: coverage %d too small, need min %d\n",
+					    UDP_SKB_CB(skb)->cscov, up->pcrlen);
 			goto drop;
 		}
 	}
@@ -1828,11 +1828,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	return 0;
 
 short_packet:
-	LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %pI4:%u %d/%d to %pI4:%u\n",
-		       proto == IPPROTO_UDPLITE ? "Lite" : "",
-		       &saddr, ntohs(uh->source),
-		       ulen, skb->len,
-		       &daddr, ntohs(uh->dest));
+	net_dbg_ratelimited("UDP%s: short packet: From %pI4:%u %d/%d to %pI4:%u\n",
+			    proto == IPPROTO_UDPLITE ? "Lite" : "",
+			    &saddr, ntohs(uh->source),
+			    ulen, skb->len,
+			    &daddr, ntohs(uh->dest));
 	goto drop;
 
 csum_error:
@@ -1840,10 +1840,10 @@ csum_error:
 	 * RFC1122: OK.  Discards the bad packet silently (as far as
 	 * the network is concerned, anyway) as per 4.1.3.4 (MUST).
 	 */
-	LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %pI4:%u to %pI4:%u ulen %d\n",
-		       proto == IPPROTO_UDPLITE ? "Lite" : "",
-		       &saddr, ntohs(uh->source), &daddr, ntohs(uh->dest),
-		       ulen);
+	net_dbg_ratelimited("UDP%s: bad checksum. From %pI4:%u to %pI4:%u ulen %d\n",
+			    proto == IPPROTO_UDPLITE ? "Lite" : "",
+			    &saddr, ntohs(uh->source), &daddr, ntohs(uh->dest),
+			    ulen);
 	UDP_INC_STATS_BH(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
 drop:
 	UDP_INC_STATS_BH(net, UDP_MIB_INERRORS, proto == IPPROTO_UDPLITE);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 06e897832a7a..251fcb48b216 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1411,10 +1411,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
 
 			if (unlikely(score->addr_type == IPV6_ADDR_ANY ||
 				     score->addr_type & IPV6_ADDR_MULTICAST)) {
-				LIMIT_NETDEBUG(KERN_DEBUG
-					       "ADDRCONF: unspecified / multicast address "
-					       "assigned as unicast address on %s",
-					       dev->name);
+				net_dbg_ratelimited("ADDRCONF: unspecified / multicast address assigned as unicast address on %s",
+						    dev->name);
 				continue;
 			}
 
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 6d16eb0e0c7f..8ab1989198f6 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -272,10 +272,9 @@ static int ipv6_clear_mutable_options(struct ipv6hdr *iph, int len, int dir)
 				ipv6_rearrange_destopt(iph, exthdr.opth);
 		case NEXTHDR_HOP:
 			if (!zero_out_mutable_opts(exthdr.opth)) {
-				LIMIT_NETDEBUG(
-					KERN_WARNING "overrun %sopts\n",
-					nexthdr == NEXTHDR_HOP ?
-						"hop" : "dest");
+				net_dbg_ratelimited("overrun %sopts\n",
+						    nexthdr == NEXTHDR_HOP ?
+						    "hop" : "dest");
 				return -EINVAL;
 			}
 			break;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 5c6996e44b14..cc1139687fd7 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -893,8 +893,8 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 			break;
 		    }
 		default:
-			LIMIT_NETDEBUG(KERN_DEBUG "invalid cmsg type: %d\n",
-				       cmsg->cmsg_type);
+			net_dbg_ratelimited("invalid cmsg type: %d\n",
+					    cmsg->cmsg_type);
 			err = -EINVAL;
 			goto exit_f;
 		}
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index d21d7b22eebc..d2c2d749b6db 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -286,8 +286,8 @@ static int esp_input_done2(struct sk_buff *skb, int err)
 	err = -EINVAL;
 	padlen = nexthdr[0];
 	if (padlen + 2 + alen >= elen) {
-		LIMIT_NETDEBUG(KERN_WARNING "ipsec esp packet is garbage "
-			       "padlen=%d, elen=%d\n", padlen + 2, elen - alen);
+		net_dbg_ratelimited("ipsec esp packet is garbage padlen=%d, elen=%d\n",
+				    padlen + 2, elen - alen);
 		goto out;
 	}
 
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 601d896f22d0..a7bbbe45570b 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -184,7 +184,7 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
 	int ret;
 
 	if (opt->dsthao) {
-		LIMIT_NETDEBUG(KERN_DEBUG "hao duplicated\n");
+		net_dbg_ratelimited("hao duplicated\n");
 		goto discard;
 	}
 	opt->dsthao = opt->dst1;
@@ -193,14 +193,14 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
 	hao = (struct ipv6_destopt_hao *)(skb_network_header(skb) + optoff);
 
 	if (hao->length != 16) {
-		LIMIT_NETDEBUG(
-			KERN_DEBUG "hao invalid option length = %d\n", hao->length);
+		net_dbg_ratelimited("hao invalid option length = %d\n",
+				    hao->length);
 		goto discard;
 	}
 
 	if (!(ipv6_addr_type(&hao->addr) & IPV6_ADDR_UNICAST)) {
-		LIMIT_NETDEBUG(
-			KERN_DEBUG "hao is not an unicast addr: %pI6\n", &hao->addr);
+		net_dbg_ratelimited("hao is not an unicast addr: %pI6\n",
+				    &hao->addr);
 		goto discard;
 	}
 
@@ -551,8 +551,8 @@ static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
 		memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
 		return true;
 	}
-	LIMIT_NETDEBUG(KERN_DEBUG "ipv6_hop_ra: wrong RA length %d\n",
-		       nh[optoff + 1]);
+	net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
+			    nh[optoff + 1]);
 	kfree_skb(skb);
 	return false;
 }
@@ -566,8 +566,8 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
 	u32 pkt_len;
 
 	if (nh[optoff + 1] != 4 || (optoff & 3) != 2) {
-		LIMIT_NETDEBUG(KERN_DEBUG "ipv6_hop_jumbo: wrong jumbo opt length/alignment %d\n",
-			       nh[optoff+1]);
+		net_dbg_ratelimited("ipv6_hop_jumbo: wrong jumbo opt length/alignment %d\n",
+				    nh[optoff+1]);
 		IP6_INC_STATS_BH(net, ipv6_skb_idev(skb),
 				 IPSTATS_MIB_INHDRERRORS);
 		goto drop;
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 62c1037d9e83..092934032077 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -338,7 +338,7 @@ static struct dst_entry *icmpv6_route_lookup(struct net *net,
 	 * anycast.
 	 */
 	if (((struct rt6_info *)dst)->rt6i_flags & RTF_ANYCAST) {
-		LIMIT_NETDEBUG(KERN_DEBUG "icmp6_send: acast source\n");
+		net_dbg_ratelimited("icmp6_send: acast source\n");
 		dst_release(dst);
 		return ERR_PTR(-EINVAL);
 	}
@@ -452,7 +452,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	 *	and anycast addresses will be checked later.
 	 */
 	if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) {
-		LIMIT_NETDEBUG(KERN_DEBUG "icmp6_send: addr_any/mcast source\n");
+		net_dbg_ratelimited("icmp6_send: addr_any/mcast source\n");
 		return;
 	}
 
@@ -460,7 +460,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	 *	Never answer to a ICMP packet.
 	 */
 	if (is_ineligible(skb)) {
-		LIMIT_NETDEBUG(KERN_DEBUG "icmp6_send: no reply to icmp error\n");
+		net_dbg_ratelimited("icmp6_send: no reply to icmp error\n");
 		return;
 	}
 
@@ -509,7 +509,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	len = skb->len - msg.offset;
 	len = min_t(unsigned int, len, IPV6_MIN_MTU - sizeof(struct ipv6hdr) - sizeof(struct icmp6hdr));
 	if (len < 0) {
-		LIMIT_NETDEBUG(KERN_DEBUG "icmp: len problem\n");
+		net_dbg_ratelimited("icmp: len problem\n");
 		goto out_dst_release;
 	}
 
@@ -706,9 +706,8 @@ static int icmpv6_rcv(struct sk_buff *skb)
 	daddr = &ipv6_hdr(skb)->daddr;
 
 	if (skb_checksum_validate(skb, IPPROTO_ICMPV6, ip6_compute_pseudo)) {
-		LIMIT_NETDEBUG(KERN_DEBUG
-			       "ICMPv6 checksum failed [%pI6c > %pI6c]\n",
-			       saddr, daddr);
+		net_dbg_ratelimited("ICMPv6 checksum failed [%pI6c > %pI6c]\n",
+				    saddr, daddr);
 		goto csum_error;
 	}
 
@@ -781,7 +780,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
 		if (type & ICMPV6_INFOMSG_MASK)
 			break;
 
-		LIMIT_NETDEBUG(KERN_DEBUG "icmpv6: msg of unknown type\n");
+		net_dbg_ratelimited("icmpv6: msg of unknown type\n");
 
 		/*
 		 * error of unknown type.
diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index f61429d391d3..b9779d441b12 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -97,16 +97,17 @@ static int mip6_mh_filter(struct sock *sk, struct sk_buff *skb)
 		return -1;
 
 	if (mh->ip6mh_hdrlen < mip6_mh_len(mh->ip6mh_type)) {
-		LIMIT_NETDEBUG(KERN_DEBUG "mip6: MH message too short: %d vs >=%d\n",
-			       mh->ip6mh_hdrlen, mip6_mh_len(mh->ip6mh_type));
+		net_dbg_ratelimited("mip6: MH message too short: %d vs >=%d\n",
+				    mh->ip6mh_hdrlen,
+				    mip6_mh_len(mh->ip6mh_type));
 		mip6_param_prob(skb, 0, offsetof(struct ip6_mh, ip6mh_hdrlen) +
 				skb_network_header_len(skb));
 		return -1;
 	}
 
 	if (mh->ip6mh_proto != IPPROTO_NONE) {
-		LIMIT_NETDEBUG(KERN_DEBUG "mip6: MH invalid payload proto = %d\n",
-			       mh->ip6mh_proto);
+		net_dbg_ratelimited("mip6: MH invalid payload proto = %d\n",
+				    mh->ip6mh_proto);
 		mip6_param_prob(skb, 0, offsetof(struct ip6_mh, ip6mh_proto) +
 				skb_network_header_len(skb));
 		return -1;
@@ -288,7 +289,7 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct sk_buff *skb,
 			 * XXX: packet if HAO exists.
 			 */
 			if (ipv6_find_tlv(skb, offset, IPV6_TLV_HAO) >= 0) {
-				LIMIT_NETDEBUG(KERN_WARNING "mip6: hao exists already, override\n");
+				net_dbg_ratelimited("mip6: hao exists already, override\n");
 				return offset;
 			}
 
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index d38e6a8d8b9f..398377a9d018 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -36,7 +36,7 @@ int ip6_route_me_harder(struct sk_buff *skb)
 	err = dst->error;
 	if (err) {
 		IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
-		LIMIT_NETDEBUG(KERN_DEBUG "ip6_route_me_harder: No more route.\n");
+		net_dbg_ratelimited("ip6_route_me_harder: No more route\n");
 		dst_release(dst);
 		return err;
 	}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d1fe36274906..0ba3de4f2368 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -660,15 +660,13 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if ((is_udplite & UDPLITE_RECV_CC)  &&  UDP_SKB_CB(skb)->partial_cov) {
 
 		if (up->pcrlen == 0) {          /* full coverage was set  */
-			LIMIT_NETDEBUG(KERN_WARNING "UDPLITE6: partial coverage"
-				" %d while full coverage %d requested\n",
-				UDP_SKB_CB(skb)->cscov, skb->len);
+			net_dbg_ratelimited("UDPLITE6: partial coverage %d while full coverage %d requested\n",
+					    UDP_SKB_CB(skb)->cscov, skb->len);
 			goto drop;
 		}
 		if (UDP_SKB_CB(skb)->cscov  <  up->pcrlen) {
-			LIMIT_NETDEBUG(KERN_WARNING "UDPLITE6: coverage %d "
-						    "too small, need min %d\n",
-				       UDP_SKB_CB(skb)->cscov, up->pcrlen);
+			net_dbg_ratelimited("UDPLITE6: coverage %d too small, need min %d\n",
+					    UDP_SKB_CB(skb)->cscov, up->pcrlen);
 			goto drop;
 		}
 	}
@@ -761,9 +759,9 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
 	/* RFC 2460 section 8.1 says that we SHOULD log
 	 * this error. Well, it is reasonable.
 	 */
-	LIMIT_NETDEBUG(KERN_INFO "IPv6: udp checksum is 0 for [%pI6c]:%u->[%pI6c]:%u\n",
-		       &ipv6_hdr(skb)->saddr, ntohs(udp_hdr(skb)->source),
-		       &ipv6_hdr(skb)->daddr, ntohs(udp_hdr(skb)->dest));
+	net_dbg_ratelimited("IPv6: udp checksum is 0 for [%pI6c]:%u->[%pI6c]:%u\n",
+			    &ipv6_hdr(skb)->saddr, ntohs(udp_hdr(skb)->source),
+			    &ipv6_hdr(skb)->daddr, ntohs(udp_hdr(skb)->dest));
 }
 
 /*
@@ -931,14 +929,11 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	return 0;
 
 short_packet:
-	LIMIT_NETDEBUG(KERN_DEBUG "UDP%sv6: short packet: From [%pI6c]:%u %d/%d to [%pI6c]:%u\n",
-		       proto == IPPROTO_UDPLITE ? "-Lite" : "",
-		       saddr,
-		       ntohs(uh->source),
-		       ulen,
-		       skb->len,
-		       daddr,
-		       ntohs(uh->dest));
+	net_dbg_ratelimited("UDP%sv6: short packet: From [%pI6c]:%u %d/%d to [%pI6c]:%u\n",
+			    proto == IPPROTO_UDPLITE ? "-Lite" : "",
+			    saddr, ntohs(uh->source),
+			    ulen, skb->len,
+			    daddr, ntohs(uh->dest));
 	goto discard;
 csum_error:
 	UDP6_INC_STATS_BH(net, UDP_MIB_CSUMERRORS, proto == IPPROTO_UDPLITE);
@@ -1290,7 +1285,7 @@ back_from_confirm:
 		/* ... which is an evident application bug. --ANK */
 		release_sock(sk);
 
-		LIMIT_NETDEBUG(KERN_DEBUG "udp cork app bug 2\n");
+		net_dbg_ratelimited("udp cork app bug 2\n");
 		err = -EINVAL;
 		goto out;
 	}
diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index 5a940dbd74a3..32ab87d34828 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -426,16 +426,17 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev,
 
 		out_dev = phonet_route_output(net, pn_sockaddr_get_addr(&sa));
 		if (!out_dev) {
-			LIMIT_NETDEBUG(KERN_WARNING"No Phonet route to %02X\n",
-					pn_sockaddr_get_addr(&sa));
+			net_dbg_ratelimited("No Phonet route to %02X\n",
+					    pn_sockaddr_get_addr(&sa));
 			goto out;
 		}
 
 		__skb_push(skb, sizeof(struct phonethdr));
 		skb->dev = out_dev;
 		if (out_dev == dev) {
-			LIMIT_NETDEBUG(KERN_ERR"Phonet loop to %02X on %s\n",
-					pn_sockaddr_get_addr(&sa), dev->name);
+			net_dbg_ratelimited("Phonet loop to %02X on %s\n",
+					    pn_sockaddr_get_addr(&sa),
+					    dev->name);
 			goto out_dev;
 		}
 		/* Some drivers (e.g. TUN) do not allocate HW header space */
diff --git a/net/phonet/pep-gprs.c b/net/phonet/pep-gprs.c
index e9a83a637185..fa8237fdc57b 100644
--- a/net/phonet/pep-gprs.c
+++ b/net/phonet/pep-gprs.c
@@ -203,8 +203,7 @@ static netdev_tx_t gprs_xmit(struct sk_buff *skb, struct net_device *dev)
 	len = skb->len;
 	err = pep_write(sk, skb);
 	if (err) {
-		LIMIT_NETDEBUG(KERN_WARNING"%s: TX error (%d)\n",
-				dev->name, err);
+		net_dbg_ratelimited("%s: TX error (%d)\n", dev->name, err);
 		dev->stats.tx_aborted_errors++;
 		dev->stats.tx_errors++;
 	} else {
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 44b2123e22b8..9cd069dfaf65 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -272,8 +272,8 @@ static int pipe_rcv_status(struct sock *sk, struct sk_buff *skb)
 
 	hdr = pnp_hdr(skb);
 	if (hdr->data[0] != PN_PEP_TYPE_COMMON) {
-		LIMIT_NETDEBUG(KERN_DEBUG"Phonet unknown PEP type: %u\n",
-				(unsigned int)hdr->data[0]);
+		net_dbg_ratelimited("Phonet unknown PEP type: %u\n",
+				    (unsigned int)hdr->data[0]);
 		return -EOPNOTSUPP;
 	}
 
@@ -304,8 +304,8 @@ static int pipe_rcv_status(struct sock *sk, struct sk_buff *skb)
 		break;
 
 	default:
-		LIMIT_NETDEBUG(KERN_DEBUG"Phonet unknown PEP indication: %u\n",
-				(unsigned int)hdr->data[1]);
+		net_dbg_ratelimited("Phonet unknown PEP indication: %u\n",
+				    (unsigned int)hdr->data[1]);
 		return -EOPNOTSUPP;
 	}
 	if (wake)
@@ -451,8 +451,8 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
 		break;
 
 	default:
-		LIMIT_NETDEBUG(KERN_DEBUG"Phonet unknown PEP message: %u\n",
-				hdr->message_id);
+		net_dbg_ratelimited("Phonet unknown PEP message: %u\n",
+				    hdr->message_id);
 		err = -EINVAL;
 	}
 out:
-- 
cgit v1.2.3


From 71783576b5345d63df048c0f18974037eea6e4f9 Mon Sep 17 00:00:00 2001
From: Hauke Mehrtens <hauke@hauke-m.de>
Date: Sat, 1 Nov 2014 16:54:56 +0100
Subject: bcma: get IRQ numbers from dt

It is not possible to auto detect the irq numbers used by the cores on
an arm SoC. If bcma was registered with device tree it will search for
some device tree nodes with the irq number and add it to the core
configuration.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
 Documentation/devicetree/bindings/bus/bcma.txt | 21 +++++++++++
 drivers/bcma/main.c                            | 52 +++++++++++++++++++++++++-
 2 files changed, 72 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/bus/bcma.txt b/Documentation/devicetree/bindings/bus/bcma.txt
index 62a48348ac15..edd44d802139 100644
--- a/Documentation/devicetree/bindings/bus/bcma.txt
+++ b/Documentation/devicetree/bindings/bus/bcma.txt
@@ -8,6 +8,11 @@ Required properties:
 
 The cores on the AXI bus are automatically detected by bcma with the
 memory ranges they are using and they get registered afterwards.
+Automatic detection of the IRQ number is not working on
+BCM47xx/BCM53xx ARM SoCs. To assign IRQ numbers to the cores, provide
+them manually through device tree. Use an interrupt-map to specify the
+IRQ used by the devices on the bus. The first address is just an index,
+because we do not have any special register.
 
 The top-level axi bus may contain children representing attached cores
 (devices). This is needed since some hardware details can't be auto
@@ -22,6 +27,22 @@ Example:
 		ranges = <0x00000000 0x18000000 0x00100000>;
 		#address-cells = <1>;
 		#size-cells = <1>;
+		#interrupt-cells = <1>;
+		interrupt-map-mask = <0x000fffff 0xffff>;
+		interrupt-map =
+			/* Ethernet Controller 0 */
+			<0x00024000 0 &gic GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>,
+
+			/* Ethernet Controller 1 */
+			<0x00025000 0 &gic GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
+
+			/* PCIe Controller 0 */
+			<0x00012000 0 &gic GIC_SPI 126 IRQ_TYPE_LEVEL_HIGH>,
+			<0x00012000 1 &gic GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>,
+			<0x00012000 2 &gic GIC_SPI 128 IRQ_TYPE_LEVEL_HIGH>,
+			<0x00012000 3 &gic GIC_SPI 129 IRQ_TYPE_LEVEL_HIGH>,
+			<0x00012000 4 &gic GIC_SPI 130 IRQ_TYPE_LEVEL_HIGH>,
+			<0x00012000 5 &gic GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>;
 
 		chipcommon {
 			reg = <0x00000000 0x1000>;
diff --git a/drivers/bcma/main.c b/drivers/bcma/main.c
index 6d1cf5701452..122086ef9fe1 100644
--- a/drivers/bcma/main.c
+++ b/drivers/bcma/main.c
@@ -11,6 +11,7 @@
 #include <linux/bcma/bcma.h>
 #include <linux/slab.h>
 #include <linux/of_address.h>
+#include <linux/of_irq.h>
 
 MODULE_DESCRIPTION("Broadcom's specific AMBA driver");
 MODULE_LICENSE("GPL");
@@ -153,6 +154,46 @@ static struct device_node *bcma_of_find_child_device(struct platform_device *par
 	return NULL;
 }
 
+static int bcma_of_irq_parse(struct platform_device *parent,
+			     struct bcma_device *core,
+			     struct of_phandle_args *out_irq, int num)
+{
+	__be32 laddr[1];
+	int rc;
+
+	if (core->dev.of_node) {
+		rc = of_irq_parse_one(core->dev.of_node, num, out_irq);
+		if (!rc)
+			return rc;
+	}
+
+	out_irq->np = parent->dev.of_node;
+	out_irq->args_count = 1;
+	out_irq->args[0] = num;
+
+	laddr[0] = cpu_to_be32(core->addr);
+	return of_irq_parse_raw(laddr, out_irq);
+}
+
+static unsigned int bcma_of_get_irq(struct platform_device *parent,
+				    struct bcma_device *core, int num)
+{
+	struct of_phandle_args out_irq;
+	int ret;
+
+	if (!parent || !parent->dev.of_node)
+		return 0;
+
+	ret = bcma_of_irq_parse(parent, core, &out_irq, num);
+	if (ret) {
+		bcma_debug(core->bus, "bcma_of_get_irq() failed with rc=%d\n",
+			   ret);
+		return 0;
+	}
+
+	return irq_create_of_mapping(&out_irq);
+}
+
 static void bcma_of_fill_device(struct platform_device *parent,
 				struct bcma_device *core)
 {
@@ -161,12 +202,19 @@ static void bcma_of_fill_device(struct platform_device *parent,
 	node = bcma_of_find_child_device(parent, core);
 	if (node)
 		core->dev.of_node = node;
+
+	core->irq = bcma_of_get_irq(parent, core, 0);
 }
 #else
 static void bcma_of_fill_device(struct platform_device *parent,
 				struct bcma_device *core)
 {
 }
+static inline unsigned int bcma_of_get_irq(struct platform_device *parent,
+					   struct bcma_device *core, int num)
+{
+	return 0;
+}
 #endif /* CONFIG_OF */
 
 unsigned int bcma_core_irq(struct bcma_device *core, int num)
@@ -182,7 +230,9 @@ unsigned int bcma_core_irq(struct bcma_device *core, int num)
 			mips_irq = bcma_core_mips_irq(core);
 			return mips_irq <= 4 ? mips_irq + 2 : 0;
 		}
-		break;
+		if (bus->host_pdev)
+			return bcma_of_get_irq(bus->host_pdev, core, num);
+		return 0;
 	case BCMA_HOSTTYPE_SDIO:
 		return 0;
 	}
-- 
cgit v1.2.3


From c88c7d3204e66faf1f24402d551ac1d0b400e3ff Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan@kernel.org>
Date: Tue, 11 Nov 2014 20:00:07 +0100
Subject: dt/bindings: fix documentation of ethernet-phy compatible property

A recent commit extended the documentation of the ethernet-phy
compatible property, but placed the new paragraph under the max-speed
property.

Fixes: f00e756ed12d ("dt: Document a compatible entry for MDIO ethernet
Phys")
Cc: devicetree@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/phy.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
index 5b8c58903077..40831fbaff72 100644
--- a/Documentation/devicetree/bindings/net/phy.txt
+++ b/Documentation/devicetree/bindings/net/phy.txt
@@ -19,7 +19,6 @@ Optional Properties:
   specifications. If neither of these are specified, the default is to
   assume clause 22. The compatible list may also contain other
   elements.
-- max-speed: Maximum PHY supported speed (10, 100, 1000...)
 
   If the phy's identifier is known then the list may contain an entry
   of the form: "ethernet-phy-idAAAA.BBBB" where
@@ -29,6 +28,8 @@ Optional Properties:
             4 hex digits. This is the chip vendor OUI bits 19:24,
             followed by 10 bits of a vendor specific ID.
 
+- max-speed: Maximum PHY supported speed (10, 100, 1000...)
+
 Example:
 
 ethernet-phy@0 {
-- 
cgit v1.2.3


From 7b52314cc44569f56aa07abdbe43e6ccfcef9478 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan@kernel.org>
Date: Tue, 11 Nov 2014 20:00:15 +0100
Subject: net: phy: micrel: enable led-mode for KSZ8081/KSZ8091

Enable led-mode configuration for KSZ8081 and KSZ8091.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/micrel.txt | 2 ++
 drivers/net/phy/micrel.c                         | 1 +
 2 files changed, 3 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt
index e1d99b95c4ec..30062fae5623 100644
--- a/Documentation/devicetree/bindings/net/micrel.txt
+++ b/Documentation/devicetree/bindings/net/micrel.txt
@@ -14,6 +14,8 @@ Optional properties:
 	      KSZ8021: register 0x1f, bits 5..4
 	      KSZ8031: register 0x1f, bits 5..4
 	      KSZ8051: register 0x1f, bits 5..4
+	      KSZ8081: register 0x1f, bits 5..4
+	      KSZ8091: register 0x1f, bits 5..4
 
               See the respective PHY datasheet for the mode values.
 
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 12e18f7273ce..30e894d6ffbd 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -265,6 +265,7 @@ static int ks8051_config_init(struct phy_device *phydev)
 static int ksz8081_config_init(struct phy_device *phydev)
 {
 	kszphy_broadcast_disable(phydev);
+	kszphy_setup_led(phydev, MII_KSZPHY_CTRL_2);
 
 	return 0;
 }
-- 
cgit v1.2.3


From 9488e1e5b319751c71eebfd49027bf9e2377f38c Mon Sep 17 00:00:00 2001
From: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com>
Date: Thu, 13 Nov 2014 15:59:07 +0900
Subject: net: sh_eth: Add r8a7793 support

The device tree probing for R-Car M2N (r8a7793) is added.

Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/sh_eth.txt | 1 +
 drivers/net/ethernet/renesas/sh_eth.c            | 2 ++
 2 files changed, 3 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/sh_eth.txt b/Documentation/devicetree/bindings/net/sh_eth.txt
index 34d4db1a4e25..2f6ec85fda8e 100644
--- a/Documentation/devicetree/bindings/net/sh_eth.txt
+++ b/Documentation/devicetree/bindings/net/sh_eth.txt
@@ -9,6 +9,7 @@ Required properties:
 	      "renesas,ether-r8a7779"  if the device is a part of R8A7779 SoC.
 	      "renesas,ether-r8a7790"  if the device is a part of R8A7790 SoC.
 	      "renesas,ether-r8a7791"  if the device is a part of R8A7791 SoC.
+	      "renesas,ether-r8a7793"  if the device is a part of R8A7793 SoC.
 	      "renesas,ether-r8a7794"  if the device is a part of R8A7794 SoC.
 	      "renesas,ether-r7s72100" if the device is a part of R7S72100 SoC.
 - reg: offset and length of (1) the E-DMAC/feLic register block (required),
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 1f79ed61f9cd..71f59ee51304 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2747,6 +2747,7 @@ static const struct of_device_id sh_eth_match_table[] = {
 	{ .compatible = "renesas,ether-r8a7779", .data = &r8a777x_data },
 	{ .compatible = "renesas,ether-r8a7790", .data = &r8a779x_data },
 	{ .compatible = "renesas,ether-r8a7791", .data = &r8a779x_data },
+	{ .compatible = "renesas,ether-r8a7793", .data = &r8a779x_data },
 	{ .compatible = "renesas,ether-r8a7794", .data = &r8a779x_data },
 	{ .compatible = "renesas,ether-r7s72100", .data = &r7s72100_data },
 	{ }
@@ -2973,6 +2974,7 @@ static struct platform_device_id sh_eth_id_table[] = {
 	{ "r8a777x-ether", (kernel_ulong_t)&r8a777x_data },
 	{ "r8a7790-ether", (kernel_ulong_t)&r8a779x_data },
 	{ "r8a7791-ether", (kernel_ulong_t)&r8a779x_data },
+	{ "r8a7793-ether", (kernel_ulong_t)&r8a779x_data },
 	{ "r8a7794-ether", (kernel_ulong_t)&r8a779x_data },
 	{ }
 };
-- 
cgit v1.2.3


From 960fb622f85180f36d3aff82af53e2be3db2f888 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Sun, 16 Nov 2014 06:23:05 -0800
Subject: net: provide a per host RSS key generic infrastructure

RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.

Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.

This patch provides infrastructure to help drivers doing the right thing.

netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.

A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/sysctl/net.txt | 22 ++++++++++++++++++++++
 include/linux/netdevice.h    |  6 ++++++
 net/core/ethtool.c           | 11 +++++++++++
 net/core/sysctl_net_core.c   | 19 +++++++++++++++++++
 4 files changed, 58 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index e26c607468a6..666594b43cff 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -142,6 +142,28 @@ netdev_max_backlog
 Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
 receives packets faster than kernel can process them.
 
+netdev_rss_key
+--------------
+
+RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
+randomly generated.
+Some user space might need to gather its content even if drivers do not
+provide ethtool -x support yet.
+
+myhost:~# cat /proc/sys/net/core/netdev_rss_key
+84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
+
+File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
+Note:
+/proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
+but most drivers only use 40 bytes of it.
+
+myhost:~# ethtool -x eth0
+RX flow hash indirection table for eth0 with 8 RX ring(s):
+    0:    0     1     2     3     4     5     6     7
+RSS hash key:
+84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
+
 netdev_tstamp_prequeue
 ----------------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4a6f770377d3..db63cf459ba1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3422,6 +3422,12 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname);
 void *netdev_lower_dev_get_private(struct net_device *dev,
 				   struct net_device *lower_dev);
+
+/* RSS keys are 40 or 52 bytes long */
+#define NETDEV_RSS_KEY_LEN 52
+extern u8 netdev_rss_key[NETDEV_RSS_KEY_LEN];
+void netdev_rss_key_fill(void *buffer, size_t len);
+
 int dev_get_nest_level(struct net_device *dev,
 		       bool (*type_check)(struct net_device *dev));
 int skb_checksum_help(struct sk_buff *skb);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index b0f84f5ddda8..715f51f321e9 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/rtnetlink.h>
 #include <linux/sched.h>
+#include <linux/net.h>
 
 /*
  * Some useful ethtool_ops methods that're device independent.
@@ -573,6 +574,16 @@ static int ethtool_copy_validate_indir(u32 *indir, void __user *useraddr,
 	return 0;
 }
 
+u8 netdev_rss_key[NETDEV_RSS_KEY_LEN];
+
+void netdev_rss_key_fill(void *buffer, size_t len)
+{
+	BUG_ON(len > sizeof(netdev_rss_key));
+	net_get_random_once(netdev_rss_key, sizeof(netdev_rss_key));
+	memcpy(buffer, netdev_rss_key, len);
+}
+EXPORT_SYMBOL(netdev_rss_key_fill);
+
 static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
 						     void __user *useraddr)
 {
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index f93f092fe226..31baba2a71ce 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -217,6 +217,18 @@ static int set_default_qdisc(struct ctl_table *table, int write,
 }
 #endif
 
+static int proc_do_rss_key(struct ctl_table *table, int write,
+			   void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct ctl_table fake_table;
+	char buf[NETDEV_RSS_KEY_LEN * 3];
+
+	snprintf(buf, sizeof(buf), "%*phC", NETDEV_RSS_KEY_LEN, netdev_rss_key);
+	fake_table.data = buf;
+	fake_table.maxlen = sizeof(buf);
+	return proc_dostring(&fake_table, write, buffer, lenp, ppos);
+}
+
 static struct ctl_table net_core_table[] = {
 #ifdef CONFIG_NET
 	{
@@ -265,6 +277,13 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "netdev_rss_key",
+		.data		= &netdev_rss_key,
+		.maxlen		= sizeof(int),
+		.mode		= 0444,
+		.proc_handler	= proc_do_rss_key,
+	},
 #ifdef CONFIG_BPF_JIT
 	{
 		.procname	= "bpf_jit_enable",
-- 
cgit v1.2.3


From 3ff9027ca6b00e194d2eae353febf7233cfcc1ea Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq@ti.com>
Date: Fri, 14 Nov 2014 17:37:39 +0200
Subject: can: c_can: Add syscon/regmap RAMINIT mechanism

Some TI SoCs like DRA7 have a RAMINIT register specification
different from the other AMxx SoCs and as expected by the
existing driver.

To add more insanity, this register is shared with other
IPs like DSS, PCIe and PWM.

Provides a more generic mechanism to specify the RAMINIT
register location and START/DONE bit position and use the
syscon/regmap framework to access the register.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 .../devicetree/bindings/net/can/c_can.txt          |   3 +
 drivers/net/can/c_can/c_can.h                      |  10 +-
 drivers/net/can/c_can/c_can_platform.c             | 109 +++++++++++++--------
 3 files changed, 81 insertions(+), 41 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt
index 8f1ae81228e3..a3ca3ee53546 100644
--- a/Documentation/devicetree/bindings/net/can/c_can.txt
+++ b/Documentation/devicetree/bindings/net/can/c_can.txt
@@ -12,6 +12,9 @@ Required properties:
 Optional properties:
 - ti,hwmods		: Must be "d_can<n>" or "c_can<n>", n being the
 			  instance number
+- syscon-raminit	: Handle to system control region that contains the
+			  RAMINIT register, register offset to the RAMINIT
+			  register and the CAN instance number (0 offset).
 
 Note: "ti,hwmods" field is used to fetch the base address and irq
 resources from TI, omap hwmod data base during device registration.
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 3f111f4f0f6e..28a73d14ea8d 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -183,6 +183,13 @@ struct c_can_driver_data {
 	bool raminit_pulse;	/* If set, sets and clears START bit (pulse) */
 };
 
+/* Out of band RAMINIT register access via syscon regmap */
+struct c_can_raminit {
+	struct regmap *syscon;	/* for raminit ctrl. reg. access */
+	unsigned int reg;	/* register index within syscon */
+	struct raminit_bits bits;
+};
+
 /* c_can private data structure */
 struct c_can_priv {
 	struct can_priv can;	/* must be the first member */
@@ -200,8 +207,7 @@ struct c_can_priv {
 	const u16 *regs;
 	void *priv;		/* for board-specific data */
 	enum c_can_dev_id type;
-	u32 __iomem *raminit_ctrlreg;
-	int instance;
+	struct c_can_raminit raminit_sys;	/* RAMINIT via syscon regmap */
 	void (*raminit) (const struct c_can_priv *priv, bool enable);
 	u32 comm_rcv_high;
 	u32 rxmasked;
diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c
index 44c293926f78..1fbfa1d59c29 100644
--- a/drivers/net/can/c_can/c_can_platform.c
+++ b/drivers/net/can/c_can/c_can_platform.c
@@ -32,14 +32,13 @@
 #include <linux/clk.h>
 #include <linux/of.h>
 #include <linux/of_device.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
 
 #include <linux/can/dev.h>
 
 #include "c_can.h"
 
-#define CAN_RAMINIT_START_MASK(i)	(0x001 << (i))
-#define CAN_RAMINIT_DONE_MASK(i)	(0x100 << (i))
-#define CAN_RAMINIT_ALL_MASK(i)		(0x101 << (i))
 #define DCAN_RAM_INIT_BIT		(1 << 3)
 static DEFINE_SPINLOCK(raminit_lock);
 /*
@@ -72,48 +71,57 @@ static void c_can_plat_write_reg_aligned_to_32bit(const struct c_can_priv *priv,
 	writew(val, priv->base + 2 * priv->regs[index]);
 }
 
-static void c_can_hw_raminit_wait_ti(const struct c_can_priv *priv, u32 mask,
-				  u32 val)
+static void c_can_hw_raminit_wait_syscon(const struct c_can_priv *priv,
+					 u32 mask, u32 val)
 {
+	const struct c_can_raminit *raminit = &priv->raminit_sys;
 	int timeout = 0;
+	u32 ctrl = 0;
 
 	/* We look only at the bits of our instance. */
 	val &= mask;
-	while ((readl(priv->raminit_ctrlreg) & mask) != val) {
+	do {
 		udelay(1);
 		timeout++;
 
+		regmap_read(raminit->syscon, raminit->reg, &ctrl);
 		if (timeout == 1000) {
 			dev_err(&priv->dev->dev, "%s: time out\n", __func__);
 			break;
 		}
-	}
+	} while ((ctrl & mask) != val);
 }
 
-static void c_can_hw_raminit_ti(const struct c_can_priv *priv, bool enable)
+static void c_can_hw_raminit_syscon(const struct c_can_priv *priv, bool enable)
 {
-	u32 mask = CAN_RAMINIT_ALL_MASK(priv->instance);
-	u32 ctrl;
+	const struct c_can_raminit *raminit = &priv->raminit_sys;
+	u32 ctrl = 0;
+	u32 mask;
 
 	spin_lock(&raminit_lock);
 
-	ctrl = readl(priv->raminit_ctrlreg);
+	mask = 1 << raminit->bits.start | 1 << raminit->bits.done;
+	regmap_read(raminit->syscon, raminit->reg, &ctrl);
+
 	/* We clear the done and start bit first. The start bit is
 	 * looking at the 0 -> transition, but is not self clearing;
 	 * And we clear the init done bit as well.
+	 * NOTE: DONE must be written with 1 to clear it.
 	 */
-	ctrl &= ~CAN_RAMINIT_START_MASK(priv->instance);
-	ctrl |= CAN_RAMINIT_DONE_MASK(priv->instance);
-	writel(ctrl, priv->raminit_ctrlreg);
-	ctrl &= ~CAN_RAMINIT_DONE_MASK(priv->instance);
-	c_can_hw_raminit_wait_ti(priv, mask, ctrl);
+	ctrl &= ~(1 << raminit->bits.start);
+	ctrl |= 1 << raminit->bits.done;
+	regmap_write(raminit->syscon, raminit->reg, ctrl);
+
+	ctrl &= ~(1 << raminit->bits.done);
+	c_can_hw_raminit_wait_syscon(priv, mask, ctrl);
 
 	if (enable) {
 		/* Set start bit and wait for the done bit. */
-		ctrl |= CAN_RAMINIT_START_MASK(priv->instance);
-		writel(ctrl, priv->raminit_ctrlreg);
-		ctrl |= CAN_RAMINIT_DONE_MASK(priv->instance);
-		c_can_hw_raminit_wait_ti(priv, mask, ctrl);
+		ctrl |= 1 << raminit->bits.start;
+		regmap_write(raminit->syscon, raminit->reg, ctrl);
+
+		ctrl |= 1 << raminit->bits.done;
+		c_can_hw_raminit_wait_syscon(priv, mask, ctrl);
 	}
 	spin_unlock(&raminit_lock);
 }
@@ -207,10 +215,11 @@ static int c_can_plat_probe(struct platform_device *pdev)
 	struct net_device *dev;
 	struct c_can_priv *priv;
 	const struct of_device_id *match;
-	struct resource *mem, *res;
+	struct resource *mem;
 	int irq;
 	struct clk *clk;
 	const struct c_can_driver_data *drvdata;
+	struct device_node *np = pdev->dev.of_node;
 
 	match = of_match_device(c_can_of_table, &pdev->dev);
 	if (match) {
@@ -278,27 +287,49 @@ static int c_can_plat_probe(struct platform_device *pdev)
 		priv->read_reg32 = d_can_plat_read_reg32;
 		priv->write_reg32 = d_can_plat_write_reg32;
 
-		if (pdev->dev.of_node)
-			priv->instance = of_alias_get_id(pdev->dev.of_node, "d_can");
-		else
-			priv->instance = pdev->id;
-
-		res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-		/* Not all D_CAN modules have a separate register for the D_CAN
-		 * RAM initialization. Use default RAM init bit in D_CAN module
-		 * if not specified in DT.
+		/* Check if we need custom RAMINIT via syscon. Mostly for TI
+		 * platforms. Only supported with DT boot.
 		 */
-		if (!res) {
+		if (np && of_property_read_bool(np, "syscon-raminit")) {
+			u32 id;
+			struct c_can_raminit *raminit = &priv->raminit_sys;
+
+			ret = -EINVAL;
+			raminit->syscon = syscon_regmap_lookup_by_phandle(np,
+									  "syscon-raminit");
+			if (IS_ERR(raminit->syscon)) {
+				/* can fail with -EPROBE_DEFER */
+				ret = PTR_ERR(raminit->syscon);
+				free_c_can_dev(dev);
+				return ret;
+			}
+
+			if (of_property_read_u32_index(np, "syscon-raminit", 1,
+						       &raminit->reg)) {
+				dev_err(&pdev->dev,
+					"couldn't get the RAMINIT reg. offset!\n");
+				goto exit_free_device;
+			}
+
+			if (of_property_read_u32_index(np, "syscon-raminit", 2,
+						       &id)) {
+				dev_err(&pdev->dev,
+					"couldn't get the CAN instance ID\n");
+				goto exit_free_device;
+			}
+
+			if (id >= drvdata->raminit_num) {
+				dev_err(&pdev->dev,
+					"Invalid CAN instance ID\n");
+				goto exit_free_device;
+			}
+
+			raminit->bits = drvdata->raminit_bits[id];
+
+			priv->raminit = c_can_hw_raminit_syscon;
+		} else {
 			priv->raminit = c_can_hw_raminit;
-			break;
 		}
-
-		priv->raminit_ctrlreg = devm_ioremap(&pdev->dev, res->start,
-						     resource_size(res));
-		if (!priv->raminit_ctrlreg || priv->instance < 0)
-			dev_info(&pdev->dev, "control memory is not used for raminit\n");
-		else
-			priv->raminit = c_can_hw_raminit_ti;
 		break;
 	default:
 		ret = -EINVAL;
-- 
cgit v1.2.3


From 0f4da3a8da5fe3a079b1adf613d121a0fafd63f1 Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq@ti.com>
Date: Fri, 7 Nov 2014 16:49:21 +0200
Subject: can: c_can: Add support for TI DRA7 DCAN

DRA7 SoC has 2 CAN IPs. Provide compatible IDs and RAMINIT
register data for both.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 Documentation/devicetree/bindings/net/can/c_can.txt |  1 +
 drivers/net/can/c_can/c_can_platform.c              | 13 +++++++++++++
 2 files changed, 14 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt
index a3ca3ee53546..f682fdb6d921 100644
--- a/Documentation/devicetree/bindings/net/can/c_can.txt
+++ b/Documentation/devicetree/bindings/net/can/c_can.txt
@@ -4,6 +4,7 @@ Bosch C_CAN/D_CAN controller Device Tree Bindings
 Required properties:
 - compatible		: Should be "bosch,c_can" for C_CAN controllers and
 			  "bosch,d_can" for D_CAN controllers.
+			  Can be "ti,dra7-d_can".
 - reg			: physical base address and size of the C_CAN/D_CAN
 			  registers map
 - interrupts		: property with a value describing the interrupt
diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c
index 41fa460c3592..570da5f56aae 100644
--- a/drivers/net/can/c_can/c_can_platform.c
+++ b/drivers/net/can/c_can/c_can_platform.c
@@ -190,6 +190,18 @@ static const struct c_can_driver_data d_can_drvdata = {
 	.id = BOSCH_D_CAN,
 };
 
+static const struct raminit_bits dra7_raminit_bits[] = {
+	[0] = { .start = 3, .done = 1, },
+	[1] = { .start = 5, .done = 2, },
+};
+
+static const struct c_can_driver_data dra7_dcan_drvdata = {
+	.id = BOSCH_D_CAN,
+	.raminit_num = ARRAY_SIZE(dra7_raminit_bits),
+	.raminit_bits = dra7_raminit_bits,
+	.raminit_pulse = true,
+};
+
 static struct platform_device_id c_can_id_table[] = {
 	{
 		.name = KBUILD_MODNAME,
@@ -210,6 +222,7 @@ MODULE_DEVICE_TABLE(platform, c_can_id_table);
 static const struct of_device_id c_can_of_table[] = {
 	{ .compatible = "bosch,c_can", .data = &c_can_drvdata },
 	{ .compatible = "bosch,d_can", .data = &d_can_drvdata },
+	{ .compatible = "ti,dra7-d_can", .data = &dra7_dcan_drvdata },
 	{ /* sentinel */ },
 };
 MODULE_DEVICE_TABLE(of, c_can_of_table);
-- 
cgit v1.2.3


From c71d0b31bf71b84ad566292ea2412755da7934b2 Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq@ti.com>
Date: Fri, 7 Nov 2014 16:49:22 +0200
Subject: can: c_can: Add support for TI am3352 DCAN

AM3352 SoC has 2 DCAN modules. Add compatible id and
raminit driver data for am3352 DCAN.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 Documentation/devicetree/bindings/net/can/c_can.txt |  2 +-
 drivers/net/can/c_can/c_can_platform.c              | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt
index f682fdb6d921..6731730eec0d 100644
--- a/Documentation/devicetree/bindings/net/can/c_can.txt
+++ b/Documentation/devicetree/bindings/net/can/c_can.txt
@@ -4,7 +4,7 @@ Bosch C_CAN/D_CAN controller Device Tree Bindings
 Required properties:
 - compatible		: Should be "bosch,c_can" for C_CAN controllers and
 			  "bosch,d_can" for D_CAN controllers.
-			  Can be "ti,dra7-d_can".
+			  Can be "ti,dra7-d_can" or "ti,am3352-d_can".
 - reg			: physical base address and size of the C_CAN/D_CAN
 			  registers map
 - interrupts		: property with a value describing the interrupt
diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c
index 570da5f56aae..f4488e5d5d68 100644
--- a/drivers/net/can/c_can/c_can_platform.c
+++ b/drivers/net/can/c_can/c_can_platform.c
@@ -202,6 +202,17 @@ static const struct c_can_driver_data dra7_dcan_drvdata = {
 	.raminit_pulse = true,
 };
 
+static const struct raminit_bits am3352_raminit_bits[] = {
+	[0] = { .start = 0, .done = 8, },
+	[1] = { .start = 1, .done = 9, },
+};
+
+static const struct c_can_driver_data am3352_dcan_drvdata = {
+	.id = BOSCH_D_CAN,
+	.raminit_num = ARRAY_SIZE(am3352_raminit_bits),
+	.raminit_bits = am3352_raminit_bits,
+};
+
 static struct platform_device_id c_can_id_table[] = {
 	{
 		.name = KBUILD_MODNAME,
@@ -223,6 +234,7 @@ static const struct of_device_id c_can_of_table[] = {
 	{ .compatible = "bosch,c_can", .data = &c_can_drvdata },
 	{ .compatible = "bosch,d_can", .data = &d_can_drvdata },
 	{ .compatible = "ti,dra7-d_can", .data = &dra7_dcan_drvdata },
+	{ .compatible = "ti,am3352-d_can", .data = &am3352_dcan_drvdata },
 	{ /* sentinel */ },
 };
 MODULE_DEVICE_TABLE(of, c_can_of_table);
-- 
cgit v1.2.3


From f2bf2589834faec7af8c02c3949c90788d21b790 Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq@ti.com>
Date: Mon, 17 Nov 2014 14:09:03 +0200
Subject: net: can: c_can: Add support for TI am4372 DCAN

AM4372 SoC has 2 DCAN modules. Add compatible id and
raminit driver data for it. The driver data is same as AM3352
but this gives us flexibility to add AM4372 specific quirks
if required later.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 Documentation/devicetree/bindings/net/can/c_can.txt | 3 ++-
 drivers/net/can/c_can/c_can_platform.c              | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt
index 6731730eec0d..5a1d8b0c39e9 100644
--- a/Documentation/devicetree/bindings/net/can/c_can.txt
+++ b/Documentation/devicetree/bindings/net/can/c_can.txt
@@ -4,7 +4,8 @@ Bosch C_CAN/D_CAN controller Device Tree Bindings
 Required properties:
 - compatible		: Should be "bosch,c_can" for C_CAN controllers and
 			  "bosch,d_can" for D_CAN controllers.
-			  Can be "ti,dra7-d_can" or "ti,am3352-d_can".
+			  Can be "ti,dra7-d_can", "ti,am3352-d_can" or
+			  "ti,am4372-d_can".
 - reg			: physical base address and size of the C_CAN/D_CAN
 			  registers map
 - interrupts		: property with a value describing the interrupt
diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c
index f4488e5d5d68..a4535d2142a7 100644
--- a/drivers/net/can/c_can/c_can_platform.c
+++ b/drivers/net/can/c_can/c_can_platform.c
@@ -235,6 +235,7 @@ static const struct of_device_id c_can_of_table[] = {
 	{ .compatible = "bosch,d_can", .data = &d_can_drvdata },
 	{ .compatible = "ti,dra7-d_can", .data = &dra7_dcan_drvdata },
 	{ .compatible = "ti,am3352-d_can", .data = &am3352_dcan_drvdata },
+	{ .compatible = "ti,am4372-d_can", .data = &am3352_dcan_drvdata },
 	{ /* sentinel */ },
 };
 MODULE_DEVICE_TABLE(of, c_can_of_table);
-- 
cgit v1.2.3


From 098ea6bc4cb5890d09b1b79154fa11182e0f71e0 Mon Sep 17 00:00:00 2001
From: Amitkumar Karwar <akarwar@marvell.com>
Date: Wed, 19 Nov 2014 01:28:32 -0800
Subject: Bluetooth: btmrvl: add DT bindings documentation

Calibration data can be downloaded through device tree method. This
patch adds the documentation. Also, instead of searching device tree
node by name using of_find_node_by_name() API, let's use
for_each_compatible_node().

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 Documentation/devicetree/bindings/btmrvl.txt | 22 ++++++++++++++++++++
 drivers/bluetooth/btmrvl_main.c              | 31 +++++++++++++---------------
 2 files changed, 36 insertions(+), 17 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/btmrvl.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/btmrvl.txt b/Documentation/devicetree/bindings/btmrvl.txt
new file mode 100644
index 000000000000..cdd57680a749
--- /dev/null
+++ b/Documentation/devicetree/bindings/btmrvl.txt
@@ -0,0 +1,22 @@
+btmrvl
+------
+
+Required properties:
+
+  - compatible : must be "btmrvl,cfgdata"
+
+Optional properties:
+
+  - btmrvl,cal-data : Calibration data downloaded to the device during
+		      initialization. This is an array of 28 values(u8).
+
+Example:
+
+btmrvl {
+	compatible = "btmrvl,cfgdata";
+
+	btmrvl,cal-data = /bits/ 8 <
+		0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02
+		0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00
+		0x00 0x00 0xf0 0x00>;
+};
diff --git a/drivers/bluetooth/btmrvl_main.c b/drivers/bluetooth/btmrvl_main.c
index 1d7db2064889..2909bca6b8b6 100644
--- a/drivers/bluetooth/btmrvl_main.c
+++ b/drivers/bluetooth/btmrvl_main.c
@@ -496,25 +496,22 @@ static int btmrvl_cal_data_dt(struct btmrvl_private *priv)
 {
 	struct device_node *dt_node;
 	u8 cal_data[BT_CAL_HDR_LEN + BT_CAL_DATA_SIZE];
-	const char name[] = "btmrvl_caldata";
-	const char property[] = "btmrvl,caldata";
 	int ret;
 
-	dt_node = of_find_node_by_name(NULL, name);
-	if (!dt_node)
-		return -ENODEV;
-
-	ret = of_property_read_u8_array(dt_node, property,
-					cal_data + BT_CAL_HDR_LEN,
-					BT_CAL_DATA_SIZE);
-	if (ret)
-		return ret;
-
-	BT_DBG("Use cal data from device tree");
-	ret = btmrvl_download_cal_data(priv, cal_data, BT_CAL_DATA_SIZE);
-	if (ret) {
-		BT_ERR("Fail to download calibrate data");
-		return ret;
+	for_each_compatible_node(dt_node, NULL, "btmrvl,cfgdata") {
+		ret = of_property_read_u8_array(dt_node, "btmrvl,cal-data",
+						cal_data + BT_CAL_HDR_LEN,
+						BT_CAL_DATA_SIZE);
+		if (ret)
+			return ret;
+
+		BT_DBG("Use cal data from device tree");
+		ret = btmrvl_download_cal_data(priv, cal_data,
+					       BT_CAL_DATA_SIZE);
+		if (ret) {
+			BT_ERR("Fail to download calibrate data");
+			return ret;
+		}
 	}
 
 	return 0;
-- 
cgit v1.2.3


From 025a60a752261fd479a4bb74cf7cb04fdc313ec7 Mon Sep 17 00:00:00 2001
From: Amitkumar Karwar <akarwar@marvell.com>
Date: Wed, 19 Nov 2014 01:28:33 -0800
Subject: Bluetooth: btmrvl: add DT-bindings for gpio-gap

This can be used to have GPIO host wakeup method suitable for the
platform and configurable GAP for host sleep handshake.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 Documentation/devicetree/bindings/btmrvl.txt |  7 +++++++
 drivers/bluetooth/btmrvl_main.c              | 12 +++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/btmrvl.txt b/Documentation/devicetree/bindings/btmrvl.txt
index cdd57680a749..58f964bb0a52 100644
--- a/Documentation/devicetree/bindings/btmrvl.txt
+++ b/Documentation/devicetree/bindings/btmrvl.txt
@@ -10,8 +10,14 @@ Optional properties:
   - btmrvl,cal-data : Calibration data downloaded to the device during
 		      initialization. This is an array of 28 values(u8).
 
+  - btmrvl,gpio-gap : gpio and gap (in msecs) combination to be
+		      configured.
+
 Example:
 
+GPIO pin 13 is configured as a wakeup source and GAP is set to 100 msecs
+in below example.
+
 btmrvl {
 	compatible = "btmrvl,cfgdata";
 
@@ -19,4 +25,5 @@ btmrvl {
 		0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02
 		0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00
 		0x00 0x00 0xf0 0x00>;
+	btmrvl,gpio-gap = <0x0d64>;
 };
diff --git a/drivers/bluetooth/btmrvl_main.c b/drivers/bluetooth/btmrvl_main.c
index 2909bca6b8b6..e43c495e8181 100644
--- a/drivers/bluetooth/btmrvl_main.c
+++ b/drivers/bluetooth/btmrvl_main.c
@@ -492,13 +492,18 @@ static int btmrvl_download_cal_data(struct btmrvl_private *priv,
 	return 0;
 }
 
-static int btmrvl_cal_data_dt(struct btmrvl_private *priv)
+static int btmrvl_check_device_tree(struct btmrvl_private *priv)
 {
 	struct device_node *dt_node;
 	u8 cal_data[BT_CAL_HDR_LEN + BT_CAL_DATA_SIZE];
 	int ret;
+	u32 val;
 
 	for_each_compatible_node(dt_node, NULL, "btmrvl,cfgdata") {
+		ret = of_property_read_u32(dt_node, "btmrvl,gpio-gap", &val);
+		if (!ret)
+			priv->btmrvl_dev.gpio_gap = val;
+
 		ret = of_property_read_u8_array(dt_node, "btmrvl,cal-data",
 						cal_data + BT_CAL_HDR_LEN,
 						BT_CAL_DATA_SIZE);
@@ -523,14 +528,15 @@ static int btmrvl_setup(struct hci_dev *hdev)
 
 	btmrvl_send_module_cfg_cmd(priv, MODULE_BRINGUP_REQ);
 
-	btmrvl_cal_data_dt(priv);
+	priv->btmrvl_dev.gpio_gap = 0xffff;
+
+	btmrvl_check_device_tree(priv);
 
 	btmrvl_pscan_window_reporting(priv, 0x01);
 
 	priv->btmrvl_dev.psmode = 1;
 	btmrvl_enable_ps(priv);
 
-	priv->btmrvl_dev.gpio_gap = 0xffff;
 	btmrvl_send_hscfg_cmd(priv);
 
 	return 0;
-- 
cgit v1.2.3


From 233b36cf1f83ef4a7b3670ffa5eeac1ed9d41889 Mon Sep 17 00:00:00 2001
From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 18 Nov 2014 09:46:59 +0100
Subject: stmmac: update driver documentation

Recently many changes have been done inside the driver
so this patch updates the driver's doc for example reviewing
information for the rx and tx processes that are managed
by napi method, adding new information for missing glue-logic files
etc.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/stmmac.txt | 132 ++++++++++++++++++------------------
 1 file changed, 65 insertions(+), 67 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 2090895b08d4..e655e2453c98 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -1,12 +1,12 @@
        STMicroelectronics 10/100/1000 Synopsys Ethernet driver
 
-Copyright (C) 2007-2013  STMicroelectronics Ltd
+Copyright (C) 2007-2014  STMicroelectronics Ltd
 Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
 
 This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers
 (Synopsys IP blocks).
 
-Currently this network device driver is for all STM embedded MAC/GMAC
+Currently this network device driver is for all STi embedded MAC/GMAC
 (i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000
 FF1152AMT0221 D1215994A VIRTEX FPGA board.
 
@@ -22,6 +22,9 @@ The kernel configuration option is STMMAC_ETH:
  Device Drivers ---> Network device support ---> Ethernet (1000 Mbit) --->
  STMicroelectronics 10/100/1000 Ethernet driver (STMMAC_ETH)
 
+CONFIG_STMMAC_PLATFORM: is to enable the platform driver.
+CONFIG_STMMAC_PCI: is to enable the pci driver.
+
 2) Driver parameters list:
 	debug: message level (0: no output, 16: all);
 	phyaddr: to manually provide the physical address to the PHY device;
@@ -45,10 +48,11 @@ Driver parameters can be also passed in command line by using:
 The xmit method is invoked when the kernel needs to transmit a packet; it sets
 the descriptors in the ring and informs the DMA engine that there is a packet
 ready to be transmitted.
-Once the controller has finished transmitting the packet, an interrupt is
-triggered; So the driver will be able to release the socket buffers.
 By default, the driver sets the NETIF_F_SG bit in the features field of the
-net_device structure enabling the scatter/gather feature.
+net_device structure enabling the scatter-gather feature. This is true on
+chips and configurations where the checksum can be done in hardware.
+Once the controller has finished transmitting the packet, napi will be
+scheduled to release the transmit resources.
 
 4.2) Receive process
 When one or more packets are received, an interrupt happens. The interrupts
@@ -58,20 +62,12 @@ This is based on NAPI so the interrupt handler signals only if there is work
 to be done, and it exits.
 Then the poll method will be scheduled at some future point.
 The incoming packets are stored, by the DMA, in a list of pre-allocated socket
-buffers in order to avoid the memcpy (Zero-copy).
+buffers in order to avoid the memcpy (zero-copy).
 
 4.3) Interrupt Mitigation
 The driver is able to mitigate the number of its DMA interrupts
 using NAPI for the reception on chips older than the 3.50.
 New chips have an HW RX-Watchdog used for this mitigation.
-
-On Tx-side, the mitigation schema is based on a SW timer that calls the
-tx function (stmmac_tx) to reclaim the resource after transmitting the
-frames.
-Also there is another parameter (like a threshold) used to program
-the descriptors avoiding to set the interrupt on completion bit in
-when the frame is sent (xmit).
-
 Mitigation parameters can be tuned by ethtool.
 
 4.4) WOL
@@ -79,7 +75,7 @@ Wake up on Lan feature through Magic and Unicast frames are supported for the
 GMAC core.
 
 4.5) DMA descriptors
-Driver handles both normal and enhanced descriptors. The latter has been only
+Driver handles both normal and alternate descriptors. The latter has been only
 tested on DWC Ether MAC 10/100/1000 Universal version 3.41a and later.
 
 STMMAC supports DMA descriptor to operate both in dual buffer (RING)
@@ -91,9 +87,20 @@ In CHAINED mode each descriptor will have pointer to next descriptor in
 the list, hence creating the explicit chaining in the descriptor itself,
 whereas such explicit chaining is not possible in RING mode.
 
+4.5.1) Extended descriptors
+	The extended descriptors give us information about the Ethernet payload
+	when it is carrying PTP packets or TCP/UDP/ICMP over IP.
+	These are not available on GMAC Synopsys chips older than the 3.50.
+	At probe time the driver will decide if these can be actually used.
+	This support also is mandatory for PTPv2 because the extra descriptors
+	are used for saving the hardware timestamps and Extended Status.
+
 4.6) Ethtool support
-Ethtool is supported. Driver statistics and internal errors can be taken using:
-ethtool -S ethX command. It is possible to dump registers etc.
+Ethtool is supported.
+
+For example, driver statistics (including RMON), internal errors can be taken
+using:
+  # ethtool -S ethX command
 
 4.7) Jumbo and Segmentation Offloading
 Jumbo frames are supported and tested for the GMAC.
@@ -101,12 +108,11 @@ The GSO has been also added but it's performed in software.
 LRO is not supported.
 
 4.8) Physical
-The driver is compatible with PAL to work with PHY and GPHY devices.
+The driver is compatible with Physical Abstraction Layer to be connected with
+PHY and GPHY devices.
 
 4.9) Platform information
-Several driver's information can be passed through the platform
-These are included in the include/linux/stmmac.h header file
-and detailed below as well:
+Several information can be passed through the platform and device-tree.
 
 struct plat_stmmacenet_data {
 	char *phy_bus_name;
@@ -125,15 +131,18 @@ struct plat_stmmacenet_data {
 	int force_sf_dma_mode;
 	int force_thresh_dma_mode;
 	int riwt_off;
+	int max_speed;
+	int maxmtu;
 	void (*fix_mac_speed)(void *priv, unsigned int speed);
 	void (*bus_setup)(void __iomem *ioaddr);
 	void *(*setup)(struct platform_device *pdev);
+	void (*free)(struct platform_device *pdev, void *priv);
 	int (*init)(struct platform_device *pdev, void *priv);
 	void (*exit)(struct platform_device *pdev, void *priv);
 	void *custom_cfg;
 	void *custom_data;
 	void *bsp_priv;
- };
+};
 
 Where:
  o phy_bus_name: phy bus name to attach to the stmmac.
@@ -258,32 +267,43 @@ and the second one, with a real PHY device attached to the bus,
 by using the stmmac_mdio_bus_data structure (to provide the id, the
 reset procedure etc).
 
-4.10) List of source files:
- o Kconfig
- o Makefile
- o stmmac_main.c: main network device driver;
- o stmmac_mdio.c: mdio functions;
- o stmmac_pci: PCI driver;
- o stmmac_platform.c: platform driver
- o stmmac_ethtool.c: ethtool support;
- o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
-		      (only tested on ST40 platforms based);
+Note that, starting from new chips, where it is available the HW capability
+register, many configurations are discovered at run-time for example to
+understand if EEE, HW csum, PTP, enhanced descriptor etc are actually
+available. As strategy adopted in this driver, the information from the HW
+capability register can replace what has been passed from the platform.
+
+4.10) Device-tree support.
+
+Please see the following document:
+	Documentation/devicetree/bindings/net/stmmac.txt
+
+and the stmmac_of_data structure inside the include/linux/stmmac.h header file.
+
+4.11) This is a summary of the content of some relevant files:
+ o stmmac_main.c: to implement the main network device driver;
+ o stmmac_mdio.c: to provide mdio functions;
+ o stmmac_pci: this the PCI driver;
+ o stmmac_platform.c: this the platform driver (OF supported)
+ o stmmac_ethtool.c: to implement the ethtool support;
  o stmmac.h: private driver structure;
  o common.h: common definitions and VFTs;
  o descs.h: descriptor structure definitions;
- o dwmac1000_core.c: GMAC core functions;
- o dwmac1000_dma.c:  dma functions for the GMAC chip;
- o dwmac1000.h: specific header file for the GMAC;
- o dwmac100_core: MAC 100 core and dma code;
- o dwmac100_dma.c: dma functions for the MAC chip;
+ o dwmac1000_core.c: dwmac GiGa core functions;
+ o dwmac1000_dma.c: dma functions for the GMAC chip;
+ o dwmac1000.h: specific header file for the dwmac GiGa;
+ o dwmac100_core: dwmac 100 core code;
+ o dwmac100_dma.c: dma functions for the dwmac 100 chip;
  o dwmac1000.h: specific header file for the MAC;
- o dwmac_lib.c: generic DMA functions shared among chips;
+ o dwmac_lib.c: generic DMA functions;
  o enh_desc.c: functions for handling enhanced descriptors;
  o norm_desc.c: functions for handling normal descriptors;
  o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes;
  o mmc_core.c/mmc.h: Management MAC Counters;
- o stmmac_hwtstamp.c: HW timestamp support for PTP
- o stmmac_ptp.c: PTP 1588 clock
+ o stmmac_hwtstamp.c: HW timestamp support for PTP;
+ o stmmac_ptp.c: PTP 1588 clock;
+ o dwmac-<XXX>.c: these are for the platform glue-logic file; e.g. dwmac-sti.c
+   for STMicroelectronics SoCs.
 
 5) Debug Information
 
@@ -298,23 +318,14 @@ to get statistics: e.g. using: ethtool -S ethX
 (that shows the Management counters (MMC) if supported)
 or sees the MAC/DMA registers: e.g. using: ethtool -d ethX
 
-Compiling the Kernel with CONFIG_DEBUG_FS and enabling the
-STMMAC_DEBUG_FS option the driver will export the following
+Compiling the Kernel with CONFIG_DEBUG_FS the driver will export the following
 debugfs entries:
 
 /sys/kernel/debug/stmmaceth/descriptors_status
   To show the DMA TX/RX descriptor rings
 
-Developer can also use the "debug" module parameter to get
-further debug information.
-
-In the end, there are other macros (that cannot be enabled
-via menuconfig) to turn-on the RX/TX DMA debugging,
-specific MAC core debug printk etc. Others to enable the
-debug in the TX and RX processes.
-All these are only useful during the developing stage
-and should never enabled inside the code for general usage.
-In fact, these can generate an huge amount of debug messages.
+Developer can also use the "debug" module parameter to get further debug
+information (please see: NETIF Msg Level).
 
 6) Energy Efficient Ethernet
 
@@ -337,15 +348,7 @@ To enter in Tx LPI mode the driver needs to have a software timer
 that enable and disable the LPI mode when there is nothing to be
 transmitted.
 
-7) Extended descriptors
-The extended descriptors give us information about the receive Ethernet payload
-when it is carrying PTP packets or TCP/UDP/ICMP over IP.
-These are not available on GMAC Synopsys chips older than the 3.50.
-At probe time the driver will decide if these can be actually used.
-This support also is mandatory for PTPv2 because the extra descriptors 6 and 7
-are used for saving the hardware timestamps.
-
-8) Precision Time Protocol (PTP)
+7) Precision Time Protocol (PTP)
 The driver supports the IEEE 1588-2002, Precision Time Protocol (PTP),
 which enables precise synchronization of clocks in measurement and
 control systems implemented with technologies such as network
@@ -355,7 +358,7 @@ In addition to the basic timestamp features mentioned in IEEE 1588-2002
 Timestamps, new GMAC cores support the advanced timestamp features.
 IEEE 1588-2008 that can be enabled when configure the Kernel.
 
-9) SGMII/RGMII supports
+8) SGMII/RGMII supports
 New GMAC devices provide own way to manage RGMII/SGMII.
 This information is available at run-time by looking at the
 HW capability register. This means that the stmmac can manage
@@ -364,8 +367,3 @@ In fact, the HW provides a subset of extended registers to
 restart the ANE, verify Full/Half duplex mode and Speed.
 Also thanks to these registers it is possible to look at the
 Auto-negotiated Link Parter Ability.
-
-10) TODO:
- o XGMAC is not supported.
- o Complete the TBI & RTBI support.
- o extend VLAN support for 3.70a SYNP GMAC.
-- 
cgit v1.2.3


From 86dc1342bcbb1905b2ac9653a559b303f62bd728 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan@kernel.org>
Date: Wed, 19 Nov 2014 12:59:19 +0100
Subject: net: phy: micrel: add support for clock-mode select to
 KSZ8081/KSZ8091

Micrel KSZ8081 and KSZ8091 PHYs have the RMII Reference Clock Select
bit, which is used to select 25 or 50 MHz clock mode.

Note that on some revisions of the PHY (e.g. KSZ8081RND) the function of
this bit is inverted so that setting it enables 25 rather than 50 MHz
mode. Add a new device-tree property
"micrel,rmii-reference-clock-select-25-mhz" to describe this.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/micrel.txt |  4 ++--
 drivers/net/phy/micrel.c                         | 11 ++++++-----
 2 files changed, 8 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt
index 30062fae5623..a1bab5eaae02 100644
--- a/Documentation/devicetree/bindings/net/micrel.txt
+++ b/Documentation/devicetree/bindings/net/micrel.txt
@@ -22,5 +22,5 @@ Optional properties:
  - clocks, clock-names: contains clocks according to the common clock bindings.
 
               supported clocks:
-	      - KSZ8021, KSZ8031: "rmii-ref": The RMII refence input clock. Used
-		to determine the XI input clock.
+	      - KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII
+		refence input clock. Used to determine the XI input clock.
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index d2e790cd3651..04fbee846b66 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -102,6 +102,7 @@ static const struct kszphy_type ksz8051_type = {
 static const struct kszphy_type ksz8081_type = {
 	.led_mode_reg		= MII_KSZPHY_CTRL_2,
 	.has_broadcast_disable	= true,
+	.has_rmii_ref_clk_sel	= true,
 };
 
 static int kszphy_extended_write(struct phy_device *phydev,
@@ -548,16 +549,16 @@ static int kszphy_probe(struct phy_device *phydev)
 	clk = devm_clk_get(&phydev->dev, "rmii-ref");
 	if (!IS_ERR(clk)) {
 		unsigned long rate = clk_get_rate(clk);
+		bool rmii_ref_clk_sel_25_mhz;
 
 		priv->rmii_ref_clk_sel = type->has_rmii_ref_clk_sel;
+		rmii_ref_clk_sel_25_mhz = of_property_read_bool(np,
+				"micrel,rmii-reference-clock-select-25-mhz");
 
-		/* FIXME: add support for PHY revisions that have this bit
-		 * inverted (e.g. through new property or based on PHY ID).
-		 */
 		if (rate > 24500000 && rate < 25500000) {
-			priv->rmii_ref_clk_sel_val = false;
+			priv->rmii_ref_clk_sel_val = rmii_ref_clk_sel_25_mhz;
 		} else if (rate > 49500000 && rate < 50500000) {
-			priv->rmii_ref_clk_sel_val = true;
+			priv->rmii_ref_clk_sel_val = !rmii_ref_clk_sel_25_mhz;
 		} else {
 			dev_err(&phydev->dev, "Clock rate out of range: %ld\n", rate);
 			return -EINVAL;
-- 
cgit v1.2.3


From 6d01329444725a5c17cf75ba6c5c0c5e42843613 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan@kernel.org>
Date: Wed, 19 Nov 2014 12:59:20 +0100
Subject: dt/bindings: reformat micrel eth-phy documentation

Reduce indentation of Micrel PHY binding documentations somewhat.

Also fix "reference input clock" typo while at it.

Cc: devicetree@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/micrel.txt | 26 ++++++++++++------------
 1 file changed, 13 insertions(+), 13 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt
index a1bab5eaae02..20a6cac7abc6 100644
--- a/Documentation/devicetree/bindings/net/micrel.txt
+++ b/Documentation/devicetree/bindings/net/micrel.txt
@@ -6,21 +6,21 @@ Optional properties:
 
  - micrel,led-mode : LED mode value to set for PHYs with configurable LEDs.
 
-              Configure the LED mode with single value. The list of PHYs and
-	      the bits that are currently supported:
+	Configure the LED mode with single value. The list of PHYs and the
+	bits that are currently supported:
 
-	      KSZ8001: register 0x1e, bits 15..14
-	      KSZ8041: register 0x1e, bits 15..14
-	      KSZ8021: register 0x1f, bits 5..4
-	      KSZ8031: register 0x1f, bits 5..4
-	      KSZ8051: register 0x1f, bits 5..4
-	      KSZ8081: register 0x1f, bits 5..4
-	      KSZ8091: register 0x1f, bits 5..4
+	KSZ8001: register 0x1e, bits 15..14
+	KSZ8041: register 0x1e, bits 15..14
+	KSZ8021: register 0x1f, bits 5..4
+	KSZ8031: register 0x1f, bits 5..4
+	KSZ8051: register 0x1f, bits 5..4
+	KSZ8081: register 0x1f, bits 5..4
+	KSZ8091: register 0x1f, bits 5..4
 
-              See the respective PHY datasheet for the mode values.
+	See the respective PHY datasheet for the mode values.
 
  - clocks, clock-names: contains clocks according to the common clock bindings.
 
-              supported clocks:
-	      - KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII
-		refence input clock. Used to determine the XI input clock.
+	supported clocks:
+	- KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII reference
+	  input clock. Used to determine the XI input clock.
-- 
cgit v1.2.3


From c3a8e1eddd6c140fbaaf957f7da8f2175f79623d Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan@kernel.org>
Date: Wed, 19 Nov 2014 12:59:21 +0100
Subject: dt/bindings: add clock-select function property to micrel phy binding

Add "micrel,rmii-reference-clock-select-25-mhz" to Micrel ethernet PHY
binding documentation.

This property is needed to properly describe some revisions of Micrel
PHYs which has the function of this configuration bit inverted so that
setting it enables 25 MHz rather than 50 MHz clock mode.

Note that a clock reference ("rmii-ref") is still needed to actually
select either mode.

Cc: devicetree@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/micrel.txt | 11 +++++++++++
 1 file changed, 11 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt
index 20a6cac7abc6..87496a8c64ab 100644
--- a/Documentation/devicetree/bindings/net/micrel.txt
+++ b/Documentation/devicetree/bindings/net/micrel.txt
@@ -19,6 +19,17 @@ Optional properties:
 
 	See the respective PHY datasheet for the mode values.
 
+ - micrel,rmii-reference-clock-select-25-mhz: RMII Reference Clock Select
+						bit selects 25 MHz mode
+
+	Setting the RMII Reference Clock Select bit enables 25 MHz rather
+	than 50 MHz clock mode.
+
+	Note that this option in only needed for certain PHY revisions with a
+	non-standard, inverted function of this configuration bit.
+	Specifically, a clock reference ("rmii-ref" below) is always needed to
+	actually select a mode.
+
  - clocks, clock-names: contains clocks according to the common clock bindings.
 
 	supported clocks:
-- 
cgit v1.2.3


From 2ad7bf3638411cb547f2823df08166c13ab04269 Mon Sep 17 00:00:00 2001
From: Mahesh Bandewar <maheshb@google.com>
Date: Sun, 23 Nov 2014 23:07:46 -0800
Subject: ipvlan: Initial check-in of the IPVLAN driver.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.

This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.

The device operates in two different modes and the difference
in these two modes in primarily in the TX side.

(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.

RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.

(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.

RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.

The devices can be added using the "ip" command from the iproute2
package -

	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ipvlan.txt | 107 +++++
 drivers/net/Kconfig                 |  18 +
 drivers/net/Makefile                |   1 +
 drivers/net/ipvlan/Makefile         |   7 +
 drivers/net/ipvlan/ipvlan.h         | 130 ++++++
 drivers/net/ipvlan/ipvlan_core.c    | 607 +++++++++++++++++++++++++++
 drivers/net/ipvlan/ipvlan_main.c    | 789 ++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h           |   4 +
 include/uapi/linux/if_link.h        |  15 +
 9 files changed, 1678 insertions(+)
 create mode 100644 Documentation/networking/ipvlan.txt
 create mode 100644 drivers/net/ipvlan/Makefile
 create mode 100644 drivers/net/ipvlan/ipvlan.h
 create mode 100644 drivers/net/ipvlan/ipvlan_core.c
 create mode 100644 drivers/net/ipvlan/ipvlan_main.c

(limited to 'Documentation')

diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
new file mode 100644
index 000000000000..cf996394e466
--- /dev/null
+++ b/Documentation/networking/ipvlan.txt
@@ -0,0 +1,107 @@
+
+                            IPVLAN Driver HOWTO
+
+Initial Release:
+	Mahesh Bandewar <maheshb AT google.com>
+
+1. Introduction:
+	This is conceptually very similar to the macvlan driver with one major
+exception of using L3 for mux-ing /demux-ing among slaves. This property makes
+the master device share the L2 with it's slave devices. I have developed this
+driver in conjuntion with network namespaces and not sure if there is use case
+outside of it.
+
+
+2. Building and Installation:
+	In order to build the driver, please select the config item CONFIG_IPVLAN.
+The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
+(CONFIG_IPVLAN=m).
+
+
+3. Configuration:
+	There are no module parameters for this driver and it can be configured
+using IProute2/ip utility.
+
+	ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
+
+	e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
+
+
+4. Operating modes:
+	IPvlan has two modes of operation - L2 and L3. For a given master device,
+you can select one of these two modes and all slaves on that master will
+operate in the same (selected) mode. The RX mode is almost identical except
+that in L3 mode the slaves wont receive any multicast / broadcast traffic.
+L3 mode is more restrictive since routing is controlled from the other (mostly)
+default namespace.
+
+4.1 L2 mode:
+	In this mode TX processing happens on the stack instance attached to the
+slave device and packets are switched and queued to the master device to send
+out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
+as well.
+
+4.2 L3 mode:
+	In this mode TX processing upto L3 happens on the stack instance attached
+to the slave device and packets are switched to the stack instance of the
+master device for the L2 processing and routing from that instance will be
+used before packets are queued on the outbound device. In this mode the slaves
+will not receive nor can send multicast / broadcast traffic.
+
+
+5. What to choose (macvlan vs. ipvlan)?
+	These two devices are very similar in many regards and the specific use
+case could very well define which device to choose. if one of the following
+situations defines your use case then you can choose to use ipvlan -
+	(a) The Linux host that is connected to the external switch / router has
+policy configured that allows only one mac per port.
+	(b) No of virtual devices created on a master exceed the mac capacity and
+puts the NIC in promiscous mode and degraded performance is a concern.
+	(c) If the slave device is to be put into the hostile / untrusted network
+namespace where L2 on the slave could be changed / misused.
+
+
+6. Example configuration:
+
+  +=============================================================+
+  |  Host: host1                                                |
+  |                                                             |
+  |   +----------------------+      +----------------------+    |
+  |   |   NS:ns0             |      |  NS:ns1              |    |
+  |   |                      |      |                      |    |
+  |   |                      |      |                      |    |
+  |   |        ipvl0         |      |         ipvl1        |    |
+  |   +----------#-----------+      +-----------#----------+    |
+  |              #                              #               |
+  |              ################################               |
+  |                              # eth0                         |
+  +==============================#==============================+
+
+
+	(a) Create two network namespaces - ns0, ns1
+		ip netns add ns0
+		ip netns add ns1
+
+	(b) Create two ipvlan slaves on eth0 (master device)
+		ip link add link eth0 ipvl0 type ipvlan mode l2
+		ip link add link eth0 ipvl1 type ipvlan mode l2
+
+	(c) Assign slaves to the respective network namespaces
+		ip link set dev ipvl0 netns ns0
+		ip link set dev ipvl1 netns ns1
+
+	(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
+		- For ns0
+			(1) ip netns exec ns0 bash
+			(2) ip link set dev ipvl0 up
+			(3) ip link set dev lo up
+			(4) ip -4 addr add 127.0.0.1 dev lo
+			(5) ip -4 addr add $IPADDR dev ipvl0
+			(6) ip -4 route add default via $ROUTER dev ipvl0
+		- For ns1
+			(1) ip netns exec ns1 bash
+			(2) ip link set dev ipvl1 up
+			(3) ip link set dev lo up
+			(4) ip -4 addr add 127.0.0.1 dev lo
+			(5) ip -4 addr add $IPADDR dev ipvl1
+			(6) ip -4 route add default via $ROUTER dev ipvl1
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f9009be3f307..b6d64f546574 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -145,6 +145,24 @@ config MACVTAP
 	  To compile this driver as a module, choose M here: the module
 	  will be called macvtap.
 
+
+config IPVLAN
+    tristate "IP-VLAN support"
+    ---help---
+      This allows one to create virtual devices off of a main interface
+      and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
+      on packets. All interfaces (including the main interface) share L2
+      making it transparent to the connected L2 switch.
+
+      Ipvlan devices can be added using the "ip" command from the
+      iproute2 package starting with the iproute2-X.Y.ZZ release:
+
+      "ip link add link <main-dev> [ NAME ] type ipvlan"
+
+      To compile this driver as a module, choose M here: the module
+      will be called ipvlan.
+
+
 config VXLAN
        tristate "Virtual eXtensible Local Area Network (VXLAN)"
        depends on INET
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 61aefdd1e173..e25fdd7d905e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -6,6 +6,7 @@
 # Networking Core Drivers
 #
 obj-$(CONFIG_BONDING) += bonding/
+obj-$(CONFIG_IPVLAN) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile
new file mode 100644
index 000000000000..df79910192d6
--- /dev/null
+++ b/drivers/net/ipvlan/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the Ethernet Ipvlan driver
+#
+
+obj-$(CONFIG_IPVLAN) += ipvlan.o
+
+ipvlan-objs := ipvlan_core.o ipvlan_main.o
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
new file mode 100644
index 000000000000..ab3e7614ed71
--- /dev/null
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ */
+#ifndef __IPVLAN_H
+#define __IPVLAN_H
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rculist.h>
+#include <linux/notifier.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/if_arp.h>
+#include <linux/if_link.h>
+#include <linux/if_vlan.h>
+#include <linux/ip.h>
+#include <linux/inetdevice.h>
+#include <net/rtnetlink.h>
+#include <net/gre.h>
+#include <net/route.h>
+#include <net/addrconf.h>
+
+#define IPVLAN_DRV	"ipvlan"
+#define IPV_DRV_VER	"0.1"
+
+#define IPVLAN_HASH_SIZE	(1 << BITS_PER_BYTE)
+#define IPVLAN_HASH_MASK	(IPVLAN_HASH_SIZE - 1)
+
+#define IPVLAN_MAC_FILTER_BITS	8
+#define IPVLAN_MAC_FILTER_SIZE	(1 << IPVLAN_MAC_FILTER_BITS)
+#define IPVLAN_MAC_FILTER_MASK	(IPVLAN_MAC_FILTER_SIZE - 1)
+
+typedef enum {
+	IPVL_IPV6 = 0,
+	IPVL_ICMPV6,
+	IPVL_IPV4,
+	IPVL_ARP,
+} ipvl_hdr_type;
+
+struct ipvl_pcpu_stats {
+	u64			rx_pkts;
+	u64			rx_bytes;
+	u64			rx_mcast;
+	u64			tx_pkts;
+	u64			tx_bytes;
+	struct u64_stats_sync	syncp;
+	u32			rx_errs;
+	u32			tx_drps;
+};
+
+struct ipvl_port;
+
+struct ipvl_dev {
+	struct net_device	*dev;
+	struct list_head	pnode;
+	struct ipvl_port	*port;
+	struct net_device	*phy_dev;
+	struct list_head	addrs;
+	int			ipv4cnt;
+	int			ipv6cnt;
+	struct ipvl_pcpu_stats	*pcpu_stats;
+	DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
+	netdev_features_t	sfeatures;
+	u32			msg_enable;
+	u16			mtu_adj;
+};
+
+struct ipvl_addr {
+	struct ipvl_dev		*master; /* Back pointer to master */
+	union {
+		struct in6_addr	ip6;	 /* IPv6 address on logical interface */
+		struct in_addr	ip4;	 /* IPv4 address on logical interface */
+	} ipu;
+#define ip6addr	ipu.ip6
+#define ip4addr ipu.ip4
+	struct hlist_node	hlnode;  /* Hash-table linkage */
+	struct list_head	anode;   /* logical-interface linkage */
+	struct rcu_head		rcu;
+	ipvl_hdr_type		atype;
+};
+
+struct ipvl_port {
+	struct net_device	*dev;
+	struct hlist_head	hlhead[IPVLAN_HASH_SIZE];
+	struct list_head	ipvlans;
+	struct rcu_head		rcu;
+	int			count;
+	u16			mode;
+};
+
+static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
+{
+	return rcu_dereference(d->rx_handler_data);
+}
+
+static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d)
+{
+	return rtnl_dereference(d->rx_handler_data);
+}
+
+static inline bool ipvlan_dev_master(struct net_device *d)
+{
+	return d->priv_flags & IFF_IPVLAN_MASTER;
+}
+
+static inline bool ipvlan_dev_slave(struct net_device *d)
+{
+	return d->priv_flags & IFF_IPVLAN_SLAVE;
+}
+
+void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev);
+void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval);
+void ipvlan_init_secret(void);
+unsigned int ipvlan_mac_hash(const unsigned char *addr);
+rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb);
+int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev);
+void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr);
+bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
+struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
+					const void *iaddr, bool is_v6);
+void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
+#endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
new file mode 100644
index 000000000000..a14d87783245
--- /dev/null
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -0,0 +1,607 @@
+/* Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ */
+
+#include "ipvlan.h"
+
+static u32 ipvlan_jhash_secret;
+
+void ipvlan_init_secret(void)
+{
+	net_get_random_once(&ipvlan_jhash_secret, sizeof(ipvlan_jhash_secret));
+}
+
+static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+			    unsigned int len, bool success, bool mcast)
+{
+	if (!ipvlan)
+		return;
+
+	if (likely(success)) {
+		struct ipvl_pcpu_stats *pcptr;
+
+		pcptr = this_cpu_ptr(ipvlan->pcpu_stats);
+		u64_stats_update_begin(&pcptr->syncp);
+		pcptr->rx_pkts++;
+		pcptr->rx_bytes += len;
+		if (mcast)
+			pcptr->rx_mcast++;
+		u64_stats_update_end(&pcptr->syncp);
+	} else {
+		this_cpu_inc(ipvlan->pcpu_stats->rx_errs);
+	}
+}
+
+static u8 ipvlan_get_v6_hash(const void *iaddr)
+{
+	const struct in6_addr *ip6_addr = iaddr;
+
+	return __ipv6_addr_jhash(ip6_addr, ipvlan_jhash_secret) &
+	       IPVLAN_HASH_MASK;
+}
+
+static u8 ipvlan_get_v4_hash(const void *iaddr)
+{
+	const struct in_addr *ip4_addr = iaddr;
+
+	return jhash_1word(ip4_addr->s_addr, ipvlan_jhash_secret) &
+	       IPVLAN_HASH_MASK;
+}
+
+struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
+					const void *iaddr, bool is_v6)
+{
+	struct ipvl_addr *addr;
+	u8 hash;
+
+	hash = is_v6 ? ipvlan_get_v6_hash(iaddr) :
+	       ipvlan_get_v4_hash(iaddr);
+	hlist_for_each_entry_rcu(addr, &port->hlhead[hash], hlnode) {
+		if (is_v6 && addr->atype == IPVL_IPV6 &&
+		    ipv6_addr_equal(&addr->ip6addr, iaddr))
+			return addr;
+		else if (!is_v6 && addr->atype == IPVL_IPV4 &&
+			 addr->ip4addr.s_addr ==
+				((struct in_addr *)iaddr)->s_addr)
+			return addr;
+	}
+	return NULL;
+}
+
+void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr)
+{
+	struct ipvl_port *port = ipvlan->port;
+	u8 hash;
+
+	hash = (addr->atype == IPVL_IPV6) ?
+	       ipvlan_get_v6_hash(&addr->ip6addr) :
+	       ipvlan_get_v4_hash(&addr->ip4addr);
+	hlist_add_head_rcu(&addr->hlnode, &port->hlhead[hash]);
+}
+
+void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync)
+{
+	hlist_del_rcu(&addr->hlnode);
+	if (sync)
+		synchronize_rcu();
+}
+
+bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6)
+{
+	struct ipvl_port *port = ipvlan->port;
+	struct ipvl_addr *addr;
+
+	list_for_each_entry(addr, &ipvlan->addrs, anode) {
+		if ((is_v6 && addr->atype == IPVL_IPV6 &&
+		    ipv6_addr_equal(&addr->ip6addr, iaddr)) ||
+		    (!is_v6 && addr->atype == IPVL_IPV4 &&
+		    addr->ip4addr.s_addr == ((struct in_addr *)iaddr)->s_addr))
+			return true;
+	}
+
+	if (ipvlan_ht_addr_lookup(port, iaddr, is_v6))
+		return true;
+
+	return false;
+}
+
+static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
+{
+	void *lyr3h = NULL;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_ARP): {
+		struct arphdr *arph;
+
+		if (unlikely(!pskb_may_pull(skb, sizeof(*arph))))
+			return NULL;
+
+		arph = arp_hdr(skb);
+		*type = IPVL_ARP;
+		lyr3h = arph;
+		break;
+	}
+	case htons(ETH_P_IP): {
+		u32 pktlen;
+		struct iphdr *ip4h;
+
+		if (unlikely(!pskb_may_pull(skb, sizeof(*ip4h))))
+			return NULL;
+
+		ip4h = ip_hdr(skb);
+		pktlen = ntohs(ip4h->tot_len);
+		if (ip4h->ihl < 5 || ip4h->version != 4)
+			return NULL;
+		if (skb->len < pktlen || pktlen < (ip4h->ihl * 4))
+			return NULL;
+
+		*type = IPVL_IPV4;
+		lyr3h = ip4h;
+		break;
+	}
+	case htons(ETH_P_IPV6): {
+		struct ipv6hdr *ip6h;
+
+		if (unlikely(!pskb_may_pull(skb, sizeof(*ip6h))))
+			return NULL;
+
+		ip6h = ipv6_hdr(skb);
+		if (ip6h->version != 6)
+			return NULL;
+
+		*type = IPVL_IPV6;
+		lyr3h = ip6h;
+		/* Only Neighbour Solicitation pkts need different treatment */
+		if (ipv6_addr_any(&ip6h->saddr) &&
+		    ip6h->nexthdr == NEXTHDR_ICMP) {
+			*type = IPVL_ICMPV6;
+			lyr3h = ip6h + 1;
+		}
+		break;
+	}
+	default:
+		return NULL;
+	}
+
+	return lyr3h;
+}
+
+unsigned int ipvlan_mac_hash(const unsigned char *addr)
+{
+	u32 hash = jhash_1word(__get_unaligned_cpu32(addr+2),
+			       ipvlan_jhash_secret);
+
+	return hash & IPVLAN_MAC_FILTER_MASK;
+}
+
+static void ipvlan_multicast_frame(struct ipvl_port *port, struct sk_buff *skb,
+				   const struct ipvl_dev *in_dev, bool local)
+{
+	struct ethhdr *eth = eth_hdr(skb);
+	struct ipvl_dev *ipvlan;
+	struct sk_buff *nskb;
+	unsigned int len;
+	unsigned int mac_hash;
+	int ret;
+
+	if (skb->protocol == htons(ETH_P_PAUSE))
+		return;
+
+	list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
+		if (local && (ipvlan == in_dev))
+			continue;
+
+		mac_hash = ipvlan_mac_hash(eth->h_dest);
+		if (!test_bit(mac_hash, ipvlan->mac_filters))
+			continue;
+
+		ret = NET_RX_DROP;
+		len = skb->len + ETH_HLEN;
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			goto mcast_acct;
+
+		if (ether_addr_equal(eth->h_dest, ipvlan->phy_dev->broadcast))
+			nskb->pkt_type = PACKET_BROADCAST;
+		else
+			nskb->pkt_type = PACKET_MULTICAST;
+
+		nskb->dev = ipvlan->dev;
+		if (local)
+			ret = dev_forward_skb(ipvlan->dev, nskb);
+		else
+			ret = netif_rx(nskb);
+mcast_acct:
+		ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true);
+	}
+
+	/* Locally generated? ...Forward a copy to the main-device as
+	 * well. On the RX side we'll ignore it (wont give it to any
+	 * of the virtual devices.
+	 */
+	if (local) {
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (nskb) {
+			if (ether_addr_equal(eth->h_dest, port->dev->broadcast))
+				nskb->pkt_type = PACKET_BROADCAST;
+			else
+				nskb->pkt_type = PACKET_MULTICAST;
+
+			dev_forward_skb(port->dev, nskb);
+		}
+	}
+}
+
+static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff *skb,
+			    bool local)
+{
+	struct ipvl_dev *ipvlan = addr->master;
+	struct net_device *dev = ipvlan->dev;
+	unsigned int len;
+	rx_handler_result_t ret = RX_HANDLER_CONSUMED;
+	bool success = false;
+
+	len = skb->len + ETH_HLEN;
+	if (unlikely(!(dev->flags & IFF_UP))) {
+		kfree_skb(skb);
+		goto out;
+	}
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (!skb)
+		goto out;
+
+	skb->dev = dev;
+	skb->pkt_type = PACKET_HOST;
+
+	if (local) {
+		if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
+			success = true;
+	} else {
+		ret = RX_HANDLER_ANOTHER;
+		success = true;
+	}
+
+out:
+	ipvlan_count_rx(ipvlan, len, success, false);
+	return ret;
+}
+
+static struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port,
+					    void *lyr3h, int addr_type,
+					    bool use_dest)
+{
+	struct ipvl_addr *addr = NULL;
+
+	if (addr_type == IPVL_IPV6) {
+		struct ipv6hdr *ip6h;
+		struct in6_addr *i6addr;
+
+		ip6h = (struct ipv6hdr *)lyr3h;
+		i6addr = use_dest ? &ip6h->daddr : &ip6h->saddr;
+		addr = ipvlan_ht_addr_lookup(port, i6addr, true);
+	} else if (addr_type == IPVL_ICMPV6) {
+		struct nd_msg *ndmh;
+		struct in6_addr *i6addr;
+
+		/* Make sure that the NeighborSolicitation ICMPv6 packets
+		 * are handled to avoid DAD issue.
+		 */
+		ndmh = (struct nd_msg *)lyr3h;
+		if (ndmh->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) {
+			i6addr = &ndmh->target;
+			addr = ipvlan_ht_addr_lookup(port, i6addr, true);
+		}
+	} else if (addr_type == IPVL_IPV4) {
+		struct iphdr *ip4h;
+		__be32 *i4addr;
+
+		ip4h = (struct iphdr *)lyr3h;
+		i4addr = use_dest ? &ip4h->daddr : &ip4h->saddr;
+		addr = ipvlan_ht_addr_lookup(port, i4addr, false);
+	} else if (addr_type == IPVL_ARP) {
+		struct arphdr *arph;
+		unsigned char *arp_ptr;
+		__be32 dip;
+
+		arph = (struct arphdr *)lyr3h;
+		arp_ptr = (unsigned char *)(arph + 1);
+		if (use_dest)
+			arp_ptr += (2 * port->dev->addr_len) + 4;
+		else
+			arp_ptr += port->dev->addr_len;
+
+		memcpy(&dip, arp_ptr, 4);
+		addr = ipvlan_ht_addr_lookup(port, &dip, false);
+	}
+
+	return addr;
+}
+
+static int ipvlan_process_v4_outbound(struct sk_buff *skb)
+{
+	const struct iphdr *ip4h = ip_hdr(skb);
+	struct net_device *dev = skb->dev;
+	struct rtable *rt;
+	int err, ret = NET_XMIT_DROP;
+	struct flowi4 fl4 = {
+		.flowi4_oif = dev->iflink,
+		.flowi4_tos = RT_TOS(ip4h->tos),
+		.flowi4_flags = FLOWI_FLAG_ANYSRC,
+		.daddr = ip4h->daddr,
+		.saddr = ip4h->saddr,
+	};
+
+	rt = ip_route_output_flow(dev_net(dev), &fl4, NULL);
+	if (IS_ERR(rt))
+		goto err;
+
+	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+		ip_rt_put(rt);
+		goto err;
+	}
+	skb_dst_drop(skb);
+	skb_dst_set(skb, &rt->dst);
+	err = ip_local_out(skb);
+	if (unlikely(net_xmit_eval(err)))
+		dev->stats.tx_errors++;
+	else
+		ret = NET_XMIT_SUCCESS;
+	goto out;
+err:
+	dev->stats.tx_errors++;
+	kfree_skb(skb);
+out:
+	return ret;
+}
+
+static int ipvlan_process_v6_outbound(struct sk_buff *skb)
+{
+	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct net_device *dev = skb->dev;
+	struct dst_entry *dst;
+	int err, ret = NET_XMIT_DROP;
+	struct flowi6 fl6 = {
+		.flowi6_iif = skb->dev->ifindex,
+		.daddr = ip6h->daddr,
+		.saddr = ip6h->saddr,
+		.flowi6_flags = FLOWI_FLAG_ANYSRC,
+		.flowlabel = ip6_flowinfo(ip6h),
+		.flowi6_mark = skb->mark,
+		.flowi6_proto = ip6h->nexthdr,
+	};
+
+	dst = ip6_route_output(dev_net(dev), NULL, &fl6);
+	if (IS_ERR(dst))
+		goto err;
+
+	skb_dst_drop(skb);
+	skb_dst_set(skb, dst);
+	err = ip6_local_out(skb);
+	if (unlikely(net_xmit_eval(err)))
+		dev->stats.tx_errors++;
+	else
+		ret = NET_XMIT_SUCCESS;
+	goto out;
+err:
+	dev->stats.tx_errors++;
+	kfree_skb(skb);
+out:
+	return ret;
+}
+
+static int ipvlan_process_outbound(struct sk_buff *skb,
+				   const struct ipvl_dev *ipvlan)
+{
+	struct ethhdr *ethh = eth_hdr(skb);
+	int ret = NET_XMIT_DROP;
+
+	/* In this mode we dont care about multicast and broadcast traffic */
+	if (is_multicast_ether_addr(ethh->h_dest)) {
+		pr_warn_ratelimited("Dropped {multi|broad}cast of type= [%x]\n",
+				    ntohs(skb->protocol));
+		kfree_skb(skb);
+		goto out;
+	}
+
+	/* The ipvlan is a pseudo-L2 device, so the packets that we receive
+	 * will have L2; which need to discarded and processed further
+	 * in the net-ns of the main-device.
+	 */
+	if (skb_mac_header_was_set(skb)) {
+		skb_pull(skb, sizeof(*ethh));
+		skb->mac_header = (typeof(skb->mac_header))~0U;
+		skb_reset_network_header(skb);
+	}
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		ret = ipvlan_process_v6_outbound(skb);
+	else if (skb->protocol == htons(ETH_P_IP))
+		ret = ipvlan_process_v4_outbound(skb);
+	else {
+		pr_warn_ratelimited("Dropped outbound packet type=%x\n",
+				    ntohs(skb->protocol));
+		kfree_skb(skb);
+	}
+out:
+	return ret;
+}
+
+static int ipvlan_xmit_mode_l3(struct sk_buff *skb, struct net_device *dev)
+{
+	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+	void *lyr3h;
+	struct ipvl_addr *addr;
+	int addr_type;
+
+	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+	if (!lyr3h)
+		goto out;
+
+	addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
+	if (addr)
+		return ipvlan_rcv_frame(addr, skb, true);
+
+out:
+	skb->dev = ipvlan->phy_dev;
+	return ipvlan_process_outbound(skb, ipvlan);
+}
+
+static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
+{
+	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ethhdr *eth = eth_hdr(skb);
+	struct ipvl_addr *addr;
+	void *lyr3h;
+	int addr_type;
+
+	if (ether_addr_equal(eth->h_dest, eth->h_source)) {
+		lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+		if (lyr3h) {
+			addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
+			if (addr)
+				return ipvlan_rcv_frame(addr, skb, true);
+		}
+		skb = skb_share_check(skb, GFP_ATOMIC);
+		if (!skb)
+			return NET_XMIT_DROP;
+
+		/* Packet definitely does not belong to any of the
+		 * virtual devices, but the dest is local. So forward
+		 * the skb for the main-dev. At the RX side we just return
+		 * RX_PASS for it to be processed further on the stack.
+		 */
+		return dev_forward_skb(ipvlan->phy_dev, skb);
+
+	} else if (is_multicast_ether_addr(eth->h_dest)) {
+		u8 ip_summed = skb->ip_summed;
+
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		ipvlan_multicast_frame(ipvlan->port, skb, ipvlan, true);
+		skb->ip_summed = ip_summed;
+	}
+
+	skb->dev = ipvlan->phy_dev;
+	return dev_queue_xmit(skb);
+}
+
+int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_port *port = ipvlan_port_get_rcu(ipvlan->phy_dev);
+
+	if (!port)
+		goto out;
+
+	if (unlikely(!pskb_may_pull(skb, sizeof(struct ethhdr))))
+		goto out;
+
+	switch(port->mode) {
+	case IPVLAN_MODE_L2:
+		return ipvlan_xmit_mode_l2(skb, dev);
+	case IPVLAN_MODE_L3:
+		return ipvlan_xmit_mode_l3(skb, dev);
+	}
+
+	/* Should not reach here */
+	WARN_ONCE(true, "ipvlan_queue_xmit() called for mode = [%hx]\n",
+			  port->mode);
+out:
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+}
+
+static bool ipvlan_external_frame(struct sk_buff *skb, struct ipvl_port *port)
+{
+	struct ethhdr *eth = eth_hdr(skb);
+	struct ipvl_addr *addr;
+	void *lyr3h;
+	int addr_type;
+
+	if (ether_addr_equal(eth->h_source, skb->dev->dev_addr)) {
+		lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+		if (!lyr3h)
+			return true;
+
+		addr = ipvlan_addr_lookup(port, lyr3h, addr_type, false);
+		if (addr)
+			return false;
+	}
+
+	return true;
+}
+
+static rx_handler_result_t ipvlan_handle_mode_l3(struct sk_buff **pskb,
+						 struct ipvl_port *port)
+{
+	void *lyr3h;
+	int addr_type;
+	struct ipvl_addr *addr;
+	struct sk_buff *skb = *pskb;
+	rx_handler_result_t ret = RX_HANDLER_PASS;
+
+	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+	if (!lyr3h)
+		goto out;
+
+	addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
+	if (addr)
+		ret = ipvlan_rcv_frame(addr, skb, false);
+
+out:
+	return ret;
+}
+
+static rx_handler_result_t ipvlan_handle_mode_l2(struct sk_buff **pskb,
+						 struct ipvl_port *port)
+{
+	struct sk_buff *skb = *pskb;
+	struct ethhdr *eth = eth_hdr(skb);
+	rx_handler_result_t ret = RX_HANDLER_PASS;
+	void *lyr3h;
+	int addr_type;
+
+	if (is_multicast_ether_addr(eth->h_dest)) {
+		if (ipvlan_external_frame(skb, port))
+			ipvlan_multicast_frame(port, skb, NULL, false);
+	} else {
+		struct ipvl_addr *addr;
+
+		lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+		if (!lyr3h)
+			return ret;
+
+		addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
+		if (addr)
+			ret = ipvlan_rcv_frame(addr, skb, false);
+	}
+
+	return ret;
+}
+
+rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
+{
+	struct sk_buff *skb = *pskb;
+	struct ipvl_port *port = ipvlan_port_get_rcu(skb->dev);
+
+	if (!port)
+		return RX_HANDLER_PASS;
+
+	switch (port->mode) {
+	case IPVLAN_MODE_L2:
+		return ipvlan_handle_mode_l2(pskb, port);
+	case IPVLAN_MODE_L3:
+		return ipvlan_handle_mode_l3(pskb, port);
+	}
+
+	/* Should not reach here */
+	WARN_ONCE(true, "ipvlan_handle_frame() called for mode = [%hx]\n",
+			  port->mode);
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
new file mode 100644
index 000000000000..c3df84bd2857
--- /dev/null
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -0,0 +1,789 @@
+/* Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ */
+
+#include "ipvlan.h"
+
+void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
+{
+	ipvlan->dev->mtu = dev->mtu - ipvlan->mtu_adj;
+}
+
+void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval)
+{
+	struct ipvl_dev *ipvlan;
+
+	if (port->mode != nval) {
+		list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
+			if (nval == IPVLAN_MODE_L3)
+				ipvlan->dev->flags |= IFF_NOARP;
+			else
+				ipvlan->dev->flags &= ~IFF_NOARP;
+		}
+		port->mode = nval;
+	}
+}
+
+static int ipvlan_port_create(struct net_device *dev)
+{
+	struct ipvl_port *port;
+	int err, idx;
+
+	if (dev->type != ARPHRD_ETHER || dev->flags & IFF_LOOPBACK) {
+		netdev_err(dev, "Master is either lo or non-ether device\n");
+		return -EINVAL;
+	}
+	port = kzalloc(sizeof(struct ipvl_port), GFP_KERNEL);
+	if (!port)
+		return -ENOMEM;
+
+	port->dev = dev;
+	port->mode = IPVLAN_MODE_L3;
+	INIT_LIST_HEAD(&port->ipvlans);
+	for (idx = 0; idx < IPVLAN_HASH_SIZE; idx++)
+		INIT_HLIST_HEAD(&port->hlhead[idx]);
+
+	err = netdev_rx_handler_register(dev, ipvlan_handle_frame, port);
+	if (err)
+		goto err;
+
+	dev->priv_flags |= IFF_IPVLAN_MASTER;
+	return 0;
+
+err:
+	kfree_rcu(port, rcu);
+	return err;
+}
+
+static void ipvlan_port_destroy(struct net_device *dev)
+{
+	struct ipvl_port *port = ipvlan_port_get_rtnl(dev);
+
+	dev->priv_flags &= ~IFF_IPVLAN_MASTER;
+	netdev_rx_handler_unregister(dev);
+	kfree_rcu(port, rcu);
+}
+
+/* ipvlan network devices have devices nesting below it and are a special
+ * "super class" of normal network devices; split their locks off into a
+ * separate class since they always nest.
+ */
+static struct lock_class_key ipvlan_netdev_xmit_lock_key;
+static struct lock_class_key ipvlan_netdev_addr_lock_key;
+
+#define IPVLAN_FEATURES \
+	(NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
+	 NETIF_F_GSO | NETIF_F_TSO | NETIF_F_UFO | NETIF_F_GSO_ROBUST | \
+	 NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
+	 NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
+
+#define IPVLAN_STATE_MASK \
+	((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))
+
+static void ipvlan_set_lockdep_class_one(struct net_device *dev,
+					 struct netdev_queue *txq,
+					 void *_unused)
+{
+	lockdep_set_class(&txq->_xmit_lock, &ipvlan_netdev_xmit_lock_key);
+}
+
+static void ipvlan_set_lockdep_class(struct net_device *dev)
+{
+	lockdep_set_class(&dev->addr_list_lock, &ipvlan_netdev_addr_lock_key);
+	netdev_for_each_tx_queue(dev, ipvlan_set_lockdep_class_one, NULL);
+}
+
+static int ipvlan_init(struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	const struct net_device *phy_dev = ipvlan->phy_dev;
+
+	dev->state = (dev->state & ~IPVLAN_STATE_MASK) |
+		     (phy_dev->state & IPVLAN_STATE_MASK);
+	dev->features = phy_dev->features & IPVLAN_FEATURES;
+	dev->features |= NETIF_F_LLTX;
+	dev->gso_max_size = phy_dev->gso_max_size;
+	dev->iflink = phy_dev->ifindex;
+	dev->hard_header_len = phy_dev->hard_header_len;
+
+	ipvlan_set_lockdep_class(dev);
+
+	ipvlan->pcpu_stats = alloc_percpu(struct ipvl_pcpu_stats);
+	if (!ipvlan->pcpu_stats)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void ipvlan_uninit(struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_port *port = ipvlan->port;
+
+	if (ipvlan->pcpu_stats)
+		free_percpu(ipvlan->pcpu_stats);
+
+	port->count -= 1;
+	if (!port->count)
+		ipvlan_port_destroy(port->dev);
+}
+
+static int ipvlan_open(struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct net_device *phy_dev = ipvlan->phy_dev;
+	struct ipvl_addr *addr;
+
+	if (ipvlan->port->mode == IPVLAN_MODE_L3)
+		dev->flags |= IFF_NOARP;
+	else
+		dev->flags &= ~IFF_NOARP;
+
+	if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
+		list_for_each_entry(addr, &ipvlan->addrs, anode)
+			ipvlan_ht_addr_add(ipvlan, addr);
+	}
+	return dev_uc_add(phy_dev, phy_dev->dev_addr);
+}
+
+static int ipvlan_stop(struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct net_device *phy_dev = ipvlan->phy_dev;
+	struct ipvl_addr *addr;
+
+	dev_uc_unsync(phy_dev, dev);
+	dev_mc_unsync(phy_dev, dev);
+
+	dev_uc_del(phy_dev, phy_dev->dev_addr);
+
+	if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
+		list_for_each_entry(addr, &ipvlan->addrs, anode)
+			ipvlan_ht_addr_del(addr, !dev->dismantle);
+	}
+	return 0;
+}
+
+netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+	int skblen = skb->len;
+	int ret;
+
+	ret = ipvlan_queue_xmit(skb, dev);
+	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
+		struct ipvl_pcpu_stats *pcptr;
+
+		pcptr = this_cpu_ptr(ipvlan->pcpu_stats);
+
+		u64_stats_update_begin(&pcptr->syncp);
+		pcptr->tx_pkts++;
+		pcptr->tx_bytes += skblen;
+		u64_stats_update_end(&pcptr->syncp);
+	} else {
+		this_cpu_inc(ipvlan->pcpu_stats->tx_drps);
+	}
+	return ret;
+}
+
+static netdev_features_t ipvlan_fix_features(struct net_device *dev,
+					     netdev_features_t features)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	return features & (ipvlan->sfeatures | ~IPVLAN_FEATURES);
+}
+
+static void ipvlan_change_rx_flags(struct net_device *dev, int change)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct net_device *phy_dev = ipvlan->phy_dev;
+
+	if (change & IFF_ALLMULTI)
+		dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
+}
+
+static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
+{
+	struct net_device *dev = ipvlan->dev;
+	unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
+
+	if (set && !test_bit(hashbit, ipvlan->mac_filters))
+		__set_bit(hashbit, ipvlan->mac_filters);
+	else if (!set && test_bit(hashbit, ipvlan->mac_filters))
+		__clear_bit(hashbit, ipvlan->mac_filters);
+}
+
+static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	if (dev->flags & (IFF_PROMISC | IFF_ALLMULTI)) {
+		bitmap_fill(ipvlan->mac_filters, IPVLAN_MAC_FILTER_SIZE);
+	} else {
+		struct netdev_hw_addr *ha;
+		DECLARE_BITMAP(mc_filters, IPVLAN_MAC_FILTER_SIZE);
+
+		bitmap_zero(mc_filters, IPVLAN_MAC_FILTER_SIZE);
+		netdev_for_each_mc_addr(ha, dev)
+			__set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
+
+		bitmap_copy(ipvlan->mac_filters, mc_filters,
+			    IPVLAN_MAC_FILTER_SIZE);
+	}
+	dev_uc_sync(ipvlan->phy_dev, dev);
+	dev_mc_sync(ipvlan->phy_dev, dev);
+}
+
+static struct rtnl_link_stats64 *ipvlan_get_stats64(struct net_device *dev,
+						    struct rtnl_link_stats64 *s)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	if (ipvlan->pcpu_stats) {
+		struct ipvl_pcpu_stats *pcptr;
+		u64 rx_pkts, rx_bytes, rx_mcast, tx_pkts, tx_bytes;
+		u32 rx_errs = 0, tx_drps = 0;
+		u32 strt;
+		int idx;
+
+		for_each_possible_cpu(idx) {
+			pcptr = per_cpu_ptr(ipvlan->pcpu_stats, idx);
+			do {
+				strt= u64_stats_fetch_begin_irq(&pcptr->syncp);
+				rx_pkts = pcptr->rx_pkts;
+				rx_bytes = pcptr->rx_bytes;
+				rx_mcast = pcptr->rx_mcast;
+				tx_pkts = pcptr->tx_pkts;
+				tx_bytes = pcptr->tx_bytes;
+			} while (u64_stats_fetch_retry_irq(&pcptr->syncp,
+							   strt));
+
+			s->rx_packets += rx_pkts;
+			s->rx_bytes += rx_bytes;
+			s->multicast += rx_mcast;
+			s->tx_packets += tx_pkts;
+			s->tx_bytes += tx_bytes;
+
+			/* u32 values are updated without syncp protection. */
+			rx_errs += pcptr->rx_errs;
+			tx_drps += pcptr->tx_drps;
+		}
+		s->rx_errors = rx_errs;
+		s->rx_dropped = rx_errs;
+		s->tx_dropped = tx_drps;
+	}
+	return s;
+}
+
+static int ipvlan_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct net_device *phy_dev = ipvlan->phy_dev;
+
+	return vlan_vid_add(phy_dev, proto, vid);
+}
+
+static int ipvlan_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
+				   u16 vid)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct net_device *phy_dev = ipvlan->phy_dev;
+
+	vlan_vid_del(phy_dev, proto, vid);
+	return 0;
+}
+
+static const struct net_device_ops ipvlan_netdev_ops = {
+	.ndo_init		= ipvlan_init,
+	.ndo_uninit		= ipvlan_uninit,
+	.ndo_open		= ipvlan_open,
+	.ndo_stop		= ipvlan_stop,
+	.ndo_start_xmit		= ipvlan_start_xmit,
+	.ndo_fix_features	= ipvlan_fix_features,
+	.ndo_change_rx_flags	= ipvlan_change_rx_flags,
+	.ndo_set_rx_mode	= ipvlan_set_multicast_mac_filter,
+	.ndo_get_stats64	= ipvlan_get_stats64,
+	.ndo_vlan_rx_add_vid	= ipvlan_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid	= ipvlan_vlan_rx_kill_vid,
+};
+
+static int ipvlan_hard_header(struct sk_buff *skb, struct net_device *dev,
+			      unsigned short type, const void *daddr,
+			      const void *saddr, unsigned len)
+{
+	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct net_device *phy_dev = ipvlan->phy_dev;
+
+	/* TODO Probably use a different field than dev_addr so that the
+	 * mac-address on the virtual device is portable and can be carried
+	 * while the packets use the mac-addr on the physical device.
+	 */
+	return dev_hard_header(skb, phy_dev, type, daddr,
+			       saddr ? : dev->dev_addr, len);
+}
+
+static const struct header_ops ipvlan_header_ops = {
+	.create  	= ipvlan_hard_header,
+	.rebuild	= eth_rebuild_header,
+	.parse		= eth_header_parse,
+	.cache		= eth_header_cache,
+	.cache_update	= eth_header_cache_update,
+};
+
+static int ipvlan_ethtool_get_settings(struct net_device *dev,
+				       struct ethtool_cmd *cmd)
+{
+	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	return __ethtool_get_settings(ipvlan->phy_dev, cmd);
+}
+
+static void ipvlan_ethtool_get_drvinfo(struct net_device *dev,
+				       struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, IPVLAN_DRV, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, IPV_DRV_VER, sizeof(drvinfo->version));
+}
+
+static u32 ipvlan_ethtool_get_msglevel(struct net_device *dev)
+{
+	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	return ipvlan->msg_enable;
+}
+
+static void ipvlan_ethtool_set_msglevel(struct net_device *dev, u32 value)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	ipvlan->msg_enable = value;
+}
+
+static const struct ethtool_ops ipvlan_ethtool_ops = {
+	.get_link	= ethtool_op_get_link,
+	.get_settings	= ipvlan_ethtool_get_settings,
+	.get_drvinfo	= ipvlan_ethtool_get_drvinfo,
+	.get_msglevel	= ipvlan_ethtool_get_msglevel,
+	.set_msglevel	= ipvlan_ethtool_set_msglevel,
+};
+
+static int ipvlan_nl_changelink(struct net_device *dev,
+				struct nlattr *tb[], struct nlattr *data[])
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev);
+
+	if (data && data[IFLA_IPVLAN_MODE]) {
+		u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
+
+		ipvlan_set_port_mode(port, nmode);
+	}
+	return 0;
+}
+
+static size_t ipvlan_nl_getsize(const struct net_device *dev)
+{
+	return (0
+		+ nla_total_size(2) /* IFLA_IPVLAN_MODE */
+		);
+}
+
+static int ipvlan_nl_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	if (data && data[IFLA_IPVLAN_MODE]) {
+		u16 mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
+
+		if (mode < IPVLAN_MODE_L2 || mode >= IPVLAN_MODE_MAX)
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static int ipvlan_nl_fillinfo(struct sk_buff *skb,
+			      const struct net_device *dev)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev);
+	int ret = -EINVAL;
+
+	if (!port)
+		goto err;
+
+	ret = -EMSGSIZE;
+	if (nla_put_u16(skb, IFLA_IPVLAN_MODE, port->mode))
+		goto err;
+
+	return 0;
+
+err:
+	return ret;
+}
+
+static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+			   struct nlattr *tb[], struct nlattr *data[])
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_port *port;
+	struct net_device *phy_dev;
+	int err;
+
+	if (!tb[IFLA_LINK])
+		return -EINVAL;
+
+	phy_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!phy_dev)
+		return -ENODEV;
+
+	if (ipvlan_dev_slave(phy_dev)) {
+		struct ipvl_dev *tmp = netdev_priv(phy_dev);
+
+		phy_dev = tmp->phy_dev;
+	} else if (!ipvlan_dev_master(phy_dev)) {
+		err = ipvlan_port_create(phy_dev);
+		if (err < 0)
+			return err;
+	}
+
+	port = ipvlan_port_get_rtnl(phy_dev);
+	if (data && data[IFLA_IPVLAN_MODE])
+		port->mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
+
+	ipvlan->phy_dev = phy_dev;
+	ipvlan->dev = dev;
+	ipvlan->port = port;
+	ipvlan->sfeatures = IPVLAN_FEATURES;
+	INIT_LIST_HEAD(&ipvlan->addrs);
+	ipvlan->ipv4cnt = 0;
+	ipvlan->ipv6cnt = 0;
+
+	/* TODO Probably put random address here to be presented to the
+	 * world but keep using the physical-dev address for the outgoing
+	 * packets.
+	 */
+	memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
+
+	dev->priv_flags |= IFF_IPVLAN_SLAVE;
+
+	port->count += 1;
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto ipvlan_destroy_port;
+
+	err = netdev_upper_dev_link(phy_dev, dev);
+	if (err)
+		goto ipvlan_destroy_port;
+
+	list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
+	netif_stacked_transfer_operstate(phy_dev, dev);
+	return 0;
+
+ipvlan_destroy_port:
+	port->count -= 1;
+	if (!port->count)
+		ipvlan_port_destroy(phy_dev);
+
+	return err;
+}
+
+static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
+{
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_addr *addr, *next;
+
+	if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
+		list_for_each_entry_safe(addr, next, &ipvlan->addrs, anode) {
+			ipvlan_ht_addr_del(addr, !dev->dismantle);
+			list_del_rcu(&addr->anode);
+		}
+	}
+	list_del_rcu(&ipvlan->pnode);
+	unregister_netdevice_queue(dev, head);
+	netdev_upper_dev_unlink(ipvlan->phy_dev, dev);
+}
+
+static void ipvlan_link_setup(struct net_device *dev)
+{
+	ether_setup(dev);
+
+	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
+	dev->priv_flags |= IFF_UNICAST_FLT;
+	dev->netdev_ops = &ipvlan_netdev_ops;
+	dev->destructor = free_netdev;
+	dev->header_ops = &ipvlan_header_ops;
+	dev->ethtool_ops = &ipvlan_ethtool_ops;
+	dev->tx_queue_len = 0;
+}
+
+static const struct nla_policy ipvlan_nl_policy[IFLA_IPVLAN_MAX + 1] =
+{
+	[IFLA_IPVLAN_MODE] = { .type = NLA_U16 },
+};
+
+static struct rtnl_link_ops ipvlan_link_ops = {
+	.kind		= "ipvlan",
+	.priv_size	= sizeof(struct ipvl_dev),
+
+	.get_size	= ipvlan_nl_getsize,
+	.policy		= ipvlan_nl_policy,
+	.validate	= ipvlan_nl_validate,
+	.fill_info	= ipvlan_nl_fillinfo,
+	.changelink	= ipvlan_nl_changelink,
+	.maxtype	= IFLA_IPVLAN_MAX,
+
+	.setup		= ipvlan_link_setup,
+	.newlink	= ipvlan_link_new,
+	.dellink	= ipvlan_link_delete,
+};
+
+int ipvlan_link_register(struct rtnl_link_ops *ops)
+{
+	return rtnl_link_register(ops);
+}
+
+static int ipvlan_device_event(struct notifier_block *unused,
+			       unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct ipvl_dev *ipvlan, *next;
+	struct ipvl_port *port;
+	LIST_HEAD(lst_kill);
+
+	if (!ipvlan_dev_master(dev))
+		return NOTIFY_DONE;
+
+	port = ipvlan_port_get_rtnl(dev);
+
+	switch (event) {
+	case NETDEV_CHANGE:
+		list_for_each_entry(ipvlan, &port->ipvlans, pnode)
+			netif_stacked_transfer_operstate(ipvlan->phy_dev,
+							 ipvlan->dev);
+		break;
+
+	case NETDEV_UNREGISTER:
+		if (dev->reg_state != NETREG_UNREGISTERING)
+			break;
+
+		list_for_each_entry_safe(ipvlan, next, &port->ipvlans,
+					 pnode)
+			ipvlan->dev->rtnl_link_ops->dellink(ipvlan->dev,
+							    &lst_kill);
+		unregister_netdevice_many(&lst_kill);
+		break;
+
+	case NETDEV_FEAT_CHANGE:
+		list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
+			ipvlan->dev->features = dev->features & IPVLAN_FEATURES;
+			ipvlan->dev->gso_max_size = dev->gso_max_size;
+			netdev_features_change(ipvlan->dev);
+		}
+		break;
+
+	case NETDEV_CHANGEMTU:
+		list_for_each_entry(ipvlan, &port->ipvlans, pnode)
+			ipvlan_adjust_mtu(ipvlan, dev);
+		break;
+
+	case NETDEV_PRE_TYPE_CHANGE:
+		/* Forbid underlying device to change its type. */
+		return NOTIFY_BAD;
+	}
+	return NOTIFY_DONE;
+}
+
+static int ipvlan_add_addr6(struct ipvl_dev *ipvlan, struct in6_addr *ip6_addr)
+{
+	struct ipvl_addr *addr;
+
+	if (ipvlan_addr_busy(ipvlan, ip6_addr, true)) {
+		netif_err(ipvlan, ifup, ipvlan->dev,
+			  "Failed to add IPv6=%pI6c addr for %s intf\n",
+			  ip6_addr, ipvlan->dev->name);
+		return -EINVAL;
+	}
+	addr = kzalloc(sizeof(struct ipvl_addr), GFP_ATOMIC);
+	if (!addr)
+		return -ENOMEM;
+
+	addr->master = ipvlan;
+	memcpy(&addr->ip6addr, ip6_addr, sizeof(struct in6_addr));
+	addr->atype = IPVL_IPV6;
+	list_add_tail_rcu(&addr->anode, &ipvlan->addrs);
+	ipvlan->ipv6cnt++;
+	ipvlan_ht_addr_add(ipvlan, addr);
+
+	return 0;
+}
+
+static void ipvlan_del_addr6(struct ipvl_dev *ipvlan, struct in6_addr *ip6_addr)
+{
+	struct ipvl_addr *addr;
+
+	addr = ipvlan_ht_addr_lookup(ipvlan->port, ip6_addr, true);
+	if (!addr)
+		return;
+
+	ipvlan_ht_addr_del(addr, true);
+	list_del_rcu(&addr->anode);
+	ipvlan->ipv6cnt--;
+	WARN_ON(ipvlan->ipv6cnt < 0);
+	kfree_rcu(addr, rcu);
+
+	return;
+}
+
+static int ipvlan_addr6_event(struct notifier_block *unused,
+			      unsigned long event, void *ptr)
+{
+	struct inet6_ifaddr *if6 = (struct inet6_ifaddr *)ptr;
+	struct net_device *dev = (struct net_device *)if6->idev->dev;
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+	if (!ipvlan_dev_slave(dev))
+		return NOTIFY_DONE;
+
+	if (!ipvlan || !ipvlan->port)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_UP:
+		if (ipvlan_add_addr6(ipvlan, &if6->addr))
+			return NOTIFY_BAD;
+		break;
+
+	case NETDEV_DOWN:
+		ipvlan_del_addr6(ipvlan, &if6->addr);
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static int ipvlan_add_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
+{
+	struct ipvl_addr *addr;
+
+	if (ipvlan_addr_busy(ipvlan, ip4_addr, false)) {
+		netif_err(ipvlan, ifup, ipvlan->dev,
+			  "Failed to add IPv4=%pI4 on %s intf.\n",
+			  ip4_addr, ipvlan->dev->name);
+		return -EINVAL;
+	}
+	addr = kzalloc(sizeof(struct ipvl_addr), GFP_KERNEL);
+	if (!addr)
+		return -ENOMEM;
+
+	addr->master = ipvlan;
+	memcpy(&addr->ip4addr, ip4_addr, sizeof(struct in_addr));
+	addr->atype = IPVL_IPV4;
+	list_add_tail_rcu(&addr->anode, &ipvlan->addrs);
+	ipvlan->ipv4cnt++;
+	ipvlan_ht_addr_add(ipvlan, addr);
+	ipvlan_set_broadcast_mac_filter(ipvlan, true);
+
+	return 0;
+}
+
+static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
+{
+	struct ipvl_addr *addr;
+
+	addr = ipvlan_ht_addr_lookup(ipvlan->port, ip4_addr, false);
+	if (!addr)
+		return;
+
+	ipvlan_ht_addr_del(addr, true);
+	list_del_rcu(&addr->anode);
+	ipvlan->ipv4cnt--;
+	WARN_ON(ipvlan->ipv4cnt < 0);
+	if (!ipvlan->ipv4cnt)
+	    ipvlan_set_broadcast_mac_filter(ipvlan, false);
+	kfree_rcu(addr, rcu);
+
+	return;
+}
+
+static int ipvlan_addr4_event(struct notifier_block *unused,
+			      unsigned long event, void *ptr)
+{
+	struct in_ifaddr *if4 = (struct in_ifaddr *)ptr;
+	struct net_device *dev = (struct net_device *)if4->ifa_dev->dev;
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct in_addr ip4_addr;
+
+	if (!ipvlan_dev_slave(dev))
+		return NOTIFY_DONE;
+
+	if (!ipvlan || !ipvlan->port)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_UP:
+		ip4_addr.s_addr = if4->ifa_address;
+		if (ipvlan_add_addr4(ipvlan, &ip4_addr))
+			return NOTIFY_BAD;
+		break;
+
+	case NETDEV_DOWN:
+		ip4_addr.s_addr = if4->ifa_address;
+		ipvlan_del_addr4(ipvlan, &ip4_addr);
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ipvlan_addr4_notifier_block __read_mostly = {
+	.notifier_call = ipvlan_addr4_event,
+};
+
+static struct notifier_block ipvlan_notifier_block __read_mostly = {
+	.notifier_call = ipvlan_device_event,
+};
+
+static struct notifier_block ipvlan_addr6_notifier_block __read_mostly = {
+	.notifier_call = ipvlan_addr6_event,
+};
+
+static int __init ipvlan_init_module(void)
+{
+	int err;
+
+	ipvlan_init_secret();
+	register_netdevice_notifier(&ipvlan_notifier_block);
+	register_inet6addr_notifier(&ipvlan_addr6_notifier_block);
+	register_inetaddr_notifier(&ipvlan_addr4_notifier_block);
+
+	err = ipvlan_link_register(&ipvlan_link_ops);
+	if (err < 0)
+		goto error;
+
+	return 0;
+error:
+	unregister_inetaddr_notifier(&ipvlan_addr4_notifier_block);
+	unregister_inet6addr_notifier(&ipvlan_addr6_notifier_block);
+	unregister_netdevice_notifier(&ipvlan_notifier_block);
+	return err;
+}
+
+static void __exit ipvlan_cleanup_module(void)
+{
+	rtnl_link_unregister(&ipvlan_link_ops);
+	unregister_netdevice_notifier(&ipvlan_notifier_block);
+	unregister_inetaddr_notifier(&ipvlan_addr4_notifier_block);
+	unregister_inet6addr_notifier(&ipvlan_addr6_notifier_block);
+}
+
+module_init(ipvlan_init_module);
+module_exit(ipvlan_cleanup_module);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mahesh Bandewar <maheshb@google.com>");
+MODULE_DESCRIPTION("Driver for L3 (IPv6/IPv4) based VLANs");
+MODULE_ALIAS_RTNL_LINK("ipvlan");
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cd508787572..2cb772495f7a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1230,6 +1230,8 @@ enum netdev_priv_flags {
 	IFF_LIVE_ADDR_CHANGE		= 1<<20,
 	IFF_MACVLAN			= 1<<21,
 	IFF_XMIT_DST_RELEASE_PERM	= 1<<22,
+	IFF_IPVLAN_MASTER		= 1<<23,
+	IFF_IPVLAN_SLAVE		= 1<<24,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1255,6 +1257,8 @@ enum netdev_priv_flags {
 #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
 #define IFF_MACVLAN			IFF_MACVLAN
 #define IFF_XMIT_DST_RELEASE_PERM	IFF_XMIT_DST_RELEASE_PERM
+#define IFF_IPVLAN_MASTER		IFF_IPVLAN_MASTER
+#define IFF_IPVLAN_SLAVE		IFF_IPVLAN_SLAVE
 
 /**
  *	struct net_device - The DEVICE structure.
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7072d8325016..36bddc233633 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -330,6 +330,21 @@ enum macvlan_macaddr_mode {
 
 #define MACVLAN_FLAG_NOPROMISC	1
 
+/* IPVLAN section */
+enum {
+	IFLA_IPVLAN_UNSPEC,
+	IFLA_IPVLAN_MODE,
+	__IFLA_IPVLAN_MAX
+};
+
+#define IFLA_IPVLAN_MAX (__IFLA_IPVLAN_MAX - 1)
+
+enum ipvlan_mode {
+	IPVLAN_MODE_L2 = 0,
+	IPVLAN_MODE_L3,
+	IPVLAN_MODE_MAX
+};
+
 /* VXLAN section */
 enum {
 	IFLA_VXLAN_UNSPEC,
-- 
cgit v1.2.3


From 007f790c8276271de26416f90d55561bcc96588a Mon Sep 17 00:00:00 2001
From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 28 Nov 2014 14:34:17 +0100
Subject: net: introduce generic switch devices support

The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
 MAINTAINERS                            |  7 ++++
 include/linux/netdevice.h              | 10 ++++++
 include/net/switchdev.h                | 30 +++++++++++++++++
 net/Kconfig                            |  1 +
 net/Makefile                           |  3 ++
 net/switchdev/Kconfig                  | 13 ++++++++
 net/switchdev/Makefile                 |  5 +++
 net/switchdev/switchdev.c              | 33 +++++++++++++++++++
 9 files changed, 161 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

(limited to 'Documentation')

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
new file mode 100644
index 000000000000..f981a9295a39
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,59 @@
+Switch (and switch-ish) device drivers HOWTO
+===========================
+
+Please note that the word "switch" is here used in very generic meaning.
+This include devices supporting L2/L3 but also various flow offloading chips,
+including switches embedded into SR-IOV NICs.
+
+Lets describe a topology a bit. Imagine the following example:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  NIC0 NIC1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  eth0 eth1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+  ndo_switch_parent_id_get - This returns the same ID for two port netdevices
+			     of the same physical switch chip. This is
+			     mandatory to be implemented by all switch drivers
+			     and serves the caller for recognition of a port
+			     netdevice.
+  ndo_switch_parent_* - Functions that serve for a manipulation of the switch
+			chip itself (it can be though of as a "parent" of the
+			port, therefore the name). They are not port-specific.
+			Caller might use arbitrary port netdevice of the same
+			switch and it will make no difference.
+  ndo_switch_port_* - Functions that serve for a port-specific manipulation.
diff --git a/MAINTAINERS b/MAINTAINERS
index 6b880deae3d2..3aba0ac2d5a3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9059,6 +9059,13 @@ F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCHDEV
+M:	Jiri Pirko <jiri@resnulli.us>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/switchdev/
+F:	include/net/switchdev.h
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4bd41d72559d..3603f31e78f3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	performing GSO on a packet. The device returns true if it is
  *	able to GSO the packet, false otherwise. If the return value is
  *	false the stack will do software GSO.
+ *
+ * int (*ndo_switch_parent_id_get)(struct net_device *dev,
+ *				   struct netdev_phys_item_id *psid);
+ *	Called to get an ID of the switch chip this port is part of.
+ *	If driver implements this, it indicates that it represents a port
+ *	of a switch chip.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1171,6 +1177,10 @@ struct net_device_ops {
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
 	bool			(*ndo_gso_check) (struct sk_buff *skb,
 						  struct net_device *dev);
+#ifdef CONFIG_NET_SWITCHDEV
+	int			(*ndo_switch_parent_id_get)(struct net_device *dev,
+							    struct netdev_phys_item_id *psid);
+#endif
 };
 
 /**
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
new file mode 100644
index 000000000000..7a52360a1446
--- /dev/null
+++ b/include/net/switchdev.h
@@ -0,0 +1,30 @@
+/*
+ * include/net/switchdev.h - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _LINUX_SWITCHDEV_H_
+#define _LINUX_SWITCHDEV_H_
+
+#include <linux/netdevice.h>
+
+#ifdef CONFIG_NET_SWITCHDEV
+
+int netdev_switch_parent_id_get(struct net_device *dev,
+				struct netdev_phys_item_id *psid);
+
+#else
+
+static inline int netdev_switch_parent_id_get(struct net_device *dev,
+					      struct netdev_phys_item_id *psid)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif
+
+#endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 99815b5454bf..ff9ffc17fa0e 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
+source "net/switchdev/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 7ed1970074b0..95fc694e4ddc 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+ifneq ($(CONFIG_NET_SWITCHDEV),)
+obj-y				+= switchdev/
+endif
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
new file mode 100644
index 000000000000..155754588fd6
--- /dev/null
+++ b/net/switchdev/Kconfig
@@ -0,0 +1,13 @@
+#
+# Configuration for Switch device support
+#
+
+config NET_SWITCHDEV
+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
+	depends on INET
+	---help---
+	  This module provides glue between core networking code and device
+	  drivers in order to support hardware switch chips in very generic
+	  meaning of the word "switch". This include devices supporting L2/L3 but
+	  also various flow offloading chips, including switches embedded into
+	  SR-IOV NICs.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
new file mode 100644
index 000000000000..5ed63ed324d0
--- /dev/null
+++ b/net/switchdev/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Switch device API
+#
+
+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
new file mode 100644
index 000000000000..66973deaae56
--- /dev/null
+++ b/net/switchdev/switchdev.c
@@ -0,0 +1,33 @@
+/*
+ * net/switchdev/switchdev.c - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <net/switchdev.h>
+
+/**
+ *	netdev_switch_parent_id_get - Get ID of a switch
+ *	@dev: port device
+ *	@psid: switch ID
+ *
+ *	Get ID of a switch this port is part of.
+ */
+int netdev_switch_parent_id_get(struct net_device *dev,
+				struct netdev_phys_item_id *psid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_switch_parent_id_get)
+		return -EOPNOTSUPP;
+	return ops->ndo_switch_parent_id_get(dev, psid);
+}
+EXPORT_SYMBOL(netdev_switch_parent_id_get);
-- 
cgit v1.2.3


From aecbe01e7410ad2de022796472f531ae6941f15e Mon Sep 17 00:00:00 2001
From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 28 Nov 2014 14:34:19 +0100
Subject: net-sysfs: expose physical switch id for particular device

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/ABI/testing/sysfs-class-net |  8 ++++++++
 net/core/net-sysfs.c                      | 24 ++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index e1b2e785bba8..beb8ec4dabbc 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -216,3 +216,11 @@ Contact:	netdev@vger.kernel.org
 Description:
 		Indicates the interface protocol type as a decimal value. See
 		include/uapi/linux/if_arp.h for all possible values.
+
+What:		/sys/class/net/<iface>/phys_switch_id
+Date:		November 2014
+KernelVersion:	3.19
+Contact:	netdev@vger.kernel.org
+Description:
+		Indicates the unique physical switch identifier of a switch this
+		port belongs to, as a string.
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 26c46f4726c5..999341244434 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -12,6 +12,7 @@
 #include <linux/capability.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <linux/if_arp.h>
 #include <linux/slab.h>
 #include <linux/nsproxy.h>
@@ -416,6 +417,28 @@ static ssize_t phys_port_id_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(phys_port_id);
 
+static ssize_t phys_switch_id_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct net_device *netdev = to_net_dev(dev);
+	ssize_t ret = -EINVAL;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (dev_isalive(netdev)) {
+		struct netdev_phys_item_id ppid;
+
+		ret = netdev_switch_parent_id_get(netdev, &ppid);
+		if (!ret)
+			ret = sprintf(buf, "%*phN\n", ppid.id_len, ppid.id);
+	}
+	rtnl_unlock();
+
+	return ret;
+}
+static DEVICE_ATTR_RO(phys_switch_id);
+
 static struct attribute *net_class_attrs[] = {
 	&dev_attr_netdev_group.attr,
 	&dev_attr_type.attr,
@@ -441,6 +464,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_tx_queue_len.attr,
 	&dev_attr_gro_flush_timeout.attr,
 	&dev_attr_phys_port_id.attr,
+	&dev_attr_phys_switch_id.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);
-- 
cgit v1.2.3


From 829ae9d611651467fe6cd7be834bd33ca6b28dfe Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Sun, 30 Nov 2014 22:22:34 -0500
Subject: net-timestamp: allow reading recv cmsg on errqueue with origin tstamp

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/timestamping.txt | 12 +++++++++++-
 include/uapi/linux/net_tstamp.h           |  3 ++-
 net/ipv4/ip_sockglue.c                    | 22 ++++++++++++++++++++--
 net/ipv6/datagram.c                       | 21 +++++++++++++++++++--
 4 files changed, 52 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 1d6d02d6ba52..b08e27261ff9 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -122,7 +122,7 @@ SOF_TIMESTAMPING_RAW_HARDWARE:
 
 1.3.3 Timestamp Options
 
-The interface supports one option
+The interface supports the options
 
 SOF_TIMESTAMPING_OPT_ID:
 
@@ -145,6 +145,16 @@ SOF_TIMESTAMPING_OPT_ID:
   stream sockets, it increments with every byte.
 
 
+SOF_TIMESTAMPING_OPT_CMSG:
+
+  Support recv() cmsg for all timestamped packets. Control messages
+  are already supported unconditionally on all packets with receive
+  timestamps and on IPv6 packets with transmit timestamp. This option
+  extends them to IPv4 packets with transmit timestamp. One use case
+  is to correlate packets with their egress device, by enabling socket
+  option IP_PKTINFO simultaneously.
+
+
 1.4 Bytestream Timestamps
 
 The SO_TIMESTAMPING interface supports timestamping of bytes in a
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index ff354021bb69..edbc888ceb51 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -23,8 +23,9 @@ enum {
 	SOF_TIMESTAMPING_OPT_ID = (1<<7),
 	SOF_TIMESTAMPING_TX_SCHED = (1<<8),
 	SOF_TIMESTAMPING_TX_ACK = (1<<9),
+	SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ACK,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_CMSG,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 59eba6c7a512..640f26c6a9fe 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -399,6 +399,22 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
 		kfree_skb(skb);
 }
 
+static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
+					  const struct sk_buff *skb,
+					  int ee_origin)
+{
+	struct in_pktinfo *info = PKTINFO_SKB_CB(skb);
+
+	if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
+	    (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
+	    (!skb->dev))
+		return false;
+
+	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
+	info->ipi_ifindex = skb->dev->ifindex;
+	return true;
+}
+
 /*
  *	Handle MSG_ERRQUEUE
  */
@@ -446,7 +462,9 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	memcpy(&errhdr.ee, &serr->ee, sizeof(struct sock_extended_err));
 	sin = &errhdr.offender;
 	sin->sin_family = AF_UNSPEC;
-	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
+
+	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+	    ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin)) {
 		struct inet_sock *inet = inet_sk(sk);
 
 		sin->sin_family = AF_INET;
@@ -1051,7 +1069,7 @@ e_inval:
 }
 
 /**
- * ipv4_pktinfo_prepare - transfert some info from rtable to skb
+ * ipv4_pktinfo_prepare - transfer some info from rtable to skb
  * @sk: socket
  * @skb: buffer
  *
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index cc1139687fd7..2464a00e36ab 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -325,6 +325,16 @@ void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu)
 	kfree_skb(skb);
 }
 
+static void ip6_datagram_prepare_pktinfo_errqueue(struct sk_buff *skb)
+{
+	int ifindex = skb->dev ? skb->dev->ifindex : -1;
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		IP6CB(skb)->iif = ifindex;
+	else
+		PKTINFO_SKB_CB(skb)->ipi_ifindex = ifindex;
+}
+
 /*
  *	Handle MSG_ERRQUEUE
  */
@@ -388,8 +398,12 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 		sin->sin6_family = AF_INET6;
 		sin->sin6_flowinfo = 0;
 		sin->sin6_port = 0;
-		if (np->rxopt.all)
+		if (np->rxopt.all) {
+			if (serr->ee.ee_origin != SO_EE_ORIGIN_ICMP &&
+			    serr->ee.ee_origin != SO_EE_ORIGIN_ICMP6)
+				ip6_datagram_prepare_pktinfo_errqueue(skb);
 			ip6_datagram_recv_common_ctl(sk, msg, skb);
+		}
 		if (skb->protocol == htons(ETH_P_IPV6)) {
 			sin->sin6_addr = ipv6_hdr(skb)->saddr;
 			if (np->rxopt.all)
@@ -491,7 +505,10 @@ void ip6_datagram_recv_common_ctl(struct sock *sk, struct msghdr *msg,
 			ipv6_addr_set_v4mapped(ip_hdr(skb)->daddr,
 					       &src_info.ipi6_addr);
 		}
-		put_cmsg(msg, SOL_IPV6, IPV6_PKTINFO, sizeof(src_info), &src_info);
+
+		if (src_info.ipi6_ifindex >= 0)
+			put_cmsg(msg, SOL_IPV6, IPV6_PKTINFO,
+				 sizeof(src_info), &src_info);
 	}
 }
 
-- 
cgit v1.2.3


From cbd3aad5ce66f5a266a185aa37e0eb9be9ba4154 Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Sun, 30 Nov 2014 22:22:35 -0500
Subject: net-timestamp: expand documentation and test

Documentation:
  expand explanation of timestamp counter

Test:
  new: flag -I requests and prints PKTINFO
  new: flag -x prints payload (possibly truncated)
  fix: remove pretty print that breaks common flag '-l 1'

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/timestamping.txt          | 23 ++++--
 .../networking/timestamping/txtimestamp.c          | 90 +++++++++++++++++++---
 2 files changed, 93 insertions(+), 20 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index b08e27261ff9..a5c784c89312 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -130,19 +130,26 @@ SOF_TIMESTAMPING_OPT_ID:
   have multiple concurrent timestamping requests outstanding. Packets
   can be reordered in the transmit path, for instance in the packet
   scheduler. In that case timestamps will be queued onto the error
-  queue out of order from the original send() calls. This option
-  embeds a counter that is incremented at send() time, to order
-  timestamps within a flow.
+  queue out of order from the original send() calls. It is not always
+  possible to uniquely match timestamps to the original send() calls
+  based on timestamp order or payload inspection alone, then.
+
+  This option associates each packet at send() with a unique
+  identifier and returns that along with the timestamp. The identifier
+  is derived from a per-socket u32 counter (that wraps). For datagram
+  sockets, the counter increments with each sent packet. For stream
+  sockets, it increments with every byte.
+
+  The counter starts at zero. It is initialized the first time that
+  the socket option is enabled. It is reset each time the option is
+  enabled after having been disabled. Resetting the counter does not
+  change the identifiers of existing packets in the system.
 
   This option is implemented only for transmit timestamps. There, the
   timestamp is always looped along with a struct sock_extended_err.
   The option modifies field ee_data to pass an id that is unique
   among all possibly concurrently outstanding timestamp requests for
-  that socket. In practice, it is a monotonically increasing u32
-  (that wraps).
-
-  In datagram sockets, the counter increments on each send call. In
-  stream sockets, it increments with every byte.
+  that socket.
 
 
 SOF_TIMESTAMPING_OPT_CMSG:
diff --git a/Documentation/networking/timestamping/txtimestamp.c b/Documentation/networking/timestamping/txtimestamp.c
index b32fc2a07734..876f71c5625a 100644
--- a/Documentation/networking/timestamping/txtimestamp.c
+++ b/Documentation/networking/timestamping/txtimestamp.c
@@ -46,6 +46,7 @@
 #include <netpacket/packet.h>
 #include <poll.h>
 #include <stdarg.h>
+#include <stdbool.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -58,6 +59,14 @@
 #include <time.h>
 #include <unistd.h>
 
+/* ugly hack to work around netinet/in.h and linux/ipv6.h conflicts */
+#ifndef in6_pktinfo
+struct in6_pktinfo {
+	struct in6_addr	ipi6_addr;
+	int		ipi6_ifindex;
+};
+#endif
+
 /* command line parameters */
 static int cfg_proto = SOCK_STREAM;
 static int cfg_ipproto = IPPROTO_TCP;
@@ -65,6 +74,8 @@ static int cfg_num_pkts = 4;
 static int do_ipv4 = 1;
 static int do_ipv6 = 1;
 static int cfg_payload_len = 10;
+static bool cfg_show_payload;
+static bool cfg_do_pktinfo;
 static uint16_t dest_port = 9000;
 
 static struct sockaddr_in daddr;
@@ -131,6 +142,30 @@ static void print_timestamp(struct scm_timestamping *tss, int tstype,
 	__print_timestamp(tsname, &tss->ts[0], tskey, payload_len);
 }
 
+/* TODO: convert to check_and_print payload once API is stable */
+static void print_payload(char *data, int len)
+{
+	int i;
+
+	if (len > 70)
+		len = 70;
+
+	fprintf(stderr, "payload: ");
+	for (i = 0; i < len; i++)
+		fprintf(stderr, "%02hhx ", data[i]);
+	fprintf(stderr, "\n");
+}
+
+static void print_pktinfo(int family, int ifindex, void *saddr, void *daddr)
+{
+	char sa[INET6_ADDRSTRLEN], da[INET6_ADDRSTRLEN];
+
+	fprintf(stderr, "         pktinfo: ifindex=%u src=%s dst=%s\n",
+		ifindex,
+		saddr ? inet_ntop(family, saddr, sa, sizeof(sa)) : "unknown",
+		daddr ? inet_ntop(family, daddr, da, sizeof(da)) : "unknown");
+}
+
 static void __poll(int fd)
 {
 	struct pollfd pollfd;
@@ -156,10 +191,9 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
 		    cm->cmsg_type == SCM_TIMESTAMPING) {
 			tss = (void *) CMSG_DATA(cm);
 		} else if ((cm->cmsg_level == SOL_IP &&
-		     cm->cmsg_type == IP_RECVERR) ||
-		    (cm->cmsg_level == SOL_IPV6 &&
-		     cm->cmsg_type == IPV6_RECVERR)) {
-
+			    cm->cmsg_type == IP_RECVERR) ||
+			   (cm->cmsg_level == SOL_IPV6 &&
+			    cm->cmsg_type == IPV6_RECVERR)) {
 			serr = (void *) CMSG_DATA(cm);
 			if (serr->ee_errno != ENOMSG ||
 			    serr->ee_origin != SO_EE_ORIGIN_TIMESTAMPING) {
@@ -168,6 +202,16 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
 						serr->ee_origin);
 				serr = NULL;
 			}
+		} else if (cm->cmsg_level == SOL_IP &&
+			   cm->cmsg_type == IP_PKTINFO) {
+			struct in_pktinfo *info = (void *) CMSG_DATA(cm);
+			print_pktinfo(AF_INET, info->ipi_ifindex,
+				      &info->ipi_spec_dst, &info->ipi_addr);
+		} else if (cm->cmsg_level == SOL_IPV6 &&
+			   cm->cmsg_type == IPV6_PKTINFO) {
+			struct in6_pktinfo *info6 = (void *) CMSG_DATA(cm);
+			print_pktinfo(AF_INET6, info6->ipi6_ifindex,
+				      NULL, &info6->ipi6_addr);
 		} else
 			fprintf(stderr, "unknown cmsg %d,%d\n",
 					cm->cmsg_level, cm->cmsg_type);
@@ -206,7 +250,11 @@ static int recv_errmsg(int fd)
 	if (ret == -1 && errno != EAGAIN)
 		error(1, errno, "recvmsg");
 
-	__recv_errmsg_cmsg(&msg, ret);
+	if (ret > 0) {
+		__recv_errmsg_cmsg(&msg, ret);
+		if (cfg_show_payload)
+			print_payload(data, cfg_payload_len);
+	}
 
 	free(data);
 	return ret == -1;
@@ -215,9 +263,9 @@ static int recv_errmsg(int fd)
 static void do_test(int family, unsigned int opt)
 {
 	char *buf;
-	int fd, i, val, total_len;
+	int fd, i, val = 1, total_len;
 
-	if (family == IPPROTO_IPV6 && cfg_proto != SOCK_STREAM) {
+	if (family == AF_INET6 && cfg_proto != SOCK_STREAM) {
 		/* due to lack of checksum generation code */
 		fprintf(stderr, "test: skipping datagram over IPv6\n");
 		return;
@@ -239,7 +287,6 @@ static void do_test(int family, unsigned int opt)
 		error(1, errno, "socket");
 
 	if (cfg_proto == SOCK_STREAM) {
-		val = 1;
 		if (setsockopt(fd, IPPROTO_TCP, TCP_NODELAY,
 			       (char*) &val, sizeof(val)))
 			error(1, 0, "setsockopt no nagle");
@@ -253,7 +300,20 @@ static void do_test(int family, unsigned int opt)
 		}
 	}
 
+	if (cfg_do_pktinfo) {
+		if (family == AF_INET6) {
+			if (setsockopt(fd, SOL_IPV6, IPV6_RECVPKTINFO,
+				       &val, sizeof(val)))
+				error(1, errno, "setsockopt pktinfo ipv6");
+		} else {
+			if (setsockopt(fd, SOL_IP, IP_PKTINFO,
+				       &val, sizeof(val)))
+				error(1, errno, "setsockopt pktinfo ipv4");
+		}
+	}
+
 	opt |= SOF_TIMESTAMPING_SOFTWARE |
+	       SOF_TIMESTAMPING_OPT_CMSG |
 	       SOF_TIMESTAMPING_OPT_ID;
 	if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
 		       (char *) &opt, sizeof(opt)))
@@ -262,8 +322,6 @@ static void do_test(int family, unsigned int opt)
 	for (i = 0; i < cfg_num_pkts; i++) {
 		memset(&ts_prev, 0, sizeof(ts_prev));
 		memset(buf, 'a' + i, total_len);
-		buf[total_len - 2] = '\n';
-		buf[total_len - 1] = '\0';
 
 		if (cfg_proto == SOCK_RAW) {
 			struct udphdr *udph;
@@ -324,11 +382,13 @@ static void __attribute__((noreturn)) usage(const char *filepath)
 			"  -4:   only IPv4\n"
 			"  -6:   only IPv6\n"
 			"  -h:   show this message\n"
+			"  -I:   request PKTINFO\n"
 			"  -l N: send N bytes at a time\n"
 			"  -r:   use raw\n"
 			"  -R:   use raw (IP_HDRINCL)\n"
 			"  -p N: connect to port N\n"
-			"  -u:   use udp\n",
+			"  -u:   use udp\n"
+			"  -x:   show payload (up to 70 bytes)\n",
 			filepath);
 	exit(1);
 }
@@ -338,7 +398,7 @@ static void parse_opt(int argc, char **argv)
 	int proto_count = 0;
 	char c;
 
-	while ((c = getopt(argc, argv, "46hl:p:rRu")) != -1) {
+	while ((c = getopt(argc, argv, "46hIl:p:rRux")) != -1) {
 		switch (c) {
 		case '4':
 			do_ipv6 = 0;
@@ -346,6 +406,9 @@ static void parse_opt(int argc, char **argv)
 		case '6':
 			do_ipv4 = 0;
 			break;
+		case 'I':
+			cfg_do_pktinfo = true;
+			break;
 		case 'r':
 			proto_count++;
 			cfg_proto = SOCK_RAW;
@@ -367,6 +430,9 @@ static void parse_opt(int argc, char **argv)
 		case 'p':
 			dest_port = strtoul(optarg, NULL, 10);
 			break;
+		case 'x':
+			cfg_show_payload = true;
+			break;
 		case 'h':
 		default:
 			usage(argv[0]);
-- 
cgit v1.2.3


From 5d330cddb907df0911652c447baf9ee1a137245d Mon Sep 17 00:00:00 2001
From: Andrew Shewmaker <agshew@gmail.com>
Date: Wed, 3 Dec 2014 14:07:31 -0800
Subject: Update old iproute2 and Xen Remus links

Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/Changes | 2 +-
 net/sched/Kconfig     | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/Changes b/Documentation/Changes
index 1de131bb49fb..74bdda9272a4 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -383,7 +383,7 @@ o  <http://www.iptables.org/downloads.html>
 
 Ip-route2
 ---------
-o  <ftp://ftp.tux.org/pub/net/ip-routing/iproute2-2.2.4-now-ss991023.tar.gz>
+o  <https://www.kernel.org/pub/linux/utils/net/iproute2/>
 
 OProfile
 --------
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a1a8e29e5fc9..d17053d9e9e2 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -22,8 +22,9 @@ menuconfig NET_SCHED
 	  This code is considered to be experimental.
 
 	  To administer these schedulers, you'll need the user-level utilities
-	  from the package iproute2+tc at <ftp://ftp.tux.org/pub/net/ip-routing/>.
-	  That package also contains some documentation; for more, check out
+	  from the package iproute2+tc at
+	  <https://www.kernel.org/pub/linux/utils/net/iproute2/>.  That package
+	  also contains some documentation; for more, check out
 	  <http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2>.
 
 	  This Quality of Service (QoS) support will enable you to use
@@ -336,7 +337,7 @@ config NET_SCH_PLUG
 	  of virtual machines by allowing the generated network output to be rolled
 	  back if needed.
 
-	  For more information, please refer to http://wiki.xensource.com/xenwiki/Remus
+	  For more information, please refer to <http://wiki.xenproject.org/wiki/Remus>
 
 	  Say Y here if you are using this kernel for Xen dom0 and
 	  want to protect Xen guests with Remus.
-- 
cgit v1.2.3


From 6dc696401a819d01428301bb0236f47ab14b7cda Mon Sep 17 00:00:00 2001
From: Rami Rosen <ramirose@gmail.com>
Date: Fri, 5 Dec 2014 19:35:43 +0200
Subject: Documentation (ixgbe.txt): use a decimal address.

This patch fixes the erronous usage of an hexadecimal address in the
example, by replacing it with a decimal address.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ixgbe.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt
index 96cccebb839b..0ace6e776ac8 100644
--- a/Documentation/networking/ixgbe.txt
+++ b/Documentation/networking/ixgbe.txt
@@ -138,7 +138,7 @@ Other ethtool Commands:
 To enable Flow Director
 	ethtool -K ethX ntuple on
 To add a filter
-	Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 0x178000a
+	Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23
         action 1
 To see the list of filters currently present:
 	ethtool -u ethX
-- 
cgit v1.2.3