lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2017-04-24	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus	Linus Torvalds
	Pull MIPS fixes from Ralf Baechle: "Another round of 4.11 for the MIPS architecture. This time around it's mostly arch but little platforms-specific code. - PCI: Register controllers in the right order to aoid a PCI error - KGDB: Use kernel context for sleeping threads - smp-cps: Fix potentially uninitialised value of core - KASLR: Fix build - ELF: Fix BUG() warning in arch_check_elf - Fix modversioning of _mcount symbol - fix out-of-tree defconfig target builds - cevt-r4k: Fix out-of-bounds array access - perf: fix deadlock - Malta: Fix i8259 irqchip setup" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: PCI: add controllers before the specified head MIPS: KGDB: Use kernel context for sleeping threads MIPS: smp-cps: Fix potentially uninitialised value of core MIPS: KASLR: Add missing header files MIPS: Avoid BUG warning in arch_check_elf MIPS: Fix modversioning of _mcount symbol MIPS: generic: fix out-of-tree defconfig target builds MIPS: cevt-r4k: Fix out-of-bounds array access MIPS: perf: fix deadlock MIPS: Malta: Fix i8259 irqchip setup
2017-04-24	Merge tag 'mlx5-fixes-2017-04-22' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2017-04-22 This series contains some mlx5 fixes for net. For your convenience, the series doesn't introduce any conflict with the ongoing net-next pull request. Please pull and let me know if there's any problem. For -stable: ("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10 ("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8 ("net/mlx5e: Fix small packet threshold") kernels >= 4.7 ("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	net: ipv6: send unsolicited NA if enabled for all interfaces	David Ahern
	When arp_notify is set to 1 for either a specific interface or for 'all' interfaces, gratuitous arp requests are sent. Since ndisc_notify is the ipv6 equivalent to arp_notify, it should follow the same semantics. Commit 4a6e3c5def13 ("net: ipv6: send unsolicited NA on admin up") sends the NA on admin up. The final piece is checking devconf_all->ndisc_notify in addition to the per device setting. Add it. Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	ravb: Double free on error in ravb_start_xmit()	Dan Carpenter
	If skb_put_padto() fails then it frees the skb. I shifted that code up a bit to make my error handling a little simpler. Fixes: a0d2f20650e8 ("Renesas Ethernet AVB PTP clock driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	udp: disable inner UDP checksum offloads in IPsec case	Ansis Atteka
	Otherwise, UDP checksum offloads could corrupt ESP packets by attempting to calculate UDP checksum when this inner UDP packet is already protected by IPsec. One way to reproduce this bug is to have a VM with virtio_net driver (UFO set to ON in the guest VM); and then encapsulate all guest's Ethernet frames in Geneve; and then further encrypt Geneve with IPsec. In this case following symptoms are observed: 1. If using ixgbe NIC, then it will complain with following error message: ixgbe 0000:01:00.1: partial checksum but l4 proto=32! 2. Receiving IPsec stack will drop all the corrupted ESP packets and increase XfrmInStateProtoError counter in /proc/net/xfrm_stat. 3. iperf UDP test from the VM with packet sizes above MTU will not work at all. 4. iperf TCP test from the VM will get ridiculously low performance because. Signed-off-by: Ansis Atteka <aatteka@ovn.org> Co-authored-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	macsec: avoid heap overflow in skb_to_sgvec	Jason A. Donenfeld
	While this may appear as a humdrum one line change, it's actually quite important. An sk_buff stores data in three places: 1. A linear chunk of allocated memory in skb->data. This is the easiest one to work with, but it precludes using scatterdata since the memory must be linear. 2. The array skb_shinfo(skb)->frags, which is of maximum length MAX_SKB_FRAGS. This is nice for scattergather, since these fragments can point to different pages. 3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff, which in turn can have data in either (1) or (2). The first two are rather easy to deal with, since they're of a fixed maximum length, while the third one is not, since there can be potentially limitless chains of fragments. Fortunately dealing with frag_list is opt-in for drivers, so drivers don't actually have to deal with this mess. For whatever reason, macsec decided it wanted pain, and so it explicitly specified NETIF_F_FRAGLIST. Because dealing with (1), (2), and (3) is insane, most users of sk_buff doing any sort of crypto or paging operation calls a convenient function called skb_to_sgvec (which happens to be recursive if (3) is in use!). This takes a sk_buff as input, and writes into its output pointer an array of scattergather list items. Sometimes people like to declare a fixed size scattergather list on the stack; othertimes people like to allocate a fixed size scattergather list on the heap. However, if you're doing it in a fixed-size fashion, you really shouldn't be using NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its frag_list children arent't shared and then you check the number of fragments in total required.) Macsec specifically does this: size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1); tmp = kmalloc(size, GFP_ATOMIC); sg = (struct scatterlist )(tmp + sg_offset); ... sg_init_table(sg, MAX_SKB_FRAGS + 1); skb_to_sgvec(skb, sg, 0, skb->len); Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will overflow the heap, and disaster ensues. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: stable@vger.kernel.org Cc: security@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	ipv4: Avoid caching l3mdev dst on mismatched local route	Robert Shearman
	David reported that doing the following: ip li add red type vrf table 10 ip link set dev eth1 vrf red ip addr add 127.0.0.1/8 dev red ip link set dev eth1 up ip li set red up ping -c1 -w1 -I red 127.0.0.1 ip li del red when either policy routing IP rules are present or the local table lookup ip rule is before the l3mdev lookup results in a hang with these messages: unregister_netdevice: waiting for red to become free. Usage count = 1 The problem is caused by caching the dst used for sending the packet out of the specified interface on a local route with a different nexthop interface. Thus the dst could stay around until the route in the table the lookup was done is deleted which may be never. Address the problem by not forcing output device to be the l3mdev in the flow's output interface if the lookup didn't use the l3mdev. This then results in the dst using the right device according to the route. Changes in v2: - make the dev_out passed in by __ip_route_output_key_hash correct instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as suggested by David. Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback") Reported-by: David Ahern <dsa@cumulusnetworks.com> Suggested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	net: tc35815: move free after the dereference	Dan Carpenter
	We dereference "skb" to get "skb->len" so we should probably do that step before freeing the skb. Fixes: eea221ce4880 ("tc35815 driver update (take 2)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24	net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats	Martin KaFai Lau
	We have observed a sudden spike in rx/tx_packets and rx/tx_bytes reported under /proc/net/dev. There is a race in mlx5e_update_stats() and some of the get-stats functions (the one that we hit is the mlx5e_get_stats() which is called by ndo_get_stats64()). In particular, the very first thing mlx5e_update_sw_counters() does is 'memset(s, 0, sizeof(*s))'. For example, if mlx5e_get_stats() is unlucky at one point, rx_bytes and rx_packets could be 0. One second later, a normal (and much bigger than 0) value will be reported. This patch is to use a 'struct mlx5e_sw_stats temp' to avoid a direct memset zero on priv->stats.sw. mlx5e_update_vport_counters() has a similar race. Hence, addressed together. However, memset zero is removed instead because it is not needed. I am lucky enough to catch this 0-reset in rx multicast: eth0: 41457665 76804 70 0 0 70 0 47085 15586634 87502 3 0 0 0 3 0 eth0: 41459860 76815 70 0 0 70 0 47094 15588376 87516 3 0 0 0 3 0 eth0: 41460577 76822 70 0 0 70 0 0 15589083 87521 3 0 0 0 3 0 eth0: 41463293 76838 70 0 0 70 0 47108 15595872 87538 3 0 0 0 3 0 eth0: 41463379 76839 70 0 0 70 0 47116 15596138 87539 3 0 0 0 3 0 v2: Remove memset zero from mlx5e_update_vport_counters() v1: Use temp and memcpy Fixes: 9218b44dcc05 ("net/mlx5e: Statistics handling refactoring") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-23	sparc: Update syscall tables.	David S. Miller
	Hook up statx. Ignore pkeys system calls, we don't have protection keeys on SPARC. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-23	sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API	David S. Miller
	This lets us enable KPROBE_EVENTS. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-23	Linux 4.11-rc8v4.11-rc8	Linus Torvalds

2017-04-23	Merge tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs	Linus Torvalds
	Pull UBI/UBIFS fixes from Richard Weinberger: "This contains fixes for issues in both UBI and UBIFS: - more O_TMPFILE fallout - RENAME_WHITEOUT regression due to a mis-merge - memory leak in ubifs_mknod() - power-cut problem in UBI's update volume feature" * tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs: ubifs: Fix O_TMPFILE corner case in ubifs_link() ubifs: Fix RENAME_WHITEOUT support ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode ubifs: Fix debug messages for an invalid filename in ubifs_dump_node ubifs: Remove filename from debug messages in ubifs_readdir ubifs: Fix memory leak in error path in ubifs_mknod ubi/upd: Always flush after prepared for an update
2017-04-23	Merge branch 'ras-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Thomas Gleixner: "The MCE atomic notifier callchain invokes callbacks which might sleep. Convert it to a blocking notifier and prevent calls from atomic context" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make the MCE notifier a blocking one
2017-04-23	Merge branch 'irq-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "The (hopefully) final fix for the irq affinity spreading logic" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Fix calculating vectors to assign
2017-04-22	net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling	Ilan Tayari
	Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size of the table, regardless of the amount of entries in it. Existing code does not do that, and this breaks all usage of ethtool -N or -n without explicit location, with this error: rmgr: Invalid RX class rules table size: Success Set info->data to the table size. Tested: ethtool -n ens8 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55 ethtool -n ens8 ethtool -N ens8 delete 1023 ethtool -N ens8 delete 55 Fixes: f913a72aa008 ("net/mlx5e: Add support to get ethtool flow rules") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22	net/mlx5e: Fix small packet threshold	Eugenia Emantayev
	RX packet headers are meant to be contained in SKB linear part, and chose a threshold of 128. It turns out this is not enough, i.e. for IPv6 packet over VxLAN. In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes, and 86 bytes for TCP/IPv6. In total 136 bytes that is more than current 128 bytes. In this case expand header flow is reached. The warning in skb_try_coalesce() caused by a wrong truesize was already fixed here: commit 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()"). Still, we prefer to totally avoid the expand header flow for performance reasons. Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found. Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22	net/mlx5: Fix UAR memory leak	Maor Gottlieb
	When UAR is released, we deallocate the device resource, but don't unmmap the UAR mapping memory. Fix the leak by unmapping this memory. Fixes: a6d51b68611e9 ('net/mlx5: Introduce blue flame register allocator) Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22	net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels	Or Gerlitz
	Otherwise the code that fills the ipv6 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22	net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels	Or Gerlitz
	Otherwise the code that fills the ipv4 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22	net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5	Or Gerlitz
	On ConnectX5 the wqe inline mode is "none" and hence the FW reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED. Fix our devlink callbacks to deal with that on get and set. Also fix the tc flow parsing code not to fail anything when inline isn't required. Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22	net/mlx5: Fix driver load bad flow when having fw initializing timeout	Mohamad Haj Yahia
	If FW is stuck in initializing state we will skip the driver load, but current error handling flow doesn't clean previously allocated command interface resources. Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-21	Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux	Linus Torvalds
	Pull nfsd bugfix from Bruce Fields: "Fix a 4.11 regression that triggers a BUG() on an attempt to use an unsupported NFSv4 compound op" * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux: nfsd: fix oops on unsupported operation
2017-04-21	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Don't race in IPSEC dumps, from Yuejie Shi. 2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu. 3) Fix out of bounds access in ipv6 segment routing code, from David Lebrun. 4) Don't write into the header of cloned SKBs in smsc95xx driver, from James Hughes. 5) Several other drivers have this bug too, fix them. From Eric Dumazet. 6) Fix access to uninitialized data in TC action cookie code, from Wolfgang Bumiller. 7) Fix double free in IPV6 segment routing, again from David Lebrun. 8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern. 9) Fix use after free in qrtr code, from Dan Carpenter. 10) Don't double-destroy devices in ip6mr code, from Nikolay Aleksandrov. 11) Don't pass out-of-range TX queue indices into drivers, from Tushar Dave. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) netpoll: Check for skb->queue_mapping ip6mr: fix notification device destruction bpf, doc: update bpf maintainers entry net: qrtr: potential use after free in qrtr_sendmsg() bpf: Fix values type used in test_maps net: ipv6: RTF_PCPU should not be settable from userspace gso: Validate assumption of frag_list segementation kaweth: use skb_cow_head() to deal with cloned skbs ch9200: use skb_cow_head() to deal with cloned skbs lan78xx: use skb_cow_head() to deal with cloned skbs sr9700: use skb_cow_head() to deal with cloned skbs cx82310_eth: use skb_cow_head() to deal with cloned skbs smsc75xx: use skb_cow_head() to deal with cloned skbs ipv6: sr: fix double free of skb after handling invalid SRH MAINTAINERS: Add "B:" field for networking. net sched actions: allocate act cookie early qed: Fix issue in populating the PFC config paramters. qed: Fix possible system hang in the dcbnl-getdcbx() path. qed: Fix sending an invalid PFC error mask to MFW. qed: Fix possible error in populating max_tc field. ...
2017-04-21	netpoll: Check for skb->queue_mapping	Tushar Dave
	Reducing real_num_tx_queues needs to be in sync with skb queue_mapping otherwise skbs with queue_mapping greater than real_num_tx_queues can be sent to the underlying driver and can result in kernel panic. One such event is running netconsole and enabling VF on the same device. Or running netconsole and changing number of tx queues via ethtool on same device. e.g. Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000000001525 tsk->{mm,active_mm}->pgd = fff800130ff9a000 \\|/ ____ \\|/ "@'/ .. \`@" /_\| \__/ \|_\ \__U_/ kworker/48:1(475): Oops [#1] CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE 4.11.0-rc3-davem-net+ #7 Workqueue: events queue_process task: fff80013113299c0 task.stack: fff800131132c000 TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y: 00000000 Tainted: G OE TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]> g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3: 0000000000000001 g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7: 00000000000000c0 o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3: 0000000000000003 o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc: 000000000049ed94 RPC: <set_next_entity+0x34/0xb80> l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3: 0000000000000000 l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7: fff8001fa7605028 i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7: 00000000103fa4b0 I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]> Call Trace: [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe] [0000000000998c74] netpoll_start_xmit+0xf4/0x200 [0000000000998e10] queue_process+0x90/0x160 [0000000000485fa8] process_one_work+0x188/0x480 [0000000000486410] worker_thread+0x170/0x4c0 [000000000048c6b8] kthread+0xd8/0x120 [0000000000406064] ret_from_fork+0x1c/0x2c [0000000000000000] (null) Disabling lock debugging due to kernel taint Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe] Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200 Caller[0000000000998e10]: queue_process+0x90/0x160 Caller[0000000000485fa8]: process_one_work+0x188/0x480 Caller[0000000000486410]: worker_thread+0x170/0x4c0 Caller[000000000048c6b8]: kthread+0xd8/0x120 Caller[0000000000406064]: ret_from_fork+0x1c/0x2c Caller[0000000000000000]: (null) Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	ip6mr: fix notification device destruction	Nikolay Aleksandrov
	Andrey Konovalov reported a BUG caused by the ip6mr code which is caused because we call unregister_netdevice_many for a device that is already being destroyed. In IPv4's ipmr that has been resolved by two commits long time ago by introducing the "notify" parameter to the delete function and avoiding the unregister when called from a notifier, so let's do the same for ip6mr. The trace from Andrey: ------------[ cut here ]------------ kernel BUG at net/core/dev.c:6813! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff880069208000 task.stack: ffff8800692d8000 RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813 RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297 RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569 RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070 R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000 FS: 0000000000000000(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0 Call Trace: unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647 call_netdevice_notifiers net/core/dev.c:1663 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many net/core/dev.c:7880 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00 RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0 ---[ end trace e0b29c57e9b3292c ]--- Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	bpf, doc: update bpf maintainers entry	Daniel Borkmann
	Add various related files that have been missing under BPF entry covering essential parts of its infrastructure and also add myself as co-maintainer. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	net: qrtr: potential use after free in qrtr_sendmsg()	Dan Carpenter
	If skb_pad() fails then it frees the skb so we should check for errors. Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	bpf: Fix values type used in test_maps	David Miller
	Maps of per-cpu type have their value element size adjusted to 8 if it is specified smaller during various map operations. This makes test_maps as a 32-bit binary fail, in fact the kernel writes past the end of the value's array on the user's stack. To be quite honest, I think the kernel should reject creation of a per-cpu map that doesn't have a value size of at least 8 if that's what the kernel is going to silently adjust to later. If the user passed something smaller, it is a sizeof() calcualtion based upon the type they will actually use (just like in this testcase code) in later calls to the map operations. Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2017-04-21	net: ipv6: RTF_PCPU should not be settable from userspace	David Ahern
	Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	gso: Validate assumption of frag_list segementation	Ilan Tayari
	Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") assumes that all SKBs in a frag_list (except maybe the last one) contain the same amount of GSO payload. This assumption is not always correct, resulting in the following warning message in the log: skb_segment: too many frags For example, mlx5 driver in Striding RQ mode creates some RX SKBs with one frag, and some with 2 frags. After GRO, the frag_list SKBs end up having different amounts of payload. If this frag_list SKB is then forwarded, the aforementioned assumption is violated. Validate the assumption, and fall back to software GSO if it not true. Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212 Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	Merge branch 'skb_cow_head'	David S. Miller
	Eric Dumazet says: ==================== net: use skb_cow_head() to deal with cloned skbs James Hughes found an issue with smsc95xx driver. Same problematic code is found in other drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	kaweth: use skb_cow_head() to deal with cloned skbs	Eric Dumazet
	We can use skb_cow_head() to properly deal with clones, especially the ones coming from TCP stack that allow their head being modified. This avoids a copy. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	ch9200: use skb_cow_head() to deal with cloned skbs	Eric Dumazet
	We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	lan78xx: use skb_cow_head() to deal with cloned skbs	Eric Dumazet
	We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	sr9700: use skb_cow_head() to deal with cloned skbs	Eric Dumazet
	We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	cx82310_eth: use skb_cow_head() to deal with cloned skbs	Eric Dumazet
	We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	smsc75xx: use skb_cow_head() to deal with cloned skbs	Eric Dumazet
	We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	ipv6: sr: fix double free of skb after handling invalid SRH	David Lebrun
	The icmpv6_param_prob() function already does a kfree_skb(), this patch removes the duplicate one. Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	Merge tag 'powerpc-4.11-8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Just two fixes. The first fixes kprobing a stdu, and is marked for stable as it's been broken for ~ever. In hindsight this could have gone in next. The other is a fix for a change we merged this cycle, where if we take a certain exception when the kernel is running relocated (currently only used for kdump), we checkstop the box. Thanks to Ravi Bangoria" * tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
2017-04-21	Merge tag 'pci-v4.11-fixes-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Sorry this is so late. It's been in -next for over a week, but I forgot to send it on until now. A single fix to the DT binding of the HiSilicon PCIe host support" * tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)
2017-04-21	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull block layer fixes from Jens Axboe: "A couple of last minute fixes for regressions in this cycle. More specifically: - Two patches from Andy, adjusting the NVMe APST quirks to avoid some issues specific to one Toshiba drive, and some variant of Samsung on two specific Dell laptops. - A fix for mtip32xx, turning off mq scheduling on that device. We have a real fix for this, but it's too late in the cycle. Thankfully we already have a NO_SCHED flag we can apply here. A prep patch for this is ensuring that we honor the NO_SCHED flag when attempting to online switch schedulers, previsouly we only did so for drive load time. From Ming. - Fixing an oops in blk-mq polling with scheduling attached. This one is easily reproducible, it would be a shame to release 4.11 with that issue. From me. I'd prefer not having to send in patches at this point in time, but the above are all things that have regressed in this cycle and the fixes are relatively straight forward" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: fix potential oops with polling and blk-mq scheduler nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA" nvme: Adjust the Samsung APST quirk mtip32xx: pass BLK_MQ_F_NO_SCHED block: respect BLK_MQ_F_NO_SCHED
2017-04-21	Merge tag 'acpi-4.11-final' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI build fix from Rafael Wysocki: "This avoids a false-positive build warning from the compiler. Specifics: - Avoid a false-positive warning regarding a variable that may not be initialized that started to trigger after a previous general build fix (Arnd Bergmann)" * tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / power: Avoid maybe-uninitialized warning
2017-04-21	Merge tag 'mmc-v4.11-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - kmalloc sdio scratch buffer to make it DMA-friendly MMC host: - dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used - sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50 cards" * tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card mmc: dw_mmc: Don't allow Runtime PM for SDIO cards mmc: sdio: fix alignment issue in struct sdio_func
2017-04-21	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixlet from Dmitry Torokhov: "An update to Elan PS/2 driver to allow working on yet another Lifebook" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
2017-04-21	MAINTAINERS: Add "B:" field for networking.	David S. Miller
	We want people to report bugs to the netdev list. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20	ARCv2: entry: save Accumulator register pair (r58:59) if present	Vineet Gupta
	Accumulator is present in configs with FPU and/or DSP MPY (mpy > 6) Instead of doing this in pt_regs (and thus every kernel entry/exit), this could have been done in context switch (and for user task only) as currently kernel doesn't clobber these registers for its own accord. However we will soon start using 64-bit multiply instructions for kernel which can clobber these. Also gcc folks also plan to start using these as GPRs, hence better to always save/restore them Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-04-20	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge two mm fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: prevent NR_ISOLATE_* stats from going negative Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"
2017-04-20	mm: prevent NR_ISOLATE_* stats from going negative	Rabin Vincent
	Commit 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn based migration") moved the dec_node_page_state() call (along with the page_is_file_cache() call) to after putback_lru_page(). But page_is_file_cache() can change after putback_lru_page() is called, so it should be called before putback_lru_page(), as it was before that patch, to prevent NR_ISOLATE_* stats from going negative. Without this fix, non-CONFIG_SMP kernels end up hanging in the while(too_many_isolated()) { congestion_wait() } loop in shrink_active_list() due to the negative stats. Mem-Info: active_anon:32567 inactive_anon:121 isolated_anon:1 active_file:6066 inactive_file:6639 isolated_file:4294967295 ^^^^^^^^^^ unevictable:0 dirty:115 writeback:0 unstable:0 slab_reclaimable:2086 slab_unreclaimable:3167 mapped:3398 shmem:18366 pagetables:1145 bounce:0 free:1798 free_pcp:13 free_cma:0 Fixes: 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn based migration") Link: http://lkml.kernel.org/r/1492683865-27549-1-git-send-email-rabin.vincent@axis.com Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ming Ling <ming.ling@spreadtrum.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-20	Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"	Mel Gorman
	This reverts commit 374ad05ab64. While the patch worked great for userspace allocations, the fact that softirq loses the per-cpu allocator caused problems. It needs to be redone taking into account that a separate list is needed for hard/soft IRQs or alternatively find a cheap way of detecting reentry due to an interrupt. Both are possible but sufficiently tricky that it shouldn't be rushed. Jesper had one method for allowing softirqs but reported that the cost was high enough that it performed similarly to a plain revert. His figures for netperf TCP_STREAM were as follows Baseline v4.10.0 : 60316 Mbit/s Current 4.11.0-rc6: 47491 Mbit/s Jesper's patch : 60662 Mbit/s This patch : 60106 Mbit/s As this is a regression, I wish to revert to noirq allocator for now and go back to the drawing board. Link: http://lkml.kernel.org/r/20170415145350.ixy7vtrzdzve57mh@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Tariq Toukan <ttoukan.linux@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>