summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-12netfilter: flowtable: Add API for registering to flow table eventsPaul Blakey
Let drivers to add their cb allowing them to receive flow offload events of type TC_SETUP_CLSFLOWER (REPLACE/DEL/STATS) for flows managed by the flow table. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads tablePaul Blakey
The eswitch offloads table, which has the reps (vport) rx miss rules, was moved from OFFLOADS namespace [0,0] (prio, level), to [1,0], so the restore table (the new [0,0]) can come before it. The destinations of these miss rules is the rep root ft (ttc for non uplink reps). Uplink rep root ft is created as OFFLOADS namespace [0,1], and is used as a hook to next RX prio (either ethtool or ttc), but this fails to pass fs_core level's check. Move uplink rep root ft to OFFLOADS prio 1, level 1 ([1,1]), so it will keep the same relative position after the restore table change. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net/mlx5: E-Switch, Enable reg c1 loopback when possiblePaul Blakey
Enable reg c1 loopback if firmware reports it's supported, as this is needed for restoring packet metadata (e.g chain). Also define helper to query if it is enabled. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Merge branch 'ct-offload' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
2020-03-12Merge branch 'bind_addr_zero'David S. Miller
Kuniyuki Iwashima says: ==================== Improve bind(addr, 0) behaviour. Currently we fail to bind sockets to ephemeral ports when all of the ports are exhausted even if all sockets have SO_REUSEADDR enabled. In this case, we still have a chance to connect to the different remote hosts. These patches add net.ipv4.ip_autobind_reuse option and fix the behaviour to fully utilize all space of the local (addr, port) tuples. Changes in v5: - Add more description to documents. - Fix sysctl option to use proc_dointvec_minmax. - Remove the Fixes: tag and squash two commits. Changes in v4: - Add net.ipv4.ip_autobind_reuse option to not change the current behaviour. - Modify .gitignore for test. https://lore.kernel.org/netdev/20200308181615.90135-1-kuniyu@amazon.co.jp/ Changes in v3: - Change the title and write more specific description of the 3rd patch. - Add a test in tools/testing/selftests/net/ as the 4th patch. https://lore.kernel.org/netdev/20200229113554.78338-1-kuniyu@amazon.co.jp/ Changes in v2: - Change the description of the 2nd patch ('localhost' -> 'address'). - Correct the description and the if statement of the 3rd patch. https://lore.kernel.org/netdev/20200226074631.67688-1-kuniyu@amazon.co.jp/ v1 with tests: https://lore.kernel.org/netdev/20200220152020.13056-1-kuniyu@amazon.co.jp/ ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.Kuniyuki Iwashima
This commit adds a test to check if we can fully utilize 4-tuples for connect() when all ephemeral ports are exhausted. The test program changes the local port range to use only one port and binds two sockets with or without SO_REUSEADDR and SO_REUSEPORT, and with the same EUID or with different EUIDs, then do listen(). We should be able to bind only one socket having both SO_REUSEADDR and SO_REUSEPORT per EUID, which restriction is to prevent unintentional listen(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and ↵Kuniyuki Iwashima
SO_REUSEPORT per EUID. If there is no TCP_LISTEN socket on a ephemeral port, we can bind multiple sockets having SO_REUSEADDR to the same port. Then if all sockets bound to the port have also SO_REUSEPORT enabled and have the same EUID, all of them can be listened. This is not safe. Let's say, an application has root privilege and binds sockets to an ephemeral port with both of SO_REUSEADDR and SO_REUSEPORT. When none of sockets is not listened yet, a malicious user can use sudo, exhaust ephemeral ports, and bind sockets to the same ephemeral port, so he or she can call listen and steal the port. To prevent this issue, we must not bind more than one sockets that have the same EUID and both of SO_REUSEADDR and SO_REUSEPORT. On the other hand, if the sockets have different EUIDs, the issue above does not occur. After sockets with different EUIDs are bound to the same port and one of them is listened, no more socket can be listened. This is because the condition below is evaluated true and listen() for the second socket fails. } else if (!reuseport_ok || !reuseport || !sk2->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2)))) { if (inet_rcv_saddr_equal(sk, sk2, true)) break; } Therefore, on the same port, we cannot do listen() for multiple sockets with different EUIDs and any other listen syscalls fail, so the problem does not happen. In this case, we can still call connect() for other sockets that cannot be listened, so we have to succeed to call bind() in order to fully utilize 4-tuples. Summarizing the above, we should be able to bind only one socket having SO_REUSEADDR and SO_REUSEPORT per EUID. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are ↵Kuniyuki Iwashima
exhausted. Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger condition for bind_conflict") introduced a restriction to forbid to bind SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to assign ports dispersedly so that we can connect to the same remote host. The change results in accelerating port depletion so that we fail to bind sockets to the same local port even if we want to connect to the different remote hosts. You can reproduce this issue by following instructions below. 1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768" 2. set SO_REUSEADDR to two sockets. 3. bind two sockets to (localhost, 0) and the latter fails. Therefore, when ephemeral ports are exhausted, bind(0) should fallback to the legacy behaviour to enable the SO_REUSEADDR option and make it possible to connect to different remote (addr, port) tuples. This patch allows us to bind SO_REUSEADDR enabled sockets to the same (addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all ephemeral ports are exhausted. This also allows connect() and listen() to share ports in the following way and may break some applications. So the ip_autobind_reuse is 0 by default and disables the feature. 1. setsockopt(sk1, SO_REUSEADDR) 2. setsockopt(sk2, SO_REUSEADDR) 3. bind(sk1, saddr, 0) 4. bind(sk2, saddr, 0) 5. connect(sk1, daddr) 6. listen(sk2) If it is set 1, we can fully utilize the 4-tuples, but we should use IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible. The notable thing is that if all sockets bound to the same port have both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an ephemeral port and also do listen(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12tcp: Remove unnecessary conditions in inet_csk_bind_conflict().Kuniyuki Iwashima
When we get an ephemeral port, the relax is false, so the SO_REUSEADDR conditions may be evaluated twice. We do not need to check the conditions again. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Merge branch 'hns3-fixes'David S. Miller
Huazhong Tan says: ==================== net: hns3: fixes for -net This series includes several bugfixes for the HNS3 ethernet driver. [patch 1] fixes an "tc qdisc del" failure. [patch 2] fixes SW & HW VLAN table not consistent issue. [patch 3] fixes a RMW issue related to VLAN filter switch. [patch 4] clears port based VLAN when uploading PF. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: hns3: clear port base VLAN when unload PFJian Shen
Currently, PF missed to clear the port base VLAN for VF when unload. In this case, the VLAN id will remain in the VLAN table. This patch fixes it. Fixes: 92f11ea177cd ("net: hns3: fix set port based VLAN issue for VF") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: hns3: fix RMW issue for VLAN filter switchJian Shen
According to the user manual, the ingress and egress VLAN filter are configured at the same time. Currently, hclge_init_vlan_config() and hclge_set_vlan_spoofchk() will both change the VLAN filter switch. So it's necessary to read the old configuration before modifying it. Fixes: 22044f95faa0 ("net: hns3: add support for spoof check setting") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: hns3: fix VF VLAN table entries inconsistent issueJian Shen
Currently, if VF is loaded on the host side, the host doesn't clear the VF's VLAN table entries when VF removing. In this case, when doing reset and disabling sriov at the same time the VLAN device over VF will be removed, but the VLAN table entries in hardware are remained. This patch fixes it by asking PF to clear the VLAN table entries for VF when VF is removing. It also clears the VLAN table full bit after VF VLAN table entries being cleared. Fixes: c6075b193462 ("net: hns3: Record VF vlan tables") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: hns3: fix "tc qdisc del" failed issueYonglong Liu
The HNS3 driver supports to configure TC numbers and TC to priority map via "tc" tool. But when delete the rule, will fail, because the HNS3 driver needs at least one TC, but the "tc" tool sets TC number to zero when delete. This patch makes sure that the TC number is at least one. Fixes: 30d240dfa2e8 ("net: hns3: Add mqprio hardware offload support in hns3 driver") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Merge branch 'ethtool-consolidate-irq-coalescing-part-4'David S. Miller
Jakub Kicinski says: ==================== ethtool: consolidate irq coalescing - part 4 Convert more drivers following the groundwork laid in a recent patch set [1] and continued in [2], [3]. The aim of the effort is to consolidate irq coalescing parameter validation in the core. This set converts 15 drivers in drivers/net/ethernet - remaining Intel drivers, Freescale/NXP, and others. 2 more conversion sets to come. [1] https://lore.kernel.org/netdev/20200305051542.991898-1-kuba@kernel.org/ [2] https://lore.kernel.org/netdev/20200306010602.1620354-1-kuba@kernel.org/ [3] https://lore.kernel.org/netdev/20200310021512.1861626-1-kuba@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: ixgbevf: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: ixgbe: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: igc: let core reject the unsupported coalescing parametersJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver was rejecting almost all unsupported parameters already, it was only missing a check for tx_max_coalesced_frames_irq. As a side effect of these changes the error code for unsupported params changes from ENOTSUPP to EOPNOTSUPP. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: igbvf: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: igb: let core reject the unsupported coalescing parametersJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver was rejecting almost all unsupported parameters already, it was only missing a check for tx_max_coalesced_frames_irq. As a side effect of these changes the error code for unsupported params changes from ENOTSUPP to EOPNOTSUPP. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: iavf: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: i40e: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: fm10k: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: e1000: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: hns3: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: hns: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: gianfar: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: fec: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: dpaa: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters (other than adaptive rx, which will now be rejected by core). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: be2net: reject unsupported coalescing paramsJakub Kicinski
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12sfc: ethtool: Refactor to remove fallthrough comments in case blocksJoe Perches
Converting fallthrough comments to fallthrough; creates warnings in this code when compiled with gcc. This code is overly complicated and reads rather better with a little refactoring and no fallthrough uses at all. Remove the fallthrough comments and simplify the written source code while reducing the object code size. Consolidate duplicated switch/case blocks for IPV4 and IPV6. defconfig x86-64 with sfc: $ size drivers/net/ethernet/sfc/ethtool.o* text data bss dec hex filename 10055 12 0 10067 2753 drivers/net/ethernet/sfc/ethtool.o.new 10135 12 0 10147 27a3 drivers/net/ethernet/sfc/ethtool.o.old Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12taprio: Fix sending packets without dequeueing themVinicius Costa Gomes
There was a bug that was causing packets to be sent to the driver without first calling dequeue() on the "child" qdisc. And the KASAN report below shows that sending a packet without calling dequeue() leads to bad results. The problem is that when checking the last qdisc "child" we do not set the returned skb to NULL, which can cause it to be sent to the driver, and so after the skb is sent, it may be freed, and in some situations a reference to it may still be in the child qdisc, because it was never dequeued. The crash log looks like this: [ 19.937538] ================================================================== [ 19.938300] BUG: KASAN: use-after-free in taprio_dequeue_soft+0x620/0x780 [ 19.938968] Read of size 4 at addr ffff8881128628cc by task swapper/1/0 [ 19.939612] [ 19.939772] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc3+ #97 [ 19.940397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qe4 [ 19.941523] Call Trace: [ 19.941774] <IRQ> [ 19.941985] dump_stack+0x97/0xe0 [ 19.942323] print_address_description.constprop.0+0x3b/0x60 [ 19.942884] ? taprio_dequeue_soft+0x620/0x780 [ 19.943325] ? taprio_dequeue_soft+0x620/0x780 [ 19.943767] __kasan_report.cold+0x1a/0x32 [ 19.944173] ? taprio_dequeue_soft+0x620/0x780 [ 19.944612] kasan_report+0xe/0x20 [ 19.944954] taprio_dequeue_soft+0x620/0x780 [ 19.945380] __qdisc_run+0x164/0x18d0 [ 19.945749] net_tx_action+0x2c4/0x730 [ 19.946124] __do_softirq+0x268/0x7bc [ 19.946491] irq_exit+0x17d/0x1b0 [ 19.946824] smp_apic_timer_interrupt+0xeb/0x380 [ 19.947280] apic_timer_interrupt+0xf/0x20 [ 19.947687] </IRQ> [ 19.947912] RIP: 0010:default_idle+0x2d/0x2d0 [ 19.948345] Code: 00 00 41 56 41 55 65 44 8b 2d 3f 8d 7c 7c 41 54 55 53 0f 1f 44 00 00 e8 b1 b2 c5 fd e9 07 00 3 [ 19.950166] RSP: 0018:ffff88811a3efda0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 [ 19.950909] RAX: 0000000080000000 RBX: ffff88811a3a9600 RCX: ffffffff8385327e [ 19.951608] RDX: 1ffff110234752c0 RSI: 0000000000000000 RDI: ffffffff8385262f [ 19.952309] RBP: ffffed10234752c0 R08: 0000000000000001 R09: ffffed10234752c1 [ 19.953009] R10: ffffed10234752c0 R11: ffff88811a3a9607 R12: 0000000000000001 [ 19.953709] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 19.954408] ? default_idle_call+0x2e/0x70 [ 19.954816] ? default_idle+0x1f/0x2d0 [ 19.955192] default_idle_call+0x5e/0x70 [ 19.955584] do_idle+0x3d4/0x500 [ 19.955909] ? arch_cpu_idle_exit+0x40/0x40 [ 19.956325] ? _raw_spin_unlock_irqrestore+0x23/0x30 [ 19.956829] ? trace_hardirqs_on+0x30/0x160 [ 19.957242] cpu_startup_entry+0x19/0x20 [ 19.957633] start_secondary+0x2a6/0x380 [ 19.958026] ? set_cpu_sibling_map+0x18b0/0x18b0 [ 19.958486] secondary_startup_64+0xa4/0xb0 [ 19.958921] [ 19.959078] Allocated by task 33: [ 19.959412] save_stack+0x1b/0x80 [ 19.959747] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 19.960222] kmem_cache_alloc+0xe4/0x230 [ 19.960617] __alloc_skb+0x91/0x510 [ 19.960967] ndisc_alloc_skb+0x133/0x330 [ 19.961358] ndisc_send_ns+0x134/0x810 [ 19.961735] addrconf_dad_work+0xad5/0xf80 [ 19.962144] process_one_work+0x78e/0x13a0 [ 19.962551] worker_thread+0x8f/0xfa0 [ 19.962919] kthread+0x2ba/0x3b0 [ 19.963242] ret_from_fork+0x3a/0x50 [ 19.963596] [ 19.963753] Freed by task 33: [ 19.964055] save_stack+0x1b/0x80 [ 19.964386] __kasan_slab_free+0x12f/0x180 [ 19.964830] kmem_cache_free+0x80/0x290 [ 19.965231] ip6_mc_input+0x38a/0x4d0 [ 19.965617] ipv6_rcv+0x1a4/0x1d0 [ 19.965948] __netif_receive_skb_one_core+0xf2/0x180 [ 19.966437] netif_receive_skb+0x8c/0x3c0 [ 19.966846] br_handle_frame_finish+0x779/0x1310 [ 19.967302] br_handle_frame+0x42a/0x830 [ 19.967694] __netif_receive_skb_core+0xf0e/0x2a90 [ 19.968167] __netif_receive_skb_one_core+0x96/0x180 [ 19.968658] process_backlog+0x198/0x650 [ 19.969047] net_rx_action+0x2fa/0xaa0 [ 19.969420] __do_softirq+0x268/0x7bc [ 19.969785] [ 19.969940] The buggy address belongs to the object at ffff888112862840 [ 19.969940] which belongs to the cache skbuff_head_cache of size 224 [ 19.971202] The buggy address is located 140 bytes inside of [ 19.971202] 224-byte region [ffff888112862840, ffff888112862920) [ 19.972344] The buggy address belongs to the page: [ 19.972820] page:ffffea00044a1800 refcount:1 mapcount:0 mapping:ffff88811a2bd1c0 index:0xffff8881128625c0 compo0 [ 19.973930] flags: 0x8000000000010200(slab|head) [ 19.974388] raw: 8000000000010200 ffff88811a2ed650 ffff88811a2ed650 ffff88811a2bd1c0 [ 19.975151] raw: ffff8881128625c0 0000000000190013 00000001ffffffff 0000000000000000 [ 19.975915] page dumped because: kasan: bad access detected [ 19.976461] page_owner tracks the page as allocated [ 19.976946] page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NO) [ 19.978332] prep_new_page+0x24b/0x330 [ 19.978707] get_page_from_freelist+0x2057/0x2c90 [ 19.979170] __alloc_pages_nodemask+0x218/0x590 [ 19.979619] new_slab+0x9d/0x300 [ 19.979948] ___slab_alloc.constprop.0+0x2f9/0x6f0 [ 19.980421] __slab_alloc.constprop.0+0x30/0x60 [ 19.980870] kmem_cache_alloc+0x201/0x230 [ 19.981269] __alloc_skb+0x91/0x510 [ 19.981620] alloc_skb_with_frags+0x78/0x4a0 [ 19.982043] sock_alloc_send_pskb+0x5eb/0x750 [ 19.982476] unix_stream_sendmsg+0x399/0x7f0 [ 19.982904] sock_sendmsg+0xe2/0x110 [ 19.983262] ____sys_sendmsg+0x4de/0x6d0 [ 19.983660] ___sys_sendmsg+0xe4/0x160 [ 19.984032] __sys_sendmsg+0xab/0x130 [ 19.984396] do_syscall_64+0xe7/0xae0 [ 19.984761] page last free stack trace: [ 19.985142] __free_pages_ok+0x432/0xbc0 [ 19.985533] qlist_free_all+0x56/0xc0 [ 19.985907] quarantine_reduce+0x149/0x170 [ 19.986315] __kasan_kmalloc.constprop.0+0x9e/0xd0 [ 19.986791] kmem_cache_alloc+0xe4/0x230 [ 19.987182] prepare_creds+0x24/0x440 [ 19.987548] do_faccessat+0x80/0x590 [ 19.987906] do_syscall_64+0xe7/0xae0 [ 19.988276] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 19.988775] [ 19.988930] Memory state around the buggy address: [ 19.989402] ffff888112862780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.990111] ffff888112862800: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 19.990822] >ffff888112862880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 19.991529] ^ [ 19.992081] ffff888112862900: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc [ 19.992796] ffff888112862980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Reported-by: Michael Schmidt <michael.schmidt@eti.uni-siegen.de> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Revert "net: sched: make newly activated qdiscs visible"Julian Wiedmann
This reverts commit 4cda75275f9f89f9485b0ca4d6950c95258a9bce from net-next. Brown bag time. Michal noticed that this change doesn't work at all when netif_set_real_num_tx_queues() gets called prior to an initial dev_activate(), as for instance igb does. Doing so dies with: [ 40.579142] BUG: kernel NULL pointer dereference, address: 0000000000000400 [ 40.586922] #PF: supervisor read access in kernel mode [ 40.592668] #PF: error_code(0x0000) - not-present page [ 40.598405] PGD 0 P4D 0 [ 40.601234] Oops: 0000 [#1] PREEMPT SMP PTI [ 40.605909] CPU: 18 PID: 1681 Comm: wickedd Tainted: G E 5.6.0-rc3-ethnl.50-default #1 [ 40.616205] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 [ 40.627377] RIP: 0010:qdisc_hash_add.part.22+0x2e/0x90 [ 40.633115] Code: 00 55 53 89 f5 48 89 fb e8 2f 9b fb ff 85 c0 74 44 48 8b 43 40 48 8b 08 69 43 38 47 86 c8 61 c1 e8 1c 48 83 e8 80 48 8d 14 c1 <48> 8b 04 c1 48 8d 4b 28 48 89 53 30 48 89 43 28 48 85 c0 48 89 0a [ 40.654080] RSP: 0018:ffffb879864934d8 EFLAGS: 00010203 [ 40.659914] RAX: 0000000000000080 RBX: ffffffffb8328d80 RCX: 0000000000000000 [ 40.667882] RDX: 0000000000000400 RSI: 0000000000000000 RDI: ffffffffb831faa0 [ 40.675849] RBP: 0000000000000000 R08: ffffa0752c8b9088 R09: ffffa0752c8b9208 [ 40.683816] R10: 0000000000000006 R11: 0000000000000000 R12: ffffa0752d734000 [ 40.691783] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa07113c18000 [ 40.699750] FS: 00007f94548e5880(0000) GS:ffffa0752e980000(0000) knlGS:0000000000000000 [ 40.708782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.715189] CR2: 0000000000000400 CR3: 000000082b6ae006 CR4: 00000000001606e0 [ 40.723156] Call Trace: [ 40.725888] dev_qdisc_set_real_num_tx_queues+0x61/0x90 [ 40.731725] netif_set_real_num_tx_queues+0x94/0x1d0 [ 40.737286] __igb_open+0x19a/0x5d0 [igb] [ 40.741767] __dev_open+0xbb/0x150 [ 40.745567] __dev_change_flags+0x157/0x1a0 [ 40.750240] dev_change_flags+0x23/0x60 [...] Fixes: 4cda75275f9f ("net: sched: make newly activated qdiscs visible") Reported-by: Michal Kubecek <mkubecek@suse.cz> CC: Michal Kubecek <mkubecek@suse.cz> CC: Eric Dumazet <edumazet@google.com> CC: Jamal Hadi Salim <jhs@mojatatu.com> CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12Merge tag 'for-linus-5.6-2' of git://github.com/cminyard/linux-ipmiLinus Torvalds
Pull IPMI fix from Corey Minyard: "Fix a message spew on some system The call to platform_get_irq() was changed to print a log if the interrupt was not available, and that was causing bogus messages to spew out for the IPMI driver. People have requested that this get in to 5.6 so I'm sending it along" * tag 'for-linus-5.6-2' of git://github.com/cminyard/linux-ipmi: ipmi_si: Avoid spurious errors for optional IRQs
2020-03-12Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a build problem with x86/curve25519" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/curve25519 - support assemblers with no adx support
2020-03-12dt-bindings: soc: qcom: fix IPA bindingAlex Elder
The definitions for the "qcom,smem-states" and "qcom,smem-state-names" properties need to list their "$ref" under an "allOf" keyword. In addition, fix two problems in the example at the end: - Use #include for header files that define needed symbolic values - Terminate the line that includes the "ipa-shared" register space name with a comma rather than a semicolon Finally, update some white space in the example for better alignment. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: mvmdio: avoid error message for optional IRQChris Packham
Per the dt-binding the interrupt is optional so use platform_get_irq_optional() instead of platform_get_irq(). Since commit 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()") platform_get_irq() produces an error message orion-mdio f1072004.mdio: IRQ index 0 not found which is perfectly normal if one hasn't specified the optional property in the device tree. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12net: dsa: mv88e6xxx: Add missing mask of ATU occupancy registerAndrew Lunn
Only the bottom 12 bits contain the ATU bin occupancy statistics. The upper bits need masking off. Fixes: e0c69ca7dfbb ("net: dsa: mv88e6xxx: Add ATU occupancy via devlink resources") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11net: mptcp: don't hang before sending 'MP capable with data'Davide Caratti
the following packetdrill script socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 flags[flag_h] nokey> < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscale 8,mpcapable v1 flags[flag_h] key[skey=2]> > . 1:1(0) ack 1 win 256 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h] key[ckey,skey]> getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 fcntl(3, F_SETFL, O_RDWR) = 0 write(3, ..., 1000) = 1000 doesn't transmit 1KB data packet after a successful three-way-handshake, using mp_capable with data as required by protocol v1, and write() hangs forever: PID: 973 TASK: ffff97dd399cae80 CPU: 1 COMMAND: "packetdrill" #0 [ffffa9b94062fb78] __schedule at ffffffff9c90a000 #1 [ffffa9b94062fc08] schedule at ffffffff9c90a4a0 #2 [ffffa9b94062fc18] schedule_timeout at ffffffff9c90e00d #3 [ffffa9b94062fc90] wait_woken at ffffffff9c120184 #4 [ffffa9b94062fcb0] sk_stream_wait_connect at ffffffff9c75b064 #5 [ffffa9b94062fd20] mptcp_sendmsg at ffffffff9c8e801c #6 [ffffa9b94062fdc0] sock_sendmsg at ffffffff9c747324 #7 [ffffa9b94062fdd8] sock_write_iter at ffffffff9c7473c7 #8 [ffffa9b94062fe48] new_sync_write at ffffffff9c302976 #9 [ffffa9b94062fed0] vfs_write at ffffffff9c305685 #10 [ffffa9b94062ff00] ksys_write at ffffffff9c305985 #11 [ffffa9b94062ff38] do_syscall_64 at ffffffff9c004475 #12 [ffffa9b94062ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ca0008c RIP: 00007f959407eaf7 RSP: 00007ffe9e95a910 RFLAGS: 00000293 RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f959407eaf7 RDX: 00000000000003e8 RSI: 0000000001785fe0 RDI: 0000000000000008 RBP: 0000000001785fe0 R8: 0000000000000000 R9: 0000000000000003 R10: 0000000000000007 R11: 0000000000000293 R12: 00000000000003e8 R13: 00007ffe9e95ae30 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b Fix it ensuring that socket state is TCP_ESTABLISHED on reception of the third ack. Fixes: 1954b86016cf ("mptcp: Check connection state before attempting send") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11net: memcg: fix lockdep splat in inet_csk_accept()Eric Dumazet
Locking newsk while still holding the listener lock triggered a lockdep splat [1] We can simply move the memcg code after we release the listener lock, as this can also help if multiple threads are sharing a common listener. Also fix a typo while reading socket sk_rmem_alloc. [1] WARNING: possible recursive locking detected 5.6.0-rc3-syzkaller #0 Not tainted -------------------------------------------- syz-executor598/9524 is trying to acquire lock: ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline] ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492 but task is already holding lock: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline] ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_INET6); lock(sk_lock-AF_INET6); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by syz-executor598/9524: #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline] #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445 stack backtrace: CPU: 0 PID: 9524 Comm: syz-executor598 Not tainted 5.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_deadlock_bug kernel/locking/lockdep.c:2370 [inline] check_deadlock kernel/locking/lockdep.c:2411 [inline] validate_chain kernel/locking/lockdep.c:2954 [inline] __lock_acquire.cold+0x114/0x288 kernel/locking/lockdep.c:3954 lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4484 lock_sock_nested+0xc5/0x110 net/core/sock.c:2947 lock_sock include/net/sock.h:1541 [inline] inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492 inet_accept+0xe9/0x7c0 net/ipv4/af_inet.c:734 __sys_accept4_file+0x3ac/0x5b0 net/socket.c:1758 __sys_accept4+0x53/0x90 net/socket.c:1809 __do_sys_accept4 net/socket.c:1821 [inline] __se_sys_accept4 net/socket.c:1818 [inline] __x64_sys_accept4+0x93/0xf0 net/socket.c:1818 do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4445c9 Code: e8 0c 0d 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc35b37608 EFLAGS: 00000246 ORIG_RAX: 0000000000000120 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004445c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000306777 R09: 0000000000306777 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00000000004053d0 R14: 0000000000000000 R15: 0000000000000000 Fixes: d752a4986532 ("net: memcg: late association of sock to memcg") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shakeel Butt <shakeelb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11Merge branch 's390-qeth-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2020-03-11 please apply the following patch series for qeth to netdev's net tree. Just one fix to get the RX buffer pool resizing right, with two preparatory cleanups. This is on the larger side given where we are in the -rc cycle, but a big chunk of the delta is just refactoring to make the fix look nice. I intentionally split these off from yesterday's series. No objections if you'd rather punt them to net-next, the series should apply cleanly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11s390/qeth: implement smarter resizing of the RX buffer poolJulian Wiedmann
The RX buffer pool is allocated in qeth_alloc_qdio_queues(). A subsequent pool resizing is then handled in a very simple way: first free the current pool, then allocate a new pool of the requested size. There's two ways where this can go wrong: 1. if the resize action happens _before_ the initial pool was allocated, then a subsequent initialization will call qeth_alloc_qdio_queues() and fill the pool with a second(!) set of pages. We consume twice the planned amount of memory. This is easy to fix - just skip the resizing if the queues haven't been allocated yet. 2. if the initial pool was created by qeth_alloc_qdio_queues() but a subsequent resizing fails, then the device has no(!) RX buffer pool. The next initialization will _not_ call qeth_alloc_qdio_queues(), and attempting to back the RX buffers with pages in qeth_init_qdio_queues() will fail. Not very difficult to fix either - instead of re-allocating the whole pool, just allocate/free as many entries to match the desired size. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11s390/qeth: refactor buffer pool codeJulian Wiedmann
In preparation for a subsequent fix, split out helpers to allocate/free individual pool entries. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11s390/qeth: use page pointers to manage RX buffer poolJulian Wiedmann
The RX buffer elements are always backed with full pages, reflect this in the pointer type. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol numberPaolo Lungaroni
The Internet Assigned Numbers Authority (IANA) has recently assigned a protocol number value of 143 for Ethernet [1]. Before this assignment, encapsulation mechanisms such as Segment Routing used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated payload is an Ethernet frame. In this patch, we add the definition of the Ethernet protocol number to the kernel headers and update the SRv6 L2 tunnels to use it. [1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Acked-by: Ahmed Abdelsalam <ahmed.abdelsalam@gssi.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11net: dsa: Don't instantiate phylink for CPU/DSA ports unless neededAndrew Lunn
By default, DSA drivers should configure CPU and DSA ports to their maximum speed. In many configurations this is sufficient to make the link work. In some cases it is necessary to configure the link to run slower, e.g. because of limitations of the SoC it is connected to. Or back to back PHYs are used and the PHY needs to be driven in order to establish link. In this case, phylink is used. Only instantiate phylink if it is required. If there is no PHY, or no fixed link properties, phylink can upset a link which works in the default configuration. Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11soc: qcom: ipa: fix spelling mistake "cahces" -> "caches"Colin Ian King
There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11net: ibm: remove set but not used variables 'err'Chen Zhou
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/ibm/emac/core.c: In function __emac_mdio_write: drivers/net/ethernet/ibm/emac/core.c:875:9: warning: variable err set but not used [-Wunused-but-set-variable] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11net: Add missing annotation for *netlink_seq_start()Jules Irenge
Sparse reports a warning at netlink_seq_start() warning: context imbalance in netlink_seq_start() - wrong count at exit The root cause is the missing annotation at netlink_seq_start() Add the missing __acquires(RCU) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-11tcp: Add missing annotation for tcp_child_process()Jules Irenge
Sparse reports warning at tcp_child_process() warning: context imbalance in tcp_child_process() - unexpected unlock The root cause is the missing annotation at tcp_child_process() Add the missing __releases(&((child)->sk_lock.slock)) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>