summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-05mld: change lockdep annotation for ip6_sf_socklist and ipv6_mc_socklistTaehee Yoo
struct ip6_sf_socklist and ipv6_mc_socklist are per-socket MLD data. These data are protected by rtnl lock, socket lock, and RCU. So, when these are used, it verifies whether rtnl lock is acquired or not. ip6_mc_msfget() is called by do_ipv6_getsockopt(). But caller doesn't acquire rtnl lock. So, when these data are used in the ip6_mc_msfget() lockdep warns about it. But accessing these is actually safe because socket lock was acquired by do_ipv6_getsockopt(). So, it changes lockdep annotation from rtnl lock to socket lock. (rtnl_dereference -> sock_dereference) Locking graph for mld data is like below: When writing mld data: do_ipv6_setsockopt() rtnl_lock lock_sock (mld functions) idev->mc_lock(if per-interface mld data is modified) When reading mld data: do_ipv6_getsockopt() lock_sock ip6_mc_msfget() Splat looks like: ============================= WARNING: suspicious RCU usage 5.12.0-rc4+ #503 Not tainted ----------------------------- net/ipv6/mcast.c:610 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by mcast-listener-/923: #0: ffff888007958a70 (sk_lock-AF_INET6){+.+.}-{0:0}, at: ipv6_get_msfilter+0xaf/0x190 stack backtrace: CPU: 1 PID: 923 Comm: mcast-listener- Not tainted 5.12.0-rc4+ #503 Call Trace: dump_stack+0xa4/0xe5 ip6_mc_msfget+0x553/0x6c0 ? ipv6_sock_mc_join_ssm+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? mark_held_locks+0xb7/0x120 ? lockdep_hardirqs_on_prepare+0x27c/0x3e0 ? __local_bh_enable_ip+0xa5/0xf0 ? lock_sock_nested+0x82/0xf0 ipv6_get_msfilter+0xc3/0x190 ? compat_ipv6_get_msfilter+0x300/0x300 ? lock_downgrade+0x690/0x690 do_ipv6_getsockopt.isra.6.constprop.13+0x1809/0x29e0 ? do_ipv6_mcast_group_source+0x150/0x150 ? register_lock_class+0x1750/0x1750 ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x3a/0x1c0 ? lock_downgrade+0x690/0x690 ? ipv6_getsockopt+0xdb/0x1b0 ipv6_getsockopt+0xdb/0x1b0 [ ... ] Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05qede: Use 'skb_add_rx_frag()' instead of hand coding itChristophe JAILLET
Some lines of code can be merged into an equivalent 'skb_add_rx_frag()' call which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Manish Chopra <manishc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05qede: Remove a erroneous ++ in 'qede_rx_build_jumbo()'Christophe JAILLET
This ++ is confusing. It looks duplicated with the one already performed in 'skb_fill_page_desc()'. In fact, it is harmless. 'nr_frags' is written twice with the same value. Once, because of the nr_frags++, and once because of the 'nr_frags = i + 1' in 'skb_fill_page_desc()'. So axe this post-increment to avoid confusion. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Manish Chopra <manishc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05sfc: Use 'skb_add_rx_frag()' instead of hand coding itChristophe JAILLET
Some lines of code can be merged into an equivalent 'skb_add_rx_frag()' call which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05ibmvnic: Use 'skb_frag_address()' instead of hand coding itChristophe JAILLET
'page_address(skb_frag_page()) + skb_frag_off()' can be replaced by an equivalent 'skb_frag_address()' which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05net: ag71xx: Slightly simplify 'ag71xx_rx_packets()'Christophe JAILLET
There is no need to use 'list_for_each_entry_safe' here, as nothing is removed from the list in the 'for' loop. Use 'list_for_each_entry' instead, it is slightly less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05net: x25: Queue received packets in the drivers instead of per-CPU queuesXie He
X.25 Layer 3 (the Packet Layer) expects layer 2 to provide a reliable datalink service such that no packets are reordered or dropped. And X.25 Layer 2 (the LAPB layer) is indeed designed to provide such service. However, this reliability is not preserved when a driver calls "netif_rx" to deliver the received packets to layer 3, because "netif_rx" will put the packets into per-CPU queues before they are delivered to layer 3. If there are multiple CPUs, the order of the packets may not be preserved. The per-CPU queues may also drop packets if there are too many. Therefore, we should not call "netif_rx" to let it queue the packets. Instead, we should use our own queue that won't reorder or drop packets. This patch changes all X.25 drivers to use their own queues instead of calling "netif_rx". The patch also documents this requirement in the "x25-iface" documentation. Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-04net: openvswitch: Use 'skb_push_rcsum()' instead of hand coding itChristophe JAILLET
'skb_push()'/'skb_postpush_rcsum()' can be replaced by an equivalent 'skb_push_rcsum()' which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-04Merge tag 'mlx5-updates-2021-04-02' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-02 This series provides trivial updates and cleanup to mlx5 driver 1) Support for matching on ct_state inv and rel flag in connection tracking 2) Reject TC rules that redirect from a VF to itself 3) Parav provided some E-Switch cleanups that could be summarized to: 3.1) Packing and Reduce structure sizes 3.2) Dynamic allocation of rate limit tables and structures 4) Vu Makes the netdev arfs and vlan tables allocation dynamic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03Merge branch 'stmmac-xdp'David S. Miller
Ong Boon Leong says: ==================== stmmac: Add XDP support This is the v4 patch series for adding XDP native support to stmmac. Changes in v4: 5/6: Move TX clean timer setup to the end of NAPI RX process and group it under stmmac_finalize_xdp_rx(). Also, fixed stmmac_xdp_xmit_back() returns STMMAC_XDP_CONSUMED if XDP buffer conversion to XDP frame fails. 6/6: Move xdp_do_flush(0 into stmmac_finalize_xdp_rx() and combine the XDP verdict of XDP TX and XDP REDIRECT together. I retested the patch series on the 'xdp2' and 'xdp_redirect' related to changes above and found the result to be satisfactory. History of previous patch series: v3: https://patchwork.kernel.org/project/netdevbpf/cover/20210331154135.8507-1-boon.leong.ong@intel.com/ v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=457757 v1: https://patchwork.kernel.org/project/netdevbpf/list/?series=457139 It will be great if community can help to test or review the v4 series and provide me any input if any. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03net: stmmac: Add support for XDP_REDIRECT actionOng Boon Leong
This patch adds the support of XDP_REDIRECT to another remote cpu for further action. It also implements ndo_xdp_xmit ops, enabling the driver to transmit packets forwarded to it by XDP program running on another interface. This patch has been tested using "xdp_redirect_cpu" for XDP_REDIRECT + drop testing. It also been tested with "xdp_redirect" sample app which can be used to exercise ndo_xdp_xmit ops. The burst traffics are generated using pktgen_sample03_burst_single_flow.sh in samples/pktgen directory. v4: Move xdp_do_flush() processing into stmmac_finalize_xdp_rx() and combined the XDP verdict of XDP TX and REDIRECT together. v3: Added 'nq->trans_start = jiffies' to avoid TX time-out as we are sharing TX queue between slow path and XDP. Thanks to Jakub Kicinski for point out. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03net: stmmac: Add support for XDP_TX actionOng Boon Leong
This patch adds support for XDP_TX action which enables XDP program to transmit back received frames. This patch has been tested with the "xdp2" app located in samples/bpf dir. The DUT receives burst traffic packet generated using pktgen script 'pktgen_sample03_burst_single_flow.sh'. v4: Moved stmmac_tx_timer_arm() to be done once at the end of NAPI RX. Fixed stmmac_xdp_xmit_back() to return STMMAC_XDP_CONSUMED if XDP buffer to frame conversion fails. Thanks to Jakub's input. v3: Added 'nq->trans_start = jiffies' to avoid TX time-out as we are sharing TX queue between slow path and XDP. Thanks to Jakub Kicinski for pointing out. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03net: stmmac: Add initial XDP supportOng Boon Leong
This patch adds the initial XDP support to stmmac driver. It supports XDP_PASS, XDP_DROP and XDP_ABORTED actions. Upcoming patches will add support for XDP_TX and XDP_REDIRECT. To support XDP headroom, this patch adds page_offset into RX buffer and change the dma_sync_single_for_device|cpu(). The DMA address used for RX operation are changed to take into page_offset too. As page_pool can handle dma_sync_single_for_device() on behalf of driver with PP_FLAG_DMA_SYNC_DEV flag, we skip doing that in stmmac driver. Current stmmac driver supports split header support (SPH) in RX but the flexibility of splitting header and payload at different position makes it very complex to be supported for XDP processing. In addition, jumbo frame is not supported in XDP to keep the initial codes simple. This patch has been tested with the sample app "xdp1" located in samples/bpf directory for both SKB and Native (XDP) mode. The burst traffic generated using pktgen_sample03_burst_single_flow.sh in samples/pktgen directory. Changes in v3: - factor in xdp header and tail adjustment done by XDP program. Thanks to Jakub Kicinski for pointing out the gap in v2. Changes in v2: - fix for "warning: variable 'len' set but not used" reported by lkp. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03net: stmmac: arrange Tx tail pointer update to stmmac_flush_tx_descriptorsOng Boon Leong
This patch organizes TX tail pointer update into a new function called stmmac_flush_tx_descriptors() so that we can reuse it in stmmac_xmit(), stmmac_tso_xmit() and up-coming XDP implementation. Changes to v2: - Fix for warning: unused variable ‘desc_size’ https://patchwork.hopto.org/static/nipa/457321/12170149/build_32bit/stderr Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03net: stmmac: make SPH enable/disable to be configurableOng Boon Leong
SPH functionality splits header and payload according to split mode and offsef fields (SPLM and SPLOFST). It is beneficials for Linux network stack RX processing however it adds a lot of complexity in XDP processing. So, this patch makes the split-header (SPH) capability of the controller is stored in "priv->sph_cap" and the enabling/disabling of SPH is decided by "priv->sph". This is to prepare initial XDP enabling for stmmac to disable the use of SPH whenever XDP is enabled. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-03net: stmmac: set IRQ affinity hint for multi MSI vectorsOng Boon Leong
Certain platform likes Intel mGBE has independent hardware IRQ resources for TX and RX DMA operation. In preparation to support XDP TX, we add IRQ affinity hint to group both RX and TX queue of the same queue ID to the same CPU. Changes in v2: - IRQ affinity hint need to set to null before IRQ is released. Thanks to issue reported by Song, Yoong Siang. Reported-by: Song, Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02net/mlx5e: Dynamic alloc vlan table for netdev when neededVu Pham
Dynamic allocate vlan table in mlx5e_priv for EN netdev when needed. Don't allocate it for representor netdev. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5e: Dynamic alloc arfs table for netdev when neededVu Pham
Dynamic allocate arfs table in mlx5e_priv for EN netdev when needed. Don't allocate it for representor netdev. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5e: Reject tc rules which redirect from a VF to itselfAriel Levkovich
Since there are self loopback prevention mechanisms at the VF level, offloading such rules which redirect from a VF to itself in the eswitch will break the datapath since the packets will be dropped once they go back to the vport they came from. Therefore, offloading such rules will be rejected and left to be handled by SW. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Use ida_alloc_range() instead of ida_simple_alloc()Roi Dayan
ida_simple_alloc() and remove functions are deprecated. Related change: commit 3264ceec8f17 ("lib/idr.c: document that ida_simple_{get,remove}() are deprecated") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: E-Switch, move QoS specific fields to existing qos structParav Pandit
Function QoS related fields are already defined in qos related struct. min and max rate are left out to mlx5_vport_info struct. Move them to existing qos struct. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: E-Switch, cut down mlx5_vport_info structure size by 8 bytesParav Pandit
Structure mlx5_vport_info consumes 40 bytes of space due to a hole in it. After packing it reduces to 32 bytes. Currently: pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u8 qos; /* 8 1 */ /* XXX 7 bytes hole, try to pack */ u64 node_guid; /* 16 8 */ int link_state; /* 24 4 */ u32 min_rate; /* 28 4 */ u32 max_rate; /* 32 4 */ bool spoofchk; /* 36 1 */ bool trusted; /* 37 1 */ /* size: 40, cachelines: 1, members: 9 */ /* sum members: 31, holes: 1, sum holes: 7 */ /* padding: 2 */ /* last cacheline: 40 bytes */ }; After packing: $ pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u64 node_guid; /* 8 8 */ int link_state; /* 16 4 */ u32 min_rate; /* 20 4 */ u32 max_rate; /* 24 4 */ u8 qos; /* 28 1 */ u8 spoofchk:1; /* 29: 0 1 */ u8 trusted:1; /* 29: 1 1 */ /* size: 32, cachelines: 1, members: 9 */ /* padding: 2 */ /* bit_padding: 6 bits */ /* last cacheline: 32 bytes */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Pair mutex_destory with mutex_init for rate limit tableParav Pandit
Add missing mutex_destroy() to pair with mutex_init(). This should be done only when table is initialized, hence perform mutex_init() only when table is initialized. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Allocate rate limit table when rate is configuredParav Pandit
A device supports 128 rate limiters. A static table allocation consumes 8KB of memory even when rate is not configured. Instead, allocate the table when at least one rate is configured. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Use helper to increment, decrement rate entry refcountParav Pandit
Rate limit entry refcount can be incremented uniformly when it is newly allocated or reused. So simplify the code to increment refcount at one place. Use decrement refcount helper in two routines. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Use helpers to allocate and free rl table entriesParav Pandit
User helper routines to allocate and free rate limit table entries. Subsequent patch extends use of these helpers to do allocation during rate entry allocation callback. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Do not hold mutex while reading table constantsParav Pandit
Table max_size, min and max rate are constants initialized while table is created. Reading it doesn't need to hold a table mutex. Hence, read them without holding table mutex. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Pack mlx5_rl_entry structureParav Pandit
mlx5_rl_entry structure is not properly packed as shown below. Due to this an array of size 9144 bytes allocated which is aligned to 16Kbytes. Hence, pack the structure and avoid the wastage. This offers 8Kbytes of saving per mlx5_core_dev struct. pahole -C mlx5_rl_entry drivers/net/ethernet/mellanox/mlx5/core/en_main.o Existing layout: struct mlx5_rl_entry { u8 rl_raw[48]; /* 0 48 */ u16 index; /* 48 2 */ /* XXX 6 bytes hole, try to pack */ u64 refcount; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u16 uid; /* 64 2 */ u8 dedicated:1; /* 66: 0 1 */ /* size: 72, cachelines: 2, members: 5 */ /* sum members: 60, holes: 1, sum holes: 6 */ /* sum bitfield members: 1 bits (0 bytes) */ /* padding: 5 */ /* bit_padding: 7 bits */ /* last cacheline: 8 bytes */ }; After alignment: struct mlx5_rl_entry { u8 rl_raw[48]; /* 0 48 */ u64 refcount; /* 48 8 */ u16 index; /* 56 2 */ u16 uid; /* 58 2 */ u8 dedicated:1; /* 60: 0 1 */ /* size: 64, cachelines: 1, members: 5 */ /* padding: 3 */ /* bit_padding: 7 bits */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: Use unsigned int for free_countParav Pandit
Fix the warning due to missing int. WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned free_count; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: E-Switch, move QoS specific fields to existing qos structParav Pandit
Function QoS related fields are already defined in qos related struct. min and max rate are left out to mlx5_vport_info struct. Move them to existing qos struct. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: E-Switch, cut down mlx5_vport_info structure size by 8 bytesParav Pandit
Structure mlx5_vport_info consumes 40 bytes of space due to a hole in it. After packing it reduces to 32 bytes. Currently: pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u8 qos; /* 8 1 */ /* XXX 7 bytes hole, try to pack */ u64 node_guid; /* 16 8 */ int link_state; /* 24 4 */ u32 min_rate; /* 28 4 */ u32 max_rate; /* 32 4 */ bool spoofchk; /* 36 1 */ bool trusted; /* 37 1 */ /* size: 40, cachelines: 1, members: 9 */ /* sum members: 31, holes: 1, sum holes: 7 */ /* padding: 2 */ /* last cacheline: 40 bytes */ }; After packing: $ pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u64 node_guid; /* 8 8 */ int link_state; /* 16 4 */ u32 min_rate; /* 20 4 */ u32 max_rate; /* 24 4 */ u8 qos; /* 28 1 */ u8 spoofchk:1; /* 29: 0 1 */ u8 trusted:1; /* 29: 1 1 */ /* size: 32, cachelines: 1, members: 9 */ /* padding: 2 */ /* bit_padding: 6 bits */ /* last cacheline: 32 bytes */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02net/mlx5: CT: Add support for matching on ct_state inv and rel flagsAriel Levkovich
Add support for matching on ct_state inv and rel flags. Currently the support is only for match on -inv and -rel. Matching on +inv and +rel will be rejected. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state -est-rel+trk \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-inv \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-02tcp: reorder tcp_congestion_ops for better cache localityEric Dumazet
Group all the often used fields in the first cache line, to reduce cache line misses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02net: reorganize fields in netns_mibEric Dumazet
Order fields to increase locality for most used protocols. udplite and icmp are moved at the end. Same for proc_net_devsnmp6 which is not used in fast path. This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02nfc: pn533: prevent potential memory corruptionDan Carpenter
If the "type_a->nfcid_len" is too large then it would lead to memory corruption in pn533_target_found_type_a() when we do: memcpy(nfc_tgt->nfcid1, tgt_type_a->nfcid_data, nfc_tgt->nfcid1_len); Fixes: c3b1e1e8a76f ("NFC: Export NFCID1 from pn533") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02Merge branch 'dpaa2-rx-copybreak'David S. Miller
Ioana Ciornei says: ==================== dpaa2-eth: add rx copybreak support DMA unmapping, allocating a new buffer and DMA mapping it back on the refill path is really not that efficient. Proper buffer recycling (page pool, flipping the page and using the other half) cannot be done for DPAA2 since it's not a ring based controller but it rather deals with multiple queues which all get their buffers from the same buffer pool on Rx. To circumvent these limitations, add support for Rx copybreak in dpaa2-eth. Below you can find a summary of the tests that were run to end up with the default rx copybreak value of 512. A bit about the setup - a LS2088A SoC, 8 x Cortex A72 @ 1.8GHz, IPfwd zero loss test @ 20Gbit/s throughput. I tested multiple frame sizes to get an idea where is the break even point. Here are 2 sets of results, (1) is the baseline and (2) is just allocating a new skb for all frames sizes received (as if the copybreak was even to the MTU). All numbers are in Mpps. 64 128 256 512 640 768 896 (1) 3.23 3.23 3.24 3.21 3.1 2.76 2.71 (2) 3.95 3.88 3.79 3.62 3.3 3.02 2.65 It seems that even for 512 bytes frame sizes it's comfortably better when allocating a new skb. After that, we see diminishing rewards or even worse. Changes in v2: - properly marked dpaa2_eth_copybreak as static ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02dpaa2-eth: export the rx copybreak value as an ethtool tunableIoana Ciornei
It's useful, especially for debugging purposes, to have the Rx copybreak value changeable at runtime. Export it as an ethtool tunable. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02dpaa2-eth: add rx copybreak supportIoana Ciornei
DMA unmapping, allocating a new buffer and DMA mapping it back on the refill path is really not that efficient. Proper buffer recycling (page pool, flipping the page and using the other half) cannot be done for DPAA2 since it's not a ring based controller but it rather deals with multiple queues which all get their buffers from the same buffer pool on Rx. To circumvent these limitations, add support for Rx copybreak. For small sized packets instead of creating a skb around the buffer in which the frame was received, allocate a new sk buffer altogether, copy the contents of the frame and release the initial page back into the buffer pool. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02dpaa2-eth: rename dpaa2_eth_xdp_release_buf into dpaa2_eth_recycle_bufIoana Ciornei
Rename the dpaa2_eth_xdp_release_buf function into dpaa2_eth_recycle_buf since in the next patches we'll be using the same recycle mechanism for the normal stack path beside for XDP_DROP. Also, rename the array which holds the buffers to be recycled so that it does not have any reference to XDP. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02Merge branch 'mptcp-misc'David S. Miller
Mat Martineau says: ==================== MPTCP: Miscellaneous changes Here is a collection of patches from the MPTCP tree: Patches 1 and 2 add some helpful MIB counters for connection information. Patch 3 cleans up some unnecessary checks. Patch 4 is a new feature, support for the MP_TCPRST option. This option is used when resetting one subflow within a MPTCP connection, and provides a reason code that the recipient can use when deciding how to adapt to the lost subflow. Patches 5-7 update the existing MPTCP selftests to improve timeout handling and to share better information when tests fail. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02selftests: mptcp: dump more info on mpjoin errorsMatthieu Baerts
Very occasionally, MPTCP selftests fail. Yeah, I saw that at least once! Here we provide more details in case of errors with mptcp_join.sh script like it was done with mptcp_connect.sh, see commit 767389c8dd55 ("selftests: mptcp: dump more info on errors") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02selftests: mptcp: init nstat historyMatthieu Baerts
Not to be impacted by packets sent between sub-tests. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02selftests: mptcp: launch mptcp_connect with timeoutMatthieu Baerts
'mptcp_connect' already has a timeout for poll() but in some cases, it is not enough. With "timeout" tool, we will force the command to fail if it doesn't finish on time. Thanks to that, the script will continue and display details about the current state before marking the test as failed. Displaying this state is very important to be able to understand the issue. Best to have our CI reporting the issue than just "the test hanged". Note that in mptcp_connect.sh, we were using a long timeout to validate the fact we cannot create a socket if a sysctl is set. We don't need this timeout. In diag.sh, we want to send signals to mptcp_connect instances that have been started in the netns. But we cannot send this signal to 'timeout' otherwise that will stop the timeout and messages telling us SIGUSR1 has been received will be printed. Instead of trying to find the right PID and storing them in an array, we can simply use the output of 'ip netns pids' which is all the PIDs we want to send signal to. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/160 Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02mptcp: add mptcp reset option supportFlorian Westphal
The MPTCP reset option allows to carry a mptcp-specific error code that provides more information on the nature of a connection reset. Reset option data received gets stored in the subflow context so it can be sent to userspace via the 'subflow closed' netlink event. When a subflow is closed, the desired error code that should be sent to the peer is also placed in the subflow context structure. If a reset is sent before subflow establishment could complete, e.g. on HMAC failure during an MP_JOIN operation, the mptcp skb extension is used to store the reset information. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02mptcp: remove unneeded check on first subflowPaolo Abeni
Currently we explicitly check for the first subflow being NULL in a couple of places, even if we don't need any special actions in such scenario. Just drop the unneeded checks, to avoid confusion. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02mptcp: add active MPC mibsPaolo Abeni
We are not currently tracking the active MPTCP connection attempts. Let's add the related counters. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02mptcp: add mib for token creation fallbackPaolo Abeni
If the MPTCP protocol is unable to create a new token, the socket fallback to plain TCP, let's keep track of such events via a specific MIB. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02Merge branch 'ionic-ptp'David S. Miller
Shannon Nelson says: ==================== ionic: add PTP and hw clock support This patchset adds support for accessing the DSC hardware clock and for offloading PTP timestamping. Tx packet timestamping happens through a separate Tx queue set up with expanded completion descriptors that can report the timestamp. Rx timestamping can happen either on all queues, or on a separate timestamping queue when specific filtering is requested. Again, the timestamps are reported with the expanded completion descriptors. The timestamping offload ability is advertised but not enabled until an OS service asks for it. At that time the driver's queues are reconfigured to use the different completion descriptors and the private processing queues as needed. Reading the raw clock value comes through a new pair of values in the device info registers in BAR0. These high and low values are interpreted with help from new clock mask, mult, and shift values in the device identity information. First we add the ability to detect new queue features, then the handling of the new descriptor sizes. After adding the new interface structures, we start adding the support code, saving the advertising to the stack for last. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02ionic: advertise support for hardware timestampsShannon Nelson
Let the network stack know we've got support for timestamping the packets. Signed-off-by: Allen Hubbe <allenbh@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02ionic: ethtool ptp statsShannon Nelson
Add the new hwstamp stats to our ethtool stats output. Signed-off-by: Allen Hubbe <allenbh@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>