Age | Commit message (Collapse) | Author |
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style. This fixes various indentation mixups (seven spaces,
tab+one space, etc).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If rvu_get_blkaddr() fails, then this rvu_cgx_nix_cuml_stats() returns
zero and we write some uninitialized data into the debugfs output.
On the error paths, the use of the uninitialized "*stat" is harmless,
but it will lead to a Smatch warning (static analysis) and a UBSan
warning (runtime analysis) so we should prevent that as well.
Fixes: f967488d095e ("octeontx2-af: Add per CGX port level NIX Rx/Tx counters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for the soft status and control register, which allows
TX_FAULT and RX_LOS to be monitored and TX_DISABLE to be set. We
make use of this when the board does not support GPIOs for these
signals.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marc Micalizzi reports that Huawei MA5671A and Alcatel/Lucent G-010S-P
modules are capable of 2500base-X, but incorrectly report their
capabilities in the EEPROM. It seems rather common that GPON modules
mis-report.
Let's fix these modules by adding some quirks.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for applying module quirks to the list of supported
ethtool link modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-20
The following pull-request contains BPF updates for your *net-next* tree.
We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).
There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca74886c433:
<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
<<<<<<< HEAD
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
The main changes are:
1) Addition of BPF trampoline which works as a bridge between kernel functions,
BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
BPF programs for tracing with practically zero overhead to call into BPF (as
opposed to k[ret]probes) and ii) attachment of the former to networking related
programs to see input/output of networking programs (covering xdpdump use case),
from Alexei Starovoitov.
2) BPF array map mmap support and use in libbpf for global data maps; also a big
batch of libbpf improvements, among others, support for reading bitfields in a
relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.
3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.
4) Add BPF audit support and emit messages upon successful prog load and unload in
order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.
5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
(XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.
6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
call named bpf_get_link_xdp_info() for retrieving the full set of prog
IDs attached to XDP, from Toke Høiland-Jørgensen.
7) Add BTF support for array of int, array of struct and multidimensional arrays
and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.
8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.
9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
xdping to be run as standalone, from Jiri Benc.
10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.
11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.
12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only values 0 and 1 are currently defined as parameters for
PHY_MDIO_CHG. Instead of silently ignoring unknown values and
misinterpreting the firmware code let's explicitly check.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using macro FIELD_SIZEOF makes this define easier understandable.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We're not in atomic context here, therefore switch to msleep.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
since now the driver can let page_pool API to manage needed DMA
sync with a proper size.
- XDP_DROP DMA sync managed by mvneta driver: ~420Kpps
- XDP_DROP DMA sync managed by page_pool API: ~585Kpps
Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rely on page_pool_recycle_direct and not on xdp_return_buff in
mvneta_run_xdp. This is a preliminary patch to limit the dma sync len
to the one strictly necessary
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add TC-MATCHALL classifier ingress offload support. The same actions
supported by existing TC-FLOWER offload can be applied to all incoming
traffic on the underlying interface.
Ensure the rule priority doesn't conflict with existing rules in the
TCAM. Only 1 ingress matchall rule can be active at a time on the
underlying interface.
v5:
- No change.
v4:
- Added check to ensure the matchall rule's prio doesn't conflict with
other rules in TCAM.
- Added logic to fill default mask for VIID, if none has been
provided, to prevent conflict with duplicate VIID rules.
- Used existing variables in private structure to fill VIID info,
instead of extracting the info manually.
v3:
- No change.
v2:
- Removed logic to fetch free index from end of TCAM. Must maintain
same ordering as in kernel.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only offload rule if it satisfies both of the following conditions:
1. The immediate previous rule has priority <= current rule's priority.
2. The immediate next rule has priority >= current rule's priority.
Also rework free entry fetch logic to search from end of TCAM, instead
of beginning, because higher indices have lower priority than lower
indices. This is similar to how TC auto generates priority values.
v5:
- Fixed commit message and comment to include comparison for equal
priority.
v4:
- Patch added in this version.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add TC-MATCHALL classifier offload with TC-POLICE action applied for
all outgoing traffic on the underlying interface. Split flow block
offload to support both egress and ingress classification.
For example, to rate limit all outgoing traffic to 1 Gbps:
$ tc qdisc add dev enp2s0f4 clsact
$ tc filter add dev enp2s0f4 egress matchall skip_sw \
action police rate 1Gbit burst 8Kbit
Note that skip_sw is important. Otherwise, both stack and hardware
will end up doing policing. Policing can't be shared across flow
blocks. Only 1 egress matchall rule can be active at a time on the
underlying interface.
v5:
- No change.
v4:
- Removed check to reject police offload if prio is not 1.
- Moved TC_SETUP_BLOCK code to separate function.
v3:
- Added check to reject police offload if prio is not 1.
- Assign block_shared variable only for TC_SETUP_BLOCK.
v2:
- Added check to reject flow block sharing for policers.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Once every napi poll cycle, check if numa node is different than
the page pool's numa id, and update it using page_pool_update_nid().
Alternatively, we could have registered an irq affinity change handler,
but page_pool_update_nid() must be called from napi context anyways, so
the handler won't actually help.
Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:
mlx5 internal page cache was locally disabled to get pure page pool
results.
CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
XDP Drop/TX single core:
NUMA | XDP | Before | After
---------------------------------------
Close | Drop | 11 Mpps | 10.9 Mpps
Far | Drop | 4.4 Mpps | 5.8 Mpps
Close | TX | 6.5 Mpps | 6.5 Mpps
Far | TX | 3.5 Mpps | 4 Mpps
Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.
TCP single/multi cpu/stream:
NUMA | #cpu | Before | After
--------------------------------------
Close | 1 | 18 Gbps | 18 Gbps
Far | 1 | 15 Gbps | 18 Gbps
Close | 12 | 80 Gbps | 80 Gbps
Far | 12 | 68 Gbps | 80 Gbps
In all test cases we see improvement for the far numa case, and no
impact on the close numa case.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CPSW switchdev based driver which is operating in dual-emac mode by
default, thus working as 2 individual network interfaces. The Switch mode
can be enabled by configuring devlink driver parameter "switch_mode" to 1:
devlink dev param set platform/48484000.switch \
name switch_mode value 1 cmode runtime
This can be done regardless of the state of Port's netdevs - UP/DOWN, but
Port's netdev devices have to be UP before joining the bridge to avoid
overwriting of bridge configuration as CPSW switch driver completely
reloads its configuration when first Port changes its state to UP.
When the both interfaces joined the bridge - CPSW switch driver will start
marking packets with offload_fwd_mark flag unless "ale_bypass=0".
All configuration is implemented via switchdev API and notifiers.
Supported:
- SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS
- SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS: BR_MCAST_FLOOD
- SWITCHDEV_ATTR_ID_PORT_STP_STATE
- SWITCHDEV_OBJ_ID_PORT_VLAN
- SWITCHDEV_OBJ_ID_PORT_MDB
- SWITCHDEV_OBJ_ID_HOST_MDB
Hence CPSW switchdev driver supports:
- FDB offloading
- MDB offloading
- VLAN filtering and offloading
- STP
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Part 1:
Introduce basic CPSW dual_mac driver (cpsw_new.c) which is operating in
dual-emac mode by default, thus working as 2 individual network interfaces.
Main differences from legacy CPSW driver are:
- optimized promiscuous mode: The P0_UNI_FLOOD (both ports) is enabled in
addition to ALLMULTI (current port) instead of ALE_BYPASS. So, Ports in
promiscuous mode will keep possibility of mcast and vlan filtering, which
is provides significant benefits when ports are joined to the same bridge,
but without enabling "switch" mode, or to different bridges.
- learning disabled on ports as it make not too much sense for
segregated ports - no forwarding in HW.
- enabled basic support for devlink.
devlink dev show
platform/48484000.switch
devlink dev param show
platform/48484000.switch:
name ale_bypass type driver-specific
values:
cmode runtime value false
- "ale_bypass" devlink driver parameter allows to enable
ALE_CONTROL(4).BYPASS mode for debug purposes.
- updated DT bindings.
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As a preparatory patch to add support for a switchdev based cpsw driver,
move common functions to cpsw-priv.c so that they can be used across both
drivers.
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A following patches introduce new CPSW switchdev driver which uses common
code with legacy CPSW driver. This will introduce build dependency between
CPSW switchdev and CPSW legacy drivers related to for_each_slave() and
cpsw_slave_index() - they can be compiled both, but only one of them will
be not functional depending in Kconfig settings due to duffrences in Slave
Ports indexes calculation.
To fix this make for_each_slave() local (it's used now only by legacy CPSW
driver) and convert cpsw_slave_index() to be a function pointer which is
assigned in probe. Driver to probe is defined by DT.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A following patch introduces switchdev functionality, so modify
ALE engine VLANs/MDBs API:
- cpsw_ale_del_mcast(): update so it will remove only selected ports from
mcast port_mask or delete whole mcast record if !port_mask
- cpsw_ale_del_vlan(): update so it will remove only selected ports from
all VLAN record's masks or delete whole VLAN record if !port_mask
- add cpsw_ale_vlan_add_modify() to add or modify existing VLAN record's
masks
- add cpsw_ale_set_unreg_mcast() for enabling unreg mcast on port VLANs
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now untagged vlan traffic is not support on Host P0 port. This patch adds
in ALE context bitmap of VLANs for which Host P0 port bit set in Force
Untagged Packet Egress bitmask in VLANs ALE entries, and adds corresponding
check in VLAN incapsulation header parsing function cpsw_rx_vlan_encap().
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clean CPSW ALE on init and intf restart (up/down) to avoid reading obsolete
or garbage entries from ALE table.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Semicolon is not required at the end of switch block. So, remove it.
Addresses coccinelle warning:
drivers/net/ethernet/chelsio/cxgb4/sge.c:2260:2-3: Unneeded semicolon
Fixes: 4846d5330daf ("cxgb4: add Tx and Rx path for ETHOFLD traffic")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On the NXP LS1028A, there are 2 Ethernet links between the Felix switch
and the ENETC:
- eno2 <-> swp4, at 2.5G
- eno3 <-> swp5, at 1G
Only one of the above Ethernet port pairs can act as a DSA link for
tagging.
When adding initial support for the driver, it was tested only on the 1G
eno3 <-> swp5 interface, due to the necessity of using PHYLIB initially
(which treats fixed-link interfaces as emulated C22 PHYs, so it doesn't
support fixed-link speeds higher than 1G).
After making PHYLINK work, it appears that swp4 still can't act as CPU
port. So it looks like ocelot_set_cpu_port was being called for swp4,
but then it was called again for swp5, overwriting the CPU port assigned
in the DT.
It appears that when you call dsa_upstream_port for a port that is not
defined in the device tree (such as swp5 when using swp4 as CPU port),
its dp->cpu_dp pointer is not initialized by dsa_tree_setup_default_cpu,
and this trips up the following condition in dsa_upstream_port:
if (!cpu_dp)
return port;
So the moral of the story is: don't call dsa_upstream_port for a port
that is not defined in the device tree, and therefore its dsa_port
structure is not completely initialized (ds->num_ports is still 6).
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the case where the call to phy_interface_is_rgmii returns zero
the variable ret is left uninitialized and this is returned at
the end of the function dp83869_configure_rgmii. Fix this by
returning 0 instead of the uninitialized value in ret.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 01db923e8377 ("net: phy: dp83869: Add TI dp83869 phy")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is especially beneficial during the NVRAM related firmware
commands that have longer timeouts. If the BNXT_STATE_FW_FATAL_COND
flag gets set while waiting for firmware response, abort and return
error.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During loss of heartbeat, log this warning message.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For NVM params that are not supported in the current NVM
configuration, return the error as -EOPNOTSUPP.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Report health status update to devlink health reporter, once
reset is completed.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Linux driver is capable of being the master function to handle
resets, so we set the flag to let firmware know. Some other
drivers, such as DPDK, is not capable and will not set the flag.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If firmware supports hot reset, extend ETHTOOL_RESET to support
hot reset driver which does not require a driver reload after
ETHTOOL_RESET. The driver will go through the same coordinated
reset sequence as a firmware initiated fatal/non-fatal reset.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the larger HWRM_COREDUMP_TIMEOUT value for coredump related
data response from the firmware. These commands take longer than
normal commands.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When hardware reports RX buffer errors, the latest 57500 chips do not
require reset. The packet is discarded by the hardware and the
ring will continue to operate.
Also, add an rx_buf_errors counter for this type of error. It can help
the user to identify if the aggregation ring is too small.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The aRFS ring table interface has changed for the 57500 chips. Updating
it accordingly so it will work with the latest production firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently match clause 45 PHYs using any ID read from a MMD marked
as present in the "Devices in package" registers 5 and 6. However,
this is incorrect. 45.2 says:
"The definition of the term package is vendor specific and could be
a chip, module, or other similar entity."
so a package could be more or less than the whole PHY - a PHY could be
made up of several modules instantiated onto a single chip such as the
Marvell 88x3310, or some of the MMDs could be disabled according to
chip configuration, such as the Broadcom 84881.
In the case of Broadcom 84881, the "Devices in package" registers
contain 0xc000009b, meaning that there is a PHYXS present in the
package, but all registers in MMD 4 return 0xffff. This leads to our
matching code incorrectly binding this PHY to one of our generic PHY
drivers.
This patch changes the way we determine whether to attempt to match a
MMD identifier, or use it to request a module - if the identifier is
all-ones, then we skip over it. When reading the identifiers, we
initialise phydev->c45_ids.device_ids to all-ones, only reading the
device ID if the "Devices in package" registers indicates we should.
This avoids the generic drivers incorrectly matching on a PHY ID of
0xffffffff.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for SFP+ cages to the Marvell 10G PHY driver. This is
slightly complicated by the way phylib works in that we need to use
a multi-step process to attach the SFP bus, and we also need to track
the phylink state machine to know when the module's transmit disable
signal should change state.
With appropriate DT changes, this allows the SFP+ canges on the
Macchiatobin platform to be functional.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add core phylib help for supporting SFP sockets on PHYs. This provides
a mechanism to inform the SFP layer about PHY up/down events, and also
unregister the SFP bus when the PHY is going away.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similarly to bpf_map's refcnt/usercnt, convert bpf_prog's refcnt to atomic64
and remove artificial 32k limit. This allows to make bpf_prog's refcounting
non-failing, simplifying logic of users of bpf_prog_add/bpf_prog_inc.
Validated compilation by running allyesconfig kernel build.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-3-andriin@fb.com
|
|
92117d8443bc ("bpf: fix refcnt overflow") turned refcounting of bpf_map into
potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit
(32k). Due to using 32-bit counter, it's possible in practice to overflow
refcounter and make it wrap around to 0, causing erroneous map free, while
there are still references to it, causing use-after-free problems.
But having a failing refcounting operations are problematic in some cases. One
example is mmap() interface. After establishing initial memory-mapping, user
is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily
splitting it into multiple non-contiguous regions. All this happening without
any control from the users of mmap subsystem. Rather mmap subsystem sends
notifications to original creator of memory mapping through open/close
callbacks, which are optionally specified during initial memory mapping
creation. These callbacks are used to maintain accurate refcount for bpf_map
(see next patch in this series). The problem is that open() callback is not
supposed to fail, because memory-mapped resource is set up and properly
referenced. This is posing a problem for using memory-mapping with BPF maps.
One solution to this is to maintain separate refcount for just memory-mappings
and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively.
There are similar use cases in current work on tcp-bpf, necessitating extra
counter as well. This seems like a rather unfortunate and ugly solution that
doesn't scale well to various new use cases.
Another approach to solve this is to use non-failing refcount_t type, which
uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX,
stays there. This utlimately causes memory leak, but prevents use after free.
But given refcounting is not the most performance-critical operation with BPF
maps (it's not used from running BPF program code), we can also just switch to
64-bit counter that can't overflow in practice, potentially disadvantaging
32-bit platforms a tiny bit. This simplifies semantics and allows above
described scenarios to not worry about failing refcount increment operation.
In terms of struct bpf_map size, we are still good and use the same amount of
space:
BEFORE (3 cache lines, 8 bytes of padding at the end):
struct bpf_map {
const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */
struct bpf_map * inner_map_meta; /* 8 8 */
void * security; /* 16 8 */
enum bpf_map_type map_type; /* 24 4 */
u32 key_size; /* 28 4 */
u32 value_size; /* 32 4 */
u32 max_entries; /* 36 4 */
u32 map_flags; /* 40 4 */
int spin_lock_off; /* 44 4 */
u32 id; /* 48 4 */
int numa_node; /* 52 4 */
u32 btf_key_type_id; /* 56 4 */
u32 btf_value_type_id; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct btf * btf; /* 64 8 */
struct bpf_map_memory memory; /* 72 16 */
bool unpriv_array; /* 88 1 */
bool frozen; /* 89 1 */
/* XXX 38 bytes hole, try to pack */
/* --- cacheline 2 boundary (128 bytes) --- */
atomic_t refcnt __attribute__((__aligned__(64))); /* 128 4 */
atomic_t usercnt; /* 132 4 */
struct work_struct work; /* 136 32 */
char name[16]; /* 168 16 */
/* size: 192, cachelines: 3, members: 21 */
/* sum members: 146, holes: 1, sum holes: 38 */
/* padding: 8 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 38 */
} __attribute__((__aligned__(64)));
AFTER (same 3 cache lines, no extra padding now):
struct bpf_map {
const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */
struct bpf_map * inner_map_meta; /* 8 8 */
void * security; /* 16 8 */
enum bpf_map_type map_type; /* 24 4 */
u32 key_size; /* 28 4 */
u32 value_size; /* 32 4 */
u32 max_entries; /* 36 4 */
u32 map_flags; /* 40 4 */
int spin_lock_off; /* 44 4 */
u32 id; /* 48 4 */
int numa_node; /* 52 4 */
u32 btf_key_type_id; /* 56 4 */
u32 btf_value_type_id; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct btf * btf; /* 64 8 */
struct bpf_map_memory memory; /* 72 16 */
bool unpriv_array; /* 88 1 */
bool frozen; /* 89 1 */
/* XXX 38 bytes hole, try to pack */
/* --- cacheline 2 boundary (128 bytes) --- */
atomic64_t refcnt __attribute__((__aligned__(64))); /* 128 8 */
atomic64_t usercnt; /* 136 8 */
struct work_struct work; /* 144 32 */
char name[16]; /* 176 16 */
/* size: 192, cachelines: 3, members: 21 */
/* sum members: 154, holes: 1, sum holes: 38 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 38 */
} __attribute__((__aligned__(64)));
This patch, while modifying all users of bpf_map_inc, also cleans up its
interface to match bpf_map_put with separate operations for bpf_map_inc and
bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref,
respectively). Also, given there are no users of bpf_map_inc_not_zero
specifying uref=true, remove uref flag and default to uref=false internally.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-2-andriin@fb.com
|
|
Lots of overlapping changes and parallel additions, stuff
like that.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drivers/net/phy/mscc.c:1683:3-4: Unneeded semicolon
Remove unneeded semicolon.
Generated by: scripts/coccinelle/misc/semicolon.cocci
Fixes: 75a1ccfe6c72 ("mscc.c: Add support for additional VSC PHYs")
CC: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Load Realtek-provided firmware for RTL8168fp/RTL8117. Unlike the
firmware for other chip versions which is for the PHY, firmware for
RTL8168fp/RTL8117 is for the MAC.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using constant MII_EXPANSION is misleading here because register 0x06
has a different meaning on page 0x0005. Here a proprietary PHY
parameter is read by writing the parameter id to register 0x05 on page
0x0005, followed by reading the parameter value from register 0x06.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use phy_support_asym_pause() rather than open-coding it.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.5
Second set of patches for v5.5. Nothing special this time, smaller
features to various drivers and of course fixes all over.
Major changes:
iwlwifi
* update scan FW API
* bump the supported FW API version
* add debug dump collection on assert in WoWLAN
* enable adaptive dwell on P2P interfaces
ath10k
* request for PM_QOS_CPU_DMA_LATENCY to improve firmware initialisation time
qtnfmac
* add support for getting/setting transmit power
* handle MIC failure event from firmware
rtl8xxxu
* add support for Edimax EW-7611ULB
wil6210
* add SPDX license identifiers
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch cleans-up the stray left over code. It has no
functionality impact.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
to balance packets between slaves. With some network errors, we receive an ICMP
error packet by the remote host or a router. If sent by a router, the source IP
can differ from the remote host one. Additionally the ICMP protocol has no port
numbers, so a layer3+4 bonding will get a different hash than the previous one.
These two conditions could let the packet go through a different interface than
the other packets of the same flow:
# tcpdump -qltnni veth0 |sed 's/^/0: /' &
# tcpdump -qltnni veth1 |sed 's/^/1: /' &
# hping3 -2 192.168.0.2 -p 9
0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
An ICMP error packet contains the header of the packet which caused the network
error, so inspect it and match the flow against it, so we can send the ICMP via
the same interface of the previous packet in the flow.
Move the IP and port dissect code into a generic function bond_flow_ip() and if
we are dissecting an ICMP error packet, call it again with the adjusted offset.
# hping3 -2 192.168.0.2 -p 9
1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit 0c65b2b90d13c ("net: of_get_phy_mode: Change API to solve
int/unit warnings") updated the function of_get_phy_mode declaration.
Now it returns an error code and in case the node doesn't contain the
property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would
set the phy_interface_t to PHY_INTERFACE_MODE_NA.
Ocelot VSC7514 has 4 internal phys which have the phy interface
PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign
PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need
to add the error check.
Updates for v2:
- drop error check because of_get_phy_mode already assigns phy_interface
to PHY_INTERFACE_MODE in case of error.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver forgets to free allocated netdev in remove like
what is done in probe failure.
Add the free to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a
void return type. No callers of .hw_reset check a return value and
bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all
hw_reset_t casts and fix the return type to void.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|