lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2020-09-17	net: marvell: prestera: Add Switchdev driver implementation	Vadym Kochan
	The following features are supported: - VLAN-aware bridge offloading - VLAN-unaware bridge offloading - FDB offloading (learning, ageing) - Switchport configuration Currently there are some limitations like: - Only 1 VLAN-aware bridge instance supported - FDB ageing timeout parameter is set globally per device Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: marvell: prestera: Add ethtool interface support	Vadym Kochan
	The ethtool API provides support for the configuration of the following features: speed and duplex, auto-negotiation, MDI-x, forward error correction, port media type. The API also provides information about the port status, hardware and software statistic. The following limitation exists: - port media type should be configured before speed setting - ethtool -m option is not supported - ethtool -p option is not supported - ethtool -r option is supported for RJ45 port only - the following combination of parameters is not supported: ethtool -s sw1pX port XX autoneg on - forward error correction feature is supported only on SFP ports, 10G speed - auto-negotiation and MDI-x features are not supported on Copper-to-Fiber SFP module Co-developed-by: Andrii Savka <andrii.savka@plvision.eu> Signed-off-by: Andrii Savka <andrii.savka@plvision.eu> Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: marvell: prestera: Add basic devlink support	Vadym Kochan
	Add very basic support for devlink interface: - driver name - fw version - devlink ports Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: marvell: prestera: Add PCI interface support	Vadym Kochan
	Add PCI interface driver for Prestera Switch ASICs family devices, which provides: - Firmware loading mechanism - Requests & events handling to/from the firmware - Access to the firmware on the bus level The firmware has to be loaded each time the device is reset. The driver is loading it from: /lib/firmware/mrvl/prestera/mvsw_prestera_fw-v{MAJOR}.{MINOR}.img The full firmware image version is located within the internal header and consists of 3 numbers - MAJOR.MINOR.PATCH. Additionally, driver has hard-coded minimum supported firmware version which it can work with: MAJOR - reflects the support on ABI level between driver and loaded firmware, this number should be the same for driver and loaded firmware. MINOR - this is the minimum supported version between driver and the firmware. PATCH - indicates only fixes, firmware ABI is not changed. Firmware image file name contains only MAJOR and MINOR numbers to make driver be compatible with any PATCH version. Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: marvell: prestera: Add driver for Prestera family ASIC devices	Vadym Kochan
	Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8 ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely wireless SMB deployment. The current implementation supports only boards designed for the Marvell Switchdev solution and requires special firmware. The core Prestera switching logic is implemented in prestera_main.c, there is an intermediate hw layer between core logic and firmware. It is implemented in prestera_hw.c, the purpose of it is to encapsulate hw related logic, in future there is a plan to support more devices with different HW related configurations. This patch contains only basic switch initialization and RX/TX support over SDMA mechanism. Currently supported devices have DMA access range <= 32bit and require ZONE_DMA to be enabled, for such cases SDMA driver checks if the skb allocated in proper range supported by the Prestera device. Also meanwhile there is no TX interrupt support in current firmware version so recycling work is scheduled on each xmit. Port's mac address is generated from the switch base mac which may be provided via device-tree (static one or as nvme cell), or randomly generated. This is required by the firmware. Co-developed-by: Andrii Savka <andrii.savka@plvision.eu> Signed-off-by: Andrii Savka <andrii.savka@plvision.eu> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu> Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu> Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: hns3: use napi_consume_skb() when cleaning tx desc	Yunsheng Lin
	Use napi_consume_skb() to batch consuming skb when cleaning tx desc in NAPI polling. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: hns3: use writel() to optimize the barrier operation	Yunsheng Lin
	writel() can be used to order I/O vs memory by default when writing portable drivers. Use writel() to replace wmb() + writel_relaxed(), and writel() is dma_wmb() + writel_relaxed() for ARM64, so there is an optimization here because dma_wmb() is a lighter barrier than wmb(). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: hns3: optimize the rx clean process	Yunsheng Lin
	Currently HNS3_RING_RX_RING_FBDNUM_REG register is read to determine how many rx desc can be cleaned. To avoid the register read operation in the critical data path, use the valid bit in the rx desc to determine if a specific rx desc can be cleaned. The hns3 driver clear valid bit in the rx desc before notifying the rx desc to the hw, and hw will only set the valid bit of the rx desc after corresponding buffer is filled with packet data and other field in the rx desc is set accordingly. Add hns3_rx_ring_move_fw() function to clear the valid bit in the rx desc before moving rx ring's next_to_clean forward to avoid double cleaning a rx desc, also add a dma_rmb() barrier in hns3_handle_rx_bd() to make sure valid bit is set before reading other field in the rx desc. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: hns3: optimize the tx clean process	Yunsheng Lin
	Currently HNS3_RING_TX_RING_HEAD_REG register is read to determine how many tx desc can be cleaned. To avoid the register read operation in the critical data path, use the valid bit in the tx desc to determine if a specific tx desc can be cleaned. The hns3 driver sets valid bit in the tx desc before ringing a doorbell to the hw, and hw will only clear the valid bit of the tx desc after corresponding packet is sent out to the wire. And because next_to_use for tx ring is a changing variable when the driver is filling the tx desc, so reuse the pull_len for rx ring to record the tx desc that has notified to the hw, so that hns3_nic_reclaim_desc() can decide how many tx desc's valid bit need checking when reclaiming tx desc. And io_err_cnt stat is also removed for it is not used anymore. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: hns3: batch tx doorbell operation	Yunsheng Lin
	Use netdev_xmit_more() to defer the tx doorbell operation when the skb is passed to the driver continuously. By doing this we can improve the overall xmit performance by avoid some doorbell operations. Also, the tx_err_cnt stat is not used, so rename it to tx_more stat. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: hns3: batch the page reference count updates	Yunsheng Lin
	Batch the page reference count updates instead of doing them one at a time. By doing this we can improve the overall receive performance by avoid some atomic increment operations when the rx page is reused. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	cxgb4vf: convert to use DEFINE_SEQ_ATTRIBUTE macro	Liu Shixin
	Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	ionic: dynamic interrupt moderation	Shannon Nelson
	Use the dim library to manage dynamic interrupt moderation in ionic. v3: rebase v2: untangled declarations in ionic_dim_work() Signed-off-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	net: stmmac: Add support to Ethtool get/set ring parameters	Song, Yoong Siang
	This patch add support to --show-ring & --set-ring Ethtool functions: - Adding min, max, power of two check to new ring parameter's value. - Bring down the network interface before changing the value of ring parameters. - Bring up the network interface after changing the value of ring parameters. Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_buffers: Manage internal buffer in the hdroom code	Petr Machata
	Traffic mirroring modes that are in-chip implemented on egress need an internal buffer to work. As the only client, the SPAN module was managing the buffer so far. However logically it belongs to the buffers module. E.g. buffer size validation needs to take the size of the internal buffer into account. Therefore move the related code from SPAN to spectrum_buffers. Move over the callbacks that determine the minimum buffer size as a function of maximum speed and MTU. Add a field describing the internal buffer to struct mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of sizing the internal buffer as well. Change the SPAN module to invoke that function and mlxsw_sp_hdroom_configure() like all the other hdroom clients. Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable(). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_buffers: Introduce shared buffer ops	Petr Machata
	The size of the internal buffer is currently calculated in the SPAN module. Logically it belongs to the spectrum_buffers module, where it should be moved. However, that being a chip-specific operation, it needs dynamic dispatch. There currently is a chip-specific structure for description of shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into this structure would be confusing. Therefore introduce a new per-chip structure, currently empty, and initialize the ops pointer as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_buffers: Convert mlxsw_sp_port_headroom_init()	Petr Machata
	Currently mlxsw_sp_port_headroom_init() configures both priomap and buffers by hand. Additionally, for port buffers, it configures buffer 0 with a size that it will never again have if PFC configuration is touched. Rewrite the init code to become a client of the new hdroom code. The only difference in invocation is that the configuration is forced, so that it is issued even if the desired configuration happens to match what is contained in (hitherto not initialized with meaningful values) mlxsw_sp_port->hdroom. Since now mlxsw_sp_port_headroom_init() initializes all the PG buffers to meaningful values, mlxsw_sp_hdroom_configure_buffers() can avoid querying the current configuration, and can fill the whole PBMC itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_buffers: Inline mlxsw_sp_sb_max_headroom_cells()	Petr Machata
	This function is now only used from the buffers module, and is a trivial field reference. Just inline it and drop the related artifacts. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_buffers: Move here the new headroom code	Petr Machata
	Move all the headroom code to the spectrum_buffers module, where it belongs. Rename mlxsw_sp_pg_buf_threshold_get() and mlxsw_sp_pg_buf_pack() to ..._hdroom_... to match the naming convention of the new headroom code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Move here the three-step headroom configuration from DCB	Petr Machata
	The ETS handler performs the headroom configuration in three steps: first it resizes the buffers and adds any new ones. Then it redirects priorities to the new buffers. And finally it sets the size of the now-unused buffers to zero. This way no packet drops are introduced. This sort of careful approach will also be useful for configuring port buffer sizes and priority map by hand, through dcbnl_setbuffer. Therefore move the code from the DCB handler to the generic headroom function. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_dcb: Convert mlxsw_sp_port_pg_prio_map() to hdroom code	Petr Machata
	The new hdroom code has certain conventions: iteration over priorities is done through a variable named `prio', configuration is not pushed unless it is dirty, but a `force' flag can be used to override this, updated configuration is written to port. Convert the function mlxsw_sp_port_pg_prio_map() to use these conventions and rename appropriately to fit in. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_dcb: Convert ETS handler fully to mlxsw_sp_hdroom_configure()	Petr Machata
	The ETS handler performs the headroom configuration in three steps: first it resizes the buffers and adds any new ones. Then it redirects priorities to the new buffers. And finally it sets the size of the now-unused buffers to zero. This way no packet drops are introduced. Both of the buffer size configuration operations are simply buffer size configurations, there is no material difference between setting buffers to zero and any other value. Therefore simply invoke the same mlxsw_sp_hdroom_configure(), and drop mlxsw_sp_port_pg_destroy() and mlxsw_sp_ets_has_pg() which are now unused. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Split headroom autoresize out of buffer configuration	Petr Machata
	Split mlxsw_sp_port_headroom_set() to three functions. mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that the requested buffer configuration matches total headroom size requirements. Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually perform full headroom configuration, but for now, only have them verify the configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers(). Have them take the `force` argument to prepare for a later patch, even though it is currently unused. Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer, it needs to keep the values queried from the FW. Eventually this function should configure all the PGs. Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That function performs the headroom configuration in three steps: first it resizes the buffers and adds any new ones. Then it redirects priorities to the new buffers. And finally it sets the size of the now-unused buffers to zero. This way no packet drops are introduced. So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the configuration to keep the old sizes of PG buffers for those buffers whose size was set to zero. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Track buffer sizes in struct mlxsw_sp_hdroom	Petr Machata
	So far, port buffers were always autoconfigured. When dcbnl_setbuffer callback is implemented, it will allow the user to change the buffer size configuration by hand. The sizes therefore need to be a configuration parameter, not always deduced, and therefore belong to struct mlxsw_sp_hdroom, where the configuration routine should take them from. Update mlxsw_sp_port_headroom_set() to update these sizes. Have the function update the sizes even for the case that a given buffer is not used. Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Track lossiness in struct mlxsw_sp_hdroom	Petr Machata
	Client-side configuration has lossiness as an attribute of a priority. Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio. To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add struct mlxsw_sp_hdroom_buf, which in the following patches will get more attributes, but right now only use it to track port buffer lossiness. Instead of passing around the primary indicators of PFC and pause_en, add a function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer lossiness from the priority map and priority lossiness. Change mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the headroom configuration. Have the PFC and pause handlers configure priority lossiness in mlxsw_sp_hdroom, from where it will propagate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Track priorities in struct mlxsw_sp_hdroom	Petr Machata
	The mapping from priorities to buffers determines which buffers should be configured. Lossiness of these priorities combined with the mapping determines whether a given buffer should be lossy. Currently this configuration is stored implicitly in DCB ETS, PFC and ethtool PAUSE configuration. Keeping it together with the rest of the headroom configuration and deriving it as needed from PFC / ETS / PAUSE will make things clearer. To that end, add a field "prios" to struct mlxsw_sp_hdroom. Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and assumed that the same mapping as we use on the egress should be used on ingress as well. Instead, track this configuration at each priority, so that it can be adjusted flexibly. In the following patches, as dcbnl_setbuffer is implemented, it will need to store its own mapping, and it will also be sometimes necessary to revert back to the original ETS mapping. Therefore track two buffer indices: the one for chip configuration (buf_idx), and the source one (ets_buf_idx). Introduce a function to configure the chip-level buffer index, and for now have it simply copy the ETS mapping over to the chip mapping. Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the buf_idx recomputation. Now that there is a canonical place to look for this configuration, mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if DCB is compiled out. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Track MTU in struct mlxsw_sp_hdroom	Petr Machata
	MTU influences sizes of auto-allocated buffers. Make it a part of port buffer configuration and have __mlxsw_sp_port_headroom_set() take it from there, instead of as an argument. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum: Unify delay handling between PFC and pause	Petr Machata
	When a priority is marked as lossless using DCB PFC, or when pause frames are enabled on a port, mlxsw adds to port buffers an extra space to cover the traffic that will arrive between the time that a pause or PFC frame is emitted, and the time traffic actually stops. This is called the delay. The concept is the same in PFC and pause, however the way the extra buffer space is calculated differs. In this patch, unify this handling. Delay is to be measured in bytes of extra space, and will not include MTU. PFC handler sets the delay directly from the parameter it gets through the DCB interface. To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module, convert to bytes, and reduce it by maximum MTU, and divide by two. Then it has the same meaning as the delay_bytes set by the PFC handler. Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the previous patch. Change PFC and pause handlers to store the new delay value there and have __mlxsw_sp_port_headroom_set() take it from there. Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(), introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision. Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory comment describing the formula used. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	mlxsw: spectrum_buffers: Add struct mlxsw_sp_hdroom	Petr Machata
	The port headroom handling is currently strewn across several modules and tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the settings, and then there is the completely separate initial configuraion in spectrum_buffers. A following patch will implement the dcbnl_setbuffer callback, which is going to further complicate the landscape. In order to simplify work with port buffers, the following patches are going to centralize all port-buffer handling in spectrum_buffers. As a first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will keep the configuration parameters, and allocate and free it in appropriate places. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-16	Merge tag 'mlx5-updates-2020-09-15' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-09-15 Various updates to mlx5 driver, 1) Eli adds support for TC trap action. 2) Eran, minor improvements to clock.c code structure 3) Better handling of error reporting in LAG from Jianbo 4) IPv6 traffic class (DSCP) header rewrite support from Maor 5) Ofer Levi adds support for CQE compression of multi-strides packets 6) Vu, Enables use of vport meta data by default. 7) Some minor code cleanup ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	chelsio/chtls: Re-add dependencies on CHELSIO_T4 to fix modular CHELSIO_T4	Geert Uytterhoeven
	As CHELSIO_INLINE_CRYPTO is bool, and CHELSIO_T4 is tristate, the dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is not sufficient to protect CRYPTO_DEV_CHELSIO_TLS and CHELSIO_IPSEC_INLINE. The latter two are also tristate, hence if CHELSIO_T4=n, they cannot be builtin, as that would lead to link failures like: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:259: undefined reference to `cxgb4_port_viid' and drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c:752: undefined reference to `cxgb4_reclaim_completed_tx' Fix this by re-adding dependencies on CHELSIO_T4 to tristate symbols. The dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is kept to avoid asking the user. Fixes: 6bd860ac1c2a0ec2 ("chelsio/chtls: CHELSIO_INLINE_CRYPTO should depend on CHELSIO_T4") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: core: Introduce fw_fatal health reporter	Jiri Pirko
	Introduce devlink health reporter to report FW fatal events. Implement the event listener using MFDE trap and enable the events to be propagated using MFGD register configuration. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: reg: Add Monitoring FW General Debug Register	Jiri Pirko
	Introduce MFGD register that is used to configure firmware debugging. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: reg: Add Monitoring FW Debug Register	Jiri Pirko
	Introduce MFDE register that is passed through MFDE trap in case of fatal FW event. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: Move fw_load_policy devlink param into core.c	Jiri Pirko
	As the fw flashing code was moved to core.c, move the param which is related to it there as well. Remove unnecessary parentheses on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: core: Push code doing params register/unregister into separate helpers	Jiri Pirko
	Extract the code calling params register/unregister driver ops into separate functions. Call publish/unpublish unconditionally. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: Move fw flashing code into core.c	Jiri Pirko
	As the firmware flashing is not specific to Spectrum, move the code to core.c and avoid one op call and 2 exported symbols. Also, this allows to do flash before call of driver->init function and possibly do other core calls in between. Do some small renaming here and there on the way to be consistent with the rest of core.c code. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlxsw: Bump firmware version to XX.2008.1310	Jiri Pirko
	Among other changes, this version supports FW monitoring. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	net: stmmac: use netif_tx_start\|stop_all_queues() function	Ong Boon Leong
	The current implementation of stmmac_stop_all_queues() and stmmac_start_all_queues() will not work correctly when the value of tx_queues_to_use is changed through ethtool -L DEVNAME rx N tx M command. Also, netif_tx_start\|stop_all_queues() are only needed in driver open() and close() only. Fixes: c22a3f48 net: stmmac: adding multiple napi mechanism Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	net: stmmac: Fix incorrect location to set real_num_rx\|tx_queues	Aashish Verma
	netif_set_real_num_tx_queues() & netif_set_real_num_rx_queues() should be used to inform network stack about the real Tx & Rx queue (active) number in both stmmac_open() and stmmac_resume(), therefore, we move the code from stmmac_dvr_probe() to stmmac_hw_setup(). Fixes: c02b7a914551 net: stmmac: use netif_set_real_num_{rx,tx}_queues Signed-off-by: Aashish Verma <aashishx.verma@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	net: stmmac: add ethtool support for get/set channels	Ong Boon Leong
	Restructure NAPI add and delete process so that we can call them accordingly in open() and ethtool_set_channels() accordingly. Introduced stmmac_reinit_queues() to handle the transition needed for changing Rx & Tx channels accordingly. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlx4: add pause frame stats	Jakub Kicinski
	Check if the pause stats are reported by HW by checking the bitmap. Calculation is based on the order of strings in main_strings from ethtool -S. Hopefully the semantics of these stats match the standard.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	mlx5: add pause frame stats	Jakub Kicinski
	Plumb through all the indirection and copy some code from ethtool -S. The names of the group indicate that these are the stats we are after (and Saeed confirms it). v3: - fix build in mlx5_rep v2: - drop the ethool helper and call stats directly - don't pass 0 as initialized to in buffer - use local buffer Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	ixgbe: add pause frame stats	Jakub Kicinski
	Report standard pause frame stats. They are already aggregated in struct ixgbe_hw_stats. The combination of the registers is suggested as equivalent to PAUSEMACCtrlFramesTransmitted / PAUSEMACCtrlFramesReceived by the Intel 82576EB datasheet, I could not find any information in the HW actually supported by ixgbe. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	bnxt: add pause frame stats	Jakub Kicinski
	These stats are already reported in ethtool -S. Michael confirms they are equivalent to standard stats. v2: - fix sparse warning about endian by using the macro - use u64 for pointer type Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	netdevsim: add pause frame stats	Jakub Kicinski
	Add minimal ethtool interface for testing ethtool pause stats. v2: add missing static on nsim_ethtool_ops Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	s390/qeth: implement ndo_bridge_setlink for learning_sync	Alexandra Winter
	Documentation/networking/switchdev.txt and 'man bridge' indicate that the learning_sync bridge attribute is used to control whether a given device will sync MAC addresses learned on its device port to a master bridge FDB, where they will show up as 'extern_learn offload'. So we map qeth_l2_dev2br_an_set() to the learning_sync bridge link attribute. Turning off learning_sync will flush all extern_learn entries from the bridge fdb and all pending events from the card's work queue. When the hardware interface goes offline with learning_sync on (e.g. for HW recovery), all extern_learn entries will be flushed from the bridge fdb and all pending events from the card's work queue. When the interface goes online again, it will send new notifications for all then valid MACs. learning_sync attribute can not be modified while interface is offline. See 'commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")' An alternative implementation would be to always offload the 'learning' attribute of a software bridge to the hardware interface attached to it and thus implicitly enable fdb notification. This was not chosen for 2 reasons: 1) In our case the software bridge is NOT a representation of a hardware switch. It is just connected to a smart NIC that is able to inform about the addresses attached to it. It is not necessarily using source MAC learning for this and other bridgeports can be attached to other NICs with different properties. 2) We want a means to enable this notification explicitly. There may be cases where a bridgeport is set to 'learning', but we do not want to enable the notification. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	s390/qeth: implement ndo_bridge_getlink for learning_sync	Alexandra Winter
	Documentation/networking/switchdev.txt and 'man bridge' indicate that the learning_sync bridge attribute is used to indicate whether a given device will sync MAC addresses learned on its device port to a master bridge FDB. learning_sync attribute can not be read while interface is offline (down). See 'commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")' We return EOPNOTSUPP and not EONODEV in this case, because EONOTSUPP is the only rc that is tolerated by 'bridge -d link show'. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	s390/qeth: Reset address notification in case of buffer overflow	Alexandra Winter
	In case hardware sends more device-to-bridge-address-change notfications than the qeth-l2 driver can handle, the hardware will send an overflow event and then stop sending any events. It expects software to flush its FDB and start over again. Re-enabling address-change-notification will report all current addresses. In order to re-enable address-change-notification this patch defines the functions qeth_l2_dev2br_an_set() and qeth_l2_dev2br_an_set_cb to enable or disable dev-to-bridge-address-notification. A following patch will use the learning_sync bridgeport flag to trigger enabling or disabling of address-change-notification, so we define priv->brport_features to store the current setting. BRIDGE_INFO and ADDR_INFO functionality are mutually exclusive, whereas ADDR_INFO and qeth_l2_vnicc* can be used together. Alternative implementations to handle buffer overflow: Just re-enabling notification and adding all newly reported addresses would cover any lost 'add' events, but not the lost 'delete' events. Then these invalid addresses would stay in the bridge FDB as long as the device exists. Setting the net device down and up, would be an alternative, but is a bit drastic. If the net device has many secondary addresses this will create many delete/add events at its peers which could de-stabilize the network segment. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15	s390/qeth: Translate address events into switchdev notifiers	Alexandra Winter
	A qeth-l2 HiperSockets card can show switch-ish behaviour in the sense, that it can report all MACs that are reachable via this interface. Just like a switch device, it can notify the software bridge about changes to its fdb. This patch exploits this device-to-bridge-notification and extracts the relevant information from the hardware events to generate notifications to an attached software bridge. There are 2 sources for this information: 1) The reply message of Perform-Network-Subchannel-Operations (PNSO) (operation code ADDR_INFO) reports all addresses that are currently reachable (implemented in a later patch). 2) As long as device-to-bridge-notification is enabled, hardware will generate address change notification events, whenever the content of the hardware fdb changes (this patch). The bridge_hostnotify feature (PNSO operation code BRIDGE_INFO) uses the same address change notification events. We need to distinguish between qeth_pnso_mode QETH_PNSO_BRIDGEPORT and QETH_PNSO_ADDR_INFO and call a different handler. In both cases deadlocks must be prevented, if the workqueue is drained under lock and QETH_PNSO_NONE, when notification is disabled. bridge_hostnotify generates udev events, there is no intend to do the same for dev2br. Instead this patch will generate SWITCHDEV_FDB_ADD_TO_BRIDGE and SWITCHDEV_FDB_DEL_TO_BRIDGE notifications, that will cause the software bridge to add (or delete) entries to its fdb as 'extern_learn offload'. Documentation/networking/switchdev.txt proposes to add "depends NET_SWITCHDEV" to driver's Kconfig. This is not done here, so even in absence of the NET_SWITCHDEV module, the QETH_L2 module will still be built, but then the switchdev notifiers will have no effect. No VLAN filtering is done on the entries and VLAN information is not passed on to the bridge fdb entries. This could be added later. For now VLAN interfaces can be defined on the upper bridge interface. Multicast entries are not passed on to the bridge fdb. This could be added later. For now mcast flooding can be used in the bridge. The card reports all MACs that are in its FDB, but we must not pass on MACs that are registered for this interface. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>