summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-23ipv6: rename and move ip6_dst_lookup_tunnel()Beniamino Galvani
At the moment ip6_dst_lookup_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel6_dst_lookup() and move it to file net/ipv6/ip6_udp_tunnel.c. This is similar to what already done for IPv4 in commit bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-23net: atm: Remove redundant check.Gavrilov Ilia
Checking the 'adev' variable is unnecessary, because 'cdev' has already been checked earlier. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 656d98b09d57 ("[ATM]: basic sysfs support for ATM devices") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22Merge branch 'bnxt_en-next'David S. Miller
Michael Chan says: ==================== bnxt_en: Update for net-next The first 2 patches are fixes for the recently added hwmon changes. The next 6 patches are enhancements to support ethtool lanes and all the proper supported and advertised link modes. Before these patches, the driver was only supporting the link modes for copper media. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: extend media types to supported and autoneg modesEdwin Peer
The current driver code does not accurately report the supported and advertised link modes. It basically always assumes the media type is copper for any particular speed. Utilize the recently added link mode mappings to accurately report fully qualified ethtool link modes for advertised and supported speeds. If the media type is known, we will report the supported link modes for that media only. If the media is not known, we will report all possible supported link modes. The user can now specify any supported link modes (including NRZ and PAM4) to advertise for autoneg. It used to only accept copper NRZ modes. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: convert to linkmode_set_bit() APIEdwin Peer
Barring the BNXT_FW_TO_ETHTOOL speed macros, which will be removed in the next patch, update code to use the newer API. No functional change. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: Refactor NRZ/PAM4 link speed related logicMichael Chan
Refactor some NRZ/PAM4 link speed related logic into helper functions. The NRZ and PAM4 link parameters are stored in separate structure fields. The driver logic has to check whether it is in NRZ or PAM4 mode and then use the appropriate field. Refactor this logic into helper functions for better readability. Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: refactor speed independent ethtool modesEdwin Peer
A future patch in this series will change the algorithm used to determine ethtool speed and media modes. Extract the handling of the unrelated pause, autoneg modes into an independent function. Also separate FEC handling out of bnxt_fw_to_ethtool_*_spds(). No functional change. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: support lane configuration via ethtoolEdwin Peer
Recent kernels support changing the number of link lanes via ethtool. This is useful for determining the appropriate signal mode to use when a given link speed can be achieved using different lane configurations. Accept the ethtool lanes parameter when configuring forced speed. If there is no lanes parameter, select a default. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: add infrastructure to lookup ethtool link modeEdwin Peer
Add infrastructure to look up the enum ethtool_link_mode_bit_indices from link information provided by the firmware. The link speed, signal mode, and media type returned by firmware will be used to look up the ethtool link mode. The immediate benefit is that once the link mode is determined, we can now use ethtool_params_from_link_mode() to fill the basic ethtool parameters including the number of lanes. Lanes will be fully supported in the next patch. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: Fix invoking hwmon_notify_eventKalesh AP
FW sends the async event to the driver when the device temperature goes above or below the threshold values. Only notify hwmon if the temperature is increasing to the next alert level, not when it is decreasing. Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22bnxt_en: Do not call sleeping hwmon_notify_event() from NAPIKalesh AP
Defer hwmon_notify_event() to bnxt_sp_task() workqueue because hwmon_notify_event() can try to acquire a mutex shown in the stack trace below. Modify bnxt_event_error_report() to return true if we need to schedule bnxt_sp_task() to notify hwmon. __schedule+0x68/0x520 hwmon_notify_event+0xe8/0x114 schedule+0x60/0xe0 schedule_preempt_disabled+0x28/0x40 __mutex_lock.constprop.0+0x534/0x550 __mutex_lock_slowpath+0x18/0x20 mutex_lock+0x5c/0x70 kobject_uevent_env+0x2f4/0x3d0 kobject_uevent+0x10/0x20 hwmon_notify_event+0x94/0x114 bnxt_hwmon_notify_event+0x40/0x70 [bnxt_en] bnxt_event_error_report+0x260/0x290 [bnxt_en] bnxt_async_event_process.isra.0+0x250/0x850 [bnxt_en] bnxt_hwrm_handler.isra.0+0xc8/0x120 [bnxt_en] bnxt_poll_p5+0x150/0x350 [bnxt_en] __napi_poll+0x3c/0x210 net_rx_action+0x308/0x3b0 __do_softirq+0x120/0x3e0 Cc: Guenter Roeck <linux@roeck-us.net> Fixes: a19b4801457b ("bnxt_en: Event handler for Thermal event") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-21octeon_ep: assert hardware structure sizesShinas Rasheed
Clean up structure defines related to hardware data to be asserted to fixed sizes, as padding is not allowed by hardware. Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-21net: dsa: mv88e6xxx: add an error code check in mv88e6352_tai_event_workSu Hui
mv88e6xxx_tai_write() can return error code (-EOPNOTSUPP ...) if failed. So check the value of 'ret' after calling mv88e6xxx_tai_write(). Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-21selftests: tc-testing: add test for 'rt' upgrade on hfscPedro Tammela
Add a test to check if inner rt curves are upgraded to sc curves. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: wwan: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect chinfo.name to be NUL-terminated based on its use with format strings and sprintf: rpmsg/rpmsg_char.c 165: dev_err(dev, "failed to open %s\n", eptdev->chinfo.name); 368: return sprintf(buf, "%s\n", eptdev->chinfo.name); ... and with strcmp(): | static struct rpmsg_endpoint *qcom_glink_create_ept(struct rpmsg_device *rpdev, | rpmsg_rx_cb_t cb, | void *priv, | struct rpmsg_channel_info | chinfo) | ... | const char *name = chinfo.name; | ... | if (!strcmp(channel->name, name)) Since chinfo is initialized as such (just above the strscpy()): | struct rpmsg_channel_info chinfo = { | .src = rpwwan->rpdev->src, | .dst = RPMSG_ADDR_ANY, | }; ... we know other members are zero-initialized. This means no NUL-padding is required (as any NUL-byte assignments are redundant). Considering the above, a suitable replacement is `strscpy` due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231019-strncpy-drivers-net-wwan-rpmsg_wwan_ctrl-c-v2-1-ecf9b5a39430@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20pds_core: add an error code check in pdsc_dl_info_getSu Hui
check the value of 'ret' after call 'devlink_info_version_stored_put'. Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20231019083351.1526484-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20Merge branch 'ice-vf-resource-tracking'David S. Miller
Jacob Keller says: ==================== Intel Wired LAN Driver Updates 2023-10-19 (ice, igb, ixgbe) This series contains improvements to the ice driver related to VF MSI-X resource tracking, as well as other minor cleanups. Dan fixes code in igb and ixgbe where the conversion to list_for_each_entry failed to account for logic which assumed a NULL pointer after iteration. Jacob makes ice_get_pf_c827_idx static, and refactors ice_find_netlist_node based on feedback that got missed before the function merged. Michal adds a switch rule to drop all traffic received by an inactive LAG port. He also implements ops to allow individual control of MSI-X vectors for SR-IOV VFs. Przemek removes some unused fields in struct ice_flow_entry, and modifies the ice driver to cache the VF PCI device inside struct ice_vf rather than performing lookup at run time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ixgbe: fix end of loop test in ixgbe_set_vf_macvlan()Dan Carpenter
The list iterator in a list_for_each_entry() loop can never be NULL. If the loop exits without hitting a break then the iterator points to an offset off the list head and dereferencing it is an out of bounds access. Before we transitioned to using list_for_each_entry() loops, then it was possible for "entry" to be NULL and the comments mention this. I have updated the comments to match the new code. Fixes: c1fec890458a ("ethernet/intel: Use list_for_each_entry() helper") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20igb: Fix an end of loop testDan Carpenter
When we exit a list_for_each_entry() without hitting a break statement, the list iterator isn't NULL, it just point to an offset off the list_head. In that situation, it wouldn't be too surprising for entry->free to be true and we end up corrupting memory. The way to test for these is to just set a flag. Fixes: c1fec890458a ("ethernet/intel: Use list_for_each_entry() helper") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: cleanup ice_find_netlist_nodeJacob Keller
The ice_find_netlist_node function was introduced in commit 8a3a565ff210 ("ice: add admin commands to access cgu configuration"). Variations of this function were reviewed concurrently on both intel-wired-lan[1][2], and netdev [3][4] [1]: https://lore.kernel.org/intel-wired-lan/20230913204943.1051233-7-vadim.fedorenko@linux.dev/ [2]: https://lore.kernel.org/intel-wired-lan/20230817000058.2433236-5-jacob.e.keller@intel.com/ [3]: https://lore.kernel.org/netdev/20230918212814.435688-1-anthony.l.nguyen@intel.com/ [4]: https://lore.kernel.org/netdev/20230913204943.1051233-7-vadim.fedorenko@linux.dev/ The variant I posted had a few changes due to review feedback which were never incorporated into the DPLL series: * Replace the references to ancient and long removed ICE_SUCCESS and ICE_ERR_DOES_NOT_EXIST status codes in the function comment. * Return -ENOENT instead of -ENOTBLK, as a more common way to indicate that an entry doesn't exist. * Avoid the use of memset() and use simple static initialization for the cmd variable. * Use FIELD_PREP to assign the node_type_ctx. * Remove an unnecessary local variable to keep track of rec_node_handle, just pass the node_handle pointer directly into ice_aq_get_netlist_node. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: make ice_get_pf_c827_idx staticJacob Keller
The ice_get_pf_c827_idx function is only called inside of ice_ptp_hw.c, so there is no reason to export it. Mark it static and remove the declaration from ice_ptp_hw.h Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: manage VFs MSI-X using resource trackingMichal Swiatkowski
Track MSI-X for VFs using bitmap, by setting and clearing bitmap during allocation and freeing. Try to linearize irqs usage for VFs, by freeing them and allocating once again. Do it only for VFs that aren't currently running. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: set MSI-X vector count on VFMichal Swiatkowski
Implement ops needed to set MSI-X vector count on VF. sriov_get_vf_total_msix() should return total number of MSI-X that can be used by the VFs. Return the value set by devlink resources API (pf->req_msix.vf). sriov_set_msix_vec_count() will set number of MSI-X on particular VF. Disable VF register mapping, rebuild VSI with new MSI-X and queues values and enable new VF register mapping. For best performance set number of queues equal to number of MSI-X. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: add bitmap to track VF MSI-X usageMichal Swiatkowski
Create a bitamp to track MSI-X usage for VFs. The bitmap has the size of total MSI-X amount on device, because at init time the amount of MSI-X used by VFs isn't known. The bitmap is used in follow up patchset to provide a block of continuous block of MSI-X indexes for each created VF. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: implement num_msix field per VFMichal Swiatkowski
Store the amount of MSI-X per VF instead of storing it in pf struct. It is used to calculate number of q_vectors (and queues) for VF VSI. This is necessary because with follow up changes the number of MSI-X can be different between VFs. Use it instead of using pf->vf_msix value in all cases. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: store VF's pci_dev ptr in ice_vfPrzemek Kitszel
Extend struct ice_vf by vfdev. Calculation of vfdev falls more nicely into ice_create_vf_entries(). Caching of vfdev enables simplification of ice_restore_all_vfs_msi_state(). Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: add drop rule matching on not active lportMichal Swiatkowski
Inactive LAG port should not receive any packets, as it can cause adding invalid FDBs (bridge offload). Add a drop rule matching on inactive lport in LAG. Reviewed-by: Simon Horman <horms@kernel.org> Co-developed-by: Marcin Szycik <marcin.szycik@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: remove unused ice_flow_entry fieldsPrzemek Kitszel
Remove ::entry and ::entry_sz fields of &ice_flow_entry, as they were never set. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ethtool: untangle the linkmode and ethtool headersJakub Kicinski
Commit 26c5334d344d ("ethtool: Add forced speed to supported link modes maps") added a dependency between ethtool.h and linkmode.h. The dependency in the opposite direction already exists so the new code was inserted in an awkward place. The reason for ethtool.h to include linkmode.h, is that ethtool_forced_speed_maps_init() is a static inline helper. That's not really necessary. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.Heng Guo
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1500 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1500 Reproduce: VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0. scapy command: send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase from 7 to 8. Issue description and patch: IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS ("OutOctets") in ip_finish_output2(). According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but not "OutRequests". "OutRequests" does not include any datagrams counted in "ForwDatagrams". ipSystemStatsOutOctets OBJECT-TYPE DESCRIPTION "The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. ipSystemStatsOutRequests OBJECT-TYPE DESCRIPTION "The total number of IP datagrams that local IP user- protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipSystemStatsOutForwDatagrams. So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add IPSTATS_MIB_OUTREQUESTS for "OutRequests". Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0 ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0 ---------------------------------------------------------------------- "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3. "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428. Signed-off-by: Heng Guo <heng.guo@windriver.com> Reviewed-by: Kun Song <Kun.Song@windriver.com> Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'ksz886x-forced-link-modes'David S. Miller
Oleksij Rempel says: ==================== fix forced link mode for KSZ886X switches changes v3: - squash patch 1 and 2 - use genphy_config_aneg() instead of genphy_setup_forced() changes v2: - address kernel test robot warning - change comment explaining clearing of KSZ886X_CTRL_FORCE_LINK bit - s/PHY we create/PHY will create/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: phy: micrel: Fix forced link mode for KSZ886X switchesOleksij Rempel
Address a link speed detection issue in KSZ886X PHY driver when in forced link mode. Previously, link partners like "ASIX AX88772B" with KSZ8873 could fall back to 10Mbit instead of configured 100Mbit. The issue arises as KSZ886X PHY continues sending Fast Link Pulses (FLPs) even with autonegotiation off, misleading link partners in autoneg mode, leading to incorrect link speed detection. Now, when autonegotiation is disabled, the driver sets the link state forcefully using KSZ886X_CTRL_FORCE_LINK bit. This action, beyond just disabling autonegotiation, makes the PHY state more reliably detected by link partners using parallel detection, thus fixing the link speed misconfiguration. With autonegotiation enabled, link state is not forced, allowing proper autonegotiation process participation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: dsa: microchip: ksz8: Enable MIIM PHY Control reg accessOleksij Rempel
Provide access to MIIM PHY Control register (Reg. 31) through ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for upcoming micrel.c patch to address forced link mode configuration. Closes: https://lore.kernel.org/oe-kbuild-all/202310112224.iYgvjBUy-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'mlxsw-lag-table-allocation'David S. Miller
Petr Machata says: ==================== mlxsw: Move allocation of LAG table to the driver PGT is an in-HW table that maps addresses to sets of ports. Then when some HW process needs a set of ports as an argument, instead of embedding the actual set in the dynamic configuration, what gets configured is the address referencing the set. The HW then works with the appropriate PGT entry. Within the PGT is placed a LAG table. That is a contiguous block of PGT memory where each entry describes which ports are members of the corresponding LAG port. The PGT is split to two parts: one managed by the FW, and one managed by the driver. Historically, the FW part included also the LAG table, referred to as FW LAG mode. Giving the responsibility for placement of the LAG table to the driver, referred to as SW LAG mode, makes the whole system more flexible. The FW currently supports both FW and SW LAG modes. To shed complexity, the FW should in the future only support SW LAG mode. Hence this patchset, where support for placement of LAG is added to mlxsw. There are FW versions out there that do not support SW LAG mode, and on Spectrum-1 in particular, there is no plan to support it at all. mlxsw will therefore have to support both modes of operation. Another aspect is that at least on Spectrum-1, there are FW versions out there that claim to support driver-placed LAG table, but then reject or ignore configurations enabling the same. The driver thus has to have a say in whether an attempt to configure SW LAG mode should even be done. The feature is therefore expressed in terms of "does the driver prefer SW LAG mode?", and "what LAG mode the PCI module managed to configure the FW with". This is unlike current flood mode configuration, where the driver can give a strict value, and that's what gets configured. But it gives a chance to the driver to determine whether LAG mode should be enabled at all. The "does the driver prefer SW LAG mode?" bit is expressed as a boolean lag_mode_prefer_sw. The reason for this is largely another feature that will be introduced in a follow-up patchset: support for CFF flood mode. The driver currently requires that the FW be configured with what is called controlled flood mode. But on capable systems, CFF would be preferred. So there are two values in flight: the preferred flood mode, and the fallback. This could be expressed with an array of flood modes ordered by preference, but that looks like an overkill in comparison. This flag/value model is then reused for LAG mode as well, except the fallback value is absent and implied to be FW, because there are no other values to choose from. The patchset progresses as follows: - Patches #1 to #5 adjust reg.h and cmd.h with new register fields, constants and remarks. - Patches #6 and #7 add the ability to request SW LAG mode and to query the LAG mode that was actually negotiated. This is where the abovementioned lag_mode_prefer_sw flag is added. - Patches #7 to #9 generalize PGT allocations to make it possible to allocate the LAG table, which is done in patch #10. - In patch #11, toggle lag_mode_prefer_sw on Spectrum-2 and above, which makes the newly-added code live. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum: Set SW LAG mode on Spectrum>1Petr Machata
On Spectrum-2, Spectrum-3 and Spectrum-4 machines, request SW responsibility for placement of the LAG table. On Spectrum-1, some FW versions claim to support lag_mode field despite quietly ignoring any settings made to that field. Thus refrain from attempting to configure lag_mode on those systems at all. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum: Allocate LAG table when in SW LAG modePetr Machata
In this patch, if the LAG mode is SW, allocate the LAG table and configure SGCR to indicate where it was allocated. We use the default "DDD" (for dynamic data duplication) layout of the LAG table. In the DDD mode, the membership information for each LAG is copied in 8 PGT entries. This is done for performance reasons. The LAG table then needs to be allocated on an address aligned to 8. Deal with this by moving the LAG init ahead so that the LAG table is allocated at address 0. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum_pgt: Generalize PGT allocationPetr Machata
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. In this patch, change the interface accordingly. Initialize FID family's pgt_base from the result of the PGT allocation (note that mlxsw makes a copy of the family structure, so what gets initialized is not actually the global structure). Drop the now-unnecessary pgt_base initializations and the corresponding defines. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum_fid: Allocate PGT for the whole FID family in one goPetr Machata
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. The current FID mode has one place where PGT address can be stored: the FID family's pgt_base. The allocation scheme should therefore be changed from allocating a block per FID flood table, to allocating a block per FID family. Do just that in this patch. The per-family allocation is going to be useful for another related feature as well: the CFF mode. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: pci: Permit toggling LAG modePetr Machata
Add to struct mlxsw_config_profile a field lag_mode_prefer_sw for the driver to indicate that SW LAG mode should be configured if possible. Add to the PCI module code to set lag_mode as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: core, pci: Add plumbing related to LAG modePetr Machata
lag_mode describes where the responsibility for LAG table placement lies: SW or FW. The bus module determines whether LAG is supported, can configure it if it is, and knows what (if any) configuration has been applied. Therefore add a bus callback to determine the configured LAG mode. Also add to core an API to query it. The LAG mode is for now kept at the default value of 0 for FW-managed. The code to actually toggle it will be added later. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: cmd: Add QUERY_FW.lag_mode_supportPetr Machata
Add QUERY_FW.lag_mode_support, which determines whether CONFIG_PROFILE.lag_mode is available. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: cmd: Add CONFIG_PROFILE.{set_, }lag_modePetr Machata
Add CONFIG_PROFILE.lag_mode, which serves for moving responsibility for placement of the LAG table from FW to SW. Whether lag_mode should be configured is determined by CONFIG_PROFILE.set_lag_mode, which also add. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: cmd: Fix omissions in CONFIG_PROFILE field names in commentsPetr Machata
A number of CONFIG_PROFILE fields' comments refer to a field named like cmd_mbox_config_* instead of cmd_mbox_config_profile_*. Correct these omissions. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: reg: Add SGCR.lag_lookup_pgt_basePetr Machata
Add SGCR.lag_lookup_pgt_base, which is used for configuring the base address of the LAG table within the PGT table for cases when the driver is responsible for the table placement. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: reg: Drop SGCR.llbPetr Machata
SGCR, Switch General Configuration Register, has not been used since commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support"). We will need the register again shortly, so instead of dropping it and reintroducing again, just drop the sole unused field. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'netlink-auto-integers'David S. Miller
Jakub Kicinski says: ==================== netlink: add variable-length / auto integers Add netlink support for "common" / variable-length / auto integers which are carried at the message level as either 4B or 8B depending on the exact value. This saves space and will hopefully decrease the number of instances where we realize that we needed more bits after uAPI is set is stone. It also loosens the alignment requirements, avoiding the need for padding. This mini-series is a fuller version of the previous RFC: https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/ No user included here. I have tested (and will use) it in the upcoming page pool API but the assumption is that it will be widely applicable. So sending without a user. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20netlink: specs: add support for auto-sized scalarsJakub Kicinski
Support uint / sint types in specs and YNL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20netlink: add variable-length / auto integersJakub Kicinski
We currently push everyone to use padding to align 64b values in netlink. Un-padded nla_put_u64() doesn't even exist any more. The story behind this possibly start with this thread: https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/ where DaveM was concerned about the alignment of a structure containing 64b stats. If user space tries to access such struct directly: struct some_stats *stats = nla_data(attr); printf("A: %llu", stats->a); lack of alignment may become problematic for some architectures. These days we most often put every single member in a separate attribute, meaning that the code above would use a helper like nla_get_u64(), which can deal with alignment internally. Even for arches which don't have good unaligned access - access aligned to 4B should be pretty efficient. Kernel and well known libraries deal with unaligned input already. Padded 64b is quite space-inefficient (64b + pad means at worst 16B per attr vs 32b which takes 8B). It is also more typing: if (nla_put_u64_pad(rsp, NETDEV_A_SOMETHING_SOMETHING, value, NETDEV_A_SOMETHING_PAD)) Create a new attribute type which will use 32 bits at netlink level if value is small enough (probably most of the time?), and (4B-aligned) 64 bits otherwise. Kernel API is just: if (nla_put_uint(rsp, NETDEV_A_SOMETHING_SOMETHING, value)) Calling this new type "just" sint / uint with no specific size will hopefully also make people more comfortable with using it. Currently telling people "don't use u8, you may need the bits, and netlink will round up to 4B, anyway" is the #1 comment we give to newcomers. In terms of netlink layout it looks like this: 0 4 8 12 16 32b: [nlattr][ u32 ] 64b: [ pad ][nlattr][ u64 ] uint(32) [nlattr][ u32 ] uint(64) [nlattr][ u64 ] Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20tools: ynl-gen: make the mnl_type() method publicJakub Kicinski
uint/sint support will add more logic to mnl_type(), deduplicate it and make it more accessible. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'devlink-errors-fmsg'David S. Miller
Przemek Kitszel says: ==================== devlink: retain error in struct devlink_fmsg Extend devlink fmsg to retain error (patch 1), so drivers could omit error checks after devlink_fmsg_*() (patches 2-10), and finally enforce future uses to follow this practice by change to return void (patch 11) Note that it was compile tested only. bloat-o-meter for whole series: add/remove: 8/18 grow/shrink: 23/40 up/down: 2017/-5833 (-3816) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>