Age | Commit message (Collapse) | Author |
|
Allow tracking of packets sent with send_subcrq direct vs
indirect. `ethtool -S <dev>` will now provide a counter
of the number of uses of each xmit method. This metric will
be useful in performance debugging.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241001163531.1803152-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definition is counter-intuitive and lead to number of bugs all over
the kernel.
Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.
[kirill@shutemov.name: fix min() warning]
Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When CPU's are added and removed, ibmvnic devices will reassign
hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD
to signal to ibmvnic devices that the CPU has been removed and it
is time to reset affinity hint assignments. On the other hand,
when CPU's are being added, add a state instance to
CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity
hints once the new CPU's are online. This implementation is based
on the virtio_net driver.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Assign affinity hints to ibmvnic device queue interrupts.
Affinity hints are assigned and removed during sub-crq init and
teardown, respectively. This update should improve latency if
utilized as interrupt lines and processing are more equally
distributed among CPU's. This implementation is based on the
virtio_net driver.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
include/linux/netdevice.h
net/core/dev.c
6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats")
794c24e9921f ("net-core: rx_otherhost_dropped to core_stats")
https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/
drivers/net/wan/cosa.c
d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()")
89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards")
https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae
When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.
Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow multiple LTBs in the txpool's ltb_set. i.e rather than using
a single large LTB, use several smaller LTBs.
The first n-1 LTBs will all be of the same size. The size of the last
LTB in the set depends on the number of buffers and buffer (mtu) size.
This strategy hopefully allows more reuse of the initial LTBs and also
reduces the chances of an allocation failure (of the large LTB) when
system is low in memory.
Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all
be of the same size. The size of the last LTB in the set depends on the
number of buffers and buffer (mtu) size.
Having a set of LTBs per pool provides a couple of benefits.
First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an
MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation
can fail in low memory conditions. With a set of LTBs per pool, we can
use several smaller (8MB) LTBs and hopefully have fewer allocation
failures. (See also comments in ibmvnic.h on the trade-off with smaller
LTBs)
Second since the kernel limits the size of the DMA buffer to 16MB (based
on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited
to 16MB. This in turn limits the number of buffers per pool to 1763 when
MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096
buffers per pool even when MTU is 9000.
Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define and use interfaces that treat the long term buffer (LTB) of an
rxpool as a set of LTBs rather than a single LTB. The set only has one
LTB for now.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a race between reset and the transmit paths that can lead to
ibmvnic_xmit() accessing an scrq after it has been freed in the reset
path. It can result in a crash like:
Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0080000016189f8
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic]
LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
Call Trace:
[c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable)
[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
[c000000000c9cfcc] sch_direct_xmit+0xec/0x330
[c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0
[c000000000c00ad4] __dev_queue_xmit+0x394/0x730
[c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding]
[c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding]
[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
[c000000000c00ca4] __dev_queue_xmit+0x564/0x730
[c000000000cf97e0] neigh_hh_output+0xd0/0x180
[c000000000cfa69c] ip_finish_output2+0x31c/0x5c0
[c000000000cfd244] __ip_queue_xmit+0x194/0x4f0
[c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0
[c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0
[c000000000d2d984] tcp_retransmit_skb+0x34/0x130
[c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0
[c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330
[c000000000d317bc] tcp_write_timer+0x5c/0x200
[c000000000243270] call_timer_fn+0x50/0x1c0
[c000000000243704] __run_timers.part.0+0x324/0x460
[c000000000243894] run_timer_softirq+0x54/0xa0
[c000000000ea713c] __do_softirq+0x15c/0x3e0
[c000000000166258] __irq_exit_rcu+0x158/0x190
[c000000000166420] irq_exit+0x20/0x40
[c00000000002853c] timer_interrupt+0x14c/0x2b0
[c000000000009a00] decrementer_common_virt+0x210/0x220
--- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
The immediate cause of the crash is the access of tx_scrq in the following
snippet during a reset, where the tx_scrq can be either NULL or an address
that will soon be invalid:
ibmvnic_xmit()
{
...
tx_scrq = adapter->tx_scrq[queue_num];
txq = netdev_get_tx_queue(netdev, queue_num);
ind_bufp = &tx_scrq->ind_buf;
if (test_bit(0, &adapter->resetting)) {
...
}
But beyond that, the call to ibmvnic_xmit() itself is not safe during a
reset and the reset path attempts to avoid this by stopping the queue in
ibmvnic_cleanup(). However just after the queue was stopped, an in-flight
ibmvnic_complete_tx() could have restarted the queue even as the reset is
progressing.
Since the queue was restarted we could get a call to ibmvnic_xmit() which
can then access the bad tx_scrq (or other fields).
We cannot however simply have ibmvnic_complete_tx() check the ->resetting
bit and skip starting the queue. This can race at the "back-end" of a good
reset which just restarted the queue but has not cleared the ->resetting
bit yet. If we skip restarting the queue due to ->resetting being true,
the queue would remain stopped indefinitely potentially leading to transmit
timeouts.
IOW ->resetting is too broad for this purpose. Instead use a new flag
that indicates whether or not the queues are active. Only the open/
reset paths control when the queues are active. ibmvnic_complete_tx()
and others wake up the queue only if the queue is marked active.
So we will have:
A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open()
->resetting = true
->tx_queues_active = false
disable tx queues
...
->tx_queues_active = true
start tx queues
B. Tx interrupt in ibmvnic_complete_tx():
if (->tx_queues_active)
netif_wake_subqueue();
To ensure that ->tx_queues_active and state of the queues are consistent,
we need a lock which:
- must also be taken in the interrupt path (ibmvnic_complete_tx())
- shared across the multiple queues in the adapter (so they don't
become serialized)
Use rcu_read_lock() and have the reset thread synchronize_rcu() after
updating the ->tx_queues_active state.
While here, consolidate a few boolean fields in ibmvnic_adapter for
better alignment.
Based on discussions with Brian King and Dany Madden.
Fixes: 7ed5b31f4a66 ("net/ibmvnic: prevent more than one thread from running in reset")
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently don't allow queuing resets when adapter is in VNIC_PROBING
state - instead we throw away the reset and return EBUSY. The reasoning
is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
being initialized so performing a reset during this time can lead us to
accessing fields in the ibmvnic_adapter that are not fully initialized.
A review of the code shows that all the adapter state neede to process a
reset is initialized before registering the CRQ so that should no longer
be a concern.
Further the expectation is that if we do get a reset (transport event)
during probe, the do..while() loop in ibmvnic_probe() will handle this
by reinitializing the CRQ.
While that is true to some extent, it is possible that the reset might
occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
such a reset will be thrown away. While the client assumes that the
adapter is functional, the vnic server will wait for the client to reinit
the adapter. This disconnect between the two leaves the adapter down
needing manual intervention.
Because ibmvnic_probe() has other work to do after initializing the CRQ
(such as registering the netdev at a minimum) and because the reset event
can occur at any instant after the CRQ is initialized, there will always
be a window between initializing the CRQ and considering the adapter
ready for resets (ie state == PROBED).
So rather than discarding resets during this window, allow queueing them
- but only process them after the adapter is fully initialized.
To do this, introduce a new completion state ->probe_done and have the
reset worker thread wait on this before processing resets.
This change brings up two new situations in or just after ibmvnic_probe().
First after one or more resets were queued, we encounter an error and
decide to retry the initialization. At that point the queued resets are
no longer relevant since we could be talking to a new vnic server. So we
must purge/flush the queued resets before restarting the initialization.
As a side note, since we are still in the probing stage and we have not
registered the netdev, it will not be CHANGE_PARAM reset.
Second this change opens up a potential race between the worker thread
in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
following sequence of events:
1. Register CRQ
2. Get transport event before CRQ_INIT completes.
3. Tasklet schedules reset:
a) add rwi to list
b) schedule_work() to start worker thread which runs
and waits for ->probe_done.
4. ibmvnic_probe() decides to retry, purges rwi_list
5. Re-register crq and this time rest of probe succeeds - register
netdev and complete(->probe_done).
6. Worker thread resumes in __ibmvnic_reset() from 3b.
7. Worker thread sets ->resetting bit
8. ibmvnic_open() comes in, notices ->resetting bit, sets state
to IBMVNIC_OPEN and returns early expecting worker thread to
finish the open.
9. Worker thread finds rwi_list empty and returns without
opening the interface.
If this happens, the ->ndo_open() call is effectively lost and the
interface remains down. To address this, ensure that ->rwi_list is
not empty before setting the ->resetting bit. See also comments in
__ibmvnic_reset().
Fixes: 6a2fb0e99f9c ("ibmvnic: driver initialization for kdump/kexec")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With previous bug fix, ->wait_capability flag is no longer needed and can
be removed.
Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IBMVNIC_STATS_TIMEOUT and IBMVNIC_INIT_FAILED are not used in the driver.
Remove them.
Suggested-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than releasing the tx pools on every close and reallocating
them on open, reuse the tx pools unless the pool parameters (number
of pools, size of each pool or size of each buffer in a pool) have
changed.
If the pool parameters changed, then release the old pools (if
any) and allocate new ones.
Specifically release tx pools, if:
- adapter is removed,
- pool parameters change during reset,
- we encounter an error when opening the adapter in response
to a user request (in ibmvnic_open()).
and don't release them:
- in __ibmvnic_close() or
- on errors in __ibmvnic_open()
in the hope that we can reuse them during this or next reset.
With these changes reset_tx_pools() can be dropped because its
optimization is now included in init_tx_pools() itself.
cleanup_tx_pools() releases all the skbs associated with the pool and
is called from ibmvnic_cleanup(), which is called on every reset. Since
we want to reuse skbs across resets, move cleanup_tx_pools() out of
ibmvnic_cleanup() and call it only when user closes the adapter.
Add two new adapter fields, ->prev_mtu, ->prev_tx_pool_size to track the
previous values and use them to decide whether to reuse or realloc the
pools.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than releasing the rx pools on and reallocating them on every
reset, reuse the rx pools unless the pool parameters (number of pools,
size of each pool or size of each buffer in a pool) have changed.
If the pool parameters changed, then release the old pools (if any)
and allocate new ones.
Specifically release rx pools, if:
- adapter is removed,
- pool parameters change during reset,
- we encounter an error when opening the adapter in response
to a user request (in ibmvnic_open()).
and don't release them:
- in __ibmvnic_close() or
- on errors in __ibmvnic_open()
in the hope that we can reuse them on the next reset.
With these, reset_rx_pools() can be dropped because its optimzation is
now included in init_rx_pools() itself.
cleanup_rx_pools() releases all the skbs associated with the pool and
is called from ibmvnic_cleanup(), which is called on every reset. Since
we want to reuse skbs across resets, move cleanup_rx_pools() out of
ibmvnic_cleanup() and call it only when user closes the adapter.
Add two new adapter fields, ->prev_rx_buf_sz, ->prev_rx_pool_size to
keep track of the previous values and use them to decide whether to
reuse or realloc the pools.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a follow-on patch, we will reuse long term buffers when possible.
When doing so we have to be careful to properly assign map ids. We
can no longer assign them sequentially because a lower map id may be
available and we could wrap at 255 and collide with an in-use map id.
Instead, use a bitmap to track active map ids and to find a free map id.
Don't need to take locks here since the map_id only changes during reset
and at that time only the reset worker thread should be using the adapter.
Noticed this when analyzing an error Dany Madden ran into with the
patch set.
Reported-by: Dany Madden <drt@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To make the code more readable, use/rename some local variables.
Basically we have a set of pools, num_pools. Each pool has a set of
buffers, pool_size and each buffer is of size buff_size.
pool_size is a bit ambiguous (whether size in bytes or buffers). Add
a comment in the header file to make it explicit.
Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the device to be initialized at a later time if
it is not available at boot. The device will be allowed to probe but
will be given a "down" state. After completing device probe and
registering the net device, the driver will await an interrupt signal
from its partner device, indicating that it is ready for boot. The
driver will schedule a work event to perform the necessary procedure
and begin operation.
Co-developed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Cristobal Forno <cforno12@linux.ibm.com>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e704f0434ea6 ("ibmvnic: Remove debugfs support") did not
clean up everything. Remove the remaining code.
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
The work queue is used to queue reset requests like CHANGE-PARAM or
FAILOVER resets for the worker thread. When the adapter is being removed
the adapter state is set to VNIC_REMOVING and the work queue is flushed
so no new work is added. However the check for adapter being removed is
racy in that the adapter can go into REMOVING state just after we check
and we might end up adding work just as it is being flushed (or after).
The ->rwi_lock is already being used to serialize queue/dequeue work.
Extend its usage ensure there is no race when scheduling/flushing work.
Fixes: 6954a9e4192b ("ibmvnic: Flush existing work items before device removal")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc:Uwe Kleine-König <uwe@kleine-koenig.org>
Cc:Saeed Mahameed <saeed@kernel.org>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The supported indirect subcrq entries on Power8 is 16. Power9
supports 128. Redefined this value to 16 to minimize the driver from
having to reset when migrating between Power9 and Power8. In our rx/tx
performance testing, we found no performance difference between 16 and
128 at this time.
Fixes: f019fb6392e5 ("ibmvnic: Introduce indirect subordinate Command Response Queue buffer")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
stats_lock is no longer used. So remove it.
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are several spinlock_t definitions without comments.
Add them.
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
drivers/net/ethernet/ibm/ibmvnic.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.
Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.
Have ibmvnic track pending logins and discard any stale login responses.
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.
Conflicts:
drivers/net/can/usb/gs_usb.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Based on the discussion with Sukadev Bhattiprolu and Dany Madden,
we believe that checking adapter->resetting bit is preferred
since RESETTING state flag is not as strict as resetting bit.
RESETTING state flag is removed since it is verbose now.
Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
PCI bus slowdowns were observed on IBM VNIC devices as a result
of partial cache line writes and non-cache aligned full cache line writes.
Ensure that packet data buffers are cache-line aligned to avoid these
slowdowns.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove unused and superfluous code and members in
existing TX implementation and data structures.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch introduces the infrastructure to send batched subordinate
Command Response Queue descriptors, which are used by the ibmvnic
driver to send TX frame and RX buffer descriptors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set up the speed according to crq->query_phys_parms.rsp.speed.
Fix IBMVNIC_10GBPS typo.
Fixes: f8d6ae0d27ec ("ibmvnic: Report actual backing device speed and duplex values")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The login response buffer is freed after it is received
and parsed, but other functions in the driver still attempt
to read it, such as when the device is opened, causing the
Oops below. Store relevant information in the driver's
private data structures and use those instead.
BUG: Kernel NULL pointer dereference on read at 0x00000010
Faulting instruction address: 0xc00800000050a900
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables ibmvnic ibmveth crc32c_vpmsum autofs4
CPU: 7 PID: 759 Comm: NetworkManager Not tainted 5.9.0-rc1-00124-gd0a84e1f38d9 #14
NIP: c00800000050a900 LR: c00800000050a8f0 CTR: 00000000005b1904
REGS: c0000001ed746d20 TRAP: 0300 Not tainted (5.9.0-rc1-00124-gd0a84e1f38d9)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24428484 XER: 00000001
CFAR: c0000000000101b0 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
GPR00: c00800000050a8f0 c0000001ed746fb0 c008000000518e00 0000000000000000
GPR04: 00000000000000c0 0000000000000080 0003c366c60c4501 0000000000000352
GPR08: 000000000001f400 0000000000000010 0000000000000000 0000000000000000
GPR12: 0001cf0000000019 c00000001ec97680 00000001003dfd40 0000010008dbb22c
GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000000edb6c8
GPR20: c000000004e73e00 c000000004fd2448 c000000004e6d700 c000000004fd2448
GPR24: c000000004fd2400 c000000004a0cd20 c0000001ed961860 c0080000005029d8
GPR28: 0000000000000000 0000000000000003 c000000004a0c000 0000000000000000
NIP [c00800000050a900] init_resources+0x338/0xa00 [ibmvnic]
LR [c00800000050a8f0] init_resources+0x328/0xa00 [ibmvnic]
Call Trace:
[c0000001ed746fb0] [c00800000050a8f0] init_resources+0x328/0xa00 [ibmvnic] (unreliable)
[c0000001ed747090] [c00800000050b024] ibmvnic_open+0x5c/0x100 [ibmvnic]
[c0000001ed747110] [c000000000bdcc0c] __dev_open+0x17c/0x250
[c0000001ed7471b0] [c000000000bdd1ec] __dev_change_flags+0x1dc/0x270
[c0000001ed747260] [c000000000bdd2bc] dev_change_flags+0x3c/0x90
[c0000001ed7472a0] [c000000000bf24b8] do_setlink+0x3b8/0x1280
[c0000001ed747450] [c000000000bf8cc8] __rtnl_newlink+0x5a8/0x980
[c0000001ed7478b0] [c000000000bf9110] rtnl_newlink+0x70/0xb0
[c0000001ed7478f0] [c000000000bf07c4] rtnetlink_rcv_msg+0x364/0x460
[c0000001ed747990] [c000000000c68b94] netlink_rcv_skb+0x84/0x1a0
[c0000001ed747a00] [c000000000bef758] rtnetlink_rcv+0x28/0x40
[c0000001ed747a20] [c000000000c68188] netlink_unicast+0x218/0x310
[c0000001ed747a80] [c000000000c6848c] netlink_sendmsg+0x20c/0x4e0
[c0000001ed747b20] [c000000000b9dc88] ____sys_sendmsg+0x158/0x360
[c0000001ed747bb0] [c000000000ba1c88] ___sys_sendmsg+0x98/0xf0
[c0000001ed747d10] [c000000000ba1db8] __sys_sendmsg+0x78/0x100
[c0000001ed747dc0] [c000000000033820] system_call_exception+0x160/0x280
[c0000001ed747e20] [c00000000000d740] system_call_common+0xf0/0x27c
Instruction dump:
3be00000 38810068 b1410076 3941006a 93e10072 fbea0000 b1210068 4bff9915
eb9e0ca0 eabe0900 393c0010 3ab50048 <7fa04c2c> 7fba07b4 7b431764 7b4917a0
---[ end trace fbc5949a28e103bd ]---
Fixes: f3ae59c0c015 ("ibmvnic: store RX and TX subCRQ handle array in ibmvnic_adapter struct")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the driver reads RX and TX subCRQ handle array directly from
a DMA-mapped buffer address when it needs to make a H_SEND_SUBCRQ
hcall. This patch stores that information in the ibmvnic_sub_crq_queue
structure instead of reading from the buffer received at login. The
overall goal of this patch is to parse relevant information from the
login response buffer and store it in the driver's private data
structures so that we don't need to read directly from the buffer and
can then free up that memory.
Signed-off-by: Cristobal Forno <cforno12@linux.ibm.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ibmvnic driver does not check the device state when the device
is removed. If the device is removed while a device reset is being
processed, the remove may free structures needed by the reset,
causing an oops.
Fix this by checking the device state before processing device remove.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Provide some serialization for device CRQ commands
and queries to ensure that the shared variable used for
storing return codes is properly synchronized.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current code allows more than one thread to run in reset. This can
corrupt struct adapter data. Check adapter->resetting before performing
a reset, if there is another reset running delay (100 msec) before trying
again.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
made the change to hold the RTNL lock during a reset to avoid deadlock
but linkwatch_event is fired during the reset and needs the RTNL lock.
That keeps linkwatch_event process from proceeding until the reset
is complete. The reset process cannot tolerate the linkwatch_event
processing after reset completes, so release the RTNL lock during the
process to allow a chance for linkwatch_event to run during reset.
This does not guarantee that the linkwatch_event will be processed as
soon as link state changes, but is an improvement over the current code
where linkwatch_event processing is always delayed, which prevents
transmissions on the device from being deactivated leading transmit
watchdog timer to time-out.
Release the RTNL lock before link state change and re-acquire after
the link state change to allow linkwatch_event to grab the RTNL lock
and run during the reset.
Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 8 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190523091650.663497195@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It was discovered in testing that the underlying hardware MAC
address will revert to initial settings following a device reset,
but the driver fails to resend the current OS MAC settings. This
oversight can result in dropped packets should the scenario occur.
Fix this by informing hardware of current MAC address settings
following any adapter initialization or resets.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ibmvnic driver currently uses the same fixed name when using
request_irq, this makes it hard to parse when multiple VNIC devices are
available at the same time. This patch adds the unit_address as the device
identification along with an id for each queue.
The original idea was to use the interface name as an identifier, but it
is not feasible given these requests happen at adapter probe, and at this
point netdev is not yet registered so it doesn't have the proper name
assigned to it.
Signed-off-by: Murilo Fossa Vicentini <muvic@linux.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ibmvnic driver currently reports a fixed value for both speed and
duplex settings regardless of the actual backing device that is being
used. By adding support to the QUERY_PHYS_PARMS command defined by the
PAPR+ we can query the current physical port state and report the proper
values for these feilds.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Murilo Fossa Vicentini <muvic@linux.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ibmvnic_reset can create and schedule a reset work item from
an IRQ context, so do not use a mutex, which can sleep. Convert
the reset work item mutex to a spin lock. Locking debugger generated
the trace output below.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
in_atomic(): 1, irqs_disabled(): 1, pid: 120, name: kworker/8:1
4 locks held by kworker/8:1/120:
#0: 0000000017c05720 ((wq_completion)"events"){+.+.}, at: process_one_work+0x188/0x710
#1: 00000000ace90706 ((linkwatch_work).work){+.+.}, at: process_one_work+0x188/0x710
#2: 000000007632871f (rtnl_mutex){+.+.}, at: rtnl_lock+0x30/0x50
#3: 00000000fc36813a (&(&crq->lock)->rlock){..-.}, at: ibmvnic_tasklet+0x88/0x2010 [ibmvnic]
irq event stamp: 26293
hardirqs last enabled at (26292): [<c000000000122468>] tasklet_action_common.isra.12+0x78/0x1c0
hardirqs last disabled at (26293): [<c000000000befce8>] _raw_spin_lock_irqsave+0x48/0xf0
softirqs last enabled at (26288): [<c000000000a8ac78>] dev_deactivate_queue.constprop.28+0xc8/0x160
softirqs last disabled at (26289): [<c0000000000306e0>] call_do_softirq+0x14/0x24
CPU: 8 PID: 120 Comm: kworker/8:1 Kdump: loaded Not tainted 4.20.0-rc6 #6
Workqueue: events linkwatch_event
Call Trace:
[c0000003fffa7a50] [c000000000bc83e4] dump_stack+0xe8/0x164 (unreliable)
[c0000003fffa7aa0] [c00000000015ba0c] ___might_sleep+0x2dc/0x320
[c0000003fffa7b20] [c000000000be960c] __mutex_lock+0x8c/0xb40
[c0000003fffa7c30] [d000000006202ac8] ibmvnic_reset+0x78/0x330 [ibmvnic]
[c0000003fffa7cc0] [d0000000062097f4] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic]
[c0000003fffa7e00] [c0000000001224c8] tasklet_action_common.isra.12+0xd8/0x1c0
[c0000003fffa7e60] [c000000000bf1238] __do_softirq+0x1a8/0x64c
[c0000003fffa7f90] [c0000000000306e0] call_do_softirq+0x14/0x24
[c0000003f3f87980] [c00000000001ba50] do_softirq_own_stack+0x60/0xb0
[c0000003f3f879c0] [c0000000001218a8] do_softirq+0xa8/0x100
[c0000003f3f879f0] [c000000000121a74] __local_bh_enable_ip+0x174/0x180
[c0000003f3f87a60] [c000000000bf003c] _raw_spin_unlock_bh+0x5c/0x80
[c0000003f3f87a90] [c000000000a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160
[c0000003f3f87ad0] [c000000000a8c8b0] dev_deactivate_many+0xd0/0x520
[c0000003f3f87b70] [c000000000a8cd40] dev_deactivate+0x40/0x60
[c0000003f3f87ba0] [c000000000a5e0c4] linkwatch_do_dev+0x74/0xd0
[c0000003f3f87bd0] [c000000000a5e694] __linkwatch_run_queue+0x1a4/0x1f0
[c0000003f3f87c30] [c000000000a5e728] linkwatch_event+0x48/0x60
[c0000003f3f87c50] [c0000000001444e8] process_one_work+0x238/0x710
[c0000003f3f87d20] [c000000000144a48] worker_thread+0x88/0x4e0
[c0000003f3f87db0] [c00000000014e3a8] kthread+0x178/0x1c0
[c0000003f3f87e20] [c00000000000bfd0] ret_from_kernel_thread+0x5c/0x6c
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch changes to use rtnl_lock only during a reset to avoid
deadlock that could occur when a thread operating close is holding
rtnl_lock and waiting for reset_lock acquired by another thread,
which is waiting for rtnl_lock in order to set the number of tx/rx
queues during a reset.
Also, we now setting the number of tx/rx queues during a soft reset
for failover or LPM events.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When choosing channel amounts and ring sizes, the maximums in the
ibmvnic driver are defined by the virtual i/o server management
partition. Even though they are defined as maximums, the client
driver may in fact successfully request resources that exceed
these limits, which are mostly dependent on a user's hardware
With this in mind, provide an ethtool flag that when enabled will
allow the user to request resources limited by driver-defined
maximums instead of limits defined by the management partition.
The driver will try to honor the user's request but may not allowed
by the management partition. In this case, the driver requests
as close as it can get to the desired amount until it succeeds.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce driver-defined maximums for queue ring sizes. Devices
available for IBM vNIC today will likely not allow this amount,
but this should give us some leeway for future devices that may
support larger ring sizes.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Increase queue size limit to 16. Devices available for IBM vNIC today
will not allow this amount, but this should give us some leeway for
future devices that may support more RX or TX queues.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|