Age | Commit message (Collapse) | Author |
|
Now that we have a subsystem for compute accelerators, move the
habanalabs driver to it.
This patch only moves the files and fixes the Makefiles. Future
patches will change the existing code to register to the accel
subsystem and expose the accel device char files instead of the
habanalabs device char files.
Update the MAINTAINERS file to reflect this change.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As ASICs are evolving, we will need to update the DRAM properties at
various points because we may get different information from the f/w
at different points of the initialization.
This ASIC function is a foundation for this capability.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add missing le32_to_cpu() conversions, and use %d for the value
returned from atomic_read().
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As part of the RAS that is done by the f/w, we should send a message
to the f/w when a user either acquires or releases the device.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Except Goya, none of our ASICs require context switch flow, hence we
enable this flow only where it is needed.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Since hwmon fini code is common for all asics, unified it to common
function.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We don't use KDMA concurrently in the driver. The only use is through
debugfs and we don't protect concurrent access through it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to be more explicit we should use the term compute_reset
for describing the reset in which only the compute engines gets
reset.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Change is_idle functions so it would be more usable outside debugfs.
Do this by replacing seq_file parameter with regular string.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
H/W being dirty during initialization is completely expected in case
f/w tools are used before loading the driver. As it is not an error,
and as it doesn't give any meaningful information to the user,
no point of printing it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Doing compute reset can be the traditional inference soft reset
that is supported only in Goya.
Or it can be the new reset upon device release, which is supported
in Gaudi2 and above.
Therefore, wherever suitable, use the terminology of compute reset
instead of soft reset.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For gaudi2 we need to send a value to F/W as part of the
PCI_ACCESS packet.
As a preparation, modify hl_fw_send_pci_access_msg() to have a 'value'
field.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Currently we are not waiting for preboot ready after hard reset.
This leads to a race in which COMMs protocol begins but will get no
response from the f/w.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
New asic properties were added for Gaudi2. We want to initialize
and use them, when relevant, also for Goya and Gaudi.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There are a number of new ASIC-specific functions that were added
for Gaudi2. To make the common code work, we need to define empty
implementations of those functions for Goya and Gaudi.
Some functions will return error if called with Goya/Gaudi.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.
It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.
It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
PCI bar size is resource_size_t so we should use %pa to make it work
correctly on all architectures.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Because in future ASICs the driver will allow the user to set the
page size we need to make sure this data is propagated in all APIs.
In addition, since this is already an ASIC property we no longer need
ASIC function for it.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We dropped support for page sizes that are not power of 2.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This is a pre-requisite patch for adding tracepoints to the DMA memory
operations (allocation/free) in the driver.
The main purpose is to be able to cross data with the map operations and
determine whether memory violation occurred, for example free DMA
allocation before unmapping it from device memory.
To achieve this the DMA alloc/free code flows were refactored so that a
single DMA tracepoint will catch many flows.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The values in this enum are not used by h/w but are a contract
between userspace and the kernel driver so they must be defined
in the uapi file.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We use scrub_device_mem only to scrub the entire SRAM and entire
DRAM. Therefore there is no need to send addr and size
args to the callback.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is a rare race condition in CB completion mechanism, that can
occur under a very high pressure of command submissions.
The preconditions for this to happen are:
1. There should be enough command submissions for the pre-allocated
patched CB pool to run out of commands. At this stage we start
allocating new patched CBs as they arrive.
2. CB size has to be exactly (128*n + 104)B for some n, i.e. 24B below
a cache line end.
The flow:
1. Two command buffers being completed on different streams, at the
same time. Denote those CB1 and CB2.
2. Each command buffer is injected with two messages, 16B each - one
for a HBW update of the completion queue, another to raise
interrupt.
3. Assume CB1 updated the completion queue and raise the interrupt.
4. Assume CB2 updated the completion queue but did not raise the
interrupt yet.
5. The host receives the interrupt. It goes over the completion queue
and sees two completions - CB1 and CB2. Release them both.
6. CB2 performs the last command. The problem is that the last command
is split between 2 cache lines. So to read the last 8B of the last
command, it has to access the host again. Problem is - CB2 is
already released. This causes a DMAR error.
The solution to this problem is simply to make sure the last two
commands in the CB are always in the same cache line, using NOP padding.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This asic callback function is not called anymore from the common code.
The asic-specific function itself is called but from within the
asic-specific code.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
When user requests to prefetch the MMU translations, the driver will
not block the user until prefetch is done.
Instead, the prefetch work will be delegated to a WQ which will do it
in the background.
This way, the prefetch may progress without blocking the user at all.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add the ability to scrub the device memory with a given value.
Add file 'dram_mem_scrub_val' to set the value
and a file 'dram_mem_scrub' to scrub the dram.
This is very important to help during automated tests, when you want
the CI system to randomize the memory before training certain
DL topologies.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
With the new code required for the flow added, we can now switch
to using the new memory manager infrastructure, removing the old code.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of using for_each_sg when iterating sgt that contains dma
entries, use the more proper for_each_sgtable_dma_sg macro.
In addition, both Goya and Gaudi have the exact same implementation
of the asic function that encapsulate the usage of this macro, so
it is better to move that implementation to the common code.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently we have two reset prints per reset. One is in the common
code and one in each asic-specific file.
We can change the asic-specific message to be debug only as we can
know the type of reset being done according to the print in the
common code, which is also easier to maintain.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Halting compute engines is a print that doesn't add us any information
because it is always done in the reset process and not used elsewhere.
Even if it was, we don't use prints to mark functions we passed
through.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The debugfs memory access now uses the callback 'access_dev_mem'
so there is no use of the callbacks
'debugfs_{read32,read64,write32,write6}'. Remove them.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is a preparation for unifying the code of accessing device memory
through debugfs. Add struct fields and callbacks that will later
be used in debugfs code and will reduce code duplication
among the different read{32,64}/write{32,64} callbacks of
every asic.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When Gaudi device is secured the monitors data in the configuration
space is blocked from PCI access.
As we need to enable user to get sync-manager monitors registers when
debugging, this patch adds a debugfs that dumps the information to a
binary file (blob).
When a root user will trigger the dump, the driver will send request to
the f/w to fill a data structure containing dump of all monitors
registers.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is necessary pre-requisite for future ASIC support, where MMU
TLB prefetch is supported.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The required DMA mask is no longer based on input from the F/W, but it
is fixed per ASIC according to its address space.
As such, the per-ASIC function to get this value can be replaced with a
property variable.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Future devices will support multiple device memory page sizes.
In addition, an API for the user was added for it to be able to control
the device memory allocation page size.
This patch is a complementary patch to inform the user of the available
page size supported by the device.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is no need to hold each MMU mask/shift as a denoted structure
member (e.g. hop0_mask).
Instead converting it to array will result in smaller and more readable
code.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch breaks the cumbersome implementation of "get real page size"
along with it's multiple inner conditions and implement each case
(according to the real complexity) inside an ASIC function.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Looking forward we will need to report to the user what is the default
page size used.
This will be done more conveniently by explicitly updating the property
rather than to rely on a "0 meaning default" value.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In future ASICs the MMU will be able to work with multiple page sizes,
thus a new flag is added to allow the user to set the requested page
size.
This flag is added since the whole DRAM is allocated for the user and
the user also should be familiar with the memory usage use case.
As such, the user may choose to "over allocate" memory in favor of
performance (for instance- large page allocations covers more memory
in less TLB entries).
For example: say available page sizes are of 1MB and 32MB. If user
wants to allocate 40MB the user can either set page size to 1MB and
allocate the exact amount of memory (but will result in 40 TLB entries)
or the user can use 32MB pages, "waste" 8MB of physical memory but
occupy only 2 TLB entries.
Note that this feature will be available only to ASIC that supports
multiple DRAM page sizes.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For current devices there is a need to send the max power value to F/W
during device init, for example because there might be several card
types.
In future devices, this info will be programmed in the device's EEPROM
and will be read by F/W, and hence the driver should not send it.
Modify the sending of the relevant message to be done only for ASIC
types that need it.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
On Goya and Gaudi, the stop-on-error configuration can be set via
debugfs. However, in future devices, this configuration will always be
enabled.
Modify the debugfs node to be allowed only for ASICs that support this
dynamic configuration.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to support several device MMU blocks with different
architectures (e.g. different HOP table size) we need to move to
per-MMU properties rather than keeping those properties as ASIC
properties.
Refactoring the code to use "per-MMU proprties" is a major effort.
To start making the transition towards this goal but still support
taking the properties from ASIC properties (for code that currently
uses them) this patch copies some of the properties to the "per-MMU"
properties and later, when implementing the per-MMU properties, we
would be able to delete the MMU props from the ASIC props.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We have a common function that wraps the call to the MMU cache
invalidation function, which is ASIC-specific. The wrapper checks
the return value and prints error if necessary. For consistency, try
to use the wrapper when possible.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We don't need this workaround anymore.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Setting PLL profile is the same for all ASICs, except for GOYA.
However, because this function is never called from common code, there
is no need to have an asic-specific callback function.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Retrieving the clock from the f/w is done exactly the same in ALL our
ASICs. Therefore, no real justification for doing it as an
ASIC-specific function.
The only thing is we need to check if we are running on simulator,
which doesn't require ASIC-specific callback.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Now that clock gating is permanently disabled in GAUDI, no need for
the ASIC functions of setting and disabling clock gating, as this
was a unique scenario in GAUDI.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Unify variables related to device reset, which will help us to
add some new reset functionality in future patches.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
During the MMU development the MMU header files were left with unclean
definitions:
- MMU "version specific" definitions that were left in the mmu_general
file
- unused definitions
This patch attempts, where possible, to keep definitions that can serve
multiple MMU versions (but that are not tightly bound with specific MMU
arch) in the mmu_general header file (e.g. different definitions for
number of HOPs).
Otherwise, move MMU version specific definitions (e.g. HOPs masks and
shifts) to the specific MMU version file.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|