Age | Commit message (Collapse) | Author |
|
With the increase in the size of the recoverable page fault
queue, we want to ensure the initial messages from GuC in
the G2H buffer have space while we transfer those out to the
actual pf_queue. Bump the G2H queue size to account for this
increase in the pf_queue size.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4c2b6974801bcffd8a010d838c8733fa4092573d.1723862633.git.stuart.summers@intel.com
|
|
Ensure we are managing G2H credits correctly. Extra important now that
this is tied to PM.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240725231801.1958038-1-matthew.brost@intel.com
|
|
Take PM ref when any G2H are outstanding, drop when none are
outstanding.
To safely ensure we have PM ref when in the GuC CT layer, a PM ref needs
to be held when scheduler messages are pending too.
v2:
- Add outer PM protections to xe_file_close (CI)
v3:
- Only take PM ref 0->1 and drop on 1->0 (Matthew Auld)
v4:
- Add assert to G2H increment function
v5:
- Rebase
v6:
- Declare xe as local variable in xe_file_close (CI)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-5-matthew.brost@intel.com
|
|
GuC TLB invalidation depends on GuC to process the request from the CT
queue and then the real time to invalidate TLB. Add a function to return
overestimated possible time a TLB inval H2G might take which can be used
as timeout value for TLB invalidation wait time.
v4: Make sure CTB is in 4K blocks(Michal) and other doc fixes
v3: Pass CT to xe_guc_ct_queue_proc_time_jiffies() (Michal)
Add tlb_timeout_jiffies() that replaces TLB_TIMEOUT(Michal)
v2: Address reviews from Michal.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1622
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628085845.2369-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
|
|
We maintain GuC error code values in hex format. Also print them
in that format for easier matching.
While at it, slightly reformat the log and add missing \n.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240625141258.1257-4-michal.wajdeczko@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The G2H RETRY message sent by the GuC does not necessary indicate
any serious problem and can be a part of the normal communication
flow. Switch the log level from warning to more appropriate debug.
This will also let the CI ignore these logs which were seen in few
SR-IOV scenarios.
While at it, use hex to print the reason and add missing \n.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240625141258.1257-2-michal.wajdeczko@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
In multi-gpu environments it is important to know the device
guc txn belongs to. The tracing information includes the device_id
to indicate the device the event is associated with.
v2: Use variable sized variant to display dev name(Gustavo)
v3: Pass single argument to __assign_str to fix kunit error
v4: Minor formatting tweaks
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607182943.3572524-5-radhakrishna.sripada@intel.com
|
|
xe_trace.h is starting to get over crowded. Move the traces
related to guc to its own file.
v2: Update year in License(Gustavo)
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Suggested-by: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607182943.3572524-3-radhakrishna.sripada@intel.com
|
|
During early initialization, in the xe_guc_min_load_for_hwconfig()
function, we are successfully enabling CTB communication, but it
will only allow us to send non-blocking H2G messages, as due to
not yet enabled IRQs, including G2H IRQs, we will not notice any
new G2H message sent by the GuC, including replies to our blocking
H2G request messages. And those successful replies are mandatory
for the VF drivers to continue normal operations.
As attempt to workaround this driver initialization ordering issue,
introduce special safe-mode CTB worker, that will periodically
trigger G2H processing, like original IRQ handler, in case no
MSI/MSIX IRQs were enabled on the driver yet. Once we detect that
IRQ were enabled, we will stop this worker.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240606130639.1504-3-michal.wajdeczko@intel.com
|
|
In the next patch we will want to perform the same steps that
g2h worker function is doing but from the different worker.
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240606130639.1504-2-michal.wajdeczko@intel.com
|
|
When thresholds used to monitor VFs activities are configured,
then GuC may send GUC2PF_ADVERSE_EVENT messages informing the
PF driver about exceeded thresholds. Start handling such messages.
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-8-michal.wajdeczko@intel.com
|
|
System work queues are shared, use a dedicated work queue for G2H
processing to avoid G2H processing getting block behind system tasks.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506034758.3697397-1-matthew.brost@intel.com
|
|
By default CT code was passing just payload of the G2H event
message, while Relay code expects full G2H message including
HXG header which contains DATA0 field. Fix that.
Fixes: 26d4481ac23f ("drm/xe/guc: Start handling GuC Relay event messages")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419150351.358-1-michal.wajdeczko@intel.com
|
|
Now that assert_mem_access is relying directly on the pm_runtime state
instead of the counters, there's no reason why we cannot use
the pm_runtime functions directly.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-8-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
GuC CTB is related to the GT, so best to use xe_gt_assert().
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404193647.759-2-michal.wajdeczko@intel.com
|
|
A platform can have more than one GuC, so we should use GT-oriented
logs to refer to specific GuC.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404193647.759-1-michal.wajdeczko@intel.com
|
|
It is useful capture the GuC CT snapshot if the GuC CT has been
forcefully put into the stopped state. Enable snapshot capture when in
this state.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405211632.223568-3-matthew.brost@intel.com
|
|
The flags stored in the BO grew over time without following
much a naming pattern. First of all, get rid of the _BIT suffix that was
banned from everywhere else due to the guideline in
drivers/gpu/drm/i915/i915_reg.h that xe kind of follows:
Define bits using ``REG_BIT(N)``. Do **not** add ``_BIT`` suffix to the name.
Here the flags aren't for a register, but it's good practice to keep it
consistent.
Second divergence on names is the use or not of "CREATE". This is
because most of the flags are passed to xe_bo_create*() family of
functions, changing its behavior. However, since the flags are also
stored in the bo itself and checked elsewhere in the code, it seems
better to just omit the CREATE part.
With those 2 guidelines, all the flags are given the form
XE_BO_FLAG_<FLAG_NAME> with the following commands:
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i \
-e "s/XE_BO_\([_A-Z0-9]*\)_BIT/XE_BO_\1/g" \
-e 's/XE_BO_CREATE_/XE_BO_FLAG_/g'
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i -r \
-e 's/XE_BO_(DEFER_BACKING|SCANOUT|FIXED_PLACEMENT|PAGETABLE|NEEDS_CPU_ACCESS|NEEDS_UC|INTERNAL_TEST|INTERNAL_64K|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g'
And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to
follow the coding style.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
GuC will use VF_STATE_NOTIFY events to notify the PF about changes
of the VF state, in particular when a VF FLR was requested. Add
very minimal support for such events to avoid reporting errors due
to unexpected G2H. We will improve handling of these messages later.
While around also add few basic functions to control the VF state
(pause, resume, stop) as we will also exercise them soon.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240326191518.363-3-michal.wajdeczko@intel.com
|
|
The initialization via drmm_mutex_init can fail, so we need to check the
return code and escalate the failure.
The mutex initialization has been moved after all the other init steps
that can't fail, so we're always guaranteed to have those done and don't
have to check in the cleanup code.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240321195512.274210-1-daniele.ceraolospurio@intel.com
|
|
Add XE_BO_GGTT_INVALIDATE flag which indicates the GGTT should be
invalidated when a BO is added / removed from the GGTT. This is
typically set when a BO is used by the GuC as the GuC has GGTT TLBs.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
[mlankhorst: Small fix to only inherit GGTT_INVALIDATE from src bo]
[mlankhorst: Remove _BIT from name]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306052002.311196-4-matthew.brost@intel.com
|
|
GuC load will need to happen at an earlier point in probe, where local
memory is not yet available. Use system memory for GuC data structures
used for initial "hwconfig" load, and realloc at a later,
"post-hwconfig" load if needed, when local memory is available.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com
|
|
Make sure G2H handler is not running when changing the CT state to drop
messages or disabled. This will help prevent races in the code ensuring
that G2H are not being processed after changing the state.
v2:
- s/flush_g2h_handler/stop_g2h_handler (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo remove the extra line while pushing]
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-4-matthew.brost@intel.com
|
|
The Guc CT has more than enabled / disables states rather it has 4. The
4 states are not initialized, disabled, stopped, and enabled. Change the
code to reflect this. These states will enable proper return codes from
functions and therefore enable proper error messages.
v2:
- s/XE_GUC_CT_STATE_DROP_MESSAGES/XE_GUC_CT_STATE_STOPPED (Michal)
- Add assert for CT being initialized (Michal)
- Fix kernel for CT state enum (Michal)
v3:
- Kernel doc (Michal)
- s/reiecved/received (Michal)
- assert CT state not initialized in xe_guc_ct_init (Michal)
- add argument xe_guc_ct_set_state to clear g2h (Michal)
v4:
- Drop clear_outstanding_g2h argument (Michal)
v5:
- Move xa_destroy outside of fast lock (CI)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-2-matthew.brost@intel.com
|
|
Right now devcoredump has a new line between '**** GuC CT ****' and
'H2G CTB (all sizes in DW):' while other sections don't have.
v2: remove double new line after IPEHR
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-1-jose.souza@intel.com
|
|
Add initial documentation for recently updated xe_guc_ct_send_recv().
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240112102554.761-2-michal.wajdeczko@intel.com
|
|
Most of the synchronous GuC HXG action responses are defined in
such a way that only mandatory DATA0 from the HXG header is used
and only in few cases it is more than MBZ (must be zero).
For those cases where HXG action returns just DATA0, return that
value if caller didn't provide buffer for the full response.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240112102554.761-1-michal.wajdeczko@intel.com
|
|
While parsing and processing CTB G2H messages we should extract
underlying HXG message and use HXG definitions on such message.
Using outer CTB layer message in HXG definitions require use of
shifted dword index, which might be confusing:
FIELD_GET(GUC_HXG_MSG_0_xxx, msg[1])
instead of:
FIELD_GET(GUC_HXG_MSG_0_xxx, hxg[0])
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111210632.717-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
Not all CTB responses from the GuC are fixed size and we need to
pass response length to the caller, if there was a response_buffer.
Easiest solution is to return it as positive value from all
xe_guc_ct_send_recv() functions. The CTB response length is always
between 1 and 254 (ie. GUC_HXG_MSG_MIN_LEN and GUC_CTB_MAX_DWORDS
- GUC_HXG_MSG_MIN_LEN).
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111152724.497-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
GuC Relay infrastructure is ready, start handling relay messages
from the GuC to unblock testing on the live system.
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-11-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
We want to use replacement functions in upcoming kunit tests.
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-9-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
We're currently sending non-blocking H2G messages using the EVENT type,
which suppresses all CTB protocol replies from the GuC, including the
failure cases. This might cause errors to slip through and manifest as
unexpected behavior (e.g. a context state might not be what the driver
thinks it is because the state change command was silently rejected by
the GuC). To avoid this kind of problems, we can use the FAST_REQUEST
type instead, which suppresses the reply only on success; this way we
still get the advantage of not having to wait for an ack from the GuC
(i.e. the H2G is still non-blocking) while still detecting errors.
Since we can't escalate to the caller when a non-blocking message
fails, we need to escalate to GT reset instead.
Note that FAST_REQUEST failures are NOT expected and are usually a sign
that the H2G was either malformed or requested an illegal operation.
v2: assign fence values to FAST_REQUEST messages, fix abi doc, use xe_gt
printers (Michal).
v3: fix doc alignment, fix and improve prints (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v2
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
A helper for managed BO allocations makes it possible to remove specific
"fini" actions and will simplify the following patches adding ability to
execute a release action for specific BO directly.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On i915 we were adding new GuC ABI headers directly to guc_fwif.h
file since we were replacing old definitions from that file.
On xe driver we could do more and better by including ABI headers
only in files that need those definitions.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/741
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20231128203203.1147-3-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This variable holds full length of the message, including header
length so it should be checked against GUC_CTB_MSG_MAX_LEN.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Event tracing enabled for CTB submissions.
Additional minor refactor - Removed a unnecessary ct_to_xe() call.
v2: Remove a unwanted comment (Hari)
Add missing change to commit message
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Link: https://lore.kernel.org/intel-xe/20231019093140.1901665-2-balasubramani.vivekanandan@intel.com/
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add missing mutex_destroy calls to fini functions or convert to
drmm_mutex_init where fini function is not available.
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Bommithi Sakeena <bommithi.sakeena@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The XE_WARN_ON macro maps to WARN_ON which is not justified
in many cases where only a simple debug check is needed.
Replace the use of the XE_WARN_ON macro with the new xe_assert
macros which relies on drm_*. This takes a struct drm_device
argument, which is one of the main changes in this commit. The
other main change is that the condition is reversed, as with
XE_WARN_ON a message is displayed if the condition is true,
whereas with xe_assert it is if the condition is false.
v2:
- Rebase
- Keep WARN splats in xe_wopcm.c (Matt Roper)
v3:
- Rebase
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Use the generic drm_warn instead of the driver-specific XE_WARN_ON
in cases where XE_WARN_ON is used to unconditionally print a debug
message.
v2: Rebase
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Actually print the info.resv_space.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Engine was inappropriately used to refer to execution queues and it
also created some confusion with hardware engines. Where it applies
the exec_queue variable name is changed to q and comments are also
updated.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The header variable is unused, remove it.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Suggested-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls
WARN() instead of BUG(). BUG() crashes the kernel and should only be
used when it is absolutely unavoidable in case of catastrophic and
unrecoverable failures, which is not the case here.
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This is unused, remove it.
Suggested-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
XE_GUC_CT_SELFTEST enabled a debugfs entry to which ran a very simple
selftest ensuring the GuC CT code worked. This was added before the
kunit framework was available and before submissions were working too.
This test isn't worth porting over to the kunit frame as if the GuC CT
didn't work, literally almost nothing would work so just remove this.
Suggested-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The callers should already be holding the mem_access reference, before
calling into this.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
It looks like there is at least one race here, given that the
pm_runtime_suspended() check looks to return false if we are in the
process of suspending the device (RPM_SUSPENDING vs RPM_SUSPENDED). We
later also do xe_pm_runtime_get_if_active(), but since the device is
suspending or has now suspended, this doesn't do anything either.
Following from this we can potentially return from
xe_device_mem_access_get() with the device suspended or about to be,
leading to broken behaviour.
Attempt to fix this by always grabbing the runtime ref when our internal
ref transitions from 0 -> 1. The hard part is then dealing with the
runtime_pm callbacks also calling xe_device_mem_access_get() and
deadlocking, which the pm_runtime_suspended() check prevented.
v2:
- ct->lock looks to be primed with fs_reclaim, so holding that and then
allocating memory will cause lockdep to complain. Now that we
unconditionally grab the mem_access.lock around mem_access_{get,put}, we
need to change the ordering wrt to grabbing the ct->lock, since some of
the runtime_pm routines can allocate memory (or at least that's what
lockdep seems to suggest). Hopefully not a big deal. It might be that
there were already issues with this, just that the atomics where
"hiding" the potential issues.
v3:
- Use Thomas Hellström' idea with tracking the active task that is
executing in the resume or suspend callback, in order to avoid
recursive resume/suspend calls deadlocking on itself.
- Split the ct->lock change.
v4:
- Add smb_mb() around accessing the pm_callback_task for extra safety.
(Thomas Hellström)
v5:
- Clarify the kernel-doc for the mem_access.lock, given that it is quite
strange in what it protects (data vs code). The real motivation is to
aid lockdep. (Rodrigo Vivi)
v6:
- Split out the lock change. We still want this as a lockdep aid but
only for the xe_device_mem_access_get() path. Sticking a lock on the
put() looks be a no-go, also the runtime_put() there is always async.
- Now that the lock is gone move to atomics and rely on the pm code
serialising multiple callers on the 0 -> 1 transition.
- g2h_worker_func() looks to be the next issue, given that
suspend-resume callbacks are using CT, so try to handle that.
v7:
- Add xe_device_mem_access_get_if_ongoing(), and use it in
g2h_worker_func().
v8 (Anshuman):
- Just always grab the rpm, instead of just on the 0 -> 1 transition,
which is a lot clearer and simplifies the code quite a bit.
v9:
- Make sure we also adjust the CT fast-path with if-active.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/258
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Acked-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Reduce the number of warnings reported by checkpatch.pl from 118 to 48 by
addressing those warnings types:
LEADING_SPACE
LINE_SPACING
BRACES
TRAILING_SEMICOLON
CONSTANT_COMPARISON
BLOCK_COMMENT_STYLE
RETURN_VOID
ONE_SEMICOLON
SUSPECT_CODE_INDENT
LINE_CONTINUATIONS
UNNECESSARY_ELSE
UNSPECIFIED_INT
UNNECESSARY_INT
MISORDERED_TYPE
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
In various test cases that put the system under a heavy load, we can
sometimes see errors with missed TLB invalidations. In such cases we see
the interrupt arrive for the invalidation from the GuC, however the
actual processing of the completion is pushed onto a workqueue and
handled with all the other CT stuff, which might take longer than
expected. Since we expect TLB invalidations to complete within a
reasonable amount of time (at most ~250ms), and they do seem pretty
critical, allow handling directly from the CT fast-path.
v2 (José):
- Actually use the correct spinlock/unlock_irq, since pending_lock is
grabbed from IRQ.
v3:
- Don't publish the TLB fence on the list until after we fully
initialize it and successfully do the CT send. The list is now only
protected by the spin_lock pending_lock and we can't hold that
across the entire TLB send operation.
v4 (Matt Brost):
- Be careful with racing against fast CT path writing the seqno,
before we have actually published the fence.
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/297
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/320
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/449
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Looks to always to be zero when inspecting the CTB dump.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|