diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-12 11:21:29 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-12 11:21:29 -0800 |
commit | 9d33edb20f7e6943250d6bb96ceaf2368f674d51 (patch) | |
tree | 4f31a59b262b5cca1905b3f74a43cfab09bed416 /kernel | |
parent | f10bc40168032962ebee26894bdbdc972cde35bf (diff) | |
parent | 6132a490f9c81d621fdb4e8c12f617dc062130a2 (diff) | |
download | lwn-9d33edb20f7e6943250d6bb96ceaf2368f674d51.tar.gz lwn-9d33edb20f7e6943250d6bb96ceaf2368f674d51.zip |
Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt core and driver subsystem:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for
PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows
device manufactures to provide implementation defined storage for MSI
messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified
message store which is uniform accross all devices). The PCI/MSI[-X]
uniformity allowed us to get away with "global" PCI/MSI domains.
IMS not only allows to overcome the size limitations of the MSI-X
table, but also gives the device manufacturer the freedom to store the
message in arbitrary places, even in host memory which is shared with
the device.
There have been several attempts to glue this into the current MSI
code, but after lengthy discussions it turned out that there is a
fundamental design problem in the current PCI/MSI-X implementation.
This needs some historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management
was completely different from what we have today in the actively
developed architectures. Interrupt management was completely
architecture specific and while there were attempts to create common
infrastructure the commonalities were rudimentary and just providing
shared data structures and interfaces so that drivers could be written
in an architecture agnostic way.
The initial PCI/MSI[-X] support obviously plugged into this model
which resulted in some basic shared infrastructure in the PCI core
code for setting up MSI descriptors, which are a pure software
construct for holding data relevant for a particular MSI interrupt,
but the actual association to Linux interrupts was completely
architecture specific. This model is still supported today to keep
museum architectures and notorious stragglers alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the
kernel, which was creating yet another architecture specific mechanism
and resulted in an unholy mess on top of the existing horrors of x86
interrupt handling. The x86 interrupt management code was already an
incomprehensible maze of indirections between the CPU vector
management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X]
implementation.
At roughly the same time ARM struggled with the ever growing SoC
specific extensions which were glued on top of the architected GIC
interrupt controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86
vector domain and interrupt remapping and also allowed ARM to handle
the zoo of SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that
it is not available the PCI/MSI[-X] domains have the vector domain as
their parent. This reduced the required interaction between the
domains pretty much to the initialization phase where it is obviously
required to establish the proper parent relation ship in the
components of the hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the
hardware it's clear that the actual PCI/MSI[-X] interrupt controller
is not a global entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the
easy solution of providing "global" PCI/MSI domains which was possible
because the PCI/MSI[-X] handling is uniform across the devices. This
also allowed to keep the existing PCI/MSI[-X] infrastructure mostly
unchanged which in turn made it simple to keep the existing
architecture specific management alive.
A similar problem was created in the ARM world with support for IP
block specific message storage. Instead of going all the way to stack
a IP block specific domain on top of the generic MSI domain this ended
in a construct which provides a "global" platform MSI domain which
allows overriding the irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the
MSI infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into
the existing infrastructure. Some of this just works by chance on
particular platforms but will fail in hard to diagnose ways when the
driver is used on platforms where the underlying MSI interrupt
management code does not expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront
to avoid resource exhaustion. They are expanded at run-time when the
guest actually tries to use them. The way how this is implemented is
that the host disables MSI-X and then re-enables it with a larger
number of vectors again. That works by chance because most device
drivers set up all interrupts before the device actually will utilize
them. But that's not universally true because some drivers allocate a
large enough number of vectors but do not utilize them until it's
actually required, e.g. for acceleration support. But at that point
other interrupts of the device might be in active use and the MSI-X
disable/enable dance can just result in losing interrupts and
therefore hard to diagnose subtle problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact
that IMS is not longer providing a uniform storage and configuration
model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per
device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for
PCI/IMS. PCI/IMS has been verified with the work in progress IDXD
driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place"
* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits)
irqchip/ti-sci-inta: Fix kernel doc
irqchip/gic-v2m: Mark a few functions __init
irqchip/gic-v2m: Include arm-gic-common.h
irqchip/irq-mvebu-icu: Fix works by chance pointer assignment
iommu/amd: Enable PCI/IMS
iommu/vt-d: Enable PCI/IMS
x86/apic/msi: Enable PCI/IMS
PCI/MSI: Provide pci_ims_alloc/free_irq()
PCI/MSI: Provide IMS (Interrupt Message Store) support
genirq/msi: Provide constants for PCI/IMS support
x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
PCI/MSI: Provide prepare_desc() MSI domain op
PCI/MSI: Split MSI-X descriptor setup
genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
genirq/msi: Provide msi_domain_alloc_irq_at()
genirq/msi: Provide msi_domain_ops:: Prepare_desc()
genirq/msi: Provide msi_desc:: Msi_data
genirq/msi: Provide struct msi_map
x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
...
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/irq/Kconfig | 7 | ||||
-rw-r--r-- | kernel/irq/chip.c | 8 | ||||
-rw-r--r-- | kernel/irq/internals.h | 2 | ||||
-rw-r--r-- | kernel/irq/irqdesc.c | 15 | ||||
-rw-r--r-- | kernel/irq/manage.c | 4 | ||||
-rw-r--r-- | kernel/irq/msi.c | 914 |
6 files changed, 761 insertions, 189 deletions
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index db3d174c53d4..b64c44ae4c25 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -86,15 +86,10 @@ config GENERIC_IRQ_IPI depends on SMP select IRQ_DOMAIN_HIERARCHY -# Generic MSI interrupt support -config GENERIC_MSI_IRQ - bool - # Generic MSI hierarchical interrupt domain support -config GENERIC_MSI_IRQ_DOMAIN +config GENERIC_MSI_IRQ bool select IRQ_DOMAIN_HIERARCHY - select GENERIC_MSI_IRQ config IRQ_MSI_IOMMU bool diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 8ac37e8e738a..49e7bc871fec 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1561,10 +1561,10 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) return 0; } -static struct device *irq_get_parent_device(struct irq_data *data) +static struct device *irq_get_pm_device(struct irq_data *data) { if (data->domain) - return data->domain->dev; + return data->domain->pm_dev; return NULL; } @@ -1578,7 +1578,7 @@ static struct device *irq_get_parent_device(struct irq_data *data) */ int irq_chip_pm_get(struct irq_data *data) { - struct device *dev = irq_get_parent_device(data); + struct device *dev = irq_get_pm_device(data); int retval = 0; if (IS_ENABLED(CONFIG_PM) && dev) @@ -1597,7 +1597,7 @@ int irq_chip_pm_get(struct irq_data *data) */ int irq_chip_pm_put(struct irq_data *data) { - struct device *dev = irq_get_parent_device(data); + struct device *dev = irq_get_pm_device(data); int retval = 0; if (IS_ENABLED(CONFIG_PM) && dev) diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index f09c60393e55..5fdc0b557579 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -52,6 +52,7 @@ enum { * IRQS_PENDING - irq is pending and replayed later * IRQS_SUSPENDED - irq is suspended * IRQS_NMI - irq line is used to deliver NMIs + * IRQS_SYSFS - descriptor has been added to sysfs */ enum { IRQS_AUTODETECT = 0x00000001, @@ -64,6 +65,7 @@ enum { IRQS_SUSPENDED = 0x00000800, IRQS_TIMINGS = 0x00001000, IRQS_NMI = 0x00002000, + IRQS_SYSFS = 0x00004000, }; #include "debug.h" diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index a91f9001103c..fd0996274401 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -288,22 +288,25 @@ static void irq_sysfs_add(int irq, struct irq_desc *desc) if (irq_kobj_base) { /* * Continue even in case of failure as this is nothing - * crucial. + * crucial and failures in the late irq_sysfs_init() + * cannot be rolled back. */ if (kobject_add(&desc->kobj, irq_kobj_base, "%d", irq)) pr_warn("Failed to add kobject for irq %d\n", irq); + else + desc->istate |= IRQS_SYSFS; } } static void irq_sysfs_del(struct irq_desc *desc) { /* - * If irq_sysfs_init() has not yet been invoked (early boot), then - * irq_kobj_base is NULL and the descriptor was never added. - * kobject_del() complains about a object with no parent, so make - * it conditional. + * Only invoke kobject_del() when kobject_add() was successfully + * invoked for the descriptor. This covers both early boot, where + * sysfs is not initialized yet, and the case of a failed + * kobject_add() invocation. */ - if (irq_kobj_base) + if (desc->istate & IRQS_SYSFS) kobject_del(&desc->kobj); } diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 40fe7806cc8c..5b7cf28df290 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -321,7 +321,7 @@ static int irq_try_set_affinity(struct irq_data *data, } static bool irq_set_affinity_deactivated(struct irq_data *data, - const struct cpumask *mask, bool force) + const struct cpumask *mask) { struct irq_desc *desc = irq_data_to_desc(data); @@ -354,7 +354,7 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, if (!chip || !chip->irq_set_affinity) return -EINVAL; - if (irq_set_affinity_deactivated(data, mask, force)) + if (irq_set_affinity_deactivated(data, mask)) return 0; if (irq_can_move_pcntxt(data) && !irqd_is_setaffinity_pending(data)) { diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index a9ee535293eb..bd4d4dd626b4 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -19,8 +19,31 @@ #include "internals.h" +/** + * struct msi_ctrl - MSI internal management control structure + * @domid: ID of the domain on which management operations should be done + * @first: First (hardware) slot index to operate on + * @last: Last (hardware) slot index to operate on + * @nirqs: The number of Linux interrupts to allocate. Can be larger + * than the range due to PCI/multi-MSI. + */ +struct msi_ctrl { + unsigned int domid; + unsigned int first; + unsigned int last; + unsigned int nirqs; +}; + +/* Invalid Xarray index which is outside of any searchable range */ +#define MSI_XA_MAX_INDEX (ULONG_MAX - 1) +/* The maximum domain size */ +#define MSI_XA_DOMAIN_SIZE (MSI_MAX_INDEX + 1) + +static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl); +static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid); static inline int msi_sysfs_create_group(struct device *dev); + /** * msi_alloc_desc - Allocate an initialized msi_desc * @dev: Pointer to the device for which this is allocated @@ -33,7 +56,7 @@ static inline int msi_sysfs_create_group(struct device *dev); * Return: pointer to allocated &msi_desc on success or %NULL on failure */ static struct msi_desc *msi_alloc_desc(struct device *dev, int nvec, - const struct irq_affinity_desc *affinity) + const struct irq_affinity_desc *affinity) { struct msi_desc *desc = kzalloc(sizeof(*desc), GFP_KERNEL); @@ -58,25 +81,56 @@ static void msi_free_desc(struct msi_desc *desc) kfree(desc); } -static int msi_insert_desc(struct msi_device_data *md, struct msi_desc *desc, unsigned int index) +static int msi_insert_desc(struct device *dev, struct msi_desc *desc, + unsigned int domid, unsigned int index) { + struct msi_device_data *md = dev->msi.data; + struct xarray *xa = &md->__domains[domid].store; + unsigned int hwsize; int ret; - desc->msi_index = index; - ret = xa_insert(&md->__store, index, desc, GFP_KERNEL); - if (ret) - msi_free_desc(desc); + hwsize = msi_domain_get_hwsize(dev, domid); + + if (index == MSI_ANY_INDEX) { + struct xa_limit limit = { .min = 0, .max = hwsize - 1 }; + unsigned int index; + + /* Let the xarray allocate a free index within the limit */ + ret = xa_alloc(xa, &index, desc, limit, GFP_KERNEL); + if (ret) + goto fail; + + desc->msi_index = index; + return 0; + } else { + if (index >= hwsize) { + ret = -ERANGE; + goto fail; + } + + desc->msi_index = index; + ret = xa_insert(xa, index, desc, GFP_KERNEL); + if (ret) + goto fail; + return 0; + } +fail: + msi_free_desc(desc); return ret; } /** - * msi_add_msi_desc - Allocate and initialize a MSI descriptor + * msi_domain_insert_msi_desc - Allocate and initialize a MSI descriptor and + * insert it at @init_desc->msi_index + * * @dev: Pointer to the device for which the descriptor is allocated + * @domid: The id of the interrupt domain to which the desriptor is added * @init_desc: Pointer to an MSI descriptor to initialize the new descriptor * * Return: 0 on success or an appropriate failure code. */ -int msi_add_msi_desc(struct device *dev, struct msi_desc *init_desc) +int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid, + struct msi_desc *init_desc) { struct msi_desc *desc; @@ -88,40 +142,8 @@ int msi_add_msi_desc(struct device *dev, struct msi_desc *init_desc) /* Copy type specific data to the new descriptor. */ desc->pci = init_desc->pci; - return msi_insert_desc(dev->msi.data, desc, init_desc->msi_index); -} -/** - * msi_add_simple_msi_descs - Allocate and initialize MSI descriptors - * @dev: Pointer to the device for which the descriptors are allocated - * @index: Index for the first MSI descriptor - * @ndesc: Number of descriptors to allocate - * - * Return: 0 on success or an appropriate failure code. - */ -static int msi_add_simple_msi_descs(struct device *dev, unsigned int index, unsigned int ndesc) -{ - unsigned int idx, last = index + ndesc - 1; - struct msi_desc *desc; - int ret; - - lockdep_assert_held(&dev->msi.data->mutex); - - for (idx = index; idx <= last; idx++) { - desc = msi_alloc_desc(dev, 1, NULL); - if (!desc) - goto fail_mem; - ret = msi_insert_desc(dev->msi.data, desc, idx); - if (ret) - goto fail; - } - return 0; - -fail_mem: - ret = -ENOMEM; -fail: - msi_free_msi_descs_range(dev, MSI_DESC_NOTASSOCIATED, index, last); - return ret; + return msi_insert_desc(dev, desc, domid, init_desc->msi_index); } static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter) @@ -138,28 +160,96 @@ static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter) return false; } +static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl) +{ + unsigned int hwsize; + + if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS || + !dev->msi.data->__domains[ctrl->domid].domain)) + return false; + + hwsize = msi_domain_get_hwsize(dev, ctrl->domid); + if (WARN_ON_ONCE(ctrl->first > ctrl->last || + ctrl->first >= hwsize || + ctrl->last >= hwsize)) + return false; + return true; +} + +static void msi_domain_free_descs(struct device *dev, struct msi_ctrl *ctrl) +{ + struct msi_desc *desc; + struct xarray *xa; + unsigned long idx; + + lockdep_assert_held(&dev->msi.data->mutex); + + if (!msi_ctrl_valid(dev, ctrl)) + return; + + xa = &dev->msi.data->__domains[ctrl->domid].store; + xa_for_each_range(xa, idx, desc, ctrl->first, ctrl->last) { + xa_erase(xa, idx); + + /* Leak the descriptor when it is still referenced */ + if (WARN_ON_ONCE(msi_desc_match(desc, MSI_DESC_ASSOCIATED))) + continue; + msi_free_desc(desc); + } +} + /** - * msi_free_msi_descs_range - Free MSI descriptors of a device - * @dev: Device to free the descriptors - * @filter: Descriptor state filter - * @first_index: Index to start freeing from - * @last_index: Last index to be freed + * msi_domain_free_msi_descs_range - Free a range of MSI descriptors of a device in an irqdomain + * @dev: Device for which to free the descriptors + * @domid: Id of the domain to operate on + * @first: Index to start freeing from (inclusive) + * @last: Last index to be freed (inclusive) */ -void msi_free_msi_descs_range(struct device *dev, enum msi_desc_filter filter, - unsigned int first_index, unsigned int last_index) +void msi_domain_free_msi_descs_range(struct device *dev, unsigned int domid, + unsigned int first, unsigned int last) +{ + struct msi_ctrl ctrl = { + .domid = domid, + .first = first, + .last = last, + }; + + msi_domain_free_descs(dev, &ctrl); +} + +/** + * msi_domain_add_simple_msi_descs - Allocate and initialize MSI descriptors + * @dev: Pointer to the device for which the descriptors are allocated + * @ctrl: Allocation control struct + * + * Return: 0 on success or an appropriate failure code. + */ +static int msi_domain_add_simple_msi_descs(struct device *dev, struct msi_ctrl *ctrl) { - struct xarray *xa = &dev->msi.data->__store; struct msi_desc *desc; - unsigned long idx; + unsigned int idx; + int ret; lockdep_assert_held(&dev->msi.data->mutex); - xa_for_each_range(xa, idx, desc, first_index, last_index) { - if (msi_desc_match(desc, filter)) { - xa_erase(xa, idx); - msi_free_desc(desc); - } + if (!msi_ctrl_valid(dev, ctrl)) + return -EINVAL; + + for (idx = ctrl->first; idx <= ctrl->last; idx++) { + desc = msi_alloc_desc(dev, 1, NULL); + if (!desc) + goto fail_mem; + ret = msi_insert_desc(dev, desc, ctrl->domid, idx); + if (ret) + goto fail; } + return 0; + +fail_mem: + ret = -ENOMEM; +fail: + msi_domain_free_descs(dev, ctrl); + return ret; } void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg) @@ -178,9 +268,13 @@ EXPORT_SYMBOL_GPL(get_cached_msi_msg); static void msi_device_data_release(struct device *dev, void *res) { struct msi_device_data *md = res; + int i; - WARN_ON_ONCE(!xa_empty(&md->__store)); - xa_destroy(&md->__store); + for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++) { + msi_remove_device_irq_domain(dev, i); + WARN_ON_ONCE(!xa_empty(&md->__domains[i].store)); + xa_destroy(&md->__domains[i].store); + } dev->msi.data = NULL; } @@ -197,7 +291,7 @@ static void msi_device_data_release(struct device *dev, void *res) int msi_setup_device_data(struct device *dev) { struct msi_device_data *md; - int ret; + int ret, i; if (dev->msi.data) return 0; @@ -212,7 +306,18 @@ int msi_setup_device_data(struct device *dev) return ret; } - xa_init(&md->__store); + for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++) + xa_init_flags(&md->__domains[i].store, XA_FLAGS_ALLOC); + + /* + * If @dev::msi::domain is set and is a global MSI domain, copy the + * pointer into the domain array so all code can operate on domain + * ids. The NULL pointer check is required to keep the legacy + * architecture specific PCI/MSI support working. + */ + if (dev->msi.domain && !irq_domain_is_msi_parent(dev->msi.domain)) + md->__domains[MSI_DEFAULT_DOMAIN].domain = dev->msi.domain; + mutex_init(&md->mutex); dev->msi.data = md; devres_add(dev, md); @@ -235,27 +340,30 @@ EXPORT_SYMBOL_GPL(msi_lock_descs); */ void msi_unlock_descs(struct device *dev) { - /* Invalidate the index wich was cached by the iterator */ - dev->msi.data->__iter_idx = MSI_MAX_INDEX; + /* Invalidate the index which was cached by the iterator */ + dev->msi.data->__iter_idx = MSI_XA_MAX_INDEX; mutex_unlock(&dev->msi.data->mutex); } EXPORT_SYMBOL_GPL(msi_unlock_descs); -static struct msi_desc *msi_find_desc(struct msi_device_data *md, enum msi_desc_filter filter) +static struct msi_desc *msi_find_desc(struct msi_device_data *md, unsigned int domid, + enum msi_desc_filter filter) { + struct xarray *xa = &md->__domains[domid].store; struct msi_desc *desc; - xa_for_each_start(&md->__store, md->__iter_idx, desc, md->__iter_idx) { + xa_for_each_start(xa, md->__iter_idx, desc, md->__iter_idx) { if (msi_desc_match(desc, filter)) return desc; } - md->__iter_idx = MSI_MAX_INDEX; + md->__iter_idx = MSI_XA_MAX_INDEX; return NULL; } /** - * msi_first_desc - Get the first MSI descriptor of a device + * msi_domain_first_desc - Get the first MSI descriptor of an irqdomain associated to a device * @dev: Device to operate on + * @domid: The id of the interrupt domain which should be walked. * @filter: Descriptor state filter * * Must be called with the MSI descriptor mutex held, i.e. msi_lock_descs() @@ -264,23 +372,26 @@ static struct msi_desc *msi_find_desc(struct msi_device_data *md, enum msi_desc_ * Return: Pointer to the first MSI descriptor matching the search * criteria, NULL if none found. */ -struct msi_desc *msi_first_desc(struct device *dev, enum msi_desc_filter filter) +struct msi_desc *msi_domain_first_desc(struct device *dev, unsigned int domid, + enum msi_desc_filter filter) { struct msi_device_data *md = dev->msi.data; - if (WARN_ON_ONCE(!md)) + if (WARN_ON_ONCE(!md || domid >= MSI_MAX_DEVICE_IRQDOMAINS)) return NULL; lockdep_assert_held(&md->mutex); md->__iter_idx = 0; - return msi_find_desc(md, filter); + return msi_find_desc(md, domid, filter); } -EXPORT_SYMBOL_GPL(msi_first_desc); +EXPORT_SYMBOL_GPL(msi_domain_first_desc); /** * msi_next_desc - Get the next MSI descriptor of a device * @dev: Device to operate on + * @domid: The id of the interrupt domain which should be walked. + * @filter: Descriptor state filter * * The first invocation of msi_next_desc() has to be preceeded by a * successful invocation of __msi_first_desc(). Consecutive invocations are @@ -290,11 +401,12 @@ EXPORT_SYMBOL_GPL(msi_first_desc); * Return: Pointer to the next MSI descriptor matching the search * criteria, NULL if none found. */ -struct msi_desc *msi_next_desc(struct device *dev, enum msi_desc_filter filter) +struct msi_desc *msi_next_desc(struct device *dev, unsigned int domid, + enum msi_desc_filter filter) { struct msi_device_data *md = dev->msi.data; - if (WARN_ON_ONCE(!md)) + if (WARN_ON_ONCE(!md || domid >= MSI_MAX_DEVICE_IRQDOMAINS)) return NULL; lockdep_assert_held(&md->mutex); @@ -303,30 +415,38 @@ struct msi_desc *msi_next_desc(struct device *dev, enum msi_desc_filter filter) return NULL; md->__iter_idx++; - return msi_find_desc(md, filter); + return msi_find_desc(md, domid, filter); } EXPORT_SYMBOL_GPL(msi_next_desc); /** - * msi_get_virq - Return Linux interrupt number of a MSI interrupt + * msi_domain_get_virq - Lookup the Linux interrupt number for a MSI index on a interrupt domain * @dev: Device to operate on + * @domid: Domain ID of the interrupt domain associated to the device * @index: MSI interrupt index to look for (0-based) * * Return: The Linux interrupt number on success (> 0), 0 if not found */ -unsigned int msi_get_virq(struct device *dev, unsigned int index) +unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index) { struct msi_desc *desc; unsigned int ret = 0; - bool pcimsi; + bool pcimsi = false; + struct xarray *xa; if (!dev->msi.data) return 0; - pcimsi = dev_is_pci(dev) ? to_pci_dev(dev)->msi_enabled : false; + if (WARN_ON_ONCE(index > MSI_MAX_INDEX || domid >= MSI_MAX_DEVICE_IRQDOMAINS)) + return 0; + + /* This check is only valid for the PCI default MSI domain */ + if (dev_is_pci(dev) && domid == MSI_DEFAULT_DOMAIN) + pcimsi = to_pci_dev(dev)->msi_enabled; msi_lock_descs(dev); - desc = xa_load(&dev->msi.data->__store, pcimsi ? 0 : index); + xa = &dev->msi.data->__domains[domid].store; + desc = xa_load(xa, pcimsi ? 0 : index); if (desc && desc->irq) { /* * PCI-MSI has only one descriptor for multiple interrupts. @@ -340,10 +460,11 @@ unsigned int msi_get_virq(struct device *dev, unsigned int index) ret = desc->irq; } } + msi_unlock_descs(dev); return ret; } -EXPORT_SYMBOL_GPL(msi_get_virq); +EXPORT_SYMBOL_GPL(msi_domain_get_virq); #ifdef CONFIG_SYSFS static struct attribute *msi_dev_attrs[] = { @@ -459,7 +580,39 @@ static inline int msi_sysfs_populate_desc(struct device *dev, struct msi_desc *d static inline void msi_sysfs_remove_desc(struct device *dev, struct msi_desc *desc) { } #endif /* !CONFIG_SYSFS */ -#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN +static struct irq_domain *msi_get_device_domain(struct device *dev, unsigned int domid) +{ + struct irq_domain *domain; + + lockdep_assert_held(&dev->msi.data->mutex); + + if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS)) + return NULL; + + domain = dev->msi.data->__domains[domid].domain; + if (!domain) + return NULL; + + if (WARN_ON_ONCE(irq_domain_is_msi_parent(domain))) + return NULL; + + return domain; +} + +static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid) +{ + struct msi_domain_info *info; + struct irq_domain *domain; + + domain = msi_get_device_domain(dev, domid); + if (domain) { + info = domain->host_data; + return info->hwsize; + } + /* No domain, no size... */ + return 0; +} + static inline void irq_chip_write_msi_msg(struct irq_data *data, struct msi_msg *msg) { @@ -613,21 +766,11 @@ static int msi_domain_ops_init(struct irq_domain *domain, return 0; } -static int msi_domain_ops_check(struct irq_domain *domain, - struct msi_domain_info *info, - struct device *dev) -{ - return 0; -} - static struct msi_domain_ops msi_domain_ops_default = { .get_hwirq = msi_domain_ops_get_hwirq, .msi_init = msi_domain_ops_init, - .msi_check = msi_domain_ops_check, .msi_prepare = msi_domain_ops_prepare, .set_desc = msi_domain_ops_set_desc, - .domain_alloc_irqs = __msi_domain_alloc_irqs, - .domain_free_irqs = __msi_domain_free_irqs, }; static void msi_domain_update_dom_ops(struct msi_domain_info *info) @@ -639,11 +782,6 @@ static void msi_domain_update_dom_ops(struct msi_domain_info *info) return; } - if (ops->domain_alloc_irqs == NULL) - ops->domain_alloc_irqs = msi_domain_ops_default.domain_alloc_irqs; - if (ops->domain_free_irqs == NULL) - ops->domain_free_irqs = msi_domain_ops_default.domain_free_irqs; - if (!(info->flags & MSI_FLAG_USE_DEF_DOM_OPS)) return; @@ -651,8 +789,6 @@ static void msi_domain_update_dom_ops(struct msi_domain_info *info) ops->get_hwirq = msi_domain_ops_default.get_hwirq; if (ops->msi_init == NULL) ops->msi_init = msi_domain_ops_default.msi_init; - if (ops->msi_check == NULL) - ops->msi_check = msi_domain_ops_default.msi_check; if (ops->msi_prepare == NULL) ops->msi_prepare = msi_domain_ops_default.msi_prepare; if (ops->set_desc == NULL) @@ -668,6 +804,40 @@ static void msi_domain_update_chip_ops(struct msi_domain_info *info) chip->irq_set_affinity = msi_domain_set_affinity; } +static struct irq_domain *__msi_create_irq_domain(struct fwnode_handle *fwnode, + struct msi_domain_info *info, + unsigned int flags, + struct irq_domain *parent) +{ + struct irq_domain *domain; + + if (info->hwsize > MSI_XA_DOMAIN_SIZE) + return NULL; + + /* + * Hardware size 0 is valid for backwards compatibility and for + * domains which are not backed by a hardware table. Grant the + * maximum index space. + */ + if (!info->hwsize) + info->hwsize = MSI_XA_DOMAIN_SIZE; + + msi_domain_update_dom_ops(info); + if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS) + msi_domain_update_chip_ops(info); + + domain = irq_domain_create_hierarchy(parent, flags | IRQ_DOMAIN_FLAG_MSI, 0, + fwnode, &msi_domain_ops, info); + + if (domain) { + if (!domain->name && info->chip) + domain->name = info->chip->name; + irq_domain_update_bus_token(domain, info->bus_token); + } + + return domain; +} + /** * msi_create_irq_domain - Create an MSI interrupt domain * @fwnode: Optional fwnode of the interrupt controller @@ -680,19 +850,210 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode, struct msi_domain_info *info, struct irq_domain *parent) { + return __msi_create_irq_domain(fwnode, info, 0, parent); +} + +/** + * msi_parent_init_dev_msi_info - Delegate initialization of device MSI info down + * in the domain hierarchy + * @dev: The device for which the domain should be created + * @domain: The domain in the hierarchy this op is being called on + * @msi_parent_domain: The IRQ_DOMAIN_FLAG_MSI_PARENT domain for the child to + * be created + * @msi_child_info: The MSI domain info of the IRQ_DOMAIN_FLAG_MSI_DEVICE + * domain to be created + * + * Return: true on success, false otherwise + * + * This is the most complex problem of per device MSI domains and the + * underlying interrupt domain hierarchy: + * + * The device domain to be initialized requests the broadest feature set + * possible and the underlying domain hierarchy puts restrictions on it. + * + * That's trivial for a simple parent->child relationship, but it gets + * interesting with an intermediate domain: root->parent->child. The + * intermediate 'parent' can expand the capabilities which the 'root' + * domain is providing. So that creates a classic hen and egg problem: + * Which entity is doing the restrictions/expansions? + * + * One solution is to let the root domain handle the initialization that's + * why there is the @domain and the @msi_parent_domain pointer. + */ +bool msi_parent_init_dev_msi_info(struct device *dev, struct irq_domain *domain, + struct irq_domain *msi_parent_domain, + struct msi_domain_info *msi_child_info) +{ + struct irq_domain *parent = domain->parent; + + if (WARN_ON_ONCE(!parent || !parent->msi_parent_ops || + !parent->msi_parent_ops->init_dev_msi_info)) + return false; + + return parent->msi_parent_ops->init_dev_msi_info(dev, parent, msi_parent_domain, + msi_child_info); +} + +/** + * msi_create_device_irq_domain - Create a device MSI interrupt domain + * @dev: Pointer to the device + * @domid: Domain id + * @template: MSI domain info bundle used as template + * @hwsize: Maximum number of MSI table entries (0 if unknown or unlimited) + * @domain_data: Optional pointer to domain specific data which is set in + * msi_domain_info::data + * @chip_data: Optional pointer to chip specific data which is set in + * msi_domain_info::chip_data + * + * Return: True on success, false otherwise + * + * There is no firmware node required for this interface because the per + * device domains are software constructs which are actually closer to the + * hardware reality than any firmware can describe them. + * + * The domain name and the irq chip name for a MSI device domain are + * composed by: "$(PREFIX)$(CHIPNAME)-$(DEVNAME)" + * + * $PREFIX: Optional prefix provided by the underlying MSI parent domain + * via msi_parent_ops::prefix. If that pointer is NULL the prefix + * is empty. + * $CHIPNAME: The name of the irq_chip in @template + * $DEVNAME: The name of the device + * + * This results in understandable chip names and hardware interrupt numbers + * in e.g. /proc/interrupts + * + * PCI-MSI-0000:00:1c.0 0-edge Parent domain has no prefix + * IR-PCI-MSI-0000:00:1c.4 0-edge Same with interrupt remapping prefix 'IR-' + * + * IR-PCI-MSIX-0000:3d:00.0 0-edge Hardware interrupt numbers reflect + * IR-PCI-MSIX-0000:3d:00.0 1-edge the real MSI-X index on that device + * IR-PCI-MSIX-0000:3d:00.0 2-edge + * + * On IMS domains the hardware interrupt number is either a table entry + * index or a purely software managed index but it is guaranteed to be + * unique. + * + * The domain pointer is stored in @dev::msi::data::__irqdomains[]. All + * subsequent operations on the domain depend on the domain id. + * + * The domain is automatically freed when the device is removed via devres + * in the context of @dev::msi::data freeing, but it can also be + * independently removed via @msi_remove_device_irq_domain(). + */ +bool msi_create_device_irq_domain(struct device *dev, unsigned int domid, + const struct msi_domain_template *template, + unsigned int hwsize, void *domain_data, + void *chip_data) +{ + struct irq_domain *domain, *parent = dev->msi.domain; + const struct msi_parent_ops *pops; + struct msi_domain_template *bundle; + struct fwnode_handle *fwnode; + + if (!irq_domain_is_msi_parent(parent)) + return false; + + if (domid >= MSI_MAX_DEVICE_IRQDOMAINS) + return false; + + bundle = kmemdup(template, sizeof(*bundle), GFP_KERNEL); + if (!bundle) + return false; + + bundle->info.hwsize = hwsize; + bundle->info.chip = &bundle->chip; + bundle->info.ops = &bundle->ops; + bundle->info.data = domain_data; + bundle->info.chip_data = chip_data; + + pops = parent->msi_parent_ops; + snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s", + pops->prefix ? : "", bundle->chip.name, dev_name(dev)); + bundle->chip.name = bundle->name; + + fwnode = irq_domain_alloc_named_fwnode(bundle->name); + if (!fwnode) + goto free_bundle; + + if (msi_setup_device_data(dev)) + goto free_fwnode; + + msi_lock_descs(dev); + + if (WARN_ON_ONCE(msi_get_device_domain(dev, domid))) + goto fail; + + if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info)) + goto fail; + + domain = __msi_create_irq_domain(fwnode, &bundle->info, IRQ_DOMAIN_FLAG_MSI_DEVICE, parent); + if (!domain) + goto fail; + + domain->dev = dev; + dev->msi.data->__domains[domid].domain = domain; + msi_unlock_descs(dev); + return true; + +fail: + msi_unlock_descs(dev); +free_fwnode: + kfree(fwnode); +free_bundle: + kfree(bundle); + return false; +} + +/** + * msi_remove_device_irq_domain - Free a device MSI interrupt domain + * @dev: Pointer to the device + * @domid: Domain id + */ +void msi_remove_device_irq_domain(struct device *dev, unsigned int domid) +{ + struct msi_domain_info *info; struct irq_domain *domain; - msi_domain_update_dom_ops(info); - if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS) - msi_domain_update_chip_ops(info); + msi_lock_descs(dev); - domain = irq_domain_create_hierarchy(parent, IRQ_DOMAIN_FLAG_MSI, 0, - fwnode, &msi_domain_ops, info); + domain = msi_get_device_domain(dev, domid); - if (domain && !domain->name && info->chip) - domain->name = info->chip->name; + if (!domain || !irq_domain_is_msi_device(domain)) + goto unlock; - return domain; + dev->msi.data->__domains[domid].domain = NULL; + info = domain->host_data; + irq_domain_remove(domain); + kfree(container_of(info, struct msi_domain_template, info)); + +unlock: + msi_unlock_descs(dev); +} + +/** + * msi_match_device_irq_domain - Match a device irq domain against a bus token + * @dev: Pointer to the device + * @domid: Domain id + * @bus_token: Bus token to match against the domain bus token + * + * Return: True if device domain exists and bus tokens match. + */ +bool msi_match_device_irq_domain(struct device *dev, unsigned int domid, + enum irq_domain_bus_token bus_token) +{ + struct msi_domain_info *info; + struct irq_domain *domain; + bool ret = false; + + msi_lock_descs(dev); + domain = msi_get_device_domain(dev, domid); + if (domain && irq_domain_is_msi_device(domain)) { + info = domain->host_data; + ret = info->bus_token == bus_token; + } + msi_unlock_descs(dev); + return ret; } int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev, @@ -700,13 +1061,8 @@ int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev, { struct msi_domain_info *info = domain->host_data; struct msi_domain_ops *ops = info->ops; - int ret; - ret = ops->msi_check(domain, info, dev); - if (ret == 0) - ret = ops->msi_prepare(domain, dev, nvec, arg); - - return ret; + return ops->msi_prepare(domain, dev, nvec, arg); } int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, @@ -714,16 +1070,27 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, { struct msi_domain_info *info = domain->host_data; struct msi_domain_ops *ops = info->ops; + struct msi_ctrl ctrl = { + .domid = MSI_DEFAULT_DOMAIN, + .first = virq_base, + .last = virq_base + nvec - 1, + }; struct msi_desc *desc; + struct xarray *xa; int ret, virq; + if (!msi_ctrl_valid(dev, &ctrl)) + return -EINVAL; + msi_lock_descs(dev); - ret = msi_add_simple_msi_descs(dev, virq_base, nvec); + ret = msi_domain_add_simple_msi_descs(dev, &ctrl); if (ret) goto unlock; + xa = &dev->msi.data->__domains[ctrl.domid].store; + for (virq = virq_base; virq < virq_base + nvec; virq++) { - desc = xa_load(&dev->msi.data->__store, virq); + desc = xa_load(xa, virq); desc->irq = virq; ops->set_desc(arg, desc); @@ -739,7 +1106,7 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, fail: for (--virq; virq >= virq_base; virq--) irq_domain_free_irqs_common(domain, virq, 1); - msi_free_msi_descs_range(dev, MSI_DESC_ALL, virq_base, virq_base + nvec - 1); + msi_domain_free_descs(dev, &ctrl); unlock: msi_unlock_descs(dev); return ret; @@ -764,6 +1131,8 @@ static bool msi_check_reservation_mode(struct irq_domain *domain, switch(domain->bus_token) { case DOMAIN_BUS_PCI_MSI: + case DOMAIN_BUS_PCI_DEVICE_MSI: + case DOMAIN_BUS_PCI_DEVICE_MSIX: case DOMAIN_BUS_VMD_MSI: break; default: @@ -789,6 +1158,8 @@ static int msi_handle_pci_fail(struct irq_domain *domain, struct msi_desc *desc, { switch(domain->bus_token) { case DOMAIN_BUS_PCI_MSI: + case DOMAIN_BUS_PCI_DEVICE_MSI: + case DOMAIN_BUS_PCI_DEVICE_MSIX: case DOMAIN_BUS_VMD_MSI: if (IS_ENABLED(CONFIG_PCI_MSI)) break; @@ -850,18 +1221,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag return 0; } -int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, - int nvec) +static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain, + struct msi_ctrl *ctrl) { + struct xarray *xa = &dev->msi.data->__domains[ctrl->domid].store; struct msi_domain_info *info = domain->host_data; struct msi_domain_ops *ops = info->ops; + unsigned int vflags = 0, allocated = 0; msi_alloc_info_t arg = { }; - unsigned int vflags = 0; struct msi_desc *desc; - int allocated = 0; + unsigned long idx; int i, ret, virq; - ret = msi_domain_prepare_irqs(domain, dev, nvec, &arg); + ret = msi_domain_prepare_irqs(domain, dev, ctrl->nirqs, &arg); if (ret) return ret; @@ -883,11 +1255,21 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, * MSI affinity setting requires a special quirk (X86) when * reservation mode is active. */ - if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK) + if (info->flags & MSI_FLAG_NOMASK_QUIRK) vflags |= VIRQ_NOMASK_QUIRK; } - msi_for_each_desc(desc, dev, MSI_DESC_NOTASSOCIATED) { + xa_for_each_range(xa, idx, desc, ctrl->first, ctrl->last) { + if (!msi_desc_match(desc, MSI_DESC_NOTASSOCIATED)) + continue; + + /* This should return -ECONFUSED... */ + if (WARN_ON_ONCE(allocated >= ctrl->nirqs)) + return -EINVAL; + + if (ops->prepare_desc) + ops->prepare_desc(domain, &arg, desc); + ops->set_desc(&arg, desc); virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used, @@ -913,76 +1295,213 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, return 0; } -static int msi_domain_add_simple_msi_descs(struct msi_domain_info *info, - struct device *dev, - unsigned int num_descs) +static int msi_domain_alloc_simple_msi_descs(struct device *dev, + struct msi_domain_info *info, + struct msi_ctrl *ctrl) { if (!(info->flags & MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS)) return 0; - return msi_add_simple_msi_descs(dev, 0, num_descs); + return msi_domain_add_simple_msi_descs(dev, ctrl); +} + +static int __msi_domain_alloc_locked(struct device *dev, struct msi_ctrl *ctrl) +{ + struct msi_domain_info *info; + struct msi_domain_ops *ops; + struct irq_domain *domain; + int ret; + + if (!msi_ctrl_valid(dev, ctrl)) + return -EINVAL; + + domain = msi_get_device_domain(dev, ctrl->domid); + if (!domain) + return -ENODEV; + + info = domain->host_data; + + ret = msi_domain_alloc_simple_msi_descs(dev, info, ctrl); + if (ret) + return ret; + + ops = info->ops; + if (ops->domain_alloc_irqs) + return ops->domain_alloc_irqs(domain, dev, ctrl->nirqs); + + return __msi_domain_alloc_irqs(dev, domain, ctrl); +} + +static int msi_domain_alloc_locked(struct device *dev, struct msi_ctrl *ctrl) +{ + int ret = __msi_domain_alloc_locked(dev, ctrl); + + if (ret) + msi_domain_free_locked(dev, ctrl); + return ret; } /** - * msi_domain_alloc_irqs_descs_locked - Allocate interrupts from a MSI interrupt domain - * @domain: The domain to allocate from + * msi_domain_alloc_irqs_range_locked - Allocate interrupts from a MSI interrupt domain * @dev: Pointer to device struct of the device for which the interrupts * are allocated - * @nvec: The number of interrupts to allocate + * @domid: Id of the interrupt domain to operate on + * @first: First index to allocate (inclusive) + * @last: Last index to allocate (inclusive) * * Must be invoked from within a msi_lock_descs() / msi_unlock_descs() - * pair. Use this for MSI irqdomains which implement their own vector + * pair. Use this for MSI irqdomains which implement their own descriptor * allocation/free. * * Return: %0 on success or an error code. */ -int msi_domain_alloc_irqs_descs_locked(struct irq_domain *domain, struct device *dev, - int nvec) +int msi_domain_alloc_irqs_range_locked(struct device *dev, unsigned int domid, + unsigned int first, unsigned int last) { - struct msi_domain_info *info = domain->host_data; - struct msi_domain_ops *ops = info->ops; - int ret; - - lockdep_assert_held(&dev->msi.data->mutex); + struct msi_ctrl ctrl = { + .domid = domid, + .first = first, + .last = last, + .nirqs = last + 1 - first, + }; + + return msi_domain_alloc_locked(dev, &ctrl); +} - ret = msi_domain_add_simple_msi_descs(info, dev, nvec); - if (ret) - return ret; +/** + * msi_domain_alloc_irqs_range - Allocate interrupts from a MSI interrupt domain + * @dev: Pointer to device struct of the device for which the interrupts + * are allocated + * @domid: Id of the interrupt domain to operate on + * @first: First index to allocate (inclusive) + * @last: Last index to allocate (inclusive) + * + * Return: %0 on success or an error code. + */ +int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid, + unsigned int first, unsigned int last) +{ + int ret; - ret = ops->domain_alloc_irqs(domain, dev, nvec); - if (ret) - msi_domain_free_irqs_descs_locked(domain, dev); + msi_lock_descs(dev); + ret = msi_domain_alloc_irqs_range_locked(dev, domid, first, last); + msi_unlock_descs(dev); return ret; } /** - * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain - * @domain: The domain to allocate from + * msi_domain_alloc_irqs_all_locked - Allocate all interrupts from a MSI interrupt domain + * * @dev: Pointer to device struct of the device for which the interrupts * are allocated - * @nvec: The number of interrupts to allocate + * @domid: Id of the interrupt domain to operate on + * @nirqs: The number of interrupts to allocate + * + * This function scans all MSI descriptors of the MSI domain and allocates interrupts + * for all unassigned ones. That function is to be used for MSI domain usage where + * the descriptor allocation is handled at the call site, e.g. PCI/MSI[X]. * * Return: %0 on success or an error code. */ -int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec) +int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int nirqs) { + struct msi_ctrl ctrl = { + .domid = domid, + .first = 0, + .last = msi_domain_get_hwsize(dev, domid) - 1, + .nirqs = nirqs, + }; + + return msi_domain_alloc_locked(dev, &ctrl); +} + +/** + * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at + * a given index - or at the next free index + * + * @dev: Pointer to device struct of the device for which the interrupts + * are allocated + * @domid: Id of the interrupt domain to operate on + * @index: Index for allocation. If @index == %MSI_ANY_INDEX the allocation + * uses the next free index. + * @affdesc: Optional pointer to an interrupt affinity descriptor structure + * @icookie: Optional pointer to a domain specific per instance cookie. If + * non-NULL the content of the cookie is stored in msi_desc::data. + * Must be NULL for MSI-X allocations + * + * This requires a MSI interrupt domain which lets the core code manage the + * MSI descriptors. + * + * Return: struct msi_map + * + * On success msi_map::index contains the allocated index number and + * msi_map::virq the corresponding Linux interrupt number + * + * On failure msi_map::index contains the error code and msi_map::virq + * is %0. + */ +struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index, + const struct irq_affinity_desc *affdesc, + union msi_instance_cookie *icookie) +{ + struct msi_ctrl ctrl = { .domid = domid, .nirqs = 1, }; + struct irq_domain *domain; + struct msi_map map = { }; + struct msi_desc *desc; int ret; msi_lock_descs(dev); - ret = msi_domain_alloc_irqs_descs_locked(domain, dev, nvec); + domain = msi_get_device_domain(dev, domid); + if (!domain) { + map.index = -ENODEV; + goto unlock; + } + + desc = msi_alloc_desc(dev, 1, affdesc); + if (!desc) { + map.index = -ENOMEM; + goto unlock; + } + + if (icookie) + desc->data.icookie = *icookie; + + ret = msi_insert_desc(dev, desc, domid, index); + if (ret) { + map.index = ret; + goto unlock; + } + + ctrl.first = ctrl.last = desc->msi_index; + + ret = __msi_domain_alloc_irqs(dev, domain, &ctrl); + if (ret) { + map.index = ret; + msi_domain_free_locked(dev, &ctrl); + } else { + map.index = desc->msi_index; + map.virq = desc->irq; + } +unlock: msi_unlock_descs(dev); - return ret; + return map; } -void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) +static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain, + struct msi_ctrl *ctrl) { + struct xarray *xa = &dev->msi.data->__domains[ctrl->domid].store; struct msi_domain_info *info = domain->host_data; struct irq_data *irqd; struct msi_desc *desc; + unsigned long idx; int i; - /* Only handle MSI entries which have an interrupt associated */ - msi_for_each_desc(desc, dev, MSI_DESC_ASSOCIATED) { + xa_for_each_range(xa, idx, desc, ctrl->first, ctrl->last) { + /* Only handle MSI entries which have an interrupt associated */ + if (!msi_desc_match(desc, MSI_DESC_ASSOCIATED)) + continue; + /* Make sure all interrupts are deactivated */ for (i = 0; i < desc->nvec_used; i++) { irqd = irq_domain_get_irq_data(domain, desc->irq + i); @@ -997,44 +1516,99 @@ void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) } } -static void msi_domain_free_msi_descs(struct msi_domain_info *info, - struct device *dev) +static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl) { + struct msi_domain_info *info; + struct msi_domain_ops *ops; + struct irq_domain *domain; + + if (!msi_ctrl_valid(dev, ctrl)) + return; + + domain = msi_get_device_domain(dev, ctrl->domid); + if (!domain) + return; + + info = domain->host_data; + ops = info->ops; + + if (ops->domain_free_irqs) + ops->domain_free_irqs(domain, dev); + else + __msi_domain_free_irqs(dev, domain, ctrl); + + if (ops->msi_post_free) + ops->msi_post_free(domain, dev); + if (info->flags & MSI_FLAG_FREE_MSI_DESCS) - msi_free_msi_descs(dev); + msi_domain_free_descs(dev, ctrl); } /** - * msi_domain_free_irqs_descs_locked - Free interrupts from a MSI interrupt @domain associated to @dev - * @domain: The domain to managing the interrupts + * msi_domain_free_irqs_range_locked - Free a range of interrupts from a MSI interrupt domain + * associated to @dev with msi_lock held * @dev: Pointer to device struct of the device for which the interrupts - * are free + * are freed + * @domid: Id of the interrupt domain to operate on + * @first: First index to free (inclusive) + * @last: Last index to free (inclusive) + */ +void msi_domain_free_irqs_range_locked(struct device *dev, unsigned int domid, + unsigned int first, unsigned int last) +{ + struct msi_ctrl ctrl = { + .domid = domid, + .first = first, + .last = last, + }; + msi_domain_free_locked(dev, &ctrl); +} + +/** + * msi_domain_free_irqs_range - Free a range of interrupts from a MSI interrupt domain + * associated to @dev + * @dev: Pointer to device struct of the device for which the interrupts + * are freed + * @domid: Id of the interrupt domain to operate on + * @first: First index to free (inclusive) + * @last: Last index to free (inclusive) + */ +void msi_domain_free_irqs_range(struct device *dev, unsigned int domid, + unsigned int first, unsigned int last) +{ + msi_lock_descs(dev); + msi_domain_free_irqs_range_locked(dev, domid, first, last); + msi_unlock_descs(dev); +} + +/** + * msi_domain_free_irqs_all_locked - Free all interrupts from a MSI interrupt domain + * associated to a device + * @dev: Pointer to device struct of the device for which the interrupts + * are freed + * @domid: The id of the domain to operate on * * Must be invoked from within a msi_lock_descs() / msi_unlock_descs() * pair. Use this for MSI irqdomains which implement their own vector * allocation. */ -void msi_domain_free_irqs_descs_locked(struct irq_domain *domain, struct device *dev) +void msi_domain_free_irqs_all_locked(struct device *dev, unsigned int domid) { - struct msi_domain_info *info = domain->host_data; - struct msi_domain_ops *ops = info->ops; - - lockdep_assert_held(&dev->msi.data->mutex); - - ops->domain_free_irqs(domain, dev); - msi_domain_free_msi_descs(info, dev); + msi_domain_free_irqs_range_locked(dev, domid, 0, + msi_domain_get_hwsize(dev, domid) - 1); } /** - * msi_domain_free_irqs - Free interrupts from a MSI interrupt @domain associated to @dev - * @domain: The domain to managing the interrupts + * msi_domain_free_irqs_all - Free all interrupts from a MSI interrupt domain + * associated to a device * @dev: Pointer to device struct of the device for which the interrupts - * are free + * are freed + * @domid: The id of the domain to operate on */ -void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) +void msi_domain_free_irqs_all(struct device *dev, unsigned int domid) { msi_lock_descs(dev); - msi_domain_free_irqs_descs_locked(domain, dev); + msi_domain_free_irqs_all_locked(dev, domid); msi_unlock_descs(dev); } @@ -1048,5 +1622,3 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain) { return (struct msi_domain_info *)domain->host_data; } - -#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */ |