From 36a1c2ee50f573972aea3c3019555f47ee0094c0 Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Fri, 17 Nov 2023 13:18:48 -0700 Subject: cxl/hdm: Fix a benign lockdep splat The new helper "cxl_num_decoders_committed()" added a lockdep assertion to validate that port->commit_end is protected against modification. That assertion fires in init_hdm_decoder() where it is initializing port->commit_end. Given that it is both accessing and writing that property it obstensibly needs the lock. In practice, CXL decoder commit rules (must commit in order) and the in-order discovery of device decoders makes the manipulation of ->commit_end in init_hdm_decoder() safe. However, rather than rely on the subtle rules of CXL hardware, just make the implementation obviously correct from a software perspective. The Fixes: tag is only for cleaning up a lockdep splat, there is no functional issue addressed by this fix. Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Signed-off-by: Dave Jiang Link: https://lore.kernel.org/r/170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3 Acked-by: Davidlohr Bueso Signed-off-by: Dan Williams --- drivers/cxl/core/hdm.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers') diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 1cc9be85ba4c..529baa8a1759 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -839,6 +839,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, cxld->target_type = CXL_DECODER_HOSTONLYMEM; else cxld->target_type = CXL_DECODER_DEVMEM; + + guard(rwsem_write)(&cxl_region_rwsem); if (cxld->id != cxl_num_decoders_committed(port)) { dev_warn(&port->dev, "decoder%d.%d: Committed out of order\n", -- cgit v1.2.3 From 5558b92e8d39e18aa19619be2ee37274e9592528 Mon Sep 17 00:00:00 2001 From: Alison Schofield Date: Sun, 26 Nov 2023 16:09:29 -0800 Subject: cxl/core: Always hold region_rwsem while reading poison lists A read of a device poison list is triggered via a sysfs attribute and the results are logged as kernel trace events of type cxl_poison. The work is managed by either: a) the region driver when one of more regions map the device, or by b) the memdev driver when no regions map the device. In the case of a) the region driver holds the region_rwsem while reading the poison by committed endpoint decoder mappings and for any unmapped resources. This makes sure that the cxl_poison trace event trace reports valid region info. (Region name, HPA, and UUID). In the case of b) the memdev driver holds the dpa_rwsem preventing new DPA resources from being attached to a region. However, it leaves a gap between region attach and decoder commit actions. If a DPA in the gap is in the poison list, the cxl_poison trace event will omit the region info. Close the gap by holding the region_rwsem and the dpa_rwsem when reading poison per memdev. Since both methods now hold both locks, down_read both from the caller. Doing so also addresses the lockdep assert that found this issue: Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event") Signed-off-by: Alison Schofield Reviewed-by: Davidlohr Bueso Reviewed-by: Dave Jiang Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/memdev.c | 9 ++++++++- drivers/cxl/core/region.c | 5 ----- 2 files changed, 8 insertions(+), 6 deletions(-) (limited to 'drivers') diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index fc5c2b414793..5ad1b13e780a 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -227,10 +227,16 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd) if (!port || !is_cxl_endpoint(port)) return -EINVAL; - rc = down_read_interruptible(&cxl_dpa_rwsem); + rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; + rc = down_read_interruptible(&cxl_dpa_rwsem); + if (rc) { + up_read(&cxl_region_rwsem); + return rc; + } + if (cxl_num_decoders_committed(port) == 0) { /* No regions mapped to this memdev */ rc = cxl_get_poison_by_memdev(cxlmd); @@ -239,6 +245,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd) rc = cxl_get_poison_by_endpoint(port); } up_read(&cxl_dpa_rwsem); + up_read(&cxl_region_rwsem); return rc; } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 56e575c79bb4..3e817a6f94c6 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2467,10 +2467,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port) struct cxl_poison_context ctx; int rc = 0; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) - return rc; - ctx = (struct cxl_poison_context) { .port = port }; @@ -2480,7 +2476,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port) rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev), &ctx); - up_read(&cxl_region_rwsem); return rc; } -- cgit v1.2.3 From 0e33ac9c3ffe5e4f55c68345f44cea7fec2fe750 Mon Sep 17 00:00:00 2001 From: Alison Schofield Date: Sun, 26 Nov 2023 16:09:30 -0800 Subject: cxl/memdev: Hold region_rwsem during inject and clear poison ops Poison inject and clear are supported via debugfs where a privileged user can inject and clear poison to a device physical address. Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") added a lockdep assert that highlighted a gap in poison inject and clear functions where holding the dpa_rwsem does not assure that a a DPA is not added to a region. The impact for inject and clear is that if the DPA address being injected or cleared has been attached to a region, but not yet committed, the dev_dbg() message intended to alert the debug user that they are acting on a mapped address is not emitted. Also, the cxl_poison trace event that serves as a log of the inject and clear activity will not include region info. Close this gap by snapshotting an unchangeable region state during poison inject and clear operations. That means holding both the region_rwsem and the dpa_rwsem during the inject and clear ops. Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command") Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command") Signed-off-by: Alison Schofield Reviewed-by: Davidlohr Bueso Reviewed-by: Dave Jiang Link: https://lore.kernel.org/r/08721dc1df0a51e4e38fecd02425c3475912dfd5.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/memdev.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) (limited to 'drivers') diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 5ad1b13e780a..2f43d368ba07 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -331,10 +331,16 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa) if (!IS_ENABLED(CONFIG_DEBUG_FS)) return 0; - rc = down_read_interruptible(&cxl_dpa_rwsem); + rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; + rc = down_read_interruptible(&cxl_dpa_rwsem); + if (rc) { + up_read(&cxl_region_rwsem); + return rc; + } + rc = cxl_validate_poison_dpa(cxlmd, dpa); if (rc) goto out; @@ -362,6 +368,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa) trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT); out: up_read(&cxl_dpa_rwsem); + up_read(&cxl_region_rwsem); return rc; } @@ -379,10 +386,16 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) if (!IS_ENABLED(CONFIG_DEBUG_FS)) return 0; - rc = down_read_interruptible(&cxl_dpa_rwsem); + rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; + rc = down_read_interruptible(&cxl_dpa_rwsem); + if (rc) { + up_read(&cxl_region_rwsem); + return rc; + } + rc = cxl_validate_poison_dpa(cxlmd, dpa); if (rc) goto out; @@ -419,6 +432,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR); out: up_read(&cxl_dpa_rwsem); + up_read(&cxl_region_rwsem); return rc; } -- cgit v1.2.3 From 6f5c4eca48ffe18307b4e1d375817691c9005c87 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Dec 2023 19:11:14 -0800 Subject: cxl/hdm: Fix dpa translation locking The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is sufficient to assert that cxl_dpa_rwsem is held rather than acquire it in the helper. Otherwise, it triggers multiple lockdep reports: 1/ Tracing callbacks are in an atomic context that can not acquire sleeping locks: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 [..] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Call Trace: dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] 2/ The rwsem is already held in the inject poison path: WARNING: possible recursive locking detected 6.7.0-rc2+ #12 Tainted: G W OE N -------------------------------------------- bash/1288 is trying to acquire lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core] but task is already holding lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core] [..] Call Trace: dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] __traceiter_cxl_poison+0x5c/0x80 [cxl_core] cxl_inject_poison+0x1bc/0x1e0 [cxl_core] This appears to have been an issue since the initial implementation and uncovered by the new cxl-poison.sh test [1]. That test is now passing with these changes. Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1] Cc: Cc: Alison Schofield Cc: Jonathan Cameron Cc: Dave Jiang Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/cxl/core/hdm.c | 3 +-- drivers/cxl/core/port.c | 4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-) (limited to 'drivers') diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 529baa8a1759..7d97790b893d 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -363,10 +363,9 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) { resource_size_t base = -1; - down_read(&cxl_dpa_rwsem); + lockdep_assert_held(&cxl_dpa_rwsem); if (cxled->dpa_res) base = cxled->dpa_res->start; - up_read(&cxl_dpa_rwsem); return base; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 38441634e4c6..b7c93bb18f6e 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -226,9 +226,9 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at char *buf) { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - u64 base = cxl_dpa_resource_start(cxled); - return sysfs_emit(buf, "%#llx\n", base); + guard(rwsem_read)(&cxl_dpa_rwsem); + return sysfs_emit(buf, "%#llx\n", (u64)cxl_dpa_resource_start(cxled)); } static DEVICE_ATTR_RO(dpa_resource); -- cgit v1.2.3 From c65efe3685f5d150eeca5599afeabdc85da899d1 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Thu, 16 Nov 2023 16:03:29 -0800 Subject: cxl/cdat: Free correct buffer on checksum error The new 6.7-rc1 kernel now checks the checksum on CDAT data. While using a branch of Fan's DCD qemu work (and specifying DCD devices), the following splat was observed. WARNING: CPU: 1 PID: 1384 at drivers/base/devres.c:1064 devm_kfree+0x4f/0x60 ... RIP: 0010:devm_kfree+0x4f/0x60 ... ? devm_kfree+0x4f/0x60 read_cdat_data+0x1a0/0x2a0 [cxl_core] cxl_port_probe+0xdf/0x200 [cxl_port] ... The issue in qemu is still unknown but the spat is a straight forward bug in the CDAT checksum processing code. Use a CDAT buffer variable to ensure the devm_free() works correctly on error. Fixes: 670e4e88f3b1 ("cxl: Add checksum verification to CDAT from CXL") Signed-off-by: Ira Weiny Reviewed-by: Dave Jiang Reviewed-by: Fan Ni Reviewed-by: Robert Richter Link: http://lore.kernel.org/r/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/pci.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'drivers') diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index eff20e83d0a6..37e1652afbc7 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -620,7 +620,7 @@ void read_cdat_data(struct cxl_port *port) struct pci_dev *pdev = NULL; struct cxl_memdev *cxlmd; size_t cdat_length; - void *cdat_table; + void *cdat_table, *cdat_buf; int rc; if (is_cxl_memdev(uport)) { @@ -651,16 +651,15 @@ void read_cdat_data(struct cxl_port *port) return; } - cdat_table = devm_kzalloc(dev, cdat_length + sizeof(__le32), - GFP_KERNEL); - if (!cdat_table) + cdat_buf = devm_kzalloc(dev, cdat_length + sizeof(__le32), GFP_KERNEL); + if (!cdat_buf) return; - rc = cxl_cdat_read_table(dev, cdat_doe, cdat_table, &cdat_length); + rc = cxl_cdat_read_table(dev, cdat_doe, cdat_buf, &cdat_length); if (rc) goto err; - cdat_table = cdat_table + sizeof(__le32); + cdat_table = cdat_buf + sizeof(__le32); if (cdat_checksum(cdat_table, cdat_length)) goto err; @@ -670,7 +669,7 @@ void read_cdat_data(struct cxl_port *port) err: /* Don't leave table data allocated on error */ - devm_kfree(dev, cdat_table); + devm_kfree(dev, cdat_buf); dev_err(dev, "Failed to read/validate CDAT.\n"); } EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); -- cgit v1.2.3 From ef3d5cf9c59cccb012aa6b93d99f4c6eb5d6648e Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Mon, 16 Oct 2023 16:25:05 -0700 Subject: cxl/pmu: Ensure put_device on pmu devices The following kmemleaks were detected when removing the cxl module stack: unreferenced object 0xffff88822616b800 (size 1024): ... backtrace: [<00000000bedc6f83>] kmalloc_trace+0x26/0x90 [<00000000448d1afc>] devm_cxl_pmu_add+0x3a/0x110 [cxl_core] [<00000000ca3bfe16>] 0xffffffffa105213b [<00000000ba7f78dc>] local_pci_probe+0x41/0x90 [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0 ... unreferenced object 0xffff8882260abcc0 (size 16): ... hex dump (first 16 bytes): 70 6d 75 5f 6d 65 6d 30 2e 30 00 26 82 88 ff ff pmu_mem0.0.&.... backtrace: ... [<00000000152b5e98>] dev_set_name+0x43/0x50 [<00000000c228798b>] devm_cxl_pmu_add+0x102/0x110 [cxl_core] [<00000000ca3bfe16>] 0xffffffffa105213b [<00000000ba7f78dc>] local_pci_probe+0x41/0x90 [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0 ... unreferenced object 0xffff8882272af200 (size 256): ... backtrace: [<00000000bedc6f83>] kmalloc_trace+0x26/0x90 [<00000000a14d1813>] device_add+0x4ea/0x890 [<00000000a3f07b47>] devm_cxl_pmu_add+0xbe/0x110 [cxl_core] [<00000000ca3bfe16>] 0xffffffffa105213b [<00000000ba7f78dc>] local_pci_probe+0x41/0x90 [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0 ... devm_cxl_pmu_add() correctly registers a device remove function but it only calls device_del() which is only part of device unregistration. Properly call device_unregister() to free up the memory associated with the device. Fixes: 1ad3f701c399 ("cxl/pci: Find and register CXL PMU devices") Cc: Jonathan Cameron Signed-off-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Link: https://lore.kernel.org/r/20231016-pmu-unregister-fix-v1-1-1e2eb2fa3c69@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers') diff --git a/drivers/cxl/core/pmu.c b/drivers/cxl/core/pmu.c index 7684c843e5a5..5d8e06b0ba6e 100644 --- a/drivers/cxl/core/pmu.c +++ b/drivers/cxl/core/pmu.c @@ -23,7 +23,7 @@ const struct device_type cxl_pmu_type = { static void remove_dev(void *dev) { - device_del(dev); + device_unregister(dev); } int devm_cxl_pmu_add(struct device *parent, struct cxl_pmu_regs *regs, -- cgit v1.2.3