summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_log_priv.h
diff options
context:
space:
mode:
authorDarrick J. Wong <djwong@kernel.org>2023-09-11 08:39:02 -0700
committerDarrick J. Wong <djwong@kernel.org>2023-09-11 08:39:02 -0700
commitecd49f7a36fbccc884471f86fc43de6ca8d1f786 (patch)
treeb21ca277df0c11f03faddefd3a25f11d8b04dd4b /fs/xfs/xfs_log_priv.h
parentcfa2df68b7ceb49ac9eb2d295ab0c5974dbf17e7 (diff)
downloadlwn-ecd49f7a36fbccc884471f86fc43de6ca8d1f786.tar.gz
lwn-ecd49f7a36fbccc884471f86fc43de6ca8d1f786.zip
xfs: fix per-cpu CIL structure aggregation racing with dying cpus
In commit 7c8ade2121200 ("xfs: implement percpu cil space used calculation"), the XFS committed (log) item list code was converted to use per-cpu lists and space tracking to reduce cpu contention when multiple threads are modifying different parts of the filesystem and hence end up contending on the log structures during transaction commit. Each CPU tracks its own commit items and space usage, and these do not have to be merged into the main CIL until either someone wants to push the CIL items, or we run over a soft threshold and switch to slower (but more accurate) accounting with atomics. Unfortunately, the for_each_cpu iteration suffers from the same race with cpu dying problem that was identified in commit 8b57b11cca88f ("pcpcntrs: fix dying cpu summation race") -- CPUs are removed from cpu_online_mask before the CPUHP_XFS_DEAD callback gets called. As a result, both CIL percpu structure aggregation functions fail to collect the items and accounted space usage at the correct point in time. If we're lucky, the items that are collected from the online cpus exceed the space given to those cpus, and the log immediately shuts down in xlog_cil_insert_items due to the (apparent) log reservation overrun. This happens periodically with generic/650, which exercises cpu hotplug vs. the filesystem code: smpboot: CPU 3 is now offline XFS (sda3): ctx ticket reservation ran out. Need to up reservation XFS (sda3): ticket reservation summary: XFS (sda3): unit res = 9268 bytes XFS (sda3): current res = -40 bytes XFS (sda3): original count = 1 XFS (sda3): remaining count = 1 XFS (sda3): Filesystem has been shut down due to log error (0x2). Applying the same sort of fix from 8b57b11cca88f to the CIL code seems to make the generic/650 problem go away, but I've been told that tglx was not happy when he saw: "...the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation." The CPU hotplug code is rather complex and difficult to understand and I don't want to try to understand the cpu hotplug locking well enough to use cpu_dying mask. Furthermore, there's a performance improvement that could be had here. Attach a private cpu mask to the CIL structure so that we can track exactly which cpus have accessed the percpu data at all. It doesn't matter if the cpu has since gone offline; log item aggregation will still find the items. Better yet, we skip cpus that have not recently logged anything. Worse yet, Ritesh Harjani and Eric Sandeen both reported today that CPU hot remove racing with an xfs mount can crash if the cpu_dead notifier tries to access the log but the mount hasn't yet set up the log. Link: https://lore.kernel.org/linux-xfs/ZOLzgBOuyWHapOyZ@dread.disaster.area/T/ Link: https://lore.kernel.org/lkml/877cuj1mt1.ffs@tglx/ Link: https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/ Link: https://lore.kernel.org/linux-xfs/ZOVkjxWZq0YmjrJu@dread.disaster.area/T/ Cc: tglx@linutronix.de Cc: peterz@infradead.org Reported-by: ritesh.list@gmail.com Reported-by: sandeen@sandeen.net Fixes: af1c2146a50b ("xfs: introduce per-cpu CIL tracking structure") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Diffstat (limited to 'fs/xfs/xfs_log_priv.h')
-rw-r--r--fs/xfs/xfs_log_priv.h14
1 files changed, 6 insertions, 8 deletions
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 1bd2963e8fbd..af87648331d5 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -231,6 +231,12 @@ struct xfs_cil_ctx {
struct work_struct discard_endio_work;
struct work_struct push_work;
atomic_t order_id;
+
+ /*
+ * CPUs that could have added items to the percpu CIL data. Access is
+ * coordinated with xc_ctx_lock.
+ */
+ struct cpumask cil_pcpmask;
};
/*
@@ -278,9 +284,6 @@ struct xfs_cil {
wait_queue_head_t xc_push_wait; /* background push throttle */
void __percpu *xc_pcp; /* percpu CIL structures */
-#ifdef CONFIG_HOTPLUG_CPU
- struct list_head xc_pcp_list;
-#endif
} ____cacheline_aligned_in_smp;
/* xc_flags bit values */
@@ -705,9 +708,4 @@ xlog_kvmalloc(
return p;
}
-/*
- * CIL CPU dead notifier
- */
-void xlog_cil_pcp_dead(struct xlog *log, unsigned int cpu);
-
#endif /* __XFS_LOG_PRIV_H__ */