lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2023-10-22	bcachefs: __bch2_btree_insert uses BTREE_INSERT_CACHED	Kent Overstreet
	Cached btrees should be doing cached updates by default: this fixes a bug in the migrate tool. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Snapshot whiteout fix	Kent Overstreet
	When fully overwriting an existing extent, we may need to generate a whiteout - not just if the extent being overwritten was in an older snapshot, but also if it was overwriting an extent in an older snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_btree_insert_nonextent()	Kent Overstreet
	This adds a new helper to delete some redundant code in bch2_trans_update_extent(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix verify_update_old_key()	Kent Overstreet
	This fixes a very-rare race in our assertion, with needs_whiteout being modified in the btree key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: let __bch2_btree_insert() pass in flags	Daniel Hill
	This patch is prep work for the following patch. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Btree write buffer	Kent Overstreet
	This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill trans->flags	Kent Overstreet
	Recursive transaction commits are occasionally necessary - in particular, for the upcoming btree write buffer's flush path. This avoids bugs due to trans->flags being accidentally mutated mid-commit, which can cause c->writes refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Debug mode for c->writes references	Kent Overstreet
	This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Use for_each_btree_key_upto() more consistently	Kent Overstreet
	It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use peek_upto() and provide an end for the interval we're searching for - otherwise, when we hit the end of the inode the next inode be in a different subvolume and not have any keys in the current snapshot, and we'd iterate over arbitrarily many keys before returning one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_trans_in_restart_error()	Kent Overstreet
	This replaces various BUG_ON() assertions with panics that tell us where the restart was done and the restart type. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix bch2_trans_reset_updates()	Kent Overstreet
	This should have been resetting trans->fs_usage_deltas as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert EAGAIN errors to private error codes	Kent Overstreet
	More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert EROFS errors to private error codes	Kent Overstreet
	More error code improvements - this gets us more useful error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: btree_iter->ip_allocated	Kent Overstreet
	In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill bch2_extent_trim_atomic() usage	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Log more messages in the journal	Kent Overstreet
	This patch - Adds a mechanism for queuing up journal entries prior to the journal being started, which will be used for early journal log messages - Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now take format strings. bch2_fs_log_msg() can be used before or after the journal has been started, and will use the appropriate mechanism. - Deletes the now obsolete bch2_journal_log_msg() - And adds more log messages to the recovery path - messages for journal/filesystem started, journal entries being blacklisted, and journal replay starting/finishing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: New btree helpers	Kent Overstreet
	This introduces some new conveniences, to help cut down on boilerplate: - bch2_trans_kmalloc_nomemzero() - performance optimiation - bch2_bkey_make_mut() - bch2_bkey_get_mut() - bch2_bkey_get_mut_typed() - bch2_bkey_alloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix error path in bch2_trans_commit_write_locked()	Kent Overstreet
	Previously, we were journalling extra_journal_entries (which is used for new btree roots, and irreversably mutates system state) before calling bch2_trans_fs_usage_apply(), which can fail - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_trans_revalidate_updates_in_node()	Kent Overstreet
	When we started stashing the key being overwritten in btree_insert_entry, this introduced a typical iterator invalidation problem, triggered by btree node splits or resorts. Previously, dealt with this by unconditionally re-validating those stashed pointers in the transaction commit path. This patch gets rid of that by doing it only when needed, in bch2_trans_node_add() or bch2_trans_node_reinit_iter(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill btree_insert_ret enum	Kent Overstreet
	Replace with standard bcachefs-private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_btree_path_peek_slot_exact()	Kent Overstreet
	When we start using the key cache for inodes again, it'll be possible for bch2_btree_path_peek_slot() to return a key in a different snapshot with a key cache path. This isn't what we want when triggers are checking what they're overwriting, so introduce a new helper for the commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: New bpos_cmp(), bkey_cmp() replacements	Kent Overstreet
	This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill some unneeded references to c->flags	Kent Overstreet
	This drops some unneeded references to JOURNAL_REPLAY_DONE in c->flags: we're already mirroring it in btree_trans, we just weren't using it consistently. We may want to do this with more flags: btree_iter.c: unsigned nr = test_bit(BCH_FS_STARTED, &c->flags) btree_update_leaf.c: if (unlikely(!test_bit(BCH_FS_MAY_GO_RW, &c->flags))) { Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Tiny bch2_trans_update_by_path_trace() optimization	Kent Overstreet
	This just removes a redundant comparison - there's more work we could do here to remove some redundant copying. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Move some asserts behind CONFIG_BCACHEFS_DEBUG	Kent Overstreet
	Convert some non-critical asserts in long-stable code to debug asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix a race with b->write_type	Kent Overstreet
	b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_trans_commit_bkey_invalid()	Kent Overstreet
	- factor out more slowpath code into non-inline function - use bch2_print_string_as_lines(), so our error message doesn't get truncated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Inlining improvements	Kent Overstreet
	- Don't call into bch2_encrypt_bio() when we're not encrypting - Pull slowpath out of trans_lock_write() - Make sure bc2h_trans_journal_res_get() gets inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Improved btree write statistics	Kent Overstreet
	This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Don't issue transaction restart on key cache realloc	Kent Overstreet
	This shouldn't be needed anymore, since we don't rely on the pointer validity that this was guarding against anymore - we get a new good reference and save it right after this function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Separate out flush_new_cached_update()	Kent Overstreet
	This separates out the slowpath of bch2_trans_update_by_path_trace() into a new non-inlined helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofail	Kent Overstreet
	Now that we have an error path plumbed through, there's no need to be using bch2_btree_node_lock_write_nofail(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix a trans path overflow in bch2_btree_delete_range_trans()	Kent Overstreet
	bch2_btree_delete_range_trans() was using btree_trans_too_many_iters() to avoid path overflow, but this was buggy here (and also btree_trans_too_many_iters() is suspect in general). btree_trans_too_many_iters() only returns true when we're close to the maximum number of paths - within 8 - but extent insert/delete assumes that it can use more paths than that. Instead, we need to call bch2_trans_begin() on every loop iteration. Since we don't want to call bch2_trans_begin() (restarting the outer transaction) if the call was a no-op - if we had no work to do - we have to structure things a bit oddly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Inline fast path of check_pos_snapshot_overwritten()	Kent Overstreet
	This moves the slowpath of check_pos_snapshot_overwritten() to a separate function, and inlines the fast path - helping performance on btrees that don't use snapshot and for users that aren't using snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill normalize_read_intent_locks()	Kent Overstreet
	Before we had the deadlock cycle detector, we didn't want to be holding read locks when taking intent locks, because blocking on an intent lock while holding a read lock was a lock ordering violation that could cause a deadlock. With the cycle detector this is no longer an issue, so this code can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Delete old deadlock avoidance code	Kent Overstreet
	This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Add private error codes for ENOSPC	Kent Overstreet
	Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_btree_path_upgrade() now emits transaction restart	Kent Overstreet
	Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Convert more locking code to btree_bkey_cached_common	Kent Overstreet
	Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: bch2_btree_node_lock_write_nofail()	Kent Overstreet
	Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: New locking functions	Kent Overstreet
	In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Mark write locks before taking lock	Kent Overstreet
	six locks are unfair: while a thread is blocked trying to take a write lock, new read locks will fail. The new deadlock cycle detector makes use of our existing lock tracing, so we need to tell it we're holding a write lock before we take the lock for it to work correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Add persistent counters for all tracepoints	Kent Overstreet
	Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Improve trans_restart_journal_preres_get tracepoint	Kent Overstreet
	It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Track held write locks	Kent Overstreet
	The upcoming lock cycle detection code will need to know precisely which locks every btree_trans is holding, including write locks - this patch updates btree_node_locked_type to include write locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Switch btree locking code to struct btree_bkey_cached_common	Kent Overstreet
	This is just some type safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Better use of locking helpers	Kent Overstreet
	Held btree locks are tracked in btree_path->nodes_locked and btree_path->nodes_intent_locked. Upcoming patches are going to change the representation in struct btree_path, so this patch switches to proper helpers instead of direct access to these fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: bch2_btree_delete_range_trans() now returns ↵	Kent Overstreet
	-BCH_ERR_transaction_restart_nested The new convention is that functions that handle transaction restarts within an existing transaction context should return -BCH_ERR_transaction_restart_nested when they did so, since they invalidated the outer transaction context. This also means bch2_btree_delete_range_trans() is changed to only call bch2_trans_begin() after a transaction restart, not on every loop iteration. This is to fix a bug in fsck, in check_inode() when we truncate an inode with BCH_INODE_I_SIZE_DIRTY set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Switch bch2_btree_delete_range() to bch2_trans_run()	Kent Overstreet
	This fixes an assertion about unexpected transaction restarts - bch2_delete_range_trans() handles transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix btree_path->uptodate inconsistency	Kent Overstreet
	This fixes an assertion in bch2_btree_path_peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>