Age | Commit message (Collapse) | Author |
|
Cached btrees should be doing cached updates by default: this fixes a
bug in the migrate tool.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When fully overwriting an existing extent, we may need to generate a
whiteout - not just if the extent being overwritten was in an older
snapshot, but also if it was overwriting an extent in an older snapshot.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new helper to delete some redundant code in
bch2_trans_update_extent().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a very-rare race in our assertion, with needs_whiteout being
modified in the btree key.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch is prep work for the following patch.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Recursive transaction commits are occasionally necessary - in
particular, for the upcoming btree write buffer's flush path.
This avoids bugs due to trans->flags being accidentally mutated
mid-commit, which can cause c->writes refcount leaks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.
This will make it easier to debug shutdown hangs due to refcount leaks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use
peek_upto() and provide an end for the interval we're searching for -
otherwise, when we hit the end of the inode the next inode be in a
different subvolume and not have any keys in the current snapshot, and
we'd iterate over arbitrarily many keys before returning one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This replaces various BUG_ON() assertions with panics that tell us where
the restart was done and the restart type.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This should have been resetting trans->fs_usage_deltas as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More error code cleanup, for better error messages and debugability.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More error code improvements - this gets us more useful error messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In debug mode, we now track where btree iterators and paths are
initialized/allocated - helpful in tracking down btree path overflows.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch
- Adds a mechanism for queuing up journal entries prior to the journal
being started, which will be used for early journal log messages
- Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now
take format strings. bch2_fs_log_msg() can be used before or after
the journal has been started, and will use the appropriate mechanism.
- Deletes the now obsolete bch2_journal_log_msg()
- And adds more log messages to the recovery path - messages for
journal/filesystem started, journal entries being blacklisted, and
journal replay starting/finishing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This introduces some new conveniences, to help cut down on boilerplate:
- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we were journalling extra_journal_entries (which is used for
new btree roots, and irreversably mutates system state) before calling
bch2_trans_fs_usage_apply(), which can fail - whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we started stashing the key being overwritten in
btree_insert_entry, this introduced a typical iterator invalidation
problem, triggered by btree node splits or resorts.
Previously, dealt with this by unconditionally re-validating those
stashed pointers in the transaction commit path. This patch gets rid of
that by doing it only when needed, in bch2_trans_node_add() or
bch2_trans_node_reinit_iter().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Replace with standard bcachefs-private error codes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we start using the key cache for inodes again, it'll be possible
for bch2_btree_path_peek_slot() to return a key in a different snapshot
with a key cache path.
This isn't what we want when triggers are checking what they're
overwriting, so introduce a new helper for the commit path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()
and equivalent replacements for bkey_cmp().
Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This drops some unneeded references to JOURNAL_REPLAY_DONE in c->flags:
we're already mirroring it in btree_trans, we just weren't using it
consistently.
We may want to do this with more flags:
btree_iter.c: unsigned nr = test_bit(BCH_FS_STARTED, &c->flags)
btree_update_leaf.c: if (unlikely(!test_bit(BCH_FS_MAY_GO_RW, &c->flags))) {
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This just removes a redundant comparison - there's more work we could do
here to remove some redundant copying.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert some non-critical asserts in long-stable code to debug asserts.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
b->write_type needs to be set atomically with setting the
btree_node_need_write flag, so move it into b->flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- factor out more slowpath code into non-inline function
- use bch2_print_string_as_lines(), so our error message doesn't get
truncated
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Don't call into bch2_encrypt_bio() when we're not encrypting
- Pull slowpath out of trans_lock_write()
- Make sure bc2h_trans_journal_res_get() gets inlined.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This replaces sysfs btree_avg_write_size with btree_write_stats, which
now breaks out statistics by the source of the btree write.
Btree writes that are too small are a source of inefficiency, and
excessive btree resort overhead - this will let us see what's causing
them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This shouldn't be needed anymore, since we don't rely on the pointer
validity that this was guarding against anymore - we get a new good
reference and save it right after this function.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This separates out the slowpath of bch2_trans_update_by_path_trace()
into a new non-inlined helper.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now that we have an error path plumbed through, there's no need to be
using bch2_btree_node_lock_write_nofail().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_delete_range_trans() was using btree_trans_too_many_iters()
to avoid path overflow, but this was buggy here (and also
btree_trans_too_many_iters() is suspect in general).
btree_trans_too_many_iters() only returns true when we're close to the
maximum number of paths - within 8 - but extent insert/delete assumes
that it can use more paths than that.
Instead, we need to call bch2_trans_begin() on every loop iteration.
Since we don't want to call bch2_trans_begin() (restarting the outer
transaction) if the call was a no-op - if we had no work to do - we have
to structure things a bit oddly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This moves the slowpath of check_pos_snapshot_overwritten() to a
separate function, and inlines the fast path - helping performance on
btrees that don't use snapshot and for users that aren't using
snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Before we had the deadlock cycle detector, we didn't want to be holding
read locks when taking intent locks, because blocking on an intent lock
while holding a read lock was a lock ordering violation that could
cause a deadlock.
With the cycle detector this is no longer an issue, so this code can be
deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This deletes our old lock ordering based deadlock avoidance code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Centralizing the transaction restart/tracepoint in
bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits
old and new locks_want.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Ideally, all the code in btree_locking.c should be converted, but then
we'd want to convert btree_path to point to btree_key_cached_common too,
and then we'd be in for a much bigger cleanup - but a bit of incremental
cleanup will still be helpful for the next patches.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Taking a write lock will be able to fail, with the new cycle detector -
unless we pass it nofail, which is possible but not preferred.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.
This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
six locks are unfair: while a thread is blocked trying to take a write
lock, new read locks will fail. The new deadlock cycle detector makes
use of our existing lock tracing, so we need to tell it we're holding a
write lock before we take the lock for it to work correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It now includes journal_flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The upcoming lock cycle detection code will need to know precisely which
locks every btree_trans is holding, including write locks - this patch
updates btree_node_locked_type to include write locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is just some type safety cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Held btree locks are tracked in btree_path->nodes_locked and
btree_path->nodes_intent_locked. Upcoming patches are going to change
the representation in struct btree_path, so this patch switches to
proper helpers instead of direct access to these fields.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
-BCH_ERR_transaction_restart_nested
The new convention is that functions that handle transaction restarts
within an existing transaction context should return
-BCH_ERR_transaction_restart_nested when they did so, since they
invalidated the outer transaction context.
This also means bch2_btree_delete_range_trans() is changed to only call
bch2_trans_begin() after a transaction restart, not on every loop
iteration.
This is to fix a bug in fsck, in check_inode() when we truncate an inode
with BCH_INODE_I_SIZE_DIRTY set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes an assertion about unexpected transaction restarts -
bch2_delete_range_trans() handles transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes an assertion in bch2_btree_path_peek_slot().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|