lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2023-10-22	bcachefs: More dio inlining	Kent Overstreet
	Eliminate another function call in the O_DIRECT write path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Error message improvement	Kent Overstreet
	- Centralize format strings in bcachefs.h - Add bch2_fmt_inum_offset() and related helpers - Switch error messages for inodes to also print out the offset, in bytes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Improve a few warnings	Kent Overstreet
	Warnings ought to always have a format string/log message - makes them considerably more useful. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Minor dio write path improvements	Kent Overstreet
	This switches where we take quota reservations to be per bch_wirte_op instead of per dio_write, so we can drop the quota reservation in the same place as we call i_sectors_acct(), and only take/release ei_quota_lock once. In the future we'd like ei_quota_lock to not be a mutex, so that we can avoid punting to process context before deliving write completions in nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Factor out two_state_shared_lock	Kent Overstreet
	We have a unique lock used for controlling adding to the pagecache: the lock has two states, where both states are shared - the lock may be held multiple times for either state - but not both states at the same time. This is exactly what we need for nocow mode locking, so this patch pulls it out of fs.c into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill BCH_WRITE_FLUSH	Kent Overstreet
	BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only used in the direct IO path, and this will allow for some consolidation with the regular fsync path, which will help with the upcoming nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: DIO write path optimization	Kent Overstreet
	- With BCH_WRITE_SYNC, we no longer need the completion in struct dio_write - Pull out bch2_dio_write_copy_iov() into a separate non-inline function, it's code that doesn't run in the common case - Copy mapping and inode pointers into dio_write, avoiding pointer chasing at the start of bch2_dio_write_loop() - kthread_use_mm() is not needed in the common case; move it into bch2_dio_write_loop_async() - factor out various helpers from bch2_dio_write_loop() and rework control flow for better icache utilization Other small optimizations: - bch2_keylist_free() is only used in one place, at the end of the bch2_write() path - drop the reinit - in bch2_disk_reservation_put(), check if res->sectors is nonzero before touching c->online_reserved, since that will likely be a cache miss Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> bcachefs: More DIO write path optimization Better code prefetching (?) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: BCH_WRITE_SYNC	Kent Overstreet
	This adds a new flag for the write path, BCH_WRITE_SYNC, and switches the O_DIRECT write path to use it when we're not running asynchronously. It runs the btree update after the write in the original thread's context instead of a kworker, cutting context switches in half. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix a spurious warning	Kent Overstreet
	Fixes fstests generic/648 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix buffered write path for generic/275	Kent Overstreet
	Per fstests generic/275, on -ENOSPC we're supposed write until the filesystem is full - i.e. do a partial write instead of failing the full write. This is a partial fix for the buffered write path: we'll still fail on a page boundary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Assorted checkpatch fixes	Kent Overstreet
	checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Quota fixes	Kent Overstreet
	- We now correctly allow soft limits to be exceeded, instead of always returning -EDQUOT - Disk quota grate times/warnings can now be set, not just the systemwide defaults Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix for not dropping privs in fallocate	Kent Overstreet
	When modifying a file, we may be required to drop the suid/sgid bits - we were missing a file_modified() call to do this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Fix bch2_write_begin()	Kent Overstreet
	An error case was jumping to the wrong label, creating an infinite loop - oops. This fixes fstests generic/648. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Reflink now respects quotas	Kent Overstreet
	This adds a new helper, quota_reserve_range(), which takes a quota reservation for unallocated blocks in a given file range, and uses it in bch2_remap_file_range(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Kill io_in_flight semaphore	Kent Overstreet
	This used to be needed more for buffered IO, but now the block layer has writeback throttling - we can delete this now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Add private error codes for ENOSPC	Kent Overstreet
	Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Errcodes can now subtype standard error codes	Kent Overstreet
	The next patch is going to be adding private error codes for all the places we return -ENOSPC. Additionally, this patch updates return paths at all module boundaries to call bch2_err_class(), to return the standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: EINTR -> BCH_ERR_transaction_restart	Kent Overstreet
	Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Always use percpu_ref_tryget_live() on c->writes	Kent Overstreet
	If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Delete bch_writepage	Kent Overstreet
	Per Dave Chinner and the xfs folks, .writepage is no longer needed, and it's better not to define it if .writepages is the intended path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Go emergency RO when i_blocks underflows	Kent Overstreet
	This improves some of our warnings and assertions - they imply possible filesystem inconsistencies, so they should be calling bch2_fs_inconsistent(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Don't skip triggers in fcollapse()	Kent Overstreet
	With backpointers this doesn't work anymore - backpointers always need to be updated to point to the new extent position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Convert some WARN_ONs to WARN_ON_ONCE	Kent Overstreet
	These warnings are symptomatic of something else going wrong, we don't want them spamming up the logs as that'll make it harder to find the real issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix dio write path with loopback dio mode	Kent Overstreet
	When the iov_iter is a bvec iter, it's possible the IO was submitted from a kthread that didn't have an mm to switch to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Use bio_iov_vecs_to_alloc()	Kent Overstreet
	This fixes a bug in the DIO read path where, when using a loopback device in DIO mode, we'd allocate a biovec that would get overwritten and leaked in bio_iov_iter_get_pages() -> bio_iov_bvec_set(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Check for stale dirty pointer before reads	Kent Overstreet
	Since we retry reads when we discover we read from a pointer that went stale, if a dirty pointer is erroniously stale it would cause us to loop retrying that read forever - unless we check before issuing the read, while the btree is still locked, when we know that a dirty pointer should never be stale. This patch adds that check, along with printing some helpful debug info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: BTREE_ITER_FILTER_SNAPSHOTS is selected automatically	Kent Overstreet
	It doesn't have to be specified - this patch deletes the two instances where it was. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix an assertion in bch2_truncate()	Kent Overstreet
	We recently added an assertion that when we truncate a file to 0, i_blocks should also go to 0 - but that's not necessarily true if we're doing an emergency shutdown, lots of invariants no longer hold true in that case. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Convert a BUG_ON() to a warning	Kent Overstreet
	A user reported hitting this assertion, and we can't reproduce it yet, but it shouldn't be fatal - so convert it to a warning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix page state after fallocate	Kent Overstreet
	This tweaks the fallocate code to also update the page cache to reflect the new on disk reservations, giving us better i_sectors consistency. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix page state when reading into !PageUptodate pages	Kent Overstreet
	This patch adds code to read page state before writing to pages that aren't uptodate, which corrects i_sectors being tempororarily too large and means we may not need to get a disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> # Conflicts: # fs/bcachefs/fs-io.c
2023-10-22	bcachefs: Kill PAGE_SECTOR_SHIFT	Kent Overstreet
	Replace it with the new, standard PAGE_SECTORS_SHIFT Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Apply workaround for too many btree iters to read path	Kent Overstreet
	Reading from cached data, which calls bch2_bucket_io_time_reset(), is leading to transaction iterator overflows - this standardizes the workaround. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: SECTOR_DIRTY_RESERVED	Kent Overstreet
	This fixes another i_sectors accounting bug - we need to differentiate between dirty writes that overwrite a reservation and dirty writes to unallocated space - dirty writes to unallocated space increase i_sectors, dirty writes over a reservation do not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix i_sectors_leak in bch2_truncate_page	Kent Overstreet
	When bch2_truncate_page() discards dirty sectors in the page cache, we need to account for that - we don't need to account for allocated sectors because that'll be done by the bch2_fpunch() call when it updates the btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix an i_sectors accounting bug	Kent Overstreet
	We weren't checking for errors before calling i_sectors_acct() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Don't check for -ENOSPC in page writeback	Kent Overstreet
	If at all possible we'd prefer to not fail page writeback unless the filesystem has been shutdown; allowing errors in page writeback means things we'd like to assert about i_size consistency between the VFS and the btree go out the window. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fallocate fixes	Kent Overstreet
	- fpunch wasn't always correctly updating i_size - when we drop buffered writes that were extending a file, we become responsible for writing i_size. - fzero was sometimes zeroing out more data that it should have - block_start and block_end were being rounded in the wrong directions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Switch fsync to use bi_journal_seq	Kent Overstreet
	Now that we're recording in each inode the journal sequence number of the most recent update, fsync becomes a lot simpler and we can delete all the plumbing for ei_journal_seq. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix restart handling in for_each_btree_key()	Kent Overstreet
	Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: bch2_trans_exit() no longer returns errors	Kent Overstreet
	Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Convert io paths for snapshots	Kent Overstreet
	This plumbs around the subvolume ID as was done previously for other filesystem code, but now for the IO paths - the control flow in the IO paths is trickier so the changes in this patch are more involved. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Plumb through subvolume id	Kent Overstreet
	To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: btree_path	Kent Overstreet
	This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22	bcachefs: Reduce iter->trans usage	Kent Overstreet
	Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Fix an unhandled transaction restart	Kent Overstreet
	__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may cause a transaction restart, which we don't return an error for because it doesn't prevent us from making forward progress on the read we're submitting. Instead, change __bch2_read() and bchfs_read() to check for transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Use bch2_trans_begin() more consistently	Kent Overstreet
	Upcoming patch will require that a transaction restart is always immediately followed by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Always check for transaction restarts	Kent Overstreet
	On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22	bcachefs: Use bch2_inode_find_by_inum() in truncate	Kent Overstreet
	This is needed for snapshots because we need to start handling lock restarts even when just calling bch2_inode_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>