summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-07-14bcachefs: darray: Don't pass NULL to memcpy()Tavian Barnes
memcpy's second parameter must not be NULL, even if size is zero. Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Kill bch2_assert_btree_nodes_not_locked()Kent Overstreet
We no longer track individual btree node locks with lockdep, so this will never be enabled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTEDKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: __bch2_read(): call trans_begin() on every loop iterKent Overstreet
perusal of /sys/kernel/debug/bcachefs/*/btree_transaction_stats shows that the read path has been acculumalating unneeded paths on the reflink btree, which we don't want. The solution is to call bch2_trans_begin(), which drops paths not used on previous loop iteration. bch2_readahead: Max mem used: 0 Transaction duration: count: 194235 since mount recent duration of events min: 150 ns max: 9 ms total: 838 ms mean: 4 us 6 us stddev: 34 us 7 us time between events min: 10 ns max: 15 h mean: 2 s 12 s stddev: 2 s 3 ms Maximum allocated btree paths (193): path: idx 2 ref 0:0 P btree=extents l=0 pos 270943112:392:U32_MAX locks 0 path: idx 3 ref 1:0 S btree=extents l=0 pos 270943112:24578:U32_MAX locks 1 path: idx 4 ref 0:0 P btree=reflink l=0 pos 0:24773509:0 locks 0 path: idx 5 ref 0:0 P S btree=reflink l=0 pos 0:24773631:0 locks 1 path: idx 6 ref 0:0 P S btree=reflink l=0 pos 0:24773759:0 locks 1 path: idx 7 ref 0:0 P S btree=reflink l=0 pos 0:24773887:0 locks 1 path: idx 8 ref 0:0 P S btree=reflink l=0 pos 0:24774015:0 locks 1 path: idx 9 ref 0:0 P S btree=reflink l=0 pos 0:24774143:0 locks 1 path: idx 10 ref 0:0 P S btree=reflink l=0 pos 0:24774271:0 locks 1 <many more reflink paths> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: show none if label is not setHongbo Li
If label is not set, the Label tag in superblock info show '(none)'. ``` [Before] Device index: 0 Label: Version: 1.4: member_seq [After] Device index: 0 Label: (none) Version: 1.4: member_seq ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: drop packed, aligned from bkey_inode_bufKent Overstreet
Unnecessary here, and this broke the rust bindings: error[E0588]: packed type cannot transitively contain a `#[repr(align)]` type --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:29025:1 | 29025 | pub struct bkey_i_inode_v3 { | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | note: `bch_inode_v3` has a `#[repr(align)]` attribute --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:8949:1 | 8949 | pub struct bch_inode_v3 { | ^^^^^^^^^^^^^^^^^^^^^^^ error[E0588]: packed type cannot transitively contain a `#[repr(align)]` type --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:32826:1 | 32826 | pub struct bkey_inode_buf { | ^^^^^^^^^^^^^^^^^^^^^^^^^ | note: `bch_inode_v3` has a `#[repr(align)]` attribute --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:8949:1 | 8949 | pub struct bch_inode_v3 { | ^^^^^^^^^^^^^^^^^^^^^^^ note: `bkey_inode_buf` contains a field of type `bkey_i_inode_v3` --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:32827:9 | 32827 | pub inode: bkey_i_inode_v3, | ^^^^^ note: ...which contains a field of type `bch_inode_v3` --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:29027:9 | 29027 | pub v: bch_inode_v3, | ^ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: btree node scan: fall back to comparing by journal seqKent Overstreet
highly damaged filesystems, or filesystems that have been damaged and repair and damaged again, may have sequence numbers we can't fully trust - which in itself is something we need to debug. Add a journal_seq fallback so that repair doesn't get stuck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Add lockdep support for btree node locksKent Overstreet
This adds lockdep tracking for held btree locks with a single dep_map in btree_trans, i.e. tracking all held btree locks as one object. This is more practical and more useful than having lockdep track held btree locks individually, because - we can take more locks than lockdep can track (unbounded, now that we have dynamically resizable btree paths) - there's no lock ordering between btree locks for lockdep to track (we do cycle detection) - and this makes it easy to teach lockdep that btree locks are not safe to hold while invoking memory reclaim. The last rule is one that lockdep would never learn, because we only do trylock() from within shrinkers - but we very much do not want to be invoking memory reclaim while holding btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14lockdep: lockdep_set_notrack_class()Kent Overstreet
Add a new helper to disable lockdep tracking entirely for a given class. This is needed for bcachefs, which takes too many btree node locks for lockdep to track. Instead, we have a single lockdep_map for "btree_trans has any btree nodes locked", which makes more since given that we have centralized lock management and a cycle detector. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Improve copygc_wait_to_text()Kent Overstreet
printing the raw values can occasionally be very useful Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Convert clock code to u64sKent Overstreet
Eliminate possible integer truncation bugs on 32 bit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Improve startup messageKent Overstreet
We're not always mounting when we start the filesystem Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Self healing on read IO errorKent Overstreet
This repurposes the promote path, which already knows how to call data_update() after a read: we now automatically rewrite bad data when we get a read error and then successfully retry from a different replica. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Make read_only a mount option again, but hiddenKent Overstreet
fsck passes read_only as a mount option, and it's required for nochanges, which it also uses. Usually read_only is handled by the VFS, but we need to be able to handle it too; we just don't want to print it out twice, so mark it as a hidden option. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_extent_crc_unpacked_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Ratelimit checksum error messagesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: spelling fixKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Simplify btree key cache fill pathKent Overstreet
Don't allocate the new bkey_cached until after we've done the btree lookup; this means we can kill bkey_cached.valid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Improve "unable to allocate journal write" messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Fix missing BTREE_TRIGGER_bucket_invalidate flagKent Overstreet
This fixes an accounting mismatch for cached data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Ensure buffered writes write as much as they canKent Overstreet
This adds a new helper, bch2_folio_reservation_get_partial(), which reserves as many blocks as possible and may return partial success. __bch2_buffered_write() is switched to the new helper - this fixes fstests generic/275, the write until -ENOSPC test. generic/230 now fails: this appears to be a test bug, where xfs_io isn't looping after a partial write to get the error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: support STATX_DIOALIGN for statx fileHongbo Li
Add support for STATX_DIOALIGN to bcachefs, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/bcachefs/test statx(/mnt/bcachefs/test) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/bcachefs/test statx(/mnt/bcachefs/test) = 0 dio mem align:1 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: split out lru_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_btree_key_cache_drop() now evictsKent Overstreet
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: set fgf order hint before starting a buffered writePankaj Raghav
Set the preferred folio order in the fgp_flags by calling fgf_set_order(). Page cache will try to allocate large folio of the preferred order whenever possible instead of allocating multiple 0 order folios. This improves the buffered write performance up to 1.25x with default mount options and up to 1.57x when mounted with no_data_io option with the following fio workload: fio --name=bcachefs --filename=/mnt/test --size=100G \ --ioengine=io_uring --iodepth=16 --rw=write --bs=128k Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: use FGP_WRITEBEGIN instead of combining individual flagsPankaj Raghav
Use FGP_WRITEBEGIN to avoid repeating the individual FGP flags before starting a buffered write. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Reduce the scope of gc_lockKent Overstreet
gc_lock is now only for synchronization between check_alloc_info and interior btree updates - nothing else Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: per_cpu_sum()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: kill key cache arg to bch2_assert_pos_locked()Kent Overstreet
this is an internal implementation detail - and we're improving key cache coherency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: btree_path_cached_set()Kent Overstreet
new helper - small refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: btree_node_unlock() assertKent Overstreet
we have a separate helper for releasing write locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_gc_pos_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_btree_id_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Kill gc_pos_btree_node()Kent Overstreet
gc_pos is now based on keys, not nodes, for invariantness w.r.t. splits and merges Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Fix bch2_gc_accounting_done() lockingKent Overstreet
The transaction commit path takes mark_lock, so we shouldn't be holding it; use a bpos as an iterator so that we can drop and retake. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_accounting_mem_gc()Kent Overstreet
Add a new helper to free zeroed out accounting entries, and use it in bch2_replicas_gc2(); bch2_replicas_gc2() was killing superblock replicas entries if their corresponding accounting counters were nonzero, but that's incorrect - the superblock replicas entry needs to exist if the accounting entry exists, not if it's nonzero, because we check and create the replicas entry when creating the new accounting entry - we don't know when it's becoming nonzero. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Refactor disk accounting data structuresKent Overstreet
Break up the percpu counter allocations into individual allocations for each disk accounting counter; this fixes an issue on large systems where we have too many replica entries to for the percpu allocator's max practical size. Also, use just one eytzinger tree for the normal set of counters and the gc counters; this simplifies accounting_gc_done() where we need the same set of counters to be present in both tables. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: fix smatch data leak warning in fs usage ioctlBrian Foster
smatch warns that the copy of arg to userspace is a potential data leak by virtue of arg.pad not being checked or zeroed. This was introduced by the commit referenced below that switched arg from being a zeroed runtime allocation to living on the stack. Fix by simply zero initializing the structure. Fixes: cde738a61e65 ("bcachefs: Convert bch2_ioctl_fs_usage() to new accounting") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Fix race in bch2_accounting_mem_insert()Kent Overstreet
bch2_accounting_mem_insert() drops and retakes mark_lock; thus, we need to check if the entry in question has already been inserted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_btree_insert() - add btree iter flagsAriel Miculas
The commit 65bd44239727 ("bcachefs: bch2_btree_insert_trans() no longer specifies BTREE_ITER_cached") removes BTREE_ITER_cached from bch2_btree_insert_trans, which causes the update_inode function from bcachefs-tools to take a long time (~20s). Add an iter_flags parameter to bch2_btree_insert, so the users can specify iter update trigger flags, such as BTREE_ITER_cached. Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: BCH_IOCTL_QUERY_ACCOUNTINGKent Overstreet
Add a new ioctl that can return the new accounting counter types; it takes as input a bitmask of accounting types to return. This will be used for returning e.g. compression accounting and rebalance_work accounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: support REMAP_FILE_DEDUP in bch2_remap_file_rangeReed Riley
By removing the early-exit when REMAP_FILE_DEDUP is set, we should be able to support the fideduperange ioctl, albeit less efficiently than if we handled some of the extent locking and comparison logic inside bcachefs. Extent comparison logic already exists inside of `__generic_remap_file_range_prep`. Signed-off-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: support FS_IOC_SETFSLABELHongbo Li
Implement support for FS_IOC_SETFSLABEL ioctl to set filesystem label. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: support get fs labelHongbo Li
Implement support for FS_IOC_GETFSLABEL ioctl to read filesystem label. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: implement FS_IOC_GETVERSION to support lsattrHongbo Li
In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from inode, after that, users can list file's generation number by using "lsattr". Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Unlock trans when waiting for user input in fsckKent Overstreet
We can't hold locks while waiting for user input, that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Add tracepoints for bch2_sync_fs() and bch2_fsync()Youling Tang
Add trace_bch2_sync_fs() and trace_bch2_fsync() implementations. The output in trace is as follows: sync-29779 [000] ..... 193.700935: bch2_sync_fs: dev 254,16 wait 1 <...>-40027 [002] ..... 342.535227: bch2_fsync: dev 254,32 ino 4099 parent 4096 datasync 1 Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: track writeback errors using the generic tracking infrastructureYouling Tang
We already using mapping_set_error() in bch2_writepage_io_done(), so all we need to do is to use file_check_and_advance_wb_err() when handling fsync() requests in bch2_fsync(). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_dir_emit() - fix directory reads in the fuse driverAriel Miculas
Commit 0c0cbfdb84725e9933a24ecf47c61bdeeda06ba2 dropped the ctx->pos update before the call to dir_emit. This breaks the userspace implementation, causing the directory reads to be stuck in an infinite loop. This doesn't happen in the kernel because the vfs handles the updates to ctx->pos, but in the fuse implementation nobody updates it. Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: twf: delete dead struct fieldsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>