summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)Author
2025-01-27Merge tag 'f2fs-for-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, there are several major improvements such as folio conversion by Matthew, speed-up of block truncation, and caching more dentry pages. In addition, we implemented a linear dentry search to address recent unicode regression, and figured out some false alarms that we could get rid of. Enhancements: - foilio conversion in various IO paths - optimize f2fs_truncate_data_blocks_range() - cache more dentry pages - remove unnecessary blk_finish_plug - procfs: show mtime in segment_bits Bug fixes: - introduce linear search for dentries - don't call block truncation for aliased file - fix using wrong 'submitted' value in f2fs_write_cache_pages - fix to do sanity check correctly on i_inline_xattr_size - avoid trying to get invalid block address - fix inconsistent dirty state of atomic file" * tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: fix inconsistent dirty state of atomic file f2fs: fix to avoid changing 'check only' behaior of recovery f2fs: Clean up the loop outside of f2fs_invalidate_blocks() f2fs: procfs: show mtime in segment_bits f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime() f2fs: Fix format specifier in sanity_check_inode() f2fs: avoid trying to get invalid block address f2fs: fix to do sanity check correctly on i_inline_xattr_size f2fs: remove blk_finish_plug f2fs: Optimize f2fs_truncate_data_blocks_range() f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pages f2fs: add parameter @len to f2fs_invalidate_blocks() f2fs: update_sit_entry_for_release() supports consecutive blocks. f2fs: introduce update_sit_entry_for_release/alloc() f2fs: don't call block truncation for aliased file f2fs: Introduce linear search for dentries f2fs: add parameter @len to f2fs_invalidate_internal_cache() f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range() f2fs: ensure that node info flags are always initialized f2fs: The GC triggered by ioctl also needs to mark the segno as victim ...
2025-01-26Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-25mm, swap: clean up device availability checkKairui Song
Remove highest_bit and lowest_bit. After the HDD allocation path has been removed, the only purpose of these two fields is to determine whether the device is full or not, which can instead be determined by checking the inuse_pages. Link: https://lkml.kernel.org/r/20250113175732.48099-6-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chis Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickens <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25f2fs: fix inconsistent dirty state of atomic fileJianan Huang
When testing the atomic write fix patches, the f2fs_bug_on was triggered as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:935! Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 UID: 0 PID: 257 Comm: bash Not tainted 6.13.0-rc1-00033-gc283a70d3497 #5 RIP: 0010:f2fs_evict_inode+0x50f/0x520 Call Trace: <TASK> ? __die_body+0x65/0xb0 ? die+0x9f/0xc0 ? do_trap+0xa1/0x170 ? f2fs_evict_inode+0x50f/0x520 ? f2fs_evict_inode+0x50f/0x520 ? handle_invalid_op+0x65/0x80 ? f2fs_evict_inode+0x50f/0x520 ? exc_invalid_op+0x39/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? __pfx_f2fs_get_dquots+0x10/0x10 ? f2fs_evict_inode+0x50f/0x520 ? f2fs_evict_inode+0x2e5/0x520 evict+0x186/0x2f0 prune_icache_sb+0x75/0xb0 super_cache_scan+0x1a8/0x200 do_shrink_slab+0x163/0x320 shrink_slab+0x2fc/0x470 drop_slab+0x82/0xf0 drop_caches_sysctl_handler+0x4e/0xb0 proc_sys_call_handler+0x183/0x280 vfs_write+0x36d/0x450 ksys_write+0x68/0xd0 do_syscall_64+0xc8/0x1a0 ? arch_exit_to_user_mode_prepare+0x11/0x60 ? irqentry_exit_to_user_mode+0x7e/0xa0 The root cause is: f2fs uses FI_ATOMIC_DIRTIED to indicate dirty atomic files during commit. If the inode is dirtied during commit, such as by f2fs_i_pino_write, the vfs inode keeps clean and the f2fs inode is set to FI_DIRTY_INODE. The FI_DIRTY_INODE flag cann't be cleared by write_inode later due to the clean vfs inode. Finally, f2fs_bug_on is triggered due to this inconsistent state when evict. To reproduce this situation: - fd = open("/mnt/test.db", O_WRONLY) - ioctl(fd, F2FS_IOC_START_ATOMIC_WRITE) - mv /mnt/test.db /mnt/test1.db - ioctl(fd, F2FS_IOC_COMMIT_ATOMIC_WRITE) - echo 3 > /proc/sys/vm/drop_caches To fix this problem, clear FI_DIRTY_INODE after commit, then f2fs_mark_inode_dirty_sync will ensure a consistent dirty state. Fixes: fccaa81de87e ("f2fs: prevent atomic file from being dirtied before commit") Signed-off-by: Yunlei He <heyunlei@xiaomi.com> Signed-off-by: Jianan Huang <huangjianan@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22f2fs: fix to avoid changing 'check only' behaior of recoveryZhiguo Niu
The following two 'check only recovery' processes are very dependent on the return value of f2fs_recover_fsync_data, especially when the return value is greater than 0. 1. when device has readonly mode, shown as commit 23738e74472f ("f2fs: fix to restrict mount condition on readonly block device") 2. mount optiont NORECOVERY or DISABLE_ROLL_FORWARD is set, shown as commit 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") However, commit c426d99127b1 ("f2fs: Check write pointer consistency of open zones") will change the return value unexpectedly, thereby changing the caller's behavior This patch let the f2fs_recover_fsync_data return correct value,and not do f2fs_check_and_fix_write_pointer when the device is read-only. Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22f2fs: Clean up the loop outside of f2fs_invalidate_blocks()Yi Sun
Now f2fs_invalidate_blocks() supports a continuous range of addresses, so the for loop can be omitted. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22f2fs: procfs: show mtime in segment_bitsChao Yu
Show mtime in segment_bits for debug. cat /proc/fs//f2fs/loop0/segment_bits format: segment_type|valid_blocks|bitmaps|mtime segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN) 0 3|1 | 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff 1 4|3 | 00 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff 2 5|0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff 3 0|1 | 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00| ffffffffffffffff Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime()Chao Yu
syzbot reported a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/gc.c:373! CPU: 0 UID: 0 PID: 5316 Comm: syz.0.0 Not tainted 6.13.0-rc3-syzkaller-00044-gaef25be35d23 #0 RIP: 0010:get_cb_cost fs/f2fs/gc.c:373 [inline] RIP: 0010:get_gc_cost fs/f2fs/gc.c:406 [inline] RIP: 0010:f2fs_get_victim+0x68b1/0x6aa0 fs/f2fs/gc.c:912 Call Trace: <TASK> __get_victim fs/f2fs/gc.c:1707 [inline] f2fs_gc+0xc89/0x2f60 fs/f2fs/gc.c:1915 f2fs_ioc_gc fs/f2fs/file.c:2624 [inline] __f2fs_ioctl+0x4cc9/0xb8b0 fs/f2fs/file.c:4482 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f w/ below testcase, it can reproduce directly: - dd if=/dev/zero of=/tmp/file bs=1M count=64 - mkfs.f2fs /tmp/file - mount -t f2fs -o loop,mode=fragment:block /tmp/file /mnt/f2fs - echo 0 > /sys/fs/f2fs/loop0/min_ssr_sections - dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=5 - umount /mnt/f2fs - for((i=4096;i<16384;i+=512)) do inject.f2fs --sit 0 --blk $i --mb mtime --val -1 /tmp/file; done - mount -o loop /tmp/file /mnt/f2fs - f2fs_io gc 0 /mnt/f2fs/file static unsigned int get_cb_cost() { ... mtime = f2fs_get_section_mtime(sbi, segno); f2fs_bug_on(sbi, mtime == INVALID_MTIME); ... } The root cause is: mtime in f2fs_sit_entry can be fuzzed to INVALID_MTIME, then it will trigger BUG_ON in get_cb_cost() during GC. Let's change behavior of f2fs_get_section_mtime() as below for fix: - return INVALID_MTIME only if total valid blocks is zero. - return INVALID_MTIME - 1 if average mtime calculated is INVALID_MTIME. Fixes: b19ee7272208 ("f2fs: introduce f2fs_get_section_mtime") Reported-by: syzbot+b9972806adbe20a910eb@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/6768c82e.050a0220.226966.0035.GAE@google.com Cc: liuderong <liuderong@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22f2fs: Fix format specifier in sanity_check_inode()Nathan Chancellor
When building for 32-bit platforms, for which 'size_t' is 'unsigned int', there is a warning due to an incorrect format specifier: fs/f2fs/inode.c:320:6: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat] 318 | f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_inline_xattr_size: %d, min: %lu, max: %lu", | ~~~ | %u 319 | __func__, inode->i_ino, fi->i_inline_xattr_size, 320 | MIN_INLINE_XATTR_SIZE, MAX_INLINE_XATTR_SIZE); | ^~~~~~~~~~~~~~~~~~~~~ fs/f2fs/f2fs.h:1855:46: note: expanded from macro 'f2fs_warn' 1855 | f2fs_printk(sbi, false, KERN_WARNING fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ fs/f2fs/xattr.h:86:31: note: expanded from macro 'MIN_INLINE_XATTR_SIZE' 86 | #define MIN_INLINE_XATTR_SIZE (sizeof(struct f2fs_xattr_header) / sizeof(__le32)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use the format specifier for 'size_t', '%zu', to resolve the warning. Fixes: 5c1768b67250 ("f2fs: fix to do sanity check correctly on i_inline_xattr_size") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-21f2fs: avoid trying to get invalid block addressJaegeuk Kim
In f2fs_new_inode(), if we fail to get a new inode, we go iput(), followed by f2fs_evict_inode(). If the inode is not marked as bad, it'll try to call f2fs_remove_inode_page() which tries to read the inode block given node id. But, there's no block address allocated yet, which gives a chance to access a wrong block address, if the block device has some garbage data in NAT table. We need to make sure NAT table should have zero data for all the unallocated node ids, but also would be better to take this unnecessary path as well. Let's mark the faild inode as bad. Fixes: 0abd675e97e6 ("f2fs: support plain user/group quota") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16f2fs: fix to do sanity check correctly on i_inline_xattr_sizeChao Yu
syzbot reported an out-of-range access issue as below: UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3292:19 index 18446744073709550491 is out of range for type '__le32[923]' (aka 'unsigned int[923]') CPU: 0 UID: 0 PID: 5338 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10689-g7af08b57bcb9 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429 read_inline_xattr+0x273/0x280 lookup_all_xattrs fs/f2fs/xattr.c:341 [inline] f2fs_getxattr+0x57b/0x13b0 fs/f2fs/xattr.c:533 vfs_getxattr_alloc+0x472/0x5c0 fs/xattr.c:393 ima_read_xattr+0x38/0x60 security/integrity/ima/ima_appraise.c:229 process_measurement+0x117a/0x1fb0 security/integrity/ima/ima_main.c:353 ima_file_check+0xd9/0x120 security/integrity/ima/ima_main.c:572 security_file_post_open+0xb9/0x280 security/security.c:3121 do_open fs/namei.c:3830 [inline] path_openat+0x2ccd/0x3590 fs/namei.c:3987 do_file_open_root+0x3a7/0x720 fs/namei.c:4039 file_open_root+0x247/0x2a0 fs/open.c:1382 do_handle_open+0x85b/0x9d0 fs/fhandle.c:414 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f index: 18446744073709550491 (decimal, unsigned long long) = 0xfffffffffffffb9b (hexadecimal) = -1125 (decimal, long long) UBSAN detects that inline_xattr_addr() tries to access .i_addr[-1125]. w/ below testcase, it can reproduce this bug easily: - mkfs.f2fs -f -O extra_attr,flexible_inline_xattr /dev/sdb - mount -o inline_xattr_size=512 /dev/sdb /mnt/f2fs - touch /mnt/f2fs/file - umount /mnt/f2fs - inject.f2fs --node --mb i_inline --nid 4 --val 0x1 /dev/sdb - inject.f2fs --node --mb i_inline_xattr_size --nid 4 --val 2048 /dev/sdb - mount /dev/sdb /mnt/f2fs - getfattr /mnt/f2fs/file The root cause is if metadata of filesystem and inode were fuzzed as below: - extra_attr feature is enabled - flexible_inline_xattr feature is enabled - ri.i_inline_xattr_size = 2048 - F2FS_EXTRA_ATTR bit in ri.i_inline was not set sanity_check_inode() will skip doing sanity check on fi->i_inline_xattr_size, result in using invalid inline_xattr_size later incorrectly, fix it. Meanwhile, let's fix to check lower boundary for .i_inline_xattr_size w/ MIN_INLINE_XATTR_SIZE like we did in parse_options(). There is a related issue reported by syzbot, Qasim Ijaz has anlyzed and fixed it w/ very similar way [1], as discussed, we all agree that it will be better to do sanity check in sanity_check_inode() for fix, so finally, let's fix these two related bugs w/ current patch. Including commit message from Qasim's patch as below, thanks a lot for his contribution. "In f2fs_getxattr(), the function lookup_all_xattrs() allocates a 12-byte (base_size) buffer for an inline extended attribute. However, when __find_inline_xattr() calls __find_xattr(), it uses the macro "list_for_each_xattr(entry, addr)", which starts by calling XATTR_FIRST_ENTRY(addr). This skips a 24-byte struct f2fs_xattr_header at the beginning of the buffer, causing an immediate out-of-bounds read in a 12-byte allocation. The subsequent !IS_XATTR_LAST_ENTRY(entry) check then dereferences memory outside the allocated region, triggering the slab-out-of bounds read. This patch prevents the out-of-bounds read by adding a check to bail out early if inline_size is too small and does not account for the header plus the 4-byte value that IS_XATTR_LAST_ENTRY reads." [1]: https://lore.kernel.org/linux-f2fs-devel/Z32y1rfBY9Qb5ZjM@qasdev.system/ Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size") Reported-by: syzbot+69f5379a1717a0b982a1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/674f4e7d.050a0220.17bd51.004f.GAE@google.com Reported-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=f5e74075e096e757bdbf Tested-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com> Tested-by: Qasim Ijaz <qasdev00@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16f2fs: remove blk_finish_plugJaegeuk Kim
Let's remove unclear blk_finish_plug. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16f2fs: Optimize f2fs_truncate_data_blocks_range()Yi Sun
Function f2fs_invalidate_blocks() can process consecutive blocks at a time, so f2fs_truncate_data_blocks_range() is optimized to use the new functionality of f2fs_invalidate_blocks(). Add two variables @blkstart and @blklen, @blkstart records the first address of the consecutive blocks, and @blkstart records the number of consecutive blocks. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-13f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pageszangyangyang1
When f2fs_write_single_data_page fails, f2fs_write_cache_pages will use the last 'submitted' value incorrectly, which will cause 'nwritten' and 'wbc->nr_to_write' calculation errors Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-13f2fs: add parameter @len to f2fs_invalidate_blocks()Yi Sun
New function can process some consecutive blocks at a time. Function f2fs_invalidate_blocks()->down_write() and up_write() are very time-consuming, so if f2fs_invalidate_blocks() can process consecutive blocks at one time, it will save a lot of time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: update_sit_entry_for_release() supports consecutive blocks.Yi Sun
This function can process some consecutive blocks at a time. When using update_sit_entry() to release consecutive blocks, ensure that the consecutive blocks belong to the same segment. Because after update_sit_entry_for_realese(), @segno is still in use in update_sit_entry(). Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: introduce update_sit_entry_for_release/alloc()Yi Sun
No logical changes, just for cleanliness. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: don't call block truncation for aliased fileJaegeuk Kim
This patch should avoid the below warning which does not corrupt the metadata tho. [ 51.508120][ T253] F2FS-fs (dm-59): access invalid blkaddr:36 [ 51.508156][ T253] __f2fs_is_valid_blkaddr+0x330/0x384 [ 51.508162][ T253] f2fs_is_valid_blkaddr_raw+0x10/0x24 [ 51.508163][ T253] f2fs_truncate_data_blocks_range+0x1ec/0x438 [ 51.508177][ T253] f2fs_remove_inode_page+0x8c/0x148 [ 51.508194][ T253] f2fs_evict_inode+0x230/0x76c Fixes: 128d333f0dff ("f2fs: introduce device aliasing file") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: Introduce linear search for dentriesDaniel Lee
This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points"). F2FS uses these case-folded names to calculate hash values for locating dentries and stores them on disk. Since utf8_casefold() can produce different output across kernel versions, stored hash values and newly calculated hash values may differ. This results in affected files no longer being found via the hash-based lookup. To resolve this, the patch introduces a linear search fallback. If the initial hash-based search fails, F2FS will sequentially scan the directory entries. Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586 Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: add parameter @len to f2fs_invalidate_internal_cache()Yi Sun
New function can process some consecutive blocks at a time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: expand f2fs_invalidate_compress_page() to ↵Yi Sun
f2fs_invalidate_compress_pages_range() New function f2fs_invalidate_compress_pages_range() adds the @len parameter. So it can process some consecutive blocks at a time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: ensure that node info flags are always initializedDmitry Antipov
Syzbot has reported the following KMSAN splat: BUG: KMSAN: uninit-value in f2fs_new_node_page+0x1494/0x1630 f2fs_new_node_page+0x1494/0x1630 f2fs_new_inode_page+0xb9/0x100 f2fs_init_inode_metadata+0x176/0x1e90 f2fs_add_inline_entry+0x723/0xc90 f2fs_do_add_link+0x48f/0xa70 f2fs_symlink+0x6af/0xfc0 vfs_symlink+0x1f1/0x470 do_symlinkat+0x471/0xbc0 __x64_sys_symlink+0xcf/0x140 x64_sys_call+0x2fcc/0x3d90 do_syscall_64+0xd9/0x1b0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable new_ni created at: f2fs_new_node_page+0x9d/0x1630 f2fs_new_inode_page+0xb9/0x100 So adjust 'f2fs_get_node_info()' to ensure that 'flag' field of 'struct node_info' is always initialized. Reported-by: syzbot+5141f6db57a2f7614352@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5141f6db57a2f7614352 Fixes: e05df3b115e7 ("f2fs: add node operations") Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: The GC triggered by ioctl also needs to mark the segno as victimYongpeng Yang
In SSR mode, the segment selected for allocation might be the same as the target segment of the GC triggered by ioctl, resulting in the GC moving the CURSEG_I(sbi, type)->segno. Thread A Thread B or Thread A - f2fs_ioc_gc_range - __f2fs_ioc_gc_range(.victim_segno=segno#N) - f2fs_gc - __get_victim - f2fs_get_victim : segno#N is valid, return segno#N as source segment of GC - f2fs_allocate_data_block - need_new_seg - get_ssr_segment - f2fs_get_victim : get segno #N as destination segment - change_curseg Fixes: e066b83c9b40 ("f2fs: add ioctl to flush data from faster device to cold area") Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: cache more dentry pageszangyangyang1
While traversing dir entries in dentry page, it's better to refresh current accessed page in lru list by using FGP_ACCESSED flag, otherwise, such page may has less chance to survive during memory reclaim, result in causing additional IO when revisiting dentry page. Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Remove calls to folio_file_mapping()Matthew Wilcox (Oracle)
All folios that f2fs sees belong to f2fs and not to the swapcache so it can dereference folio->mapping directly like all other filesystems do. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Convert __read_io_type() to take a folioMatthew Wilcox (Oracle)
Remove the last call to page_file_mapping() as both callers can now pass in a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a data folio in f2fs_submit_page_bio()Matthew Wilcox (Oracle)
Remove a call to compound_head(). We can call bio_add_folio_nofail() here because we just allocated the bio, so we know it can't fail and thus the error path can never be taken. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio more in f2fs_submit_page_bio()Matthew Wilcox (Oracle)
Cache the result of page_folio(fio->page) in a local variable so we don't have to keep calling it. Saves a couple of calls to compound_head() and removes an access to page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Convert f2fs_finish_read_bio() to use foliosMatthew Wilcox (Oracle)
Use bio_for_each_folio_all() to iterate over each folio in the bio. This lets us use folio_end_read() which saves an atomic operation and memory barrier compared to marking the folio uptodate and unlocking it as two separate operations. This also removes a few hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Add F2FS_F_SB()Matthew Wilcox (Oracle)
This is the folio equivalent of F2FS_P_SB(). Removes a call to page_file_mapping() as we know folios seen by f2fs are never part of the swap cache. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Convert submit tracepoints to take a folioMatthew Wilcox (Oracle)
Remove accesses to page->index and page->mapping as well as unnecessary calls to page_file_mapping(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_write_compressed_pages()Matthew Wilcox (Oracle)
Remove accesses to page->index and an unnecessary reference to page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_truncate_partial_cluster()Matthew Wilcox (Oracle)
Convert the incoming page to a folio and use it throughout. Removes an access to page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_compress_write_end()Matthew Wilcox (Oracle)
This removes an access of page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_all_cluster_page_ready()Matthew Wilcox (Oracle)
Remove references to page->index and use folio_test_uptodate() instead of PageUptodate(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-01f2fs: switch to using the crc32 libraryEric Biggers
Now that the crc32() library function takes advantage of architecture-specific optimizations, it is unnecessary to go through the crypto API. Just use crc32(). This is much simpler, and it improves performance due to eliminating the crypto API overhead. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20241202010844.144356-19-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-26Merge tag 'f2fs-for-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ...
2024-11-23f2fs: fix to drop all discards after creating snapshot on lvm deviceChao Yu
Piergiorgio reported a bug in bugzilla as below: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330 RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs] Call Trace: __issue_discard_cmd+0x1ca/0x350 [f2fs] issue_discard_thread+0x191/0x480 [f2fs] kthread+0xcf/0x100 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1a/0x30 w/ below testcase, it can reproduce this bug quickly: - pvcreate /dev/vdb - vgcreate myvg1 /dev/vdb - lvcreate -L 1024m -n mylv1 myvg1 - mount /dev/myvg1/mylv1 /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20 - sync - rm /mnt/f2fs/file - sync - lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1 - umount /mnt/f2fs The root cause is: it will update discard_max_bytes of mounted lvm device to zero after creating snapshot on this lvm device, then, __submit_discard_cmd() will pass parameter @nr_sects w/ zero value to __blkdev_issue_discard(), it returns a NULL bio pointer, result in panic. This patch changes as below for fixing: 1. Let's drop all remained discards in f2fs_unfreeze() if snapshot of lvm device is created. 2. Checking discard_max_bytes before submitting discard during __submit_discard_cmd(). Cc: stable@vger.kernel.org Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer") Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23f2fs: add a sysfs node to limit max read extent count per-inodeChao Yu
Quoted: "at this time, there are still 1086911 extent nodes in this zombie extent tree that need to be cleaned up. crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500 node_cnt = { counter = 1086911 }, " As reported by Xiuhong, there will be a huge number of extent nodes in extent tree, it may potentially cause: - slab memory fragments - extreme long time shrink on extent tree - low mapping efficiency Let's add a sysfs node to limit max read extent count for each inode, by default, value of this threshold is 10240, it can be updated according to user's requirement. Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/ Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: fix to shrink read extent node in batchesChao Yu
We use rwlock to protect core structure data of extent tree during its shrink, however, if there is a huge number of extent nodes in extent tree, during shrink of extent tree, it may hold rwlock for a very long time, which may trigger kernel hang issue. This patch fixes to shrink read extent node in batches, so that, critical region of the rwlock can be shrunk to avoid its extreme long time hold. Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/ Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: print message if fscorrupted was found in f2fs_new_node_page()Chao Yu
If fs corruption occurs in f2fs_new_node_page(), let's print more information about corrupted metadata into kernel log. Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not NULL_ADDR which means nat bitmap and nat entry is inconsistent. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: clear SBI_POR_DOING before initing inmem cursegSheng Yong
SBI_POR_DOING can be cleared after recovery is completed, so that changes made before recovery can be persistent, and subsequent errors can be recorded into cp/sb. Signed-off-by: Song Feng <songfeng@oppo.com> Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: fix changing cursegs if recovery fails on zoned deviceSheng Yong
Fsync data recovery attempts to check and fix write pointer consistency of cursegs and all other zones. If the write pointers of cursegs are unaligned, cursegs are changed to new sections. If recovery fails, zone write pointers are still checked and fixed, but the latest checkpoint cannot be written back. Additionally, retry- mount skips recovery and rolls back to reuse the old cursegs whose zones are already finished. This can lead to unaligned write later. This patch addresses the issue by leaving writer pointers untouched if recovery fails. When retry-mount is performed, cursegs and other zones are checked and fixed after skipping recovery. Signed-off-by: Song Feng <songfeng@oppo.com> Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: adjust unusable cap before checkpoint=disable modeDaeho Jeong
The unusable cap value must be adjusted before checking whether checkpoint=disable is feasible. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: fix to requery extent which cross boundary of inquiryChao Yu
dd if=/dev/zero of=file bs=4k count=5 xfs_io file -c "fiemap -v 2 16384" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 139272..139303 32 0x1000 1: [32..39]: 139304..139311 8 0x1001 xfs_io file -c "fiemap -v 0 16384" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 139272..139303 32 0x1000 xfs_io file -c "fiemap -v 0 16385" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..39]: 139272..139311 40 0x1001 There are two problems: - continuous extent is split to two - FIEMAP_EXTENT_LAST is missing in last extent The root cause is: if upper boundary of inquiry crosses extent, f2fs_map_blocks() will truncate length of returned extent to F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter extent or hole to make sure current extent is last or not. In order to fix this issue, once we found an extent locates in the end of inquiry range by f2fs_map_blocks(), we need to expand inquiry range to requiry. Cc: stable@vger.kernel.org Fixes: 7f63eb77af7b ("f2fs: report unwritten area in f2fs_fiemap") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: fix to adjust appropriate length for fiemapZhiguo Niu
If user give a file size as "length" parameter for fiemap operations, but if this size is non-block size aligned, it will show 2 segments fiemap results even this whole file is contiguous on disk, such as the following results: ./f2fs_io fiemap 0 19034 ylog/analyzer.py Fiemap: offset = 0 len = 19034 logical addr. physical addr. length flags 0 0000000000000000 0000000020baa000 0000000000004000 00001000 1 0000000000004000 0000000020bae000 0000000000001000 00001001 after this patch: ./f2fs_io fiemap 0 19034 ylog/analyzer.py Fiemap: offset = 0 len = 19034 logical addr. physical addr. length flags 0 0000000000000000 00000000315f3000 0000000000005000 00001001 Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}Chao Yu
f2fs doesn't support different blksize in one instance, so bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES instead for cleanup. Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflowChao Yu
It missed to cast variable to unsigned long long type before bit shift, which will cause overflow, fix it. Fixes: f7ef9b83b583 ("f2fs: introduce macros to convert bytes and blocks in f2fs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21f2fs: replace deprecated strcpy with strscpyDaniel Yang
strcpy is deprecated. Kernel docs recommend replacing strcpy with strscpy. The function strcpy() return value isn't used so there shouldn't be an issue replacing with the safer alternative strscpy. Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21Revert "f2fs: remove unreachable lazytime mount option parsing"Jaegeuk Kim
This reverts commit 54f43a10fa257ad4af02a1d157fefef6ebcfa7dc. The above commit broke the lazytime mount, given mount("/dev/vdb", "/mnt/test", "f2fs", 0, "lazytime"); CC: stable@vger.kernel.org # 6.11+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>