summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-11-01Merge tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Light on new features - basically just the hybrid mode support. Outside of that it's just fixes, cleanups, and performance improvements. In detail: - Add ring related information to the fdinfo output (Hao) - Hybrid async mode (Hao) - Support for batched issue on block (me) - sqe error trace improvement (me) - IOPOLL efficiency improvements (Pavel) - submit state cleanups and improvements (Pavel) - Completion side improvements (Pavel) - Drain improvements (Pavel) - Buffer selection cleanups (Pavel) - Fixed file node improvements (Pavel) - io-wq setup cancelation fix (Pavel) - Various other performance improvements and cleanups (Pavel) - Misc fixes (Arnd, Bixuan, Changcheng, Hao, me, Noah)" * tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block: (97 commits) io-wq: remove worker to owner tw dependency io_uring: harder fdinfo sq/cq ring iterating io_uring: don't assign write hint in the read path io_uring: clusterise ki_flags access in rw_prep io_uring: kill unused param from io_file_supports_nowait io_uring: clean up timeout async_data allocation io_uring: don't try io-wq polling if not supported io_uring: check if opcode needs poll first on arming io_uring: clean iowq submit work cancellation io_uring: clean io_wq_submit_work()'s main loop io-wq: use helper for worker refcounting io_uring: implement async hybrid mode for pollable requests io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR()) io_uring: split logic of force_nonblock io_uring: warning about unused-but-set parameter io_uring: inform block layer of how many requests we are submitting io_uring: simplify io_file_supports_nowait() io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags io_uring: arm poll for non-nowait files fs/io_uring: Prioritise checking faster conditions first in io_write ...
2021-11-01Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
2021-11-01Merge tag 'locks-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Most of this is just follow-on cleanup work of documentation and comments from the mandatory locking removal in v5.15. The only real functional change is that LOCK_MAND flock() support is also being removed, as it has basically been non-functional since the v2.5 days" * tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove leftover comments from mandatory locking removal locks: remove changelog comments docs: fs: locks.rst: update comment about mandatory file locking Documentation: remove reference to now removed mandatory-locking doc locks: remove LOCK_MAND flock lock support
2021-11-01Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull memory folios from Matthew Wilcox: "Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. The point of all this churn is to allow filesystems and the page cache to manage memory in larger chunks than PAGE_SIZE. The original plan was to use compound pages like THP does, but I ran into problems with some functions expecting only a head page while others expect the precise page containing a particular byte. The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head(). This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready. The multi-page folios offer some improvement to some workloads. The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%. I haven't heard of any performance losses as a result of this series. Nobody has done any serious performance tuning; I imagine that tweaking the readahead algorithm could provide some more interesting wins. There are also other places where we could choose to create large folios and currently do not, such as writes that are larger than PAGE_SIZE. I'd like to thank all my reviewers who've offered review/ack tags: Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil Babka, William Kucharski, Yu Zhao and Zi Yan. I'd also like to thank those who gave feedback I incorporated but haven't offered up review tags for this part of the series: Nick Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard, Hugh Dickins, and probably a few others who I forget" * tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits) mm/writeback: Add folio_write_one mm/filemap: Add FGP_STABLE mm/filemap: Add filemap_get_folio mm/filemap: Convert mapping_get_entry to return a folio mm/filemap: Add filemap_add_folio() mm/filemap: Add filemap_alloc_folio mm/page_alloc: Add folio allocation functions mm/lru: Add folio_add_lru() mm/lru: Convert __pagevec_lru_add_fn to take a folio mm: Add folio_evictable() mm/workingset: Convert workingset_refault() to take a folio mm/filemap: Add readahead_folio() mm/filemap: Add folio_mkwrite_check_truncate() mm/filemap: Add i_blocks_per_folio() mm/writeback: Add folio_redirty_for_writepage() mm/writeback: Add folio_account_redirty() mm/writeback: Add folio_clear_dirty_for_io() mm/writeback: Add folio_cancel_dirty() mm/writeback: Add folio_account_cleaned() mm/writeback: Add filemap_dirty_folio() ...
2021-10-29Merge tag 'for-5.15-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Last minute fixes for crash on 32bit architectures when compression is in use. It's a regression introduced in 5.15-rc and I'd really like not let this into the final release, fixes via stable trees would add unnecessary delay. The problem is on 32bit architectures with highmem enabled, the pages for compression may need to be kmapped, while the patches removed that as we don't use GFP_HIGHMEM allocations anymore. The pages that don't come from local allocation still may be from highmem. Despite being on 32bit there's enough such ARM machines in use so it's not a marginal issue. I did full reverts of the patches one by one instead of a huge one. There's one exception for the "lzo" revert as there was an intermediate patch touching the same code to make it compatible with subpage. I can't revert that one too, so the revert in lzo.c is manual. Qu Wenruo has worked on that with me and verified the changes" * tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Revert "btrfs: compression: drop kmap/kunmap from lzo" Revert "btrfs: compression: drop kmap/kunmap from zlib" Revert "btrfs: compression: drop kmap/kunmap from zstd" Revert "btrfs: compression: drop kmap/kunmap from generic helpers"
2021-10-29io-wq: remove worker to owner tw dependencyPavel Begunkov
INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds. Not tainted 5.15.0-rc5-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xb44/0x5960 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x176/0x280 kernel/sched/completion.c:138 io_worker_exit fs/io-wq.c:183 [inline] io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 io-wq worker may submit a task_work to the master task and upon io_worker_exit() wait for the tw to get executed. The problem appears when the master task is waiting in coredump.c: 468 freezer_do_not_count(); 469 wait_for_completion(&core_state->startup); 470 freezer_count(); Apparently having some dependency on children threads getting everything stuck. Workaround it by cancelling the taks_work callback that causes it before going into io_worker_exit() waiting. p.s. probably a better option is to not submit tw elevating the refcount in the first place, but let's leave this excercise for the future. Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/142a716f4ed936feae868959059154362bfa8c19.1635509451.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29io_uring: harder fdinfo sq/cq ring iteratingJens Axboe
The ring iteration is racy, which isn't necessarily a problem except it can cause us to iterate the whole thing. That isn't desired or ideal, and it can lead to excessive runtimes of reading fdinfo. Cap the iteration at tail - head OR the ring size. While in there, clean up the ring masking and just dump the raw values along with the masks. That provides more useful debug info. Fixes: 83f84356bc8f ("io_uring: add more uring info to fdinfo for debug") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29Revert "btrfs: compression: drop kmap/kunmap from lzo"David Sterba
This reverts commit 8c945d32e60427cbc0859cf7045bbe6196bb03d8. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. The revert does not apply cleanly due to changes in a6e66e6f8c1b ("btrfs: rework lzo_decompress_bio() to make it subpage compatible") that reworked the page iteration so the revert is done to be equivalent to the original code. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Tested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29Revert "btrfs: compression: drop kmap/kunmap from zlib"David Sterba
This reverts commit 696ab562e6df9fbafd6052d8ce4aafcb2ed16069. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29Revert "btrfs: compression: drop kmap/kunmap from zstd"David Sterba
This reverts commit bbaf9715f3f5b5ff0de71da91fcc34ee9c198ed8. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Example stacktrace with ZSTD on a 32bit ARM machine: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c4159ed3 [00000000] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 210 Comm: kworker/u2:3 Not tainted 5.14.0-rc79+ #12 Hardware name: Allwinner sun4i/sun5i Families Workqueue: btrfs-delalloc btrfs_work_helper PC is at mmiocpy+0x48/0x330 LR is at ZSTD_compressStream_generic+0x15c/0x28c (mmiocpy) from [<c0629648>] (ZSTD_compressStream_generic+0x15c/0x28c) (ZSTD_compressStream_generic) from [<c06297dc>] (ZSTD_compressStream+0x64/0xa0) (ZSTD_compressStream) from [<c049444c>] (zstd_compress_pages+0x170/0x488) (zstd_compress_pages) from [<c0496798>] (btrfs_compress_pages+0x124/0x12c) (btrfs_compress_pages) from [<c043c068>] (compress_file_range+0x3c0/0x834) (compress_file_range) from [<c043c4ec>] (async_cow_start+0x10/0x28) (async_cow_start) from [<c0475c3c>] (btrfs_work_helper+0x100/0x230) (btrfs_work_helper) from [<c014ef68>] (process_one_work+0x1b4/0x418) (process_one_work) from [<c014f210>] (worker_thread+0x44/0x524) (worker_thread) from [<c0156aa4>] (kthread+0x180/0x1b0) (kthread) from [<c0100150>] Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-28ocfs2: fix race between searching chunks and release journal_head from ↵Gautham Ananthakrishna
buffer_head Encountered a race between ocfs2_test_bg_bit_allocatable() and jbd2_journal_put_journal_head() resulting in the below vmcore. PID: 106879 TASK: ffff880244ba9c00 CPU: 2 COMMAND: "loop3" Call trace: panic oops_end no_context __bad_area_nosemaphore bad_area_nosemaphore __do_page_fault do_page_fault page_fault [exception RIP: ocfs2_block_group_find_clear_bits+316] ocfs2_block_group_find_clear_bits [ocfs2] ocfs2_cluster_group_search [ocfs2] ocfs2_search_chain [ocfs2] ocfs2_claim_suballoc_bits [ocfs2] __ocfs2_claim_clusters [ocfs2] ocfs2_claim_clusters [ocfs2] ocfs2_local_alloc_slide_window [ocfs2] ocfs2_reserve_local_alloc_bits [ocfs2] ocfs2_reserve_clusters_with_limit [ocfs2] ocfs2_reserve_clusters [ocfs2] ocfs2_lock_refcount_allocators [ocfs2] ocfs2_make_clusters_writable [ocfs2] ocfs2_replace_cow [ocfs2] ocfs2_refcount_cow [ocfs2] ocfs2_file_write_iter [ocfs2] lo_rw_aio loop_queue_work kthread_worker_fn kthread ret_from_fork When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and released the jounal head from the buffer head. Needed to take bit lock for the bit 'BH_JournalHead' to fix this race. Link: https://lkml.kernel.org/r/1634820718-6043-1-git-send-email-gautham.ananthakrishna@oracle.com Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <rajesh.sivaramasubramaniom@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-27Revert "btrfs: compression: drop kmap/kunmap from generic helpers"David Sterba
This reverts commit 4c2bf276b56d8d27ddbafcdf056ef3fc60ae50b0. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26io_uring: don't assign write hint in the read pathJens Axboe
Move this out of the generic read/write prep path, and place it in the write specific kiocb setup instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26fs: remove leftover comments from mandatory locking removalJeff Layton
Stragglers from commit f7e33bdbd6d1 ("fs: remove mandatory file locking support"). Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-10-25io_uring: clusterise ki_flags access in rw_prepPavel Begunkov
ioprio setup doesn't depend on other fields that are modified in io_prep_rw() and we can move it down in the function without worrying about performance. It's useful as it makes iocb->ki_flags accesses/modifications closer together, so it's more likely the compiler will cache it in a register and avoid extra reloads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: kill unused param from io_file_supports_nowaitPavel Begunkov
io_file_supports_nowait() doesn't use rw argument anymore, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4bd6709fc573d70c866ea656cb7a7dbe94be8026.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean up timeout async_data allocationPavel Begunkov
opcode prep functions are one of the first things that are called, we can't have ->async_data allocated at this point and it's certainly a bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE just in case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: don't try io-wq polling if not supportedPavel Begunkov
If an opcode doesn't support polling, just let it be executed synchronously in iowq, otherwise it will do a nonblock attempt just to fail in io_arm_poll_handler() and return back to blocking execution. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: check if opcode needs poll first on armingPavel Begunkov
->pollout or ->pollin are set only for opcodes that need a file, so if io_arm_poll_handler() tests them first we can be sure that the request has file set and the ->file check can be removed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean iowq submit work cancellationPavel Begunkov
If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error on the same lines as the check instead of having a weird code flow. The main loop doesn't change but goes one indention left. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean io_wq_submit_work()'s main loopPavel Begunkov
Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid of switch, just replace it with a single if as we're retrying in both other cases. Kill issue_sqe label, Get rid of needs_poll nesting and disambiguate a bit the comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-24Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull autofs fix from Al Viro: "Fix for a braino of mine (in getting rid of open-coded dentry_path_raw() in autofs a couple of cycles ago). Mea culpa... Obvious -stable fodder" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: autofs: fix wait name hash calculation in autofs_wait()
2021-10-24Merge tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd fixes from Steve French: "Ten fixes for the ksmbd kernel server, for improved security and additional buffer overflow checks: - a security improvement to session establishment to reduce the possibility of dictionary attacks - fix to ensure that maximum i/o size negotiated in the protocol is not less than 64K and not more than 8MB to better match expected behavior - fix for crediting (flow control) important to properly verify that sufficient credits are available for the requested operation - seven additional buffer overflow, buffer validation checks" * tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: add buffer validation in session setup ksmbd: throttle session setup failures to avoid dictionary attacks ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests ksmbd: validate credit charge after validating SMB2 PDU body size ksmbd: add buffer validation for smb direct ksmbd: limit read/write/trans buffer size not to exceed 8MB ksmbd: validate compound response buffer ksmbd: fix potencial 32bit overflow from data area check in smb2_write ksmbd: improve credits management ksmbd: add validation in smb2_ioctl
2021-10-23io-wq: use helper for worker refcountingPavel Begunkov
Use io_worker_release() instead of hand coding it in io_worker_exit(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6f95f09d2cdbafcbb2e22ad0d1a2bc4d3962bf65.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22Merge tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Two fixes for the max workers limit API that was introduced this series: one fix for an issue with that code, and one fixing a linked timeout regression in this series" * tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block: io_uring: apply worker limits to previous users io_uring: fix ltimeout unprep io_uring: apply max_workers limit to all future users io-wq: max_worker fixes
2021-10-22io_uring: implement async hybrid mode for pollable requestsHao Xu
The current logic of requests with IOSQE_ASYNC is first queueing it to io-worker, then execute it in a synchronous way. For unbound works like pollable requests(e.g. read/write a socketfd), the io-worker may stuck there waiting for events for a long time. And thus other works wait in the list for a long time too. Let's introduce a new way for unbound works (currently pollable requests), with this a request will first be queued to io-worker, then executed in a nonblock try rather than a synchronous way. Failure of that leads it to arm poll stuff and then the worker can begin to handle other works. The detail process of this kind of requests is: step1: original context: queue it to io-worker step2: io-worker context: nonblock try(the old logic is a synchronous try here) | |--fail--> arm poll | |--(fail/ready)-->synchronous issue | |--(succeed)-->worker finish it's job, tw take over the req This works much better than the old IOSQE_ASYNC logic in cases where unbound max_worker is relatively small. In this case, number of io-worker eazily increments to max_worker, new worker cannot be created and running workers stuck there handling old works in IOSQE_ASYNC mode. In my 64-core machine, set unbound max_worker to 20, run echo-server, turns out: (arguments: register_file, connetion number is 1000, message size is 12 Byte) original IOSQE_ASYNC: 76664.151 tps after this patch: 166934.985 tps Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20211018133445.103438-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22Merge tag 'fuse-fixes-5.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Syzbot discovered a race in case of reusing the fuse sb (introduced in this cycle). Fix it by doing the s_fs_info initialization at the proper place" * tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: clean up error exits in fuse_fill_super() fuse: always initialize sb->s_fs_info fuse: clean up fuse_mount destruction fuse: get rid of fuse_put_super() fuse: check s_root when destroying sb
2021-10-21io_uring: apply worker limits to previous usersPavel Begunkov
Another change to the API io-wq worker limitation API added in 5.15, apply the limit to all prior users that already registered a tctx. It may be confusing as it's now, in particular the change covers the following 2 cases: TASK1 | TASK2 _________________________________________________ ring = create() | | limit_iowq_workers() *not limited* | TASK1 | TASK2 _________________________________________________ ring = create() | | issue_requests() limit_iowq_workers() | | *not limited* A note on locking, it's safe to traverse ->tctx_list as we hold ->uring_lock, but do that after dropping sqd->lock to avoid possible problems. It's also safe to access tctx->io_wq there because tasks kill it only after removing themselves from tctx_list, see io_uring_cancel_generic() -> io_uring_clean_tctx() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21fuse: clean up error exits in fuse_fill_super()Miklos Szeredi
Instead of "goto err", return error directly, since there's no error cleanup to do now. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: always initialize sb->s_fs_infoMiklos Szeredi
Syzkaller reports a null pointer dereference in fuse_test_super() that is caused by sb->s_fs_info being NULL. This is due to the fact that fuse_fill_super() is initializing s_fs_info, which is too late, it's already on the fs_supers list. The initialization needs to be done in sget_fc() with the sb_lock held. Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into fuse_get_tree(). After this ->kill_sb() will always be called with non-NULL ->s_fs_info, hence fuse_mount_destroy() can drop the test for non-NULL "fm". Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com Fixes: 5d5b74aa9c76 ("fuse: allow sharing existing sb") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: clean up fuse_mount destructionMiklos Szeredi
1. call fuse_mount_destroy() for open coded variants 2. before deactivate_locked_super() don't need fuse_mount destruction since that will now be done (if ->s_fs_info is not cleared) 3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the regular pattern can be used Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: get rid of fuse_put_super()Miklos Szeredi
The ->put_super callback is called from generic_shutdown_super() in case of a fully initialized sb. This is called from kill_***_super(), which is called from ->kill_sb instances. Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the reference to the fuse_conn, while it does the same on each error case during sb setup. This patch moves the destruction from fuse_put_super() to fuse_mount_destroy(), called at the end of all ->kill_sb instances. A follup patch will clean up the error paths. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: check s_root when destroying sbMiklos Szeredi
Checking "fm" works because currently sb->s_fs_info is cleared on error paths; however, sb->s_root is what generic_shutdown_super() checks to determine whether the sb was fully initialized or not. This change will allow cleanup of sb setup error paths. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-20autofs: fix wait name hash calculation in autofs_wait()Ian Kent
There's a mistake in commit 2be7828c9fefc ("get rid of autofs_getpath()") that affects kernels from v5.13.0, basically missed because of me not fully testing the change for Al. The problem is that the hash calculation for the wait name qstr hasn't been updated to account for the change to use dentry_path_raw(). This prevents the correct matching an existing wait resulting in multiple notifications being sent to the daemon for the same mount which must not occur. The problem wasn't discovered earlier because it only occurs when multiple processes trigger a request for the same mount concurrently so it only shows up in more aggressive testing. Fixes: 2be7828c9fefc ("get rid of autofs_getpath()") Cc: stable@vger.kernel.org Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-10-20Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two important filesystem fixes, marked for stable. The blocklisted superblocks issue was particularly annoying because for unexperienced users it essentially exacted a reboot to establish a new functional mount in that scenario" * tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client: ceph: fix handling of "meta" errors ceph: skip existing superblocks that are blocklisted or shut down when mounting
2021-10-20block: cleanup the flush plug helpersChristoph Hellwig
Consolidate the various helpers into a single blk_flush_plug helper that takes a plk_plug and the from_scheduler bool and switch all callsites to call it directly. Checks that the plug is non-NULL must be performed by the caller, something that most already do anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20io_uring: fix ltimeout unprepPavel Begunkov
io_unprep_linked_timeout() is broken, first it needs to return back REQ_F_ARM_LTIMEOUT, so the linked timeout is enqueued and disarmed. But now we refcounted it, and linked timeouts may get not executed at all, leaking a request. Just kill the unprep optimisation. Fixes: 906c6caaf586 ("io_uring: optimise io_prep_linked_timeout()") Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/51b8e2bfc4bea8ee625cf2ba62b2a350cc9be031.1634719585.git.asml.silence@gmail.com Link: https://github.com/axboe/liburing/issues/460 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20io_uring: apply max_workers limit to all future usersPavel Begunkov
Currently, IORING_REGISTER_IOWQ_MAX_WORKERS applies only to the task that issued it, it's unexpected for users. If one task creates a ring, limits workers and then passes it to another task the limit won't be applied to the other task. Another pitfall is that a task should either create a ring or submit at least one request for IORING_REGISTER_IOWQ_MAX_WORKERS to work at all, furher complicating the picture. Change the API, save the limits and apply to all future users. Note, it should be done first before giving away the ring or submitting new requests otherwise the result is not guaranteed. Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers") Link: https://github.com/axboe/liburing/issues/460 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/51d0bae97180e08ab722c0d5c93e7439cfb6f697.1634683237.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR())Changcheng Deng
Use ERR_CAST() instead of ERR_PTR(PTR_ERR()). This makes it more readable and also fix this warning detected by err_cast.cocci: ./fs/io_uring.c: WARNING: 3208: 11-18: ERR_CAST can be used with buf Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Link: https://lore.kernel.org/r/20211020084948.1038420-1-deng.changcheng@zte.com.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20ksmbd: add buffer validation in session setupMarios Makassikis
Make sure the security buffer's length/offset are valid with regards to the packet length. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-20ksmbd: throttle session setup failures to avoid dictionary attacksNamjae Jeon
To avoid dictionary attacks (repeated session setups rapidly sent) to connect to server, ksmbd make a delay of a 5 seconds on session setup failure to make it harder to send enough random connection requests to break into a server if a user insert the wrong password 10 times in a row. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-20ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requestsHyunchul Lee
Validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests and check the free size of response buffer for these requests. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-19io_uring: split logic of force_nonblockHao Xu
Currently force_nonblock stands for three meanings: - nowait or not - in an io-worker or not(hold uring_lock or not) Let's split the logic to two flags, IO_URING_F_NONBLOCK and IO_URING_F_UNLOCKED for convenience of the next patch. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20211018133431.103298-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io-wq: max_worker fixesPavel Begunkov
First, fix nr_workers checks against max_workers, with max_worker registration, it may pretty easily happen that nr_workers > max_workers. Also, synchronise writing to acct->max_worker with wqe->lock. It's not an actual problem, but as we don't care about io_wqe_create_worker(), it's better than WRITE_ONCE()/READ_ONCE(). Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers") Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/11f90e6b49410b7d1a88f5d04fb8d95bb86b8cf3.1634671835.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19locks: remove changelog commentsJ. Bruce Fields
This is only of historical interest, and anyone interested in the history can dig out an old version of locks.c from from git. Triggered by the observation that it references the now-removed Documentation/filesystems/mandatory-locking.rst. Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-10-19io_uring: warning about unused-but-set parameterArnd Bergmann
When enabling -Wunused warnings by building with W=1, I get an instance of the -Wunused-but-set-parameter warning in the io_uring code: fs/io_uring.c: In function 'io_queue_async_work': fs/io_uring.c:1445:61: error: parameter 'locked' set but not used [-Werror=unused-but-set-parameter] 1445 | static void io_queue_async_work(struct io_kiocb *req, bool *locked) | ~~~~~~^~~~~~ There are very few warnings of this type, so it would be nice to enable this by default and fix all the existing instances. As the assignment serves no purpose by itself other than to prevent developers from using the variable, an easy workaround is to remove the assignment and just rename the argument to "dont_use". Fixes: f237c30a5610 ("io_uring: batch task work locking") Link: https://lore.kernel.org/lkml/20210920121352.93063-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211019153507.348480-1-arnd@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: inform block layer of how many requests we are submittingJens Axboe
The block layer can use this knowledge to make smarter decisions on how to handle the request, if it knows that N more may be coming. Switch to using blk_start_plug_nr_ios() to pass in that information. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: simplify io_file_supports_nowait()Pavel Begunkov
Make sure that REQ_F_SUPPORT_NOWAIT is always set io_prep_rw(), and so we can stop caring about setting it down the line simplifying io_file_supports_nowait(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60c8f1f5e2cb45e00f4897b2cec10c5b3669da91.1634425438.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flagsPavel Begunkov
Merge REQ_F_NOWAIT_READ and REQ_F_NOWAIT_WRITE into one flag, i.e. REQ_F_SUPPORT_NOWAIT. First it gets rid of dependence on CONFIG_64BIT but also simplifies the code. One thing to consider is when we don't have ->{read,write}_iter and go through loop_rw_iter(). Just fail it with -EAGAIN if we expect nowait behaviour but not sure whether it supports it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f832a20e5186c2e79c6519280c238f559a1d2bbc.1634425438.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: arm poll for non-nowait filesPavel Begunkov
Don't check if we can do nowait before arming apoll, there are several reasons for that. First, we don't care much about files that don't support nowait. Second, it may be useful -- we don't want to be taking away extra workers from io-wq when it can go in some async. Even if it will go through io-wq eventually, it make difference in the numbers of workers actually used. And the last one, it's needed to clean nowait in future commits. [kernel test robot: fix unused-var] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d06f3cb2c8b686d970269a87986f154edb83043.1634425438.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>