summaryrefslogtreecommitdiff
path: root/include/linux/fsnotify_backend.h
AgeCommit message (Collapse)Author
2025-02-04fsnotify: add mount notification infrastructureMiklos Szeredi
This is just the plumbing between the event source (fs/namespace.c) and the event consumer (fanotify). In itself it does nothing. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129165803.72138-2-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-10fsnotify: pass optional file access range in pre-content eventAmir Goldstein
We would like to add file range information to pre-content events. Pass a struct file_range with offset and length to event handler along with pre-content permission event. The offset and length are aligned to page size, but we may need to align them to minimum folio size for filesystems with large block size. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/88eddee301231d814aede27fb4d5b41ae37c9702.1731684329.git.josef@toxicpanda.com
2024-12-10fsnotify: introduce pre-content permission eventsAmir Goldstein
The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM, but it meant for a different use case of filling file content before access to a file range, so it has slightly different semantics. Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, so content scanners could inspect the content filled by pre-content event handler. Unlike FS_ACCESS_PERM, FS_PRE_ACCESS is also called before a file is modified by syscalls as write() and fallocate(). FS_ACCESS_PERM is reported also on blockdev and pipes, but the new pre-content events are only reported for regular files and dirs. The pre-content events are meant to be used by hierarchical storage managers that want to fill the content of files on first access. There are some specific requirements from filesystems that could be used with pre-content events, so add a flag for fs to opt-in for pre-content events explicitly before they can be used. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/b934c5e3af205abc4e0e4709f6486815937ddfdf.1731684329.git.josef@toxicpanda.com
2024-12-10fanotify: reserve event bit of deprecated FAN_DIR_MODIFYAmir Goldstein
Avoid reusing it, because we would like to reserve it for future FAN_PATH_MODIFY pre-content event. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/632d9f80428e2e7a6b6a8ccc2925d87c92bbb518.1731684329.git.josef@toxicpanda.com
2024-12-10fsnotify: check if file is actually being watched for pre-content events on openAmir Goldstein
So far, we set FMODE_NONOTIFY_ flags at open time if we know that there are no permission event watchers at all on the filesystem, but lack of FMODE_NONOTIFY_ flags does not mean that the file is actually watched. For pre-content events, it is possible to optimize things so that we don't bother trying to send pre-content events if file was not watched (through sb, mnt, parent or inode itself) on open. Set FMODE_NONOTIFY_ flags according to that. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/2ddcc9f8d1fde48d085318a6b5a889289d8871d8.1731684329.git.josef@toxicpanda.com
2024-10-02inotify: Fix possible deadlock in fsnotify_destroy_markLizhi Xu
[Syzbot reported] WARNING: possible circular locking dependency detected 6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted ------------------------------------------------------ kswapd0/78 is trying to acquire lock: ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 but task is already holding lock: ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline] ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: ... kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044 inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline] inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline] __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline] __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&group->mark_mutex){+.+.}-{3:3}: ... __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934 fsnotify_inoderemove include/linux/fsnotify.h:264 [inline] dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403 __dentry_kill+0x20d/0x630 fs/dcache.c:610 shrink_kill+0xa9/0x2c0 fs/dcache.c:1055 shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082 prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163 super_cache_scan+0x34f/0x4b0 fs/super.c:221 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4815 shrink_many mm/vmscan.c:4876 [inline] lru_gen_shrink_node mm/vmscan.c:4954 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5934 kswapd_shrink_node mm/vmscan.c:6762 [inline] balance_pgdat mm/vmscan.c:6954 [inline] kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223 [Analysis] The problem is that inotify_new_watch() is using GFP_KERNEL to allocate new watches under group->mark_mutex, however if dentry reclaim races with unlinking of an inode, it can end up dropping the last dentry reference for an unlinked inode resulting in removal of fsnotify mark from reclaim context which wants to acquire group->mark_mutex as well. This scenario shows that all notification groups are in principle prone to this kind of a deadlock (previously, we considered only fanotify and dnotify to be problematic for other reasons) so make sure all allocations under group->mark_mutex happen with GFP_NOFS. Reported-and-tested-by: syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com
2024-06-05fsnotify: clear PARENT_WATCHED flags lazilyAmir Goldstein
In some setups directories can have many (usually negative) dentries. Hence __fsnotify_update_child_dentry_flags() function can take a significant amount of time. Since the bulk of this function happens under inode->i_lock this causes a significant contention on the lock when we remove the watch from the directory as the __fsnotify_update_child_dentry_flags() call from fsnotify_recalc_mask() races with __fsnotify_update_child_dentry_flags() calls from __fsnotify_parent() happening on children. This can lead upto softlockup reports reported by users. Fix the problem by calling fsnotify_update_children_dentry_flags() to set PARENT_WATCHED flags only when parent starts watching children. When parent stops watching children, clear false positive PARENT_WATCHED flags lazily in __fsnotify_parent() for each accessed child. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2024-04-17fsnotify: fix UAF from FS_ERROR event on a shutting down filesystemAmir Goldstein
Protect against use after free when filesystem calls fsnotify_sb_error() during fs shutdown. Move freeing of sb->s_fsnotify_info to destroy_super_work(), because it may be accessed from fs shutdown context. Reported-by: syzbot+5e3f9b2a67b45f16d4e6@syzkaller.appspotmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/linux-fsdevel/20240416173211.4lnmgctyo4jn5fha@quack3/ Fixes: 07a3b8d0bf72 ("fsnotify: lazy attach fsnotify_sb_info state to sb") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240416181452.567070-1-amir73il@gmail.com>
2024-04-04fsnotify: optimize the case of no permission event watchersAmir Goldstein
Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type") optimized the case where there are no fsnotify watchers on any of the filesystem's objects. It is quite common for a system to have a single local filesystem and it is quite common for the system to have some inotify watches on some config files or directories, so the optimization of no marks at all is often not in effect. Permission event watchers, which require high priority group are more rare, so optimizing the case of no marks og high priority groups can improve performance for more systems, especially for performance sensitive io workloads. Count per-sb watched objects by high priority groups and use that the optimize out the call to __fsnotify_parent() and fsnotify() in fsnotify permission hooks. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-11-amir73il@gmail.com>
2024-04-04fsnotify: use an enum for group priority constantsAmir Goldstein
And use meaningfull names for the constants. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-10-amir73il@gmail.com>
2024-04-04fsnotify: move s_fsnotify_connectors into fsnotify_sb_infoAmir Goldstein
Move the s_fsnotify_connectors counter into the per-sb fsnotify state. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-9-amir73il@gmail.com>
2024-04-04fsnotify: lazy attach fsnotify_sb_info state to sbAmir Goldstein
Define a container struct fsnotify_sb_info to hold per-sb state, including the reference to sb marks connector. Allocate the fsnotify_sb_info state before attaching connector to any object on the sb and free it only when killing sb. This state is going to be used for storing per priority watched objects counters. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-8-amir73il@gmail.com>
2024-04-04fsnotify: create helper fsnotify_update_sb_watchers()Amir Goldstein
We would like to count watched objects by priority group, so we will need to update the watched object counter after adding/removing marks. Create a helper fsnotify_update_sb_watchers() and call it after attaching/detaching a mark, instead of fsnotify_{get,put}_sb_watchers() only after attaching/detaching a connector. Soon, we will use this helper to count watched objects by the highest watching priority group. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-7-amir73il@gmail.com>
2024-04-04fsnotify: pass object pointer and type to fsnotify mark helpersAmir Goldstein
Instead of passing fsnotify_connp_t, pass the pointer to the marked object. Store the object pointer in the connector and move the definition of fsnotify_connp_t to internal fsnotify subsystem API, so it is no longer used by fsnotify backends. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-6-amir73il@gmail.com>
2024-04-04fsnotify: create a wrapper fsnotify_find_inode_mark()Amir Goldstein
In preparation to passing an object pointer to fsnotify_find_mark(), add a wrapper fsnotify_find_inode_mark() and use it where possible. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-4-amir73il@gmail.com>
2024-04-04fsnotify: rename fsnotify_{get,put}_sb_connectors()Amir Goldstein
Instead of counting the number of connectors in an sb, we would like to count the number of watched objects per priority group. As a start, create an accessor fsnotify_sb_watched_objects() to s_fsnotify_connectors and rename the fsnotify_{get,put}_sb_connectors() helpers to fsnotify_{get,put}_sb_watchers() to better describes the counter. Increment the counter at the end of fsnotify_attach_connector_to_object() if connector was attached instead of decrementing it on race to connect. This is fine, because fsnotify_delete_sb() cannot be running in parallel to fsnotify_attach_connector_to_object() which requires a reference to a filesystem object. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240317184154.1200192-2-amir73il@gmail.com>
2024-03-06fsnotify: Fix misspelling of "writable"Vicki Pfau
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the fsnotify header. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240306020831.1404033-2-vi@endrift.com>
2023-12-01fanotify: allow "weak" fsid when watching a single filesystemAmir Goldstein
So far, fanotify returns -ENODEV or -EXDEV when trying to set a mark on a filesystem with a "weak" fsid, namely, zero fsid (e.g. fuse), or non-uniform fsid (e.g. btrfs non-root subvol). When group is watching inodes all from the same filesystem (or subvol), allow adding inode marks with "weak" fsid, because there is no ambiguity regarding which filesystem reports the event. The first mark added to a group determines if this group is single or multi filesystem, depending on the fsid at the path of the added mark. If the first mark added has a "strong" fsid, marks with "weak" fsid cannot be added and vice versa. If the first mark added has a "weak" fsid, following marks must have the same "weak" fsid and the same sb as the first mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20231130165619.3386452-3-amir73il@gmail.com>
2023-12-01fanotify: store fsid in mark instead of in connectorAmir Goldstein
Some filesystems like fuse and nfs have zero or non-unique fsid. We would like to avoid reporting ambiguous fsid in events, so we need to avoid marking objects with same fsid and different sb. To make this easier to enforce, store the fsid in the marks of the group instead of in the shared conenctor. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20231130165619.3386452-2-amir73il@gmail.com>
2023-08-01fanotify: Remove unused extern declaration fsnotify_get_conn_fsid()YueHaibing
This is never used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230725135528.25996-1-yuehaibing@huawei.com>
2022-07-01fanotify: prepare for setting event flags in ignore maskAmir Goldstein
Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories. Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask. Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories. To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags. This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments. Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-05-18fsnotify: introduce mark type iteratorAmir Goldstein
fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code of iterating all marks of a specific group interested in an event by consulting the iterator report_mask. Use an open coded version of that iterator in fsnotify_iter_next() that collects all marks of the current iteration group without consulting the iterator report_mask. At the moment, the two iterator variants are the same, but this decoupling will allow us to exclude some of the group's marks from reporting the event, for example for event on child and inode marks on parent did not request to watch events on children. Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara <jack@suse.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220511190213.831646-2-amir73il@gmail.com
2022-04-25fsnotify: allow adding an inode mark without pinning inodeAmir Goldstein
fsnotify_add_mark() and variants implicitly take a reference on inode when attaching a mark to an inode. Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF. Instead of taking the inode reference when attaching connector to inode and dropping the inode reference when detaching connector from inode, take the inode reference on attach of the first mark that wants to hold an inode reference and drop the inode reference on detach of the last mark that wants to hold an inode reference. Backends can "upgrade" an existing mark to take an inode reference, but cannot "downgrade" a mark with inode reference to release the refernce. This leaves the choice to the backend whether or not to pin the inode when adding an inode mark. This is intended to be used when adding a mark with ignored mask that is used for optimization in cases where group can afford getting unneeded events and reinstate the mark with ignored mask when inode is accessed again after being evicted. Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: create helpers for group mark_mutex lockAmir Goldstein
Create helpers to take and release the group mark_mutex lock. Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines if the mark_mutex lock is fs reclaim safe or not. If not safe, the lock helpers take the lock and disable direct fs reclaim. In that case we annotate the mutex with a different lockdep class to express to lockdep that an allocation of mark of an fs reclaim safe group may take the group lock of another "NOFS" group to evict inodes. For now, converted only the callers in common code and no backend defines the NOFS flag. It is intended to be set by fanotify for evictable marks support. Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: make allow_dups a property of the groupAmir Goldstein
Instead of passing the allow_dups argument to fsnotify_add_mark() as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express the allow_dups behavior and set this behavior at group creation time for all calls of fsnotify_add_mark(). Rename the allow_dups argument to generic add_flags argument for future use. Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25fsnotify: pass flags argument to fsnotify_alloc_group()Amir Goldstein
Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation. Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags. Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25inotify: move control flags from mask to mark flagsAmir Goldstein
The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not relevant to object interest mask, so move them to the mark flags. This frees up some bits in the object interest mask. Link: https://lore.kernel.org/r/20220422120327.3459282-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-02-24fsnotify: optimize FS_MODIFY events with no ignored masksAmir Goldstein
fsnotify() treats FS_MODIFY events specially - it does not skip them even if the FS_MODIFY event does not apear in the object's fsnotify mask. This is because send_to_group() checks if FS_MODIFY needs to clear ignored mask of marks. The common case is that an object does not have any mark with ignored mask and in particular, that it does not have a mark with ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag. Set FS_MODIFY in object's fsnotify mask during fsnotify_recalc_mask() if object has a mark with an ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag and remove the special treatment of FS_MODIFY in fsnotify(), so that FS_MODIFY events could be optimized in the common case. Call fsnotify_recalc_mask() from fanotify after adding or removing an ignored mask from a mark without FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY or when adding the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag to a mark with ignored mask (the flag cannot be removed by fanotify uapi). Performance results for doing 10000000 write(2)s to tmpfs: vanilla patched without notification mark 25.486+-1.054 24.965+-0.244 with notification mark 30.111+-0.139 26.891+-1.355 So we can see the overhead of notification subsystem has been drastically reduced. Link: https://lore.kernel.org/r/20220223151438.790268-3-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-02-24fsnotify: fix merge with parent's ignored maskAmir Goldstein
fsnotify_parent() does not consider the parent's mark at all unless the parent inode shows interest in events on children and in the specific event. So unless parent added an event to both its mark mask and ignored mask, the event will not be ignored. Fix this by declaring the interest of an object in an event when the event is in either a mark mask or ignored mask. Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-12-15fsnotify: generate FS_RENAME event with rich informationAmir Goldstein
The dnotify FS_DN_RENAME event is used to request notification about a move within the same parent directory and was always coupled with the FS_MOVED_FROM event. Rename the FS_DN_RENAME event flag to FS_RENAME, decouple it from FS_MOVED_FROM and report it with the moved dentry instead of the moved inode, so it has the information about both old and new parent and name. Generate the FS_RENAME event regardless of same parent dir and apply the "same parent" rule in the generic fsnotify_handle_event() helper that is used to call backends with ->handle_inode_event() method (i.e. dnotify). The ->handle_inode_event() method is not rich enough to report both old and new parent and name anyway. The enriched event is reported to fanotify over the ->handle_event() method with the old and new dir inode marks in marks array slots for ITER_TYPE_INODE and a new iter type slot ITER_TYPE_INODE2. The enriched event will be used for reporting old and new parent+name to fanotify groups with FAN_RENAME events. Link: https://lore.kernel.org/r/20211129201537.1932819-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-12-15fsnotify: separate mark iterator type from object type enumAmir Goldstein
They are two different types that use the same enum, so this confusing. Use the object type to indicate the type of object mark is attached to and the iter type to indicate the type of watch. A group can have two different watches of the same object type (parent and child watches) that match the same event. Link: https://lore.kernel.org/r/20211129201537.1932819-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-12-15fsnotify: clarify object type argumentAmir Goldstein
In preparation for separating object type from iterator type, rename some 'type' arguments in functions to 'obj_type' and remove the unused interface to clear marks by object type mask. Link: https://lore.kernel.org/r/20211129201537.1932819-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fanotify: Pre-allocate pool of error eventsGabriel Krisman Bertazi
Pre-allocate slots for file system errors to have greater chances of succeeding, since error events can happen in GFP_NOFS context. This patch introduces a group-wide mempool of error events, shared by all FAN_FS_ERROR marks in this group. Link: https://lore.kernel.org/r/20211025192746.66445-20-krisman@collabora.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: Support FS_ERROR event typeGabriel Krisman Bertazi
Expose a new type of fsnotify event for filesystems to report errors for userspace monitoring tools. fanotify will send this type of notification for FAN_FS_ERROR events. This also introduce a helper for generating the new event. Link: https://lore.kernel.org/r/20211025192746.66445-18-krisman@collabora.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: Pass group argument to free_eventGabriel Krisman Bertazi
For group-wide mempool backed events, like FS_ERROR, the free_event callback will need to reference the group's mempool to free the memory. Wire that argument into the current callers. Link: https://lore.kernel.org/r/20211025192746.66445-13-krisman@collabora.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: Protect fsnotify_handle_inode_event from no-inode eventsGabriel Krisman Bertazi
FAN_FS_ERROR allows events without inodes - i.e. for file system-wide errors. Even though fsnotify_handle_inode_event is not currently used by fanotify, this patch protects other backends from cases where neither inode or dir are provided. Also document the constraints of the interface (inode and dir cannot be both NULL). Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: Retrieve super block from the data fieldGabriel Krisman Bertazi
Some file system events (i.e. FS_ERROR) might not be associated with an inode or directory. For these, we can retrieve the super block from the data field. But, since the super_block is available in the data field on every event type, simplify the code to always retrieve it from there, through a new helper. Link: https://lore.kernel.org/r/20211025192746.66445-11-krisman@collabora.com Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: Add wrapper around fsnotify_add_eventGabriel Krisman Bertazi
fsnotify_add_event is growing in number of parameters, which in most case are just passed a NULL pointer. So, split out a new fsnotify_insert_event function to clean things up for users who don't need an insert hook. Link: https://lore.kernel.org/r/20211025192746.66445-10-krisman@collabora.com Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: Add helper to detect overflow_eventGabriel Krisman Bertazi
Similarly to fanotify_is_perm_event and friends, provide a helper predicate to say whether a mask is of an overflow event. Link: https://lore.kernel.org/r/20211025192746.66445-9-krisman@collabora.com Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fsnotify: pass dentry instead of inode dataAmir Goldstein
Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY. Use it to pass the dentry instead of it's ->d_inode where available. This is needed in preparation to the refactor to retrieve the super block from the data field. In some cases (i.e. mkdir in kernfs), the data inode comes from a negative dentry, such that no super block information would be available. By receiving the dentry itself, instead of the inode, fsnotify can derive the super block even on these cases. Link: https://lore.kernel.org/r/20211025192746.66445-3-krisman@collabora.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> [Expand explanation in commit message] Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-16fanotify: configurable limits via sysfsAmir Goldstein
fanotify has some hardcoded limits. The only APIs to escape those limits are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS. Allow finer grained tuning of the system limits via sysfs tunables under /proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify, with some minor differences. - max_queued_events - global system tunable for group queue size limit. Like the inotify tunable with the same name, it defaults to 16384 and applies on initialization of a new group. - max_user_marks - user ns tunable for marks limit per user. Like the inotify tunable named max_user_watches, on a machine with sufficient RAM and it defaults to 1048576 in init userns and can be further limited per containing user ns. - max_user_groups - user ns tunable for number of groups per user. Like the inotify tunable named max_user_instances, it defaults to 128 in init userns and can be further limited per containing user ns. The slightly different tunable names used for fanotify are derived from the "group" and "mark" terminology used in the fanotify man pages and throughout the code. Considering the fact that the default value for max_user_instances was increased in kernel v5.10 from 8192 to 1048576, leaving the legacy fanotify limit of 8192 marks per group in addition to the max_user_marks limit makes little sense, so the per group marks limit has been removed. Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own marks are not accounted in the per user marks account, so in effect the limit of max_user_marks is only for the collection of groups that are not initialized with FAN_UNLIMITED_MARKS. Link: https://lore.kernel.org/r/20210304112921.3996419-2-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-16fsnotify: use hash table for faster events mergeAmir Goldstein
In order to improve event merge performance, hash events in a 128 size hash table by the event merge key. The fanotify_event size grows by two pointers, but we just reduced its size by removing the objectid member, so overall its size is increased by one pointer. Permission events and overflow event are not merged so they are also not hashed. Link: https://lore.kernel.org/r/20210304104826.3993892-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-16fanotify: reduce event objectid to 29-bit hashAmir Goldstein
objectid is only used by fanotify backend and it is just an optimization for event merge before comparing all fields in event. Move the objectid member from common struct fsnotify_event into struct fanotify_event and reduce it to 29-bit hash to cram it together with the 3-bit event type. Events of different types are never merged, so the combination of event type and hash form a 32-bit key for fast compare of events. This reduces the size of events by one pointer and paves the way for adding hashed queue support for fanotify. Link: https://lore.kernel.org/r/20210304104826.3993892-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-16fsnotify: allow fsnotify_{peek,remove}_first_event with empty queueAmir Goldstein
Current code has an assumtion that fsnotify_notify_queue_is_empty() is called to verify that queue is not empty before trying to peek or remove an event from queue. Remove this assumption by moving the fsnotify_notify_queue_is_empty() into the functions, allow them to return NULL value and check return value by all callers. This is a prep patch for multi event queues. Link: https://lore.kernel.org/r/20210304104826.3993892-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-01-05inotify, memcg: account inotify instances to kmemcgShakeel Butt
Currently the fs sysctl inotify/max_user_instances is used to limit the number of inotify instances on the system. For systems running multiple workloads, the per-user namespace sysctl max_inotify_instances can be used to further partition inotify instances. However there is no easy way to set a sensible system level max limit on inotify instances and further partition it between the workloads. It is much easier to charge the underlying resource (i.e. memory) behind the inotify instances to the memcg of the workload and let their memory limits limit the number of inotify instances they can create. With inotify instances charged to memcg, the admin can simply set max_user_instances to INT_MAX and let the memcg limits of the jobs limit their inotify instances. Link: https://lore.kernel.org/r/20201220044608.1258123-1-shakeelb@google.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-12-11fsnotify: fix events reported to watching parent and childAmir Goldstein
fsnotify_parent() used to send two separate events to backends when a parent inode is watching children and the child inode is also watching. In an attempt to avoid duplicate events in fanotify, we unified the two backend callbacks to a single callback and handled the reporting of the two separate events for the relevant backends (inotify and dnotify). However the handling is buggy and can result in inotify and dnotify listeners receiving events of the type they never asked for or spurious events. The problem is the unified event callback with two inode marks (parent and child) is called when any of the parent and child inodes are watched and interested in the event, but the parent inode's mark that is interested in the event on the child is not necessarily the one we are currently reporting to (it could belong to a different group). So before reporting the parent or child event flavor to backend we need to check that the mark is really interested in that event flavor. The semantics of INODE and CHILD marks were hard to follow and made the logic more complicated than it should have been. Replace it with INODE and PARENT marks semantics to hopefully make the logic more clear. Thanks to Hugh Dickins for spotting a bug in the earlier version of this patch. Fixes: 497b0c5a7c06 ("fsnotify: send event to parent and child with single callback") CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201202120713.702387-4-amir73il@gmail.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-12-03fsnotify: generalize handle_inode_event()Amir Goldstein
The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/20201202120713.702387-2-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: create method handle_inode_event() in fsnotify_operationsAmir Goldstein
The method handle_event() grew a lot of complexity due to the design of fanotify and merging of ignore masks. Most backends do not care about this complex functionality, so we can hide this complexity from them. Introduce a method handle_inode_event() that serves those backends and passes a single inode mark and less arguments. This change converts all backends except fanotify and inotify to use the simplified handle_inode_event() method. In pricipal, inotify could have also used the new method, but that would require passing more arguments on the simple helper (data, data_type, cookie), so we leave it with the handle_event() method. Link: https://lore.kernel.org/r/20200722125849.17418-9-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: send event with parent/name info to sb/mount/non-dir marksAmir Goldstein
Similar to events "on child" to watching directory, send event with parent/name info if sb/mount/non-dir marks are interested in parent/name info. The FS_EVENT_ON_CHILD flag can be set on sb/mount/non-dir marks to specify interest in parent/name info for events on non-directory inodes. Events on "orphan" children (disconnected dentries) are sent without parent/name info. Events on directories are sent with parent/name info only if the parent directory is watching. After this change, even groups that do not subscribe to events on children could get an event with mark iterator type TYPE_CHILD and without mark iterator type TYPE_INODE if fanotify has marks on the same objects. dnotify and inotify event handlers can already cope with that situation. audit does not subscribe to events that are possible on child, so won't get to this situation. nfsd does not access the marks iterator from its event handler at the moment, so it is not affected. This is a bit too fragile, so we should prepare all groups to cope with mark type TYPE_CHILD preferably using a generic helper. Link: https://lore.kernel.org/r/20200716084230.30611-16-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: pass dir and inode arguments to fsnotify()Amir Goldstein
The arguments of fsnotify() are overloaded and mean different things for different event types. Replace the to_tell argument with separate arguments @dir and @inode, because we may be sending to both dir and child. Using the @data argument to pass the child is not enough, because dirent events pass this argument (for audit), but we do not report to child. Document the new fsnotify() function argumenets. Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>