diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/9p.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/bcachefs/SubmittingPatches.rst | 4 | ||||
-rw-r--r-- | Documentation/filesystems/coda.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/debugfs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/fscrypt.rst | 8 | ||||
-rw-r--r-- | Documentation/filesystems/fsverity.rst | 16 | ||||
-rw-r--r-- | Documentation/filesystems/index.rst | 1 | ||||
-rw-r--r-- | Documentation/filesystems/iomap/design.rst | 9 | ||||
-rw-r--r-- | Documentation/filesystems/iomap/operations.rst | 42 | ||||
-rw-r--r-- | Documentation/filesystems/locking.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/netfs_library.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/overlayfs.rst | 24 | ||||
-rw-r--r-- | Documentation/filesystems/porting.rst | 48 | ||||
-rw-r--r-- | Documentation/filesystems/sysv-fs.rst | 264 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.rst | 23 | ||||
-rw-r--r-- | Documentation/filesystems/xfs/xfs-delayed-logging-design.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/xfs/xfs-maintainer-entry-profile.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/xfs/xfs-online-fsck-design.rst | 4 |
18 files changed, 149 insertions, 308 deletions
diff --git a/Documentation/filesystems/9p.rst b/Documentation/filesystems/9p.rst index 2bbf68b56b0d..3078f3c9256a 100644 --- a/Documentation/filesystems/9p.rst +++ b/Documentation/filesystems/9p.rst @@ -90,7 +90,7 @@ Just start the 9pfs capable network server like diod/nfs-ganesha e.g.:: $ diod -f -n -d 0 -S -l 0.0.0.0:9999 -e $PWD -Optionaly scan your bus if there are more then one usbg gadgets to find their path:: +Optionally scan your bus if there are more then one usbg gadgets to find their path:: $ python $kernel_dir/tools/usb/p9_fwd.py list diff --git a/Documentation/filesystems/bcachefs/SubmittingPatches.rst b/Documentation/filesystems/bcachefs/SubmittingPatches.rst index 026b12ae0d6a..4b79ca58faf2 100644 --- a/Documentation/filesystems/bcachefs/SubmittingPatches.rst +++ b/Documentation/filesystems/bcachefs/SubmittingPatches.rst @@ -30,7 +30,7 @@ CI: === Instead of running your tests locally, when running the full test suite it's -prefereable to let a server farm do it in parallel, and then have the results +preferable to let a server farm do it in parallel, and then have the results in a nice test dashboard (which can tell you which failures are new, and presents results in a git log view, avoiding the need for most bisecting). @@ -68,7 +68,7 @@ Other things to think about: land - use them. Use them judiciously, and not as a replacement for proper error handling, but use them. -- Does it need to be performance tested? Should we add new peformance counters? +- Does it need to be performance tested? Should we add new performance counters? bcachefs has a set of persistent runtime counters which can be viewed with the 'bcachefs fs top' command; this should give users a basic idea of what diff --git a/Documentation/filesystems/coda.rst b/Documentation/filesystems/coda.rst index bdde7e4e010b..0db3c83a50e5 100644 --- a/Documentation/filesystems/coda.rst +++ b/Documentation/filesystems/coda.rst @@ -141,7 +141,7 @@ kernel support. a process P which accessing a Coda file. It makes a system call which traps to the OS kernel. Examples of such calls trapping to the kernel are ``read``, ``write``, ``open``, ``close``, ``create``, ``mkdir``, - ``rmdir``, ``chmod`` in a Unix ontext. Similar calls exist in the Win32 + ``rmdir``, ``chmod`` in a Unix context. Similar calls exist in the Win32 environment, and are named ``CreateFile``. Generally the operating system handles the request in a virtual diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst index f7f977ffbf8d..610f718ef8b5 100644 --- a/Documentation/filesystems/debugfs.rst +++ b/Documentation/filesystems/debugfs.rst @@ -220,7 +220,7 @@ There are a couple of other directory-oriented helper functions:: A call to debugfs_change_name() will give a new name to an existing debugfs file, always in the same directory. The new_name must not exist prior -to the call; the return value is 0 on success and -E... on failuer. +to the call; the return value is 0 on success and -E... on failure. Symbolic links can be created with debugfs_create_symlink(). There is one important thing that all debugfs users must take into account: diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index 04eaab01314b..e80329908549 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -137,9 +137,8 @@ However, these ioctls have some limitations: - In general, decrypted contents and filenames in the kernel VFS caches are freed but not wiped. Therefore, portions thereof may be recoverable from freed memory, even after the corresponding key(s) - were wiped. To partially solve this, you can set - CONFIG_PAGE_POISONING=y in your kernel config and add page_poison=1 - to your kernel command line. However, this has a performance cost. + were wiped. To partially solve this, you can add init_on_free=1 to + your kernel command line. However, this has a performance cost. - Secret keys might still exist in CPU registers, in crypto accelerator hardware (if used by the crypto API to implement any of @@ -428,11 +427,8 @@ API, but the filenames mode still does. - Mandatory: - CONFIG_CRYPTO_ADIANTUM - Recommended: - - arm32: CONFIG_CRYPTO_CHACHA20_NEON - arm32: CONFIG_CRYPTO_NHPOLY1305_NEON - - arm64: CONFIG_CRYPTO_CHACHA20_NEON - arm64: CONFIG_CRYPTO_NHPOLY1305_NEON - - x86: CONFIG_CRYPTO_CHACHA20_X86_64 - x86: CONFIG_CRYPTO_NHPOLY1305_SSE2 - x86: CONFIG_CRYPTO_NHPOLY1305_AVX2 diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index 76e538217868..dacdbc1149e6 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -248,11 +248,17 @@ FS_IOC_READ_VERITY_METADATA The FS_IOC_READ_VERITY_METADATA ioctl reads verity metadata from a verity file. This ioctl is available since Linux v5.12. -This ioctl allows writing a server program that takes a verity file -and serves it to a client program, such that the client can do its own -fs-verity compatible verification of the file. This only makes sense -if the client doesn't trust the server and if the server needs to -provide the storage for the client. +This ioctl is useful for cases where the verity verification should be +performed somewhere other than the currently running kernel. + +One example is a server program that takes a verity file and serves it +to a client program, such that the client can do its own fs-verity +compatible verification of the file. This only makes sense if the +client doesn't trust the server and if the server needs to provide the +storage for the client. + +Another example is copying verity metadata when creating filesystem +images in userspace (such as with ``mkfs.ext4 -d``). This is a fairly specialized use case, and most fs-verity users won't need this ioctl. diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 2636f2a41bd3..a9cf8e950b15 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -118,7 +118,6 @@ Documentation for filesystem implementations. spufs/index squashfs sysfs - sysv-fs tmpfs ubifs ubifs-authentication diff --git a/Documentation/filesystems/iomap/design.rst b/Documentation/filesystems/iomap/design.rst index b0d0188a095e..e29651a42eec 100644 --- a/Documentation/filesystems/iomap/design.rst +++ b/Documentation/filesystems/iomap/design.rst @@ -246,6 +246,10 @@ The fields are as follows: * **IOMAP_F_PRIVATE**: Starting with this value, the upper bits can be set by the filesystem for its own purposes. + * **IOMAP_F_ANON_WRITE**: Indicates that (write) I/O does not have a target + block assigned to it yet and the file system will do that in the bio + submission handler, splitting the I/O as needed. + These flags can be set by iomap itself during file operations. The filesystem should supply an ``->iomap_end`` function if it needs to observe these flags: @@ -352,6 +356,11 @@ operations: ``IOMAP_NOWAIT`` is often set on behalf of ``IOCB_NOWAIT`` or ``RWF_NOWAIT``. + * ``IOMAP_DONTCACHE`` is set when the caller wishes to perform a + buffered file I/O and would like the kernel to drop the pagecache + after the I/O completes, if it isn't already being used by another + thread. + If it is necessary to read existing file contents from a `different <https://lore.kernel.org/all/20191008071527.29304-9-hch@lst.de/>`_ device or address range on a device, the filesystem should return that diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst index 2c7f5df9d8b0..3b628e370d88 100644 --- a/Documentation/filesystems/iomap/operations.rst +++ b/Documentation/filesystems/iomap/operations.rst @@ -131,6 +131,8 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap: * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``. + * ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``. + Internal per-Folio State ------------------------ @@ -283,7 +285,7 @@ The ``ops`` structure must be specified and is as follows: struct iomap_writeback_ops { int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset, unsigned len); - int (*prepare_ioend)(struct iomap_ioend *ioend, int status); + int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status); void (*discard_folio)(struct folio *folio, loff_t pos); }; @@ -306,13 +308,12 @@ The fields are as follows: purpose. This function must be supplied by the filesystem. - - ``prepare_ioend``: Enables filesystems to transform the writeback - ioend or perform any other preparatory work before the writeback I/O - is submitted. + - ``submit_ioend``: Allows the file systems to hook into writeback bio + submission. This might include pre-write space accounting updates, or installing a custom ``->bi_end_io`` function for internal purposes, such as deferring the ioend completion to a workqueue to run metadata update - transactions from process context. + transactions from process context before submitting the bio. This function is optional. - ``discard_folio``: iomap calls this function after ``->map_blocks`` @@ -341,7 +342,7 @@ This can happen in interrupt or process context, depending on the storage device. Filesystems that need to update internal bookkeeping (e.g. unwritten -extent conversions) should provide a ``->prepare_ioend`` function to +extent conversions) should provide a ``->submit_ioend`` function to set ``struct iomap_end::bio::bi_end_io`` to its own function. This function should call ``iomap_finish_ioends`` after finishing its own work (e.g. unwritten extent conversion). @@ -515,18 +516,33 @@ IOMAP_WRITE`` with any combination of the following enhancements: * ``IOMAP_ATOMIC``: This write is being issued with torn-write protection. - Only a single bio can be created for the write, and the write must - not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be - set. + Torn-write protection may be provided based on HW-offload or by a + software mechanism provided by the filesystem. + + For HW-offload based support, only a single bio can be created for the + write, and the write must not be split into multiple I/O requests, i.e. + flag REQ_ATOMIC must be set. The file range to write must be aligned to satisfy the requirements of both the filesystem and the underlying block device's atomic commit capabilities. If filesystem metadata updates are required (e.g. unwritten extent - conversion or copy on write), all updates for the entire file range + conversion or copy-on-write), all updates for the entire file range must be committed atomically as well. - Only one space mapping is allowed per untorn write. - Untorn writes must be aligned to, and must not be longer than, a - single file block. + Untorn-writes may be longer than a single file block. In all cases, + the mapping start disk block must have at least the same alignment as + the write offset. + The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an + untorn-write based on HW-offload. + + For untorn-writes based on a software mechanism provided by the + filesystem, all the disk block alignment and single bio restrictions + which apply for HW-offload based untorn-writes do not apply. + The mechanism would typically be used as a fallback for when + HW-offload based untorn-writes may not be issued, e.g. the range of the + write covers multiple extents, meaning that it is not possible to issue + a single bio. + All filesystem metadata updates for the entire file range must be + committed atomically as well. Callers commonly hold ``i_rwsem`` in shared or exclusive mode before calling this function. diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index d20a32b77b60..0ec0bb6eb0fb 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -66,7 +66,7 @@ prototypes:: int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *); - int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t); + struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *, diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst index 73f0bfd7e903..3886c14f89f4 100644 --- a/Documentation/filesystems/netfs_library.rst +++ b/Documentation/filesystems/netfs_library.rst @@ -515,7 +515,7 @@ The methods defined in the table are: the cache to expand a request in either direction. This allows the cache to size the request appropriately for the cache granularity. - The function is passed poiners to the start and length in its parameters, + The function is passed pointers to the start and length in its parameters, plus the size of the file for reference, and adjusts the start and length appropriately. It should return one of: diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 6245b67ae9e0..2db379b4b31e 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -292,13 +292,27 @@ rename or unlink will of course be noticed and handled). Permission model ---------------- +An overlay filesystem stashes credentials that will be used when +accessing lower or upper filesystems. + +In the old mount api the credentials of the task calling mount(2) are +stashed. In the new mount api the credentials of the task creating the +superblock through FSCONFIG_CMD_CREATE command of fsconfig(2) are +stashed. + +Starting with kernel v6.15 it is possible to use the "override_creds" +mount option which will cause the credentials of the calling task to be +recorded. Note that "override_creds" is only meaningful when used with +the new mount api as the old mount api combines setting options and +superblock creation in a single mount(2) syscall. + Permission checking in the overlay filesystem follows these principles: 1) permission check SHOULD return the same result before and after copy up 2) task creating the overlay mount MUST NOT gain additional privileges - 3) non-mounting task MAY gain additional privileges through the overlay, + 3) task[*] MAY gain additional privileges through the overlay, compared to direct access on underlying lower or upper filesystems This is achieved by performing two permission checks on each access: @@ -306,7 +320,7 @@ This is achieved by performing two permission checks on each access: a) check if current task is allowed access based on local DAC (owner, group, mode and posix acl), as well as MAC checks - b) check if mounting task would be allowed real operation on lower or + b) check if stashed credentials would be allowed real operation on lower or upper layer based on underlying filesystem permissions, again including MAC checks @@ -315,10 +329,10 @@ are copied up. On the other hand it can result in server enforced permissions (used by NFS, for example) being ignored (3). Check (b) ensures that no task gains permissions to underlying layers that -the mounting task does not have (2). This also means that it is possible +the stashed credentials do not have (2). This also means that it is possible to create setups where the consistency rule (1) does not hold; normally, -however, the mounting task will have sufficient privileges to perform all -operations. +however, the stashed credentials will have sufficient privileges to +perform all operations. Another way to demonstrate this model is drawing parallels between:: diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 1639e78e3146..767b2927c762 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1144,7 +1144,7 @@ and it *must* be opened exclusive. --- -** mandatory** +**mandatory** ->d_revalidate() gets two extra arguments - inode of parent directory and name our dentry is expected to have. Both are stable (dir is pinned in @@ -1157,3 +1157,49 @@ in normal case it points into the pathname being looked up. NOTE: if you need something like full path from the root of filesystem, you are still on your own - this assists with simple cases, but it's not magic. + +--- + +**recommended** + +kern_path_locked() and user_path_locked() no longer return a negative +dentry so this doesn't need to be checked. If the name cannot be found, +ERR_PTR(-ENOENT) is returned. + +--- + +**recommended** + +lookup_one_qstr_excl() is changed to return errors in more cases, so +these conditions don't require explicit checks: + + - if LOOKUP_CREATE is NOT given, then the dentry won't be negative, + ERR_PTR(-ENOENT) is returned instead + - if LOOKUP_EXCL IS given, then the dentry won't be positive, + ERR_PTR(-EEXIST) is rreturned instread + +LOOKUP_EXCL now means "target must not exist". It can be combined with +LOOK_CREATE or LOOKUP_RENAME_TARGET. + +--- + +**mandatory** +invalidate_inodes() is gone use evict_inodes() instead. + +--- + +**mandatory** + +->mkdir() now returns a dentry. If the created inode is found to +already be in cache and have a dentry (often IS_ROOT()), it will need to +be spliced into the given name in place of the given dentry. That dentry +now needs to be returned. If the original dentry is used, NULL should +be returned. Any error should be returned with ERR_PTR(). + +In general, filesystems which use d_instantiate_new() to install the new +inode can safely return NULL. Filesystems which may not have an I_NEW inode +should use d_drop();d_splice_alias() and return the result of the latter. + +If a positive dentry cannot be returned for some reason, in-kernel +clients such as cachefiles, nfsd, smb/server may not perform ideally but +will fail-safe. diff --git a/Documentation/filesystems/sysv-fs.rst b/Documentation/filesystems/sysv-fs.rst deleted file mode 100644 index 89e40911ad7c..000000000000 --- a/Documentation/filesystems/sysv-fs.rst +++ /dev/null @@ -1,264 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -================== -SystemV Filesystem -================== - -It implements all of - - Xenix FS, - - SystemV/386 FS, - - Coherent FS. - -To install: - -* Answer the 'System V and Coherent filesystem support' question with 'y' - when configuring the kernel. -* To mount a disk or a partition, use:: - - mount [-r] -t sysv device mountpoint - - The file system type names:: - - -t sysv - -t xenix - -t coherent - - may be used interchangeably, but the last two will eventually disappear. - -Bugs in the present implementation: - -- Coherent FS: - - - The "free list interleave" n:m is currently ignored. - - Only file systems with no filesystem name and no pack name are recognized. - (See Coherent "man mkfs" for a description of these features.) - -- SystemV Release 2 FS: - - The superblock is only searched in the blocks 9, 15, 18, which - corresponds to the beginning of track 1 on floppy disks. No support - for this FS on hard disk yet. - - -These filesystems are rather similar. Here is a comparison with Minix FS: - -* Linux fdisk reports on partitions - - - Minix FS 0x81 Linux/Minix - - Xenix FS ?? - - SystemV FS ?? - - Coherent FS 0x08 AIX bootable - -* Size of a block or zone (data allocation unit on disk) - - - Minix FS 1024 - - Xenix FS 1024 (also 512 ??) - - SystemV FS 1024 (also 512 and 2048) - - Coherent FS 512 - -* General layout: all have one boot block, one super block and - separate areas for inodes and for directories/data. - On SystemV Release 2 FS (e.g. Microport) the first track is reserved and - all the block numbers (including the super block) are offset by one track. - -* Byte ordering of "short" (16 bit entities) on disk: - - - Minix FS little endian 0 1 - - Xenix FS little endian 0 1 - - SystemV FS little endian 0 1 - - Coherent FS little endian 0 1 - - Of course, this affects only the file system, not the data of files on it! - -* Byte ordering of "long" (32 bit entities) on disk: - - - Minix FS little endian 0 1 2 3 - - Xenix FS little endian 0 1 2 3 - - SystemV FS little endian 0 1 2 3 - - Coherent FS PDP-11 2 3 0 1 - - Of course, this affects only the file system, not the data of files on it! - -* Inode on disk: "short", 0 means non-existent, the root dir ino is: - - ================================= == - Minix FS 1 - Xenix FS, SystemV FS, Coherent FS 2 - ================================= == - -* Maximum number of hard links to a file: - - =========== ========= - Minix FS 250 - Xenix FS ?? - SystemV FS ?? - Coherent FS >=10000 - =========== ========= - -* Free inode management: - - - Minix FS - a bitmap - - Xenix FS, SystemV FS, Coherent FS - There is a cache of a certain number of free inodes in the super-block. - When it is exhausted, new free inodes are found using a linear search. - -* Free block management: - - - Minix FS - a bitmap - - Xenix FS, SystemV FS, Coherent FS - Free blocks are organized in a "free list". Maybe a misleading term, - since it is not true that every free block contains a pointer to - the next free block. Rather, the free blocks are organized in chunks - of limited size, and every now and then a free block contains pointers - to the free blocks pertaining to the next chunk; the first of these - contains pointers and so on. The list terminates with a "block number" - 0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS. - -* Super-block location: - - =========== ========================== - Minix FS block 1 = bytes 1024..2047 - Xenix FS block 1 = bytes 1024..2047 - SystemV FS bytes 512..1023 - Coherent FS block 1 = bytes 512..1023 - =========== ========================== - -* Super-block layout: - - - Minix FS:: - - unsigned short s_ninodes; - unsigned short s_nzones; - unsigned short s_imap_blocks; - unsigned short s_zmap_blocks; - unsigned short s_firstdatazone; - unsigned short s_log_zone_size; - unsigned long s_max_size; - unsigned short s_magic; - - - Xenix FS, SystemV FS, Coherent FS:: - - unsigned short s_firstdatazone; - unsigned long s_nzones; - unsigned short s_fzone_count; - unsigned long s_fzones[NICFREE]; - unsigned short s_finode_count; - unsigned short s_finodes[NICINOD]; - char s_flock; - char s_ilock; - char s_modified; - char s_rdonly; - unsigned long s_time; - short s_dinfo[4]; -- SystemV FS only - unsigned long s_free_zones; - unsigned short s_free_inodes; - short s_dinfo[4]; -- Xenix FS only - unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only - char s_fname[6]; - char s_fpack[6]; - - then they differ considerably: - - Xenix FS:: - - char s_clean; - char s_fill[371]; - long s_magic; - long s_type; - - SystemV FS:: - - long s_fill[12 or 14]; - long s_state; - long s_magic; - long s_type; - - Coherent FS:: - - unsigned long s_unique; - - Note that Coherent FS has no magic. - -* Inode layout: - - - Minix FS:: - - unsigned short i_mode; - unsigned short i_uid; - unsigned long i_size; - unsigned long i_time; - unsigned char i_gid; - unsigned char i_nlinks; - unsigned short i_zone[7+1+1]; - - - Xenix FS, SystemV FS, Coherent FS:: - - unsigned short i_mode; - unsigned short i_nlink; - unsigned short i_uid; - unsigned short i_gid; - unsigned long i_size; - unsigned char i_zone[3*(10+1+1+1)]; - unsigned long i_atime; - unsigned long i_mtime; - unsigned long i_ctime; - - -* Regular file data blocks are organized as - - - Minix FS: - - - 7 direct blocks - - 1 indirect block (pointers to blocks) - - 1 double-indirect block (pointer to pointers to blocks) - - - Xenix FS, SystemV FS, Coherent FS: - - - 10 direct blocks - - 1 indirect block (pointers to blocks) - - 1 double-indirect block (pointer to pointers to blocks) - - 1 triple-indirect block (pointer to pointers to pointers to blocks) - - - =========== ========== ================ - Inode size inodes per block - =========== ========== ================ - Minix FS 32 32 - Xenix FS 64 16 - SystemV FS 64 16 - Coherent FS 64 8 - =========== ========== ================ - -* Directory entry on disk - - - Minix FS:: - - unsigned short inode; - char name[14/30]; - - - Xenix FS, SystemV FS, Coherent FS:: - - unsigned short inode; - char name[14]; - - =========== ============== ===================== - Dir entry size dir entries per block - =========== ============== ===================== - Minix FS 16/32 64/32 - Xenix FS 16 64 - SystemV FS 16 64 - Coherent FS 16 32 - =========== ============== ===================== - -* How to implement symbolic links such that the host fsck doesn't scream: - - - Minix FS normal - - Xenix FS kludge: as regular files with chmod 1000 - - SystemV FS ?? - - Coherent FS kludge: as regular files with chmod 1000 - - -Notation: We often speak of a "block" but mean a zone (the allocation unit) -and not the disk driver's notion of "block". diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 31eea688609a..ae79c30b6c0c 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -495,7 +495,7 @@ As of kernel 2.6.22, the following members are defined: int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *); - int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t); + struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *, @@ -562,7 +562,26 @@ otherwise noted. ``mkdir`` called by the mkdir(2) system call. Only required if you want to support creating subdirectories. You will probably need to - call d_instantiate() just as you would in the create() method + call d_instantiate_new() just as you would in the create() method. + + If d_instantiate_new() is not used and if the fh_to_dentry() + export operation is provided, or if the storage might be + accessible by another path (e.g. with a network filesystem) + then more care may be needed. Importantly d_instantate() + should not be used with an inode that is no longer I_NEW if there + any chance that the inode could already be attached to a dentry. + This is because of a hard rule in the VFS that a directory must + only ever have one dentry. + + For example, if an NFS filesystem is mounted twice the new directory + could be visible on the other mount before it is on the original + mount, and a pair of name_to_handle_at(), open_by_handle_at() + calls could instantiate the directory inode with an IS_ROOT() + dentry before the first mkdir returns. + + If there is any chance this could happen, then the new inode + should be d_drop()ed and attached with d_splice_alias(). The + returned dentry (if any) should be returned by ->mkdir(). ``rmdir`` called by the rmdir(2) system call. Only required if you want diff --git a/Documentation/filesystems/xfs/xfs-delayed-logging-design.rst b/Documentation/filesystems/xfs/xfs-delayed-logging-design.rst index 6402ab8e370c..2a2705e975e8 100644 --- a/Documentation/filesystems/xfs/xfs-delayed-logging-design.rst +++ b/Documentation/filesystems/xfs/xfs-delayed-logging-design.rst @@ -219,7 +219,7 @@ The log is circular, so the positions in the log are defined by the combination of a cycle number - the number of times the log has been overwritten - and the offset into the log. A LSN carries the cycle in the upper 32 bits and the offset in the lower 32 bits. The offset is in units of "basic blocks" (512 -bytes). Hence we can do realtively simple LSN based math to keep track of +bytes). Hence we can do relatively simple LSN based math to keep track of available space in the log. Log space accounting is done via a pair of constructs called "grant heads". The diff --git a/Documentation/filesystems/xfs/xfs-maintainer-entry-profile.rst b/Documentation/filesystems/xfs/xfs-maintainer-entry-profile.rst index 32b6ac4ca9d6..ce4584fb3103 100644 --- a/Documentation/filesystems/xfs/xfs-maintainer-entry-profile.rst +++ b/Documentation/filesystems/xfs/xfs-maintainer-entry-profile.rst @@ -93,7 +93,7 @@ others on a regular basis about burnout. sponsoring work on any part of XFS. - **LTS Maintainer**: Someone who backports and tests bug fixes from - uptream to the LTS kernels. + upstream to the LTS kernels. There tend to be six separate LTS trees at any given time. The maintainer for a given LTS release should identify themselves with an diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 12aa63840830..e231d127cd40 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -4521,8 +4521,8 @@ Both online and offline repair can use this strategy. | For this second effort, the ondisk parent pointer format as originally | | proposed was ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``. | | The format was changed during development to eliminate the requirement | -| of repair tools needing to to ensure that the ``dirent_pos`` field | -| always matched when reconstructing a directory. | +| of repair tools needing to ensure that the ``dirent_pos`` field always | +| matched when reconstructing a directory. | | | | There were a few other ways to have solved that problem: | | | |