diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/filesystems/knfsd-stats.txt | 159 | ||||
-rw-r--r-- | Documentation/filesystems/nfs41-server.txt | 161 | ||||
-rw-r--r-- | Documentation/filesystems/nilfs2.txt | 200 |
4 files changed, 522 insertions, 0 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 52cd611277a3..8dd6db76171d 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -68,6 +68,8 @@ ncpfs.txt - info on Novell Netware(tm) filesystem using NCP protocol. nfsroot.txt - short guide on setting up a diskless box with NFS root filesystem. +nilfs2.txt + - info and mount options for the NILFS2 filesystem. ntfs.txt - info and mount options for the NTFS filesystem (Windows NT). ocfs2.txt diff --git a/Documentation/filesystems/knfsd-stats.txt b/Documentation/filesystems/knfsd-stats.txt new file mode 100644 index 000000000000..64ced5149d37 --- /dev/null +++ b/Documentation/filesystems/knfsd-stats.txt @@ -0,0 +1,159 @@ + +Kernel NFS Server Statistics +============================ + +This document describes the format and semantics of the statistics +which the kernel NFS server makes available to userspace. These +statistics are available in several text form pseudo files, each of +which is described separately below. + +In most cases you don't need to know these formats, as the nfsstat(8) +program from the nfs-utils distribution provides a helpful command-line +interface for extracting and printing them. + +All the files described here are formatted as a sequence of text lines, +separated by newline '\n' characters. Lines beginning with a hash +'#' character are comments intended for humans and should be ignored +by parsing routines. All other lines contain a sequence of fields +separated by whitespace. + +/proc/fs/nfsd/pool_stats +------------------------ + +This file is available in kernels from 2.6.30 onwards, if the +/proc/fs/nfsd filesystem is mounted (it almost always should be). + +The first line is a comment which describes the fields present in +all the other lines. The other lines present the following data as +a sequence of unsigned decimal numeric fields. One line is shown +for each NFS thread pool. + +All counters are 64 bits wide and wrap naturally. There is no way +to zero these counters, instead applications should do their own +rate conversion. + +pool + The id number of the NFS thread pool to which this line applies. + This number does not change. + + Thread pool ids are a contiguous set of small integers starting + at zero. The maximum value depends on the thread pool mode, but + currently cannot be larger than the number of CPUs in the system. + Note that in the default case there will be a single thread pool + which contains all the nfsd threads and all the CPUs in the system, + and thus this file will have a single line with a pool id of "0". + +packets-arrived + Counts how many NFS packets have arrived. More precisely, this + is the number of times that the network stack has notified the + sunrpc server layer that new data may be available on a transport + (e.g. an NFS or UDP socket or an NFS/RDMA endpoint). + + Depending on the NFS workload patterns and various network stack + effects (such as Large Receive Offload) which can combine packets + on the wire, this may be either more or less than the number + of NFS calls received (which statistic is available elsewhere). + However this is a more accurate and less workload-dependent measure + of how much CPU load is being placed on the sunrpc server layer + due to NFS network traffic. + +sockets-enqueued + Counts how many times an NFS transport is enqueued to wait for + an nfsd thread to service it, i.e. no nfsd thread was considered + available. + + The circumstance this statistic tracks indicates that there was NFS + network-facing work to be done but it couldn't be done immediately, + thus introducing a small delay in servicing NFS calls. The ideal + rate of change for this counter is zero; significantly non-zero + values may indicate a performance limitation. + + This can happen either because there are too few nfsd threads in the + thread pool for the NFS workload (the workload is thread-limited), + or because the NFS workload needs more CPU time than is available in + the thread pool (the workload is CPU-limited). In the former case, + configuring more nfsd threads will probably improve the performance + of the NFS workload. In the latter case, the sunrpc server layer is + already choosing not to wake idle nfsd threads because there are too + many nfsd threads which want to run but cannot, so configuring more + nfsd threads will make no difference whatsoever. The overloads-avoided + statistic (see below) can be used to distinguish these cases. + +threads-woken + Counts how many times an idle nfsd thread is woken to try to + receive some data from an NFS transport. + + This statistic tracks the circumstance where incoming + network-facing NFS work is being handled quickly, which is a good + thing. The ideal rate of change for this counter will be close + to but less than the rate of change of the packets-arrived counter. + +overloads-avoided + Counts how many times the sunrpc server layer chose not to wake an + nfsd thread, despite the presence of idle nfsd threads, because + too many nfsd threads had been recently woken but could not get + enough CPU time to actually run. + + This statistic counts a circumstance where the sunrpc layer + heuristically avoids overloading the CPU scheduler with too many + runnable nfsd threads. The ideal rate of change for this counter + is zero. Significant non-zero values indicate that the workload + is CPU limited. Usually this is associated with heavy CPU usage + on all the CPUs in the nfsd thread pool. + + If a sustained large overloads-avoided rate is detected on a pool, + the top(1) utility should be used to check for the following + pattern of CPU usage on all the CPUs associated with the given + nfsd thread pool. + + - %us ~= 0 (as you're *NOT* running applications on your NFS server) + + - %wa ~= 0 + + - %id ~= 0 + + - %sy + %hi + %si ~= 100 + + If this pattern is seen, configuring more nfsd threads will *not* + improve the performance of the workload. If this patten is not + seen, then something more subtle is wrong. + +threads-timedout + Counts how many times an nfsd thread triggered an idle timeout, + i.e. was not woken to handle any incoming network packets for + some time. + + This statistic counts a circumstance where there are more nfsd + threads configured than can be used by the NFS workload. This is + a clue that the number of nfsd threads can be reduced without + affecting performance. Unfortunately, it's only a clue and not + a strong indication, for a couple of reasons: + + - Currently the rate at which the counter is incremented is quite + slow; the idle timeout is 60 minutes. Unless the NFS workload + remains constant for hours at a time, this counter is unlikely + to be providing information that is still useful. + + - It is usually a wise policy to provide some slack, + i.e. configure a few more nfsds than are currently needed, + to allow for future spikes in load. + + +Note that incoming packets on NFS transports will be dealt with in +one of three ways. An nfsd thread can be woken (threads-woken counts +this case), or the transport can be enqueued for later attention +(sockets-enqueued counts this case), or the packet can be temporarily +deferred because the transport is currently being used by an nfsd +thread. This last case is not very interesting and is not explicitly +counted, but can be inferred from the other counters thus: + +packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken ) + + +More +---- +Descriptions of the other statistics file should go here. + + +Greg Banks <gnb@sgi.com> +26 Mar 2009 diff --git a/Documentation/filesystems/nfs41-server.txt b/Documentation/filesystems/nfs41-server.txt new file mode 100644 index 000000000000..05d81cbcb2e1 --- /dev/null +++ b/Documentation/filesystems/nfs41-server.txt @@ -0,0 +1,161 @@ +NFSv4.1 Server Implementation + +Server support for minorversion 1 can be controlled using the +/proc/fs/nfsd/versions control file. The string output returned +by reading this file will contain either "+4.1" or "-4.1" +correspondingly. + +Currently, server support for minorversion 1 is disabled by default. +It can be enabled at run time by writing the string "+4.1" to +the /proc/fs/nfsd/versions control file. Note that to write this +control file, the nfsd service must be taken down. Use your user-mode +nfs-utils to set this up; see rpc.nfsd(8) + +The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based +on the latest NFSv4.1 Internet Draft: +http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29 + +From the many new features in NFSv4.1 the current implementation +focuses on the mandatory-to-implement NFSv4.1 Sessions, providing +"exactly once" semantics and better control and throttling of the +resources allocated for each client. + +Other NFSv4.1 features, Parallel NFS operations in particular, +are still under development out of tree. +See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design +for more information. + +The table below, taken from the NFSv4.1 document, lists +the operations that are mandatory to implement (REQ), optional +(OPT), and NFSv4.0 operations that are required not to implement (MNI) +in minor version 1. The first column indicates the operations that +are not supported yet by the linux server implementation. + +The OPTIONAL features identified and their abbreviations are as follows: + pNFS Parallel NFS + FDELG File Delegations + DDELG Directory Delegations + +The following abbreviations indicate the linux server implementation status. + I Implemented NFSv4.1 operations. + NS Not Supported. + NS* unimplemented optional feature. + P pNFS features implemented out of tree. + PNS pNFS features that are not supported yet (out of tree). + +Operations + + +----------------------+------------+--------------+----------------+ + | Operation | REQ, REC, | Feature | Definition | + | | OPT, or | (REQ, REC, | | + | | MNI | or OPT) | | + +----------------------+------------+--------------+----------------+ + | ACCESS | REQ | | Section 18.1 | +NS | BACKCHANNEL_CTL | REQ | | Section 18.33 | +NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | + | CLOSE | REQ | | Section 18.2 | + | COMMIT | REQ | | Section 18.3 | + | CREATE | REQ | | Section 18.4 | +I | CREATE_SESSION | REQ | | Section 18.36 | +NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | + | DELEGRETURN | OPT | FDELG, | Section 18.6 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS | DESTROY_CLIENTID | REQ | | Section 18.50 | +I | DESTROY_SESSION | REQ | | Section 18.37 | +I | EXCHANGE_ID | REQ | | Section 18.35 | +NS | FREE_STATEID | REQ | | Section 18.38 | + | GETATTR | REQ | | Section 18.7 | +P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | +P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | + | GETFH | REQ | | Section 18.8 | +NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | +P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | +P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | +P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | + | LINK | OPT | | Section 18.9 | + | LOCK | REQ | | Section 18.10 | + | LOCKT | REQ | | Section 18.11 | + | LOCKU | REQ | | Section 18.12 | + | LOOKUP | REQ | | Section 18.13 | + | LOOKUPP | REQ | | Section 18.14 | + | NVERIFY | REQ | | Section 18.15 | + | OPEN | REQ | | Section 18.16 | +NS*| OPENATTR | OPT | | Section 18.17 | + | OPEN_CONFIRM | MNI | | N/A | + | OPEN_DOWNGRADE | REQ | | Section 18.18 | + | PUTFH | REQ | | Section 18.19 | + | PUTPUBFH | REQ | | Section 18.20 | + | PUTROOTFH | REQ | | Section 18.21 | + | READ | REQ | | Section 18.22 | + | READDIR | REQ | | Section 18.23 | + | READLINK | OPT | | Section 18.24 | +NS | RECLAIM_COMPLETE | REQ | | Section 18.51 | + | RELEASE_LOCKOWNER | MNI | | N/A | + | REMOVE | REQ | | Section 18.25 | + | RENAME | REQ | | Section 18.26 | + | RENEW | MNI | | N/A | + | RESTOREFH | REQ | | Section 18.27 | + | SAVEFH | REQ | | Section 18.28 | + | SECINFO | REQ | | Section 18.29 | +NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | + | | | layout (REQ) | Section 13.12 | +I | SEQUENCE | REQ | | Section 18.46 | + | SETATTR | REQ | | Section 18.30 | + | SETCLIENTID | MNI | | N/A | + | SETCLIENTID_CONFIRM | MNI | | N/A | +NS | SET_SSV | REQ | | Section 18.47 | +NS | TEST_STATEID | REQ | | Section 18.48 | + | VERIFY | REQ | | Section 18.31 | +NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | + | WRITE | REQ | | Section 18.32 | + +Callback Operations + + +-------------------------+-----------+-------------+---------------+ + | Operation | REQ, REC, | Feature | Definition | + | | OPT, or | (REQ, REC, | | + | | MNI | or OPT) | | + +-------------------------+-----------+-------------+---------------+ + | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | +P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | +NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | +P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | +NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | +NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | + | CB_RECALL | OPT | FDELG, | Section 20.2 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS | CB_RECALL_SLOT | REQ | | Section 20.8 | +NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | + | | | (REQ) | | +I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | + | | | DDELG, pNFS | | + | | | (REQ) | | + +-------------------------+-----------+-------------+---------------+ + +Implementation notes: + +EXCHANGE_ID: +* only SP4_NONE state protection supported +* implementation ids are ignored + +CREATE_SESSION: +* backchannel attributes are ignored +* backchannel security parameters are ignored + +SEQUENCE: +* no support for dynamic slot table renegotiation (optional) + +nfsv4.1 COMPOUND rules: +The following cases aren't supported yet: +* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION, + DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID. +* DESTROY_SESSION MUST be the final operation in the COMPOUND request. + diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt new file mode 100644 index 000000000000..55c4300abfcb --- /dev/null +++ b/Documentation/filesystems/nilfs2.txt @@ -0,0 +1,200 @@ +NILFS2 +------ + +NILFS2 is a log-structured file system (LFS) supporting continuous +snapshotting. In addition to versioning capability of the entire file +system, users can even restore files mistakenly overwritten or +destroyed just a few seconds ago. Since NILFS2 can keep consistency +like conventional LFS, it achieves quick recovery after system +crashes. + +NILFS2 creates a number of checkpoints every few seconds or per +synchronous write basis (unless there is no change). Users can select +significant versions among continuously created checkpoints, and can +change them into snapshots which will be preserved until they are +changed back to checkpoints. + +There is no limit on the number of snapshots until the volume gets +full. Each snapshot is mountable as a read-only file system +concurrently with its writable mount, and this feature is convenient +for online backup. + +The userland tools are included in nilfs-utils package, which is +available from the following download page. At least "mkfs.nilfs2", +"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called +cleaner or garbage collector) are required. Details on the tools are +described in the man pages included in the package. + +Project web page: http://www.nilfs.org/en/ +Download page: http://www.nilfs.org/en/download.html +Git tree web page: http://www.nilfs.org/git/ +NILFS mailing lists: http://www.nilfs.org/mailman/listinfo/users + +Caveats +======= + +Features which NILFS2 does not support yet: + + - atime + - extended attributes + - POSIX ACLs + - quotas + - writable snapshots + - remote backup (CDP) + - data integrity + - defragmentation + +Mount options +============= + +NILFS2 supports the following mount options: +(*) == default + +barrier=on(*) This enables/disables barriers. barrier=off disables + it, barrier=on enables it. +errors=continue(*) Keep going on a filesystem error. +errors=remount-ro Remount the filesystem read-only on an error. +errors=panic Panic and halt the machine if an error occurs. +cp=n Specify the checkpoint-number of the snapshot to be + mounted. Checkpoints and snapshots are listed by lscp + user command. Only the checkpoints marked as snapshot + are mountable with this option. Snapshot is read-only, + so a read-only mount option must be specified together. +order=relaxed(*) Apply relaxed order semantics that allows modified data + blocks to be written to disk without making a + checkpoint if no metadata update is going. This mode + is equivalent to the ordered data mode of the ext3 + filesystem except for the updates on data blocks still + conserve atomicity. This will improve synchronous + write performance for overwriting. +order=strict Apply strict in-order semantics that preserves sequence + of all file operations including overwriting of data + blocks. That means, it is guaranteed that no + overtaking of events occurs in the recovered file + system after a crash. + +NILFS2 usage +============ + +To use nilfs2 as a local file system, simply: + + # mkfs -t nilfs2 /dev/block_device + # mount -t nilfs2 /dev/block_device /dir + +This will also invoke the cleaner through the mount helper program +(mount.nilfs2). + +Checkpoints and snapshots are managed by the following commands. +Their manpages are included in the nilfs-utils package above. + + lscp list checkpoints or snapshots. + mkcp make a checkpoint or a snapshot. + chcp change an existing checkpoint to a snapshot or vice versa. + rmcp invalidate specified checkpoint(s). + +To mount a snapshot, + + # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir + +where <cno> is the checkpoint number of the snapshot. + +To unmount the NILFS2 mount point or snapshot, simply: + + # umount /dir + +Then, the cleaner daemon is automatically shut down by the umount +helper program (umount.nilfs2). + +Disk format +=========== + +A nilfs2 volume is equally divided into a number of segments except +for the super block (SB) and segment #0. A segment is the container +of logs. Each log is composed of summary information blocks, payload +blocks, and an optional super root block (SR): + + ______________________________________________________ + | |SB| | Segment | Segment | Segment | ... | Segment | | + |_|__|_|____0____|____1____|____2____|_____|____N____|_| + 0 +1K +4K +8M +16M +24M +(8MB x N) + . . (Typical offsets for 4KB-block) + . . + .______________________. + | log | log |... | log | + |__1__|__2__|____|__m__| + . . + . . + . . + .______________________________. + | Summary | Payload blocks |SR| + |_blocks__|_________________|__| + +The payload blocks are organized per file, and each file consists of +data blocks and B-tree node blocks: + + |<--- File-A --->|<--- File-B --->| + _______________________________________________________________ + | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ... + _|_____________|_______________|_____________|_______________|_ + + +Since only the modified blocks are written in the log, it may have +files without data blocks or B-tree node blocks. + +The organization of the blocks is recorded in the summary information +blocks, which contains a header structure (nilfs_segment_summary), per +file structures (nilfs_finfo), and per block structures (nilfs_binfo): + + _________________________________________________________________________ + | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |... + |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___ + + +The logs include regular files, directory files, symbolic link files +and several meta data files. The mata data files are the files used +to maintain file system meta data. The current version of NILFS2 uses +the following meta data files: + + 1) Inode file (ifile) -- Stores on-disk inodes + 2) Checkpoint file (cpfile) -- Stores checkpoints + 3) Segment usage file (sufile) -- Stores allocation state of segments + 4) Data address translation file -- Maps virtual block numbers to usual + (DAT) block numbers. This file serves to + make on-disk blocks relocatable. + +The following figure shows a typical organization of the logs: + + _________________________________________________________________________ + | Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR| + |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__| + + +To stride over segment boundaries, this sequence of files may be split +into multiple logs. The sequence of logs that should be treated as +logically one log, is delimited with flags marked in the segment +summary. The recovery code of nilfs2 looks this boundary information +to ensure atomicity of updates. + +The super root block is inserted for every checkpoints. It includes +three special inodes, inodes for the DAT, cpfile, and sufile. Inodes +of regular files, directories, symlinks and other special files, are +included in the ifile. The inode of ifile itself is included in the +corresponding checkpoint entry in the cpfile. Thus, the hierarchy +among NILFS2 files can be depicted as follows: + + Super block (SB) + | + v + Super root block (the latest cno=xx) + |-- DAT + |-- sufile + `-- cpfile + |-- ifile (cno=c1) + |-- ifile (cno=c2) ---- file (ino=i1) + : : |-- file (ino=i2) + `-- ifile (cno=xx) |-- file (ino=i3) + : : + `-- file (ino=yy) + ( regular file, directory, or symlink ) + +For detail on the format of each file, please see include/linux/nilfs2_fs.h. |