Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2009-01-08 17:14:59 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2009-01-08 17:14:59 -0800
commit: 2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f (patch)
tree: f72a0d85e66f500b4cead348a231e3d3b9f357bc /Documentation/filesystems
parent: cd764695b67386a81964f68e9c66efd9f13f4d29 (diff)
parent: 4b905671d2ea09fd48fed72c581df17e40823f39 (diff)
download: lwn-2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f.tar.gz
lwn-2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f.zip
1 files changed, 67 insertions, 18 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 174eaff7ded9..cec829bc7291 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -58,13 +58,22 @@ Note: More extensive information for getting started with ext4 can be
 
 	# mount -t ext4 /dev/hda1 /wherever
 
-  - When comparing performance with other filesystems, remember that
-    ext3/4 by default offers higher data integrity guarantees than most.
-    So when comparing with a metadata-only journalling filesystem, such
-    as ext3, use `mount -o data=writeback'.  And you might as well use
-    `mount -o nobh' too along with it.  Making the journal larger than
-    the mke2fs default often helps performance with metadata-intensive
-    workloads.
+  - When comparing performance with other filesystems, it's always
+    important to try multiple workloads; very often a subtle change in a
+    workload parameter can completely change the ranking of which
+    filesystems do well compared to others.  When comparing versus ext3,
+    note that ext4 enables write barriers by default, while ext3 does
+    not enable write barriers by default.  So it is useful to use
+    explicitly specify whether barriers are enabled or not when via the
+    '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
+    for a fair comparison.  When tuning ext3 for best benchmark numbers,
+    it is often worthwhile to try changing the data journaling mode; '-o
+    data=writeback,nobh' can be faster for some workloads.  (Note
+    however that running mounted with data=writeback can potentially
+    leave stale data exposed in recently written files in case of an
+    unclean shutdown, which could be a security exposure in some
+    situations.)  Configuring the filesystem with a large journal can
+    also be helpful for metadata-intensive workloads.
 
 2. Features
 ===========
@@ -74,7 +83,7 @@ Note: More extensive information for getting started with ext4 can be
 * ability to use filesystems > 16TB (e2fsprogs support not available yet)
 * extent format reduces metadata overhead (RAM, IO for access, transactions)
 * extent format more robust in face of on-disk corruption due to magics,
-* internal redunancy in tree
+* internal redundancy in tree
 * improved file allocation (multi-block alloc)
 * fix 32000 subdirectory limit
 * nsec timestamps for mtime, atime, ctime, create time
@@ -116,10 +125,11 @@ grouping of bitmaps and inode tables.  Some test results available here:
 When mounting an ext4 filesystem, the following option are accepted:
 (*) == default
 
-extents		(*)	ext4 will use extents to address file data.  The
-			file system will no longer be mountable by ext3.
-
-noextents		ext4 will not use extents for newly created files
+ro                   	Mount filesystem read only. Note that ext4 will
+                     	replay the journal (and thus write to the
+                     	partition) even when mounted "read only". The
+                     	mount options "ro,noload" can be used to prevent
+		     	writes to the filesystem.
 
 journal_checksum	Enable checksumming of the journal transactions.
 			This will allow the recovery code in e2fsck and the
@@ -134,17 +144,17 @@ journal_async_commit	Commit block can be written to disk without waiting
 journal=update		Update the ext4 file system's journal to the current
 			format.
 
-journal=inum		When a journal already exists, this option is ignored.
-			Otherwise, it specifies the number of the inode which
-			will represent the ext4 file system's journal file.
-
 journal_dev=devnum	When the external journal device's major/minor numbers
 			have changed, this option allows the user to specify
 			the new journal location.  The journal device is
 			identified through its new major/minor numbers encoded
 			in devnum.
 
-noload			Don't load the journal on mounting.
+noload			Don't load the journal on mounting.  Note that
+                     	if the filesystem was not unmounted cleanly,
+                     	skipping the journal replay will lead to the
+                     	filesystem containing inconsistencies that can
+                     	lead to any number of problems.
 
 data=journal		All data are committed into the journal prior to being
 			written into the main file system.
@@ -219,9 +229,12 @@ minixdf			Make 'df' act like Minix.
 
 debug			Extra debugging information is sent to syslog.
 
-errors=remount-ro(*)	Remount the filesystem read-only on an error.
+errors=remount-ro	Remount the filesystem read-only on an error.
 errors=continue		Keep going on a filesystem error.
 errors=panic		Panic and halt the machine if an error occurs.
+                        (These mount options override the errors behavior
+                        specified in the superblock, which can be configured
+                        using tune2fs)
 
 data_err=ignore(*)	Just print an error message if an error occurs
 			in a file data buffer in ordered mode.
@@ -261,6 +274,42 @@ delalloc	(*)	Deferring block allocation until write-out time.
 nodelalloc		Disable delayed allocation. Blocks are allocation
 			when data is copied from user to page cache.
 
+max_batch_time=usec	Maximum amount of time ext4 should wait for
+			additional filesystem operations to be batch
+			together with a synchronous write operation.
+			Since a synchronous write operation is going to
+			force a commit and then a wait for the I/O
+			complete, it doesn't cost much, and can be a
+			huge throughput win, we wait for a small amount
+			of time to see if any other transactions can
+			piggyback on the synchronous write.   The
+			algorithm used is designed to automatically tune
+			for the speed of the disk, by measuring the
+			amount of time (on average) that it takes to
+			finish committing a transaction.  Call this time
+			the "commit time".  If the time that the
+			transactoin has been running is less than the
+			commit time, ext4 will try sleeping for the
+			commit time to see if other operations will join
+			the transaction.   The commit time is capped by
+			the max_batch_time, which defaults to 15000us
+			(15ms).   This optimization can be turned off
+			entirely by setting max_batch_time to 0.
+
+min_batch_time=usec	This parameter sets the commit time (as
+			described above) to be at least min_batch_time.
+			It defaults to zero microseconds.  Increasing
+			this parameter may improve the throughput of
+			multi-threaded, synchronous workloads on very
+			fast disks, at the cost of increasing latency.
+
+journal_ioprio=prio	The I/O priority (from 0 to 7, where 0 is the
+			highest priorty) which should be used for I/O
+			operations submitted by kjournald2 during a
+			commit operation.  This defaults to 3, which is
+			a slightly higher priority than the default I/O
+			priority.
+
 Data Mode
 =========
 There are 3 different data modes:
author	Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 17:14:59 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 17:14:59 -0800
commit	2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f (patch)
tree	f72a0d85e66f500b4cead348a231e3d3b9f357bc /Documentation/filesystems
parent	cd764695b67386a81964f68e9c66efd9f13f4d29 (diff)
parent	4b905671d2ea09fd48fed72c581df17e40823f39 (diff)
download	lwn-2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f.tar.gz lwn-2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f.zip