summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorRob Landley <rob@landley.net>2005-11-07 01:01:09 -0800
committerLinus Torvalds <torvalds@g5.osdl.org>2005-11-07 07:53:56 -0800
commit7f46a240b0a1797eb641c046d445f026563463d4 (patch)
treef45cc4bccb355b147b2bbf8b0c329b71466aeed5 /Documentation
parentcbf8f0f36a2339f87b9dabbbd301ffd86744620c (diff)
downloadlwn-7f46a240b0a1797eb641c046d445f026563463d4.tar.gz
lwn-7f46a240b0a1797eb641c046d445f026563463d4.zip
[PATCH] ramfs, rootfs, and initramfs docs
Docs for ramfs, rootfs, and initramfs. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.txt195
1 files changed, 195 insertions, 0 deletions
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
new file mode 100644
index 000000000000..b3404a032596
--- /dev/null
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -0,0 +1,195 @@
+ramfs, rootfs and initramfs
+October 17, 2005
+Rob Landley <rob@landley.net>
+=============================
+
+What is ramfs?
+--------------
+
+Ramfs is a very simple filesystem that exports Linux's disk caching
+mechanisms (the page cache and dentry cache) as a dynamically resizable
+ram-based filesystem.
+
+Normally all files are cached in memory by Linux. Pages of data read from
+backing store (usually the block device the filesystem is mounted on) are kept
+around in case it's needed again, but marked as clean (freeable) in case the
+Virtual Memory system needs the memory for something else. Similarly, data
+written to files is marked clean as soon as it has been written to backing
+store, but kept around for caching purposes until the VM reallocates the
+memory. A similar mechanism (the dentry cache) greatly speeds up access to
+directories.
+
+With ramfs, there is no backing store. Files written into ramfs allocate
+dentries and page cache as usual, but there's nowhere to write them to.
+This means the pages are never marked clean, so they can't be freed by the
+VM when it's looking to recycle memory.
+
+The amount of code required to implement ramfs is tiny, because all the
+work is done by the existing Linux caching infrastructure. Basically,
+you're mounting the disk cache as a filesystem. Because of this, ramfs is not
+an optional component removable via menuconfig, since there would be negligible
+space savings.
+
+ramfs and ramdisk:
+------------------
+
+The older "ram disk" mechanism created a synthetic block device out of
+an area of ram and used it as backing store for a filesystem. This block
+device was of fixed size, so the filesystem mounted on it was of fixed
+size. Using a ram disk also required unnecessarily copying memory from the
+fake block device into the page cache (and copying changes back out), as well
+as creating and destroying dentries. Plus it needed a filesystem driver
+(such as ext2) to format and interpret this data.
+
+Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
+unnecessary work for the CPU, and pollutes the CPU caches. (There are tricks
+to avoid this copying by playing with the page tables, but they're unpleasantly
+complicated and turn out to be about as expensive as the copying anyway.)
+More to the point, all the work ramfs is doing has to happen _anyway_,
+since all file access goes through the page and dentry caches. The ram
+disk is simply unnecessary, ramfs is internally much simpler.
+
+Another reason ramdisks are semi-obsolete is that the introduction of
+loopback devices offered a more flexible and convenient way to create
+synthetic block devices, now from files instead of from chunks of memory.
+See losetup (8) for details.
+
+ramfs and tmpfs:
+----------------
+
+One downside of ramfs is you can keep writing data into it until you fill
+up all memory, and the VM can't free it because the VM thinks that files
+should get written to backing store (rather than swap space), but ramfs hasn't
+got any backing store. Because of this, only root (or a trusted user) should
+be allowed write access to a ramfs mount.
+
+A ramfs derivative called tmpfs was created to add size limits, and the ability
+to write the data to swap space. Normal users can be allowed write access to
+tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
+
+What is rootfs?
+---------------
+
+Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
+(It's used internally as the starting and stopping point for searches of the
+kernel's doubly-linked list of mount points.)
+
+Most systems just mount another filesystem over it and ignore it. The
+amount of space an empty instance of ramfs takes up is tiny.
+
+What is initramfs?
+------------------
+
+All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
+extracted into rootfs when the kernel boots up. After extracting, the kernel
+checks to see if rootfs contains a file "init", and if so it executes it as PID
+1. If found, this init process is responsible for bringing the system the
+rest of the way up, including locating and mounting the real root device (if
+any). If rootfs does not contain an init program after the embedded cpio
+archive is extracted into it, the kernel will fall through to the older code
+to locate and mount a root partition, then exec some variant of /sbin/init
+out of that.
+
+All this differs from the old initrd in several ways:
+
+ - The old initrd was a separate file, while the initramfs archive is linked
+ into the linux kernel image. (The directory linux-*/usr is devoted to
+ generating this archive during the build.)
+
+ - The old initrd file was a gzipped filesystem image (in some file format,
+ such as ext2, that had to be built into the kernel), while the new
+ initramfs archive is a gzipped cpio archive (like tar only simpler,
+ see cpio(1) and Documentation/early-userspace/buffer-format.txt).
+
+ - The program run by the old initrd (which was called /initrd, not /init) did
+ some setup and then returned to the kernel, while the init program from
+ initramfs is not expected to return to the kernel. (If /init needs to hand
+ off control it can overmount / with a new root device and exec another init
+ program. See the switch_root utility, below.)
+
+ - When switching another root device, initrd would pivot_root and then
+ umount the ramdisk. But initramfs is rootfs: you can neither pivot_root
+ rootfs, nor unmount it. Instead delete everything out of rootfs to
+ free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
+ with the new root (cd /newmount; mount --move . /; chroot .), attach
+ stdin/stdout/stderr to the new /dev/console, and exec the new init.
+
+ Since this is a remarkably persnickity process (and involves deleting
+ commands before you can run them), the klibc package introduced a helper
+ program (utils/run_init.c) to do all this for you. Most other packages
+ (such as busybox) have named this command "switch_root".
+
+Populating initramfs:
+---------------------
+
+The 2.6 kernel build process always creates a gzipped cpio format initramfs
+archive and links it into the resulting kernel binary. By default, this
+archive is empty (consuming 134 bytes on x86). The config option
+CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
+in menuconfig, and living in usr/Kconfig) can be used to specify a source for
+the initramfs archive, which will automatically be incorporated into the
+resulting binary. This option can point to an existing gzipped cpio archive, a
+directory containing files to be archived, or a text file specification such
+as the following example:
+
+ dir /dev 755 0 0
+ nod /dev/console 644 0 0 c 5 1
+ nod /dev/loop0 644 0 0 b 7 0
+ dir /bin 755 1000 1000
+ slink /bin/sh busybox 777 0 0
+ file /bin/busybox initramfs/busybox 755 0 0
+ dir /proc 755 0 0
+ dir /sys 755 0 0
+ dir /mnt 755 0 0
+ file /init initramfs/init.sh 755 0 0
+
+One advantage of the text file is that root access is not required to
+set permissions or create device nodes in the new archive. (Note that those
+two example "file" entries expect to find files named "init.sh" and "busybox" in
+a directory called "initramfs", under the linux-2.6.* directory. See
+Documentation/early-userspace/README for more details.)
+
+If you don't already understand what shared libraries, devices, and paths
+you need to get a minimal root filesystem up and running, here are some
+references:
+http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
+http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
+http://www.linuxfromscratch.org/lfs/view/stable/
+
+The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
+designed to be a tiny C library to statically link early userspace
+code against, along with some related utilities. It is BSD licensed.
+
+I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
+myself. These are LGPL and GPL, respectively.
+
+In theory you could use glibc, but that's not well suited for small embedded
+uses like this. (A "hello world" program statically linked against glibc is
+over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
+name lookups, even when otherwise statically linked.)
+
+Future directions:
+------------------
+
+Today (2.6.14), initramfs is always compiled in, but not always used. The
+kernel falls back to legacy boot code that is reached only if initramfs does
+not contain an /init program. The fallback is legacy code, there to ensure a
+smooth transition and allowing early boot functionality to gradually move to
+"early userspace" (I.E. initramfs).
+
+The move to early userspace is necessary because finding and mounting the real
+root device is complex. Root partitions can span multiple devices (raid or
+separate journal). They can be out on the network (requiring dhcp, setting a
+specific mac address, logging into a server, etc). They can live on removable
+media, with dynamically allocated major/minor numbers and persistent naming
+issues requiring a full udev implementation to sort out. They can be
+compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
+and so on.
+
+This kind of complexity (which inevitably includes policy) is rightly handled
+in userspace. Both klibc and busybox/uClibc are working on simple initramfs
+packages to drop into a kernel build, and when standard solutions are ready
+and widely deployed, the kernel's legacy early boot code will become obsolete
+and a candidate for the feature removal schedule.
+
+But that's a while off yet.