diff options
author | Manish Ahuja <ahuja@austin.ibm.com> | 2008-03-22 10:33:10 +1100 |
---|---|---|
committer | Paul Mackerras <paulus@samba.org> | 2008-03-26 08:44:06 +1100 |
commit | d28a79326a4028dbb1755b8efe6daa915d8bfeea (patch) | |
tree | 4de3c19e604333cf20c209330c2c868ab2114439 /Documentation/powerpc | |
parent | 163dab39b5432761437ae59164ab351a8680ca4f (diff) | |
download | lwn-d28a79326a4028dbb1755b8efe6daa915d8bfeea.tar.gz lwn-d28a79326a4028dbb1755b8efe6daa915d8bfeea.zip |
[POWERPC] pseries: phyp dump: Documentation
Basic documentation for hypervisor-assisted dump.
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Diffstat (limited to 'Documentation/powerpc')
-rw-r--r-- | Documentation/powerpc/phyp-assisted-dump.txt | 127 |
1 files changed, 127 insertions, 0 deletions
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt new file mode 100644 index 000000000000..c4682b982a2e --- /dev/null +++ b/Documentation/powerpc/phyp-assisted-dump.txt @@ -0,0 +1,127 @@ + + Hypervisor-Assisted Dump + ------------------------ + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + "echo 0x40000000 0x10000000 > /sys/kernel/release_region" + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +---------------------- + +During boot, a check is made to see if firmware supports +this feature on this particular machine. If it does, then +we check to see if a active dump is waiting for us. If yes +then everything but 256 MB of RAM is reserved during early +boot. This area is released once we collect a dump from user +land scripts that are run. If there is dump data, then +the /sys/kernel/release_region file is created, and +the reserved memory is held. + +If there is no waiting dump data, then only the highest +256MB of the ram is reserved as a scratch area. This area +is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy +of the low 256MB in the case a crash does occur. See, +however, "open issues" below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + +General notes: +-------------- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: +------------ + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead? There is a dump_subsys being created by the s390 code, + perhaps the pseries code should use a similar layout as well. + + o Is reserving a 256MB region really required? The goal of + reserving a 256MB scratch area is to make sure that no + important crash data is clobbered when the hypervisor + save low mem to the scratch area. But, if one could assure + that nothing important is located in some 256MB area, then + it would not need to be reserved. Something that can be + improved in subsequent versions. + + o Still working the kdump team to integrate this with kdump, + some work remains but this would not affect the current + patches. + + o Still need to write a shell script, to copy the dump away. + Currently I am parsing it manually. |