<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/arch/i386/kernel/head.S, branch docs-next</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-next</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-next'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2007-10-11T09:13:10+00:00</updated>
<entry>
<title>i386: prepare shared kernel/head.S</title>
<updated>2007-10-11T09:13:10+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2007-10-11T09:13:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=deb8677148fef3aea11edd4e980f6eb9c564f3cd'/>
<id>urn:sha1:deb8677148fef3aea11edd4e980f6eb9c564f3cd</id>
<content type='text'>
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>i386: Fix start_kernel warning</title>
<updated>2007-08-11T22:58:14+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@suse.de</email>
</author>
<published>2007-08-10T20:31:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5fe4486c79cdc8dbbb2a9c3f884a5ad0830a5a23'/>
<id>urn:sha1:5fe4486c79cdc8dbbb2a9c3f884a5ad0830a5a23</id>
<content type='text'>
Fix

WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to .init.text:start_kernel (between 'is386' and 'check_x87')

Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xen: Core Xen implementation</title>
<updated>2007-07-18T15:47:42+00:00</updated>
<author>
<name>Jeremy Fitzhardinge</name>
<email>jeremy@xensource.com</email>
</author>
<published>2007-07-18T01:37:04+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5ead97c84fa7d63a6a7a2f4e9f18f452bd109045'/>
<id>urn:sha1:5ead97c84fa7d63a6a7a2f4e9f18f452bd109045</id>
<content type='text'>
This patch is a rollup of all the core pieces of the Xen
implementation, including:
 - booting and setup
 - pagetable setup
 - privileged instructions
 - segmentation
 - interrupt flags
 - upcalls
 - multicall batching

BOOTING AND SETUP

The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.

Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.

Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper.  The main
steps are:
  1. Install the Xen paravirt_ops, which is simply a matter of a
     structure assignment.
  2. Set init_mm to use the Xen-supplied pagetables (analogous to the
     head.S generated pagetables in a native boot).
  3. Reserve address space for Xen, since it takes a chunk at the top
     of the address space for its own use.
  4. Call start_kernel()

PAGETABLE SETUP

Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state.  One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable.  Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind.  It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.

This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.

PRIVILEGED INSTRUCTIONS AND SEGMENTATION

When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly.  Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops.  In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.

The privileged instructions fall into the broad classes of:
  Segmentation: setting up the GDT and the GDT entries, LDT,
     TLS and so on.  Xen doesn't allow the GDT to be directly
     modified; all GDT updates are done via hypercalls where the new
     entries can be validated.  This is important because Xen uses
     segment limits to prevent the guest kernel from damaging the
     hypervisor itself.
  Traps and exceptions: Xen uses a special format for trap entrypoints,
     so when the kernel wants to set an IDT entry, it needs to be
     converted to the form Xen expects.  Xen sets int 0x80 up specially
     so that the trap goes straight from userspace into the guest kernel
     without going via the hypervisor.  sysenter isn't supported.
  Kernel stack: The esp0 entry is extracted from the tss and provided to
     Xen.
  TLB operations: the various TLB calls are mapped into corresponding
     Xen hypercalls.
  Control registers: all the control registers are privileged.  The most
     important is cr3, which points to the base of the current pagetable,
     and we handle it specially.

Another instruction we treat specially is CPUID, even though its not
privileged.  We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.

INTERRUPT FLAGS

Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure.  Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).

(A note on terminology: "events" and interrupts are effectively
synonymous.  However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)

There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state.  The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.

UPCALLS

Xen needs a couple of upcall (or callback) functions to be implemented
by each guest.  One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests.  The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret.  These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.

MULTICALL BATCHING

Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor.  This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one.  This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Cc: Ian Pratt &lt;ian.pratt@xensource.com&gt;
Cc: Christian Limpach &lt;Christian.Limpach@cl.cam.ac.uk&gt;
Cc: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
</entry>
<entry>
<title>x86: initial fixmap support</title>
<updated>2007-07-16T16:05:35+00:00</updated>
<author>
<name>Eric W. Biderman</name>
<email>ebiderman@xmission.com</email>
</author>
<published>2007-07-16T06:37:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b1c931e39327ef121797927d4b3198d370e75b9b'/>
<id>urn:sha1:b1c931e39327ef121797927d4b3198d370e75b9b</id>
<content type='text'>
Needed to get fixed virtual address for USB debug and earlycon with mmio.

Signed-off-by: Eric W. Biderman &lt;ebiderman@xmisson.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai.lu@sun.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Bjorn Helgaas &lt;bjorn.helgaas@hp.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: Gerd Hoffmann &lt;kraxel@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization"</title>
<updated>2007-05-10T16:26:53+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-05-10T10:15:36+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5a18c92aab13aac7917bc87ceefa23da68698be4'/>
<id>urn:sha1:5a18c92aab13aac7917bc87ceefa23da68698be4</id>
<content type='text'>
This reverts commit c9ccf30d77f04064fe5436027ab9d2230c7cdd94.

Entering the kernel at startup_32 without passing our real mode data in
%esi, and without guaranteeing that physical and virtual addresses are
identity mapped makes head.S impossible to maintain.

The only user of this infrastructure is lguest which is not merged so
nothing we currently support will break by removing this over designed
nightmare, and only the pending lguest patches will be affected.  The
pending Xen patches have a different entry point that they use.

We are currently discussing what Xen and lguest need to do to boot the
kernel in a more normal fashion so using startup_32 in this weird manner is
clearly not their long term direction.

So let's remove this code in head.S before it causes brain damage to people
trying to maintain head.S

Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Cc: Zachary Amsden &lt;zach@vmware.com&gt;
CC: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] i386: map enough initial memory to create lowmem mappings</title>
<updated>2007-05-02T17:27:16+00:00</updated>
<author>
<name>Jeremy Fitzhardinge</name>
<email>jeremy@goop.org</email>
</author>
<published>2007-05-02T17:27:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9ce8c2ed12550f90fd6e902990652b13df647793'/>
<id>urn:sha1:9ce8c2ed12550f90fd6e902990652b13df647793</id>
<content type='text'>
head.S creates the very initial pagetable for the kernel.  This just
maps enough space for the kernel itself, and an allocation bitmap.
The amount of mapped memory is rounded up to 4Mbytes, and so this
typically ends up mapping 8Mbytes of memory.

When booting, pagetable_init() needs to create mappings for all
lowmem, and the pagetables for these mappings are allocated from the
free pages around the kernel in low memory.  If the number of
pagetable pages + kernel size exceeds head.S's initial mapping, it
will end up faulting on an unmapped page.  This will only happen with
specific combinations of kernel size and memory size.

This patch makes sure that head.S also maps enough space to fit the
kernel pagetables as well as the kernel itself.  It ends up using an
additional two pages of unreclaimable memory.

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Acked-by: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Cc: Zachary Amsden &lt;zach@vmware.com&gt;
Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;,
</content>
</entry>
<entry>
<title>[PATCH] i386: Convert PDA into the percpu section</title>
<updated>2007-05-02T17:27:16+00:00</updated>
<author>
<name>Jeremy Fitzhardinge</name>
<email>jeremy@goop.org</email>
</author>
<published>2007-05-02T17:27:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7c3576d261ce046789a7db14f43303f8120910c7'/>
<id>urn:sha1:7c3576d261ce046789a7db14f43303f8120910c7</id>
<content type='text'>
Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
    special segment at early boot; it can be deferred until the first
    percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
</content>
</entry>
<entry>
<title>[PATCH] i386: Page-align the GDT</title>
<updated>2007-05-02T17:27:15+00:00</updated>
<author>
<name>Jeremy Fitzhardinge</name>
<email>jeremy@goop.org</email>
</author>
<published>2007-05-02T17:27:15+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7a61d35d4b4056e7711031202da7605e052f4137'/>
<id>urn:sha1:7a61d35d4b4056e7711031202da7605e052f4137</id>
<content type='text'>
Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Acked-by: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
</content>
</entry>
<entry>
<title>[PATCH] i386: Rename boot_gdt_table to boot_gdt</title>
<updated>2007-05-02T17:27:10+00:00</updated>
<author>
<name>Sebastien Dugue</name>
<email>sebastien.dugue@bull.net</email>
</author>
<published>2007-05-02T17:27:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=52de74dd3994e165ef1b35c33d54655a6400e30c'/>
<id>urn:sha1:52de74dd3994e165ef1b35c33d54655a6400e30c</id>
<content type='text'>
Rename boot_gdt_table to boot_gdt to avoid the duplicate T(able).

Signed-off-by: Sebastien Dugue &lt;sebastien.dugue@bull.net&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Acked-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] i386: Use per-cpu GDT immediately upon boot</title>
<updated>2007-05-02T17:27:10+00:00</updated>
<author>
<name>Rusty Russell</name>
<email>rusty@rustcorp.com.au</email>
</author>
<published>2007-05-02T17:27:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=bf50467204b435421d8de33ad080fa46c6f3d50b'/>
<id>urn:sha1:bf50467204b435421d8de33ad080fa46c6f3d50b</id>
<content type='text'>
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT.  This means initializing the cpu_gdt array in C.

The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated.  For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.

For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Cc: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
