summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2008-06-27x86, AMD IOMMU: initialize dma_ops from IOMMU initialization and enable IOMMUsJoerg Roedel
This patch adds a call to the driver specific dma_ops initialization routine from the AMD IOMMU hardware initialization. Further it adds the necessary code to finally enable AMD IOMMU hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add amd_iommu.h to export functions to the generic x86 dma codeJoerg Roedel
This patch adds the amd_iommu.h file which will be included in the generic code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add dma_ops initialization functionJoerg Roedel
This patch adds the function to initialize the dma_ops for the AMD IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add pre-allocation of protection domainsJoerg Roedel
This patch adds a function to pre-allocate protection domains. So we don't have to allocate it on the first request for a device (which can happen in atomic mode). Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add mapping functions for coherent mappingsJoerg Roedel
This patch adds the dma_ops functions for coherent mappings. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add mapping functions for scatter gather listsJoerg Roedel
This patch adds the dma_ops functions for mapping and unmapping scatter gather lists. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add dma_ops mapping functions for single mappingsJoerg Roedel
This patch adds the dma_ops specific mapping functions for single mappings. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add generic dma_ops mapping functionsJoerg Roedel
This patch adds the generic functions to map and unmap pages to a protection domain for dma_ops usage. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions to find IOMMU device resourcesJoerg Roedel
This patch adds functions necessary to find the IOMMU resources for a specific device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add domain allocation and deallocation functionsJoerg Roedel
This patch adds the functions to allocate and free protection domains for the IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add address allocation and deallocation functionsJoerg Roedel
This patch adds the address allocator to the AMD IOMMU code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions to initialize unity mappingsJoerg Roedel
This patch adds the functions which will initialize the unity mappings in the device page tables. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions to send IOMMU commandsJoerg Roedel
This patch adds generic handling function as well as all functions to send specific commands to the IOMMU hardware as required by this driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add amd_iommu.c to MakefileJoerg Roedel
This patch adds the new amd_iommu.c file to the architecture kernel Makefile for x86. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add generic defines and structures for mapping codeJoerg Roedel
This patch adds generic stuff used by the mapping and unmapping code for the AMD IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add kernel command line parameters for AMD IOMMUJoerg Roedel
This patch adds two parameters to the kernel command line to control behavior of the AMD IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add early detection codeJoerg Roedel
This patch adds code to detect an AMD IOMMU early during boot before the early detection code of other IOMMUs run. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: clue initialization code togetherJoerg Roedel
This patch puts the AMD IOMMU ACPI table parsing and hardware initialization functions together to the main intialization routine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions to parse IOMMU memory mapping requirements for ↵Joerg Roedel
devices This patch adds the functions to parse the information about IOMMU exclusion ranges and required unity mappings for the devices handled by the IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add detect code for AMD IOMMU hardwareJoerg Roedel
This patch adds the detection of AMD IOMMU hardware provided on information from ACPI provided by the BIOS. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions for IOMMU hardware initialization from ACPIJoerg Roedel
This patch adds functions to initialize the IOMMU hardware with information from ACPI and PCI. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add device table initialization functionsJoerg Roedel
This patch adds functions necessary to initialize the device table from the ACPI definitions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add command buffer (de-)allocationJoerg Roedel
This patch adds the functions to allocate and deallocate the command buffer for one IOMMU in the system. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions for programming IOMMU MMIO spaceJoerg Roedel
This patch adds the functions required to programm the IOMMU with the MMIO space. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions for mapping/unmapping the MMIO spaceJoerg Roedel
This patch contains two functions to map and unmap the MMIO region of an IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add amd_iommu_init.c to MakefileJoerg Roedel
This patch adds the source file amd_iommu_init.c to the kernel Makefile for the x86 architecture. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add functions to find last possible PCI device for IOMMUJoerg Roedel
This patch adds functions to find the last PCI bus/device/function the IOMMU driver has to handle. This information is used later to allocate the memory for the data structures. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add data structures to manage the IOMMUs in the systemJoerg Roedel
This patch adds the data structures which will contain the information read from the ACPI table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add defines and structures for ACPI scanning codeJoerg Roedel
This patch adds the required data structures and constants required to parse the ACPI table for the AMD IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27x86, AMD IOMMU: add Kconfig entryJoerg Roedel
This patch adds the Kconfig entry for the AMD IOMMU driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: Sebastian.Biemueller@amd.com Cc: robert.richter@amd.com Cc: joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24Merge branch 'kvm-updates-2.6.26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Remove now unused structs from kvm_para.h x86: KVM guest: Use the paravirt clocksource structs and functions KVM: Make kvm host use the paravirt clocksource structs x86: Make xen use the paravirt clocksource structs and functions x86: Add structs and functions for paravirt clocksource KVM: VMX: Fix host msr corruption with preemption enabled KVM: ioapic: fix lost interrupt when changing a device's irq KVM: MMU: Fix oops on guest userspace access to guest pagetable KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) KVM: MMU: Fix rmap_write_protect() hugepage iteration bug KVM: close timer injection race window in __vcpu_run KVM: Fix race between timer migration and vcpu migration
2008-06-24x86: KVM guest: Use the paravirt clocksource structs and functionsGerd Hoffmann
This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann
This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Make xen use the paravirt clocksource structs and functionsGerd Hoffmann
This patch updates the xen guest to use the pvclock structs and helper functions. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Add structs and functions for paravirt clocksourceGerd Hoffmann
This patch adds structs for the paravirt clocksource ABI used by both xen and kvm (pvclock-abi.h). It also adds some helper functions to read system time and wall clock time from a paravirtual clocksource (pvclock.[ch]). They are based on the xen code. They are enabled using CONFIG_PARAVIRT_CLOCK. Subsequent patches of this series will put the code in use. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24xen: remove support for non-PAE 32-bitJeremy Fitzhardinge
Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity
Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity
KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti
rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: close timer injection race window in __vcpu_runMarcelo Tosatti
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti
A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-20xen: don't drop NX bitJeremy Fitzhardinge
Because NX is now enforced properly, we must put the hypercall page into the .text segment so that it is executable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20xen: mask unwanted pte bits in __supported_pte_maskJeremy Fitzhardinge
[ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86, geode: add a VSA2 ID for General SoftwareJordan Crouse
General Software writes their own VSA2 module for their version of the Geode BIOS, which returns a different ID then the standard VSA2. This was causing the framebuffer driver to break for most GSW boards. Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Cc: tglx@linutronix.de Cc: linux-geode@lists.infradead.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86: use BOOTMEM_EXCLUSIVE on 32-bitBernhard Walle
This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for i386 and prints a error message on failure. The patch is still for 2.6.26 since it is only bug fixing. The unification of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
2008-06-19x86, 32-bit: fix boot failure on TSC-less processorsMikael Pettersson
Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6" (invalid opcode) and a kernel halt immediately after the kernel has been uncompressed. The BUG shows EIP pointing to an rdtsc instruction in native_read_tsc(), invoked from native_sched_clock(). (This error occurs so early that not even the serial console can capture it.) A bisection showed that this bug first occurs in 2.6.26-rc3-git7, via commit 9ccc906c97e34fd91dc6aaf5b69b52d824386910: >x86: distangle user disabled TSC from unstable > >tsc_enabled is set to 0 from the command line switch "notsc" and from >the mark_tsc_unstable code. Seperate those functionalities and replace >tsc_enable with tsc_disable. This makes also the native_sched_clock() >decision when to use TSC understandable. > >Preparatory patch to solve the sched_clock() issue on 32 bit. > >Signed-off-by: Thomas Gleixner <tglx@linutronix.de> The core reason for this bug is that native_sched_clock() gets called before tsc_init(). Before the commit above, tsc_32.c used a "tsc_enabled" variable which defaulted to 0 == disabled, and which only got enabled late in tsc_init(). Thus early calls to native_sched_clock() would skip the TSC and use jiffies instead. After the commit above, tsc_32.c uses a "tsc_disabled" variable which defaults to 0, meaning that the TSC is Ok to use. Early calls to native_sched_clock() now erroneously try to use the TSC on !cpu_has_tsc processors, leading to invalid opcode exceptions. My proposed fix is to initialise tsc_disabled to a "soft disabled" state distinct from the hard disabled state set up by the "notsc" kernel option. This fixes the native_sched_clock() problem. It also allows tsc_init() to be simplified: instead of setting tsc_disabled = 1 on every error return, we just set tsc_disabled = 0 once when all checks have succeeded. I've verified that this lets my 486 boot again. I've also verified that a Core2 machine still uses the TSC as clocksource after the patch. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86: fix NULL pointer deref in __switch_toSuresh Siddha
Patrick McHardy reported a crash: > > I get this oops once a day, its apparently triggered by something > > run by cron, but the process is a different one each time. > > > > Kernel is -git from yesterday shortly before the -rc6 release > > (last commit is the usb-2.6 merge, the x86 patches are missing), > > .config is attached. > > > > I'll retry with current -git, but the patches that have gone in > > since I last updated don't look related. > > > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at > > 000001ff > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118 > > [62060.043009] *pde = 00000000 > > [62060.043009] Oops: 0002 [#1] PREEMPT Vegard Nossum analyzed it: > This decodes to > > 0: 0f ae 00 fxsave (%eax) > > so it's related to the floating-point context. This is the exact > location of the crash: > > $ addr2line -e arch/x86/kernel/process_32.o -i ab0 > include/asm/i387.h:232 > include/asm/i387.h:262 > arch/x86/kernel/process_32.c:595 > > ...so it looks like prev_task->thread.xstate->fxsave has become NULL. > Or maybe it never had any other value. Somehow (as described below) TS_USEDFPU is set but the fpu is not allocated or freed. Another possible FPU pre-emption issue with the sleazy FPU optimization which was benign before but not so anymore, with the dynamic FPU allocation patch. New task is getting exec'd and it is prempted at the below point. flush_thread() { ... /* * Forget coprocessor state.. */ clear_fpu(tsk); <----- Preemption point clear_used_math(); ... } Now when it context switches in again, as the used_math() is still set and fpu_counter can be > 5, we will do a math_state_restore() which sets the task's TS_USEDFPU. After it continues from the above preemption point it does clear_used_math() and much later free_thread_xstate(). Now, at the next context switch, it is quite possible that xstate is null, used_math() is not set and TS_USEDFPU is still set. This will trigger unlazy_fpu() causing kernel oops. Fix this by clearing tsk's fpu_counter before clearing task's fpu. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-17x86-64: Fix "bytes left to copy" return value for copy_from_user()Linus Torvalds
Most users by far do not care about the exact return value (they only really care about whether the copy succeeded in its entirety or not), but a few special core routines actually care deeply about exactly how many bytes were copied from user space. And the unrolled versions of the x86-64 user copy routines would sometimes report that it had copied more bytes than it actually had. Very few uses actually have partial copies to begin with, but to make this bug even harder to trigger, most x86 CPU's use the "rep string" instructions for normal user copies, and that version didn't have this issue. To make it even harder to hit, the one user of this that really cared about the return value (and used the uncached version of the copy that doesn't use the "rep string" instructions) was the generic write routine, which pre-populated its source, once more hiding the problem by avoiding the exception case that triggers the bug. In other words, very special thanks to Bron Gondwana who not only triggered this, but created a test-program to show it, and bisected the behavior down to commit 08291429cfa6258c4cd95d8833beb40f828b194e ("mm: fix pagecache write deadlocks") which changed the access pattern just enough that you can now trigger it with 'writev()' with multiple iovec's. That commit itself was not the cause of the bug, it just allowed all the stars to align just right that you could trigger the problem. [ Side note: this is just the minimal fix to make the copy routines (with __copy_from_user_inatomic_nocache as the particular version that was involved in showing this) have the right return values. We really should improve on the exceptional case further - to make the copy do a byte-accurate copy up to the exact page limit that causes it to fail. As it is, the callers have to do extra work to handle the limit case gracefully. ] Reported-by: Bron Gondwana <brong@fastmail.fm> Cc: Nick Piggin <npiggin@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (which didn't have this problem), and since most users that do the carethis was very hard to trigger, but
2008-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fixup write combine comment in pci_mmap_resource x86: PAT export resource_wc in pci sysfs x86, pci-dma.c: don't always add __GFP_NORETRY to gfp suspend-vs-iommu: prevent suspend if we could not resume x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005) PCI: use dev_to_node in pci_call_probe PCI: Correct last two HP entries in the bfsort whitelist