summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2011-08-08x86: HPET: Chose a paranoid safe value for the ETIME checkThomas Gleixner
(imported from commit v2.6.37-rc5-64-gf1c1807) commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty) chose 8 HPET cycles as a safe value for the ETIME check, as we had the confirmation that the posted write to the comparator register is delayed by two HPET clock cycles on Intel chipsets which showed readback problems. After that patch hit mainline we got reports from machines with newer AMD chipsets which seem to have an even longer delay. See http://thread.gmane.org/gmane.linux.kernel/1054283 and http://thread.gmane.org/gmane.linux.kernel/1069458 for further information. Boris tried to come up with an ACPI based selection of the minimum HPET cycles, but this failed on a couple of test machines. And of course we did not get any useful information from the hardware folks. For now our only option is to chose a paranoid high and safe value for the minimum HPET cycles used by the ETIME check. Adjust the minimum ns value for the HPET clockevent accordingly. Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6> Cc: Simon Kirby <sim@hostway.ca> Cc: Borislav Petkov <bp@alien8.de> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08x86: Hpet: Avoid the comparator readback penaltyThomas Gleixner
(imported from commit v2.6.36-rc4-167-g995bd3b) Due to the overly intelligent design of HPETs, we need to workaround the problem that the compare value which we write is already behind the actual counter value at the point where the value hits the real compare register. This happens for two reasons: 1) We read out the counter, add the delta and write the result to the compare register. When a NMI or SMI hits between the read out and the write then the counter can be ahead of the event already 2) The write to the compare register is delayed by up to two HPET cycles in certain chipsets. We worked around this by reading back the compare register to make sure that the written value has hit the hardware. For certain ICH9+ chipsets this can require two readouts, as the first one can return the previous compare register value. That's bad performance wise for the normal case where the event is far enough in the future. As we already know that the write can be delayed by up to two cycles we can avoid the read back of the compare register completely if we make the decision whether the delta has elapsed already or not based on the following calculation: cmp = event - actual_count; If cmp is less than 8 HPET clock cycles, then we decide that the event has happened already and return -ETIME. That covers the above #1 and #2 problems which would cause a wait for HPET wraparound (~306 seconds). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nix <nix@esperi.org.uk> Tested-by: Artur Skawina <art.08.09@gmail.com> Cc: Damien Wyart <damien.wyart@free.fr> Tested-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08kexec, x86: Fix incorrect jump back address if not preserving contextHuang Ying
commit 050438ed5a05b25cdf287f5691e56a58c2606997 upstream. In kexec jump support, jump back address passed to the kexeced kernel via function calling ABI, that is, the function call return address is the jump back entry. Furthermore, jump back entry == 0 should be used to signal that the jump back or preserve context is not enabled in the original kernel. But in the current implementation the stack position used for function call return address is not cleared context preservation is disabled. The patch fixes this bug. Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08x86: Make Dell Latitude E5420 use reboot=pciDaniel J Blueman
commit b7798d28ec15d20fd34b70fa57eb13f0cf6d1ecd upstream. Rebooting on the Dell E5420 often hangs with the keyboard or ACPI methods, but is reliable via the PCI method. [ hpa: this was deferred because we believed for a long time that the recent reshuffling of the boot priorities in commit 660e34cebf0a11d54f2d5dd8838607452355f321 fixed this platform. Unfortunately that turned out to be incorrect. ] Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23x86/amd-iommu: Fix boot crash with hidden PCI devicesJoerg Roedel
commit 26018874e3584f1658570d41d57d4c34f6a53aa0 upstream. Some PCIe cards ship with a PCI-PCIe bridge which is not visible as a PCI device in Linux. But the device-id of the bridge is present in the IOMMU tables which causes a boot crash in the IOMMU driver. This patch fixes by removing these cards from the IOMMU handling. This is a pure -stable fix, a real fix to handle this situation appriatly will follow for the next merge window. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23x86/amd-iommu: Fix 3 possible endless loopsJoerg Roedel
commit 0de66d5b35ee148455e268b2782873204ffdef4b upstream. The driver contains several loops counting on an u16 value where the exit-condition is checked against variables that can have values up to 0xffff. In this case the loops will never exit. This patch fixed 3 such loops. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23x86/amd-iommu: Use only per-device dma_opsJoerg Roedel
commit 27c2127a15d340706c0aa84e311188a14468d841 upstream. Unfortunatly there are systems where the AMD IOMMU does not cover all devices. This breaks with the current driver as it initializes the global dma_ops variable. This patch limits the AMD IOMMU to the devices listed in the IVRS table fixing DMA for devices not covered by the IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23x86, amd: Use _safe() msr access for GartTlbWlk disable codeRoedel, Joerg
commit d47cc0db8fd6011de2248df505fc34990b7451bf upstream. The workaround for Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=33012 introduced a read and a write to the MC4 mask msr. Unfortunatly this MSR is not emulated by the KVM hypervisor so that the kernel will get a #GP and crashes when applying this workaround when running inside KVM. This issue was reported as: https://bugzilla.kernel.org/show_bug.cgi?id=35132 and is fixed with this patch. The change just let the kernel ignore any #GP it gets while accessing this MSR by using the _safe msr access methods. Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23x86, amd: Do not enable ARAT feature on AMD processors below family 0x12Boris Ostrovsky
commit e9cdd343a5e42c43bcda01e609fa23089e026470 upstream. Commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 added support for ARAT (Always Running APIC timer) on AMD processors that are not affected by erratum 400. This erratum is present on certain processor families and prevents APIC timer from waking up the CPU when it is in a deep C state, including C1E state. Determining whether a processor is affected by this erratum may have some corner cases and handling these cases is somewhat complicated. In the interest of simplicity we won't claim ARAT support on processor families below 0x12 and will go back to broadcasting timer when going idle. Signed-off-by: Boris Ostrovsky <ostr@amd64.org> Link: http://lkml.kernel.org/r/1306423192-19774-1-git-send-email-ostr@amd64.org Tested-by: Boris Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23x86, mce, AMD: Fix leaving freed data in a listJulia Lawall
commit d9a5ac9ef306eb5cc874f285185a15c303c50009 upstream. b may be added to a list, but is not removed before being freed in the case of an error. This is done in the corresponding deallocation function, so the code here has been changed to follow that. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E,E1,E2; identifier l; @@ *list_add(&E->l,E1); ... when != E1 when != list_del(&E->l) when != list_del_init(&E->l) when != E = E2 *kfree(E);// </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23x86, apic: Fix spurious error interrupts triggering on all non-boot APsYouquan Song
commit e503f9e4b092e2349a9477a333543de8f3c7f5d9 upstream. This patch fixes a bug reported by a customer, who found that many unreasonable error interrupts reported on all non-boot CPUs (APs) during the system boot stage. According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC may signal an illegal vector error when an LVT entry is set as an illegal vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of whether the mask bit is set or an interrupt actually happen. These errors are seen as error interrupts. The initial value of thermal LVT entries on all APs always reads 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT registers are reset to 0s except for the mask bits which are set to 1s when APs receive INIT IPI. When the BIOS takes over the thermal throttling interrupt, the LVT thermal deliver mode should be SMI and it is required from the kernel to keep AP's LVT thermal monitoring register programmed as such as well. This issue happens when BIOS does not take over thermal throttling interrupt, AP's LVT thermal monitor register will be restored to 0x10000 which means vector 0 and fixed deliver mode, so all APs will signal illegal vector error interrupts. This patch check if interrupt delivery mode is not fixed mode before restoring AP's LVT thermal monitor register. Signed-off-by: Youquan Song <youquan.song@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yong Wang <yong.y.wang@intel.com> Cc: hpa@linux.intel.com Cc: joe@perches.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23x86, AMD: Fix ARAT feature setting againBorislav Petkov
commit 14fb57dccb6e1defe9f89a66f548fcb24c374c1d upstream. Trying to enable the local APIC timer on early K8 revisions uncovers a number of other issues with it, in conjunction with the C1E enter path on AMD. Fixing those causes much more churn and troubles than the benefit of using that timer brings so don't enable it on K8 at all, falling back to the original functionality the kernel had wrt to that. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: Joerg-Volker-Peetz <jvpeetz@web.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"Borislav Petkov
commit 328935e6348c6a7cb34798a68c326f4b8372e68a upstream. This reverts commit e20a2d205c05cef6b5783df339a7d54adeb50962, as it crashes certain boxes with specific AMD CPU models. Moving the lower endpoint of the Erratum 400 check to accomodate earlier K8 revisions (A-E) opens a can of worms which is simply not worth to fix properly by tweaking the errata checking framework: * missing IntPenging MSR on revisions < CG cause #GP: http://marc.info/?l=linux-kernel&m=130541471818831 * makes earlier revisions use the LAPIC timer instead of the C1E idle routine which switches to HPET, thus not waking up in deeper C-states: http://lkml.org/lkml/2011/4/24/20 Therefore, leave the original boundary starting with K8-revF. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23x86, hw_breakpoints: Fix racy access to ptrace breakpointsFrederic Weisbecker
commit 87dc669ba25777b67796d7262c569429e58b1ed4 upstream. While the tracer accesses ptrace breakpoints, the child task may concurrently exit due to a SIGKILL and thus release its breakpoints at the same time. We can then dereference some freed pointers. To fix this, hold a reference on the child breakpoints before manipulating them. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Link: http://lkml.kernel.org/r/1302284067-7860-3-git-send-email-fweisbec@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processorsBoris Ostrovsky
commit e20a2d205c05cef6b5783df339a7d54adeb50962 upstream. Older AMD K8 processors (Revisions A-E) are affected by erratum 400 (APIC timer interrupts don't occur in C states greater than C1). This, for example, means that X86_FEATURE_ARAT flag should not be set for these parts. This addresses regression introduced by commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 ("x86, AMD: Set ARAT feature on AMD processors") where the system may become unresponsive until external interrupt (such as keyboard input) occurs. This results, for example, in time not being reported correctly, lack of progress on the system and other lockups. Reported-by: Joerg-Volker Peetz <jvpeetz@web.de> Tested-by: Joerg-Volker Peetz <jvpeetz@web.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09x86, gart: Make sure GART does not map physmem above 1TBJoerg Roedel
commit 665d3e2af83c8fbd149534db8f57d82fa6fa6753 upstream. The GART can only map physical memory below 1TB. Make sure the gart driver in the kernel does not try to map memory above 1TB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, cpu: Fix regression in AMD errata checking codeHans Rosenfeld
commit 07a7795ca2e6e66d00b184efb46bd0e23d90d3fe upstream. A bug in the family-model-stepping matching code caused the presence of errata to go undetected when OSVW was not used. This causes hangs on some K8 systems because the E400 workaround is not enabled. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, amd: Disable GartTlbWlkErr when BIOS forgets itJoerg Roedel
commit 5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e upstream. This patch disables GartTlbWlk errors on AMD Fam10h CPUs if the BIOS forgets to do is (or is just too old). Letting these errors enabled can cause a sync-flood on the CPU causing a reboot. The AMD BKDG recommends disabling GART TLB Wlk Error completely. This patch is the fix for https://bugzilla.kernel.org/show_bug.cgi?id=33012 on my machine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, cpu: Clean up AMD erratum 400 workaroundHans Rosenfeld
commit 9d8888c2a214aece2494a49e699a097c2ba9498b upstream. Remove check_c1e_idle() and use the new AMD errata checking framework instead. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1280336972-865982-2-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, cpu: AMD errata checking frameworkHans Rosenfeld
commit d78d671db478eb8b14c78501c0cee1cc7baf6967 upstream. Errata are defined using the AMD_LEGACY_ERRATUM() or AMD_OSVW_ERRATUM() macros. The latter is intended for newer errata that have an OSVW id assigned, which it takes as first argument. Both take a variable number of family-specific model-stepping ranges created by AMD_MODEL_RANGE(). Iff an erratum has an OSVW id, OSVW is available on the CPU, and the OSVW id is known to the hardware, it is used to determine whether an erratum is present. Otherwise, the model-stepping ranges are matched against the current CPU to find out whether the erratum applies. For certain special errata, the code using this framework might have to conduct further checks to make sure an erratum is really (not) present. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1280336972-865982-1-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, AMD: Set ARAT feature on AMD processorsBoris Ostrovsky
commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 upstream. Support for Always Running APIC timer (ARAT) was introduced in commit db954b5898dd3ef3ef93f4144158ea8f97deb058. This feature allows us to avoid switching timers from LAPIC to something else (e.g. HPET) and go into timer broadcasts when entering deep C-states. AMD processors don't provide a CPUID bit for that feature but they also keep APIC timers running in deep C-states (except for cases when the processor is affected by erratum 400). Therefore we should set ARAT feature bit on AMD CPUs. Tested-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> LKML-Reference: <1300205624-4813-1-git-send-email-ostr@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Revert "x86: Cleanup highmap after brk is concluded"Greg Kroah-Hartman
This reverts upstream commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e It caused problems in the stable tree and should not have been there. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14x86, mtrr, pat: Fix one cpu getting out of sync during resumeSuresh Siddha
commit 84ac7cdbdd0f04df6b96153f7a79127fd6e45467 upstream. On laptops with core i5/i7, there were reports that after resume graphics workloads were performing poorly on a specific AP, while the other cpu's were ok. This was observed on a 32bit kernel specifically. Debug showed that the PAT init was not happening on that AP during resume and hence it contributing to the poor workload performance on that cpu. On this system, resume flow looked like this: 1. BP starts the resume sequence and we reinit BP's MTRR's/PAT early on using mtrr_bp_restore() 2. Resume sequence brings all AP's online 3. Resume sequence now kicks off the MTRR reinit on all the AP's. 4. For some reason, between point 2 and 3, we moved from BP to one of the AP's. My guess is that printk() during resume sequence is contributing to this. We don't see similar behavior with the 64bit kernel but there is no guarantee that at this point the remaining resume sequence (after AP's bringup) has to happen on BP. 5. set_mtrr() was assuming that we are still on BP and skipped the MTRR/PAT init on that cpu (because of 1 above) 6. But we were on an AP and this led to not reprogramming PAT on this cpu leading to bad performance. Fix this by doing unconditional mtrr_if->set_all() in set_mtrr() during MTRR/PAT init. This might be unnecessary if we are still running on BP. But it is of no harm and will guarantee that after resume, all the cpu's will be in sync with respect to the MTRR/PAT registers. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Keith Packard <keithp@keithp.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-28x86: Cleanup highmap after brk is concludedYinghai Lu
commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e upstream. Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-28x86, binutils, xen: Fix another wrong size directiveAlexander van Heukelum
commit 371c394af27ab7d1e58a66bc19d9f1f3ac1f67b4 upstream. The latest binutils (2.21.0.20110302/Ubuntu) breaks the build yet another time, under CONFIG_XEN=y due to a .size directive that refers to a slightly differently named (hence, to the now very strict and unforgiving assembler, non-existent) symbol. [ mingo: This unnecessary build breakage caused by new binutils version 2.21 gets escallated back several kernel releases spanning several years of Linux history, affecting over 130,000 upstream kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially affecting all major Linux distro kernel configs). Git annotate tells us that this slight debug symbol code mismatch bug has been introduced in 2008 in commit 3d75e1b8: 3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) The 'bug' is just a slight assymetry in ENTRY()/END() debug-symbols sequences, with lots of assembly code between the ENTRY() and the END(): ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) ... END(do_hypervisor_callback) Human reviewers almost never catch such small mismatches, and binutils never even warned about it either. This new binutils version thus breaks the Xen build on all upstream kernels since v2.6.27, out of the blue. This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels impossible on such binutils, for a bisection window of over hundred thousand historic commits. (!) This is a major fail on the side of binutils and binutils needs to turn this show-stopper build failure into a warning ASAP. ] Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jan Beulich <jbeulich@novell.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, quirk: Fix SB600 revision checkAndreas Herrmann
commit 1d3e09a304e6c4e004ca06356578b171e8735d3c upstream. Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 (x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems) introduced a regression. It removed some SB600 specific code to determine the revision ID without adapting a corresponding revision ID check for SB600. See this mail thread: http://marc.info/?l=linux-kernel&m=129980296006380&w=2 This patch adapts the corresponding check to cover all SB600 revisions. Tested-by: Wang Lei <f3d27b@gmail.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20110315143137.GD29499@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86: Emit "mem=nopentium ignored" warning when not supportedKamal Mostafa
commit 9a6d44b9adb777ca9549e88cd55bd8f2673c52a2 upstream. Emit warning when "mem=nopentium" is specified on any arch other than x86_32 (the only that arch supports it). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86: Fix panic when handling "mem={invalid}" paramKamal Mostafa
commit 77eed821accf5dd962b1f13bed0680e217e49112 upstream. Avoid removing all of memory and panicing when "mem={invalid}" is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on platforms other than x86_32). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systemsAndreas Herrmann
commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 upstream. On some SB800 systems polarity for IOAPIC pin2 is wrongly specified as low active by BIOS. This caused system hangs after resume from S3 when HPET was used in one-shot mode on such systems because a timer interrupt was missed (HPET signal is high active). For more details see: http://marc.info/?l=linux-kernel&m=129623757413868 Tested-by: Manoj Iyer <manoj.iyer@canonical.com> Tested-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20110224145346.GD3658@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86/pvclock: Zero last_value on resumeJeremy Fitzhardinge
commit e7a3481c0246c8e45e79c629efd63b168e91fcda upstream. If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platformsSuresh Siddha
commit f7448548a9f32db38f243ccd4271617758ddfe2c upstream. Markus Kohn ran into a hard hang regression on an acer aspire 1310, when acpi is enabled. git bisect showed the following commit as the bad one that introduced the boot regression. commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 Author: Suresh Siddha <suresh.b.siddha@intel.com> Date: Wed Aug 19 18:05:36 2009 -0700 x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init Because of the UP configuration of that platform, native_smp_prepare_cpus() bailed out (in smp_sanity_check()) before doing the set_mtrr_aps_delayed_init() Further down the boot path, native_smp_cpus_done() will call the delayed MTRR initialization for the AP's (mtrr_aps_init()) with mtrr_aps_delayed_init not set. This resulted in the boot processor reprogramming its MTRR's to the values seen during the start of the OS boot. While this is not needed ideally, this shouldn't have caused any side-effects. This is because the reprogramming of MTRR's (set_mtrr_state() that gets called via set_mtrr()) will check if the live register contents are different from what is being asked to write and will do the actual write only if they are different. BP's mtrr state is read during the start of the OS boot and typically nothing would have changed when we ask to reprogram it on BP again because of the above scenario on an UP platform. So on a normal UP platform no reprogramming of BP MTRR MSR's happens and all is well. However, on this platform, bios seems to be modifying the fixed mtrr range registers between the start of OS boot and when we double check the live registers for reprogramming BP MTRR registers. And as the live registers are modified, we end up reprogramming the MTRR's to the state seen during the start of the OS boot. During ACPI initialization, something in the bios (probably smi handler?) don't like this fact and results in a hard lockup. We didn't see this boot hang issue on this platform before the commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3, because only the AP's (if any) will program its MTRR's to the value that BP had at the start of the OS boot. Fix this issue by checking mtrr_aps_delayed_init before continuing further in the mtrr_aps_init(). Now, only AP's (if any) will program its MTRR's to the BP values during boot. Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393 [ By the way, this behavior of the bios modifying MTRR's after the start of the OS boot is not common and the kernel is not prepared to handle this situation well. Irrespective of this issue, during suspend/resume, linux kernel will try to reprogram the BP's MTRR values to the values seen during the start of the OS boot. So suspend/resume might be already broken on this platform for all linux kernel versions. ] Reported-and-bisected-by: Markus Kohn <jabber@gmx.org> Tested-by: Markus Kohn <jabber@gmx.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@novell.com> Cc: Rafael Wysocki <rjw@novell.com> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic modeKenji Kaneshige
commit 086e8ced65d9bcc4a8e8f1cd39b09640f2883f90 upstream. In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86: Enable the intr-remap fault handling after local APIC setupKenji Kaneshige
commit 7f7fbf45c6b748074546f7f16b9488ca71de99c1 upstream. Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <20101201062244.541996375@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, amd: Fix panic on AMD CPU family 0x15Andreas Herrmann
[The mainline kernel doesn't have this problem. Commit "(23588c3) x86, amd: Add support for CPUID topology extension of AMD CPUs" removed the family check. But 2.6.32.y needs to be fixed.] This CPU family check is not required -- existence of the NodeId MSR is indicated by a CPUID feature flag which is already checked in amd_fixup_dcm() -- and it needlessly prevents amd_fixup_dcm() to be called for newer AMD CPUs. In worst case this can lead to a panic in the scheduler code for AMD family 0x15 multi-node AMD CPUs. I just have a picture of VGA console output so I can't copy-and-paste it herein, but the call stack of such a panic looked like: do_divide_error ... find_busiest_group run_rebalance_domains ... apic_timer_interrupt ... cpu_idle The mainline kernel doesn't have this problem. Commit "(23588c3) x86, amd: Add support for CPUID topology extension of AMD CPUs" removed the family check. But 2.6.32.y needs to be fixed. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, hotplug: Use mwait to offline a processor, fix the legacy caseH. Peter Anvin
upstream ea53069231f9317062910d6e772cca4ce93de8c8 x86, hotplug: Use mwait to offline a processor, fix the legacy case Here included also some small follow-on patches to the same code: upstream a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a x86, hotplug: Move WBINVD back outside the play_dead loop upstream ce5f68246bf2385d6174856708d0b746dc378f20 x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line https://bugzilla.kernel.org/show_bug.cgi?id=5471 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86: Ignore trap bits on single step exceptionsFrederic Weisbecker
commit 6c0aca288e726405b01dacb12cac556454d34b2a upstream. When a single step exception fires, the trap bits, used to signal hardware breakpoints, are in a random state. These trap bits might be set if another exception will follow, like a breakpoint in the next instruction, or a watchpoint in the previous one. Or there can be any junk there. So if we handle these trap bits during the single step exception, we are going to handle an exception twice, or we are going to handle junk. Just ignore them in this case. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332 Reported-by: Michael Stefaniuc <mstefani@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Alexandre Julliard <julliard@winehq.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21acpi-cpufreq: fix a memleak when unloading driverZhang Rui
commit dab5fff14df2cd16eb1ad4c02e83915e1063fece upstream. We didn't free per_cpu(acfreq_data, cpu)->freq_table when acpi_freq driver is unloaded. Resulting in the following messages in /sys/kernel/debug/kmemleak: unreferenced object 0xf6450e80 (size 64): comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s) hex dump (first 32 bytes): 00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00 ......$.......$. 02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00 .....j.......5.. backtrace: [<c123ba97>] kmemleak_alloc+0x27/0x50 [<c109f96f>] __kmalloc+0xcf/0x110 [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq] [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0 [<c11920b7>] sysdev_driver_register+0x97/0x110 [<c11cce56>] cpufreq_register_driver+0x86/0x140 [<f9dad080>] 0xf9dad080 [<c1001130>] do_one_initcall+0x30/0x160 [<c10626e9>] sys_init_module+0x99/0x1e0 [<c1002d97>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21 Tested-by: Toralf Forster <toralf.foerster@gmx.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, kdump: Change copy_oldmem_page() to use cached addressingCliff Wickman
commit 37a2f9f30a360fb03522d15c85c78265ccd80287 upstream. The copy of /proc/vmcore to a user buffer proceeds much faster if the kernel addresses memory as cached. With this patch we have seen an increase in transfer rate from less than 15MB/s to 80-460MB/s, depending on size of the transfer. This makes a big difference in time needed to save a system dump. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: kexec@lists.infradead.org LKML-Reference: <E1OtMLz-0001yp-Ia@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, intr-remap: Set redirection hint in the IRTESuresh Siddha
commit 75e3cfbed6f71a8f151dc6e413b6ce3c390030cb upstream. Currently the redirection hint in the interrupt-remapping table entry is set to 0, which means the remapped interrupt is directed to the processors listed in the destination. So in logical flat mode in the presence of intr-remapping, this results in a single interrupt multi-casted to multiple cpu's as specified by the destination bit mask. But what we really want is to send that interrupt to one of the cpus based on the lowest priority delivery mode. Set the redirection hint in the IRTE to '1' to indicate that we want the remapped interrupt to be directed to only one of the processors listed in the destination. This fixes the issue of same interrupt getting delivered to multiple cpu's in the logical flat mode in the presence of interrupt-remapping. While there is no functional issue observed with this behavior, this will impact performance of such configurations (<=8 cpu's using logical flat mode in the presence of interrupt-remapping) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100827181049.013051492@sbsiddha-MOBL3.sc.intel.com> Cc: Weidong Han <weidong.han@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUsAndreas Herrmann
commit 3fdbf004c1706480a7c7fac3c9d836fa6df20d7d upstream. Instead of adapting the CPU family check in amd_special_default_mtrr() for each new CPU family assume that all new AMD CPUs support the necessary bits in SYS_CFG MSR. Tom2Enabled is architectural (defined in APM Vol.2). Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT. In pre K8-NPT BKDG this bit is reserved (read as zero). W/o this adaption Linux would unnecessarily complain about bad MTRR settings on every new AMD CPU family, e.g. [ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123235.GB20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, olpc: Don't retry EC commands foreverPaul Fox
commit 286e5b97eb22baab9d9a41ca76c6b933a484252c upstream. Avoids a potential infinite loop. It was observed once, during an EC hacking/debugging session - not in regular operation. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: dilinger@queued.net Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, kexec: Make sure to stop all CPUs before exiting the kernelAlok Kataria
commit 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce upstream. x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait" this allows the caller to specify if it wants to stop until all the cpus have processed the stop IPI. This is required specifically for the kexec case where we should wait for all the cpus to be stopped before starting the new kernel. We now wait for the cpus to stop in all cases except for panic/kdump where we expect things to be broken and we are doing our best to make things work anyway. This patch fixes a legitimate regression, which was introduced during 2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21mm, x86: Saving vmcore with non-lazy freeing of vmasCliff Wickman
commit 3ee48b6af49cf534ca2f481ecc484b156a41451d upstream. During the reading of /proc/vmcore the kernel is doing ioremap()/iounmap() repeatedly. And the buildup of un-flushed vm_area_struct's is causing a great deal of overhead. (rb_next() is chewing up most of that time). This solution is to provide function set_iounmap_nonlazy(). It causes a subsequent call to iounmap() to immediately purge the vma area (with try_purge_vmap_area_lazy()). With this patch we have seen the time for writing a 250MB compressed dump drop from 71 seconds to 44 seconds. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kexec@lists.infradead.org LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampolineHugh Dickins
commit b7d460897739e02f186425b7276e3fdb1595cea7 upstream. rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT laptop: whereas cloning from kernel to low mappings pgd range does need to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY. Signed-off-by: Hugh Dickins <hughd@google.com> LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86-32: Separate 1:1 pagetables from swapper_pg_dirJoerg Roedel
commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream. This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100816123833.GB28147@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86: detect scattered cpuid features earlierJacob Pan
commit 1dedefd1a066a795a87afca9c0236e1a94de9bf6 upstream. Some extra CPU features such as ARAT is needed in early boot so that x86_init function pointers can be set up properly. http://lkml.org/lkml/2010/5/18/519 At start_kernel() level, this patch moves init_scattered_cpuid_features() from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than platform specific x86_init layer setup. Suggested by HPA. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration orderBorislav Petkov
commit 6dcbfe4f0b4e17e289d56fa534b7ce5a6b7f63a3 upstream. This fixes possible cases of not collecting valid error info in the MCE error thresholding groups on F10h hardware. The current code contains a subtle problem of checking only the Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM thresholding group) in its first iteration and breaking out if the bit is cleared. But (!), this MSR contains an offset value, BlkPtr[31:24], which points to the remaining MSRs in this thresholding group which might contain valid information too. But if we bail out only after we checked the valid bit in the first MSR and not the block pointer too, we miss that other information. The thing is, MC4_MISC0[BlkPtr] is not predicated on MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked prior to iterating over the MCI_MISCj thresholding group, irrespective of the MC4_MISC0[Valid] setting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, irq: Plug memory leak in sparse irqThomas Gleixner
commit 1cf180c94e9166cda083ff65333883ab3648e852 upstream. free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this triggers a use after free caused by the fact that copying struct irq_cfg is done with memcpy, which copies the pointer not the cpumask. Fix both places. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21x86, hpet: Fix bogus error check in hpet_assign_irq()Thomas Gleixner
commit 021989622810b02aab4b24f91e1f5ada2b654579 upstream. create_irq() returns -1 if the interrupt allocation failed, but the code checks for irq == 0. Use create_irq_nr() instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-21tracing/x86: Don't use mcount in kvmclock.cSteven Rostedt
commit 258af47479980d8238a04568b94a4e55aa1cb537 upstream. The guest can use the paravirt clock in kvmclock.c which is used by sched_clock(), which in turn is used by the tracing mechanism for timestamps, which leads to infinite recursion. Disable mcount/tracing for kvmclock.o. Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>