summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2010-02-12sparsemem: Put mem map for one node together.Yinghai Lu
Add vmemmap_alloc_block_buf for mem map only. It will fallback to the old way if it cannot get a block that big. Before this patch, when a node have 128g ram installed, memmap are split into two parts or more. [ 0.000000] [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1 [ 0.000000] [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1 [ 0.000000] [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0 [ 0.000000] [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0 [ 0.000000] [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0 [ 0.000000] [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2 [ 0.000000] [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2 [ 0.000000] [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2 [ 0.000000] [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3 [ 0.000000] [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3 [ 0.000000] [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4 [ 0.000000] [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4 [ 0.000000] [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5 [ 0.000000] [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5 [ 0.000000] [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5 [ 0.000000] [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6 [ 0.000000] [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6 [ 0.000000] [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6 [ 0.000000] [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7 [ 0.000000] [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7 after patch will get [ 0.000000] [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0 [ 0.000000] [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1 [ 0.000000] [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2 [ 0.000000] [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3 [ 0.000000] [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4 [ 0.000000] [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5 [ 0.000000] [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6 [ 0.000000] [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7 -v2: change buf to vmemmap_buf instead according to Ingo also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo -v3: according to Andrew, use sizeof(name) instead of hard coded 15 Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-12x86: Make 64 bit use early_res instead of bootmem before slabYinghai Lu
Finally we can use early_res to replace bootmem for x86_64 now. Still can use CONFIG_NO_BOOTMEM to enable it or not. -v2: fix 32bit compiling about MAX_DMA32_PFN -v3: folded bug fix from LKML message below Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B747239.4070907@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMAYinghai Lu
64bit NUMA already make enough space under 4G with new early_node_mem. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-16-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Make early_node_mem get mem > 4 GB if possibleYinghai Lu
So we could put pgdata for the node high, and later sparse vmmap will get the section nr that need. With this patch will make <4 GB ram not use a sparse vmmap. before this patch, will get, before swiotlb try get bootmem [ 0.000000] nid=1 start=0 end=2080000 aligned=1 [ 0.000000] free [10 - 96] [ 0.000000] free [b12 - 1000] [ 0.000000] free [359f - 38a3] [ 0.000000] free [38b5 - 3a00] [ 0.000000] free [41e01 - 42000] [ 0.000000] free [73dde - 73e00] [ 0.000000] free [73fdd - 74000] [ 0.000000] free [741dd - 74200] [ 0.000000] free [743dd - 74400] [ 0.000000] free [745dd - 74600] [ 0.000000] free [747dd - 74800] [ 0.000000] free [749dd - 74a00] [ 0.000000] free [74bdd - 74c00] [ 0.000000] free [74ddd - 74e00] [ 0.000000] free [74fdd - 75000] [ 0.000000] free [751dd - 75200] [ 0.000000] free [753dd - 75400] [ 0.000000] free [755dd - 75600] [ 0.000000] free [757dd - 75800] [ 0.000000] free [759dd - 75a00] [ 0.000000] free [75bdd - 7bf5f] [ 0.000000] free [7f730 - 7f750] [ 0.000000] free [100000 - 2080000] [ 0.000000] total free 1f87170 [ 93.301474] Placing 64MB software IO TLB between ffff880075bdd000 - ffff880079bdd000 [ 93.311814] software IO TLB at phys 0x75bdd000 - 0x79bdd000 with this patch will get: before swiotlb try get bootmem [ 0.000000] nid=1 start=0 end=2080000 aligned=1 [ 0.000000] free [a - 96] [ 0.000000] free [702 - 1000] [ 0.000000] free [359f - 3600] [ 0.000000] free [37de - 3800] [ 0.000000] free [39dd - 3a00] [ 0.000000] free [3bdd - 3c00] [ 0.000000] free [3ddd - 3e00] [ 0.000000] free [3fdd - 4000] [ 0.000000] free [41dd - 4200] [ 0.000000] free [43dd - 4400] [ 0.000000] free [45dd - 4600] [ 0.000000] free [47dd - 4800] [ 0.000000] free [49dd - 4a00] [ 0.000000] free [4bdd - 4c00] [ 0.000000] free [4ddd - 4e00] [ 0.000000] free [4fdd - 5000] [ 0.000000] free [51dd - 5200] [ 0.000000] free [53dd - 5400] [ 0.000000] free [55dd - 7bf5f] [ 0.000000] free [7f730 - 7f750] [ 0.000000] free [100428 - 100600] [ 0.000000] free [13ea01 - 13ec00] [ 0.000000] free [170800 - 2080000] [ 0.000000] total free 1f87170 [ 92.689485] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 92.699799] Placing 64MB software IO TLB between ffff8800055dd000 - ffff8800095dd000 [ 92.710916] software IO TLB at phys 0x55dd000 - 0x95dd000 so will get enough space below 4G, aka pfn 0x100000 Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-15-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Dynamically increase early_res array sizeYinghai Lu
Use early_res_count to track the num, and use find_e820 to get a new buffer, then copy from the old to the new one. Also, clear early_res to prevent later invalid usage. -v2 _check_and_double_early_res should take new start Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-14-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Introduce max_early_res and early_res_countYinghai Lu
To prepare allocate early res array from fine_e820_area. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-13-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Call early_res_to_bootmem one timeYinghai Lu
Simplify setup_node_mem: don't use bootmem from other node, instead just find_e820_area in early_node_mem. This keeps the boundary between early_res and boot mem more clear, and lets us only call early_res_to_bootmem() one time instead of for all nodes. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Print out RAM buffer informationYinghai Lu
So we can check that early in the bootlog. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-11-git-send-email-yinghai@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Change range end to start+sizeYinghai Lu
So make interface more consistent with early_res. Later we can share some code with early_res. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-10-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86/pci: Enable pci root res read out for 32bit tooYinghai Lu
Should be good for 32bit too. -v3: cast res->start -v4: according to Linus, to use %pR instead of cast Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-9-git-send-email-yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86/pci: Add cap_resource()Yinghai Lu
Prepare for 32bit pci root bus -v2: hpa said we should compare with (resource_size_t)~0 -v3: according to Linus to use MAX_RESOURCE instead. also need need to put related patches together -v4: according to Andrew, use min in cap_resource() Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-8-git-send-email-yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86/pci: Use u64 instead of size_t in amd_bus.cYinghai Lu
Prepare to enable it for 32bit. -v2: remove not needed cast Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-7-git-send-email-yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86/pci: AMD one chain system to use pci read out resYinghai Lu
Found MSI amd k8 based laptops is hiding [0x70000000, 0x80000000) RAM from e820. enable amd one chain even for all. -v2: use bool for found, according to Andrew Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-6-git-send-email-yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86/pci: Use resource_size_t in update_resYinghai Lu
Prepare to enable 32bit intel and amd bus. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-5-git-send-email-yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Move range related operation to one fileYinghai Lu
We have almost the same code for mtrr cleanup and amd_bus checkup, and this code will also be used in replacing bootmem with early_res, so try to move them together and reuse it from different parts. Also rename update_range to subtract_range as that is what the function is actually doing. -v2: update comments as Christoph requested Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-4-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10Merge remote branch 'linus/master' into x86/bootmemH. Peter Anvin
2010-02-10Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PIT: control word is write-only kvmclock: count total_sleep_time when updating guest clock Export the symbol of getboottime and mmonotonic_to_bootbased
2010-02-09KVM: PIT: control word is write-onlyMarcelo Tosatti
PIT control word (address 0x43) is write-only, reads are undefined. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09kvmclock: count total_sleep_time when updating guest clockJason Wang
Current kvm wallclock does not consider the total_sleep_time which could cause wrong wallclock in guest after host suspend/resume. This patch solve this issue by counting total_sleep_time to get the correct host boot time. Cc: stable@kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-08Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix ondemand to not request targets outside policy limits [CPUFREQ] Fix use after free of struct powernow_k8_data [CPUFREQ] fix default value for ondemand governor
2010-02-02memory hotplug: fix a bug on /dev/mem for 64-bit kernelsShaohui Zheng
Newly added memory can not be accessed via /dev/mem, because we do not update the variables high_memory, max_pfn and max_low_pfn. Add a function update_end_of_memory_vars() to update these variables for 64-bit kernels. [akpm@linux-foundation.org: simplify comment] Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Li Haicheng <haicheng.li@intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-01x86: Use the generic page_is_ram()Wu Fengguang
The generic resource based page_is_ram() works better with memory hotplug/hotremove. So switch the x86 e820map based code to it. CC: Andi Kleen <andi@firstfloor.org> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100122033004.470767217@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-01x86: Remove BIOS data range from e820Yinghai Lu
In preparation for moving to the generic page_is_ram(), make explicit what we expect to be reserved and not reserved. Tested-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20100122033004.335813103@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-01Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, hw_breakpoint, kgdb: Do not take mutex for kernel debugger x86, hw_breakpoints, kgdb: Fix kgdb to use hw_breakpoint API hw_breakpoints: Release the bp slot if arch_validate_hwbkpt_settings() fails. perf: Ignore perf.data.old perf report: Fix segmentation fault when running with '-g none'
2010-02-01Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/agp: Fix agp_amd64_init regression x86: Add quirk for Intel DG45FC board to avoid low memory corruption x86: Add Dell OptiPlex 760 reboot quirk x86, UV: Fix RTC latency bug by reading replicated cachelines oprofile/x86: add Xeon 7500 series support oprofile/x86: fix crash when profiling more than 28 events lib/dma-debug.c: mark file-local struct symbol static. x86/amd-iommu: Fix deassignment of a device from the pt_domain x86/amd-iommu: Fix IOMMU-API initialization for iommu=pt x86/amd-iommu: Fix NULL pointer dereference in __detach_device() x86/amd-iommu: Fix possible integer overflow
2010-01-30perf, hw_breakpoint, kgdb: Do not take mutex for kernel debuggerJason Wessel
This patch fixes the regression in functionality where the kernel debugger and the perf API do not nicely share hw breakpoint reservations. The kernel debugger cannot use any mutex_lock() calls because it can start the kernel running from an invalid context. A mutex free version of the reservation API needed to get created for the kernel debugger to safely update hw breakpoint reservations. The possibility for a breakpoint reservation to be concurrently processed at the time that kgdb interrupts the system is improbable. Should this corner case occur the end user is warned, and the kernel debugger will prohibit updating the hardware breakpoint reservations. Any time the kernel debugger reserves a hardware breakpoint it will be a system wide reservation. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: kgdb-bugreport@lists.sourceforge.net Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: torvalds@linux-foundation.org LKML-Reference: <1264719883-7285-3-git-send-email-jason.wessel@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-30x86, hw_breakpoints, kgdb: Fix kgdb to use hw_breakpoint APIJason Wessel
In the 2.6.33 kernel, the hw_breakpoint API is now used for the performance event counters. The hw_breakpoint_handler() now consumes the hw breakpoints that were previously set by kgdb arch specific code. In order for kgdb to work in conjunction with this core API change, kgdb must use some of the low level functions of the hw_breakpoint API to install, uninstall, and deal with hw breakpoint reservations. The kgdb core required a change to call kgdb_disable_hw_debug anytime a slave cpu enters kgdb_wait() in order to keep all the hw breakpoints in sync as well as to prevent hitting a hw breakpoint while kgdb is active. During the architecture specific initialization of kgdb, it will pre-allocate 4 disabled (struct perf event **) structures. Kgdb will use these to manage the capabilities for the 4 hw breakpoint registers, per cpu. Right now the hw_breakpoint API does not have a way to ask how many breakpoints are available, on each CPU so it is possible that the install of a breakpoint might fail when kgdb restores the system to the run state. The intent of this patch is to first get the basic functionality of hw breakpoints working and leave it to the person debugging the kernel to understand what hw breakpoints are in use and what restrictions have been imposed as a result. Breakpoint constraints will be dealt with in a future patch. While atomic, the x86 specific kgdb code will call arch_uninstall_hw_breakpoint() and arch_install_hw_breakpoint() to manage the cpu specific hw breakpoints. The net result of these changes allow kgdb to use the same pool of hw_breakpoints that are used by the perf event API, but neither knows about future reservations for the available hw breakpoint slots. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: kgdb-bugreport@lists.sourceforge.net Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: torvalds@linux-foundation.org LKML-Reference: <1264719883-7285-2-git-send-email-jason.wessel@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29x86: Add quirk for Intel DG45FC board to avoid low memory corruptionDavid Härdeman
Commit 6aa542a694dc9ea4344a8a590d2628c33d1b9431 added a quirk for the Intel DG45ID board due to low memory corruption. The Intel DG45FC shares the same BIOS (and the same bug) as noted in: http://bugzilla.kernel.org/show_bug.cgi?id=13736 Signed-off-by: David Härdeman <david@hardeman.nu> LKML-Reference: <20100128200254.GA9134@hardeman.nu> Cc: <stable@kernel.org> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: ykzhao <yakui.zhao@intel.com> Cc: Tony Bones <aabonesml@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-29x86: get rid of the insane TIF_ABI_PENDING bitH. Peter Anvin
Now that the previous commit made it possible to do the personality setting at the point of no return, we do just that for ELF binaries. And suddenly all the reasons for that insane TIF_ABI_PENDING bit go away, and we can just make SET_PERSONALITY() just do the obvious thing for a 32-bit compat process. Everything becomes much more straightforward this way. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29Split 'flush_old_exec' into two functionsLinus Torvalds
'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86/PCI: remove IOH range fetching PCI: fix nested spinlock hang in aer_inject
2010-01-28x86/PCI: remove IOH range fetchingJeff Garrett
Turned out to cause trouble on single IOH machines, and is superceded by _CRS on multi-IOH machines with production BIOSes. Signed-off-by: Jeff Garrett <jeff@jgarrett.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-01-27x86: Add Dell OptiPlex 760 reboot quirkLeann Ogasawara
Dell OptiPlex 760 hangs on reboot unless reboot=bios is used. Add quirk to reboot through the BIOS. BugLink: https://bugs.launchpad.net/bugs/488319 Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> LKML-Reference: <1264634958.27335.1091.camel@emiko> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-27Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, msr/cpuid: Pass the number of minors when unregistering MSR and CPUID drivers. x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG) Revert "x86: ucode-amd: Load ucode-patches once ..." x86: Disable HPET MSI on ATI SB700/SB800 x86: Set hotpluggable nodes in nodes_possible_map
2010-01-27x86, UV: Fix RTC latency bug by reading replicated cachelinesDimitri Sivanich
For SGI UV node controllers (HUB) rev 2.0 or greater, use replicated cachelines to read the RTC timer. This optimization allows faster simulataneous reads from a given socket. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <20100122154140.GB4975@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-27Merge branch 'iommu/fixes' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
2010-01-27Merge branch 'urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent
2010-01-26x86, msr/cpuid: Pass the number of minors when unregistering MSR and CPUID ↵Russ Anderson
drivers. Pass the number of minors when unregistering MSR and CPUID drivers. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Dean Nelson <dnelson@redhat.com> LKML-Reference: <20100127023722.GA22305@sgi.com> Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-25oprofile/x86: add Xeon 7500 series supportAndi Kleen
Add Xeon 7500 series support to oprofile. Straight forward: it's the same as Core i7, so just detect the model number. No user space changes needed. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-01-25oprofile/x86: fix crash when profiling more than 28 eventsSuravee Suthikulpanit
With multiplexing enabled oprofile crashs when profiling more than 28 events. This patch fixes this. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-01-25KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()Wei Yongjun
In function kvm_arch_vcpu_init(), if the memory malloc for vcpu->arch.mce_banks is fail, it does not free the memory of lapic date. This patch fixed it. Cc: stable@kernel.org Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: x86: Fix probable memory leak of vcpu->arch.mce_banksWei Yongjun
vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but never free in any place, this may cause memory leak. So this patch fixed to free it in kvm_arch_vcpu_uninit(). Cc: stable@kernel.org Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: MMU: bail out pagewalk on kvm_read_guest errorMarcelo Tosatti
Exit the guest pagetable walk loop if reading gpte failed. Otherwise its possible to enter an endless loop processing the previous present pte. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: x86: Fix host_mapping_level()Sheng Yang
When found a error hva, should not return PAGE_SIZE but the level... Also clean up the coding style of the following loop. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25KVM: Fix race between APIC TMR and IRRAvi Kivity
When we queue an interrupt to the local apic, we set the IRR before the TMR. The vcpu can pick up the IRR and inject the interrupt before setting the TMR, and perhaps even EOI it, causing incorrect behaviour. The race is really insignificant since it can only occur on the first interrupt (usually following interrupts will not change TMR), but it's better closed than open. Fixed by reordering setting the TMR vs IRR. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-23x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG)H. Peter Anvin
CONFIG_X86_CPU_DEBUG, which provides some parsed versions of the x86 CPU configuration via debugfs, has caused boot failures on real hardware. The value of this feature has been marginal at best, as all this information is already available to userspace via generic interfaces. Causes crashes that have not been fixed + minimal utility -> remove. See the referenced LKML thread for more information. Reported-by: Ozan Çağlayan <ozan@pardus.org.tr> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <alpine.LFD.2.00.1001221755320.13231@localhost.localdomain> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Yinghai Lu <yinghai@kernel.org> Cc: <stable@kernel.org>
2010-01-23Revert "x86: ucode-amd: Load ucode-patches once ..."Andreas Herrmann
Commit d1c84f79a6ba992dc01e312c44a21496303874d6 leads to a regression when microcode_amd.c is compiled into the kernel. It causes a big boot delay because the firmware is not available. See http://marc.info/?l=linux-kernel&m=126267290920060 It also renders the reload sysfs attribute useless. Fixing this is too intrusive for an -rc5 kernel. Thus I'd like to restore the microcode loading behaviour of kernel 2.6.32. CC: Gene Heskett <gene.heskett@verizon.net> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100122203456.GB13792@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-23x86: Disable HPET MSI on ATI SB700/SB800Pallipadi, Venkatesh
HPET MSI on platforms with ATI SB700/SB800 as they seem to have some side-effects on floppy DMA. Do not use HPET MSI on such platforms. Original problem report from Mark Hounschell http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html [ This patch needs to go to stable as well. But, there are some conflicts that prevents the patch from going as is. I can rebase/resubmit to stable once the patch goes upstream. hpa: still Cc:'ing stable@ as an FYI. ] Tested-by: Mark Hounschell <markh@compro.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-23x86: Set hotpluggable nodes in nodes_possible_mapDavid Rientjes
nodes_possible_map does not currently include nodes that have SRAT entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared in nodes_parsed if it does not have an online address range. Unequivocally setting the bit in nodes_parsed is insufficient since existing code, such as acpi_get_nodes(), assumes all nodes in the map have online address ranges. In fact, all code using nodes_parsed assumes such nodes represent an address range of online memory. nodes_possible_map is created by unioning nodes_parsed and cpu_nodes_parsed; the former represents nodes with online memory and the latter represents memoryless nodes. We now set the bit for hotpluggable nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map. [ hpa: Haicheng Li points out that this makes the naming of the variable cpu_nodes_parsed somewhat counterintuitive. However, leave it as is in the interest of keeping the pure bug fix patch small. ] Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <alpine.DEB.2.00.1001201152040.30528@chino.kir.corp.google.com> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-22x86/amd-iommu: Fix deassignment of a device from the pt_domainJoerg Roedel
Deassigning a device from the passthrough domain does not work and breaks device assignment to kvm guests. This patch fixes the issue. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>