lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2011-10-10	x86, nmi: Add in logic to handle multiple events and unknown NMIs	Don Zickus
	Previous patches allow the NMI subsystem to process multipe NMI events in one NMI. As previously discussed this can cause issues when an event triggered another NMI but is processed in the current NMI. This causes the next NMI to go unprocessed and become an 'unknown' NMI. To handle this, we first have to flag whether or not the NMI handler handled more than one event or not. If it did, then there exists a chance that the next NMI might be already processed. Once the NMI is flagged as a candidate to be swallowed, we next look for a back-to-back NMI condition. This is determined by looking at the %rip from pt_regs. If it is the same as the previous NMI, it is assumed the cpu did not have a chance to jump back into a non-NMI context and execute code and instead handled another NMI. If both of those conditions are true then we will swallow any unknown NMI. There still exists a chance that we accidentally swallow a real unknown NMI, but for now things seem better. An optimization has also been added to the nmi notifier rountine. Because x86 can latch up to one NMI while currently processing an NMI, we don't have to worry about executing _all_ the handlers in a standalone NMI. The idea is if multiple NMIs come in, the second NMI will represent them. For those back-to-back NMI cases, we have the potentail to drop NMIs. Therefore only execute all the handlers in the second half of a detected back-to-back NMI. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10	x86, nmi: Wire up NMI handlers to new routines	Don Zickus
	Just convert all the files that have an nmi handler to the new routines. Most of it is straight forward conversion. A couple of places needed some tweaking like kgdb which separates the debug notifier from the nmi handler and mce removes a call to notify_die. [Thanks to Ying for finding out the history behind that mce call https://lkml.org/lkml/2010/5/27/114 And Boris responding that he would like to remove that call because of it https://lkml.org/lkml/2011/9/21/163] The things that get converted are the registeration/unregistration routines and the nmi handler itself has its args changed along with code removal to check which list it is on (most are on one NMI list except for kgdb which has both an NMI routine and an NMI Unknown routine). Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Corey Minyard <minyard@acm.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Corey Minyard <minyard@acm.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10	x86, nmi: Create new NMI handler routines	Don Zickus
	The NMI handlers used to rely on the notifier infrastructure. This worked great until we wanted to support handling multiple events better. One of the key ideas to the nmi handling is to process _all_ the handlers for each NMI. The reason behind this switch is because NMIs are edge triggered. If enough NMIs are triggered, then they could be lost because the cpu can only latch at most one NMI (besides the one currently being processed). In order to deal with this we have decided to process all the NMI handlers for each NMI. This allows the handlers to determine if they recieved an event or not (the ones that can not determine this will be left to fend for themselves on the unknown NMI list). As a result of this change it is now possible to have an extra NMI that was destined to be received for an already processed event. Because the event was processed in the previous NMI, this NMI gets dropped and becomes an 'unknown' NMI. This of course will cause printks that scare people. However, we prefer to have extra NMIs as opposed to losing NMIs and as such are have developed a basic mechanism to catch most of them. That will be a later patch. To accomplish this idea, I unhooked the nmi handlers from the notifier routines and created a new mechanism loosely based on doIRQ. The reason for this is the notifier routines have a couple of shortcomings. One we could't guarantee all future NMI handlers used NOTIFY_OK instead of NOTIFY_STOP. Second, we couldn't keep track of the number of events being handled in each routine (most only handle one, perf can handle more than one). Third, I wanted to eventually display which nmi handlers are registered in the system in /proc/interrupts to help see who is generating NMIs. The patch below just implements the new infrastructure but doesn't wire it up yet (that is the next patch). Its design is based on doIRQ structs and the atomic notifier routines. So the rcu stuff in the patch isn't entirely untested (as the notifier routines have soaked it) but it should be double checked in case I copied the code wrong. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10	x86, nmi: Split out nmi from traps.c	Don Zickus
	The nmi stuff is changing a lot and adding more functionality. Split it out from the traps.c file so it doesn't continue to pollute that file. This makes it easier to find and expand all the future nmi related work. No real functional changes here. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10	perf, intel: Use GO/HO bits in perf-ctr	Gleb Natapov
	Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06	perf, amd: Use GO/HO bits in perf-ctr	Joerg Roedel
	The AMD perf-counters support counting in guest or host-mode only. Make use of that feature when user-space specified guest/host-mode only counting. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-3-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06	Merge branch 'ras' of git://amd64.org/linux/bp into perf/core	Ingo Molnar

2011-10-06	Merge commit 'v3.1-rc9' into perf/core	Ingo Molnar
	Merge reason: pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28	x86-64: Fix CFI data for interrupt frames	Jan Beulich
	The patch titled "x86: Don't use frame pointer to save old stack on irq entry" did not properly adjust CFI directives, so this patch is a follow-up to that one. With the old stack pointer no longer stored in a callee-saved register (plus some offset), we now have to use a CFA expression to describe the memory location where it is being found. This requires the use of .cfi_escape (allowing arbitrary byte streams to be emitted into .eh_frame), as there is no .cfi_def_cfa_expression (which also cannot reasonably be expected, as it would require a full expression parser). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/4E8360200200007800058467@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28	apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches	Jan Beulich
	These warnings (generally one per CPU) are a result of initializing x86_cpu_to_logical_apicid while apic_default is still in use, but the check in setup_local_APIC() being done when apic_bigsmp was already used as an override in default_setup_apic_routing(): Overriding APIC driver with bigsmp Enabling APIC mode: Physflat. Using 5 I/O APICs ------------[ cut here ]------------ WARNING: at .../arch/x86/kernel/apic/apic.c:1239 ... CPU 1 irqstacks, hard=f1c9a000 soft=f1c9c000 Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 9e000 Initializing CPU#1 ------------[ cut here ]------------ WARNING: at .../arch/x86/kernel/apic/apic.c:1239 setup_local_APIC+0x137/0x46b() Hardware name: ... CPU1 logical APIC ID: 2 != 8 ... Fix this (for the time being, i.e. until x86_32_early_logical_apicid() will get removed again, as Tejun says ought to be possible) by overriding the previously stored values at the point where the APIC driver gets overridden. v2: Move this and the pre-existing override logic into arch/x86/kernel/apic/bigsmp_32.c. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> (2.6.39 and onwards) Link: http://lkml.kernel.org/r/4E835D16020000780005844C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-28	x86, amd: Include linux/elf.h since we use stuff from asm/elf.h	Stephen Rothwell
	After merging the moduleh tree, today's linux-next build (x86_64 allmodconfig) failed like this: arch/x86/kernel/sys_x86_64.c:28:10: warning: 'enum align_flags' declared inside parameter list arch/x86/kernel/sys_x86_64.c:28:10: warning: its scope is only this definition or declaration, which is probably not what you want arch/x86/kernel/sys_x86_64.c:28:22: error: parameter 3 ('flags') has incomplete type [...] Presumably caused by the module.h split interacting with a new commit dfb09f9b7ab0 ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") from the x8 tree. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/20110928174214.17a58be15d84d67c185930e1@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-27	x86: Perf_event_amd.c needs <asm/apicdef.h>	Randy Dunlap
	Fix (rare) build error by adding <asm/apicdef.h> header file: arch/x86/kernel/cpu/perf_event_amd.c:350:2: error: 'BAD_APICID' undeclared (first use in this function) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Robert Richter <robert.richter@amd.com> Cc: Andre Przywara <andre.przywara@amd.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/4E820138.90301@xenotime.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-27	doc: fix broken references	Paul Bolle
	There are numerous broken references to Documentation files (in other Documentation files, in comments, etc.). These broken references are caused by typo's in the references, and by renames or removals of the Documentation files. Some broken references are simply odd. Fix these broken references, sometimes by dropping the irrelevant text they were part of. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-26	x86, perf: Clean up perf_event cpu code	Kevin Winchester
	The CPU support for perf events on x86 was implemented via included C files with #ifdefs. Clean this up by creating a new header file and compiling the vendor-specific files as needed. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26	Merge commit 'v3.1-rc7' into perf/core	Ingo Molnar
	Merge reason: Pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86/rtc: Don't recursively acquire rtc_lock	Matt Fleming
	A deadlock was introduced on x86 in commit ef68c8f87ed1 ("x86: Serialize EFI time accesses on rtc_lock") because efi_get_time() and friends can be called with rtc_lock already held by read_persistent_time(), e.g.: timekeeping_init() read_persistent_clock() <-- acquire rtc_lock efi_get_time() phys_efi_get_time() <-- acquire rtc_lock <DEADLOCK> To fix this let's push the locking down into the get_wallclock() and set_wallclock() implementations. Only the clock implementations that access the x86 RTC directly need to acquire rtc_lock, so it makes sense to push the locking down into the rtc, vrtc and efi code. The virtualization implementations don't require rtc_lock to be held because they provide their own serialization. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Avi Kivity <avi@redhat.com> [for the virtualization aspect] Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86: uv2: Workaround for UV2 Hub bug (system global address format)	Jack Steiner
	This is a workaround for a UV2 hub bug that affects the format of system global addresses. The GRU API for UV2 was inadvertently broken by a hardware change. The format of the physical address used for TLB dropins and for addresses used with instructions running in unmapped mode has changed. This change was not documented and became apparent only when diags failed running on system simulators. For UV1, TLB and GRU instruction physical addresses are identical to socket physical addresses (although high NASID bits must be OR'ed into the address). For UV2, socket physical addresses need to be converted. The NODE portion of the physical address needs to be shifted so that the low bit is in bit 39 or bit 40, depending on an MMR value. It is not yet clear if this bug will be fixed in a silicon respin. If it is fixed, the hub revision will be incremented & the workaround disabled. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-21	x86, ioapic: Consolidate the explicit EOI code	Suresh Siddha
	Consolidate the io-apic EOI code in clear_IO_APIC_pin() and eoi_ioapic_irq(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Rafael Wysocki <rjw@novell.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: lchiquitto@novell.com Cc: jbeulich@novell.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110825190657.259696697@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq()	Suresh Siddha
	For older IO-APIC's, we were clearing the remote-IRR by changing the RTE trigger mode to edge and then back to level. We wanted to mask the RTE during this process, so we were essentially doing mask+edge and then to unmask+level. As part of the commit ca64c47cecd0321b2e0dcbd7aaff44b68ce20654, we moved this EOI process earlier where the IO-APIC RTE is masked. So we were wrongly unmasking it in the eoi_ioapic_irq(). So change the remote-IRR clear sequence in eoi_ioapic_irq() to mask + edge and then restore the previous RTE entry which will restore the mask status as well as the level trigger. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Thomas Renninger <trenn@suse.de> Cc: Rafael Wysocki <rjw@novell.com> Cc: lchiquitto@novell.com Cc: jbeulich@novell.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110825190657.210286410@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC	Suresh Siddha
	In the kdump scenario mentioned below, we can have a case where the device using level triggered interrupt will not generate any interrupts in the kdump kernel. 1. IO-APIC sends a level triggered interrupt to the CPU's local APIC. 2. Kernel crashed before the CPU services this interrupt, leaving the remote-IRR in the IO-APIC set. 3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC initialization. But this fails to reset remote-IRR bit of the IO-APIC RTE as the remote-IRR bit is read-only. 4. Device using that level triggered entry can't generate any more interrupts because of the remote-IRR bit. In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if so do an explicit attempt to clear it (by doing EOI write on modern io-apic's and changing trigger mode to edge/level on older io-apic's). Also before doing the explicit EOI to the io-apic, ensure that the trigger mode is indeed set to level. This will enable the explicit EOI to the io-apic to reset the remote-IRR bit. Tested-by: Leonardo Chiquitto <lchiquitto@novell.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686 Cc: Rafael Wysocki <rjw@novell.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Thomas Renninger <trenn@suse.de> Cc: jbeulich@novell.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	iommu: Rename the DMAR and INTR_REMAP config options	Suresh Siddha
	Change the CONFIG_DMAR to CONFIG_INTEL_IOMMU to be consistent with the other IOMMU options. Rename the CONFIG_INTR_REMAP to CONFIG_IRQ_REMAP to match the irq subsystem name. And define the CONFIG_DMAR_TABLE for the common ACPI DMAR routines shared by both CONFIG_INTEL_IOMMU and CONFIG_IRQ_REMAP. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: youquan.song@intel.com Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.558630224@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86, ioapic: Define irq_remap_modify_chip_defaults()	Suresh Siddha
	Define irq_remap_modify_chip_defaults() and remove the duplicate code, cleanup the unnecessary ifdefs. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: youquan.song@intel.com Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.499225692@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86, msi, intr-remap: Use the ioapic set affinity routine	Suresh Siddha
	IRQ set affinity routine is same for the IO-APIC IRQ's aswell as the MSI IRQ's in the presence of interrupt-remapping. This is because we modify the interrupt-remapping table entry and doesn't touch the IO-APIC RTE or the MSI entry. So remove the ir_msi_set_affinity() and re-use the ir_ioapic_set_affinity() Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: youquan.song@intel.com Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.452760446@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21	x86, x2apic: Enable the bios request for x2apic optout	Suresh Siddha
	On the platforms which are x2apic and interrupt-remapping capable, Linux kernel is enabling x2apic even if the BIOS doesn't. This is to take advantage of the features that x2apic brings in. Some of the OEM platforms are running into issues because of this, as their bios is not x2apic aware. For example, this was resulting in interrupt migration issues on one of the platforms. Also if the BIOS SMI handling uses APIC interface to send SMI's, then the BIOS need to be aware of x2apic mode that OS has enabled. On some of these platforms, BIOS doesn't have a HW mechanism to turnoff the x2apic feature to prevent OS from enabling it. To resolve this mess, recent changes to the VT-d2 specification: http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf includes a mechanism that provides BIOS a way to request system software to opt out of enabling x2apic mode. Look at the x2apic optout flag in the DMAR tables before enabling the x2apic mode in the platform. Also print a warning that we have disabled x2apic based on the BIOS request. Kernel boot parameter "intremap=no_x2apic_optout" can be used to override the BIOS x2apic optout request. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: joerg.roedel@amd.com Cc: tony.luck@intel.com Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/20110824001456.171766616@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-15	Merge branch 'master' into for-next	Jiri Kosina
	Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.
2011-09-15	sched: x86_32 Fix typo in switch_to() description	Kamalesh Babulal
	This patch fixes the typo in parameters passed to x86_32 switch_to() description. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-14	x86, mce: Do not call del_timer_sync() in IRQ context	Hidetoshi Seto
	del_timer_sync() can cause a deadlock when called in interrupt context. It is used with on_each_cpu() in some parts for sysfs files like bank*, check_interval, cmci_disabled and ignore_ce. However, use of on_each_cpu() results in calling the function passed as the argument in interrupt context. This causes a flood of nested warnings from del_timer_sync() (it runs on each CPU) caused even by a simple file access like: $ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval Fortunately, these MCE-specific files are rarely used and AFAIK only few MCE geeks experience this warning. To remove the warning, move timer deletion outside of the interrupt context. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-09-13	locking, x86: mce: Annotate cmci_discover_lock as raw	Thomas Gleixner
	The cmci_discover_lock can be taken in atomic context (cpu bring up sequence) and therefore cannot be preempted on -rt. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-12	x86: cache_info: Update calculation of AMD L3 cache indices	Frank Arnold
	L3 subcaches 0 and 1 of AMD Family 15h CPUs can have a size of 2MB. Update the calculation routine for the number of L3 indices to reflect that. Signed-off-by: Frank Arnold <frank.arnold@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Rosenfeld Hans <Hans.Rosenfeld@amd.com> Cc: Herrmann3 Andreas <Andreas.Herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Cc: Frank Arnold <Frank.Arnold@amd.com> Link: http://lkml.kernel.org/r/20110726170449.GB32536@aftab Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-12	x86: cache_info: Kill the atomic allocation in amd_init_l3_cache()	Thomas Gleixner
	It's not a good reason to allocate memory in the smp function call just because someone thought it's the most conveniant place. The AMD L3 data is coupled to the northbridge info by a pointer to the corresponding north bridge data. So allocating it with the northbridge data and referencing the northbridge in the cache_info code instead uses less memory and gets rid of that atomic allocation hack in the smp function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20110723212626.688229918@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-12	x86: cache_info: Kill the moronic shadow struct	Thomas Gleixner
	Commit f9b90566c ("x86: reduce stack usage in init_intel_cacheinfo") introduced a shadow structure to reduce the stack usage on large machines instead of making the smaller structure embedded into the large one. That's definitely a candidate for the bad taste award. Move the small struct into the large one and get rid of the ugly type casts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20110723212626.625651773@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-12	x86: cache_info: Remove bogus free of amd_l3_cache data	Thomas Gleixner
	free_cache_attributes() kfree's: per_cpu(ici_cpuid4_info, cpu)->l3 which is a pointer to memory which was allocated as a block in amd_init_l3_cache(). l3 of a particular cpu points to a part of this memory blob. The part and the rest of the blob are still referenced by other cpus. As far as I can tell from the git history this is a leftover from the conversion from per cpu to node data with commit ba06edb63(x86, cacheinfo: Make L3 cache info per node) and the following commit f658bcfb2(x86, cacheinfo: Cleanup L3 cache index disable support) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20110723212626.550539989@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-08	x86: Hyper-V: Integrate the clocksource with Hyper-V detection code	K. Y. Srinivasan
	The Hyper-V clocksource driver is best integrated with Hyper-V detection code since: (a) Linux guests running on Hyper-V require it (b) Integration into that code significanly reduces code size Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: gregkh@suse.de Cc: devel@linuxdriverproject.org Cc: virtualization@lists.osdl.org Link: http://lkml.kernel.org/r/1315434310-4827-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-08-31	x86, perf: Check that current->mm is alive before getting user callchain	Andrey Vagin
	An event may occur when an mm is already released. I added an event in dequeue_entity() and caught a panic with the following backtrace: [ 434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120 ... [ 434.421258] Call Trace: [ 434.421258] [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0 [ 434.421258] [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90 [ 434.421258] [<ffffffff8101b048>] perf_callchain_user+0x128/0x170 [ 434.421258] [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100 [ 434.421258] [<ffffffff81116690>] perf_prepare_sample+0x200/0x280 [ 434.421258] [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290 [ 434.421258] [<ffffffff81065240>] ? tg_shares_up+0x0/0x670 [ 434.421258] [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0 [ 434.421258] [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0 [ 434.421258] [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250 [ 434.421258] [<ffffffff81119204>] perf_tp_event+0x44/0x70 [ 434.421258] [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110 [ 434.421258] [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0 [ 434.421258] [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60 [ 434.421258] [<ffffffff8105818a>] dequeue_task+0x9a/0xb0 [ 434.421258] [<ffffffff810581e2>] deactivate_task+0x42/0xe0 [ 434.421258] [<ffffffff814bc019>] thread_return+0x191/0x808 [ 434.421258] [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60 [ 434.421258] [<ffffffff8106f4c4>] do_exit+0x464/0x910 [ 434.421258] [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0 [ 434.421258] [<ffffffff8106fa57>] sys_exit_group+0x17/0x20 [ 434.421258] [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-26	All Arch: remove linkage for sys_nfsservctl system call	NeilBrown
	The nfsservctl system call is now gone, so we should remove all linkage for it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25	x86-32: Fix boot with CONFIG_X86_INVD_BUG	Andy Lutomirski
	entry_32.S contained a hardcoded alternative instruction entry, and the format changed in commit 59e97e4d6fbc ("x86: Make alternative instruction pointers relative"). Replace the hardcoded entry with the altinstruction_entry macro. This fixes the 32-bit boot with CONFIG_X86_INVD_BUG=y. Reported-and-tested-by: Arnaud Lacombe <lacombar@gmail.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25	mtrr: fix UP breakage caused during switch to stop_machine	Tejun Heo
	While removing custom rendezvous code and switching to stop_machine, commit 192d8857427d ("x86, mtrr: use stop_machine APIs for doing MTRR rendezvous") completely dropped mtrr setting code on !CONFIG_SMP breaking MTRR settting on UP. Fix it by removing the incorrect CONFIG_SMP. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Anders Eriksson <aeriksson@fastmail.fm> Tested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-23	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, vdso: On system call restart after SYSENTER, use int $0x80 x86, UV: Remove UV delay in starting slave cpus x86, olpc: Wait for last byte of EC command to be accepted
2011-08-23	x86, mm, trivial: Remove unnecessary get_order() in free_thread_info()	Zhao Jin
	Because THREAD_SIZE is defined as PAGE_SIZE << THREAD_ORDER on x86, the call of get_order(THREAD_SIZE) can be replaced with THREAD_ORDER. Signed-off-by: Zhao Jin <cronozhj@gmail.com> Link: http://lkml.kernel.org/r/4E4FB5A9.700@gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-19	x86: jump_label: arch_jump_label_text_poke_early: add missing __init	Kevin Winchester
	arch_jump_label_text_poke_early calls text_poke_early, which is an __init function. Thus arch_jump_label_text_poke_early should be the same. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: jbaron@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1313539478-30303-1-git-send-email-kjwinchester@gmail.com [ Use __init_or_module instead of __init ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-18	Merge branch 'perf/urgent' into perf/core	Ingo Molnar
	Merge reason: add the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-14	perf, x86: Avoid kfree() in CPU_STARTING	Peter Zijlstra
	On -rt kfree() can schedule, but CPU_STARTING is before the CPU is fully up and running. These are contradictory, so avoid it. Instead push the kfree() to CPU_ONLINE where we're free to schedule. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kwd4j6ayld5thrscvaxgjquv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-12	Merge branch 'x86-vdso-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip: x86-64: Rework vsyscall emulation and add vsyscall= parameter x86-64: Wire up getcpu syscall x86: Remove unnecessary compile flag tweaks for vsyscall code x86-64: Add vsyscall:emulate_vsyscall trace event x86-64: Add user_64bit_mode paravirt op x86-64, xen: Enable the vvar mapping x86-64: Work around gold bug 13023 x86-64: Move the "user" vsyscall segment out of the data segment. x86-64: Pad vDSO to a page boundary
2011-08-10	x86-64: Rework vsyscall emulation and add vsyscall= parameter	Andy Lutomirski
	There are three choices: vsyscall=native: Vsyscalls are native code that issues the corresponding syscalls. vsyscall=emulate (default): Vsyscalls are emulated by instruction fault traps, tested in the bad_area path. The actual contents of the vsyscall page is the same as the vsyscall=native case except that it's marked NX. This way programs that make assumptions about what the code in the page does will not be confused when they read that code. vsyscall=none: Trying to execute a vsyscall will segfault. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10	x86: Remove unnecessary compile flag tweaks for vsyscall code	Andy Lutomirski
	As of commit 98d0ac38ca7b1b7a552c9a2359174ff84decb600 Author: Andy Lutomirski <luto@mit.edu> Date: Thu Jul 14 06:47:22 2011 -0400 x86-64: Move vread_tsc and vread_hpet into the vDSO user code no longer directly calls into code in arch/x86/kernel/, so we don't need compile flag hacks to make it safe. All vdso code is in the vdso directory now. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/835cd05a4c7740544d09723d6ba48f4406f9826c.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-10	x86, amd: Include elf.h explicitly, prepare the code for the module.h split	Stephen Rothwell
	When the moduleu.h splitting tree is merged to the latest tip:x86/cpu tree, the x86_64 allmodconfig build fails like this: arch/x86/kernel/cpu/amd.c: In function 'bsp_init_amd': arch/x86/kernel/cpu/amd.c:437:3: error: 'va_align' undeclared (first use in this function) arch/x86/kernel/cpu/amd.c:438:23: error: 'ALIGN_VA_32' undeclared (first use in this function) arch/x86/kernel/cpu/amd.c:438:37: error: 'ALIGN_VA_64' undeclared (first use in this function) This is caused by the module.h split up intreacting with commit dfb09f9b7ab0 ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") from the tip:x86/cpu tree. I have added the following patch for today (this, or something similar, could be applied to the tip tree directly - the export.h include below was added by the module.h splitup). So include elf.h to use va_align and remove this implicit dependency on module.h doing it for us. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/20110810114956.238d66772883636e3040d29f@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-09	perf, x86: Add model 45 SandyBridge support	Youquan Song
	Add support to Romely-EP SandyBridge. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Anhua Xu <anhua.xu@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1312264895-2010-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-06	x86-32, amd: Move va_align definition to unbreak 32-bit build	Borislav Petkov
	hpa reported that dfb09f9b7ab03fd367740e541a5caf830ed56726 breaks 32-bit builds with the following error message: /home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined reference to `va_align' /home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined reference to `va_align' This is due to the fact that va_align is a global in a 64-bit only compilation unit. Move it to mmap.c where it is visible to both subarches. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1312633899-1131-1-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-08-05	x86, UV: Remove UV delay in starting slave cpus	Jack Steiner
	Delete the 10 msec delay between the INIT and SIPI when starting slave cpus. I can find no requirement for this delay. BIOS also has similar code sequences without the delay. Removing the delay reduces boot time by 40 sec. Every bit helps. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/20110805140900.GA6774@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-05	x86, amd: Move BSP code to cpu_dev helper	Borislav Petkov
	Move code which is run once on the BSP during boot into the cpu_dev helper. [ hpa: removed bogus cpu_has -> static_cpu_has conversion ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/20110805180409.GC26217@aftab Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>