summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-03-05parisc: fix IOMMU's device boundary overflow bug on 32bits archFUJITA Tomonori
On 32bits boxes, boundary_size becomes zero due to a overflow and we hit BUG_ON in iommu_is_span_boundary. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Matthew Wilcox <matthew@wil.cx> Acked-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-05cpusets: fix obsolete commentDavid Rientjes
mm migration is no longer done in cpuset_update_task_memory_state() so it can no longer take current->mm->mmap_sem, so fix the obsolete comment. [ This changed in commit 04c19fa6f16047abff2288ddbc1f0798ede5a849 ("cpuset: migrate all tasks in cpuset at once") when the mm migration was moved from cpuset_update_task_memory_state() to update_nodemask() ] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (27 commits) [SCSI] mpt fusion: don't oops if NumPhys==0 [SCSI] iscsi class: regression - fix races with state manipulation and blocking/unblocking [SCSI] qla4xxx: regression - add start scan callout [SCSI] qla4xxx: fix host reset dpc race [SCSI] tgt: fix build errors when dprintk is defined [SCSI] tgt: set the data length properly [SCSI] tgt: stop zero'ing scsi_cmnd [SCSI] ibmvstgt: set up scsi_host properly before __scsi_alloc_queue [SCSI] docbook: fix fusion source files [SCSI] docbook: fix scsi source file [SCSI] qla2xxx: Update version number to 8.02.00-k9. [SCSI] qla2xxx: Correct usage of inconsistent timeout values while issuing ELS commands. [SCSI] qla2xxx: Correct discrepancies during OVERRUN handling on FWI2-capable cards. [SCSI] qla2xxx: Correct needless clean-up resets during shutdown. [SCSI] arcmsr: update version and changelog [SCSI] ps3rom: disable clustering [SCSI] ps3rom: fix wrong resid calculation bug [SCSI] mvsas: fix phy sas address [SCSI] gdth: fix to internal commands execution [SCSI] gdth: bugfix for the at-exit problems ...
2008-03-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: NFS: use new LSM interfaces to explicitly set mount options LSM/SELinux: Interfaces to allow FS to control mount options
2008-03-05Merge branch 'fixes-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes-25' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] fix section mismatch warnings [CPUFREQ] Remove debugging message from e_powersaver [CPUFREQ] Fix missing cpufreq_cpu_put() call in ->store [CPUFREQ] Fix missing cpufreq_cpu_put() call in ->show
2008-03-05Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] incorrect reipl nss name. [S390] Load disabled wait psw if reipl fails. [S390] Fix IPL from NSS. [S390] zcrypt: fix ap_device_list handling [S390] sclp_vt220: speed up console output for interactive work [S390] dasd: fix reference counting in display method for proc/dasd/devices [S390] dasd: let dasd erp matching recognize alias recovery [S390] Get rid of memcpy gcc warning workaround. [S390] idle: Fix machine check handling in idle loop. [S390] Update default configuration.
2008-03-06NFS: use new LSM interfaces to explicitly set mount optionsEric Paris
NFS and SELinux worked together previously because SELinux had NFS specific knowledge built in. This design was approved by both groups back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and the usage of nfs_clone_mount_data showed this to be a poor fragile solution. This patch fixes the NFS functionality regression by making use of the new LSM interfaces to allow an FS to explicitly set its own mount options. The explicit setting of mount options is done in the nfs get_sb functions which are called before the generic vfs hooks try to set mount options for filesystems which use text mount data. This does not currently support NFSv4 as that functionality did not exist in previous kernels and thus there is no regression. I will be adding the needed code, which I believe to be the exact same as the v3 code, in nfs4_get_sb for 2.6.26. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-03-06LSM/SELinux: Interfaces to allow FS to control mount optionsEric Paris
Introduce new LSM interfaces to allow an FS to deal with their own mount options. This includes a new string parsing function exported from the LSM that an FS can use to get a security data blob and a new security data blob. This is particularly useful for an FS which uses binary mount data, like NFS, which does not pass strings into the vfs to be handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK when dealing with binary mount data. If the binary mount data is less than one page the copy_page() in security_sb_copy_data() can cause an illegal page fault and boom. Remove all NFSisms from the SELinux code since they were broken by past NFS changes. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-03-05[SCSI] mpt fusion: don't oops if NumPhys==0Krzysztof Oledzki
Don't oops if NumPhys==0, instead return -ENODEV. This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909 Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Acked-by: Eric Moore <Eric.Moore@lsi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-03-05[CPUFREQ] fix section mismatch warningsSam Ravnborg
Fix the following warnings: WARNING: vmlinux.o(.text+0xfe6711): Section mismatch in reference from the function cpufreq_unregister_driver() to the variable .cpuinit.data:cpufreq_cpu_notifier WARNING: vmlinux.o(.text+0xfe68af): Section mismatch in reference from the function cpufreq_register_driver() to the variable .cpuinit.data:cpufreq_cpu_notifier WARNING: vmlinux.o(.exit.text+0xc4fa): Section mismatch in reference from the function cpufreq_stats_exit() to the variable .cpuinit.data:cpufreq_stat_cpu_notifier The warnings were casued by references to unregister_hotcpu_notifier() from normal functions or exit functions. This is flagged by modpost as a potential error because it does not know that for the non HOTPLUG_CPU scenario the unregister_hotcpu_notifier() is a nop. Silence the warning by replacing the __initdata annotation with a __refdata annotation. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
2008-03-05[CPUFREQ] Remove debugging message from e_powersaverDave Jones
We don't need to printk a message every time we transition. Leave the code there, but ifdef'd out, as it's useful when adding support for new processors. Reported-by: Petr Titěra <P.Titera@century.cz> Signed-off-by: Dave Jones <davej@redhat.com>
2008-03-05[CPUFREQ] Fix missing cpufreq_cpu_put() call in ->storeDave Jones
refactor to use gotos instead of explicit exit paths Signed-off-by: Dave Jones <davej@redhat.com>
2008-03-05[CPUFREQ] Fix missing cpufreq_cpu_put() call in ->showDave Jones
refactor to use gotos instead of explicit exit paths Signed-off-by: Dave Jones <davej@redhat.com>
2008-03-05[SCSI] iscsi class: regression - fix races with state manipulation and ↵Mike Christie
blocking/unblocking For qla4xxx, we could be starting a session, but some error (network, target, IO from a device that got started, etc) could cause the session to fail and curring the block/unblock and state manipulation could race with each other. This patch just has those operations done in the single threaded iscsi eh work queue, so that way they are serialized. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-03-05[SCSI] qla4xxx: regression - add start scan calloutMike Christie
We are seeing EXIST errors from sysfs during device addition. We need a start scan callout so we do not start scanning sessions found during hba setup, before the async scsi scan code is ready. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: David C Somayajulu <david.somayajulu@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-03-05[SCSI] qla4xxx: fix host reset dpc raceMike Christie
The host reset callout could be starting to reset the hba at the same time the dpc thread is. This creates lots of problems because they both want to do wierd things with the firmware and interrupts, etc. This patch just has the host reset function fully shutdown the dpc thread before resetting the hba. This patch also moves the setting of the session online bit to fix a potential race with the dpc thread and iscsi recovery thread. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: David C Somayajulu <david.somayajulu@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-03-05ahci: work around ATI SB600 h/w quirkJeff Garzik
This addresses the recent ATI SB600 errata, where the hardware does not like 256-length PRD entries during FPDMA (aka NCQ). It hurts performance on SB600, but it is more important to get a correct patch eliminating the data corruption/lockups, and then later on tune for performance. We simply limit each command to a maximum of 255 sectors, on SB600. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-03-05pata_hpt*, pata_serverworks: fix UDMA maskingAlan Cox
When masking, mask out the modes that are unsupported not the ones that are supported. This makes life happier. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-03-05[S390] incorrect reipl nss name.Hongjie Yang
/sys/firmware/reipl/nss/name contains the nss name when defsys or savesys command has been executed. If the defsys or savesys command fails the kernel_nss_name has to be cleared since a reipl on that nss name won't be possible. Signed-off-by: Hongjie Yang <hongjie@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] Load disabled wait psw if reipl fails.Michael Holzheu
Normally this should not happen, but it's cleaner to do it that way. Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] Fix IPL from NSS.Heiko Carstens
IPL from NSS didn't work because the memory detection routine omits any memory sections with a size lower than what MAX_ORDER defines. This causes the detection routine to skip the first memory segment which has a size of 1MB. Which later on will let the kernel think that there is no memory available at all. Since in addition the z/VM memory increment size is 1MB force MAX_ORDER to be 9, so we can support 1MB segments. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] zcrypt: fix ap_device_list handlingRalph Wuerthner
In ap_device_probe() we can add the new ap device to the internal device list only if the device probe function successfully returns. Otherwise we might end up with an invalid device in the internal ap device list. Signed-off-by: Ralph Wuerthner <rwuerthn@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] sclp_vt220: speed up console output for interactive workChristian Borntraeger
Currently an output buffer can wait up to HZ/2 until the buffer is flushed. The wait time is noticeable in interactive tools like mc. Change the value to HZ/20, which seems enough for interactive work. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] dasd: fix reference counting in display method for proc/dasd/devicesStefan Weinhuber
Using the /proc/dasd/devices interface leaves the reference counter of alias devices in an inconsistent state. A process that tries to set such a device offline afterwards will hang. The dasd_devices_show function returns immediately for alias devices and this code path was missing a dasd_put_device call. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] dasd: let dasd erp matching recognize alias recoveryStefan Weinhuber
When a request fails that was started on an alias device then the first recovery step is to retry it on the base device. If the recovery request fails again with the same symptoms, the next step should not be a simple retry, but should be a proper recovery based on sense data, etc. To do so, the dasd recovery functions need to recognize the alias recovery step in the erp chain by comparing the start devices. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] Get rid of memcpy gcc warning workaround.Heiko Carstens
Compile smp.o with -Wno-nonnull so gcc stops warning about memcpy being used with a null parameter. Also remove the workaround code and use a char * cast instead of a void * cast to do computations. Cc: Bastian Blank <bastian@waldi.eu.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] idle: Fix machine check handling in idle loop.Heiko Carstens
If a machine check handling is pending when the idle loop is entered default_idle will be left with timer ticks and virtual timer disabled. Fix this by "calling" the idle_chain. Also a BUG_ON(!in_interrupt) in start_hz_timer must be removed since the function now gets called from non interrupt context as well. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-05[S390] Update default configuration.Martin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-03-04Linux 2.6.25-rc4v2.6.25-rc4Linus Torvalds
2008-03-04module: allow ndiswrapper to use GPL-only symbolsPavel Roskin
A change after 2.6.24 broke ndiswrapper by accidentally removing its access to GPL-only symbols. Revert that change and add comments about the reasons why ndiswrapper and driverloader are treated in a special way. Signed-off-by: Pavel Roskin <proski@gnu.org> Acked-by: Greg KH <gregkh@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jon Masters <jonathan@jonmasters.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) [IPCONFIG]: The kernel gets no IP from some DHCP servers b43legacy: Fix module init message rndis_wlan: fix broken data copy libertas: compare the current command with response libertas: fix sanity check on sequence number in command response p54: fix eeprom parser length sanity checks p54: fix EEPROM structure endianness ssb: Add pcibios_enable_device() return value check rc80211-pid: fix rate adjustment [ESP]: Add select on AUTHENC [TCP]: Improve ipv4 established hash function. [NETPOLL]: Revert two bogus cleanups that broke netconsole. [PPPOL2TP]: Add missing sock_put() in pppol2tp_tunnel_closeall() Subject: [PPPOL2TP] add missing sock_put() in pppol2tp_recv_dequeue() [BLUETOOTH]: l2cap info_timer delete fix in hci_conn_del [NET]: Fix race in generic address resolution. iucv: fix build error on !SMP [TCP]: Must count fack_count also when skipping [TUN]: Fix RTNL-locking in tun/tap driver [SCTP]: Use proc_create to setup de->proc_fops. ...
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC]: Fix link errors with gcc-4.3 sparc64: replace remaining __FUNCTION__ occurances sparc: replace remaining __FUNCTION__ occurances [SPARC]: Add reboot_command[] extern decl to asm/system.h [SPARC]: Mark linux_sparc_{fpu,chips} static.
2008-03-04[IPCONFIG]: The kernel gets no IP from some DHCP serversStephen Hemminger
From: Stephen Hemminger <shemminger@linux-foundation.org> Based upon a patch by Marcel Wappler: This patch fixes a DHCP issue of the kernel: some DHCP servers (i.e. in the Linksys WRT54Gv5) are very strict about the contents of the DHCPDISCOVER packet they receive from clients. Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and 'siaddr' MUST be set to '0'. These DHCP servers ignore Linux kernel's DHCP discovery packets with these two fields set to '255.255.255.255' (in contrast to popular DHCP clients, such as 'dhclient' or 'udhcpc'). This leads to a not booting system. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6
2008-03-04Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] fix ia64 kprobes compilation [IA64] move gcc_intrin.h from header-y to unifdef-y [IA64] workaround tiger ia64_sal_get_physical_id_info hang [IA64] move defconfig to arch/ia64/configs/ [IA64] Fix irq migration in multiple vector domain [IA64] signal(ia64_ia32): add a signal stack overflow check [IA64] signal(ia64): add a signal stack overflow check [IA64] CONFIG_SGI_SN2 - auto select NUMA and ACPI_NUMA
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: debugfs: fix sparse warnings Driver core: Fix cleanup when failing device_add(). driver core: Remove dpm_sysfs_remove() from error path of device_add() PM: fix new mutex-locking bug in the PM core PM: Do not acquire device semaphores upfront during suspend kobject: properly initialize ksets sysfs: CONFIG_SYSFS_DEPRECATED fix driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: pci: hotplug: pciehp: fix error code path in hpc_power_off_slot PCI: Add DECLARE_PCI_DEVICE_TABLE macro PCI: fix up error messages for pci_bus registering PCI: fix section mismatch warning in pci_scan_child_bus PCI: consolidate duplicated MSI enable functions PCI: use dev_printk in quirk messages
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: USB: ftdi_sio - really enable EM1010PC USB: remove incorrect struct class_device from the printer gadget USB: pxa2xx_udc: fix misuse of clock enable/disable calls USB: ftdi_sio: Workaround for broken Matrix Orbital serial port USB: Add support for AXESSTEL MV110H CDMA modem usb-storage: update earlier scatter-gather bug fix USB: isp116x: fix enumeration on boot USB: ehci: handle large bulk URBs correctly (again) USB: spruce up the device blacklist USB: fix comment of struct usb_interface USB: update Kconfig entry for USB_SUSPEND usb: Add support for the mos7820/7840-based B&B USB/RS485 converter to mos7840.c
2008-03-04kprobes: fix a null pointer bug in register_kretprobe()Masami Hiramatsu
Fix a bug in regiseter_kretprobe() which does not check rp->kp.symbol_name == NULL before calling kprobe_lookup_name. For maintainability, this introduces kprobe_addr helper function which resolves addr field. It is used by register_kprobe and register_kretprobe. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04input: add I2C to config since the driver makes several i2c*() callsRandy Dunlap
Add to help text that the Intel I2C ICH (i801) driver is also needed for this kernel. Add LEDS_CLASS to config since the driver makes les_classdev_*() calls: ERROR: "led_classdev_register" [drivers/input/misc/apanel.ko] undefined! ERROR: "__led_classdev_unregister" [drivers/input/misc/apanel.ko] undefined! Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04ext3: fix mount option parsingJosef Bacik
The "resize" option won't be noticed as it comes after the NULL option, so if you try to mount (or in this case remount) with that option it won't be recognized. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04hugetlb: fix pool shrinking while in restricted cpusetNishanth Aravamudan
Adam Litke noticed that currently we grow the hugepage pool independent of any cpuset the running process may be in, but when shrinking the pool, the cpuset is checked. This leads to inconsistency when shrinking the pool in a restricted cpuset -- an administrator may have been able to grow the pool on a node restricted by a containing cpuset, but they cannot shrink it there. There are two options: either prevent growing of the pool outside of the cpuset or allow shrinking outside of the cpuset. >From previous discussions on linux-mm, /proc/sys/vm/nr_hugepages is an administrative interface that should not be restricted by cpusets. So allow shrinking the pool by removing pages from nodes outside of current's cpuset. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: William Irwin <wli@holomorphy.com> Cc: Lee Schermerhorn <Lee.Schermerhonr@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04hugetlb: close a difficult to trigger reservation raceAdam Litke
A hugetlb reservation may be inadequately backed in the event of racing allocations and frees when utilizing surplus huge pages. Consider the following series of events in processes A and B: A) Allocates some surplus pages to satisfy a reservation B) Frees some huge pages A) A notices the extra free pages and drops hugetlb_lock to free some of its surplus pages back to the buddy allocator. B) Allocates some huge pages A) Reacquires hugetlb_lock and returns from gather_surplus_huge_pages() Avoid this by commiting the reservation after pages have been allocated but before dropping the lock to free excess pages. For parity, release the reservation in return_unused_surplus_pages(). This patch also corrects the cpuset_mems_nr() error path in hugetlb_acct_memory(). If the cpuset check fails, uncommit the reservation, but also be sure to return any surplus huge pages that may have been allocated to back the failed reservation. Thanks to Andy Whitcroft for discovering this. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: the md RAID10 resync thread could cause a md RAID10 array deadlockK.Tanaka
This message describes another issue about md RAID10 found by testing the 2.6.24 md RAID10 using new scsi fault injection framework. Abstract: When a scsi error results in disabling a disk during RAID10 recovery, the resync threads of md RAID10 could stall. This case, the raid array has already been broken and it may not matter. But I think stall is not preferable. If it occurs, even shutdown or reboot will fail because of resource busy. The deadlock mechanism: The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be handled when recovering. The "remaining" counter is incremented when building a BIO in sync_request() and is decremented when finish a BIO in end_sync_write(). If building a BIO fails for some reasons in sync_request(), the "remaining" should be decremented if it has already been incremented. I found a case where this decrement is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for md_done_sync() called by end_sync_write(), but end_sync_write() never calls md_done_sync() because of the "remaining" counter mismatch. For example, this problem would be reproduced in the following case: Personalities : [raid10] md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F) 3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_] [>....................] recovery = 2.2% (45376/1959808) finish=0.7min speed=45376K/sec This case, sdf1 is recovering, sdb1 and sde1 are disabled. An additional error with detaching sdd will cause a deadlock. md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F) 3919616 blocks 64K chunks 2 near-copies [4/1] [_U__] [=>...................] recovery = 5.0% (99520/1959808) finish=5.9min speed=5237K/sec 2739 ? S< 0:17 [md0_raid10] 28608 ? D< 0:00 [md0_resync] 28629 pts/1 Ss 0:00 bash 28830 pts/1 R+ 0:00 ps ax 31819 ? D< 0:00 [kjournald] The resync thread keeps working, but actually it is deadlocked. Patch: By this patch, the remaining counter will be decremented if needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: fix possible raid1/raid10 deadlock on read error during resyncNeilBrown
Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for another possible deadlock in raid1/raid10 error handing. If a read request returns an error while a resync is happening and a resync request is pending, the attempt to fix the error will block until the resync progresses, and the resync will block until the read request completes. Thus a deadlock. This patch fixes the problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: don't attempt read-balancing for raid10 'far' layoutsKeld Simonsen
This patch changes the disk to be read for layout "far > 1" to always be the disk with the lowest block address. Thus the chunks to be read will always be (for a fully functioning array) from the first band of stripes, and the raid will then work as a raid0 consisting of the first band of stripes. Some advantages: The fastest part which is the outer sectors of the disks involved will be used. The outer blocks of a disk may be as much as 100 % faster than the inner blocks. Average seek time will be smaller, as seeks will always be confined to the first part of the disks. Mixed disks with different performance characteristics will work better, as they will work as raid0, the sequential read rate will be number of disks involved times the IO rate of the slowest disk. If a disk is malfunctioning, the first disk which is working, and has the lowest block address for the logical block will be used. Signed-off-by: Keld Simonsen <keld@dkuug.dk> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: lock access to rdev attributes properlyNeilBrown
When we access attributes of an rdev (component device on an md array) through sysfs, we really need to lock the array against concurrent changes. We currently do that when we change an attribute, but not when we read an attribute. We need to lock when reading as well else rdev->mddev could become NULL while we are accessing it. So add appropriate locking (mddev_lock) to rdev_attr_show. rdev_size_store requires some extra care as well as it needs to unlock the mddev while scanning other mddevs for overlapping regions. We currently assume that rdev->mddev will still be unchanged after the scan, but that cannot be certain. So take a copy of rdev->mddev for use at the end of the function. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: make sure a reshape is started when device switches to read-writeNeilBrown
A resync/reshape/recovery thread will refuse to progress when the array is marked read-only. So whenever it mark it not read-only, it is important to wake up thread resync thread. There is one place we didn't do this. The problem manifests if the start_ro module parameters is set, and a raid5 array that is in the middle of a reshape (restripe) is started. The array will initially be semi-read-only (meaning it acts like it is readonly until the first write). So the reshape will not proceed. On the first write, the array will become read-write, but the reshape will not be started, and there is no event which will ever restart that thread. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: clean up irregularity with raid autodetectNeilBrown
When a raid1 array is stopped, all components currently get added to the list for auto-detection. However we should really only add components that were found by autodetection in the first place. So add a flag to record that information, and use it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04md: guard against possible bad array geometry in v1 metadataNeilBrown
Make sure the data doesn't start before the end of the superblock when the superblock is at the start of the device. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>