lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2007-04-09	fix MTIME_SEC_MAX on 32-bit	Thomas Gleixner
	The maximum seconds value we can handle on 32bit is LONG_MAX. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-04-09	prevent timespec/timeval to ktime_t overflow	Thomas Gleixner
	Frank v. Waveren pointed out that on 64bit machines the timespec to ktime_t conversion might overflow. This is also true for timeval to time_t conversions. This breaks a "sleep inf" on 64bit machines. While a timespec/timeval with tx.sec = MAX_LONG is valid by specification the internal representation of ktime_t is based on nanoseconds. The conversion of seconds to nanoseconds overflows for seconds values >= (MAX_LONG / NSEC_PER_SEC). Check the seconds argument to the conversion and limit it to the maximum time which can be represented by ktime_t. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-04-03	[IFB]: Fix crash on input device removal	Patrick McHardy
	The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-28	[SERIAL] Fix oops when removing suspended serial port	Russell King
	A serial card might have been removed when the system is resumed. This results in a suspended port being shut down, which results in the ports shutdown method being called twice in a row. This causes BUGs. Avoid this by tracking the suspended state separately from the initialised state. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-25	USB Storage: US_FL_MAX_SECTORS_64 flag	Phil Dibowitz
	This patch adds a US_FL_MAX_SECTORS_64 and removes the Genesys special-cases for this that were in scsiglue.c. It also adds the flag to other devices reported to need it. Signed-off-by: Phil Dibowitz <phil@ipom.com> Signed-off-by: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-24	NETFILTER: tcp conntrack: fix IP_CT_TCP_FLAG_CLOSE_INIT value	Patrick McHardy
	IP_CT_TCP_FLAG_CLOSE_INIT is a flag and should have a value of 0x4 instead of 0x3, which is IP_CT_TCP_FLAG_WINDOW_SCALE \| IP_CT_TCP_FLAG_SACK_PERM. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-24	NETFILTER: Fix iptables ABI breakage on (at least) CRIS	Patrick McHardy
	With the introduction of x_tables we accidentally broke compatibility by defining IPT_TABLE_MAXNAMELEN to XT_FUNCTION_MAXNAMELEN instead of XT_TABLE_MAXNAMELEN, which is two bytes larger. On most architectures it doesn't really matter since we don't have any tables with names that long in the kernel and the structure layout didn't change because of alignment requirements of following members. On CRIS however (and other architectures that don't align data) this changed the structure layout and thus broke compatibility with old iptables binaries. Changing it back will break compatibility with binaries compiled against recent kernels again, but since the breakage has only been there for three releases this seems like the better choice. Spotted by Jonas Berlin <xkr47@outerspace.dyndns.org>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-24	NETFILTER: arp_tables: fix userspace compilation	Bart De Schuymer
	The included patch translates arpt_counters to xt_counters, making userspace arptables compile against recent kernels. Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-26	hwmon: Refactor SENSOR_DEVICE_ATTR_2	Jim Cromie
	This patch refactors SENSOR_DEVICE_ATTR_2 macro, following pattern set by SENSOR_ATTR. First it creates a new macro SENSOR_ATTR_2() which expands to an initialization expression, then it uses that in SENSOR_DEVICE_ATTR_2, which declares and initializes a struct sensor_device_attribute_2. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-26	hwmon: Allow sensor attributes arrays	Jim Cromie
	This patch refactors SENSOR_DEVICE_ATTR macro. First it creates a new macro SENSOR_ATTR() which expands to an initialization expression, then it uses that in SENSOR_DEVICE_ATTR, which declares and initializes a struct sensor_device_attribute. IOW, SENSOR_ATTR() imitates __ATTR() in include/linux/device.h. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-26	i2c-piix4: Add ATI IXP200/300/400 support	Rudolf Marek
	This patch adds the ATI IXP southbridges support to i2c-piix4, as it turned out those chips are compatible with it. Signed-off-by: Rudolf Marek <r.marek@sh.cvut.cz> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-26	I2C: i2c-piix4: Add Broadcom HT-1000 support	Martin Devera
	Add Broadcom HT-1000 south bridge's PCI ID to i2c-piix driver. Note that at least on Supermicro H8SSL it uses non-standard SMBHSTCFG = 3 and standard values like 0 or 9 causes hangup. Signed-off-by: Martin Devera <devik@cdi.cz> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14	[TCP]: struct tcp_sack_block annotations	Al Viro
	Some of the instances of tcp_sack_block are host-endian, some - net-endian. Define struct tcp_sack_block_wire identical to struct tcp_sack_block with u32 replaced with __be32; annotate uses of tcp_sack_block replacing net-endian ones with tcp_sack_block_wire. Change is obviously safe since for cc(1) __be32 is typedefed to u32. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-03	reiserfs: avoid tail packing if an inode was ever mmapped	Vladimir Saveliev
	This patch fixes a confusion reiserfs has for a long time. On release file operation reiserfs used to try to pack file data stored in last incomplete page of some files into metadata blocks. After packing the page got cleared with clear_page_dirty. It did not take into account that the page may be mmaped into other process's address space. Recent replacement for clear_page_dirty cancel_dirty_page found the confusion with sanity check that page has to be not mapped. The patch fixes the confusion by making reiserfs avoid tail packing if an inode was ever mmapped. reiserfs_mmap and reiserfs_file_release are serialized with mutex in reiserfs specific inode. reiserfs_mmap locks the mutex and sets a bit in reiserfs specific inode flags. reiserfs_file_release checks the bit having the mutex locked. If bit is set - tail packing is avoided. This eliminates a possibility that mmapped page gets cancel_page_dirty-ed. Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-30	hwmon: New driver k8temp	Rudolf Marek
	Add support for the temperature sensor(s) found in AMD K8 CPUs. Signed-off-by: Rudolf Marek <r.marek@sh.cvut.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-30	[SCSI] arcmsr: initial driver, version 1.20.00.13	Erich Chen
	arcmsr is a driver for the Areca Raid controller, a host based RAID subsystem that speaks SCSI at the firmware level. This patch is quite a clean up over the initial submission with contributions from: Randy Dunlap <rdunlap@xenotime.net> Christoph Hellwig <hch@lst.de> Matthew Wilcox <matthew@wil.cx> Adrian Bunk <bunk@stusta.de> Signed-off-by: Erich Chen <erich@areca.com.tw> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09	PCI: irq: irq and pci_ids patch for Intel ICH9	Jason Gaston
	This updated patch adds the Intel ICH9 LPC and SMBus Controller DID's. Signed-off-by: Jason Gaston <jason.d.gaston@intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-18	bridge-netfilter: don't overwrite memory outside of skb	Stephen Hemminger
	The bridge netfilter code needs to check for space at the front of the skb before overwriting; otherwise if skb from device doesn't have headroom, then it will cause random memory corruption. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-14	pci_ids.h: Add NVIDIA PCI ID	Peer Chen
	Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-14	amd74xx.c: add some NVIDIA chipset IDs	Randy Dunlap
	Add some nVidia chipset ID's support. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-14	sata_nv/amd74xx: Add MCP61 support	Andrew Chew
	Added MCP61 support to sata_nv and amd74xx. Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-08	PCI: Unhide the SMBus on Asus PU-DLS	Jean Delvare
	Unhide the SMBus controller on the Asus PU-DLS board. This fixes bug #6763. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-08	PCI: nVidia quirk to make AER PCI-E extended capability visible	Brice Goglin
	The nVidia CK804 PCI-E chipset supports the AER extended capability but sometimes fails to link it (with some BIOS or after a warm reboot). It makes the AER cap invisible to pci_find_ext_capability(). The patch adds a quirk to set the missing bit that controls the linking of the capability. By the way, it removes the corresponding code in the myri10ge driver. Signed-off-by: Brice Goglin <brice@myri.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-08	pci_ids.h: correct naming of 1022:7450 (AMD 8131 Bridge)	John W. Linville
	The naming of the constant defined for PCI ID 1022:7450 does not seem to match the information at http://pciids.sourceforge.net/: http://pci-ids.ucw.cz/iii/?i=1022 There 1022:7450 is listed as "AMD-8131 PCI-X Bridge" while 1022:7451 is listed as "AMD-8131 PCI-X IOAPIC". Yet, the current definition for 0x7450 is PCI_DEVICE_ID_AMD_8131_APIC. It seems to me like that name should map to 0x7451, while a name like PCI_DEVICE_ID_AMD_8131_BRIDGE should map to 0x7450. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-06	Fix mempolicy.h build error	Ralf Baechle
	<linux/mempolicy.h> uses struct mm_struct and relies on a definition or declaration somehow magically being dragged in which may result in a build: CC mm/mempolicy.o In file included from mm/mempolicy.c:69: include/linux/mempolicy.h:150: warning: 'struct mm_struct' declared inside parameter list include/linux/mempolicy.h:150: warning: its scope is only this definition or declaration, which is probably not what you want include/linux/mempolicy.h:174: warning: 'struct mm_struct' declared inside parameter list mm/mempolicy.c:673: error: conflicting types for 'do_migrate_pages' include/linux/mempolicy.h:174: error: previous declaration of 'do_migrate_pages' was here mm/mempolicy.c:1696: error: conflicting types for 'mpol_rebind_mm' include/linux/mempolicy.h:150: error: previous declaration of 'mpol_rebind_mm' was here make[1]: * [mm/mempolicy.o] Error 1 make: * [mm] Error 2 $ Including <linux/sched.h> is a step into direction of include hell so fixed by adding a forward declaration of struct mm_struct instead. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04	remove garbage the sneaked into the ext3 fix	Adrian Bunk
	Spotted by Thomas Voegtle. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29	add forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)	Al Viro
	sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of buffer size from address of buffer_head is a bad idea - it will generate junk in any case, may oops if buffer_head is close to the end of slab page and next page is not mapped and isn't what was intended there. IOW, ->b_data is missing in that call. Fortunately, result doesn't go into the primary on-disk data structures, so only backup ones get crap written to them; that had allowed this bug to remain unnoticed until now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-25	[IGMP]: Fix IGMPV3_EXP() normalization bit shift value.	David L Stevens
	The IGMPV3_EXP() macro doesn't correctly shift the normalization bit, so time-out values are longer than they should be. Thanks to Dirk Ooms for finding the problem in IGMPv3 - MLDv2 had a similar problem that was already fixed a year ago. :-( Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-11	[NET]: Update frag_list in pskb_trim	Herbert Xu
	When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of the packet, the frag_list is not updated to reflect the trimming. This will usually work fine until you hit something that uses the packet length or tail from the frag_list. Examples include esp_output and ip_fragment. Another problem caused by this is that you can end up with a linear packet with a frag_list attached. It is possible to get away with this if we audit everything to make sure that they always consult skb->len before going down onto frag_list. In fact we can do the samething for the paged part as well to avoid copying the data area of the skb. For now though, let's do the conservative fix and update frag_list. Many thanks to Marco Berizzi for helping me to track down this bug. This 4-year old bug took 3 months to track down. Marco was very patient indeed :) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-09	drivers/video/nvidia/nvidia.c: Add ID for Quadro NVS280	Pavel Roskin
	Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem I 10de:0215. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-09-18	[AGPGART] VIA PT880 Ultra support.	Magnus Kessler
	This patch enables agpgart on a Via "PT880 Ultra" based motherboard (Asus P4V800D-X). The PCI ID of the PT880 Ultra is 0x0308 instead of 0x0258 of the PT880. The patched via-agp passes testgart. Signed-off-by: Magnus Kessler <Magnus.Kessler@gmx.net> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-09-18	PFKEYV2: Fix inconsistent typing in struct sadb_x_kmprivate.	Tushar Gohad
	Fixes inconsistent use of "uint32_t" vs. "u_int32_t". Fix pfkeyv2 userspace builds. Signed-off-by: Tushar Gohad <tgohad@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-09-06	pci_ids.h: add some VIA IDE identifiers	Alan Cox
	Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-08-30	ext3: avoid triggering ext3_error on bad NFS file handle	Neil Brown
	The inode number out of an NFS file handle gets passed eventually to ext3_get_inode_block() without any checking. If ext3_get_inode_block() allows it to trigger an error, then bad filehandles can have unpleasant effect - ext3_error() will usually cause a forced read-only remount, or a panic if `errors=panic' was used. So remove the call to ext3_error there and put a matching check in ext3/namei.c where inode numbers are read off storage. Andrew Morton fixed an off-by-one error. Dann Frazier ported the patch to 2.6.16. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-22	[PATCH] I2O: Bugfixes to get I2O working again	Markus Lidel
	- Fixed locking of struct i2o_exec_wait in Executive-OSM - Removed LCT Notify in i2o_exec_probe() which caused freeing memory and accessing freed memory during first enumeration of I2O devices - Added missing locking in i2o_exec_lct_notify() - removed put_device() of I2O controller in i2o_iop_remove() which caused the controller structure get freed to early - Fixed size of mempool in i2o_iop_alloc() - Fixed access to freed memory in i2o_msg_get() See http://bugzilla.kernel.org/show_bug.cgi?id=6561 Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01	[PATCH] Simplify proc/devices and fix early termination regression	Andrew Morton
	Repair /proc/devices early-termination regression. 2.6.16 broke /proc/devices. An application often gets an EOF before the end of data is reached, if that application uses a series of short read(2)s to access the data. I have used read buffers of varying sizes with varying degrees of unsuccess (larger sizes get further into the data than smaller sizes, following a simple pattern). It appears that the only safe way to get the data is to use a single read buffer larger than all the data in /proc/devices. The following example demonstates the problem: # dd if=/proc/devices bs=1 Character devices: 1 mem 27+0 records in 27+0 records out This patch is a backport of the fix recently accepted to Linus's tree: commit 68eef3b4791572ecb70249c7fb145bb3742dd899 [PATCH] Simplify proc/devices and fix early termination regression It replaces the complex, state-machine algorithm introduced in 2.6.16 with a simple algorithm, modeled on the implementation of /proc/interrupts. [akpm@osdl.org: cleanups, simplifications] Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01	[PATCH] for_each_possible_cpu	Andrew Morton
	Backport for_each_possible_cpu() into 2.6.16. Fixes the alpha build, and any future occurrences. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17	[PATCH] Fix buddy list race that could lead to page lru list corruptions	Nick Piggin
	Rohit found an obscure bug causing buddy list corruption. page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0) to determine whether or not a free page's buddy is itself free and in the buddy lists. Each of the conjuncts may be true at different times due to unrelated conditions, so the non-atomic page_is_buddy test may find each conjunct to be true even if they were not both true at the same time (ie. the page was not on the buddy lists). Signed-off-by: Martin Bligh <mbligh@google.com> Signed-off-by: Rohit Seth <rohitseth@google.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07	[PATCH] kdump proc vmcore size oveflow fix	Vivek Goyal
	A couple of /proc/vmcore data structures overflow with 32bit systems having memory more than 4G. This patch fixes those. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07	[PATCH] fbcon: Fix big-endian bogosity in slow_imageblit()	Antonino A. Daplas
	The monochrome->color expansion routine that handles bitmaps which have (widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian. This is caused by a bogus bit test in slow_imageblit(). Fix. Signed-off-by: Antonino Daplas <adaplas@pol.net> Acked-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27	[PATCH] DM: Fix bug: BIO_RW_BARRIER requests to md/raid1 hang.	Neil Brown
	Both R1BIO_Barrier and R1BIO_Returned are 4 !!!! This means that barrier requests don't get returned (i.e. b_endio called) because it looks like they already have been. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27	[PATCH] rtc.h broke strace(1) builds	Joe Korty
	Git patch 52dfa9a64cfb3dd01fa1ee1150d589481e54e28e [PATCH] move rtc_interrupt() prototype to rtc.h broke strace(1) builds. The below moves the kernel-only additions lower, under the already provided #ifdef __KERNEL__ statement. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27	[PATCH] get_cpu_sysdev() signedness fix	Andrew Morton
	Doing (int < NR_CPUS) doesn't dtrt if it's negative.. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-19	[TG3]: 40-bit DMA workaround part 2	Michael Chan
	The 40-bit DMA workaround recently implemented for 5714, 5715, and 5780 needs to be expanded because there may be other tg3 devices behind the EPB Express to PCIX bridge in the 5780 class device. For example, some 4-port card or mother board designs have 5704 behind the 5714. All devices behind the EPB require the 40-bit DMA workaround. Thanks to Chris Elmquist again for reporting the problem and testing the patch. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-11	[PATCH] remove __put_task_struct_cb export again	Christoph Hellwig
	The patch '[PATCH] RCU signal handling' [1] added an export for __put_task_struct_cb, a put_task_struct helper newly introduced in that patch. But the put_task_struct couldn't be used modular previously as __put_task_struct wasn't exported. There are not callers of it in modular code, and it shouldn't be exported because we don't want drivers to hold references to task_structs. This patch removes the export and folds __put_task_struct into __put_task_struct_cb as there's no other caller. [1] http://www2.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e56d090310d7625ecb43a1eeebd479f04affb48b Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11	[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside	Kirill Korotaev
	This patch fixes illegal __GFP_FS allocation inside ext3 transaction in ext3_symlink(). Such allocation may re-enter ext3 code from try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal handle in task_struct and, hence, is not reentrable. This bug led to "Assertion failure in journal_dirty_metadata()" messages. http://bugzilla.openvz.org/show_bug.cgi?id=115 Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09	[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.	Christoph Lameter
	The cache reaper currently tries to free all alien caches and all remote per cpu pages in each pass of cache_reap. For a machines with large number of nodes (such as Altix) this may lead to sporadic delays of around ~10ms. Interrupts are disabled while reclaiming creating unacceptable delays. This patch changes that behavior by adding a per cpu reap_node variable. Instead of attempting to free all caches, we free only one alien cache and the per cpu pages from one remote node. That reduces the time spend in cache_reap. However, doing so will lengthen the time it takes to completely drain all remote per cpu pagesets and all alien caches. The time needed will grow with the number of nodes in the system. All caches are drained when they overflow their respective capacity. So the drawback here is only that a bit of memory may be wasted for awhile longer. Details: 1. Rename drain_remote_pages to drain_node_pages to allow the specification of the node to drain of pcp pages. 2. Add additional functions init_reap_node, next_reap_node for NUMA that manage a per cpu reap_node counter. 3. Add a reap_alien function that reaps only from the current reap_node. For us this seems to be a critical issue. Holdoffs of an average of ~7ms cause some HPC benchmarks to slow down significantly. F.e. NAS parallel slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with the additional interrupt holdoff reductions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09	[PATCH] mtd: 64 bit fixes	Atsushi Nemoto
	Fix some bugs in mtd/jffs2 on 64bit platform. The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h. And some variables in jffs2 are declared as uint32_t but used to hold size_t values. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08	[PATCH] fix file counting	Dipankar Sarma
	I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08	[PATCH] rcu batch tuning	Dipankar Sarma
	This patch adds new tunables for RCU queue and finished batches. There are two types of controls - number of completed RCU updates invoked in a batch (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark, qlowmark). By default, the per-cpu batch limit is set to a small value. If the input RCU rate exceeds the high watermark, we do two things - force quiescent state on all cpus and set the batch limit of the CPU to INTMAX. Setting batch limit to INTMAX forces all finished RCUs to be processed in one shot. If we have more than INTMAX RCUs queued up, then we have bigger problems anyway. Once the incoming queued RCUs fall below the low watermark, the batch limit is set to the default. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>