summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-05-26ipv4: udp: fix short packet and bad checksum loggingBjørn Mork
commit ccc2d97cb7c798e785c9f198de243e2b59f7073b upstream. commit 2783ef23 moved the initialisation of saddr and daddr after pskb_may_pull() to avoid a potential data corruption. Unfortunately also placing it after the short packet and bad checksum error paths, where these variables are used for logging. The result is bogus output like [92238.389505] UDP: short packet: From 2.0.0.0:65535 23715/178 to 0.0.0.0:65535 Moving the saddr and daddr initialisation above the error paths, while still keeping it after the pskb_may_pull() to keep the fix from commit 2783ef23. Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26printk: Fix missing klogd wakeupThomas Gleixner
The RT check for !in_atomic() && !irqs_disabled()) to prevent the klogd wakeup is actually bogus as wake_up_klogd() is just setting the needs print flag which is then evaluated from printk_tick() which does the actual wakeup. Reported-by: Nikita V. Youshchenko <yoush@cs.msu.su> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-26fix undefined references to kernel_semOlaf Hering
protect kernel_sem access with CONFIG_LOCK_KERNEL lib/kernel_lock.c is compiled conditionally. Signed-off-by: Olaf Hering <olaf@aepfle.de> LKML-Reference: <20100524220428.GA17771@aepfle.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-20fs: namespace: Fix fuse mount falloutjohn stultz
> WARNING: at fs/namespace.c:648 commit_tree+0xf1/0x10b() > Pid: 14002, comm: fusermount Not tainted 2.6.33.3-rt19 #32 > Call Trace: > [<ffffffff81040e67>] warn_slowpath_common+0x7c/0x94 > [<ffffffff81040e93>] warn_slowpath_null+0x14/0x16 > [<ffffffff81115811>] commit_tree+0xf1/0x10b > [<ffffffff8111661d>] attach_recursive_mnt+0xf2/0x188 > [<ffffffff811167b3>] graft_tree+0x100/0x102 > [<ffffffff8111765b>] do_mount+0x386/0x7ae > [<ffffffff810d55f2>] ? strndup_user+0x5d/0x85 > [<ffffffff81117b0b>] sys_mount+0x88/0xc2 > [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b Looks like the fuse mounts are more interesting here and are tripping up the MNT_MOUNTED logic. Trivial cases: MNT_MOUNTED gets set by: attach_mnt commit_tree MNT_MOUNTED gets unset by: detach_mnt unmount_tree So there's a nice symmetry there. We also clear MNT_MOUNTED in clone_mnt(), since we're creating a unmounted copy that we will latter call attach_mnt() upon. Now, here's where things get messy: copy_tree(): Earlier, we didn't set MNT_MOUNTED on the first clone on the root of the mnt to be copied. This caused problems with new namespaces since after it is copied, we don't call attach_mnt or commit_tree. So when the namespace is removed, and we call unmount_tree, and hit a WARN_ON. Similarly, if we bombed out in copy_tree due to a ENOMEM, we call umount_tree on the mnt and will hit the WARN_ON as well. The same issue hits us with collect_mounts and drop_collected_mounts, where we copy_tree() and then unmount_tree() and hit the WARN_ON. This seemed broken, so I set MNT_MOUNTED on the root cloned mnt in copy_tree and it resolved the above asymmetries. However, do_loopback is more complicated, since it calls either copy_tree or clone_mnt (depending on the recursive flag) and then grafts that mnt which calls commit_tree()/attach_mnt(). Leaving clone_mnt(), the mnt is not set as MNT_MOUNTED, but now with my change to copy_tree(), it sets the root as MNT_MOUNTED. This then causes a WARN_ON in the commit_tree() called by graft_tree(). The fix below simply clears the recently set MNT_MOUNTED flag after copy_tree() returns, before calling graft_tree(). Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Carsten Emde <C.Emde@osadl.org> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-19net: ehea: make rx irq handler non-threaded (IRQF_NODELAY)Darren Hart
The underlying hardware is edge triggered but presented by XICS as level triggered. The edge triggered interrupts are not reissued after masking. This is not a problem in mainline which does not mask the interrupt (relying on the EOI mechanism instead). The threaded interrupts in PREEMPT_RT do mask the interrupt, and can lose interrupts that occurred while masked, resulting in a hung ethernet interface. The receive handler simply calls napi_schedule(), as such, there is no significant additional overhead in making this non-threaded, since we either wakeup the threaded irq handler to call napi_schedule(), or just call napi_schedule() directly to wakeup the softirqs. As the receive handler is lockless, there is no need to convert any of the ehea spinlock_t's to raw_spinlock_t's. Without this patch, a simple scp file copy loop would fail quickly (usually seconds). We have over two hours of sustained scp activity with the patch applied. Credit goes to Will Schmidt for lots of instrumentation and tracing which clarified the scenario and to Thomas Gleixner for the incredibly simple solution. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Jan-Bernd Themann <themann@de.ibm.com> Cc: Nivedita Singhvi <niv@us.ibm.com> Cc: Brian King <bjking1@us.ibm.com> Cc: Michael Ellerman <ellerman@au1.ibm.com> Cc: Doug Maxey <doug.maxey@us.ibm.com> LKML-Reference: <4BF30793.5070300@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-17v2.6.33.4-rt20v2.6.33.4-rt20Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-17net: Make [dis/en]able_irq_*_lockdep() RT safeThomas Gleixner
The lockdep irqoff protection which is used to prevent lockdep false positives leads to "scheduling while atomic" and "might sleep" bug floods. Make the irq disabling depend on !RT. Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-13fs: Resolve mntput_no_expire issues.john stultz
In testing the mnt_count typo fix, I hit a few BUG_ON/WARN_ON messages in the mntput_no_expire code. The first issue was a race against the MNT_MOUNTED flag, where if after the optimistic lock free check is done, someone changes the value, we might BUG_ON after getting the lock. The fix is after getting the lock, re-check the MNT_MOUNTED bit and drop the lock and try again if its changed. The second issue was a call to smp_processor_id() in add_mnt_count() that was done while preemptable. This was missed in my earlier commit 070976b5b038218900648ea4cc88786d5dfcd58d. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <1273711934.2856.22.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-13fs: Fix mnt_count typojohn stultz
Clark noticed the following snippit in commit 070976b5b038218900648ea4cc88786d5dfcd58d : if (mnt->mnt_pinned) { - inc_mnt_count(mnt); + preempt_disable(); + dec_mnt_count(mnt); + preempt_enable(); mnt->mnt_pinned--; } vfsmount_write_unlock(); I accidentally replaced an inc_mnt_count() with a dec_mnt_count(). The issue went unnoticed, as the only user of mnt_unpin in the acct syscall. This patch corrects the mistake. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Cc: Darren Hart <dvhltc@us.ibm.com> LKML-Reference: <1273711544.2856.15.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-13Merge branch '2.6.33.4' into rt/2.6.33Thomas Gleixner
Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-12Linux 2.6.33.4v2.6.33.4Greg Kroah-Hartman
2010-05-12MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions.Ralf Baechle
(cherry picked from commit e65c7f33d75e977350ca350573d93c517ec02776) Previously it was unconditionally used on all Sibyte family SOCs. The M3 bug has to be handled in the TLB exception handler which is extremly performance sensitive, so this modification is expected to deliver around 2-3% performance improvment. This is important as required changes to the M3 workaround will make it more costly. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12SCSI: Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O errorJames Bottomley
commit 77a4229719e511a0d38d9c355317ae1469adeb54 upstream. There's nastyness in the way we currently handle barriers (and discards): They're effectively filesystem commands, but they get processed as BLOCK_PC commands. Unfortunately BLOCK_PC commands are taken by SCSI to be SG_IO commands and the issuer expects to see and handle any returned errors, however trivial. This leads to a huge problem, because the block layer doesn't expect this to happen and any trivially retryable error on a barrier causes an immediate I/O error to the filesystem. The only real way to hack around this is to take the usual class of offending errors (unit attentions) and make them all retryable in the case of a REQ_HARDBARRIER. A correct fix would involve a rework of the entire block and SCSI submit system, and so is out of scope for a quick fix. Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12Enable retries for SYNCRONIZE_CACHE commands to fix I/O errorHannes Reinecke
commit c213e1407be6b04b144794399a91472e0ef92aec upstream. Some arrays are giving I/O errors with ext3 filesystems when SYNCHRONIZE_CACHE gets a UNIT_ATTENTION. What is happening is that these commands have no retries, so the UNIT_ATTENTION causes the barrier to fail. We should be enable retries here to clear any transient error and allow the barrier to succeed. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12scsi_debug: virtual_gb ignores sector_sizeDouglas Gilbert
commit 5447ed6c968e7270b656afa273c2b79d15d82edd upstream. In the scsi_debug driver, the virtual_gb option ignores the sector_size, implicitly assuming that is 512 bytes. So if 'virtual_gb=1 sector_size=4096' the result is an 8 GB (virtual) disk. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12SCSI: libiscsi: regression: fix header digest errorsMike Christie
commit 96b1f96dcab87756c0a1e7ba76bc5dc2add82b88 upstream. This fixes a regression introduced with this commit: commit d3305f3407fa3e9452079ec6cc8379067456e4aa Author: Mike Christie <michaelc@cs.wisc.edu> Date: Thu Aug 20 15:10:58 2009 -0500 [SCSI] libiscsi: don't increment cmdsn if cmd is not sent in 2.6.32. When I moved the hdr->cmdsn after init_task, I added a bug when header digests are used. The problem is that the LLD may calculate the header digest in init_task, so if we then set the cmdsn after the init_task call we change what the digest will be calculated by the target. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12SCSI: fix locking around blk_abort_request()Tejun Heo
commit 70b25f890ce9f0520c64075ce9225a5b020a513e upstream. blk_abort_request() expects queue lock to be held by the caller. Grab it before calling the function. Lack of this synchronization led to infinite loop on corrupt q->timeout_list. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12pxa/colibri: fix missing #include <mach/mfp.h> in colibri.hJakob Viketoft
commit ccb8d8d070b8f25f0163da5c9ceacf63a5169540 upstream. The use of mfp_cfg_t causes build errors without including <mach/mfp.h>. CC: Daniel Mack <daniel@caiaq.de> Signed-off-by: Jakob Viketoft <jakob.viketoft@bitsim.com> Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12cpuidle: Fix incorrect optimizationArjan van de Ven
commit 1c6fe0364fa7bf28248488753ee0afb6b759cd04 upstream. commit 672917dcc78 ("cpuidle: menu governor: reduce latency on exit") added an optimization, where the analysis on the past idle period moved from the end of idle, to the beginning of the new idle. Unfortunately, this optimization had a bug where it zeroed one key variable for new use, that is needed for the analysis. The fix is simple, zero the variable after doing the work from the previous idle. During the audit of the code that found this issue, another issue was also found; the ->measured_us data structure member is never set, a local variable is always used instead. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12ACPI: sleep: init_set_sci_en_on_resume for Dell Studio 155xKamal Mostafa
commit ea5bc73f4f56449b2d450068d492bcd17a675d7a upstream. Add Dell Studio models (1558, 1557, 1555) to the 'set_sci_en_on_resume' list to fix hang on resume. BugLink: http://bugs.launchpad.net/bugs/553498 Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Alex Chiang <achiang@canonical.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12power_meter: acpi_device_class "power_meter_resource" too longDan Carpenter
commit 18262714ca0fb65c290b8ea1807b2b02bb52d0e3 upstream. acpi_device_class can only be 19 characters and a NULL terminator. The current code has a buffer overflow in acpi_power_meter_add(): strcpy(acpi_device_class(device), ACPI_POWER_METER_CLASS); Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Len Brown <lenb@kernel.org> Cc: "Darrick J. Wong" <djwong@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12ACPI: DMI init_set_sci_en_on_resume for multiple Lenovo ThinkPadsAlex Chiang
commit 07bedca29b0973f36a6b6db36936deed367164ed upstream. Multiple Lenovo ThinkPad models with Intel Core i5/i7 CPUs can successfully suspend/resume once, and then hang on the second s/r cycle. We got confirmation that this was due to a BIOS defect. The BIOS did not properly set SCI_EN coming out of S3. The BIOS guys hinted that The Other Leading OS ignores the fact that hardware owns the bit and sets it manually. In any case, an existing DMI table exists for machines where this defect is a known problem. Lenovo promise to fix their BIOS, but for folks who either won't or can't upgrade their BIOS, allow Linux to workaround the issue. https://bugzilla.kernel.org/show_bug.cgi?id=15407 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/532374 Confirmed by numerous testers in the launchpad bug that using acpi_sleep=sci_force_enable fixes the issue. We add the machines to acpisleep_dmi_table[] to automatically enable this workaround. Cc: Colin King <colin.king@canonical.com> Signed-off-by: Alex Chiang <achiang@canonical.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12V4L/DVB: budget: Oops: "BUG: unable to handle kernel NULL pointer dereference"Bjørn Mork
commit 6f550dc08369ee0bc6402963c377e65f0f2e3b71 upstream. Never call dvb_frontend_detach if we failed to attach a frontend. This fixes the following oops, which will be triggered by a missing stv090x module: [ 8.172997] DVB: registering new adapter (TT-Budget S2-1600 PCI) [ 8.209018] adapter has MAC addr = 00:d0:5c:cc:a7:29 [ 8.328665] Intel ICH 0000:00:1f.5: PCI INT B -> GSI 17 (level, low) -> IRQ 17 [ 8.328753] Intel ICH 0000:00:1f.5: setting latency timer to 64 [ 8.562047] DVB: Unable to find symbol stv090x_attach() [ 8.562117] BUG: unable to handle kernel NULL pointer dereference at 000000ac [ 8.562239] IP: [<e08b04a3>] dvb_frontend_detach+0x4/0x67 [dvb_core] Ref http://bugs.debian.org/575207 Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12md/raid6: Fix raid-6 read-error correction in degraded stateGabriele A. Trombetti
commit 87aa63000c484bfb9909989316f615240dfee018 upstream. Fix: Raid-6 was not trying to correct a read-error when in singly-degraded state and was instead dropping one more device, going to doubly-degraded state. This patch fixes this behaviour. Tested-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: Gabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com> Reported-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12virtio: initialize earlierStijn Tintel
commit e2dbe06c271f3bb2a495627980aad3d1d8ccef2a upstream. Move initialization of the virtio framework before the initialization of mtd, so that block2mtd can be used on virtio-based block devices. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15644 Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12md: restore ability of spare drives to spin down.NeilBrown
commit 1176568de7e066c0be9e46c37503b9fd4730edcf upstream. Some time ago we stopped the clean/active metadata updates from being written to a 'spare' device in most cases so that it could spin down and say spun down. Device failure/removal etc are still recorded on spares. However commit 51d5668cb2e3fd1827a55 broke this 50% of the time, depending on whether the event count is even or odd. The change log entry said: This means that the alignment between 'odd/even' and 'clean/dirty' might take a little longer to attain, how ever the code makes no attempt to create that alignment, so it could take arbitrarily long. So when we find that clean/dirty is not aligned with odd/even, force a second metadata-update immediately. There are already cases where a second metadata-update is needed immediately (e.g. when a device fails during the metadata update). We just piggy-back on that. Reported-by: Joe Bryant <tenminjoe@yahoo.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12security: testing the wrong variable in create_by_name()Dan Carpenter
commit b338cc8207eae46640a8d534738fda7b5e48511d upstream. There is a typo here. We should be testing "*dentry" instead of "dentry". If "*dentry" is an ERR_PTR, it gets dereferenced in either mkdir() or create() which would cause an OOPs. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sparc64: Fix memory leak in pci_register_iommu_region().David S. Miller
[ Upstream commit e182c77cc291456eed127b1472952ddb59a81a9d ] Found by kmemleak. If request_resource() fails, we leak the struct resource we allocated to represent the IOMMU mapping area. This actually happens on sun4v machines because the IOMEM area is only reported sans the IOMMU region, unlike all previous systems. I'll need to fix that at some point, but for now fix the leak. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sparc64: Adjust __raw_local_irq_save() to cooperate in NMIs.David S. Miller
[ Upstream commits 0c25e9e6cbe7b233bb91d14d0e2c258bf8e6ec83 and c011f80ba0912486fe51dd2b3f71d9b33a151188 ] If we are in an NMI then doing a plain raw_local_irq_disable() will write PIL_NORMAL_MAX into %pil, which is lower than PIL_NMI, and thus we'll re-enable NMIs and recurse. Doing a simple: %pil = %pil | PIL_NORMAL_MAX does what we want, if we're already at PIL_NMI (15) we leave it at that setting, else we set it to PIL_NORMAL_MAX (14). This should get the function tracer working on sparc64. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sparc64: Use kstack_valid() in die_if_kernel().David S. Miller
[ Upstream commit cb256aa60409efd803806cfb0528a4b3f8397dba ] This gets rid of a local function (is_kernel_stack()) which tries to do the same thing, yet poorly in that it doesn't handle IRQ stacks properly. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sparc64: Fix hardirq tracing in trap return path.David S. Miller
[ Upstream commit 28a1f533ae8606020238b840b82ae70a3f87609e ] We can overflow the hardirq stack if we set the %pil here so early, just let the normal control flow do it. This is fine as we are allowed to do the actual IRQ enable at any point after we call trace_hardirqs_on. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sparc64: Fix PREEMPT_ACTIVE value.David S. Miller
[ Upstream commit 6c94b1ee0ca2bfb526d779c088ec20da6a3761db ] It currently overlaps the NMI bit. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sparc64: Use correct pt_regs in decode_access_size() error paths.David S. Miller
[ Upstream commit baa06775e224e9f74e5c2de894c95cd49678beff ] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.Krzysztof Halasa
[ Upstream commit 31f634a63de7068c6a5dcb0d7b09b24b61a5cf88 ] tx_queue is used as a temporary queue when not allowed to queue skb directly to the hw device driver (which may sleep). Most paths flush it before returning, but ppp_start() currently cannot. Make sure we don't leave skbs pointing to a non-existent device. Thanks to Michael Barkowski for reporting this problem. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12udp: fix for unicast RX path optimizationJorge Boncompte [DTI2]
[ Upstream commit 1223c67c0938d2df309fde618bd82c87c8c1af04 ] Commits 5051ebd275de672b807c28d93002c2fb0514a3c9 and 5051ebd275de672b807c28d93002c2fb0514a3c9 ("ipv[46]: udp: optimize unicast RX path") broke some programs. After upgrading a L2TP server to 2.6.33 it started to fail, tunnels going up an down, after the 10th tunnel came up. My modified rp-l2tp uses a global unconnected socket bound to (INADDR_ANY, 1701) and one connected socket per tunnel after parameter negotiation. After ten sockets were open and due to mixed parameters to udp[46]_lib_lookup2() kernel started to drop packets. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12tun: orphan an skb on txMichael S. Tsirkin
[ Upstream commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5 ] The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. This happens because tun/tap devices can hang on to skbs undefinitely. As noted by Herbert, possible solutions include a timeout followed by a copy/change of ownership of the skb, or always copying/changing ownership if we're going into a hostile device. This patch implements the second approach. Note: one issue still remaining is that since skbs keep reference to tun socket and tun socket has a reference to tun device, we won't flush backlog, instead simply waiting for all skbs to get transmitted. At least this is not user-triggerable, and this was not reported in practice, my assumption is other devices besides tap complete an skb within finite time after it has been queued. A possible solution for the second issue would not to have socket reference the device, instead, implement dev->destructor for tun, and wait for all skbs to complete there, but this needs some thought, probably too risky for 2.6.34. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Yan Vugenfirer <yvugenfi@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12tipc: Fix oops on send prior to entering networked mode (v3)Neil Horman
[ Upstream commit d0021b252eaf65ca07ed14f0d66425dd9ccab9a6 ] Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE user programs can oops the kernel by sending datagrams via AF_TIPC prior to entering networked mode. The following backtrace has been observed: ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" [exception RIP: tipc_node_select_next_hop+90] RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b What happens is that, when the tipc module in inserted it enters a standalone node mode in which communication to its own address is allowed <0.0.0> but not to other addresses, since the appropriate data structures have not been allocated yet (specifically the tipc_net pointer). There is nothing stopping a client from trying to send such a message however, and if that happens, we attempt to dereference tipc_net.zones while the pointer is still NULL, and explode. The fix is pretty straightforward. Since these oopses all arise from the dereference of global pointers prior to their assignment to allocated values, and since these allocations are small (about 2k total), lets convert these pointers to static arrays of the appropriate size. All the accesses to these bits consider 0/NULL to be a non match when searching, so all the lookups still work properly, and there is no longer a chance of a bad dererence anywhere. As a bonus, this lets us eliminate the setup/teardown routines for those pointers, and elimnates the need to preform any locking around them to prevent access while their being allocated/freed. I've updated the tipc_net structure to behave this way to fix the exact reported problem, and also fixed up the tipc_bearers and media_list arrays to fix an obvious simmilar problem that arises from issuing tipc-config commands to manipulate bearers/links prior to entering networked mode I've tested this for a few hours by running the sanity tests and stress test with the tipcutils suite, and nothing has fallen over. There have been a few lockdep warnings, but those were there before, and can be addressed later, as they didn't actually result in any deadlock. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Allan Stephens <allan.stephens@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: tipc-discussion@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12net: Fix oops from tcp_collapse() when using splice()Steven J. Magnani
[ Upstream commit baff42ab1494528907bf4d5870359e31711746ae ] tcp_read_sock() can have a eat skbs without immediately advancing copied_seq. This can cause a panic in tcp_collapse() if it is called as a result of the recv_actor dropping the socket lock. A userspace program that splices data from a socket to either another socket or to a file can trigger this bug. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sctp: Fix oops when sending queued ASCONF chunksVlad Yasevich
[ Upstream commit c0786693404cffd80ca3cb6e75ee7b35186b2825 ] When we finish processing ASCONF_ACK chunk, we try to send the next queued ASCONF. This action runs the sctp state machine recursively and it's not prepared to do so. kernel BUG at kernel/timer.c:790! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/module/ipv6/initstate Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0 EIP is at add_timer+0xd/0x1b EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4 ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000) Stack: c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004 <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14 00000004 <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14 000000d0 Call Trace: [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp] [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp] [<d1863386>] ? sctp_pname+0x0/0x1d [sctp] [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp] [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp] [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp] [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp] [<d1863334>] ? sctp_cname+0x0/0x52 [sctp] [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp] [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp] [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp] Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie> Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sctp: fix to calc the INIT/INIT-ACK chunk length correctly is setWei Yongjun
[ Upstream commita8170c35e738d62e9919ce5b109cf4ed66e95bde ] When calculating the INIT/INIT-ACK chunk length, we should not only account the length of parameters, but also the parameters zero padding length, such as AUTH HMACS parameter and CHUNKS parameter. Without the parameters zero padding length we may get following oops. skb_over_panic: text:ce2068d2 len:130 put:6 head:cac3fe00 data:cac3fe00 tail:0xcac3fe82 end:0xcac3fe80 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:127! invalid opcode: 0000 [#2] SMP last sysfs file: /sys/module/aes_generic/initstate Modules linked in: authenc ...... Pid: 4102, comm: sctp_darn Tainted: G D 2.6.34-rc2 #6 EIP: 0060:[<c0607630>] EFLAGS: 00010282 CPU: 0 EIP is at skb_over_panic+0x37/0x3e EAX: 00000078 EBX: c07c024b ECX: c07c02b9 EDX: cb607b78 ESI: 00000000 EDI: cac3fe7a EBP: 00000002 ESP: cb607b74 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process sctp_darn (pid: 4102, ti=cb607000 task=cabdc990 task.ti=cb607000) Stack: c07c02b9 ce2068d2 00000082 00000006 cac3fe00 cac3fe00 cac3fe82 cac3fe80 <0> c07c024b cac3fe7c cac3fe7a c0608dec ca986e80 ce2068d2 00000006 0000007a <0> cb8120ca ca986e80 cb812000 00000003 cb8120c4 ce208a25 cb8120ca cadd9400 Call Trace: [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp] [<c0608dec>] ? skb_put+0x2e/0x32 [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp] [<ce208a25>] ? sctp_make_init+0x279/0x28c [sctp] [<c0686a92>] ? apic_timer_interrupt+0x2a/0x30 [<ce1fdc0b>] ? sctp_sf_do_prm_asoc+0x2b/0x7b [sctp] [<ce202823>] ? sctp_do_sm+0xa0/0x14a [sctp] [<ce2133b9>] ? sctp_pname+0x0/0x14 [sctp] [<ce211d72>] ? sctp_primitive_ASSOCIATE+0x2b/0x31 [sctp] [<ce20f3cf>] ? sctp_sendmsg+0x7a0/0x9eb [sctp] [<c064eb1e>] ? inet_sendmsg+0x3b/0x43 [<c04244b7>] ? task_tick_fair+0x2d/0xd9 [<c06031e1>] ? sock_sendmsg+0xa7/0xc1 [<c0416afe>] ? smp_apic_timer_interrupt+0x6b/0x75 [<c0425123>] ? dequeue_task_fair+0x34/0x19b [<c0446abb>] ? sched_clock_local+0x17/0x11e [<c052ea87>] ? _copy_from_user+0x2b/0x10c [<c060ab3a>] ? verify_iovec+0x3c/0x6a [<c06035ca>] ? sys_sendmsg+0x186/0x1e2 [<c042176b>] ? __wake_up_common+0x34/0x5b [<c04240c2>] ? __wake_up+0x2c/0x3b [<c057e35c>] ? tty_wakeup+0x43/0x47 [<c04430f2>] ? remove_wait_queue+0x16/0x24 [<c0580c94>] ? n_tty_read+0x5b8/0x65e [<c042be02>] ? default_wake_function+0x0/0x8 [<c0604e0e>] ? sys_socketcall+0x17f/0x1cd [<c040264c>] ? sysenter_do_call+0x12/0x22 Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 ...... EIP: [<c0607630>] skb_over_panic+0x37/0x3e SS:ESP 0068:cb607b74 To reproduce: # modprobe sctp # echo 1 > /proc/sys/net/sctp/addip_enable # echo 1 > /proc/sys/net/sctp/auth_enable # sctp_test -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 800 -l # sctp_darn -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 900 -h 192.168.0.21 -p 800 -I -s -t sctp_darn ready to send... 3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.0.21 3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.1.21 3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> snd=10 ------------------------------------------------------------------ eth0 has addresses: 3ffe:501:ffff:100:20c:29ff:fe4d:f37e and 192.168.0.21 eth1 has addresses: 192.168.1.21 ------------------------------------------------------------------ Reported-by: George Cheimonidis <gchimon@gmail.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sctp: per_cpu variables should be in bh_disabled sectionVlad Yasevich
[ Upstream commit 81419d862db743fe4450a021893f24bab4698c1d ] Since the change of the atomics to percpu variables, we now have to disable BH in process context when touching percpu variables. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sctp: fix potential reference of a freed pointerVlad Yasevich
[ Upstream commit 0c42749cffbb4a06be86c5e5db6c7ebad548781f ] When sctp attempts to update an assocition, it removes any addresses that were not in the updated INITs. However, the loop may attempt to refrence a transport with address after removing it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12sctp: avoid irq lock inversion while call sk->sk_data_ready()Wei Yongjun
[ Upstream commit 561b1733a465cf9677356b40c27653dd45f1ac56 ] sk->sk_data_ready() of sctp socket can be called from both BH and non-BH contexts, but the default sk->sk_data_ready(), sock_def_readable(), can not be used in this case. Therefore, we have to make a new function sctp_data_ready() to grab sk->sk_data_ready() with BH disabling. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #129 --------------------------------------------------------- sctp_darn/1517 just changed the state of lock: (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80 but this lock took another, SOFTIRQ-unsafe lock in the past: (slock-AF_INET){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 1 lock held by sctp_darn/1517: #0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12ipv6: Fix tcp_v6_send_response transport header setting.Herbert Xu
[ Upstream commit 6651ffc8e8bdd5fb4b7d1867c6cfebb4f309512c ] My recent patch to remove the open-coded checksum sequence in tcp_v6_send_response broke it as we did not set the transport header pointer on the new packet. Actually, there is code there trying to set the transport header properly, but it sets it for the wrong skb ('skb' instead of 'buff'). This bug was introduced by commit a8fdf2b331b38d61fb5f11f3aec4a4f9fb2dedcb ("ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12ieee802154: Fix oops during ieee802154_sock_ioctlStefan Schmidt
[ Upstream commit 93c0c8b4a5a174645550d444bd5c3ff0cccf74cb ] Trying to run izlisten (from lowpan-tools tests) on a device that does not exists I got the oops below. The problem is that we are using get_dev_by_name without checking if we really get a device back. We don't in this case and writing to dev->type generates this oops. [Oops code removed by Dmitry Eremin-Solenikov] If possible this patch should be applied to the current -rc fixes branch. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12cdc_ether: fix autosuspend for mbm devicesTorgny Johansson
[ Upstream commit 55964d72d63b15df49a5df11ef91dc8601270815 ] Autosuspend works until you bring the wwan interface up, then the device does not enter autosuspend anymore. The following patch fixes the problem by setting the .manage_power field in the mbm_info struct to the same as in the cdc_info struct (cdc_manager_power). Signed-off-by: Torgny Johansson <torgny.johansson@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12bnx2: Fix lost MSI-X problem on 5709 NICs.Michael Chan
commit c441b8d2cb2194b05550a558d6d95d8944e56a84 upstream. It has been reported that under certain heavy traffic conditions in MSI-X mode, the driver can lose an MSI-X vector causing all packets in the associated rx/tx ring pair to be dropped. The problem is caused by the chip dropping the write to unmask the MSI-X vector by the kernel (when migrating the IRQ for example). This can be prevented by increasing the GRC timeout value for these register read and write operations. Thanks to Dell for helping us debug this problem. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12powerpc: Reduce printk from pseries_mach_cpu_die()Vaidyanathan Srinivasan
commit a8e6da093ea8642b1320fb5d64134366f2a8d0ac upstream. Remove debug printks in pseries_mach_cpu_die(). These are noisy at runtime. Traceevents can be added to instrument this section of code. The following KERN_INFO printks are removed: cpu 62 (hwid 62) returned from cede. Decrementer value = b2802fff Timebase value = 2fa8f95035f4a cpu 62 (hwid 62) got prodded to go online cpu 58 (hwid 58) ceding for offline with hint 2 Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12powerpc: Move checks in pseries_mach_cpu_die()Vaidyanathan Srinivasan
commit 0212f2602a38e740d5a96aba4cebfc2ebc993ecf upstream. Rearrange condition checks for better code readability and prevention of possible race conditions when preferred_offline_state can potentially change during the execution of pseries_mach_cpu_die(). The patch will make pseries_mach_cpu_die() put cpu in one of the consistent states and not hit the run over BUG() Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12powerpc: Reset kernel stack on cpu online from cede stateVaidyanathan Srinivasan
commit 8dbce53cc249a76e9450708d291fce5a7e29c6a1 upstream. Cpu hotplug (offline) without dlpar operation will place cpu in cede state and the extended_cede_processor() function will return when resumed. Kernel stack pointer needs to be reset before start_secondary() is called to continue the online operation. Added new function start_secondary_resume() to do the above steps. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>