Age | Commit message (Collapse) | Author |
|
In testing the mnt_count typo fix, I hit a few BUG_ON/WARN_ON messages
in the mntput_no_expire code.
The first issue was a race against the MNT_MOUNTED flag, where if after
the optimistic lock free check is done, someone changes the value, we
might BUG_ON after getting the lock. The fix is after getting the lock,
re-check the MNT_MOUNTED bit and drop the lock and try again if its
changed.
The second issue was a call to smp_processor_id() in add_mnt_count()
that was done while preemptable. This was missed in my earlier commit
070976b5b038218900648ea4cc88786d5dfcd58d.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1273711934.2856.22.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Clark noticed the following snippit in commit
070976b5b038218900648ea4cc88786d5dfcd58d :
if (mnt->mnt_pinned) {
- inc_mnt_count(mnt);
+ preempt_disable();
+ dec_mnt_count(mnt);
+ preempt_enable();
mnt->mnt_pinned--;
}
vfsmount_write_unlock();
I accidentally replaced an inc_mnt_count() with a dec_mnt_count().
The issue went unnoticed, as the only user of mnt_unpin in the acct
syscall.
This patch corrects the mistake.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
LKML-Reference: <1273711544.2856.15.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Conflicts:
Makefile
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
commit 9bf729c0af67897ea8498ce17c29b0683f7f2028 upstream
On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.
This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2b0b39517d1af5294128dbc2fd7ed39c8effa540 upstream.
Resizing the filesystem would result in an diAllocExt error in some
instances because changes in bmp->db_agsize would not get noticed if
goto extendBmap was called.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1 upstream.
ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region. This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.
We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.
Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 356e76b855bdbfd8d1c5e75bcf0c6bf0dfe83496 upstream.
NFSv4 mounts ignore the rsize and wsize mount options, and always use
the default transfer size for both. This seems to be because all
NFSv4 mounts are now cloned, and the cloning logic doesn't copy the
rsize and wsize settings from the parent nfs_server.
I tested Fedora's 2.6.32.11-99 and it seems to have this problem as
well, so I'm guessing that .33, .32, and perhaps older kernels have
this issue as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f upstream.
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.
Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a36d515c7a2dfacebcf41729f6812dbc424ebcf0 upstream.
When asked for a partial read of the LVB in a dlmfs file, we can
accidentally calculate a negative count.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a42ab8e1a37257da37e0f018e707bf365ac24531 upstream.
Online resize writes out the new superblock and its backups directly.
The metaecc data wasn't being recomputed. Let's do that directly.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0350cb078f5035716ebdad4ad4709d02fe466a8a upstream.
If "handle" is non null at the end of the function then we assume it's a
valid pointer and pass it to ocfs2_commit_trans();
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c21a534e2f24968cf74976a4e721ac194db30ded upstream.
In reflink we update the id info on the disk but forgot to update
the corresponding information in the VFS inode. Update them
accordingly when we want to preserve the attributes.
Reported-by: Jeff Liu <jeff.liu@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2bc3c1179c781b359d4f2f3439cb3df72afc17fc upstream.
When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to. So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.
We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic. Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3835541dd481091c4dbf5ef83c08aed12e50fd61 upstream.
Correct the file_operations struct in fdinfo entry of tid_base_stuff[].
Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
/proc/*/fd/.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit fb2162df74bb19552db3d988fd11c787cf5fad56 upstream.
Commit 48b32a3553a54740d236b79a90f20147a25875e3 ("reiserfs: use generic
xattr handlers") introduced a problem that causes corruption when extended
attributes are replaced with a smaller value.
The issue is that the reiserfs_setattr to shrink the xattr file was moved
from before the write to after the write.
The root issue has always been in the reiserfs xattr code, but was papered
over by the fact that in the shrink case, the file would just be expanded
again while the xattr was written.
The end result is that the last 8 bytes of xattr data are lost.
This patch fixes it to use new_size.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Cc: Jethro Beekman <kernel@jbeekman.nl>
Cc: Greg Surbey <gregsurbey@hotmail.com>
Cc: Marco Gatti <marco.gatti@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cac36f707119b792b2396aed371d6b5cdc194890 upstream.
Commit 677c9b2e393a0cd203bd54e9c18b012b2c73305a ("reiserfs: remove
privroot hiding in lookup") removed the magic from the lookup code to hide
the .reiserfs_priv directory since it was getting loaded at mount-time
instead. The intent was that the entry would be hidden from the user via
a poisoned d_compare, but this was faulty.
This introduced a security issue where unprivileged users could access and
modify extended attributes or ACLs belonging to other users, including
root.
This patch resolves the issue by properly hiding .reiserfs_priv. This was
the intent of the xattr poisoning code, but it appears to have never
worked as expected. This is fixed by using d_revalidate instead of
d_compare.
This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way.
The effort involved in working out the corner cases wrt permissions and
caching outweigh the benefit of the feature.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Reported-by: Matt McCutchen <matt@mattmccutchen.net>
Tested-by: Matt McCutchen <matt@mattmccutchen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Apparently the conversion from using the dcache_lock -> autofs4_lock
forgot that this function already grabs the autofs_lock for a small
moment, so we end up grabbing the lock, then a moment later grab it
again. Splat.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
LKML-Reference: <1273279153.2776.7.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Fix for this compile warning:
fs/namespace.c:757: warning: suggest parentheses around operand \
of '!' or change '&' to '&&' or '!' to '~'
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
dput() drops dentry->d_lock when it fails to lock inode->i_lock or
parent->d_lock. dentry->d_count is 0 at this point so dentry kann be
killed and freed by someone else. This leaves dput with a stale
pointer in the retry code which results in interesting kernel crashes.
Prevent this by incrementing dentry->d_count before dropping the
lock. Go back to start after dropping the lock so d_count is
decremented again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The VFS scalability rework broke UP due to a stupid typo which
enqueued inodes on the file list.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Nick converted the dentry->d_mounted counter to a flag, however with
namespaces, dentries can be mounted multiple times (and more
importantly unmounted multiple times).
If a namespace was created and then released, the unmount_tree would
remove the DCACHE_MOUNTED flag and that would make d_mountpoint fail,
causing the mounts to be lost.
This patch coverts it back to a counter, and adds some extra WARN_ONs
to make sure things are accounted properly.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1272522942.1967.12.camel@work-vm>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
i_count is not longer atomic. Fix up the leftover.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
d_count got converted to int and back to atomic_t. Two instances were
missed in the backward conversion. Fix them up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
We don't call attach_mnt on a cloned rootfs so set the MNT_MOUNTED
flag in copy_tree().
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On RT the lock() inside the preempt disabled region of get_cpu_var()
results in a might sleep warning.
Restructure the code and check the atomic transition to 0 open coded
to avoid vfsmount_write_lock() in the case when ns->count is > 1.
If ns->count == 1 then do the atomic decrement under full locking of
namespace_sem and vfsmount_write_lock(). In most cases the
atomic_dec_and_test() will have dropped ns->count to 0 so we need the
full locking anyway.
Based on a patch from John Stultz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
do_unmount() does a lock() instead of unlock() in a return path which
will lead to a dead lock when this code path is taken. Fix the typo.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Amit Arora noticed some compile issues with coda, and an fs.h include
issue, so so this patch fixes those along with btrfs warnings.
Thanks to Amit for the testing!
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On Wed, Apr 07, 2010 at 04:21:18PM -0700, john stultz wrote:
> Further using lockstat I was able to isolate it the contention down to
> the journal j_state_lock, and then adding some lock owner tracking, I
> was able to see that the lock owners were almost always in
> start_this_handle, and jbd2_journal_stop when we saw contention (with
> the freq breakdown being about 55% in jbd2_journal_stop and 45% in
> start_this_handle).
Hmm.... I've taken a very close look at jbd2_journal_stop(), and I
don't think we need to take j_state_lock() at all except if we need to
call jbd2_log_start_commit(). t_outstanding_credits,
h_buffer_credits, and t_updates are all documented (and verified by
me) to be protected by the t_handle_lock spinlock.
So I ***think*** the following might be safe. WARNING! WARNING!! No
real testing done on this patch, other than "it compiles! ship it!!".
I'll let other people review it, and maybe you could give this a run
and see what happens with this patch?
- Ted
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
After adding an xfs partition to my system, I started seeing
boot time NULL pointer oopses, and bisected it down to the
fs-scale-pseudo change.
Not sure what the right fix is, but this change avoids the issue.
Here's the bug i was seeing on boot:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffff81103d42>] link_path_walk+0xd12/0xda0
PGD 42b12e067 PUD 42cb2a067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/block/md0/dev
CPU 7
Pid: 2993, comm: vgs Not tainted 2.6.33-rc8john #272 Server Blade/IBM eServer BladeCenter HS21 -[7995AC1]-
RIP: 0010:[<ffffffff81103d42>] [<ffffffff81103d42>] link_path_walk+0xd12/0xda0
RSP: 0018:ffff88042a929b78 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88042ab41000 RCX: ffff88042ab41028
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88042aa0fcc0
RBP: ffff88042a929c28 R08: ffff88042aa0fcc0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88042c6a40b0
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88042a929dc8
FS: 00007f6f8c481710(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000030 CR3: 000000042b310000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vgs (pid: 2993, threadinfo ffff88042a928000, task ffff88042ab41000)
Stack:
ffff88042ab41000 ffff88042ab41000 ffff88042ab41000 ffff88042ab41000
<0> 0000000100000000 ffff88042a929de8 ffff880400000000 0000000000000000
<0> ffff88042f6b5610 0000000000000000 0000000000000000 ffff88042f418920
Call Trace:
[<ffffffff811006c2>] ? path_get+0x32/0x50
[<ffffffff81103c50>] link_path_walk+0xc20/0xda0
[<ffffffff811006c2>] ? path_get+0x32/0x50
[<ffffffff81103f7c>] path_walk+0x5c/0xd0
[<ffffffff811041de>] do_path_lookup+0x1ee/0x250
[<ffffffff81103ff0>] ? do_path_lookup+0x0/0x250
[<ffffffff81104ebb>] user_path_at+0x7b/0xb0
[<ffffffff81112bb1>] ? vfsmount_read_unlock+0x31/0x60
[<ffffffff81114788>] ? mntput_no_expire+0x48/0x190
[<ffffffff810fb293>] ? cp_new_stat+0xe3/0xf0
[<ffffffff810fb4ac>] vfs_fstatat+0x3c/0x80
[<ffffffff810fb616>] vfs_stat+0x16/0x20
[<ffffffff810fb63f>] sys_newstat+0x1f/0x50
[<ffffffff81994a33>] ? lockdep_sys_exit_thunk+0x35/0x67
[<ffffffff810025eb>] system_call_fastpath+0x16/0x1b
Code: ec e8 93 c8 ff ff 0f 1f 00 e9 46 ff ff ff 41 83 7f 34 04 66 0f 1f 44 00 00 0f 85 38 ff ff ff 4d 8b 67 08 49 8b 84 24 b8 00 00 00 <48> 8b 40 30 f6 40 09 40 0f 84 1e ff ff ff 49 8b 44 24 70 4c 89
RIP [<ffffffff81103d42>] link_path_walk+0xd12/0xda0
RSP <ffff88042a929b78>
CR2: 0000000000000030
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---[ end trace 0dd94d94b1b27094 ]---
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Quoting Nick:
"BTW there are a few issues Al pointed out. We have to synchronize RCU
after unregistering a filesystem so d_ops/i_ops doesn't go away, and
mntput can sleep so we can't do it under RCU read lock."
This patch simply calls synchronize_rcpu in unregister_filesystem to avoid
this issue
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Originally found by Anton Blanchard, this patch makes sure
we keep the MNT_MOUNTED flag set in do_remount(). Without this
scalability suffers pretty badly.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This patch reverts the portion of Nick's vfs scalability patch that
converts the dentry d_count from an atomic_t to an int protected by
the d_lock.
This greatly improves vfs scalability with the -rt kernel, as
the extra lock contention on the d_lock hurts very badly when
CONFIG_PREEMPT_RT is enabled and the spinlocks become rtmutexes.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In Nick's patches, there's a few spots that use get_cpu_var to access
a per-cpu spinlock. However, the put_cpu_var isn't called until after the
lock is aquired and released. This causes mightsleep warnings with -rt.
Move the put_cpu_var above the spin_lock/unlock call to avoid this.
Not sure if this is 100% right, but seems to work. Not sure what
holding the get does on the lock, since once we have the lock,
the reference shouldn't change. Other users of the same lock don't bother
with the get_cpu_var method and just use per_cpu.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
With Nick's vfs patches, inc/dec_mnt_count use per-cpu counters, so
this patch makes sure we disable preemption before calling.
Its not a great fix, but works because count_mnt_count() sums all the
percpu values, so each one individually doesn't need to be 0'ed out.
I suspect the better fix for -rt is to revert the mnt_count back to an atomic
counter.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Because vfsmount_read_lock aquires the vfsmount spinlock for the current cpu,
it causes problems wiht -rt, as you might migrate between cpus between a
lock and unlock.
This patch fixes the issue by having the caller pick a cpu, then consistently
use that cpu between the lock and unlock. We may migrate inbetween lock and
unlock, but that's ok because we're not doing anything cpu specific, other
then avoiding contention on the read side across the cpus.
Its not pretty, but it works and statistically shouldn't hurt performance.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The rt hack in mnt_want_write needs to be changed to work with
Nick's VFS patches.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
I was seeing MNT_MOUNTED already set WARN_ON messages in commit_tree.
This seems to be caused by clone_mnt copying the flag of an already mounted
mnt to the mount before it is used by commit_tree.
My fix (which may not be correct) is to unmark MNT_MOUNTED on the cloned
mnt.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This patch is just the delta from Nick's 06102009 and his 09102009 megapatches
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
fs: inode per-cpu nr_inodes counter
Avoids cache line ping pongs between cpus and prepare next patch,
because updates of nr_inodes dont need inode_lock anymore.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
fs: inode per-cpu last_ino allocator
new_inode() dirties a contended cache line to get increasing
inode numbers.
Solve this problem by providing to each cpu a per_cpu variable,
feeded by the shared last_ino, but once every 1024 allocations.
This reduce contention on the shared last_ino, and give same
spreading ino numbers than before.
(same wraparound after 2^32 allocations)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
XXX: this should be folded back into the individual locking patches
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.
XXX: is this right? I can't see any reason why they need to have a real
parent.
TODO: add a d_instantiate_something() and avoid adding the extra checks
for !d_parent
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This enables locking to be reduced and simplified.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
RCU free the struct inode. This will allow:
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- eventually, completely write-free RCU path walking. The inode must be
consulted for permissions when walking, so a write-free reference (ie.
RCU is helpful).
- can potentially simplify things a bit in VM land. May not need to take the
page lock to get back to the page->mapping.
- can remove some nested trylock loops in dcache code
todo: convert all filesystems
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Impelemnt lazy inode lru similarly to dcache. This should reduce inode list
lock acquisition (todo: measure).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Remove the global inode_hash_lock and replace it with per-hash-bucket locks.
Todo: should use bit spinlock in hlist_head pointer to save space.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|