<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/fs/ext4/inode.c, branch v3.0.22</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.0.22</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.0.22'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2011-12-21T20:57:44+00:00</updated>
<entry>
<title>ext4: avoid potential hang in mpage_submit_io() when blocksize &lt; pagesize</title>
<updated>2011-12-21T20:57:44+00:00</updated>
<author>
<name>Yongqiang Yang</name>
<email>xiaoqiangnk@gmail.com</email>
</author>
<published>2011-12-14T02:51:55+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8fe5e8ff93473129957c6d4a7df80046186b9c7b'/>
<id>urn:sha1:8fe5e8ff93473129957c6d4a7df80046186b9c7b</id>
<content type='text'>
commit 13a79a4741d37fda2fbafb953f0f301dc007928f upstream.

If there is an unwritten but clean buffer in a page and there is a
dirty buffer after the buffer, then mpage_submit_io does not write the
dirty buffer out.  As a result, da_writepages loops forever.

This patch fixes the problem by checking dirty flag.

Signed-off-by: Yongqiang Yang &lt;xiaoqiangnk@gmail.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: avoid hangs in ext4_da_should_update_i_disksize()</title>
<updated>2011-12-21T20:57:44+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2011-12-14T02:41:15+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=dda54df863232231ce851045834f475f02a7e565'/>
<id>urn:sha1:dda54df863232231ce851045834f475f02a7e565</id>
<content type='text'>
commit ea51d132dbf9b00063169c1159bee253d9649224 upstream.

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to -&gt;end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb&gt; bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=&lt;value optimized out&gt;, iov=&lt;value optimized out&gt;, nr_segs=&lt;value o\
 ptimized out&gt;, pos=0x108000, ppos=0xffff88001e26be40, count=&lt;value optimized out&gt;, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=&lt;value optimized out&gt;, iov=&lt;value optimized out&gt;, nr_segs=&lt;value optimized out&gt;, p\
 os=0x108000, ppos=0xffff88001e26be40, count=&lt;value optimized out&gt;, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=&lt;value optimi\
 zed out&gt;, pos=&lt;value optimized out&gt;) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=&lt;value optimized out&gt;, len=&lt;value optimized out&gt;, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 &lt;Address 0x1ec2960 out of bounds&gt;, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=&lt;value optimized out&gt;, buf=0x1ec2960 &lt;Address 0x1ec2960 out of bounds&gt;, count=0x\
 4000) at fs/read_write.c:487
 #10 &lt;signal handler called&gt;
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb&gt; print offset
 $22 = 0xffffffffffffffff
 gdb&gt; print idx
 $23 = 0xffffffff
 gdb&gt; print inode-&gt;i_blkbits
 $24 = 0xc
 gdb&gt; up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb&gt; print start
 $25 = 0x0
 gdb&gt; print end
 $26 = 0xffffffffffffffff
 gdb&gt; print pos
 $27 = 0x108000
 gdb&gt; print new_i_size
 $28 = 0x108000
 gdb&gt; print ((struct ext4_inode_info *)((char *)inode-((int)(&amp;((struct ext4_inode_info *)0)-&gt;vfs_inode))))-&gt;i_disksize
 $29 = 0xd9000
 gdb&gt; down
 2467            for (i = 0; i &lt; idx; i++)
 gdb&gt; print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining</title>
<updated>2011-11-11T17:37:16+00:00</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-08-31T15:50:51+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ef52f3936f9f5d770ea177e5c769e68af1701a90'/>
<id>urn:sha1:ef52f3936f9f5d770ea177e5c769e68af1701a90</id>
<content type='text'>
commit 8c0bec2151a47906bf779c6715a10ce04453ab77 upstream.

The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places.  In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode().  Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix.  Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero.  However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage</title>
<updated>2011-10-03T18:40:43+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2010-06-06T16:38:15+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ac693061b11c33d5a5c5ec1925de7abd3fcb0971'/>
<id>urn:sha1:ac693061b11c33d5a5c5ec1925de7abd3fcb0971</id>
<content type='text'>
commit 6e6938b6d3130305a5960c86b1a9b21e58cf6144 upstream.

sync(2) is performed in two stages: the WB_SYNC_NONE sync and the
WB_SYNC_ALL sync. Identify the first stage with .tagged_writepages and
do livelock prevention for it, too.

Jan's commit f446daaea9 ("mm: implement writeback livelock avoidance
using page tagging") is a partial fix in that it only fixed the
WB_SYNC_ALL phase livelock.

Although ext4 is tested to no longer livelock with commit f446daaea9,
it may due to some "redirty_tail() after pages_skipped" effect which
is by no means a guarantee for _all_ the file systems.

Note that writeback_inodes_sb() is called by not only sync(), they are
treated the same because the other callers also need livelock prevention.

Impact:  It changes the order in which pages/inodes are synced to disk.
Now in the WB_SYNC_NONE stage, it won't proceed to write the next inode
until finished with the current inode.

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
CC: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: fix nomblk_io_submit option so it correctly converts uninit blocks</title>
<updated>2011-08-29T20:29:11+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-08-13T16:58:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=45df4b8977852ea12d6ed19f6c87e6765f6c31e5'/>
<id>urn:sha1:45df4b8977852ea12d6ed19f6c87e6765f6c31e5</id>
<content type='text'>
commit 9dd75f1f1a02d656a11a7b9b9e6c2759b9c1e946 upstream.

Bug discovered by Jan Kara:

Finally, commit 1449032be17abb69116dbc393f67ceb8bd034f92 returned back
the old IO submission code but apparently it forgot to return the old
handling of uninitialized buffers so we unconditionnaly call
block_write_full_page() without specifying end_io function. So AFAICS
we never convert unwritten extents to written in some cases. For
example when I mount the fs as: mount -t ext4 -o
nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
        int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
        char buf[1024];
        memset(buf, 'a', sizeof(buf));
        fallocate(fd, 0, 0, 16384);
        write(fd, buf, sizeof(buf));

I get a file full of zeros (after remounting the filesystem so that
pagecache is dropped) instead of seeing the first KB contain 'a's.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.</title>
<updated>2011-08-29T20:29:11+00:00</updated>
<author>
<name>Tao Ma</name>
<email>boyu.mt@taobao.com</email>
</author>
<published>2011-08-13T16:30:59+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4eddd2a50f2548c6c83081fde8fdbc3de07626f6'/>
<id>urn:sha1:4eddd2a50f2548c6c83081fde8fdbc3de07626f6</id>
<content type='text'>
commit 32c80b32c053dc52712dedac5e4d0aa7c93fc353 upstream.

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&amp;m=131316851417460&amp;w=2

Reported-and-Tested-by: Michael Tokarev&lt;mjt@tls.msk.ru&gt;
Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode</title>
<updated>2011-08-29T20:29:10+00:00</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-08-13T16:17:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2526f368949bccda6e8ed1bf74a4e955e3af42af'/>
<id>urn:sha1:2526f368949bccda6e8ed1bf74a4e955e3af42af</id>
<content type='text'>
commit 2581fdc810889fdea97689cb62481201d579c796 upstream.

Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Also moves ext4_flush_completed_IO and ext4_ioend_wait from
ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
ext4_evict_inode() is called before ext4_destroy_inode() and in
ext4_evict_inode(), we may call ext4_truncate() without holding
i_mutex lock. As a result, there is a race between flush_completed_IO
that is called from ext4_ext_truncate() and ext4_end_io_work, which
may cause corruption on an io_end structure. This change moves
ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
to ext4_evict_inode() to resolve the race between ext4_truncate() and
ext4_end_io_work during inode deletion.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: Fix ext4_should_writeback_data() for no-journal mode</title>
<updated>2011-08-29T20:29:10+00:00</updated>
<author>
<name>Curt Wohlgemuth</name>
<email>curtw@google.com</email>
</author>
<published>2011-08-13T15:25:18+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2fb522e963f57a6c0206f67a355b8131b16ef607'/>
<id>urn:sha1:2fb522e963f57a6c0206f67a355b8131b16ef607</id>
<content type='text'>
commit 441c850857148935babe000fc2ba1455fe54a6a9 upstream.

ext4_should_writeback_data() had an incorrect sequence of
tests to determine if it should return 0 or 1: in
particular, even in no-journal mode, 0 was being returned
for a non-regular-file inode.

This meant that, in non-journal mode, we would use
ext4_journalled_aops for directories, symlinks, and other
non-regular files.  However, calling journalled aop
callbacks when there is no valid handle, can cause problems.

This would cause a kernel crash with Jan Kara's commit
2d859db3e4 ("ext4: fix data corruption in inodes with
journalled data"), because we now dereference 'handle' in
ext4_journalled_write_end().

I also added BUG_ONs to check for a valid handle in the
obviously journal-only aops callbacks.

I tested this running xfstests with a scratch device in
these modes:

   - no-journal
   - data=ordered
   - data=writeback
   - data=journal

All work fine; the data=journal run has many failures and a
crash in xfstests 074, but this is no different from a
vanilla kernel.

Signed-off-by: Curt Wohlgemuth &lt;curtw@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: fixed tracepoints cleanup</title>
<updated>2011-06-06T13:51:52+00:00</updated>
<author>
<name>Lukas Czerner</name>
<email>lczerner@redhat.com</email>
</author>
<published>2011-06-06T13:51:52+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a9c667f8f0656631ee5438baaf21bf30d5f67375'/>
<id>urn:sha1:a9c667f8f0656631ee5438baaf21bf30d5f67375</id>
<content type='text'>
While creating fixed tracepoints for ext3, basically by porting them
from ext4, I found a lot of useless retyping, wrong type usage, useless
variable passing and other inconsistencies in the ext4 fixed tracepoint
code.

This patch cleans the fixed tracepoint code for ext4 and also simplify
some of them.

Signed-off-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>fs: pass exact type of data dirties to -&gt;dirty_inode</title>
<updated>2011-05-27T11:04:40+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-05-27T10:53:02+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=aa38572954ade525817fe88c54faebf85e5a61c0'/>
<id>urn:sha1:aa38572954ade525817fe88c54faebf85e5a61c0</id>
<content type='text'>
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that -&gt;dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
