<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/mm/truncate.c, branch v2.6.27.8</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v2.6.27.8</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v2.6.27.8'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2008-09-03T02:21:37+00:00</updated>
<entry>
<title>VFS: fix dio write returning EIO when try_to_release_page fails</title>
<updated>2008-09-03T02:21:37+00:00</updated>
<author>
<name>Hisashi Hifumi</name>
<email>hifumi.hisashi@oss.ntt.co.jp</email>
</author>
<published>2008-09-02T21:35:40+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd'/>
<id>urn:sha1:6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd</id>
<content type='text'>
Dio write returns EIO when try_to_release_page fails because bh is
still referenced.

The patch

    commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
    Author: Mingming Cao &lt;cmm@us.ibm.com&gt;
    Date:   Fri Jul 25 01:46:22 2008 -0700

        jbd: fix race between free buffer and commit transaction

was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.

I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
sometimes got EIO through this test.

The patch above fixed race between freeing buffer(dio) and committing
transaction(jbd) but I discovered that there is another race, freeing
buffer(dio) and ext3/4_ordered_writepage.

: background_writeout()
     -&gt;write_cache_pages()
       -&gt;ext3_ordered_writepage()
     	   walk_page_buffers() -&gt; take a bh ref
 	   block_write_full_page() -&gt; unlock_page
		: &lt;- end_page_writeback
                : &lt;- race! (dio write-&gt;try_to_release_page fails)
      	   walk_page_buffers() -&gt;release a bh ref

ext3_ordered_writepage holds bh ref and does unlock_page remaining
taking a bh ref, so this causes the race and failure of
try_to_release_page.

To fix this race, I used the approach of falling back to buffered
writes if try_to_release_page() fails on a page.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Hisashi Hifumi &lt;hifumi.hisashi@oss.ntt.co.jp&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Mingming Cao &lt;cmm@us.ibm.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: rename page trylock</title>
<updated>2008-08-05T04:31:34+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-08-02T10:01:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=529ae9aaa08378cfe2a4350bded76f32cc8ff0ce'/>
<id>urn:sha1:529ae9aaa08378cfe2a4350bded76f32cc8ff0ce</id>
<content type='text'>
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock =&gt; trylock_page, SetPageLocked =&gt; set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: dont clear PG_uptodate on truncate/invalidate</title>
<updated>2008-08-02T16:12:34+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@suse.cz</email>
</author>
<published>2008-08-01T18:28:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=84209e02de48d72289650cc5a7ae8dd18223620f'/>
<id>urn:sha1:84209e02de48d72289650cc5a7ae8dd18223620f</id>
<content type='text'>
Brian Wang reported that a FUSE filesystem exported through NFS could
return I/O errors on read.  This was traced to splice_direct_to_actor()
returning a short or zero count when racing with page invalidation.

However this is not FUSE or NFSD specific, other filesystems (notably
NFS) also call invalidate_inode_pages2() to purge stale data from the
cache.

If this happens while such pages are sitting in a pipe buffer, then
splice(2) from the pipe can return zero, and read(2) from the pipe can
return ENODATA.

The zero return is especially bad, since it implies end-of-file or
disconnected pipe/socket, and is documented as such for splice.  But
returning an error for read() is also nasty, when in fact there was no
error (data becoming stale is not an error).

The same problems can be triggered by "hole punching" with
madvise(MADV_REMOVE).

Fix this by not clearing the PG_uptodate flag on truncation and
invalidation.

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Acked-by: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: spinlock tree_lock</title>
<updated>2008-07-26T19:00:06+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-07-26T02:45:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=19fd6231279be3c3bdd02ed99f9b0eb195978064'/>
<id>urn:sha1:19fd6231279be3c3bdd02ed99f9b0eb195978064</id>
<content type='text'>
mapping-&gt;tree_lock has no read lockers.  convert the lock from an rwlock
to a spinlock.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Reviewed-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fix invalidate_inode_pages2_range() to not clear ret</title>
<updated>2008-04-28T15:58:18+00:00</updated>
<author>
<name>Hisashi Hifumi</name>
<email>hifumi.hisashi@oss.ntt.co.jp</email>
</author>
<published>2008-04-28T09:12:08+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0dd1334faf7e075bfdb6f5284eed65210b296fc1'/>
<id>urn:sha1:0dd1334faf7e075bfdb6f5284eed65210b296fc1</id>
<content type='text'>
DIO invalidates page cache through invalidate_inode_pages2_range().
invalidate_inode_pages2_range() sets ret=-EIO when
invalidate_complete_page2() fails, but this ret is cleared if
do_launder_page() succeed on a page of next index.

In this case, dio is carried out even if invalidate_complete_page2() fails
on some pages.

This can cause inconsistency between memory and blocks on HDD because the
page cache still exists.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hisashi Hifumi &lt;hifumi.hisashi@oss.ntt.co.jp&gt;
Cc: Badari Pulavarty &lt;pbadari@us.ibm.com&gt;
Cc: Ken Chen &lt;kenchen@google.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Chuck Lever &lt;cel@citi.umich.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>docbook: fix kernel-api source files</title>
<updated>2008-03-03T18:47:14+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>randy.dunlap@oracle.com</email>
</author>
<published>2008-03-01T06:03:15+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0643245f595dc175c14245fa1e1e9efda3e12f2a'/>
<id>urn:sha1:0643245f595dc175c14245fa1e1e9efda3e12f2a</id>
<content type='text'>
Fix docbook problems in kernel-api.tmpl.
These cause the generated docbook to be incorrect.

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>page migraton: handle orphaned pages</title>
<updated>2008-02-05T17:44:19+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2008-02-05T06:29:33+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=62e1c55300f306e06478f460a7eefba085206e0b'/>
<id>urn:sha1:62e1c55300f306e06478f460a7eefba085206e0b</id>
<content type='text'>
Orphaned page might have fs-private metadata, the page is truncated.  As
the page hasn't mapping, page migration refuse to migrate the page.  It
appears the page is only freed in page reclaim and if zone watermark is
low, the page is never freed, as a result migration always fail.  I thought
we could free the metadata so such page can be freed in migration and make
migration more reliable.

[akpm@linux-foundation.org: go direct to try_to_free_buffers()]
Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Acked-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Fix dirty page accounting leak with ext3 data=journal</title>
<updated>2008-02-05T17:44:19+00:00</updated>
<author>
<name>Bjorn Steinbrink</name>
<email>B.Steinbrink@gmx.de</email>
</author>
<published>2008-02-05T06:29:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a2b345642f530054a92b8d2b5108436225a8093e'/>
<id>urn:sha1:a2b345642f530054a92b8d2b5108436225a8093e</id>
<content type='text'>
In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 ("Clean up and make
try_to_free_buffers() not race with dirty pages"), try_to_free_buffers
was changed to bail out if the page was dirty.

That in turn caused truncate_complete_page to leak massive amounts of
memory, because the dirty bit was only cleared after the call to
try_to_free_buffers.

So the call to cancel_dirty_page was moved up to have the dirty bit
cleared early in 3e67c0987d7567ad666641164a153dca9a43b11d ("truncate:
clear page dirtiness before running try_to_free_buffers()").

The problem with that fix is, that the page can be redirtied after
cancel_dirty_page was called, eg. like this:

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
    ext3_invalidatepage()
      journal_invalidatepage()
        journal_unmap_buffer()
          __dispose_buffer()
            __journal_unfile_buffer()
              __journal_temp_unlink_buffer()
                mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

And then we end up with dirty pages being wrongly accounted.

As a result, in ecdfc9787fe527491baefc22dce8b2dbd5b2908d ("Resurrect
'try_to_free_buffers()' VM hackery") the changes to try_to_free_buffers
were reverted, so the original reason for the massive memory leak is
gone, and we can also revert the move of the call to cancel_dirty_page
from truncate_complete_page and get the accounting right again.

I'm not sure if it matters, but opposed to the final check in
__remove_from_page_cache, this one also cares about the task io
accounting, so maybe we want to use this instead, although it's not
quite the clean fix either.

Signed-off-by: Björn Steinbrink &lt;B.Steinbrink@gmx.de&gt;
Tested-by: Krzysztof Piotr Oledzki &lt;ole@ans.pl&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Thomas Osterried &lt;osterried@jesse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user</title>
<updated>2008-02-05T17:44:13+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2008-02-05T06:28:29+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=eebd2aa355692afaf9906f62118620f1a1c19dbb'/>
<id>urn:sha1:eebd2aa355692afaf9906f62118620f1a1c19dbb</id>
<content type='text'>
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Steven French &lt;sfrench@us.ibm.com&gt;
Cc: Michael Halcrow &lt;mhalcrow@us.ibm.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Anton Altaparmakov &lt;aia21@cantab.net&gt;
Cc: Mark Fasheh &lt;mark.fasheh@oracle.com&gt;
Cc: David Chinner &lt;dgc@sgi.com&gt;
Cc: Michael Halcrow &lt;mhalcrow@us.ibm.com&gt;
Cc: Steven French &lt;sfrench@us.ibm.com&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>do_invalidatepage() comment typo fix</title>
<updated>2008-02-03T16:04:10+00:00</updated>
<author>
<name>Fengguang Wu</name>
<email>wfg@mail.ustc.edu.cn</email>
</author>
<published>2008-02-03T16:04:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=28bc44d7d1d967b8251214dd7a130d523b5ba5ee'/>
<id>urn:sha1:28bc44d7d1d967b8251214dd7a130d523b5ba5ee</id>
<content type='text'>
Fix a typo in the comment for do_invalidatepage().

Signed-off-by: Fengguang Wu &lt;wfg@mail.ustc.edu.cn&gt;
Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;
</content>
</entry>
</feed>
