<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/fs/btrfs/file.c, branch v3.15.9</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.15.9</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.15.9'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2014-07-01T03:14:02+00:00</updated>
<entry>
<title>Btrfs: read inode size after acquiring the mutex when punching a hole</title>
<updated>2014-07-01T03:14:02+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@gmail.com</email>
</author>
<published>2014-04-26T00:35:31+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2501c4a066e633524791e8ce8dbfe615aca071cf'/>
<id>urn:sha1:2501c4a066e633524791e8ce8dbfe615aca071cf</id>
<content type='text'>
commit a1a50f60a6bf4f861eb94793420274bc1ccd409a upstream.

In a previous change, commit 12870f1c9b2de7d475d22e73fd7db1b418599725,
I accidentally moved the roundup of inode-&gt;i_size to outside of the
critical section delimited by the inode mutex, which is not atomic and
not correct since the size can be changed by other task before we acquire
the mutex. Therefore fix it.

Signed-off-by: Filipe David Borba Manana &lt;fdmanana@gmail.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Btrfs: fix leaf corruption caused by ENOSPC while hole punching</title>
<updated>2014-07-01T03:14:02+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@gmail.com</email>
</author>
<published>2014-04-29T12:18:40+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=938bdab6783797ff1f70cbb9cefa406889ef5c52'/>
<id>urn:sha1:938bdab6783797ff1f70cbb9cefa406889ef5c52</id>
<content type='text'>
commit fc19c5e73645f95d3eca12b4e91e7b56faf1e4a4 upstream.

While running a stress test with multiple threads writing to the same btrfs
file system, I ended up with a situation where a leaf was corrupted in that
it had 2 file extent item keys that had the same exact key. I was able to
detect this quickly thanks to the following patch which triggers an assertion
as soon as a leaf is marked dirty if there are duplicated keys or out of order
keys:

    Btrfs: check if items are ordered when a leaf is marked dirty
    (https://patchwork.kernel.org/patch/3955431/)

Basically while running the test, I got the following in dmesg:

    [28877.415877] WARNING: CPU: 2 PID: 10706 at fs/btrfs/file.c:553 btrfs_drop_extent_cache+0x435/0x440 [btrfs]()
    (...)
    [28877.415917] Call Trace:
    [28877.415922]  [&lt;ffffffff816f1189&gt;] dump_stack+0x4e/0x68
    [28877.415926]  [&lt;ffffffff8104a32c&gt;] warn_slowpath_common+0x8c/0xc0
    [28877.415929]  [&lt;ffffffff8104a37a&gt;] warn_slowpath_null+0x1a/0x20
    [28877.415944]  [&lt;ffffffffa03775a5&gt;] btrfs_drop_extent_cache+0x435/0x440 [btrfs]
    [28877.415949]  [&lt;ffffffff8118e7be&gt;] ? kmem_cache_alloc+0xfe/0x1c0
    [28877.415962]  [&lt;ffffffffa03777d9&gt;] fill_holes+0x229/0x3e0 [btrfs]
    [28877.415972]  [&lt;ffffffffa0345865&gt;] ? block_rsv_add_bytes+0x55/0x80 [btrfs]
    [28877.415984]  [&lt;ffffffffa03792cb&gt;] btrfs_fallocate+0xb6b/0xc20 [btrfs]
    (...)
    [29854.132560] BTRFS critical (device sdc): corrupt leaf, bad key order: block=955232256,root=1, slot=24
    [29854.132565] BTRFS info (device sdc): leaf 955232256 total ptrs 40 free space 778
    (...)
    [29854.132637] 	item 23 key (3486 108 667648) itemoff 2694 itemsize 53
    [29854.132638] 		extent data disk bytenr 14574411776 nr 286720
    [29854.132639] 		extent data offset 0 nr 286720 ram 286720
    [29854.132640] 	item 24 key (3486 108 954368) itemoff 2641 itemsize 53
    [29854.132641] 		extent data disk bytenr 0 nr 0
    [29854.132643] 		extent data offset 0 nr 0 ram 0
    [29854.132644] 	item 25 key (3486 108 954368) itemoff 2588 itemsize 53
    [29854.132645] 		extent data disk bytenr 8699670528 nr 77824
    [29854.132646] 		extent data offset 0 nr 77824 ram 77824
    [29854.132647] 	item 26 key (3486 108 1146880) itemoff 2535 itemsize 53
    [29854.132648] 		extent data disk bytenr 8699670528 nr 77824
    [29854.132649] 		extent data offset 0 nr 77824 ram 77824
    (...)
    [29854.132707] kernel BUG at fs/btrfs/ctree.h:3901!
    (...)
    [29854.132771] Call Trace:
    [29854.132779]  [&lt;ffffffffa0342b5c&gt;] setup_items_for_insert+0x2dc/0x400 [btrfs]
    [29854.132791]  [&lt;ffffffffa0378537&gt;] __btrfs_drop_extents+0xba7/0xdd0 [btrfs]
    [29854.132794]  [&lt;ffffffff8109c0d6&gt;] ? trace_hardirqs_on_caller+0x16/0x1d0
    [29854.132797]  [&lt;ffffffff8109c29d&gt;] ? trace_hardirqs_on+0xd/0x10
    [29854.132800]  [&lt;ffffffff8118e7be&gt;] ? kmem_cache_alloc+0xfe/0x1c0
    [29854.132810]  [&lt;ffffffffa036783b&gt;] insert_reserved_file_extent.constprop.66+0xab/0x310 [btrfs]
    [29854.132820]  [&lt;ffffffffa036a6c6&gt;] __btrfs_prealloc_file_range+0x116/0x340 [btrfs]
    [29854.132830]  [&lt;ffffffffa0374d53&gt;] btrfs_prealloc_file_range+0x23/0x30 [btrfs]
    (...)

So this is caused by getting an -ENOSPC error while punching a file hole, more
specifically, we get -ENOSPC error from __btrfs_drop_extents in the while loop
of file.c:btrfs_punch_hole() when it's unable to modify the btree to delete one
or more file extent items due to lack of enough free space. When this happens,
in btrfs_punch_hole(), we attempt to reclaim free space by switching our transaction
block reservation object to root-&gt;fs_info-&gt;trans_block_rsv, end our transaction and
start a new transaction basically - and, we keep increasing our current offset
(cur_offset) as long as it's smaller than the end of the target range (lockend) -
this makes use leave the loop with cur_offset == drop_end which in turn makes us
call fill_holes() for inserting a file extent item that represents a 0 bytes range
hole (and this insertion succeeds, as in the meanwhile more space became available).

This 0 bytes file hole extent item is a problem because any subsequent caller of
__btrfs_drop_extents (regular file writes, or fallocate calls for e.g.), with a
start file offset that is equal to the offset of the hole, will not remove this
extent item due to the following conditional in the while loop of
__btrfs_drop_extents:

    if (extent_end &lt;= search_start) {
            path-&gt;slots[0]++;
            goto next_slot;
    }

This later makes the call to setup_items_for_insert() (at the very end of
__btrfs_drop_extents), insert a new file extent item with the same offset as
the 0 bytes file hole extent item that follows it. Needless is to say that this
causes chaos, either when reading the leaf from disk (btree_readpage_end_io_hook),
where we perform leaf sanity checks or in subsequent operations that manipulate
file extent items, as in the fallocate call as shown by the dmesg trace above.

Without my other patch to perform the leaf sanity checks once a leaf is marked
as dirty (if the integrity checker is enabled), it would have been much harder
to debug this issue.

This change might fix a few similar issues reported by users in the mailing
list regarding assertion failures in btrfs_set_item_key_safe calls performed
by __btrfs_drop_extents, such as the following report:

    http://comments.gmane.org/gmane.comp.file-systems.btrfs/32938

Asking fill_holes() to create a 0 bytes wide file hole item also produced the
first warning in the trace above, as we passed a range to btrfs_drop_extent_cache
that has an end smaller (by -1) than its start.

On 3.14 kernels this issue manifests itself through leaf corruption, as we get
duplicated file extent item keys in a leaf when calling setup_items_for_insert(),
but on older kernels, setup_items_for_insert() isn't called by __btrfs_drop_extents(),
instead we have callers of __btrfs_drop_extents(), namely the functions
inode.c:insert_inline_extent() and inode.c:insert_reserved_file_extent(), calling
btrfs_insert_empty_item() to insert the new file extent item, which would fail with
error -EEXIST, instead of inserting a duplicated key - which is still a serious
issue as it would make all similar file extent item replace operations keep
failing if they target the same file range.

Signed-off-by: Filipe David Borba Manana &lt;fdmanana@gmail.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2014-04-27T20:26:28+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-27T20:26:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=33c0022f0e687b0161a9bb84a5671df932551e3a'/>
<id>urn:sha1:33c0022f0e687b0161a9bb84a5671df932551e3a</id>
<content type='text'>
Pull btrfs fixes from Chris Mason.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: limit the path size in send to PATH_MAX
  Btrfs: correctly set profile flags on seqlock retry
  Btrfs: use correct key when repeating search for extent item
  Btrfs: fix inode caching vs tree log
  Btrfs: fix possible memory leaks in open_ctree()
  Btrfs: avoid triggering bug_on() when we fail to start inode caching task
  Btrfs: move btrfs_{set,clear}_and_info() to ctree.h
  btrfs: replace error code from btrfs_drop_extents
  btrfs: Change the hole range to a more accurate value.
  btrfs: fix use-after-free in mount_subvol()
</content>
</entry>
<entry>
<title>btrfs: replace error code from btrfs_drop_extents</title>
<updated>2014-04-24T23:43:32+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.cz</email>
</author>
<published>2014-04-15T16:50:17+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3f9e3df8da3c51649c15db249978a10f7374236a'/>
<id>urn:sha1:3f9e3df8da3c51649c15db249978a10f7374236a</id>
<content type='text'>
There's a case which clone does not handle and used to BUG_ON instead,
(testcase xfstests/btrfs/035), now returns EINVAL. This error code is
confusing to the ioctl caller, as it normally signifies errorneous
arguments.

Change it to ENOPNOTSUPP which allows a fall back to copy instead of
clone. This does not affect the common reflink operation.

Signed-off-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>btrfs: Change the hole range to a more accurate value.</title>
<updated>2014-04-24T23:43:32+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-04-15T02:41:00+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c5f7d0bb29df2e1848a236e58e201daf5b4e0f21'/>
<id>urn:sha1:c5f7d0bb29df2e1848a236e58e201daf5b4e0f21</id>
<content type='text'>
Commit 3ac0d7b96a268a98bd474cab8bce3a9f125aaccf fixed the btrfs expanding
write problem but the hole punched is sometimes too large for some
iovec, which has unmapped data ranges.
This patch will change to hole range to a more accurate value using the
counts checked by the write check routines.

Reported-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2014-04-12T21:49:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-12T21:49:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5166701b368caea89d57b14bf41cf39e819dad51'/>
<id>urn:sha1:5166701b368caea89d57b14bf41cf39e819dad51</id>
<content type='text'>
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe-&gt;buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2014-04-11T21:16:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-11T21:16:53+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3123bca71993c2346a458875488863772c1d5dc4'/>
<id>urn:sha1:3123bca71993c2346a458875488863772c1d5dc4</id>
<content type='text'>
Pull second set of btrfs updates from Chris Mason:
 "The most important changes here are from Josef, fixing a btrfs
  regression in 3.14 that can cause corruptions in the extent allocation
  tree when snapshots are in use.

  Josef also fixed some deadlocks in send/recv and other assorted races
  when balance is running"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (23 commits)
  Btrfs: fix compile warnings on on avr32 platform
  btrfs: allow mounting btrfs subvolumes with different ro/rw options
  btrfs: export global block reserve size as space_info
  btrfs: fix crash in remount(thread_pool=) case
  Btrfs: abort the transaction when we don't find our extent ref
  Btrfs: fix EINVAL checks in btrfs_clone
  Btrfs: fix unlock in __start_delalloc_inodes()
  Btrfs: scrub raid56 stripes in the right way
  Btrfs: don't compress for a small write
  Btrfs: more efficient io tree navigation on wait_extent_bit
  Btrfs: send, build path string only once in send_hole
  btrfs: filter invalid arg for btrfs resize
  Btrfs: send, fix data corruption due to incorrect hole detection
  Btrfs: kmalloc() doesn't return an ERR_PTR
  Btrfs: fix snapshot vs nocow writting
  btrfs: Change the expanding write sequence to fix snapshot related bug.
  btrfs: make device scan less noisy
  btrfs: fix lockdep warning with reclaim lock inversion
  Btrfs: hold the commit_root_sem when getting the commit root during send
  Btrfs: remove transaction from send
  ...
</content>
</entry>
<entry>
<title>mm: implement -&gt;map_pages for page cache</title>
<updated>2014-04-07T23:35:53+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2014-04-07T22:37:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f1820361f83d556a7f0a9f629100f3825e594328'/>
<id>urn:sha1:f1820361f83d556a7f0a9f629100f3825e594328</id>
<content type='text'>
filemap_map_pages() is generic implementation of -&gt;map_pages() for
filesystems who uses page cache.

It should be safe to use filemap_map_pages() for -&gt;map_pages() if
filesystem use filemap_fault() for -&gt;fault().

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Ning Qu &lt;quning@gmail.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>btrfs: Change the expanding write sequence to fix snapshot related bug.</title>
<updated>2014-04-07T16:08:42+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-03-27T02:51:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3ac0d7b96a268a98bd474cab8bce3a9f125aaccf'/>
<id>urn:sha1:3ac0d7b96a268a98bd474cab8bce3a9f125aaccf</id>
<content type='text'>
When testing fsstress with snapshot making background, some snapshot
following problem.

Snapshot 270:
inode 323: size 0

Snapshot 271:
inode 323: size 349145
|-------Hole---|---------Empty gap-------|-------Hole-----|
0	    122880			172032	      349145

Snapshot 272:
inode 323: size 349145
|-------Hole---|------------Data---------|-------Hole-----|
0	    122880			172032	      349145

The fsstress operation on inode 323 is the following:
write: 		offset 	126832 	len 43124
truncate: 	size 	349145

Since the write with offset is consist of 2 operations:
1. punch hole
2. write data
Hole punching is faster than data write, so hole punching in write
and truncate is done first and then buffered write, so the snapshot 271 got
empty gap, which will not pass btrfsck.

To fix the bug, this patch will change the write sequence which will
first punch a hole covering the write end if a hole is needed.

Reported-by: Gui Hecheng &lt;guihc.fnst@cn.fujitsu.com&gt;
Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2014-04-04T22:31:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-04T22:31:36+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=53c566625fb872e7826a237f0f5c21458028e94a'/>
<id>urn:sha1:53c566625fb872e7826a237f0f5c21458028e94a</id>
<content type='text'>
Pull btrfs changes from Chris Mason:
 "This is a pretty long stream of bug fixes and performance fixes.

  Qu Wenruo has replaced the btrfs async threads with regular kernel
  workqueues.  We'll keep an eye out for performance differences, but
  it's nice to be using more generic code for this.

  We still have some corruption fixes and other patches coming in for
  the merge window, but this batch is tested and ready to go"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (108 commits)
  Btrfs: fix a crash of clone with inline extents's split
  btrfs: fix uninit variable warning
  Btrfs: take into account total references when doing backref lookup
  Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
  Btrfs: fix incremental send's decision to delay a dir move/rename
  Btrfs: remove unnecessary inode generation lookup in send
  Btrfs: fix race when updating existing ref head
  btrfs: Add trace for btrfs_workqueue alloc/destroy
  Btrfs: less fs tree lock contention when using autodefrag
  Btrfs: return EPERM when deleting a default subvolume
  Btrfs: add missing kfree in btrfs_destroy_workqueue
  Btrfs: cache extent states in defrag code path
  Btrfs: fix deadlock with nested trans handles
  Btrfs: fix possible empty list access when flushing the delalloc inodes
  Btrfs: split the global ordered extents mutex
  Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock
  Btrfs: reclaim delalloc metadata more aggressively
  Btrfs: remove unnecessary lock in may_commit_transaction()
  Btrfs: remove the unnecessary flush when preparing the pages
  Btrfs: just do dirty page flush for the inode with compression before direct IO
  ...
</content>
</entry>
</feed>
