lwn.git/fs/btrfs/transaction.h, branch v3.18.26

Btrfs: make sure logged extents complete in the current transaction V3

2015-01-08T18:30:30+00:00

commit 50d9aa99bd35c77200e0e3dd7a72274f8304701f upstream. Liu Bo pointed out that my previous fix would lose the generation update in the scenario I described. It is actually much worse than that, we could lose the entire extent if we lose power right after the transaction commits. Consider the following write extent 0-4k log extent in log tree commit transaction < power fail happens here ordered extent completes We would lose the 0-4k extent because it hasn't updated the actual fs tree, and the transaction commit will reset the log so it isn't replayed. If we lose power before the transaction commit we are save, otherwise we are not. Fix this by keeping track of all extents we logged in this transaction. Then when we go to commit the transaction make sure we wait for all of those ordered extents to complete before proceeding. This will make sure that if we lose power after the transaction commit we still have our data. This also fixes the problem of the improperly updated extent generation. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman

btrfs: hide typecast to definition of BTRFS_SEND_TRANS_STUB

2014-10-02T15:30:31+00:00

Signed-off-by: David Sterba

btrfs: disable strict file flushes for renames and truncates

2014-08-15T14:43:42+00:00

Truncates and renames are often used to replace old versions of a file with new versions. Applications often expect this to be an atomic replacement, even if they haven't done anything to make sure the new version is fully on disk. Btrfs has strict flushing in place to make sure that renaming over an old file with a new file will fully flush out the new file before allowing the transaction commit with the rename to complete. This ordering means the commit code needs to be able to lock file pages, and there are a few paths in the filesystem where we will try to end a transaction with the page lock held. It's rare, but these things can deadlock. This patch removes the ordered flushes and switches to a best effort filemap_flush like ext4 uses. It's not perfect, but it should fix the deadlocks. Signed-off-by: Chris Mason

Btrfs: add sanity tests for new qgroup accounting code

2014-06-10T00:20:49+00:00

This exercises the various parts of the new qgroup accounting code. We do some basic stuff and do some things with the shared refs to make sure all that code works. I had to add a bunch of infrastructure because I needed to be able to insert items into a fake tree without having to do all the hard work myself, hopefully this will be usefull in the future. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason

Btrfs: remove transaction from send

2014-04-07T00:39:30+00:00

Lets try this again. We can deadlock the box if we send on a box and try to write onto the same fs with the app that is trying to listen to the send pipe. This is because the writer could get stuck waiting for a transaction commit which is being blocked by the send. So fix this by making sure looking at the commit roots is always going to be consistent. We do this by keeping track of which roots need to have their commit roots swapped during commit, and then taking the commit_root_sem and swapping them all at once. Then make sure we take a read lock on the commit_root_sem in cases where we search the commit root to make sure we're always looking at a consistent view of the commit roots. Previously we had problems with this because we would swap a fs tree commit root and then swap the extent tree commit root independently which would cause the backref walking code to screw up sometimes. With this patch we no longer deadlock and pass all the weird send/receive corner cases. Thanks, Reportedy-by: Hugo Mills Signed-off-by: Josef Bacik Signed-off-by: Chris Mason

Btrfs: don't clear uptodate if the eb is under IO

2014-04-07T00:34:37+00:00

So I have an awful exercise script that will run snapshot, balance and send/receive in parallel. This sometimes would crash spectacularly and when it came back up the fs would be completely hosed. Turns out this is because of a bad interaction of balance and send/receive. Send will hold onto its entire path for the whole send, but its blocks could get relocated out from underneath it, and because it doesn't old tree locks theres nothing to keep this from happening. So it will go to read in a slot with an old transid, and we could have re-allocated this block for something else and it could have a completely different transid. But because we think it is invalid we clear uptodate and re-read in the block. If we do this before we actually write out the new block we could write back stale data to the fs, and boom we're screwed. Now we definitely need to fix this disconnect between send and balance, but we really really need to not allow ourselves to accidently read in stale data over new data. So make sure we check if the extent buffer is not under io before clearing uptodate, this will kick back EIO to the caller instead of reading in stale data and keep us from corrupting the fs. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason

Btrfs: make fsync latency less sucky

2014-01-28T21:20:25+00:00

Looking into some performance related issues with large amounts of metadata revealed that we can have some pretty huge swings in fsync() performance. If we have a lot of delayed refs backed up (as you will tend to do with lots of metadata) fsync() will wander off and try to run some of those delayed refs which can result in reading from disk and such. Since the actual act of fsync() doesn't create any delayed refs there is no need to make it throttle on delayed ref stuff, that will be handled by other people. With this patch we get much smoother fsync performance with large amounts of metadata. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason

Btrfs: remove btrfs_end_transaction_dmeta()

2014-01-28T21:20:08+00:00

Two reasons: - btrfs_end_transaction_dmeta() is the same as btrfs_end_transaction_throttle() so it is unnecessary. - All the delayed items should be dealt in the current transaction, so the workers should not commit the transaction, instead, deal with the delayed items as many as possible. So we can remove btrfs_end_transaction_dmeta() Signed-off-by: Miao Xie Signed-off-by: Chris Mason

Btrfs: fix BUG_ON() casued by the reserved space migration

2013-11-12T02:54:28+00:00

When we did space balance and snapshot creation at the same time, we might meet the following oops: kernel BUG at fs/btrfs/inode.c:3038! [SNIP] Call Trace: [] btrfs_orphan_cleanup+0x293/0x407 [btrfs] [] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs] [] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs] [] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs] [] btrfs_ioctl+0x414/0x1854 [btrfs] [] ? __do_page_fault+0x305/0x379 [] vfs_ioctl+0x1d/0x39 [] do_vfs_ioctl+0x32d/0x3e2 [] ? finish_task_switch+0x80/0xb8 [] SyS_ioctl+0x57/0x83 [] ? do_device_not_available+0x12/0x14 [] system_call_fastpath+0x16/0x1b [SNIP] RIP [] btrfs_orphan_add+0xc3/0x126 [btrfs] The reason of the problem is that the relocation root creation stole the reserved space, which was reserved for orphan item deletion. There are several ways to fix this problem, one is to increasing the reserved space size of the space balace, and then we can use that space to create the relocation tree for each fs/file trees. But it is hard to calculate the suitable size because we doesn't know how many fs/file trees we need relocate. We fixed this problem by reserving the space for relocation root creation actively since the space it need is very small (one tree block, used for root node copy), then we use that reserved space to create the relocation tree. If we don't reserve space for relocation tree creation, we will use the reserved space of the balance. Signed-off-by: Miao Xie Signed-off-by: Josef Bacik Signed-off-by: Chris Mason

Btrfs: fix two use-after-free bugs with transaction cleanup

2013-11-12T02:54:03+00:00

I was noticing the slab redzone stuff going off every once and a while during transaction aborts. This was caused by two things 1) We would walk the pending snapshots and set their error to -ECANCELED. We don't need to do this, the snapshot stuff waits for a transaction commit and if there is a problem we just free our pending snapshot object and exit. Doing this was causing us to touch the pending snapshot object after the thing had already been freed. 2) We were freeing the transaction manually with wanton disregard for it's use_count reference counter. To fix this I cleaned up the transaction freeing loop to either wait for the transaction commit to finish if it was in the middle of that (since it will be cleaned and freed up there) or to do the cleanup oursevles. I also moved the global "kill all things dirty everywhere" stuff outside of the transaction cleanup loop since that only needs to be done once. With this patch I'm no longer seeing slab corruption because of use after frees. Thanks, Signed-off-by: Josef Bacik Signed-off-by: Chris Mason