xfs: add missing ilock around dio write last extent alignment

The iomap codepath (via get_blocks()) acquires and release the inode lock in the case of a direct write that requires block allocation. This is because xfs_iomap_write_direct() allocates a transaction, which means the ilock must be dropped and reacquired after the transaction is allocated and reserved. xfs_iomap_write_direct() invokes xfs_iomap_eof_align_last_fsb() before the transaction is created and thus before the ilock is reacquired. This can lead to calls to xfs_iread_extents() and reads of the in-core extent list without any synchronization (via xfs_bmap_eof() and xfs_bmap_last_extent()). xfs_iread_extents() assert fails if the ilock is not held, but this is not currently seen in practice as the current callers had already invoked xfs_bmapi_read(). What has been seen in practice are reports of crashes down in the xfs_bmap_eof() codepath on direct writes due to seemingly bogus pointer references from xfs_iext_get_ext(). While an explicit reproducer is not currently available to confirm the cause of the problem, crash analysis and code inspection from David Jeffrey had identified the insufficient locking. xfs_iomap_eof_align_last_fsb() is called from other contexts with the inode lock already held, so we cannot acquire it therein. __xfs_get_blocks() acquires and drops the ilock with variable flags to cover the event that the extent list must be read in. The common case is that __xfs_get_blocks() acquires the shared ilock. To provide locking around the last extent alignment call without adding more lock cycles to the dio path, update xfs_iomap_write_direct() to expect the shared ilock held on entry and do the extent alignment under its protection. Demote the lock, if necessary, from __xfs_get_blocks() and push the xfs_qm_dqattach() call outside of the shared lock critical section. Also, add an assert to document that the extent list is always expected to be present in this path. Otherwise, we risk a call to xfs_iread_extents() while under the shared ilock. This is safe as all current callers have executed an xfs_bmapi_read() call under the current iolock context. Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
author: Brian Foster <bfoster@redhat.com> 2015-10-12 15:34:20 +1100
committer: Dave Chinner <david@fromorbit.com> 2015-10-12 15:34:20 +1100
commit: 009c6e871e98aa23bc2e58474c3d9feb05dd1ae6 (patch)
tree: 5af6b1c62320e3643bfbc895f9ee9fbe93103ba4 /fs/xfs/xfs_aops.c
parent: 5cb13dcd0fac071b45c4bebe1801a08ff0d89cad (diff)
download: lwn-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.tar.gz
lwn-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.zip
1 files changed, 5 insertions, 5 deletions
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e485e31813fa..e4fff5898c1c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1408,12 +1408,12 @@ __xfs_get_blocks(
 	      imap.br_startblock == DELAYSTARTBLOCK))) {
 		if (direct || xfs_get_extsz_hint(ip)) {
 			/*
-			 * Drop the ilock in preparation for starting the block
-			 * allocation transaction.  It will be retaken
-			 * exclusively inside xfs_iomap_write_direct for the
-			 * actual allocation.
+			 * xfs_iomap_write_direct() expects the shared lock. It
+			 * is unlocked on return.
 			 */
-			xfs_iunlock(ip, lockmode);
+			if (lockmode == XFS_ILOCK_EXCL)
+				xfs_ilock_demote(ip, lockmode);
+
 			error = xfs_iomap_write_direct(ip, offset, size,
 						       &imap, nimaps);
 			if (error)
author	Brian Foster <bfoster@redhat.com>	2015-10-12 15:34:20 +1100
committer	Dave Chinner <david@fromorbit.com>	2015-10-12 15:34:20 +1100
commit	009c6e871e98aa23bc2e58474c3d9feb05dd1ae6 (patch)
tree	5af6b1c62320e3643bfbc895f9ee9fbe93103ba4 /fs/xfs/xfs_aops.c
parent	5cb13dcd0fac071b45c4bebe1801a08ff0d89cad (diff)
download	lwn-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.tar.gz lwn-009c6e871e98aa23bc2e58474c3d9feb05dd1ae6.zip