diff options
author | Trond Myklebust <Trond.Myklebust@netapp.com> | 2007-01-10 23:15:39 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.osdl.org> | 2007-01-11 18:18:21 -0800 |
commit | e3db7691e9f3dff3289f64e3d98583e28afe03db (patch) | |
tree | e05542d8d8bb545545c5b535381a8c1fcb369a03 /Documentation/filesystems/Locking | |
parent | 07031e14c1127fc7e1a5b98dfcc59f434e025104 (diff) | |
download | lwn-e3db7691e9f3dff3289f64e3d98583e28afe03db.tar.gz lwn-e3db7691e9f3dff3289f64e3d98583e28afe03db.zip |
[PATCH] NFS: Fix race in nfs_release_page()
NFS: Fix race in nfs_release_page()
invalidate_inode_pages2() may find the dirty bit has been set on a page
owing to the fact that the page may still be mapped after it was locked.
Only after the call to unmap_mapping_range() are we sure that the page
can no longer be dirtied.
In order to fix this, NFS has hooked the releasepage() method and tries
to write the page out between the call to unmap_mapping_range() and the
call to remove_mapping(). This, however leads to deadlocks in the page
reclaim code, where the page may be locked without holding a reference
to the inode or dentry.
Fix is to add a new address_space_operation, launder_page(), which will
attempt to write out a dirty page without releasing the page lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Also, the bare SetPageDirty() can skew all sort of accounting leading to
other nasties.
[akpm@osdl.org: cleanup]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'Documentation/filesystems/Locking')
-rw-r--r-- | Documentation/filesystems/Locking | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 790ef6fbe495..28bfea75bcf2 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -171,6 +171,7 @@ prototypes: int (*releasepage) (struct page *, int); int (*direct_IO)(int, struct kiocb *, const struct iovec *iov, loff_t offset, unsigned long nr_segs); + int (*launder_page) (struct page *); locking rules: All except set_page_dirty may block @@ -188,6 +189,7 @@ bmap: yes invalidatepage: no yes releasepage: no yes direct_IO: no +launder_page: no yes ->prepare_write(), ->commit_write(), ->sync_page() and ->readpage() may be called from the request handler (/dev/loop). @@ -281,6 +283,12 @@ buffers from the page in preparation for freeing it. It returns zero to indicate that the buffers are (or may be) freeable. If ->releasepage is zero, the kernel assumes that the fs has no private interest in the buffers. + ->launder_page() may be called prior to releasing a page if +it is still found to be dirty. It returns zero if the page was successfully +cleaned, or an error value if not. Note that in order to prevent the page +getting mapped back in and redirtied, it needs to be kept locked +across the entire operation. + Note: currently almost all instances of address_space methods are using BKL for internal serialization and that's one of the worst sources of contention. Normally they are calling library functions (in fs/buffer.c) |