summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorHuang Ying <ying.huang@intel.com>2017-11-02 15:59:50 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2017-11-03 07:39:19 -0700
commit2628bd6fc052bd85e9864dae4de494d8a6313391 (patch)
tree109b74ba01ac402232f89d5650bacf1f761dc5a0 /include
parentdd8a67f9a37c74b61e5e050924ceec9ffb4f8c3c (diff)
downloadlwn-2628bd6fc052bd85e9864dae4de494d8a6313391.tar.gz
lwn-2628bd6fc052bd85e9864dae4de494d8a6313391.zip
mm, swap: fix race between swap count continuation operations
One page may store a set of entries of the sis->swap_map (swap_info_struct->swap_map) in multiple swap clusters. If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX, multiple pages will be used to store the set of entries of the sis->swap_map. And the pages are linked with page->lru. This is called swap count continuation. To access the pages which store the set of entries of the sis->swap_map simultaneously, previously, sis->lock is used. But to improve the scalability of __swap_duplicate(), swap cluster lock may be used in swap_count_continued() now. This may race with add_swap_count_continuation() which operates on a nearby swap cluster, in which the sis->swap_map entries are stored in the same page. The race can cause wrong swap count in practice, thus cause unfreeable swap entries or software lockup, etc. To fix the race, a new spin lock called cont_lock is added to struct swap_info_struct to protect the swap count continuation page list. This is a lock at the swap device level, so the scalability isn't very well. But it is still much better than the original sis->lock, because it is only acquired/released when swap count continuation is used. Which is considered rare in practice. If it turns out that the scalability becomes an issue for some workloads, we can split the lock into some more fine grained locks. Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com Fixes: 235b62176712 ("mm/swap: add cluster lock") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shaohua Li <shli@kernel.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r--include/linux/swap.h4
1 files changed, 4 insertions, 0 deletions
diff --git a/include/linux/swap.h b/include/linux/swap.h
index b489bd77bbdc..f02fb5db8914 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -266,6 +266,10 @@ struct swap_info_struct {
* both locks need hold, hold swap_lock
* first.
*/
+ spinlock_t cont_lock; /*
+ * protect swap count continuation page
+ * list.
+ */
struct work_struct discard_work; /* discard worker */
struct swap_cluster_list discard_clusters; /* discard clusters list */
};