summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWaiman Long <longman@redhat.com>2019-11-30 17:56:49 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2019-12-01 12:59:08 -0800
commit930668c34408ba983049322e04f13f03b6f1fafa (patch)
tree70b658341e9392125e05254ec17f8c956f2d17bd
parent1ab5b82f540b31852fbf4a3c975f3c16e0e76b9f (diff)
downloadlwn-930668c34408ba983049322e04f13f03b6f1fafa.tar.gz
lwn-930668c34408ba983049322e04f13f03b6f1fafa.zip
hugetlbfs: take read_lock on i_mmap for PMD sharing
A customer with large SMP systems (up to 16 sockets) with application that uses large amount of static hugepages (~500-1500GB) are experiencing random multisecond delays. These delays were caused by the long time it took to scan the VMA interval tree with mmap_sem held. The sharing of huge PMD does not require changes to the i_mmap at all. Therefore, we can just take the read lock and let other threads searching for the right VMA share it in parallel. Once the right VMA is found, either the PMD lock (2M huge page for x86-64) or the mm->page_table_lock will be acquired to perform the actual PMD sharing. Lock contention, if present, will happen in the spinlock. That is much better than contention in the rwsem where the time needed to scan the the interval tree is indeterminate. With this patch applied, the customer is seeing significant performance improvement over the unpatched kernel. Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--mm/hugetlb.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 39579f98d6f3..18c92cb9bf43 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4769,7 +4769,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
if (!vma_shareable(vma, addr))
return (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_lock_write(mapping);
+ i_mmap_lock_read(mapping);
vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -4799,7 +4799,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
spin_unlock(ptl);
out:
pte = (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_unlock_write(mapping);
+ i_mmap_unlock_read(mapping);
return pte;
}