mm: unclutter THP migration

THP migration is hacked into the generic migration with rather surprising semantic. The migration allocation callback is supposed to check whether the THP can be migrated at once and if that is not the case then it allocates a simple page to migrate. unmap_and_move then fixes that up by spliting the THP into small pages while moving the head page to the newly allocated order-0 page. Remaning pages are moved to the LRU list by split_huge_page. The same happens if the THP allocation fails. This is really ugly and error prone [1]. I also believe that split_huge_page to the LRU lists is inherently wrong because all tail pages are not migrated. Some callers will just work around that by retrying (e.g. memory hotplug). There are other pfn walkers which are simply broken though. e.g. madvise_inject_error will migrate head and then advances next pfn by the huge page size. do_move_page_to_node_array, queue_pages_range (migrate_pages, mbind), will simply split the THP before migration if the THP migration is not supported then falls back to single page migration but it doesn't handle tail pages if the THP migration path is not able to allocate a fresh THP so we end up with ENOMEM and fail the whole migration which is a questionable behavior. Page compaction doesn't try to migrate large pages so it should be immune. This patch tries to unclutter the situation by moving the special THP handling up to the migrate_pages layer where it actually belongs. We simply split the THP page into the existing list if unmap_and_move fails with ENOMEM and retry. So we will _always_ migrate all THP subpages and specific migrate_pages users do not have to deal with this case in a special way. [1] http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com Link: http://lkml.kernel.org/r/20180103082555.14592-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Michal Hocko <mhocko@suse.com> 2018-04-10 16:30:07 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2018-04-11 10:28:32 -0700
commit: 94723aafb9e76414fada7c1c198733a86f01ea8f (patch)
tree: c6e12fbe38b5e22ec37e6e4b9bbe1faaae3a354b /mm/huge_memory.c
parent: 666feb21a0083e5b29ddd96588553ffa0cc357b6 (diff)
download: lwn-94723aafb9e76414fada7c1c198733a86f01ea8f.tar.gz
lwn-94723aafb9e76414fada7c1c198733a86f01ea8f.zip
1 files changed, 6 insertions, 0 deletions
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 229ab8c75a6b..3f3267af4e3b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2401,6 +2401,12 @@ static void __split_huge_page_tail(struct page *head, int tail,
 
 	page_tail->index = head->index + tail;
 	page_cpupid_xchg_last(page_tail, page_cpupid_last(head));
+
+	/*
+	 * always add to the tail because some iterators expect new
+	 * pages to show after the currently processed elements - e.g.
+	 * migrate_pages
+	 */
 	lru_add_page_tail(head, page_tail, lruvec, list);
 }
author	Michal Hocko <mhocko@suse.com>	2018-04-10 16:30:07 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:32 -0700
commit	94723aafb9e76414fada7c1c198733a86f01ea8f (patch)
tree	c6e12fbe38b5e22ec37e6e4b9bbe1faaae3a354b /mm/huge_memory.c
parent	666feb21a0083e5b29ddd96588553ffa0cc357b6 (diff)
download	lwn-94723aafb9e76414fada7c1c198733a86f01ea8f.tar.gz lwn-94723aafb9e76414fada7c1c198733a86f01ea8f.zip