lwn.git/mm/hugetlb_cgroup.c, branch docs-4.16

mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

2016-05-21T00:58:30+00:00

The page_counter rounds limits down to page size values. This makes sense, except in the case of hugetlb_cgroup where it's not possible to charge partial hugepages. If the hugetlb_cgroup margin is less than the hugepage size being charged, it will fail as expected. Round the hugetlb_cgroup limit down to hugepage size, since it is the effective limit of the cgroup. For consistency, round down PAGE_COUNTER_MAX as well when a hugetlb_cgroup is created: this prevents error reports when a user cannot restore the value to the kernel default. Signed-off-by: David Rientjes Cc: Michal Hocko Cc: Nikolay Borisov Cc: Johannes Weiner Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: make compound_head() robust

2015-11-07T01:50:42+00:00

Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0 CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail->first_page = NULL head = tail->first_page alloc_pages(__GFP_COMP) prep_compound_page() tail->first_page = head __SetPageTail(p); !!PageTail() == true The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page->compound_head into third double word block in front of compound_dtor and compound_order. Bit 0 encodes PageTail() and the rest bits are pointer to head page if bit zero is set. The patch moves page->pmd_huge_pte out of word, just in case if an architecture defines pgtable_t into something what can have the bit 0 set. hugetlb_cgroup uses page->lru.next in the second tail page to store pointer struct hugetlb_cgroup. The patch switch it to use page->private in the second tail page instead. The space is free since ->first_page is removed from the union. The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER limitation, since there's now space in first tail page to store struct hugetlb_cgroup pointer. But that's out of scope of the patch. That means page->compound_head shares storage space with: - page->lru.next; - page->next; - page->rcu_head.next; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future call_rcu_lazy() is not allowed as it makes use of the bit and we can get false positive PageTail(). [1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com Signed-off-by: Kirill A. Shutemov Acked-by: Michal Hocko Reviewed-by: Andrea Arcangeli Cc: Hugh Dickins Cc: David Rientjes Cc: Vlastimil Babka Acked-by: Paul E. McKenney Cc: Aneesh Kumar K.V Cc: Andi Kleen Cc: Christoph Lameter Cc: Joonsoo Kim Cc: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: page_counter: let page_counter_try_charge() return bool

2015-11-06T03:34:48+00:00

page_counter_try_charge() currently returns 0 on success and -ENOMEM on failure, which is surprising behavior given the function name. Make it follow the expected pattern of try_stuff() functions that return a boolean true to indicate success, or false for failure. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Vladimir Davydov Signed-off-by: Linus Torvalds

mm: page_counter: pull "-1" handling out of page_counter_memparse()

2015-02-12T01:06:02+00:00

The unified hierarchy interface for memory cgroups will no longer use "-1" to mean maximum possible resource value. In preparation for this, make the string an argument and let the caller supply it. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Vladimir Davydov Cc: Greg Thelen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: hugetlb_cgroup: convert to lockless page counters

2014-12-11T01:41:04+00:00

Abandon the spinlock-protected byte counters in favor of the unlocked page counters in the hugetlb controller as well. Signed-off-by: Johannes Weiner Reviewed-by: Vladimir Davydov Acked-by: Michal Hocko Cc: Tejun Heo Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked

2014-08-29T23:28:16+00:00

spin_lock may be an empty struct for !SMP configurations and so arch_spin_is_locked may return unconditional 0 and trigger the VM_BUG_ON even when the lock is held. Replace spin_is_locked by lockdep_assert_held. We will not BUG anymore but it is questionable whether crashing makes a lot of sense in the uncharge path. Uncharge happens after the last page reference was released so nobody should touch the page and the function doesn't update any shared state except for res counter which uses synchronization of its own. Signed-off-by: Michal Hocko Reviewed-by: Aneesh Kumar K.V Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm, hugetlb_cgroup: align hugetlb cgroup limit to hugepage size

2014-08-14T16:56:15+00:00

Memcg aligns memory.limit_in_bytes to PAGE_SIZE as part of the resource counter since it makes no sense to allow a partial page to be charged. As a result of the hugetlb cgroup using the resource counter, it is also aligned to PAGE_SIZE but makes no sense unless aligned to the size of the hugepage being limited. Align hugetlb cgroup limit to hugepage size. Signed-off-by: David Rientjes Acked-by: Michal Hocko Cc: "Aneesh Kumar K.V" Cc: Tejun Heo Cc: Li Zefan Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()

2014-07-15T15:05:09+00:00

Currently, cftypes added by cgroup_add_cftypes() are used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype addition functions and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch adds cgroup_add_legacy_cftypes() which currently is a simple wrapper around cgroup_add_cftypes() and replaces all cgroup_add_cftypes() usages with it. While at it, this patch drops a completely spurious return from __hugetlb_cgroup_file_init(). This patch doesn't introduce any functional differences. Signed-off-by: Tejun Heo Acked-by: Neil Horman Acked-by: Li Zefan Cc: Johannes Weiner Cc: Michal Hocko Cc: Aneesh Kumar K.V

cgroup: remove css_parent()

2014-05-16T17:22:48+00:00

cgroup in general is moving towards using cgroup_subsys_state as the fundamental structural component and css_parent() was introduced to convert from using cgroup->parent to css->parent. It was quite some time ago and we're moving forward with making css more prominent. This patch drops the trivial wrapper css_parent() and let the users dereference css->parent. While at it, explicitly mark fields of css which are public and immutable. v2: New usage from device_cgroup.c converted. Signed-off-by: Tejun Heo Acked-by: Michal Hocko Acked-by: Neil Horman Acked-by: "David S. Miller" Acked-by: Li Zefan Cc: Vivek Goyal Cc: Jens Axboe Cc: Peter Zijlstra Cc: Johannes Weiner

cgroup: replace cftype->trigger() with cftype->write()

2014-05-13T16:16:21+00:00

cftype->trigger() is pointless. It's trivial to ignore the input buffer from a regular ->write() operation. Convert all ->trigger() users to ->write() and remove ->trigger(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo Acked-by: Li Zefan Cc: Johannes Weiner Cc: Michal Hocko