<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/mm/hugetlb_cgroup.c, branch docs-4.16</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-4.16</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-4.16'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2016-05-21T00:58:30+00:00</updated>
<entry>
<title>mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2016-05-20T23:57:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=297880f4af4e492ed5084be9397d65a18ade56ee'/>
<id>urn:sha1:297880f4af4e492ed5084be9397d65a18ade56ee</id>
<content type='text'>
The page_counter rounds limits down to page size values.  This makes
sense, except in the case of hugetlb_cgroup where it's not possible to
charge partial hugepages.  If the hugetlb_cgroup margin is less than the
hugepage size being charged, it will fail as expected.

Round the hugetlb_cgroup limit down to hugepage size, since it is the
effective limit of the cgroup.

For consistency, round down PAGE_COUNTER_MAX as well when a
hugetlb_cgroup is created: this prevents error reports when a user
cannot restore the value to the kernel default.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Nikolay Borisov &lt;kernel@kyup.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make compound_head() robust</title>
<updated>2015-11-07T01:50:42+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-11-07T00:29:54+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1d798ca3f16437c71ff63e36597ff07f9c12e4d6'/>
<id>urn:sha1:1d798ca3f16437c71ff63e36597ff07f9c12e4d6</id>
<content type='text'>
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail-&gt;first_page = NULL
      head = tail-&gt;first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail-&gt;first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    &lt;head == NULL dereferencing&gt;

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page-&gt;compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.

The patch moves page-&gt;pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.

hugetlb_cgroup uses page-&gt;lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page-&gt;private
in the second tail page instead. The space is free since -&gt;first_page is
removed from the union.

The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.

That means page-&gt;compound_head shares storage space with:

 - page-&gt;lru.next;
 - page-&gt;next;
 - page-&gt;rcu_head.next;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.

page-&gt;rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().

[1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_counter: let page_counter_try_charge() return bool</title>
<updated>2015-11-06T03:34:48+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2015-11-06T02:50:26+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6071ca5201066f4b2a61cfb693dd186d6bc6e9f3'/>
<id>urn:sha1:6071ca5201066f4b2a61cfb693dd186d6bc6e9f3</id>
<content type='text'>
page_counter_try_charge() currently returns 0 on success and -ENOMEM on
failure, which is surprising behavior given the function name.

Make it follow the expected pattern of try_stuff() functions that return a
boolean true to indicate success, or false for failure.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Vladimir Davydov &lt;vdavydov@virtuozzo.com
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_counter: pull "-1" handling out of page_counter_memparse()</title>
<updated>2015-02-12T01:06:02+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2015-02-11T23:26:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=650c5e565492f9092552bfe4d65935196c7d9567'/>
<id>urn:sha1:650c5e565492f9092552bfe4d65935196c7d9567</id>
<content type='text'>
The unified hierarchy interface for memory cgroups will no longer use "-1"
to mean maximum possible resource value.  In preparation for this, make
the string an argument and let the caller supply it.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: hugetlb_cgroup: convert to lockless page counters</title>
<updated>2014-12-11T01:41:04+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-12-10T23:42:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=71f87bee38edddb21d97895fa938744cf3f477bb'/>
<id>urn:sha1:71f87bee38edddb21d97895fa938744cf3f477bb</id>
<content type='text'>
Abandon the spinlock-protected byte counters in favor of the unlocked
page counters in the hugetlb controller as well.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked</title>
<updated>2014-08-29T23:28:16+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2014-08-29T22:18:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7ea8574e5fa31f43d8098a028f12ba6a9c9f3530'/>
<id>urn:sha1:7ea8574e5fa31f43d8098a028f12ba6a9c9f3530</id>
<content type='text'>
spin_lock may be an empty struct for !SMP configurations and so
arch_spin_is_locked may return unconditional 0 and trigger the VM_BUG_ON
even when the lock is held.

Replace spin_is_locked by lockdep_assert_held.  We will not BUG anymore
but it is questionable whether crashing makes a lot of sense in the
uncharge path.  Uncharge happens after the last page reference was
released so nobody should touch the page and the function doesn't update
any shared state except for res counter which uses synchronization of
its own.

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Reviewed-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, hugetlb_cgroup: align hugetlb cgroup limit to hugepage size</title>
<updated>2014-08-14T16:56:15+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2014-08-12T20:46:07+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=24d7cd207f8e0d6864a314d5f03b5434403ff940'/>
<id>urn:sha1:24d7cd207f8e0d6864a314d5f03b5434403ff940</id>
<content type='text'>
Memcg aligns memory.limit_in_bytes to PAGE_SIZE as part of the resource
counter since it makes no sense to allow a partial page to be charged.

As a result of the hugetlb cgroup using the resource counter, it is also
aligned to PAGE_SIZE but makes no sense unless aligned to the size of
the hugepage being limited.

Align hugetlb cgroup limit to hugepage size.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()</title>
<updated>2014-07-15T15:05:09+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2014-07-15T15:05:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2cf669a58dc08fa065a8bd0dca866c0e6cb358cc'/>
<id>urn:sha1:2cf669a58dc08fa065a8bd0dca866c0e6cb358cc</id>
<content type='text'>
Currently, cftypes added by cgroup_add_cftypes() are used for both the
unified default hierarchy and legacy ones and subsystems can mark each
file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
appear only on one of them.  This is quite hairy and error-prone.
Also, we may end up exposing interface files to the default hierarchy
without thinking it through.

cgroup_subsys will grow two separate cftype addition functions and
apply each only on the hierarchies of the matching type.  This will
allow organizing cftypes in a lot clearer way and encourage subsystems
to scrutinize the interface which is being exposed in the new default
hierarchy.

In preparation, this patch adds cgroup_add_legacy_cftypes() which
currently is a simple wrapper around cgroup_add_cftypes() and replaces
all cgroup_add_cftypes() usages with it.

While at it, this patch drops a completely spurious return from
__hugetlb_cgroup_file_init().

This patch doesn't introduce any functional differences.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>cgroup: remove css_parent()</title>
<updated>2014-05-16T17:22:48+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2014-05-16T17:22:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5c9d535b893f30266ea29fe377cb9b002fcd76aa'/>
<id>urn:sha1:5c9d535b893f30266ea29fe377cb9b002fcd76aa</id>
<content type='text'>
cgroup in general is moving towards using cgroup_subsys_state as the
fundamental structural component and css_parent() was introduced to
convert from using cgroup-&gt;parent to css-&gt;parent.  It was quite some
time ago and we're moving forward with making css more prominent.

This patch drops the trivial wrapper css_parent() and let the users
dereference css-&gt;parent.  While at it, explicitly mark fields of css
which are public and immutable.

v2: New usage from device_cgroup.c converted.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Acked-by: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cftype-&gt;trigger() with cftype-&gt;write()</title>
<updated>2014-05-13T16:16:21+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2014-05-13T16:16:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6770c64e5c8da4705d1f0973bdeb5c2bf4f3a404'/>
<id>urn:sha1:6770c64e5c8da4705d1f0973bdeb5c2bf4f3a404</id>
<content type='text'>
cftype-&gt;trigger() is pointless.  It's trivial to ignore the input
buffer from a regular -&gt;write() operation.  Convert all -&gt;trigger()
users to -&gt;write() and remove -&gt;trigger().

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
</content>
</entry>
</feed>
