<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/kernel/cgroup/cgroup.c, branch docs-mw</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-mw</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-mw'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2026-04-19T15:01:17+00:00</updated>
<entry>
<title>Merge tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2026-04-19T15:01:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-19T15:01:17+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=40735a683bf844a453d7a0f91e5e3daa0abc659b'/>
<id>urn:sha1:40735a683bf844a453d7a0f91e5e3daa0abc659b</id>
<content type='text'>
Pull more MM updates from Andrew Morton:

 - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song)

   Address the longstanding "dying memcg problem". A situation wherein a
   no-longer-used memory control group will hang around for an extended
   period pointlessly consuming memory

 - "fix unexpected type conversions and potential overflows" (Qi Zheng)

   Fix a couple of potential 32-bit/64-bit issues which were identified
   during review of the "Eliminate Dying Memory Cgroup" series

 - "kho: history: track previous kernel version and kexec boot count"
   (Breno Leitao)

   Use Kexec Handover (KHO) to pass the previous kernel's version string
   and the number of kexec reboots since the last cold boot to the next
   kernel, and print it at boot time

 - "liveupdate: prevent double preservation" (Pasha Tatashin)

   Teach LUO to avoid managing the same file across different active
   sessions

 - "liveupdate: Fix module unloading and unregister API" (Pasha
   Tatashin)

   Address an issue with how LUO handles module reference counting and
   unregistration during module unloading

 - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar)

   Simplify and clean up the zswap crypto compression handling and
   improve the lifecycle management of zswap pool's per-CPU acomp_ctx
   resources

 - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race"
   (SeongJae Park)

   Address unlikely but possible leaks and deadlocks in damon_call() and
   damon_walk()

 - "mm/damon/core: validate damos_quota_goal-&gt;nid" (SeongJae Park)

   Fix a couple of root-only wild pointer dereferences

 - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race"
   (SeongJae Park)

   Update the DAMON documentation to warn operators about potential
   races which can occur if the commit_inputs parameter is altered at
   the wrong time

 - "Minor hmm_test fixes and cleanups" (Alistair Popple)

   Bugfixes and a cleanup for the HMM kernel selftests

 - "Modify memfd_luo code" (Chenghao Duan)

   Cleanups, simplifications and speedups to the memfd_lou code

 - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport)

   Support for userfaultfd in guest_memfd

 - "selftests/mm: skip several tests when thp is not available" (Chunyu
   Hu)

   Fix several issues in the selftests code which were causing breakage
   when the tests were run on CONFIG_THP=n kernels

 - "mm/mprotect: micro-optimization work" (Pedro Falcato)

   A couple of nice speedups for mprotect()

 - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav)

   Document upcoming changes in the maintenance of KHO, LUO, memfd_luo,
   kexec, crash, kdump and probably other kexec-based things - they are
   being moved out of mm.git and into a new git tree

* tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits)
  MAINTAINERS: add page cache reviewer
  mm/vmscan: avoid false-positive -Wuninitialized warning
  MAINTAINERS: update Dave's kdump reviewer email address
  MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE
  MAINTAINERS: drop include/linux/kho/abi/ from KHO
  MAINTAINERS: update KHO and LIVE UPDATE maintainers
  MAINTAINERS: update kexec/kdump maintainers entries
  mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd()
  selftests: mm: skip charge_reserved_hugetlb without killall
  userfaultfd: allow registration of ranges below mmap_min_addr
  mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update
  mm/hugetlb: fix early boot crash on parameters without '=' separator
  zram: reject unrecognized type= values in recompress_store()
  docs: proc: document ProtectionKey in smaps
  mm/mprotect: special-case small folios when applying permissions
  mm/mprotect: move softleaf code out of the main function
  mm: remove '!root_reclaim' checking in should_abort_scan()
  mm/sparse: fix comment for section map alignment
  mm/page_io: use sio-&gt;len for PSWPIN accounting in sio_read_complete()
  selftests/mm: transhuge_stress: skip the test when thp not available
  ...
</content>
</entry>
<entry>
<title>mm: memcontrol: prepare for reparenting non-hierarchical stats</title>
<updated>2026-04-18T07:10:47+00:00</updated>
<author>
<name>Qi Zheng</name>
<email>zhengqi.arch@bytedance.com</email>
</author>
<published>2026-03-05T11:52:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8285917d6f383aef274fb442eb0e6f948d76abe3'/>
<id>urn:sha1:8285917d6f383aef274fb442eb0e6f948d76abe3</id>
<content type='text'>
To resolve the dying memcg issue, we need to reparent LRU folios of child
memcg to its parent memcg.  This could cause problems for non-hierarchical
stats.

As Yosry Ahmed pointed out:

In short, if memory is charged to a dying cgroup at the time of
reparenting, when the memory gets uncharged the stats updates will occur
at the parent. This will update both hierarchical and non-hierarchical
stats of the parent, which would corrupt the parent's non-hierarchical
stats (because those counters were never incremented when the memory was
charged).

Now we have the following two types of non-hierarchical stats, and they
are only used in CONFIG_MEMCG_V1:

a. memcg-&gt;vmstats-&gt;state_local[i]
b. pn-&gt;lruvec_stats-&gt;state_local[i]

To ensure that these non-hierarchical stats work properly, we need to
reparent these non-hierarchical stats after reparenting LRU folios. To
this end, this commit makes the following preparations:

1. implement reparent_state_local() to reparent non-hierarchical stats
2. make css_killed_work_fn() to be called in rcu work, and implement
   get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race
   between mod_memcg_state()/mod_memcg_lruvec_state()
   and reparent_state_local()

Link: https://lore.kernel.org/e862995c45a7101a541284b6ebee5e5c32c89066.1772711148.git.zhengqi.arch@bytedance.com
Co-developed-by: Yosry Ahmed &lt;yosry@kernel.org&gt;
Signed-off-by: Yosry Ahmed &lt;yosry@kernel.org&gt;
Signed-off-by: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Allen Pais &lt;apais@linux.microsoft.com&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Chengming Zhou &lt;chengming.zhou@linux.dev&gt;
Cc: Chen Ridong &lt;chenridong@huawei.com&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Hamza Mahfooz &lt;hamzamahfooz@linux.microsoft.com&gt;
Cc: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kamalesh Babulal &lt;kamalesh.babulal@oracle.com&gt;
Cc: Lance Yang &lt;lance.yang@linux.dev&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Lorenzo Stoakes (Oracle) &lt;ljs@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vlastimil Babka &lt;vbabka@kernel.org&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Yuanchu Xie &lt;yuanchu@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2026-04-15T17:18:49+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-15T17:18:49+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b71f0be2d23d876648758d57bc6761500e3b9c70'/>
<id>urn:sha1:b71f0be2d23d876648758d57bc6761500e3b9c70</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cgroup_file_notify() locking converted from a global lock to
   per-cgroup_file spinlock with a lockless fast-path when no
   notification is needed

 - Misc changes including exposing cgroup helpers for sched_ext and
   minor fixes

* tag 'cgroup-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/rdma: fix swapped arguments in pr_warn() format string
  cgroup/dmem: remove region parameter from dmemcg_parse_limit
  cgroup: replace global cgroup_file_kn_lock with per-cgroup_file lock
  cgroup: add lockless fast-path checks to cgroup_file_notify()
  cgroup: reduce cgroup_file_kn_lock hold time in cgroup_file_notify()
  cgroup: Expose some cgroup helpers
</content>
</entry>
<entry>
<title>cgroup: Fix cgroup_drain_dying() testing the wrong condition</title>
<updated>2026-03-26T00:08:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-25T17:23:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4c56a8ac6869855866de0bb368a4189739e1d24f'/>
<id>urn:sha1:4c56a8ac6869855866de0bb368a4189739e1d24f</id>
<content type='text'>
cgroup_drain_dying() was using cgroup_is_populated() to test whether there are
dying tasks to wait for. cgroup_is_populated() tests nr_populated_csets,
nr_populated_domain_children and nr_populated_threaded_children, but
cgroup_drain_dying() only needs to care about this cgroup's own tasks - whether
there are children is cgroup_destroy_locked()'s concern.

This caused hangs during shutdown. When systemd tried to rmdir a cgroup that had
no direct tasks but had a populated child, cgroup_drain_dying() would enter its
wait loop because cgroup_is_populated() was true from
nr_populated_domain_children. The task iterator found nothing to wait for, yet
the populated state never cleared because it was driven by live tasks in the
child cgroup.

Fix it by using cgroup_has_tasks() which only tests nr_populated_csets.

v3: Fix cgroup_is_populated() -&gt; cgroup_has_tasks() (Sebastian).

v2: https://lore.kernel.org/r/20260323200205.1063629-1-tj@kernel.org

Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
</content>
</entry>
<entry>
<title>cgroup: Wait for dying tasks to leave on rmdir</title>
<updated>2026-03-24T20:21:40+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-24T20:21:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1b164b876c36c3eb5561dd9b37702b04401b0166'/>
<id>urn:sha1:1b164b876c36c3eb5561dd9b37702b04401b0166</id>
<content type='text'>
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
tasks from cgroup.procs so that systemd doesn't see tasks that have already
been reaped via waitpid(). However, the populated counter (nr_populated_csets)
is only decremented when the task later passes through cgroup_task_dead() in
finish_task_switch(). This means cgroup.procs can appear empty while the
cgroup is still populated, causing rmdir to fail with -EBUSY.

Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
cgroup is populated but all remaining tasks have PF_EXITING set (the task
iterator returns none due to the existing filter), wait for a kick from
cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
cgroup's css_set between PF_EXITING assertion in do_exit() and
cgroup_task_dead() in finish_task_switch().

v2: cgroup_is_populated() true to false transition happens under css_set_lock
    not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
    missed wakeups (Sebastian).

Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com
Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Bert Karwatzki &lt;spasswolf@web.de&gt;
Cc: Michal Koutny &lt;mkoutny@suse.com&gt;
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup: replace global cgroup_file_kn_lock with per-cgroup_file lock</title>
<updated>2026-03-11T22:16:21+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2026-03-11T01:01:01+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4ef420b3450026b56807e5d53001f80eb495403c'/>
<id>urn:sha1:4ef420b3450026b56807e5d53001f80eb495403c</id>
<content type='text'>
Replace the global cgroup_file_kn_lock with a per-cgroup_file spinlock
to eliminate cross-cgroup contention as it is not really protecting
data shared between different cgroups.

The lock is initialized in cgroup_add_file() alongside timer_setup().
No lock acquisition is needed during initialization since the cgroup
directory is being populated under cgroup_mutex and no concurrent
accessors exist at that point.

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: add lockless fast-path checks to cgroup_file_notify()</title>
<updated>2026-03-11T22:16:21+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2026-03-11T01:01:00+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4616120fca7f6d48b4c640e3975352e451e9c2ce'/>
<id>urn:sha1:4616120fca7f6d48b4c640e3975352e451e9c2ce</id>
<content type='text'>
Add lockless checks before acquiring cgroup_file_kn_lock:

1. READ_ONCE(cfile-&gt;kn) NULL check to skip torn-down files.
2. READ_ONCE(cfile-&gt;notified_at) rate-limit check to skip when
   within the notification interval.  If within the interval, arm
   the deferred timer via timer_reduce() and confirm it is pending
   before returning -- if the timer fired in between, fall through
   to the lock path so the notification is not lost.

Both checks have safe error directions -- a stale read can only
cause unnecessary lock acquisition, never a missed notification.

The critical section is simplified to just taking a kernfs_get()
reference and updating notified_at.

Annotate cfile-&gt;kn and cfile-&gt;notified_at write sites with
WRITE_ONCE() to pair with the lockless readers.

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: reduce cgroup_file_kn_lock hold time in cgroup_file_notify()</title>
<updated>2026-03-11T22:16:21+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2026-03-11T01:00:59+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=05070cd654f38346d051b8c411faff196fa58880'/>
<id>urn:sha1:05070cd654f38346d051b8c411faff196fa58880</id>
<content type='text'>
cgroup_file_notify() calls kernfs_notify() while holding the global
cgroup_file_kn_lock.  kernfs_notify() does non-trivial work including
wake_up_interruptible() and acquisition of a second global spinlock
(kernfs_notify_lock), inflating the hold time.

Take a kernfs_get() reference under the lock and call kernfs_notify()
after dropping it, following the pattern from cgroup_file_show().

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Don't expose dead tasks in cgroup</title>
<updated>2026-03-06T22:43:25+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2026-03-06T19:22:35+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a72f73c4dd9b209c53cf8b03b6e97fcefad4262c'/>
<id>urn:sha1:a72f73c4dd9b209c53cf8b03b6e97fcefad4262c</id>
<content type='text'>
Once a task exits it has its state set to TASK_DEAD and then it is
removed from the cgroup it belonged to. The last step happens on the task
gets out of its last schedule() invocation and is delayed on PREEMPT_RT
due to locking constraints.

As a result it is possible to receive a pid via waitpid() of a task
which is still listed in cgroup.procs for the cgroup it belonged
to. This is something that systemd does not expect and as a result it
waits for its exit until a time out occurs.
This can also be reproduced on !PREEMPT_RT kernel with a significant
delay in do_exit() after exit_notify().

Hide the task from the output which have PF_EXITING set which is done
before the parent is notified. Keeping zombies with live threads
shouldn't break anything (suggested by Tejun).

Reported-by: Bert Karwatzki &lt;spasswolf@web.de&gt;
Closes: https://lore.kernel.org/all/20260219164648.3014-1-spasswolf@web.de/
Tested-by: Bert Karwatzki &lt;spasswolf@web.de&gt;
Fixes: 9311e6c29b34 ("cgroup: Fix sleeping from invalid context warning on PREEMPT_RT")
Cc: stable@vger.kernel.org # v6.19+
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Expose some cgroup helpers</title>
<updated>2026-03-06T04:15:58+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-04T21:26:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5b30afc20b3fea29b9beb83c6415c4ff06f774aa'/>
<id>urn:sha1:5b30afc20b3fea29b9beb83c6415c4ff06f774aa</id>
<content type='text'>
Expose the following through cgroup.h:

- cgroup_on_dfl()
- cgroup_is_dead()
- cgroup_for_each_live_child()
- cgroup_for_each_live_descendant_pre()
- cgroup_for_each_live_descendant_post()

Until now, these didn't need to be exposed because controllers only cared
about the css hierarchy. The planned sched_ext hierarchical scheduler
support will be based on the default cgroup hierarchy, which is in line
with the existing BPF cgroup support, and thus needs these exposed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
