<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/kernel/cgroup, branch master</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=master</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2026-04-15T17:18:49+00:00</updated>
<entry>
<title>Merge tag 'cgroup-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2026-04-15T17:18:49+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-15T17:18:49+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b71f0be2d23d876648758d57bc6761500e3b9c70'/>
<id>urn:sha1:b71f0be2d23d876648758d57bc6761500e3b9c70</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cgroup_file_notify() locking converted from a global lock to
   per-cgroup_file spinlock with a lockless fast-path when no
   notification is needed

 - Misc changes including exposing cgroup helpers for sched_ext and
   minor fixes

* tag 'cgroup-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/rdma: fix swapped arguments in pr_warn() format string
  cgroup/dmem: remove region parameter from dmemcg_parse_limit
  cgroup: replace global cgroup_file_kn_lock with per-cgroup_file lock
  cgroup: add lockless fast-path checks to cgroup_file_notify()
  cgroup: reduce cgroup_file_kn_lock hold time in cgroup_file_notify()
  cgroup: Expose some cgroup helpers
</content>
</entry>
<entry>
<title>cgroup/rdma: fix swapped arguments in pr_warn() format string</title>
<updated>2026-04-10T08:30:08+00:00</updated>
<author>
<name>cuitao</name>
<email>cuitao@kylinos.cn</email>
</author>
<published>2026-04-09T05:21:35+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3348e1e83a0f8a5ca1095843bc3316aaef7aae34'/>
<id>urn:sha1:3348e1e83a0f8a5ca1095843bc3316aaef7aae34</id>
<content type='text'>
The format string says "device %p ... rdma cgroup %p" but the arguments
were passed as (cg, device), printing them in the wrong order.

Signed-off-by: cuitao &lt;cuitao@kylinos.cn&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Skip security check for hotplug induced v1 task migration</title>
<updated>2026-03-31T19:14:13+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-03-31T15:11:08+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=089f3fcd690c71cb3d8ca09f34027764e28920a0'/>
<id>urn:sha1:089f3fcd690c71cb3d8ca09f34027764e28920a0</id>
<content type='text'>
When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
cpuset hotplug handler will schedule a work function to migrate tasks
in that cpuset with no CPU to its ancestor to enable those tasks to
continue running.

If a strict security policy is in place, however, the task migration
may fail when security_task_setscheduler() call in cpuset_can_attach()
returns a -EACCES error. That will mean that those tasks will have
no CPU to run on. The system administrators will have to explicitly
intervene to either add CPUs to that cpuset or move the tasks elsewhere
if they are aware of it.

This problem was found by a reported test failure in the LTP's
cpuset_hotplug_test.sh. Fix this problem by treating this special case as
an exception to skip the setsched security check in cpuset_can_attach()
when a v1 cpuset with tasks have no CPU left.

With that patch applied, the cpuset_hotplug_test.sh test can be run
successfully without failure.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()</title>
<updated>2026-03-31T19:14:13+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-03-31T15:11:07+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=bbe5ab8191a33572c11be8628c55b79246307125'/>
<id>urn:sha1:bbe5ab8191a33572c11be8628c55b79246307125</id>
<content type='text'>
Centralize the check required to run security_task_setscheduler() in
the task iteration loop of cpuset_can_attach() outside of the loop as
it has no dependency on the characteristics of the tasks themselves.

There is no functional change.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Fix cgroup_drain_dying() testing the wrong condition</title>
<updated>2026-03-26T00:08:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-25T17:23:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4c56a8ac6869855866de0bb368a4189739e1d24f'/>
<id>urn:sha1:4c56a8ac6869855866de0bb368a4189739e1d24f</id>
<content type='text'>
cgroup_drain_dying() was using cgroup_is_populated() to test whether there are
dying tasks to wait for. cgroup_is_populated() tests nr_populated_csets,
nr_populated_domain_children and nr_populated_threaded_children, but
cgroup_drain_dying() only needs to care about this cgroup's own tasks - whether
there are children is cgroup_destroy_locked()'s concern.

This caused hangs during shutdown. When systemd tried to rmdir a cgroup that had
no direct tasks but had a populated child, cgroup_drain_dying() would enter its
wait loop because cgroup_is_populated() was true from
nr_populated_domain_children. The task iterator found nothing to wait for, yet
the populated state never cleared because it was driven by live tasks in the
child cgroup.

Fix it by using cgroup_has_tasks() which only tests nr_populated_csets.

v3: Fix cgroup_is_populated() -&gt; cgroup_has_tasks() (Sebastian).

v2: https://lore.kernel.org/r/20260323200205.1063629-1-tj@kernel.org

Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
</content>
</entry>
<entry>
<title>cgroup: Wait for dying tasks to leave on rmdir</title>
<updated>2026-03-24T20:21:40+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-24T20:21:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1b164b876c36c3eb5561dd9b37702b04401b0166'/>
<id>urn:sha1:1b164b876c36c3eb5561dd9b37702b04401b0166</id>
<content type='text'>
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
tasks from cgroup.procs so that systemd doesn't see tasks that have already
been reaped via waitpid(). However, the populated counter (nr_populated_csets)
is only decremented when the task later passes through cgroup_task_dead() in
finish_task_switch(). This means cgroup.procs can appear empty while the
cgroup is still populated, causing rmdir to fail with -EBUSY.

Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
cgroup is populated but all remaining tasks have PF_EXITING set (the task
iterator returns none due to the existing filter), wait for a kick from
cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
cgroup's css_set between PF_EXITING assertion in do_exit() and
cgroup_task_dead() in finish_task_switch().

v2: cgroup_is_populated() true to false transition happens under css_set_lock
    not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
    missed wakeups (Sebastian).

Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com
Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Bert Karwatzki &lt;spasswolf@web.de&gt;
Cc: Michal Koutny &lt;mkoutny@suse.com&gt;
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup/dmem: remove region parameter from dmemcg_parse_limit</title>
<updated>2026-03-21T19:24:02+00:00</updated>
<author>
<name>Thadeu Lima de Souza Cascardo</name>
<email>cascardo@igalia.com</email>
</author>
<published>2026-03-19T21:22:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6675af9c1a3251ff95ef1290f9317ba0e83ce99d'/>
<id>urn:sha1:6675af9c1a3251ff95ef1290f9317ba0e83ce99d</id>
<content type='text'>
dmemcg_parse_limit does not use the region parameter. Remove it.

Signed-off-by: Thadeu Lima de Souza Cascardo &lt;cascardo@igalia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace global cgroup_file_kn_lock with per-cgroup_file lock</title>
<updated>2026-03-11T22:16:21+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2026-03-11T01:01:01+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4ef420b3450026b56807e5d53001f80eb495403c'/>
<id>urn:sha1:4ef420b3450026b56807e5d53001f80eb495403c</id>
<content type='text'>
Replace the global cgroup_file_kn_lock with a per-cgroup_file spinlock
to eliminate cross-cgroup contention as it is not really protecting
data shared between different cgroups.

The lock is initialized in cgroup_add_file() alongside timer_setup().
No lock acquisition is needed during initialization since the cgroup
directory is being populated under cgroup_mutex and no concurrent
accessors exist at that point.

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: add lockless fast-path checks to cgroup_file_notify()</title>
<updated>2026-03-11T22:16:21+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2026-03-11T01:01:00+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4616120fca7f6d48b4c640e3975352e451e9c2ce'/>
<id>urn:sha1:4616120fca7f6d48b4c640e3975352e451e9c2ce</id>
<content type='text'>
Add lockless checks before acquiring cgroup_file_kn_lock:

1. READ_ONCE(cfile-&gt;kn) NULL check to skip torn-down files.
2. READ_ONCE(cfile-&gt;notified_at) rate-limit check to skip when
   within the notification interval.  If within the interval, arm
   the deferred timer via timer_reduce() and confirm it is pending
   before returning -- if the timer fired in between, fall through
   to the lock path so the notification is not lost.

Both checks have safe error directions -- a stale read can only
cause unnecessary lock acquisition, never a missed notification.

The critical section is simplified to just taking a kernfs_get()
reference and updating notified_at.

Annotate cfile-&gt;kn and cfile-&gt;notified_at write sites with
WRITE_ONCE() to pair with the lockless readers.

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: reduce cgroup_file_kn_lock hold time in cgroup_file_notify()</title>
<updated>2026-03-11T22:16:21+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2026-03-11T01:00:59+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=05070cd654f38346d051b8c411faff196fa58880'/>
<id>urn:sha1:05070cd654f38346d051b8c411faff196fa58880</id>
<content type='text'>
cgroup_file_notify() calls kernfs_notify() while holding the global
cgroup_file_kn_lock.  kernfs_notify() does non-trivial work including
wake_up_interruptible() and acquisition of a second global spinlock
(kernfs_notify_lock), inflating the hold time.

Take a kernfs_get() reference under the lock and call kernfs_notify()
after dropping it, following the pattern from cgroup_file_show().

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
