summaryrefslogtreecommitdiff
path: root/include/linux/cgroup.h
diff options
context:
space:
mode:
authorTejun Heo <tj@kernel.org>2014-02-25 10:04:01 -0500
committerTejun Heo <tj@kernel.org>2014-02-25 10:04:01 -0500
commitb3dc094e93905ae9c1bc0815402ad8e5b203d068 (patch)
tree6d99ba4737ccbf7ce94f06937db571a7fcb902a4 /include/linux/cgroup.h
parentc75611282cf1bf717c1866e7a7eb4d0743815187 (diff)
downloadlwn-b3dc094e93905ae9c1bc0815402ad8e5b203d068.tar.gz
lwn-b3dc094e93905ae9c1bc0815402ad8e5b203d068.zip
cgroup: use css_set->mg_tasks to track target tasks during migration
Currently, while migrating tasks from one cgroup to another, cgroup_attach_task() builds a flex array of all target tasks; unfortunately, this has a couple issues. * Flex array has size limit. On 64bit, struct task_and_cgroup is 24bytes making the flex element limit around 87k. It is a high number but not impossible to hit. This means that the current cgroup implementation can't migrate a process with more than 87k threads. * Process migration involves memory allocation whose size is dependent on the number of threads the process has. This means that cgroup core can't guarantee success or failure of multi-process migrations as memory allocation failure can happen in the middle. This is in part because cgroup can't grab threadgroup locks of multiple processes at the same time, so when there are multiple processes to migrate, it is imposible to tell how many tasks are to be migrated beforehand. Note that this already affects cgroup_transfer_tasks(). cgroup currently cannot guarantee atomic success or failure of the operation. It may fail in the middle and after such failure cgroup doesn't have enough information to roll back properly. It just aborts with some tasks migrated and others not. To resolve the situation, this patch updates the migration path to use task->cg_list to track target tasks. The previous patch already added css_set->mg_tasks and updated iterations in non-migration paths to include them during task migration. This patch updates migration path to actually make use of it. Instead of putting onto a flex_array, each target task is moved from its css_set->tasks list to css_set->mg_tasks and the migration path keeps trace of all the source css_sets and the associated cgroups. Once all source css_sets are determined, the destination css_set for each is determined, linked to the matching source css_set and put on a separate list. To iterate the target tasks, migration path just needs to iterat through either the source or target css_sets, depending on whether migration has been committed or not, and the tasks on their ->mg_tasks lists. cgroup_taskset is updated to contain the list_heads for source and target css_sets and the iteration cursor. cgroup_taskset_*() are accordingly updated to walk through css_sets and their ->mg_tasks. This resolves the above listed issues with moderate additional complexity. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
Diffstat (limited to 'include/linux/cgroup.h')
-rw-r--r--include/linux/cgroup.h16
1 files changed, 16 insertions, 0 deletions
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528e2aed36c3..3a1cb265afd6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -346,6 +346,22 @@ struct css_set {
*/
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
+ /*
+ * List of csets participating in the on-going migration either as
+ * source or destination. Protected by cgroup_mutex.
+ */
+ struct list_head mg_node;
+
+ /*
+ * If this cset is acting as the source of migration the following
+ * two fields are set. mg_src_cgrp is the source cgroup of the
+ * on-going migration and mg_dst_cset is the destination cset the
+ * target tasks on this cset should be migrated to. Protected by
+ * cgroup_mutex.
+ */
+ struct cgroup *mg_src_cgrp;
+ struct css_set *mg_dst_cset;
+
/* For RCU-protected deletion */
struct rcu_head rcu_head;
};