<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/include/linux/sched.h, branch doc/4.4</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=doc%2F4.4</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=doc%2F4.4'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2015-09-04T23:54:41+00:00</updated>
<entry>
<title>mm: defer flush of writable TLB entries</title>
<updated>2015-09-04T23:54:41+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2015-09-04T22:47:35+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d950c9477d51f0cefc2ed3cf76e695d46af0d9c1'/>
<id>urn:sha1:d950c9477d51f0cefc2ed3cf76e695d46af0d9c1</id>
<content type='text'>
If a PTE is unmapped and it's dirty then it was writable recently.  Due to
deferred TLB flushing, it's best to assume a writable TLB cache entry
exists.  With that assumption, the TLB must be flushed before any IO can
start or the page is freed to avoid lost writes or data corruption.  This
patch defers flushing of potentially writable TLBs as long as possible.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: send one IPI per CPU to TLB flush all entries after unmapping pages</title>
<updated>2015-09-04T23:54:41+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2015-09-04T22:47:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=72b252aed506b8f1a03f7abd29caef4cdf6a043b'/>
<id>urn:sha1:72b252aed506b8f1a03f7abd29caef4cdf6a043b</id>
<content type='text'>
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs.  There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.

On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high.  This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped.  When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.

Architectures wishing to do this must give the following guarantee.

        If a clean page is unmapped and not immediately flushed, the
        architecture must guarantee that a write to that linear address
        from a CPU with a cached TLB entry will trap a page fault.

This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting.  The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry.  An additional
architecture helper called flush_tlb_local is required.  It's a trivial
wrapper with some accounting in the x86 case.

The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure.  The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite.  The test case uses NR_CPU
readers of mapped files that consume 10*RAM.

Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User          581.00       611.43
System       5804.93      4111.76
Elapsed       161.03       122.12

This is showing that the readers completed 24.40% faster with 29% less
system CPU time.  From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.

The impact is lower on a single socket machine.

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User           58.09        57.64
System        111.82        76.56
Elapsed        27.29        22.55

It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.

The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages.  It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes.  Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.

[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2015-09-01T03:26:22+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-01T03:26:22+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a1d8561172f369ba56d636df49a6b4d6d77e2123'/>
<id>urn:sha1:a1d8561172f369ba56d636df49a6b4d6d77e2123</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
 "The biggest change in this cycle is the rewrite of the main SMP load
  balancing metric: the CPU load/utilization.  The main goal was to make
  the metric more precise and more representative - see the changelog of
  this commit for the gory details:

    9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

  It is done in a way that significantly reduces complexity of the code:

    5 files changed, 249 insertions(+), 494 deletions(-)

  and the performance testing results are encouraging.  Nevertheless we
  need to keep an eye on potential regressions, since this potentially
  affects every SMP workload in existence.

  This work comes from Yuyang Du.

  Other changes:

   - SCHED_DL updates.  (Andrea Parri)

   - Simplify architecture callbacks by removing finish_arch_switch().
     (Peter Zijlstra et al)

   - cputime accounting: guarantee stime + utime == rtime.  (Peter
     Zijlstra)

   - optimize idle CPU wakeups some more - inspired by Facebook server
     loads.  (Mike Galbraith)

   - stop_machine fixes and updates.  (Oleg Nesterov)

   - Introduce the 'trace_sched_waking' tracepoint.  (Peter Zijlstra)

   - sched/numa tweaks.  (Srikar Dronamraju)

   - misc fixes and small cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  sched/deadline: Fix comment in enqueue_task_dl()
  sched/deadline: Fix comment in push_dl_tasks()
  sched: Change the sched_class::set_cpus_allowed() calling context
  sched: Make sched_class::set_cpus_allowed() unconditional
  sched: Fix a race between __kthread_bind() and sched_setaffinity()
  sched: Ensure a task has a non-normalized vruntime when returning back to CFS
  sched/numa: Fix NUMA_DIRECT topology identification
  tile: Reorganize _switch_to()
  sched, sparc32: Update scheduler comments in copy_thread()
  sched: Remove finish_arch_switch()
  sched, tile: Remove finish_arch_switch
  sched, sh: Fold finish_arch_switch() into switch_to()
  sched, score: Remove finish_arch_switch()
  sched, avr32: Remove finish_arch_switch()
  sched, MIPS: Get rid of finish_arch_switch()
  sched, arm: Remove finish_arch_switch()
  sched/fair: Clean up load average references
  sched/fair: Provide runnable_load_avg back to cfs_rq
  sched/fair: Remove task and group entity load when they are dead
  sched/fair: Init cfs_rq's sched_entity load average
  ...
</content>
</entry>
<entry>
<title>sched: Fix a race between __kthread_bind() and sched_setaffinity()</title>
<updated>2015-08-12T10:06:09+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-05-15T15:43:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=25834c73f93af7f0712c98ca4593691592e6b360'/>
<id>urn:sha1:25834c73f93af7f0712c98ca4593691592e6b360</id>
<content type='text'>
Because sched_setscheduler() checks p-&gt;flags &amp; PF_NO_SETAFFINITY
without locks, a caller might observe an old value and race with the
set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo
it:

	__kthread_bind()
	  do_set_cpus_allowed()
						&lt;SYSCALL&gt;
						  sched_setaffinity()
						    if (p-&gt;flags &amp; PF_NO_SETAFFINITIY)
						    set_cpus_allowed_ptr()
	  p-&gt;flags |= PF_NO_SETAFFINITY

Fix the bug by putting everything under the regular scheduler locks.

This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: dedekind1@gmail.com
Cc: juri.lelli@arm.com
Cc: mgorman@suse.de
Cc: riel@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Rewrite runnable load and utilization average tracking</title>
<updated>2015-08-03T10:21:29+00:00</updated>
<author>
<name>Yuyang Du</name>
<email>yuyang.du@intel.com</email>
</author>
<published>2015-07-15T00:04:37+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9d89c257dfb9c51a532d69397f6eed75e5168c35'/>
<id>urn:sha1:9d89c257dfb9c51a532d69397f6eed75e5168c35</id>
<content type='text'>
The idea of runnable load average (let runnable time contribute to weight)
was proposed by Paul Turner and Ben Segall, and it is still followed by
this rewrite. This rewrite aims to solve the following issues:

1. cfs_rq's load average (namely runnable_load_avg and blocked_load_avg) is
   updated at the granularity of an entity at a time, which results in the
   cfs_rq's load average is stale or partially updated: at any time, only
   one entity is up to date, all other entities are effectively lagging
   behind. This is undesirable.

   To illustrate, if we have n runnable entities in the cfs_rq, as time
   elapses, they certainly become outdated:

     t0: cfs_rq { e1_old, e2_old, ..., en_old }

   and when we update:

     t1: update e1, then we have cfs_rq { e1_new, e2_old, ..., en_old }

     t2: update e2, then we have cfs_rq { e1_old, e2_new, ..., en_old }

     ...

   We solve this by combining all runnable entities' load averages together
   in cfs_rq's avg, and update the cfs_rq's avg as a whole. This is based
   on the fact that if we regard the update as a function, then:

   w * update(e) = update(w * e) and

   update(e1) + update(e2) = update(e1 + e2), then

   w1 * update(e1) + w2 * update(e2) = update(w1 * e1 + w2 * e2)

   therefore, by this rewrite, we have an entirely updated cfs_rq at the
   time we update it:

     t1: update cfs_rq { e1_new, e2_new, ..., en_new }

     t2: update cfs_rq { e1_new, e2_new, ..., en_new }

     ...

2. cfs_rq's load average is different between top rq-&gt;cfs_rq and other
   task_group's per CPU cfs_rqs in whether or not blocked_load_average
   contributes to the load.

   The basic idea behind runnable load average (the same for utilization)
   is that the blocked state is taken into account as opposed to only
   accounting for the currently runnable state. Therefore, the average
   should include both the runnable/running and blocked load averages.
   This rewrite does that.

   In addition, we also combine runnable/running and blocked averages
   of all entities into the cfs_rq's average, and update it together at
   once. This is based on the fact that:

     update(runnable) + update(blocked) = update(runnable + blocked)

   This significantly reduces the code as we don't need to separately
   maintain/update runnable/running load and blocked load.

3. How task_group entities' share is calculated is complex and imprecise.

   We reduce the complexity in this rewrite to allow a very simple rule:
   the task_group's load_avg is aggregated from its per CPU cfs_rqs's
   load_avgs. Then group entity's weight is simply proportional to its
   own cfs_rq's load_avg / task_group's load_avg. To illustrate,

   if a task_group has { cfs_rq1, cfs_rq2, ..., cfs_rqn }, then,

   task_group_avg = cfs_rq1_avg + cfs_rq2_avg + ... + cfs_rqn_avg, then

   cfs_rqx's entity's share = cfs_rqx_avg / task_group_avg * task_group's share

To sum up, this rewrite in principle is equivalent to the current one, but
fixes the issues described above. Turns out, it significantly reduces the
code complexity and hence increases clarity and efficiency. In addition,
the new averages are more smooth/continuous (no spurious spikes and valleys)
and updated more consistently and quickly to reflect the load dynamics.

As a result, we have less load tracking overhead, better performance,
and especially better power efficiency due to more balanced load.

Signed-off-by: Yuyang Du &lt;yuyang.du@intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: arjan@linux.intel.com
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: fengguang.wu@intel.com
Cc: len.brown@intel.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: rafael.j.wysocki@intel.com
Cc: umgwanakikbuti@gmail.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1436918682-4971-3-git-send-email-yuyang.du@intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/preempt: Fix cond_resched_lock() and cond_resched_softirq()</title>
<updated>2015-08-03T10:21:24+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-07-15T09:52:04+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=fe32d3cd5e8eb0f82e459763374aa80797023403'/>
<id>urn:sha1:fe32d3cd5e8eb0f82e459763374aa80797023403</id>
<content type='text'>
These functions check should_resched() before unlocking spinlock/bh-enable:
preempt_count always non-zero =&gt; should_resched() always returns false.
cond_resched_lock() worked iff spin_needbreak is set.

This patch adds argument "preempt_offset" to should_resched().

preempt_count offset constants for that:

  PREEMPT_DISABLE_OFFSET  - offset after preempt_disable()
  PREEMPT_LOCK_OFFSET     - offset after spin_lock()
  SOFTIRQ_DISABLE_OFFSET  - offset after local_bh_distable()
  SOFTIRQ_LOCK_OFFSET     - offset after spin_lock_bh()

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alexander Graf &lt;agraf@suse.de&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Beef up wake_wide()</title>
<updated>2015-08-03T10:21:23+00:00</updated>
<author>
<name>Mike Galbraith</name>
<email>umgwanakikbuti@gmail.com</email>
</author>
<published>2015-07-14T15:39:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=63b0e9edceec10fa41ec33393a1515a5ff444277'/>
<id>urn:sha1:63b0e9edceec10fa41ec33393a1515a5ff444277</id>
<content type='text'>
Josef Bacik reported that Facebook sees better performance with their
1:N load (1 dispatch/node, N workers/node) when carrying an old patch
to try very hard to wake to an idle CPU.  While looking at wake_wide(),
I noticed that it doesn't pay attention to the wakeup of a many partner
waker, returning 1 only when waking one of its many partners.

Correct that, letting explicit domain flags override the heuristic.

While at it, adjust task_struct bits, we don't need a 64-bit counter.

Tested-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
[ Tidy things up. ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: kernel-team&lt;Kernel-team@fb.com&gt;
Cc: morten.rasmussen@arm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1436888390.7983.49.camel@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/cputime: Guarantee stime + utime == rtime</title>
<updated>2015-08-03T10:21:21+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-06-30T09:30:54+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9d7fb04276481c59610983362d8e023d262b58ca'/>
<id>urn:sha1:9d7fb04276481c59610983362d8e023d262b58ca</id>
<content type='text'>
While the current code guarantees monotonicity for stime and utime
independently of one another, it does not guarantee that the sum of
both is equal to the total time we started out with.

This confuses things (and peoples) who look at this sum, like top, and
will report &gt;100% usage followed by a matching period of 0%.

Rework the code to provide both individual monotonicity and a coherent
sum.

Suggested-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Reported-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Tested-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: jason.low2@hp.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86</title>
<updated>2015-07-18T01:42:51+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2015-07-17T10:28:12+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5aaeb5c01c5b6c0be7b7aadbf3ace9f3a4458c3d'/>
<id>urn:sha1:5aaeb5c01c5b6c0be7b7aadbf3ace9f3a4458c3d</id>
<content type='text'>
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/fpu, sched: Dynamically allocate 'struct fpu'</title>
<updated>2015-07-18T01:42:35+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave@sr71.net</email>
</author>
<published>2015-07-17T10:28:11+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0c8c0f03e3a292e031596484275c14cf39c0ab7a'/>
<id>urn:sha1:0c8c0f03e3a292e031596484275c14cf39c0ab7a</id>
<content type='text'>
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
