<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/kernel/sched, branch v4.16-rc4</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v4.16-rc4</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v4.16-rc4'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2018-02-13T12:04:03+00:00</updated>
<entry>
<title>sched/cpufreq: Remove unused SUGOV_KTHREAD_PRIORITY macro</title>
<updated>2018-02-13T12:04:03+00:00</updated>
<author>
<name>Leo Yan</name>
<email>leo.yan@linaro.org</email>
</author>
<published>2018-02-08T13:48:22+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=43d1b29b27c76e7454cd6c85bec4d0e9cbb039f3'/>
<id>urn:sha1:43d1b29b27c76e7454cd6c85bec4d0e9cbb039f3</id>
<content type='text'>
Since schedutil kernel thread directly set priority to 0, the macro
SUGOV_KTHREAD_PRIORITY is not used.  So remove it.

Signed-off-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Acked-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rafael J . Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vikram Mulukutla &lt;markivx@codeaurora.org&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: http://lkml.kernel.org/r/1518097702-9665-1-git-send-email-leo.yan@linaro.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Fix DEBUG_SPINLOCK annotation for rq-&gt;lock</title>
<updated>2018-02-13T10:44:41+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2018-02-06T16:52:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=269d599271fa604f09d5cb0093c5dd5d59964dd5'/>
<id>urn:sha1:269d599271fa604f09d5cb0093c5dd5d59964dd5</id>
<content type='text'>
Mark noticed that he had sporadic "spinlock recursion" warnings from
the DEBUG_SPINLOCK code. Now rq-&gt;lock is special in that the owner
changes in the middle of a context switch.

It so happens that we fix up the lock.owner too late, @prev can run
(remotely) the moment prev-&gt;on_cpu is cleared, this then allows @prev
to again try and acquire this rq-&gt;lock and trigger this warning.

So we have to switch lock.owner before clearing prev-&gt;on_cpu.

Do this by moving the DEBUG_SPINLOCK annotation from after switch_to()
to before switch_to() and collect all lockdep annotations there into
prepare_lock_switch() to mirror the existing finish_lock_switch().

Debugged-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/rt: Make update_curr_rt() more accurate</title>
<updated>2018-02-13T10:44:41+00:00</updated>
<author>
<name>Wen Yang</name>
<email>wen.yang99@zte.com.cn</email>
</author>
<published>2018-02-06T01:53:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a7711602c7b79950ea437178f601b52ab08ef659'/>
<id>urn:sha1:a7711602c7b79950ea437178f601b52ab08ef659</id>
<content type='text'>
rq-&gt;clock_task may be updated between the two calls of
rq_clock_task() in update_curr_rt(). Calling rq_clock_task() only
once makes it more accurate and efficient, taking update_curr() as
reference.

Signed-off-by: Wen Yang &lt;wen.yang99@zte.com.cn&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Jiang Biao &lt;jiang.biao2@zte.com.cn&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1517882008-44552-1-git-send-email-wen.yang99@zte.com.cn
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Make update_curr_dl() more accurate</title>
<updated>2018-02-13T10:44:40+00:00</updated>
<author>
<name>Wen Yang</name>
<email>wen.yang99@zte.com.cn</email>
</author>
<published>2018-02-06T01:55:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6fe0ce1eb04f99a1eb1eb6e7f775666966cf6c80'/>
<id>urn:sha1:6fe0ce1eb04f99a1eb1eb6e7f775666966cf6c80</id>
<content type='text'>
rq-&gt;clock_task may be updated between the two calls of
rq_clock_task() in update_curr_dl(). Calling rq_clock_task() only
once makes it more accurate and efficient, taking update_curr() as
reference.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Wen Yang &lt;wen.yang99@zte.com.cn&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Jiang Biao &lt;jiang.biao2@zte.com.cn&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1517882148-44599-1-git-send-email-wen.yang99@zte.com.cn
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2018-02-07T06:15:42+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-02-07T06:15:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a2e5790d841658485d642196dbb0927303d6c22f'/>
<id>urn:sha1:a2e5790d841658485d642196dbb0927303d6c22f</id>
<content type='text'>
Merge misc updates from Andrew Morton:

 - kasan updates

 - procfs

 - lib/bitmap updates

 - other lib/ updates

 - checkpatch tweaks

 - rapidio

 - ubsan

 - pipe fixes and cleanups

 - lots of other misc bits

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (114 commits)
  Documentation/sysctl/user.txt: fix typo
  MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns
  MAINTAINERS: update various PALM patterns
  MAINTAINERS: update "ARM/OXNAS platform support" patterns
  MAINTAINERS: update Cortina/Gemini patterns
  MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern
  MAINTAINERS: remove ANDROID ION pattern
  mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors
  mm: docs: fix parameter names mismatch
  mm: docs: fixup punctuation
  pipe: read buffer limits atomically
  pipe: simplify round_pipe_size()
  pipe: reject F_SETPIPE_SZ with size over UINT_MAX
  pipe: fix off-by-one error when checking buffer limits
  pipe: actually allow root to exceed the pipe buffer limits
  pipe, sysctl: remove pipe_proc_fn()
  pipe, sysctl: drop 'min' parameter from pipe-max-size converter
  kasan: rework Kconfig settings
  crash_dump: is_kdump_kernel can be boolean
  kernel/mutex: mutex_is_locked can be boolean
  ...
</content>
</entry>
<entry>
<title>cpumask: make cpumask_size() return "unsigned int"</title>
<updated>2018-02-07T02:32:45+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-02-06T23:39:37+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4de373a12f3c551f9263f37d609f264b440adfec'/>
<id>urn:sha1:4de373a12f3c551f9263f37d609f264b440adfec</id>
<content type='text'>
CPUmasks are never big enough to warrant 64-bit code.

Space savings:

	add/remove: 0/0 grow/shrink: 1/4 up/down: 3/-17 (-14)
	Function                                     old     new   delta
	sched_init_numa                             1530    1533      +3
	compat_sys_sched_setaffinity                 160     159      -1
	sys_sched_getaffinity                        197     195      -2
	sys_sched_setaffinity                        183     176      -7
	compat_sys_sched_getaffinity                 179     172      -7

Link: http://lkml.kernel.org/r/20171204165531.GA8221@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'linus' into sched/urgent, to resolve conflicts</title>
<updated>2018-02-06T20:12:31+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2018-02-06T20:12:31+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=82845079160817cc6ac64e5321bbd935e0a47b3a'/>
<id>urn:sha1:82845079160817cc6ac64e5321bbd935e0a47b3a</id>
<content type='text'>
 Conflicts:
	arch/arm64/kernel/entry.S
	arch/x86/Kconfig
	include/linux/sched/mm.h
	kernel/fork.c

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS</title>
<updated>2018-02-06T09:20:37+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2018-01-30T10:45:55+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=32e839dda3ba576943365f0f5817ce5c843137dc'/>
<id>urn:sha1:32e839dda3ba576943365f0f5817ce5c843137dc</id>
<content type='text'>
The select_idle_sibling() (SIS) rewrite in commit:

  10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings()")

... replaced a domain iteration with a search that broadly speaking
does a wrapped walk of the scheduler domain sharing a last-level-cache.

While this had a number of improvements, one consequence is that two tasks
that share a waker/wakee relationship push each other around a socket. Even
though two tasks may be active, all cores are evenly used. This is great from
a search perspective and spreads a load across individual cores, but it has
adverse consequences for cpufreq. As each CPU has relatively low utilisation,
cpufreq may decide the utilisation is too low to used a higher P-state and
overall computation throughput suffers.

While individual cpufreq and cpuidle drivers may compensate by artifically
boosting P-state (at c0) or avoiding lower C-states (during idle), it does
not help if hardware-based cpufreq (e.g. HWP) is used.

This patch tracks a recently used CPU based on what CPU a task was running
on when it last was a waker a CPU it was recently using when a task is a
wakee. During SIS, the recently used CPU is used as a target if it's still
allowed by the task and is idle.

The benefit may be non-obvious so consider an example of two tasks
communicating back and forth. Task A may be an application doing IO where
task B is a kworker or kthread like journald. Task A may issue IO, wake
B and B wakes up A on completion.  With the existing scheme this may look
like the following (potentially different IDs if SMT is in use but similar
principal applies).

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 2)
 A (cpu 2)	wake	B (wakes on cpu 3)
 etc.

A careful reader may wonder why CPU 0 was not idle when B wakes A the
first time and it's simply due to the fact that A can be rescheduled to
another CPU and the pattern is that prev == target when B tries to wakeup A
and the information about CPU 0 has been lost.

With this patch, the pattern is more likely to be:

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 0)
 A (cpu 0)	wake	B (wakes on cpu 1)
 etc

i.e. two communicating casts are more likely to use just two cores instead
of all available cores sharing a LLC.

The most dramatic speedup was noticed on dbench using the XFS filesystem on
UMA as clients interact heavily with workqueues in that configuration. Note
that a similar speedup is not observed on ext4 as the wakeup pattern
is different:

                          4.15.0-rc9             4.15.0-rc9
                           waprev-v1        biasancestor-v1
 Hmean      1      287.54 (   0.00%)      817.01 ( 184.14%)
 Hmean      2     1268.12 (   0.00%)     1781.24 (  40.46%)
 Hmean      4     1739.68 (   0.00%)     1594.47 (  -8.35%)
 Hmean      8     2464.12 (   0.00%)     2479.56 (   0.63%)
 Hmean     64     1455.57 (   0.00%)     1434.68 (  -1.44%)

The results can be less dramatic on NUMA where automatic balancing interferes
with the test. It's also known that network benchmarks running on localhost
also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP
and TCP depending on the machine). Hackbench also seens small improvements
(6-11% depending on machine and thread count). The facebook schbench was also
tested but in most cases showed little or no different to wakeup latencies.

Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Do not migrate if the prev_cpu is idle</title>
<updated>2018-02-06T09:20:36+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2018-01-30T10:45:54+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=806486c377e33ab662de6d47902e9e2a32b79368'/>
<id>urn:sha1:806486c377e33ab662de6d47902e9e2a32b79368</id>
<content type='text'>
wake_affine_idle() prefers to move a task to the current CPU if the
wakeup is due to an interrupt. The expectation is that the interrupt
data is cache hot and relevant to the waking task as well as avoiding
a search. However, there is no way to determine if there was cache hot
data on the previous CPU that may exceed the interrupt data. Furthermore,
round-robin delivery of interrupts can migrate tasks around a socket where
each CPU is under-utilised.  This can interact badly with cpufreq which
makes decisions based on per-cpu data. It has been observed on machines
with HWP that p-states are not boosted to their maximum levels even though
the workload is latency and throughput sensitive.

This patch uses the previous CPU for the task if it's idle and cache-affine
with the current CPU even if the current CPU is idle due to the wakup
being related to the interrupt. This reduces migrations at the cost of
the interrupt data not being cache hot when the task wakes.

A variety of workloads were tested on various machines and no adverse
impact was noticed that was outside noise. dbench on ext4 on UMA showed
roughly 10% reduction in the number of CPU migrations and it is a case
where interrupts are frequent for IO competions. In most cases, the
difference in performance is quite small but variability is often
reduced. For example, this is the result for pgbench running on a UMA
machine with different numbers of clients.

                          4.15.0-rc9             4.15.0-rc9
                            baseline              waprev-v1
 Hmean     1     22096.28 (   0.00%)    22734.86 (   2.89%)
 Hmean     4     74633.42 (   0.00%)    75496.77 (   1.16%)
 Hmean     7    115017.50 (   0.00%)   113030.81 (  -1.73%)
 Hmean     12   126209.63 (   0.00%)   126613.40 (   0.32%)
 Hmean     16   131886.91 (   0.00%)   130844.35 (  -0.79%)
 Stddev    1       636.38 (   0.00%)      417.11 (  34.46%)
 Stddev    4       614.64 (   0.00%)      583.24 (   5.11%)
 Stddev    7       542.46 (   0.00%)      435.45 (  19.73%)
 Stddev    12      173.93 (   0.00%)      171.50 (   1.40%)
 Stddev    16      671.42 (   0.00%)      680.30 (  -1.32%)
 CoeffVar  1         2.88 (   0.00%)        1.83 (  36.26%)

Note that the different in performance is marginal but for low utilisation,
there is less variability.

Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Restructure wake_affine*() to return a CPU id</title>
<updated>2018-02-06T09:20:35+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2018-01-30T10:45:53+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3b76c4a33959ca98a573cd9c94c8690d123912ca'/>
<id>urn:sha1:3b76c4a33959ca98a573cd9c94c8690d123912ca</id>
<content type='text'>
This is a preparation patch that has wake_affine*() return a CPU ID instead of
a boolean. The intent is to allow the wake_affine() helpers to be avoided
if a decision is already made. This patch has no functional change.

Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20180130104555.4125-3-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
