<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/kernel/time/timer.c, branch doc/4.9</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=doc%2F4.9</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=doc%2F4.9'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2016-07-15T08:41:42+00:00</updated>
<entry>
<title>timers/core: Convert to hotplug state machine</title>
<updated>2016-07-15T08:41:42+00:00</updated>
<author>
<name>Richard Cochran</name>
<email>rcochran@linutronix.de</email>
</author>
<published>2016-07-13T17:16:59+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=24f73b99716a9cd8cbb345c41ced6b3b5ed94006'/>
<id>urn:sha1:24f73b99716a9cd8cbb345c41ced6b3b5ed94006</id>
<content type='text'>
When tearing down, call timers_dead_cpu() before notify_dead().
There is a hidden dependency between:

 - timers
 - block multiqueue
 - rcutree

If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify()
that latter function causes a RCU stall.

Signed-off-by: Richard Cochran &lt;rcochran@linutronix.de&gt;
Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.566790058@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'timers/fast-wheel' into timers/core</title>
<updated>2016-07-07T08:35:28+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2016-07-07T08:35:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4b4b20852d1009c5e8bc357b22353b62e3a241c7'/>
<id>urn:sha1:4b4b20852d1009c5e8bc357b22353b62e3a241c7</id>
<content type='text'>
</content>
</entry>
<entry>
<title>timers: Implement optimization for same expiry time in mod_timer()</title>
<updated>2016-07-07T08:35:12+00:00</updated>
<author>
<name>Anna-Maria Gleixner</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2016-07-04T09:50:40+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f00c0afdfa625165a609513bc74164d56752ec3e'/>
<id>urn:sha1:f00c0afdfa625165a609513bc74164d56752ec3e</id>
<content type='text'>
The existing optimization for same expiry time in mod_timer() checks whether
the timer expiry time is the same as the new requested expiry time. In the old
timer wheel implementation this does not take the slack batching into account,
neither does the new implementation evaluate whether the new expiry time will
requeue the timer to the same bucket.

To optimize that, we can calculate the resulting bucket and check if the new
expiry time is different from the current expiry time. This calculation
happens outside the base lock held region. If the resulting bucket is the same
we can avoid taking the base lock and requeueing the timer.

If the timer needs to be requeued then we have to check under the base lock
whether the base time has changed between the lockless calculation and taking
the lock. If it has changed we need to recalculate under the lock.

This optimization takes effect for timers which are enqueued into the less
granular wheel levels (1 and above). With a simple test case the functionality
has been verified:

            Before        After
 Match:       5.5%        86.6%
 Requeue:    94.5%        13.4%
 Recalc:                  &lt;0.01%

In the non optimized case the timer is requeued in 94.5% of the cases. With
the index optimization in place the requeue rate drops to 13.4%. The case
where the lockless index calculation has to be redone is less than 0.01%.

With a real world test case (networking) we observed the following changes:

            Before        After
 Match:      97.8%        99.7%
 Requeue:     2.2%         0.3%
 Recalc:                  &lt;0.001%

That means two percent fewer lock/requeue/unlock operations done in one of
the hot path use cases of timers.

Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.778527749@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Split out index calculation</title>
<updated>2016-07-07T08:35:12+00:00</updated>
<author>
<name>Anna-Maria Gleixner</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2016-07-04T09:50:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ffdf047728f8f93df896b58049c7513856027141'/>
<id>urn:sha1:ffdf047728f8f93df896b58049c7513856027141</id>
<content type='text'>
For further optimizations we need to seperate index calculation
from queueing. No functional change.

Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.691159619@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Only wake softirq if necessary</title>
<updated>2016-07-07T08:35:11+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-04T09:50:37+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4e85876a9d2a977b4a07389da8c07edf76d10825'/>
<id>urn:sha1:4e85876a9d2a977b4a07389da8c07edf76d10825</id>
<content type='text'>
With the wheel forwading in place and with the HZ=1000 4ms folding we can
avoid running the softirq at all.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.607650550@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Forward the wheel clock whenever possible</title>
<updated>2016-07-07T08:35:11+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-04T09:50:36+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a683f390b93f4d1292f849fc48d28e322046120f'/>
<id>urn:sha1:a683f390b93f4d1292f849fc48d28e322046120f</id>
<content type='text'>
The wheel clock is stale when a CPU goes into a long idle sleep. This has the
side effect that timers which are queued end up in the outer wheel levels.
That results in coarser granularity.

To solve this, we keep track of the idle state and forward the wheel clock
whenever possible.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.512039360@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Optimize collect_expired_timers() for NOHZ</title>
<updated>2016-07-07T08:35:10+00:00</updated>
<author>
<name>Anna-Maria Gleixner</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2016-07-04T09:50:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=236968383cf5cd48835ff0d8a265e299e220d140'/>
<id>urn:sha1:236968383cf5cd48835ff0d8a265e299e220d140</id>
<content type='text'>
After a NOHZ idle sleep the timer wheel must be forwarded to current jiffies.
There might be expired timers so the current code loops and checks the expired
buckets for timers. This can take quite some time for long NOHZ idle periods.

The pending bitmask in the timer base allows us to do a quick search for the
next expiring timer and therefore a fast forward of the base time which
prevents pointless long lasting loops.

For a 3 seconds idle sleep this reduces the catchup time from ~1ms to 5us.

Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.351296290@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Move __run_timers() function</title>
<updated>2016-07-07T08:35:09+00:00</updated>
<author>
<name>Anna-Maria Gleixner</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2016-07-04T09:50:33+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=73420fea80c6c376d91a69defe64013baa0d7e95'/>
<id>urn:sha1:73420fea80c6c376d91a69defe64013baa0d7e95</id>
<content type='text'>
Move __run_timers() below __next_timer_interrupt() and next_pending_bucket()
in preparation for __run_timers() NOHZ optimization.

No functional change.

Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.271872665@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Remove set_timer_slack() leftovers</title>
<updated>2016-07-07T08:35:09+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-04T09:50:31+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=53bf837b78d155b8e1110b3c25b4d0d6391b8ff3'/>
<id>urn:sha1:53bf837b78d155b8e1110b3c25b4d0d6391b8ff3</id>
<content type='text'>
We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Alan Stern &lt;stern@rowland.harvard.edu&gt;
Cc: Andrew F. Davis &lt;afd@ti.com&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Dmitry Eremin-Solenikov &lt;dbaryshkov@gmail.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Jaehoon Chung &lt;jh80.chung@samsung.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mathias Nyman &lt;mathias.nyman@intel.com&gt;
Cc: Pali Rohár &lt;pali.rohar@gmail.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Sebastian Reichel &lt;sre@kernel.org&gt;
Cc: Ulf Hansson &lt;ulf.hansson@linaro.org&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>timers: Switch to a non-cascading wheel</title>
<updated>2016-07-07T08:35:09+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-04T09:50:30+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=500462a9de657f86edaa102f8ab6bff7f7e43fc2'/>
<id>urn:sha1:500462a9de657f86edaa102f8ab6bff7f7e43fc2</id>
<content type='text'>
The current timer wheel has some drawbacks:

1) Cascading:

   Cascading can be an unbound operation and is completely pointless in most
   cases because the vast majority of the timer wheel timers are canceled or
   rearmed before expiration. (They are used as timeout safeguards, not as
   real timers to measure time.)

2) No fast lookup of the next expiring timer:

   In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
   must fast forward the base time to the current value of jiffies. As we
   have no way to find the next expiring timer fast, the code loops linearly
   and increments the base time one by one and checks for expired timers
   in each step. This causes unbound overhead spikes exactly in the moment
   when we should wake up as fast as possible.

After a thorough analysis of real world data gathered on laptops,
workstations, webservers and other machines (thanks Chris!) I came to the
conclusion that the current 'classic' timer wheel implementation can be
modified to address the above issues.

The vast majority of timer wheel timers is canceled or rearmed before
expiry. Most of them are timeouts for networking and other I/O tasks. The
nature of timeouts is to catch the exception from normal operation (TCP ack
timed out, disk does not respond, etc.). For these kinds of timeouts the
accuracy of the timeout is not really a concern. Timeouts are very often
approximate worst-case values and in case the timeout fires, we already
waited for a long time and performance is down the drain already.

The few timers which actually expire can be split into two categories:

 1) Short expiry times which expect halfways accurate expiry

 2) Long term expiry times are inaccurate today already due to the
    batching which is done for NOHZ automatically and also via the
    set_timer_slack() API.

So for long term expiry timers we can avoid the cascading property and just
leave them in the less granular outer wheels until expiry or
cancelation. Timers which are armed with a timeout larger than the wheel
capacity are no longer cascaded. We expire them with the longest possible
timeout (6+ days). We have not observed such timeouts in our data collection,
but at least we handle them, applying the rule of the least surprise.

To avoid extending the wheel levels for HZ=1000 so we can accomodate the
longest observed timeouts (5 days in the network conntrack code) we reduce the
first level granularity on HZ=1000 to 4ms, which effectively is the same as
the HZ=250 behaviour. From our data analysis there is nothing which relies on
that 1ms granularity and as a side effect we get better batching and timer
locality for the networking code as well.

Contrary to the classic wheel the granularity of the next wheel is not the
capacity of the first wheel. The granularities of the wheels are in the
currently chosen setting 8 times the granularity of the previous wheel.

So for HZ=250 we end up with the following granularity levels:

 Level Offset   Granularity                  Range
     0      0          4 ms                 0 ms -        252 ms
     1     64         32 ms               256 ms -       2044 ms (256ms - ~2s)
     2    128        256 ms              2048 ms -      16380 ms (~2s   - ~16s)
     3    192       2048 ms (~2s)       16384 ms -     131068 ms (~16s  - ~2m)
     4    256      16384 ms (~16s)     131072 ms -    1048572 ms (~2m   - ~17m)
     5    320     131072 ms (~2m)     1048576 ms -    8388604 ms (~17m  - ~2h)
     6    384    1048576 ms (~17m)    8388608 ms -   67108863 ms (~2h   - ~18h)
     7    448    8388608 ms (~2h)    67108864 ms -  536870911 ms (~18h  - ~6d)

That's a worst case inaccuracy of 12.5% for the timers which are queued at the
beginning of a level.

So the new wheel concept addresses the old issues:

1) Cascading is avoided completely

2) By keeping the timers in the bucket until expiry/cancelation we can track
   the buckets which have timers enqueued in a bucket bitmap and therefore can
   look up the next expiring timer very fast and O(1).

A further benefit of the concept is that the slack calculation which is done
on every timer start is no longer necessary because the granularity levels
provide natural batching already.

Our extensive testing with various loads did not show any performance
degradation vs. the current wheel implementation.

This patch does not address the 'fast lookup' issue as we wanted to make sure
that there is no regression introduced by the wheel redesign. The
optimizations are in follow up patches.

This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
