doc, block, bfq: add information on bfq execution time

The execution time of BFQ has been slightly lowered. Report the new execution time in BFQ documentation. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
author: Paolo Valente <paolo.valente@linaro.org> 2019-03-12 09:59:35 +0100
committer: Jens Axboe <axboe@kernel.dk> 2019-04-01 08:15:40 -0600
commit: 4438cf50e7b315ff4bc4cfff8520b906428c3024 (patch)
tree: 5a107d61e4c380e0deee7e9acb1e63b2e8f624f7 /Documentation/block
parent: fffca087d587b03d0d0dca2e86bf8e688fbf2c18 (diff)
download: lwn-4438cf50e7b315ff4bc4cfff8520b906428c3024.tar.gz
lwn-4438cf50e7b315ff4bc4cfff8520b906428c3024.zip
1 files changed, 22 insertions, 7 deletions
diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
index 98a8dd5ee385..1a0f2ac02eb6 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,13 +20,26 @@ for that device, by setting low_latency to 0. See Section 3 for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
-can process for a device scheduled with BFQ. To give an idea of the
-limits on slow or average CPUs, here are, first, the limits of BFQ for
-three different CPUs, on, respectively, an average laptop, an old
-desktop, and a cheap embedded system, in case full hierarchical
-support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
-CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
+As every I/O scheduler, BFQ adds some overhead to per-I/O-request
+processing. To give an idea of this overhead, the total,
+single-lock-protected, per-request processing time of BFQ---i.e., the
+sum of the execution times of the request insertion, dispatch and
+completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
+(dated CPU for notebooks; time measured with simple code
+instrumentation, and using the throughput-sync.sh script of the S
+suite [1], in performance-profiling mode). To put this result into
+context, the total, single-lock-protected, per-request execution time
+of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
+us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
+
+Scheduling overhead further limits the maximum IOPS that a CPU can
+process (already limited by the execution of the rest of the I/O
+stack). To give an idea of the limits with BFQ, on slow or average
+CPUs, here are, first, the limits of BFQ for three different CPUs, on,
+respectively, an average laptop, an old desktop, and a cheap embedded
+system, in case full hierarchical support is enabled (i.e.,
+CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_DEBUG_BLK_CGROUP is not
+set (Section 4-2):
 - Intel i7-4850HQ: 400 KIOPS
 - AMD A8-3850: 250 KIOPS
 - ARM CortexTM-A53 Octa-core: 80 KIOPS
@@ -566,3 +579,5 @@ applications. Unset this tunable if you need/want to control weights.
     Slightly extended version:
     http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
 							results.pdf
+
+[3] https://github.com/Algodev-github/S
author	Paolo Valente <paolo.valente@linaro.org>	2019-03-12 09:59:35 +0100
committer	Jens Axboe <axboe@kernel.dk>	2019-04-01 08:15:40 -0600
commit	4438cf50e7b315ff4bc4cfff8520b906428c3024 (patch)
tree	5a107d61e4c380e0deee7e9acb1e63b2e8f624f7 /Documentation/block
parent	fffca087d587b03d0d0dca2e86bf8e688fbf2c18 (diff)
download	lwn-4438cf50e7b315ff4bc4cfff8520b906428c3024.tar.gz lwn-4438cf50e7b315ff4bc4cfff8520b906428c3024.zip