diff options
author | Tejun Heo <tj@kernel.org> | 2023-08-07 15:57:24 -1000 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2023-08-07 15:57:24 -1000 |
commit | 63c5484e74952f60f5810256bd69814d167b8d22 (patch) | |
tree | e045e2d922b5f462856ffb2058621d8e63e8a59e /Documentation/core-api | |
parent | 025e16845877e80cb169274b330c236056ba553c (diff) | |
download | lwn-63c5484e74952f60f5810256bd69814d167b8d22.tar.gz lwn-63c5484e74952f60f5810256bd69814d167b8d22.zip |
workqueue: Add multiple affinity scopes and interface to select them
Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE
the default. The code changes to actually add the additional scopes are
trivial.
Also add module parameter "workqueue.default_affinity_scope" to override the
default scope and "affinity_scope" sysfs file to configure it per workqueue.
wq_dump.py and documentations are updated accordingly.
This enables significant flexibility in configuring how unbound workqueues
behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu
workqueue. On the other hand, "system" removes all locality boundaries.
Many modern machines have multiple L3 caches often while being mostly
uniform in terms of memory access. Thus, workqueue's previous behavior of
spreading work items in each NUMA node had negative performance implications
from unncessarily crossing L3 boundaries between issue and execution.
However, picking a finer grained affinity scope also has a downside in that
an issuer in one group can't utilize CPUs in other groups.
While dependent on the specifics of workload, there's usually a noticeable
penalty in crossing L3 boundaries, so let's default to CACHE. This issue
will be further addressed and documented with examples in future patches.
Signed-off-by: Tejun Heo <tj@kernel.org>
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/workqueue.rst | 63 |
1 files changed, 63 insertions, 0 deletions
diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst index c9e46acd339b..56af317508c9 100644 --- a/Documentation/core-api/workqueue.rst +++ b/Documentation/core-api/workqueue.rst @@ -347,6 +347,51 @@ Guidelines level of locality in wq operations and work item execution. +Affinity Scopes +=============== + +An unbound workqueue groups CPUs according to its affinity scope to improve +cache locality. For example, if a workqueue is using the default affinity +scope of "cache", it will group CPUs according to last level cache +boundaries. A work item queued on the workqueue will be processed by a +worker running on one of the CPUs which share the last level cache with the +issuing CPU. + +Workqueue currently supports the following five affinity scopes. + +``cpu`` + CPUs are not grouped. A work item issued on one CPU is processed by a + worker on the same CPU. This makes unbound workqueues behave as per-cpu + workqueues without concurrency management. + +``smt`` + CPUs are grouped according to SMT boundaries. This usually means that the + logical threads of each physical CPU core are grouped together. + +``cache`` + CPUs are grouped according to cache boundaries. Which specific cache + boundary is used is determined by the arch code. L3 is used in a lot of + cases. This is the default affinity scope. + +``numa`` + CPUs are grouped according to NUMA bounaries. + +``system`` + All CPUs are put in the same group. Workqueue makes no effort to process a + work item on a CPU close to the issuing CPU. + +The default affinity scope can be changed with the module parameter +``workqueue.default_affinity_scope`` and a specific workqueue's affinity +scope can be changed using ``apply_workqueue_attrs()``. + +If ``WQ_SYSFS`` is set, the workqueue will have the following affinity scope +related interface files under its ``/sys/devices/virtual/WQ_NAME/`` +directory. + +``affinity_scope`` + Read to see the current affinity scope. Write to change. + + Examining Configuration ======================= @@ -358,6 +403,24 @@ configuration, worker pools and how workqueues map to the pools: :: =============== wq_unbound_cpumask=0000000f + CPU + nr_pods 4 + pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 + pod_node [0]=0 [1]=0 [2]=1 [3]=1 + cpu_pod [0]=0 [1]=1 [2]=2 [3]=3 + + SMT + nr_pods 4 + pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 + pod_node [0]=0 [1]=0 [2]=1 [3]=1 + cpu_pod [0]=0 [1]=1 [2]=2 [3]=3 + + CACHE (default) + nr_pods 2 + pod_cpus [0]=00000003 [1]=0000000c + pod_node [0]=0 [1]=1 + cpu_pod [0]=0 [1]=0 [2]=1 [3]=1 + NUMA nr_pods 2 pod_cpus [0]=00000003 [1]=0000000c |