summaryrefslogtreecommitdiff
path: root/Documentation/scheduler
AgeCommit message (Collapse)Author
2024-11-11sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()Tejun Heo
In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq*() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq*() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*() This patch performs the second rename. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. The compat features will be dropped after v6.15. v2: Comment and documentation updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-11-11sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()Tejun Heo
In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq*() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq*() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*() This patch performs the first set of renames. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. The compat features will be dropped after v6.15. v2: Documentation updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-10-08sched_ext: Documentation: Update instructions for running example schedulersDevaansh-Kumar
Since the artifact paths for tools changed, we need to update the documentation to reflect that path. Signed-off-by: Devaansh-Kumar <devaanshk840@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-23sched_ext: Provide a sysfs enable_seq counterAndrea Righi
As discussed during the distro-centric session within the sched_ext Microconference at LPC 2024, introduce a sequence counter that is incremented every time a BPF scheduler is loaded. This feature can help distributions in diagnosing potential performance regressions by identifying systems where users are running (or have ran) custom BPF schedulers. Example: arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq 0 arighi@virtme-ng~> sudo scx_simple local=1 global=0 ^CEXIT: unregistered from user space arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq 1 In this way user-space tools (such as Ubuntu's apport and similar) are able to gather and include this information in bug reports. Cc: Giovanni Gherdovich <giovanni.gherdovich@suse.com> Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Cc: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Cc: Phil Auld <pauld@redhat.com> Signed-off-by: Andrea Righi <andrea.righi@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-21Merge tag 'sched_ext-for-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext support from Tejun Heo: "This implements a new scheduler class called ‘ext_sched_class’, or sched_ext, which allows scheduling policies to be implemented as BPF programs. The goals of this are: - Ease of experimentation and exploration: Enabling rapid iteration of new scheduling policies. - Customization: Building application-specific schedulers which implement policies that are not applicable to general-purpose schedulers. - Rapid scheduler deployments: Non-disruptive swap outs of scheduling policies in production environments" See individual commits for more documentation, but also the cover letter for the latest series: Link: https://lore.kernel.org/all/20240618212056.2833381-1-tj@kernel.org/ * tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (110 commits) sched: Move update_other_load_avgs() to kernel/sched/pelt.c sched_ext: Don't trigger ops.quiescent/runnable() on migrations sched_ext: Synchronize bypass state changes with rq lock scx_qmap: Implement highpri boosting sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq() sched_ext: Compact struct bpf_iter_scx_dsq_kern sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq() sched_ext: Move consume_local_task() upward sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq() sched_ext: Reorder args for consume_local/remote_task() sched_ext: Restructure dispatch_to_local_dsq() sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON sched_ext: Refactor consume_remote_task() sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate sched_ext: Add missing static to scx_dump_data sched_ext: Add missing static to scx_has_op[] sched_ext: Temporarily work around pick_task_scx() being called without balance_scx() sched_ext: Add a cgroup scheduler which uses flattened hierarchy sched_ext: Add cgroup support ...
2024-09-19Merge tag 'sched-core-2024-09-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's last major contribution to the kernel: "SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT Throttling; DEADLINE servers should be able to replace and improve that." (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef Esmat, Huang Shijie) - Preparatory changes for sched_ext integration: - Use set_next_task(.first) where required - Fix up set_next_task() implementations - Clean up DL server vs. core sched - Split up put_prev_task_balance() - Rework pick_next_task() - Combine the last put_prev_task() and the first set_next_task() - Rework dl_server - Add put_prev_task(.next) (Peter Zijlstra, with a fix by Tejun Heo) - Complete the EEVDF transition and refine EEVDF scheduling: - Implement delayed dequeue - Allow shorter slices to wakeup-preempt - Use sched_attr::sched_runtime to set request/slice suggestion - Document the new feature flags - Remove unused and duplicate-functionality fields - Simplify & unify pick_next_task_fair() - Misc debuggability enhancements (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin Schneider and Chuyi Zhou) - Initialize the vruntime of a new task when it is first enqueued, resulting in significant decrease in latency of newly woken tasks (Zhang Qiao) - Introduce SM_IDLE and an idle re-entry fast-path in __schedule() (K Prateek Nayak, Peter Zijlstra) - Clean up and clarify the usage of Clean up usage of rt_task() (Qais Yousef) - Preempt SCHED_IDLE entities in strict cgroup hierarchies (Tianchen Ding) - Clarify the documentation of time units for deadline scheduler parameters (Christian Loehle) - Remove the HZ_BW chicken-bit feature flag introduced a year ago, the original change seems to be working fine (Phil Auld) - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie, Peilin He, Qais Yousefm and Vincent Guittot) * tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched/cpufreq: Use NSEC_PER_MSEC for deadline task cpufreq/cppc: Use NSEC_PER_MSEC for deadline task sched/deadline: Clarify nanoseconds in uapi sched/deadline: Convert schedtool example to chrt sched/debug: Fix the runnable tasks output sched: Fix sched_delayed vs sched_core kernel/sched: Fix util_est accounting for DELAY_DEQUEUE kthread: Fix task state in kthread worker if being frozen sched/pelt: Use rq_clock_task() for hw_pressure sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule() sched: Add put_prev_task(.next) sched: Rework dl_server sched: Combine the last put_prev_task() and the first set_next_task() sched: Rework pick_next_task() sched: Split up put_prev_task_balance() sched: Clean up DL server vs core sched sched: Fixup set_next_task() implementations sched: Use set_next_task(.first) where required sched/fair: Properly deactivate sched_delayed task upon class change ...
2024-09-11Merge branch 'tip/sched/core' into sched_ext/for-6.12Tejun Heo
Pull in tip/sched/core to resolve two merge conflicts: - 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") 5d871a63997f ("sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c") A simple context conflict. The former added __update_blocked_others() in the same #ifdef CONFIG_SMP block that effective_cpu_util() and sched_cpu_util() are in and the latter moved those functions to fair.c. This makes __update_blocked_others() more out of place. Will follow up with a patch to relocate. - 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") 84d265281d6c ("sched/pelt: Use rq_clock_task() for hw_pressure") The former factored out the body of __update_blocked_others() into update_other_load_avgs(). The latter changed how update_hw_load_avg() is called in the body. Resolved by applying the change to update_other_load_avgs() instead. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11sched/deadline: Convert schedtool example to chrtChristian Loehle
chrt has SCHED_DEADLINE support so convert the example instead of relying on a schedtool fork. While at it fix the wrong mentioning of microseconds, it was nanoseconds for both schedtool and chrt. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20240813144348.1180344-2-christian.loehle@arm.com
2024-09-05docs: scheduler: completion: Update member of struct completionI Hsin Cheng
The member "wait" in struct completion isn't of type wait_queue_head_t anymore, as it is now "struct swait_queue_head", fix it to match with the current implementation. Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240828165036.178011-1-richard120310@gmail.com
2024-08-07docs: scheduler: Start documenting the EEVDF schedulerCarlos Bilbao
Add some documentation regarding the newly introduced scheduler EEVDF. Signed-off-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240720002207.444286-2-carlos.bilbao.osdev@gmail.com
2024-07-30Merge tag 'v6.11-rc1' into for-6.12Tejun Heo
Linux 6.11-rc1
2024-07-09docs/sp_SP: Add translation for scheduler/sched-design-CFS.rstSergio González Collado
Translate Documentation/scheduler/sched-design-CFS.rst into Spanish Signed-off-by: Sergio González Collado <sergio.collado@gmail.com> Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240707195047.14359-1-sergio.collado@gmail.com
2024-07-02sched_ext: Documentation: Remove mentions of scx_bpf_switch_allAboorva Devarajan
Updated sched_ext doc to eliminate references to scx_bpf_switch_all, which has been removed in recent sched_ext versions. Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-18sched_ext: Documentation: scheduler: Document extensible scheduler classTejun Heo
Add Documentation/scheduler/sched-ext.rst which gives a high-level overview and pointers to the examples. v6: - Add paragraph explaining debug dump. v5: - Updated to reflect /sys/kernel interface change. Kconfig options added. v4: - README improved, reformatted in markdown and renamed to README.md. v3: - Added tools/sched_ext/README. - Dropped _example prefix from scheduler names. v2: - Apply minor edits suggested by Bagas. Caveats section dropped as all of them are addressed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com>
2024-03-25Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-03-24Merge tag 'sched-urgent-2024-03-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler doc clarification from Thomas Gleixner: "A single update for the documentation of the base_slice_ns tunable to clarify that any value which is less than the tick slice has no effect because the scheduler tick is not guaranteed to happen within the set time slice" * tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
2024-03-21sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relationMukesh Kumar Chaurasiya
The tunable base_slice_ns is dependent on CONFIG_HZ (i.e. TICK_NSEC) for any significant performance improvement. The reason being the scheduler tick is not frequent enough to force preemption when base_slice expires in case of: base_slice_ns < TICK_NSEC The below data is of stress-ng: Number of CPU: 1 Stressor threads: 4 Time: 30sec On CONFIG_HZ=1000 | base_slice | avg-run (msec) | context-switches | | ---------- | -------------- | ---------------- | | 3ms | 2.914 | 10342 | | 6ms | 4.857 | 6196 | | 9ms | 6.754 | 4482 | | 12ms | 7.872 | 3802 | | 22ms | 11.294 | 2710 | | 32ms | 13.425 | 2284 | On CONFIG_HZ=100 | base_slice | avg-run (msec) | context-switches | | ---------- | -------------- | ---------------- | | 3ms | 9.144 | 3337 | | 6ms | 9.113 | 3301 | | 9ms | 8.991 | 3315 | | 12ms | 12.935 | 2328 | | 22ms | 16.031 | 1915 | | 32ms | 18.608 | 1622 | base_slice: the value of base_slice in ms avg-run (msec): average time of the stressor threads got on cpu before it got preempted context-switches: number of context switches for the stress-ng process Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240320173815.927637-2-mchauras@linux.ibm.com
2024-03-12sched/balancing: Rename load_balance() => sched_balance_rq()Ingo Molnar
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Also load_balance() has become somewhat of a misnomer: historically it was the first and primary load-balancing function that was called, but with the introduction of sched domains, it's become a lower layer function that balances runqueues. Rename it to sched_balance_rq() accordingly. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-6-mingo@kernel.org
2024-03-12sched/balancing: Rename rebalance_domains() => sched_balance_domains()Ingo Molnar
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-5-mingo@kernel.org
2024-03-12sched/balancing: Rename trigger_load_balance() => sched_balance_trigger()Ingo Molnar
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-4-mingo@kernel.org
2024-03-12sched/balancing: Rename scheduler_tick() => sched_tick()Ingo Molnar
- Standardize on prefixing scheduler-internal functions defined in <linux/sched.h> with sched_*() prefix. scheduler_tick() was the only function using the scheduler_ prefix. Harmonize it. - The other reason to rename it is the NOHZ scheduler tick handling functions are already named sched_tick_*(). Make the 'git grep sched_tick' more meaningful. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-3-mingo@kernel.org
2024-03-12sched/balancing: Rename run_rebalance_domains() => sched_balance_softirq()Ingo Molnar
run_rebalance_domains() is a misnomer, as it doesn't only run rebalance_domains(), but since the introduction of the NOHZ code it also runs nohz_idle_balance(). Rename it to sched_balance_softirq(), reflecting its more generic purpose and that it's a softirq handler. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-2-mingo@kernel.org
2024-03-12sched/debug: Increase SCHEDSTAT_VERSION to 16Ingo Molnar
We changed the order of definitions within 'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES] columns in show_schedstat(). Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "Gautham R. Shenoy" <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/20240308105901.1096078-5-mingo@kernel.org
2024-02-15membarrier: Create Documentation/scheduler/membarrier.rstAndrea Parri
To gather the architecture requirements of the "private/global expedited" membarrier commands. The file will be expanded to integrate further information about the membarrier syscall (as needed/desired in the future). While at it, amend some related inline comments in the membarrier codebase. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-3-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-23sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)Vincent Guittot
sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature in order to check for possibly related regressions. After 3 years, it has never been used and no regression has been reported. Let's remove it and make fast increase a permanent behavior. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com> Reviewed-by: Yanteng Si <siyanteng@loongson.cn> [for the Chinese translation] Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/20231201161652.1241695-2-vincent.guittot@linaro.org
2023-11-26sched/doc: Update documentation after renames and synchronize Chinese versionWenyu Huang
Update the documentation after these changes, which didn't entirely propagate the changes: e23edc86b09d ("sched/fair: Rename check_preempt_curr() to wakeup_preempt()") 03b7fad167ef ("sched: Add task_struct pointer to sched_class::set_curr_task") 2f88c8e802c8 ("sched/eevdf/doc: Modify the documented knob to base_slice_ns as well") [ mingo: Reworked the changelog. ] Signed-off-by: Wenyu Huang <huangwenyu5@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2023-11-01Merge tag 'asm-generic-6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture
2023-10-09sched/topology: Remove the EM_MAX_COMPLEXITY limitPierre Gondois
The Energy Aware Scheduler (EAS) estimates the energy consumption of placing a task on different CPUs. The goal is to minimize this energy consumption. Estimating the energy of different task placements is increasingly complex with the size of the platform. To avoid having a slow wake-up path, EAS is only enabled if this complexity is low enough. The current complexity limit was set in: b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate platforms") ... based on the first implementation of EAS, which was re-computing the power of the whole platform for each task placement scenario, see: 390031e4c309 ("sched/fair: Introduce an energy estimation helper function") ... but the complexity of EAS was reduced in: eb92692b2544d ("sched/fair: Speed-up energy-aware wake-ups") ... and find_energy_efficient_cpu() (feec) algorithm was updated in: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()") find_energy_efficient_cpu() (feec) is now doing: feec() \_ for_each_pd(pd) [0] // get max_spare_cap_cpu and compute_prev_delta \_ for_each_cpu(pd) [1] \_ eenv_pd_busy_time(pd) [2] \_ for_each_cpu(pd) // compute_energy(pd) without the task \_ eenv_pd_max_util(pd, -1) [3.0] \_ for_each_cpu(pd) \_ em_cpu_energy(pd, -1) \_ for_each_ps(pd) // compute_energy(pd) with the task on prev_cpu \_ eenv_pd_max_util(pd, prev_cpu) [3.1] \_ for_each_cpu(pd) \_ em_cpu_energy(pd, prev_cpu) \_ for_each_ps(pd) // compute_energy(pd) with the task on max_spare_cap_cpu \_ eenv_pd_max_util(pd, max_spare_cap_cpu) [3.2] \_ for_each_cpu(pd) \_ em_cpu_energy(pd, max_spare_cap_cpu) \_ for_each_ps(pd) [3.1] happens only once since prev_cpu is unique. With the same definitions for nr_pd, nr_cpus and nr_ps, the complexity is of: nr_pd * (2 * [nr_cpus in pd] + 2 * ([nr_cpus in pd] + [nr_ps in pd])) + ([nr_cpus in pd] + [nr_ps in pd]) [0] * ( [1] + [2] + [3.0] + [3.2] ) + [3.1] = nr_pd * (4 * [nr_cpus in pd] + 2 * [nr_ps in pd]) + [nr_cpus in prev pd] + nr_ps The complexity limit was set to 2048 in: b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate platforms") ... to make "EAS usable up to 16 CPUs with per-CPU DVFS and less than 8 performance states each". For the same platform, the complexity would actually be of: 16 * (4 + 2 * 7) + 1 + 7 = 296 Since the EAS complexity was greatly reduced since the limit was introduced, bigger platforms can handle EAS. For instance, a platform with 112 CPUs with 7 performance states each would not reach it: 112 * (4 + 2 * 7) + 1 + 7 = 2024 To reflect this improvement in the underlying EAS code, remove the EAS complexity check. Note that a limit on the number of CPUs still holds against EM_MAX_NUM_CPUS to avoid overflows during the energy estimation. [ mingo: Updates to the changelog. ] Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20231009060037.170765-2-sshegde@linux.vnet.ibm.com
2023-10-09sched/topology: Consolidate and clean up access to a CPU's max compute capacityVincent Guittot
Remove the rq::cpu_capacity_orig field and use arch_scale_cpu_capacity() instead. The scheduler uses 3 methods to get access to a CPU's max compute capacity: - arch_scale_cpu_capacity(cpu) which is the default way to get a CPU's capacity. - cpu_capacity_orig field which is periodically updated with arch_scale_cpu_capacity(). - capacity_orig_of(cpu) which encapsulates rq->cpu_capacity_orig. There is no real need to save the value returned by arch_scale_cpu_capacity() in struct rq. arch_scale_cpu_capacity() returns: - either a per_cpu variable. - or a const value for systems which have only one capacity. Remove rq::cpu_capacity_orig and use arch_scale_cpu_capacity() everywhere. No functional changes. Some performance tests on Arm64: - small SMP device (hikey): no noticeable changes - HMP device (RB5): hackbench shows minor improvement (1-2%) - large smp (thx2): hackbench and tbench shows minor improvement (1%) Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20231009103621.374412-2-vincent.guittot@linaro.org
2023-10-02sched/rt/docs: Use 'real-time' instead of 'realtime'Cyril Hrubis
Standardize on a single variant. Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-4-chrubis@suse.cz
2023-10-02sched/rt/docs: Clarify & fix sched_rt_* sysctl docsCyril Hrubis
- Describe explicitly that sched_rt_runtime_us is allocated from sched_rt_period_us and hence always less or equal to that value. - The limit for sched_rt_runtime_us is not INT_MAX-1, but rather it's limited by the value of sched_rt_period_us. If sched_rt_period_us is INT_MAX then sched_rt_runtime_us can be set to INT_MAX as well. Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-3-chrubis@suse.cz
2023-09-11Documentation: Drop or replace remaining mentions of IA64Ard Biesheuvel
Drop or update mentions of IA64, as appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-08-30Merge tag 'docs-6.6' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ...
2023-08-24sched/eevdf/doc: Modify the documented knob to base_slice_ns as wellShrikanth Hegde
After committing the scheduler to EEVDF, we renamed the 'min_granularity_ns' sysctl to 'base_slice_ns': e4ec3318a17f ("sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice") ... but we forgot to rename it in the documentation. Do that now. Fixes: e4ec3318a17f ("sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice") Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230824080342.543396-1-sshegde@linux.vnet.ibm.com
2023-08-18Documentation: Fix typosBjorn Helgaas
Fix typos in Documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-07-14docs: scheduler: completion: Fix minor error in pseudo-codeRick Wertenbroek
Add missing address-of (&) operator in pseudo-code. Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230706114057.1120335-1-rick.wertenbroek@gmail.com
2023-06-16sched/deadline: Update GRUB description in the documentationVineeth Pillai
Update the details of GRUB to reflect the updated logic. Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230530135526.2385378-2-vineeth@bitbyteword.org
2023-04-27Merge tag 'sh-for-v6.4-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "This is a bit larger than my previous one and mainly consists of clean-up work in the arch/sh directory by Geert Uytterhoeven and Randy Dunlap. Additionally, this fixes a bug in the Storage Queue code that was discovered while I was reviewing a patch to switch the code to the bitmap API by Christophe Jaillet. So this contains both a fix for the original bug in the Storage Queue code that can be backported later as well as the Christophe's patch to swich the code to the bitmap API. Summary: - Use generic GCC library routines - sq: Use the bitmap API when applicable - sq: Fix incorrect element size for allocating bitmap buffer - pci: Remove unused variable in SH-7786 PCI Express code - mcount.S: fix build error when PRINTK is not enabled - remove sh5/sh64 last fragments - math-emu: fix macro redefined warning - init: use OF_EARLY_FLATTREE for early init - nmi_debug: fix return value of __setup handler - SH2007: drop the bad URL info" * tag 'sh-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: Replace <uapi/asm/types.h> by <asm-generic/int-ll64.h> sh: Use generic GCC library routines sh: sq: Use the bitmap API when applicable sh: sq: Fix incorrect element size for allocating bitmap buffer sh: pci: Remove unused variable in SH-7786 PCI Express code sh: mcount.S: fix build error when PRINTK is not enabled sh: remove sh5/sh64 last fragments sh: math-emu: fix macro redefined warning sh: init: use OF_EARLY_FLATTREE for early init sh: nmi_debug: fix return value of __setup handler sh: SH2007: drop the bad URL info
2023-03-23sh: remove sh5/sh64 last fragmentsRandy Dunlap
A previous patch removed most of the sh5 (sh64) support from the kernel tree. Now remove the last stragglers. Fixes: 37744feebc08 ("sh: remove sh5 support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: linux-sh@vger.kernel.org Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230306040037.20350-6-rdunlap@infradead.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-03-07sched/doc: supplement CPU capacity with RISC-VSong Shuai
This commit 7d2078310cbf ("dt-bindings: arm: move cpu-capacity to a shared loation") updates some references about capacity-dmips-mhz property in this document. The list of architectures using capacity-dmips-mhz omits RISC-V, so supplements it here. Signed-off-by: Song Shuai <suagrfillet@gmail.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> # English Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Alex Shi <alexs@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230227105941.2749193-1-suagrfillet@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-02-25Merge tag 'riscv-for-linus-6.3-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "There's a bunch of fixes/cleanups throughout the tree as usual, but we also have a handful of new features: - Various improvements to the extension detection and alternative patching infrastructure - Zbb-optimized string routines - Support for cpu-capacity in the RISC-V DT bindings - Zicbom no longer depends on toolchain support - Some performance and code size improvements to ftrace - Support for ARCH_WANT_LD_ORPHAN_WARN - Oops now contain the faulting instruction" * tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (67 commits) RISC-V: add a spin_shadow_stack declaration riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP riscv: Add header include guards to insn.h riscv: alternative: proceed one more instruction for auipc/jalr pair riscv: Avoid enabling interrupts in die() riscv, mm: Perform BPF exhandler fixup on page fault RISC-V: take text_mutex during alternative patching riscv: hwcap: Don't alphabetize ISA extension IDs RISC-V: fix ordering of Zbb extension riscv: jump_label: Fixup unaligned arch_static_branch function RISC-V: Only provide the single-letter extensions in HWCAP riscv: mm: fix regression due to update_mmu_cache change scripts/decodecode: Add support for RISC-V riscv: Add instruction dump to RISC-V splats riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols riscv: lds: define RUNTIME_DISCARD_EXIT RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes ...
2023-02-22Merge tag 'docs-6.3' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It has been a moderately calm cycle for documentation; the significant changes include: - Some significant additions to the memory-management documentation - Some improvements to navigation in the HTML-rendered docs - More Spanish and Chinese translations ... and the usual set of typo fixes and such" * tag 'docs-6.3' of git://git.lwn.net/linux: (68 commits) Documentation/watchdog/hpwdt: Fix Format Documentation/watchdog/hpwdt: Fix Reference Documentation: core-api: padata: correct spelling docs/mm: Physical Memory: correct spelling in reference to CONFIG_PAGE_EXTENSION docs: Use HTML comments for the kernel-toc SPDX line docs: Add more information to the HTML sidebar Documentation: KVM: Update AMD memory encryption link printk: Document that CONFIG_BOOT_PRINTK_DELAY required for boot_delay= Documentation: userspace-api: correct spelling Documentation: sparc: correct spelling Documentation: driver-api: correct spelling Documentation: admin-guide: correct spelling docs: add workload-tracing document to admin-guide docs/admin-guide/mm: remove useless markup docs/mm: remove useless markup docs/mm: Physical Memory: remove useless markup docs/sp_SP: Add process magic-number translation docs: ftrace: always use canonical ftrace path Doc/damon: fix the data path error dma-buf: Add "dma-buf" to title of documentation ...
2023-02-14dt-bindings: arm: move cpu-capacity to a shared loationConor Dooley
RISC-V uses the same generic topology code as arm64 & while there currently exists no binding for cpu-capacity on RISC-V, the code paths can be hit if the property is present. Move the documentation of cpu-capacity to a shared location, ahead of defining a binding for capacity-dmips-mhz on RISC-V. Update some references to this document in the process. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20230104180513.1379453-2-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24Docs/subsystem-apis: Remove '[The ]Linux' prefixes from titles of listed ↵SeongJae Park
documents Some documents that listed on subsystem-apis have 'Linux' or 'The Linux' title prefixes. It's duplicated information, and makes finding the document of interest with human eyes not easy. Remove the prefixes from the titles. Signed-off-by: SeongJae Park <sj@kernel.org> Acked-by: Iwona Winiarska <iwona.winiarska@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230122184834.181977-1-sj@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-05sched/documentation: Document the util clamp featureQais Yousef
Add a document explaining the util clamp feature: what it is and how to use it. The new document hopefully covers everything one needs to know about uclamp. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20221216235716.201923-1-qyousef@layalina.io Cc: Jonathan Corbet <corbet@lwn.net>
2022-09-27docs: scheduler: Update new path for the sysctl knobsLukasz Luba
Add missing update for the documentation bit of some scheduler knob. The knobs have been moved to /debug/sched/ location (with adjusted names). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Alex Shi <alexs@kernel.org> Tested-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20220816121907.841-1-lukasz.luba@arm.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-16docs/scheduler: fix unit errorJui-Tse Huang
The unit mentioned in the documentation of scheduler statistics is outdated which may mislead the readers. The unit of statistics that is reported by /proc/schedstat is modified to nanosecond, and the unit of statistics that is reported by /proc/PID/schedstat is provided as well to make the context consistent. The rq_cpu_time and the rq_sched_info.run_delay of a run queue, and the sched_info.run_delay of a task are all updated based on the clock of the run queue, while the se.sum_exec_runtime of a task is updated based on the clock_task of the run queue of the task. Both the clock and clock_task are relied on the return value of the function sched_clock() which is in the unit of nanosecond. Signed-off-by: Jui-Tse Huang <juitse.huang@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-16docs/scheduler: Change unit of cpu_time and rq_time to nanosecondsChunguang Xu
In the current implementation, cpu_time and rq_time should be in nanoseconds, so this document needs to be modified. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-22Merge tag 'sched-core-2022-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder *_api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer * tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) sched/headers: ARM needs asm/paravirt_api_clock.h too sched/numa: Fix boot crash on arm64 systems headers/prep: Fix header to build standalone: <linux/psi.h> sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y cgroup: Fix suspicious rcu_dereference_check() usage warning sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity() sched/deadline,rt: Remove unused functions for !CONFIG_SMP sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy() sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file sched/deadline: Remove unused def_dl_bandwidth sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE sched/tracing: Don't re-read p->state when emitting sched_switch event sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race sched/cpuacct: Remove redundant RCU read lock sched/cpuacct: Optimize away RCU read lock sched/cpuacct: Fix charge percpu cpuusage sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies ...
2022-03-16docs: scheduler: Convert schedutil.txt to ReSTTang Yizhou
All other scheduler documents have been converted to *.rst. Let's do the same for schedutil.txt. Also fixed some typos. Signed-off-by: Tang Yizhou <tangyizhou@huawei.com> Link: https://lore.kernel.org/r/20220312070751.16844-1-tangyizhou@huawei.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>