summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-08-01sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rstValentin Schneider
Documentation/scheduler/sched-capacity.rst ought to be the canonical place to blabber about SD_ASYM_CPUCAPACITY, so remove its explanation from sched-energy.rst and point to sched-capacity.rst instead. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200731192016.7484-4-valentin.schneider@arm.com
2020-08-01sched/doc: Document capacity aware schedulingValentin Schneider
Add some documentation detailing the concepts, requirements and implementation of capacity aware scheduling across the different scheduler classes. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200731192016.7484-3-valentin.schneider@arm.com
2020-08-01sched: Document arch_scale_*_capacity()Valentin Schneider
Rather that hide their purpose in some dark, damp corner of Documentation/, add some documentation to the default implementations. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200731192016.7484-2-valentin.schneider@arm.com
2020-07-29arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSUREValentin Schneider
Qian reported that the current setup forgoes the Kconfig dependencies and results in warnings such as: WARNING: unmet direct dependencies detected for SCHED_THERMAL_PRESSURE Depends on [n]: SMP [=y] && CPU_FREQ_THERMAL [=n] Selected by [y]: - ARM64 [=y] Revert commit e17ae7fea871 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE") and re-implement it by making the option default to 'y' for arm64 and arm, which respects Kconfig dependencies (i.e. will remain 'n' if CPU_FREQ_THERMAL=n). Fixes: e17ae7fea871 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200729135718.1871-1-valentin.schneider@arm.com
2020-07-29Documentation/sysctl: Document uclamp sysctl knobsQais Yousef
Uclamp exposes 3 sysctl knobs: * sched_util_clamp_min * sched_util_clamp_max * sched_util_clamp_min_rt_default Document them in sysctl/kernel.rst. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-3-qais.yousef@arm.com
2020-07-29sched/uclamp: Add a new sysctl to control RT default boost valueQais Yousef
RT tasks by default run at the highest capacity/performance level. When uclamp is selected this default behavior is retained by enforcing the requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum value. This is also referred to as 'the default boost value of RT tasks'. See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks"). On battery powered devices, it is desired to control this default (currently hardcoded) behavior at runtime to reduce energy consumed by RT tasks. For example, a mobile device manufacturer where big.LITTLE architecture is dominant, the performance of the little cores varies across SoCs, and on high end ones the big cores could be too power hungry. Given the diversity of SoCs, the new knob allows manufactures to tune the best performance/power for RT tasks for the particular hardware they run on. They could opt to further tune the value when the user selects a different power saving mode or when the device is actively charging. The runtime aspect of it further helps in creating a single kernel image that can be run on multiple devices that require different tuning. Keep in mind that a lot of RT tasks in the system are created by the kernel. On Android for instance I can see over 50 RT tasks, only a handful of which created by the Android framework. To control the default behavior globally by system admins and device integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default to change the default boost value of the RT tasks. I anticipate this to be mostly in the form of modifying the init script of a particular device. To avoid polluting the fast path with unnecessary code, the approach taken is to synchronously do the update by traversing all the existing tasks in the system. This could race with a concurrent fork(), which is dealt with by introducing sched_post_fork() function which will ensure the racy fork will get the right update applied. Tested on Juno-r2 in combination with the RT capacity awareness [1]. By default an RT task will go to the highest capacity CPU and run at the maximum frequency, which is particularly energy inefficient on high end mobile devices because the biggest core[s] are 'huge' and power hungry. With this patch the RT task can be controlled to run anywhere by default, and doesn't cause the frequency to be maximum all the time. Yet any task that really needs to be boosted can easily escape this default behavior by modifying its requested uclamp.min value (p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall. [1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
2020-07-29sched/uclamp: Fix a deadlock when enabling uclamp static keyQais Yousef
The following splat was caught when setting uclamp value of a task: BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:49 cpus_read_lock+0x68/0x130 static_key_enable+0x1c/0x38 __sched_setscheduler+0x900/0xad8 Fix by ensuring we enable the key outside of the critical section in __sched_setscheduler() Fixes: 46609ce22703 ("sched/uclamp: Protect uclamp fast path code with static key") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-4-qais.yousef@arm.com
2020-07-28sched: Remove duplicated tick_nohz_full_enabled() checkMiaohe Lin
In sched_update_tick_dependency() there's two calls that check whether nohz_full is enabled: tick_nohz_full_cpu() does it implicitly, while there's also an explicit call to tick_nohz_full_enabled(). Remove the duplicated, open coded check. [ mingo: Amended the changelog. ] Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/1595935075-14223-1-git-send-email-linmiaohe@huawei.com
2020-07-27sched: Fix a typo in a comment王文虎
Change the comment typo: "direcly" -> "directly". Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/AAcAXwBTDSpsKN-5iyIOtaqk.1.1595857191899.Hmail.wenhu.wang@vivo.com
2020-07-25sched/uclamp: Remove unnecessary mutex_init()Qinglang Miao
The uclamp_mutex lock is initialized statically via DEFINE_MUTEX(), it is unnecessary to initialize it runtime via mutex_init(). Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20200725085629.98292-1-miaoqinglang@huawei.com
2020-07-22arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSUREValentin Schneider
This option now correctly depends on CPU_FREQ_THERMAL, so select it on the architectures that implement the required functions, arch_set_thermal_pressure() and arch_get_thermal_pressure(). Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lkml.kernel.org/r/20200712165917.9168-4-valentin.schneider@arm.com
2020-07-22sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entryValentin Schneider
As Russell pointed out [1], this option is severely lacking in the documentation department, and figuring out if one has the required dependencies to benefit from turning it on is not straightforward. Make it non user-visible, and add a bit of help to it. While at it, make it depend on CPU_FREQ_THERMAL. [1]: https://lkml.kernel.org/r/20200603173150.GB1551@shell.armlinux.org.uk Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200712165917.9168-3-valentin.schneider@arm.com
2020-07-22arch_topology, sched/core: Cleanup thermal pressure definitionValentin Schneider
The following commit: 14533a16c46d ("thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code") moved the definition of arch_set_thermal_pressure() to sched/core.c, but kept its declaration in linux/arch_topology.h. When building e.g. an x86 kernel with CONFIG_SCHED_THERMAL_PRESSURE=y, cpufreq_cooling.c ends up getting the declaration of arch_set_thermal_pressure() from include/linux/arch_topology.h, which is somewhat awkward. On top of this, sched/core.c unconditionally defines o The thermal_pressure percpu variable o arch_set_thermal_pressure() while arch_scale_thermal_pressure() does nothing unless redefined by the architecture. arch_*() functions are meant to be defined by architectures, so revert the aforementioned commit and re-implement it in a way that keeps arch_set_thermal_pressure() architecture-definable, and doesn't define the thermal pressure percpu variable for kernels that don't need it (CONFIG_SCHED_THERMAL_PRESSURE=n). Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200712165917.9168-2-valentin.schneider@arm.com
2020-07-22trace/events/sched.h: fix duplicated wordRandy Dunlap
Change "It it" to "It is". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/25305c1d-4ee8-e091-d20f-e700ddad49fd@infradead.org
2020-07-22linux/sched/mm.h: drop duplicated words in commentsRandy Dunlap
Drop doubled words "to" and "that". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/927ea8d8-3f6c-9b65-4c2b-63ab4bd59ef1@infradead.org
2020-07-22smp: Fix a potential usage of stale nr_cpusMuchun Song
The get_option() maybe return 0, it means that the nr_cpus is not initialized. Then we will use the stale nr_cpus to initialize the nr_cpu_ids. So fix it. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716070457.53255-1-songmuchun@bytedance.com
2020-07-22sched/fair: update_pick_idlest() Select group with lowest group_util when ↵Peter Puhov
idle_cpus are equal In slow path, when selecting idlest group, if both groups have type group_has_spare, only idle_cpus count gets compared. As a result, if multiple tasks are created in a tight loop, and go back to sleep immediately (while waiting for all tasks to be created), they may be scheduled on the same core, because CPU is back to idle when the new fork happen. For example: sudo perf record -e sched:sched_wakeup_new -- \ sysbench threads --threads=4 run ... total number of events: 61582 ... sudo perf script sysbench 129378 [006] 74586.633466: sched:sched_wakeup_new: sysbench:129380 [120] success=1 CPU:007 sysbench 129378 [006] 74586.634718: sched:sched_wakeup_new: sysbench:129381 [120] success=1 CPU:007 sysbench 129378 [006] 74586.635957: sched:sched_wakeup_new: sysbench:129382 [120] success=1 CPU:007 sysbench 129378 [006] 74586.637183: sched:sched_wakeup_new: sysbench:129383 [120] success=1 CPU:007 This may have negative impact on performance for workloads with frequent creation of multiple threads. In this patch we are using group_util to select idlest group if both groups have equal number of idle_cpus. Comparing the number of idle cpu is not enough in this case, because the newly forked thread sleeps immediately and before we select the cpu for the next one. This is shown in the trace where the same CPU7 is selected for all wakeup_new events. That's why, looking at utilization when there is the same number of CPU is a good way to see where the previous task was placed. Using nr_running doesn't solve the problem because the newly forked task is not running and the cpu would not have been idle in this case and an idle CPU would have been selected instead. With this patch newly created tasks would be better distributed. With this patch: sudo perf record -e sched:sched_wakeup_new -- \ sysbench threads --threads=4 run ... total number of events: 74401 ... sudo perf script sysbench 129455 [006] 75232.853257: sched:sched_wakeup_new: sysbench:129457 [120] success=1 CPU:008 sysbench 129455 [006] 75232.854489: sched:sched_wakeup_new: sysbench:129458 [120] success=1 CPU:009 sysbench 129455 [006] 75232.855732: sched:sched_wakeup_new: sysbench:129459 [120] success=1 CPU:010 sysbench 129455 [006] 75232.856980: sched:sched_wakeup_new: sysbench:129460 [120] success=1 CPU:011 We tested this patch with following benchmarks: master: 'commit b3a9e3b9622a ("Linux 5.8-rc1")' 100 iterations of: perf bench -f simple futex wake -s -t 128 -w 1 Lower result is better | | BASELINE | +PATCH | DELTA (%) | |---------|------------|----------|-------------| | mean | 0.33 | 0.313 | +5.152 | | std (%) | 10.433 | 7.563 | | 100 iterations of: sysbench threads --threads=8 run Higher result is better | | BASELINE | +PATCH | DELTA (%) | |---------|------------|----------|-------------| | mean | 5235.02 | 5863.73 | +12.01 | | std (%) | 8.166 | 10.265 | | 100 iterations of: sysbench mutex --mutex-num=1 --threads=8 run Lower result is better | | BASELINE | +PATCH | DELTA (%) | |---------|------------|----------|-------------| | mean | 0.413 | 0.404 | +2.179 | | std (%) | 3.791 | 1.816 | | Signed-off-by: Peter Puhov <peter.puhov@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200714125941.4174-1-peter.puhov@linaro.org
2020-07-22sched: nohz: stop passing around unused "ticks" parameter.Paul Gortmaker
The "ticks" parameter was added in commit 0f004f5a696a ("sched: Cure more NO_HZ load average woes") since calc_global_nohz() was called and needed the "ticks" argument. But in commit c308b56b5398 ("sched: Fix nohz load accounting -- again!") it became unused as the function calc_global_nohz() dropped using "ticks". Fixes: c308b56b5398 ("sched: Fix nohz load accounting -- again!") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593628458-32290-1-git-send-email-paul.gortmaker@windriver.com
2020-07-22sched: Better document ttwu()Peter Zijlstra
Dave hit the problem fixed by commit: b6e13e85829f ("sched/core: Fix ttwu() race") and failed to understand much of the code involved. Per his request a few comments to (hopefully) clarify things. Requested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200702125211.GQ4800@hirez.programming.kicks-ass.net
2020-07-22Merge branch 'sched/urgent'Peter Zijlstra
2020-07-22sched: Fix race against ptrace_freeze_trace()Peter Zijlstra
There is apparently one site that violates the rule that only current and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced() will change task->state for a remote task. Oleg explains: "TASK_TRACED/TASK_STOPPED was always protected by siglock. In particular, ttwu(__TASK_TRACED) must be always called with siglock held. That is why ptrace_freeze_traced() assumes it can safely do s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)." This breaks the ordering scheme introduced by commit: dbfb089d360b ("sched: Fix loadavg accounting race") Specifically, the reload not matching no longer implies we don't have to block. Simply things by noting that what we need is a LOAD->STORE ordering and this can be provided by a control dependency. So replace: prev_state = prev->state; raw_spin_lock(&rq->lock); smp_mb__after_spinlock(); /* SMP-MB */ if (... && prev_state && prev_state == prev->state) deactivate_task(); with: prev_state = prev->state; if (... && prev_state) /* CTRL-DEP */ deactivate_task(); Since that already implies the 'prev->state' load must be complete before allowing the 'prev->on_rq = 0' store to become visible. Fixes: dbfb089d360b ("sched: Fix loadavg accounting race") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-19Linux 5.8-rc6v5.8-rc6Linus Torvalds
2020-07-19Merge tag 'perf-tools-fixes-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into master Pull perf tooling fixes from Arnaldo Carvalho de Melo: - Update hashmap.h from libbpf and kvm.h from x86's kernel UAPI. - Set opt->set in libsubcmd's OPT_CALLBACK_SET(). This fixes 'perf record --switch-output-event event-name' usage" * tag 'perf-tools-fixes-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: tools arch kvm: Sync kvm headers with the kernel sources perf tools: Sync hashmap.h with libbpf's libsubcmd: Fix OPT_CALLBACK_SET()
2020-07-19Merge tag 'x86-urgent-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull x86 fixes from Thomas Gleixner: "A pile of fixes for x86: - Fix the I/O bitmap invalidation on XEN PV, which was overlooked in the recent ioperm/iopl rework. This caused the TSS and XEN's I/O bitmap to get out of sync. - Use the proper vectors for HYPERV. - Make disabling of stack protector for the entry code work with GCC builds which enable stack protector by default. Removing the option is not sufficient, it needs an explicit -fno-stack-protector to shut it off. - Mark check_user_regs() noinstr as it is called from noinstr code. The missing annotation causes it to be placed in the text section which makes it instrumentable. - Add the missing interrupt disable in exc_alignment_check() - Fixup a XEN_PV build dependency in the 32bit entry code - A few fixes to make the Clang integrated assembler happy - Move EFI stub build to the right place for out of tree builds - Make prepare_exit_to_usermode() static. It's not longer called from ASM code" * tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Don't add the EFI stub to targets x86/entry: Actually disable stack protector x86/ioperm: Fix io bitmap invalidation on Xen PV x86: math-emu: Fix up 'cmp' insn for clang ias x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV x86/entry: Add compatibility with IAS x86/entry/common: Make prepare_exit_to_usermode() static x86/entry: Mark check_user_regs() noinstr x86/traps: Disable interrupts in exc_aligment_check() x86/entry/32: Fix XEN_PV build dependency
2020-07-19Merge tag 'timers-urgent-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull timer fixes from Thomas Gleixner: "Two fixes for the timer wheel: - A timer which is already expired at enqueue time can set the base->next_expiry value backwards. As a consequence base->clk can be set back as well. This can lead to timers expiring early. Add a sanity check to prevent this. - When a timer is queued with an expiry time beyond the wheel capacity then it should be queued in the bucket of the last wheel level which is expiring last. The code adjusted the expiry time to the maximum wheel capacity, which is only correct when the wheel clock is 0. Aside of that the check whether the delta is larger than wheel capacity does not check the delta, it checks the expiry value itself. As a result timers can expire at random. Fix this by checking the right variable and adjust expiry time so it becomes base->clock plus capacity which places it into the outmost bucket in the last wheel level" * tag 'timers-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Fix wheel index calculation on last level timer: Prevent base->clk from moving backward
2020-07-19Merge tag 'sched-urgent-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull scheduler fixes from Thomas Gleixner: "A set of scheduler fixes: - Plug a load average accounting race which was introduced with a recent optimization casing load average to show bogus numbers. - Fix the rseq CPU id initialization for new tasks. sched_fork() does not update the rseq CPU id so the id is the stale id of the parent task, which can cause user space data corruption. - Handle a 0 return value of task_h_load() correctly in the load balancer, which does not decrease imbalance and therefore pulls until the maximum number of loops is reached, which might be all tasks just created by a fork bomb" * tag 'sched-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: handle case of task_h_load() returning 0 sched: Fix unreliable rseq cpu_id for new tasks sched: Fix loadavg accounting race
2020-07-19Merge tag 'irq-urgent-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull irq fixes from Thomas Gleixner: "Two fixes for the interrupt subsystem: - Make the handling of the firmware node consistent and do not free the node after the domain has been created successfully. The core code stores a pointer to it which can lead to a use after free or double free. This used to "work" because the pointer was not stored when the initial code was written, but at some point later it was required to store it. Of course nobody noticed that the existing users break that way. - Handle affinity setting on inactive interrupts correctly when hierarchical irq domains are enabled. When interrupts are inactive with the modern hierarchical irqdomain design, the interrupt chips are not necessarily in a state where affinity changes can be handled. The legacy irq chip design allowed this because interrupts are immediately fully initialized at allocation time. X86 has a hacky workaround for this, but other implementations do not. This cased malfunction on GIC-V3. Instead of playing whack a mole to find all affected drivers, change the core code to store the requested affinity setting and then establish it when the interrupt is allocated, which makes the X86 hack go away" * tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Handle affinity setting on inactive interrupts correctly irqdomain/treewide: Keep firmware node unconditionally allocated
2020-07-19Merge tag 'usb-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb into master Pull USB fixes from Greg KH: "Here are a few small USB fixes, and one thunderbolt fix, for 5.8-rc6. Nothing huge in here, just the normal collection of gadget, dwc2/3, serial, and other minor USB driver fixes and id additions. Full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: iuu_phoenix: fix memory corruption USB: c67x00: fix use after free in c67x00_giveback_urb usb: gadget: function: fix missing spinlock in f_uac1_legacy usb: gadget: udc: atmel: fix uninitialized read in debug printk usb: gadget: udc: atmel: remove outdated comment in usba_ep_disable() usb: dwc2: Fix shutdown callback in platform usb: cdns3: trace: fix some endian issues usb: cdns3: ep0: fix some endian issues usb: gadget: udc: gr_udc: fix memleak on error handling path in gr_ep_init() usb: gadget: fix langid kernel-doc warning in usbstring.c usb: dwc3: pci: add support for the Intel Jasper Lake usb: dwc3: pci: add support for the Intel Tiger Lake PCH -H variant usb: chipidea: core: add wakeup support for extcon USB: serial: option: add Quectel EG95 LTE modem thunderbolt: Fix path indices used in USB3 tunnel discovery USB: serial: ch341: add new Product ID for CH340 USB: serial: option: add GosunCn GM500 series USB: serial: cypress_m8: enable Simply Automated UPB PIM
2020-07-19Merge tag 'dma-mapping-5.8-6' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping into master Pull dma-mapping fixes from Christoph Hellwig: "Ensure we always have fully addressable memory in the dma coherent pool (Nicolas Saenz Julienne)" * tag 'dma-mapping-5.8-6' of git://git.infradead.org/users/hch/dma-mapping: dma-pool: do not allocate pool memory from CMA dma-pool: make sure atomic pool suits device dma-pool: introduce dma_guess_pool() dma-pool: get rid of dma_in_atomic_pool() dma-direct: provide function to check physical memory area validity
2020-07-19x86/boot: Don't add the EFI stub to targetsArvind Sankar
vmlinux-objs-y is added to targets, which currently means that the EFI stub gets added to the targets as well. It shouldn't be added since it is built elsewhere. This confuses Makefile.build which interprets the EFI stub as a target $(obj)/$(objtree)/drivers/firmware/efi/libstub/lib.a and will create drivers/firmware/efi/libstub/ underneath arch/x86/boot/compressed, to hold this supposed target, if building out-of-tree. [0] Fix this by pulling the stub out of vmlinux-objs-y into efi-obj-y. [0] See scripts/Makefile.build near the end: # Create directories for object files if they do not exist Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200715032631.1562882-1-nivedita@alum.mit.edu
2020-07-19x86/entry: Actually disable stack protectorKees Cook
Some builds of GCC enable stack protector by default. Simply removing the arguments is not sufficient to disable stack protector, as the stack protector for those GCC builds must be explicitly disabled. Remove the argument removals and add -fno-stack-protector. Additionally include missed x32 argument updates, and adjust whitespace for readability. Fixes: 20355e5f73a7 ("x86/entry: Exclude low level entry code from sanitizing") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/202006261333.585319CA6B@keescook
2020-07-18Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi into master Pull SCSI fix from James Bottomley: "One small driver fix. Although the one liner makes it sound like a cosmetic change, it's a regression fix for the megaraid_sas driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: megaraid_sas: Remove undefined ENABLE_IRQ_POLL macro
2020-07-18Merge tag 'hwmon-for-v5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging into master Pull hwmon fixes from Guenter Roeck: - Using SCT on some Tohsiba drives causes firmware hangs. Disable its use in the drivetemp driver. - Handle potential buffer overflows in scmi and aspeed-pwm-tacho driver. - Energy reporting does not work well on all AMD CPUs. Restrict amd_energy to known working models. - Enable reading the CPU temperature on NCT6798D using undocumented registers. - Fix read errors seen if PEC is enabled in adm1275 driver. - Fix setting the pwm1_enable in emc2103 driver. * tag 'hwmon-for-v5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives hwmon: (scmi) Fix potential buffer overflow in scmi_hwmon_probe() hwmon: (nct6775) Accept PECI Calibration as temperature source for NCT6798D hwmon: (adm1275) Make sure we are reading enough data for different chips hwmon: (emc2103) fix unable to change fan pwm1_enable attribute hwmon: (amd_energy) match for supported models hwmon: (aspeed-pwm-tacho) Avoid possible buffer overflow
2020-07-18Merge tag 'riscv-for-linus-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux into master Pull RISC-V fixes from Palmer Dabbelt: "Two fixes: - 16KiB kernel stacks on rv64, which fixes a lot of crashes. - Rolling an mmiowb() into the scheduler, which when combined with Will's fix to the mmiowb()-on-spinlock should fix the PREEMPT issues we've been seeing" * tag 'riscv-for-linus-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Upgrade smp_mb__after_spinlock() to iorw,iorw riscv: use 16KB kernel stack on 64-bit
2020-07-18Merge tag 'powerpc-5.8-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into master Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.8: - A fix to the VAS code we merged this cycle, to report the proper error code to userspace for address translation failures. And a selftest update to match. - Another fix for our pkey handling of PROT_EXEC mappings. - A fix for a crash when booting a "secure VM" under an ultravisor with certain numbers of CPUs. Thanks to: Aneesh Kumar K.V, Haren Myneni, Laurent Dufour, Sandipan Das, Satheesh Rajendran, Thiago Jung Bauermann" * tag 'powerpc-5.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Use proper error code to check fault address powerpc/vas: Report proper error code for address translation failure powerpc/pseries/svm: Fix incorrect check for shared_lppaca_size powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkey
2020-07-18hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drivesMaciej S. Szmigiero
It has been observed that Toshiba DT01ACA family drives have WRITE FPDMA QUEUED command timeouts and sometimes just freeze until power-cycled under heavy write loads when their temperature is getting polled in SCT mode. The SMART mode seems to be fine, though. Let's make sure we don't use SCT mode for these drives then. While only the 3 TB model was actually caught exhibiting the problem let's play safe here to avoid data corruption and extend the ban to the whole family. Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") Cc: stable@vger.kernel.org Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Link: https://lore.kernel.org/r/0cb2e7022b66c6d21d3f189a12a97878d0e7511b.1595075458.git.mail@maciej.szmigiero.name Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-07-18x86/ioperm: Fix io bitmap invalidation on Xen PVAndy Lutomirski
tss_invalidate_io_bitmap() wasn't wired up properly through the pvop machinery, so the TSS and Xen's io bitmap would get out of sync whenever disabling a valid io bitmap. Add a new pvop for tss_invalidate_io_bitmap() to fix it. This is XSA-329. Fixes: 22fe5b0439dd ("x86/ioperm: Move TSS bitmap update to exit to user work") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/d53075590e1f91c19f8af705059d3ff99424c020.1595030016.git.luto@kernel.org
2020-07-17Merge tag 'nfs-for-5.8-3' of git://git.linux-nfs.org/projects/anna/linux-nfs ↵Linus Torvalds
into master Pull NFS client fixes from Anna Schumaker: "A few more NFS client bugfixes for Linux 5.8: NFS: - Fix interrupted slots by using the SEQUENCE operation SUNRPC: - revert d03727b248d0 to fix unkillable IOs xprtrdma: - Fix double-free in rpcrdma_ep_create() - Fix recursion into rpcrdma_xprt_disconnect() - Fix return code from rpcrdma_xprt_connect() - Fix handling of connect errors - Fix incorrect header size calculations" * tag 'nfs-for-5.8-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO compeletion") xprtrdma: fix incorrect header size calculations NFS: Fix interrupted slots by sending a solo SEQUENCE operation xprtrdma: Fix handling of connect errors xprtrdma: Fix return code from rpcrdma_xprt_connect() xprtrdma: Fix recursion into rpcrdma_xprt_disconnect() xprtrdma: Fix double-free in rpcrdma_ep_create()
2020-07-17Merge tag 'arm-fixes-5.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc into master Pull ARM SoC fixes from Arnd Bergmann: "This time there are a number of actual code fixes, plus a small set of device tree issues getting addressed: Renesas: - one defconfig cleanup to allow a later Kconfig change Intel socfpga: - enable QSPI devices on some machines - fix DTC validation warnings TI OMAP: - Two DEBUG_ATOMIC_SLEEP fixes for ti-sysc interconnect target module driver - A regression fix for ti-sysc no-idle handling that caused issues compared to earlier platform data based booting - A fix for memory leak for omap_hwmod_allocate_module - Fix d_can driver probe for am437x NXP i.MX: - A couple of fixes on i.MX platform device registration code to stop the use of invalid IRQ 0. - Fix a regression seen on ls1021a platform, caused by commit 52102a3ba6a61 ("soc: imx: move cpu code to drivers/soc/imx"). - Fix a misconfiguration of audio SSI on imx6qdl-gw551x board. Amlogic Meson: - misc DT fixes - SoC ID fixes to detect all chips correctly" * tag 'arm-fixes-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: spcfpga: Align GIC, NAND and UART nodenames with dtschema ARM: dts: socfpga: Align L2 cache-controller nodename with dtschema arm64: dts: stratix10: increase QSPI reg address in nand dts file arm64: dts: stratix10: add status to qspi dts node arm64: dts: agilex: add status to qspi dts node ARM: dts: Fix dcan driver probe failed on am437x platform ARM: OMAP2+: Fix possible memory leak in omap_hwmod_allocate_module arm64: defconfig: Enable CONFIG_PCIE_RCAR_HOST soc: imx: check ls1021a ARM: imx: Remove imx_add_imx_dma() unused irq_err argument ARM: imx: Provide correct number of resources when registering gpio devices ARM: dts: imx6qdl-gw551x: fix audio SSI bus: ti-sysc: Do not disable on suspend for no-idle bus: ti-sysc: Fix sleeping function called from invalid context for RTC quirk bus: ti-sysc: Fix wakeirq sleeping function called from invalid context ARM: dts: meson: Align L2 cache-controller nodename with dtschema arm64: dts: meson-gxl-s805x: reduce initial Mali450 core frequency arm64: dts: meson: add missing gxl rng clock soc: amlogic: meson-gx-socinfo: Fix S905X3 and S905D3 ID's
2020-07-17Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into master Pull arm64 fixes from Will Deacon: "A batch of arm64 fixes. Although the diffstat is a bit larger than we'd usually have at this stage, a decent amount of it is the addition of comments describing our syscall tracing behaviour, and also a sweep across all the modular arm64 PMU drivers to make them rebust against unloading and unbinding. There are a couple of minor things kicking around at the moment (CPU errata and module PLTs for very large modules), but I'm not expecting any significant changes now for us in 5.8. - Fix kernel text addresses for relocatable images booting using EFI and with KASLR disabled so that they match the vmlinux ELF binary. - Fix unloading and unbinding of PMU driver modules. - Fix generic mmiowb() when writeX() is called from preemptible context (reported by the riscv folks). - Fix ptrace hardware single-step interactions with signal handlers, system calls and reverse debugging. - Fix reporting of 64-bit x0 register for 32-bit tasks via 'perf_regs'. - Add comments describing syscall entry/exit tracing ABI" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: drivers/perf: Prevent forced unbinding of PMU drivers asm-generic/mmiowb: Allow mmiowb_set_pending() when preemptible() arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP arm64: ptrace: Use NO_SYSCALL instead of -1 in syscall_trace_enter() arm64: syscall: Expand the comment about ptrace and syscall(-1) arm64: ptrace: Add a comment describing our syscall entry/exit trap ABI arm64: compat: Ensure upper 32 bits of x0 are zero on syscall return arm64: ptrace: Override SPSR.SS when single-stepping is enabled arm64: ptrace: Consistently use pseudo-singlestep exceptions drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling efi/libstub/arm64: Retain 2MB kernel Image alignment if !KASLR
2020-07-17genirq/affinity: Handle affinity setting on inactive interrupts correctlyThomas Gleixner
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
2020-07-17timer: Fix wheel index calculation on last levelFrederic Weisbecker
When an expiration delta falls into the last level of the wheel, that delta has be compared against the maximum possible delay and reduced to fit in if necessary. However instead of comparing the delta against the maximum, the code compares the actual expiry against the maximum. Then instead of fixing the delta to fit in, it sets the maximum delta as the expiry value. This can result in various undesired outcomes, the worst possible one being a timer expiring 15 days ahead to fire immediately. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200717140551.29076-2-frederic@kernel.org
2020-07-17SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO ↵Olga Kornievskaia
compeletion") Reverting commit d03727b248d0 "NFSv4 fix CLOSE not waiting for direct IO compeletion". This patch made it so that fput() by calling inode_dio_done() in nfs_file_release() would wait uninterruptably for any outstanding directIO to the file (but that wait on IO should be killable). The problem the patch was also trying to address was REMOVE returning ERR_ACCESS because the file is still opened, is supposed to be resolved by server returning ERR_FILE_OPEN and not ERR_ACCESS. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-17Merge tag 'io_uring-5.8-2020-07-17' of git://git.kernel.dk/linux-block into ↵Linus Torvalds
master Pull io_uring fix from Jens Axboe: "Fix for a case where, with automatic buffer selection, we can leak the buffer descriptor for recvmsg" * tag 'io_uring-5.8-2020-07-17' of git://git.kernel.dk/linux-block: io_uring: fix recvmsg memory leak with buffer selection
2020-07-17Merge tag 'block-5.8-2020-07-17' of git://git.kernel.dk/linux-block into masterLinus Torvalds
Pull block fix from Jens Axboe: "Single NVMe multipath capacity fix" * tag 'block-5.8-2020-07-17' of git://git.kernel.dk/linux-block: nvme: explicitly update mpath disk capacity on revalidation
2020-07-17Merge tag 'fuse-fixes-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into master Pull fuse fixes from Miklos Szeredi: - two regressions in this cycle caused by the conversion of writepage list to an rb_tree - two regressions in v5.4 cause by the conversion to the new mount API - saner behavior of fsconfig(2) for the reconfigure case - an ancient issue with FS_IOC_{GET,SET}FLAGS ioctls * tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS fuse: don't ignore errors from fuse_writepages_fill() fuse: clean up condition for writepage sending fuse: reject options on reconfigure via fsconfig(2) fuse: ignore 'data' argument of mount(..., MS_REMOUNT) fuse: use ->reconfigure() instead of ->remount_fs() fuse: fix warning in tree_insert() and clean up writepage insertion fuse: move rb_erase() before tree_insert()
2020-07-17Merge tag 'ovl-fixes-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into master Pull overlayfs fixes from Miklos Szeredi: - fix a regression introduced in v4.20 in handling a regenerated squashfs lower layer - two regression fixes for this cycle, one of which is Oops inducing - miscellaneous issues * tag 'ovl-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix lookup of indexed hardlinks with metacopy ovl: fix unneeded call to ovl_change_flags() ovl: fix mount option checks for nfs_export with no upperdir ovl: force read-only sb on failure to create index dir ovl: fix regression with re-formatted lower squashfs ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on ovl: relax WARN_ON() when decoding lower directory file handle ovl: remove not used argument in ovl_check_origin ovl: change ovl_copy_up_flags static ovl: inode reference leak in ovl_is_inuse true case.
2020-07-17Merge tag 'spi-fix-v5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into master Pull spi fixes from Mark Brown: "A couple of small driver specific fixes for fairly minor issues" * tag 'spi-fix-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-sun6i: sun6i_spi_transfer_one(): fix setting of clock rate spi: mediatek: use correct SPI_CFG2_REG MACRO
2020-07-17Merge tag 'regulator-fix-v5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into master Pull regulator fixes from Mark Brown: "The more substantial fix here is the rename of the da903x driver which fixes a collision with the parent MFD driver name which caused issues when things were built as modules. There's also a fix for a mislableled regulator on the pmi8994 which is quite important for systems with that device" * tag 'regulator-fix-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: MAINTAINERS: remove obsolete entry after file renaming regulator: rename da903x to da903x-regulator regulator: qcom_smd: Fix pmi8994 label
2020-07-17Merge tag 'regmap-fix-v5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap into master Pull regmap fixes from Mark Brown: "A couple of substantial fixes here, one from Doug which fixes the debugfs code for MMIO regmaps (fortunately not the common case) and one from Marc fixing lookups of multiple regmaps for the same device (a very unusual case). There's also a fix for Kconfig to ensure we enable SoundWire properly" * tag 'regmap-fix-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: debugfs: Don't sleep while atomic for fast_io regmaps regmap: add missing dependency on SoundWire regmap: dev_get_regmap_match(): fix string comparison