diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-19 16:35:06 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-19 16:35:06 -0800 |
commit | bf9aa14fc523d2763fc9a10672a709224e8fcaf4 (patch) | |
tree | 7d9c0cad473dc27a0c9bb09c561511df9481b066 /arch | |
parent | 035238752319a58244d86facd442c5f40b0e97e2 (diff) | |
parent | cdc905d16b07981363e53a21853ba1cf6cd8e92a (diff) | |
download | lwn-bf9aa14fc523d2763fc9a10672a709224e8fcaf4.tar.gz lwn-bf9aa14fc523d2763fc9a10672a709224e8fcaf4.zip |
Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
Diffstat (limited to 'arch')
-rw-r--r-- | arch/arm/kernel/smp_twd.c | 1 | ||||
-rw-r--r-- | arch/mips/ralink/Kconfig | 7 | ||||
-rw-r--r-- | arch/mips/ralink/Makefile | 2 | ||||
-rw-r--r-- | arch/mips/ralink/cevt-rt3352.c | 153 | ||||
-rw-r--r-- | arch/powerpc/kernel/rtas.c | 21 | ||||
-rw-r--r-- | arch/riscv/configs/defconfig | 1 | ||||
-rw-r--r-- | arch/x86/Kconfig | 1 | ||||
-rw-r--r-- | arch/x86/include/asm/timer.h | 2 | ||||
-rw-r--r-- | arch/x86/kvm/xen.c | 12 |
9 files changed, 9 insertions, 191 deletions
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c index 9a14f721a2b0..42a3706e16a6 100644 --- a/arch/arm/kernel/smp_twd.c +++ b/arch/arm/kernel/smp_twd.c @@ -93,7 +93,6 @@ static void twd_timer_stop(void) { struct clock_event_device *clk = raw_cpu_ptr(twd_evt); - twd_shutdown(clk); disable_percpu_irq(clk->irq); } diff --git a/arch/mips/ralink/Kconfig b/arch/mips/ralink/Kconfig index 08c012a2591f..910d059ec70b 100644 --- a/arch/mips/ralink/Kconfig +++ b/arch/mips/ralink/Kconfig @@ -1,13 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 if RALINK -config CLKEVT_RT3352 - bool - depends on SOC_RT305X || SOC_MT7620 - default y - select TIMER_OF - select CLKSRC_MMIO - config RALINK_ILL_ACC bool depends on SOC_RT305X diff --git a/arch/mips/ralink/Makefile b/arch/mips/ralink/Makefile index 26fabbdea1f1..0c109eae1953 100644 --- a/arch/mips/ralink/Makefile +++ b/arch/mips/ralink/Makefile @@ -10,8 +10,6 @@ ifndef CONFIG_MIPS_GIC obj-y += clk.o timer.o endif -obj-$(CONFIG_CLKEVT_RT3352) += cevt-rt3352.o - obj-$(CONFIG_RALINK_ILL_ACC) += ill_acc.o obj-$(CONFIG_IRQ_INTC) += irq.o diff --git a/arch/mips/ralink/cevt-rt3352.c b/arch/mips/ralink/cevt-rt3352.c deleted file mode 100644 index 269d4877d120..000000000000 --- a/arch/mips/ralink/cevt-rt3352.c +++ /dev/null @@ -1,153 +0,0 @@ -/* - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - * - * Copyright (C) 2013 by John Crispin <john@phrozen.org> - */ - -#include <linux/clockchips.h> -#include <linux/clocksource.h> -#include <linux/interrupt.h> -#include <linux/reset.h> -#include <linux/init.h> -#include <linux/time.h> -#include <linux/of.h> -#include <linux/of_irq.h> -#include <linux/of_address.h> - -#include <asm/mach-ralink/ralink_regs.h> - -#define SYSTICK_FREQ (50 * 1000) - -#define SYSTICK_CONFIG 0x00 -#define SYSTICK_COMPARE 0x04 -#define SYSTICK_COUNT 0x08 - -/* route systick irq to mips irq 7 instead of the r4k-timer */ -#define CFG_EXT_STK_EN 0x2 -/* enable the counter */ -#define CFG_CNT_EN 0x1 - -struct systick_device { - void __iomem *membase; - struct clock_event_device dev; - int irq_requested; - int freq_scale; -}; - -static int systick_set_oneshot(struct clock_event_device *evt); -static int systick_shutdown(struct clock_event_device *evt); - -static int systick_next_event(unsigned long delta, - struct clock_event_device *evt) -{ - struct systick_device *sdev; - u32 count; - - sdev = container_of(evt, struct systick_device, dev); - count = ioread32(sdev->membase + SYSTICK_COUNT); - count = (count + delta) % SYSTICK_FREQ; - iowrite32(count, sdev->membase + SYSTICK_COMPARE); - - return 0; -} - -static void systick_event_handler(struct clock_event_device *dev) -{ - /* noting to do here */ -} - -static irqreturn_t systick_interrupt(int irq, void *dev_id) -{ - struct clock_event_device *dev = (struct clock_event_device *) dev_id; - - dev->event_handler(dev); - - return IRQ_HANDLED; -} - -static struct systick_device systick = { - .dev = { - /* - * cevt-r4k uses 300, make sure systick - * gets used if available - */ - .rating = 310, - .features = CLOCK_EVT_FEAT_ONESHOT, - .set_next_event = systick_next_event, - .set_state_shutdown = systick_shutdown, - .set_state_oneshot = systick_set_oneshot, - .event_handler = systick_event_handler, - }, -}; - -static int systick_shutdown(struct clock_event_device *evt) -{ - struct systick_device *sdev; - - sdev = container_of(evt, struct systick_device, dev); - - if (sdev->irq_requested) - free_irq(systick.dev.irq, &systick.dev); - sdev->irq_requested = 0; - iowrite32(0, systick.membase + SYSTICK_CONFIG); - - return 0; -} - -static int systick_set_oneshot(struct clock_event_device *evt) -{ - const char *name = systick.dev.name; - struct systick_device *sdev; - int irq = systick.dev.irq; - - sdev = container_of(evt, struct systick_device, dev); - - if (!sdev->irq_requested) { - if (request_irq(irq, systick_interrupt, - IRQF_PERCPU | IRQF_TIMER, name, &systick.dev)) - pr_err("Failed to request irq %d (%s)\n", irq, name); - } - sdev->irq_requested = 1; - iowrite32(CFG_EXT_STK_EN | CFG_CNT_EN, - systick.membase + SYSTICK_CONFIG); - - return 0; -} - -static int __init ralink_systick_init(struct device_node *np) -{ - int ret; - - systick.membase = of_iomap(np, 0); - if (!systick.membase) - return -ENXIO; - - systick.dev.name = np->name; - clockevents_calc_mult_shift(&systick.dev, SYSTICK_FREQ, 60); - systick.dev.max_delta_ns = clockevent_delta2ns(0x7fff, &systick.dev); - systick.dev.max_delta_ticks = 0x7fff; - systick.dev.min_delta_ns = clockevent_delta2ns(0x3, &systick.dev); - systick.dev.min_delta_ticks = 0x3; - systick.dev.irq = irq_of_parse_and_map(np, 0); - if (!systick.dev.irq) { - pr_err("%pOFn: request_irq failed", np); - return -EINVAL; - } - - ret = clocksource_mmio_init(systick.membase + SYSTICK_COUNT, np->name, - SYSTICK_FREQ, 301, 16, - clocksource_mmio_readl_up); - if (ret) - return ret; - - clockevents_register_device(&systick.dev); - - pr_info("%pOFn: running - mult: %d, shift: %d\n", - np, systick.dev.mult, systick.dev.shift); - - return 0; -} - -TIMER_OF_DECLARE(systick, "ralink,cevt-systick", ralink_systick_init); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index f7e86e09c49f..d31c9799cab2 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1390,21 +1390,14 @@ bool __ref rtas_busy_delay(int status) */ ms = clamp(ms, 1U, 1000U); /* - * The delay hint is an order-of-magnitude suggestion, not - * a minimum. It is fine, possibly even advantageous, for - * us to pause for less time than hinted. For small values, - * use usleep_range() to ensure we don't sleep much longer - * than actually needed. - * - * See Documentation/timers/timers-howto.rst for - * explanation of the threshold used here. In effect we use - * usleep_range() for 9900 and 9901, msleep() for - * 9902-9905. + * The delay hint is an order-of-magnitude suggestion, not a + * minimum. It is fine, possibly even advantageous, for us to + * pause for less time than hinted. To make sure pause time will + * not be way longer than requested independent of HZ + * configuration, use fsleep(). See fsleep() for details of + * used sleeping functions. */ - if (ms <= 20) - usleep_range(ms * 100, ms * 1000); - else - msleep(ms); + fsleep(ms * 1000); break; case RTAS_BUSY: ret = true; diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig index 5b1d6325df85..1d5e13b148f2 100644 --- a/arch/riscv/configs/defconfig +++ b/arch/riscv/configs/defconfig @@ -302,7 +302,6 @@ CONFIG_DEBUG_MEMORY_INIT=y CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_WQ_WATCHDOG=y -CONFIG_DEBUG_TIMEKEEPING=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8f5362a2498f..a3c31b784edc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -146,7 +146,6 @@ config X86 select ARCH_HAS_PARANOID_L1D_FLUSH select BUILDTIME_TABLE_SORT select CLKEVT_I8253 - select CLOCKSOURCE_VALIDATE_LAST_CYCLE select CLOCKSOURCE_WATCHDOG # Word-size accesses may read uninitialized data past the trailing \0 # in strings and cause false KMSAN reports. diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h index 7365dd4acffb..23baf8c9b34c 100644 --- a/arch/x86/include/asm/timer.h +++ b/arch/x86/include/asm/timer.h @@ -6,8 +6,6 @@ #include <linux/interrupt.h> #include <linux/math64.h> -#define TICK_SIZE (tick_nsec / 1000) - unsigned long long native_sched_clock(void); extern void recalibrate_cpu_khz(void); diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 622fe24da910..a909b817b9c0 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -263,13 +263,6 @@ static void kvm_xen_stop_timer(struct kvm_vcpu *vcpu) atomic_set(&vcpu->arch.xen.timer_pending, 0); } -static void kvm_xen_init_timer(struct kvm_vcpu *vcpu) -{ - hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS_HARD); - vcpu->arch.xen.timer.function = xen_timer_callback; -} - static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) { struct kvm_vcpu_xen *vx = &v->arch.xen; @@ -1070,9 +1063,6 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; } - if (!vcpu->arch.xen.timer.function) - kvm_xen_init_timer(vcpu); - /* Stop the timer (if it's running) before changing the vector */ kvm_xen_stop_timer(vcpu); vcpu->arch.xen.timer_virq = data->u.timer.port; @@ -2235,6 +2225,8 @@ void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) vcpu->arch.xen.poll_evtchn = 0; timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); + hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); + vcpu->arch.xen.timer.function = xen_timer_callback; kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm); kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm); |