lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2020-12-12	md: change mddev 'chunk_sectors' from int to unsigned	Mike Snitzer
	Commit e2782f560c29 ("Revert "dm raid: remove unnecessary discard limits for raid10"") exposed compiler warnings introduced by commit e0910c8e4f87 ("dm raid: fix discard limits for raid1 and raid10"): In file included from ./include/linux/kernel.h:14, from ./include/asm-generic/bug.h:20, from ./arch/x86/include/asm/bug.h:93, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/gfp.h:5, from ./include/linux/slab.h:15, from drivers/md/dm-raid.c:8: drivers/md/dm-raid.c: In function ‘raid_io_hints’: ./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) )1 == (typeof(y) )1))) ^~ ./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’ (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ ./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’ __builtin_choose_expr(__safe_cmp(x, y), \ ^~~~~~~~~~ ./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’ #define min(x, y) __careful_cmp(x, y, <) ^~~~~~~~~~~~~ ./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’ __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); }) ^~~ drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’ limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors, ^~~~~~~~~~~~ Fix this by changing the chunk_sectors member of 'struct mddev' from int to 'unsigned int' to match the type used for the 'chunk_sectors' member of 'struct queue_limits'. Various MD code still uses 'int' but none of it appears to ever make use of signed int; and storing positive signed int in unsigned is perfectly safe. Reported-by: Song Liu <songliubraving@fb.com> Fixes: e2782f560c29 ("Revert "dm raid: remove unnecessary discard limits for raid10"") Fixes: e0910c8e4f87 ("dm raid: fix discard limits for raid1 and raid10") Cc: stable@vger,kernel.org # e0910c8e4f87 was marked for stable@ Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-12	x86/kprobes: Fix optprobe to detect INT3 padding correctly	Masami Hiramatsu
	Commit 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes") changed the padding bytes between functions from NOP to INT3. However, when optprobe decodes a target function it finds INT3 and gives up the jump optimization. Instead of giving up any INT3 detection, check whether the rest of the bytes to the end of the function are INT3. If all of them are INT3, those come from the linker. In that case, continue the optprobe jump optimization. [ bp: Massage commit message. ] Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes") Reported-by: Adam Zabrocki <pi3@pi3.com.pl> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
2020-12-12	Merge tag 'timers-v5.11-2' of ↵	Thomas Gleixner
	https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/events updates from Daniel Lezcano: - Fix error handling if no clock is available on dw_apb_timer_of (Dinh Nguyen) - Fix overhead for erratum handling when the timer has no erratum and fix fault programing for the event stream on the arm arch timer (Keqian Zhu) - Fix potential deadlock when calling runtime PM on sh_cmt (Niklas Söderlund)
2020-12-11	Input: goodix - add upside-down quirk for Teclast X98 Pro tablet	Simon Beginn
	The touchscreen on the Teclast x98 Pro is also mounted upside-down in relation to the display orientation. Signed-off-by: Simon Beginn <linux@simonmicro.de> Signed-off-by: Bastien Nocera <hadess@hadess.net> Link: https://lore.kernel.org/r/20201117004253.27A5A27EFD@localhost Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-12-11	tools/kvm_stat: Exempt time-based counters	Stefan Raspl
	The new counters halt_poll_success_ns and halt_poll_fail_ns do not count events. Instead they provide a time, and mess up our statistics. Therefore, we should exclude them. Removal is currently implemented with an exempt list. If more counters like these appear, we can think about a more general rule like excluding all fields name "*_ns", in case that's a standing convention. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Tested-and-reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20201208210829.101324-1-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-11	KVM: mmu: Fix SPTE encoding of MMIO generation upper half	Maciej S. Szmigiero
	Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling") cleaned up the computation of MMIO generation SPTE masks, however it introduced a bug how the upper part was encoded: SPTE bits 52-61 were supposed to contain bits 10-19 of the current generation number, however a missing shift encoded bits 1-10 there instead (mostly duplicating the lower part of the encoded generation number that then consisted of bits 1-9). In the meantime, the upper part was shrunk by one bit and moved by subsequent commits to become an upper half of the encoded generation number (bits 9-17 of bits 0-17 encoded in a SPTE). In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation") has changed the SPTE bit range assigned to encode the generation number and the total number of bits encoded but did not update them in the comment attached to their defines, nor in the KVM MMU doc. Let's do it here, too, since it is too trivial thing to warrant a separate commit. Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com> Cc: stable@vger.kernel.org [Reorganize macros so that everything is computed from the bit ranges. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-11	Merge tag 'mtd/fixes-for-5.10-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: "Second series of fixes for raw NAND drivers initiated because of a rework of the ECC engine subsystem. The location of the DT parsing logic got moved, breaking several drivers which in fact were not doing the ECC engine initialization at the right place. These drivers have been fixed by enforcing a particular ECC engine type and algorithm, software Hamming, while the algorithm may be overwritten by a DT property. This merge request fixes this in the xway, socrates, plat_nand, pasemi, orion, mpc5121, gpio, au1550 and ams-delta controller drivers" * tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: xway: Do not force a particular software ECC engine mtd: rawnand: socrates: Do not force a particular software ECC engine mtd: rawnand: plat_nand: Do not force a particular software ECC engine mtd: rawnand: pasemi: Do not force a particular software ECC engine mtd: rawnand: orion: Do not force a particular software ECC engine mtd: rawnand: mpc5121: Do not force a particular software ECC engine mtd: rawnand: gpio: Do not force a particular software ECC engine mtd: rawnand: au1550: Do not force a particular software ECC engine mtd: rawnand: ams-delta: Do not force a particular software ECC engine
2020-12-11	Merge tag 'mmc-v5.10-rc4-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "A couple of MMC fixes: MMC core: - Fixup condition for CMD13 polling for RPMB requests MMC host: - mtk-sd: Fix system suspend/resume support for CQHCI - mtd-sd: Extend SDIO IRQ fix to more variants - sdhci-of-arasan: Fix clock registration error for Keem Bay SOC - tmio: Bring HW to a sane state after a power off" * tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: mediatek: mark PM functions as __maybe_unused mmc: block: Fixup condition for CMD13 polling for RPMB requests mmc: tmio: improve bringing HW to a sane state with MMC_POWER_OFF mmc: sdhci-of-arasan: Fix clock registration error for Keem Bay SOC mmc: mediatek: Extend recheck_sdio_irq fix to more variants mmc: mediatek: Fix system suspend/resume support for CQHCI
2020-12-11	Merge tag 'at24-fixes-for-v5.10' of ↵	Wolfram Sang
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current at24 fixes for v5.10 - fix NVMEM name with custom AT24 device name
2020-12-11	Merge tag 'zonefs-5.10-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: "A single patch in this pull request to fix a BIO and page reference leak when writing sequential zone files" * tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: fix page reference and BIO leak
2020-12-11	tick/sched: Make jiffies update quick check more robust	Thomas Gleixner
	The quick check in tick_do_update_jiffies64() whether jiffies need to be updated is not really correct under all circumstances and on all architectures, especially not on 32bit systems. The quick check does: if (now < READ_ONCE(tick_next_period)) return; and the counterpart in the update is: WRITE_ONCE(tick_next_period, next_update_time); This has two problems: 1) On weakly ordered architectures there is no guarantee that the stores before the WRITE_ONCE() are visible which means that other CPUs can operate on a stale jiffies value. 2) On 32bit the store of tick_next_period which is an u64 is split into two 32bit stores. If the first 32bit store advances tick_next_period far out and the second 32bit store is delayed (virt, NMI ...) then jiffies will become stale until the second 32bit store happens. Address this by seperating the handling for 32bit and 64bit. On 64bit problem #1 is addressed by replacing READ_ONCE() / WRITE_ONCE() with smp_load_acquire() / smp_store_release(). On 32bit problem #2 is addressed by protecting the quick check with the jiffies sequence counter. The load and stores can be plain because the sequence count mechanics provides the required barriers already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87czzpc02w.fsf@nanos.tec.linutronix.de
2020-12-11	bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers	Andrii Nakryiko
	Remove bpf_ prefix, which causes these helpers to be reported in verifier dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets fix it as long as it is still possible before UAPI freezes on these helpers. Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge misc fixes from Andrew Morton: "8 patches. Subsystems affected by this patch series: proc, selftests, kbuild, and mm (pagecache, kasan, hugetlb)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/hugetlb: clear compound_nr before freeing gigantic pages kasan: fix object remaining in offline per-cpu quarantine elfcore: fix building with clang initramfs: fix clang build failure kbuild: avoid static_assert for genksyms selftest/fpu: avoid clang warning proc: use untagged_addr() for pagemap_read addresses revert "mm/filemap: add static for function __add_to_page_cache_locked"
2020-12-11	mm/hugetlb: clear compound_nr before freeing gigantic pages	Gerald Schaefer
	Commit 1378a5ee451a ("mm: store compound_nr as well as compound_order") added compound_nr counter to first tail struct page, overlaying with page->mapping. The overlay itself is fine, but while freeing gigantic hugepages via free_contig_range(), a "bad page" check will trigger for non-NULL page->mapping on the first tail page: BUG: Bad page state in process bash pfn:380001 page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001 aops:0x0 flags: 0x3ffff00000000000() raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000 raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 page dumped because: non-NULL mapping Modules linked in: CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1 Hardware name: IBM 3906 M03 703 (LPAR) Call Trace: show_stack+0x6e/0xe8 dump_stack+0x90/0xc8 bad_page+0xd6/0x130 free_pcppages_bulk+0x26a/0x800 free_unref_page+0x6e/0x90 free_contig_range+0x94/0xe8 update_and_free_page+0x1c4/0x2c8 free_pool_huge_page+0x11e/0x138 set_max_huge_pages+0x228/0x300 nr_hugepages_store_common+0xb8/0x130 kernfs_fop_write+0xd2/0x218 vfs_write+0xb0/0x2b8 ksys_write+0xac/0xe0 system_call+0xe6/0x288 Disabling lock debugging due to kernel taint This is because only the compound_order is cleared in destroy_compound_gigantic_page(), and compound_nr is set to 1U << order == 1 for order 0 in set_compound_order(page, 0). Fix this by explicitly clearing compound_nr for first tail page after calling set_compound_order(page, 0). Link: https://lkml.kernel.org/r/20201208182813.66391-2-gerald.schaefer@linux.ibm.com Fixes: 1378a5ee451a ("mm: store compound_nr as well as compound_order") Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	kasan: fix object remaining in offline per-cpu quarantine	Kuan-Ying Lee
	We hit this issue in our internal test. When enabling generic kasan, a kfree()'d object is put into per-cpu quarantine first. If the cpu goes offline, object still remains in the per-cpu quarantine. If we call kmem_cache_destroy() now, slub will report "Objects remaining" error. ============================================================================= BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200 CPU: 3 PID: 176 Comm: cat Tainted: G B 5.10.0-rc1-00007-g4525c8781ec0-dirty #10 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2b0 show_stack+0x18/0x68 dump_stack+0xfc/0x168 slab_err+0xac/0xd4 __kmem_cache_shutdown+0x1e4/0x3c8 kmem_cache_destroy+0x68/0x130 test_version_show+0x84/0xf0 module_attr_show+0x40/0x60 sysfs_kf_seq_show+0x128/0x1c0 kernfs_seq_show+0xa0/0xb8 seq_read+0x1f0/0x7e8 kernfs_fop_read+0x70/0x338 vfs_read+0xe4/0x250 ksys_read+0xc8/0x180 __arm64_sys_read+0x44/0x58 el0_svc_common.constprop.0+0xac/0x228 do_el0_svc+0x38/0xa0 el0_sync_handler+0x170/0x178 el0_sync+0x174/0x180 INFO: Object 0x(____ptrval____) @offset=15848 INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172 stack_trace_save+0x9c/0xd0 set_track+0x64/0xf0 alloc_debug_processing+0x104/0x1a0 ___slab_alloc+0x628/0x648 __slab_alloc.isra.0+0x2c/0x58 kmem_cache_alloc+0x560/0x588 test_version_show+0x98/0xf0 module_attr_show+0x40/0x60 sysfs_kf_seq_show+0x128/0x1c0 kernfs_seq_show+0xa0/0xb8 seq_read+0x1f0/0x7e8 kernfs_fop_read+0x70/0x338 vfs_read+0xe4/0x250 ksys_read+0xc8/0x180 __arm64_sys_read+0x44/0x58 el0_svc_common.constprop.0+0xac/0x228 kmem_cache_destroy test_module_slab: Slab cache still has objects Register a cpu hotplug function to remove all objects in the offline per-cpu quarantine when cpu is going offline. Set a per-cpu variable to indicate this cpu is offline. [qiang.zhang@windriver.com: fix slab double free when cpu-hotplug] Link: https://lkml.kernel.org/r/20201204102206.20237-1-qiang.zhang@windriver.com Link: https://lkml.kernel.org/r/1606895585-17382-2-git-send-email-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Guangye Yang <guangye.yang@mediatek.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Qian Cai <qcai@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	elfcore: fix building with clang	Arnd Bergmann
	kernel/elfcore.c only contains weak symbols, which triggers a bug with clang in combination with recordmcount: Cannot find symbol for section 2: .text. kernel/elfcore.o: failed Move the empty stubs into linux/elfcore.h as inline functions. As only two architectures use these, just use the architecture specific Kconfig symbols to key off the declaration. Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	initramfs: fix clang build failure	Arnd Bergmann
	There is only one function in init/initramfs.c that is in the .text section, and it is marked __weak. When building with clang-12 and the integrated assembler, this leads to a bug with recordmcount: ./scripts/recordmcount "init/initramfs.o" Cannot find symbol for section 2: .text. init/initramfs.o: failed I'm not quite sure what exactly goes wrong, but I notice that this function is only ever called from an __init function, and normally inlined. Marking it __init as well is clearly correct and it leads to recordmcount no longer complaining. Link: https://lkml.kernel.org/r/20201204165742.3815221-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	kbuild: avoid static_assert for genksyms	Arnd Bergmann
	genksyms does not know or care about the _Static_assert() built-in, and sometimes falls back to ignoring the later symbols, which causes undefined behavior such as WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned. ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation Redefine static_assert for genksyms to avoid that. Link: https://lkml.kernel.org/r/20201203230955.1482058-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Kees Cook <keescook@chromium.org> Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	selftest/fpu: avoid clang warning	Arnd Bergmann
	With extra warnings enabled, clang complains about the redundant -mhard-float argument: clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument] Move this into the gcc-only part of the Makefile. Link: https://lkml.kernel.org/r/20201203223652.1320700-1-arnd@kernel.org Fixes: 4185b3b92792 ("selftests/fpu: Add an FPU selftest") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Petteri Aimonen <jpa@git.mail.kapsi.fi> Cc: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	proc: use untagged_addr() for pagemap_read addresses	Miles Chen
	When we try to visit the pagemap of a tagged userspace pointer, we find that the start_vaddr is not correct because of the tag. To fix it, we should untag the userspace pointers in pagemap_read(). I tested with 5.10-rc4 and the issue remains. Explanation from Catalin in [1]: "Arguably, that's a user-space bug since tagged file offsets were never supported. In this case it's not even a tag at bit 56 as per the arm64 tagged address ABI but rather down to bit 47. You could say that the problem is caused by the C library (malloc()) or whoever created the tagged vaddr and passed it to this function. It's not a kernel regression as we've never supported it. Now, pagemap is a special case where the offset is usually not generated as a classic file offset but rather derived by shifting a user virtual address. I guess we can make a concession for pagemap (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)" My test code is based on [2]: A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8 userspace program: uint64 OsLayer::VirtualToPhysical(void vaddr) { uint64 frame, paddr, pfnmask, pagemask; int pagesize = sysconf(_SC_PAGESIZE); off64_t off = ((uintptr_t)vaddr) / pagesize 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0 int fd = open(kPagemapPath, O_RDONLY); ... if (lseek64(fd, off, SEEK_SET) != off \|\| read(fd, &frame, 8) != 8) { int err = errno; string errtxt = ErrorString(err); if (fd >= 0) close(fd); return 0; } ... } kernel fs/proc/task_mmu.c: static ssize_t pagemap_read(struct file file, char __user buf, size_t count, loff_t ppos) { ... src = ppos; svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54 start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000 end_vaddr = mm->task_size; /* watch out for wraparound */ // svpfn == 0xb400007662f54 // (mm->task_size >> PAGE) == 0x8000000 if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4 start_vaddr = end_vaddr; ret = 0; while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr int len; unsigned long end; ... } ... } [1] https://lore.kernel.org/patchwork/patch/1343258/ [2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158 Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Will Deacon <will@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com> Cc: <stable@vger.kernel.org> [5.4-] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	revert "mm/filemap: add static for function __add_to_page_cache_locked"	Andrew Morton
	Revert commit 3351b16af494 ("mm/filemap: add static for function __add_to_page_cache_locked") due to incompatibility with ALLOW_ERROR_INJECTION which result in build errors. Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.com Tested-by: Justin Forbes <jmforbes@linuxtx.org> Tested-by: Greg Thelen <gthelen@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Michal Kubecek <mkubecek@suse.cz> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Tony Luck <tony.luck@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11	Input: cm109 - do not stomp on control URB	Dmitry Torokhov
	We need to make sure we are not stomping on the control URB that was issued when opening the device when attempting to toggle buzzer. To do that we need to mark it as pending in cm109_open(). Reported-and-tested-by: syzbot+150f793ac5bc18eee150@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-12-11	docs: Note that sphinx 1.7 will be required soon	Jonathan Corbet
	The time has come to drop support for some truly ancient versions of sphinx; put in a warning now. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-12-11	mtd: rawnand: xway: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: d525914b5bd8 ("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-10-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: socrates: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: b36bf0a0fe5d ("mtd: rawnand: socrates: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-9-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: plat_nand: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: 612e048e6aab ("mtd: rawnand: plat_nand: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-8-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: pasemi: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: 8fc6f1f042b2 ("mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-7-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: orion: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Fixes: 553508cec2e8 ("mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-6-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: mpc5121: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: 6dd09f775b72 ("mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-5-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: gpio: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: f6341f6448e0 ("mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-4-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: au1550: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: dbffc8ccdf3a ("mtd: rawnand: au1550: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-3-miquel.raynal@bootlin.com
2020-12-11	mtd: rawnand: ams-delta: Do not force a particular software ECC engine	Miquel Raynal
	Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input parsing bits") kind of broke the logic around the initialization of several ECC engines. Unfortunately, the fix (which indeed moved the ECC initialization to the right place) did not take into account the fact that a different ECC algorithm could have been used thanks to a DT property, considering the "Hamming" algorithm entry a configuration while it was only a default. Add the necessary logic to be sure Hamming keeps being only a default. Fixes: 59d93473323a ("mtd: rawnand: ams-delta: Move the ECC initialization to ->attach_chip()") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-2-miquel.raynal@bootlin.com
2020-12-11	x86/ia32_signal: Propagate __user annotation properly	Lukas Bulwahn
	Commit 57d563c82925 ("x86: ia32_setup_rt_frame(): consolidate uaccess areas") dropped a __user annotation in a cast when refactoring __put_user() to unsafe_put_user(). Hence, since then, sparse warns in arch/x86/ia32/ia32_signal.c:350:9: warning: cast removes address space '__user' of expression warning: incorrect type in argument 1 (different address spaces) expected void const volatile [noderef] __user ptr got unsigned long long [usertype] Add the __user annotation to restore the propagation of address spaces. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201207124141.21859-1-lukas.bulwahn@gmail.com
2020-12-11	Merge tag 'pinctrl-v5.10-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here is a late set of pin control fixes for v5.10, most concern some minor and major issues found in the Intel drivers. Some are so hairy that I have no idea what is going on there, but luckily the maintainer knows what's up. We also have an interesting fix for AMD, which makes AMD-based laptops more stable IIUC. Summary: - Fix up some SPI group and a register offset on Intel Jasperlake - Set default bias on Intel Merrifield - Preserve debouncing on Intel Baytrail - Stop .set_type() irqchip callback in the AMD driver from fiddling with the debounce filter - Fix access to GPIO banks that are pass-thru on the Aspeed - Fix a fix for the Intel pin control driver to disable Rx/Tx when requesting a UART line as GPIO" * tag 'pinctrl-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: intel: Actually disable Tx and Rx buffers on GPIO request pinctrl: aspeed: Fix GPIO requests on pass-through banks pinctrl: amd: remove debounce filter setting in IRQ type setting pinctrl: baytrail: Avoid clearing debounce value when turning it off pinctrl: merrifield: Set default bias in case no particular value given pinctrl: jasperlake: Fix HOSTSW_OWN offset pinctrl: jasperlake: Unhide SPI group of pins
2020-12-11	Merge tag 'v5.10-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "These are hopefully the last GPIO fixes for this cycle. All are driver fixes except a small resource leak for pin ranges in the gpiolib. Two are PM related, which is nice because when developers start to find PM bugs it is usually because they have smoked out the bugs of more severe nature. Summary: - Fix runtime PM balancing on the errorpath of the Arizona driver - Fix a suspend NULL pointer reference in the dwapb driver - Balance free:ing in gpiochip_generic_free() - Fix runtime PM balancing on the errorpath of the zynq driver - Fix irqdomain use-after-free in the mvebu driver - Break an eternal loop in the spreadtrum EIC driver" * tag 'v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: eic-sprd: break loop when getting NULL device resource gpio: mvebu: fix potential user-after-free on probe gpio: zynq: fix reference leak in zynq_gpio functions gpiolib: Don't free if pin ranges are not defined gpio: dwapb: fix NULL pointer dereference at dwapb_gpio_suspend() gpio: arizona: disable pm_runtime in case of failure
2020-12-11	Merge tag 'clk-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Two small clk driver build fixes - Remove __packed from a Renesas struct to improve portability - Fix a linking problem with i.MX when config options don't agree" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: renesas: r9a06g032: Drop __packed for portability clk: imx: scu: fix MXC_CLK_SCU module build break
2020-12-11	Revert "scsi: storvsc: Validate length of incoming packet in ↵	Andrea Parri (Microsoft)
	storvsc_on_channel_callback()" This reverts commit 3b8c72d076c42bf27284cda7b2b2b522810686f8. Dexuan reported a regression where StorVSC fails to probe a device (and where, consequently, the VM may fail to boot). The root-cause analysis led to a long-standing race condition that is exposed by the validation /commit in question. Let's put the new validation aside until a proper solution for that race condition is in place. Link: https://lore.kernel.org/r/20201211131404.21359-1-parri.andrea@gmail.com Fixes: 3b8c72d076c4 ("scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()") Cc: Dexuan Cui <decui@microsoft.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-11	crypto: qat - add capability detection logic in qat_4xxx	Marco Chiappero
	Add logic to detect device capabilities in qat_4xxx driver. Read fuses and build the device capabilities mask. This will enable services and handling specific to QAT 4xxx devices. Co-developed-by: Tomaszx Kowalik <tomaszx.kowalik@intel.com> Signed-off-by: Tomaszx Kowalik <tomaszx.kowalik@intel.com> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	crypto: qat - add AES-XTS support for QAT GEN4 devices	Marco Chiappero
	Add handling of AES-XTS specific to QAT GEN4 devices. Co-developed-by: Tomaszx Kowalik <tomaszx.kowalik@intel.com> Signed-off-by: Tomaszx Kowalik <tomaszx.kowalik@intel.com> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	crypto: qat - add AES-CTR support for QAT GEN4 devices	Marco Chiappero
	Add support for AES-CTR for QAT GEN4 devices. Also, introduce the capability ICP_ACCEL_CAPABILITIES_AES_V2 and the helper macro HW_CAP_AES_V2, which allow to distinguish between different HW generations. Co-developed-by: Tomasz Kowalik <tomaszx.kowalik@intel.com> Signed-off-by: Tomasz Kowalik <tomaszx.kowalik@intel.com> Co-developed-by: Mateusz Polrola <mateuszx.potrola@intel.com> Signed-off-by: Mateusz Polrola <mateuszx.potrola@intel.com> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	crypto: atmel-i2c - select CONFIG_BITREVERSE	Arnd Bergmann
	The bitreverse helper is almost always built into the kernel, but in a rare randconfig build it is possible to hit a case in which it is a loadable module while the atmel-i2c driver is built-in: arm-linux-gnueabi-ld: drivers/crypto/atmel-i2c.o: in function `atmel_i2c_checksum': atmel-i2c.c:(.text+0xa0): undefined reference to `byte_rev_table' Add one more 'select' statement to prevent this. Fixes: 11105693fa05 ("crypto: atmel-ecc - introduce Microchip / Atmel ECC driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	crypto: hisilicon/trng - replace atomic_add_return()	Yejune Deng
	a set of atomic_inc_return() looks more neater Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	crypto: keembay - Add support for Keem Bay OCS AES/SM4	Mike Healy
	Add support for the AES/SM4 crypto engine included in the Offload and Crypto Subsystem (OCS) of the Intel Keem Bay SoC, thus enabling hardware-acceleration for the following transformations: - ecb(aes), cbc(aes), ctr(aes), cts(cbc(aes)), gcm(aes) and cbc(aes); supported for 128-bit and 256-bit keys. - ecb(sm4), cbc(sm4), ctr(sm4), cts(cbc(sm4)), gcm(sm4) and cbc(sm4); supported for 128-bit keys. The driver passes crypto manager self-tests, including the extra tests (CRYPTO_MANAGER_EXTRA_TESTS=y). Signed-off-by: Mike Healy <mikex.healy@intel.com> Co-developed-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Acked-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	dt-bindings: Add Keem Bay OCS AES bindings	Daniele Alessandrelli
	Add device-tree bindings for Intel Keem Bay Offload and Crypto Subsystem (OCS) AES crypto driver. Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Acked-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-11	ntp: Consolidate the RTC update implementation	Thomas Gleixner
	The code for the legacy RTC and the RTC class based update are pretty much the same. Consolidate the common parts into one function and just invoke the actual setter functions. For RTC class based devices the update code checks whether the offset is valid for the device, which is usually not the case for the first invocation. If it's not the same it stores the correct offset and lets the caller try again. That's not much different from the previous approach where the first invocation had a pretty low probability to actually hit the allowed window. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.355743355@linutronix.de
2020-12-11	ntp: Make the RTC sync offset less obscure	Thomas Gleixner
	The current RTC set_offset_nsec value is not really intuitive to understand. tsched twrite(t2.tv_sec - 1) t2 (seconds increment) The offset is calculated from twrite based on the assumption that t2 - twrite == 1s. That means for the MC146818 RTC the offset needs to be negative so that the write happens 500ms before t2. It's easier to understand when the whole calculation is based on t2. That avoids negative offsets and the meaning is obvious: t2 - twrite: The time defined by the chip when seconds increment after the write. twrite - tsched: The time for the transport to the point where the chip is updated. ==> set_offset_nsec = t2 - tsched ttransport = twrite - tsched tRTCinc = t2 - twrite ==> set_offset_nsec = ttransport + tRTCinc tRTCinc is a chip property and can be obtained from the data sheet. ttransport depends on how the RTC is connected. It is close to 0 for directly accessible RTCs. For RTCs behind a slow bus, e.g. i2c, it's the time required to send the update over the bus. This can be estimated or even calibrated, but that's a different problem. Adjust the implementation and update comments accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.263204937@linutronix.de
2020-12-11	ntp, rtc: Move rtc_set_ntp_time() to ntp code	Thomas Gleixner
	rtc_set_ntp_time() is not really RTC functionality as the code is just a user of RTC. Move it into the NTP code which allows further cleanups. Requested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.166871172@linutronix.de
2020-12-11	ntp: Make the RTC synchronization more reliable	Thomas Gleixner
	Miroslav reported that the periodic RTC synchronization in the NTP code fails more often than not to hit the specified update window. The reason is that the code uses delayed_work to schedule the update which needs to be in thread context as the underlying RTC might be connected via a slow bus, e.g. I2C. In the update function it verifies whether the current time is correct vs. the requirements of the underlying RTC. But delayed_work is using the timer wheel for scheduling which is inaccurate by design. Depending on the distance to the expiry the wheel gets less granular to allow batching and to avoid the cascading of the original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading wheel") and the code for further details. The code already deals with this by splitting the 660 seconds period into a long 659 seconds timer and then retrying with a smaller delta. But looking at the actual granularities of the timer wheel (which depend on the HZ configuration) the 659 seconds timer ends up in an outer wheel level and is affected by a worst case granularity of: HZ Granularity 1000 32s 250 16s 100 40s So the initial timer can be already off by max 12.5% which is not a big issue as the period of the sync is defined as ~11 minutes. The fine grained second attempt schedules to the desired update point with a timer expiring less than a second from now. Depending on the actual delta and the HZ setting even the second attempt can end up in outer wheel levels which have a large enough granularity to make the correctness check fail. As this is a fundamental property of the timer wheel there is no way to make this more accurate short of iterating in one jiffies steps towards the update point. Switch it to an hrtimer instead which schedules the actual update work. The hrtimer will expire precisely (max 1 jiffie delay when high resolution timers are not available). The actual scheduling delay of the work is the same as before. The update is triggered from do_adjtimex() which is a bit racy but not much more racy than it was before: if (ntp_synced()) queue_delayed_work(system_power_efficient_wq, &sync_work, 0); which is racy when the work is currently executed and has not managed to reschedule itself. This becomes now: if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer)) queue_work(system_power_efficient_wq, &sync_work, 0); which is racy when the hrtimer has expired and the work is currently executed and has not yet managed to rearm the hrtimer. Not a big problem as it just schedules work for nothing. The new implementation has a safe guard in place to catch the case where the hrtimer is queued on entry to the work function and avoids an extra update attempt of the RTC that way. Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Miroslav Lichvar <mlichvar@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.062910520@linutronix.de
2020-12-11	rtc: core: Make the sync offset default more realistic	Thomas Gleixner
	The offset which is used to steer the start of an RTC synchronization update via rtc_set_ntp_time() is huge. The math behind this is: tsched twrite(t2.tv_sec - 1) t2 (seconds increment) twrite - tsched is the transport time for the write to hit the device. t2 - twrite depends on the chip and is for most chips one second. The rtc_set_ntp_time() calculation of tsched is: tsched = t2 - 1sec - (t2 - twrite) The default for the sync offset is 500ms which means that twrite - tsched is 500ms assumed that t2 - twrite is one second. This is 0.5 seconds off for RTCs which are directly accessible by IO writes and probably for the majority of i2C/SPI based RTC off by an order of magnitude. Set it to 5ms which should bring it closer to reality. The default can be adjusted by drivers (rtc_cmos does so) and could be adjusted further by a calibration method which is an orthogonal problem. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220541.960333166@linutronix.de
2020-12-11	rtc: cmos: Make rtc_cmos sync offset correct	Thomas Gleixner
	The offset for rtc_cmos must be -500ms to work correctly with the current implementation of rtc_set_ntp_time() due to the following: tsched twrite(t2.tv_sec - 1) t2 (seconds increment) twrite - tsched is the transport time for the write to hit the device, which is negligible for this chip because it's accessed directly. t2 - twrite = 500ms according to the datasheet. But rtc_set_ntp_time() calculation of tsched is: tsched = t2 - 1sec - (t2 - twrite) The default for the sync offset is 500ms which means that the write happens at t2 - 1.5 seconds which is obviously off by a second for this device. Make the offset -500ms so it works correct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220541.830517160@linutronix.de