summaryrefslogtreecommitdiff
path: root/kernel/debug
AgeCommit message (Collapse)Author
2020-06-17maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09kernel: rename show_stack_loglvl() => show_stack()Dmitry Safonov
Now the last users of show_stack() got converted to use an explicit log level, show_stack_loglvl() can drop it's redundant suffix and become once again well known show_stack(). Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200418201944.482088-51-dima@arista.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09kdb: don't play with console_loglevelDmitry Safonov
Print the stack trace with KERN_EMERG - it should be always visible. Playing with console_loglevel is a bad idea as there may be more messages printed than wanted. Also the stack trace might be not printed at all if printk() was deferred and console_loglevel was raised back before the trace got flushed. Unfortunately, after rebasing on commit 2277b492582d ("kdb: Fix stack crawling on 'running' CPUs that aren't the master"), kdb_show_stack() uses now kdb_dump_stack_on_cpu(), which for now won't be converted as it uses dump_stack() instead of show_stack(). Convert for now the branch that uses show_stack() and remove console_loglevel exercise from that case. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20200418201944.482088-48-dima@arista.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-07Merge tag 'tty-5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the tty and serial driver updates for 5.8-rc1 Nothing huge at all, just a lot of little serial driver fixes, updates for new devices and features, and other small things. Full details are in the shortlog. All of these have been in linux-next with no issues for a while" * tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (67 commits) tty: serial: qcom_geni_serial: Add 51.2MHz frequency support tty: serial: imx: clear Ageing Timer Interrupt in handler serial: 8250_fintek: Add F81966 Support sc16is7xx: Add flag to activate IrDA mode dt-bindings: sc16is7xx: Add flag to activate IrDA mode serial: 8250: Support rs485 bus termination GPIO serial: 8520_port: Fix function param documentation dt-bindings: serial: Add binding for rs485 bus termination GPIO vt: keyboard: avoid signed integer overflow in k_ascii serial: 8250: Enable 16550A variants by default on non-x86 tty: hvc_console, fix crashes on parallel open/close serial: imx: Initialize lock for non-registered console sc16is7xx: Read the LSR register for basic device presence check sc16is7xx: Allow sharing the IRQ line sc16is7xx: Use threaded IRQ sc16is7xx: Always use falling edge IRQ tty: n_gsm: Fix bogus i++ in gsm_data_kick tty: n_gsm: Remove unnecessary test in gsm_print_packet() serial: stm32: add no_console_suspend support tty: serial: fsl_lpuart: Use __maybe_unused instead of #if CONFIG_PM_SLEEP ...
2020-06-04Merge tag 'riscv-for-linus-5.8-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - The remainder of the code necessary to support the Kendryte K210: * Support for building device trees into the kernel, as the K210 doesn't have a bootloader that provides one * A K210 device tree and the associated defconfig update * Support for skipping PMP initialization on systems that trap on PMP accesses rather than treating them as WARL - Support for KGDB - Improvements to text patching - Some cleanups to the SiFive L2 cache driver * tag 'riscv-for-linus-5.8-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: soc: sifive: l2 cache: Mark l2_get_priv_group as static soc: sifive: l2 cache: Eliminate an unsigned zero compare warning riscv: Add support to determine no. of L2 cache way enabled riscv: cacheinfo: Implement cache_get_priv_group with a generic ops structure riscv: Use text_mutex instead of patch_lock riscv: Use NOKPROBE_SYMBOL() instead of __krpobes annotation riscv: Remove the 'riscv_' prefix of function name riscv: Add SW single-step support for KDB riscv: Use the XML target descriptions to report 3 system registers riscv: Add KGDB support kgdb: Add kgdb_has_hit_break function RISC-V: Skip setting up PMPs on traps riscv: K210: Update defconfig riscv: K210: Add a built-in device tree riscv: Allow device trees to be built into the kernel
2020-06-02kdb: Remove the misfeature 'KDBFLAGS'Wei Li
Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG' is set, and the user can define an environment variable named 'KDBFLAGS' too. These are puzzling indeed. After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature. So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we wrote into. After this modification, we can use `md4c1 kdb_flags` instead, to observe the state flags. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20200521072125.21103-1-liwei391@huawei.com [daniel.thompson@linaro.org: Make kdb_flags unsigned to avoid arithmetic right shift] Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02kdb: Cleanup math with KDB_CMD_HISTORY_COUNTDouglas Anderson
From code inspection the math in handle_ctrl_cmd() looks super sketchy because it subjects -1 from cmdptr and then does a "% KDB_CMD_HISTORY_COUNT". It turns out that this code works because "cmdptr" is unsigned and KDB_CMD_HISTORY_COUNT is a nice power of 2. Let's make this a little less sketchy. This patch should be a no-op. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200507161125.1.I2cce9ac66e141230c3644b8174b6c15d4e769232@changeid Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02kgdb: Don't call the deinit under spinlockDouglas Anderson
When I combined kgdboc_earlycon with an inflight patch titled ("soc: qcom-geni-se: Add interconnect support to fix earlycon crash") [1] things went boom. Specifically I got a crash during the transition between kgdboc_earlycon and the main kgdboc that looked like this: Call trace: __schedule_bug+0x68/0x6c __schedule+0x75c/0x924 schedule+0x8c/0xbc schedule_timeout+0x9c/0xfc do_wait_for_common+0xd0/0x160 wait_for_completion_timeout+0x54/0x74 rpmh_write_batch+0x1fc/0x23c qcom_icc_bcm_voter_commit+0x1b4/0x388 qcom_icc_set+0x2c/0x3c apply_constraints+0x5c/0x98 icc_set_bw+0x204/0x3bc icc_put+0x30/0xf8 geni_remove_earlycon_icc_vote+0x6c/0x9c qcom_geni_serial_earlycon_exit+0x10/0x1c kgdboc_earlycon_deinit+0x38/0x58 kgdb_register_io_module+0x11c/0x194 configure_kgdboc+0x108/0x174 kgdboc_probe+0x38/0x60 platform_drv_probe+0x90/0xb0 really_probe+0x130/0x2fc ... The problem was that we were holding the "kgdb_registration_lock" while calling into code that didn't expect to be called in spinlock context. Let's slightly defer when we call the deinit code so that it's not done under spinlock. NOTE: this does mean that the "deinit" call of the old kgdb IO module is now made _after_ the init of the new IO module, but presumably that's OK. [1] https://lkml.kernel.org/r/1588919619-21355-3-git-send-email-akashast@codeaurora.org Fixes: 220995622da5 ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles") Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200526142001.1.I523dc33f96589cb9956f5679976d402c8cda36fa@changeid [daniel.thompson@linaro.org: Resolved merge issues by hand] Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-05-18kgdb: Add kgdb_has_hit_break functionVincent Chen
The break instruction in RISC-V does not have an immediate value field, so the kernel cannot identify the purpose of each trap exception through the opcode. This makes the existing identification schemes in other architecture unsuitable for the RISC-V kernel. To solve this problem, this patch adds kgdb_has_hit_break(), which can help RISC-V kernel identify the KGDB trap exception. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-18kgdboc: Add kgdboc_earlycon to support early kgdb using boot consolesDouglas Anderson
We want to enable kgdb to debug the early parts of the kernel. Unfortunately kgdb normally is a client of the tty API in the kernel and serial drivers don't register to the tty layer until fairly late in the boot process. Serial drivers do, however, commonly register a boot console. Let's enable the kgdboc driver to work with boot consoles to provide early debugging. This change co-opts the existing read() function pointer that's part of "struct console". It's assumed that if a boot console (with the flag CON_BOOT) has implemented read() that both the read() and write() function are polling functions. That means they work without interrupts and read() will return immediately (with 0 bytes read) if there's nothing to read. This should be a safe assumption since it appears that no current boot consoles implement read() right now and there seems no reason to do so unless they wanted to support "kgdboc_earlycon". The normal/expected way to make all this work is to use "kgdboc_earlycon" and "kgdboc" together. You should point them both to the same physical serial connection. At boot time, as the system transitions from the boot console to the normal console (and registers a tty), kgdb will switch over. One awkward part of all this, though, is that there can be a window where the boot console goes away and we can't quite transtion over to the main kgdboc that uses the tty layer. There are two main problems: 1. The act of registering the tty doesn't cause any call into kgdboc so there is a window of time when the tty is there but kgdboc's init code hasn't been called so we can't transition to it. 2. On some serial drivers the normal console inits (and replaces the boot console) quite early in the system. Presumably these drivers were coded up before earlycon worked as well as it does today and probably they don't need to do this anymore, but it causes us problems nontheless. Problem #1 is not too big of a deal somewhat due to the luck of probe ordering. kgdboc is last in the tty/serial/Makefile so its probe gets right after all other tty devices. It's not fun to rely on this, but it does work for the most part. Problem #2 is a big deal, but only for some serial drivers. Other serial drivers end up registering the console (which gets rid of the boot console) and tty at nearly the same time. The way we'll deal with the window when the system has stopped using the boot console and the time when we're setup using the tty is to keep using the boot console. This may sound surprising, but it has been found to work well in practice. If it doesn't work, it shouldn't be too hard for a given serial driver to make it keep working. Specifically, it's expected that the read()/write() function provided in the boot console should be the same (or nearly the same) as the normal kgdb polling functions. That means continuing to use them should work just fine. To make things even more likely to work work we'll also trap the recently added exit() function in the boot console we're using and delay any calls to it until we're all done with the boot console. NOTE: there could be ways to use all this in weird / unexpected ways. If you do something like this, it's a bit of a buyer beware situation. Specifically: - If you specify only "kgdboc_earlycon" but not "kgdboc" then (depending on your serial driver) things will probably work OK, but you'll get a warning printed the first time you use kgdb after the boot console is gone. You'd only be able to do this, of course, if the serial driver you're running atop provided an early boot console. - If your "kgdboc_earlycon" and "kgdboc" devices are not the same device things should work OK, but it'll be your job to switch over which device you're monitoring (including figuring out how to switch over gdb in-flight if you're using it). When trying to enable "kgdboc_earlycon" it should be noted that the names that are registered through the boot console layer and the tty layer are not the same for the same port. For example when debugging on one board I'd need to pass "kgdboc_earlycon=qcom_geni kgdboc=ttyMSM0" to enable things properly. Since digging up the boot console name is a pain and there will rarely be more than one boot console enabled, you can provide the "kgdboc_earlycon" parameter without specifying the name of the boot console. In this case we'll just pick the first boot that implements read() that we find. This new "kgdboc_earlycon" parameter should be contrasted to the existing "ekgdboc" parameter. While both provide a way to debug very early, the usage and mechanisms are quite different. Specifically "kgdboc_earlycon" is meant to be used in tandem with "kgdboc" and there is a transition from one to the other. The "ekgdboc" parameter, on the other hand, replaces the "kgdboc" parameter. It runs the same logic as the "kgdboc" parameter but just relies on your TTY driver being present super early. The only known usage of the old "ekgdboc" parameter is documented as "ekgdboc=kbd earlyprintk=vga". It should be noted that "kbd" has special treatment allowing it to init early as a tty device. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.8.I8fba5961bf452ab92350654aa61957f23ecf0100@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-05-18kgdb: Prevent infinite recursive entries to the debuggerDouglas Anderson
If we detect that we recursively entered the debugger we should hack our I/O ops to NULL so that the panic() in the next line won't actually cause another recursion into the debugger. The first line of kgdb_panic() will check this and return. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.6.I89de39f68736c9de610e6f241e68d8dbc44bc266@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-05-18kgdb: Delay "kgdbwait" to dbg_late_init() by defaultDouglas Anderson
Using kgdb requires at least some level of architecture-level initialization. If nothing else, it relies on the architecture to pass breakpoints / crashes onto kgdb. On some architectures this all works super early, specifically it starts working at some point in time before Linux parses early_params's. On other architectures it doesn't. A survey of a few platforms: a) x86: Presumably it all works early since "ekgdboc" is documented to work here. b) arm64: Catching crashes works; with a simple patch breakpoints can also be made to work. c) arm: Nothing in kgdb works until paging_init() -> devicemaps_init() -> early_trap_init() Let's be conservative and, by default, process "kgdbwait" (which tells the kernel to drop into the debugger ASAP at boot) a bit later at dbg_late_init() time. If an architecture has tested it and wants to re-enable super early debugging, they can select the ARCH_HAS_EARLY_DEBUG KConfig option. We'll do this for x86 to start. It should be noted that dbg_late_init() is still called quite early in the system. Note that this patch doesn't affect when kgdb runs its init. If kgdb is set to initialize early it will still initialize when parsing early_param's. This patch _only_ inhibits the initial breakpoint from "kgdbwait". This means: * Without any extra patches arm64 platforms will at least catch crashes after kgdb inits. * arm platforms will catch crashes (and could handle a hardcoded kgdb_breakpoint()) any time after early_trap_init() runs, even before dbg_late_init(). Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200507130644.v4.4.I3113aea1b08d8ce36dc3720209392ae8b815201b@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-05-18kgdb: Disable WARN_CONSOLE_UNLOCKED for all kgdbDouglas Anderson
In commit 81eaadcae81b ("kgdboc: disable the console lock when in kgdb") we avoided the WARN_CONSOLE_UNLOCKED() yell when we were in kgdboc. That still works fine, but it turns out that we get a similar yell when using other I/O drivers. One example is the "I/O driver" for the kgdb test suite (kgdbts). When I enabled that I again got the same yells. Even though "kgdbts" doesn't actually interact with the user over the console, using it still causes kgdb to print to the consoles. That trips the same warning: con_is_visible+0x60/0x68 con_scroll+0x110/0x1b8 lf+0x4c/0xc8 vt_console_print+0x1b8/0x348 vkdb_printf+0x320/0x89c kdb_printf+0x68/0x90 kdb_main_loop+0x190/0x860 kdb_stub+0x2cc/0x3ec kgdb_cpu_enter+0x268/0x744 kgdb_handle_exception+0x1a4/0x200 kgdb_compiled_brk_fn+0x34/0x44 brk_handler+0x7c/0xb8 do_debug_exception+0x1b4/0x228 Let's increment/decrement the "ignore_console_lock_warning" variable all the time when we enter the debugger. This will allow us to later revert commit 81eaadcae81b ("kgdboc: disable the console lock when in kgdb"). Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.1.Ied2b058357152ebcc8bf68edd6f20a11d98d7d4e@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-05-15kdb: constify sysrq_key_opEmil Velikov
With earlier commits, the API no longer discards the const-ness of the sysrq_key_op. As such we can add the notation. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: linux-kernel@vger.kernel.org Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: kgdb-bugreport@lists.sourceforge.net Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://lore.kernel.org/r/20200513214351.2138580-9-emil.l.velikov@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-07kgdb: Drop malformed kernel doc commentAndy Shevchenko
Kernel doc does not understand POD variables to be referred to. .../debug_core.c:73: warning: cannot understand function prototype: 'int kgdb_connected; ' Convert kernel doc to pure comment. Fixes: dc7d55270521 ("kgdb: core") Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-04-03Merge tag 'spdx-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-04-01kdb: Censor attempts to set PROMPT without ENABLE_MEM_READDaniel Thompson
Currently the PROMPT variable could be abused to provoke the printf() machinery to read outside the current stack frame. Normally this doesn't matter becaues md is already a much better tool for reading from memory. However the md command can be disabled by not setting KDB_ENABLE_MEM_READ. Let's also prevent PROMPT from being modified in these circumstances. Whilst adding a comment to help future code reviewers we also remove the #ifdef where PROMPT in consumed. There is no problem passing an unused (0) to snprintf when !CONFIG_SMP. argument Reported-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org>
2020-04-01kdb: Eliminate strncpy() warnings by replacing with strscpy()Daniel Thompson
Currently the code to manage the kdb history buffer uses strncpy() to copy strings to/and from the history and exhibits the classic "but nobody ever told me that strncpy() doesn't always terminate strings" bug. Modern gcc compilers recognise this bug and issue a warning. In reality these calls will only abridge the copied string if kdb_read() has *already* overflowed the command buffer. Thus the use of counted copies here is only used to reduce the secondary effects of a bug elsewhere in the code. Therefore transitioning these calls into strscpy() (without checking the return code) is appropriate. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org>
2020-03-25.gitignore: add SPDX License IdentifierMasahiro Yamada
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-06Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no ↵Daniel Thompson
regs" This reverts commit bbfceba15f8d1260c328a254efc2b3f2deae4904. When DBG_MAX_REG_NUM is zero then a number of symbols are conditionally defined. It is therefore not possible to check it using C expressions. Reported-by: Anatoly Pugachev <matorola@gmail.com> Acked-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: Use for_each_console() helperAndy Shevchenko
Replace open coded single-linked list iteration loop with for_each_console() helper in use. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: remove redundant assignment to pointer bpColin Ian King
The point bp is assigned a value that is never read, it is being re-assigned later to bp = &kdb_breakpoints[lowbp] in a for-loop. Remove the redundant assignment. Addresses-Coverity ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20191128130753.181246-1-colin.king@canonical.com Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: Get rid of confusing diag msg from "rd" if current task has no regsDouglas Anderson
If you switch to a sleeping task with the "pid" command and then type "rd", kdb tells you this: No current kdb registers. You may need to select another task diag: -17: Invalid register name The first message makes sense, but not the second. Fix it by just returning 0 after commands accessing the current registers finish if we've already printed the "No current kdb registers" error. While fixing kdb_rd(), change the function to use "if" rather than "ifdef". It cleans the function up a bit and any modern compiler will have no trouble handling still producing good code. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111624.5.I121f4c6f0c19266200bf6ef003de78841e5bfc3d@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: Gid rid of implicit setting of the current task / regsDouglas Anderson
Some (but not all?) of the kdb backtrace paths would cause the kdb_current_task and kdb_current_regs to remain changed. As discussed in a review of a previous patch [1], this doesn't seem intuitive, so let's fix that. ...but, it turns out that there's actually no longer any reason to set the current task / current regs while backtracing anymore anyway. As of commit 2277b492582d ("kdb: Fix stack crawling on 'running' CPUs that aren't the master") if we're backtracing on a task running on a CPU we ask that CPU to do the backtrace itself. Linux can do that without anything fancy. If we're doing backtrace on a sleeping task we can also do that fine without updating globals. So this patch mostly just turns into deleting a bunch of code. [1] https://lore.kernel.org/r/20191010150735.dhrj3pbjgmjrdpwr@holly.lan Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111624.4.Ibc3d982bbeb9e46872d43973ba808cd4c79537c7@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: kdb_current_task shouldn't be exportedDouglas Anderson
The kdb_current_task variable has been declared in "kernel/debug/kdb/kdb_private.h" since 2010 when kdb was added to the mainline kernel. This is not a public header. There should be no reason that kdb_current_task should be exported and there are no in-kernel users that need it. Remove the export. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111623.3.I14b22b5eb15ca8f3812ab33e96621231304dc1f7@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: kdb_current_regs should be privateDouglas Anderson
As of the patch ("MIPS: kdb: Remove old workaround for backtracing on other CPUs") there is no reason for kdb_current_regs to be in the public "kdb.h". Let's move it next to kdb_current_task. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111623.2.Iadbfb484e90b557cc4b5ac9890bfca732cd99d77@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-10-28kdb: Tweak escape handling for vi usersDaniel Thompson
Currently if sequences such as "\ehelp\r" are delivered to the console then the h gets eaten by the escape handling code. Since pressing escape becomes something of a nervous twitch for vi users (and that escape doesn't have much effect at a shell prompt) it is more helpful to emit the 'h' than the '\e'. We don't simply choose to emit the final character for all escape sequences since that will do odd things for unsupported escape sequences (in other words we retain the existing behaviour once we see '\e['). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191025073328.643-6-daniel.thompson@linaro.org
2019-10-28kdb: Improve handling of characters from different input sourcesDaniel Thompson
Currently if an escape timer is interrupted by a character from a different input source then the new character is discarded and the function returns '\e' (which will be discarded by the level above). It is hard to see why this would ever be the desired behaviour. Fix this to return the new character rather than the '\e'. This is a bigger refactor than might be expected because the new character needs to go through escape sequence detection. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191025073328.643-5-daniel.thompson@linaro.org
2019-10-28kdb: Remove special case logic from kdb_read()Daniel Thompson
kdb_read() contains special case logic to force it exit after reading a single character. We can remove all the special case logic by directly calling the function to read a single character instead. This also allows us to tidy up the function prototype which, because it now matches getchar(), we can also rename in order to make its role clearer. This does involve some extra code to handle btaprompt properly but we don't mind the new lines of code here because the old code had some interesting problems (bad newline handling, treating unexpected characters like <cr>). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191025073328.643-4-daniel.thompson@linaro.org
2019-10-28kdb: Simplify code to fetch characters from consoleDaniel Thompson
Currently kdb_read_get_key() contains complex control flow that, on close inspection, turns out to be unnecessary. In particular: 1. It is impossible to enter the branch conditioned on (escape_delay == 1) except when the loop enters with (escape_delay == 2) allowing us to combine the branches. 2. Most of the code conditioned on (escape_delay == 2) simply modifies local data and then breaks out of the loop causing the function to return escape_data[0]. 3. Based on #2 there is not actually any need to ever explicitly set escape_delay to 2 because we it is much simpler to directly return escape_data[0] instead. 4. escape_data[0] is, for all but one exit path, known to be '\e'. Simplify the code based on these observations. There is a subtle (and harmless) change of behaviour resulting from this simplification: instead of letting the escape timeout after ~1998 milliseconds we now timeout after ~2000 milliseconds Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191025073328.643-3-daniel.thompson@linaro.org
2019-10-28kdb: Tidy up code to handle escape sequencesDaniel Thompson
kdb_read_get_key() has extremely complex break/continue control flow managed by state variables and is very hard to review or modify. In particular the way the escape sequence handling interacts with the general control flow is hard to follow. Separate out the escape key handling, without changing the control flow. This makes the main body of the code easier to review. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191025073328.643-2-daniel.thompson@linaro.org
2019-10-24kdb: Avoid array subscript warnings on non-SMP buildsDaniel Thompson
Recent versions of gcc (reported on gcc-7.4) issue array subscript warnings for builds where SMP is not enabled. kernel/debug/debug_core.c: In function 'kdb_dump_stack_on_cpu': kernel/debug/debug_core.c:452:17: warning: array subscript is outside array +bounds [-Warray-bounds] if (!(kgdb_info[cpu].exception_state & DCPU_IS_SLAVE)) { ~~~~~~~~~^~~~~ kernel/debug/debug_core.c:469:33: warning: array subscript is outside array +bounds [-Warray-bounds] kgdb_info[cpu].exception_state |= DCPU_WANT_BT; kernel/debug/debug_core.c:470:18: warning: array subscript is outside array +bounds [-Warray-bounds] while (kgdb_info[cpu].exception_state & DCPU_WANT_BT) There is no bug here but there is scope to improve the code generation for non-SMP systems (whilst also silencing the warning). Reported-by: kbuild test robot <lkp@intel.com> Fixes: 2277b492582d ("kdb: Fix stack crawling on 'running' CPUs that aren't the master") Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20191021101057.23861-1-daniel.thompson@linaro.org Reviewed-by: Douglas Anderson <dianders@chromium.org>
2019-10-10kdb: Fix stack crawling on 'running' CPUs that aren't the masterDouglas Anderson
In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily give you the right info. Specifically on many architectures (including arm64, where I tested) you can't dump the stack of a "running" process that isn't the process running on the current CPU. This can be seen by this: echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT # wait 2 seconds <sysrq>g Here's what I see now on rk3399-gru-kevin. I see the stack crawl for the CPU that handled the sysrq but everything else just shows me stuck in __switch_to() which is bogus: ====== [0]kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0, 1-3(I), 4, 5(I) Stack traceback for pid 0 0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0 Call trace: dump_backtrace+0x0/0x138 ... kgdb_compiled_brk_fn+0x34/0x44 ... sysrq_handle_dbg+0x34/0x5c Stack traceback for pid 0 0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f65616c0 Stack traceback for pid 0 0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f65806c0 Stack traceback for pid 0 0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f659f6c0 Stack traceback for pid 1474 0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash Call trace: __switch_to+0x1e4/0x240 __schedule+0x464/0x618 0xffffffc0dde8b040 Stack traceback for pid 0 0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f65dd6c0 === The problem is that 'btc' eventually boils down to show_stack(task_struct, NULL); ...and show_stack() doesn't work for "running" CPUs because their registers haven't been stashed. On x86 things might work better (I haven't tested) because kdb has a special case for x86 in kdb_show_stack() where it passes the stack pointer to show_stack(). This wouldn't work on arm64 where the stack crawling function seems needs the "fp" and "pc", not the "sp" which is presumably why arm64's show_stack() function totally ignores the "sp" parameter. NOTE: we _can_ get a good stack dump for all the cpus if we manually switch each one to the kdb master and do a back trace. AKA: cpu 4 bt ...will give the expected trace. That's because now arm64's dump_backtrace will now see that "tsk == current" and go through a different path. In this patch I fix the problems by catching a request to stack crawl a task that's running on a CPU and then I ask that CPU to do the stack crawl. NOTE: this will (presumably) change what stack crawls are printed for x86 machines. Now kdb functions will show up in the stack crawl. Presumably this is OK but if it's not we can go back and add a special case for x86 again. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-10-10kdb: Fix "btc <cpu>" crash if the CPU didn't round upDouglas Anderson
I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't rounded up that I'd crash. I was going to copy the same fix from commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round up") into the "not all the CPUs" case, but decided it'd be better to clean things up a little bit. This consolidates the two code paths. It is _slightly_ wasteful in in that the checks for "cpu" being too small or being offline isn't really needed when we're iterating over all online CPUs, but that really shouldn't hurt. Better to have the same code path. While at it, eliminate at least one slightly ugly (and totally needless) recursive use of kdb_parse(). Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-10-10kdb: Remove unused "argcount" param from kdb_bt1(); make btaprompt boolDouglas Anderson
The kdb_bt1() had a mysterious "argcount" parameter passed in (always the number 5, by the way) and never used. Presumably this is just old cruft. Remove it. While at it, upgrade the btaprompt parameter to a full fledged bool instead of an int. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-10-10kgdb: Remove unused DCPU_SSTEP definitionDouglas Anderson
From doing a 'git log --patch kernel/debug', it looks as if DCPU_SSTEP has never been used. Presumably it used to be used back when kgdb was out of tree and nobody thought to delete the definition when the usage went away. Delete. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-09-25kgdb: don't use a notifier to enter kgdb at panic; call directlyDouglas Anderson
Right now kgdb/kdb hooks up to debug panics by registering for the panic notifier. This works OK except that it means that kgdb/kdb gets called _after_ the CPUs in the system are taken offline. That means that if anything important was happening on those CPUs (like something that might have contributed to the panic) you can't debug them. Specifically I ran into a case where I got a panic because a task was "blocked for more than 120 seconds" which was detected on CPU 2. I nicely got shown stack traces in the kernel log for all CPUs including CPU 0, which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of __mmc_switch(). I then ended up at the kdb prompt where switched over to kgdb to try to look at local variables of the process on CPU 0. I found that I couldn't. Digging more, I found that I had no info on any tasks running on CPUs other than CPU 2 and that asking kdb for help showed me "Error: no saved data for this cpu". This was because all the CPUs were offline. Let's move the entry of kdb/kgdb to a direct call from panic() and stop using the generic notifier. Putting a direct call in allows us to order things more properly and it also doesn't seem like we're breaking any abstractions by calling into the debugger from the panic function. Daniel said: : This patch changes the way kdump and kgdb interact with each other. : However it would seem rather odd to have both tools simultaneously armed : and, even if they were, the user still has the option to use panic_timeout : to force a kdump to happen. Thus I think the change of order is : acceptable. Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-03kgdb: fix comment regarding static functionNadav Amit
The comment that says that module_event() is not static is clearly wrong. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-09-03kdb: Replace strncmp with str_has_prefixChuhong Yuan
strncmp(str, const, len) is error-prone. We had better use newly introduced str_has_prefix() instead of it. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14kdb: Fix bound check compiler warningWenlin Kang
The strncpy() function may leave the destination string buffer unterminated, better use strscpy() instead. This fixes the following warning with gcc 8.2: kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr': kernel/debug/kdb/kdb_io.c:449:3: warning: 'strncpy' specified bound 256 equals destination size [-Wstringop-truncation] strncpy(kdb_prompt_str, prompt, CMD_BUFLEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-05-12kdb: do a sanity check on the cpu in kdb_per_cpu()Dan Carpenter
The "whichcpu" comes from argv[3]. The cpu_online() macro looks up the cpu in a bitmap of online cpus, but if the value is too high then it could read beyond the end of the bitmap and possibly Oops. Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-05-12kdb: Get rid of broken attempt to print CCVERSION in kdb summaryDouglas Anderson
If you drop into kdb and type "summary", it prints out a line that says this: ccversion CCVERSION ...and I don't mean that it actually prints out the version of the C compiler. It literally prints out the string "CCVERSION". The version of the C Compiler is already printed at boot up and it doesn't seem useful to replicate this in kdb. Let's just delete it. We can also delete the bit of the Makefile that called the C compiler in an attempt to pass this into kdb. This will remove one extra call to the C compiler at Makefile parse time and (very slightly) speed up builds. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-05-02kdb: kdb_support: replace strcpy() by strscpy()Gustavo A. R. Silva
The strcpy() function is being deprecated. Replace it by the safer strscpy() and fix the following Coverity warning: "You might overrun the 129-character fixed-size string ks_namebuf by copying name without checking the length." Addresses-Coverity-ID: 138995 ("Copy into fixed size buffer") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-05-02gdbstub: Replace strcpy() by strscpy()Gustavo A. R. Silva
The strcpy() function is being deprecated. Replace it by the safer strscpy() and fix the following Coverity warning: "You might overrun the 1024-character fixed-size string remcom_in_buffer by copying cmd without checking the length." Addresses-Coverity-ID: 138999 ("Copy into fixed size buffer") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2019-05-02gdbstub: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: kernel/debug/gdbstub.c: In function ‘gdb_serial_stub’: kernel/debug/gdbstub.c:1031:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (remcom_in_buffer[1] == '\0') { ^ kernel/debug/gdbstub.c:1036:3: note: here case 'C': /* Exception passing */ ^~~~ kernel/debug/gdbstub.c:1040:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (tmp == 0) ^ kernel/debug/gdbstub.c:1043:3: note: here case 'c': /* Continue packet */ ^~~~ kernel/debug/gdbstub.c:1050:4: warning: this statement may fall through [-Wimplicit-fallthrough=] dbg_activate_sw_breakpoints(); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/debug/gdbstub.c:1052:3: note: here default: ^~~~~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kdb: use bool for binary state indicatorsNicholas Mc Guire
defcmd_in_progress is the state trace for command group processing - within a command group or not - usable is an indicator if a command set is valid (allocated/non-empty) - so use a bool for those binary indication here. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kdb: Don't back trace on a cpu that didn't round upDouglas Anderson
If you have a CPU that fails to round up and then run 'btc' you'll end up crashing in kdb becaue we dereferenced NULL. Let's add a check. It's wise to also set the task to NULL when leaving the debugger so that if we fail to round up on a later entry into the debugger we won't backtrace a stale task. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kgdb: Don't round up a CPU that failed rounding up beforeDouglas Anderson
If we're using the default implementation of kgdb_roundup_cpus() that uses smp_call_function_single_async() we can end up hanging kgdb_roundup_cpus() if we try to round up a CPU that failed to round up before. Specifically smp_call_function_single_async() will try to wait on the csd lock for the CPU that we're trying to round up. If the previous round up never finished then that lock could still be held and we'll just sit there hanging. There's not a lot of use trying to round up a CPU that failed to round up before. Let's keep a flag that indicates whether the CPU started but didn't finish to round up before. If we see that flag set then we'll skip the next round up. In general we have a few goals here: - We never want to end up calling smp_call_function_single_async() when the csd is still locked. This is accomplished because flush_smp_call_function_queue() unlocks the csd _before_ invoking the callback. That means that when kgdb_nmicallback() runs we know for sure the the csd is no longer locked. Thus when we set "rounding_up = false" we know for sure that the csd is unlocked. - If there are no timeouts rounding up we should never skip a round up. NOTE #1: In general trying to continue running after failing to round up CPUs doesn't appear to be supported in the debugger. When I simulate this I find that kdb reports "Catastrophic error detected" when I try to continue. I can overrule and continue anyway, but it should be noted that we may be entering the land of dragons here. Possibly the "Catastrophic error detected" was added _because_ of the future failure to round up, but even so this is an area of the code that hasn't been strongly tested. NOTE #2: I did a bit of testing before and after this change. I introduced a 10 second hang in the kernel while holding a spinlock that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...". Before this change if I did: - Invoke hang - Enter debugger - g (which warns about Catastrophic error, g again to go anyway) - g - Enter debugger ...I'd hang the rest of the 10 seconds without getting a debugger prompt. After this change I end up in the debugger the 2nd time after only 1 second with the standard warning about 'Timed out waiting for secondary CPUs.' I'll also note that once the CPU finished waiting I could actually debug it (aka "btc" worked) I won't promise that everything works perfectly if the errant CPU comes back at just the wrong time (like as we're entering or exiting the debugger) but it certainly seems like an improvement. NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some implementations override kgdb_call_nmi_hook(). It shouldn't hurt to have it in kgdb_nmicallback() in any case. NOTE #4: this logic is really only needed because there is no API call like "smp_try_call_function_single_async()" or "smp_csd_is_locked()". If such an API existed then we'd use it instead, but it seemed a bit much to add an API like this just for kgdb. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()Douglas Anderson
When I had lockdep turned on and dropped into kgdb I got a nice splat on my system. Specifically it hit: DEBUG_LOCKS_WARN_ON(current->hardirq_context) Specifically it looked like this: sysrq: SysRq : DEBUG ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(current->hardirq_context) WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27 pstate: 604003c9 (nZCv DAIF +PAN -UAO) pc : lockdep_hardirqs_on+0xf0/0x160 ... Call trace: lockdep_hardirqs_on+0xf0/0x160 trace_hardirqs_on+0x188/0x1ac kgdb_roundup_cpus+0x14/0x3c kgdb_cpu_enter+0x53c/0x5cc kgdb_handle_exception+0x180/0x1d4 kgdb_compiled_brk_fn+0x30/0x3c brk_handler+0x134/0x178 do_debug_exception+0xfc/0x178 el1_dbg+0x18/0x78 kgdb_breakpoint+0x34/0x58 sysrq_handle_dbg+0x54/0x5c __handle_sysrq+0x114/0x21c handle_sysrq+0x30/0x3c qcom_geni_serial_isr+0x2dc/0x30c ... ... irq event stamp: ...45 hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4 hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130 softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34 softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100 ---[ end trace adf21f830c46e638 ]--- Looking closely at it, it seems like a really bad idea to be calling local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems like it could violate spinlock semantics and cause a deadlock. Instead, let's use a private csd alongside smp_call_function_single_async() to round up the other CPUs. Using smp_call_function_single_async() doesn't require interrupts to be enabled so we can remove the offending bit of code. In order to avoid duplicating this across all the architectures that use the default kgdb_roundup_cpus(), we'll add a "weak" implementation to debug_core.c. Looking at all the people who previously had copies of this code, there were a few variants. I've attempted to keep the variants working like they used to. Specifically: * For arch/arc we passed NULL to kgdb_nmicallback() instead of get_irq_regs(). * For arch/mips there was a bit of extra code around kgdb_nmicallback() NOTE: In this patch we will still get into trouble if we try to round up a CPU that failed to round up before. We'll try to round it up again and potentially hang when we try to grab the csd lock. That's not new behavior but we'll still try to do better in a future patch. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>