<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/arch/arm64/kernel/entry-ftrace.S, branch docs-mw</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-mw</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-mw'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2025-11-12T17:07:59+00:00</updated>
<entry>
<title>arm64: Fix double word in comments</title>
<updated>2025-11-12T17:07:59+00:00</updated>
<author>
<name>Bo Liu</name>
<email>liubo03@inspur.com</email>
</author>
<published>2025-10-29T07:17:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=337f7e3a4b4d60b41d3e9a389675bb9353925183'/>
<id>urn:sha1:337f7e3a4b4d60b41d3e9a389675bb9353925183</id>
<content type='text'>
Remove the repeated word "the" in comments.

Signed-off-by: Bo Liu &lt;liubo03@inspur.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>fgraph: Replace fgraph_ret_regs with ftrace_regs</title>
<updated>2024-12-26T15:50:02+00:00</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2024-12-26T05:11:55+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a3ed4157b7d89800a0008de0c9e46a438a5c3745'/>
<id>urn:sha1:a3ed4157b7d89800a0008de0c9e46a438a5c3745</id>
<content type='text'>
Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.

The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Acked-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Cc: Florent Revest &lt;revest@chromium.org&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: bpf &lt;bpf@vger.kernel.org&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alan Maguire &lt;alan.maguire@oracle.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: x86@kernel.org
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL</title>
<updated>2023-06-20T22:38:37+00:00</updated>
<author>
<name>Donglin Peng</name>
<email>pengdonglin@sangfor.com.cn</email>
</author>
<published>2023-04-08T12:42:18+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3646970322464c21e69dcb9a2e37d461c5834bf5'/>
<id>urn:sha1:3646970322464c21e69dcb9a2e37d461c5834bf5</id>
<content type='text'>
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the ARM64 platform.

We introduce a new structure called fgraph_ret_regs for the ARM64
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address to
the function ftrace_return_to_handler to record the return value.

Link: https://lkml.kernel.org/r/c78366416ce93f704ae7000c4ee60eb4258c38f7.1680954589.git.pengdonglin@sangfor.com.cn

Reviewed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Donglin Peng &lt;pengdonglin@sangfor.com.cn&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>arm64: ftrace: Add direct call support</title>
<updated>2023-04-11T17:06:39+00:00</updated>
<author>
<name>Florent Revest</name>
<email>revest@chromium.org</email>
</author>
<published>2023-04-05T18:02:46+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2aa6ac03516d078cf0c35aaa273b5cd11ea9734c'/>
<id>urn:sha1:2aa6ac03516d078cf0c35aaa273b5cd11ea9734c</id>
<content type='text'>
This builds up on the CALL_OPS work which extends the ftrace patchsite
on arm64 with an ops pointer usable by the ftrace trampoline.

This ops pointer is valid at all time. Indeed, it is either pointing to
ftrace_list_ops or to the single ops which should be called from that
patchsite.

There are a few cases to distinguish:
- If a direct call ops is the only one tracing a function:
  - If the direct called trampoline is within the reach of a BL
    instruction
     -&gt; the ftrace patchsite jumps to the trampoline
  - Else
     -&gt; the ftrace patchsite jumps to the ftrace_caller trampoline which
        reads the ops pointer in the patchsite and jumps to the direct
        call address stored in the ops
- Else
  -&gt; the ftrace patchsite jumps to the ftrace_caller trampoline and its
     ops literal points to ftrace_list_ops so it iterates over all
     registered ftrace ops, including the direct call ops and calls its
     call_direct_funcs handler which stores the direct called
     trampoline's address in the ftrace_regs and the ftrace_caller
     trampoline will return to that address instead of returning to the
     traced function

Signed-off-by: Florent Revest &lt;revest@chromium.org&gt;
Co-developed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20230405180250.2046566-2-revest@chromium.org
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS</title>
<updated>2023-01-24T11:49:43+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2023-01-23T13:46:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=baaf553d3bc330697c68a00f96cf11f4edfeac7e'/>
<id>urn:sha1:baaf553d3bc330697c68a00f96cf11f4edfeac7e</id>
<content type='text'>
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.

The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.

As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.

Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops-&gt;func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).

I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:

  https://lore.kernel.org/lkml/20230103124912.2948963-1-mark.rutland@arm.com
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109

Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).

Before this patch:

  Number of tracers     || Total time  | Per-call average time (ns)
  Relevant | Irrelevant || (ns)        | Total        | Overhead
  =========+============++=============+==============+============
         0 |          0 ||      94,583 |         0.95 |           -
         0 |          1 ||      93,709 |         0.94 |           -
         0 |          2 ||      93,666 |         0.94 |           -
         0 |         10 ||      93,709 |         0.94 |           -
         0 |        100 ||      93,792 |         0.94 |           -
  ---------+------------++-------------+--------------+------------
         1 |          1 ||   6,467,833 |        64.68 |       63.73
         1 |          2 ||   7,509,708 |        75.10 |       74.15
         1 |         10 ||  23,786,792 |       237.87 |      236.92
         1 |        100 || 106,432,500 |     1,064.43 |     1063.38
  ---------+------------++-------------+--------------+------------
         1 |          0 ||   1,431,875 |        14.32 |       13.37
         2 |          0 ||   6,456,334 |        64.56 |       63.62
        10 |          0 ||  22,717,000 |       227.17 |      226.22
       100 |          0 || 103,293,667 |      1032.94 |     1031.99
  ---------+------------++-------------+--------------+--------------

  Note: per-call overhead is estimated relative to the baseline case
  with 0 relevant tracers and 0 irrelevant tracers.

After this patch

  Number of tracers     || Total time  | Per-call average time (ns)
  Relevant | Irrelevant || (ns)        | Total        | Overhead
  =========+============++=============+==============+============
         0 |          0 ||      94,541 |         0.95 |           -
         0 |          1 ||      93,666 |         0.94 |           -
         0 |          2 ||      93,709 |         0.94 |           -
         0 |         10 ||      93,667 |         0.94 |           -
         0 |        100 ||      93,792 |         0.94 |           -
  ---------+------------++-------------+--------------+------------
         1 |          1 ||     281,000 |         2.81 |        1.86
         1 |          2 ||     281,042 |         2.81 |        1.87
         1 |         10 ||     280,958 |         2.81 |        1.86
         1 |        100 ||     281,250 |         2.81 |        1.87
  ---------+------------++-------------+--------------+------------
         1 |          0 ||     280,959 |         2.81 |        1.86
         2 |          0 ||   6,502,708 |        65.03 |       64.08
        10 |          0 ||  18,681,209 |       186.81 |      185.87
       100 |          0 || 103,550,458 |     1,035.50 |     1034.56
  ---------+------------++-------------+--------------+------------

  Note: per-call overhead is estimated relative to the baseline case
  with 0 relevant tracers and 0 irrelevant tracers.

As can be seen from the above:

a) Whenever there is a single relevant tracer function associated with a
   tracee, the overhead of invoking the tracer is constant, and does not
   scale with the number of tracers which are *not* associated with that
   tracee.

b) The overhead for a single relevant tracer has dropped to ~1/7 of the
   overhead prior to this series (from 13.37ns to 1.86ns). This is
   largely due to permitting calls to dynamically-allocated ftrace_ops
   without going through ftrace_ops_list_func.

I've run the ftrace selftests from v6.2-rc3, which reports:

| # of passed:  110
| # of failed:  0
| # of unresolved:  3
| # of untested:  0
| # of unsupported:  0
| # of xfailed:  1
| # of undefined(test bug):  0

... where the unresolved entries were the tests for DIRECT functions
(which are not supported), and the checkbashisms selftest (which is
irrelevant here):

| [8] Test ftrace direct functions against tracers        [UNRESOLVED]
| [9] Test ftrace direct functions against kprobes        [UNRESOLVED]
| [62] Meta-selftest: Checkbashisms       [UNRESOLVED]

... with all other tests passing (or failing as expected).

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Florent Revest &lt;revest@chromium.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20230123134603.1064407-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux</title>
<updated>2022-12-12T17:50:05+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-12T17:50:05+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=06cff4a58e7dfa018c5f8a6ebdc3ff12745e0bae'/>
<id>urn:sha1:06cff4a58e7dfa018c5f8a6ebdc3ff12745e0bae</id>
<content type='text'>
Pull arm64 updates from Will Deacon:
 "The highlights this time are support for dynamically enabling and
  disabling Clang's Shadow Call Stack at boot and a long-awaited
  optimisation to the way in which we handle the SVE register state on
  system call entry to avoid taking unnecessary traps from userspace.

  Summary:

  ACPI:
   - Enable FPDT support for boot-time profiling
   - Fix CPU PMU probing to work better with PREEMPT_RT
   - Update SMMUv3 MSI DeviceID parsing to latest IORT spec
   - APMT support for probing Arm CoreSight PMU devices

  CPU features:
   - Advertise new SVE instructions (v2.1)
   - Advertise range prefetch instruction
   - Advertise CSSC ("Common Short Sequence Compression") scalar
     instructions, adding things like min, max, abs, popcount
   - Enable DIT (Data Independent Timing) when running in the kernel
   - More conversion of system register fields over to the generated
     header

  CPU misfeatures:
   - Workaround for Cortex-A715 erratum #2645198

  Dynamic SCS:
   - Support for dynamic shadow call stacks to allow switching at
     runtime between Clang's SCS implementation and the CPU's pointer
     authentication feature when it is supported (complete with scary
     DWARF parser!)

  Tracing and debug:
   - Remove static ftrace in favour of, err, dynamic ftrace!
   - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
     and existing arch code
   - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
     old FTRACE_WITH_REGS
   - Extend 'crashkernel=' parameter with default value and fallback to
     placement above 4G physical if initial (low) allocation fails

  SVE:
   - Optimisation to avoid disabling SVE unconditionally on syscall
     entry and just zeroing the non-shared state on return instead

  Exceptions:
   - Rework of undefined instruction handling to avoid serialisation on
     global lock (this includes emulation of user accesses to the ID
     registers)

  Perf and PMU:
   - Support for TLP filters in Hisilicon's PCIe PMU device
   - Support for the DDR PMU present in Amlogic Meson G12 SoCs
   - Support for the terribly-named "CoreSight PMU" architecture from
     Arm (and Nvidia's implementation of said architecture)

  Misc:
   - Tighten up our boot protocol for systems with memory above 52 bits
     physical
   - Const-ify static keys to satisty jump label asm constraints
   - Trivial FFA driver cleanups in preparation for v1.1 support
   - Export the kernel_neon_* APIs as GPL symbols
   - Harden our instruction generation routines against instrumentation
   - A bunch of robustness improvements to our arch-specific selftests
   - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
  arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
  arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
  arm64: Prohibit instrumentation on arch_stack_walk()
  arm64:uprobe fix the uprobe SWBP_INSN in big-endian
  arm64: alternatives: add __init/__initconst to some functions/variables
  arm_pmu: Drop redundant armpmu-&gt;map_event() in armpmu_event_init()
  kselftest/arm64: Allow epoll_wait() to return more than one result
  kselftest/arm64: Don't drain output while spawning children
  kselftest/arm64: Hold fp-stress children until they're all spawned
  arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
  arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
  arm64/sysreg: Convert MVFR2_EL1 to automatic generation
  arm64/sysreg: Convert MVFR1_EL1 to automatic generation
  arm64/sysreg: Convert MVFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
  ...
</content>
</entry>
<entry>
<title>ftrace: arm64: remove static ftrace</title>
<updated>2022-11-25T12:11:50+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2022-11-22T16:36:24+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=cfce092dae95ed81391e49f273353a96cc6dec64'/>
<id>urn:sha1:cfce092dae95ed81391e49f273353a96cc6dec64</id>
<content type='text'>
The build test robot pointer out that there's a build failure when:

  CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
  CONFIG_DYNAMIC_FTRACE_WITH_ARGS=n

... due to some mismatched ifdeffery, some of which checks
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and some of which checks
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, leading to some missing definitions expected
by the core code when CONFIG_DYNAMIC_FTRACE=n and consequently
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=n.

There's really not much point in supporting CONFIG_DYNAMIC_FTRACE=n (AKA
static ftrace). All supported toolchains allow us to implement
DYNAMIC_FTRACE, distributions all prefer DYNAMIC_FTRACE, and both
powerpc and s390 removed support for static ftrace in commits:

  0c0c52306f4792a4 ("powerpc: Only support DYNAMIC_FTRACE not static")
  5d6a0163494c78ad ("s390/ftrace: enforce DYNAMIC_FTRACE if FUNCTION_TRACER is selected")

... and according to Steven, static ftrace is only supported on x86 to
allow testing that the core code still functions in this configuration.

Given that, let's simplify matters by removing arm64's support for
static ftrace. This avoids the problem originally reported, and leaves
us with less code to maintain.

Fixes: 26299b3f6ba2 ("ftrace: arm64: move from REGS to ARGS")
Link: https://lore.kernel.org/r/202211212249.livTPi3Y-lkp@intel.com
Suggested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Link: https://lore.kernel.org/r/20221122163624.1225912-1-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>ftrace: arm64: move from REGS to ARGS</title>
<updated>2022-11-18T13:56:41+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2022-11-03T17:05:20+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=26299b3f6ba26bfc234b73126d14bdf4dec5275a'/>
<id>urn:sha1:26299b3f6ba26bfc234b73126d14bdf4dec5275a</id>
<content type='text'>
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).

FTRACE_WITH_REGS has been supported on arm64 since commit:

  3b23e4991fb66f6d ("arm64: implement ftrace with regs")

As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:

(1) To make it possible to use the ftrace graph tracer with pointer
    authentication, where it's necessary to snapshot/manipulate the LR
    before it is signed by the instrumented function.

(2) To make it possible to implement LIVEPATCH in future, where we need
    to hook function entry before an instrumented function manipulates
    the stack or argument registers. Practically speaking, we need to
    preserve the argument/return registers, PC, LR, and SP.

Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:

  https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst

Per AAPCS64, all function call argument and return values are held in
the following GPRs:

* X0 - X7 : parameter / result registers
* X8      : indirect result location register
* SP      : stack pointer (AKA SP)

Additionally, ad function call boundaries, the following GPRs hold
context/return information:

* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)

... and for ftrace we need to capture the instrumented address:

 * PC  : program counter

No other GPRs are relevant, as none of the other arguments hold
parameters or return values:

* X9  - X17 : temporaries, may be clobbered
* X18       : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved

This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.

This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.

As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.

I've tested this with the ftrace selftests, where there are no
unexpected failures.

Co-developed-by: Florent Revest &lt;revest@chromium.org&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Florent Revest &lt;revest@chromium.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: ftrace: Define ftrace_stub_graph only with FUNCTION_GRAPH_TRACER</title>
<updated>2022-11-14T12:28:52+00:00</updated>
<author>
<name>Sami Tolvanen</name>
<email>samitolvanen@google.com</email>
</author>
<published>2022-11-09T19:28:31+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2598ac6ec493566105f8a1029aa9de36ba6946ca'/>
<id>urn:sha1:2598ac6ec493566105f8a1029aa9de36ba6946ca</id>
<content type='text'>
The 0-day bot reports that arm64 builds with CONFIG_CFI_CLANG +
CONFIG_FTRACE are broken when CONFIG_FUNCTION_GRAPH_TRACER is not
enabled:

 ld.lld: error: undefined symbol: __kcfi_typeid_ftrace_stub_graph
 &gt;&gt;&gt; referenced by entry-ftrace.S:299 (arch/arm64/kernel/entry-ftrace.S:299)
 &gt;&gt;&gt;               arch/arm64/kernel/entry-ftrace.o:(.text+0x48) in archive vmlinux.a

This is caused by ftrace_stub_graph using SYM_TYPE_FUNC_START when
the address of the function is not taken in any C translation unit.

Fix the build by only defining ftrace_stub_graph when it's actually
needed, i.e. with CONFIG_FUNCTION_GRAPH_TRACER.

Link: https://lore.kernel.org/lkml/202210251659.tRMs78RH-lkp@intel.com/
Fixes: 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Sami Tolvanen &lt;samitolvanen@google.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Link: https://lore.kernel.org/r/20221109192831.3057131-1-samitolvanen@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()</title>
<updated>2022-10-20T15:10:27+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-18T11:49:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=883bbbffa5a4ffd1915f8b42934dab81b7f87226'/>
<id>urn:sha1:883bbbffa5a4ffd1915f8b42934dab81b7f87226</id>
<content type='text'>
Different function signatures means they needs to be different
functions; otherwise CFI gets upset.

As triggered by the ftrace boot tests:

  [] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347)

Fixes: 3c516f89e17e ("x86: Add support for CONFIG_CFI_CLANG")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net
</content>
</entry>
</feed>
