summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorTejun Heo <tj@kernel.org>2026-06-01 08:37:28 -1000
committerAlexei Starovoitov <ast@kernel.org>2026-06-05 08:22:36 -0700
commitf64c723741c911544cca4c838d7a291b06b3ad1d (patch)
treea9c1224ebd7b471b2b8368216f6067ba56abc483 /include/linux
parentaa496720618f1a6054f1c870bf10b4f6c99bf656 (diff)
downloadlwn-f64c723741c911544cca4c838d7a291b06b3ad1d.tar.gz
lwn-f64c723741c911544cca4c838d7a291b06b3ad1d.zip
bpf: Replace scratch PTE atomically when allocating arena pages
apply_range_set_cb() maps the pages for a new arena allocation and returned -EBUSY when the target PTE was already populated. Kernel-fault recovery leaves the per-arena scratch page in unallocated arena PTEs, so a later bpf_arena_alloc_pages() over such a page hits that -EBUSY, and every subsequent allocation of it fails the same way. Allocation must install the real page over scratch instead. Overwriting the scratch PTE in place is a valid->valid change, which arm64 forbids without break-before-make. Route through an invalid entry instead: ptep_try_set() fills only a none slot, so the PTE goes scratch->none->page. On finding scratch, clear it and flush_tlb_before_set() before retrying. The new flush_tlb_before_set() is a no-op except on arches like arm64 that need the break-before-make TLB invalidate. The loop also copes with a concurrent fault re-scratching the slot. Arches without ptep_try_set() never install the scratch page, so keep the must-be-empty check and set_pte_at() for them. Fixes: dc11a4dba246 ("bpf: Recover arena kernel faults with scratch page") Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: David Hildenbrand <david@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260601183728.1800490-1-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/pgtable.h18
1 files changed, 18 insertions, 0 deletions
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b5739bb99fc1..4c6c4081ef71 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1061,6 +1061,24 @@ static inline bool ptep_try_set(pte_t *ptep, pte_t new_pte)
}
#endif
+#ifndef flush_tlb_before_set
+/**
+ * flush_tlb_before_set - invalidate a kernel PTE's TLB before re-setting it
+ * @addr: kernel virtual address whose PTE was just cleared
+ *
+ * Some architectures (e.g. arm64) do not allow a live page-table entry to be
+ * repointed at a different page in one step. The old entry must first be made
+ * invalid and its translation flushed from every TLB, and only then may the new
+ * entry be written.
+ *
+ * This is only for the lockless atomic kernel-PTE installers (ptep_try_set()).
+ * It must be callable with interrupts disabled.
+ */
+static inline void flush_tlb_before_set(unsigned long addr)
+{
+}
+#endif
+
#ifndef wrprotect_ptes
/**
* wrprotect_ptes - Write-protect PTEs that map consecutive pages of the same