summaryrefslogtreecommitdiff
path: root/kernel/bpf/syscall.c
AgeCommit message (Collapse)Author
5 daysbpf: Add support to specify uprobe_multi target via file descriptorJiri Olsa
Allow uprobe_multi link to identify the target binary by an already opened file descriptor. Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for the attr.link_create.uprobe_multi struct. When the flag is set, we resolve the target from path_fd, without the flag, we keep the existing string path behavior. I don't see a use case for supporting O_PATH file descriptors, because we need to read the binary first to get probes offsets, so I'm using the CLASS(fd, f), which fails for O_PATH fds. Assisted-by: Codex:GPT-5.4 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: Cancel special fields on map value recycleJustin Suess
Map update and delete paths currently call bpf_obj_free_fields() when a value is being replaced or recycled. That makes field destruction depend on the context of the update/delete operation. For tracing programs this can include NMI context, where referenced kptr destructors, uptr unpinning, and graph root destruction are not generally safe. Introduce bpf_obj_cancel_fields() for the reusable-value path. It only performs NMI-safe cleanup for timer, workqueue, and task_work fields. Fields that need full destruction are left attached to the recycled value and are destroyed by the final cleanup path instead. Switch array and hashtab update/delete/recycle paths to this cancel helper. Keep bpf_obj_free_fields() for final map destruction and for bpf_mem_alloc destructors. Preallocated hashtabs do not have allocator destructors, so teardown continues to walk the normal and extra elements and fully destroy their fields. This deliberately relaxes the eager-free semantics of map update/delete for special fields. Programs that relied on a recycled map slot becoming empty immediately after update/delete were relying on behavior that cannot be implemented safely from every BPF execution context without offloading arbitrary destructors. There is a chance this change breaks programs making assumptions regarding the eager freeing of fields. If so, we can relax semantics to cancellation only when irqs_disabled() is true in the future. However, theoretically, map values that get reused eagerly already have weaker guarantees as parallel users can recreate freed fields before the new element becomes visible again. Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr") Signed-off-by: Justin Suess <utilityemal77@gmail.com> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260609202548.3571690-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Add support for tracing_multi link sessionJiri Olsa
Adding support to use session attachment with tracing_multi link. Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows the BPF_TRACE_FSESSION behaviour but on the tracing_multi link. Such program is called on entry and exit of the attached function and allows to pass cookie value from entry to exit execution. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Add support for tracing multi linkJiri Olsa
Adding new link to allow to attach program to multiple function BTF IDs. The link is represented by struct bpf_tracing_multi_link. To configure the link, new fields are added to bpf_attr::link_create to pass array of BTF IDs; struct { __aligned_u64 ids; __u32 cnt; } tracing_multi; Each BTF ID represents function (BTF_KIND_FUNC) that the link will attach bpf program to. We use previously added bpf_trampoline_multi_attach/detach functions to attach/detach the link. The linkinfo/fdinfo callbacks will be implemented in following changes. Note this is supported only for archs (x86_64) with ftrace direct and have single ops support. CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS && CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS Note using sort_r (instead of plain sort) in check_dup_ids, because we will use the swap callback in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-14-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Add multi tracing attach typesJiri Olsa
Adding new program attach types multi tracing attachment: BPF_TRACE_FENTRY_MULTI BPF_TRACE_FEXIT_MULTI and their base support in verifier code. Programs with such attach type will use specific link attachment interface coming in following changes. This was suggested by Andrii some (long) time ago and turned out to be easier than having special program flag for that. Bpf programs with such types have 'bpf_multi_func' function set as their attach_btf_id and keep module reference when it's specified by attach_prog_fd. They are also accepted as sleepable programs during verification, and the real validation for specific BTF_IDs/functions will happen during the multi link attachment in following changes. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-11-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Factor fsession link to use struct bpf_tramp_nodeJiri Olsa
Now that we split trampoline attachment object (bpf_tramp_node) from the link object (bpf_tramp_link) we can use bpf_tramp_node as fsession's fexit attachment object and get rid of the bpf_fsession_link object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-10-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Add struct bpf_tramp_node objectJiri Olsa
Adding struct bpf_tramp_node to decouple the link out of the trampoline attachment info. At the moment the object for attaching bpf program to the trampoline is 'struct bpf_tramp_link': struct bpf_tramp_link { struct bpf_link link; struct hlist_node tramp_hlist; u64 cookie; } The link holds the bpf_prog pointer and forces one link - one program binding logic. In following changes we want to attach program to multiple trampolines but we want to keep just one bpf_link object. Splitting struct bpf_tramp_link into: struct bpf_tramp_link { struct bpf_link link; struct bpf_tramp_node node; }; struct bpf_tramp_node { struct bpf_link *link; struct hlist_node tramp_hlist; u64 cookie; }; The 'struct bpf_tramp_link' defines standard single trampoline link and 'struct bpf_tramp_node' is the attachment trampoline object with pointer to the bpf_link object. This will allow us to define link for multiple trampolines, like: struct bpf_tracing_multi_link { struct bpf_link link; ... int nodes_cnt; struct bpf_tracing_multi_node nodes[] __counted_by(nodes_cnt); }; Cc: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-9-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-05bpf: Expose signature verdict via bpf_prog_auxKP Singh
BPF_PROG_LOAD verifies the loader signature but does not record the outcome on the BPF program. [BPF] LSMs and audit can read attr->signature and attr->keyring_id to infer "was this signed, and if so, against which keyring". Add prog->aux->sig (verdict + keyring_{type,serial}), populated by bpf_prog_load before the LSM hook. keyring_type classifies the keyring the load referenced (builtin, secondary, platform or user), while keyring_serial records the serial of the keyring the signature was actually validated against. System keyrings carry a pseudo key pointer with no user-visible serial and are reported as 0, as are unsigned loads. Failed verifications reject the load before the hook runs, so it observes only either UNSIGNED or VERIFIED. Signed-off-by: KP Singh <kpsingh@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260605213518.544262-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-05bpf: Check tail zero of bpf_prog_infoLeon Hwang
Since there're 4 bytes padding at the end of struct bpf_prog_info, they won't be checked by bpf_check_uarg_tail_zero(). pahole -C bpf_prog_info ./vmlinux struct bpf_prog_info { ... __u32 attach_btf_obj_id; /* 220 4 */ __u32 attach_btf_id; /* 224 4 */ /* size: 232, cachelines: 4, members: 38 */ /* sum members: 224 */ /* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 31 bits */ /* padding: 4 */ /* forced alignments: 9 */ /* last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); If a future kernel extension adds a new 4-byte field, older userspace programs allocating this structure on the stack might inadvertently pass uninitialized stack garbage into the new field, permanently breaking backward compatibility. -- sashiko [1] Fix it by changing sizeof(info) to offsetofend(struct bpf_prog_info, attach_btf_id). And, add "__u32 :32" to the tail of struct bpf_prog_info. [1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/ Fixes: aba64c7da983 ("bpf: Add verified_insns to bpf_prog_info and fdinfo") Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260605155249.20772-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-05bpf: Check tail zero of bpf_map_infoLeon Hwang
Since there're 4 bytes padding at the end of struct bpf_map_info, they won't be checked by bpf_check_uarg_tail_zero(). pahole -C bpf_map_info ./vmlinux struct bpf_map_info { ... __u64 hash __attribute__((__aligned__(8))); /* 88 8 */ __u32 hash_size; /* 96 4 */ /* size: 104, cachelines: 2, members: 18 */ /* padding: 4 */ /* forced alignments: 1 */ /* last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); If a future kernel extension adds a new 4-byte field, older userspace programs allocating this structure on the stack might inadvertently pass uninitialized stack garbage into the new field, permanently breaking backward compatibility. -- sashiko [1] Fix it by changing sizeof(info) to offsetofend(struct bpf_map_info, hash_size). And, add "__u32 :32" to the tail of struct bpf_map_info. [1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/ Fixes: ea2e6467ac36 ("bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD") Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260605155249.20772-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-05bpf: Allow special fields in resizable hashtabMykyta Yatsenko
Add support for timers, workqueues, task work, spin locks and kptrs. Without this, users needing deferred callbacks, BPF_F_LOCK, or refcounted kernel pointers in a dynamically-sized map have no option - fixed-size htab is the only map supporting these field types. Resizable hashtab should offer the same capability. kptr semantics under in-place updates are identical to array map. Properly clean up BTF record fields on element delete and map teardown by wiring up bpf_obj_free_fields through a memory allocator destructor, matching the pattern used by htab for non-prealloc maps. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260605-rhash-v7-6-5b8e05f8630d@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-05bpf: Implement resizable hashmap basic functionsMykyta Yatsenko
Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast() for deletes, and rhashtable_lookup_get_insert_fast() for inserts. Updates modify values in place under RCU rather than allocating a new element and swapping the pointer (as regular htab does). This trades read consistency for performance: concurrent readers may see partial updates. BPF_F_LOCK support and special-field handling (timers, kptrs, etc.) follow in a later commit. Initialize rhashtable with bpf_mem_alloc element cache. Require BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via rhashtable_free_and_destroy(). Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260605-rhash-v7-4-5b8e05f8630d@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01libbpf: Reject non-exclusive metadata maps in the signed loaderKP Singh
The loader verifies map->sha against the metadata hash in its instructions. map->sha is calculated when BPF_OBJ_GET_INFO_BY_FD is called on the frozen map. While the map is frozen, the /signed loader/ must also ensure the map is exclusive, as, without exclusivity (which a hostile host could just omit when loading the loader), another BPF program with map access can mutate the contents afterwards, so the check passes on stale data. With the extra check as part of the signed loader, it now refuses to move on with map->sha validation if the host set it up wrongly. Fixes: fb2b0e290147 ("libbpf: Update light skeleton for signing") Signed-off-by: KP Singh <kpsingh@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Drop redundant hash_buf from map_get_hash operationDaniel Borkmann
bpf_map_get_info_by_fd() is the only caller of the ->map_get_hash and always invokes it with hash_buf == map->sha and hash_buf_size of SHA256_DIGEST_SIZE. array_map_get_hash() in turn lets sha256() write the digest directly into that buffer (map->sha) and then performs a trailing memcpy(), which evaluates to memcpy(map->sha, map->sha, 32): a redundant self-copy. The hash_buf_size argument was never used at all. Simplify this a bit, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-31bpf: Fix security_bpf_prog_load() error handlingPaul Moore
If security_bpf_prog_load() fails there is no need to call into security_bpf_prog_free() as the LSM will handle the cleanup of any partial LSM state before returning to the caller with an error. Thankfully this isn't an issue with any of the existing code as the LSMs which currently provide BPF hook callback implementations don't allocate any internal state, but this is something we want to fix for potential future users. Signed-off-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20260523160025.16363-2-paul@paul-moore.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-31bpf: fix BPF_PROG_QUERY OOB write and cgroup backward compatYuyang Huang
BPF_PROG_QUERY writes back the 'query.revision' field unconditionally to userspace. If userspace passes a smaller 'bpf_attr' structure (e.g. 40 bytes, which was the layout before the addition of 'query.revision'), the kernel performs an out-of-bounds write. Fix this by propagating the user-provided attribute size 'uattr_size' down to the cgroup query handlers, and conditionally skipping writing the revision field to userspace when the provided buffer size is insufficient. query.revision in bpf_mprog_query is structurally identical to the cgroup case: a late tail field, written unconditionally. But the backward-compat hazard is not the same. The min-historical-size test is per command, and bpf_mprog_query only serves attach types that were born with revision in the struct: - tcx_prog_query -> BPF_TCX_INGRESS/EGRESS - netkit_prog_query -> BPF_NETKIT_PRIMARY/PEER tcx, netkit, the revision field, and bpf_mprog_query itself all landed in the same v6.6 merge window (053c8e1f235d added the mprog query API + revision; tcx in e420bed02507, netkit in 35dfaad7188c). There has never been a tcx/netkit BPF_PROG_QUERY userspace that doesn't know about revision. So for these commands the minimum legitimate struct already covers offset 56-64 — no old binary can be broken here. Contrast with cgroup: BPF_PROG_QUERY on cgroup attach types shipped in 2017; revision write-back was bolted on years later (120933984460). That path has a real population of pre-revision callers. Fixes: 120933984460 ("bpf: Implement mprog API on top of existing cgroup progs") Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Link: https://lore.kernel.org/r/20260531075600.4058207-2-yuyanghuang@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-28bpf: Fix race between bpf_map_new_fd() and close_fd()Leon Hwang
Because there is time gap between bpf_map_new_fd() and close_fd(), a concurrent thread is able to close the new fd and opens a new, unrelated file with the exact same fd number. Thereafter, this close_fd() might inadvertently close the unrelated file. To avoid such regression, do finalize log before security_bpf_map_create(). However, in order to achieve it, move bpf_get_file_flag(), security_bpf_map_create(), bpf_map_alloc_id(), and bpf_map_new_fd() from __map_create() to map_create(). And, rename __map_create() to map_create_alloc() meanwhile. Then, in order to reuse the map and token when all checks pass in map_create_alloc(), pass "struct bpf_map **" and "struct bpf_token **" to map_create_alloc(). Fixes: 49f9b2b2a18c ("bpf: Add syscall common attributes support for map_create") Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260521142909.95818-1-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-18bpf: Check tail zero of bpf_common_attr using offsetofendLeon Hwang
Because of the 8-byte alignment, the compiler will pad struct bpf_common_attr to 24 bytes. That said, sizeof(attr_common) is 24 instead of 20. When check tail zero using sizeof(attr_common) in bpf_check_uarg_tail_zero(), there will be 4 bytes that won't be checked. To also check the padding 4 bytes, replace sizeof(attr_common) with offsetofend(struct bpf_common_attr, log_true_size). Fixes: f28771c0691b ("bpf: Extend BPF syscall with common attributes support") Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260518145446.6794-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-12bpf: Add syscall common attributes support for map_createLeon Hwang
Many BPF_MAP_CREATE validation failures currently return -EINVAL without any explanation to userspace. Plumb common syscall log attributes into map_create(), create a verifier log from bpf_common_attr::log_buf/log_size/log_level, and report map-creation failure reasons through that buffer. This improves debuggability by allowing userspace to inspect why map creation failed and read back log_true_size from common attributes. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260512153157.28382-7-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-12bpf: Add syscall common attributes support for btf_loadLeon Hwang
BPF_BTF_LOAD can now take log parameters from both union bpf_attr and struct bpf_common_attr, with the same merge rules as BPF_PROG_LOAD: - if both sides provide a complete log tuple (buf/size/level) and they match, use it; - if only one side provides log parameters, use that one; - if both sides provide complete tuples but they differ, return -EINVAL. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260512153157.28382-6-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-12bpf: Add syscall common attributes support for prog_loadLeon Hwang
BPF_PROG_LOAD can now take log parameters from both union bpf_attr and struct bpf_common_attr. The merge rules are: - if both sides provide a complete log tuple (buf/size/level) and they match, use it; - if only one side provides log parameters, use that one; - if both sides provide complete tuples but they differ, return -EINVAL. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260512153157.28382-5-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-12bpf: Refactor reporting log_true_size for prog_loadLeon Hwang
The next commit will add support for reporting logs via extended common attributes, including 'log_true_size'. To prepare for that, refactor the 'log_true_size' reporting logic by introducing a new struct bpf_log_attr to encapsulate log-related behavior: * bpf_log_attr_init(): initialize log fields, which will support extended common attributes in the next commit. * bpf_log_attr_finalize(): handle log finalization and write back 'log_true_size' to userspace. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260512153157.28382-4-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-12bpf: Extend BPF syscall with common attributes supportLeon Hwang
Add generic BPF syscall support for passing common attributes. The initial set of common attributes includes: 1. 'log_buf': User-provided buffer for storing logs. 2. 'log_size': Size of the log buffer. 3. 'log_level': Log verbosity level. 4. 'log_true_size': Actual log size reported by kernel. The common-attribute pointer and its size are passed as the 4th and 5th syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'), indicates that common attributes are supplied. This commit adds syscall and uapi plumbing. Command-specific handling is added in follow-up patches. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260512153157.28382-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-11bpf: Fix s16 truncation for large bpf-to-bpf call offsetsYazhou Tang
Currently, the BPF instruction set allows bpf-to-bpf calls (or internal calls, pseudo calls) to use a 32-bit imm field to represent the relative jump offset. However, when JIT is disabled or falls back to the interpreter, the verifier invokes bpf_patch_call_args() to rewrite the call instruction. In this function, the 32-bit imm is downcast to s16 and stored in the off field. void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth) { stack_depth = max_t(u32, stack_depth, 1); insn->off = (s16) insn->imm; insn->imm = interpreters_args[(round_up(stack_depth, 32) / 32) - 1] - __bpf_call_base_args; insn->code = BPF_JMP | BPF_CALL_ARGS; } If the original imm exceeds the s16 range (i.e., a jump offset greater than 32767 instructions), this downcast silently truncates the offset, resulting in an incorrect call target. Fix this by: 1. In bpf_patch_call_args(), keeping the imm field unchanged and using the off field to store the index of the interpreter function. 2. In ___bpf_prog_run() for the JMP_CALL_ARGS case, retrieving the interpreter function pointer from the interpreters_args array using the off field as the index, and passing the original imm to calculate the last argument of the interpreter function. After these changes, the truncation issue is resolved, and __bpf_call_base_args is also no longer needed and can be removed, which makes the code cleaner. Performance: In ___bpf_prog_run() for the JMP_CALL_ARGS case, changing the retrieval of the interpreter function pointer from pointer addition to direct array indexing improves performance. The possible reason is that the latter has better instruction-level parallelism. See the v5 discussion [1] for more details. [1] https://lore.kernel.org/bpf/f120c3c4-6999-414a-b514-518bb64b4758@zju.edu.cn/ To avoid requiring bpftool changes, keep the new imm/off encoding internal and restore the legacy xlated dump layout in bpf_insn_prepare_dump(). For bpf-to-bpf call offsets that do not fit in s16, export off as 0 instead of a truncated and misleading value. Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter") Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump") Suggested-by: Xu Kuohai <xukuohai@huaweicloud.com> Suggested-by: Puranjay Mohan <puranjay@kernel.org> Co-developed-by: Tianci Cao <ziye@zju.edu.cn> Signed-off-by: Tianci Cao <ziye@zju.edu.cn> Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com> Link: https://lore.kernel.org/r/20260506094714.419842-3-tangyazhou@zju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-22bpf: Verifier support for sleepable tracepoint programsMykyta Yatsenko
Allow BPF_PROG_TYPE_RAW_TRACEPOINT, BPF_PROG_TYPE_TRACEPOINT, and BPF_TRACE_RAW_TP (tp_btf) programs to be sleepable by adding them to can_be_sleepable(). For BTF-based raw tracepoints (tp_btf), add a load-time check in bpf_check_attach_target() that rejects sleepable programs attaching to non-faultable tracepoints with a descriptive error message. For raw tracepoints (raw_tp), add an attach-time check in bpf_raw_tp_link_attach() that rejects sleepable programs on non-faultable tracepoints. The attach-time check is needed because the tracepoint name is not known at load time for raw_tp. The attach-time check for classic tracepoints (tp) in __perf_event_set_bpf_prog() was added in the previous patch. Replace the verbose error message that enumerates allowed program types with a generic "Program of this type cannot be sleepable" message, since the list of sleepable-capable types keeps growing. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-4-99005dff21ef@meta.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2026-04-16bpf: Pass bpf_verifier_env to JITXu Kuohai
Pass bpf_verifier_env to bpf_int_jit_compile(). The follow-up patch will use env->insn_aux_data in the JIT stage to detect indirect jump targets. Since bpf_prog_select_runtime() can be called by cbpf and lib/test_bpf.c code without verifier, introduce helper __bpf_prog_select_runtime() to accept the env parameter. Remove the call to bpf_prog_select_runtime() in bpf_prog_load(), and switch to call __bpf_prog_select_runtime() in the verifier, with env variable passed. The original bpf_prog_select_runtime() is preserved for cbpf and lib/test_bpf.c, where env is NULL. Now all constants blinding calls are moved into the verifier, except the cbpf and lib/test_bpf.c cases. The instructions arrays are adjusted by bpf_patch_insn_data() function for normal cases, so there is no need to call adjust_insn_arrays() in bpf_jit_blind_constants(). Remove it. Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> # v8 Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> # v12 Acked-by: Hengqi Chen <hengqi.chen@gmail.com> # v14 Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260416064341.151802-3-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07bpf: Retire rcu_trace_implies_rcu_gp()Kumar Kartikeya Dwivedi
RCU Tasks Trace grace period implies RCU grace period, and this guarantee is expected to remain in the future. Only BPF is the user of this predicate, hence retire the API and clean up all in-tree users. RCU Tasks Trace is now implemented on SRCU-fast and its grace period mechanism always has at least one call to synchronize_rcu() as it is required for SRCU-fast's correctness (it replaces the smp_mb() that SRCU-fast readers skip). So, RCU-tt GP will always imply RCU GP. Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260407162234.785270-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06bpf: Enable unaligned accesses for syscall ctxKumar Kartikeya Dwivedi
Don't reject usage of fixed unaligned offsets for syscall ctx. Tests will be added in later commits. Unaligned offsets already work for variable offsets. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260406194403.1649608-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc6+Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. Minor conflict in kernel/bpf/verifier.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02bpf: Fix abuse of kprobe_write_ctx via freplaceLeon Hwang
uprobe programs are allowed to modify struct pt_regs. Since the actual program type of uprobe is KPROBE, it can be abused to modify struct pt_regs via kprobe+freplace when the kprobe attaches to kernel functions. For example, SEC("?kprobe") int kprobe(struct pt_regs *regs) { return 0; } SEC("?freplace") int freplace_kprobe(struct pt_regs *regs) { regs->di = 0; return 0; } freplace_kprobe prog will attach to kprobe prog. kprobe prog will attach to a kernel function. Without this patch, when the kernel function runs, its first arg will always be set as 0 via the freplace_kprobe prog. To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow attaching freplace programs on kprobe programs with different kprobe_write_ctx values. Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers") Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-31bpf: Fix grace period wait for tracepoint bpf_linkKumar Kartikeya Dwivedi
Recently, tracepoints were switched from using disabled preemption (which acts as RCU read section) to SRCU-fast when they are not faultable. This means that to do a proper grace period wait for programs running in such tracepoints, we must use SRCU's grace period wait. This is only for non-faultable tracepoints, faultable ones continue using RCU Tasks Trace. However, bpf_link_free() currently does call_rcu() for all cases when the link is non-sleepable (hence, for tracepoints, non-faultable). Fix this by doing a call_srcu() grace period wait. As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed unnecessary for tracepoint programs. The link and program are either accessed under RCU Tasks Trace protection, or SRCU-fast protection now. The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was to generalize the logic, even if it conceded an extra RCU gp wait, however that is unnecessary for tracepoints even before this change. In practice no cost was paid since rcu_trace_implies_rcu_gp() was always true. Hence we need not chaining any RCU gp after the SRCU gp. For instance, in the non-faultable raw tracepoint, the RCU read section of the program in __bpf_trace_run() is enclosed in the SRCU gp, likewise for faultable raw tracepoint, the program is under the RCU Tasks Trace protection. Hence, the outermost scope can be waited upon to ensure correctness. Also, sleepable programs cannot be attached to non-faultable tracepoints, so whenever program or link is sleepable, only RCU Tasks Trace protection is being used for the link and prog. Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20260331211021.1632902-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-24bpf: propagate kvmemdup_bpfptr errors from bpf_prog_verify_signatureWeixie Cui
kvmemdup_bpfptr() returns -EFAULT when the user pointer cannot be copied, and -ENOMEM on allocation failure. The error path always returned -ENOMEM, misreporting bad addresses as out-of-memory. Return PTR_ERR(sig) so user space gets the correct errno. Signed-off-by: Weixie Cui <cuiweixie@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/tencent_C9C5B2B28413D6303D505CD02BFEA4708C07@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27bpf: Lose const-ness of map in map_check_btf()Kumar Kartikeya Dwivedi
BPF hash map may now use the map_check_btf() callback to decide whether to set a dtor on its bpf_mem_alloc or not. Unlike C++ where members can opt out of const-ness using mutable, we must lose the const qualifier on the callback such that we can avoid the ugly cast. Make the change and adjust all existing users, and lose the comment in hashtab.c. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-05bpf: Require frozen map for calculating map hashKP Singh
Currently, bpf_map_get_info_by_fd calculates and caches the hash of the map regardless of the map's frozen state. This leads to a TOCTOU bug where userspace can call BPF_OBJ_GET_INFO_BY_FD to cache the hash and then modify the map contents before freezing. Therefore, a trusted loader can be tricked into verifying the stale hash while loading the modified contents. Fix this by returning -EPERM if the map is not frozen when the hash is requested. This ensures the hash is only generated for the final, immutable state of the map. Fixes: ea2e6467ac36 ("bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD") Reported-by: Toshi Piazza <toshi.piazza@microsoft.com> Signed-off-by: KP Singh <kpsingh@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260205070755.695776-1-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-05bpf: Limit bpf program signature sizeKP Singh
Practical BPF signatures are significantly smaller than KMALLOC_MAX_CACHE_SIZE Allowing larger sizes opens the door for abuse by passing excessive size values and forcing the kernel into expensive allocation paths (via kmalloc_large or vmalloc). Fixes: 349271568303 ("bpf: Implement signature verification for BPF programs") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: KP Singh <kpsingh@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260205063807.690823-1-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27bpf: Fix tcx/netkit detach permissions when prog fd isn't givenGuillaume Gonnet
This commit fixes a security issue where BPF_PROG_DETACH on tcx or netkit devices could be executed by any user when no program fd was provided, bypassing permission checks. The fix adds a capability check for CAP_NET_ADMIN or CAP_SYS_ADMIN in this case. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Signed-off-by: Guillaume Gonnet <ggonnet.linux@gmail.com> Link: https://lore.kernel.org/r/20260127160200.10395-1-ggonnet.linux@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24bpf: add fsession supportMenglong Dong
The fsession is something that similar to kprobe session. It allow to attach a single BPF program to both the entry and the exit of the target functions. Introduce the struct bpf_fsession_link, which allows to add the link to both the fentry and fexit progs_hlist of the trampoline. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20kernel.h: drop hex.h and update all hex.h usersRandy Dunlap
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20bpf: Fix memory access flags in helper prototypesZesen Liu
After commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking"), the verifier started relying on the access type flags in helper function prototypes to perform memory access optimizations. Currently, several helper functions utilizing ARG_PTR_TO_MEM lack the corresponding MEM_RDONLY or MEM_WRITE flags. This omission causes the verifier to incorrectly assume that the buffer contents are unchanged across the helper call. Consequently, the verifier may optimize away subsequent reads based on this wrong assumption, leading to correctness issues. For bpf_get_stack_proto_raw_tp, the original MEM_RDONLY was incorrect since the helper writes to the buffer. Change it to ARG_PTR_TO_UNINIT_MEM which correctly indicates write access to potentially uninitialized memory. Similar issues were recently addressed for specific helpers in commit ac44dcc788b9 ("bpf: Fix verifier assumptions of bpf_d_path's output buffer") and commit 2eb7648558a7 ("bpf: Specify access type of bpf_sysctl_get_name args"). Fix these prototypes by adding the correct memory access flags. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Shuran Liu <electronlsr@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Zesen Liu <ftyghome@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120-helper_proto-v3-1-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for ↵Leon Hwang
percpu_cgroup_storage maps Introduce BPF_F_ALL_CPUS flag support for percpu_cgroup_storage maps to allow updating values for all CPUs with a single value for update_elem API. Introduce BPF_F_CPU flag support for percpu_cgroup_storage maps to allow: * update value for specified CPU for update_elem API. * lookup value for specified CPU for lookup_elem API. The BPF_F_CPU flag is passed via map_flags along with embedded cpu info. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-6-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and ↵Leon Hwang
lru_percpu_hash maps Introduce BPF_F_ALL_CPUS flag support for percpu_hash and lru_percpu_hash maps to allow updating values for all CPUs with a single value for both update_elem and update_batch APIs. Introduce BPF_F_CPU flag support for percpu_hash and lru_percpu_hash maps to allow: * update value for specified CPU for both update_elem and update_batch APIs. * lookup value for specified CPU for both lookup_elem and lookup_batch APIs. The BPF_F_CPU flag is passed via: * map_flags along with embedded cpu info. * elem_flags along with embedded cpu info. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-4-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array mapsLeon Hwang
Introduce support for the BPF_F_ALL_CPUS flag in percpu_array maps to allow updating values for all CPUs with a single value for both update_elem and update_batch APIs. Introduce support for the BPF_F_CPU flag in percpu_array maps to allow: * update value for specified CPU for both update_elem and update_batch APIs. * lookup value for specified CPU for both lookup_elem and lookup_batch APIs. The BPF_F_CPU flag is passed via: * map_flags of lookup_elem and update_elem APIs along with embedded cpu info. * elem_flags of lookup_batch and update_batch APIs along with embedded cpu info. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flagsLeon Hwang
Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags and check them for following APIs: * 'map_lookup_elem()' * 'map_update_elem()' * 'generic_map_lookup_batch()' * 'generic_map_update_batch()' And, get the correct value size for these APIs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02bpf: arena: Reintroduce memcg accountingPuranjay Mohan
When arena allocations were converted from bpf_map_alloc_pages() to kmalloc_nolock() to support non-sleepable contexts, memcg accounting was inadvertently lost. This commit restores proper memory accounting for all arena-related allocations. All arena related allocations are accounted into memcg of the process that created bpf_arena. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260102200230.25168-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02bpf: syscall: Introduce memcg enter/exit helpersPuranjay Mohan
Introduce bpf_map_memcg_enter() and bpf_map_memcg_exit() helpers to reduce code duplication in memcg context management. bpf_map_memcg_enter() gets the memcg from the map, sets it as active, and returns both the previous and the now active memcg. bpf_map_memcg_exit() restores the previous active memcg and releases the reference obtained during enter. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260102200230.25168-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.19-rc1Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>