<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/kernel/bpf/syscall.c, branch v6.4-rc1</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v6.4-rc1</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v6.4-rc1'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2023-04-21T18:34:14+00:00</updated>
<entry>
<title>bpf: add bpf_link support for BPF_NETFILTER programs</title>
<updated>2023-04-21T18:34:14+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2023-04-21T17:02:54+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=84601d6ee68ae820dec97450934797046d62db4b'/>
<id>urn:sha1:84601d6ee68ae820dec97450934797046d62db4b</id>
<content type='text'>
Add bpf_link support skeleton.  To keep this reviewable, no bpf program
can be invoked yet, if a program is attached only a c-stub is called and
not the actual bpf program.

Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.

Uapi example usage:
	union bpf_attr attr = { };

	attr.link_create.prog_fd = progfd;
	attr.link_create.attach_type = 0; /* unused */
	attr.link_create.netfilter.pf = PF_INET;
	attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN;
	attr.link_create.netfilter.priority = -128;

	err = bpf(BPF_LINK_CREATE, &amp;attr, sizeof(attr));

... this would attach progfd to ipv4:input hook.

Such hook gets removed automatically if the calling program exits.

BPF_NETFILTER program invocation is added in followup change.

NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it
allows to tell userspace which program is attached at the given hook
when user runs 'nft hook list' command rather than just the priority
and not-very-helpful 'this hook runs a bpf prog but I can't tell which
one'.

Will also be used to disallow registration of two bpf programs with
same priority in a followup patch.

v4: arm32 cmpxchg only supports 32bit operand
    s/prio/priority/
v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if
    more use cases pop up (arptables, ebtables, netdev ingress/egress etc).

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: lirc program type should not require SYS_CAP_ADMIN</title>
<updated>2023-04-17T20:21:14+00:00</updated>
<author>
<name>Sean Young</name>
<email>sean@mess.org</email>
</author>
<published>2023-04-17T08:17:48+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=69a8c792cd9518071dc801bb110e0f2210d9f958'/>
<id>urn:sha1:69a8c792cd9518071dc801bb110e0f2210d9f958</id>
<content type='text'>
Make it possible to load lirc program type with just CAP_BPF. There is
nothing exceptional about lirc programs that means they require
SYS_CAP_ADMIN.

In order to attach or detach a lirc program type you need permission to
open /dev/lirc0; if you have permission to do that, you can alter all
sorts of lirc receiving options. Changing the IR protocol decoder is no
different.

Right now on a typical distribution /dev/lirc devices are only
read/write by root. Ideally we would make them group read/write like
other devices so that local users can use them without becoming root.

Signed-off-by: Sean Young &lt;sean@mess.org&gt;
Link: https://lore.kernel.org/r/ZD0ArKpwnDBJZsrE@gofer.mess.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing</title>
<updated>2023-04-16T00:36:49+00:00</updated>
<author>
<name>Dave Marchevsky</name>
<email>davemarchevsky@fb.com</email>
</author>
<published>2023-04-15T20:18:04+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d54730b50bae1f3119bd686d551d66f0fcc387ca'/>
<id>urn:sha1:d54730b50bae1f3119bd686d551d66f0fcc387ca</id>
<content type='text'>
A 'struct bpf_refcount' is added to the set of opaque uapi/bpf.h types
meant for use in BPF programs. Similarly to other opaque types like
bpf_spin_lock and bpf_rbtree_node, the verifier needs to know where in
user-defined struct types a bpf_refcount can be located, so necessary
btf_record plumbing is added to enable this. bpf_refcount is sized to
hold a refcount_t.

Similarly to bpf_spin_lock, the offset of a bpf_refcount is cached in
btf_record as refcount_off in addition to being in the field array.
Caching refcount_off makes sense for this field because further patches
in the series will modify functions that take local kptrs (e.g.
bpf_obj_drop) to change their behavior if the type they're operating on
is refcounted. So enabling fast "is this type refcounted?" checks is
desirable.

No such verifier behavior changes are introduced in this patch, just
logic to recognize 'struct bpf_refcount' in btf_record.

Signed-off-by: Dave Marchevsky &lt;davemarchevsky@fb.com&gt;
Link: https://lore.kernel.org/r/20230415201811.343116-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove btf_field_offs, use btf_record's fields instead</title>
<updated>2023-04-16T00:36:49+00:00</updated>
<author>
<name>Dave Marchevsky</name>
<email>davemarchevsky@fb.com</email>
</author>
<published>2023-04-15T20:18:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=cd2a8079014aced27da9b2e669784f31680f1351'/>
<id>urn:sha1:cd2a8079014aced27da9b2e669784f31680f1351</id>
<content type='text'>
The btf_field_offs struct contains (offset, size) for btf_record fields,
sorted by offset. btf_field_offs is always used in conjunction with
btf_record, which has btf_field 'fields' array with (offset, type), the
latter of which btf_field_offs' size is derived from via
btf_field_type_size.

This patch adds a size field to struct btf_field and sorts btf_record's
fields by offset, making it possible to get rid of btf_field_offs. Less
data duplication and less code complexity results.

Since btf_field_offs' lifetime closely followed the btf_record used to
populate it, most complexity wins are from removal of initialization
code like:

  if (btf_record_successfully_initialized) {
    foffs = btf_parse_field_offs(rec);
    if (IS_ERR_OR_NULL(foffs))
      // free the btf_record and return err
  }

Other changes in this patch are pretty mechanical:

  * foffs-&gt;field_off[i] -&gt; rec-&gt;fields[i].offset
  * foffs-&gt;field_sz[i] -&gt; rec-&gt;fields[i].size
  * Sort rec-&gt;fields in btf_parse_fields before returning
    * It's possible that this is necessary independently of other
      changes in this patch. btf_record_find in syscall.c expects
      btf_record's fields to be sorted by offset, yet there's no
      explicit sorting of them before this patch, record's fields are
      populated in the order they're read from BTF struct definition.
      BTF docs don't say anything about the sortedness of struct fields.
  * All functions taking struct btf_field_offs * input now instead take
    struct btf_record *. All callsites of these functions already have
    access to the correct btf_record.

Signed-off-by: Dave Marchevsky &lt;davemarchevsky@fb.com&gt;
Link: https://lore.kernel.org/r/20230415201811.343116-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add log_true_size output field to return necessary log buffer size</title>
<updated>2023-04-11T16:05:43+00:00</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2023-04-06T23:41:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=47a71c1f9af0a334c9dfa97633c41de4feda4287'/>
<id>urn:sha1:47a71c1f9af0a334c9dfa97633c41de4feda4287</id>
<content type='text'>
Add output-only log_true_size and btf_log_true_size field to
BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
the size of log buffer necessary to fit in all the log contents at
specified log_level. This is very useful for BPF loader libraries like
libbpf to be able to size log buffer correctly, but could be used by
users directly, if necessary, as well.

This patch plumbs all this through the code, taking into account actual
bpf_attr size provided by user to determine if these new fields are
expected by users. And if they are, set them from kernel on return.

We refactory btf_parse() function to accommodate this, moving attr and
uattr handling inside it. The rest is very straightforward code, which
is split from the logging accounting changes in the previous patch to
make it simpler to review logic vs UAPI changes.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Lorenz Bauer &lt;lmb@isovalent.com&gt;
Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
</content>
</entry>
<entry>
<title>bpf: Only invoke kptr dtor following non-NULL xchg</title>
<updated>2023-03-25T23:56:22+00:00</updated>
<author>
<name>David Vernet</name>
<email>void@manifault.com</email>
</author>
<published>2023-03-25T21:31:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1431d0b584a673ea690c88a5f7e1aedd9caf0e84'/>
<id>urn:sha1:1431d0b584a673ea690c88a5f7e1aedd9caf0e84</id>
<content type='text'>
When a map value is being freed, we loop over all of the fields of the
corresponding BPF object and issue the appropriate cleanup calls
corresponding to the field's type. If the field is a referenced kptr, we
atomically xchg the value out of the map, and invoke the kptr's
destructor on whatever was there before (or bpf_obj_drop() it if it was
a local kptr).

Currently, we always invoke the destructor (either bpf_obj_drop() or the
kptr's registered destructor) on any KPTR_REF-type field in a map, even
if there wasn't a value in the map. This means that any function serving
as the kptr's KF_RELEASE destructor must always treat the argument as
possibly NULL, as the following can and regularly does happen:

void *xchgd_field;

/* No value was in the map, so xchgd_field is NULL */
xchgd_field = (void *)xchg(unsigned long *field_ptr, 0);
field-&gt;kptr.dtor(xchgd_field);

These are odd semantics to impose on KF_RELEASE kfuncs -- BPF programs
are prohibited by the verifier from passing NULL pointers to KF_RELEASE
kfuncs, so it doesn't make sense to require this of BPF programs, but
not the main kernel destructor path. It's also unnecessary to invoke any
cleanup logic for local kptrs. If there is no object there, there's
nothing to drop.

So as to allow KF_RELEASE kfuncs to fully assume that an argument is
non-NULL, this patch updates a KPTR_REF's destructor to only be invoked
when a non-NULL value is xchg'd out of the kptr map field.

Signed-off-by: David Vernet &lt;void@manifault.com&gt;
Link: https://lore.kernel.org/r/20230325213144.486885-2-void@manifault.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Check IS_ERR for the bpf_map_get() return value</title>
<updated>2023-03-24T19:40:47+00:00</updated>
<author>
<name>Martin KaFai Lau</name>
<email>martin.lau@kernel.org</email>
</author>
<published>2023-03-24T18:42:41+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=55fbae05476df65e5eee8be54f61d0257af0240b'/>
<id>urn:sha1:55fbae05476df65e5eee8be54f61d0257af0240b</id>
<content type='text'>
This patch fixes a mistake in checking NULL instead of
checking IS_ERR for the bpf_map_get() return value.

It also fixes the return value in link_update_map() from -EINVAL
to PTR_ERR(*_map).

Reported-by: syzbot+71ccc0fe37abb458406b@syzkaller.appspotmail.com
Fixes: 68b04864ca42 ("bpf: Create links for BPF struct_ops maps.")
Fixes: aef56f2e918b ("bpf: Update the struct_ops of a bpf_link.")
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Acked-by: Kui-Feng Lee &lt;kuifeng@meta.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/r/20230324184241.1387437-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Update the struct_ops of a bpf_link.</title>
<updated>2023-03-23T05:53:02+00:00</updated>
<author>
<name>Kui-Feng Lee</name>
<email>kuifeng@meta.com</email>
</author>
<published>2023-03-23T03:24:02+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=aef56f2e918bf8fc8de25f0b36e8c2aba44116ec'/>
<id>urn:sha1:aef56f2e918bf8fc8de25f0b36e8c2aba44116ec</id>
<content type='text'>
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
to conveniently switch between different struct_ops on a single
bpf_link. This would enable smoother transitions from one struct_ops
to another.

The struct_ops maps passing along with BPF_LINK_UPDATE should have the
BPF_F_LINK flag.

Signed-off-by: Kui-Feng Lee &lt;kuifeng@meta.com&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Create links for BPF struct_ops maps.</title>
<updated>2023-03-23T05:53:02+00:00</updated>
<author>
<name>Kui-Feng Lee</name>
<email>kuifeng@meta.com</email>
</author>
<published>2023-03-23T03:24:00+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=68b04864ca425d1894c96b8141d4fba1181f11cb'/>
<id>urn:sha1:68b04864ca425d1894c96b8141d4fba1181f11cb</id>
<content type='text'>
Make bpf_link support struct_ops.  Previously, struct_ops were always
used alone without any associated links. Upon updating its value, a
struct_ops would be activated automatically. Yet other BPF program
types required to make a bpf_link with their instances before they
could become active. Now, however, you can create an inactive
struct_ops, and create a link to activate it later.

With bpf_links, struct_ops has a behavior similar to other BPF program
types. You can pin/unpin them from their links and the struct_ops will
be deactivated when its link is removed while previously need someone
to delete the value for it to be deactivated.

bpf_links are responsible for registering their associated
struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
set to create a bpf_link, while a structs without this flag behaves in
the same manner as before and is registered upon updating its value.

The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
used to craft the links for BPF struct_ops programs, but also to
create links for BPF struct_ops them-self.  Since the links of BPF
struct_ops programs are only used to create trampolines internally,
they are never seen in other contexts. Thus, they can be reused for
struct_ops themself.

To maintain a reference to the map supporting this link, we add
bpf_struct_ops_link as an additional type. The pointer of the map is
RCU and won't be necessary until later in the patchset.

Signed-off-by: Kui-Feng Lee &lt;kuifeng@meta.com&gt;
Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Retire the struct_ops map kvalue-&gt;refcnt.</title>
<updated>2023-03-23T05:51:47+00:00</updated>
<author>
<name>Kui-Feng Lee</name>
<email>kuifeng@meta.com</email>
</author>
<published>2023-03-23T03:23:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b671c2067a04c0668df174ff5dfdb573d1f9b074'/>
<id>urn:sha1:b671c2067a04c0668df174ff5dfdb573d1f9b074</id>
<content type='text'>
We have replaced kvalue-refcnt with synchronize_rcu() to wait for an
RCU grace period.

Maintenance of kvalue-&gt;refcnt was a complicated task, as we had to
simultaneously keep track of two reference counts: one for the
reference count of bpf_map. When the kvalue-&gt;refcnt reaches zero, we
also have to reduce the reference count on bpf_map - yet these steps
are not performed in an atomic manner and require us to be vigilant
when managing them. By eliminating kvalue-&gt;refcnt, we can make our
maintenance more straightforward as the refcount of bpf_map is now
solely managed!

To prevent the trampoline image of a struct_ops from being released
while it is still in use, we wait for an RCU grace period. The
setsockopt(TCP_CONGESTION, "...") command allows you to change your
socket's congestion control algorithm and can result in releasing the
old struct_ops implementation. It is fine. However, this function is
exposed through bpf_setsockopt(), it may be accessed by BPF programs
as well. To ensure that the trampoline image belonging to struct_op
can be safely called while its method is in use, the trampoline
safeguarde the BPF program with rcu_read_lock(). Doing so prevents any
destruction of the associated images before returning from a
trampoline and requires us to wait for an RCU grace period.

Signed-off-by: Kui-Feng Lee &lt;kuifeng@meta.com&gt;
Link: https://lore.kernel.org/r/20230323032405.3735486-2-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
</feed>
