<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/sched, branch docs-4.16</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-4.16</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-4.16'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2017-12-06T20:34:10+00:00</updated>
<entry>
<title>net_sched: use macvlan real dev trans_start in dev_trans_start()</title>
<updated>2017-12-06T20:34:10+00:00</updated>
<author>
<name>Chris Dion</name>
<email>christopher.dion@dell.com</email>
</author>
<published>2017-12-06T15:50:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=32d3e51a82d453762ef148b2c4fbc8a7ec374a88'/>
<id>urn:sha1:32d3e51a82d453762ef148b2c4fbc8a7ec374a88</id>
<content type='text'>
Macvlan devices are similar to vlans and do not update their
own trans_start. In order for arp monitoring to work for a bond device
when the slaves are macvlans, obtain its real device.

Signed-off-by: Chris Dion &lt;christopher.dion@dell.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: red: Avoid illegal values</title>
<updated>2017-12-05T19:37:13+00:00</updated>
<author>
<name>Nogah Frankel</name>
<email>nogahf@mellanox.com</email>
</author>
<published>2017-12-04T11:31:11+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8afa10cbe281b10371fee5a87ab266e48d71a7f9'/>
<id>urn:sha1:8afa10cbe281b10371fee5a87ab266e48d71a7f9</id>
<content type='text'>
Check the qmin &amp; qmax values doesn't overflow for the given Wlog value.
Check that qmin &lt;= qmax.

Fixes: a783474591f2 ("[PKT_SCHED]: Generic RED layer")
Signed-off-by: Nogah Frankel &lt;nogahf@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>act_sample: get rid of tcf_sample_cleanup_rcu()</title>
<updated>2017-11-30T15:19:17+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-11-30T00:07:51+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=90a6ec85351b31449c2c6b5406b5396ac96f191d'/>
<id>urn:sha1:90a6ec85351b31449c2c6b5406b5396ac96f191d</id>
<content type='text'>
Similar to commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu"),
TC actions don't need to respect RCU grace period, because it
is either just detached from tc filter (standalone case) or
it is removed together with tc filter (bound case) in which case
RCU grace period is already respected at filter layer.

Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Cc: Yotam Gigi &lt;yotamg@mellanox.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: cbq: create block for q-&gt;link.block</title>
<updated>2017-11-28T21:04:26+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@mellanox.com</email>
</author>
<published>2017-11-27T17:37:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d51aae68b142f48232257e96ce317db25445418d'/>
<id>urn:sha1:d51aae68b142f48232257e96ce317db25445418d</id>
<content type='text'>
q-&gt;link.block is not initialized, that leads to EINVAL when one tries to
add filter there. So initialize it properly.

This can be reproduced by:
$ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit
$ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1

Reported-by: Jaroslav Aster &lt;jaster@redhat.com&gt;
Reported-by: Ivan Vecera &lt;ivecera@redhat.com&gt;
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Acked-by: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Reviewed-by: Ivan Vecera &lt;ivecera@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sch_sfq: fix null pointer dereference at timer expiration</title>
<updated>2017-11-28T20:54:05+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2017-11-28T13:28:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f85729d07cd649bf69820f14bdad36326a24a699'/>
<id>urn:sha1:f85729d07cd649bf69820f14bdad36326a24a699</id>
<content type='text'>
While converting sch_sfq to use timer_setup(), the commit cdeabbb88134
("net: sched: Convert timers to use timer_setup()") forgot to
initialize the 'sch' field. As a result, the timer callback tries to
dereference a NULL pointer, and the kernel does oops.

Fix it initializing such field at qdisc creation time.

Fixes: cdeabbb88134 ("net: sched: Convert timers to use timer_setup()")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>cls_bpf: don't decrement net's refcount when offload fails</title>
<updated>2017-11-28T20:49:44+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2017-11-27T19:11:41+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=25415cec502a1232b19fffc85465882b19a90415'/>
<id>urn:sha1:25415cec502a1232b19fffc85465882b19a90415</id>
<content type='text'>
When cls_bpf offload was added it seemed like a good idea to
call cls_bpf_delete_prog() instead of extending the error
handling path, since the software state is fully initialized
at that point.  This handling of errors without jumping to
the end of the function is error prone, as proven by later
commit missing that extra call to __cls_bpf_delete_prog().

__cls_bpf_delete_prog() is now expected to be invoked with
a reference on exts-&gt;net or the field zeroed out.  The call
on the offload's error patch does not fullfil this requirement,
leading to each error stealing a reference on net namespace.

Create a function undoing what cls_bpf_set_parms() did and
use it from __cls_bpf_delete_prog() and the error path.

Fixes: aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()")
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: Simon Horman &lt;simon.horman@netronome.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: crash on blocks with goto chain action</title>
<updated>2017-11-25T14:57:20+00:00</updated>
<author>
<name>Roman Kapl</name>
<email>code@rkapl.cz</email>
</author>
<published>2017-11-24T11:27:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a60b3f515d30d0fe8537c64671926879a3548103'/>
<id>urn:sha1:a60b3f515d30d0fe8537c64671926879a3548103</id>
<content type='text'>
tcf_block_put_ext has assumed that all filters (and thus their goto
actions) are destroyed in RCU callback and thus can not race with our
list iteration. However, that is not true during netns cleanup (see
tcf_exts_get_net comment).

Prevent the user after free by holding all chains (except 0, that one is
already held). foreach_safe is not enough in this case.

To reproduce, run the following in a netns and then delete the ns:
    ip link add dtest type dummy
    tc qdisc add dev dtest ingress
    tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2

Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()")
Signed-off-by: Roman Kapl &lt;code@rkapl.cz&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2017-11-23T17:33:01+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-11-23T17:33:01+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e4be7baba81a816bdf778804508b43fa92c6446d'/>
<id>urn:sha1:e4be7baba81a816bdf778804508b43fa92c6446d</id>
<content type='text'>
Daniel Borkmann says:

====================
pull-request: bpf 2017-11-23

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Several BPF offloading fixes, from Jakub. Among others:

    - Limit offload to cls_bpf and XDP program types only.
    - Move device validation into the driver and don't make
      any assumptions about the device in the classifier due
      to shared blocks semantics.
    - Don't pass offloaded XDP program into the driver when
      it should be run in native XDP instead. Offloaded ones
      are not JITed for the host in such cases.
    - Don't destroy device offload state when moved to
      another namespace.
    - Revert dumping offload info into user space for now,
      since ifindex alone is not sufficient. This will be
      redone properly for bpf-next tree.

2) Fix test_verifier to avoid using bpf_probe_write_user()
   helper in test cases, since it's dumping a warning into
   kernel log which may confuse users when only running tests.
   Switch to use bpf_trace_printk() instead, from Yonghong.

3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
   before it becomes uabi, from Gianluca. More specifically:

    - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
      by bpf_csum_diff(), where the argument is either a
      valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
      then enforces a valid pointer in case of non-0 size
      or a valid pointer or NULL in case of size 0. Given
      that, the semantics for ARG_PTR_TO_MEM in combination
      with ARG_CONST_SIZE_OR_ZERO are now such that in case
      of size 0, the pointer must always be valid and cannot
      be NULL. This fix in semantics allows for bpf_probe_read()
      to drop the recently added size == 0 check in the helper
      that would become part of uabi otherwise once released.
      At the same time we can then fix bpf_probe_read_str() and
      bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
      instead of ARG_CONST_SIZE in order to fix recently
      reported issues by Arnaldo et al, where LLVM optimizes
      two boundary checks into a single one for unknown
      variables where the verifier looses track of the variable
      bounds and thus rejects valid programs otherwise.

4) A fix for the verifier for the case when it detects
   comparison of two constants where the branch is guaranteed
   to not be taken at runtime. Verifier will rightfully prune
   the exploration of such paths, but we still pass the program
   to JITs, where they would complain about using reserved
   fields, etc. Track such dead instructions and sanitize
   them with mov r0,r0. Rejection is not possible since LLVM
   may generate them for valid C code and doesn't do as much
   data flow analysis as verifier. For bpf-next we might
   implement removal of such dead code and adjust branches
   instead. Fix from Alexei.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: accept UFO datagrams from tuntap and packet</title>
<updated>2017-11-23T16:37:35+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-11-21T15:22:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0c19f846d582af919db66a5914a0189f9f92c936'/>
<id>urn:sha1:0c19f846d582af919db66a5914a0189f9f92c936</id>
<content type='text'>
Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)-&gt;ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &amp;
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 &lt; payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -&gt; v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/&lt;CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com&gt;
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: fix crash when deleting secondary chains</title>
<updated>2017-11-23T16:25:37+00:00</updated>
<author>
<name>Roman Kapl</name>
<email>code@rkapl.cz</email>
</author>
<published>2017-11-20T21:21:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d7aa04a5e82b4f254d306926c81eae8df69e5200'/>
<id>urn:sha1:d7aa04a5e82b4f254d306926c81eae8df69e5200</id>
<content type='text'>
If you flush (delete) a filter chain other than chain 0 (such as when
deleting the device), the kernel may run into a use-after-free. The
chain refcount must not be decremented unless we are sure we are done
with the chain.

To reproduce the bug, run:
    ip link add dtest type dummy
    tc qdisc add dev dtest ingress
    tc filter add dev dtest chain 1  parent ffff: flower
    ip link del dtest

Introduced in: commit f93e1cdcf42c ("net/sched: fix filter flushing"),
but unless you have KAsan or luck, you won't notice it until
commit 0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()")

Fixes: f93e1cdcf42c ("net/sched: fix filter flushing")
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: Roman Kapl &lt;code@rkapl.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
