<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/include/uapi/linux/pkt_sched.h, branch docs-5.3</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-5.3</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-5.3'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2019-05-01T15:58:51+00:00</updated>
<entry>
<title>taprio: Add support for cycle-time-extension</title>
<updated>2019-05-01T15:58:51+00:00</updated>
<author>
<name>Vinicius Costa Gomes</name>
<email>vinicius.gomes@intel.com</email>
</author>
<published>2019-04-29T22:48:33+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c25031e993440debdd530278ce2171ce477df029'/>
<id>urn:sha1:c25031e993440debdd530278ce2171ce477df029</id>
<content type='text'>
IEEE 802.1Q-2018 defines the concept of a cycle-time-extension, so the
last entry of a schedule before the start of a new schedule can be
extended, so "too-short" entries can be avoided.

Signed-off-by: Vinicius Costa Gomes &lt;vinicius.gomes@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>taprio: Add support for setting the cycle-time manually</title>
<updated>2019-05-01T15:58:51+00:00</updated>
<author>
<name>Vinicius Costa Gomes</name>
<email>vinicius.gomes@intel.com</email>
</author>
<published>2019-04-29T22:48:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6ca6a6654225f3cd001304d33429c817e0c0b85f'/>
<id>urn:sha1:6ca6a6654225f3cd001304d33429c817e0c0b85f</id>
<content type='text'>
IEEE 802.1Q-2018 defines that a the cycle-time of a schedule may be
overridden, so the schedule is truncated to a determined "width".

Signed-off-by: Vinicius Costa Gomes &lt;vinicius.gomes@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>taprio: Add support adding an admin schedule</title>
<updated>2019-05-01T15:58:51+00:00</updated>
<author>
<name>Vinicius Costa Gomes</name>
<email>vinicius.gomes@intel.com</email>
</author>
<published>2019-04-29T22:48:31+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a3d43c0d56f1b94e74963a2fbadfb70126d92213'/>
<id>urn:sha1:a3d43c0d56f1b94e74963a2fbadfb70126d92213</id>
<content type='text'>
The IEEE 802.1Q-2018 defines two "types" of schedules, the "Oper" (from
operational?) and "Admin" ones. Up until now, 'taprio' only had
support for the "Oper" one, added when the qdisc is created. This adds
support for the "Admin" one, which allows the .change() operation to
be supported.

Just for clarification, some quick (and dirty) definitions, the "Oper"
schedule is the currently (as in this instant) running one, and it's
read-only. The "Admin" one is the one that the system configurator has
installed, it can be changed, and it will be "promoted" to "Oper" when
it's 'base-time' is reached.

The idea behing this patch is that calling something like the below,
(after taprio is already configured with an initial schedule):

$ tc qdisc change taprio dev IFACE parent root 	     \
     	   base-time X 	     	   	       	     \
     	   sched-entry &lt;CMD&gt; &lt;GATES&gt; &lt;INTERVAL&gt;	     \
	   ...

Will cause a new admin schedule to be created and programmed to be
"promoted" to "Oper" at instant X. If an "Admin" schedule already
exists, it will be overwritten with the new parameters.

Up until now, there was some code that was added to ease the support
of changing a single entry of a schedule, but was ultimately unused.
Now, that we have support for "change" with more well thought
semantics, updating a single entry seems to be less useful.

So we remove what is in practice dead code, and return a "not
supported" error if the user tries to use it. If changing a single
entry would make the user's life easier we may ressurrect this idea,
but at this point, removing it simplifies the code.

For now, only the schedule specific bits are allowed to be added for a
new schedule, that means that 'clockid', 'num_tc', 'map' and 'queues'
cannot be modified.

Example:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
      base-time $BASE_TIME \
      sched-entry S 00 500000 \
      sched-entry S 0f 500000 \
      clockid CLOCK_TAI

The only change in the netlink API introduced by this change is the
introduction of an "admin" type in the response to a dump request,
that type allows userspace to separate the "oper" schedule from the
"admin" schedule. If userspace doesn't support the "admin" type, it
will only display the "oper" schedule.

Signed-off-by: Vinicius Costa Gomes &lt;vinicius.gomes@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sch_cake: Permit use of connmarks as tin classifiers</title>
<updated>2019-03-04T04:14:28+00:00</updated>
<author>
<name>Kevin Darbyshire-Bryant</name>
<email>ldir@darbyshire-bryant.me.uk</email>
</author>
<published>2019-03-01T15:04:05+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc'/>
<id>urn:sha1:0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc</id>
<content type='text'>
Add flag 'FWMARK' to enable use of firewall connmarks as tin selector.
The connmark (skbuff-&gt;mark) needs to be in the range 1-&gt;tin_cnt ie.
for diffserv3 the mark needs to be 1-&gt;3.

Background

Typically CAKE uses DSCP as the basis for tin selection.  DSCP values
are relatively easily changed as part of the egress path, usually with
iptables &amp; the mangle table, ingress is more challenging.  CAKE is often
used on the WAN interface of a residential gateway where passthrough of
DSCP from the ISP is either missing or set to unhelpful values thus use
of ingress DSCP values for tin selection isn't helpful in that
environment.

An approach to solving the ingress tin selection problem is to use
CAKE's understanding of tc filters.  Naive tc filters could match on
source/destination port numbers and force tin selection that way, but
multiple filters don't scale particularly well as each filter must be
traversed whether it matches or not. e.g. a simple example to map 3
firewall marks to tins:

MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1
tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2
tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3

Another option is to use eBPF cls_act with tc filters e.g.

MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o

This has the disadvantages of a) needing someone to write &amp; maintain
the bpf program, b) a bpf toolchain to compile it and c) needing to
hardcode the major number in the bpf program so it matches the cake
instance (or forcing the cake instance to a particular major number)
since the major number cannot be passed to the bpf program via tc
command line.

As already hinted at by the previous examples, it would be helpful
to associate tins with something that survives the Internet path and
ideally allows tin selection on both egress and ingress.  Netfilter's
conntrack permits setting an identifying mark on a connection which
can also be restored to an ingress packet with tc action connmark e.g.

tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
	match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1

Since tc's connmark action has restored any connmark into skb-&gt;mark,
any of the previous solutions are based upon it and in one form or
another copy that mark to the skb-&gt;priority field where again CAKE
picks this up.

This change cuts out at least one of the (less intuitive &amp;
non-scalable) middlemen and permit direct access to skb-&gt;mark.

Signed-off-by: Kevin Darbyshire-Bryant &lt;ldir@darbyshire-bryant.me.uk&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: pie: add more cases to auto-tune alpha and beta</title>
<updated>2019-02-25T22:21:03+00:00</updated>
<author>
<name>Mohit P. Tahiliani</name>
<email>tahiliani@nitk.edu.in</email>
</author>
<published>2019-02-25T19:09:59+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3f7ae5f3dc5295ac17d6521130ed8a8f8a723fbf'/>
<id>urn:sha1:3f7ae5f3dc5295ac17d6521130ed8a8f8a723fbf</id>
<content type='text'>
The current implementation scales the local alpha and beta
variables in the calculate_probability function by the same
amount for all values of drop probability below 1%.

RFC 8033 suggests using additional cases for auto-tuning
alpha and beta when the drop probability is less than 1%.

In order to add more auto-tuning cases, MAX_PROB must be
scaled by u64 instead of u32 to prevent underflow when
scaling the local alpha and beta variables in the
calculate_probability function.

Signed-off-by: Mohit P. Tahiliani &lt;tahiliani@nitk.edu.in&gt;
Signed-off-by: Dhaval Khandla &lt;dhavaljkhandla26@gmail.com&gt;
Signed-off-by: Hrishikesh Hiraskar &lt;hrishihiraskar@gmail.com&gt;
Signed-off-by: Manish Kumar B &lt;bmanish15597@gmail.com&gt;
Signed-off-by: Sachin D. Patil &lt;sdp.sachin@gmail.com&gt;
Signed-off-by: Leslie Monis &lt;lesliemonis@gmail.com&gt;
Acked-by: Dave Taht &lt;dave.taht@gmail.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: gred: allow manipulating per-DP RED flags</title>
<updated>2018-11-17T07:08:51+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2018-11-15T06:23:51+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=72111015024f4eddb5aac400ddbe38a4f8f0279a'/>
<id>urn:sha1:72111015024f4eddb5aac400ddbe38a4f8f0279a</id>
<content type='text'>
Allow users to set and dump RED flags (ECN enabled and harddrop)
on per-virtual queue basis.  Validation of attributes is split
from changes to make sure we won't have to undo previous operations
when we find out configuration is invalid.

The objective is to allow changing per-Qdisc parameters without
overwriting the per-vq configured flags.

Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and
per-Qdisc flags will always get propagated to the virtual queues.

New user space which wants to make use of per-vq flags should set
per-Qdisc flags to 0 and then configure per-vq flags as it
sees fit.  Once per-vq flags are set per-Qdisc flags can't be
changed to non-zero.  Vice versa - if the per-Qdisc flags are
non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted
or set to the same value as per-Qdisc flags.

Update per-Qdisc parameters:
per-Qdisc | per-VQ | result
        0 |      0 | all vq flags updated
	0 |  non-0 | error (vq flags in use)
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | all vq flags updated

Update per-VQ state (flags parameter not specified):
   no change to flags

Update per-VQ state (flags parameter set):
per-Qdisc | per-VQ | result
        0 |   any  | per-vq flags updated
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | error (per-Qdisc flags in use)

Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: John Hurley &lt;john.hurley@netronome.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: gred: provide a better structured dump and expose stats</title>
<updated>2018-11-17T07:08:51+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2018-11-15T06:23:49+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=80e22e961dfd15530215f6f6dcd94cd8f65ba1ea'/>
<id>urn:sha1:80e22e961dfd15530215f6f6dcd94cd8f65ba1ea</id>
<content type='text'>
Currently all GRED's virtual queue data is dumped in a single
array in a single attribute.  This makes it pretty much impossible
to add new fields.  In order to expose more detailed stats add a
new set of attributes.  We can now expose the 64 bit value of bytesin
and all the mark stats which were not part of the original design.

Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: John Hurley &lt;john.hurley@netronome.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: sch_fq: add dctcp-like marking</title>
<updated>2018-11-11T21:59:21+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-11-11T17:11:31+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=48872c11b77271ef9b070bdc50afe6655c4eb9aa'/>
<id>urn:sha1:48872c11b77271ef9b070bdc50afe6655c4eb9aa</id>
<content type='text'>
Similar to 80ba92fa1a92 ("codel: add ce_threshold attribute")

After EDT adoption, it became easier to implement DCTCP-like CE marking.

In many cases, queues are not building in the network fabric but on
the hosts themselves.

If packets leaving fq missed their Earliest Departure Time by XXX usec,
we mark them with ECN CE. This gives a feedback (after one RTT) to
the sender to slow down and find better operating mode.

Example :

tc qd replace dev eth0 root fq ce_threshold 2.5ms

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tc: Add support for configuring the taprio scheduler</title>
<updated>2018-10-04T20:52:23+00:00</updated>
<author>
<name>Vinicius Costa Gomes</name>
<email>vinicius.gomes@intel.com</email>
</author>
<published>2018-09-29T00:59:43+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5a781ccbd19e4664babcbe4b4ead7aa2b9283d22'/>
<id>urn:sha1:5a781ccbd19e4664babcbe4b4ead7aa2b9283d22</id>
<content type='text'>
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
          num_tc 3 \
	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	  queues 1@0 1@1 2@2 \
	  base-time 1528743495910289987 \
	  sched-entry S 01 300000 \
	  sched-entry S 02 300000 \
	  sched-entry S 04 300000 \
	  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

     sched-entry &lt;CMD&gt; &lt;GATE MASK&gt; &lt;INTERVAL&gt;

The only supported &lt;CMD&gt; is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). &lt;GATE MASK&gt;
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. &lt;INTERVAL&gt; is a time duration
in nanoseconds that specifies for how long that state defined by &lt;CMD&gt;
and &lt;GATE MASK&gt; should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

 	      base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes &lt;vinicius.gomes@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/sched: fix type of htb statistics</title>
<updated>2018-09-02T20:57:12+00:00</updated>
<author>
<name>Florent Fourcot</name>
<email>florent.fourcot@wifirst.fr</email>
</author>
<published>2018-08-30T14:39:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b9de3963cc2b373a655636335cb8c4ed12fc9d3b'/>
<id>urn:sha1:b9de3963cc2b373a655636335cb8c4ed12fc9d3b</id>
<content type='text'>
tokens and ctokens are defined as s64 in htb_class structure,
and clamped to 32bits value during netlink dumps:

cl-&gt;xstats.tokens = clamp_t(s64, PSCHED_NS2TICKS(cl-&gt;tokens),
                            INT_MIN, INT_MAX);

Defining it as u32 is working since userspace (tc) is printing it as
signed int, but a correct definition from the beginning is probably
better.

In the same time, 'giants' structure member is unused since years, so
update the comment to mark it unused.

Signed-off-by: Florent Fourcot &lt;florent.fourcot@wifirst.fr&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
