<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/net/sched, branch v3.11-rc6</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.11-rc6</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.11-rc6'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2013-08-15T08:43:08+00:00</updated>
<entry>
<title>net_sched: restore "linklayer atm" handling</title>
<updated>2013-08-15T08:43:08+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-08-14T21:47:11+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8a8e3d84b1719a56f9151909e80ea6ebc5b8e318'/>
<id>urn:sha1:8a8e3d84b1719a56f9151909e80ea6ebc5b8e318</id>
<content type='text'>
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.

 tc class add ... htb rate X ceil Y linklayer atm

The linklayer setting is implemented by modifying the rate table
which is send to the kernel.  No direct parameter were
transferred to the kernel indicating the linklayer setting.

The commit 56b765b79 ("htb: improved accuracy at high rates")
removed the use of the rate table system.

To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table.  It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.

Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect.  Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: make dev_trans_start return vlan's real dev trans_start</title>
<updated>2013-08-05T19:17:42+00:00</updated>
<author>
<name>nikolay@redhat.com</name>
<email>nikolay@redhat.com</email>
</author>
<published>2013-08-03T20:07:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=07ce76aa9bcf8bc106a53c67548c5602f1598595'/>
<id>urn:sha1:07ce76aa9bcf8bc106a53c67548c5602f1598595</id>
<content type='text'>
Vlan devices are LLTX and don't update their own trans_start, so if
dev_trans_start has to be called with a vlan device then 0 or a stale
value will be returned. Currently the bonding is the only such user, and
it's needed for proper arp monitoring when the slaves are vlans.
Fix this by extracting the vlan's real device trans_start.

Suggested-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@redhat.com&gt;
Acked-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>htb: fix sign extension bug</title>
<updated>2013-08-02T21:52:20+00:00</updated>
<author>
<name>stephen hemminger</name>
<email>stephen@networkplumber.org</email>
</author>
<published>2013-08-02T05:32:07+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=cbd375567f7e4811b1c721f75ec519828ac6583f'/>
<id>urn:sha1:cbd375567f7e4811b1c721f75ec519828ac6583f</id>
<content type='text'>
When userspace passes a large priority value
the assignment of the unsigned value hopt-&gt;prio
to  signed int cl-&gt;prio causes cl-&gt;prio to become negative and the
comparison is with TC_HTB_NUMPRIO is always false.

The result is that HTB crashes by referencing outside
the array when processing packets. With this patch the large value
wraps around like other values outside the normal range.

See: https://bugzilla.kernel.org/show_bug.cgi?id=60669

Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: info leak in atm_tc_dump_class()</title>
<updated>2013-07-31T22:04:19+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2013-07-30T10:23:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8cb3b9c3642c0263d48f31d525bcee7170eedc20'/>
<id>urn:sha1:8cb3b9c3642c0263d48f31d525bcee7170eedc20</id>
<content type='text'>
The "pvc" struct has a hole after pvc.sap_family which is not cleared.

Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: Fix stack info leak in cbq_dump_wrr().</title>
<updated>2013-07-30T07:16:21+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2013-07-30T07:16:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a0db856a95a29efb1c23db55c02d9f0ff4f0db48'/>
<id>urn:sha1:a0db856a95a29efb1c23db55c02d9f0ff4f0db48</id>
<content type='text'>
Make sure the reserved fields, and padding (if any), are
fully initialized.

Based upon a patch by Dan Carpenter and feedback from
Joe Perches.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pkt_sched: sch_qfq: remove a source of high packet delay/jitter</title>
<updated>2013-07-18T20:02:00+00:00</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@unimore.it</email>
</author>
<published>2013-07-16T06:52:30+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=87f40dd6ce7042caca0b3b557e8923127f51f902'/>
<id>urn:sha1:87f40dd6ce7042caca0b3b557e8923127f51f902</id>
<content type='text'>
QFQ+ inherits from QFQ a design choice that may cause a high packet
delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
special quantity, the system virtual time, to track the service
provided by the ideal system it approximates. When a packet is
dequeued, this quantity must be incremented by the size of the packet,
divided by the sum of the weights of the aggregates waiting to be
served. Tracking this sum correctly is a non-trivial task, because, to
preserve tight service guarantees, the decrement of this sum must be
delayed in a special way [1]: this sum can be decremented only after
that its value would decrease also in the ideal system approximated by
QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
weight sum, increased and decreased immediately as the weight of an
aggregate changes, and as an aggregate is created or destroyed (which,
in its turn, happens as a consequence of some class being
created/destroyed/changed). However, to avoid the problems caused to
service guarantees by these immediate decreases, QFQ+ increments the
system virtual time using the maximum value allowed for the weight
sum, 2^10, in place of the dynamic, instantaneous value. The
instantaneous value of the weight sum is used only to check whether a
request of weight increase or a class creation can be satisfied.

Unfortunately, the problems caused by this choice are worse than the
temporary degradation of the service guarantees that may occur, when a
class is changed or destroyed, if the instantaneous value of the
weight sum was used to update the system virtual time. In fact, the
fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
equal to the ratio between the weight of the aggregate and the sum of
the weights of the competing aggregates. The packet delay guaranteed
to the aggregate is instead inversely proportional to the guaranteed
bandwidth. By using the maximum possible value, and not the actual
value of the weight sum, QFQ+ provides each aggregate with the worst
possible service guarantees, and not with service guarantees related
to the actual set of competing aggregates. To see the consequences of
this fact, consider the following simple example.

Suppose that only the following aggregates are backlogged, i.e., that
only the classes in the following aggregates have packets to transmit:
one aggregate with weight 10, say A, and ten aggregates with weight 1,
say B1, B2, ..., B10. In particular, suppose that these aggregates are
always backlogged. Given the weight distribution, the smoothest and
fairest service order would be:
A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...

QFQ+ would provide exactly this optimal service if it used the actual
value for the weight sum instead of the maximum possible value, i.e.,
11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
serves aggregates as follows (easy to prove and to reproduce
experimentally):
A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...

By replacing 10 with N in the above example, and by increasing N, one
can increase at will the maximum packet delay and the jitter
experienced by the classes in aggregate A.

This patch addresses this issue by just using the above
'instantaneous' value of the weight sum, instead of the maximum
possible value, when updating the system virtual time.  After the
instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
service for a time interval in the order of the time to serve one
maximum-size packet for each backlogged class. The worst-case extent
of the deviation exhibited by QFQ+ during this time interval [1] is
basically the same as of the deviation described above (but, without
this patch, QFQ+ suffers from such a deviation all the time). Finally,
this patch modifies the comment to the function qfq_slot_insert, to
make it coherent with the fact that the weight sum used by QFQ+ can
now be lower than the maximum possible value.

[1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
Proceedings of AAA-IDEA'05, June 2005.

Signed-off-by: Paolo Valente &lt;paolo.valente@unimore.it&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts</title>
<updated>2013-07-11T20:01:07+00:00</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@unimore.it</email>
</author>
<published>2013-07-10T13:46:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=88d4f419a43b474a4524f41f55c36bee13416bdd'/>
<id>urn:sha1:88d4f419a43b474a4524f41f55c36bee13416bdd</id>
<content type='text'>
This patch removes the forward declaration of qfq_update_agg_ts, by moving
the definition of the function above its first call. This patch also
removes a useless forward declaration of qfq_schedule_agg.

Reported-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Paolo Valente &lt;paolo.valente@unimore.it&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pkt_sched: sch_qfq: improve efficiency of make_eligible</title>
<updated>2013-07-11T20:01:07+00:00</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@unimore.it</email>
</author>
<published>2013-07-10T13:46:08+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=87f1369d6e2e820c77cf9eac542eed4dcf036f64'/>
<id>urn:sha1:87f1369d6e2e820c77cf9eac542eed4dcf036f64</id>
<content type='text'>
In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result.  The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31.  On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.

The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32.  As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

Reported-by: David S. Miller &lt;davem@davemloft.net&gt;
Reported-by: David Laight &lt;David.Laight@ACULAB.COM&gt;
Signed-off-by: Paolo Valente &lt;paolo.valente@unimore.it&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netem: fix possible NULL deref in netem_dequeue()</title>
<updated>2013-07-03T23:52:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-07-03T21:04:14+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=36b7bfe09b6deb71bf387852465245783c9a6208'/>
<id>urn:sha1:36b7bfe09b6deb71bf387852465245783c9a6208</id>
<content type='text'>
commit aec0a40a6f7884 ("netem: use rb tree to implement the time queue")
added a regression if a child qdisc is attached to netem, as we perform
a NULL dereference.

Fix this by adding a temporary variable to cache
netem_skb_cb(skb)-&gt;time_to_send.

Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netem: use rb tree to implement the time queue</title>
<updated>2013-07-02T01:07:15+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-06-28T14:40:57+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=aec0a40a6f78843c0ce73f7398230ee5184f896d'/>
<id>urn:sha1:aec0a40a6f78843c0ce73f7398230ee5184f896d</id>
<content type='text'>
Following typical setup to implement a ~100 ms RTT and big
amount of reorders has very poor performance because netem
implements the time queue using a linked list.
-----------------------------------------------------------
ETH=eth0
IFB=ifb0
modprobe ifb
ip link set dev $IFB up
tc qdisc add dev $ETH ingress 2&gt;/dev/null
tc filter add dev $ETH parent ffff: \
   protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
   redirect dev $IFB
ethtool -K $ETH gro off tso off gso off
tc qdisc add dev $IFB root netem delay 50ms 10ms limit 100000
tc qd add dev $ETH root netem delay 50ms limit 100000
---------------------------------------------------------

Switch netem time queue to a rb tree, so this kind of setup can work at
high speed.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
