diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-01-03 12:53:47 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-01-03 12:53:47 -0800 |
commit | 43d86ee8c639df750529b4d8f062b328b61c423e (patch) | |
tree | 076161dd7ce3f843b9c965a780ecfbf020f75e8e /Documentation/networking | |
parent | 645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144 (diff) | |
parent | c5ee066333ebc322a24a00a743ed941a0c68617e (diff) | |
download | lwn-43d86ee8c639df750529b4d8f062b328b61c423e.tar.gz lwn-43d86ee8c639df750529b4d8f062b328b61c423e.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Several fixes here. Basically split down the line between newly
introduced regressions and long existing problems:
1) Double free in tipc_enable_bearer(), from Cong Wang.
2) Many fixes to nf_conncount, from Florian Westphal.
3) op->get_regs_len() can throw an error, check it, from Yunsheng
Lin.
4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
driver, from Scott Wood.
5) Inifnite loop in fib_empty_table(), from Yue Haibing.
6) Use after free in ax25_fillin_cb(), from Cong Wang.
7) Fix socket locking in nr_find_socket(), also from Cong Wang.
8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.
9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.
10) Fix ptr_ring wrap during queue swap, from Cong Wang.
11) Missing shutdown callback in hinic driver, from Xue Chaojing.
12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
Brivio.
13) BPF out of bounds speculation fixes from Daniel Borkmann"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
ipv6: Consider sk_bound_dev_if when binding a socket to an address
ipv6: Fix dump of specific table with strict checking
bpf: add various test cases to selftests
bpf: prevent out of bounds speculation on pointer arithmetic
bpf: fix check_map_access smin_value test when pointer contains offset
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
bpf: restrict stack pointer arithmetic for unprivileged
bpf: restrict map value pointer arithmetic for unprivileged
bpf: enable access to ax register also from verifier rewrite
bpf: move tmp variable into ax register in interpreter
bpf: move {prev_,}insn_idx into verifier env
isdn: fix kernel-infoleak in capi_unlocked_ioctl
ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
net/hamradio/6pack: use mod_timer() to rearm timers
net-next/hinic:add shutdown callback
net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
ip: validate header length on virtual device xmit
tap: call skb_probe_transport_header after setting skb->dev
ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
net: rds: remove unnecessary NULL check
...
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/snmp_counter.rst | 240 |
1 files changed, 239 insertions, 1 deletions
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst index f8eb77ddbd44..b0dfdaaca512 100644 --- a/Documentation/networking/snmp_counter.rst +++ b/Documentation/networking/snmp_counter.rst @@ -571,7 +571,97 @@ duplicate packet is received. * TcpExtTCPDSACKOfoRecv The TCP stack receives a DSACK, which indicate an out of order -duplciate packet is received. +duplicate packet is received. + +TCP out of order +=============== +* TcpExtTCPOFOQueue +The TCP layer receives an out of order packet and has enough memory +to queue it. + +* TcpExtTCPOFODrop +The TCP layer receives an out of order packet but doesn't have enough +memory, so drops it. Such packets won't be counted into +TcpExtTCPOFOQueue. + +* TcpExtTCPOFOMerge +The received out of order packet has an overlay with the previous +packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge +packets will also be counted into TcpExtTCPOFOQueue. + +TCP PAWS +======= +PAWS (Protection Against Wrapped Sequence numbers) is an algorithm +which is used to drop old packets. It depends on the TCP +timestamps. For detail information, please refer the `timestamp wiki`_ +and the `RFC of PAWS`_. + +.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17 +.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps + +* TcpExtPAWSActive +Packets are dropped by PAWS in Syn-Sent status. + +* TcpExtPAWSEstab +Packets are dropped by PAWS in any status other than Syn-Sent. + +TCP ACK skip +=========== +In some scenarios, kernel would avoid sending duplicate ACKs too +frequently. Please find more details in the tcp_invalid_ratelimit +section of the `sysctl document`_. When kernel decides to skip an ACK +due to tcp_invalid_ratelimit, kernel would update one of below +counters to indicate the ACK is skipped in which scenario. The ACK +would only be skipped if the received packet is either a SYN packet or +it has no data. + +.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt + +* TcpExtTCPACKSkippedSynRecv +The ACK is skipped in Syn-Recv status. The Syn-Recv status means the +TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is +waiting for an ACK. Generally, the TCP stack doesn't need to send ACK +in the Syn-Recv status. But in several scenarios, the TCP stack need +to send an ACK. E.g., the TCP stack receives the same SYN packet +repeately, the received packet does not pass the PAWS check, or the +received packet sequence number is out of window. In these scenarios, +the TCP stack needs to send ACK. If the ACk sending frequency is higher than +tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and +increase TcpExtTCPACKSkippedSynRecv. + + +* TcpExtTCPACKSkippedPAWS +The ACK is skipped due to PAWS (Protect Against Wrapped Sequence +numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2 +or Time-Wait statuses, the skipped ACK would be counted to +TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or +TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK +would be counted to TcpExtTCPACKSkippedPAWS. + +* TcpExtTCPACKSkippedSeq +The sequence number is out of window and the timestamp passes the PAWS +check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait. + +* TcpExtTCPACKSkippedFinWait2 +The ACK is skipped in Fin-Wait-2 status, the reason would be either +PAWS check fails or the received sequence number is out of window. + +* TcpExtTCPACKSkippedTimeWait +Tha ACK is skipped in Time-Wait status, the reason would be either +PAWS check failed or the received sequence number is out of window. + +* TcpExtTCPACKSkippedChallenge +The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines +3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_, +`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these +three scenarios, In some TCP status, the linux TCP stack would also +send challenge ACKs if the ACK number is before the first +unacknowledged number (more strict than `RFC 5961 section 5.2`_). + +.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7 +.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9 +.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11 + examples ======= @@ -1188,3 +1278,151 @@ Run nstat on server B:: We have deleted the default route on server B. Server B couldn't find a route for the 8.8.8.8 IP address, so server B increased IpOutNoRoutes. + +TcpExtTCPACKSkippedSynRecv +------------------------ +In this test, we send 3 same SYN packets from client to server. The +first SYN will let server create a socket, set it to Syn-Recv status, +and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK +again, and record the reply time (the duplicate ACK reply time). The +third SYN will let server check the previous duplicate ACK reply time, +and decide to skip the duplicate ACK, then increase the +TcpExtTCPACKSkippedSynRecv counter. + +Run tcpdump to capture a SYN packet:: + + nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000 + tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes + +Open another terminal, run nc command:: + + nstatuser@nstat-a:~$ nc nstat-b 9000 + +As the nstat-b didn't listen on port 9000, it should reply a RST, and +the nc command exited immediately. It was enough for the tcpdump +command to capture a SYN packet. A linux server might use hardware +offload for the TCP checksum, so the checksum in the /tmp/syn.pcap +might be not correct. We call tcprewrite to fix it:: + + nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum + +On nstat-b, we run nc to listen on port 9000:: + + nstatuser@nstat-b:~$ nc -lkv 9000 + Listening on [0.0.0.0] (family 0, port 9000) + +On nstat-a, we blocked the packet from port 9000, or nstat-a would send +RST to nstat-b:: + + nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP + +Send 3 SYN repeatly to nstat-b:: + + nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done + +Check snmp cunter on nstat-b:: + + nstatuser@nstat-b:~$ nstat | grep -i skip + TcpExtTCPACKSkippedSynRecv 1 0.0 + +As we expected, TcpExtTCPACKSkippedSynRecv is 1. + +TcpExtTCPACKSkippedPAWS +---------------------- +To trigger PAWS, we could send an old SYN. + +On nstat-b, let nc listen on port 9000:: + + nstatuser@nstat-b:~$ nc -lkv 9000 + Listening on [0.0.0.0] (family 0, port 9000) + +On nstat-a, run tcpdump to capture a SYN:: + + nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000 + tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes + +On nstat-a, run nc as a client to connect nstat-b:: + + nstatuser@nstat-a:~$ nc -v nstat-b 9000 + Connection to nstat-b 9000 port [tcp/*] succeeded! + +Now the tcpdump has captured the SYN and exit. We should fix the +checksum:: + + nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum + +Send the SYN packet twice:: + + nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done + +On nstat-b, check the snmp counter:: + + nstatuser@nstat-b:~$ nstat | grep -i skip + TcpExtTCPACKSkippedPAWS 1 0.0 + +We sent two SYN via tcpreplay, both of them would let PAWS check +failed, the nstat-b replied an ACK for the first SYN, skipped the ACK +for the second SYN, and updated TcpExtTCPACKSkippedPAWS. + +TcpExtTCPACKSkippedSeq +-------------------- +To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid +timestamp (to pass PAWS check) but the sequence number is out of +window. The linux TCP stack would avoid to skip if the packet has +data, so we need a pure ACK packet. To generate such a packet, we +could create two sockets: one on port 9000, another on port 9001. Then +we capture an ACK on port 9001, change the source/destination port +numbers to match the port 9000 socket. Then we could trigger +TcpExtTCPACKSkippedSeq via this packet. + +On nstat-b, open two terminals, run two nc commands to listen on both +port 9000 and port 9001:: + + nstatuser@nstat-b:~$ nc -lkv 9000 + Listening on [0.0.0.0] (family 0, port 9000) + + nstatuser@nstat-b:~$ nc -lkv 9001 + Listening on [0.0.0.0] (family 0, port 9001) + +On nstat-a, run two nc clients:: + + nstatuser@nstat-a:~$ nc -v nstat-b 9000 + Connection to nstat-b 9000 port [tcp/*] succeeded! + + nstatuser@nstat-a:~$ nc -v nstat-b 9001 + Connection to nstat-b 9001 port [tcp/*] succeeded! + +On nstat-a, run tcpdump to capture an ACK:: + + nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001 + tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes + +On nstat-b, send a packet via the port 9001 socket. E.g. we sent a +string 'foo' in our example:: + + nstatuser@nstat-b:~$ nc -lkv 9001 + Listening on [0.0.0.0] (family 0, port 9001) + Connection from nstat-a 42132 received! + foo + +On nstat-a, the tcpdump should have caputred the ACK. We should check +the source port numbers of the two nc clients:: + + nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee + State Recv-Q Send-Q Local Address:Port Peer Address:Port + ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000 + ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001 + +Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to +port 50208:: + + nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum + +Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b:: + + nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done + +Check TcpExtTCPACKSkippedSeq on nstat-b:: + + nstatuser@nstat-b:~$ nstat | grep -i skip + TcpExtTCPACKSkippedSeq 1 0.0 |