summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-09-28fib: introduce FIB notification infrastructureJiri Pirko
This allows to pass information about added/deleted FIB entries/rules to whoever is interested. This is done in a very similar way as devinet notifies address additions/removals. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28net/sched: cls_flower: Use a proper mask value for enc key id parameterHadar Hen Zion
The current code use the encapsulation key id value as the mask of that parameter which is wrong. Fix that by using a full mask. Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-09-25 Here are a few more Bluetooth & 802.15.4 patches for the 4.9 kernel that have popped up during the past week: - New USB ID for QCA_ROME Bluetooth device - NULL pointer dereference fix for Bluetooth mgmt sockets - Fixes for BCSP driver - Fix for updating LE scan response Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25Merge branch 'master' of ↵Pablo Neira Ayuso
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Conflicts: net/netfilter/core.c net/netfilter/nf_tables_netdev.c Resolve two conflicts before pull request for David's net-next tree: 1) Between c73c24849011 ("netfilter: nf_tables_netdev: remove redundant ip_hdr assignment") from the net tree and commit ddc8b6027ad0 ("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()"). 2) Between e8bffe0cf964 ("net: Add _nf_(un)register_hooks symbols") and Aaron Conole's patches to replace list_head with single linked list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nf_log: get rid of XT_LOG_* macrosLiping Zhang
nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros here is not appropriate. Replace them with NF_LOG_XXX. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_log: complete NFTA_LOG_FLAGS attr supportLiping Zhang
NFTA_LOG_FLAGS attribute is already supported, but the related NF_LOG_XXX flags are not exposed to the userspace. So we cannot explicitly enable log flags to log uid, tcp sequence, ip options and so on, i.e. such rule "nft add rule filter output log uid" is not supported yet. So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In order to keep consistent with other modules, change NF_LOG_MASK to refer to all supported log flags. On the other hand, add a new NF_LOG_DEFAULT_MASK to refer to the original default log flags. Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the userspace. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nf_tables: add range expressionPablo Neira Ayuso
Inverse ranges != [a,b] are not currently possible because rules are composites of && operations, and we need to express this: data < a || data > b This patch adds a new range expression. Positive ranges can be already through two cmp expressions: cmp(sreg, data, >=) cmp(sreg, data, <=) This new range expression provides an alternative way to express this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: xt_socket: fix transparent match for IPv6 request socketsKOVACS Krisztian
The introduction of TCP_NEW_SYN_RECV state, and the addition of request sockets to the ehash table seems to have broken the --transparent option of the socket match for IPv6 (around commit a9407000). Now that the socket lookup finds the TCP_NEW_SYN_RECV socket instead of the listener, the --transparent option tries to match on the no_srccheck flag of the request socket. Unfortunately, that flag was only set for IPv4 sockets in tcp_v4_init_req() by copying the transparent flag of the listener socket. This effectively causes '-m socket --transparent' not match on the ACK packet sent by the client in a TCP handshake. Based on the suggestion from Eric Dumazet, this change moves the code initializing no_srccheck to tcp_conn_request(), rendering the above scenario working again. Fixes: a940700003 ("netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support") Signed-off-by: Alex Badics <alex.badics@balabit.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: evict stale entries when user reads /proc/net/nf_conntrackFlorian Westphal
Fabian reports a possible conntrack memory leak (could not reproduce so far), however, one minor issue can be easily resolved: > cat /proc/net/nf_conntrack | wc -l = 5 > 4 minutes required to clean up the table. We should not report those timed-out entries to the user in first place. And instead of just skipping those timed-out entries while iterating over the table we can also zap them (we already do this during ctnetlink walks, but I forgot about the /proc interface). Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer") Reported-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: xt_hashlimit: Create revision 2 to support higher pps ratesVishwanath Pai
Create a new revision for the hashlimit iptables extension module. Rev 2 will support higher pps of upto 1 million, Version 1 supports only 10k. To support this we have to increase the size of the variables avg and burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2 and xt_hashlimit_mtinfo2 and also create newer versions of all the functions for match, checkentry and destroy. Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very similar in both rev1 and rev2 with only minor changes, so I have split those functions and moved all the common code to a *_common function. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: xt_hashlimit: Prepare for revision 2Vishwanath Pai
I am planning to add a revision 2 for the hashlimit xtables module to support higher packets per second rates. This patch renames all the functions and variables related to revision 1 by adding _v1 at the end of the names. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_ct: report error if mark and dir specified simultaneouslyLiping Zhang
NFT_CT_MARK is unrelated to direction, so if NFTA_CT_DIRECTION attr is specified, report EINVAL to the userspace. This validation check was already done at nft_ct_get_init, but we missed it in nft_ct_set_init. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_ct: unnecessary to require dir when use ct l3proto/protocolLiping Zhang
Currently, if the user want to match ct l3proto, we must specify the direction, for example: # nft add rule filter input ct original l3proto ipv4 ^^^^^^^^ Otherwise, error message will be reported: # nft add rule filter input ct l3proto ipv4 nft add rule filter input ct l3proto ipv4 <cmdline>:1:1-38: Error: Could not process rule: Invalid argument add rule filter input ct l3proto ipv4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Actually, there's no need to require NFTA_CT_DIRECTION attr, because ct l3proto and protocol are unrelated to direction. And for compatibility, even if the user specify the NFTA_CT_DIRECTION attr, do not report error, just skip it. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ackGao Feng
It is valid that the TCP RST packet which does not set ack flag, and bytes of ack number are zero. But current seqadj codes would adjust the "0" ack to invalid ack number. Actually seqadj need to check the ack flag before adjust it for these RST packets. The following is my test case client is 10.26.98.245, and add one iptable rule: iptables -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2: --connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with tcp-reset This iptables rule could generate on TCP RST without ack flag. server:10.172.135.55 Enable the synproxy with seqadjust by the following iptables rules iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m tcp --syn -j CT --notrack iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack --ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460 iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack --ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT The following is my test result. 1. packet trace on client root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829, win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7], length 0 IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266, ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884, nop,wscale 7], length 0 IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229, options [nop,nop,TS val 452367885 ecr 15643479], length 0 IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226, options [nop,nop,TS val 15643479 ecr 452367885], length 0 IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830, win 0, length 0 2. seqadj log on server [62873.867319] Adjusting sequence number from 602341895->546723267, ack from 3695959830->3695959830 [62873.867644] Adjusting sequence number from 602341895->546723267, ack from 3695959830->3695959830 [62873.869040] Adjusting sequence number from 3695959830->3695959830, ack from 0->55618628 To summarize, it is clear that the seqadj codes adjust the 0 ack when receive one TCP RST packet without ack. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: replace list_head with single linked listAaron Conole
The netfilter hook list never uses the prev pointer, and so can be trimmed to be a simple singly-linked list. In addition to having a more light weight structure for hook traversal, struct net becomes 5568 bytes (down from 6400) and struct net_device becomes 2176 bytes (down from 2240). Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25Merge tag 'rxrpc-rewrite-20160924' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Implement slow-start and other bits This set of patches implements the RxRPC slow-start feature for AF_RXRPC to improve performance and handling of occasional packet loss. This is more or less the same as TCP slow start [RFC 5681]. Firstly, there are some ACK generation improvements: (1) Send ACKs regularly to apprise the peer of our state so that they can do congestion management of their own. (2) Send an ACK when we fill in a hole in the buffer so that the peer can find out that we did this thus forestalling retransmission. (3) Note the final DATA packet's serial number in the final ACK for correlation purposes. and a couple of bug fixes: (4) Reinitialise the ACK state and clear the ACK and resend timers upon entering the client reply reception phase to kill off any pending probe ACKs. (5) Delay the resend timer to allow for nsec->jiffies conversion errors. and then there's the slow-start pieces: (6) Summarise an ACK. (7) Schedule a PING or IDLE ACK if the reply to a client call is overdue to try and find out what happened to it. (8) Implement the slow start feature. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24gre: use nla_get_be32() to extract flowinfoLance Richardson
Eliminate a sparse endianness mismatch warning, use nla_get_be32() to extract a __be32 value instead of nla_get_u32(). Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24rxrpc: Implement slow-startDavid Howells
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A tracepoint is added to log the state of the congestion management algorithm and the decisions it makes. Notes: (1) Since we send fixed-size DATA packets (apart from the final packet in each phase), counters and calculations are in terms of packets rather than bytes. (2) The ACK packet carries the equivalent of TCP SACK. (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly suited to SACK of a small number of packets. It seems that, almost inevitably, by the time three 'duplicate' ACKs have been seen, we have narrowed the loss down to one or two missing packets, and the FLIGHT_SIZE calculation ends up as 2. (4) In rxrpc_resend(), if there was no data that apparently needed retransmission, we transmit a PING ACK to ask the peer to tell us what its Rx window state is. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Schedule an ACK if the reply to a client call appears overdueDavid Howells
If we've sent all the request data in a client call but haven't seen any sign of the reply data yet, schedule an ACK to be sent to the server to find out if the reply data got lost. If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to demand a response to find out whether we need to retransmit. If the server says it has received all of the data, we send an IDLE ACK to tell the server that we haven't received anything in the receive phase as yet. To make this work, a non-immediate PING ACK must carry a delay. I've chosen the same as the IDLE ACK for the moment. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Generate a summary of the ACK state for later useDavid Howells
Generate a summary of the Tx buffer packet state when an ACK is received for use in a later patch that does congestion management. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Delay the resend timer to allow for nsec->jiffies conv errorDavid Howells
When determining the resend timer value, we have a value in nsec but the timer is in jiffies which may be a million or more times more coarse. nsecs_to_jiffies() rounds down - which means that the resend timeout expressed as jiffies is very likely earlier than the one expressed as nanoseconds from which it was derived. The problem is that rxrpc_resend() gets triggered by the timer, but can't then find anything to resend yet. It sets the timer again - but gets kicked off immediately again and again until the nanosecond-based expiry time is reached and we actually retransmit. Fix this by adding 1 to the jiffies-based resend_at value to counteract the rounding and make sure that the timer happens after the nanosecond-based expiry is passed. Alternatives would be to adjust the timestamp on the packets to align with the jiffie scale or to switch back to using jiffie-timestamps. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Reinitialise the call ACK and timer state for client reply phaseDavid Howells
Clear the ACK reason, ACK timer and resend timer when entering the client reply phase when the first DATA packet is received. New ACKs will be proposed once the data is queued. The resend timer is no longer relevant and we need to cancel ACKs scheduled to probe for a lost reply. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Include the last reply DATA serial number in the final ACKDavid Howells
In a client call, include the serial number of the last DATA packet of the reply in the final ACK. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Send an immediate ACK if we fill in a holeDavid Howells
Send an immediate ACK if we fill in a hole in the buffer left by an out-of-sequence packet. This may allow the congestion management in the peer to avoid a retransmission if packets got reordered on the wire. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24netfilter: Only allow sane values in nf_register_net_hookAaron Conole
This commit adds an upfront check for sane values to be passed when registering a netfilter hook. This will be used in a future patch for a simplified hook list traversal. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: Remove explicit rcu_read_lock in nf_hook_slowAaron Conole
All of the callers of nf_hook_slow already hold the rcu_read_lock, so this cleanup removes the recursive call. This is just a cleanup, as the locking code gracefully handles this situation. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: call nf_hook_ingress with rcu_read_lockAaron Conole
This commit ensures that the rcu read-side lock is held while the ingress hook is called. This ensures that a call to nf_hook_slow (and ultimately nf_ingress) will be read protected. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: bridge: add and use br_nf_hook_threshFlorian Westphal
This replaces the last uses of NF_HOOK_THRESH(). Followup patch will remove it and rename nf_hook_thresh. The reason is that inet (non-bridge) netfilter no longer invokes the hooks from hooks, so we do no longer need the thresh value to skip hooks with a lower priority. The bridge netfilter however may need to do this. br_nf_hook_thresh is a wrapper that is supposed to do this, i.e. only call hooks with a priority that exceeds NF_BR_PRI_BRNF. It's used only in the recursion cases of br_netfilter. It invokes nf_hook_slow while holding an rcu read-side critical section to make a future cleanup simpler. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: xt_TCPMSS: Refactor the codes to decrease one condition check and ↵Gao Feng
more readable The origin codes perform two condition checks with dst_mtu(skb_dst(skb)) and in_mtu. And the last statement is "min(dst_mtu(skb_dst(skb)), in_mtu) - minlen". It may let reader think about how about the result. Would it be negative. Now assign the result of min(dst_mtu(skb_dst(skb)), in_mtu) to a new variable, then only perform one condition check, and it is more readable. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24rxrpc: Send an ACK after every few DATA packets we receiveDavid Howells
Send an ACK if we haven't sent one for the last two packets we've received. This keeps the other end apprised of where we've got to - which is important if they're doing slow-start. We do this in recvmsg so that we can dispatch a packet directly without the need to wake up the background thread. This should possibly be made configurable in future. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24Merge tag 'rxrpc-rewrite-20160923' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Bug fixes and tracepoints Here are a bunch of bug fixes: (1) Need to set the timestamp on a Tx packet before queueing it to avoid trouble with the retransmission function. (2) Don't send an ACK at the end of the service reply transmission; it's the responsibility of the client to send an ACK to close the call. The service can resend the last DATA packet or send a PING ACK. (3) Wake sendmsg() on abnormal call termination. (4) Use ktime_add_ms() not ktime_add_ns() to add millisecond offsets. (5) Use before_eq() & co. to compare serial numbers (which may wrap). (6) Start the resend timer on DATA packet transmission. (7) Don't accidentally cancel a retransmission upon receiving a NACK. (8) Fix the call timer setting function to deal with timeouts that are now or past. (9) Don't use a flag to communicate the presence of the last packet in the Tx buffer from sendmsg to the input routines where ACK and DATA reception is handled. The problem is that there's a window between queueing the last packet for transmission and setting the flag in which ACKs or reply DATA packets can arrive, causing apparent state machine violation issues. Instead use the annotation buffer to mark the last packet and pick up and set the flag in the input routines. (10) Don't call the tx_ack tracepoint and don't allocate a serial number if someone else nicked the ACK we were about to transmit. There are also new tracepoints and one altered tracepoint used to track down the above bugs: (11) Call timer tracepoint. (12) Data Tx tracepoint (and adjustments to ACK tracepoint). (13) Injected Rx packet loss tracepoint. (14) Ack proposal tracepoint. (15) Retransmission selection tracepoint. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2016-09-23 Only two patches this time: 1) Fix a comment reference to struct xfrm_replay_state_esn. From Richard Guy Briggs. 2) Convert xfrm_state_lookup to rcu, we don't need the xfrm_state_lock anymore in the input path. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24net: Update API for VF vlan protocol 802.1ad supportMoshe Shemesh
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving the ability for user-space application to specify it for the VF, as an option to support 802.1ad. We adjusted IP Link tool to support this option. For future use cases, the new UAPI supports multiple vlans. For now we limit the list size to a single vlan in kernel. Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward compatibility with older versions of IP Link tool. Add a vlan protocol parameter to the ndo_set_vf_vlan callback. We kept 802.1Q as the drivers' default vlan protocol. Suitable ip link tool command examples: Set vf vlan protocol 802.1ad: ip link set eth0 vf 1 vlan 100 proto 802.1ad Set vf to VST (802.1Q) mode: ip link set eth0 vf 1 vlan 100 proto 802.1Q Or by omitting the new parameter ip link set eth0 vf 1 vlan 100 Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23rxrpc: Add a tracepoint to log which packets will be retransmittedDavid Howells
Add a tracepoint to log in rxrpc_resend() which packets will be retransmitted. Note that if a positive ACK comes in whilst we have dropped the lock to retransmit another packet, the actual retransmission may not happen, though some of the effects will (such as altering the congestion management). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add tracepoint for ACK proposalDavid Howells
Add a tracepoint to log proposed ACKs, including whether the proposal is used to update a pending ACK or is discarded in favour of an easlier, higher priority ACK. Whilst we're at it, get rid of the rxrpc_acks() function and access the name array directly. We do, however, need to validate the ACK reason number given to trace_rxrpc_rx_ack() to make sure we don't overrun the array. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint to log injected Rx packet lossDavid Howells
Add a tracepoint to log received packets that get discarded due to Rx packet loss. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepointDavid Howells
Add a tracepoint to log transmission of DATA packets (including loss injection). Adjust the ACK transmission tracepoint to include the packet serial number and to line this up with the DATA transmission display. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint for the call timerDavid Howells
Add a tracepoint to log call timer initiation, setting and expiry. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Don't call the tx_ack tracepoint if don't generate an ACKDavid Howells
rxrpc_send_call_packet() is invoking the tx_ack tracepoint before it checks whether there's an ACK to transmit (another thread may jump in and transmit it). Fix this by only invoking the tracepoint if we get a valid ACK to transmit. Further, only allocate a serial number if we're going to actually transmit something. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Pass the last Tx packet marker in the annotation bufferDavid Howells
When the last packet of data to be transmitted on a call is queued, tx_top is set and then the RXRPC_CALL_TX_LAST flag is set. Unfortunately, this leaves a race in the ACK processing side of things because the flag affects the interpretation of tx_top and also allows us to start receiving reply data before we've finished transmitting. To fix this, make the following changes: (1) rxrpc_queue_packet() now sets a marker in the annotation buffer instead of setting the RXRPC_CALL_TX_LAST flag. (2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the same context as the routines that use it. (3) rxrpc_end_tx_phase() is simplified to just shift the call state. The Tx window must have been rotated before calling to discard the last packet. (4) rxrpc_receiving_reply() is added to handle the arrival of the first DATA packet of a reply to a client call (which is an implicit ACK of the Tx phase). (5) The last part of rxrpc_input_ack() is reordered to perform Tx rotation, then soft-ACK application and then to end the phase if we've rotated the last packet. In the event of a terminal ACK, the soft-ACK application will be skipped as nAcks should be 0. (6) rxrpc_input_ackall() now has to rotate as well as ending the phase. In addition: (7) Alter the transmit tracepoint to log the rotation of the last packet. (8) Remove the no-longer relevant queue_reqack tracepoint note. The ACK-REQUESTED packet header flag is now set as needed when we actually transmit the packet and may vary by retransmission. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Fix call timerDavid Howells
Fix the call timer in the following ways: (1) If call->resend_at or call->ack_at are before or equal to the current time, then ignore that timeout. (2) If call->expire_at is before or equal to the current time, then don't set the timer at all (possibly we should queue the call). (3) Don't skip modifying the timer if timer_pending() is true. This indicates that the timer is working, not that it has expired and is running/waiting to run its expiry handler. Also call rxrpc_set_timer() to start the call timer going rather than calling add_timer(). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Fix accidental cancellation of scheduled resend by ACK parserDavid Howells
When rxrpc_input_soft_acks() is parsing the soft-ACKs from an ACK packet, it updates the Tx packet annotations in the annotation buffer. If a soft-ACK is an ACK, then we overwrite unack'd, nak'd or to-be-retransmitted states and that is fine; but if the soft-ACK is an NACK, we overwrite the to-be-retransmitted with a nak - which isn't. Instead, we need to let any scheduled retransmission stand if the packet was NAK'd. Note that we don't reissue a resend if the annotation is in the to-be-retransmitted state because someone else must've scheduled the resend already. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Need to start the resend timer on initial transmissionDavid Howells
When a DATA packet has its initial transmission, we may need to start or adjust the resend timer. Without this we end up relying on being sent a NACK to initiate the resend. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Use before_eq() and friends to compare serial numbersDavid Howells
before_eq() and friends should be used to compare serial numbers (when not checking for (non)equality) rather than casting to int, subtracting and checking the result. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23bpf: add helper to invalidate hashDaniel Borkmann
Add a small helper that complements 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs") for invalidating the current skb->hash after mangling on headers via direct packet write. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23bpf: use bpf_get_smp_processor_id_proto instead of raw oneDaniel Borkmann
Same motivation as in commit 80b48c445797 ("bpf: don't use raw processor id in generic helper"), but this time for XDP typed programs. Thus, allow for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise use the raw variant. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23bpf: use skb_to_full_sk helper in bpf_skb_under_cgroupDaniel Borkmann
We need to use skb_to_full_sk() helper introduced in commit bd5eb35f16a9 ("xfrm: take care of request sockets") as otherwise we miss tcp synack messages, since ownership is on request socket and therefore it would miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly in the bpf_get_cgroup_classid() helper via 2309236c13fe ("cls_cgroup: get sk_classid only from full sockets") fix to not let this fall through. Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net: dsa: add port fast ageingVivien Didelot
Today the DSA drivers are in charge of flushing the MAC addresses associated to a port when its STP state changes from Learning or Forwarding, to Disabled or Blocking or Listening. This makes the drivers more complex and hides the generic switch logic. Introduce a new optional port_fast_age operation to dsa_switch_ops, to move this logic to the DSA layer and keep drivers simple. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net: dsa: add port STP state helperVivien Didelot
Add a void helper to set the STP state of a port, checking first if the required routine is provided by the driver. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23rxrpc: Should be using ktime_add_ms() not ktime_add_ns()David Howells
ktime_add_ms() should be used to add the resend time (in ms) rather than ktime_add_ns(). Signed-off-by: David Howells <dhowells@redhat.com>