lwn.git/net/ipv4/tcp_input.c, branch v3.9.9

tcp: bug fix in proportional rate reduction.

2013-06-27T17:39:00+00:00

[ Upstream commit 35f079ebbc860dcd1cca70890c9c8d59c1145525 ] This patch is a fix for a bug triggering newly_acked_sacked < 0 in tcp_ack(.). The bug is triggered by sacked_out decreasing relative to prior_sacked, but packets_out remaining the same as pior_packets. This is because the snapshot of prior_packets is taken after tcp_sacktag_write_queue() while prior_sacked is captured before tcp_sacktag_write_queue(). The problem is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment) adjusts the pcount for packets_out and sacked_out (MSS change or other reason). As a result, this delta in pcount is reflected in (prior_sacked - sacked_out) but not in (prior_packets - packets_out). This patch does the following: 1) initializes prior_packets at the start of tcp_ack() so as to capture the delta in packets_out created by tcp_fragment. 2) introduces a new "previous_packets_out" variable that snapshots packets_out right before tcp_clean_rtx_queue, so pkts_acked can be correctly computed as before. 3) Computes pkts_acked using previous_packets_out, and computes newly_acked_sacked using prior_packets. Signed-off-by: Nandita Dukkipati Acked-by: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

tcp: call tcp_replace_ts_recent() from tcp_ack()

2013-04-19T18:21:53+00:00

commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()) introduced a TS ecr bug in slow path processing. 1 A > B P. 1:10001(10000) ack 1 2 B < A . 1:1(0) ack 1 win 257 3 A > B . 1:1001(1000) ack 1 win 227 4 A > B . 1001:2001(1000) ack 1 win 227 (ecr 200 should be ecr 300 in packets 3 & 4) Problem is tcp_ack() can trigger send of new packets (retransmits), reflecting the prior TSval, instead of the TSval contained in the currently processed incoming packet. Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the checks, but before the actions. Reported-by: Yuchung Cheng Signed-off-by: Eric Dumazet Cc: Neal Cardwell Acked-by: Neal Cardwell Signed-off-by: David S. Miller

tcp: undo spurious timeout after SACK reneging

2013-03-24T21:27:28+00:00

On SACK reneging the sender immediately retransmits and forces a timeout but disables Eifel (undo). If the (buggy) receiver does not drop any packet this can trigger a false slow-start retransmit storm driven by the ACKs of the original packets. This can be detected with undo and TCP timestamps. Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller

tcp: fix double-counted receiver RTT when leaving receiver fast path

2013-03-04T19:12:07+00:00

We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both before and after going to step5. That wastes CPU and double-counts the receiver-side RTT sample. Signed-off-by: Neal Cardwell Acked-by: Eric Dumazet Signed-off-by: David S. Miller

net: Fix possible wrong checksum generation.

2013-02-13T18:30:10+00:00

Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar Signed-off-by: David S. Miller

tcp: send packets with a socket timestamp

2013-02-13T18:22:16+00:00

A socket timestamp is a sum of the global tcp_time_stamp and a per-socket offset. A socket offset is added in places where externally visible tcp timestamp option is parsed/initialized. Connections in the SYN_RECV state are not supported, global tcp_time_stamp is used for them, because repair mode doesn't support this state. In a future it can be implemented by the similar way as for TIME_WAIT sockets. Cc: "David S. Miller" Cc: Alexey Kuznetsov Cc: James Morris Cc: Hideaki YOSHIFUJI Cc: Patrick McHardy Cc: Eric Dumazet Cc: Pavel Emelyanov Signed-off-by: Andrey Vagin Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2013-02-08T23:02:14+00:00

Synchronize with 'net' in order to sort out some l2tp, wireless, and ipv6 GRE fixes that will be built on top of in 'net-next'. Signed-off-by: David S. Miller

tcp: fix for zero packets_in_flight was too broad

2013-02-06T20:53:03+00:00

There are transients during normal FRTO procedure during which the packets_in_flight can go to zero between write_queue state updates and firing the resulting segments out. As FRTO processing occurs during that window the check must be more precise to not match "spuriously" :-). More specificly, e.g., when packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic branch that set cwnd into zero would not be taken and new segments might be sent out later. Signed-off-by: Ilpo Järvinen Tested-by: Eric Dumazet Acked-by: Neal Cardwell Signed-off-by: David S. Miller

tcp: remove Appropriate Byte Count support

2013-02-05T19:51:16+00:00

TCP Appropriate Byte Count was added by me, but later disabled. There is no point in maintaining it since it is a potential source of bugs and Linux already implements other better window protection heuristics. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2013-02-05T19:12:20+00:00

Conflicts: drivers/net/ethernet/intel/e1000e/ethtool.c drivers/net/vmxnet3/vmxnet3_drv.c drivers/net/wireless/iwlwifi/dvm/tx.c net/ipv6/route.c The ipv6 route.c conflict is simple, just ignore the 'net' side change as we fixed the same problem in 'net-next' by eliminating cached neighbours from ipv6 routes. The e1000e conflict is an addition of a new statistic in the ethtool code, trivial. The vmxnet3 conflict is about one change in 'net' removing a guarding conditional, whilst in 'net-next' we had a netdev_info() conversion. The iwlwifi conflict is dealing with a WARN_ON() conversion in 'net-next' vs. a revert happening in 'net'. Signed-off-by: David S. Miller