diff options
author | Eric Dumazet <eric.dumazet@gmail.com> | 2012-04-24 22:12:06 -0400 |
---|---|---|
committer | Ben Hutchings <ben@decadent.org.uk> | 2012-05-11 13:14:17 +0100 |
commit | 0d63a9816f225228d72461444c6b1e5f58c6f90b (patch) | |
tree | e2df0b1b12e65581d9efb94dec6d09041af8e598 /fs/splice.c | |
parent | 410322fe638452e1e444093033937965bc68a0db (diff) | |
download | lwn-0d63a9816f225228d72461444c6b1e5f58c6f90b.tar.gz lwn-0d63a9816f225228d72461444c6b1e5f58c6f90b.zip |
tcp: allow splice() to build full TSO packets
[ This combines upstream commit
2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix
commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ]
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.
4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)
This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)
In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.
Call tcp_push() only if MSG_MORE is not set in the flags parameter.
This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.
In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Diffstat (limited to 'fs/splice.c')
-rw-r--r-- | fs/splice.c | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/fs/splice.c b/fs/splice.c index fa2defa8afcf..6d0dfb89c75a 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -31,6 +31,7 @@ #include <linux/uio.h> #include <linux/security.h> #include <linux/gfp.h> +#include <linux/socket.h> /* * Attempt to steal a page from a pipe buffer. This should perhaps go into @@ -691,7 +692,9 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, if (!likely(file->f_op && file->f_op->sendpage)) return -EINVAL; - more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len; + more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; + if (sd->len < sd->total_len) + more |= MSG_SENDPAGE_NOTLAST; return file->f_op->sendpage(file, buf->page, buf->offset, sd->len, &pos, more); } |