summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-08-03sctp: remove the typedef sctp_addip_param_tXin Long
This patch is to remove the typedef sctp_addip_param_t, and replace with struct sctp_addip_param in the places where it's using this typedef. It is to use sizeof(variable) instead of sizeof(type), and also fix some indent problems. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_cwrhdr_tXin Long
This patch is to remove the typedef sctp_cwrhdr_t, and replace with struct sctp_cwrhdr in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_ecne_chunk_tXin Long
This patch is to remove the typedef sctp_ecne_chunk_t, and replace with struct sctp_ecne_chunk in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_ecnehdr_tXin Long
This patch is to remove the typedef sctp_ecnehdr_t, and replace with struct sctp_ecnehdr in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_error_tXin Long
This patch is to remove the typedef sctp_error_t, and replace with enum sctp_error in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_operr_chunk_tXin Long
This patch is to remove the typedef sctp_operr_chunk_t, and replace with struct sctp_operr_chunk in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_errhdr_tXin Long
This patch is to remove the typedef sctp_errhdr_t, and replace with struct sctp_errhdr in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: fix the name of struct sctp_shutdown_chunk_tXin Long
This patch is to fix the name of struct sctp_shutdown_chunk_t , replace with struct sctp_initack_chunk in the places where it's using it. It is also to fix some indent problem. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03sctp: remove the typedef sctp_shutdownhdr_tXin Long
This patch is to remove the typedef sctp_shutdownhdr_t, and replace with struct sctp_shutdownhdr in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03tcp: remove extra POLL_OUT added for finished active connect()Neal Cardwell
Commit 45f119bf936b ("tcp: remove header prediction") introduced a minor bug: the sk_state_change() and sk_wake_async() notifications for a completed active connection happen twice: once in this new spot inside tcp_finish_connect() and once in the existing code in tcp_rcv_synsent_state_process() immediately after it calls tcp_finish_connect(). This commit remoes the duplicate POLL_OUT notifications. Fixes: 45f119bf936b ("tcp: remove header prediction") Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03rds: reduce memory footprint for RDS when transport is RDMASowmini Varadhan
RDS over IB does not use multipath RDS, so the array of additional rds_conn_path structures is always superfluous in this case. Reduce the memory footprint of the rds module by making this a dynamic allocation predicated on whether the transport is mp_capable. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Efrain Galaviz <efrain.galaviz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03ipv4: Introduce ipip_offload_init helper function.Tonghao Zhang
It's convenient to init ipip offload. We will check the return value, and print KERN_CRIT info on failure. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03bpf: fix the printing of ifindexWilliam Tu
Save the ifindex before it gets zeroed so the invalid ifindex can be printed out. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03Merge tag 'batadv-next-for-davem-20170802' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Remove unnecessary length qualifier, by Joe Perches - Remove too short %pM field width, by Sven Eckelmann - Remove return value handling from skb_put_data, by Sven Eckelmann - Spelling fixes, by Colin Ian King - Convert batman-adv.txt to reStructuredText, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03X25: constify null_x25_addressJulia Lawall
null_x25_address is only used to access the string it contains, so it can be const. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02ipv4: fib: Set offload indication according to nexthop flagsIdo Schimmel
We're going to have capable drivers indicate route offload using the nexthop flags, but for non-multipath routes these flags aren't dumped to user space. Instead, set the offload indication in the route message flags. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02net: dsa: Add support for 64-bit statisticsFlorian Fainelli
DSA slave network devices maintain a pair of bytes and packets counters for each directions, but these are not 64-bit capable. Re-use pcpu_sw_netstats which contains exactly what we need for that purpose and update the code path to report 64-bit capable statistics. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02flow_dissector: remove unused functionsWANG Cong
They are introduced by commit f70ea018da06 ("net: Add functions to get skb->hash based on flow structures") but never gets used in tree. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02tcp: tcp_data_queue() cleanupEric Dumazet
Commit c13ee2a4f03f ("tcp: reindent two spots after prequeue removal") removed code in tcp_data_queue(). We can go a little farther, removing an always true test, and removing initializers for fragstolen and eaten variables. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01net: dsa: rename switch EEE opsVivien Didelot
To avoid confusion with the PHY EEE settings, rename the .set_eee and .get_eee ops to respectively .set_mac_eee and .get_mac_eee. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01net: dsa: remove PHY device argument from .set_eeeVivien Didelot
The DSA switch operations for EEE are only meant to configure a port's MAC EEE settings. The port's PHY EEE settings are accessed by the DSA layer and must be made available via a proper PHY driver. In order to reduce this confusion, remove the phy_device argument from the .set_eee operation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01net: dsa: call phy_init_eee in DSA layerVivien Didelot
All DSA drivers are calling phy_init_eee if eee_enabled is true. Move up this statement in the DSA layer to simplify the DSA drivers. qca8k does not require to cache the ethtool_eee structures from now on. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01net: dsa: PHY device is mandatory for EEEVivien Didelot
The port's PHY and MAC are both implied in EEE. The current code does not call the PHY operations if the related device is NULL. Change that by returning -ENODEV if there's no PHY device attached to the interface. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01net: add skb_frag_foreach_page and use with kmap_atomicWillem de Bruijn
Skb frags may contain compound pages. Various operations map frags temporarily using kmap_atomic, but this function works on single pages, not whole compound pages. The distinction is only relevant for high mem pages that require temporary mappings. Introduce a looping mechanism that for compound highmem pages maps one page at a time, does not change behavior on other pages. Use the loop in the kmap_atomic callers in net/core/skbuff.c. Verified by triggering skb_copy_bits with tcpdump -n -c 100 -i ${DEV} -w /dev/null & netperf -t TCP_STREAM -H ${HOST} and by triggering __skb_checksum with ethtool -K ${DEV} tx off repeated the tests with looping on a non-highmem platform (x86_64) by making skb_frag_must_loop always return true. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01strparser: Generalize strparserTom Herbert
Generalize strparser from more than just being used in conjunction with read_sock. strparser will also be used in the send path with zero proxy. The primary change is to create strp_process function that performs the critical processing on skbs. The documentation is also updated to reflect the new uses. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01skbuff: Function to send an skbuf on a socketTom Herbert
Add skb_send_sock to send an skbuff on a socket within the kernel. Arguments include an offset so that an skbuf might be sent in mulitple calls (e.g. send buffer limit is hit). Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01proto_ops: Add locked held versions of sendmsg and sendpageTom Herbert
Add new proto_ops sendmsg_locked and sendpage_locked that can be called when the socket lock is already held. Correspondingly, add kernel_sendmsg_locked and kernel_sendpage_locked as front end functions. These functions will be used in zero proxy so that we can take the socket lock in a ULP sendmsg/sendpage and then directly call the backend transport proto_ops functions. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two minor conflicts in virtio_net driver (bug fix overlapping addition of a helper) and MAINTAINERS (new driver edit overlapping revamp of PHY entry). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Revert "l2tp: constify inet6_protocol structures"Julia Lawall
This reverts commit d04916a48ad4a3db892b664fa9c3a2a693c378ad. inet6_add_protocol and inet6_del_protocol include casts that remove the effect of the const annotation on their parameter, leading to possible runtime crashes. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Revert "ipv6: constify inet6_protocol structures"Julia Lawall
This reverts commit 3a3a4e3054137c5ff5d4d306ec834f6d25d7f95b. inet6_add_protocol and inet6_del_protocol include casts that remove the effect of the const annotation on their parameter, leading to possible runtime crashes. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle notifier registry failures properly in tun/tap driver, from Tonghao Zhang. 2) Fix bpf verifier handling of subtraction bounds and add a testcase for this, from Edward Cree. 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt. 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET, from Cong Wang. 5) Fix SElinux regression due to recent UDP optimizations, from Paolo Abeni. 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code paths, fix from Stefano Brivio. 7) Fix some mem leaks in dccp, from Xin Long. 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd Bergmann. 9) Mac address length check and buffer size fixes from Cong Wang. 10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni. 11) Fix return value when copy_from_user() fails in bpf_prog_get_info_by_fd(), from Daniel Borkmann. 12) Handle PHY_HALTED properly in phy library state machine, from Florian Fainelli. 13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel. 14) Fix truesize calculation in virtio_net which led to performance regressions, from Michael S Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) samples/bpf: fix bpf tunnel cleanup udp6: fix jumbogram reception ppp: Fix a scheduling-while-atomic bug in del_chan Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config" virtio_net: fix truesize for mergeable buffers mv643xx_eth: fix of_irq_to_resource() error check MAINTAINERS: Add more files to the PHY LIBRARY section ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev() net: phy: Correctly process PHY_HALTED in phy_stop_machine() sunhme: fix up GREG_STAT and GREG_IMASK register offsets bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len tcp: avoid bogus gcc-7 array-bounds warning net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt" bpf: don't indicate success when copy_from_user fails udp6: fix socket leak on early demux net: thunderx: Fix BGX transmit stall due to underflow Revert "vhost: cache used event for better performance" team: use a larger struct for mac address net: check dev->addr_len for dev_set_mac_address() phy: bcm-ns-usb3: fix MDIO_BUS dependency ...
2017-07-31udp6: fix jumbogram receptionPaolo Abeni
Since commit 67a51780aebb ("ipv6: udp: leverage scratch area helpers") udp6_recvmsg() read the skb len from the scratch area, to avoid a cache miss. But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their length exceeds the 16 bits available in the scratch area. As a side effect the length returned by recvmsg() is: <ingress datagram len> % (1<<16) This commit addresses the issue allocating one more bit in the IP6CB flags field and setting it for incoming jumbograms. Such field is still in the first cacheline, so at recvmsg() time we can check it and fallback to access skb->len if required, without a measurable overhead. Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31ipv6: Avoid going through ->sk_net to access the netnsJakub Sitnicki
There is no need to go through sk->sk_net to access the net namespace and its sysctl variables because we allocate the sock and initialize sk_net just a few lines earlier in the same routine. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()Ido Schimmel
Michał reported a NULL pointer deref during fib_sync_down_dev() when unregistering a netdevice. The problem is that we don't check for 'in_dev' being NULL, which can happen in very specific cases. Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or the inetaddr notification chains. However, if an interface isn't configured with any IP address, then it's possible for host routes to be flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in inetdev_destroy(). To reproduce: $ ip link add type dummy $ ip route add local 1.1.1.0/24 dev dummy0 $ ip link del dev dummy0 Fix this by checking for the presence of 'in_dev' before referencing it. Fixes: 982acb97560c ("ipv4: fib: Notify about nexthop status changes") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: add related fields into SCM_TIMESTAMPING_OPT_STATSWei Wang
Add the following stats into SCM_TIMESTAMPING_OPT_STATS control msg: TCP_NLA_PACING_RATE TCP_NLA_DELIVERY_RATE TCP_NLA_SND_CWND TCP_NLA_REORDERING TCP_NLA_MIN_RTT TCP_NLA_RECUR_RETRANS TCP_NLA_DELIVERY_RATE_APP_LMT Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: extract the function to compute delivery rateWei Wang
Refactor the code to extract the function to compute delivery rate. This function will be used in later commit. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: remove unused mib countersFlorian Westphal
was used by tcp prequeue and header prediction. TCPFORWARDRETRANS use was removed in january. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: remove CA_ACK_SLOWPATHFlorian Westphal
re-indent tcp_ack, and remove CA_ACK_SLOWPATH; it is always set now. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: remove header predictionFlorian Westphal
Like prequeue, I am not sure this is overly useful nowadays. If we receive a train of packets, GRO will aggregate them if the headers are the same (HP predates GRO by several years) so we don't get a per-packet benefit, only a per-aggregated-packet one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: remove low_latency sysctlFlorian Westphal
Was only checked by the removed prequeue code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: reindent two spots after prequeue removalFlorian Westphal
These two branches are now always true, remove the conditional. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31tcp: remove prequeue supportFlorian Westphal
prequeue is a tcp receive optimization that moves part of rx processing from bh to process context. This only works if the socket being processed belongs to a process that is blocked in recv on that socket. In practice, this doesn't happen anymore that often because nowadays servers tend to use an event driven (epoll) model. Even normal client applications (web browsers) commonly use many tcp connections in parallel. This has measureable impact only in netperf (which uses plain recv and thus allows prequeue use) from host to locally running vm (~4%), however, there were no changes when using netperf between two physical hosts with ixgbe interfaces. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30net sched actions: add time filter for action dumpingJamal Hadi Salim
This patch adds support for filtering based on time since last used. When we are dumping a large number of actions it is useful to have the option of filtering based on when the action was last used to reduce the amount of data crossing to user space. With this patch the user space app sets the TCA_ROOT_TIME_DELTA attribute with the value in milliseconds with "time of interest since now". The kernel converts this to jiffies and does the filtering comparison matching entries that have seen activity since then and returns them to user space. Old kernels and old tc continue to work in legacy mode since they dont specify this attribute. Some example (we have 400 actions bound to 400 filters); at installation time. Using updated when tc setting the time of interest to 120 seconds earlier (we see 400 actions): prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l 400 go get some coffee and wait for > 120 seconds and try again: prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l 0 Lets see a filter bound to one of these actions: .... filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1) match 7f000002/ffffffff at 12 (success 1 ) action order 1: gact action pass random type none pass val 0 index 23 ref 2 bind 1 installed 1145 sec used 802 sec Action statistics: Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 .... that coffee took long, no? It was good. Now lets ping -c 1 127.0.0.2, then run the actions again: prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l 1 More details please: prompt$ hackedtc -s actions ls action gact since 120000 action order 0: gact action pass random type none pass val 0 index 23 ref 2 bind 1 installed 1270 sec used 30 sec Action statistics: Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 And the filter? filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2) match 7f000002/ffffffff at 12 (success 2 ) action order 1: gact action pass random type none pass val 0 index 23 ref 2 bind 1 installed 1324 sec used 84 sec Action statistics: Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batchJamal Hadi Salim
When you dump hundreds of thousands of actions, getting only 32 per dump batch even when the socket buffer and memory allocations allow is inefficient. With this change, the user will get as many as possibly fitting within the given constraints available to the kernel. The top level action TLV space is extended. An attribute TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON is set by the user indicating the user is capable of processing these large dumps. Older user space which doesnt set this flag doesnt get the large (than 32) batches. The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many actions are put in a single batch. As such user space app knows how long to iterate (independent of the type of action being dumped) instead of hardcoded maximum of 32 thus maintaining backward compat. Some results dumping 1.5M actions below: first an unpatched tc which doesnt understand these features... prompt$ time -p tc actions ls action gact | grep index | wc -l 1500000 real 1388.43 user 2.07 sys 1386.79 Now lets see a patched tc which sets the correct flags when requesting a dump: prompt$ time -p updatedtc actions ls action gact | grep index | wc -l 1500000 real 178.13 user 2.02 sys 176.96 That is about 8x performance improvement for tc app which sets its receive buffer to about 32K. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30net sched actions: Use proper root attribute table for actionsJamal Hadi Salim
Bug fix for an issue which has been around for about a decade. We got away with it because the enumeration was larger than needed. Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API") Suggested-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29tcp: avoid bogus gcc-7 array-bounds warningArnd Bergmann
When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a false-positive warning: net/ipv4/tcp_output.c: In function 'tcp_connect': net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ^~ net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds] tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ I have opened a gcc bug for this, but distros have already shipped compilers with this problem, and it's not clear yet whether there is a way for gcc to avoid the warning. As the problem is related to the bitfield access, this introduces a temporary variable to store the old enum value. I did not notice this warning earlier, since UBSAN is disabled when building with COMPILE_TEST, and that was always turned on in both allmodconfig and randconfig tests. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29net: ethtool: add support for forward error correction modesVidya Sagar Ravipati
Forward Error Correction (FEC) modes i.e Base-R and Reed-Solomon modes are introduced in 25G/40G/100G standards for providing good BER at high speeds. Various networking devices which support 25G/40G/100G provides ability to manage supported FEC modes and the lack of FEC encoding control and reporting today is a source for interoperability issues for many vendors. FEC capability as well as specific FEC mode i.e. Base-R or RS modes can be requested or advertised through bits D44:47 of base link codeword. This patch set intends to provide option under ethtool to manage and report FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by specs. set-fec/show-fec option(s) are designed to provide control and report the FEC encoding on the link. SET FEC option: root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto] Encoding: Types of encoding Off : Turning off any encoding RS : enforcing RS-FEC encoding on supported speeds BaseR : enforcing Base R encoding on supported speeds Auto : IEEE defaults for the speed/medium combination Here are a few examples of what we would expect if encoding=auto: - if autoneg is on, we are expecting FEC to be negotiated as on or off as long as protocol supports it - if the hardware is capable of detecting the FEC encoding on it's receiver it will reconfigure its encoder to match - in absence of the above, the configuration would be set to IEEE defaults. >From our understanding , this is essentially what most hardware/driver combinations are doing today in the absence of a way for users to control the behavior. SHOW FEC option: root@tor: ethtool --show-fec swp1 FEC parameters for swp1: Active FEC encodings: RS Configured FEC encodings: RS | BaseR ETHTOOL DEVNAME output modification: ethtool devname output: root@tor:~# ethtool swp1 Settings for swp1: root@hpe-7712-03:~# ethtool swp18 Settings for swp18: Supported ports: [ FIBRE ] Supported link modes: 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full Supported pause frame use: No Supports auto-negotiation: Yes Supported FEC modes: [RS | BaseR | None | Not reported] Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: [RS | BaseR | None | Not reported] <<<< One or more FEC modes Speed: 100000Mb/s Duplex: Full Port: FIBRE PHYAD: 106 Transceiver: internal Auto-negotiation: off Link detected: yes This patch includes following changes a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by the new get_fecparam/set_fecparam callbacks, provides support for configuration of forward error correction modes. b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC are defined so that users can configure these fec modes for supported and advertising fields as part of link autonegotiation. Signed-off-by: Vidya Sagar Ravipati <vidya.chowdary@gmail.com> Signed-off-by: Dustin Byford <dustin@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29udp6: fix socket leak on early demuxPaolo Abeni
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the sk reference is retrieved and used, but the relevant reference count is leaked and the socket destructor is never called. Beyond leaking the sk memory, if there are pending UDP packets in the receive queue, even the related accounted memory is leaked. In the long run, this will cause persistent forward allocation errors and no UDP skbs (both ipv4 and ipv6) will be able to reach the user-space. Fix this by explicitly accessing the early demux reference before the lookup, and properly decreasing the socket reference count after usage. Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and the now obsoleted comment about "socket cache". The newly added code is derived from the current ipv4 code for the similar path. v1 -> v2: fixed the __udp6_lib_rcv() return code for resubmission, as suggested by Eric Reported-by: Sam Edwards <CFSworks@gmail.com> Reported-by: Marc Haber <mh+netdev@zugschlus.de> Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29net: check dev->addr_len for dev_set_mac_address()WANG Cong
Historically, dev_ifsioc() uses struct sockaddr as mac address definition, this is why dev_set_mac_address() accepts a struct sockaddr pointer as input but now we have various types of mac addresse whose lengths are up to MAX_ADDR_LEN, longer than struct sockaddr, and saved in dev->addr_len. It is too late to fix dev_ifsioc() due to API compatibility, so just reject those larger than sizeof(struct sockaddr), otherwise we would read and use some random bytes from kernel stack. Fortunately, only a few IPv6 tunnel devices have addr_len larger than sizeof(struct sockaddr) and they don't support ndo_set_mac_addr(). But with team driver, in lb mode, they can still be enslaved to a team master and make its mac addr length as the same. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29net/smc: synchronize buffer usage with deviceUrsula Braun
Usage of send buffer "sndbuf" is synced (a) before filling sndbuf for cpu access (b) after filling sndbuf for device access Usage of receive buffer "RMB" is synced (a) before reading RMB content for cpu access (b) after reading RMB content for device access Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>