From d6e640f9766e2fb9aa3853b4ff19e4d7d5d7e373 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Tue, 8 May 2012 22:20:33 +0200 Subject: can: update documentation wording error frames -> error messages As Heinz-Juergen Oertel pointed out 'CAN error frames' are a already defined term for the CAN protocol violation indication on the wire. To avoid confusion with the error messages created by CAN drivers available via CAN RAW sockets update the documentation and change the naming from 'error frames' to 'error messages' or 'error message frames'. Signed-off-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- Documentation/networking/can.txt | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 56ca3b75376e..28d9b14c34ec 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -232,16 +232,16 @@ solution for a couple of reasons: arbitration problems and error frames caused by the different ECUs. The occurrence of detected errors are important for diagnosis and have to be logged together with the exact timestamp. For this - reason the CAN interface driver can generate so called Error Frames - that can optionally be passed to the user application in the same - way as other CAN frames. Whenever an error on the physical layer + reason the CAN interface driver can generate so called Error Message + Frames that can optionally be passed to the user application in the + same way as other CAN frames. Whenever an error on the physical layer or the MAC layer is detected (e.g. by the CAN controller) the driver - creates an appropriate error frame. Error frames can be requested by - the user application using the common CAN filter mechanisms. Inside - this filter definition the (interested) type of errors may be - selected. The reception of error frames is disabled by default. - The format of the CAN error frame is briefly described in the Linux - header file "include/linux/can/error.h". + creates an appropriate error message frame. Error messages frames can + be requested by the user application using the common CAN filter + mechanisms. Inside this filter definition the (interested) type of + errors may be selected. The reception of error messages is disabled + by default. The format of the CAN error message frame is briefly + described in the Linux header file "include/linux/can/error.h". 4. How to use Socket CAN ------------------------ @@ -383,7 +383,7 @@ solution for a couple of reasons: defaults are set at RAW socket binding time: - The filters are set to exactly one filter receiving everything - - The socket only receives valid data frames (=> no error frames) + - The socket only receives valid data frames (=> no error message frames) - The loopback of sent CAN frames is enabled (see chapter 3.2) - The socket does not receive its own sent frames (in loopback mode) @@ -434,7 +434,7 @@ solution for a couple of reasons: 4.1.2 RAW socket option CAN_RAW_ERR_FILTER As described in chapter 3.4 the CAN interface driver can generate so - called Error Frames that can optionally be passed to the user + called Error Message Frames that can optionally be passed to the user application in the same way as other CAN frames. The possible errors are divided into different error classes that may be filtered using the appropriate error mask. To register for every possible @@ -527,7 +527,7 @@ solution for a couple of reasons: rcvlist_all - list for unfiltered entries (no filter operations) rcvlist_eff - list for single extended frame (EFF) entries - rcvlist_err - list for error frames masks + rcvlist_err - list for error message frames masks rcvlist_fil - list for mask/value filters rcvlist_inv - list for mask/value filters (inverse semantic) rcvlist_sff - list for single standard frame (SFF) entries @@ -784,13 +784,13 @@ solution for a couple of reasons: $ ip link set canX type can restart-ms 100 Alternatively, the application may realize the "bus-off" condition - by monitoring CAN error frames and do a restart when appropriate with - the command: + by monitoring CAN error message frames and do a restart when + appropriate with the command: $ ip link set canX type can restart - Note that a restart will also create a CAN error frame (see also - chapter 3.4). + Note that a restart will also create a CAN error message frame (see + also chapter 3.4). 6.6 Supported CAN hardware -- cgit v1.2.3 From 35b2a113cb0298d4f9a1263338b456094a414057 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 16 May 2012 23:40:18 +0200 Subject: wireless: remove wext sysfs The only user of this was hal prior to its 0.5.12 release which happened over two years ago, so I'm sure this can be removed without issues. Signed-off-by: Johannes Berg Signed-off-by: John W. Linville --- Documentation/feature-removal-schedule.txt | 9 ---- net/core/net-sysfs.c | 74 ------------------------------ net/wireless/Kconfig | 13 ------ 3 files changed, 96 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 56000b33340b..0258e3d06280 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -249,15 +249,6 @@ Who: Ravikiran Thirumalai --------------------------- -What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS - (in net/core/net-sysfs.c) -When: 3.5 -Why: Over 1K .text/.data size reduction, data is available in other - ways (ioctls) -Who: Johannes Berg - ---------------------------- - What: sysfs ui for changing p4-clockmod parameters When: September 2009 Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index fdf9e61d0651..72607174ea5a 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -417,72 +417,6 @@ static struct attribute_group netstat_group = { .name = "statistics", .attrs = netstat_attrs, }; - -#ifdef CONFIG_WIRELESS_EXT_SYSFS -/* helper function that does all the locking etc for wireless stats */ -static ssize_t wireless_show(struct device *d, char *buf, - ssize_t (*format)(const struct iw_statistics *, - char *)) -{ - struct net_device *dev = to_net_dev(d); - const struct iw_statistics *iw; - ssize_t ret = -EINVAL; - - if (!rtnl_trylock()) - return restart_syscall(); - if (dev_isalive(dev)) { - iw = get_wireless_stats(dev); - if (iw) - ret = (*format)(iw, buf); - } - rtnl_unlock(); - - return ret; -} - -/* show function template for wireless fields */ -#define WIRELESS_SHOW(name, field, format_string) \ -static ssize_t format_iw_##name(const struct iw_statistics *iw, char *buf) \ -{ \ - return sprintf(buf, format_string, iw->field); \ -} \ -static ssize_t show_iw_##name(struct device *d, \ - struct device_attribute *attr, char *buf) \ -{ \ - return wireless_show(d, buf, format_iw_##name); \ -} \ -static DEVICE_ATTR(name, S_IRUGO, show_iw_##name, NULL) - -WIRELESS_SHOW(status, status, fmt_hex); -WIRELESS_SHOW(link, qual.qual, fmt_dec); -WIRELESS_SHOW(level, qual.level, fmt_dec); -WIRELESS_SHOW(noise, qual.noise, fmt_dec); -WIRELESS_SHOW(nwid, discard.nwid, fmt_dec); -WIRELESS_SHOW(crypt, discard.code, fmt_dec); -WIRELESS_SHOW(fragment, discard.fragment, fmt_dec); -WIRELESS_SHOW(misc, discard.misc, fmt_dec); -WIRELESS_SHOW(retries, discard.retries, fmt_dec); -WIRELESS_SHOW(beacon, miss.beacon, fmt_dec); - -static struct attribute *wireless_attrs[] = { - &dev_attr_status.attr, - &dev_attr_link.attr, - &dev_attr_level.attr, - &dev_attr_noise.attr, - &dev_attr_nwid.attr, - &dev_attr_crypt.attr, - &dev_attr_fragment.attr, - &dev_attr_retries.attr, - &dev_attr_misc.attr, - &dev_attr_beacon.attr, - NULL -}; - -static struct attribute_group wireless_group = { - .name = "wireless", - .attrs = wireless_attrs, -}; -#endif #endif /* CONFIG_SYSFS */ #ifdef CONFIG_RPS @@ -1463,14 +1397,6 @@ int netdev_register_kobject(struct net_device *net) groups++; *groups++ = &netstat_group; -#ifdef CONFIG_WIRELESS_EXT_SYSFS - if (net->ieee80211_ptr) - *groups++ = &wireless_group; -#ifdef CONFIG_WIRELESS_EXT - else if (net->wireless_handlers) - *groups++ = &wireless_group; -#endif -#endif #endif /* CONFIG_SYSFS */ error = device_add(dev); diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig index 2e4444fedbe0..8dba3c60794a 100644 --- a/net/wireless/Kconfig +++ b/net/wireless/Kconfig @@ -119,19 +119,6 @@ config CFG80211_WEXT Enable this option if you need old userspace for wireless extensions with cfg80211-based drivers. -config WIRELESS_EXT_SYSFS - bool "Wireless extensions sysfs files" - depends on WEXT_CORE && SYSFS - help - This option enables the deprecated wireless statistics - files in /sys/class/net/*/wireless/. The same information - is available via the ioctls as well. - - Say N. If you know you have ancient tools requiring it, - like very old versions of hal (prior to 0.5.12 release), - say Y and update the tools as soon as possible as this - option will be removed soon. - config LIB80211 tristate "Common routines for IEEE802.11 drivers" default n -- cgit v1.2.3 From 10bab00afed042c1a38ed5ffb135e2aea5ce1277 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 16 May 2012 23:40:19 +0200 Subject: cfg80211: deprecate CFG80211_WEXT Almost all wireless tools have transitioned to or at least added compatibility with nl80211 so there's no real need for CONFIG_CFG80211_WEXT any more. Mark it for removal, and also change the default to not be enabled. Signed-off-by: Johannes Berg Signed-off-by: John W. Linville --- Documentation/feature-removal-schedule.txt | 13 +++++++++++++ net/wireless/Kconfig | 1 - 2 files changed, 13 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 0258e3d06280..dec901554ef7 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -440,6 +440,19 @@ Who: Hans Verkuil ---------------------------- +What: CONFIG_CFG80211_WEXT +When: as soon as distributions ship new wireless tools, ie. wpa_supplicant 1.0 + and NetworkManager/connman/etc. that are able to use nl80211 +Why: Wireless extensions are deprecated, and userland tools are moving to + using nl80211. New drivers are no longer using wireless extensions, + and while there might still be old drivers, both new drivers and new + userland no longer needs them and they can't be used for an feature + developed in the past couple of years. As such, compatibility with + wireless extensions in new drivers will be removed. +Who: Johannes Berg + +---------------------------- + What: g_file_storage driver When: 3.8 Why: This driver has been superseded by g_mass_storage. diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig index 8dba3c60794a..4d2b1ec6516f 100644 --- a/net/wireless/Kconfig +++ b/net/wireless/Kconfig @@ -114,7 +114,6 @@ config CFG80211_WEXT bool "cfg80211 wireless extensions compatibility" depends on CFG80211 select WEXT_CORE - default y help Enable this option if you need old userspace for wireless extensions with cfg80211-based drivers. -- cgit v1.2.3 From 7a74c1a18d1f03462f9f766ee8213535ce131b0f Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Sat, 19 May 2012 04:39:00 +0000 Subject: netfilter: remove include/linux/netfilter_ipv4/ipt_addrtype.h It was scheduled to be removed. Acked-by: Florian Westphal Signed-off-by: Cong Wang Signed-off-by: Pablo Neira Ayuso --- Documentation/feature-removal-schedule.txt | 8 -------- include/linux/netfilter_ipv4/Kbuild | 1 - include/linux/netfilter_ipv4/ipt_addrtype.h | 27 --------------------------- 3 files changed, 36 deletions(-) delete mode 100644 include/linux/netfilter_ipv4/ipt_addrtype.h (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 56000b33340b..08e7e71b266c 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -421,14 +421,6 @@ Files: net/netfilter/xt_connlimit.c ---------------------------- -What: ipt_addrtype match include file -When: 2012 -Why: superseded by xt_addrtype -Who: Florian Westphal -Files: include/linux/netfilter_ipv4/ipt_addrtype.h - ----------------------------- - What: i2c_driver.attach_adapter i2c_driver.detach_adapter When: September 2011 diff --git a/include/linux/netfilter_ipv4/Kbuild b/include/linux/netfilter_ipv4/Kbuild index c61b8fb1a9ef..8ba0c5b72ea9 100644 --- a/include/linux/netfilter_ipv4/Kbuild +++ b/include/linux/netfilter_ipv4/Kbuild @@ -5,7 +5,6 @@ header-y += ipt_LOG.h header-y += ipt_REJECT.h header-y += ipt_TTL.h header-y += ipt_ULOG.h -header-y += ipt_addrtype.h header-y += ipt_ah.h header-y += ipt_ecn.h header-y += ipt_ttl.h diff --git a/include/linux/netfilter_ipv4/ipt_addrtype.h b/include/linux/netfilter_ipv4/ipt_addrtype.h deleted file mode 100644 index 0da42237c8da..000000000000 --- a/include/linux/netfilter_ipv4/ipt_addrtype.h +++ /dev/null @@ -1,27 +0,0 @@ -#ifndef _IPT_ADDRTYPE_H -#define _IPT_ADDRTYPE_H - -#include - -enum { - IPT_ADDRTYPE_INVERT_SOURCE = 0x0001, - IPT_ADDRTYPE_INVERT_DEST = 0x0002, - IPT_ADDRTYPE_LIMIT_IFACE_IN = 0x0004, - IPT_ADDRTYPE_LIMIT_IFACE_OUT = 0x0008, -}; - -struct ipt_addrtype_info_v1 { - __u16 source; /* source-type mask */ - __u16 dest; /* dest-type mask */ - __u32 flags; -}; - -/* revision 0 */ -struct ipt_addrtype_info { - __u16 source; /* source-type mask */ - __u16 dest; /* dest-type mask */ - __u32 invert_source; - __u32 invert_dest; -}; - -#endif -- cgit v1.2.3 From 68c07cb6d8aa05daf38ab47d5bb674d81a2066fb Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Sat, 19 May 2012 04:39:01 +0000 Subject: netfilter: xt_connlimit: remove revision 0 It was scheduled to be removed. Cc: Jan Engelhardt Signed-off-by: Cong Wang Signed-off-by: Pablo Neira Ayuso --- Documentation/feature-removal-schedule.txt | 7 ------ include/linux/netfilter/xt_connlimit.h | 9 ++------ net/netfilter/xt_connlimit.c | 35 ++++++++++-------------------- 3 files changed, 13 insertions(+), 38 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 08e7e71b266c..24ac00f4ae4b 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -414,13 +414,6 @@ Who: Jean Delvare ---------------------------- -What: xt_connlimit rev 0 -When: 2012 -Who: Jan Engelhardt -Files: net/netfilter/xt_connlimit.c - ----------------------------- - What: i2c_driver.attach_adapter i2c_driver.detach_adapter When: September 2011 diff --git a/include/linux/netfilter/xt_connlimit.h b/include/linux/netfilter/xt_connlimit.h index d1366f05d1b2..f1656096121e 100644 --- a/include/linux/netfilter/xt_connlimit.h +++ b/include/linux/netfilter/xt_connlimit.h @@ -22,13 +22,8 @@ struct xt_connlimit_info { #endif }; unsigned int limit; - union { - /* revision 0 */ - unsigned int inverse; - - /* revision 1 */ - __u32 flags; - }; + /* revision 1 */ + __u32 flags; /* Used internally by the kernel */ struct xt_connlimit_data *data __attribute__((aligned(8))); diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c index c6d5a83450c9..70b5591a2586 100644 --- a/net/netfilter/xt_connlimit.c +++ b/net/netfilter/xt_connlimit.c @@ -274,38 +274,25 @@ static void connlimit_mt_destroy(const struct xt_mtdtor_param *par) kfree(info->data); } -static struct xt_match connlimit_mt_reg[] __read_mostly = { - { - .name = "connlimit", - .revision = 0, - .family = NFPROTO_UNSPEC, - .checkentry = connlimit_mt_check, - .match = connlimit_mt, - .matchsize = sizeof(struct xt_connlimit_info), - .destroy = connlimit_mt_destroy, - .me = THIS_MODULE, - }, - { - .name = "connlimit", - .revision = 1, - .family = NFPROTO_UNSPEC, - .checkentry = connlimit_mt_check, - .match = connlimit_mt, - .matchsize = sizeof(struct xt_connlimit_info), - .destroy = connlimit_mt_destroy, - .me = THIS_MODULE, - }, +static struct xt_match connlimit_mt_reg __read_mostly = { + .name = "connlimit", + .revision = 1, + .family = NFPROTO_UNSPEC, + .checkentry = connlimit_mt_check, + .match = connlimit_mt, + .matchsize = sizeof(struct xt_connlimit_info), + .destroy = connlimit_mt_destroy, + .me = THIS_MODULE, }; static int __init connlimit_mt_init(void) { - return xt_register_matches(connlimit_mt_reg, - ARRAY_SIZE(connlimit_mt_reg)); + return xt_register_match(&connlimit_mt_reg); } static void __exit connlimit_mt_exit(void) { - xt_unregister_matches(connlimit_mt_reg, ARRAY_SIZE(connlimit_mt_reg)); + xt_unregister_match(&connlimit_mt_reg); } module_init(connlimit_mt_init); -- cgit v1.2.3 From efdedd5426a94b00d23483a1bcb4af3a91c894db Mon Sep 17 00:00:00 2001 From: Denys Fedoryshchenko Date: Thu, 17 May 2012 23:08:57 +0300 Subject: netfilter: xt_recent: add address masking option The mask option allows you put all address belonging that mask into the same recent slot. This can be useful in case that recent is used to detect attacks from the same network segment. Tested for backward compatibility. Signed-off-by: Denys Fedoryshchenko Signed-off-by: Pablo Neira Ayuso --- Documentation/feature-removal-schedule.txt | 7 ++++ include/linux/netfilter.h | 10 +++++ include/linux/netfilter/xt_recent.h | 10 +++++ net/netfilter/xt_recent.c | 62 +++++++++++++++++++++++++----- 4 files changed, 80 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 24ac00f4ae4b..bc4b9c6eb80e 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -574,6 +574,13 @@ Why: Remount currently allows changing bound subsystems and ---------------------------- +What: xt_recent rev 0 +When: 2013 +Who: Pablo Neira Ayuso +Files: net/netfilter/xt_recent.c + +---------------------------- + What: KVM debugfs statistics When: 2013 Why: KVM tracepoints provide mostly equivalent information in a much more diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index ff9c84c29b28..4541f33dbfc3 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -94,6 +94,16 @@ static inline int nf_inet_addr_cmp(const union nf_inet_addr *a1, a1->all[3] == a2->all[3]; } +static inline void nf_inet_addr_mask(const union nf_inet_addr *a1, + union nf_inet_addr *result, + const union nf_inet_addr *mask) +{ + result->all[0] = a1->all[0] & mask->all[0]; + result->all[1] = a1->all[1] & mask->all[1]; + result->all[2] = a1->all[2] & mask->all[2]; + result->all[3] = a1->all[3] & mask->all[3]; +} + extern void netfilter_init(void); /* Largest hook number + 1 */ diff --git a/include/linux/netfilter/xt_recent.h b/include/linux/netfilter/xt_recent.h index 83318e01425e..6ef36c113e89 100644 --- a/include/linux/netfilter/xt_recent.h +++ b/include/linux/netfilter/xt_recent.h @@ -32,4 +32,14 @@ struct xt_recent_mtinfo { __u8 side; }; +struct xt_recent_mtinfo_v1 { + __u32 seconds; + __u32 hit_count; + __u8 check_set; + __u8 invert; + char name[XT_RECENT_NAME_LEN]; + __u8 side; + union nf_inet_addr mask; +}; + #endif /* _LINUX_NETFILTER_XT_RECENT_H */ diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c index fc0d6dbe5d17..ae2ad1eec8d0 100644 --- a/net/netfilter/xt_recent.c +++ b/net/netfilter/xt_recent.c @@ -75,6 +75,7 @@ struct recent_entry { struct recent_table { struct list_head list; char name[XT_RECENT_NAME_LEN]; + union nf_inet_addr mask; unsigned int refcnt; unsigned int entries; struct list_head lru_list; @@ -228,10 +229,10 @@ recent_mt(const struct sk_buff *skb, struct xt_action_param *par) { struct net *net = dev_net(par->in ? par->in : par->out); struct recent_net *recent_net = recent_pernet(net); - const struct xt_recent_mtinfo *info = par->matchinfo; + const struct xt_recent_mtinfo_v1 *info = par->matchinfo; struct recent_table *t; struct recent_entry *e; - union nf_inet_addr addr = {}; + union nf_inet_addr addr = {}, addr_mask; u_int8_t ttl; bool ret = info->invert; @@ -261,12 +262,15 @@ recent_mt(const struct sk_buff *skb, struct xt_action_param *par) spin_lock_bh(&recent_lock); t = recent_table_lookup(recent_net, info->name); - e = recent_entry_lookup(t, &addr, par->family, + + nf_inet_addr_mask(&addr, &addr_mask, &t->mask); + + e = recent_entry_lookup(t, &addr_mask, par->family, (info->check_set & XT_RECENT_TTL) ? ttl : 0); if (e == NULL) { if (!(info->check_set & XT_RECENT_SET)) goto out; - e = recent_entry_init(t, &addr, par->family, ttl); + e = recent_entry_init(t, &addr_mask, par->family, ttl); if (e == NULL) par->hotdrop = true; ret = !ret; @@ -306,10 +310,10 @@ out: return ret; } -static int recent_mt_check(const struct xt_mtchk_param *par) +static int recent_mt_check(const struct xt_mtchk_param *par, + const struct xt_recent_mtinfo_v1 *info) { struct recent_net *recent_net = recent_pernet(par->net); - const struct xt_recent_mtinfo *info = par->matchinfo; struct recent_table *t; #ifdef CONFIG_PROC_FS struct proc_dir_entry *pde; @@ -361,6 +365,8 @@ static int recent_mt_check(const struct xt_mtchk_param *par) goto out; } t->refcnt = 1; + + memcpy(&t->mask, &info->mask, sizeof(t->mask)); strcpy(t->name, info->name); INIT_LIST_HEAD(&t->lru_list); for (i = 0; i < ip_list_hash_size; i++) @@ -385,10 +391,28 @@ out: return ret; } +static int recent_mt_check_v0(const struct xt_mtchk_param *par) +{ + const struct xt_recent_mtinfo_v0 *info_v0 = par->matchinfo; + struct xt_recent_mtinfo_v1 info_v1; + + /* Copy revision 0 structure to revision 1 */ + memcpy(&info_v1, info_v0, sizeof(struct xt_recent_mtinfo)); + /* Set default mask to ensure backward compatible behaviour */ + memset(info_v1.mask.all, 0xFF, sizeof(info_v1.mask.all)); + + return recent_mt_check(par, &info_v1); +} + +static int recent_mt_check_v1(const struct xt_mtchk_param *par) +{ + return recent_mt_check(par, par->matchinfo); +} + static void recent_mt_destroy(const struct xt_mtdtor_param *par) { struct recent_net *recent_net = recent_pernet(par->net); - const struct xt_recent_mtinfo *info = par->matchinfo; + const struct xt_recent_mtinfo_v1 *info = par->matchinfo; struct recent_table *t; mutex_lock(&recent_mutex); @@ -625,7 +649,7 @@ static struct xt_match recent_mt_reg[] __read_mostly = { .family = NFPROTO_IPV4, .match = recent_mt, .matchsize = sizeof(struct xt_recent_mtinfo), - .checkentry = recent_mt_check, + .checkentry = recent_mt_check_v0, .destroy = recent_mt_destroy, .me = THIS_MODULE, }, @@ -635,10 +659,30 @@ static struct xt_match recent_mt_reg[] __read_mostly = { .family = NFPROTO_IPV6, .match = recent_mt, .matchsize = sizeof(struct xt_recent_mtinfo), - .checkentry = recent_mt_check, + .checkentry = recent_mt_check_v0, + .destroy = recent_mt_destroy, + .me = THIS_MODULE, + }, + { + .name = "recent", + .revision = 1, + .family = NFPROTO_IPV4, + .match = recent_mt, + .matchsize = sizeof(struct xt_recent_mtinfo_v1), + .checkentry = recent_mt_check_v1, .destroy = recent_mt_destroy, .me = THIS_MODULE, }, + { + .name = "recent", + .revision = 1, + .family = NFPROTO_IPV6, + .match = recent_mt, + .matchsize = sizeof(struct xt_recent_mtinfo_v1), + .checkentry = recent_mt_check_v1, + .destroy = recent_mt_destroy, + .me = THIS_MODULE, + } }; static int __init recent_mt_init(void) -- cgit v1.2.3 From d0daebc3d622f95db181601cb0c4a0781f74f758 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Tue, 12 Jun 2012 00:44:01 +0000 Subject: ipv4: Add interface option to enable routing of 127.0.0.0/8 Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf Acked-by: Neil Horman Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 5 +++++ include/linux/inetdevice.h | 2 ++ net/ipv4/arp.c | 3 ++- net/ipv4/devinet.c | 5 ++++- net/ipv4/route.c | 30 +++++++++++++++++++++--------- 5 files changed, 34 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 6f896b94abdc..99d0e0504d6e 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -862,6 +862,11 @@ accept_local - BOOLEAN local interfaces over the wire and have them accepted properly. default FALSE +route_localnet - BOOLEAN + Do not consider loopback addresses as martian source or destination + while routing. This enables the use of 127/8 for local routing purposes. + default FALSE + rp_filter - INTEGER 0 - No source validation. 1 - Strict mode as defined in RFC3704 Strict Reverse Path diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index 597f4a9f3240..67f9ddacb70c 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -38,6 +38,7 @@ enum IPV4_DEVCONF_ACCEPT_LOCAL, IPV4_DEVCONF_SRC_VMARK, IPV4_DEVCONF_PROXY_ARP_PVLAN, + IPV4_DEVCONF_ROUTE_LOCALNET, __IPV4_DEVCONF_MAX }; @@ -131,6 +132,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) #define IN_DEV_PROMOTE_SECONDARIES(in_dev) \ IN_DEV_ORCONF((in_dev), \ PROMOTE_SECONDARIES) +#define IN_DEV_ROUTE_LOCALNET(in_dev) IN_DEV_ORCONF(in_dev, ROUTE_LOCALNET) #define IN_DEV_RX_REDIRECTS(in_dev) \ ((IN_DEV_FORWARD(in_dev) && \ diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index cda37be02f8d..2e560f0c757d 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -790,7 +790,8 @@ static int arp_process(struct sk_buff *skb) * Check for bad requests for 127.x.x.x and requests for multicast * addresses. If this is one such, delete it. */ - if (ipv4_is_loopback(tip) || ipv4_is_multicast(tip)) + if (ipv4_is_multicast(tip) || + (!IN_DEV_ROUTE_LOCALNET(in_dev) && ipv4_is_loopback(tip))) goto out; /* diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 10e15a144e95..44bf82e3aef7 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1500,7 +1500,8 @@ static int devinet_conf_proc(ctl_table *ctl, int write, if (cnf == net->ipv4.devconf_dflt) devinet_copy_dflt_conf(net, i); - if (i == IPV4_DEVCONF_ACCEPT_LOCAL - 1) + if (i == IPV4_DEVCONF_ACCEPT_LOCAL - 1 || + i == IPV4_DEVCONF_ROUTE_LOCALNET - 1) if ((new_value == 0) && (old_value != 0)) rt_cache_flush(net, 0); } @@ -1617,6 +1618,8 @@ static struct devinet_sysctl_table { "force_igmp_version"), DEVINET_SYSCTL_FLUSHING_ENTRY(PROMOTE_SECONDARIES, "promote_secondaries"), + DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET, + "route_localnet"), }, }; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 842510d50453..655506af47ca 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1960,9 +1960,13 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr, return -EINVAL; if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) || - ipv4_is_loopback(saddr) || skb->protocol != htons(ETH_P_IP)) + skb->protocol != htons(ETH_P_IP)) goto e_inval; + if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev))) + if (ipv4_is_loopback(saddr)) + goto e_inval; + if (ipv4_is_zeronet(saddr)) { if (!ipv4_is_local_multicast(daddr)) goto e_inval; @@ -2203,8 +2207,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, by fib_lookup. */ - if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) || - ipv4_is_loopback(saddr)) + if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) goto martian_source; if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) @@ -2216,9 +2219,17 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, if (ipv4_is_zeronet(saddr)) goto martian_source; - if (ipv4_is_zeronet(daddr) || ipv4_is_loopback(daddr)) + if (ipv4_is_zeronet(daddr)) goto martian_destination; + if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev))) { + if (ipv4_is_loopback(daddr)) + goto martian_destination; + + if (ipv4_is_loopback(saddr)) + goto martian_source; + } + /* * Now we are ready to route packet. */ @@ -2457,9 +2468,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res, u16 type = res->type; struct rtable *rth; - if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK)) + in_dev = __in_dev_get_rcu(dev_out); + if (!in_dev) return ERR_PTR(-EINVAL); + if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev))) + if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK)) + return ERR_PTR(-EINVAL); + if (ipv4_is_lbcast(fl4->daddr)) type = RTN_BROADCAST; else if (ipv4_is_multicast(fl4->daddr)) @@ -2470,10 +2486,6 @@ static struct rtable *__mkroute_output(const struct fib_result *res, if (dev_out->flags & IFF_LOOPBACK) flags |= RTCF_LOCAL; - in_dev = __in_dev_get_rcu(dev_out); - if (!in_dev) - return ERR_PTR(-EINVAL); - if (type == RTN_BROADCAST) { flags |= RTCF_BROADCAST | RTCF_LOCAL; fi = NULL; -- cgit v1.2.3 From f8214865a55f805e65c33350bc0f1eb46dd8433d Mon Sep 17 00:00:00 2001 From: Martin Hundebøll Date: Fri, 20 Apr 2012 17:02:45 +0200 Subject: batman-adv: Add get_ethtool_stats() support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added additional counters in a bat_stats structure, which are exported through the ethtool api. The counters are specific to batman-adv and includes: forwarded packets and bytes management packets and bytes (aggregated OGMs at this point) translation table packets New counters are added by extending "enum bat_counters" in types.h and adding corresponding descriptive string(s) to bat_counters_strings in soft-iface.c. Counters are increased by calling batadv_add_counter() and incremented by one by calling batadv_inc_counter(). Signed-off-by: Martin Hundebøll Signed-off-by: Sven Eckelmann --- Documentation/networking/batman-adv.txt | 5 +++ net/batman-adv/bat_iv_ogm.c | 10 ++++- net/batman-adv/main.c | 2 + net/batman-adv/main.h | 27 ++++++++++++++ net/batman-adv/routing.c | 11 ++++++ net/batman-adv/soft-interface.c | 66 ++++++++++++++++++++++++++++++++- net/batman-adv/translation-table.c | 8 ++++ net/batman-adv/types.h | 17 +++++++++ 8 files changed, 143 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index 75a592365af9..8f3ae4a6147e 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -211,6 +211,11 @@ The debug output can be changed at runtime using the file will enable debug messages for when routes change. +Counters for different types of packets entering and leaving the +batman-adv module are available through ethtool: + +# ethtool --statistics bat0 + BATCTL ------ diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index ec351199c652..99ec218df7f2 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -196,8 +196,12 @@ static void bat_iv_ogm_send_to_if(struct forw_packet *forw_packet, /* create clone because function is called more than once */ skb = skb_clone(forw_packet->skb, GFP_ATOMIC); - if (skb) + if (skb) { + batadv_inc_counter(bat_priv, BAT_CNT_MGMT_TX); + batadv_add_counter(bat_priv, BAT_CNT_MGMT_TX_BYTES, + skb->len + ETH_HLEN); send_skb_packet(skb, hard_iface, broadcast_addr); + } } /* send a batman ogm packet */ @@ -1203,6 +1207,10 @@ static int bat_iv_ogm_receive(struct sk_buff *skb, if (bat_priv->bat_algo_ops->bat_ogm_emit != bat_iv_ogm_emit) return NET_RX_DROP; + batadv_inc_counter(bat_priv, BAT_CNT_MGMT_RX); + batadv_add_counter(bat_priv, BAT_CNT_MGMT_RX_BYTES, + skb->len + ETH_HLEN); + packet_len = skb_headlen(skb); ethhdr = (struct ethhdr *)skb_mac_header(skb); packet_buff = skb->data; diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c index 083a2993efe4..bd83aa4e7c87 100644 --- a/net/batman-adv/main.c +++ b/net/batman-adv/main.c @@ -153,6 +153,8 @@ void mesh_free(struct net_device *soft_iface) bla_free(bat_priv); + free_percpu(bat_priv->bat_counters); + atomic_set(&bat_priv->mesh_state, MESH_INACTIVE); } diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h index 630bbe8968ca..6e0cbdc48321 100644 --- a/net/batman-adv/main.h +++ b/net/batman-adv/main.h @@ -138,6 +138,7 @@ enum dbg_level { #include /* kernel threads */ #include /* schedule types */ #include /* workqueue */ +#include #include #include /* struct sock */ #include @@ -242,4 +243,30 @@ static inline bool has_timed_out(unsigned long timestamp, unsigned int timeout) _dummy > smallest_signed_int(_dummy); }) #define seq_after(x, y) seq_before(y, x) +/* Stop preemption on local cpu while incrementing the counter */ +static inline void batadv_add_counter(struct bat_priv *bat_priv, size_t idx, + size_t count) +{ + int cpu = get_cpu(); + per_cpu_ptr(bat_priv->bat_counters, cpu)[idx] += count; + put_cpu(); +} + +#define batadv_inc_counter(b, i) batadv_add_counter(b, i, 1) + +/* Sum and return the cpu-local counters for index 'idx' */ +static inline uint64_t batadv_sum_counter(struct bat_priv *bat_priv, size_t idx) +{ + uint64_t *counters; + int cpu; + int sum = 0; + + for_each_possible_cpu(cpu) { + counters = per_cpu_ptr(bat_priv->bat_counters, cpu); + sum += counters[idx]; + } + + return sum; +} + #endif /* _NET_BATMAN_ADV_MAIN_H_ */ diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index 015471d801b4..369604c99a46 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -600,6 +600,8 @@ int recv_tt_query(struct sk_buff *skb, struct hard_iface *recv_if) switch (tt_query->flags & TT_QUERY_TYPE_MASK) { case TT_REQUEST: + batadv_inc_counter(bat_priv, BAT_CNT_TT_REQUEST_RX); + /* If we cannot provide an answer the tt_request is * forwarded */ if (!send_tt_response(bat_priv, tt_query)) { @@ -612,6 +614,8 @@ int recv_tt_query(struct sk_buff *skb, struct hard_iface *recv_if) } break; case TT_RESPONSE: + batadv_inc_counter(bat_priv, BAT_CNT_TT_RESPONSE_RX); + if (is_my_mac(tt_query->dst)) { /* packet needs to be linearized to access the TT * changes */ @@ -665,6 +669,8 @@ int recv_roam_adv(struct sk_buff *skb, struct hard_iface *recv_if) if (is_broadcast_ether_addr(ethhdr->h_source)) goto out; + batadv_inc_counter(bat_priv, BAT_CNT_TT_ROAM_ADV_RX); + roam_adv_packet = (struct roam_adv_packet *)skb->data; if (!is_my_mac(roam_adv_packet->dst)) @@ -872,6 +878,11 @@ static int route_unicast_packet(struct sk_buff *skb, struct hard_iface *recv_if) /* decrement ttl */ unicast_packet->header.ttl--; + /* Update stats counter */ + batadv_inc_counter(bat_priv, BAT_CNT_FORWARD); + batadv_add_counter(bat_priv, BAT_CNT_FORWARD_BYTES, + skb->len + ETH_HLEN); + /* route it */ send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); ret = NET_RX_SUCCESS; diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 6e2530b02043..304a7ba09e03 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -45,6 +45,10 @@ static void bat_get_drvinfo(struct net_device *dev, static u32 bat_get_msglevel(struct net_device *dev); static void bat_set_msglevel(struct net_device *dev, u32 value); static u32 bat_get_link(struct net_device *dev); +static void batadv_get_strings(struct net_device *dev, u32 stringset, u8 *data); +static void batadv_get_ethtool_stats(struct net_device *dev, + struct ethtool_stats *stats, u64 *data); +static int batadv_get_sset_count(struct net_device *dev, int stringset); static const struct ethtool_ops bat_ethtool_ops = { .get_settings = bat_get_settings, @@ -52,6 +56,9 @@ static const struct ethtool_ops bat_ethtool_ops = { .get_msglevel = bat_get_msglevel, .set_msglevel = bat_set_msglevel, .get_link = bat_get_link, + .get_strings = batadv_get_strings, + .get_ethtool_stats = batadv_get_ethtool_stats, + .get_sset_count = batadv_get_sset_count, }; int my_skb_head_push(struct sk_buff *skb, unsigned int len) @@ -399,13 +406,18 @@ struct net_device *softif_create(const char *name) bat_priv->primary_if = NULL; bat_priv->num_ifaces = 0; + bat_priv->bat_counters = __alloc_percpu(sizeof(uint64_t) * BAT_CNT_NUM, + __alignof__(uint64_t)); + if (!bat_priv->bat_counters) + goto unreg_soft_iface; + ret = bat_algo_select(bat_priv, bat_routing_algo); if (ret < 0) - goto unreg_soft_iface; + goto free_bat_counters; ret = sysfs_add_meshif(soft_iface); if (ret < 0) - goto unreg_soft_iface; + goto free_bat_counters; ret = debugfs_add_meshif(soft_iface); if (ret < 0) @@ -421,6 +433,8 @@ unreg_debugfs: debugfs_del_meshif(soft_iface); unreg_sysfs: sysfs_del_meshif(soft_iface); +free_bat_counters: + free_percpu(bat_priv->bat_counters); unreg_soft_iface: unregister_netdevice(soft_iface); return NULL; @@ -486,3 +500,51 @@ static u32 bat_get_link(struct net_device *dev) { return 1; } + +/* Inspired by drivers/net/ethernet/dlink/sundance.c:1702 + * Declare each description string in struct.name[] to get fixed sized buffer + * and compile time checking for strings longer than ETH_GSTRING_LEN. + */ +static const struct { + const char name[ETH_GSTRING_LEN]; +} bat_counters_strings[] = { + { "forward" }, + { "forward_bytes" }, + { "mgmt_tx" }, + { "mgmt_tx_bytes" }, + { "mgmt_rx" }, + { "mgmt_rx_bytes" }, + { "tt_request_tx" }, + { "tt_request_rx" }, + { "tt_response_tx" }, + { "tt_response_rx" }, + { "tt_roam_adv_tx" }, + { "tt_roam_adv_rx" }, +}; + +static void batadv_get_strings(struct net_device *dev, uint32_t stringset, + uint8_t *data) +{ + if (stringset == ETH_SS_STATS) + memcpy(data, bat_counters_strings, + sizeof(bat_counters_strings)); +} + +static void batadv_get_ethtool_stats(struct net_device *dev, + struct ethtool_stats *stats, + uint64_t *data) +{ + struct bat_priv *bat_priv = netdev_priv(dev); + int i; + + for (i = 0; i < BAT_CNT_NUM; i++) + data[i] = batadv_sum_counter(bat_priv, i); +} + +static int batadv_get_sset_count(struct net_device *dev, int stringset) +{ + if (stringset == ETH_SS_STATS) + return BAT_CNT_NUM; + + return -EOPNOTSUPP; +} diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index a66c2dcd1088..ca53542e1e8e 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -1356,6 +1356,8 @@ static int send_tt_request(struct bat_priv *bat_priv, dst_orig_node->orig, neigh_node->addr, (full_table ? 'F' : '.')); + batadv_inc_counter(bat_priv, BAT_CNT_TT_REQUEST_TX); + send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); ret = 0; @@ -1480,6 +1482,8 @@ static bool send_other_tt_response(struct bat_priv *bat_priv, res_dst_orig_node->orig, neigh_node->addr, req_dst_orig_node->orig, req_ttvn); + batadv_inc_counter(bat_priv, BAT_CNT_TT_RESPONSE_TX); + send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); ret = true; goto out; @@ -1596,6 +1600,8 @@ static bool send_my_tt_response(struct bat_priv *bat_priv, orig_node->orig, neigh_node->addr, (tt_response->flags & TT_FULL_TABLE ? 'F' : '.')); + batadv_inc_counter(bat_priv, BAT_CNT_TT_RESPONSE_TX); + send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); ret = true; goto out; @@ -1895,6 +1901,8 @@ static void send_roam_adv(struct bat_priv *bat_priv, uint8_t *client, "Sending ROAMING_ADV to %pM (client %pM) via %pM\n", orig_node->orig, client, neigh_node->addr); + batadv_inc_counter(bat_priv, BAT_CNT_TT_ROAM_ADV_TX); + send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); ret = 0; diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 547dc339f33d..6b569debc1a6 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -148,9 +148,26 @@ struct bcast_duplist_entry { }; #endif +enum bat_counters { + BAT_CNT_FORWARD, + BAT_CNT_FORWARD_BYTES, + BAT_CNT_MGMT_TX, + BAT_CNT_MGMT_TX_BYTES, + BAT_CNT_MGMT_RX, + BAT_CNT_MGMT_RX_BYTES, + BAT_CNT_TT_REQUEST_TX, + BAT_CNT_TT_REQUEST_RX, + BAT_CNT_TT_RESPONSE_TX, + BAT_CNT_TT_RESPONSE_RX, + BAT_CNT_TT_ROAM_ADV_TX, + BAT_CNT_TT_ROAM_ADV_RX, + BAT_CNT_NUM, +}; + struct bat_priv { atomic_t mesh_state; struct net_device_stats stats; + uint64_t __percpu *bat_counters; /* Per cpu counters */ atomic_t aggregated_ogms; /* boolean */ atomic_t bonding; /* boolean */ atomic_t fragmentation; /* boolean */ -- cgit v1.2.3 From ea53fe0c667ad3cae61d4d71d2be41908ac5c0a4 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Sat, 16 Jun 2012 12:01:58 +0200 Subject: canfd: update documentation according to CAN FD extensions Signed-off-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- Documentation/networking/can.txt | 154 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index a06741898f29..820f55344edc 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -22,7 +22,8 @@ This file contains 4.1.2 RAW socket option CAN_RAW_ERR_FILTER 4.1.3 RAW socket option CAN_RAW_LOOPBACK 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS - 4.1.5 RAW socket returned message flags + 4.1.5 RAW socket option CAN_RAW_FD_FRAMES + 4.1.6 RAW socket returned message flags 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) 4.3 connected transport protocols (SOCK_SEQPACKET) 4.4 unconnected transport protocols (SOCK_DGRAM) @@ -41,7 +42,8 @@ This file contains 6.5.1 Netlink interface to set/get devices properties 6.5.2 Setting the CAN bit-timing 6.5.3 Starting and stopping the CAN network device - 6.6 supported CAN hardware + 6.6 CAN FD (flexible data rate) driver support + 6.7 supported CAN hardware 7 Socket CAN resources @@ -273,7 +275,7 @@ solution for a couple of reasons: struct can_frame { canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ - __u8 can_dlc; /* data length code: 0 .. 8 */ + __u8 can_dlc; /* frame payload length in byte (0 .. 8) */ __u8 data[8] __attribute__((aligned(8))); }; @@ -375,6 +377,51 @@ solution for a couple of reasons: nbytes = sendto(s, &frame, sizeof(struct can_frame), 0, (struct sockaddr*)&addr, sizeof(addr)); + Remark about CAN FD (flexible data rate) support: + + Generally the handling of CAN FD is very similar to the formerly described + examples. The new CAN FD capable CAN controllers support two different + bitrates for the arbitration phase and the payload phase of the CAN FD frame + and up to 64 bytes of payload. This extended payload length breaks all the + kernel interfaces (ABI) which heavily rely on the CAN frame with fixed eight + bytes of payload (struct can_frame) like the CAN_RAW socket. Therefore e.g. + the CAN_RAW socket supports a new socket option CAN_RAW_FD_FRAMES that + switches the socket into a mode that allows the handling of CAN FD frames + and (legacy) CAN frames simultaneously (see section 4.1.5). + + The struct canfd_frame is defined in include/linux/can.h: + + struct canfd_frame { + canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ + __u8 len; /* frame payload length in byte (0 .. 64) */ + __u8 flags; /* additional flags for CAN FD */ + __u8 __res0; /* reserved / padding */ + __u8 __res1; /* reserved / padding */ + __u8 data[64] __attribute__((aligned(8))); + }; + + The struct canfd_frame and the existing struct can_frame have the can_id, + the payload length and the payload data at the same offset inside their + structures. This allows to handle the different structures very similar. + When the content of a struct can_frame is copied into a struct canfd_frame + all structure elements can be used as-is - only the data[] becomes extended. + + When introducing the struct canfd_frame it turned out that the data length + code (DLC) of the struct can_frame was used as a length information as the + length and the DLC has a 1:1 mapping in the range of 0 .. 8. To preserve + the easy handling of the length information the canfd_frame.len element + contains a plain length value from 0 .. 64. So both canfd_frame.len and + can_frame.can_dlc are equal and contain a length information and no DLC. + For details about the distinction of CAN and CAN FD capable devices and + the mapping to the bus-relevant data length code (DLC), see chapter 6.6. + + The length of the two CAN(FD) frame structures define the maximum transfer + unit (MTU) of the CAN(FD) network interface and skbuff data length. Two + definitions are specified for CAN specific MTUs in include/linux/can.h : + + #define CAN_MTU (sizeof(struct can_frame)) == 16 => 'legacy' CAN frame + #define CANFD_MTU (sizeof(struct canfd_frame)) == 72 => CAN FD frame + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) Using CAN_RAW sockets is extensively comparable to the commonly @@ -472,7 +519,69 @@ solution for a couple of reasons: setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, &recv_own_msgs, sizeof(recv_own_msgs)); - 4.1.5 RAW socket returned message flags + 4.1.5 RAW socket option CAN_RAW_FD_FRAMES + + CAN FD support in CAN_RAW sockets can be enabled with a new socket option + CAN_RAW_FD_FRAMES which is off by default. When the new socket option is + not supported by the CAN_RAW socket (e.g. on older kernels), switching the + CAN_RAW_FD_FRAMES option returns the error -ENOPROTOOPT. + + Once CAN_RAW_FD_FRAMES is enabled the application can send both CAN frames + and CAN FD frames. OTOH the application has to handle CAN and CAN FD frames + when reading from the socket. + + CAN_RAW_FD_FRAMES enabled: CAN_MTU and CANFD_MTU are allowed + CAN_RAW_FD_FRAMES disabled: only CAN_MTU is allowed (default) + + Example: + [ remember: CANFD_MTU == sizeof(struct canfd_frame) ] + + struct canfd_frame cfd; + + nbytes = read(s, &cfd, CANFD_MTU); + + if (nbytes == CANFD_MTU) { + printf("got CAN FD frame with length %d\n", cfd.len); + /* cfd.flags contains valid data */ + } else if (nbytes == CAN_MTU) { + printf("got legacy CAN frame with length %d\n", cfd.len); + /* cfd.flags is undefined */ + } else { + fprintf(stderr, "read: invalid CAN(FD) frame\n"); + return 1; + } + + /* the content can be handled independently from the received MTU size */ + + printf("can_id: %X data length: %d data: ", cfd.can_id, cfd.len); + for (i = 0; i < cfd.len; i++) + printf("%02X ", cfd.data[i]); + + When reading with size CANFD_MTU only returns CAN_MTU bytes that have + been received from the socket a legacy CAN frame has been read into the + provided CAN FD structure. Note that the canfd_frame.flags data field is + not specified in the struct can_frame and therefore it is only valid in + CANFD_MTU sized CAN FD frames. + + As long as the payload length is <=8 the received CAN frames from CAN FD + capable CAN devices can be received and read by legacy sockets too. When + user-generated CAN FD frames have a payload length <=8 these can be send + by legacy CAN network interfaces too. Sending CAN FD frames with payload + length > 8 to a legacy CAN network interface returns an -EMSGSIZE error. + + Implementation hint for new CAN applications: + + To build a CAN FD aware application use struct canfd_frame as basic CAN + data structure for CAN_RAW based applications. When the application is + executed on an older Linux kernel and switching the CAN_RAW_FD_FRAMES + socket option returns an error: No problem. You'll get legacy CAN frames + or CAN FD frames and can process them the same way. + + When sending to CAN devices make sure that the device is capable to handle + CAN FD frames by checking if the device maximum transfer unit is CANFD_MTU. + The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall. + + 4.1.6 RAW socket returned message flags When using recvmsg() call, the msg->msg_flags may contain following flags: @@ -573,10 +682,13 @@ solution for a couple of reasons: dev->type = ARPHRD_CAN; /* the netdevice hardware type */ dev->flags = IFF_NOARP; /* CAN has no arp */ - dev->mtu = sizeof(struct can_frame); + dev->mtu = CAN_MTU; /* sizeof(struct can_frame) -> legacy CAN interface */ - The struct can_frame is the payload of each socket buffer in the - protocol family PF_CAN. + or alternative, when the controller supports CAN with flexible data rate: + dev->mtu = CANFD_MTU; /* sizeof(struct canfd_frame) -> CAN FD interface */ + + The struct can_frame or struct canfd_frame is the payload of each socket + buffer (skbuff) in the protocol family PF_CAN. 6.2 local loopback of sent frames @@ -792,7 +904,33 @@ solution for a couple of reasons: Note that a restart will also create a CAN error message frame (see also chapter 3.4). - 6.6 Supported CAN hardware + 6.6 CAN FD (flexible data rate) driver support + + CAN FD capable CAN controllers support two different bitrates for the + arbitration phase and the payload phase of the CAN FD frame. Therefore a + second bittiming has to be specified in order to enable the CAN FD bitrate. + + Additionally CAN FD capable CAN controllers support up to 64 bytes of + payload. The representation of this length in can_frame.can_dlc and + canfd_frame.len for userspace applications and inside the Linux network + layer is a plain value from 0 .. 64 instead of the CAN 'data length code'. + The data length code was a 1:1 mapping to the payload length in the legacy + CAN frames anyway. The payload length to the bus-relevant DLC mapping is + only performed inside the CAN drivers, preferably with the helper + functions can_dlc2len() and can_len2dlc(). + + The CAN netdevice driver capabilities can be distinguished by the network + devices maximum transfer unit (MTU): + + MTU = 16 (CAN_MTU) => sizeof(struct can_frame) => 'legacy' CAN device + MTU = 72 (CANFD_MTU) => sizeof(struct canfd_frame) => CAN FD capable device + + The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall. + N.B. CAN FD capable devices can also handle and send legacy CAN frames. + + FIXME: Add details about the CAN FD controller configuration when available. + + 6.7 Supported CAN hardware Please check the "Kconfig" file in "drivers/net/can" to get an actual list of the support CAN hardware. On the Socket CAN project website -- cgit v1.2.3 From 5051c94bb3998ff24bf07ae3b72dca30f85962f8 Mon Sep 17 00:00:00 2001 From: Sjur Brændeland Date: Mon, 25 Jun 2012 07:49:40 +0000 Subject: Documentation/networking/caif: Update documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update drawing and remove description of old features. Add HSI and USB link layers to the drawing. Reported-by: Joerg Reisenweber Signed-off-by: Sjur Brændeland Signed-off-by: David S. Miller --- Documentation/networking/caif/Linux-CAIF.txt | 91 +++++++++------------------- 1 file changed, 27 insertions(+), 64 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/caif/Linux-CAIF.txt b/Documentation/networking/caif/Linux-CAIF.txt index e52fd62bef3a..0aa4bd381bec 100644 --- a/Documentation/networking/caif/Linux-CAIF.txt +++ b/Documentation/networking/caif/Linux-CAIF.txt @@ -19,60 +19,36 @@ and host. Currently, UART and Loopback are available for Linux. Architecture: ------------ The implementation of CAIF is divided into: -* CAIF Socket Layer, Kernel API, and Net Device. +* CAIF Socket Layer and GPRS IP Interface. * CAIF Core Protocol Implementation * CAIF Link Layer, implemented as NET devices. RTNL ! - ! +------+ +------+ +------+ - ! +------+! +------+! +------+! - ! ! Sock !! !Kernel!! ! Net !! - ! ! API !+ ! API !+ ! Dev !+ <- CAIF Client APIs - ! +------+ +------! +------+ - ! ! ! ! - ! +----------!----------+ - ! +------+ <- CAIF Protocol Implementation - +-------> ! CAIF ! - ! Core ! - +------+ - +--------!--------+ - ! ! - +------+ +-----+ - ! ! ! TTY ! <- Link Layer (Net Devices) - +------+ +-----+ - - -Using the Kernel API ----------------------- -The Kernel API is used for accessing CAIF channels from the -kernel. -The user of the API has to implement two callbacks for receive -and control. -The receive callback gives a CAIF packet as a SKB. The control -callback will -notify of channel initialization complete, and flow-on/flow- -off. - - - struct caif_device caif_dev = { - .caif_config = { - .name = "MYDEV" - .type = CAIF_CHTY_AT - } - .receive_cb = my_receive, - .control_cb = my_control, - }; - caif_add_device(&caif_dev); - caif_transmit(&caif_dev, skb); - -See the caif_kernel.h for details about the CAIF kernel API. + ! +------+ +------+ + ! +------+! +------+! + ! ! IP !! !Socket!! + +-------> !interf!+ ! API !+ <- CAIF Client APIs + ! +------+ +------! + ! ! ! + ! +-----------+ + ! ! + ! +------+ <- CAIF Core Protocol + ! ! CAIF ! + ! ! Core ! + ! +------+ + ! +----------!---------+ + ! ! ! ! + ! +------+ +-----+ +------+ + +--> ! HSI ! ! TTY ! ! USB ! <- Link Layer (Net Devices) + +------+ +-----+ +------+ + I M P L E M E N T A T I O N =========================== -=========================== + CAIF Core Protocol Layer ========================================= @@ -88,17 +64,13 @@ The Core CAIF implementation contains: - Simple implementation of CAIF. - Layered architecture (a la Streams), each layer in the CAIF specification is implemented in a separate c-file. - - Clients must implement PHY layer to access physical HW - with receive and transmit functions. - Clients must call configuration function to add PHY layer. - Clients must implement CAIF layer to consume/produce CAIF payload with receive and transmit functions. - Clients must call configuration function to add and connect the Client layer. - When receiving / transmitting CAIF Packets (cfpkt), ownership is passed - to the called function (except for framing layers' receive functions - or if a transmit function returns an error, in which case the caller - must free the packet). + to the called function (except for framing layers' receive function) Layered Architecture -------------------- @@ -109,11 +81,6 @@ Implementation. The support functions include: CAIF Packet has functions for creating, destroying and adding content and for adding/extracting header and trailers to protocol packets. - - CFLST CAIF list implementation. - - - CFGLUE CAIF Glue. Contains OS Specifics, such as memory - allocation, endianness, etc. - The CAIF Protocol implementation contains: - CFCNFG CAIF Configuration layer. Configures the CAIF Protocol @@ -128,7 +95,7 @@ The CAIF Protocol implementation contains: control and remote shutdown requests. - CFVEI CAIF VEI layer. Handles CAIF AT Channels on VEI (Virtual - External Interface). This layer encodes/decodes VEI frames. + External Interface). This layer encodes/decodes VEI frames. - CFDGML CAIF Datagram layer. Handles CAIF Datagram layer (IP traffic), encodes/decodes Datagram frames. @@ -170,7 +137,7 @@ The CAIF Protocol implementation contains: +---------+ +---------+ ! ! +---------+ +---------+ - | | | Serial | + | | | Serial | | | | CFSERL | +---------+ +---------+ @@ -186,24 +153,20 @@ In this layered approach the following "rules" apply. layer->dn->transmit(layer->dn, packet); -Linux Driver Implementation +CAIF Socket and IP interface =========================== -Linux GPRS Net Device and CAIF socket are implemented on top of the -CAIF Core protocol. The Net device and CAIF socket have an instance of +The IP interface and CAIF socket API are implemented on top of the +CAIF Core protocol. The IP Interface and CAIF socket have an instance of 'struct cflayer', just like the CAIF Core protocol stack. Net device and Socket implement the 'receive()' function defined by 'struct cflayer', just like the rest of the CAIF stack. In this way, transmit and receive of packets is handled as by the rest of the layers: the 'dn->transmit()' function is called in order to transmit data. -The layer on top of the CAIF Core implementation is -sometimes referred to as the "Client layer". - - Configuration of Link Layer --------------------------- -The Link Layer is implemented as Linux net devices (struct net_device). +The Link Layer is implemented as Linux network devices (struct net_device). Payload handling and registration is done using standard Linux mechanisms. The CAIF Protocol relies on a loss-less link layer without implementing -- cgit v1.2.3 From 8786395c6956ae16cd04cc8c55e0f5fcd45fa939 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Tue, 26 Jun 2012 21:19:02 -0700 Subject: connector: Move cn_test.c away from NLMSG_PUT(). And use nlmsg_data() while we're here too. Signed-off-by: David S. Miller --- Documentation/connector/cn_test.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c index 7764594778d4..adcca0368d60 100644 --- a/Documentation/connector/cn_test.c +++ b/Documentation/connector/cn_test.c @@ -69,9 +69,13 @@ static int cn_test_want_notify(void) return -ENOMEM; } - nlh = NLMSG_PUT(skb, 0, 0x123, NLMSG_DONE, size - sizeof(*nlh)); + nlh = nlmsg_put(skb, 0, 0x123, NLMSG_DONE, size - sizeof(*nlh), 0); + if (!nlh) { + kfree_skb(skb); + return -EMSGSIZE; + } - msg = (struct cn_msg *)NLMSG_DATA(nlh); + msg = nlmsg_data(nlh); memset(msg, 0, size0); @@ -117,11 +121,6 @@ static int cn_test_want_notify(void) pr_info("request was sent: group=0x%x\n", ctl->group); return 0; - -nlmsg_failure: - pr_err("failed to send %u.%u\n", msg->seq, msg->ack); - kfree_skb(skb); - return -EINVAL; } #endif -- cgit v1.2.3 From c9040af264de8a425b0efabfcc18a6ce47d0f21f Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Wed, 27 Jun 2012 03:45:23 +0000 Subject: net: fec: phy-reset-gpios is optional The phy-reset-gpios is an optional property for fec device tree boot. Change the binding document to match the driver code. Signed-off-by: Shawn Guo Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/fsl-fec.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt index 7ab9e1a2d8be..0428920aacc7 100644 --- a/Documentation/devicetree/bindings/net/fsl-fec.txt +++ b/Documentation/devicetree/bindings/net/fsl-fec.txt @@ -7,10 +7,10 @@ Required properties: - phy-mode : String, operation mode of the PHY interface. Supported values are: "mii", "gmii", "sgmii", "tbi", "rmii", "rgmii", "rgmii-id", "rgmii-rxid", "rgmii-txid", "rtbi", "smii". -- phy-reset-gpios : Should specify the gpio for phy reset Optional properties: - local-mac-address : 6 bytes, mac address +- phy-reset-gpios : Should specify the gpio for phy reset Example: -- cgit v1.2.3 From a3caad0a160c03b7238a2518fa89abda78adef1e Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Wed, 27 Jun 2012 03:45:24 +0000 Subject: net: fec: add phy-reset-duration for device tree probe Different boards may require different phy reset duration. Add property phy-reset-duration for device tree probe, so that the boards that need a longer reset duration can specify it in their device tree. Signed-off-by: Shawn Guo Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/fsl-fec.txt | 4 ++++ drivers/net/ethernet/freescale/fec.c | 8 +++++++- 2 files changed, 11 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt index 0428920aacc7..f7a2fefc8ef1 100644 --- a/Documentation/devicetree/bindings/net/fsl-fec.txt +++ b/Documentation/devicetree/bindings/net/fsl-fec.txt @@ -11,6 +11,10 @@ Required properties: Optional properties: - local-mac-address : 6 bytes, mac address - phy-reset-gpios : Should specify the gpio for phy reset +- phy-reset-duration : Reset duration in milliseconds. Should present + only if property "phy-reset-gpios" is available. Missing the property + will have the duration be 1 millisecond. Numbers greater than 1000 are + invalid and 1 millisecond will be used instead. Example: diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c index f174070646f5..dafd797a6069 100644 --- a/drivers/net/ethernet/freescale/fec.c +++ b/drivers/net/ethernet/freescale/fec.c @@ -1507,11 +1507,17 @@ static int __devinit fec_get_phy_mode_dt(struct platform_device *pdev) static void __devinit fec_reset_phy(struct platform_device *pdev) { int err, phy_reset; + int msec = 1; struct device_node *np = pdev->dev.of_node; if (!np) return; + of_property_read_u32(np, "phy-reset-duration", &msec); + /* A sane reset duration should not be longer than 1s */ + if (msec > 1000) + msec = 1; + phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0); err = devm_gpio_request_one(&pdev->dev, phy_reset, GPIOF_OUT_INIT_LOW, "phy-reset"); @@ -1519,7 +1525,7 @@ static void __devinit fec_reset_phy(struct platform_device *pdev) pr_debug("FEC: failed to get gpio phy-reset: %d\n", err); return; } - msleep(1); + msleep(msec); gpio_set_value(phy_reset, 1); } #else /* CONFIG_OF */ -- cgit v1.2.3 From 6bd47ac2e434611e52027155438d7b4ad3c76bdb Mon Sep 17 00:00:00 2001 From: David Daney Date: Wed, 27 Jun 2012 07:33:36 +0000 Subject: netdev/phy/of: Handle IEEE802.3 clause 45 Ethernet PHYs in of_mdiobus_register() Define two new "compatible" values for Ethernet PHYs. "ethernet-phy-ieee802.3-c22" and "ethernet-phy-ieee802.3-c45" are used to indicate a PHY uses the corresponding protocol. If a PHY is "compatible" with "ethernet-phy-ieee802.3-c45", we indicate this so that get_phy_device() can properly probe the device. If get_phy_device() fails, it was probably due to failing the probe of the PHY identifier registers. Since we have the device tree telling us the PHY exists, go ahead and add it anyhow with a phy_id of zero. There may be a driver match based on the "compatible" property. Signed-off-by: David Daney Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/phy.txt | 12 +++++++++++- drivers/of/of_mdio.c | 16 ++++++++++++---- 2 files changed, 23 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt index bb8c742eb8c5..7cd18fbfcf71 100644 --- a/Documentation/devicetree/bindings/net/phy.txt +++ b/Documentation/devicetree/bindings/net/phy.txt @@ -14,10 +14,20 @@ Required properties: - linux,phandle : phandle for this node; likely referenced by an ethernet controller node. +Optional Properties: + +- compatible: Compatible list, may contain + "ethernet-phy-ieee802.3-c22" or "ethernet-phy-ieee802.3-c45" for + PHYs that implement IEEE802.3 clause 22 or IEEE802.3 clause 45 + specifications. If neither of these are specified, the default is to + assume clause 22. The compatible list may also contain other + elements. + Example: ethernet-phy@0 { - linux,phandle = <2452000> + compatible = "ethernet-phy-ieee802.3-c22"; + linux,phandle = <2452000>; interrupt-parent = <40000>; interrupts = <35 1>; reg = <0>; diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c index 6c24cad322df..8e6c25f35040 100644 --- a/drivers/of/of_mdio.c +++ b/drivers/of/of_mdio.c @@ -57,6 +57,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np) const __be32 *paddr; u32 addr; int len; + bool is_c45; /* A PHY must have a reg property in the range [0-31] */ paddr = of_get_property(child, "reg", &len); @@ -79,11 +80,18 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np) mdio->irq[addr] = PHY_POLL; } - phy = get_phy_device(mdio, addr, false); + is_c45 = of_device_is_compatible(child, + "ethernet-phy-ieee802.3-c45"); + phy = get_phy_device(mdio, addr, is_c45); + if (!phy || IS_ERR(phy)) { - dev_err(&mdio->dev, "error probing PHY at address %i\n", - addr); - continue; + phy = phy_device_create(mdio, addr, 0, false, NULL); + if (!phy || IS_ERR(phy)) { + dev_err(&mdio->dev, + "error creating PHY at address %i\n", + addr); + continue; + } } /* Associate the OF node with the device structure so it -- cgit v1.2.3 From e9976d7c96423ac1991396aa82335206ded55bcf Mon Sep 17 00:00:00 2001 From: David Daney Date: Wed, 27 Jun 2012 07:33:38 +0000 Subject: netdev/phy: Add driver for Broadcom BCM87XX 10G Ethernet PHYs Add a driver for BCM8706 and BCM8727 devices. These are a 10Gig PHYs which use MII_ADDR_C45 addressing. They are always 10G full duplex, so there is no autonegotiation. All we do is report link state and send interrupts when it changes. If the PHY has a device tree of_node associated with it, the "broadcom,c45-reg-init" property is used to supply register initialization values when config_init() is called. Signed-off-by: David Daney Signed-off-by: David S. Miller --- .../devicetree/bindings/net/broadcom-bcm87xx.txt | 29 +++ drivers/net/phy/Kconfig | 5 + drivers/net/phy/Makefile | 1 + drivers/net/phy/bcm87xx.c | 238 +++++++++++++++++++++ 4 files changed, 273 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt create mode 100644 drivers/net/phy/bcm87xx.c (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt b/Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt new file mode 100644 index 000000000000..7c86d5e28a0e --- /dev/null +++ b/Documentation/devicetree/bindings/net/broadcom-bcm87xx.txt @@ -0,0 +1,29 @@ +The Broadcom BCM87XX devices are a family of 10G Ethernet PHYs. They +have these bindings in addition to the standard PHY bindings. + +Compatible: Should contain "broadcom,bcm8706" or "broadcom,bcm8727" and + "ethernet-phy-ieee802.3-c45" + +Optional Properties: + +- broadcom,c45-reg-init : one of more sets of 4 cells. The first cell + is the MDIO Manageable Device (MMD) address, the second a register + address within the MMD, the third cell contains a mask to be ANDed + with the existing register value, and the fourth cell is ORed with + he result to yield the new register value. If the third cell has a + value of zero, no read of the existing value is performed. + +Example: + + ethernet-phy@5 { + reg = <5>; + compatible = "broadcom,bcm8706", "ethernet-phy-ieee802.3-c45"; + interrupt-parent = <&gpio>; + interrupts = <12 8>; /* Pin 12, active low */ + /* + * Set PMD Digital Control Register for + * GPIO[1] Tx/Rx + * GPIO[0] R64 Sync Acquired + */ + broadcom,c45-reg-init = <1 0xc808 0xff8f 0x70>; + }; diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 944cdfb80fe4..3090dc65a6f1 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -67,6 +67,11 @@ config BCM63XX_PHY ---help--- Currently supports the 6348 and 6358 PHYs. +config BCM87XX_PHY + tristate "Driver for Broadcom BCM8706 and BCM8727 PHYs" + help + Currently supports the BCM8706 and BCM8727 10G Ethernet PHYs. + config ICPLUS_PHY tristate "Drivers for ICPlus PHYs" ---help--- diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index f51af688ef8b..6d2dc6c94f2e 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_SMSC_PHY) += smsc.o obj-$(CONFIG_VITESSE_PHY) += vitesse.o obj-$(CONFIG_BROADCOM_PHY) += broadcom.o obj-$(CONFIG_BCM63XX_PHY) += bcm63xx.o +obj-$(CONFIG_BCM87XX_PHY) += bcm87xx.o obj-$(CONFIG_ICPLUS_PHY) += icplus.o obj-$(CONFIG_REALTEK_PHY) += realtek.o obj-$(CONFIG_LSI_ET1011C_PHY) += et1011c.o diff --git a/drivers/net/phy/bcm87xx.c b/drivers/net/phy/bcm87xx.c new file mode 100644 index 000000000000..f5f0562934db --- /dev/null +++ b/drivers/net/phy/bcm87xx.c @@ -0,0 +1,238 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2011 - 2012 Cavium, Inc. + */ + +#include +#include +#include + +#define PHY_ID_BCM8706 0x0143bdc1 +#define PHY_ID_BCM8727 0x0143bff0 + +#define BCM87XX_PMD_RX_SIGNAL_DETECT (MII_ADDR_C45 | 0x1000a) +#define BCM87XX_10GBASER_PCS_STATUS (MII_ADDR_C45 | 0x30020) +#define BCM87XX_XGXS_LANE_STATUS (MII_ADDR_C45 | 0x40018) + +#define BCM87XX_LASI_CONTROL (MII_ADDR_C45 | 0x39002) +#define BCM87XX_LASI_STATUS (MII_ADDR_C45 | 0x39005) + +#if IS_ENABLED(CONFIG_OF_MDIO) +/* Set and/or override some configuration registers based on the + * marvell,reg-init property stored in the of_node for the phydev. + * + * broadcom,c45-reg-init = ,...; + * + * There may be one or more sets of : + * + * devid: which sub-device to use. + * reg: the register. + * mask: if non-zero, ANDed with existing register value. + * value: ORed with the masked value and written to the regiser. + * + */ +static int bcm87xx_of_reg_init(struct phy_device *phydev) +{ + const __be32 *paddr; + const __be32 *paddr_end; + int len, ret; + + if (!phydev->dev.of_node) + return 0; + + paddr = of_get_property(phydev->dev.of_node, + "broadcom,c45-reg-init", &len); + if (!paddr) + return 0; + + paddr_end = paddr + (len /= sizeof(*paddr)); + + ret = 0; + + while (paddr + 3 < paddr_end) { + u16 devid = be32_to_cpup(paddr++); + u16 reg = be32_to_cpup(paddr++); + u16 mask = be32_to_cpup(paddr++); + u16 val_bits = be32_to_cpup(paddr++); + int val; + u32 regnum = MII_ADDR_C45 | (devid << 16) | reg; + val = 0; + if (mask) { + val = phy_read(phydev, regnum); + if (val < 0) { + ret = val; + goto err; + } + val &= mask; + } + val |= val_bits; + + ret = phy_write(phydev, regnum, val); + if (ret < 0) + goto err; + } +err: + return ret; +} +#else +static int bcm87xx_of_reg_init(struct phy_device *phydev) +{ + return 0; +} +#endif /* CONFIG_OF_MDIO */ + +static int bcm87xx_config_init(struct phy_device *phydev) +{ + phydev->supported = SUPPORTED_10000baseR_FEC; + phydev->advertising = ADVERTISED_10000baseR_FEC; + phydev->state = PHY_NOLINK; + + bcm87xx_of_reg_init(phydev); + + return 0; +} + +static int bcm87xx_config_aneg(struct phy_device *phydev) +{ + return -EINVAL; +} + +static int bcm87xx_read_status(struct phy_device *phydev) +{ + int rx_signal_detect; + int pcs_status; + int xgxs_lane_status; + + rx_signal_detect = phy_read(phydev, BCM87XX_PMD_RX_SIGNAL_DETECT); + if (rx_signal_detect < 0) + return rx_signal_detect; + + if ((rx_signal_detect & 1) == 0) + goto no_link; + + pcs_status = phy_read(phydev, BCM87XX_10GBASER_PCS_STATUS); + if (pcs_status < 0) + return pcs_status; + + if ((pcs_status & 1) == 0) + goto no_link; + + xgxs_lane_status = phy_read(phydev, BCM87XX_XGXS_LANE_STATUS); + if (xgxs_lane_status < 0) + return xgxs_lane_status; + + if ((xgxs_lane_status & 0x1000) == 0) + goto no_link; + + phydev->speed = 10000; + phydev->link = 1; + phydev->duplex = 1; + return 0; + +no_link: + phydev->link = 0; + return 0; +} + +static int bcm87xx_config_intr(struct phy_device *phydev) +{ + int reg, err; + + reg = phy_read(phydev, BCM87XX_LASI_CONTROL); + + if (reg < 0) + return reg; + + if (phydev->interrupts == PHY_INTERRUPT_ENABLED) + reg |= 1; + else + reg &= ~1; + + err = phy_write(phydev, BCM87XX_LASI_CONTROL, reg); + return err; +} + +static int bcm87xx_did_interrupt(struct phy_device *phydev) +{ + int reg; + + reg = phy_read(phydev, BCM87XX_LASI_STATUS); + + if (reg < 0) { + dev_err(&phydev->dev, + "Error: Read of BCM87XX_LASI_STATUS failed: %d\n", reg); + return 0; + } + return (reg & 1) != 0; +} + +static int bcm87xx_ack_interrupt(struct phy_device *phydev) +{ + /* Reading the LASI status clears it. */ + bcm87xx_did_interrupt(phydev); + return 0; +} + +static int bcm8706_match_phy_device(struct phy_device *phydev) +{ + return phydev->c45_ids.device_ids[4] == PHY_ID_BCM8706; +} + +static int bcm8727_match_phy_device(struct phy_device *phydev) +{ + return phydev->c45_ids.device_ids[4] == PHY_ID_BCM8727; +} + +static struct phy_driver bcm8706_driver = { + .phy_id = PHY_ID_BCM8706, + .phy_id_mask = 0xffffffff, + .name = "Broadcom BCM8706", + .flags = PHY_HAS_INTERRUPT, + .config_init = bcm87xx_config_init, + .config_aneg = bcm87xx_config_aneg, + .read_status = bcm87xx_read_status, + .ack_interrupt = bcm87xx_ack_interrupt, + .config_intr = bcm87xx_config_intr, + .did_interrupt = bcm87xx_did_interrupt, + .match_phy_device = bcm8706_match_phy_device, + .driver = { .owner = THIS_MODULE }, +}; + +static struct phy_driver bcm8727_driver = { + .phy_id = PHY_ID_BCM8727, + .phy_id_mask = 0xffffffff, + .name = "Broadcom BCM8727", + .flags = PHY_HAS_INTERRUPT, + .config_init = bcm87xx_config_init, + .config_aneg = bcm87xx_config_aneg, + .read_status = bcm87xx_read_status, + .ack_interrupt = bcm87xx_ack_interrupt, + .config_intr = bcm87xx_config_intr, + .did_interrupt = bcm87xx_did_interrupt, + .match_phy_device = bcm8727_match_phy_device, + .driver = { .owner = THIS_MODULE }, +}; + +static int __init bcm87xx_init(void) +{ + int ret; + + ret = phy_driver_register(&bcm8706_driver); + if (ret) + goto err; + + ret = phy_driver_register(&bcm8727_driver); +err: + return ret; +} +module_init(bcm87xx_init); + +static void __exit bcm87xx_exit(void) +{ + phy_driver_unregister(&bcm8706_driver); + phy_driver_unregister(&bcm8727_driver); +} +module_exit(bcm87xx_exit); -- cgit v1.2.3 From c801e3cc1925e02fa7213889306d4d77e6ad1550 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 30 Jun 2012 22:39:27 -0700 Subject: ipv4: Clarify in docs that accept_local requires rp_filter. Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 99d0e0504d6e..47b6c79e9b05 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -857,9 +857,14 @@ accept_source_route - BOOLEAN FALSE (host) accept_local - BOOLEAN - Accept packets with local source addresses. In combination with - suitable routing, this can be used to direct packets between two - local interfaces over the wire and have them accepted properly. + Accept packets with local source addresses. In combination + with suitable routing, this can be used to direct packets + between two local interfaces over the wire and have them + accepted properly. + + rp_filter must be set to a non-zero value in order for + accept_local to have an effect. + default FALSE route_localnet - BOOLEAN -- cgit v1.2.3 From 0ec2ccd0804ebb57a860c59d056a3f420c4f8028 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Wed, 27 Jun 2012 21:14:36 +0000 Subject: stmmac: update the driver Documentation and add EEE This patch updates the stmmac's documentation adding some missing files in the section used to describe the internal driver's structure. Also the patch adds a new section to describe the EEE support. Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 5cb9a1972460..c676b9cedbd0 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -257,9 +257,11 @@ reset procedure etc). o Makefile o stmmac_main.c: main network device driver; o stmmac_mdio.c: mdio functions; + o stmmac_pci: PCI driver; + o stmmac_platform.c: platform driver o stmmac_ethtool.c: ethtool support; o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts - Only tested on ST40 platforms based. + (only tested on ST40 platforms based); o stmmac.h: private driver structure; o common.h: common definitions and VFTs; o descs.h: descriptor structure definitions; @@ -269,9 +271,11 @@ reset procedure etc). o dwmac100_core: MAC 100 core and dma code; o dwmac100_dma.c: dma funtions for the MAC chip; o dwmac1000.h: specific header file for the MAC; - o dwmac_lib.c: generic DMA functions shared among chips - o enh_desc.c: functions for handling enhanced descriptors - o norm_desc.c: functions for handling normal descriptors + o dwmac_lib.c: generic DMA functions shared among chips; + o enh_desc.c: functions for handling enhanced descriptors; + o norm_desc.c: functions for handling normal descriptors; + o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes; + o mmc_core.c/mmc.h: Management MAC Counters; 5) Debug Information @@ -304,7 +308,27 @@ All these are only useful during the developing stage and should never enabled inside the code for general usage. In fact, these can generate an huge amount of debug messages. -6) TODO: +6) Energy Efficient Ethernet + +Energy Efficient Ethernet(EEE) enables IEEE 802.3 MAC sublayer along +with a family of Physical layer to operate in the Low power Idle(LPI) +mode. The EEE mode supports the IEEE 802.3 MAC operation at 100Mbps, +1000Mbps & 10Gbps. + +The LPI mode allows power saving by switching off parts of the +communication device functionality when there is no data to be +transmitted & received. The system on both the side of the link can +disable some functionalities & save power during the period of low-link +utilization. The MAC controls whether the system should enter or exit +the LPI mode & communicate this to PHY. + +As soon as the interface is opened, the driver verifies if the EEE can +be supported. This is done by looking at both the DMA HW capability +register and the PHY devices MCD registers. +To enter in Tx LPI mode the driver needs to have a software timer +that enable and disable the LPI mode when there is nothing to be +transmitted. + +7) TODO: o XGMAC is not supported. - o Add the EEE - Energy Efficient Ethernet o Add the PTP - precision time protocol -- cgit v1.2.3 From 8b7b736fbfa1307fd294ca1fca69316f1b07a806 Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Tue, 26 Jun 2012 16:49:22 +0800 Subject: net: flexcan: clock-frequency is optional for device tree probe The property clock-frequency is optional for device tree probe. When it's absent, the flexcan driver will try to get the frequency from clk system by calling clk_get_rate. Signed-off-by: Shawn Guo Acked-by: Dong Aisheng Signed-off-by: Marc Kleine-Budde --- Documentation/devicetree/bindings/net/can/fsl-flexcan.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt index f31b686d4556..8ff324eaa889 100644 --- a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt +++ b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt @@ -11,6 +11,9 @@ Required properties: - reg : Offset and length of the register set for this device - interrupts : Interrupt tuple for this device + +Optional properties: + - clock-frequency : The oscillator frequency driving the flexcan device Example: -- cgit v1.2.3 From f72b85b8eb6657fae95ac8f5cb20954b4d87a520 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 6 Jul 2012 19:49:54 +0200 Subject: mac80211: remove ieee80211_key_removed This API call was intended to be used by drivers if they want to optimize key handling by removing one key when another is added. Remove it since no driver is using it. If needed, it can always be added back. Signed-off-by: Johannes Berg --- Documentation/DocBook/80211.tmpl | 1 - include/net/mac80211.h | 16 ---------------- net/mac80211/key.c | 20 -------------------- 3 files changed, 37 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl index f3e214f9e256..42e7f030cb16 100644 --- a/Documentation/DocBook/80211.tmpl +++ b/Documentation/DocBook/80211.tmpl @@ -404,7 +404,6 @@ !Finclude/net/mac80211.h ieee80211_get_tkip_p1k !Finclude/net/mac80211.h ieee80211_get_tkip_p1k_iv !Finclude/net/mac80211.h ieee80211_get_tkip_p2k -!Finclude/net/mac80211.h ieee80211_key_removed diff --git a/include/net/mac80211.h b/include/net/mac80211.h index e3fa90ce9ecb..c5dbb46debb0 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -3591,22 +3591,6 @@ void ieee80211_chswitch_done(struct ieee80211_vif *vif, bool success); void ieee80211_request_smps(struct ieee80211_vif *vif, enum ieee80211_smps_mode smps_mode); -/** - * ieee80211_key_removed - disable hw acceleration for key - * @key_conf: The key hw acceleration should be disabled for - * - * This allows drivers to indicate that the given key has been - * removed from hardware acceleration, due to a new key that - * was added. Don't use this if the key can continue to be used - * for TX, if the key restriction is on RX only it is permitted - * to keep the key for TX only and not call this function. - * - * Due to locking constraints, it may only be called during - * @set_key. This function must be allowed to sleep, and the - * key it tries to disable may still be used until it returns. - */ -void ieee80211_key_removed(struct ieee80211_key_conf *key_conf); - /** * ieee80211_ready_on_channel - notification of remain-on-channel start * @hw: pointer as obtained from ieee80211_alloc_hw() diff --git a/net/mac80211/key.c b/net/mac80211/key.c index b3b7e526e245..7ae678ba5d67 100644 --- a/net/mac80211/key.c +++ b/net/mac80211/key.c @@ -194,26 +194,6 @@ static void ieee80211_key_disable_hw_accel(struct ieee80211_key *key) key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; } -void ieee80211_key_removed(struct ieee80211_key_conf *key_conf) -{ - struct ieee80211_key *key; - - key = container_of(key_conf, struct ieee80211_key, conf); - - might_sleep(); - assert_key_lock(key->local); - - key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; - - /* - * Flush TX path to avoid attempts to use this key - * after this function returns. Until then, drivers - * must be prepared to handle the key. - */ - synchronize_rcu(); -} -EXPORT_SYMBOL_GPL(ieee80211_key_removed); - static void __ieee80211_set_default_key(struct ieee80211_sub_if_data *sdata, int idx, bool uni, bool multi) { -- cgit v1.2.3 From 36516268fd372294f5f4f26e0538ee70a1b5b9e7 Mon Sep 17 00:00:00 2001 From: Eric Lapuyade Date: Thu, 3 May 2012 11:49:30 +0200 Subject: NFC: Error management documentation Signed-off-by: Eric Lapuyade Signed-off-by: Samuel Ortiz --- Documentation/nfc/nfc-hci.txt | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'Documentation') diff --git a/Documentation/nfc/nfc-hci.txt b/Documentation/nfc/nfc-hci.txt index 320f9336c781..89a339c9b079 100644 --- a/Documentation/nfc/nfc-hci.txt +++ b/Documentation/nfc/nfc-hci.txt @@ -178,3 +178,36 @@ ANY_GET_PARAMETER to the reader A gate to get information on the target that was discovered). Typically, such an event will be propagated to NFC Core from MSGRXWQ context. + +Error management +---------------- + +Errors that occur synchronously with the execution of an NFC Core request are +simply returned as the execution result of the request. These are easy. + +Errors that occur asynchronously (e.g. in a background protocol handling thread) +must be reported such that upper layers don't stay ignorant that something +went wrong below and know that expected events will probably never happen. +Handling of these errors is done as follows: + +- driver (pn544) fails to deliver an incoming frame: it stores the error such +that any subsequent call to the driver will result in this error. Then it calls +the standard nfc_shdlc_recv_frame() with a NULL argument to report the problem +above. shdlc stores a EREMOTEIO sticky status, which will trigger SMW to +report above in turn. + +- SMW is basically a background thread to handle incoming and outgoing shdlc +frames. This thread will also check the shdlc sticky status and report to HCI +when it discovers it is not able to run anymore because of an unrecoverable +error that happened within shdlc or below. If the problem occurs during shdlc +connection, the error is reported through the connect completion. + +- HCI: if an internal HCI error happens (frame is lost), or HCI is reported an +error from a lower layer, HCI will either complete the currently executing +command with that error, or notify NFC Core directly if no command is executing. + +- NFC Core: when NFC Core is notified of an error from below and polling is +active, it will send a tag discovered event with an empty tag list to the user +space to let it know that the poll operation will never be able to detect a tag. +If polling is not active and the error was sticky, lower levels will return it +at next invocation. -- cgit v1.2.3 From c0589fa78ae534acb741370872c4e13578d2f164 Mon Sep 17 00:00:00 2001 From: Jon Mason Date: Mon, 9 Jul 2012 14:07:57 +0000 Subject: vxge/s2io: remove dead URLs URLs to neterion.com and s2io.com no longer resolve. Remove all references to these URLs in the driver source and documentation. Signed-off-by: Jon Mason Signed-off-by: David S. Miller --- Documentation/networking/s2io.txt | 14 ++------------ Documentation/networking/vxge.txt | 7 ------- MAINTAINERS | 2 -- drivers/net/ethernet/neterion/vxge/vxge-main.c | 4 +--- 4 files changed, 3 insertions(+), 24 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt index 4be0c039edbc..d2a9f43b5546 100644 --- a/Documentation/networking/s2io.txt +++ b/Documentation/networking/s2io.txt @@ -136,16 +136,6 @@ For more information, please review the AMD8131 errata at http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/ 26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf -6. Available Downloads -Neterion "s2io" driver in Red Hat and Suse 2.6-based distributions is kept up -to date, also the latest "s2io" code (including support for 2.4 kernels) is -available via "Support" link on the Neterion site: http://www.neterion.com. - -For Xframe User Guide (Programming manual), visit ftp site ns1.s2io.com, -user: linuxdocs password: HALdocs - -7. Support +6. Support For further support please contact either your 10GbE Xframe NIC vendor (IBM, -HP, SGI etc.) or click on the "Support" link on the Neterion site: -http://www.neterion.com. - +HP, SGI etc.) diff --git a/Documentation/networking/vxge.txt b/Documentation/networking/vxge.txt index d2e2997e6fa0..bb76c667a476 100644 --- a/Documentation/networking/vxge.txt +++ b/Documentation/networking/vxge.txt @@ -91,10 +91,3 @@ v) addr_learn_en virtualization environment. Valid range: 0,1 (disabled, enabled respectively) Default: 0 - -4) Troubleshooting: -------------------- - -To resolve an issue with the source code or X3100 series adapter, please collect -the statistics, register dumps using ethool, relevant logs and email them to -support@neterion.com. diff --git a/MAINTAINERS b/MAINTAINERS index 8da1373bd6d1..ce7398e1e1ec 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4638,8 +4638,6 @@ F: net/sched/sch_netem.c NETERION 10GbE DRIVERS (s2io/vxge) M: Jon Mason L: netdev@vger.kernel.org -W: http://trac.neterion.com/cgi-bin/trac.cgi/wiki/Linux?Anonymous -W: http://trac.neterion.com/cgi-bin/trac.cgi/wiki/X3100Linux?Anonymous S: Supported F: Documentation/networking/s2io.txt F: Documentation/networking/vxge.txt diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c index 2fd1edbc5e0e..4e20c5f02712 100644 --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c @@ -4261,9 +4261,7 @@ static int vxge_probe_fw_update(struct vxgedev *vdev) if (VXGE_FW_VER(VXGE_CERT_FW_VER_MAJOR, VXGE_CERT_FW_VER_MINOR, 0) > VXGE_FW_VER(maj, min, 0)) { vxge_debug_init(VXGE_ERR, "%s: Firmware %d.%d.%d is too old to" - " be used with this driver.\n" - "Please get the latest version from " - "ftp://ftp.s2io.com/pub/X3100-Drivers/FIRMWARE", + " be used with this driver.", VXGE_DRIVER_NAME, maj, min, bld); return -EINVAL; } -- cgit v1.2.3 From 46d3ceabd8d98ed0ad10f20c595ca784e34786c5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 11 Jul 2012 05:50:31 +0000 Subject: tcp: TCP Small Queues This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet Cc: Dave Taht Cc: Tom Herbert Cc: Matt Mathis Cc: Yuchung Cheng Cc: Nandita Dukkipati Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 14 +++ include/linux/tcp.h | 9 ++ include/net/sock.h | 2 + include/net/tcp.h | 4 + net/core/sock.c | 4 + net/ipv4/sysctl_net_ipv4.c | 7 ++ net/ipv4/tcp.c | 6 ++ net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_minisocks.c | 1 + net/ipv4/tcp_output.c | 154 ++++++++++++++++++++++++++++++++- net/ipv6/tcp_ipv6.c | 1 + 11 files changed, 202 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 47b6c79e9b05..e20c17a7d34e 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN Documentation/networking/tcp-thin.txt Default: 0 +tcp_limit_output_bytes - INTEGER + Controls TCP Small Queue limit per tcp socket. + TCP bulk sender tends to increase packets in flight until it + gets losses notifications. With SNDBUF autotuning, this can + result in a large amount of packets queued in qdisc/device + on the local machine, hurting latency of other flows, for + typical pfifo_fast qdiscs. + tcp_limit_output_bytes limits the number of bytes on qdisc + or device to reduce artificial RTT/cwnd and reduce bufferbloat. + Note: For GSO/TSO enabled flows, we try to have at least two + packets in flight. Reducing tcp_limit_output_bytes might also + reduce the size of individual GSO packet (64KB being the max) + Default: 131072 + UDP variables: udp_mem - vector of 3 INTEGERs: min, pressure, max diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 2de9cf46f9fc..1888169e07c7 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -339,6 +339,9 @@ struct tcp_sock { u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */ u32 lsndtime; /* timestamp of last sent data packet (for restart window) */ + struct list_head tsq_node; /* anchor in tsq_tasklet.head list */ + unsigned long tsq_flags; + /* Data for direct copy to user */ struct { struct sk_buff_head prequeue; @@ -494,6 +497,12 @@ struct tcp_sock { struct tcp_cookie_values *cookie_values; }; +enum tsq_flags { + TSQ_THROTTLED, + TSQ_QUEUED, + TSQ_OWNED, /* tcp_tasklet_func() found socket was locked */ +}; + static inline struct tcp_sock *tcp_sk(const struct sock *sk) { return (struct tcp_sock *)sk; diff --git a/include/net/sock.h b/include/net/sock.h index dcb54a0793ec..88de092df50f 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -858,6 +858,8 @@ struct proto { int (*backlog_rcv) (struct sock *sk, struct sk_buff *skb); + void (*release_cb)(struct sock *sk); + /* Keeping track of sk's, looking them up, and port selection methods. */ void (*hash)(struct sock *sk); void (*unhash)(struct sock *sk); diff --git a/include/net/tcp.h b/include/net/tcp.h index 3618fefae049..439984b9af49 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -253,6 +253,7 @@ extern int sysctl_tcp_cookie_size; extern int sysctl_tcp_thin_linear_timeouts; extern int sysctl_tcp_thin_dupack; extern int sysctl_tcp_early_retrans; +extern int sysctl_tcp_limit_output_bytes; extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; @@ -321,6 +322,8 @@ extern struct proto tcp_prot; extern void tcp_init_mem(struct net *net); +extern void tcp_tasklet_init(void); + extern void tcp_v4_err(struct sk_buff *skb, u32); extern void tcp_shutdown (struct sock *sk, int how); @@ -334,6 +337,7 @@ extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size); extern int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); +extern void tcp_release_cb(struct sock *sk); extern int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, unsigned int len); diff --git a/net/core/sock.c b/net/core/sock.c index 929bdcc2383b..24039ac12426 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2159,6 +2159,10 @@ void release_sock(struct sock *sk) spin_lock_bh(&sk->sk_lock.slock); if (sk->sk_backlog.tail) __release_sock(sk); + + if (sk->sk_prot->release_cb) + sk->sk_prot->release_cb(sk); + sk->sk_lock.owned = 0; if (waitqueue_active(&sk->sk_lock.wq)) wake_up(&sk->sk_lock.wq); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 12aa0c5867c4..70730f7aeafe 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -598,6 +598,13 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_limit_output_bytes", + .data = &sysctl_tcp_limit_output_bytes, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, #ifdef CONFIG_NET_DMA { .procname = "tcp_dma_copybreak", diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index d902da96d154..4252cd8f39fd 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -376,6 +376,7 @@ void tcp_init_sock(struct sock *sk) skb_queue_head_init(&tp->out_of_order_queue); tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); + INIT_LIST_HEAD(&tp->tsq_node); icsk->icsk_rto = TCP_TIMEOUT_INIT; tp->mdev = TCP_TIMEOUT_INIT; @@ -796,6 +797,10 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, inet_csk(sk)->icsk_ext_hdr_len - tp->tcp_header_len); + /* TSQ : try to have two TSO segments in flight */ + xmit_size_goal = min_t(u32, xmit_size_goal, + sysctl_tcp_limit_output_bytes >> 1); + xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal); /* We try hard to avoid divides here */ @@ -3574,4 +3579,5 @@ void __init tcp_init(void) tcp_secret_primary = &tcp_secret_one; tcp_secret_retiring = &tcp_secret_two; tcp_secret_secondary = &tcp_secret_two; + tcp_tasklet_init(); } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ddefd39ac0cf..01545a3fc0f2 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2588,6 +2588,7 @@ struct proto tcp_prot = { .sendmsg = tcp_sendmsg, .sendpage = tcp_sendpage, .backlog_rcv = tcp_v4_do_rcv, + .release_cb = tcp_release_cb, .hash = inet_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 65608863fdee..c66f2ede160e 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -424,6 +424,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, treq->snt_isn + 1 + tcp_s_data_size(oldtp); tcp_prequeue_init(newtp); + INIT_LIST_HEAD(&newtp->tsq_node); tcp_init_wl(newtp, treq->rcv_isn); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c465d3e51e28..03854abfd9d8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -50,6 +50,9 @@ int sysctl_tcp_retrans_collapse __read_mostly = 1; */ int sysctl_tcp_workaround_signed_windows __read_mostly = 0; +/* Default TSQ limit of two TSO segments */ +int sysctl_tcp_limit_output_bytes __read_mostly = 131072; + /* This limits the percentage of the congestion window which we * will allow a single TSO frame to consume. Building TSO frames * which are too large can cause TCP streams to be bursty. @@ -65,6 +68,8 @@ int sysctl_tcp_slow_start_after_idle __read_mostly = 1; int sysctl_tcp_cookie_size __read_mostly = 0; /* TCP_COOKIE_MAX */ EXPORT_SYMBOL_GPL(sysctl_tcp_cookie_size); +static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, + int push_one, gfp_t gfp); /* Account for new data that has been sent to the network. */ static void tcp_event_new_data_sent(struct sock *sk, const struct sk_buff *skb) @@ -783,6 +788,140 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb return size; } + +/* TCP SMALL QUEUES (TSQ) + * + * TSQ goal is to keep small amount of skbs per tcp flow in tx queues (qdisc+dev) + * to reduce RTT and bufferbloat. + * We do this using a special skb destructor (tcp_wfree). + * + * Its important tcp_wfree() can be replaced by sock_wfree() in the event skb + * needs to be reallocated in a driver. + * The invariant being skb->truesize substracted from sk->sk_wmem_alloc + * + * Since transmit from skb destructor is forbidden, we use a tasklet + * to process all sockets that eventually need to send more skbs. + * We use one tasklet per cpu, with its own queue of sockets. + */ +struct tsq_tasklet { + struct tasklet_struct tasklet; + struct list_head head; /* queue of tcp sockets */ +}; +static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet); + +/* + * One tasklest per cpu tries to send more skbs. + * We run in tasklet context but need to disable irqs when + * transfering tsq->head because tcp_wfree() might + * interrupt us (non NAPI drivers) + */ +static void tcp_tasklet_func(unsigned long data) +{ + struct tsq_tasklet *tsq = (struct tsq_tasklet *)data; + LIST_HEAD(list); + unsigned long flags; + struct list_head *q, *n; + struct tcp_sock *tp; + struct sock *sk; + + local_irq_save(flags); + list_splice_init(&tsq->head, &list); + local_irq_restore(flags); + + list_for_each_safe(q, n, &list) { + tp = list_entry(q, struct tcp_sock, tsq_node); + list_del(&tp->tsq_node); + + sk = (struct sock *)tp; + bh_lock_sock(sk); + + if (!sock_owned_by_user(sk)) { + if ((1 << sk->sk_state) & + (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | + TCPF_CLOSING | TCPF_CLOSE_WAIT)) + tcp_write_xmit(sk, + tcp_current_mss(sk), + 0, 0, + GFP_ATOMIC); + } else { + /* defer the work to tcp_release_cb() */ + set_bit(TSQ_OWNED, &tp->tsq_flags); + } + bh_unlock_sock(sk); + + clear_bit(TSQ_QUEUED, &tp->tsq_flags); + sk_free(sk); + } +} + +/** + * tcp_release_cb - tcp release_sock() callback + * @sk: socket + * + * called from release_sock() to perform protocol dependent + * actions before socket release. + */ +void tcp_release_cb(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (test_and_clear_bit(TSQ_OWNED, &tp->tsq_flags)) { + if ((1 << sk->sk_state) & + (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | + TCPF_CLOSING | TCPF_CLOSE_WAIT)) + tcp_write_xmit(sk, + tcp_current_mss(sk), + 0, 0, + GFP_ATOMIC); + } +} +EXPORT_SYMBOL(tcp_release_cb); + +void __init tcp_tasklet_init(void) +{ + int i; + + for_each_possible_cpu(i) { + struct tsq_tasklet *tsq = &per_cpu(tsq_tasklet, i); + + INIT_LIST_HEAD(&tsq->head); + tasklet_init(&tsq->tasklet, + tcp_tasklet_func, + (unsigned long)tsq); + } +} + +/* + * Write buffer destructor automatically called from kfree_skb. + * We cant xmit new skbs from this context, as we might already + * hold qdisc lock. + */ +void tcp_wfree(struct sk_buff *skb) +{ + struct sock *sk = skb->sk; + struct tcp_sock *tp = tcp_sk(sk); + + if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) && + !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) { + unsigned long flags; + struct tsq_tasklet *tsq; + + /* Keep a ref on socket. + * This last ref will be released in tcp_tasklet_func() + */ + atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc); + + /* queue this socket to tasklet queue */ + local_irq_save(flags); + tsq = &__get_cpu_var(tsq_tasklet); + list_add(&tp->tsq_node, &tsq->head); + tasklet_schedule(&tsq->tasklet); + local_irq_restore(flags); + } else { + sock_wfree(skb); + } +} + /* This routine actually transmits TCP packets queued in by * tcp_do_sendmsg(). This is used by both the initial * transmission and possible later retransmissions. @@ -844,7 +983,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, skb_push(skb, tcp_header_size); skb_reset_transport_header(skb); - skb_set_owner_w(skb, sk); + + skb_orphan(skb); + skb->sk = sk; + skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ? + tcp_wfree : sock_wfree; + atomic_add(skb->truesize, &sk->sk_wmem_alloc); /* Build TCP header and checksum it. */ th = tcp_hdr(skb); @@ -1780,6 +1924,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, while ((skb = tcp_send_head(sk))) { unsigned int limit; + tso_segs = tcp_init_tso_segs(sk, skb, mss_now); BUG_ON(!tso_segs); @@ -1800,6 +1945,13 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } + /* TSQ : sk_wmem_alloc accounts skb truesize, + * including skb overhead. But thats OK. + */ + if (atomic_read(&sk->sk_wmem_alloc) >= sysctl_tcp_limit_output_bytes) { + set_bit(TSQ_THROTTLED, &tp->tsq_flags); + break; + } limit = mss_now; if (tso_segs > 1 && !tcp_urg_mode(tp)) limit = tcp_mss_split_point(sk, skb, mss_now, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 61175cb2478f..70458a9cd837 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1970,6 +1970,7 @@ struct proto tcpv6_prot = { .sendmsg = tcp_sendmsg, .sendpage = tcp_sendpage, .backlog_rcv = tcp_v6_do_rcv, + .release_cb = tcp_release_cb, .hash = tcp_v6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, -- cgit v1.2.3 From 282f23c6ee343126156dd41218b22ece96d747e3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 17 Jul 2012 10:13:05 +0200 Subject: tcp: implement RFC 5961 3.2 Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet Cc: Kiran Kumar Kella Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 5 +++++ include/linux/snmp.h | 1 + include/net/tcp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/sysctl_net_ipv4.c | 7 +++++++ net/ipv4/tcp_input.c | 31 ++++++++++++++++++++++++++++++- 6 files changed, 45 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index e20c17a7d34e..e1e021594cff 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -565,6 +565,11 @@ tcp_limit_output_bytes - INTEGER reduce the size of individual GSO packet (64KB being the max) Default: 131072 +tcp_challenge_ack_limit - INTEGER + Limits number of Challenge ACK sent per second, as recommended + in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) + Default: 100 + UDP variables: udp_mem - vector of 3 INTEGERs: min, pressure, max diff --git a/include/linux/snmp.h b/include/linux/snmp.h index 6e4c51123828..673e0e928b2b 100644 --- a/include/linux/snmp.h +++ b/include/linux/snmp.h @@ -237,6 +237,7 @@ enum LINUX_MIB_TCPOFOQUEUE, /* TCPOFOQueue */ LINUX_MIB_TCPOFODROP, /* TCPOFODrop */ LINUX_MIB_TCPOFOMERGE, /* TCPOFOMerge */ + LINUX_MIB_TCPCHALLENGEACK, /* TCPChallengeACK */ __LINUX_MIB_MAX }; diff --git a/include/net/tcp.h b/include/net/tcp.h index 439984b9af49..85c5090bfe25 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -254,6 +254,7 @@ extern int sysctl_tcp_thin_linear_timeouts; extern int sysctl_tcp_thin_dupack; extern int sysctl_tcp_early_retrans; extern int sysctl_tcp_limit_output_bytes; +extern int sysctl_tcp_challenge_ack_limit; extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index dae25e7622cf..3e8e78f12a38 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -261,6 +261,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPOFOQueue", LINUX_MIB_TCPOFOQUEUE), SNMP_MIB_ITEM("TCPOFODrop", LINUX_MIB_TCPOFODROP), SNMP_MIB_ITEM("TCPOFOMerge", LINUX_MIB_TCPOFOMERGE), + SNMP_MIB_ITEM("TCPChallengeACK", LINUX_MIB_TCPCHALLENGEACK), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 70730f7aeafe..3f6a1e762e9c 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -605,6 +605,13 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_challenge_ack_limit", + .data = &sysctl_tcp_challenge_ack_limit, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, #ifdef CONFIG_NET_DMA { .procname = "tcp_dma_copybreak", diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index cc4e12f1f2f7..c841a8990377 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -88,6 +88,9 @@ int sysctl_tcp_app_win __read_mostly = 31; int sysctl_tcp_adv_win_scale __read_mostly = 1; EXPORT_SYMBOL(sysctl_tcp_adv_win_scale); +/* rfc5961 challenge ack rate limiting */ +int sysctl_tcp_challenge_ack_limit = 100; + int sysctl_tcp_stdurg __read_mostly; int sysctl_tcp_rfc1337 __read_mostly; int sysctl_tcp_max_orphans __read_mostly = NR_FILE; @@ -5247,6 +5250,23 @@ out: } #endif /* CONFIG_NET_DMA */ +static void tcp_send_challenge_ack(struct sock *sk) +{ + /* unprotected vars, we dont care of overwrites */ + static u32 challenge_timestamp; + static unsigned int challenge_count; + u32 now = jiffies / HZ; + + if (now != challenge_timestamp) { + challenge_timestamp = now; + challenge_count = 0; + } + if (++challenge_count <= sysctl_tcp_challenge_ack_limit) { + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPCHALLENGEACK); + tcp_send_ack(sk); + } +} + /* Does PAWS and seqno based validation of an incoming segment, flags will * play significant role here. */ @@ -5283,7 +5303,16 @@ static int tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, /* Step 2: check RST bit */ if (th->rst) { - tcp_reset(sk); + /* RFC 5961 3.2 : + * If sequence number exactly matches RCV.NXT, then + * RESET the connection + * else + * Send a challenge ACK + */ + if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) + tcp_reset(sk); + else + tcp_send_challenge_ack(sk); goto discard; } -- cgit v1.2.3 From 42f59967a091d012b358a532766fe4d53c6d3ba3 Mon Sep 17 00:00:00 2001 From: Heiko Schocher Date: Tue, 17 Jul 2012 00:34:24 +0000 Subject: net: ethernet: davinci_emac: add OF support add OF support for the davinci_emac driver. Signed-off-by: Heiko Schocher Acked-by: Sekhar Nori Signed-off-by: Anatolij Gustschin Cc: netdev@vger.kernel.org Cc: davinci-linux-open-source@linux.davincidsp.com Cc: linux-arm-kernel@lists.infradead.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Grant Likely Cc: Sekhar Nori Cc: Wolfgang Denk Cc: Anatoly Sivov Cc: David Miller Signed-off-by: David S. Miller --- .../devicetree/bindings/net/davinci_emac.txt | 41 ++++++++++ drivers/net/ethernet/ti/davinci_emac.c | 89 +++++++++++++++++++++- 2 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 Documentation/devicetree/bindings/net/davinci_emac.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/davinci_emac.txt b/Documentation/devicetree/bindings/net/davinci_emac.txt new file mode 100644 index 000000000000..48b259e29e87 --- /dev/null +++ b/Documentation/devicetree/bindings/net/davinci_emac.txt @@ -0,0 +1,41 @@ +* Texas Instruments Davinci EMAC + +This file provides information, what the device node +for the davinci_emac interface contains. + +Required properties: +- compatible: "ti,davinci-dm6467-emac"; +- reg: Offset and length of the register set for the device +- ti,davinci-ctrl-reg-offset: offset to control register +- ti,davinci-ctrl-mod-reg-offset: offset to control module register +- ti,davinci-ctrl-ram-offset: offset to control module ram +- ti,davinci-ctrl-ram-size: size of control module ram +- ti,davinci-rmii-en: use RMII +- ti,davinci-no-bd-ram: has the emac controller BD RAM +- phy-handle: Contains a phandle to an Ethernet PHY. + if not, davinci_emac driver defaults to 100/FULL +- interrupts: interrupt mapping for the davinci emac interrupts sources: + 4 sources: + +Optional properties: +- local-mac-address : 6 bytes, mac address + +Example (enbw_cmc board): + eth0: emac@1e20000 { + compatible = "ti,davinci-dm6467-emac"; + reg = <0x220000 0x4000>; + ti,davinci-ctrl-reg-offset = <0x3000>; + ti,davinci-ctrl-mod-reg-offset = <0x2000>; + ti,davinci-ctrl-ram-offset = <0>; + ti,davinci-ctrl-ram-size = <0x2000>; + local-mac-address = [ 00 00 00 00 00 00 ]; + interrupts = <33 + 34 + 35 + 36 + >; + interrupt-parent = <&intc>; + }; diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c index ab0bbb78699a..b298ab071e3d 100644 --- a/drivers/net/ethernet/ti/davinci_emac.c +++ b/drivers/net/ethernet/ti/davinci_emac.c @@ -58,6 +58,12 @@ #include #include #include +#include +#include +#include +#include + +#include #include #include @@ -339,6 +345,9 @@ struct emac_priv { u32 rx_addr_type; atomic_t cur_tx; const char *phy_id; +#ifdef CONFIG_OF + struct device_node *phy_node; +#endif struct phy_device *phydev; spinlock_t lock; /*platform specific members*/ @@ -1760,6 +1769,77 @@ static const struct net_device_ops emac_netdev_ops = { #endif }; +#ifdef CONFIG_OF +static struct emac_platform_data + *davinci_emac_of_get_pdata(struct platform_device *pdev, + struct emac_priv *priv) +{ + struct device_node *np; + struct emac_platform_data *pdata = NULL; + const u8 *mac_addr; + u32 data; + int ret; + + pdata = pdev->dev.platform_data; + if (!pdata) { + pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL); + if (!pdata) + goto nodata; + } + + np = pdev->dev.of_node; + if (!np) + goto nodata; + else + pdata->version = EMAC_VERSION_2; + + if (!is_valid_ether_addr(pdata->mac_addr)) { + mac_addr = of_get_mac_address(np); + if (mac_addr) + memcpy(pdata->mac_addr, mac_addr, ETH_ALEN); + } + + ret = of_property_read_u32(np, "ti,davinci-ctrl-reg-offset", &data); + if (!ret) + pdata->ctrl_reg_offset = data; + + ret = of_property_read_u32(np, "ti,davinci-ctrl-mod-reg-offset", + &data); + if (!ret) + pdata->ctrl_mod_reg_offset = data; + + ret = of_property_read_u32(np, "ti,davinci-ctrl-ram-offset", &data); + if (!ret) + pdata->ctrl_ram_offset = data; + + ret = of_property_read_u32(np, "ti,davinci-ctrl-ram-size", &data); + if (!ret) + pdata->ctrl_ram_size = data; + + ret = of_property_read_u32(np, "ti,davinci-rmii-en", &data); + if (!ret) + pdata->rmii_en = data; + + ret = of_property_read_u32(np, "ti,davinci-no-bd-ram", &data); + if (!ret) + pdata->no_bd_ram = data; + + priv->phy_node = of_parse_phandle(np, "phy-handle", 0); + if (!priv->phy_node) + pdata->phy_id = ""; + + pdev->dev.platform_data = pdata; +nodata: + return pdata; +} +#else +static struct emac_platform_data + *davinci_emac_of_get_pdata(struct platform_device *pdev, + struct emac_priv *priv) +{ + return pdev->dev.platform_data; +} +#endif /** * davinci_emac_probe - EMAC device probe * @pdev: The DaVinci EMAC device that we are removing @@ -1802,7 +1882,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev) spin_lock_init(&priv->lock); - pdata = pdev->dev.platform_data; + pdata = davinci_emac_of_get_pdata(pdev, priv); if (!pdata) { dev_err(&pdev->dev, "no platform data\n"); rc = -ENODEV; @@ -2013,12 +2093,19 @@ static const struct dev_pm_ops davinci_emac_pm_ops = { .resume = davinci_emac_resume, }; +static const struct of_device_id davinci_emac_of_match[] = { + {.compatible = "ti,davinci-dm6467-emac", }, + {}, +}; +MODULE_DEVICE_TABLE(of, davinci_emac_of_match); + /* davinci_emac_driver: EMAC platform driver structure */ static struct platform_driver davinci_emac_driver = { .driver = { .name = "davinci_emac", .owner = THIS_MODULE, .pm = &davinci_emac_pm_ops, + .of_match_table = of_match_ptr(davinci_emac_of_match), }, .probe = davinci_emac_probe, .remove = __devexit_p(davinci_emac_remove), -- cgit v1.2.3 From 84c9f8c41df9f62a34eb680009b59cc817a76d6e Mon Sep 17 00:00:00 2001 From: Dinh Nguyen Date: Wed, 18 Jul 2012 13:28:26 +0000 Subject: net: stmmac: Add ip version to dts bindings Because there are multiple variants to the stmmac/dwmac driver, the dts bindings should be updated to include version of the IP used. Signed-off-by: Dinh Nguyen Acked-by: Stefan Roese Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/stmmac.txt | 3 ++- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 ++++++-- 2 files changed, 8 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index 1f62623f8c3f..060bbf098ef3 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -1,7 +1,8 @@ * STMicroelectronics 10/100/1000 Ethernet driver (GMAC) Required properties: -- compatible: Should be "st,spear600-gmac" +- compatible: Should be "snps,dwmac-" "snps,dwmac" + For backwards compatibility: "st,spear600-gmac" is also supported. - reg: Address and length of the register set for the device - interrupt-parent: Should be the phandle for the interrupt controller that services interrupts for this device diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 7d36163d0d23..cd01ee7ecef1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -49,7 +49,9 @@ static int __devinit stmmac_probe_config_dt(struct platform_device *pdev, * are provided. All other properties should be added * once needed on other platforms. */ - if (of_device_is_compatible(np, "st,spear600-gmac")) { + if (of_device_is_compatible(np, "st,spear600-gmac") || + of_device_is_compatible(np, "snps,dwmac-3.70a") || + of_device_is_compatible(np, "snps,dwmac")) { plat->has_gmac = 1; plat->pmt = 1; } @@ -252,7 +254,9 @@ static const struct dev_pm_ops stmmac_pltfr_pm_ops; #endif /* CONFIG_PM */ static const struct of_device_id stmmac_dt_ids[] = { - { .compatible = "st,spear600-gmac", }, + { .compatible = "st,spear600-gmac"}, + { .compatible = "snps,dwmac-3.70a"}, + { .compatible = "snps,dwmac"}, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, stmmac_dt_ids); -- cgit v1.2.3 From 8427b2acfdd5e6c554fb7ad1fbccf53a24a08454 Mon Sep 17 00:00:00 2001 From: stephen hemminger Date: Thu, 19 Jul 2012 07:01:07 +0000 Subject: bridge: update documentation references Update the references to bridge utilities and web pages to current locations Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller --- Documentation/networking/bridge.txt | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bridge.txt b/Documentation/networking/bridge.txt index a7ba5e4e2c91..a27cb6214ed7 100644 --- a/Documentation/networking/bridge.txt +++ b/Documentation/networking/bridge.txt @@ -1,7 +1,14 @@ In order to use the Ethernet bridging functionality, you'll need the -userspace tools. These programs and documentation are available -at http://www.linuxfoundation.org/en/Net:Bridge. The download page is -http://prdownloads.sourceforge.net/bridge. +userspace tools. + +Documentation for Linux bridging is on: + http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge + +The bridge-utilities are maintained at: + git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git + +Additionally, the iproute2 utilities can be used to configure +bridge devices. If you still have questions, don't hesitate to post to the mailing list (more info https://lists.linux-foundation.org/mailman/listinfo/bridge). -- cgit v1.2.3 From cf60af03ca4e71134206809ea892e49b92a88896 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Thu, 19 Jul 2012 06:43:09 +0000 Subject: net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 11 ++++++ include/linux/socket.h | 1 + include/net/inet_common.h | 6 ++-- include/net/tcp.h | 3 ++ net/ipv4/af_inet.c | 19 ++++++++--- net/ipv4/tcp.c | 61 +++++++++++++++++++++++++++++++--- net/ipv4/tcp_ipv4.c | 3 ++ 7 files changed, 92 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index e1e021594cff..03964e088180 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -468,6 +468,17 @@ tcp_syncookies - BOOLEAN SYN flood warnings in logs not being really flooded, your server is seriously misconfigured. +tcp_fastopen - INTEGER + Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data + in the opening SYN packet. To use this feature, the client application + must not use connect(). Instead, it should use sendmsg() or sendto() + with MSG_FASTOPEN flag which performs a TCP handshake automatically. + + The values (bitmap) are: + 1: Enables sending data in the opening SYN on the client + + Default: 0 + tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 255. Default value diff --git a/include/linux/socket.h b/include/linux/socket.h index 25d6322fb635..ba7b2e817cfa 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -268,6 +268,7 @@ struct ucred { #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ #define MSG_EOF MSG_FIN +#define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exit for file descriptor received through SCM_RIGHTS */ diff --git a/include/net/inet_common.h b/include/net/inet_common.h index 22fac9892b16..234008782c8c 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -14,9 +14,11 @@ struct sockaddr; struct socket; extern int inet_release(struct socket *sock); -extern int inet_stream_connect(struct socket *sock, struct sockaddr * uaddr, +extern int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, int addr_len, int flags); -extern int inet_dgram_connect(struct socket *sock, struct sockaddr * uaddr, +extern int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, + int addr_len, int flags); +extern int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, int addr_len, int flags); extern int inet_accept(struct socket *sock, struct socket *newsock, int flags); extern int inet_sendmsg(struct kiocb *iocb, struct socket *sock, diff --git a/include/net/tcp.h b/include/net/tcp.h index 867557b4244a..c0258100d70c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -212,6 +212,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); /* TCP initial congestion window as per draft-hkchu-tcpm-initcwnd-01 */ #define TCP_INIT_CWND 10 +/* Bit Flags for sysctl_tcp_fastopen */ +#define TFO_CLIENT_ENABLE 1 + extern struct inet_timewait_death_row tcp_death_row; /* sysctl variables for tcp */ diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index edc414625be2..fe4582ca969a 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -585,8 +585,8 @@ static long inet_wait_for_connect(struct sock *sk, long timeo, int writebias) * Connect to a remote host. There is regrettably still a little * TCP 'magic' in here. */ -int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, - int addr_len, int flags) +int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, + int addr_len, int flags) { struct sock *sk = sock->sk; int err; @@ -595,8 +595,6 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, if (addr_len < sizeof(uaddr->sa_family)) return -EINVAL; - lock_sock(sk); - if (uaddr->sa_family == AF_UNSPEC) { err = sk->sk_prot->disconnect(sk, flags); sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED; @@ -663,7 +661,6 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, sock->state = SS_CONNECTED; err = 0; out: - release_sock(sk); return err; sock_error: @@ -673,6 +670,18 @@ sock_error: sock->state = SS_DISCONNECTING; goto out; } +EXPORT_SYMBOL(__inet_stream_connect); + +int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, + int addr_len, int flags) +{ + int err; + + lock_sock(sock->sk); + err = __inet_stream_connect(sock, uaddr, addr_len, flags); + release_sock(sock->sk); + return err; +} EXPORT_SYMBOL(inet_stream_connect); /* diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4252cd8f39fd..581ecf02c6b5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -270,6 +270,7 @@ #include #include +#include #include #include #include @@ -982,26 +983,67 @@ static inline int select_size(const struct sock *sk, bool sg) return tmp; } +void tcp_free_fastopen_req(struct tcp_sock *tp) +{ + if (tp->fastopen_req != NULL) { + kfree(tp->fastopen_req); + tp->fastopen_req = NULL; + } +} + +static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *size) +{ + struct tcp_sock *tp = tcp_sk(sk); + int err, flags; + + if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE)) + return -EOPNOTSUPP; + if (tp->fastopen_req != NULL) + return -EALREADY; /* Another Fast Open is in progress */ + + tp->fastopen_req = kzalloc(sizeof(struct tcp_fastopen_request), + sk->sk_allocation); + if (unlikely(tp->fastopen_req == NULL)) + return -ENOBUFS; + tp->fastopen_req->data = msg; + + flags = (msg->msg_flags & MSG_DONTWAIT) ? O_NONBLOCK : 0; + err = __inet_stream_connect(sk->sk_socket, msg->msg_name, + msg->msg_namelen, flags); + *size = tp->fastopen_req->copied; + tcp_free_fastopen_req(tp); + return err; +} + int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size) { struct iovec *iov; struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - int iovlen, flags, err, copied; - int mss_now = 0, size_goal; + int iovlen, flags, err, copied = 0; + int mss_now = 0, size_goal, copied_syn = 0, offset = 0; bool sg; long timeo; lock_sock(sk); flags = msg->msg_flags; + if (flags & MSG_FASTOPEN) { + err = tcp_sendmsg_fastopen(sk, msg, &copied_syn); + if (err == -EINPROGRESS && copied_syn > 0) + goto out; + else if (err) + goto out_err; + offset = copied_syn; + } + timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); /* Wait for a connection to finish. */ if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) - goto out_err; + goto do_error; if (unlikely(tp->repair)) { if (tp->repair_queue == TCP_RECV_QUEUE) { @@ -1037,6 +1079,15 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, unsigned char __user *from = iov->iov_base; iov++; + if (unlikely(offset > 0)) { /* Skip bytes copied in SYN */ + if (offset >= seglen) { + offset -= seglen; + continue; + } + seglen -= offset; + from += offset; + offset = 0; + } while (seglen > 0) { int copy = 0; @@ -1199,7 +1250,7 @@ out: if (copied && likely(!tp->repair)) tcp_push(sk, flags, mss_now, tp->nonagle); release_sock(sk); - return copied; + return copied + copied_syn; do_fault: if (!skb->len) { @@ -1212,7 +1263,7 @@ do_fault: } do_error: - if (copied) + if (copied + copied_syn) goto out; out_err: err = sk_stream_error(sk, flags, err); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 01aa77a97020..1d8b75a58981 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1952,6 +1952,9 @@ void tcp_v4_destroy_sock(struct sock *sk) tp->cookie_values = NULL; } + /* If socket is aborted during connect operation */ + tcp_free_fastopen_req(tp); + sk_sockets_allocated_dec(sk); sock_release_memcg(sk); } -- cgit v1.2.3 From 67da22d23fa6f3324e03bcd0580b914b2e4afbf3 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Thu, 19 Jul 2012 06:43:11 +0000 Subject: net-tcp: Fast Open client - cookie-less mode In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 ++ include/linux/tcp.h | 1 + include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 8 ++++++-- net/ipv4/tcp_output.c | 6 +++++- 5 files changed, 15 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 03964e088180..5f3ef7f7fcec 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -476,6 +476,8 @@ tcp_fastopen - INTEGER The values (bitmap) are: 1: Enables sending data in the opening SYN on the client + 5: Enables sending data in the opening SYN on the client regardless + of cookie availability. Default: 0 diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 1edf96afab44..9febfb685c33 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -387,6 +387,7 @@ struct tcp_sock { u8 repair_queue; u8 do_early_retrans:1,/* Enable RFC5827 early-retransmit */ early_retrans_delayed:1, /* Delayed ER timer installed */ + syn_data:1, /* SYN includes data */ syn_fastopen:1; /* SYN includes Fast Open option */ /* RTT measurement */ diff --git a/include/net/tcp.h b/include/net/tcp.h index e07878d246aa..bc7c134ec054 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -214,6 +214,7 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); /* Bit Flags for sysctl_tcp_fastopen */ #define TFO_CLIENT_ENABLE 1 +#define TFO_CLIENT_NO_COOKIE 4 /* Data in SYN w/o cookie option */ extern struct inet_timewait_death_row tcp_death_row; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c49a4fc175bd..e67d685a6c0e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5650,7 +5650,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, struct tcp_fastopen_cookie *cookie) { struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *data = tcp_write_queue_head(sk); + struct sk_buff *data = tp->syn_data ? tcp_write_queue_head(sk) : NULL; u16 mss = tp->rx_opt.mss_clamp; bool syn_drop; @@ -5665,6 +5665,9 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, mss = opt.mss_clamp; } + if (!tp->syn_fastopen) /* Ignore an unsolicited cookie */ + cookie->len = -1; + /* The SYN-ACK neither has cookie nor acknowledges the data. Presumably * the remote receives only the retransmitted (regular) SYNs: either * the original SYN-data or the corresponding SYN-ACK is lost. @@ -5816,7 +5819,8 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, tcp_finish_connect(sk, skb); - if (tp->syn_fastopen && tcp_rcv_fastopen_synack(sk, skb, &foc)) + if ((tp->syn_fastopen || tp->syn_data) && + tcp_rcv_fastopen_synack(sk, skb, &foc)) return -1; if (sk->sk_write_pending || diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c5cfd5ec3184..27a32acfdb62 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2864,6 +2864,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) struct sk_buff *syn_data = NULL, *data; unsigned long last_syn_loss = 0; + tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */ tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie, &syn_loss, &last_syn_loss); /* Recurring FO SYN losses: revert to regular handshake temporarily */ @@ -2873,7 +2874,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) goto fallback; } - if (fo->cookie.len <= 0) + if (sysctl_tcp_fastopen & TFO_CLIENT_NO_COOKIE) + fo->cookie.len = -1; + else if (fo->cookie.len <= 0) goto fallback; /* MSS for SYN-data is based on cached MSS and bounded by PMTU and @@ -2916,6 +2919,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) fo->copied = data->len; if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) { + tp->syn_data = (fo->copied > 0); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENACTIVE); goto done; } -- cgit v1.2.3 From efaac3bf087b1a6cec28f2a041e01c874d65390c Mon Sep 17 00:00:00 2001 From: Leo Alterman Date: Fri, 20 Jul 2012 14:51:07 -0700 Subject: openvswitch: Fix typo in documentation. Signed-off-by: Leo Alterman Signed-off-by: Jesse Gross --- Documentation/networking/openvswitch.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt index b8a048b8df3a..8fa2dd1e792e 100644 --- a/Documentation/networking/openvswitch.txt +++ b/Documentation/networking/openvswitch.txt @@ -118,7 +118,7 @@ essentially like this, ignoring metadata: Naively, to add VLAN support, it makes sense to add a new "vlan" flow key attribute to contain the VLAN tag, then continue to decode the encapsulated headers beyond the VLAN tag using the existing field -definitions. With this change, an TCP packet in VLAN 10 would have a +definitions. With this change, a TCP packet in VLAN 10 would have a flow key much like this: eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) -- cgit v1.2.3 From 5aa93bcf66f4af094d6f11096e81d5501a0b4ba5 Mon Sep 17 00:00:00 2001 From: Neil Horman Date: Sat, 21 Jul 2012 07:56:07 +0000 Subject: sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman CC: Vlad Yasevich CC: Sridhar Samudrala CC: "David S. Miller" CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 14 +++++ include/net/sctp/constants.h | 1 + include/net/sctp/structs.h | 20 ++++++- include/net/sctp/user.h | 11 ++++ net/sctp/associola.c | 37 +++++++++--- net/sctp/outqueue.c | 6 +- net/sctp/sm_sideeffect.c | 33 +++++++++-- net/sctp/socket.c | 101 +++++++++++++++++++++++++++++++++ net/sctp/sysctl.c | 9 +++ net/sctp/transport.c | 4 +- 10 files changed, 221 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5f3ef7f7fcec..406a5226220d 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1440,6 +1440,20 @@ path_max_retrans - INTEGER Default: 5 +pf_retrans - INTEGER + The number of retransmissions that will be attempted on a given path + before traffic is redirected to an alternate transport (should one + exist). Note this is distinct from path_max_retrans, as a path that + passes the pf_retrans threshold can still be used. Its only + deprioritized when a transmission path is selected by the stack. This + setting is primarily used to enable fast failover mechanisms without + having to reduce path_max_retrans to a very low value. See: + http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt + for details. Note also that a value of pf_retrans > path_max_retrans + disables this feature + + Default: 0 + rto_initial - INTEGER The initial round trip timeout value in milliseconds that will be used in calculating round trip times. This is the initial time interval diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 942b864f6135..d053d2e99876 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -334,6 +334,7 @@ typedef enum { typedef enum { SCTP_TRANSPORT_UP, SCTP_TRANSPORT_DOWN, + SCTP_TRANSPORT_PF, } sctp_transport_cmd_t; /* These are the address scopes defined mainly for IPv4 addresses diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 536e439ddf1d..fc5e60016e37 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -161,6 +161,12 @@ extern struct sctp_globals { int max_retrans_path; int max_retrans_init; + /* Potentially-Failed.Max.Retrans sysctl value + * taken from: + * http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 + */ + int pf_retrans; + /* * Policy for preforming sctp/socket accounting * 0 - do socket level accounting, all assocs share sk_sndbuf @@ -258,6 +264,7 @@ extern struct sctp_globals { #define sctp_sndbuf_policy (sctp_globals.sndbuf_policy) #define sctp_rcvbuf_policy (sctp_globals.rcvbuf_policy) #define sctp_max_retrans_path (sctp_globals.max_retrans_path) +#define sctp_pf_retrans (sctp_globals.pf_retrans) #define sctp_max_retrans_init (sctp_globals.max_retrans_init) #define sctp_sack_timeout (sctp_globals.sack_timeout) #define sctp_hb_interval (sctp_globals.hb_interval) @@ -990,10 +997,15 @@ struct sctp_transport { /* This is the max_retrans value for the transport and will * be initialized from the assocs value. This can be changed - * using SCTP_SET_PEER_ADDR_PARAMS socket option. + * using the SCTP_SET_PEER_ADDR_PARAMS socket option. */ __u16 pathmaxrxt; + /* This is the partially failed retrans value for the transport + * and will be initialized from the assocs value. This can be changed + * using the SCTP_PEER_ADDR_THLDS socket option + */ + int pf_retrans; /* PMTU : The current known path MTU. */ __u32 pathmtu; @@ -1664,6 +1676,12 @@ struct sctp_association { */ int max_retrans; + /* This is the partially failed retrans value for the transport + * and will be initialized from the assocs value. This can be + * changed using the SCTP_PEER_ADDR_THLDS socket option + */ + int pf_retrans; + /* Maximum number of times the endpoint will retransmit INIT */ __u16 max_init_attempts; diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index 0842ef00b2fe..1b02d7ad453b 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -93,6 +93,7 @@ typedef __s32 sctp_assoc_t; #define SCTP_GET_ASSOC_NUMBER 28 /* Read only */ #define SCTP_GET_ASSOC_ID_LIST 29 /* Read only */ #define SCTP_AUTO_ASCONF 30 +#define SCTP_PEER_ADDR_THLDS 31 /* Internal Socket Options. Some of the sctp library functions are * implemented using these socket options. @@ -649,6 +650,7 @@ struct sctp_paddrinfo { */ enum sctp_spinfo_state { SCTP_INACTIVE, + SCTP_PF, SCTP_ACTIVE, SCTP_UNCONFIRMED, SCTP_UNKNOWN = 0xffff /* Value used for transport state unknown */ @@ -741,4 +743,13 @@ typedef struct { int sd; } sctp_peeloff_arg_t; +/* + * Peer Address Thresholds socket option + */ +struct sctp_paddrthlds { + sctp_assoc_t spt_assoc_id; + struct sockaddr_storage spt_address; + __u16 spt_pathmaxrxt; + __u16 spt_pathpfthld; +}; #endif /* __net_sctp_user_h__ */ diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 8cf348e62e74..ebaef3ed6065 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -124,6 +124,8 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a * socket values. */ asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt; + asoc->pf_retrans = sctp_pf_retrans; + asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial); asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max); asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min); @@ -686,6 +688,9 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, /* Set the path max_retrans. */ peer->pathmaxrxt = asoc->pathmaxrxt; + /* And the partial failure retrnas threshold */ + peer->pf_retrans = asoc->pf_retrans; + /* Initialize the peer's SACK delay timeout based on the * association configured value. */ @@ -841,6 +846,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc, struct sctp_ulpevent *event; struct sockaddr_storage addr; int spc_state = 0; + bool ulp_notify = true; /* Record the transition on the transport. */ switch (command) { @@ -854,6 +860,14 @@ void sctp_assoc_control_transport(struct sctp_association *asoc, spc_state = SCTP_ADDR_CONFIRMED; else spc_state = SCTP_ADDR_AVAILABLE; + /* Don't inform ULP about transition from PF to + * active state and set cwnd to 1, see SCTP + * Quick failover draft section 5.1, point 5 + */ + if (transport->state == SCTP_PF) { + ulp_notify = false; + transport->cwnd = 1; + } transport->state = SCTP_ACTIVE; break; @@ -872,6 +886,11 @@ void sctp_assoc_control_transport(struct sctp_association *asoc, spc_state = SCTP_ADDR_UNREACHABLE; break; + case SCTP_TRANSPORT_PF: + transport->state = SCTP_PF; + ulp_notify = false; + break; + default: return; } @@ -879,12 +898,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc, /* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the * user. */ - memset(&addr, 0, sizeof(struct sockaddr_storage)); - memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len); - event = sctp_ulpevent_make_peer_addr_change(asoc, &addr, - 0, spc_state, error, GFP_ATOMIC); - if (event) - sctp_ulpq_tail_event(&asoc->ulpq, event); + if (ulp_notify) { + memset(&addr, 0, sizeof(struct sockaddr_storage)); + memcpy(&addr, &transport->ipaddr, + transport->af_specific->sockaddr_len); + event = sctp_ulpevent_make_peer_addr_change(asoc, &addr, + 0, spc_state, error, GFP_ATOMIC); + if (event) + sctp_ulpq_tail_event(&asoc->ulpq, event); + } /* Select new active and retran paths. */ @@ -900,7 +922,8 @@ void sctp_assoc_control_transport(struct sctp_association *asoc, transports) { if ((t->state == SCTP_INACTIVE) || - (t->state == SCTP_UNCONFIRMED)) + (t->state == SCTP_UNCONFIRMED) || + (t->state == SCTP_PF)) continue; if (!first || t->last_time_heard > first->last_time_heard) { second = first; diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index a0fa19f5650c..e7aa177c9522 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout) if (!new_transport) new_transport = asoc->peer.active_path; } else if ((new_transport->state == SCTP_INACTIVE) || - (new_transport->state == SCTP_UNCONFIRMED)) { + (new_transport->state == SCTP_UNCONFIRMED) || + (new_transport->state == SCTP_PF)) { /* If the chunk is Heartbeat or Heartbeat Ack, * send it to chunk->transport, even if it's * inactive. @@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout) new_transport = chunk->transport; if (!new_transport || ((new_transport->state == SCTP_INACTIVE) || - (new_transport->state == SCTP_UNCONFIRMED))) + (new_transport->state == SCTP_UNCONFIRMED) || + (new_transport->state == SCTP_PF))) new_transport = asoc->peer.active_path; if (new_transport->state == SCTP_UNCONFIRMED) continue; diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 8716da1a8592..fe99628e1257 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type, sctp_subtype_t subtype, sctp_cmd_seq_t *commands, gfp_t gfp); +static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds, + struct sctp_transport *t); /******************************************************************** * Helper functions ********************************************************************/ @@ -470,7 +472,8 @@ sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = { * notification SHOULD be sent to the upper layer. * */ -static void sctp_do_8_2_transport_strike(struct sctp_association *asoc, +static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands, + struct sctp_association *asoc, struct sctp_transport *transport, int is_hb) { @@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc, transport->error_count++; } + /* If the transport error count is greater than the pf_retrans + * threshold, and less than pathmaxrtx, then mark this transport + * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1, + * point 1 + */ + if ((transport->state != SCTP_PF) && + (asoc->pf_retrans < transport->pathmaxrxt) && + (transport->error_count > asoc->pf_retrans)) { + + sctp_assoc_control_transport(asoc, transport, + SCTP_TRANSPORT_PF, + 0); + + /* Update the hb timer to resend a heartbeat every rto */ + sctp_cmd_hb_timer_update(commands, transport); + } + if (transport->state != SCTP_INACTIVE && (transport->error_count > transport->pathmaxrxt)) { SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p", @@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds, SCTP_HEARTBEAT_SUCCESS); } + if (t->state == SCTP_PF) + sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP, + SCTP_HEARTBEAT_SUCCESS); + /* The receiver of the HEARTBEAT ACK should also perform an * RTT measurement for that destination transport address * using the time value carried in the HEARTBEAT ACK chunk. @@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type, case SCTP_CMD_STRIKE: /* Mark one strike against a transport. */ - sctp_do_8_2_transport_strike(asoc, cmd->obj.transport, - 0); + sctp_do_8_2_transport_strike(commands, asoc, + cmd->obj.transport, 0); break; case SCTP_CMD_TRANSPORT_IDLE: @@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type, case SCTP_CMD_TRANSPORT_HB_SENT: t = cmd->obj.transport; - sctp_do_8_2_transport_strike(asoc, t, 1); + sctp_do_8_2_transport_strike(commands, asoc, + t, 1); t->hb_sent = 1; break; diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 5d488cdcf679..5e259817a7f3 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3478,6 +3478,56 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval, } +/* + * SCTP_PEER_ADDR_THLDS + * + * This option allows us to alter the partially failed threshold for one or all + * transports in an association. See Section 6.1 of: + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt + */ +static int sctp_setsockopt_paddr_thresholds(struct sock *sk, + char __user *optval, + unsigned int optlen) +{ + struct sctp_paddrthlds val; + struct sctp_transport *trans; + struct sctp_association *asoc; + + if (optlen < sizeof(struct sctp_paddrthlds)) + return -EINVAL; + if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, + sizeof(struct sctp_paddrthlds))) + return -EFAULT; + + + if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) { + asoc = sctp_id2assoc(sk, val.spt_assoc_id); + if (!asoc) + return -ENOENT; + list_for_each_entry(trans, &asoc->peer.transport_addr_list, + transports) { + if (val.spt_pathmaxrxt) + trans->pathmaxrxt = val.spt_pathmaxrxt; + trans->pf_retrans = val.spt_pathpfthld; + } + + if (val.spt_pathmaxrxt) + asoc->pathmaxrxt = val.spt_pathmaxrxt; + asoc->pf_retrans = val.spt_pathpfthld; + } else { + trans = sctp_addr_id2transport(sk, &val.spt_address, + val.spt_assoc_id); + if (!trans) + return -ENOENT; + + if (val.spt_pathmaxrxt) + trans->pathmaxrxt = val.spt_pathmaxrxt; + trans->pf_retrans = val.spt_pathpfthld; + } + + return 0; +} + /* API 6.2 setsockopt(), getsockopt() * * Applications use setsockopt() and getsockopt() to set or retrieve @@ -3627,6 +3677,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname, case SCTP_AUTO_ASCONF: retval = sctp_setsockopt_auto_asconf(sk, optval, optlen); break; + case SCTP_PEER_ADDR_THLDS: + retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen); + break; default: retval = -ENOPROTOOPT; break; @@ -5498,6 +5551,51 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len, return 0; } +/* + * SCTP_PEER_ADDR_THLDS + * + * This option allows us to fetch the partially failed threshold for one or all + * transports in an association. See Section 6.1 of: + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt + */ +static int sctp_getsockopt_paddr_thresholds(struct sock *sk, + char __user *optval, + int len, + int __user *optlen) +{ + struct sctp_paddrthlds val; + struct sctp_transport *trans; + struct sctp_association *asoc; + + if (len < sizeof(struct sctp_paddrthlds)) + return -EINVAL; + len = sizeof(struct sctp_paddrthlds); + if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, len)) + return -EFAULT; + + if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) { + asoc = sctp_id2assoc(sk, val.spt_assoc_id); + if (!asoc) + return -ENOENT; + + val.spt_pathpfthld = asoc->pf_retrans; + val.spt_pathmaxrxt = asoc->pathmaxrxt; + } else { + trans = sctp_addr_id2transport(sk, &val.spt_address, + val.spt_assoc_id); + if (!trans) + return -ENOENT; + + val.spt_pathmaxrxt = trans->pathmaxrxt; + val.spt_pathpfthld = trans->pf_retrans; + } + + if (put_user(len, optlen) || copy_to_user(optval, &val, len)) + return -EFAULT; + + return 0; +} + SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { @@ -5636,6 +5734,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, case SCTP_AUTO_ASCONF: retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen); break; + case SCTP_PEER_ADDR_THLDS: + retval = sctp_getsockopt_paddr_thresholds(sk, optval, len, optlen); + break; default: retval = -ENOPROTOOPT; break; diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index e5fe639c89e7..2b2bfe933ff1 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -140,6 +140,15 @@ static ctl_table sctp_table[] = { .extra1 = &one, .extra2 = &int_max }, + { + .procname = "pf_retrans", + .data = &sctp_pf_retrans, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &int_max + }, { .procname = "max_init_retransmits", .data = &sctp_max_retrans_init, diff --git a/net/sctp/transport.c b/net/sctp/transport.c index a6b7ee9ce28a..d1c652ed2f3d 100644 --- a/net/sctp/transport.c +++ b/net/sctp/transport.c @@ -87,6 +87,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer, /* Initialize the default path max_retrans. */ peer->pathmaxrxt = sctp_max_retrans_path; + peer->pf_retrans = sctp_pf_retrans; INIT_LIST_HEAD(&peer->transmitted); INIT_LIST_HEAD(&peer->send_ready); @@ -595,7 +596,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t) { unsigned long timeout; timeout = t->rto + sctp_jitter(t->rto); - if (t->state != SCTP_UNCONFIRMED) + if ((t->state != SCTP_UNCONFIRMED) && + (t->state != SCTP_PF)) timeout += t->hbinterval; timeout += jiffies; return timeout; -- cgit v1.2.3 From f8b72d36d2eb94824d8445efdd706bf037570f88 Mon Sep 17 00:00:00 2001 From: Rick Jones Date: Fri, 20 Jul 2012 10:51:37 +0000 Subject: net-next: minor cleanups for bonding documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The section titled "Configuring Bonding for Maximum Throughput" is actually section twelve not thirteen, and there are a couple of words spelled incorrectly. Signed-off-by: Rick Jones Reviewed-by: Nicolas de Pesloüan Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index bfea8a338901..6b1c7110534e 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1210,7 +1210,7 @@ options, you may wish to use the "max_bonds" module parameter, documented above. To create multiple bonding devices with differing options, it is -preferrable to use bonding parameters exported by sysfs, documented in the +preferable to use bonding parameters exported by sysfs, documented in the section below. For versions of bonding without sysfs support, the only means to @@ -1950,7 +1950,7 @@ access to fail over to. Additionally, the bonding load balance modes support link monitoring of their members, so if individual links fail, the load will be rebalanced across the remaining devices. - See Section 13, "Configuring Bonding for Maximum Throughput" + See Section 12, "Configuring Bonding for Maximum Throughput" for information on configuring bonding with one peer device. 11.2 High Availability in a Multiple Switch Topology @@ -2620,7 +2620,7 @@ be found at: https://lists.sourceforge.net/lists/listinfo/bonding-devel - Discussions regarding the developpement of the bonding driver take place + Discussions regarding the development of the bonding driver take place on the main Linux network mailing list, hosted at vger.kernel.org. The list address is: -- cgit v1.2.3