lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2018-04-01	Bluetooth: btusb: Add USB ID 7392:a611 for Edimax EW-7611ULB	Vicente Bergas
	This WiFi/Bluetooth USB dongle uses a Realtek chipset, so, use btrtl for it. Product information: https://wikidevi.com/wiki/Edimax_EW-7611ULB From /sys/kernel/debug/usb/devices T: Bus=02 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=7392 ProdID=a611 Rev= 2.00 S: Manufacturer=Realtek S: Product=Edimax Wi-Fi N150 Bluetooth4.0 USB Adapter S: SerialNumber=00e04c000001 C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=01 I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms I:* If#= 2 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=ff Prot=ff Driver=rtl8723bu E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=500us E: Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=09(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Tested-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-04-01	Bluetooth: hci_ll: Use skb_put_u8 instead of struct hcill_cmd	Marcel Holtmann
	The struct hcill_cmd to create an skb with a single u8 is pointless. So just use skb_put_u8 instead. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2018-04-01	Bluetooth: btmrvl: Delete an unnecessary variable initialisation in ↵	Markus Elfring
	btmrvl_sdio_card_to_host() The variable "payload" will eventually be set to an appropriate pointer a bit later. Thus omit the explicit initialisation at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-04-01	Bluetooth: btmrvl: Delete an unnecessary variable initialisation in ↵	Markus Elfring
	btmrvl_sdio_register_dev() The local variable "ret" will be set to an appropriate value a bit later. Thus omit the explicit initialisation at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-04-01	Bluetooth: hci_bcm: Use default baud rate if missing shutdown GPIO	Marcel Holtmann
	In case the shutdown GPIO is not wired up, it is impossible to reset the Bluetooth controller to its original state. This include the initial default baud rate which leads to issues when reloading the module or when something unexpected happens. To avoid any kind of runtime deadlocks, stick with the initial default baud rate. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2018-04-01	Bluetooth: hci_bcm: use gpiod cansleep version	Loic Poulain
	Some GPIO controller drivers request sleepable context and so can't be accessed from IRQ context. Using gpiod_set/get_value accessors with such controller leads to a kernel warning since they are reserved for atomic context (according to the documentation). Use the postfixed _cansleep version instead, indicating that context is safe for sleeping if necessary. Note that this is the case here since we never toggle the gpio neither from IRQ nor from a spinlocked section. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-04-01	Bluetooth: Fix data type of appearence	Jaganath Kanakkassery
	It should be __le16 instead of __u16 since its part of mgmt API. Signed-off-by: Jaganath Kanakkassery <jaganathx.kanakkassery@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-03-31	Merge branch 'chelsio-inline-tls'	David S. Miller
	Atul Gupta says: ==================== Chelsio Inline TLS Series for Chelsio Inline TLS driver (chtls) Use tls ULP infrastructure to register chtls as Inline TLS driver. Chtls use TCP Sockets to Tx/Rx TLS records. TCP sk_proto APIs are enhanced to offload TLS record. T6 adapter provides the following features: -TLS record offload, TLS header, encrypt, digest and transmit -TLS record receive and decrypt -TLS keys store -TCP/IP engine -TLS engine -GCM crypto engine [support CBC also] TLS provides security at the transport layer. It uses TCP to provide reliable end-to-end transport of application data. It relies on TCP for any retransmission. TLS session comprises of three parts: a. TCP/IP connection b. TLS handshake c. Record layer processing TLS handshake state machine is executed in host (refer standard implementation eg. OpenSSL). Setsockopt [SOL_TCP, TCP_ULP] initialize TCP proto-ops for Chelsio inline tls support. setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); Tx and Rx Keys are decided during handshake and programmed on the chip after CCS is exchanged. struct tls12_crypto_info_aes_gcm_128 crypto_info setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)) Finish is the first encrypted/decrypted message tx/rx inline. On the Tx path TLS engine receive plain text from openssl, insert IV, fetches the tx key, create cipher text records and generate MAC. TLS header is added to cipher text and forward to TCP/IP engine for transport layer processing and transmission on wire. TX PATH: Apps--openssl--chtls---TLS engine---encrypt/auth---TCP/IP engine---wire On the Rx side, data received is PDU aligned at record boundaries. TLS processes only the complete record. If rx key is programmed on CCS receive, data is decrypted and plain text is posted to host. RX PATH: Wire--cipher-text--TCP/IP engine [PDU align]---TLS engine--- decrypt/auth---plain-text--chtls--openssl--application v15: indent fix in mark_urg -removed unwanted checks in sendmsg, sendpage, recvmsg, close, disconnect,shutdown, destroy sock [Sabrina] - removed unused chtls_free_kmap [chtls.h] - rebase to top of net-next v14: -Reverse christmas tree style for variable declarations for various functions in chtls_hw.c, chtls_io.c [Stefano Brivio] - replaced break with return in tcp_state_to_flowc_state [Stefano Brivio] - renamed tlstx_seq_number to tlstx_incr_seqnum [Stefano Brivio] - use bool for corked, should_push and send_should_push [Stefano Brivio] - removed "Reviewed-by" tag for Stefano, Sabrina, Dave Watson v13: handle clean ctx free for HW_RECORD in tls_sk_proto_close -removed SOCK_INLINE [chtls.h], using csk_conn_inline instead in send_abort_rpl,chtls_send_abort_rpl,chtls_sendmsg,chtls_sendpage -removed sk_no_receive [chtls_io.c] replaced with sk_shutdown & RCV_SHUTDOWN in chtls_pt_recvmsg, peekmsg and chtls_recvmsg -cleaned chtls_expansion_size [Stefano Brivio] - u8 conf:3 in tls_sw_context to add TLS_HW_RECORD -removed is_tls_skb, using tls_skb_inline [Stefano Brivio] -reverse christmas tree formatting in chtls_io.c, chtls_cm.c [Stefano Brivio] -fixed build warning reported by kbuild robot -retained ctx conf enum in chtls_main vs earlier versions, tls_prots not used in chtls. -cleanup [removed syn_sent, base_prot, added synq] [Michael Werner] - passing struct fw_wr_hdr * to ofldtxq_stop [Casey] - rebased on top of the current net-next v12: patch against net-next -fixed build error [reported by Julia] -replace set_queue with skb_set_queue_mapping [Sabrina] -copyright year correction [chtls] v11: formatting and cleanup, few function rename and error handling [Stefano Brivio] - ctx freed later for TLS_HW_RECORD - split tx and rx in different patch v10: fixed following based on the review comments of Sabrina Dubroca -docs header added for struct tls_device [tls.h] -changed TLS_FULL_HW to TLS_HW_RECORD -similary using tls-hw-record instead of tls-inline for ethtool feature config -added more description to patch sets -replaced kmalloc/vmalloc/kfree with kvzalloc/kvfree -reordered the patch sequence -formatted entire patch for func return values v9: corrected __u8 and similar usage -create_ctx to alloc tls_context -tls_hw_prot before sk !establish check v8: tls_main.c cleanup comment [Dave Watson] v7: func name change, use sk->sk_prot where required v6: modify prot only for FULL_HW -corrected commit message for patch 11 v5: set TLS_FULL_HW for registered inline tls drivers -set TLS_FULL_HW prot for offload connection else move to TLS_SW_TX -Case handled for interface with same IP [Dave Miller] -Removed Specific IP and INADDR_ANY handling [v4] v4: removed chtls ULP type, retained tls ULP -registered chtls with net tls -defined struct tls_device to register the Inline drivers -ethtool interface tls-inline to enable Inline TLS for interface -prot update to support inline TLS v3: fixed the kbuild test issues -made few funtions static -initialized few variables v2: fixed the following based on the review comments of Stephan Mueller, Stefano Brivio and Hannes Frederic -Added more details in cover letter -Fixed indentation and formating issues -Using aes instead of aes-generic -memset key info after programing the key on chip -reordered the patch sequence ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chtls - Makefile Kconfig	Atul Gupta
	Entry for Inline TLS as another driver dependent on cxgb4 and chcr Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chtls - Program the TLS session Key	Atul Gupta
	Initialize the space reserved for storing the TLS keys, get and free the location where key is stored for the TLS connection. Program the Tx and Rx key as received from user in struct tls12_crypto_info_aes_gcm_128 and understood by hardware. added socket option TLS_RX Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chtls - Inline TLS record Rx	Atul Gupta
	handler for record receive. plain text copied to user buffer Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: Michael Werner <werner@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chtls - Inline TLS record Tx	Atul Gupta
	TLS handler for record transmit. Create Inline TLS work request and post to FW. Create Inline TLS record CPLs for hardware Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: Michael Werner <werner@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto : chtls - CPL handler definition	Atul Gupta
	Exchange messages with hardware to program the TLS session CPL handlers for messages received from chip. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: Michael Werner <werner@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chtls - Register chtls with net tls	Atul Gupta
	Register chtls as Inline TLS driver, chtls is ULD to cxgb4. Setsockopt to program (tx/rx) keys on chip. Support AES GCM of key size 128. Support both Inline Rx and Tx. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Reviewed-by: Michael Werner <werner@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chtls - structure and macro for Inline TLS	Atul Gupta
	Define Inline TLS state, connection management info. Supporting macros definition. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Michael Werner <werner@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	crypto: chcr - Inline TLS Key Macros	Atul Gupta
	Define macro for programming the TLS Key context Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	cxgb4: LLD driver changes to support TLS	Atul Gupta
	Read the Inline TLS capability from firmware. Determine the area reserved for storing the keys Dump the Inline TLS tx and rx records count. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Michael Werner <werner@chelsio.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	cxgb4: Inline TLS FW Interface	Atul Gupta
	Key area size in hw-config file. CPL struct for TLS request and response. Work request for Inline TLS. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	ethtool: enable Inline TLS in HW	Atul Gupta
	Ethtool option enables TLS record offload on HW, user configures the feature for netdev capable of Inline TLS. This allows user to define custom sk_prot for Inline TLS sock Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	tls: support for Inline tls record	Atul Gupta
	Facility to register Inline TLS drivers to net/tls. Setup TLS_HW_RECORD prot to listen on offload device. Cases handled - Inline TLS device exists, setup prot for TLS_HW_RECORD - Atleast one Inline TLS exists, sets TLS_HW_RECORD. - If non-inline device establish connection, move to TLS_SW_TX Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller
	Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-31 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei. 2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey. 3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John. 4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong. 5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri. 6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin. 7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita. 8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant. 9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua. 10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	Merge branch 'inet-frags-bring-rhashtables-to-IP-defrag'	David S. Miller
	Eric Dumazet says: ==================== inet: frags: bring rhashtables to IP defrag IP defrag processing is one of the remaining problematic layer in linux. It uses static hash tables of 1024 buckets, and up to 128 items per bucket. A work queue is supposed to garbage collect items when host is under memory pressure, and doing a hash rebuild, changing seed used in hash computations. This work queue blocks softirqs for up to 25 ms when doing a hash rebuild, occurring every 5 seconds if host is under fire. Then there is the problem of sharing this hash table for all netns. It is time to switch to rhashtables, and allocate one of them per netns to speedup netns dismantle, since this is a critical metric these days. Lookup is now using RCU, and 64bit hosts can now provision whatever amount of memory needed to handle the expected workloads. v2: Addressed Herbert and Kirill feedbacks (Use rhashtable_free_and_destroy(), and split the big patch into small units) v3: Removed the extra add_frag_mem_limit(...) from inet_frag_create() Removed the refcount_inc_not_zero() call from inet_frags_free_cb(), as we can exploit del_timer() return value. v4: kbuild robot feedback about one missing static (squashed) Additional patches : inet: frags: do not clone skb in ip_expire() ipv6: frags: rewrite ip6_expire_frag_queue() rhashtable: reorganize struct rhashtable layout inet: frags: reorganize struct netns_frags inet: frags: get rid of ipfrag_skb_cb/FRAG_CB ipv6: frags: get rid of ip6frag_skb_cb/FRAG6_CB inet: frags: get rid of nf_ct_frag6_skb_cb/NFCT_FRAG6_CB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: get rid of nf_ct_frag6_skb_cb/NFCT_FRAG6_CB	Eric Dumazet
	nf_ct_frag6_queue() uses skb->cb[] to store the fragment offset, meaning that we could use two cache lines per skb when finding the insertion point, if for some reason inet6_skb_parm size is increased in the future. By using skb->ip_defrag_offset instead of skb->cb[] we pack all the fields in a single cache line, matching what we did for IPv4. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	ipv6: frags: get rid of ip6frag_skb_cb/FRAG6_CB	Eric Dumazet
	ip6_frag_queue uses skb->cb[] to store the fragment offset, meaning that we could use two cache lines per skb when finding the insertion point, if for some reason inet6_skb_parm size is increased in the future. By using skb->ip_defrag_offset instead of skb->cb[], we pack all the fields in a single cache line, matching what we did for IPv4. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: get rid of ipfrag_skb_cb/FRAG_CB	Eric Dumazet
	ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately this integer is currently in a different cache line than skb->next, meaning that we use two cache lines per skb when finding the insertion point. By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields in a single cache line and save precious memory bandwidth. Note that after the fast path added by Changli Gao in commit d6bebca92c66 ("fragment: add fast path for in-order fragments") this change wont help the fast path, since we still need to access prev->len (2nd cache line), but will show great benefits when slow path is entered, since we perform a linear scan of a potentially long list. Also, note that this potential long list is an attack vector, we might consider also using an rb-tree there eventually. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: reorganize struct netns_frags	Eric Dumazet
	Put the read-mostly fields in a separate cache line at the beginning of struct netns_frags, to reduce false sharing noticed in inet_frag_kill() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	rhashtable: reorganize struct rhashtable layout	Eric Dumazet
	While under frags DDOS I noticed unfortunate false sharing between @nelems and @params.automatic_shrinking Move @nelems at the end of struct rhashtable so that first cache line is shared between all cpus, because almost never dirtied. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	ipv6: frags: rewrite ip6_expire_frag_queue()	Eric Dumazet
	Make it similar to IPv4 ip_expire(), and release the lock before calling icmp functions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: do not clone skb in ip_expire()	Eric Dumazet
	An skb_clone() was added in commit ec4fbd64751d ("inet: frag: release spinlock before calling icmp_send()") While fixing the bug at that time, it also added a very high cost for DDOS frags, as the ICMP rate limit is applied after this expensive operation (skb_clone() + consume_skb(), implying memory allocations, copy, and freeing) We can use skb_get(head) here, all we want is to make sure skb wont be freed by another cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: break the 2GB limit for frags storage	Eric Dumazet
	Some users are willing to provision huge amounts of memory to be able to perform reassembly reasonnably well under pressure. Current memory tracking is using one atomic_t and integers. Switch to atomic_long_t so that 64bit arches can use more than 2GB, without any cost for 32bit arches. Note that this patch avoids an overflow error, if high_thresh was set to ~2GB, since this test in inet_frag_alloc() was never true : if (... \|\| frag_mem_limit(nf) > nf->high_thresh) Tested: $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh <frag DDOS> $ grep FRAG /proc/net/sockstat FRAG: inuse 14705885 memory 16000002880 $ nstat -n ; sleep 1 ; nstat \| grep Reas IpReasmReqds 3317150 0.0 IpReasmFails 3317112 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: remove inet_frag_maybe_warn_overflow()	Eric Dumazet
	This function is obsolete, after rhashtable addition to inet defrag. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: get rif of inet_frag_evicting()	Eric Dumazet
	This refactors ip_expire() since one indentation level is removed. Note: in the future, we should try hard to avoid the skb_clone() since this is a serious performance cost. Under DDOS, the ICMP message wont be sent because of rate limits. Fact that ip6_expire_frag_queue() does not use skb_clone() is disturbing too. Presumably IPv6 should have the same issue than the one we fixed in commit ec4fbd64751d ("inet: frag: release spinlock before calling icmp_send()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: remove some helpers	Eric Dumazet
	Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem() Also since we use rhashtable we can bring back the number of fragments in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was removed in commit 434d305405ab ("inet: frag: don't account number of fragment queues") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: use rhashtables for reassembly units	Eric Dumazet
	Some applications still rely on IP fragmentation, and to be fair linux reassembly unit is not working under any serious load. It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!) A work queue is supposed to garbage collect items when host is under memory pressure, and doing a hash rebuild, changing seed used in hash computations. This work queue blocks softirqs for up to 25 ms when doing a hash rebuild, occurring every 5 seconds if host is under fire. Then there is the problem of sharing this hash table for all netns. It is time to switch to rhashtables, and allocate one of them per netns to speedup netns dismantle, since this is a critical metric these days. Lookup is now using RCU. A followup patch will even remove the refcount hold/release left from prior implementation and save a couple of atomic operations. Before this patch, 16 cpus (16 RX queue NIC) could not handle more than 1 Mpps frags DDOS. After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB of storage for the fragments (exact number depends on frags being evicted after timeout) $ grep FRAG /proc/net/sockstat FRAG: inuse 1966916 memory 2140004608 A followup patch will change the limits for 64bit arches. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Florian Westphal <fw@strlen.de> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	rhashtable: add schedule points	Eric Dumazet
	Rehashing and destroying large hash table takes a lot of time, and happens in process context. It is safe to add cond_resched() in rhashtable_rehash_table() and rhashtable_free_and_destroy() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: refactor ipfrag_init()	Eric Dumazet
	We need to call inet_frags_init() before register_pernet_subsys(), as a prereq for following patch ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: refactor lowpan_net_frag_init()	Eric Dumazet
	We want to call lowpan_net_frag_init() earlier. Similar to commit "inet: frags: refactor ipv6_frag_init()" This is a prereq to "inet: frags: use rhashtables for reassembly units" Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: refactor ipv6_frag_init()	Eric Dumazet
	We want to call inet_frags_init() earlier. This is a prereq to "inet: frags: use rhashtables for reassembly units" Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: add a pointer to struct netns_frags	Eric Dumazet
	In order to simplify the API, add a pointer to struct inet_frags. This will allow us to make things less complex. These functions no longer have a struct inet_frags parameter : inet_frag_destroy(struct inet_frag_queue q /, struct inet_frags f /) inet_frag_put(struct inet_frag_queue q /, struct inet_frags f /) inet_frag_kill(struct inet_frag_queue q /, struct inet_frags f /) inet_frags_exit_net(struct netns_frags nf /, struct inet_frags f /) ip6_expire_frag_queue(struct net net, struct frag_queue fq) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	inet: frags: change inet_frags_init_net() return value	Eric Dumazet
	We will soon initialize one rhashtable per struct netns_frags in inet_frags_init_net(). This patch changes the return value to eventually propagate an error. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	ipv6: frag: remove unused field	Eric Dumazet
	csum field in struct frag_queue is not used, remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	Merge branch 'bnxt_en-next'	David S. Miller
	Michael Chan says: ==================== bnxt_en: Update for net-next. Misc. updates including updated firmware interface, some additional port statistics, a new IRQ assignment scheme for the RDMA driver, support for VF trust, and other changes and improvements for SRIOV. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Add ULP calls to stop and restart IRQs.	Michael Chan
	When the driver needs to re-initailize the IRQ vectors, we make the new ulp_irq_stop() call to tell the RDMA driver to disable and free the IRQ vectors. After IRQ vectors have been re-initailized, we make the ulp_irq_restart() call to tell the RDMA driver that IRQs can be restarted. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver.	Michael Chan
	Add additional logic to reserve completion rings for the bnxt_re driver when it requests MSIX vectors. The function bnxt_cp_rings_in_use() will return the total number of completion rings used by both drivers that need to be reserved. If the network interface in up, we will close and open the NIC to reserve the new set of completion rings and re-initialize the vectors. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Refactor bnxt_need_reserve_rings().	Michael Chan
	Refactor bnxt_need_reserve_rings() slightly so that __bnxt_reserve_rings() can call it and remove some duplicated code. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Add IRQ remapping logic.	Michael Chan
	Add remapping logic so that bnxt_en can use any arbitrary MSIX vectors. This will allow the driver to reserve one range of MSIX vectors to be used by both bnxt_en and bnxt_re. bnxt_en can now skip over the MSIX vectors used by bnxt_re. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Change IRQ assignment for RDMA driver.	Michael Chan
	In the current code, the range of MSIX vectors allocated for the RDMA driver is disjoint from the network driver. This creates a problem for the new firmware ring reservation scheme. The new scheme requires the reserved completion rings/MSIX vectors to be in a contiguous range. Change the logic to allocate RDMA MSIX vectors to be contiguous with the vectors used by bnxt_en on new firmware using the new scheme. The new function bnxt_get_num_msix() calculates the exact number of vectors needed by both drivers. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Improve ring allocation logic.	Michael Chan
	Currently, the driver code makes some assumptions about the group index and the map index of rings. This makes the code more difficult to understand and less flexible. Improve it by adding the grp_idx and map_idx fields explicitly to the bnxt_ring_struct as a union. The grp_idx is initialized for each tx ring and rx agg ring during init. time. We do the same for the map_idx for each cmpl ring. The grp_idx ties the tx ring to the ring group. The map_idx is the doorbell index of the ring. With this new infrastructure, we can change the ring index mapping scheme easily in the future. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Improve valid bit checking in firmware response message.	Michael Chan
	When firmware sends a DMA response to the driver, the last byte of the message will be set to 1 to indicate that the whole response is valid. The driver waits for the message to be valid before reading the message. The firmware spec allows these response messages to increase in length by adding new fields to the end of these messages. The older spec's valid location may become a new field in a newer spec. To guarantee compatibility, the driver should zero the valid byte before interpreting the entire message so that any new fields not implemented by the older spec will be read as zero. For messages that are forwarded to VFs, we need to set the length and re-instate the valid bit so the VF will see the valid response. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31	bnxt_en: Improve resource accounting for SRIOV.	Michael Chan
	When VFs are created, the current code subtracts the maximum VF resources from the PF's pool. This under-estimates the resources remaining in the PF pool. Instead, we should subtract the minimum VF resources. The VF minimum resources are guaranteed to the VFs and only these should be subtracted from the PF's pool. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>