Age | Commit message (Collapse) | Author |
|
When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the following command:
# tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
$tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop
doesn't drop all IPv4 packets that match the configured TTL / destination
address. In particular, if "fragment offset" or "more fragments" have non
zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply
ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while
at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the
correct behavior.
Fixes: 518d8a2e9bad ("net/flow_dissector: add support for dissection of misc ip header fields")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Last set of updates:
* more minstrel work from Felix to reduce the
probing overhead
* QoS for nl80211 control port frames
* STBC injection support
* and a couple of small fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As udp_port_cfg struct changes its members with dependency on IPv6
configuration, the code in rxrpc should also check for IPv6.
Fixes: 1a9b86c9fd95 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a DDP broadcast packet is sent out to a non-gateway target, it is
also looped back. There is a potential for the loopback device to have a
longer hardware header length than the original target route's device,
which can result in the skb not being created with enough room for the
loopback device's hardware header. This patch fixes the issue by
determining that a loopback will be necessary prior to allocating the
skb, and if so, ensuring the skb has enough room.
This was discovered while testing a new driver that creates a LocalTalk
network interface (LTALK_HLEN = 1). It caused an skb_under_panic.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow userspace (mptcpd) to subscribe to mptcp genl multicast events.
This implementation reuses the same event API as the mptcp kernel fork
to ease integration of existing tools, e.g. mptcpd.
Supported events include:
1. start and close of an mptcp connection
2. start and close of subflows (joins)
3. announce and withdrawals of addresses
4. subflow priority (backup/non-backup) change.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Once event support is added this may need to allocate memory while msk
lock is held with softirqs disabled.
Not using lock_fast also allows to do the allocation with GFP_KERNEL.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the first/initial subflow to the existing functions so they can
pass this on to the notification handler that is added later in the
series.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case mptcp socket is already dead the entire mptcp socket
will be freed. We can avoid the close check in this case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When remote side closes a subflow we should schedule the worker to
dispose of the subflow in a timely manner.
Otherwise, SF_CLOSED event won't be generated until the mptcp
socket itself is closing or local side is closing another subflow.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prepare for subflow close events:
When mptcp connection is torn down its enough to send the mptcp socket
close notification rather than a subflow close event for all of the
subflows followed by the mptcp close event.
This splits the helper: mptcp_close_ssk() will emit the close
notification, __mptcp_close_ssk will not.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allows to make some functions static and avoids acquire of the pm
spinlock in protocol.c.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The use-case for dropping the MTU check when TC-BPF does redirect to
ingress, is described by Eyal Birger in email[0]. The summary is the
ability to increase packet size (e.g. with IPv6 headers for NAT64) and
ingress redirect packet and let normal netstack fragment packet as needed.
[0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/
V15:
- missing static for function declaration
V9:
- Make net_device "up" (IFF_UP) check explicit in skb_do_redirect
V4:
- Keep net_device "up" (IFF_UP) check.
- Adjustment to handle bpf_redirect_peer() helper
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287790971.790810.11785274340154740591.stgit@firesoul
|
|
This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs.
The SKB object is complex and the skb->len value (accessible from
BPF-prog) also include the length of any extra GRO/GSO segments, but
without taking into account that these GRO/GSO segments get added
transport (L4) and network (L3) headers before being transmitted. Thus,
this BPF-helper is created such that the BPF-programmer don't need to
handle these details in the BPF-prog.
The API is designed to help the BPF-programmer, that want to do packet
context size changes, which involves other helpers. These other helpers
usually does a delta size adjustment. This helper also support a delta
size (len_diff), which allow BPF-programmer to reuse arguments needed by
these other helpers, and perform the MTU check prior to doing any actual
size adjustment of the packet context.
It is on purpose, that we allow the len adjustment to become a negative
result, that will pass the MTU check. This might seem weird, but it's not
this helpers responsibility to "catch" wrong len_diff adjustments. Other
helpers will take care of these checks, if BPF-programmer chooses to do
actual size adjustment.
V14:
- Improve man-page desc of len_diff.
V13:
- Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff.
V12:
- Simplify segment check that calls skb_gso_validate_network_len.
- Helpers should return long
V9:
- Use dev->hard_header_len (instead of ETH_HLEN)
- Annotate with unlikely req from Daniel
- Fix logic error using skb_gso_validate_network_len from Daniel
V6:
- Took John's advice and dropped BPF_MTU_CHK_RELAX
- Returned MTU is kept at L3-level (like fib_lookup)
V4: Lot of changes
- ifindex 0 now use current netdev for MTU lookup
- rename helper from bpf_mtu_check to bpf_check_mtu
- fix bug for GSO pkt length (as skb->len is total len)
- remove __bpf_len_adj_positive, simply allow negative len adj
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
|
|
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog
don't know the MTU value that caused this rejection.
If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
need to know this MTU value for the ICMP packet.
Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
value as output via a union with 'tot_len' as this is the value used for
the MTU lookup.
V5:
- Fixed uninit value spotted by Dan Carpenter.
- Name struct output member mtu_result
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287789952.790810.13134700381067698781.stgit@firesoul
|
|
BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use
bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size,
by adjusting fib_params 'tot_len' with the packet length plus the expected
encap size. (Just like the bpf_check_mtu helper supports). He discovered
that for SKB ctx the param->tot_len was not used, instead skb->len was used
(via MTU check in is_skb_forwardable() that checks against netdev MTU).
Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g.
zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU
check is done like XDP code-path, which checks against FIB-dst MTU.
V16:
- Revert V13 optimization, 2nd lookup is against egress/resulting netdev
V13:
- Only do ifindex lookup one time, calling dev_get_by_index_rcu().
V10:
- Use same method as XDP for 'tot_len' MTU check
Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status")
Reported-by: Carlo Carraro <colrack@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/161287789444.790810.15247494756551413508.stgit@firesoul
|
|
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against
the current net_device MTU (skb->dev->mtu).
When a BPF-prog grow the packet size, then it should not be limited to the
MTU. The MTU is a transmit limitation, and software receiving this packet
should be allowed to increase the size. Further more, current MTU check in
__bpf_skb_max_len uses the MTU from ingress/current net_device, which in
case of redirects uses the wrong net_device.
This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit
is elsewhere in the system. Jesper's testing[1] showed it was not possible
to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting
factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for
SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is
in-effect due to this being called from softirq context see code
__gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed
that frames above 16KiB can cause NICs to reset (but not crash). Keep this
sanity limit at this level as memory layer can differ based on kernel
config.
[1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul
|
|
Allow user to disable HE mode, similar to how VHT and HT
can be disabled. Useful for testing.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20210204144610.25971-1-greearb@candelatech.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This patch adds support for STBC encoding to the radiotap tx parse
function. Prior to this change adding the STBC flag to the radiotap
header did not encode frames with STBC.
Signed-off-by: Philipp Borgers <borgers@mi.fu-berlin.de>
Link: https://lore.kernel.org/r/20210125150744.83065-1-borgers@mi.fu-berlin.de
[use u8_get_bits/u32_encode_bits instead of manually shifting]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This was added to mitigate the effects of too much sampling on devices that
use a static global fallback table instead of configurable multi-rate retry.
Now that the sampling algorithm is improved, this code path no longer performs
any better than the standard probing on affected devices.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-6-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This makes it easier to see what rates are going to be tested next
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-5-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The biggest flaw in current minstrel_ht is the fact that it needs way too
many probing packets to be able to quickly find the best rate.
Depending on the wifi hardware and operating mode, this can significantly
reduce throughput when not operating at the highest available data rate.
In order to be able to significantly reduce the amount of rate sampling,
we need a much smarter selection of probing rates.
The new approach introduced by this patch maintains a limited set of
available rates to be tested during a statistics window.
They are split into distinct categories:
- MINSTREL_SAMPLE_TYPE_INC - incremental rate upgrade:
Pick the next rate group and find the first rate that is faster than
the current max. throughput rate
- MINSTREL_SAMPLE_TYPE_JUMP - random testing of higher rates:
Pick a random rate from the next group that is faster than the current
max throughput rate. This allows faster adaptation when the link changes
significantly
- MINSTREL_SAMPLE_TYPE_SLOW - test a rate between max_prob, max_tp2 and
max_tp in order to reduce the gap between them
In order to prioritize sampling, every 6 attempts are split into 3x INC,
2x JUMP, 1x SLOW.
Available rates are checked and refilled on every stats window update.
With this approach, we finally get a very small delta in throughput when
comparing setting the optimal data rate as a fixed rate vs normal rate
control operation.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-4-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In order to more gracefully be able to fall back to lower rates without too
much throughput fluctuations, initialize all untested rates below tested ones
to the maximum probabilty of higher rates.
Usually this leads to untested lower rates getting initialized with a
probability value of 100%, making them better candidates for fallback without
having to rely on random probing
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Keep the update in one place and prepare for further rework
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Get rid of a lot of divisions and modulo operations
Reduces code size and improves performance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Sparse started warning on this function because we can potentially
return an uninitialized value. The reason is that if the caller
passes a min_bw value that is higher then the last value in bws[], we
will not go into the loop and reg_rule will remain initialized. This
cannot happen because the only caller of this function uses either 1
or 20 in min_bw, but the function will be more robust if we
pre-initialize the value.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210204154439.6c884ea7281c.I257278d03b0c1ae0aa6631672cfa48f1a95d5996@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The multiplication of the u32 variables tx_time and estimated_retx is
performed using a 32 bit multiplication and the result is stored in
a u64 result. This has a potential u32 overflow issue, so avoid this
by casting tx_time to a u64 to force a 64 bit multiply.
Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 050ac52cbe1f ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210205175352.208841-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This patch unifies sending control port frames
over nl80211 and AF_PACKET sockets a little more.
Before this patch, EAPOL frames got QoS prioritization
only when using AF_PACKET sockets.
__ieee80211_select_queue only selects a QoS-enabled queue
for control port frames, when the control port protocol
is set correctly on the skb. For the AF_PACKET path this
works, but the nl80211 path used ETH_P_802_3.
Another check for injected frames in wme.c then prevented
the QoS TID to be copied in the frame.
In order to fix this, get rid of the frame injection marking
for nl80211 ctrl port and set the correct ethernet protocol.
Please note:
An erlier version of this path tried to prevent
frame aggregation for control port frames in order to speed up
the initial connection setup a little. This seemed to cause
issues on my older Intel dvm-based hardware, and was therefore
removed again. Future commits which try to reintroduce this
have to check carefully how hw behaves with aggregated and
non-aggregated traffic for the same TID.
My NIC: Intel(R) Centrino(R) Ultimate-N 6300 AGN, REV=0x74
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20210206115112.567881-1-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211 class registers a callback which actually does nothing.
Given that the callback is optional, and all its accesses are protected
by a NULL check, remove it entirely.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Link: https://lore.kernel.org/r/20210208113356.4105-1-mcroce@linux.microsoft.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Compile-in the socket RX queue mapping field and logic when TLS_DEVICE
is enabled. This allows device drivers to pick the recorded socket's
RX queue and use it for streams distribution.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use a new config SOCK_RX_QUEUE_MAPPING to compile-in the socket
RX queue field and logic, instead of the XPS config.
This breaks dependency in XPS, and allows selecting it from non-XPS
use cases, as we do in the next patch.
In addition, use the new flag to wrap the logic in sk_rx_queue_get()
and protect access to the sk_rx_queue_mapping field, while keeping
the function exposed unconditionally, just like sk_rx_queue_set()
and sk_rx_queue_clear().
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we do not schedule the MPTCP retransmission
timer after pushing the data when such action happens
in the subflow context.
This may cause hang-up on active-backup scenarios, or
even when only single subflow msks are involved, if we lost
some peer's ack.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move mptcp_cleanup_rbuf() related checks inside the mentioned
helper and extend them to mirror TCP checks more closely.
Additionally drop the 'rmem_pending' hack, since commit 879526030c8b
("mptcp: protect the rx path with the msk socket spinlock") we
can use instead 'rmem_released'.
Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mptcp subflow route_req() callback performs the subflow
req initialization after the route_req() check. If the latter
fails, mptcp-specific bits of the current request sockets
are left uninitialized.
The above causes bad things at req socket disposal time, when
the mptcp resources are cleared.
This change addresses the issue by splitting subflow_init_req()
into the actual initialization and the mptcp-specific checks.
The initialization is moved before any possibly failing check.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 7ea851d19b23 ("tcp: merge 'init_req' and 'route_req' functions")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzkaller was able to trigger the following splat again:
WARNING: CPU: 1 PID: 12512 at net/mptcp/protocol.c:761 mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761
Modules linked in:
CPU: 1 PID: 12512 Comm: kworker/1:6 Not tainted 5.10.0-rc6 #52
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761
Code: e8 4b 0c ad ff e8 56 21 88 fe 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00 00 00 00 48 83 c4 40 5b 5d 41 5c c3 e8 36 21 88 fe <0f> 0b 41 bc c8 00 00 00 eb 98 e8 e7 b1 af fe e9 30 ff ff ff 48 c7
RSP: 0018:ffffc900018c7c68 EFLAGS: 00010293
RAX: ffff888108cb1c80 RBX: 1ffff92000318f8d RCX: ffffffff82ad0307
RDX: 0000000000000000 RSI: ffffffff82ad036a RDI: 0000000000000007
RBP: ffff888113e2d000 R08: ffff888108cb1c80 R09: ffffed10227c5ab7
R10: ffff888113e2d5b7 R11: ffffed10227c5ab6 R12: 0000000000000000
R13: ffff88801f100000 R14: ffff888113e2d5b0 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd76a874ef8 CR3: 000000001689c005 CR4: 0000000000170ee0
Call Trace:
mptcp_worker+0xaa4/0x1560 net/mptcp/protocol.c:2334
process_one_work+0x8d3/0x1200 kernel/workqueue.c:2272
worker_thread+0x9c/0x1090 kernel/workqueue.c:2418
kthread+0x303/0x410 kernel/kthread.c:292
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296
The mptcp_worker tries to update the MPTCP retransmission timer
even if such timer is not currently scheduled.
The mptcp_rtx_head() return value is bogus: we can have enqueued
data not yet transmitted. The above may additionally cause spurious,
unneeded MPTCP-level retransmissions.
Fix the issue adding an explicit clearing of the rtx queue before
trying to retransmit and checking for unacked data.
Additionally drop an unneeded timer stop call and the unused
mptcp_rtx_tail() helper.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current mptcp_poll() implementation gives unexpected
results after shutdown(SEND_SHUTDOWN) and when the msk
status is TCP_CLOSE.
Set the correct mask.
Fixes: 8edf08649eed ("mptcp: rework poll+nospace handling")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently all errors received on msk subflows are ignored.
We need to catch at least the errors on connect() and
on fallback sockets.
Use a custom sk_error_report callback at subflow level,
and do the real action under the msk socket lock - via
the usual sock_owned_by_user()/release_callback() schema.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Explicitly define reserved field and require it and any subsequent
fields to be zero-valued for now. Additionally, limit the valid CMSG
flags that tcp_zerocopy_receive accepts.
Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:
Thread 1 Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
// dev_ifsioc_locked()
memcpy(ifr->ifr_hwaddr.sa_data,
dev->dev_addr,...);
Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.
Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.
Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
|
|
el/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2021-02-11
Here's the main bluetooth-next pull request for 5.12:
- Add support for advertising monitor offliading using Microsoft
vendor extensions
- Add firmware download support for MediaTek MT7921U USB devices
- Suspend-related fixes for Qualcomm devices
- Add support for Intel GarfieldPeak controller
- Various other smaller fixes & cleanups
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function fib6_walk_continue() cannot return a positive value when
called from register_fib_notifier(), but ignoring causes static analyzer to
generate warnings in users of register_fib_notifier() that try to convert
returned error code to pointer with ERR_PTR(). Handle such case by
explicitly checking for positive error values and converting them to
-EINVAL in fib6_tables_dump().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Due to the fact that ic_dev->dev is kept open in ic_close_dev, I had
thought that ic_dev will not be freed either. But that is not the case,
but instead "everybody dies" when ipconfig cleans up, and just the
net_device behind ic_dev->dev remains allocated but not ic_dev itself.
This is a problem because in ic_close_devs, for every net device that
we're about to close, we compare it against the list of lower interfaces
of ic_dev, to figure out whether we should close it or not. But since
ic_dev itself is subject to freeing, this means that at some point in
the middle of the list of ipconfig interfaces, ic_dev will have been
freed, and we would be still attempting to iterate through its list of
lower interfaces while checking whether to bring down the remaining
ipconfig interfaces.
There are multiple ways to avoid the use-after-free: we could delay
freeing ic_dev until the very end (outside the while loop). Or an even
simpler one: we can observe that we don't need ic_dev when iterating
through its lowers, only ic_dev->dev, structure which isn't ever freed.
So, by keeping ic_dev->dev in a variable assigned prior to freeing
ic_dev, we can avoid all use-after-free issues.
Fixes: 46acf7bdbc72 ("Revert "net: ipv4: handle DSA enabled master network devices"")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is simpler to make net->net_cookie a plain u64
written once in setup_net() instead of looping
and using atomic64 helpers.
Lorenz Bauer wants to add SO_NETNS_COOKIE socket option
and this patch would makes his patch series simpler.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add offloading for HSR/PRP (IEC 62439-3) tag insertion, tag removal
forwarding and duplication supported by the xrs7000 series switches.
Only HSR v1 and PRP v1 are supported by the xrs7000 series switches (HSR
v0 is not).
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding on DSA switches.
Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls
to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch.
The DSA switch driver should then set netdev feature flags for the
HSR/PRP operation that it offloads.
NETIF_F_HW_HSR_TAG_INS
NETIF_F_HW_HSR_TAG_RM
NETIF_F_HW_HSR_FWD
NETIF_F_HW_HSR_DUP
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding.
For HSR, insertion involves the switch adding a 6 byte HSR header after
the 14 byte Ethernet header. For PRP it adds a 6 byte trailer.
Tag removal involves automatically stripping the HSR/PRP header/trailer
in the switch. This is possible when the switch also performs auto
deduplication using the HSR/PRP header/trailer (making it no longer
required).
Forwarding involves automatically forwarding between redundant ports in
an HSR. This is crucial because delay is accumulated as a frame passes
through each node in the ring.
Duplication involves the switch automatically sending a single frame
from the CPU port to both redundant ports. This is required because the
inserted HSR/PRP header/trailer must contain the same sequence number
on the frames sent out both redundant ports.
Export is_hsr_master so DSA can tell them apart from other devices in
dsa_slave_changeupper.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For a switch to offload insertion of HSR/PRP tags, frames must not be
sent to the CPU facing switch port with a tag. Generate supervision frames
(eth type ETH_P_PRP) without HSR v1 (ETH_P_HSR)/PRP tag and rely on
create_tagged_frame which inserts it later. This will allow skipping the
tag insertion for all outgoing frames in the future which is required for
HSR v1/PRP tag insertions to be offloaded.
HSR v0 supervision frames always contain tag information so insertion of
the tag can't be offloaded. IEC 62439-3 Ed.2.0 (HSR v1) specifically
notes that this was changed since v0 to allow offloading.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
Port Selection Algorithm), a patient attacker could still be able
to collect enough state from an otherwise idle host.
Idea of this patch is to inject some noise, in the
cases __inet_hash_connect() found a candidate in the first
attempt.
This noise should not significantly reduce the collision
avoidance, and should be zero if connection table
is already well used.
Note that this is not implementing RFC 6056 3.3.5
because we think Algorithm 5 could hurt typical
workloads.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
provides good summary of why source selection needs extra care.
David Dworken reminded us that linux implements Algorithm 3
as described in RFC 6056 3.3.3
Quoting David :
In the context of the web, this creates an interesting info leak where
websites can count how many TCP connections a user's computer is
establishing over time. For example, this allows a website to count
exactly how many subresources a third party website loaded.
This also allows:
- Distinguishing between different users behind a VPN based on
distinct source port ranges.
- Tracking users over time across multiple networks.
- Covert communication channels between different browsers/browser
profiles running on the same computer
- Tracking what applications are running on a computer based on
the pattern of how fast source ports are getting incremented.
Section 3.3.4 describes an enhancement, that reduces
attackers ability to use the basic information currently
stored into the shared 'u32 hint'.
This change also decreases collision rate when
multiple applications need to connect() to
different destinations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|