Age | Commit message (Collapse) | Author |
|
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"Lots of bug fixes here:
1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.
2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
Crispin.
3) Use after free in psock backlog workqueue, from John Fastabend.
4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
Salem.
5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.
6) Network header needs to be set for packet redirect in nfp, from
John Hurley.
7) Fix udp zerocopy refcnt, from Willem de Bruijn.
8) Don't assume linear buffers in vxlan and geneve error handlers,
from Stefano Brivio.
9) Fix TOS matching in mlxsw, from Jiri Pirko.
10) More SCTP cookie memory leak fixes, from Neil Horman.
11) Fix VLAN filtering in rtl8366, from Linus Walluij.
12) Various TCP SACK payload size and fragmentation memory limit fixes
from Eric Dumazet.
13) Use after free in pneigh_get_next(), also from Eric Dumazet.
14) LAPB control block leak fix from Jeremy Sowden"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
lapb: fixed leak of control-blocks.
tipc: purge deferredq list for each grp member in tipc_group_delete
ax25: fix inconsistent lock state in ax25_destroy_timer
neigh: fix use-after-free read in pneigh_get_next
tcp: fix compile error if !CONFIG_SYSCTL
hv_sock: Suppress bogus "may be used uninitialized" warnings
be2net: Fix number of Rx queues used for flow hashing
net: handle 802.1P vlan 0 packets properly
tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
tcp: add tcp_min_snd_mss sysctl
tcp: tcp_fragment() should apply sane memory limits
tcp: limit payload size of sacked skbs
Revert "net: phylink: set the autoneg state in phylink_phy_change"
bpf: fix nested bpf tracepoints with per-cpu data
bpf: Fix out of bounds memory access in bpf_sk_storage
vsock/virtio: set SOCK_DONE on peer shutdown
net: dsa: rtl8366: Fix up VLAN filtering
net: phylink: set the autoneg state in phylink_phy_change
net: add high_order_alloc_disable sysctl/static key
tcp: add tcp_tx_skb_cache sysctl
...
|
|
Using a bare block cipher in non-crypto code is almost always a bad idea,
not only for security reasons (and we've seen some examples of this in
the kernel in the past), but also for performance reasons.
In the TCP fastopen case, we call into the bare AES block cipher one or
two times (depending on whether the connection is IPv4 or IPv6). On most
systems, this results in a call chain such as
crypto_cipher_encrypt_one(ctx, dst, src)
crypto_cipher_crt(tfm)->cit_encrypt_one(crypto_cipher_tfm(tfm), ...);
aesni_encrypt
kernel_fpu_begin();
aesni_enc(ctx, dst, src); // asm routine
kernel_fpu_end();
It is highly unlikely that the use of special AES instructions has a
benefit in this case, especially since we are doing the above twice
for IPv6 connections, instead of using a transform which can process
the entire input in one go.
We could switch to the cbcmac(aes) shash, which would at least get
rid of the duplicated overhead in *some* cases (i.e., today, only
arm64 has an accelerated implementation of cbcmac(aes), while x86 will
end up using the generic cbcmac template wrapping the AES-NI cipher,
which basically ends up doing exactly the above). However, in the given
context, it makes more sense to use a light-weight MAC algorithm that
is more suitable for the purpose at hand, such as SipHash.
Since the output size of SipHash already matches our chosen value for
TCP_FASTOPEN_COOKIE_SIZE, and given that it accepts arbitrary input
sizes, this greatly simplifies the code as well.
NOTE: Server farms backing a single server IP for load balancing purposes
and sharing a single fastopen key will be adversely affected by
this change unless all systems in the pool receive their kernel
upgrades at the same time.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The phy_reset hook is not set anywhere. Drop it to make
stmmac_mdio_reset() smaller.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only OF platforms use the reset delays and these delays are only read in
stmmac_mdio_reset(). Move them from struct stmmac_mdio_bus_data to a
stack variable inside stmmac_mdio_reset() because that's the only usage
of these delays.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No platform uses the "reset_gpio" field from stmmac_mdio_bus_data
anymore. Drop it so we don't get any new consumers either.
Plain GPIO numbers are being deprecated in favor of GPIO descriptors. If
needed any new non-OF platform can add a GPIO descriptor lookup table.
devm_gpiod_get_optional() will find the GPIO in that case.
Suggested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"The accumulated fixes from this and last week:
- Fix vmalloc TLB flush and map range calculations which lead to
stale TLBs, spurious faults and other hard to diagnose issues.
- Use fault_in_pages_writable() for prefaulting the user stack in the
FPU code as it's less fragile than the current solution
- Use the PF_KTHREAD flag when checking for a kernel thread instead
of current->mm as the latter can give the wrong answer due to
use_mm()
- Compute the vmemmap size correctly for KASLR and 5-Level paging.
Otherwise this can end up with a way too small vmemmap area.
- Make KASAN and 5-level paging work again by making sure that all
invalid bits are masked out when computing the P4D offset. This
worked before but got broken recently when the LDT remap area was
moved.
- Prevent a NULL pointer dereference in the resource control code
which can be triggered with certain mount options when the
requested resource is not available.
- Enforce ordering of microcode loading vs. perf initialization on
secondary CPUs. Otherwise perf tries to access a non-existing MSR
as the boot CPU marked it as available.
- Don't stop the resource control group walk early otherwise the
control bitmaps are not updated correctly and become inconsistent.
- Unbreak kgdb by returning 0 on success from
kgdb_arch_set_breakpoint() instead of an error code.
- Add more Icelake CPU model defines so depending changes can be
queued in other trees"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
x86/kasan: Fix boot with 5-level paging and KASAN
x86/fpu: Don't use current->mm to check for a kthread
x86/kgdb: Return 0 from kgdb_arch_set_breakpoint()
x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled
x86/resctrl: Don't stop walking closids when a locksetup group is found
x86/fpu: Update kernel's FPU state before using for the fsave header
x86/mm/KASLR: Compute the size of the vmemmap section properly
x86/fpu: Use fault_in_pages_writeable() for pre-faulting
x86/CPU: Add more Icelake model numbers
mm/vmalloc: Avoid rare case of flushing TLB with weird arguments
mm/vmalloc: Fix calculation of direct map addr range
|
|
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu
overhead.
Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value
to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.
Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.
TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.
A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.
Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.
CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-06-15
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix stack layout of JITed x64 bpf code, from Alexei.
2) fix out of bounds memory access in bpf_sk_storage, from Arthur.
3) fix lpm trie walk, from Jonathan.
4) fix nested bpf_perf_event_output, from Matt.
5) and several other fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This config option makes only couple of lines optional.
Two small helpers and an int in couple of cls structs.
Remove the config option and always compile this in.
This saves the user from unexpected surprises when he adds
a filter with ingress device match which is silently ignored
in case the config option is not set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because of PHYLINK conversion we stopped parsing the phy-handle property
from DT. Unfortunatelly, some wrapper drivers still rely on this phy
node to configure the PHY.
Let's restore the parsing of PHY handle while these wrapper drivers are
not fully converted to PHYLINK.
Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adric Blake reported the following warning during suspend-resume:
Enabling non-boot CPUs ...
x86: Booting SMP configuration:
smpboot: Booting Node 0 Processor 1 APIC 0x2
unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
Call Trace:
intel_set_tfa
intel_pmu_cpu_starting
? x86_pmu_dead_cpu
x86_pmu_starting_cpu
cpuhp_invoke_callback
? _raw_spin_lock_irqsave
notify_cpu_starting
start_secondary
secondary_startup_64
microcode: sig=0x806ea, pf=0x80, revision=0x96
microcode: updated to revision 0xb4, date = 2019-04-01
CPU1 is up
The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
by microcode. The log above shows that the microcode loader callback
happens after the PMU restoration, leading to the conjecture that
because the microcode hasn't been updated yet, that MSR is not present
yet, leading to the #GP.
Add a microcode loader-specific hotplug vector which comes before
the PERF vectors and thus executes earlier and makes sure the MSR is
present.
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Adric Blake <promarbler14@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: x86@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This has an unusually high density of tricky fixes:
- task_get_css() could deadlock when it races against a dying cgroup.
- cgroup.procs didn't list thread group leaders with live threads.
This could mislead readers to think that a cgroup is empty when
it's not. Fixed by making PROCS iterator include dead tasks. I made
a couple mistakes making this change and this pull request contains
a couple follow-up patches.
- When cpusets run out of online cpus, it updates cpusmasks of member
tasks in bizarre ways. Joel improved the behavior significantly"
* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: restore sanity to cpuset_cpus_allowed_fallback()
cgroup: Fix css_task_iter_advance_css_set() cset skip condition
cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
cgroup: Include dying leaders with live threads in PROCS iterations
cgroup: Implement css_task_iter_skip()
cgroup: Call cgroup_release() before __exit_signal()
docs cgroups: add another example size for hugetlb
cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
|
|
Pull drm fixes from Daniel Vetter:
"Nothing unsettling here, also not aware of anything serious still
pending.
The edid override regression fix took a bit longer since this seems to
be an area with an overabundance of bad options. But the fix we have
now seems like a good path forward.
Next week it should be back to Dave.
Summary:
- fix regression on amdgpu on SI
- fix edid override regression
- driver fixes: amdgpu, i915, mediatek, meson, panfrost
- fix writecombine for vmap in gem-shmem helper (used by panfrost)
- add more panel quirks"
* tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm: (25 commits)
drm/amdgpu: return 0 by default in amdgpu_pm_load_smu_firmware
drm/amdgpu: Fix bounds checking in amdgpu_ras_is_supported()
drm: add fallback override/firmware EDID modes workaround
drm/edid: abstract override/firmware EDID retrieval
drm/i915/perf: fix whitelist on Gen10+
drm/i915/sdvo: Implement proper HDMI audio support for SDVO
drm/i915: Fix per-pixel alpha with CCS
drm/i915/dmc: protect against reading random memory
drm/i915/dsi: Use a fuzzy check for burst mode clock check
drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
drm/panfrost: Require the simple_ondemand governor
drm/panfrost: make devfreq optional again
drm/gem_shmem: Use a writecombine mapping for ->vaddr
drm: panel-orientation-quirks: Add quirk for GPD MicroPC
drm: panel-orientation-quirks: Add quirk for GPD pocket2
drm/meson: fix G12A primary plane disabling
drm/meson: fix primary plane disabling
drm/meson: fix G12A HDMI PLL settings for 4K60 1000/1001 variations
drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable()
drm/mediatek: clear num_pipes when unbind driver
...
|
|
A port may trigger operations on its dedicated CPU port, so using
cpu_dp as const will raise warnings. Make cpu_dp non const.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
>From linux-3.7, (commit 5640f7685831 "net: use a per task frag
allocator") TCP sendmsg() has preferred using order-3 allocations.
While it gives good results for most cases, we had reports
that heavy uses of TCP over loopback were hitting a spinlock
contention in page allocations/freeing.
This commits adds a sysctl so that admins can opt-in
for order-0 allocations. Hopefully mm layer might optimize
order-3 allocations in the future since it could give us
a nice boost (see 8 lines of following benchmark)
The following benchmark shows a win when more than 8 TCP_STREAM
threads are running (56 x86 cores server in my tests)
for thr in {1..30}
do
sysctl -wq net.core.high_order_alloc_disable=0
T0=`./super_netperf $thr -H 127.0.0.1 -l 15`
sysctl -wq net.core.high_order_alloc_disable=1
T1=`./super_netperf $thr -H 127.0.0.1 -l 15`
echo $thr:$T0:$T1
done
1: 49979: 37267
2: 98745: 76286
3: 141088: 110051
4: 177414: 144772
5: 197587: 173563
6: 215377: 208448
7: 241061: 234087
8: 267155: 263373
9: 295069: 297402
10: 312393: 335213
11: 340462: 368778
12: 371366: 403954
13: 412344: 443713
14: 426617: 473580
15: 474418: 507861
16: 503261: 538539
17: 522331: 563096
18: 532409: 567084
19: 550824: 605240
20: 525493: 641988
21: 564574: 665843
22: 567349: 690868
23: 583846: 710917
24: 588715: 736306
25: 603212: 763494
26: 604083: 792654
27: 602241: 796450
28: 604291: 797993
29: 611610: 833249
30: 577356: 841062
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Feng Tang reported a performance regression after introduction
of per TCP socket tx/rx caches, for TCP over loopback (netperf)
There is high chance the regression is caused by a change on
how well the 32 KB per-thread page (current->task_frag) can
be recycled, and lack of pcp caches for order-3 pages.
I could not reproduce the regression myself, cpus all being
spinning on the mm spinlocks for page allocs/freeing, regardless
of enabling or disabling the per tcp socket caches.
It seems best to disable the feature by default, and let
admins enabling it.
MM layer either needs to provide scalable order-3 pages
allocations, or could attempt a trylock on zone->lock if
the caller only attempts to get a high-order page and is
able to fallback to order-0 ones in case of pressure.
Tests run on a 56 cores host (112 hyper threads)
- 35.49% netperf [kernel.vmlinux] [k] queued_spin_lock_slowpath
- 35.49% queued_spin_lock_slowpath
- 18.18% get_page_from_freelist
- __alloc_pages_nodemask
- 18.18% alloc_pages_current
skb_page_frag_refill
sk_page_frag_refill
tcp_sendmsg_locked
tcp_sendmsg
inet_sendmsg
sock_sendmsg
__sys_sendto
__x64_sys_sendto
do_syscall_64
entry_SYSCALL_64_after_hwframe
__libc_send
+ 17.31% __free_pages_ok
+ 31.43% swapper [kernel.vmlinux] [k] intel_idle
+ 9.12% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
+ 6.53% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
+ 0.69% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath
+ 0.68% netperf [kernel.vmlinux] [k] skb_release_data
+ 0.52% netperf [kernel.vmlinux] [k] tcp_sendmsg_locked
0.46% netperf [kernel.vmlinux] [k] _raw_spin_lock_irqsave
Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of relying on rps_needed, it is safer to use a separate
static key, since we do not want to enable TCP rx_skb_cache
by default. This feature can cause huge increase of memory
usage on hosts with millions of sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert proc_dointvec_minmax_bpf_stats() into a more generic
helper, since we are going to use jump labels more often.
Note that sysctl_bpf_stats_enabled is removed, since
it is no longer needed/used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we want to set a EDT time for the skb we want to send
via ip_send_unicast_reply(), we have to pass a new parameter
and initialize ipc.sockc.transmit_time with it.
This fixes the EDT time for ACK/RST packets sent on behalf of
a TIME_WAIT socket.
Fixes: a842fe1425cb ("tcp: add optional per socket transmit delay")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-06-13
Mlx5 devlink health fw reporters and sw reset support
This series provides mlx5 firmware reset support and firmware devlink health
reporters.
1) Add initial mlx5 kernel documentation and include devlink health reporters
2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot
3) Issue software reset upon FW asserts
4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.
4.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.
4.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This interface is currently only defined if CONFIG_JUMP_LABEL. Make it
available also when jump labels are off.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Switch stmmac_mdio_reset to use GPIO descriptors. GPIO core handles the
"snps,reset-gpio" for GPIO descriptors so we don't need to take care of
it inside the driver anymore.
The advantage of this is that we now preserve the GPIO flags which are
passed via devicetree. This is required on some newer Amlogic boards
which use an Open Drain pin for the reset GPIO. This pin can only output
a LOW signal or switch to input mode but it cannot output a HIGH signal.
There are already devicetree bindings for these special cases and GPIO
core already takes care of them but only if we use GPIO descriptors
instead of GPIO numbers.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Many changes all over:
* HE (802.11ax) work continues
* WPA3 offloads
* work on extended key ID handling continues
* fixes to honour AP supported rates with auth/assoc frames
* nl80211 netlink policy improvements to fix some issues
with strict validation on new commands with old attrs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Various fixes, all over:
* a few memory leaks
* fixes for management frame protection security
and A2/A3 confusion (affecting TDLS as well)
* build fix for certificates
* etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While reviewing the DPAA2 work, it has become apparent that we need
better documentation about which members of the phylink link state
structure are valid in the mac_config call. Improve this
documentation.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/devm_memremap_pages: fix final page put race
PCI/P2PDMA: track pgmap references per resource, not globally
lib/genalloc: introduce chunk owners
PCI/P2PDMA: fix the gen_pool_add_virt() failure path
mm/devm_memremap_pages: introduce devm_memunmap_pages
drivers/base/devres: introduce devm_release_action()
mm/vmscan.c: fix trying to reclaim unevictable LRU page
coredump: fix race condition between collapse_huge_page() and core dumping
mm/mlock.c: change count_mm_mlocked_page_nr return type
mm: mmu_gather: remove __tlb_reset_range() for force flush
fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
mm/vmscan.c: fix recent_rotated history
mm/mlock.c: mlockall error for flag MCL_ONFAULT
scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE
mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
mm: memcontrol: don't batch updates of local VM stats and events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"It might feel like deja vu to receive a bulk of changes at rc5, and it
happens again; we've got a collection of fixes for ASoC. Most of fixes
are targeted for the newly merged SOF (Sound Open Firmware) stuff and
the relevant fixes for Intel platforms.
Other than that, there are a few regression fixes for the recent ASoC
core changes and HD-audio quirk, as well as a couple of FireWire fixes
and for other ASoC codecs"
* tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (54 commits)
Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops"
ALSA: ice1712: Check correct return value to snd_i2c_sendbytes (EWS/DMX 6Fire)
ALSA: oxfw: allow PCM capture for Stanton SCS.1m
ALSA: firewire-motu: fix destruction of data for isochronous resources
ASoC: Intel: sst: fix kmalloc call with wrong flags
ASoC: core: Fix deadlock in snd_soc_instantiate_card()
SoC: rt274: Fix internal jack assignment in set_jack callback
ALSA: hdac: fix memory release for SST and SOF drivers
ASoC: SOF: Intel: hda: use the defined ppcap functions
ASoC: core: move DAI pre-links initiation to snd_soc_instantiate_card
ASoC: Intel: cht_bsw_rt5672: fix kernel oops with platform_name override
ASoC: Intel: cht_bsw_nau8824: fix kernel oops with platform_name override
ASoC: Intel: bytcht_es8316: fix kernel oops with platform_name override
ASoC: Intel: cht_bsw_max98090: fix kernel oops with platform_name override
ASoC: sun4i-i2s: Add offset to RX channel select
ASoC: sun4i-i2s: Fix sun8i tx channel offset mask
ASoC: max98090: remove 24-bit format support if RJ is 0
ASoC: da7219: Fix build error without CONFIG_I2C
ASoC: SOF: Intel: hda: Fix COMPILE_TEST build error
ASoC: SOF: fix DSP oops definitions in FW ABI
...
|
|
cfg80211_remain_on_channel_expired is used to notify userspace when
the remain on channel duration expired by sending an event. There is
no such equivalent to CMD_FRAME, where if offchannel and a duration
is provided, the card will go offchannel for that duration. Currently
there is no way for userspace to tell when that duration expired
apart from setting an independent timeout. This timeout is quite
erroneous as the kernel may not immediately send out the frame
because of scheduling or work queue delays. In testing, it was found
this timeout had to be quite large to accomidate any potential delays.
A better solution is to have the kernel send an event when this
duration has expired. There is already NL80211_CMD_FRAME_WAIT_CANCEL
which can be used to cancel a NL80211_CMD_FRAME offchannel. Using this
command matches perfectly to how NL80211_CMD_CANCEL_REMAIN_ON_CHANNEL
works, where its both used to cancel and notify if the duration has
expired.
Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's no rate control algorithm that *doesn't* want to call
it internally, and calling it internally will let us modify
its behaviour in the future.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add the "OBSS Narrow Bandwidth RU In OFDMA Tolerance Support" flag
definition to the definitions of the flags covered by the Extended
Capability IE.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add a function that iterates over the BSS entries associated with a
given wiphy and calls a callback for each iterated BSS. This can be
used by drivers in various ways, e.g., to evaluate some property for
all the BSSs in the medium.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Allow the userland daemon to en/disable TWT support for an AP.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
[simplify parsing code]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Turn TWT for STA interfaces when they associate and/or receive a
beacon where the twt_responder bit has changed.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Require that each vendor command give a policy of its sub-attributes
in NL80211_ATTR_VENDOR_DATA, and then (stricly) check the contents,
including the NLA_F_NESTED flag that we couldn't check on the outer
layer because there we don't know yet.
It is possible to use VENDOR_CMD_RAW_DATA for raw data, but then no
nested data can be given (NLA_F_NESTED flag must be clear) and the
data is just passed as is to the command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This function is similar to ieee80211_get_he_sta_cap() but allows passing
the iftype. Also make ieee80211_get_he_sta_cap() use the new helper
rather than duplicating the code.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Let drivers advertise support for station-mode SAE authentication
offload with a new NL80211_EXT_FEATURE_SAE_OFFLOAD flag.
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add definition of WPA version 3 for SAE authentication.
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
IEEE 802.11 - 2016 forbids mixing MPDUs with different keyIDs in one
A-MPDU. Drivers supporting A-MPDUs and Extended Key ID must actively
enforce that requirement due to the available two unicast keyIDs.
Allow driver to signal mac80211 that they will not check the keyID in
MPDUs when aggregating them and that they expect mac80211 to stop Tx
aggregation when rekeying a connection using Extended Key ID.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Sean writes:
meson: A few G12A fixes across the driver (Neil)
quirks: A couple quirks for GPD devices (Hans)
gem_shmem: Use writecombine when vmapping non-dmabuf BOs (Boris)
panfrost: A couple tweaks to requiring devfreq (Neil & Ezequiel)
edid: Ensure we return the override mode when ddc probe fails (Jani)
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613143946.GA24233@art_vandelay
|
|
Publish scaled_ppm_to_ppb to allow drivers to use it.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap. If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.
As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.
Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback. Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.
Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 41e94a851304 ("add devm_memremap_pages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The p2pdma facility enables a provider to publish a pool of dma
addresses for a consumer to allocate. A genpool is used internally by
p2pdma to collect dma resources, 'chunks', to be handed out to
consumers. Whenever a consumer allocates a resource it needs to pin the
'struct dev_pagemap' instance that backs the chunk selected by
pci_alloc_p2pmem().
Currently that reference is taken globally on the entire provider
device. That sets up a lifetime mismatch whereby the p2pdma core needs
to maintain hacks to make sure the percpu_ref is not released twice.
This lifetime mismatch also stands in the way of a fix to
devm_memremap_pages() whereby devm_memremap_pages_release() must wait for
the percpu_ref ->release() callback to complete before it can proceed to
teardown pages.
So, towards fixing this situation, introduce the ability to store a 'chunk
owner' at gen_pool_add() time, and a facility to retrieve the owner at
gen_pool_{alloc,free}() time. For p2pdma this will be used to store and
recall individual dev_pagemap reference counter instances per-chunk.
Link: http://lkml.kernel.org/r/155727338118.292046.13407378933221579644.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use the new devm_release_action() facility to allow
devm_memremap_pages_release() to be manually triggered.
Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm/devm_memremap_pages: Fix page release race", v2.
Logan audited the devm_memremap_pages() shutdown path and noticed that
it was possible to proceed to arch_remove_memory() before all potential
page references have been reaped.
Introduce a new ->cleanup() callback to do the work of waiting for any
straggling page references and then perform the percpu_ref_exit() in
devm_memremap_pages_release() context.
For p2pdma this involves some deeper reworks to reference count
resources on a per-instance basis rather than a per pci-device basis. A
modified genalloc api is introduced to convey a driver-private pointer
through gen_pool_{alloc,free}() interfaces. Also, a
devm_memunmap_pages() api is introduced since p2pdma does not
auto-release resources on a setup failure.
The dax and pmem changes pass the nvdimm unit tests, and the p2pdma
changes should now pass testing with the pci_p2pdma_release() fix.
Jrme, how does this look for HMM?
This patch (of 6):
The devm_add_action() facility allows a resource allocation routine to
add custom devm semantics. One such user is devm_memremap_pages().
There is now a need to manually trigger
devm_memremap_pages_release(). Introduce devm_release_action() so the
release action can be triggered via a new devm_memunmap_pages() api in a
follow-on change.
Link: http://lkml.kernel.org/r/155727336530.292046.2926860263201336366.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41fb70 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.
If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.
khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.
collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().
Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.
The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().
So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump... as long as the coredump
can invoke page faults without holding the mmap_sem for reading.
This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().
Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47d8 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The kernel test robot noticed a 26% will-it-scale pagefault regression
from commit 42a300353577 ("mm: memcontrol: fix recursive statistics
correctness & scalabilty"). This appears to be caused by bouncing the
additional cachelines from the new hierarchical statistics counters.
We can fix this by getting rid of the batched local counters instead.
Originally, there were *only* group-local counters, and they were fully
maintained per cpu. A reader of a stats file high up in the cgroup tree
would have to walk the entire subtree and collect each level's per-cpu
counters to get the recursive view. This was prohibitively expensive,
and so we switched to per-cpu batched updates of the local counters
during a983b5ebee57 ("mm: memcontrol: fix excessive complexity in
memory.stat reporting"), reducing the complexity from nr_subgroups *
nr_cpus to nr_subgroups.
With growing machines and cgroup trees, the tree walk itself became too
expensive for monitoring top-level groups, and this is when the culprit
patch added hierarchy counters on each cgroup level. When the per-cpu
batch size would be reached, both the local and the hierarchy counters
would get batch-updated from the per-cpu delta simultaneously.
This makes local and hierarchical counter reads blazingly fast, but it
unfortunately makes the write-side too cache line intense.
Since local counter reads were never a problem - we only centralized
them to accelerate the hierarchy walk - and use of the local counters
are becoming rarer due to replacement with hierarchical views (ongoing
rework in the page reclaim and workingset code), we can make those local
counters unbatched per-cpu counters again.
The scheme will then be as such:
when a memcg statistic changes, the writer will:
- update the local counter (per-cpu)
- update the batch counter (per-cpu). If the batch is full:
- spill the batch into the group's atomic_t
- spill the batch into all ancestors' atomic_ts
- empty out the batch counter (per-cpu)
when a local memcg counter is read, the reader will:
- collect the local counter from all cpus
when a hiearchy memcg counter is read, the reader will:
- read the atomic_t
We might be able to simplify this further and make the recursive
counters unbatched per-cpu counters as well (batch upward propagation,
but leave per-cpu collection to the readers), but that will require a
more in-depth analysis and testing of all the callsites. Deal with the
immediate regression for now.
Link: http://lkml.kernel.org/r/20190521151647.GB2870@cmpxchg.org
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Tested-by: kernel test robot <rong.a.chen@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Previously, the BPF_FIB_LOOKUP_{DIRECT,OUTPUT} flags in the BPF UAPI
were defined with the help of BIT macro. This had the following issues:
- In order to use any of the flags, a user was required to depend
on <linux/bits.h>.
- No other flag in bpf.h uses the macro, so it seems that an unwritten
convention is to use (1 << (nr)) to define BPF-related flags.
Fixes: 87f5fc7e48dd ("bpf: Provide helper to do forwarding lookups in kernel FIB table")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|