<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/kernel, branch doc/4.4</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=doc%2F4.4</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=doc%2F4.4'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2015-09-12T02:34:09+00:00</updated>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2015-09-12T02:34:09+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-12T02:34:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=01b0c014eeb0bb857a5dc572cd108be7becddfe7'/>
<id>urn:sha1:01b0c014eeb0bb857a5dc572cd108be7becddfe7</id>
<content type='text'>
Merge fourth patch-bomb from Andrew Morton:

 - sys_membarier syscall

 - seq_file interface changes

 - a few misc fixups

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;:
  revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"
  mm/early_ioremap: add explicit #include of asm/early_ioremap.h
  fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void
  selftests: enhance membarrier syscall test
  selftests: add membarrier syscall test
  sys_membarrier(): system-wide memory barrier (generic, x86)
  MODSIGN: fix a compilation warning in extract-cert
</content>
</entry>
<entry>
<title>Merge tag 'pm+acpi-4.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2015-09-12T02:11:06+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-12T02:11:06+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=fa9a67ef9de48de5474ea1e5a358340369e78b74'/>
<id>urn:sha1:fa9a67ef9de48de5474ea1e5a358340369e78b74</id>
<content type='text'>
Pull more power management and ACPI updates from Rafael Wysocki:
 "These are mostly fixes and cleanups on top of the previous PM+ACPI
  pull request (cpufreq core and drivers, cpuidle, generic power domains
  framework).  Some of them didn't make to that pull request and some
  fix issues introduced by it.

  The only really new thing is the support for suspend frequency in the
  cpufreq-dt driver, but it is needed to fix an issue with Exynos
  platforms.

  Specifics:

   - build fix for the new Mediatek MT8173 cpufreq driver (Guenter
     Roeck).

   - generic power domains framework fixes (power on error code path,
     subdomain removal) and cleanup of a deprecated API user (Geert
     Uytterhoeven, Jon Hunter, Ulf Hansson).

   - cpufreq-dt driver fixes including two fixes for bugs related to the
     new Operating Performance Points Device Tree bindings introduced
     recently (Viresh Kumar).

   - suspend frequency support for the cpufreq-dt driver (Bartlomiej
     Zolnierkiewicz, Viresh Kumar).

   - cpufreq core cleanups (Viresh Kumar).

   - intel_pstate driver fixes (Chen Yu, Kristen Carlson Accardi).

   - additional sanity check in the cpuidle core (Xunlei Pang).

   - fix for a comment related to CPU power management (Lina Iyer)"

* tag 'pm+acpi-4.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_pstate: fix PCT_TO_HWP macro
  intel_pstate: Fix user input of min/max to legal policy region
  PM / OPP: Return suspend_opp only if it is enabled
  cpufreq-dt: add suspend frequency support
  cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency
  PM / OPP: add dev_pm_opp_get_suspend_opp() helper
  staging: board: Migrate away from __pm_genpd_name_add_device()
  cpufreq: Use __func__ to print function's name
  cpufreq: staticize cpufreq_cpu_get_raw()
  PM / Domains: Ensure subdomain is not in use before removing
  cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL
  cpuidle/coupled: Add sanity check for safe_state_index
  PM / Domains: Try power off masters in error path of __pm_genpd_poweron()
  cpufreq: dt: Tolerance applies on both sides of target voltage
  cpufreq: dt: Print error on failing to mark OPPs as shared
  cpufreq: dt: Check OPP count before marking them shared
  kernel/cpu_pm: fix cpu_cluster_pm_exit comment
</content>
</entry>
<entry>
<title>sys_membarrier(): system-wide memory barrier (generic, x86)</title>
<updated>2015-09-11T22:21:34+00:00</updated>
<author>
<name>Mathieu Desnoyers</name>
<email>mathieu.desnoyers@efficios.com</email>
</author>
<published>2015-09-11T20:07:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1'/>
<id>urn:sha1:5b25b13ab08f616efd566347d809b4ece54570d1</id>
<content type='text'>
Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system.  It is
implemented by calling synchronize_sched().  It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier.  For synchronization primitives
that distinguish between read-side and write-side (e.g.  userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.

The existing applications of which I am aware that would be improved by
this system call are as follows:

* Through Userspace RCU library (http://urcu.so)
  - DNS server (Knot DNS) https://www.knot-dns.cz/
  - Network sniffer (http://netsniff-ng.org/)
  - Distributed object storage (https://sheepdog.github.io/sheepdog/)
  - User-space tracing (http://lttng.org)
  - Network storage system (https://www.gluster.org/)
  - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
  - Financial software (https://lkml.org/lkml/2015/3/23/189)

Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking.  Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().

* Direct users of sys_membarrier
  - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)

Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux.  They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.

To explain the benefit of this scheme, let's introduce two example threads:

Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())

In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".

Before the change, we had, for each smp_mb() pairs:

Thread A                    Thread B
previous mem accesses       previous mem accesses
smp_mb()                    smp_mb()
following mem accesses      following mem accesses

After the change, these pairs become:

Thread A                    Thread B
prev mem accesses           prev mem accesses
sys_membarrier()            barrier()
follow mem accesses         follow mem accesses

As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).

1) Non-concurrent Thread A vs Thread B accesses:

Thread A                    Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
                            prev mem accesses
                            barrier()
                            follow mem accesses

In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.

2) Concurrent Thread A vs Thread B accesses

Thread A                    Thread B
prev mem accesses           prev mem accesses
sys_membarrier()            barrier()
follow mem accesses         follow mem accesses

In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().

* Benchmarks

On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)

1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.

* User-space user of this system call: Userspace RCU library

Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.

Results in liburcu:

Operations in 10s, 6 readers, 2 writers:

memory barriers in reader:    1701557485 reads, 2202847 writes
signal-based scheme:          9830061167 reads,    6700 writes
sys_membarrier:               9952759104 reads,     425 writes
sys_membarrier (dyn. check):  7970328887 reads,     425 writes

The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.

Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.

An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()-&gt;mm without holding each CPU's rq lock.

This patch adds the system call to x86 and to asm-generic.

[1] http://urcu.so

membarrier(2) man page:

MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)

NAME
       membarrier - issue memory barriers on a set of threads

SYNOPSIS
       #include &lt;linux/membarrier.h&gt;

       int membarrier(int cmd, int flags);

DESCRIPTION
       The cmd argument is one of the following:

       MEMBARRIER_CMD_QUERY
              Query  the  set  of  supported commands. It returns a bitmask of
              supported commands.

       MEMBARRIER_CMD_SHARED
              Execute a memory barrier on all threads running on  the  system.
              Upon  return from system call, the caller thread is ensured that
              all running threads have passed through a state where all memory
              accesses  to  user-space  addresses  match program order between
              entry to and return from the system  call  (non-running  threads
              are de facto in such a state). This covers threads from all pro=E2=80=90
              cesses running on the system.  This command returns 0.

       The flags argument needs to be 0. For future extensions.

       All memory accesses performed  in  program  order  from  each  targeted
       thread is guaranteed to be ordered with respect to sys_membarrier(). If
       we use the semantic "barrier()" to represent a compiler barrier forcing
       memory  accesses  to  be performed in program order across the barrier,
       and smp_mb() to represent explicit memory barriers forcing full  memory
       ordering  across  the barrier, we have the following ordering table for
       each pair of barrier(), sys_membarrier() and smp_mb():

       The pair ordering is detailed as (O: ordered, X: not ordered):

                              barrier()   smp_mb() sys_membarrier()
              barrier()          X           X            O
              smp_mb()           X           O            O
              sys_membarrier()   O           O            O

RETURN VALUE
       On success, these system calls return zero.  On error, -1 is  returned,
       and errno is set appropriately. For a given command, with flags
       argument set to 0, this system call is guaranteed to always return the
       same value until reboot.

ERRORS
       ENOSYS System call is not implemented.

       EINVAL Invalid arguments.

Linux                             2015-04-15                     MEMBARRIER(2)

Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Nicholas Miell &lt;nmiell@comcast.net&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Alan Cox &lt;gnomes@lxorguk.ukuu.org.uk&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Pranith Kumar &lt;bobby.prani@gmail.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Shuah Khan &lt;shuahkh@osg.samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branches 'pm-cpu', 'pm-cpuidle' and 'pm-domains'</title>
<updated>2015-09-11T13:37:36+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2015-09-11T13:37:36+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4614e0cc66a8ea1d163efc364ba743424dee5c0a'/>
<id>urn:sha1:4614e0cc66a8ea1d163efc364ba743424dee5c0a</id>
<content type='text'>
* pm-cpu:
  kernel/cpu_pm: fix cpu_cluster_pm_exit comment

* pm-cpuidle:
  cpuidle/coupled: Add sanity check for safe_state_index

* pm-domains:
  staging: board: Migrate away from __pm_genpd_name_add_device()
  PM / Domains: Ensure subdomain is not in use before removing
  PM / Domains: Try power off masters in error path of __pm_genpd_poweron()
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2015-09-11T01:19:42+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-11T01:19:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=33e247c7e58d335d70ecb84fd869091e2e4b8dcb'/>
<id>urn:sha1:33e247c7e58d335d70ecb84fd869091e2e4b8dcb</id>
<content type='text'>
Merge third patch-bomb from Andrew Morton:

 - even more of the rest of MM

 - lib/ updates

 - checkpatch updates

 - small changes to a few scruffy filesystems

 - kmod fixes/cleanups

 - kexec updates

 - a dma-mapping cleanup series from hch

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (81 commits)
  dma-mapping: consolidate dma_set_mask
  dma-mapping: consolidate dma_supported
  dma-mapping: cosolidate dma_mapping_error
  dma-mapping: consolidate dma_{alloc,free}_noncoherent
  dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
  mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
  mm: make sure all file VMAs have -&gt;vm_ops set
  mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
  mm: mark most vm_operations_struct const
  namei: fix warning while make xmldocs caused by namei.c
  ipc: convert invalid scenarios to use WARN_ON
  zlib_deflate/deftree: remove bi_reverse()
  lib/decompress_unlzma: Do a NULL check for pointer
  lib/decompressors: use real out buf size for gunzip with kernel
  fs/affs: make root lookup from blkdev logical size
  sysctl: fix int -&gt; unsigned long assignments in INT_MIN case
  kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
  kexec: align crash_notes allocation to make it be inside one physical page
  kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
  kexec: split kexec_load syscall from kexec core code
  ...
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-09-10T20:53:15+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-10T20:53:15+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=65c61bc5dbbcfa1ff38e58aa834cb9a88e84a886'/>
<id>urn:sha1:65c61bc5dbbcfa1ff38e58aa834cb9a88e84a886</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Fix out-of-bounds array access in netfilter ipset, from Jozsef
    Kadlecsik.

 2) Use correct free operation on netfilter conntrack templates, from
    Daniel Borkmann.

 3) Fix route leak in SCTP, from Marcelo Ricardo Leitner.

 4) Fix sizeof(pointer) in mac80211, from Thierry Reding.

 5) Fix cache pointer comparison in ip6mr leading to missed unlock of
    mrt_lock.  From Richard Laing.

 6) rds_conn_lookup() needs to consider network namespace in key
    comparison, from Sowmini Varadhan.

 7) Fix deadlock in TIPC code wrt broadcast link wakeups, from Kolmakov
    Dmitriy.

 8) Fix fd leaks in bpf syscall, from Daniel Borkmann.

 9) Fix error recovery when installing ipv6 multipath routes, we would
    delete the old route before we would know if we could fully commit
    to the new set of nexthops.  Fix from Roopa Prabhu.

10) Fix run-time suspend problems in r8152, from Hayes Wang.

11) In fec, don't program the MAC address into the chip when the clocks
    are gated off.  From Fugang Duan.

12) Fix poll behavior for netlink sockets when using rx ring mmap, from
    Daniel Borkmann.

13) Don't allocate memory with GFP_KERNEL from get_stats64 in r8169
    driver, from Corinna Vinschen.

14) In TCP Cubic congestion control, handle idle periods better where we
    are application limited, in order to keep cwnd from growing out of
    control.  From Eric Dumzet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  tcp_cubic: better follow cubic curve after idle period
  tcp: generate CA_EVENT_TX_START on data frames
  xen-netfront: respect user provided max_queues
  xen-netback: respect user provided max_queues
  r8169: Fix sleeping function called during get_stats64, v2
  ether: add IEEE 1722 ethertype - TSN
  netlink, mmap: fix edge-case leakages in nf queue zero-copy
  netlink, mmap: don't walk rx ring on poll if receive queue non-empty
  cxgb4: changes for new firmware 1.14.4.0
  net: fec: add netif status check before set mac address
  r8152: fix the runtime suspend issues
  r8152: split DRIVER_VERSION
  ipv6: fix ifnullfree.cocci warnings
  add microchip LAN88xx phy driver
  stmmac: fix check for phydev being open
  net: qlcnic: delete redundant memsets
  net: mv643xx_eth: use kzalloc
  net: jme: use kzalloc() instead of kmalloc+memset
  net: cavium: liquidio: use kzalloc in setup_glist()
  net: ipv6: use common fib_default_rule_pref
  ...
</content>
</entry>
<entry>
<title>sysctl: fix int -&gt; unsigned long assignments in INT_MIN case</title>
<updated>2015-09-10T20:29:01+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-09-09T22:39:06+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9a5bc726d559221a3394bb8ef97d0abc1ee94d00'/>
<id>urn:sha1:9a5bc726d559221a3394bb8ef97d0abc1ee94d00</id>
<content type='text'>
The following

    if (val &lt; 0)
        *lvalp = (unsigned long)-val;

is incorrect because the compiler is free to assume -val to be positive
and use a sign-extend instruction for extending the bit pattern.  This is
a problem if val == INT_MIN:

    # echo -2147483648 &gt;/proc/sys/dev/scsi/logging_level
    # cat /proc/sys/dev/scsi/logging_level
    -18446744071562067968

Cast to unsigned long before negation - that way we first sign-extend and
then negate an unsigned, which is well defined.  With this:

    # cat /proc/sys/dev/scsi/logging_level
    -2147483648

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Cc: Mikulas Patocka &lt;mikulas@twibright.com&gt;
Cc: Robert Xiao &lt;nneonneo@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo</title>
<updated>2015-09-10T20:29:01+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2015-09-09T22:39:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1303a27c9c32020a3b6ac89be270d2ab1f28be24'/>
<id>urn:sha1:1303a27c9c32020a3b6ac89be270d2ab1f28be24</id>
<content type='text'>
In x86_64, since v2.6.26 the KERNEL_IMAGE_SIZE is changed to 512M, and
accordingly the MODULES_VADDR is changed to 0xffffffffa0000000.  However,
in v3.12 Kees Cook introduced kaslr to randomise the location of kernel.
And the kernel text mapping addr space is enlarged from 512M to 1G.  That
means now KERNEL_IMAGE_SIZE is variable, its value is 512M when kaslr
support is not compiled in and 1G when kaslr support is compiled in.
Accordingly the MODULES_VADDR is changed too to be:

    #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)

So when kaslr is compiled in and enabled, the kernel text mapping addr
space and modules vaddr space need be adjusted.  Otherwise makedumpfile
will collapse since the addr for some symbols is not correct.

Hence KERNEL_IMAGE_SIZE need be exported to vmcoreinfo and got in
makedumpfile to help calculate MODULES_VADDR.

Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kexec: align crash_notes allocation to make it be inside one physical page</title>
<updated>2015-09-10T20:29:01+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2015-09-09T22:39:00+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=bbb78b8f3f4ea8eca14937b693bfe244838e1d4d'/>
<id>urn:sha1:bbb78b8f3f4ea8eca14937b693bfe244838e1d4d</id>
<content type='text'>
People reported that crash_notes in /proc/vmcore were corrupted and this
cause crash kdump failure.  With code debugging and log we got the root
cause.  This is because percpu variable crash_notes are allocated in 2
vmalloc pages.  Currently percpu is based on vmalloc by default.  Vmalloc
can't guarantee 2 continuous vmalloc pages are also on 2 continuous
physical pages.  So when 1st kernel exports the starting address and size
of crash_notes through sysfs like below:

/sys/devices/system/cpu/cpux/crash_notes
/sys/devices/system/cpu/cpux/crash_notes_size

kdump kernel use them to get the content of crash_notes.  However the 2nd
part may not be in the next neighbouring physical page as we expected if
crash_notes are allocated accross 2 vmalloc pages.  That's why
nhdr_ptr-&gt;n_namesz or nhdr_ptr-&gt;n_descsz could be very huge in
update_note_header_size_elf64() and cause note header merging failure or
some warnings.

In this patch change to call __alloc_percpu() to passed in the align value
by rounding crash_notes_size up to the nearest power of two.  This makes
sure the crash_notes is allocated inside one physical page since
sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE.  Meanwhile add
a BUILD_BUG_ON to break compile if size is bigger than PAGE_SIZE since
crash_notes definitely will be in 2 pages.  That need be avoided, and need
be reported if it's unavoidable.

[akpm@linux-foundation.org: use correct comment layout]
Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Lisa Mitchell &lt;lisa.mitchell@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kexec: remove unnecessary test in kimage_alloc_crash_control_pages()</title>
<updated>2015-09-10T20:29:01+00:00</updated>
<author>
<name>Minfei Huang</name>
<email>mnfhuang@gmail.com</email>
</author>
<published>2015-09-09T22:38:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=04e9949b2d26ae1f0acd1181876a2a8ece92112d'/>
<id>urn:sha1:04e9949b2d26ae1f0acd1181876a2a8ece92112d</id>
<content type='text'>
Transforming PFN(Page Frame Number) to struct page is never failure, so we
can simplify the code logic to do the image-&gt;control_page assignment
directly in the loop, and remove the unnecessary conditional judgement.

Signed-off-by: Minfei Huang &lt;mnfhuang@gmail.com&gt;
Acked-by: Dave Young &lt;dyoung@redhat.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
