<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/include/linux/dax.h, branch standardize-docs</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=standardize-docs</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=standardize-docs'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2017-07-10T23:32:31+00:00</updated>
<entry>
<title>mm: always enable thp for dax mappings</title>
<updated>2017-07-10T23:32:31+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-07-10T22:48:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=baabda261424517110ea98c6651f632ebf2561e3'/>
<id>urn:sha1:baabda261424517110ea98c6651f632ebf2561e3</id>
<content type='text'>
The madvise policy for transparent huge pages is meant to avoid unwanted
allocations of transparent huge pages.  It allows a policy of disabling
the extra memory pressure and effort to arrange for a huge page when it
is not needed.

DAX by definition never incurs this overhead since it is statically
allocated.  The policy choice makes even less sense for device-dax which
tries to guarantee a given tlb-fault size.  Specifically, the following
setting:

	echo never &gt; /sys/kernel/mm/transparent_hugepage/enabled

...violates that guarantee and silently disables all device-dax
instances with a 2M or 1G alignment.  So, let's avoid that non-obvious
side effect by force enabling thp for dax mappings in all cases.

It is worth noting that the reason this uses vma_is_dax(), and the
resulting header include changes, is that previous attempts to add a
VM_DAX flag were NAKd.

Link: http://lkml.kernel.org/r/149739531127.20686.15813586620597484283.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>libnvdimm, pmem, dax: export a cache control attribute</title>
<updated>2017-06-29T16:29:50+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-06-27T04:28:41+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6e0c90d691cd5d90569f5918ab03eb76c81f9c6e'/>
<id>urn:sha1:6e0c90d691cd5d90569f5918ab03eb76c81f9c6e</id>
<content type='text'>
The dax_flush() operation can be turned into a nop on platforms where
firmware arranges for cpu caches to be flushed on a power-fail event.
The ACPI 6.2 specification defines a mechanism for the platform to
indicate this capability so the kernel can select the proper default.
However, for other platforms, the administrator must toggle this setting
manually.

Given this flush setting is a dax-specific mechanism we advertise it
through a 'dax' attribute group hanging off a host device. For example,
a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
'write_cache' attribute to control response to dax cache flush requests.
This is similar to the 'queue/write_cache' attribute that appears under
block devices.

Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: remove default copy_from_iter fallback</title>
<updated>2017-06-27T23:44:27+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-06-27T20:06:22+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5d61e43b3975c0582003329d9de9d5e85abf5d33'/>
<id>urn:sha1:5d61e43b3975c0582003329d9de9d5e85abf5d33</id>
<content type='text'>
Require all dax-drivers to register a -&gt;copy_from_iter() operation so
that it is clear which dax_operations are optional and which must be
implemented for filesystem-dax to operate.

Cc: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dm: add -&gt;flush() dax operation support</title>
<updated>2017-06-15T21:34:59+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-05-29T20:02:52+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=abebfbe2f7315dd3ec9a0c69596a76e32beb5749'/>
<id>urn:sha1:abebfbe2f7315dd3ec9a0c69596a76e32beb5749</id>
<content type='text'>
Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying flush implementations.

Reviewed-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax, pmem: introduce an optional 'flush' dax_operation</title>
<updated>2017-06-15T21:34:59+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-05-29T19:58:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3c1cebff23cdca01c421411e953a9e239f2b9ef9'/>
<id>urn:sha1:3c1cebff23cdca01c421411e953a9e239f2b9ef9</id>
<content type='text'>
Filesystem-DAX flushes caches whenever it writes to the address returned
through dax_direct_access() and when writing back dirty radix entries.
That flushing is only required in the pmem case, so add a dax operation
to allow pmem to take this extra action, but skip it for other dax
capable devices that do not provide a flush routine.

An example for this differentiation might be a volatile ram disk where
there is no expectation of persistence. In fact the pmem driver itself might
front such an address range specified by the NFIT. So, this "no flush"
property might be something passed down by the bus / libnvdimm.

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dm: add -&gt;copy_from_iter() dax operation support</title>
<updated>2017-06-09T16:22:21+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-05-29T19:57:56+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7e026c8c0a4200da86bc51edeaad79dcdccf78ca'/>
<id>urn:sha1:7e026c8c0a4200da86bc51edeaad79dcdccf78ca</id>
<content type='text'>
Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.

Reviewed-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations</title>
<updated>2017-06-09T16:09:56+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-05-29T19:22:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0aed55af88345b5d673240f90e671d79662fb01e'/>
<id>urn:sha1:0aed55af88345b5d673240f90e671d79662fb01e</id>
<content type='text'>
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc: &lt;x86@kernel.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case</title>
<updated>2017-05-14T00:52:16+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-05-13T23:31:05+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f5705aa8cfed142d980ecac12bee0d81b756479e'/>
<id>urn:sha1:f5705aa8cfed142d980ecac12bee0d81b756479e</id>
<content type='text'>
Tetsuo reports:

  fs/built-in.o: In function `xfs_file_iomap_end':
  xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
  fs/built-in.o: In function `xfs_file_iomap_begin':
  xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
  make: *** [vmlinux] Error 1
  $ grep DAX .config
  CONFIG_DAX=m
  # CONFIG_DEV_DAX is not set
  # CONFIG_FS_DAX is not set

When FS_DAX=n we can/must throw away the dax code in filesystems.
Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
nops in the FS_DAX=n case.

Cc: &lt;linux-xfs@vger.kernel.org&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: "Darrick J. Wong" &lt;darrick.wong@oracle.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
Reported-by: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2017-05-13T16:49:35+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-05-13T16:49:35+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=1251704a631b62591ad1d1b6ead252e9e597d5f5'/>
<id>urn:sha1:1251704a631b62591ad1d1b6ead252e9e597d5f5</id>
<content type='text'>
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;:
  mm, docs: update memory.stat description with workingset* entries
  mm: vmscan: scan until it finds eligible pages
  mm, thp: copying user pages must schedule on collapse
  dax: fix PMD data corruption when fault races with write
  dax: fix data corruption when fault races with write
  ext4: return to starting transaction in ext4_dax_huge_fault()
  mm: fix data corruption due to stale mmap reads
  dax: prevent invalidation of mapped DAX entries
  Tigran has moved
  mm, vmalloc: fix vmalloc users tracking properly
  mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
  gcov: support GCC 7.1
  mm, vmstat: Remove spurious WARN() during zoneinfo print
  time: delete current_fs_time()
  hwpoison, memcg: forcibly uncharge LRU pages
</content>
</entry>
<entry>
<title>dax: prevent invalidation of mapped DAX entries</title>
<updated>2017-05-12T22:57:15+00:00</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2017-05-12T22:46:47+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4636e70bb0a8b871998b6841a2e4b205cf2bc863'/>
<id>urn:sha1:4636e70bb0a8b871998b6841a2e4b205cf2bc863</id>
<content type='text'>
Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.

This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).

The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.

This patch (of 4):

dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked.  This is done via:

  invalidate_mapping_pages()
    invalidate_exceptional_entry()
      dax_invalidate_mapping_entry()

However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped.  This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().

For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry.  This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.

We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping-&gt;i_mmap_rwsem.

Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().

Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;    [4.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
