summaryrefslogtreecommitdiff
path: root/drivers/nvdimm
AgeCommit message (Collapse)Author
2024-07-20Merge tag 'libnvdimm-for-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ira Weiny: - One small cleanup to use sizeof(*pointer) - Add MODULE_DESCRIPTIONS() to eliminate make W=1 warnings * tag 'libnvdimm-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: testing: nvdimm: Add MODULE_DESCRIPTION() macros testing: nvdimm: iomap: add MODULE_DESCRIPTION() dax: add missing MODULE_DESCRIPTION() macros nvdimm: add missing MODULE_DESCRIPTION() macros ACPI: NFIT: add missing MODULE_DESCRIPTION() macro nvdimm/btt: use sizeof(*pointer) instead of sizeof(type)
2024-06-19block: move the dax flag to queue_limitsChristoph Hellwig
Move the dax flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move the synchronous flag to queue_limitsChristoph Hellwig
Move the synchronous flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move the nonrot flag to queue_limitsChristoph Hellwig
Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move cache control settings out of queue->flagsChristoph Hellwig
Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-17nvdimm: add missing MODULE_DESCRIPTION() macrosJeff Johnson
Fix the 'make W=1' warnings: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/libnvdimm.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_pmem.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_btt.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_e820.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/of_pmem.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_virtio.o [iweiny: edit core module description] Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/r/20240526-md-drivers-nvdimm-v1-1-9e583677e80f@quicinc.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-06-17nvdimm/btt: use sizeof(*pointer) instead of sizeof(type)Erick Archer
It is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter). This patch has no effect on runtime behavior. Signed-off-by: Erick Archer <erick.archer@outlook.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/r/AS8PR02MB72372490C53FB2E35DA1ADD08BFE2@AS8PR02MB7237.eurprd02.prod.outlook.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-06-14block: move integrity information into queue_limitsChristoph Hellwig
Move the integrity information into the queue limits so that it can be set atomically with other queue limits, and that the sysfs changes to the read_verify and write_generate flags are properly synchronized. This also allows to provide a more useful helper to stack the integrity fields, although it still is separate from the main stacking function as not all stackable devices want to inherit the integrity settings. Even with that it greatly simplifies the code in md and dm. Note that the integrity field is moved as-is into the queue limits. While there are good arguments for removing the separate blk_integrity structure, this would cause a lot of churn and might better be done at a later time if desired. However the integrity field in the queue_limits structure is now unconditional so that various ifdefs can be avoided or replaced with IS_ENABLED(). Given that tiny size of it that seems like a worthwhile trade off. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240613084839.1044015-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...
2024-05-22nvdimm: virtio_pmem: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-21-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
2024-04-25nvdimm/btt: always set max_integrity_segmentsChristoph Hellwig
max_integrity_segments is just a hardware/driver limit and can be safely set even when integrity data is not supported. Set it in the initial queue_limits passed to blk_alloc_disk to simplify the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240306142739.237234-3-hch@lst.de Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-04-25nvdimm: remove nd_integrity_initChristoph Hellwig
nd_integrity_init is only called from a single place. Open code it there, and use IS_ENABLED to remove the need for an extra stub. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240306142739.237234-2-hch@lst.de Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-03-15Merge tag 'libnvdimm-for-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dave Jiang: - ACPI_NFIT Kconfig documetation fix - Make nvdimm_bus_type const - Make dax_bus_type const - remove SLAB_MEM_SPREAD flag usage in DAX * tag 'libnvdimm-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: remove SLAB_MEM_SPREAD flag usage device-dax: make dax_bus_type const nvdimm: make nvdimm_bus_type const libnvdimm: Fix ACPI_NFIT in BLK_DEV_PMEM help
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-02-22nvdimm/pmem: Treat alloc_dax() -EOPNOTSUPP failure as non-fatalMathieu Desnoyers
In preparation for checking whether the architecture has data cache aliasing within alloc_dax(), modify the error handling of nvdimm/pmem pmem_attach_disk() to treat alloc_dax() -EOPNOTSUPP failure as non-fatal. [ Based on commit "nvdimm/pmem: Fix leak on dax_add_host() failure". ] Link: https://lkml.kernel.org/r/20240215144633.96437-4-mathieu.desnoyers@efficios.com Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing caches") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Sclafani <dm-devel@lists.linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22nvdimm/pmem: fix leak on dax_add_host() failureMathieu Desnoyers
Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done before setting pmem->dax_dev, which therefore issues the two following calls on NULL pointers: out_cleanup_dax: kill_dax(pmem->dax_dev); put_dax(pmem->dax_dev); Link: https://lkml.kernel.org/r/20240208184913.484340-1-mathieu.desnoyers@efficios.com Link: https://lkml.kernel.org/r/20240208184913.484340-2-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-19pmem: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig
Pass the queue limits directly to blk_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-19btt: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig
Pass the queue limits directly to blk_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-19block: pass a queue_limits argument to blk_alloc_diskChristoph Hellwig
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12nvdimm: make nvdimm_bus_type constRicardo B. Marliere
Now that the driver core can properly handle constant struct bus_type, move the nvdimm_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240204-bus_cleanup-nvdimm-v1-1-77ae19fa3e3b@marliere.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-02-12libnvdimm: Fix ACPI_NFIT in BLK_DEV_PMEM helpPeter Robinson
The ACPI_NFIT config option is described incorrectly as the inverse NFIT_ACPI, which doesn't exist, so update the help to the actual config option. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Link: https://lore.kernel.org/r/20240212123716.795996-1-pbrobinson@gmail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vdpa/mlx5: support for resumable vqs - virtio_scsi: mq_poll support - 3virtio_pmem: support SHMEM_REGION - virtio_balloon: stay awake while adjusting balloon - virtio: support for no-reset virtio PCI PM - Fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Add mkey leak detection vdpa/mlx5: Introduce reference counting to mrs vdpa/mlx5: Use vq suspend/resume during .set_map vdpa/mlx5: Mark vq state for modification in hw vq vdpa/mlx5: Mark vq addrs for modification in hw vq vdpa/mlx5: Introduce per vq and device resume vdpa/mlx5: Allow modifying multiple vq fields in one modify command vdpa/mlx5: Expose resumable vq capability vdpa: Block vq property changes in DRIVER_OK vdpa: Track device suspended state scsi: virtio_scsi: Add mq_poll support virtio_pmem: support feature SHMEM_REGION virtio_balloon: stay awake while adjusting balloon vdpa: Remove usage of the deprecated ida_simple_xx() API virtio: Add support for no-reset virtio PCI PM virtio_net: fix missing dma unmap for resize vhost-vdpa: account iommu allocations vdpa: Fix an error handling path in eni_vdpa_probe()
2024-01-12Merge tag 'libnvdimm-for-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ira Weiny: "A mix of bug fixes and updates to interfaces used by nvdimm: - Updates to interfaces include: Use the new scope based management Remove deprecated ida interfaces Update to sysfs_emit() - Fixup kdoc comments" * tag 'libnvdimm-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: acpi/nfit: Use sysfs_emit() for all attributes nvdimm/namespace: fix kernel-doc for function params nvdimm/dimm_devs: fix kernel-doc for function params nvdimm/btt: fix btt_blk_cleanup() kernel-doc nvdimm-btt: simplify code with the scope based resource management nvdimm: Remove usage of the deprecated ida_simple_xx() API ACPI: NFIT: Use cleanup.h helpers instead of devm_*()
2024-01-10virtio_pmem: support feature SHMEM_REGIONChangyuan Lyu
This patch adds the support for feature VIRTIO_PMEM_F_SHMEM_REGION (virtio spec v1.2 section 5.19.5.2 [1]). During feature negotiation, if VIRTIO_PMEM_F_SHMEM_REGION is offered by the device, the driver looks for a shared memory region of id 0. If it is found, this feature is understood. Otherwise, this feature bit is cleared. During probe, if VIRTIO_PMEM_F_SHMEM_REGION has been negotiated, virtio pmem ignores the `start` and `size` fields in device config and uses the physical address range of shared memory region 0. [1] https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-6480002 Signed-off-by: Changyuan Lyu <changyuanl@google.com> Message-Id: <20231220204906.566922-1-changyuanl@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-01-03nvdimm/namespace: fix kernel-doc for function paramsRandy Dunlap
Adjust kernel-doc notation to prevent warnings when using -Wall. namespace_devs.c:76: warning: No description found for return value of 'nd_is_uuid_unique' namespace_devs.c:343: warning: No description found for return value of 'shrink_dpa_allocation' namespace_devs.c:668: warning: No description found for return value of 'grow_dpa_allocation' namespace_devs.c:958: warning: No description found for return value of 'namespace_update_uuid' namespace_devs.c:1665: warning: Function parameter or member 'nd_mapping' not described in 'create_namespace_pmem' namespace_devs.c:1665: warning: Excess function parameter 'nspm' description in 'create_namespace_pmem' namespace_devs.c:1665: warning: No description found for return value of 'create_namespace_pmem' [iweiny: s/-errno/ERR_PTR(-errno)/] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: <nvdimm@lists.linux.dev> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231207210545.24056-3-rdunlap@infradead.org Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-01-03nvdimm/dimm_devs: fix kernel-doc for function paramsRandy Dunlap
Adjust kernel-doc notation to prevent warnings when using -Wall. dimm_devs.c:59: warning: Function parameter or member 'ndd' not described in 'nvdimm_init_nsarea' dimm_devs.c:59: warning: Excess function parameter 'nvdimm' description in 'nvdimm_init_nsarea' dimm_devs.c:59: warning: No description found for return value of 'nvdimm_init_nsarea' dimm_devs.c:728: warning: No description found for return value of 'nd_pmem_max_contiguous_dpa' dimm_devs.c:773: warning: No description found for return value of 'nd_pmem_available_dpa' dimm_devs.c:844: warning: Function parameter or member 'ndd' not described in 'nvdimm_allocated_dpa' dimm_devs.c:844: warning: Excess function parameter 'nvdimm' description in 'nvdimm_allocated_dpa' dimm_devs.c:844: warning: No description found for return value of 'nvdimm_allocated_dpa' [iweiny: drop ND_CMD_* status code] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: <nvdimm@lists.linux.dev> Link: https://lore.kernel.org/r/20231207210545.24056-2-rdunlap@infradead.org Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-01-03nvdimm/btt: fix btt_blk_cleanup() kernel-docRandy Dunlap
Correct the function parameters to prevent kernel-doc warnings: btt.c:1567: warning: Function parameter or member 'nd_region' not described in 'btt_init' btt.c:1567: warning: Excess function parameter 'maxlane' description in 'btt_init' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: <nvdimm@lists.linux.dev> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231207210545.24056-1-rdunlap@infradead.org Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-01-03nvdimm-btt: simplify code with the scope based resource managementDinghao Liu
Use the scope based resource management (defined in linux/cleanup.h) to automate resource lifetime control on struct btt_sb *super in discover_arenas(). Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231214083919.22218-1-dinghao.liu@zju.edu.cn Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-01-03nvdimm: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/50719568e4108f65f3b989ba05c1563e17afba3f.1702228319.git.christophe.jaillet@wanadoo.fr Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2023-12-01nvdimm/btt: replace deprecated strncpy with strscpyJustin Stitt
Found with grep. strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect super->signature to be NUL-terminated based on its usage with memcmp against a NUL-term'd buffer: btt_devs.c: 253 | if (memcmp(super->signature, BTT_SIG, BTT_SIG_LEN) != 0) btt.h: 13 | #define BTT_SIG "BTT_ARENA_INFO\0" NUL-padding is not required as `super` is already zero-allocated: btt.c: 985 | super = kzalloc(sizeof(struct btt_sb), GFP_NOIO); ... rendering any additional NUL-padding superfluous. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Let's also use the more idiomatic strscpy usage of (dest, src, sizeof(dest)) instead of (dest, src, XYZ_LEN) for buffers that the compiler can determine the size of. This more tightly correlates the destination buffer to the amount of bytes copied. Side note, this pattern of memcmp() on two NUL-terminated strings should really be changed to just a strncmp(), if i'm not mistaken? I see multiple instances of this pattern in this system: | if (memcmp(super->signature, BTT_SIG, BTT_SIG_LEN) != 0) | return false; where BIT_SIG is defined (weirdly) as a double NUL-terminated string: | #define BTT_SIG "BTT_ARENA_INFO\0" Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231019-strncpy-drivers-nvdimm-btt-c-v2-1-366993878cf0@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18libnvdimm: remove kernel-doc warnings:Zhu Wang
Remove kernel-doc warnings: drivers/nvdimm/badrange.c:271: warning: Function parameter or member 'nd_region' not described in 'nvdimm_badblocks_populate' drivers/nvdimm/badrange.c:271: warning: Function parameter or member 'range' not described in 'nvdimm_badblocks_populate' drivers/nvdimm/badrange.c:271: warning: Excess function parameter 'region' description in 'nvdimm_badblocks_populate' drivers/nvdimm/badrange.c:271: warning: Excess function parameter 'res' description in 'nvdimm_badblocks_populate' Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Link: https://lore.kernel.org/r/20230731112942.215135-1-wangzhu9@huawei.com Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2023-09-27libnvdimm: Annotate struct nd_region with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nd_region. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: nvdimm@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2023-09-27nd_btt: Make BTT lanes preemptibleTomas Glozar
nd_region_acquire_lane uses get_cpu, which disables preemption. This is an issue on PREEMPT_RT kernels, since btt_write_pg and also nd_region_acquire_lane itself take a spin lock, resulting in BUG: sleeping function called from invalid context. Fix the issue by replacing get_cpu with smp_process_id and migrate_disable when needed. This makes BTT operations preemptible, thus permitting the use of spin_lock. BUG example occurring when running ndctl tests on PREEMPT_RT kernel: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4903, name: libndctl preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: [<ffffffffc1313db5>] nd_region_acquire_lane+0x15/0x90 [libnvdimm] Call Trace: <TASK> dump_stack_lvl+0x8e/0xb0 __might_resched+0x19b/0x250 rt_spin_lock+0x4c/0x100 ? btt_write_pg+0x2d7/0x500 [nd_btt] btt_write_pg+0x2d7/0x500 [nd_btt] ? local_clock_noinstr+0x9/0xc0 btt_submit_bio+0x16d/0x270 [nd_btt] __submit_bio+0x48/0x80 __submit_bio_noacct+0x7e/0x1e0 submit_bio_wait+0x58/0xb0 __blkdev_direct_IO_simple+0x107/0x240 ? inode_set_ctime_current+0x51/0x110 ? __pfx_submit_bio_wait_endio+0x10/0x10 blkdev_write_iter+0x1d8/0x290 vfs_write+0x237/0x330 ... </TASK> Fixes: 5212e11fde4d ("nd_btt: atomic sector updates") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2023-09-27libnvdimm/of_pmem: Use devm_kstrdup instead of kstrdup and check its return ↵Chen Ni
value Use devm_kstrdup() instead of kstrdup() and check its return value to avoid memory leak. Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus provider") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2023-08-30Merge tag 'libnvdimm-for-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull nvdimm updates from Dave Jiang: "This is mostly small cleanups, fixes, and with a change to prevent zero-sized namespace exposed to user for nvdimm. Summary: - kstrtobool() conversion for nvdimm - Add REQ_OP_WRITE for virtio_pmem - Header files update for of_pmem - Restrict zero-sized namespace from being exposed to user - Avoid unnecessary endian conversion - Fix mem leak in nvdimm pmu - Fix dereference after free in nvdimm pmu" * tag 'libnvdimm-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm: Fix dereference after free in register_nvdimm_pmu() nvdimm: Fix memleak of pmu attr_groups in unregister_nvdimm_pmu() nvdimm/pfn_dev: Avoid unnecessary endian conversion nvdimm/pfn_dev: Prevent the creation of zero-sized namespaces nvdimm: Explicitly include correct DT includes virtio_pmem: add the missing REQ_OP_WRITE for flush bio nvdimm: Use kstrtobool() instead of strtobool()
2023-08-18mm/hugepage pud: allow arch-specific helper function to check huge page pud ↵Aneesh Kumar K.V
support Patch series "Add support for DAX vmemmap optimization for ppc64", v6. This patch series implements changes required to support DAX vmemmap optimization for ppc64. The vmemmap optimization is only enabled with radix MMU translation and 1GB PUD mapping with 64K page size. The patch series also splits the hugetlb vmemmap optimization as a separate Kconfig variable so that architectures can enable DAX vmemmap optimization without enabling hugetlb vmemmap optimization. This should enable architectures like arm64 to enable DAX vmemmap optimization while they can't enable hugetlb vmemmap optimization. More details of the same are in patch "mm/vmemmap optimization: Split hugetlb and devdax vmemmap optimization". With 64K page size for 16384 pages added (1G) we save 14 pages With 4K page size for 262144 pages added (1G) we save 4094 pages With 4K page size for 512 pages added (2M) we save 6 pages This patch (of 13): Architectures like powerpc would like to enable transparent huge page pud support only with radix translation. To support that add has_transparent_pud_hugepage() helper that architectures can override. [aneesh.kumar@linux.ibm.com: use the new has_transparent_pud_hugepage()] Link: https://lkml.kernel.org/r/87tttrvtaj.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20230724190759.483013-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20230724190759.483013-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-17nvdimm: Fix dereference after free in register_nvdimm_pmu()Konstantin Meskhidze
'nd_pmu->pmu.attr_groups' is dereferenced in function 'nvdimm_pmu_free_hotplug_memory' call after it has been freed. Because in function 'nvdimm_pmu_free_hotplug_memory' memory pointed by the fields of 'nd_pmu->pmu.attr_groups' is deallocated it is necessary to call 'kfree' after 'nvdimm_pmu_free_hotplug_memory'. Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats") Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/20230817114103.754977-1-konstantin.meskhidze@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-08-17nvdimm: Fix memleak of pmu attr_groups in unregister_nvdimm_pmu()Konstantin Meskhidze
Memory pointed by 'nd_pmu->pmu.attr_groups' is allocated in function 'register_nvdimm_pmu' and is lost after 'kfree(nd_pmu)' call in function 'unregister_nvdimm_pmu'. Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats") Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/20230817115945.771826-1-konstantin.meskhidze@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-08-11nvdimm/pfn_dev: Avoid unnecessary endian conversionAneesh Kumar K.V
use the local variable that already have the converted values. No functional change in this patch. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20230809053512.350660-2-aneesh.kumar@linux.ibm.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-08-11nvdimm/pfn_dev: Prevent the creation of zero-sized namespacesAneesh Kumar K.V
On architectures that have different page size values used for kernel direct mapping and userspace mappings, the user can end up creating zero-sized namespaces as shown below :/sys/bus/nd/devices/region1# cat align 0x1000000 /sys/bus/nd/devices/region1# echo 0x200000 > align /sys/bus/nd/devices/region1/dax1.0# cat supported_alignments 65536 16777216 $ ndctl create-namespace -r region1 -m devdax -s 18M --align 64K { "dev":"namespace1.0", "mode":"devdax", "map":"dev", "size":0, "uuid":"3094329a-0c66-4905-847e-357223e56ab0", "daxregion":{ "id":1, "size":0, "align":65536 }, "align":65536 } similarily for fsdax $ ndctl create-namespace -r region1 -m fsdax -s 18M --align 64K { "dev":"namespace1.0", "mode":"fsdax", "map":"dev", "size":0, "uuid":"45538a6f-dec7-405d-b1da-2a4075e06232", "sector_size":512, "align":65536, "blockdev":"pmem1" } In commit 9ffc1d19fc4a ("mm/memremap_pages: Introduce memremap_compat_align()") memremap_compat_align was added to make sure the kernel always creates namespaces with 16MB alignment. But the user can still override the region alignment and no input validation is done in the region alignment values to retain the flexibility user had before. However, the kernel ensures that only part of the namespace that can be mapped via kernel direct mapping page size is enabled. This is achieved by tracking the unmapped part of the namespace in pfn_sb->end_trunc. The kernel also ensures that the start address of the namespace is also aligned to the kernel direct mapping page size. Depending on the user request, the kernel implements userspace mapping alignment by updating pfn device alignment attribute and this value is used to adjust the start address for userspace mappings. This is tracked in pfn_sb->dataoff. Hence the available size for userspace mapping is: usermapping_size = size of the namespace - pfn_sb->end_trun - pfn_sb->dataoff If the kernel finds the user mapping size zero then don't allow the creation of namespace. After fix: $ ndctl create-namespace -f -r region1 -m devdax -s 18M --align 64K libndctl: ndctl_dax_enable: dax1.1: failed to enable Error: namespace1.2: failed to enable failed to create namespace: No such device or address And existing zero sized namespace will be marked disabled. root@ltczz75-lp2:/home/kvaneesh# ndctl list -N -r region1 -i [ { "dev":"namespace1.0", "mode":"raw", "size":18874368, "uuid":"94a90fb0-8e78-4fb6-a759-ffc62f9fa181", "sector_size":512, "state":"disabled" }, Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/20230809053512.350660-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-07-19nvdimm: Explicitly include correct DT includesRob Herring
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-07-19virtio_pmem: add the missing REQ_OP_WRITE for flush bioHou Tao
When doing mkfs.xfs on a pmem device, the following warning was reported: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct Modules linked in: CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:submit_bio_noacct+0x340/0x520 ...... Call Trace: <TASK> ? submit_bio_noacct+0xd5/0x520 submit_bio+0x37/0x60 async_pmem_flush+0x79/0xa0 nvdimm_flush+0x17/0x40 pmem_submit_bio+0x370/0x390 __submit_bio+0xbc/0x190 submit_bio_noacct_nocheck+0x14d/0x370 submit_bio_noacct+0x1ef/0x520 submit_bio+0x55/0x60 submit_bio_wait+0x5a/0xc0 blkdev_issue_flush+0x44/0x60 The root cause is that submit_bio_noacct() needs bio_op() is either WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail the flush bio. Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we could fix the flush order issue and do flush optimization later. Cc: stable@vger.kernel.org # 6.3+ Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Tested-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-07-19nvdimm: Use kstrtobool() instead of strtobool()Christophe JAILLET
strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2023-06-26dax: enable dax fault handler to report VM_FAULT_HWPOISONJane Chu
When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb set for the poison scope. Soon after, any other process issues a 'load' to the poisoned page (that is unmapped from the kernel side by memory_failure), it receives a SIGBUS with si_code = BUS_ADRERR and without valid si_lsb. This is confusing to user, and is different from page fault due to poison in RAM memory, also some helpful information is lost. Channel dax backend driver's poison detection to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. If user level block IO syscalls fail due to poison, the errno will be converted to EIO to maintain block API consistency. Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-06-23nvdimm: make security_show staticBen Dooks
The security_show function is not used outside of drivers/nvdimm/dimm_devs.c and the attribute it is for is also already static. Silence the sparse warning for this not being declared by making it static. Fixes: drivers/nvdimm/dimm_devs.c:352:9: warning: symbol 'security_show' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://lore.kernel.org/r/20230616160925.17687-1-ben.dooks@codethink.co.uk Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-06-23nvdimm: make nd_class variable staticBen Dooks
The nd_class is not used outside of drivers/nvdimm/bus.c and thus sparse is generating the following warning. Remove this by making it static: drivers/nvdimm/bus.c:28:14: warning: symbol 'nd_class' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://lore.kernel.org/r/20230616160628.11801-1-ben.dooks@codethink.co.uk Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-06-07libnvdimm: mark 'security_show' static againArnd Bergmann
The security_show() function was made global and __weak at some point to allow overriding it. The override was removed later, but it remains global, which causes a warning about the missing declaration: drivers/nvdimm/dimm_devs.c:352:9: error: no previous prototype for 'security_show' This is also not an appropriate name for a global symbol in the kernel, so just make it static again. Fixes: 15a8348707ff ("libnvdimm: Introduce CONFIG_NVDIMM_SECURITY_TEST flag") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230516201415.556858-3-arnd@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-03-17driver core: class: remove module * from class_create()Greg Kroah-Hartman
The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
2023-02-25Merge tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds
Pull Compute Express Link (CXL) updates from Dan Williams: "To date Linux has been dependent on platform-firmware to map CXL RAM regions and handle events / errors from devices. With this update we can now parse / update the CXL memory layout, and report events / errors from devices. This is a precursor for the CXL subsystem to handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that for DDR-attached-DRAM is handled by the EDAC driver where it maps system physical address events to a field-replaceable-unit (FRU / endpoint device). In general, CXL has the potential to standardize what has historically been a pile of memory-controller-specific error handling logic. Another change of note is the default policy for handling RAM-backed device-dax instances. Previously the default access mode was "device", mmap(2) a device special file to access memory. The new default is "kmem" where the address range is assigned to the core-mm via add_memory_driver_managed(). This saves typical users from wondering why their platform memory is not visible via free(1) and stuck behind a device-file. At the same time it allows expert users to deploy policy to, for example, get dedicated access to high performance memory, or hide low performance memory from general purpose kernel allocations. This affects not only CXL, but also systems with high-bandwidth-memory that platform-firmware tags with the EFI_MEMORY_SP (special purpose) designation. Summary: - CXL RAM region enumeration: instantiate 'struct cxl_region' objects for platform firmware created memory regions - CXL RAM region provisioning: complement the existing PMEM region creation support with RAM region support - "Soft Reservation" policy change: Online (memory hot-add) soft-reserved memory (EFI_MEMORY_SP) by default, but still allow for setting aside such memory for dedicated access via device-dax. - CXL Events and Interrupts: Takeover CXL event handling from platform-firmware (ACPI calls this CXL Memory Error Reporting) and export CXL Events via Linux Trace Events. - Convey CXL _OSC results to drivers: Similar to PCI, let the CXL subsystem interrogate the result of CXL _OSC negotiation. - Emulate CXL DVSEC Range Registers as "decoders": Allow for first-generation devices that pre-date the definition of the CXL HDM Decoder Capability to translate the CXL DVSEC Range Registers into 'struct cxl_decoder' objects. - Set timestamp: Per spec, set the device timestamp in case of hotplug, or if platform-firwmare failed to set it. - General fixups: linux-next build issues, non-urgent fixes for pre-production hardware, unit test fixes, spelling and debug message improvements" * tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits) dax/kmem: Fix leak of memory-hotplug resources cxl/mem: Add kdoc param for event log driver state cxl/trace: Add serial number to trace points cxl/trace: Add host output to trace points cxl/trace: Standardize device information output cxl/pci: Remove locked check for dvsec_range_allowed() cxl/hdm: Add emulation when HDM decoders are not committed cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders cxl/hdm: Emulate HDM decoder from DVSEC range registers cxl/pci: Refactor cxl_hdm_decode_init() cxl/port: Export cxl_dvsec_rr_decode() to cxl_port cxl/pci: Break out range register decoding from cxl_hdm_decode_init() cxl: add RAS status unmasking for CXL cxl: remove unnecessary calling of pci_enable_pcie_error_reporting() dax/hmem: build hmem device support as module if possible dax: cxl: add CXL_REGION dependency cxl: avoid returning uninitialized error code cxl/pmem: Fix nvdimm registration races cxl/mem: Fix UAPI command comment cxl/uapi: Tag commands from cxl_query_cmd() ...