summaryrefslogtreecommitdiff
path: root/drivers/base/arch_numa.c
AgeCommit message (Collapse)Author
2024-12-01arch_numa: Restore nid checks before registering a memblock with a nodeMarc Zyngier
Commit 767507654c22 ("arch_numa: switch over to numa_memblks") significantly cleaned up the NUMA registration code, but also dropped a significant check that was refusing to accept to configure a memblock with an invalid nid. On "quality hardware" such as my ThunderX machine, this results in a kernel that dies immediately: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a10] [ 0.000000] Linux version 6.12.0-00013-g8920d74cf8db (maz@valley-girl) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #3872 SMP PREEMPT Wed Nov 27 15:25:49 GMT 2024 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: Cavium ThunderX CN88XX board [ 0.000000] efi: EFI v2.4 by American Megatrends [ 0.000000] efi: ESRT=0xffce0ff18 SMBIOS 3.0=0xfffb0000 ACPI 2.0=0xffec60000 MEMRESERVE=0xffc905d98 [ 0.000000] esrt: Reserving ESRT space from 0x0000000ffce0ff18 to 0x0000000ffce0ff50. [ 0.000000] earlycon: pl11 at MMIO 0x000087e024000000 (options '115200n8') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] NODE_DATA(0) allocated [mem 0xff6754580-0xff67566bf] [ 0.000000] Unable to handle kernel paging request at virtual address 0000000000001d40 [ 0.000000] Mem abort info: [ 0.000000] ESR = 0x0000000096000004 [ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.000000] SET = 0, FnV = 0 [ 0.000000] EA = 0, S1PTW = 0 [ 0.000000] FSC = 0x04: level 0 translation fault [ 0.000000] Data abort info: [ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 0.000000] [0000000000001d40] user address but active_mm is swapper [ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.0-00013-g8920d74cf8db #3872 [ 0.000000] Hardware name: Cavium ThunderX CN88XX board (DT) [ 0.000000] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.000000] pc : sparse_init_nid+0x54/0x428 [ 0.000000] lr : sparse_init+0x118/0x240 [ 0.000000] sp : ffff800081da3cb0 [ 0.000000] x29: ffff800081da3cb0 x28: 0000000fedbab10c x27: 0000000000000001 [ 0.000000] x26: 0000000ffee250f8 x25: 0000000000000001 x24: ffff800082102cd0 [ 0.000000] x23: 0000000000000001 x22: 0000000000000000 x21: 00000000001fffff [ 0.000000] x20: 0000000000000001 x19: 0000000000000000 x18: ffffffffffffffff [ 0.000000] x17: 0000000001b00000 x16: 0000000ffd130000 x15: 0000000000000000 [ 0.000000] x14: 00000000003e0000 x13: 00000000000001c8 x12: 0000000000000014 [ 0.000000] x11: ffff800081e82860 x10: ffff8000820fb2c8 x9 : ffff8000820fb490 [ 0.000000] x8 : 0000000000ffed20 x7 : 0000000000000014 x6 : 00000000001fffff [ 0.000000] x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000000 [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000040 x0 : 0000000000000007 [ 0.000000] Call trace: [ 0.000000] sparse_init_nid+0x54/0x428 [ 0.000000] sparse_init+0x118/0x240 [ 0.000000] bootmem_init+0x70/0x1c8 [ 0.000000] setup_arch+0x184/0x270 [ 0.000000] start_kernel+0x74/0x670 [ 0.000000] __primary_switched+0x80/0x90 [ 0.000000] Code: f865d804 d37df060 cb030000 d2800003 (b95d4084) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- while previous kernel versions were able to recognise how brain-damaged the machine is, and only build a fake node. Use the memblock_validate_numa_coverage() helper to restore some sanity and a "working" system. Fixes: 767507654c22 ("arch_numa: switch over to numa_memblks") Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241201092702.3792845-1-maz@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2024-09-03arch_numa: switch over to numa_memblksMike Rapoport (Microsoft)
Until now arch_numa was directly translating firmware NUMA information to memblock. Using numa_memblks as an intermediate step has a few advantages: * alignment with more battle tested x86 implementation * availability of NUMA emulation * maintaining node information for not yet populated memory Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t and replace current functionality related to numa_add_memblk() and __node_distance() in arch_numa with the implementation based on numa_memblks and add functions required by numa_emulation. [rppt@kernel.org: fix section mismatch] Link: https://lkml.kernel.org/r/ZrO6cExVz1He_yPn@kernel.org [rppt@kernel.org: PFN_PHYS() translation is unnecessary here] Link: https://lkml.kernel.org/r/Zs2T5wkSYO9MGcab@kernel.org Link: https://lkml.kernel.org/r/20240807064110.1003856-25-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: pull out allocation of NODE_DATA to generic codeMike Rapoport (Microsoft)
Architectures that support NUMA duplicate the code that allocates NODE_DATA on the node-local memory with slight variations in reporting of the addresses where the memory was allocated. Use x86 version as the basis for the generic alloc_node_data() function and call this function in architecture specific numa initialization. Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like x86 used to do since the bootmem era when allocation granularity was PAGE_SIZE anyway. Link: https://lkml.kernel.org/r/20240807064110.1003856-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)
Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-22ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_initHaibo Xu
There are lots of ACPI enabled systems that aren't NUMA and If the firmware didn't provide the SRAT/SLIT, then there will be a message "Failed to initialise from firmware" from arch_acpi_numa_init() which adding noise to the boot on all of those kind of systems. Replace the pr_info with pr_debug in arch_acpi_numa_init() to avoid it. Suggested-by: Sunil V L <sunilvl@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/109354315a02cd22145d2effa4a8c571b69d3e56.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-05arm64: irq: set the correct node for VMAP stackHuang Shijie
In current code, init_irq_stacks() will call cpu_to_node(). The cpu_to_node() depends on percpu "numa_node" which is initialized in: arch_call_rest_init() --> rest_init() -- kernel_init() --> kernel_init_freeable() --> smp_prepare_cpus() But init_irq_stacks() is called in init_IRQ() which is before arch_call_rest_init(). So in init_irq_stacks(), the cpu_to_node() does not work, it always return 0. In NUMA, it makes the node 1 cpu accesses the IRQ stack which is in the node 0. This patch fixes it by: 1.) export the early_cpu_to_node(), and use it in the init_irq_stacks(). 2.) change init_irq_stacks() to __init function. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com> Link: https://lore.kernel.org/r/20231124031513.81548-1-shijie@os.amperecomputing.com Signed-off-by: Will Deacon <will@kernel.org>
2022-01-20mm: percpu: add generic pcpu_populate_pte() functionKefeng Wang
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_populate_pte(), which is marked __weak and used on most architectures, but it is overridden on x86, which has its own implementation. Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add generic pcpu_fc_alloc/free funcitonKefeng Wang
With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architecture. Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedefKefeng Wang
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding node by it, this is prepare for the next patch. Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memblock: use memblock_free for freeing virtual pointersMike Rapoport
Rename memblock_free_ptr() to memblock_free() and use memblock_free() when freeing a virtual pointer so that memblock_free() will be a counterpart of memblock_alloc() The callers are updated with the below semantic patch and manual addition of (void *) casting to pointers that are represented by unsigned long variables. @@ identifier vaddr; expression size; @@ ( - memblock_phys_free(__pa(vaddr), size); + memblock_free(vaddr, size); | - memblock_free_ptr(vaddr, size); + memblock_free(vaddr, size); ) [sfr@canb.auug.org.au: fixup] Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memblock: rename memblock_free to memblock_phys_freeMike Rapoport
Since memblock_free() operates on a physical range, make its name reflect it and rename it to memblock_phys_free(), so it will be a logical counterpart to memblock_phys_alloc(). The callers are updated with the below semantic patch: @@ expression addr; expression size; @@ - memblock_free(addr, size); + memblock_phys_free(addr, size); Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memblock: drop memblock_free_early_nid() and memblock_free_early()Mike Rapoport
memblock_free_early_nid() is unused and memblock_free_early() is an alias for memblock_free(). Replace calls to memblock_free_early() with calls to memblock_free() and remove memblock_free_early() and memblock_free_early_nid(). Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06arch_numa: simplify numa_distance allocationMike Rapoport
Patch series "memblock: cleanup memblock_free interface", v2. This is the fix for memblock freeing APIs mismatch [1]. The first patch is a cleanup of numa_distance allocation in arch_numa I've spotted during the conversion. The second patch is a fix for Xen memory freeing on some of the error paths. [1] https://lore.kernel.org/all/CAHk-=wj9k4LZTz+svCxLYs5Y1=+yKrbAUArH1+ghyG3OLd8VVg@mail.gmail.com This patch (of 6): Memory allocation of numa_distance uses memblock_phys_alloc_range() without actual range limits, converts the returned physical address to virtual and then only uses the virtual address for further initialization. Simplify this by replacing memblock_phys_alloc_range() with memblock_alloc(). Link: https://lkml.kernel.org/r/20210930185031.18648-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20210930185031.18648-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06arm64: support page mapping percpu first chunk allocatorKefeng Wang
Percpu embedded first chunk allocator is the firstly option, but it could fails on ARM64, eg, percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000 percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000 percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000 then we could get WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838 and the system could not boot successfully. Let's implement page mapping percpu first chunk allocator as a fallback to the embedding allocator to increase the robustness of the system. Link: https://lkml.kernel.org/r/20210910053354.26721-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Marco Elver <elver@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14memblock: introduce saner 'memblock_free_ptr()' interfaceLinus Torvalds
The boot-time allocation interface for memblock is a mess, with 'memblock_alloc()' returning a virtual pointer, but then you are supposed to free it with 'memblock_free()' that takes a _physical_ address. Not only is that all kinds of strange and illogical, but it actually causes bugs, when people then use it like a normal allocation function, and it fails spectacularly on a NULL pointer: https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/ or just random memory corruption if the debug checks don't catch it: https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/ I really don't want to apply patches that treat the symptoms, when the fundamental cause is this horribly confusing interface. I started out looking at just automating a sane replacement sequence, but because of this mix or virtual and physical addresses, and because people have used the "__pa()" macro that can take either a regular kernel pointer, or just the raw "unsigned long" address, it's all quite messy. So this just introduces a new saner interface for freeing a virtual address that was allocated using 'memblock_alloc()', and that was kept as a regular kernel pointer. And then it converts a couple of users that are obvious and easy to test, including the 'xbc_nodes' case in lib/bootconfig.c that caused problems. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03memblock: make memblock_find_in_range method privateMike Rapoport
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-18arch_numa: fix common code printing of phys_addr_tRandy Dunlap
Fix build warnings in the arch_numa common code: ../include/linux/kern_levels.h:5:18: warning: format '%Lx' expects argument of type 'long long unsigned int', but argument 3 has type 'phys_addr_t' {aka 'unsigned int'} [-Wformat=] ../drivers/base/arch_numa.c:360:56: note: format string is defined here 360 | pr_warn("Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n", ../drivers/base/arch_numa.c:435:39: note: format string is defined here 435 | pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1); Fixes: ae3c107cd8be ("numa: Move numa implementation to common code") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-14numa: Move numa implementation to common codeAtish Patra
ARM64 numa implementation is generic enough that RISC-V can reuse that implementation with very minor cosmetic changes. This will help both ARM64 and RISC-V in terms of maintanace and feature improvement Move the numa implementation code to common directory so that both ISAs can reuse this. This doesn't introduce any function changes for ARM64. Signed-off-by: Atish Patra <atish.patra@wdc.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>