summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorNick Piggin <npiggin@suse.de>2008-05-14 06:37:36 +0200
committerLinus Torvalds <torvalds@linux-foundation.org>2008-05-14 10:05:18 -0700
commit362a61ad61199e19a61b8e432015e2586b288f5b (patch)
treeb766e454928eea0db1ec6e301340c27ef5f5244f /include
parent73f10281ea96d7e8b4fc1c5d755a7c8eb484155b (diff)
downloadlwn-362a61ad61199e19a61b8e432015e2586b288f5b.tar.gz
lwn-362a61ad61199e19a61b8e432015e2586b288f5b.zip
fix SMP data race in pagetable setup vs walking
There is a possible data race in the page table walking code. After the split ptlock patches, it actually seems to have been introduced to the core code, but even before that I think it would have impacted some architectures (powerpc and sparc64, at least, walk the page tables without taking locks eg. see find_linux_pte()). The race is as follows: The pte page is allocated, zeroed, and its struct page gets its spinlock initialized. The mm-wide ptl is then taken, and then the pte page is inserted into the pagetables. At this point, the spinlock is not guaranteed to have ordered the previous stores to initialize the pte page with the subsequent store to put it in the page tables. So another Linux page table walker might be walking down (without any locks, because we have split-leaf-ptls), and find that new pte we've inserted. It might try to take the spinlock before the store from the other CPU initializes it. And subsequently it might read a pte_t out before stores from the other CPU have cleared the memory. There are also similar races in higher levels of the page tables. They obviously don't involve the spinlock, but could see uninitialized memory. Arch code and hardware pagetable walkers that walk the pagetables without locks could see similar uninitialized memory problems, regardless of whether split ptes are enabled or not. I prefer to put the barriers in core code, because that's where the higher level logic happens, but the page table accessors are per-arch, and open-coding them everywhere I don't think is an option. I'll put the read-side barriers in alpha arch code for now (other architectures perform data-dependent loads in order). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r--include/asm-alpha/pgtable.h21
1 files changed, 19 insertions, 2 deletions
diff --git a/include/asm-alpha/pgtable.h b/include/asm-alpha/pgtable.h
index 05ce5fba43e3..3f0c59f6d8aa 100644
--- a/include/asm-alpha/pgtable.h
+++ b/include/asm-alpha/pgtable.h
@@ -287,17 +287,34 @@ extern inline pte_t pte_mkspecial(pte_t pte) { return pte; }
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
#define pgd_offset(mm, address) ((mm)->pgd+pgd_index(address))
+/*
+ * The smp_read_barrier_depends() in the following functions are required to
+ * order the load of *dir (the pointer in the top level page table) with any
+ * subsequent load of the returned pmd_t *ret (ret is data dependent on *dir).
+ *
+ * If this ordering is not enforced, the CPU might load an older value of
+ * *ret, which may be uninitialized data. See mm/memory.c:__pte_alloc for
+ * more details.
+ *
+ * Note that we never change the mm->pgd pointer after the task is running, so
+ * pgd_offset does not require such a barrier.
+ */
+
/* Find an entry in the second-level page table.. */
extern inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
{
- return (pmd_t *) pgd_page_vaddr(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1));
+ pmd_t *ret = (pmd_t *) pgd_page_vaddr(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1));
+ smp_read_barrier_depends(); /* see above */
+ return ret;
}
/* Find an entry in the third-level page table.. */
extern inline pte_t * pte_offset_kernel(pmd_t * dir, unsigned long address)
{
- return (pte_t *) pmd_page_vaddr(*dir)
+ pte_t *ret = (pte_t *) pmd_page_vaddr(*dir)
+ ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 1));
+ smp_read_barrier_depends(); /* see above */
+ return ret;
}
#define pte_offset_map(dir,addr) pte_offset_kernel((dir),(addr))