summaryrefslogtreecommitdiff
path: root/Documentation/mm
diff options
context:
space:
mode:
authorMark Brown <broonie@kernel.org>2023-07-17 06:12:31 +0100
committerMark Brown <broonie@kernel.org>2023-07-17 06:12:31 +0100
commit0791faebfe750292a8a842b64795a390ca4a3b51 (patch)
tree0e6095a5a0130398b0693bddfdc421c41eebda7c /Documentation/mm
parente8bf1741c14eb8e4a4e1364d45aeeab66660ab9b (diff)
parentfdf0eaf11452d72945af31804e2a1048ee1b574c (diff)
downloadlwn-0791faebfe750292a8a842b64795a390ca4a3b51.tar.gz
lwn-0791faebfe750292a8a842b64795a390ca4a3b51.zip
ASoC: Merge v6.5-rc2
Get a similar baseline to my other branches, and fixes for people using the branch.
Diffstat (limited to 'Documentation/mm')
-rw-r--r--Documentation/mm/damon/design.rst337
-rw-r--r--Documentation/mm/damon/faq.rst23
-rw-r--r--Documentation/mm/damon/maintainer-profile.rst4
-rw-r--r--Documentation/mm/page_migration.rst7
-rw-r--r--Documentation/mm/page_table_check.rst19
-rw-r--r--Documentation/mm/page_tables.rst149
-rw-r--r--Documentation/mm/split_page_table_lock.rst17
7 files changed, 495 insertions, 61 deletions
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index 0cff6fac6b7e..4bfdf1d30c4a 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -4,31 +4,55 @@
Design
======
-Configurable Layers
-===================
-
-DAMON provides data access monitoring functionality while making the accuracy
-and the overhead controllable. The fundamental access monitorings require
-primitives that dependent on and optimized for the target address space. On
-the other hand, the accuracy and overhead tradeoff mechanism, which is the core
-of DAMON, is in the pure logic space. DAMON separates the two parts in
-different layers and defines its interface to allow various low level
-primitives implementations configurable with the core logic. We call the low
-level primitives implementations monitoring operations.
-
-Due to this separated design and the configurable interface, users can extend
-DAMON for any address space by configuring the core logics with appropriate
-monitoring operations. If appropriate one is not provided, users can implement
-the operations on their own.
+
+Overall Architecture
+====================
+
+DAMON subsystem is configured with three layers including
+
+- Operations Set: Implements fundamental operations for DAMON that depends on
+ the given monitoring target address-space and available set of
+ software/hardware primitives,
+- Core: Implements core logics including monitoring overhead/accurach control
+ and access-aware system operations on top of the operations set layer, and
+- Modules: Implements kernel modules for various purposes that provides
+ interfaces for the user space, on top of the core layer.
+
+
+Configurable Operations Set
+---------------------------
+
+For data access monitoring and additional low level work, DAMON needs a set of
+implementations for specific operations that are dependent on and optimized for
+the given target address space. On the other hand, the accuracy and overhead
+tradeoff mechanism, which is the core logic of DAMON, is in the pure logic
+space. DAMON separates the two parts in different layers, namely DAMON
+Operations Set and DAMON Core Logics Layers, respectively. It further defines
+the interface between the layers to allow various operations sets to be
+configured with the core logic.
+
+Due to this design, users can extend DAMON for any address space by configuring
+the core logic to use the appropriate operations set. If any appropriate set
+is unavailable, users can implement one on their own.
For example, physical memory, virtual memory, swap space, those for specific
processes, NUMA nodes, files, and backing memory devices would be supportable.
-Also, if some architectures or devices support special optimized access check
-primitives, those will be easily configurable.
+Also, if some architectures or devices supporting special optimized access
+check primitives, those will be easily configurable.
-Reference Implementations of Address Space Specific Monitoring Operations
-=========================================================================
+Programmable Modules
+--------------------
+
+Core layer of DAMON is implemented as a framework, and exposes its application
+programming interface to all kernel space components such as subsystems and
+modules. For common use cases of DAMON, DAMON subsystem provides kernel
+modules that built on top of the core layer using the API, which can be easily
+used by the user space end users.
+
+
+Operations Set Layer
+====================
The monitoring operations are defined in two parts:
@@ -90,8 +114,12 @@ conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
as Idle page tracking does.
-Address Space Independent Core Mechanisms
-=========================================
+Core Logics
+===========
+
+
+Monitoring
+----------
Below four sections describe each of the DAMON core mechanisms and the five
monitoring attributes, ``sampling interval``, ``aggregation interval``,
@@ -100,7 +128,7 @@ regions``.
Access Frequency Monitoring
----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
The output of DAMON says what pages are how frequently accessed for a given
duration. The resolution of the access frequency is controlled by setting
@@ -127,7 +155,7 @@ size of the target workload grows.
Region Based Sampling
----------------------
+~~~~~~~~~~~~~~~~~~~~~
To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
that assumed to have the same access frequencies into a region. As long as the
@@ -144,7 +172,7 @@ assumption is not guaranteed.
Adaptive Regions Adjustment
----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
Even somehow the initial monitoring target regions are well constructed to
fulfill the assumption (pages in same region have similar access frequencies),
@@ -162,8 +190,22 @@ In this way, DAMON provides its best-effort quality and minimal overhead while
keeping the bounds users set for their trade-off.
+Age Tracking
+~~~~~~~~~~~~
+
+By analyzing the monitoring results, users can also find how long the current
+access pattern of a region has maintained. That could be used for good
+understanding of the access pattern. For example, page placement algorithm
+utilizing both the frequency and the recency could be implemented using that.
+To make such access pattern maintained period analysis easier, DAMON maintains
+yet another counter called ``age`` in each region. For each ``aggregation
+interval``, DAMON checks if the region's size and access frequency
+(``nr_accesses``) has significantly changed. If so, the counter is reset to
+zero. Otherwise, the counter is increased.
+
+
Dynamic Target Space Updates Handling
--------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The monitoring target address range could dynamically changed. For example,
virtual memory could be dynamically mapped and unmapped. Physical memory could
@@ -174,3 +216,246 @@ monitoring operations to check dynamic changes including memory mapping changes
and applies it to monitoring operations-related data structures such as the
abstracted monitoring target memory area only for each of a user-specified time
interval (``update interval``).
+
+
+.. _damon_design_damos:
+
+Operation Schemes
+-----------------
+
+One common purpose of data access monitoring is access-aware system efficiency
+optimizations. For example,
+
+ paging out memory regions that are not accessed for more than two minutes
+
+or
+
+ using THP for memory regions that are larger than 2 MiB and showing a high
+ access frequency for more than one minute.
+
+One straightforward approach for such schemes would be profile-guided
+optimizations. That is, getting data access monitoring results of the
+workloads or the system using DAMON, finding memory regions of special
+characteristics by profiling the monitoring results, and making system
+operation changes for the regions. The changes could be made by modifying or
+providing advice to the software (the application and/or the kernel), or
+reconfiguring the hardware. Both offline and online approaches could be
+available.
+
+Among those, providing advice to the kernel at runtime would be flexible and
+effective, and therefore widely be used. However, implementing such schemes
+could impose unnecessary redundancy and inefficiency. The profiling could be
+redundant if the type of interest is common. Exchanging the information
+including monitoring results and operation advice between kernel and user
+spaces could be inefficient.
+
+To allow users to reduce such redundancy and inefficiencies by offloading the
+works, DAMON provides a feature called Data Access Monitoring-based Operation
+Schemes (DAMOS). It lets users specify their desired schemes at a high
+level. For such specifications, DAMON starts monitoring, finds regions having
+the access pattern of interest, and applies the user-desired operation actions
+to the regions as soon as found.
+
+
+.. _damon_design_damos_action:
+
+Operation Action
+~~~~~~~~~~~~~~~~
+
+The management action that the users desire to apply to the regions of their
+interest. For example, paging out, prioritizing for next reclamation victim
+selection, advising ``khugepaged`` to collapse or split, or doing nothing but
+collecting statistics of the regions.
+
+The list of supported actions is defined in DAMOS, but the implementation of
+each action is in the DAMON operations set layer because the implementation
+normally depends on the monitoring target address space. For example, the code
+for paging specific virtual address ranges out would be different from that for
+physical address ranges. And the monitoring operations implementation sets are
+not mandated to support all actions of the list. Hence, the availability of
+specific DAMOS action depends on what operations set is selected to be used
+together.
+
+Applying an action to a region is considered as changing the region's
+characteristics. Hence, DAMOS resets the age of regions when an action is
+applied to those.
+
+
+.. _damon_design_damos_access_pattern:
+
+Target Access Pattern
+~~~~~~~~~~~~~~~~~~~~~
+
+The access pattern of the schemes' interest. The patterns are constructed with
+the properties that DAMON's monitoring results provide, specifically the size,
+the access frequency, and the age. Users can describe their access pattern of
+interest by setting minimum and maximum values of the three properties. If a
+region's three properties are in the ranges, DAMOS classifies it as one of the
+regions that the scheme is having an interest in.
+
+
+.. _damon_design_damos_quotas:
+
+Quotas
+~~~~~~
+
+DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if
+the target access pattern is not properly tuned. For example, if a huge memory
+region having the access pattern of interest is found, applying the scheme's
+action to all pages of the huge region could consume unacceptably large system
+resources. Preventing such issues by tuning the access pattern could be
+challenging, especially if the access patterns of the workloads are highly
+dynamic.
+
+To mitigate that situation, DAMOS provides an upper-bound overhead control
+feature called quotas. It lets users specify an upper limit of time that DAMOS
+can use for applying the action, and/or a maximum bytes of memory regions that
+the action can be applied within a user-specified time duration.
+
+
+.. _damon_design_damos_quotas_prioritization:
+
+Prioritization
+^^^^^^^^^^^^^^
+
+A mechanism for making a good decision under the quotas. When the action
+cannot be applied to all regions of interest due to the quotas, DAMOS
+prioritizes regions and applies the action to only regions having high enough
+priorities so that it will not exceed the quotas.
+
+The prioritization mechanism should be different for each action. For example,
+rarely accessed (colder) memory regions would be prioritized for page-out
+scheme action. In contrast, the colder regions would be deprioritized for huge
+page collapse scheme action. Hence, the prioritization mechanisms for each
+action are implemented in each DAMON operations set, together with the actions.
+
+Though the implementation is up to the DAMON operations set, it would be common
+to calculate the priority using the access pattern properties of the regions.
+Some users would want the mechanisms to be personalized for their specific
+case. For example, some users would want the mechanism to weigh the recency
+(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users
+to specify the weight of each access pattern property and passes the
+information to the underlying mechanism. Nevertheless, how and even whether
+the weight will be respected are up to the underlying prioritization mechanism
+implementation.
+
+
+.. _damon_design_damos_watermarks:
+
+Watermarks
+~~~~~~~~~~
+
+Conditional DAMOS (de)activation automation. Users might want DAMOS to run
+only under certain situations. For example, when a sufficient amount of free
+memory is guaranteed, running a scheme for proactive reclamation would only
+consume unnecessary system resources. To avoid such consumption, the user would
+need to manually monitor some metrics such as free memory ratio, and turn
+DAMON/DAMOS on or off.
+
+DAMOS allows users to offload such works using three watermarks. It allows the
+users to configure the metric of their interest, and three watermark values,
+namely high, middle, and low. If the value of the metric becomes above the
+high watermark or below the low watermark, the scheme is deactivated. If the
+metric becomes below the mid watermark but above the low watermark, the scheme
+is activated. If all schemes are deactivated by the watermarks, the monitoring
+is also deactivated. In this case, the DAMON worker thread only periodically
+checks the watermarks and therefore incurs nearly zero overhead.
+
+
+.. _damon_design_damos_filters:
+
+Filters
+~~~~~~~
+
+Non-access pattern-based target memory regions filtering. If users run
+self-written programs or have good profiling tools, they could know something
+more than the kernel, such as future access patterns or some special
+requirements for specific types of memory. For example, some users may know
+only anonymous pages can impact their program's performance. They can also
+have a list of latency-critical processes.
+
+To let users optimize DAMOS schemes with such special knowledge, DAMOS provides
+a feature called DAMOS filters. The feature allows users to set an arbitrary
+number of filters for each scheme. Each filter specifies the type of target
+memory, and whether it should exclude the memory of the type (filter-out), or
+all except the memory of the type (filter-in).
+
+As of this writing, anonymous page type and memory cgroup type are supported by
+the feature. Some filter target types can require additional arguments. For
+example, the memory cgroup filter type asks users to specify the file path of
+the memory cgroup for the filter. Hence, users can apply specific schemes to
+only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages
+excluding those of specific cgroups, and any combination of those.
+
+
+Application Programming Interface
+---------------------------------
+
+The programming interface for kernel space data access-aware applications.
+DAMON is a framework, so it does nothing by itself. Instead, it only helps
+other kernel components such as subsystems and modules building their data
+access-aware applications using DAMON's core features. For this, DAMON exposes
+its all features to other kernel components via its application programming
+interface, namely ``include/linux/damon.h``. Please refer to the API
+:doc:`document </mm/damon/api>` for details of the interface.
+
+
+Modules
+=======
+
+Because the core of DAMON is a framework for kernel components, it doesn't
+provide any direct interface for the user space. Such interfaces should be
+implemented by each DAMON API user kernel components, instead. DAMON subsystem
+itself implements such DAMON API user modules, which are supposed to be used
+for general purpose DAMON control and special purpose data access-aware system
+operations, and provides stable application binary interfaces (ABI) for the
+user space. The user space can build their efficient data access-aware
+applications using the interfaces.
+
+
+General Purpose User Interface Modules
+--------------------------------------
+
+DAMON modules that provide user space ABIs for general purpose DAMON usage in
+runtime.
+
+DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs
+interface' are DAMON API user kernel modules that provide ABIs to the
+user-space. Please note that DAMON debugfs interface is currently deprecated.
+
+Like many other ABIs, the modules create files on sysfs and debugfs, allow
+users to specify their requests to and get the answers from DAMON by writing to
+and reading from the files. As a response to such I/O, DAMON user interface
+modules control DAMON and retrieve the results as user requested via the DAMON
+API, and return the results to the user-space.
+
+The ABIs are designed to be used for user space applications development,
+rather than human beings' fingers. Human users are recommended to use such
+user space tools. One such Python-written user space tool is available at
+Github (https://github.com/awslabs/damo), Pypi
+(https://pypistats.org/packages/damo), and Fedora
+(https://packages.fedoraproject.org/pkgs/python-damo/damo/).
+
+Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for
+details of the interfaces.
+
+
+Special-Purpose Access-aware Kernel Modules
+-------------------------------------------
+
+DAMON modules that provide user space ABI for specific purpose DAMON usage.
+
+DAMON sysfs/debugfs user interfaces are for full control of all DAMON features
+in runtime. For each special-purpose system-wide data access-aware system
+operations such as proactive reclamation or LRU lists balancing, the interfaces
+could be simplified by removing unnecessary knobs for the specific purpose, and
+extended for boot-time and even compile time control. Default values of DAMON
+control parameters for the usage would also need to be optimized for the
+purpose.
+
+To support such cases, yet more DAMON API user kernel modules that provide more
+simple and optimized user space interfaces are available. Currently, two
+modules for proactive reclamation and LRU lists manipulation are provided. For
+more detail, please read the usage documents for those
+(:doc:`/admin-guide/mm/damon/reclaim` and
+:doc:`/admin-guide/mm/damon/lru_sort`).
diff --git a/Documentation/mm/damon/faq.rst b/Documentation/mm/damon/faq.rst
index dde7e2414ee6..3279dc7a8211 100644
--- a/Documentation/mm/damon/faq.rst
+++ b/Documentation/mm/damon/faq.rst
@@ -4,29 +4,6 @@
Frequently Asked Questions
==========================
-Why a new subsystem, instead of extending perf or other user space tools?
-=========================================================================
-
-First, because it needs to be lightweight as much as possible so that it can be
-used online, any unnecessary overhead such as kernel - user space context
-switching cost should be avoided. Second, DAMON aims to be used by other
-programs including the kernel. Therefore, having a dependency on specific
-tools like perf is not desirable. These are the two biggest reasons why DAMON
-is implemented in the kernel space.
-
-
-Can 'idle pages tracking' or 'perf mem' substitute DAMON?
-=========================================================
-
-Idle page tracking is a low level primitive for access check of the physical
-address space. 'perf mem' is similar, though it can use sampling to minimize
-the overhead. On the other hand, DAMON is a higher-level framework for the
-monitoring of various address spaces. It is focused on memory management
-optimization and provides sophisticated accuracy/overhead handling mechanisms.
-Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of
-DAMON's output, but cannot substitute DAMON.
-
-
Does DAMON support virtual memory only?
=======================================
diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst
index 24a202f03de8..a84c14e59053 100644
--- a/Documentation/mm/damon/maintainer-profile.rst
+++ b/Documentation/mm/damon/maintainer-profile.rst
@@ -3,7 +3,7 @@
DAMON Maintainer Entry Profile
==============================
-The DAMON subsystem covers the files that listed in 'DATA ACCESS MONITOR'
+The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR'
section of 'MAINTAINERS' file.
The mailing lists for the subsystem are damon@lists.linux.dev and
@@ -15,7 +15,7 @@ SCM Trees
There are multiple Linux trees for DAMON development. Patches under
development or testing are queued in damon/next [2]_ by the DAMON maintainer.
-Suffieicntly reviewed patches will be queued in mm-unstable [1]_ by the memory
+Sufficiently reviewed patches will be queued in mm-unstable [1]_ by the memory
management subsystem maintainer. After more sufficient tests, the patches will
be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the
memory management subsystem maintainer.
diff --git a/Documentation/mm/page_migration.rst b/Documentation/mm/page_migration.rst
index 313dce18893e..e35af7805be5 100644
--- a/Documentation/mm/page_migration.rst
+++ b/Documentation/mm/page_migration.rst
@@ -73,14 +73,13 @@ In kernel use of migrate_pages()
It also prevents the swapper or other scans from encountering
the page.
-2. We need to have a function of type new_page_t that can be
+2. We need to have a function of type new_folio_t that can be
passed to migrate_pages(). This function should figure out
- how to allocate the correct new page given the old page.
+ how to allocate the correct new folio given the old folio.
3. The migrate_pages() function is called which attempts
to do the migration. It will call the function to allocate
- the new page for each page that is considered for
- moving.
+ the new folio for each folio that is considered for moving.
How migrate_pages() works
=========================
diff --git a/Documentation/mm/page_table_check.rst b/Documentation/mm/page_table_check.rst
index cfd8f4117cf3..c12838ce6b8d 100644
--- a/Documentation/mm/page_table_check.rst
+++ b/Documentation/mm/page_table_check.rst
@@ -52,3 +52,22 @@ Build kernel with:
Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
table support without extra kernel parameter.
+
+Implementation notes
+====================
+
+We specifically decided not to use VMA information in order to avoid relying on
+MM states (except for limited "struct page" info). The page table check is a
+separate from Linux-MM state machine that verifies that the user accessible
+pages are not falsely shared.
+
+PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without
+EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory
+regions into the userspace via /dev/mem. At the same time, pages may change
+their properties (e.g., from anonymous pages to named pages) while they are
+still being mapped in the userspace, leading to "corruption" detected by the
+page table check.
+
+Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via
+/dev/mem. However, these pages are always considered as named pages, so they
+won't break the logic used in the page table check.
diff --git a/Documentation/mm/page_tables.rst b/Documentation/mm/page_tables.rst
index 96939571d7bc..7840c1891751 100644
--- a/Documentation/mm/page_tables.rst
+++ b/Documentation/mm/page_tables.rst
@@ -3,3 +3,152 @@
===========
Page Tables
===========
+
+Paged virtual memory was invented along with virtual memory as a concept in
+1962 on the Ferranti Atlas Computer which was the first computer with paged
+virtual memory. The feature migrated to newer computers and became a de facto
+feature of all Unix-like systems as time went by. In 1985 the feature was
+included in the Intel 80386, which was the CPU Linux 1.0 was developed on.
+
+Page tables map virtual addresses as seen by the CPU into physical addresses
+as seen on the external memory bus.
+
+Linux defines page tables as a hierarchy which is currently five levels in
+height. The architecture code for each supported architecture will then
+map this to the restrictions of the hardware.
+
+The physical address corresponding to the virtual address is often referenced
+by the underlying physical page frame. The **page frame number** or **pfn**
+is the physical address of the page (as seen on the external memory bus)
+divided by `PAGE_SIZE`.
+
+Physical memory address 0 will be *pfn 0* and the highest pfn will be
+the last page of physical memory the external address bus of the CPU can
+address.
+
+With a page granularity of 4KB and a address range of 32 bits, pfn 0 is at
+address 0x00000000, pfn 1 is at address 0x00001000, pfn 2 is at 0x00002000
+and so on until we reach pfn 0xfffff at 0xfffff000. With 16KB pages pfs are
+at 0x00004000, 0x00008000 ... 0xffffc000 and pfn goes from 0 to 0x3fffff.
+
+As you can see, with 4KB pages the page base address uses bits 12-31 of the
+address, and this is why `PAGE_SHIFT` in this case is defined as 12 and
+`PAGE_SIZE` is usually defined in terms of the page shift as `(1 << PAGE_SHIFT)`
+
+Over time a deeper hierarchy has been developed in response to increasing memory
+sizes. When Linux was created, 4KB pages and a single page table called
+`swapper_pg_dir` with 1024 entries was used, covering 4MB which coincided with
+the fact that Torvald's first computer had 4MB of physical memory. Entries in
+this single table were referred to as *PTE*:s - page table entries.
+
+The software page table hierarchy reflects the fact that page table hardware has
+become hierarchical and that in turn is done to save page table memory and
+speed up mapping.
+
+One could of course imagine a single, linear page table with enormous amounts
+of entries, breaking down the whole memory into single pages. Such a page table
+would be very sparse, because large portions of the virtual memory usually
+remains unused. By using hierarchical page tables large holes in the virtual
+address space does not waste valuable page table memory, because it will suffice
+to mark large areas as unmapped at a higher level in the page table hierarchy.
+
+Additionally, on modern CPUs, a higher level page table entry can point directly
+to a physical memory range, which allows mapping a contiguous range of several
+megabytes or even gigabytes in a single high-level page table entry, taking
+shortcuts in mapping virtual memory to physical memory: there is no need to
+traverse deeper in the hierarchy when you find a large mapped range like this.
+
+The page table hierarchy has now developed into this::
+
+ +-----+
+ | PGD |
+ +-----+
+ |
+ | +-----+
+ +-->| P4D |
+ +-----+
+ |
+ | +-----+
+ +-->| PUD |
+ +-----+
+ |
+ | +-----+
+ +-->| PMD |
+ +-----+
+ |
+ | +-----+
+ +-->| PTE |
+ +-----+
+
+
+Symbols on the different levels of the page table hierarchy have the following
+meaning beginning from the bottom:
+
+- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned earlier.
+ The *pte* is an array of `PTRS_PER_PTE` elements of the `pteval_t` type, each
+ mapping a single page of virtual memory to a single page of physical memory.
+ The architecture defines the size and contents of `pteval_t`.
+
+ A typical example is that the `pteval_t` is a 32- or 64-bit value with the
+ upper bits being a **pfn** (page frame number), and the lower bits being some
+ architecture-specific bits such as memory protection.
+
+ The **entry** part of the name is a bit confusing because while in Linux 1.0
+ this did refer to a single page table entry in the single top level page
+ table, it was retrofitted to be an array of mapping elements when two-level
+ page tables were first introduced, so the *pte* is the lowermost page
+ *table*, not a page table *entry*.
+
+- **pmd**, `pmd_t`, `pmdval_t` = **Page Middle Directory**, the hierarchy right
+ above the *pte*, with `PTRS_PER_PMD` references to the *pte*:s.
+
+- **pud**, `pud_t`, `pudval_t` = **Page Upper Directory** was introduced after
+ the other levels to handle 4-level page tables. It is potentially unused,
+ or *folded* as we will discuss later.
+
+- **p4d**, `p4d_t`, `p4dval_t` = **Page Level 4 Directory** was introduced to
+ handle 5-level page tables after the *pud* was introduced. Now it was clear
+ that we needed to replace *pgd*, *pmd*, *pud* etc with a figure indicating the
+ directory level and that we cannot go on with ad hoc names any more. This
+ is only used on systems which actually have 5 levels of page tables, otherwise
+ it is folded.
+
+- **pgd**, `pgd_t`, `pgdval_t` = **Page Global Directory** - the Linux kernel
+ main page table handling the PGD for the kernel memory is still found in
+ `swapper_pg_dir`, but each userspace process in the system also has its own
+ memory context and thus its own *pgd*, found in `struct mm_struct` which
+ in turn is referenced to in each `struct task_struct`. So tasks have memory
+ context in the form of a `struct mm_struct` and this in turn has a
+ `struct pgt_t *pgd` pointer to the corresponding page global directory.
+
+To repeat: each level in the page table hierarchy is a *array of pointers*, so
+the **pgd** contains `PTRS_PER_PGD` pointers to the next level below, **p4d**
+contains `PTRS_PER_P4D` pointers to **pud** items and so on. The number of
+pointers on each level is architecture-defined.::
+
+ PMD
+ --> +-----+ PTE
+ | ptr |-------> +-----+
+ | ptr |- | ptr |-------> PAGE
+ | ptr | \ | ptr |
+ | ptr | \ ...
+ | ... | \
+ | ptr | \ PTE
+ +-----+ +----> +-----+
+ | ptr |-------> PAGE
+ | ptr |
+ ...
+
+
+Page Table Folding
+==================
+
+If the architecture does not use all the page table levels, they can be *folded*
+which means skipped, and all operations performed on page tables will be
+compile-time augmented to just skip a level when accessing the next lower
+level.
+
+Page table handling code that wishes to be architecture-neutral, such as the
+virtual memory manager, will need to be written so that it traverses all of the
+currently five levels. This style should also be preferred for
+architecture-specific code, so as to be robust to future changes.
diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst
index 50ee0dfc95be..a834fad9de12 100644
--- a/Documentation/mm/split_page_table_lock.rst
+++ b/Documentation/mm/split_page_table_lock.rst
@@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock.
There are helpers to lock/unlock a table and other accessor functions:
- pte_offset_map_lock()
- maps pte and takes PTE table lock, returns pointer to the taken
- lock;
+ maps PTE and takes PTE table lock, returns pointer to PTE with
+ pointer to its PTE table lock, or returns NULL if no PTE table;
+ - pte_offset_map_nolock()
+ maps PTE, returns pointer to PTE with pointer to its PTE table
+ lock (not taken), or returns NULL if no PTE table;
+ - pte_offset_map()
+ maps PTE, returns pointer to PTE, or returns NULL if no PTE table;
+ - pte_unmap()
+ unmaps PTE table;
- pte_unmap_unlock()
unlocks and unmaps PTE table;
- pte_alloc_map_lock()
- allocates PTE table if needed and take the lock, returns pointer
- to taken lock or NULL if allocation failed;
- - pte_lockptr()
- returns pointer to PTE table lock;
+ allocates PTE table if needed and takes its lock, returns pointer to
+ PTE with pointer to its lock, or returns NULL if allocation failed;
- pmd_lock()
takes PMD table lock, returns pointer to taken lock;
- pmd_lockptr()