diff options
author | Mark Brown <broonie@kernel.org> | 2023-07-17 06:12:31 +0100 |
---|---|---|
committer | Mark Brown <broonie@kernel.org> | 2023-07-17 06:12:31 +0100 |
commit | 0791faebfe750292a8a842b64795a390ca4a3b51 (patch) | |
tree | 0e6095a5a0130398b0693bddfdc421c41eebda7c /Documentation/mm | |
parent | e8bf1741c14eb8e4a4e1364d45aeeab66660ab9b (diff) | |
parent | fdf0eaf11452d72945af31804e2a1048ee1b574c (diff) | |
download | lwn-0791faebfe750292a8a842b64795a390ca4a3b51.tar.gz lwn-0791faebfe750292a8a842b64795a390ca4a3b51.zip |
ASoC: Merge v6.5-rc2
Get a similar baseline to my other branches, and fixes for people using
the branch.
Diffstat (limited to 'Documentation/mm')
-rw-r--r-- | Documentation/mm/damon/design.rst | 337 | ||||
-rw-r--r-- | Documentation/mm/damon/faq.rst | 23 | ||||
-rw-r--r-- | Documentation/mm/damon/maintainer-profile.rst | 4 | ||||
-rw-r--r-- | Documentation/mm/page_migration.rst | 7 | ||||
-rw-r--r-- | Documentation/mm/page_table_check.rst | 19 | ||||
-rw-r--r-- | Documentation/mm/page_tables.rst | 149 | ||||
-rw-r--r-- | Documentation/mm/split_page_table_lock.rst | 17 |
7 files changed, 495 insertions, 61 deletions
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 0cff6fac6b7e..4bfdf1d30c4a 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -4,31 +4,55 @@ Design ====== -Configurable Layers -=================== - -DAMON provides data access monitoring functionality while making the accuracy -and the overhead controllable. The fundamental access monitorings require -primitives that dependent on and optimized for the target address space. On -the other hand, the accuracy and overhead tradeoff mechanism, which is the core -of DAMON, is in the pure logic space. DAMON separates the two parts in -different layers and defines its interface to allow various low level -primitives implementations configurable with the core logic. We call the low -level primitives implementations monitoring operations. - -Due to this separated design and the configurable interface, users can extend -DAMON for any address space by configuring the core logics with appropriate -monitoring operations. If appropriate one is not provided, users can implement -the operations on their own. + +Overall Architecture +==================== + +DAMON subsystem is configured with three layers including + +- Operations Set: Implements fundamental operations for DAMON that depends on + the given monitoring target address-space and available set of + software/hardware primitives, +- Core: Implements core logics including monitoring overhead/accurach control + and access-aware system operations on top of the operations set layer, and +- Modules: Implements kernel modules for various purposes that provides + interfaces for the user space, on top of the core layer. + + +Configurable Operations Set +--------------------------- + +For data access monitoring and additional low level work, DAMON needs a set of +implementations for specific operations that are dependent on and optimized for +the given target address space. On the other hand, the accuracy and overhead +tradeoff mechanism, which is the core logic of DAMON, is in the pure logic +space. DAMON separates the two parts in different layers, namely DAMON +Operations Set and DAMON Core Logics Layers, respectively. It further defines +the interface between the layers to allow various operations sets to be +configured with the core logic. + +Due to this design, users can extend DAMON for any address space by configuring +the core logic to use the appropriate operations set. If any appropriate set +is unavailable, users can implement one on their own. For example, physical memory, virtual memory, swap space, those for specific processes, NUMA nodes, files, and backing memory devices would be supportable. -Also, if some architectures or devices support special optimized access check -primitives, those will be easily configurable. +Also, if some architectures or devices supporting special optimized access +check primitives, those will be easily configurable. -Reference Implementations of Address Space Specific Monitoring Operations -========================================================================= +Programmable Modules +-------------------- + +Core layer of DAMON is implemented as a framework, and exposes its application +programming interface to all kernel space components such as subsystems and +modules. For common use cases of DAMON, DAMON subsystem provides kernel +modules that built on top of the core layer using the API, which can be easily +used by the user space end users. + + +Operations Set Layer +==================== The monitoring operations are defined in two parts: @@ -90,8 +114,12 @@ conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, as Idle page tracking does. -Address Space Independent Core Mechanisms -========================================= +Core Logics +=========== + + +Monitoring +---------- Below four sections describe each of the DAMON core mechanisms and the five monitoring attributes, ``sampling interval``, ``aggregation interval``, @@ -100,7 +128,7 @@ regions``. Access Frequency Monitoring ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of DAMON says what pages are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting @@ -127,7 +155,7 @@ size of the target workload grows. Region Based Sampling ---------------------- +~~~~~~~~~~~~~~~~~~~~~ To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that assumed to have the same access frequencies into a region. As long as the @@ -144,7 +172,7 @@ assumption is not guaranteed. Adaptive Regions Adjustment ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), @@ -162,8 +190,22 @@ In this way, DAMON provides its best-effort quality and minimal overhead while keeping the bounds users set for their trade-off. +Age Tracking +~~~~~~~~~~~~ + +By analyzing the monitoring results, users can also find how long the current +access pattern of a region has maintained. That could be used for good +understanding of the access pattern. For example, page placement algorithm +utilizing both the frequency and the recency could be implemented using that. +To make such access pattern maintained period analysis easier, DAMON maintains +yet another counter called ``age`` in each region. For each ``aggregation +interval``, DAMON checks if the region's size and access frequency +(``nr_accesses``) has significantly changed. If so, the counter is reset to +zero. Otherwise, the counter is increased. + + Dynamic Target Space Updates Handling -------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The monitoring target address range could dynamically changed. For example, virtual memory could be dynamically mapped and unmapped. Physical memory could @@ -174,3 +216,246 @@ monitoring operations to check dynamic changes including memory mapping changes and applies it to monitoring operations-related data structures such as the abstracted monitoring target memory area only for each of a user-specified time interval (``update interval``). + + +.. _damon_design_damos: + +Operation Schemes +----------------- + +One common purpose of data access monitoring is access-aware system efficiency +optimizations. For example, + + paging out memory regions that are not accessed for more than two minutes + +or + + using THP for memory regions that are larger than 2 MiB and showing a high + access frequency for more than one minute. + +One straightforward approach for such schemes would be profile-guided +optimizations. That is, getting data access monitoring results of the +workloads or the system using DAMON, finding memory regions of special +characteristics by profiling the monitoring results, and making system +operation changes for the regions. The changes could be made by modifying or +providing advice to the software (the application and/or the kernel), or +reconfiguring the hardware. Both offline and online approaches could be +available. + +Among those, providing advice to the kernel at runtime would be flexible and +effective, and therefore widely be used. However, implementing such schemes +could impose unnecessary redundancy and inefficiency. The profiling could be +redundant if the type of interest is common. Exchanging the information +including monitoring results and operation advice between kernel and user +spaces could be inefficient. + +To allow users to reduce such redundancy and inefficiencies by offloading the +works, DAMON provides a feature called Data Access Monitoring-based Operation +Schemes (DAMOS). It lets users specify their desired schemes at a high +level. For such specifications, DAMON starts monitoring, finds regions having +the access pattern of interest, and applies the user-desired operation actions +to the regions as soon as found. + + +.. _damon_design_damos_action: + +Operation Action +~~~~~~~~~~~~~~~~ + +The management action that the users desire to apply to the regions of their +interest. For example, paging out, prioritizing for next reclamation victim +selection, advising ``khugepaged`` to collapse or split, or doing nothing but +collecting statistics of the regions. + +The list of supported actions is defined in DAMOS, but the implementation of +each action is in the DAMON operations set layer because the implementation +normally depends on the monitoring target address space. For example, the code +for paging specific virtual address ranges out would be different from that for +physical address ranges. And the monitoring operations implementation sets are +not mandated to support all actions of the list. Hence, the availability of +specific DAMOS action depends on what operations set is selected to be used +together. + +Applying an action to a region is considered as changing the region's +characteristics. Hence, DAMOS resets the age of regions when an action is +applied to those. + + +.. _damon_design_damos_access_pattern: + +Target Access Pattern +~~~~~~~~~~~~~~~~~~~~~ + +The access pattern of the schemes' interest. The patterns are constructed with +the properties that DAMON's monitoring results provide, specifically the size, +the access frequency, and the age. Users can describe their access pattern of +interest by setting minimum and maximum values of the three properties. If a +region's three properties are in the ranges, DAMOS classifies it as one of the +regions that the scheme is having an interest in. + + +.. _damon_design_damos_quotas: + +Quotas +~~~~~~ + +DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if +the target access pattern is not properly tuned. For example, if a huge memory +region having the access pattern of interest is found, applying the scheme's +action to all pages of the huge region could consume unacceptably large system +resources. Preventing such issues by tuning the access pattern could be +challenging, especially if the access patterns of the workloads are highly +dynamic. + +To mitigate that situation, DAMOS provides an upper-bound overhead control +feature called quotas. It lets users specify an upper limit of time that DAMOS +can use for applying the action, and/or a maximum bytes of memory regions that +the action can be applied within a user-specified time duration. + + +.. _damon_design_damos_quotas_prioritization: + +Prioritization +^^^^^^^^^^^^^^ + +A mechanism for making a good decision under the quotas. When the action +cannot be applied to all regions of interest due to the quotas, DAMOS +prioritizes regions and applies the action to only regions having high enough +priorities so that it will not exceed the quotas. + +The prioritization mechanism should be different for each action. For example, +rarely accessed (colder) memory regions would be prioritized for page-out +scheme action. In contrast, the colder regions would be deprioritized for huge +page collapse scheme action. Hence, the prioritization mechanisms for each +action are implemented in each DAMON operations set, together with the actions. + +Though the implementation is up to the DAMON operations set, it would be common +to calculate the priority using the access pattern properties of the regions. +Some users would want the mechanisms to be personalized for their specific +case. For example, some users would want the mechanism to weigh the recency +(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users +to specify the weight of each access pattern property and passes the +information to the underlying mechanism. Nevertheless, how and even whether +the weight will be respected are up to the underlying prioritization mechanism +implementation. + + +.. _damon_design_damos_watermarks: + +Watermarks +~~~~~~~~~~ + +Conditional DAMOS (de)activation automation. Users might want DAMOS to run +only under certain situations. For example, when a sufficient amount of free +memory is guaranteed, running a scheme for proactive reclamation would only +consume unnecessary system resources. To avoid such consumption, the user would +need to manually monitor some metrics such as free memory ratio, and turn +DAMON/DAMOS on or off. + +DAMOS allows users to offload such works using three watermarks. It allows the +users to configure the metric of their interest, and three watermark values, +namely high, middle, and low. If the value of the metric becomes above the +high watermark or below the low watermark, the scheme is deactivated. If the +metric becomes below the mid watermark but above the low watermark, the scheme +is activated. If all schemes are deactivated by the watermarks, the monitoring +is also deactivated. In this case, the DAMON worker thread only periodically +checks the watermarks and therefore incurs nearly zero overhead. + + +.. _damon_design_damos_filters: + +Filters +~~~~~~~ + +Non-access pattern-based target memory regions filtering. If users run +self-written programs or have good profiling tools, they could know something +more than the kernel, such as future access patterns or some special +requirements for specific types of memory. For example, some users may know +only anonymous pages can impact their program's performance. They can also +have a list of latency-critical processes. + +To let users optimize DAMOS schemes with such special knowledge, DAMOS provides +a feature called DAMOS filters. The feature allows users to set an arbitrary +number of filters for each scheme. Each filter specifies the type of target +memory, and whether it should exclude the memory of the type (filter-out), or +all except the memory of the type (filter-in). + +As of this writing, anonymous page type and memory cgroup type are supported by +the feature. Some filter target types can require additional arguments. For +example, the memory cgroup filter type asks users to specify the file path of +the memory cgroup for the filter. Hence, users can apply specific schemes to +only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages +excluding those of specific cgroups, and any combination of those. + + +Application Programming Interface +--------------------------------- + +The programming interface for kernel space data access-aware applications. +DAMON is a framework, so it does nothing by itself. Instead, it only helps +other kernel components such as subsystems and modules building their data +access-aware applications using DAMON's core features. For this, DAMON exposes +its all features to other kernel components via its application programming +interface, namely ``include/linux/damon.h``. Please refer to the API +:doc:`document </mm/damon/api>` for details of the interface. + + +Modules +======= + +Because the core of DAMON is a framework for kernel components, it doesn't +provide any direct interface for the user space. Such interfaces should be +implemented by each DAMON API user kernel components, instead. DAMON subsystem +itself implements such DAMON API user modules, which are supposed to be used +for general purpose DAMON control and special purpose data access-aware system +operations, and provides stable application binary interfaces (ABI) for the +user space. The user space can build their efficient data access-aware +applications using the interfaces. + + +General Purpose User Interface Modules +-------------------------------------- + +DAMON modules that provide user space ABIs for general purpose DAMON usage in +runtime. + +DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs +interface' are DAMON API user kernel modules that provide ABIs to the +user-space. Please note that DAMON debugfs interface is currently deprecated. + +Like many other ABIs, the modules create files on sysfs and debugfs, allow +users to specify their requests to and get the answers from DAMON by writing to +and reading from the files. As a response to such I/O, DAMON user interface +modules control DAMON and retrieve the results as user requested via the DAMON +API, and return the results to the user-space. + +The ABIs are designed to be used for user space applications development, +rather than human beings' fingers. Human users are recommended to use such +user space tools. One such Python-written user space tool is available at +Github (https://github.com/awslabs/damo), Pypi +(https://pypistats.org/packages/damo), and Fedora +(https://packages.fedoraproject.org/pkgs/python-damo/damo/). + +Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for +details of the interfaces. + + +Special-Purpose Access-aware Kernel Modules +------------------------------------------- + +DAMON modules that provide user space ABI for specific purpose DAMON usage. + +DAMON sysfs/debugfs user interfaces are for full control of all DAMON features +in runtime. For each special-purpose system-wide data access-aware system +operations such as proactive reclamation or LRU lists balancing, the interfaces +could be simplified by removing unnecessary knobs for the specific purpose, and +extended for boot-time and even compile time control. Default values of DAMON +control parameters for the usage would also need to be optimized for the +purpose. + +To support such cases, yet more DAMON API user kernel modules that provide more +simple and optimized user space interfaces are available. Currently, two +modules for proactive reclamation and LRU lists manipulation are provided. For +more detail, please read the usage documents for those +(:doc:`/admin-guide/mm/damon/reclaim` and +:doc:`/admin-guide/mm/damon/lru_sort`). diff --git a/Documentation/mm/damon/faq.rst b/Documentation/mm/damon/faq.rst index dde7e2414ee6..3279dc7a8211 100644 --- a/Documentation/mm/damon/faq.rst +++ b/Documentation/mm/damon/faq.rst @@ -4,29 +4,6 @@ Frequently Asked Questions ========================== -Why a new subsystem, instead of extending perf or other user space tools? -========================================================================= - -First, because it needs to be lightweight as much as possible so that it can be -used online, any unnecessary overhead such as kernel - user space context -switching cost should be avoided. Second, DAMON aims to be used by other -programs including the kernel. Therefore, having a dependency on specific -tools like perf is not desirable. These are the two biggest reasons why DAMON -is implemented in the kernel space. - - -Can 'idle pages tracking' or 'perf mem' substitute DAMON? -========================================================= - -Idle page tracking is a low level primitive for access check of the physical -address space. 'perf mem' is similar, though it can use sampling to minimize -the overhead. On the other hand, DAMON is a higher-level framework for the -monitoring of various address spaces. It is focused on memory management -optimization and provides sophisticated accuracy/overhead handling mechanisms. -Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of -DAMON's output, but cannot substitute DAMON. - - Does DAMON support virtual memory only? ======================================= diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst index 24a202f03de8..a84c14e59053 100644 --- a/Documentation/mm/damon/maintainer-profile.rst +++ b/Documentation/mm/damon/maintainer-profile.rst @@ -3,7 +3,7 @@ DAMON Maintainer Entry Profile ============================== -The DAMON subsystem covers the files that listed in 'DATA ACCESS MONITOR' +The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR' section of 'MAINTAINERS' file. The mailing lists for the subsystem are damon@lists.linux.dev and @@ -15,7 +15,7 @@ SCM Trees There are multiple Linux trees for DAMON development. Patches under development or testing are queued in damon/next [2]_ by the DAMON maintainer. -Suffieicntly reviewed patches will be queued in mm-unstable [1]_ by the memory +Sufficiently reviewed patches will be queued in mm-unstable [1]_ by the memory management subsystem maintainer. After more sufficient tests, the patches will be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the memory management subsystem maintainer. diff --git a/Documentation/mm/page_migration.rst b/Documentation/mm/page_migration.rst index 313dce18893e..e35af7805be5 100644 --- a/Documentation/mm/page_migration.rst +++ b/Documentation/mm/page_migration.rst @@ -73,14 +73,13 @@ In kernel use of migrate_pages() It also prevents the swapper or other scans from encountering the page. -2. We need to have a function of type new_page_t that can be +2. We need to have a function of type new_folio_t that can be passed to migrate_pages(). This function should figure out - how to allocate the correct new page given the old page. + how to allocate the correct new folio given the old folio. 3. The migrate_pages() function is called which attempts to do the migration. It will call the function to allocate - the new page for each page that is considered for - moving. + the new folio for each folio that is considered for moving. How migrate_pages() works ========================= diff --git a/Documentation/mm/page_table_check.rst b/Documentation/mm/page_table_check.rst index cfd8f4117cf3..c12838ce6b8d 100644 --- a/Documentation/mm/page_table_check.rst +++ b/Documentation/mm/page_table_check.rst @@ -52,3 +52,22 @@ Build kernel with: Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page table support without extra kernel parameter. + +Implementation notes +==================== + +We specifically decided not to use VMA information in order to avoid relying on +MM states (except for limited "struct page" info). The page table check is a +separate from Linux-MM state machine that verifies that the user accessible +pages are not falsely shared. + +PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without +EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory +regions into the userspace via /dev/mem. At the same time, pages may change +their properties (e.g., from anonymous pages to named pages) while they are +still being mapped in the userspace, leading to "corruption" detected by the +page table check. + +Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via +/dev/mem. However, these pages are always considered as named pages, so they +won't break the logic used in the page table check. diff --git a/Documentation/mm/page_tables.rst b/Documentation/mm/page_tables.rst index 96939571d7bc..7840c1891751 100644 --- a/Documentation/mm/page_tables.rst +++ b/Documentation/mm/page_tables.rst @@ -3,3 +3,152 @@ =========== Page Tables =========== + +Paged virtual memory was invented along with virtual memory as a concept in +1962 on the Ferranti Atlas Computer which was the first computer with paged +virtual memory. The feature migrated to newer computers and became a de facto +feature of all Unix-like systems as time went by. In 1985 the feature was +included in the Intel 80386, which was the CPU Linux 1.0 was developed on. + +Page tables map virtual addresses as seen by the CPU into physical addresses +as seen on the external memory bus. + +Linux defines page tables as a hierarchy which is currently five levels in +height. The architecture code for each supported architecture will then +map this to the restrictions of the hardware. + +The physical address corresponding to the virtual address is often referenced +by the underlying physical page frame. The **page frame number** or **pfn** +is the physical address of the page (as seen on the external memory bus) +divided by `PAGE_SIZE`. + +Physical memory address 0 will be *pfn 0* and the highest pfn will be +the last page of physical memory the external address bus of the CPU can +address. + +With a page granularity of 4KB and a address range of 32 bits, pfn 0 is at +address 0x00000000, pfn 1 is at address 0x00001000, pfn 2 is at 0x00002000 +and so on until we reach pfn 0xfffff at 0xfffff000. With 16KB pages pfs are +at 0x00004000, 0x00008000 ... 0xffffc000 and pfn goes from 0 to 0x3fffff. + +As you can see, with 4KB pages the page base address uses bits 12-31 of the +address, and this is why `PAGE_SHIFT` in this case is defined as 12 and +`PAGE_SIZE` is usually defined in terms of the page shift as `(1 << PAGE_SHIFT)` + +Over time a deeper hierarchy has been developed in response to increasing memory +sizes. When Linux was created, 4KB pages and a single page table called +`swapper_pg_dir` with 1024 entries was used, covering 4MB which coincided with +the fact that Torvald's first computer had 4MB of physical memory. Entries in +this single table were referred to as *PTE*:s - page table entries. + +The software page table hierarchy reflects the fact that page table hardware has +become hierarchical and that in turn is done to save page table memory and +speed up mapping. + +One could of course imagine a single, linear page table with enormous amounts +of entries, breaking down the whole memory into single pages. Such a page table +would be very sparse, because large portions of the virtual memory usually +remains unused. By using hierarchical page tables large holes in the virtual +address space does not waste valuable page table memory, because it will suffice +to mark large areas as unmapped at a higher level in the page table hierarchy. + +Additionally, on modern CPUs, a higher level page table entry can point directly +to a physical memory range, which allows mapping a contiguous range of several +megabytes or even gigabytes in a single high-level page table entry, taking +shortcuts in mapping virtual memory to physical memory: there is no need to +traverse deeper in the hierarchy when you find a large mapped range like this. + +The page table hierarchy has now developed into this:: + + +-----+ + | PGD | + +-----+ + | + | +-----+ + +-->| P4D | + +-----+ + | + | +-----+ + +-->| PUD | + +-----+ + | + | +-----+ + +-->| PMD | + +-----+ + | + | +-----+ + +-->| PTE | + +-----+ + + +Symbols on the different levels of the page table hierarchy have the following +meaning beginning from the bottom: + +- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned earlier. + The *pte* is an array of `PTRS_PER_PTE` elements of the `pteval_t` type, each + mapping a single page of virtual memory to a single page of physical memory. + The architecture defines the size and contents of `pteval_t`. + + A typical example is that the `pteval_t` is a 32- or 64-bit value with the + upper bits being a **pfn** (page frame number), and the lower bits being some + architecture-specific bits such as memory protection. + + The **entry** part of the name is a bit confusing because while in Linux 1.0 + this did refer to a single page table entry in the single top level page + table, it was retrofitted to be an array of mapping elements when two-level + page tables were first introduced, so the *pte* is the lowermost page + *table*, not a page table *entry*. + +- **pmd**, `pmd_t`, `pmdval_t` = **Page Middle Directory**, the hierarchy right + above the *pte*, with `PTRS_PER_PMD` references to the *pte*:s. + +- **pud**, `pud_t`, `pudval_t` = **Page Upper Directory** was introduced after + the other levels to handle 4-level page tables. It is potentially unused, + or *folded* as we will discuss later. + +- **p4d**, `p4d_t`, `p4dval_t` = **Page Level 4 Directory** was introduced to + handle 5-level page tables after the *pud* was introduced. Now it was clear + that we needed to replace *pgd*, *pmd*, *pud* etc with a figure indicating the + directory level and that we cannot go on with ad hoc names any more. This + is only used on systems which actually have 5 levels of page tables, otherwise + it is folded. + +- **pgd**, `pgd_t`, `pgdval_t` = **Page Global Directory** - the Linux kernel + main page table handling the PGD for the kernel memory is still found in + `swapper_pg_dir`, but each userspace process in the system also has its own + memory context and thus its own *pgd*, found in `struct mm_struct` which + in turn is referenced to in each `struct task_struct`. So tasks have memory + context in the form of a `struct mm_struct` and this in turn has a + `struct pgt_t *pgd` pointer to the corresponding page global directory. + +To repeat: each level in the page table hierarchy is a *array of pointers*, so +the **pgd** contains `PTRS_PER_PGD` pointers to the next level below, **p4d** +contains `PTRS_PER_P4D` pointers to **pud** items and so on. The number of +pointers on each level is architecture-defined.:: + + PMD + --> +-----+ PTE + | ptr |-------> +-----+ + | ptr |- | ptr |-------> PAGE + | ptr | \ | ptr | + | ptr | \ ... + | ... | \ + | ptr | \ PTE + +-----+ +----> +-----+ + | ptr |-------> PAGE + | ptr | + ... + + +Page Table Folding +================== + +If the architecture does not use all the page table levels, they can be *folded* +which means skipped, and all operations performed on page tables will be +compile-time augmented to just skip a level when accessing the next lower +level. + +Page table handling code that wishes to be architecture-neutral, such as the +virtual memory manager, will need to be written so that it traverses all of the +currently five levels. This style should also be preferred for +architecture-specific code, so as to be robust to future changes. diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock. There are helpers to lock/unlock a table and other accessor functions: - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() |