diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-06-28 10:28:11 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-06-28 10:28:11 -0700 |
commit | 6e17c6de3ddf3073741d9c91a796ee696914d8a0 (patch) | |
tree | 2c425707f78642625dbe2c824c7fded2021e3dc7 /Documentation/mm | |
parent | 6aeadf7896bff4ca230702daba8788455e6b866e (diff) | |
parent | acc72d59c7509540c27c49625cb4b5a8db1f1a84 (diff) | |
download | lwn-6e17c6de3ddf3073741d9c91a796ee696914d8a0.tar.gz lwn-6e17c6de3ddf3073741d9c91a796ee696914d8a0.zip |
Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
Diffstat (limited to 'Documentation/mm')
-rw-r--r-- | Documentation/mm/damon/design.rst | 337 | ||||
-rw-r--r-- | Documentation/mm/damon/faq.rst | 23 | ||||
-rw-r--r-- | Documentation/mm/damon/maintainer-profile.rst | 4 | ||||
-rw-r--r-- | Documentation/mm/page_migration.rst | 7 | ||||
-rw-r--r-- | Documentation/mm/split_page_table_lock.rst | 17 |
5 files changed, 327 insertions, 61 deletions
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 0cff6fac6b7e..4bfdf1d30c4a 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -4,31 +4,55 @@ Design ====== -Configurable Layers -=================== - -DAMON provides data access monitoring functionality while making the accuracy -and the overhead controllable. The fundamental access monitorings require -primitives that dependent on and optimized for the target address space. On -the other hand, the accuracy and overhead tradeoff mechanism, which is the core -of DAMON, is in the pure logic space. DAMON separates the two parts in -different layers and defines its interface to allow various low level -primitives implementations configurable with the core logic. We call the low -level primitives implementations monitoring operations. - -Due to this separated design and the configurable interface, users can extend -DAMON for any address space by configuring the core logics with appropriate -monitoring operations. If appropriate one is not provided, users can implement -the operations on their own. + +Overall Architecture +==================== + +DAMON subsystem is configured with three layers including + +- Operations Set: Implements fundamental operations for DAMON that depends on + the given monitoring target address-space and available set of + software/hardware primitives, +- Core: Implements core logics including monitoring overhead/accurach control + and access-aware system operations on top of the operations set layer, and +- Modules: Implements kernel modules for various purposes that provides + interfaces for the user space, on top of the core layer. + + +Configurable Operations Set +--------------------------- + +For data access monitoring and additional low level work, DAMON needs a set of +implementations for specific operations that are dependent on and optimized for +the given target address space. On the other hand, the accuracy and overhead +tradeoff mechanism, which is the core logic of DAMON, is in the pure logic +space. DAMON separates the two parts in different layers, namely DAMON +Operations Set and DAMON Core Logics Layers, respectively. It further defines +the interface between the layers to allow various operations sets to be +configured with the core logic. + +Due to this design, users can extend DAMON for any address space by configuring +the core logic to use the appropriate operations set. If any appropriate set +is unavailable, users can implement one on their own. For example, physical memory, virtual memory, swap space, those for specific processes, NUMA nodes, files, and backing memory devices would be supportable. -Also, if some architectures or devices support special optimized access check -primitives, those will be easily configurable. +Also, if some architectures or devices supporting special optimized access +check primitives, those will be easily configurable. -Reference Implementations of Address Space Specific Monitoring Operations -========================================================================= +Programmable Modules +-------------------- + +Core layer of DAMON is implemented as a framework, and exposes its application +programming interface to all kernel space components such as subsystems and +modules. For common use cases of DAMON, DAMON subsystem provides kernel +modules that built on top of the core layer using the API, which can be easily +used by the user space end users. + + +Operations Set Layer +==================== The monitoring operations are defined in two parts: @@ -90,8 +114,12 @@ conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, as Idle page tracking does. -Address Space Independent Core Mechanisms -========================================= +Core Logics +=========== + + +Monitoring +---------- Below four sections describe each of the DAMON core mechanisms and the five monitoring attributes, ``sampling interval``, ``aggregation interval``, @@ -100,7 +128,7 @@ regions``. Access Frequency Monitoring ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of DAMON says what pages are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting @@ -127,7 +155,7 @@ size of the target workload grows. Region Based Sampling ---------------------- +~~~~~~~~~~~~~~~~~~~~~ To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that assumed to have the same access frequencies into a region. As long as the @@ -144,7 +172,7 @@ assumption is not guaranteed. Adaptive Regions Adjustment ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), @@ -162,8 +190,22 @@ In this way, DAMON provides its best-effort quality and minimal overhead while keeping the bounds users set for their trade-off. +Age Tracking +~~~~~~~~~~~~ + +By analyzing the monitoring results, users can also find how long the current +access pattern of a region has maintained. That could be used for good +understanding of the access pattern. For example, page placement algorithm +utilizing both the frequency and the recency could be implemented using that. +To make such access pattern maintained period analysis easier, DAMON maintains +yet another counter called ``age`` in each region. For each ``aggregation +interval``, DAMON checks if the region's size and access frequency +(``nr_accesses``) has significantly changed. If so, the counter is reset to +zero. Otherwise, the counter is increased. + + Dynamic Target Space Updates Handling -------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The monitoring target address range could dynamically changed. For example, virtual memory could be dynamically mapped and unmapped. Physical memory could @@ -174,3 +216,246 @@ monitoring operations to check dynamic changes including memory mapping changes and applies it to monitoring operations-related data structures such as the abstracted monitoring target memory area only for each of a user-specified time interval (``update interval``). + + +.. _damon_design_damos: + +Operation Schemes +----------------- + +One common purpose of data access monitoring is access-aware system efficiency +optimizations. For example, + + paging out memory regions that are not accessed for more than two minutes + +or + + using THP for memory regions that are larger than 2 MiB and showing a high + access frequency for more than one minute. + +One straightforward approach for such schemes would be profile-guided +optimizations. That is, getting data access monitoring results of the +workloads or the system using DAMON, finding memory regions of special +characteristics by profiling the monitoring results, and making system +operation changes for the regions. The changes could be made by modifying or +providing advice to the software (the application and/or the kernel), or +reconfiguring the hardware. Both offline and online approaches could be +available. + +Among those, providing advice to the kernel at runtime would be flexible and +effective, and therefore widely be used. However, implementing such schemes +could impose unnecessary redundancy and inefficiency. The profiling could be +redundant if the type of interest is common. Exchanging the information +including monitoring results and operation advice between kernel and user +spaces could be inefficient. + +To allow users to reduce such redundancy and inefficiencies by offloading the +works, DAMON provides a feature called Data Access Monitoring-based Operation +Schemes (DAMOS). It lets users specify their desired schemes at a high +level. For such specifications, DAMON starts monitoring, finds regions having +the access pattern of interest, and applies the user-desired operation actions +to the regions as soon as found. + + +.. _damon_design_damos_action: + +Operation Action +~~~~~~~~~~~~~~~~ + +The management action that the users desire to apply to the regions of their +interest. For example, paging out, prioritizing for next reclamation victim +selection, advising ``khugepaged`` to collapse or split, or doing nothing but +collecting statistics of the regions. + +The list of supported actions is defined in DAMOS, but the implementation of +each action is in the DAMON operations set layer because the implementation +normally depends on the monitoring target address space. For example, the code +for paging specific virtual address ranges out would be different from that for +physical address ranges. And the monitoring operations implementation sets are +not mandated to support all actions of the list. Hence, the availability of +specific DAMOS action depends on what operations set is selected to be used +together. + +Applying an action to a region is considered as changing the region's +characteristics. Hence, DAMOS resets the age of regions when an action is +applied to those. + + +.. _damon_design_damos_access_pattern: + +Target Access Pattern +~~~~~~~~~~~~~~~~~~~~~ + +The access pattern of the schemes' interest. The patterns are constructed with +the properties that DAMON's monitoring results provide, specifically the size, +the access frequency, and the age. Users can describe their access pattern of +interest by setting minimum and maximum values of the three properties. If a +region's three properties are in the ranges, DAMOS classifies it as one of the +regions that the scheme is having an interest in. + + +.. _damon_design_damos_quotas: + +Quotas +~~~~~~ + +DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if +the target access pattern is not properly tuned. For example, if a huge memory +region having the access pattern of interest is found, applying the scheme's +action to all pages of the huge region could consume unacceptably large system +resources. Preventing such issues by tuning the access pattern could be +challenging, especially if the access patterns of the workloads are highly +dynamic. + +To mitigate that situation, DAMOS provides an upper-bound overhead control +feature called quotas. It lets users specify an upper limit of time that DAMOS +can use for applying the action, and/or a maximum bytes of memory regions that +the action can be applied within a user-specified time duration. + + +.. _damon_design_damos_quotas_prioritization: + +Prioritization +^^^^^^^^^^^^^^ + +A mechanism for making a good decision under the quotas. When the action +cannot be applied to all regions of interest due to the quotas, DAMOS +prioritizes regions and applies the action to only regions having high enough +priorities so that it will not exceed the quotas. + +The prioritization mechanism should be different for each action. For example, +rarely accessed (colder) memory regions would be prioritized for page-out +scheme action. In contrast, the colder regions would be deprioritized for huge +page collapse scheme action. Hence, the prioritization mechanisms for each +action are implemented in each DAMON operations set, together with the actions. + +Though the implementation is up to the DAMON operations set, it would be common +to calculate the priority using the access pattern properties of the regions. +Some users would want the mechanisms to be personalized for their specific +case. For example, some users would want the mechanism to weigh the recency +(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users +to specify the weight of each access pattern property and passes the +information to the underlying mechanism. Nevertheless, how and even whether +the weight will be respected are up to the underlying prioritization mechanism +implementation. + + +.. _damon_design_damos_watermarks: + +Watermarks +~~~~~~~~~~ + +Conditional DAMOS (de)activation automation. Users might want DAMOS to run +only under certain situations. For example, when a sufficient amount of free +memory is guaranteed, running a scheme for proactive reclamation would only +consume unnecessary system resources. To avoid such consumption, the user would +need to manually monitor some metrics such as free memory ratio, and turn +DAMON/DAMOS on or off. + +DAMOS allows users to offload such works using three watermarks. It allows the +users to configure the metric of their interest, and three watermark values, +namely high, middle, and low. If the value of the metric becomes above the +high watermark or below the low watermark, the scheme is deactivated. If the +metric becomes below the mid watermark but above the low watermark, the scheme +is activated. If all schemes are deactivated by the watermarks, the monitoring +is also deactivated. In this case, the DAMON worker thread only periodically +checks the watermarks and therefore incurs nearly zero overhead. + + +.. _damon_design_damos_filters: + +Filters +~~~~~~~ + +Non-access pattern-based target memory regions filtering. If users run +self-written programs or have good profiling tools, they could know something +more than the kernel, such as future access patterns or some special +requirements for specific types of memory. For example, some users may know +only anonymous pages can impact their program's performance. They can also +have a list of latency-critical processes. + +To let users optimize DAMOS schemes with such special knowledge, DAMOS provides +a feature called DAMOS filters. The feature allows users to set an arbitrary +number of filters for each scheme. Each filter specifies the type of target +memory, and whether it should exclude the memory of the type (filter-out), or +all except the memory of the type (filter-in). + +As of this writing, anonymous page type and memory cgroup type are supported by +the feature. Some filter target types can require additional arguments. For +example, the memory cgroup filter type asks users to specify the file path of +the memory cgroup for the filter. Hence, users can apply specific schemes to +only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages +excluding those of specific cgroups, and any combination of those. + + +Application Programming Interface +--------------------------------- + +The programming interface for kernel space data access-aware applications. +DAMON is a framework, so it does nothing by itself. Instead, it only helps +other kernel components such as subsystems and modules building their data +access-aware applications using DAMON's core features. For this, DAMON exposes +its all features to other kernel components via its application programming +interface, namely ``include/linux/damon.h``. Please refer to the API +:doc:`document </mm/damon/api>` for details of the interface. + + +Modules +======= + +Because the core of DAMON is a framework for kernel components, it doesn't +provide any direct interface for the user space. Such interfaces should be +implemented by each DAMON API user kernel components, instead. DAMON subsystem +itself implements such DAMON API user modules, which are supposed to be used +for general purpose DAMON control and special purpose data access-aware system +operations, and provides stable application binary interfaces (ABI) for the +user space. The user space can build their efficient data access-aware +applications using the interfaces. + + +General Purpose User Interface Modules +-------------------------------------- + +DAMON modules that provide user space ABIs for general purpose DAMON usage in +runtime. + +DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs +interface' are DAMON API user kernel modules that provide ABIs to the +user-space. Please note that DAMON debugfs interface is currently deprecated. + +Like many other ABIs, the modules create files on sysfs and debugfs, allow +users to specify their requests to and get the answers from DAMON by writing to +and reading from the files. As a response to such I/O, DAMON user interface +modules control DAMON and retrieve the results as user requested via the DAMON +API, and return the results to the user-space. + +The ABIs are designed to be used for user space applications development, +rather than human beings' fingers. Human users are recommended to use such +user space tools. One such Python-written user space tool is available at +Github (https://github.com/awslabs/damo), Pypi +(https://pypistats.org/packages/damo), and Fedora +(https://packages.fedoraproject.org/pkgs/python-damo/damo/). + +Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for +details of the interfaces. + + +Special-Purpose Access-aware Kernel Modules +------------------------------------------- + +DAMON modules that provide user space ABI for specific purpose DAMON usage. + +DAMON sysfs/debugfs user interfaces are for full control of all DAMON features +in runtime. For each special-purpose system-wide data access-aware system +operations such as proactive reclamation or LRU lists balancing, the interfaces +could be simplified by removing unnecessary knobs for the specific purpose, and +extended for boot-time and even compile time control. Default values of DAMON +control parameters for the usage would also need to be optimized for the +purpose. + +To support such cases, yet more DAMON API user kernel modules that provide more +simple and optimized user space interfaces are available. Currently, two +modules for proactive reclamation and LRU lists manipulation are provided. For +more detail, please read the usage documents for those +(:doc:`/admin-guide/mm/damon/reclaim` and +:doc:`/admin-guide/mm/damon/lru_sort`). diff --git a/Documentation/mm/damon/faq.rst b/Documentation/mm/damon/faq.rst index dde7e2414ee6..3279dc7a8211 100644 --- a/Documentation/mm/damon/faq.rst +++ b/Documentation/mm/damon/faq.rst @@ -4,29 +4,6 @@ Frequently Asked Questions ========================== -Why a new subsystem, instead of extending perf or other user space tools? -========================================================================= - -First, because it needs to be lightweight as much as possible so that it can be -used online, any unnecessary overhead such as kernel - user space context -switching cost should be avoided. Second, DAMON aims to be used by other -programs including the kernel. Therefore, having a dependency on specific -tools like perf is not desirable. These are the two biggest reasons why DAMON -is implemented in the kernel space. - - -Can 'idle pages tracking' or 'perf mem' substitute DAMON? -========================================================= - -Idle page tracking is a low level primitive for access check of the physical -address space. 'perf mem' is similar, though it can use sampling to minimize -the overhead. On the other hand, DAMON is a higher-level framework for the -monitoring of various address spaces. It is focused on memory management -optimization and provides sophisticated accuracy/overhead handling mechanisms. -Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of -DAMON's output, but cannot substitute DAMON. - - Does DAMON support virtual memory only? ======================================= diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst index 24a202f03de8..a84c14e59053 100644 --- a/Documentation/mm/damon/maintainer-profile.rst +++ b/Documentation/mm/damon/maintainer-profile.rst @@ -3,7 +3,7 @@ DAMON Maintainer Entry Profile ============================== -The DAMON subsystem covers the files that listed in 'DATA ACCESS MONITOR' +The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR' section of 'MAINTAINERS' file. The mailing lists for the subsystem are damon@lists.linux.dev and @@ -15,7 +15,7 @@ SCM Trees There are multiple Linux trees for DAMON development. Patches under development or testing are queued in damon/next [2]_ by the DAMON maintainer. -Suffieicntly reviewed patches will be queued in mm-unstable [1]_ by the memory +Sufficiently reviewed patches will be queued in mm-unstable [1]_ by the memory management subsystem maintainer. After more sufficient tests, the patches will be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the memory management subsystem maintainer. diff --git a/Documentation/mm/page_migration.rst b/Documentation/mm/page_migration.rst index 313dce18893e..e35af7805be5 100644 --- a/Documentation/mm/page_migration.rst +++ b/Documentation/mm/page_migration.rst @@ -73,14 +73,13 @@ In kernel use of migrate_pages() It also prevents the swapper or other scans from encountering the page. -2. We need to have a function of type new_page_t that can be +2. We need to have a function of type new_folio_t that can be passed to migrate_pages(). This function should figure out - how to allocate the correct new page given the old page. + how to allocate the correct new folio given the old folio. 3. The migrate_pages() function is called which attempts to do the migration. It will call the function to allocate - the new page for each page that is considered for - moving. + the new folio for each folio that is considered for moving. How migrate_pages() works ========================= diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock. There are helpers to lock/unlock a table and other accessor functions: - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() |