diff options
author | Jesper Dangaard Brouer <brouer@redhat.com> | 2019-06-18 15:05:47 +0200 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2019-06-19 11:23:13 -0400 |
commit | 99c07c43c4ea0bc101331401a0fabfc51933c6a3 (patch) | |
tree | a21190f1a54dc072c9e3f6e9a94a5a0970d5f3bf /include/net/page_pool.h | |
parent | 29b006a67634e389ebc1b0c45e72a84d60118d6f (diff) | |
download | lwn-99c07c43c4ea0bc101331401a0fabfc51933c6a3.tar.gz lwn-99c07c43c4ea0bc101331401a0fabfc51933c6a3.zip |
xdp: tracking page_pool resources and safe removal
This patch is needed before we can allow drivers to use page_pool for
DMA-mappings. Today with page_pool and XDP return API, it is possible to
remove the page_pool object (from rhashtable), while there are still
in-flight packet-pages. This is safely handled via RCU and failed lookups in
__xdp_return() fallback to call put_page(), when page_pool object is gone.
In-case page is still DMA mapped, this will result in page note getting
correctly DMA unmapped.
To solve this, the page_pool is extended with tracking in-flight pages. And
XDP disconnect system queries page_pool and waits, via workqueue, for all
in-flight pages to be returned.
To avoid killing performance when tracking in-flight pages, the implement
use two (unsigned) counters, that in placed on different cache-lines, and
can be used to deduct in-flight packets. This is done by mapping the
unsigned "sequence" counters onto signed Two's complement arithmetic
operations. This is e.g. used by kernel's time_after macros, described in
kernel commit 1ba3aab3033b and 5a581b367b5, and also explained in RFC1982.
The trick is these two incrementing counters only need to be read and
compared, when checking if it's safe to free the page_pool structure. Which
will only happen when driver have disconnected RX/alloc side. Thus, on a
non-fast-path.
It is chosen that page_pool tracking is also enabled for the non-DMA
use-case, as this can be used for statistics later.
After this patch, using page_pool requires more strict resource "release",
e.g. via page_pool_release_page() that was introduced in this patchset, and
previous patches implement/fix this more strict requirement.
Drivers no-longer call page_pool_destroy(). Drivers already call
xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will
attempt to disconnect the mem id, and if attempt fails schedule the
disconnect for later via delayed workqueue.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/net/page_pool.h')
-rw-r--r-- | include/net/page_pool.h | 41 |
1 files changed, 31 insertions, 10 deletions
diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 754d980700df..f09b3f1994e6 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -16,14 +16,16 @@ * page_pool_alloc_pages() call. Drivers should likely use * page_pool_dev_alloc_pages() replacing dev_alloc_pages(). * - * If page_pool handles DMA mapping (use page->private), then API user - * is responsible for invoking page_pool_put_page() once. In-case of - * elevated refcnt, the DMA state is released, assuming other users of - * the page will eventually call put_page(). + * API keeps track of in-flight pages, in-order to let API user know + * when it is safe to dealloactor page_pool object. Thus, API users + * must make sure to call page_pool_release_page() when a page is + * "leaving" the page_pool. Or call page_pool_put_page() where + * appropiate. For maintaining correct accounting. * - * If no DMA mapping is done, then it can act as shim-layer that - * fall-through to alloc_page. As no state is kept on the page, the - * regular put_page() call is sufficient. + * API user must only call page_pool_put_page() once on a page, as it + * will either recycle the page, or in case of elevated refcnt, it + * will release the DMA mapping and in-flight state accounting. We + * hope to lift this requirement in the future. */ #ifndef _NET_PAGE_POOL_H #define _NET_PAGE_POOL_H @@ -66,9 +68,10 @@ struct page_pool_params { }; struct page_pool { - struct rcu_head rcu; struct page_pool_params p; + u32 pages_state_hold_cnt; + /* * Data structure for allocation side * @@ -96,6 +99,8 @@ struct page_pool { * TODO: Implement bulk return pages into this structure. */ struct ptr_ring ring; + + atomic_t pages_state_release_cnt; }; struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); @@ -109,8 +114,6 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) struct page_pool *page_pool_create(const struct page_pool_params *params); -void page_pool_destroy(struct page_pool *pool); - void __page_pool_free(struct page_pool *pool); static inline void page_pool_free(struct page_pool *pool) { @@ -143,6 +146,24 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, __page_pool_put_page(pool, page, true); } +/* API user MUST have disconnected alloc-side (not allowed to call + * page_pool_alloc_pages()) before calling this. The free-side can + * still run concurrently, to handle in-flight packet-pages. + * + * A request to shutdown can fail (with false) if there are still + * in-flight packet-pages. + */ +bool __page_pool_request_shutdown(struct page_pool *pool); +static inline bool page_pool_request_shutdown(struct page_pool *pool) +{ + /* When page_pool isn't compiled-in, net/core/xdp.c doesn't + * allow registering MEM_TYPE_PAGE_POOL, but shield linker. + */ +#ifdef CONFIG_PAGE_POOL + return __page_pool_request_shutdown(pool); +#endif +} + /* Disconnects a page (from a page_pool). API users can have a need * to disconnect a page (from a page_pool), to allow it to be used as * a regular page (that will eventually be returned to the normal |