<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/lib/radix-tree.c, branch 765</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=765</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=765'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2008-02-05T17:44:17+00:00</updated>
<entry>
<title>radix-tree: avoid atomic allocations for preloaded insertions</title>
<updated>2008-02-05T17:44:17+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-02-05T06:29:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e2848a0efedef4dad52d1334d37f8719cd6268fd'/>
<id>urn:sha1:e2848a0efedef4dad52d1334d37f8719cd6268fd</id>
<content type='text'>
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone-&gt;lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>avoid negative (and full-width) shifts in radix-tree.c</title>
<updated>2007-10-17T15:42:56+00:00</updated>
<author>
<name>Peter Lund</name>
<email>firefly@vax64.dk</email>
</author>
<published>2007-10-17T06:29:35+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=430d275a399175c7c0673459738979287ec1fd22'/>
<id>urn:sha1:430d275a399175c7c0673459738979287ec1fd22</id>
<content type='text'>
Negative shifts are not allowed in C (the result is undefined).  Same thing
with full-width shifts.

It works on most platforms but not on the VAX with gcc 4.0.1 (it results in an
"operand reserved" fault).

Shifting by more than the width of the value on the left is also not
allowed.  I think the extra '&gt;&gt; 1' tacked on at the end in the original
code was an attempt to work around that.  Getting rid of that is an extra
feature of this patch.

Here's the chapter and verse, taken from the final draft of the C99
standard ("6.5.7 Bitwise shift operators", paragraph 3):

  "The integer promotions are performed on each of the operands. The
  type of the result is that of the promoted left operand. If the
  value of the right operand is negative or is greater than or equal
  to the width of the promoted left operand, the behavior is
  undefined."

Thank you to Jan-Benedict Glaw, Christoph Hellwig, Maciej Rozycki, Pekka
Enberg, Andreas Schwab, and Christoph Lameter for review.  Special thanks
to Andreas for spotting that my fix only removed half the undefined
behaviour.

Signed-off-by: Peter Lund &lt;firefly@vax64.dk&gt;
Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: "Maciej W. Rozycki" &lt;macro@linux-mips.org&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Andreas Schwab &lt;schwab@suse.de&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: WU Fengguang &lt;wfg@mail.ustc.edu.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Slab API: remove useless ctor parameter and reorder parameters</title>
<updated>2007-10-17T15:42:45+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2007-10-17T06:25:51+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4ba9b9d0ba0a49d91fa6417c7510ee36f48cf957'/>
<id>urn:sha1:4ba9b9d0ba0a49d91fa6417c7510ee36f48cf957</id>
<content type='text'>
Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Group short-lived and reclaimable kernel allocations</title>
<updated>2007-10-16T16:43:00+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mel@csn.ul.ie</email>
</author>
<published>2007-10-16T08:25:52+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e12ba74d8ff3e2f73a583500d7095e406df4d093'/>
<id>urn:sha1:e12ba74d8ff3e2f73a583500d7095e406df4d093</id>
<content type='text'>
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fix the max path calculation in radix-tree.c</title>
<updated>2007-10-16T16:42:54+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2007-10-16T08:24:49+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=26fb1589cb0aaec3a0b4418c54f30c1a2b1781f6'/>
<id>urn:sha1:26fb1589cb0aaec3a0b4418c54f30c1a2b1781f6</id>
<content type='text'>
A while back, Nick Piggin introduced a patch to reduce the node memory
usage for small files (commit cfd9b7df4abd3257c9e381b0e445817b26a51c0c):

-#define RADIX_TREE_MAP_SHIFT	6
+#define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)

Unfortunately, he didn't take into account the fact that the
calculation of the maximum path was based on an assumption of having
to round up:

#define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)

So, if CONFIG_BASE_SMALL is set, you will end up with a
RADIX_TREE_MAX_PATH that is one greater than necessary.  The practical
upshot of this is just a bit of wasted memory (one long in the
height_to_maxindex array, an extra pre-allocated radix tree node per
cpu, and extra stack usage in a couple of functions), but it seems
worth getting right.

It's also worth noting that I never build with CONFIG_BASE_SMALL.
What I did to test this was duplicate the code in a small user-space
program and check the results of the calculations for max path and the
contents of the height_to_maxindex array.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix-tree: use indirect bit</title>
<updated>2007-10-16T16:42:53+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2007-10-16T08:24:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c0bc9875b701c588e448302d41181995c21e8040'/>
<id>urn:sha1:c0bc9875b701c588e448302d41181995c21e8040</id>
<content type='text'>
Rather than sign direct radix-tree pointers with a special bit, sign the
indirect one that hangs off the root.  This means that, given a lookup_slot
operation, the invalid result will be differentiated from the valid
(previously, valid results could have the bit either set or clear).

This does not affect slot lookups which occur under lock -- they can never
return an invalid result.  Is needed in future for lockless pagecache.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radixtree: introduce radix_tree_next_hole()</title>
<updated>2007-10-16T16:42:52+00:00</updated>
<author>
<name>Fengguang Wu</name>
<email>wfg@mail.ustc.edu.cn</email>
</author>
<published>2007-10-16T08:24:33+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6df8ba4f8a4c4abca9ccad10441d0dddbdff301c'/>
<id>urn:sha1:6df8ba4f8a4c4abca9ccad10441d0dddbdff301c</id>
<content type='text'>
Introduce radix_tree_next_hole(root, index, max_scan) to scan radix tree for
the first hole.  It will be used in interleaved readahead.

The implementation is dumb and obviously correct.  It can help debug(and
document) the possible smart one in future.

Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Fengguang Wu &lt;wfg@mail.ustc.edu.cn&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: Remove slab destructors from kmem_cache_create().</title>
<updated>2007-07-20T01:11:58+00:00</updated>
<author>
<name>Paul Mundt</name>
<email>lethal@linux-sh.org</email>
</author>
<published>2007-07-20T01:11:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=20c2df83d25c6a95affe6157a4c9cac4cf5ffaac'/>
<id>urn:sha1:20c2df83d25c6a95affe6157a4c9cac4cf5ffaac</id>
<content type='text'>
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt &lt;lethal@linux-sh.org&gt;
</content>
</entry>
<entry>
<title>[LIB]: export radix_tree_preload()</title>
<updated>2007-07-14T06:05:04+00:00</updated>
<author>
<name>David Chinner</name>
<email>dgc@sgi.com</email>
</author>
<published>2007-07-14T06:05:04+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d7f0923d83dcabfc063257a281529cdbcd5eedb5'/>
<id>urn:sha1:d7f0923d83dcabfc063257a281529cdbcd5eedb5</id>
<content type='text'>
XFS filestreams functionality uses radix trees and the preload
functions. XFS can be built as a module and hence we need
radix_tree_preload() exported. radix_tree_preload_end() is a
static inline, so it doesn't need exporting.

Signed-Off-By: Dave Chinner &lt;dgc@sgi.com&gt;
Signed-Off-By: Tim Shimmin &lt;tes@sgi.com&gt;
</content>
</entry>
<entry>
<title>Add suspend-related notifications for CPU hotplug</title>
<updated>2007-05-09T19:30:56+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rjw@sisk.pl</email>
</author>
<published>2007-05-09T09:35:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d'/>
<id>urn:sha1:8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d</id>
<content type='text'>
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Cc: Gautham R Shenoy &lt;ego@in.ibm.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
