Maple Tree¶
- Author:
Liam R. Howlett
Overview¶
The Maple Tree is a B-Tree data type which is optimized for storing non-overlapping ranges, including ranges of size 1. The tree was designed to be simple to use and does not require a user written search method. It supports iterating over a range of entries and going to the previous or next entry in a cache-efficient manner. The tree can also be put into an RCU-safe mode of operation which allows reading and writing concurrently. Writers must synchronize on a lock, which can be the default spinlock, or the user can set the lock to an external lock of a different type.
The Maple Tree maintains a small memory footprint and was designed to use modern processor cache efficiently. The majority of the users will be able to use the normal API. An Advanced API exists for more complex scenarios. The most important usage of the Maple Tree is the tracking of the virtual memory areas.
The Maple Tree can store values between 0
and ULONG_MAX
. The Maple
Tree reserves values with the bottom two bits set to '10' which are below 4096
(ie 2, 6, 10 .. 4094) for internal use. If the entries may use reserved
entries then the users can convert the entries using xa_mk_value()
and convert
them back by calling xa_to_value()
. If the user needs to use a reserved
value, then the user can convert the value when using the
Advanced API, but are blocked by the normal API.
The Maple Tree can also be configured to support searching for a gap of a given size (or larger).
Pre-allocating of nodes is also supported using the Advanced API. This is useful for users who must guarantee a successful store operation within a given code segment when allocating cannot be done. Allocations of nodes are relatively small at around 256 bytes.
Normal API¶
Start by initialising a maple tree, either with DEFINE_MTREE() for statically
allocated maple trees or mt_init()
for dynamically allocated ones. A
freshly-initialised maple tree contains a NULL
pointer for the range 0
- ULONG_MAX
. There are currently two types of maple trees supported: the
allocation tree and the regular tree. The regular tree has a higher branching
factor for internal nodes. The allocation tree has a lower branching factor
but allows the user to search for a gap of a given size or larger from either
0
upwards or ULONG_MAX
down. An allocation tree can be used by
passing in the MT_FLAGS_ALLOC_RANGE
flag when initialising the tree.
You can then set entries using mtree_store()
or mtree_store_range()
.
mtree_store()
will overwrite any entry with the new entry and return 0 on
success or an error code otherwise. mtree_store_range()
works in the same way
but takes a range. mtree_load()
is used to retrieve the entry stored at a
given index. You can use mtree_erase()
to erase an entire range by only
knowing one value within that range, or mtree_store()
call with an entry of
NULL may be used to partially erase a range or many ranges at once.
If you want to only store a new entry to a range (or index) if that range is
currently NULL
, you can use mtree_insert_range()
or mtree_insert()
which
return -EEXIST if the range is not empty.
You can search for an entry from an index upwards by using mt_find()
.
You can walk each entry within a range by calling mt_for_each()
. You must
provide a temporary variable to store a cursor. If you want to walk each
element of the tree then 0
and ULONG_MAX
may be used as the range. If
the caller is going to hold the lock for the duration of the walk then it is
worth looking at the mas_for_each()
API in the Advanced API
section.
Sometimes it is necessary to ensure the next call to store to a maple tree does not allocate memory, please see Advanced API for this use case.
Finally, you can remove all entries from a maple tree by calling
mtree_destroy()
. If the maple tree entries are pointers, you may wish to free
the entries first.
Allocating Nodes¶
The allocations are handled by the internal tree code. See Advanced Allocating Nodes for other options.
Locking¶
You do not have to worry about locking. See Advanced Locking for other options.
The Maple Tree uses RCU and an internal spinlock to synchronise access:
- Takes RCU read lock:
- Takes ma_lock internally:
If you want to take advantage of the internal lock to protect the data
structures that you are storing in the Maple Tree, you can call mtree_lock()
before calling mtree_load()
, then take a reference count on the object you
have found before calling mtree_unlock(). This will prevent stores from
removing the object from the tree between looking up the object and
incrementing the refcount. You can also use RCU to avoid dereferencing
freed memory, but an explanation of that is beyond the scope of this
document.
Advanced API¶
The advanced API offers more flexibility and better performance at the cost of an interface which can be harder to use and has fewer safeguards. You must take care of your own locking while using the advanced API. You can use the ma_lock, RCU or an external lock for protection. You can mix advanced and normal operations on the same array, as long as the locking is compatible. The Normal API is implemented in terms of the advanced API.
The advanced API is based around the ma_state, this is where the 'mas' prefix originates. The ma_state struct keeps track of tree operations to make life easier for both internal and external tree users.
Initialising the maple tree is the same as in the Normal API. Please see above.
The maple state keeps track of the range start and end in mas->index and mas->last, respectively.
mas_walk()
will walk the tree to the location of mas->index and set the
mas->index and mas->last according to the range for the entry.
You can set entries using mas_store()
. mas_store()
will overwrite any entry
with the new entry and return the first existing entry that is overwritten.
The range is passed in as members of the maple state: index and last.
You can use mas_erase()
to erase an entire range by setting index and
last of the maple state to the desired range to erase. This will erase
the first range that is found in that range, set the maple state index
and last as the range that was erased and return the entry that existed
at that location.
You can walk each entry within a range by using mas_for_each()
. If you want
to walk each element of the tree then 0
and ULONG_MAX
may be used as
the range. If the lock needs to be periodically dropped, see the locking
section mas_pause()
.
Using a maple state allows mas_next()
and mas_prev()
to function as if the
tree was a linked list. With such a high branching factor the amortized
performance penalty is outweighed by cache optimization. mas_next()
will
return the next entry which occurs after the entry at index. mas_prev()
will return the previous entry which occurs before the entry at index.
mas_find()
will find the first entry which exists at or above index on
the first call, and the next entry from every subsequent calls.
mas_find_rev()
will find the fist entry which exists at or below the last on
the first call, and the previous entry from every subsequent calls.
If the user needs to yield the lock during an operation, then the maple state
must be paused using mas_pause()
.
There are a few extra interfaces provided when using an allocation tree. If you wish to search for a gap within a range, then mas_empty_area() or mas_empty_area_rev() can be used. mas_empty_area() searches for a gap starting at the lowest index given up to the maximum of the range. mas_empty_area_rev() searches for a gap starting at the highest index given and continues downward to the lower bound of the range.
Advanced Allocating Nodes¶
Allocations are usually handled internally to the tree, however if allocations need to occur before a write occurs then calling mas_expected_entries() will allocate the worst-case number of needed nodes to insert the provided number of ranges. This also causes the tree to enter mass insertion mode. Once insertions are complete calling mas_destroy() on the maple state will free the unused allocations.
Advanced Locking¶
The maple tree uses a spinlock by default, but external locks can be used for
tree updates as well. To use an external lock, the tree must be initialized
with the MT_FLAGS_LOCK_EXTERN flag
, this is usually done with the
MTREE_INIT_EXT()
#define, which takes an external lock as an argument.
Functions and structures¶
Maple tree flags
MT_FLAGS_ALLOC_RANGE - Track gaps in this tree
MT_FLAGS_USE_RCU - Operate in RCU mode
MT_FLAGS_HEIGHT_OFFSET - The position of the tree height in the flags
MT_FLAGS_HEIGHT_MASK - The mask for the maple tree height value
MT_FLAGS_LOCK_MASK - How the mt_lock is used
MT_FLAGS_LOCK_IRQ - Acquired irq-safe
MT_FLAGS_LOCK_BH - Acquired bh-safe
MT_FLAGS_LOCK_EXTERN - mt_lock is not used
MAPLE_HEIGHT_MAX The largest height that can be stored
-
MTREE_INIT¶
MTREE_INIT (name, __flags)
Initialize a maple tree
Parameters
name
The maple tree name
__flags
The maple tree flags
-
MTREE_INIT_EXT¶
MTREE_INIT_EXT (name, __flags, __lock)
Initialize a maple tree with an external lock.
Parameters
name
The tree name
__flags
The maple tree flags
__lock
The external lock
-
bool mtree_empty(const struct maple_tree *mt)¶
Determine if a tree has any present entries.
Parameters
const struct maple_tree *mt
Maple Tree.
Context
Any context.
Return
true
if the tree contains only NULL pointers.
-
void mas_reset(struct ma_state *mas)¶
Reset a Maple Tree operation state.
Parameters
struct ma_state *mas
Maple Tree operation state.
Description
Resets the error or walk state of the mas so future walks of the array will start from the root. Use this if you have dropped the lock and want to reuse the ma_state.
Context
Any context.
-
mas_for_each¶
mas_for_each (__mas, __entry, __max)
Iterate over a range of the maple tree.
Parameters
__mas
Maple Tree operation state (maple_state)
__entry
Entry retrieved from the tree
__max
maximum index to retrieve from the tree
Description
When returned, mas->index and mas->last will hold the entire range for the entry.
Note
may return the zero entry.
-
void __mas_set_range(struct ma_state *mas, unsigned long start, unsigned long last)¶
Set up Maple Tree operation state to a sub-range of the current location.
Parameters
struct ma_state *mas
Maple Tree operation state.
unsigned long start
New start of range in the Maple Tree.
unsigned long last
New end of range in the Maple Tree.
Description
set the internal maple state values to a sub-range.
Please use mas_set_range()
if you do not know where you are in the tree.
-
void mas_set_range(struct ma_state *mas, unsigned long start, unsigned long last)¶
Set up Maple Tree operation state for a different index.
Parameters
struct ma_state *mas
Maple Tree operation state.
unsigned long start
New start of range in the Maple Tree.
unsigned long last
New end of range in the Maple Tree.
Description
Move the operation state to refer to a different range. This will
have the effect of starting a walk from the top; see mas_next()
to move to an adjacent index.
-
void mas_set(struct ma_state *mas, unsigned long index)¶
Set up Maple Tree operation state for a different index.
Parameters
struct ma_state *mas
Maple Tree operation state.
unsigned long index
New index into the Maple Tree.
Description
Move the operation state to refer to a different index. This will
have the effect of starting a walk from the top; see mas_next()
to move to an adjacent index.
-
void mt_init_flags(struct maple_tree *mt, unsigned int flags)¶
Initialise an empty maple tree with flags.
Parameters
struct maple_tree *mt
Maple Tree
unsigned int flags
maple tree flags.
Description
If you need to initialise a Maple Tree with special flags (eg, an allocation tree), use this function.
Context
Any context.
-
void mt_init(struct maple_tree *mt)¶
Initialise an empty maple tree.
Parameters
struct maple_tree *mt
Maple Tree
Description
An empty Maple Tree.
Context
Any context.
-
void mt_clear_in_rcu(struct maple_tree *mt)¶
Switch the tree to non-RCU mode.
Parameters
struct maple_tree *mt
The Maple Tree
-
void mt_set_in_rcu(struct maple_tree *mt)¶
Switch the tree to RCU safe mode.
Parameters
struct maple_tree *mt
The Maple Tree
-
mt_for_each¶
mt_for_each (__tree, __entry, __index, __max)
Iterate over each entry starting at index until max.
Parameters
__tree
The Maple Tree
__entry
The current entry
__index
The index to start the search from. Subsequently used as iterator.
__max
The maximum limit for index
Description
This iterator skips all entries, which resolve to a NULL pointer, e.g. entries which has been reserved with XA_ZERO_ENTRY.
-
void *mas_insert(struct ma_state *mas, void *entry)¶
Internal call to insert a value
Parameters
struct ma_state *mas
The maple state
void *entry
The entry to store
Return
NULL
or the contents that already exists at the requested index
otherwise. The maple state needs to be checked for error conditions.
-
void *mas_walk(struct ma_state *mas)¶
Search for mas->index in the tree.
Parameters
struct ma_state *mas
The maple state.
Description
mas->index and mas->last will be set to the range if there is a value. If mas->node is MAS_NONE, reset to MAS_START.
Return
the entry at the location or NULL
.
-
void __rcu **mte_dead_walk(struct maple_enode **enode, unsigned char offset)¶
Walk down a dead tree to just before the leaves
Parameters
struct maple_enode **enode
The maple encoded node
unsigned char offset
The starting offset
Note
This can only be used from the RCU callback context.
-
void mt_free_walk(struct rcu_head *head)¶
Walk & free a tree in the RCU callback context
Parameters
struct rcu_head *head
The RCU head that's within the node.
Note
This can only be used from the RCU callback context.
-
void *mas_store(struct ma_state *mas, void *entry)¶
Store an entry.
Parameters
struct ma_state *mas
The maple state.
void *entry
The entry to store.
Description
The mas->index and mas->last is used to set the range for the entry.
Note
The mas should have pre-allocated entries to ensure there is memory to store the entry. Please see mas_expected_entries()/mas_destroy() for more details.
Return
the first entry between mas->index and mas->last or NULL
.
-
int mas_store_gfp(struct ma_state *mas, void *entry, gfp_t gfp)¶
Store a value into the tree.
Parameters
struct ma_state *mas
The maple state
void *entry
The entry to store
gfp_t gfp
The GFP_FLAGS to use for allocations if necessary.
Return
0 on success, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
-
void mas_store_prealloc(struct ma_state *mas, void *entry)¶
Store a value into the tree using memory preallocated in the maple state.
Parameters
struct ma_state *mas
The maple state
void *entry
The entry to store.
-
int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)¶
Preallocate enough nodes for a store operation
Parameters
struct ma_state *mas
The maple state
void *entry
The entry that will be stored
gfp_t gfp
The GFP_FLAGS to use for allocations.
Return
0 on success, -ENOMEM if memory could not be allocated.
-
void *mas_next(struct ma_state *mas, unsigned long max)¶
Get the next entry.
Parameters
struct ma_state *mas
The maple state
unsigned long max
The maximum index to check.
Description
Returns the next entry after mas->index. Must hold rcu_read_lock or the write lock. Can return the zero entry.
Return
The next entry or NULL
-
void *mas_next_range(struct ma_state *mas, unsigned long max)¶
Advance the maple state to the next range
Parameters
struct ma_state *mas
The maple state
unsigned long max
The maximum index to check.
Description
Sets mas->index and mas->last to the range. Must hold rcu_read_lock or the write lock. Can return the zero entry.
Return
The next entry or NULL
-
void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max)¶
get the next value in the maple tree
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The start index
unsigned long max
The maximum index to check
Description
Takes RCU read lock internally to protect the search, which does not protect the returned pointer after dropping RCU read lock. See also: Maple Tree
Return
The entry higher than index or NULL
if nothing is found.
-
void *mas_prev(struct ma_state *mas, unsigned long min)¶
Get the previous entry
Parameters
struct ma_state *mas
The maple state
unsigned long min
The minimum value to check.
Description
Must hold rcu_read_lock or the write lock. Will reset mas to MAS_START if the node is MAS_NONE. Will stop on not searchable nodes.
Return
the previous value or NULL
.
-
void *mas_prev_range(struct ma_state *mas, unsigned long min)¶
Advance to the previous range
Parameters
struct ma_state *mas
The maple state
unsigned long min
The minimum value to check.
Description
Sets mas->index and mas->last to the range. Must hold rcu_read_lock or the write lock. Will reset mas to MAS_START if the node is MAS_NONE. Will stop on not searchable nodes.
Return
the previous value or NULL
.
-
void *mt_prev(struct maple_tree *mt, unsigned long index, unsigned long min)¶
get the previous value in the maple tree
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The start index
unsigned long min
The minimum index to check
Description
Takes RCU read lock internally to protect the search, which does not protect the returned pointer after dropping RCU read lock. See also: Maple Tree
Return
The entry before index or NULL
if nothing is found.
-
void mas_pause(struct ma_state *mas)¶
Pause a mas_find/mas_for_each to drop the lock.
Parameters
struct ma_state *mas
The maple state to pause
Description
Some users need to pause a walk and drop the lock they're holding in
order to yield to a higher priority thread or carry out an operation
on an entry. Those users should call this function before they drop
the lock. It resets the mas to be suitable for the next iteration
of the loop after the user has reacquired the lock. If most entries
found during a walk require you to call mas_pause()
, the mt_for_each()
iterator may be more appropriate.
-
bool mas_find_setup(struct ma_state *mas, unsigned long max, void **entry)¶
Internal function to set up mas_find*().
Parameters
struct ma_state *mas
The maple state
unsigned long max
The maximum index
void **entry
Pointer to the entry
Return
True if entry is the answer, false otherwise.
-
void *mas_find(struct ma_state *mas, unsigned long max)¶
On the first call, find the entry at or after mas->index up to
max
. Otherwise, find the entry after mas->index.
Parameters
struct ma_state *mas
The maple state
unsigned long max
The maximum value to check.
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL
.
-
void *mas_find_range(struct ma_state *mas, unsigned long max)¶
On the first call, find the entry at or after mas->index up to
max
. Otherwise, advance to the next slot mas->index.
Parameters
struct ma_state *mas
The maple state
unsigned long max
The maximum value to check.
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL
.
-
bool mas_find_rev_setup(struct ma_state *mas, unsigned long min, void **entry)¶
Internal function to set up mas_find_*_rev()
Parameters
struct ma_state *mas
The maple state
unsigned long min
The minimum index
void **entry
Pointer to the entry
Return
True if entry is the answer, false otherwise.
-
void *mas_find_rev(struct ma_state *mas, unsigned long min)¶
On the first call, find the first non-null entry at or below mas->index down to
min
. Otherwise find the first non-null entry below mas->index down tomin
.
Parameters
struct ma_state *mas
The maple state
unsigned long min
The minimum value to check.
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL
.
-
void *mas_find_range_rev(struct ma_state *mas, unsigned long min)¶
On the first call, find the first non-null entry at or below mas->index down to
min
. Otherwise advance to the previous slot after mas->index down tomin
.
Parameters
struct ma_state *mas
The maple state
unsigned long min
The minimum value to check.
Description
Must hold rcu_read_lock or the write lock. If an entry exists, last and index are updated accordingly. May set mas->node to MAS_NONE.
Return
The entry or NULL
.
-
void *mas_erase(struct ma_state *mas)¶
Find the range in which index resides and erase the entire range.
Parameters
struct ma_state *mas
The maple state
Description
Must hold the write lock. Searches for mas->index, sets mas->index and mas->last to the range and erases that range.
Return
the entry that was erased or NULL
, mas->index and mas->last are updated.
-
bool mas_nomem(struct ma_state *mas, gfp_t gfp)¶
Check if there was an error allocating and do the allocation if necessary If there are allocations, then free them.
Parameters
struct ma_state *mas
The maple state
gfp_t gfp
The GFP_FLAGS to use for allocations
Return
true on allocation, false otherwise.
-
void *mtree_load(struct maple_tree *mt, unsigned long index)¶
Load a value stored in a maple tree
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The index to load
Return
the entry or NULL
-
int mtree_store_range(struct maple_tree *mt, unsigned long index, unsigned long last, void *entry, gfp_t gfp)¶
Store an entry at a given range.
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The start of the range
unsigned long last
The end of the range
void *entry
The entry to store
gfp_t gfp
The GFP_FLAGS to use for allocations
Return
0 on success, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
-
int mtree_store(struct maple_tree *mt, unsigned long index, void *entry, gfp_t gfp)¶
Store an entry at a given index.
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The index to store the value
void *entry
The entry to store
gfp_t gfp
The GFP_FLAGS to use for allocations
Return
0 on success, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
-
int mtree_insert_range(struct maple_tree *mt, unsigned long first, unsigned long last, void *entry, gfp_t gfp)¶
Insert an entry at a given range if there is no value.
Parameters
struct maple_tree *mt
The maple tree
unsigned long first
The start of the range
unsigned long last
The end of the range
void *entry
The entry to store
gfp_t gfp
The GFP_FLAGS to use for allocations.
Return
0 on success, -EEXISTS if the range is occupied, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
-
int mtree_insert(struct maple_tree *mt, unsigned long index, void *entry, gfp_t gfp)¶
Insert an entry at a given index if there is no value.
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The index to store the value
void *entry
The entry to store
gfp_t gfp
The GFP_FLAGS to use for allocations.
Return
0 on success, -EEXISTS if the range is occupied, -EINVAL on invalid request, -ENOMEM if memory could not be allocated.
-
void *mtree_erase(struct maple_tree *mt, unsigned long index)¶
Find an index and erase the entire range.
Parameters
struct maple_tree *mt
The maple tree
unsigned long index
The index to erase
Description
Erasing is the same as a walk to an entry then a store of a NULL to that ENTIRE range. In fact, it is implemented as such using the advanced API.
Return
The entry stored at the index or NULL
-
void __mt_destroy(struct maple_tree *mt)¶
Walk and free all nodes of a locked maple tree.
Parameters
struct maple_tree *mt
The maple tree
Note
Does not handle locking.
-
void mtree_destroy(struct maple_tree *mt)¶
Destroy a maple tree
Parameters
struct maple_tree *mt
The maple tree
Description
Frees all resources used by the tree. Handles locking.
-
void *mt_find(struct maple_tree *mt, unsigned long *index, unsigned long max)¶
Search from the start up until an entry is found.
Parameters
struct maple_tree *mt
The maple tree
unsigned long *index
Pointer which contains the start location of the search
unsigned long max
The maximum value of the search range
Description
Takes RCU read lock internally to protect the search, which does not protect the returned pointer after dropping RCU read lock. See also: Maple Tree
In case that an entry is found index is updated to point to the next possible entry independent whether the found entry is occupying a single index or a range if indices.
Return
The entry at or after the index or NULL
-
void *mt_find_after(struct maple_tree *mt, unsigned long *index, unsigned long max)¶
Search from the start up until an entry is found.
Parameters
struct maple_tree *mt
The maple tree
unsigned long *index
Pointer which contains the start location of the search
unsigned long max
The maximum value to check
Description
Same as mt_find()
except that it checks index for 0 before
searching. If index == 0, the search is aborted. This covers a wrap
around of index to 0 in an iterator loop.
Return
The entry at or after the index or NULL