IntrusiveHashMap

IntrusiveHashMap provides a “hashed” or “unordered” set, using intrusive links. It provides a container for elements, each of which has a key. A hash function is applied to a key to generate a hash id which is used to group the elements in to buckets for fast lookup. This container is a mix of std::unordered_set and std::unordered_map. There is no separation between elements and keys, but each element can contain non-key data.

Iteration over elements is provided and is constant time.

In order to optimize lookup, the container can increase the number of buckets used. This is called “expansion” and when it occurs is control by the “expansion policy” of the container. The policy can be to be automatic or only done explicitly.

Usage

To use an IntrusiveHashMap the element must provide support for the container. This is done through an associated descriptor class which provides the operations needed to manipulate the elements in the container.

Details

template<typename H>
class IntrusiveHashMap
Template Parameters:

H – Element operations.

An unordered map using a hash function. The properties of the map are determined by types and operations provided by the descriptor type H. The following types are derived from H and defined in the container type.

type value_type

The type of elements in the container, deduced from the return types of the link accessor methods in H.

type key_type

The type of the key used for hash computations. Deduced from the return type of the key accessor. An instance of this type is never default constructed nor modified, therefore it can be a reference if the key type is expensive to copy.

type hash_id

The type of the hash of a key_type. Deduced from the return type of the hash function. This must be a numeric type.

H

This describes the hash map, primarily via the operations required for the map. The related types are deduced from the function return types. This is designed to be compatible with IntrusiveDList.

static key_type key_of(value_type *v)

Key accessor - return the key of the element v.

static hash_id hash_of(key_type key)

Hash function - compute the hash value of the key.

static bool equal(key_type lhs, key_type rhs)

Key comparison - two keys are equal if this function returns true.

static IntrusiveHashMap::value_type *&next_ptr(IntrusiveHashMap::value_type *v)

Return a reference to the next element pointer embedded in the element v.

static IntrusiveHashMap::value_type *&prev_ptr(IntrusiveHashMap::value_type *v)

Return a reference to the previous element pointer embedded in the element v.

type iterator

An STL compliant iterator over elements in the container.

type range

An STL compliant half open range of elements, represented by a pair of iterators.

IntrusiveHashMap &insert(value_type *v)

Insert the element v into the container. If there are already elements with the same key, v is inserted after these elements.

There is no emplace because v is put in the container, not a copy of v. For the same reason v must be constructed before calling this method, the container will never create an element instance.

iterator begin()

Return an iterator to the first element in the container.

iterator end()

Return an iterator to past the last element in the container.

iterator find(value_type *v)

Search for v in the container. If found, return an iterator referring to v. If not return the end iterator. This validates v is in the container.

range equal_range(key_type key)

Find all elements with a key that is equal to key. The returned value is a half open range starting with the first matching element to one past the last matching element. All element in the range will have a key that is equal to key. If no element has a matching key the range will be empty.

IntrusiveHashMap &erase(iterator spot)

Remove the element referred to by spot from the container.

iterator iterator_for(value_type *v)

Return an iterator for v. This is very fast, faster than IntrusiveHashMap::find() but less safe because no validation done on v. If it not in the container (either in no container or a different one) further iteration on the returned iterator will go badly. It is useful inside range for loops when it is guaranteed the element is in the container.

template<typename F>
IntrusiveHashMap &apply(F &&f)
Template Parameters:

F – A functional type with the signature void (value_type*).

This applies the function f to every element in the container in such a way that modification of the element does not interfere with the iteration. The most common use is to delete the elements during cleanup. The common idiom

for ( auto & elt : container) delete &elt;

is problematic because the iteration links are in the deleted element causing the computation of the next element to be a use after free. Using IntrusiveHashMap::apply() enables safe cleanup.

container.apply([](value_type & v) { delete & v; });

Because the links are intrusive it is possible for other classes or the element class to modify them. In such cases this method provides a safe way to invoke such mechanisms.

Design Notes

This is a refresh of an previously existing class, TSHahTable. The switch to C++ 11 and then C++ 17 made it possible to do much better in terms of the internal implementation and API. The overall functionality is the roughly the same but with an easier API, compatibility with IntrusiveDList, improved compliance with STL container standards, and better internal implementation.

The biggest change is that elements are stored in a single global list rather than per hash bucket. The buckets serve only as entry points in to the global list and to count the number of elements per bucket. This simplifies the implementation of iteration, so that the old Location nested class can be removed. Elements with equal keys can be handled in the same way as with STL containers, via iterator ranges, instead of a custom pseudo-iterator class.

Notes on IntrusiveHashMap::apply()

This was added after some experience with use of the container. Initially it was added to make cleaning up the container easier. Without it, cleanup looks like

for ( auto spot = map.begin(), limit = map.end() ; spot != limit ; delete &( * spot++)) {
   ; // empty
}

Instead one can do

map.apply([](value_type& v) { delete &v; });

The post increment operator guarantees that spot has been updated before the current element is destroyed. However, it turns out to be handy in other map modifying operations. In the unit tests there is this code

  // Erase all the non-"dup" and see if the range is still correct.
  map.apply([&map](Thing *thing) {
    if (thing->_payload != "dup"sv) {

This removes all elements that do not have the payload “dup”. As another design note, IntrusiveHashMap::iterator_for() here serves to bypass validation checking on the target for IntrusiveHashMap::erase(), which is proper because IntrusiveHashMap::apply() guarantees thing is in the map.

Without apply this is needed

auto idx = map.begin();
while (idx != map.end()) {
  auto x{idx++};
  if ("dup"sv != x->_payload) {
    map.erase(x);
  }
}

The latter is more verbose and more importantly less obvious, depending on a subtle interaction with post increment.