Header Heap

Memory for HTTP header data is kept in header heaps.

Classes

class HdrHeapObjImpl

This is the abstract base class for objects allocated in a HdrHeap. This allows updating objects in a heap in a generic way, without having to locate all of the pointers to the objects.

The type of an instance stored in a heap must be one of the following values.

enumerator HDR_HEAP_OBJ_EMPTY = 0

Used to mark invalid objects, ones not yet constructed or ones that have been destroyed.

enumerator HDR_HEAP_OBJ_RAW = 1

Some sort of raw object, I have no idea.

enumerator HDR_HEAP_OBJ_URL = 2

A URL object.

enumerator HDR_HEAP_OBJ_HTTP_HEADER = 3

The header for an HTTP request or response.

enumerator HDR_HEAP_OBJ_MIME_HEADER = 4

A MIME header, containing MIME style fields with names and values.

enumerator HDR_HEAP_OBJ_FIELD_BLOCK = 5

Who the heck knows?

class HdrStrHeap

This is a variable sized class, therefore new instance must be created by new_HdrStrHeap() and deallocated by the destroy method.

HdrStrHeap *new_HdrStrHeap(int n)

Create and return a new instance of HdrStrHeap. If n is less than HDR_STR_HEAP_DEFAULT_SIZE it is increased to that value.

If the allocated size is HDR_STR_HEAP_DEFAULT_SIZE (or smaller and upsized to that value) then the instance is allocated from a thread local pool via strHeapAllocator. If larger it is allocated from global memory via ats_malloc.

class HdrHeap

This is a variable sized class and therefore new instances must be created by new_HdrHeap() and deallocated by the destroy method.

HdrHeap manages memory for heap objects directly and memory for strings via ancillary heaps (which are instances of HdrStrHeap). For the string heaps there is at most one writeable heap, and up to HDR_BUF_RONLY_HEAPS read only heaps.

All objects in the internal heap must be subclasses of HdrHeapObjImpl.

size_t required_space_for_evacuation()

Calculate and return the total live string space for this.

void evacuate_from_str_heaps(HdrStrHeap *new_heap)

Copy all live strings from the heap objects in this to new_heap.

void coalesce_str_heaps(int incoming_size)

This garbage collects the string heaps in a half space style, by creating a new string space (string heap), copying all of the strings there, and then discarding the existing string heaps.

The total amount of live string space is calculated by HdrHeap::required_space_for_evacuation() and a new string heap is created of a size at least as large as the live string space plus incoming_size bytes.

All of the live strings are moved to the new string heap by HdrHeap::evacuate_from_str_heaps(), the existing string heaps are deallocated, and the new string heap becomes the writeable string heap for the header heap. The end result is a single writeable string heap and no read only string heaps, with all live strings resident in that writeable string heap.

char *allocate_str(int bytes)

Allocate nbytes of space for a string in the writeable string heap. A pointer to the first byte is returned, or nullptr if the space could not be allocated.

HdrHeapObjImpl *allocate_obj(int nbytes, int type)

Allocate a type object that is nbytes in size in the heap and return a pointer to it, or nullptr if the object could not be allocated.

nbytes must be at most HDR_MAX_ALLOC_SIZE.

The members of HdrHeapObjImpl are initialized. Further initialization is the responsibility of the caller.

type must be one of the values specified in HdrHeapObjImpl.

int marshal_length()

Compute and return the size of the buffer needed to serialize this.

int marshal(char *buffer, int length)

Serialize this to buffer of size length. It is required that length be at least the value returned by HdrHeap::marshal_length().

HdrHeap *new_HdrHeap(int n)

Create and return a new instance of HdrHeap. If n is less than HdrHeap::DEFAULT_SIZE it is increased to that value.

If the allocated size is HdrHeap::DEFAULT_SIZE (or smaller and upsized to that value) then the instance is allocated from a thread local pool via hdrHeapAllocator. If larger it is allocated from global memory via ats_malloc.

Header Heap Class Structure

../../_images/hdr-heap-class.svg

Implementation

String Coalescence

String heaps do do not maintain lists of internal free space. Strings that are released are left in place, creating dead space in the heap. For this reason it can become necessary to do a garbage collection operation on the writeable string heap in the header heap by calling HdrHeap::coalesce_str_heaps(). This is done when

  • The amount of dead space in the writable string heap exceeds MAX_LOST_STR_SPACE.

  • An external string heap is being added and all current read only string heap slots are used.

The mechanism is simple in design - the size of the live string data in the current string heaps is calculated and a new heap is allocated sufficient to contain all existing strings, with additional space for new string data. Each heap object is required to provide a strings_length method which returns the size of the live string data for that object (recursively as needed). The strings are copied to the new string heap, all of the previous string heaps are discarded, and the new heap becomes the writable string heap for the header heap.

Each heap object is responsible for providing a move_strings method which copies its strings to a new string heap, passed as an argument. This is a source of pointer invalidation for other parts of the core and the plugin API. For the latter, insulating from such string movement is the point of the TSMLoc type.

String Allocation

Storage for a string is allocated by HdrHeap::allocate_str(). If the current amount of dead space is too large, this is treated as an initial allocation failure. If there is no current writeable string heap, one is created that is a least as large as the space requested and the size of the previous writeable string heap. Space for the string is then allocated out of the writeable string heap. If this fails due to lack of space the current writeable string heap is “demoted” to a read only string heap and allocation retried (which will cause a new writeable string heap). If the writeable string heap cannot be demoted due to lack of read only slots, the strings heaps are coalesced with an additional size request of the requested string size. This will result in a single writeable string heap and not read only heaps, the former containing all of the existing strings plus sufficient space to allocate the new string.

Decision Diagram

../../_images/hdr-heap-str-alloc.svg

Object Allocation

Objects are allocated on the header heap by HdrHeap::allocate_obj(). Such objects must be one of a compile time determined set of types 1. This method first tries to allocate the object in existing free space. If that doesn’t work then the allocator walks a list of HdrHeap instances looking for space. If no space is found anywhere, a new HdrHeap instance is created with twice the space of the last HdrHeap in the list and added to the list to try.

Once space is found for the object, the base members of HdrHeapObjImpl are initialized with the objec type and size, with the m_obj_flags set to 0.

Serialization

Because heaps store the HTTP request / response data, a header heap needs to be serialized to be put in to the cache. For performance reasons, it is desirable to be able to unserialize the serialized data in place, rather than copying it again. That is, the data is read from disk into a block of memory and then that memory is converted to a live data structure. In this case the memory used by the heap is owned by some other object and the header heap must not do any clean up. This is signaled by the m_writeable flag. In an unserialized header heap this is set to false and such a header heap is not allowed to allocate any additional objects or strings - it is immutable.

The primary mechanism to do this is to use swizzling on the pointers in the structure. During serialization pointers are converted to offsets and during unserialization these offsets are converted back to pointers. To make this simpler, unserialized header heaps are marked read only so that updating does not have to be supported. Additionally, HdrHeap is a POD and therefore has no virtual function table pointer to be stored or restored 2.

To serialize, first HdrHeap::marshal_length() is called to get a buffer size. The serialization buffer is created with sufficient space for the header heap and that space is passed to HdrHeap::marshal() to perform the actual serialization. The object heaps are serialized followed by the string heaps. No coalescence is done, on the presumption that because the amount of dead space is limited by coalescence (as needed) on every string creation.

When serializing strings, each object is responsible for swizzling its own pointers. Because the object heaps have already been serialized and all of the header heap object types are also PODs, these serialized objects can have the pointer swizzling method, marshal, called directly on them. This method is provided with a set of “translations” which indicate the base offset for each range of object and string heap memory. The object marshalling can then compute the correct offset to store for each live string pointer.

Inheriting Strings

The string heaps are designed to be reference counted so that they can be shared as read only objects between heaps. This enables copying heap objects between heaps less expensive as the strings pointers in them can be preserved in the new heap by sharing the string heaps in which those strings reside.

This can still be a bit complex as it is possible that the combined number of string heaps is more than the limit. In this case, the target header heap does string coalescence so that it is reduced to having a single writeable string heap with enough free space to hold all of the strings in the source header heap. As a result, it is required that all heap objects already be present in the target header heap before the strings are inherited. This means that the string coalescence will properly copy the strings of and update the strings pointers in the copied heap objects.

Footnotes.

1

Not that I can see any good reason for that, if virtual methods instead of switch statements were used.

2

Which makes the initialization logic to “fixup” the virtual function pointer rather silly.