tcmalloc
tcmalloc is a thread-caching malloc developed by Google.
It is good for handling multi-threading situations.
The main benefit is that there is no need to use locks in case of small object allocations, since each thread has its own pre-allocated thread-local cache.
Objects that are less than 32K are considered small.
Main Components
-
Thread-local cache
Each thread has its own pre-allocated cache. This cache is a list of singly linked lists. Each linked list consists of same sized object classes. The size-classes are separated by 8 bytes, 16 bytes, 32 bytes, and up to 256 bytes, totalling 170 classes. The smallest size class is 8 bytes, the largest is 32K.
When the thread asks for a small object, OS will identify this thread’s cache and retrieve the free object from its cache. If the linked list for a size-class is empty, it is refilled from the central free list.
-
Page heap
Page heap is the “main” memory. This is initialized first when the program starts. It is shared by all threads, which means that accessing page heap requires a lock. When central free list needs more memory, it accesses page heap.
Memory in page heap is organized into spans. Spans are continguous pages. Its size ranges from 1 to 256 pages.
For efficient management of spans, tcmalloc uses 2 or 3-level radix tree, which maps page number to span.
-
Central free list
Central free list is shared by all threads. It manages memory in spans, like the page heap. However, each span is segmented into size-classes, so that when the thread caches need more memory, it can be easily taken from the span.
Functions
void * tc_central_init ()
-
Initialize page heap
-
Initialize central free list
-
Initialize radix tree
void * tc_thread_init()
- Find current thread id
- Find if there are enough space for building a thread cache
- How much memory to allocate per thread cache?
- Initialize thread cache
void * tc_malloc (size_t size)
- Check if size is small or large
- If it is small, find current thread and perform small object allocation with its thread cache
- If it is large, perform large object allocation with page heap
- Return the beginning of memory location
pthread_mutex_t lock;
void * tc_malloc (size_t size) {
void * memloc;
// Check if size is small or large
if (size <= MAX_SMALL_SIZE) {
memloc = small_object_allocation(size);
}
else {
// initiate lock
pthread_mutex_lock(&lock);
memloc = large_object_allocation(size);
// free lock
pthread_mutex_unlock(&lock);
}
return memloc;
}
Object * small_object_allocation (size_t size)
- Map size -> size_class
- Get current thread id and its thread cache
- Get the freelist of the current size_class
- If the free list is not empty, return the first object
- If the free list is empty,
get_object_from_central_freelist(size_t size_class)
- put it into the current free list
- Return the first object
- If the central free list is empty,
get_pages_from_central_heap (size_t size_class)
- divide it into objects of size_class
- put it inside the central free list.
get_object_from_central_freelist(size_t size_class)
- put it into the current free list
- Return the first object
Span * large_object_allocation(size_t size)
-
Round size -> page_size
-
From page heap, get the free list (span list) for this page_size
-
If the free list is not empty
- Return the first span
-
If the free list is empty,
- Go to the next page size’s free list until you find a non-empty free list.
- If the last free list is empty,
- Get memory from the system
- If there are left over pages, insert the left over pages into the corresponding free list.
void tc_free(void * ptr)
-
Compute the page number from radix tree
-
Check if the size is small or not
-
If the object is small
-
Get size_class
-
Find out current thread id and its thread cache
-
Insert into the free list of size_class
-
-
If the object is large
- Check the pages that are used by this object.
- Merge with adjacent pages if they are free
- Return the merged page into the page heap