tcmalloc | Kyung Yun Lee

tcmalloc is a thread-caching malloc developed by Google.

It is good for handling multi-threading situations.

The main benefit is that there is no need to use locks in case of small object allocations, since each thread has its own pre-allocated thread-local cache.

Objects that are less than 32K are considered small.

Screen Shot 2020-11-30 at 9.59.23 AM

Main Components

Thread-local cache

Each thread has its own pre-allocated cache. This cache is a list of singly linked lists. Each linked list consists of same sized object classes. The size-classes are separated by 8 bytes, 16 bytes, 32 bytes, and up to 256 bytes, totalling 170 classes. The smallest size class is 8 bytes, the largest is 32K.

When the thread asks for a small object, OS will identify this thread’s cache and retrieve the free object from its cache. If the linked list for a size-class is empty, it is refilled from the central free list.
Page heap

Page heap is the “main” memory. This is initialized first when the program starts. It is shared by all threads, which means that accessing page heap requires a lock. When central free list needs more memory, it accesses page heap.

Memory in page heap is organized into spans. Spans are continguous pages. Its size ranges from 1 to 256 pages.

For efficient management of spans, tcmalloc uses 2 or 3-level radix tree, which maps page number to span.
Central free list

Central free list is shared by all threads. It manages memory in spans, like the page heap. However, each span is segmented into size-classes, so that when the thread caches need more memory, it can be easily taken from the span.

Functions

void * tc_central_init ()

Initialize page heap
Initialize central free list
Initialize radix tree

void * tc_thread_init()

Find current thread id
Find if there are enough space for building a thread cache
- How much memory to allocate per thread cache?
Initialize thread cache

void * tc_malloc (size_t size)

Check if size is small or large
If it is small, find current thread and perform small object allocation with its thread cache
If it is large, perform large object allocation with page heap
Return the beginning of memory location

pthread_mutex_t lock; 

void * tc_malloc (size_t size) {
  void * memloc; 
  // Check if size is small or large 
  if (size <= MAX_SMALL_SIZE) { 
  	memloc = small_object_allocation(size); 
  } 
  else {
    // initiate lock 
    pthread_mutex_lock(&lock);
    
  	memloc = large_object_allocation(size);
		// free lock 
    pthread_mutex_unlock(&lock); 
  } 
  return memloc; 
}

Object * small_object_allocation (size_t size)

Map size -> size_class
Get current thread id and its thread cache
Get the freelist of the current size_class
If the free list is not empty, return the first object
If the free list is empty,
1. get_object_from_central_freelist(size_t size_class)
2. put it into the current free list
3. Return the first object
If the central free list is empty,
1. get_pages_from_central_heap (size_t size_class)
2. divide it into objects of size_class
3. put it inside the central free list.
4. get_object_from_central_freelist(size_t size_class)
5. put it into the current free list
6. Return the first object

Span * large_object_allocation(size_t size)

Round size -> page_size
From page heap, get the free list (span list) for this page_size
If the free list is not empty
1. Return the first span
If the free list is empty,
1. Go to the next page size’s free list until you find a non-empty free list.
2. If the last free list is empty,
  1. Get memory from the system
3. If there are left over pages, insert the left over pages into the corresponding free list.

void tc_free(void * ptr)

Compute the page number from radix tree
Check if the size is small or not
If the object is small
1. Get size_class
2. Find out current thread id and its thread cache
3. Insert into the free list of size_class
If the object is large
1. Check the pages that are used by this object.
2. Merge with adjacent pages if they are free
3. Return the merged page into the page heap