tcmalloc is a thread-caching malloc developed by Google.

It is good for handling multi-threading situations.

The main benefit is that there is no need to use locks in case of small object allocations, since each thread has its own pre-allocated thread-local cache.

Objects that are less than 32K are considered small.

Screen Shot 2020-11-30 at 9.59.23 AM

Main Components

  • Thread-local cache

    Each thread has its own pre-allocated cache. This cache is a list of singly linked lists. Each linked list consists of same sized object classes. The size-classes are separated by 8 bytes, 16 bytes, 32 bytes, and up to 256 bytes, totalling 170 classes. The smallest size class is 8 bytes, the largest is 32K.

    When the thread asks for a small object, OS will identify this thread’s cache and retrieve the free object from its cache. If the linked list for a size-class is empty, it is refilled from the central free list.

    Screen Shot 2020-11-30 at 10.11.55 AM

  • Page heap

    Page heap is the “main” memory. This is initialized first when the program starts. It is shared by all threads, which means that accessing page heap requires a lock. When central free list needs more memory, it accesses page heap.

    Memory in page heap is organized into spans. Spans are continguous pages. Its size ranges from 1 to 256 pages.

    For efficient management of spans, tcmalloc uses 2 or 3-level radix tree, which maps page number to span.

    Screen Shot 2020-11-30 at 10.27.23 AM

  • Central free list

    Central free list is shared by all threads. It manages memory in spans, like the page heap. However, each span is segmented into size-classes, so that when the thread caches need more memory, it can be easily taken from the span.

Functions

void * tc_central_init ()

  • Initialize page heap

  • Initialize central free list

  • Initialize radix tree

void * tc_thread_init()

  • Find current thread id
  • Find if there are enough space for building a thread cache
    • How much memory to allocate per thread cache?
  • Initialize thread cache

void * tc_malloc (size_t size)

  1. Check if size is small or large
  2. If it is small, find current thread and perform small object allocation with its thread cache
  3. If it is large, perform large object allocation with page heap
  4. Return the beginning of memory location
pthread_mutex_t lock; 

void * tc_malloc (size_t size) {
  void * memloc; 
  // Check if size is small or large 
  if (size <= MAX_SMALL_SIZE) { 
  	memloc = small_object_allocation(size); 
  } 
  else {
    // initiate lock 
    pthread_mutex_lock(&lock);
    
  	memloc = large_object_allocation(size);
		// free lock 
    pthread_mutex_unlock(&lock); 
  } 
  return memloc; 
}

Object * small_object_allocation (size_t size)

  1. Map size -> size_class
  2. Get current thread id and its thread cache
  3. Get the freelist of the current size_class
  4. If the free list is not empty, return the first object
  5. If the free list is empty,
    1. get_object_from_central_freelist(size_t size_class)
    2. put it into the current free list
    3. Return the first object
  6. If the central free list is empty,
    1. get_pages_from_central_heap (size_t size_class)
    2. divide it into objects of size_class
    3. put it inside the central free list.
    4. get_object_from_central_freelist(size_t size_class)
    5. put it into the current free list
    6. Return the first object

Span * large_object_allocation(size_t size)

  1. Round size -> page_size

  2. From page heap, get the free list (span list) for this page_size

  3. If the free list is not empty

    1. Return the first span
  4. If the free list is empty,

    1. Go to the next page size’s free list until you find a non-empty free list.
    2. If the last free list is empty,
      1. Get memory from the system
    3. If there are left over pages, insert the left over pages into the corresponding free list.

void tc_free(void * ptr)

  1. Compute the page number from radix tree

  2. Check if the size is small or not

  3. If the object is small

    1. Get size_class

    2. Find out current thread id and its thread cache

    3. Insert into the free list of size_class

  4. If the object is large

    1. Check the pages that are used by this object.
    2. Merge with adjacent pages if they are free
    3. Return the merged page into the page heap