How Rust solves memory management without a garbage collector

Every computer program needs memory and a way to manage it. Traditional memory management paradigms are either error-prone or poorly performant. Rust’s memory management system is unique because it provides memory safety and predictably high performance without the use of a garbage collector.

The Stack and The Heap

First, we need peek underneath the programming language constructs and understand what it means to manage computer memory. A computer’s physical memory is called RAM and it’s a place which stores code and data for a running program, called a process. Each process has an associated memory known as its virtual address space, which is divided into different segments:

Virtual address space Virtual address space

Code: holds CPU instructions. An Instruction Pointer (IP) points to the next instruction to be executed. Does not shrink or grow once the process starts running

Data: holds global data such as variable and arrays. Does not shrink or grow once the process starts running

Heap: the dynamic part of data segment that grows or shrinks as the process runs. A proccess allocates itself a chunk of memory from the OS using a low-level library call such as malloc. This chunk must be explicitly freed by letting the OS know you’re done with it. Otherwise, it will remain allocated until the process crashes; this is known as a memory leak

Stack: stores local variables and function arguments. Similar to heap, the stack expands and contracts dynamically. However, the stack memory is managed automatically because of its last-in first out (LIFO) structure. When a function is called, a new stack frame is pushed onto the stack (automatic allocation). And when the function returns, the stack frame is destroyed (automatic deallocation)

So, stack memory allocation and deallocation is fast and handled automatically by the underlying system architecure. However, heap memory must be used when the lifetime of an object extends beyond one local function. Therefore, when we talk about memory management in programming, we’re talking about managing heap memory.

Pick your poison

Every programming language has some sort of mechanism to manage memory. Most programming languages use either manual memory management or garbage collection to manage computer memory. Choosing which form of memory management mechanism to use from these two is generally a trade-off between two characteristics of a running program: memory safety and performance.

Managing memory yourself

Old-school programming languages like C put the responsibility of managing dynamic memory, i.e. heap memory, on the programmer. A block of heap memory can be allocated and freed by making calls to low-level library functions malloc and free. It is the responsibility of programmer to make sure every heap-allocated memory block that is no longer needed is freed explicitly. Otherwise, the process will keep on eating up unnecessary memory; this is known as a memory leak.

Here’s how to leak around 400MiB of memory in C++:

#include <cstdio>
#include <cstdlib>

void leak(int n) {
    // Allocates memory that is never freed
    int *a = (int*) malloc(n * sizeof(int));
    return;
}

int main() {
    // Program will keep around useless 400MiB of 
    // memory allocated until it is terminated
    for (int i = 0; i < 100000; ++i) {
        leak(1000);
    }

    int input;
    scanf("%d", &input);
}

This is just one example of many different classes of memory-related bugs that arise in manual memory management. To see some of the most commonly-occuring memory errors along with their descriptions and code examples, check out alzheimers. Manual memory management, as in C/C++, is easy to get wrong, and the consequences of memory-related bugs can be devastating. That is why every major programming language outside of C, C++ and Rust use garbage collection.

Mark and sweep

Chances are that the programming language you use every day is a garbage-collected language. Java, Python, JavaScript and Go among others are all garbage-collected languages. But what is garbage collection?

Garbage collection (GC) is a runtime mechanism that automatically frees memory that is no longer being used by the program. It works by periodically running a separate program called the garbage collector in the background. Garbage collector pauses the execution of main program and runs an algorithm that finds and frees unused memory. Mark and sweep is one example of such algorithm. At a high level, it iterates all the references and marks the memory they are utilizing, then it sweeps or frees the memory that was not marked. Garbage collection is a complex topic that requires several blog posts to explain, so we’ll leave it here.

The interesting thing to note, however, is that the garbage collection itself is an overhead. It pauses the execution of main thread in order to sweep garbage memory, which results in unpredictable performance degradations in applications. GC performance drawbacks are more visible in real-time, performance-critical apps that can no longer perform their real-time tasks when the garbage collector is scheduled to run. In fact, Discord switched from Go to Rust for one of their performance-critical service exactly due to this reason.

Rust and RAII

Enter Rust. In a world where you must pick either memory safety or high performance, Rust offers you both. Rust has an automatic memory management mechanism to discard memory-related bugs at compile-time, so there’s no need for a garbage collector.

Ownership is a novel concept introduced in Rust that makes heap memory usage feel like it’s stack memory, i.e. automatic memory allocation and deallocation. Ownership is a set of rules that the Rust compiler enforces in order to make automatic memory management possible. The most important ownership rule is that every valid heap-allocated value in Rust has exactly one owner variable.

In Rust, we can use Box construct to allocate data on the heap. In the following snippet, Box::new creates an array of integers on the heap and stores the pointer to it in the variable a. We say that a is the owner of the heap-allocated array of integers. In the second line, the ownership is moved from a to b. Note that the heap data is not copied. Since there can be only one owner at a time, the same array of integers on the heap that was owned by a is now owned by variable b. Variable a can no longer be used after the ownership move as it is deemed invalid by Rust. Uncommenting the third line will produce a compile-time error:

let a = Box::new([0; 1000]);
let b = a;
// println!("{}", a[0]);

Since there is only one owner to any heap-allocated data, Rust can make the assumption that if the owner is no longer valid, the heap-allocated value can be dropped. Therefore, whenever the owner variable goes out of scope, the underlying heap memory is deallocated automatically. Binding the allocation lifetime of a resource, such as memory, to the lifetime of an object in this way is known as Resource Acquisition Is Initialization (RAII). RAII means that memory leaks are highly unlikely since memory is dropped automatically when the owner goes out of scope.

Following is a Rust program that has the same source code as the C++ example program we looked earlier. However, the behavior is different. Rust compiler won’t allow programmers to leak memory easily:

fn make_and_drop() {
    // Allocates heap-memory whose owner is variable '_a'
    let _a = Box::new([0; 1000]);
    // Memory is deallocated after function ends because
    // the owner variable's lifetime ends
}

fn main() {
    // No memory leak
    for _ in 0..100000 {
        make_and_drop();
    }

    let mut input = String::new();
    std::io::stdin().read_line(&mut input).unwrap();    
}

Here’s the summary from valgrind confirming that all heap blocks were freed. Hence, no memory leak:

➜ valgrind target/debug/noleak
==28617== HEAP SUMMARY:
==28617==     in use at exit: 0 bytes in 0 blocks
==28617==   total heap usage: 100,010 allocs, 100,010 frees, 400,002,157 bytes allocated
==28617== 
==28617== All heap blocks were freed -- no leaks are possible
==28617== 
==28617== For lists of detected and suppressed errors, rerun with: -s
==28617== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

References

https://github.com/binjamil/alzheimers/

https://blog.regehr.org/archives/1393

https://discord.com/blog/why-discord-is-switching-from-go-to-rust

Further Reading

https://stanford-cs242.github.io/f18/lectures/05-1-rust-memory-safety.html

https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html