Address Sanitizer Internals

Prerequisite

For this article, you’ll need the following knowledge:

Basic C understanding (Memory, Stack, Heap, Syscall).
(Optional) x86_64 Assembly

Preamble

Xavier Login is a freshly employee who got his internship in a big company. On his first days, he was asked to do a little program to validate an input by verifying that a magic byte was set.

So this was the first attempt of Xavier:

magic_checker.c

1
2
3
4
5
6
7
8
9


int is_magic_byte_valid(char *buffer) {
    return buffer[10] == 'A';
}

int main() {
    char buffer[] = "Hello";

    return is_magic_byte_valid(buffer);
}

Xavier was in a good school and never forgot about enabling ASan flags with compiling c code.

MagicCheckerOverflowDetected

Xavier screams internally seeing this big red line with written “stack-buffer-overflow”, knowing he did a big mistake.

Xavier looks at his code and finally finds the error, it was obviously the index in the function is_magic_byte_valid because the team asked him to check the 42nd index, not the 10th one !

This is the new code he came up with:

magic_checker.c

1
2
3
4
5
6
7
8
9


int is_magic_byte_valid(char *buffer) {
    return buffer[42] == 'A';
}

int main() {
    char buffer[] = "Hello";

    return is_magic_byte_valid(buffer);
}

After fixing it, Xavier compiles it, and executes it again.

MagicCheckerOverflowUndetected

This time, he executes the program and… nothing happens !?

Here is what Xavier drew to represent the situation.

XavierBuffer

And this doesn’t make sense for him. Why is the first error reported, and not the second ?

But ! Don’t worry Xavier, this document will help you learn and understand, how ASan works under the hood to show you (or not), invalid memory accesses and other memory mistakes.

This will also help you understand exactly what ASan is telling you when an error occurred.

Introduction To Address Sanitizer

AddressSanitizer (ASan) is a memory misuse detector tool for C and C++.

It’s a tool who lives in a compiler toolbox and uses dynamic analysis. Most modern compilers for C/C++ includes ASan supports.

List of compilers that supports ASan:

GNU GCC (since ver. 4.8)
Clang LLVM (since ver. 3.1)
MSVC (since ver. 16.4)

ASan is a tool divided into 2 modules:

Instrumentation module
- It consists on a compiler pass that will add instructions to our code on specific parts.
Run-time library
- The library implements functions to replace used memory functions (like malloc(3)).
- The library implements functions to report errors nicely to the user.

For this post, we will look at all the types of errors that ASan can detect.

Then, we will start to see how ASan works by explaining it’s core concepts.

And finally, we will also understand how ASan uses both of the modules to provide an efficient and fast memory error checking.

Error types

ASan can detect several classes of memory errors in C/C++.

Memory Errors

Global buffer underflow/overflow
Stack buffer underflow/overflow
Heap buffer underflow/overflow
Initialization order bugs
Use after return
Use after scope
Use after free
Memory leaks

Core Concepts

Let’s see what methods ASan can use to detect memory errors.

Memory Mapping

It will first modify the structure of the Virtual Memory for the program.

The virtual address space used by a program is now divided in 3 parts:

Application memory: The application still uses the memory normally and can store all of its data.
- It consists of ~7/8 of the virtual memory space.
Shadow memory: This part of the memory is handled by ASan.
- It takes ~1/8 of the virtual memory space.
Protected memory: This parts of the memory is used by ASan to detect unwanted access to Shadow Memory.

MemoryMapping

Memory Mapping for each architecture is defined here: https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.2.0/libsanitizer/asan/asan_mapping.h

Shadow Memory

The Shadow Memory is a part of the virtual memory used to store metadata about the data stored in the Application Memory of this virtual memory.

Each byte in the shadow memory, correspond to exactly 8 bytes in the application memory.

ASan access the Shadow Memory via a function MemToShadow which will map the application memory address to the shadow memory address.

The goal of this function is to be fast and that it can allow ASan to map an address like such: shadow_address = MemToShadow(application_adress);

Each bit in the Shadow Memory can be then analyzed be ASan, and identify the wrongly accessed byte in the application memory.

ASan is able to detect memory errors thanks to the bytes contained in the Shadow Memory, in conjunction with a method called Infection.

Infection

The Infection will poison some bytes allocated in Application Memory and store information about which bytes has been poisoned in the Shadow Memory.

The Infection takes place during a malloc() call (we will see later how can ASan modifies the behavior of malloc()).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


void *malloc(size_t size) {
    // Allocate data
    char *ptr = ...; // Pointer to our allocated memory (8-bytes aligned).

    // Infection
    AsanPoisonMemory(ptr, size);

    // Return address just like the normal malloc
    return ptr;
}

ASan manages to allocate everything (even the stack, which we’ll see later), using a call to malloc().

This is used to ease and simplify the infection done by ASan because everything stored in virtual memory, will go through a malloc() call, thus infecting our program memory.

For example, let’s allocate a simple integer on the heap.

1

int *my_int = malloc(sizeof(int));

The memory around the my_int pointer will look like:

MyIntMemoryPoisoning

Poisoned

A byte is Poisoned by ASan in the Application memory. Poisoning a byte means to write a special value to it. This value is then used to identify the type of invalid memory access we are doing.

The goal of the Shadow Memory is only to store information on where the poisoned bytes are in the Application Memory.

We said that, 1 byte in the Shadow Memory corresponds to 8 bytes in the Application Memory. Thus, we can represent the status (poisoned) of 1 byte in Application Memory, with only 1 bit in Shadow Memory.

And we can do this, because all memory allocated by our program will go through a malloc(). Moreover, the 8 bytes will always be free to modify, because malloc() is guaranteed to always return an 8-byte aligned chunk of memory.

We will see in later in this document, the list of all the possible values of a byte in the Shadow Memory.

Conclusion on ASan core concepts

As we can see, ASan modifies the usage of the underlying virtual memory as it needs to ensure that we access the right bytes while accessing memory.

This allows it to check efficiently for errors while only impacting the running program by only removing ~1/8 of the total available memory for the application.

Thanks to what we have seen already, you can then see why Xavier Login first error was caught, but, you are not sure why the second error was not caught by ASan. It should have detected it if ASan poison memory around the variable.

Unfortunately, one limitation of ASan, is that it can only poison bytes near an allocated memory. Thus, there are places in memory not yet poisoned, and ASan can miss out of bound accesses if it’s too far from a known allocation point.

Now that you know the core concepts of ASan, we are able to deep dive into its source code and see how it uses instrumentation and a run-time library to do all the necessary checks and infection.

Instrumentation Module

The goal of the Instrumentation Module is to add run-time checks before every memory instruction.

Initialization

The first instrumentation will be located at the module initialization and will add a call to __asan_init_vN() where N is the desired API version (we will not look at API differences).

This function will be called at module initialization time in order for ASan to be initialized.

Instuctions

Then, it needs to instruments certain types of instructions.

Actually, it only modifies 2 types of instructions:

load: When performing a load instruction (read).
store: When performing a store instruction (write).

The code:

1
2


// Perform a load instruction on Addr
read_value(Addr);

Will be enhanced with memory checks, like follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


// Compute Shadow Address
ShadowAddr = MemToShadow(Addr);

// Get the Shadow Value
ShadowValue = *(*char)ShadowAddr; // *(short*) for 16-byte access

// If the ShadowValue is different from 0, there is a poisoned byte access
if (ShadowValue)
{
    // If the load is for N=(1, 2 or 4) bytes from Addr,
    // we need to check each bit
    // For N=(8 or 16), this check is not needed
    if (IsByteSet(ShadowValue, Addr, N))
    {
        // ASan will report an error accessing unwanted memory
        __asan_report_loadN(Addr);
    }
}

// Read the value
read_value(Addr);

File: gcc/asan.cc

It’s exactly the same for a store instruction. (ASan just update the __asan_report_loadN to __asan_report_storeN).

Stack

The module will also instrument variables on the stack.

It will do so by creating a call for __asan_stack_malloc() which will take care of the creation of the Shadow Bytes in the stack alongside the stack variable.

Run-time Library

The Run-time library defines all the functions that we have seen in the instrumentation part.

It defines all the functions needed for ASan to work around the memory (poisoning bytes and checking for memory accesses), while also reporting error reports.

It consists of:

Mapping from/to Shadow Memory to/from Application Memory.
Interception of memory related functions.
Handling the Stack and the Fake Stack implementation.
Functions to check if the memory accessed is (un)valid.
Functions to report errors when the check failed.

Shadow and Application Address Mapping

The functions which are the most utilized through the program will be the functions to convert addresses from (or to) Shadow Memory to (or from) Application Memory.

These functions are defined in the asan_mapping.cc in order to change scale and offset depending on the architecture.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


// Memory to Shadow
#    define MEM_TO_SHADOW(mem) \
      (((mem) >> ASAN_SHADOW_SCALE) + (ASAN_SHADOW_OFFSET))

static inline uptr MemToShadow(uptr p) {
  PROFILE_ASAN_MAPPING();
  // Checking if `p` is in the Application Memory
  CHECK(AddrIsInMem(p));
  return MEM_TO_SHADOW(p);
}

// Shadow to Memory
#    define SHADOW_TO_MEM(mem) \
      (((mem) - (ASAN_SHADOW_OFFSET)) << (ASAN_SHADOW_SCALE))

static inline uptr ShadowToMem(uptr p) {
  PROFILE_ASAN_MAPPING();
  // Checking if `p` is in the Shadow Memory
  CHECK(AddrIsInShadow(p));
  return SHADOW_TO_MEM(p);
}

File: libsanitizer/asan/asan_mapping.h

For both functions, it checks whether the address is really in the specified region first. Then it converts the address thanks to the 2 macros MEM_TO_SHADOW and SHADOW_TO_MEM.

These functions are inlined in order to reduce function overhead because it will be called a lot of times. The faster this function is, the lower the performance impact ASan instrumentation will force on target code.

Interceptor

One of the main challenges for ASan is to intercept all functions that need to alter memory. For example, the famous malloc/calloc/free functions needs to be intercepted in order to poison memory before giving it back to the user.

In order to do this, ASan has a macro called INTERCEPTOR. It takes in parameter the return type, the function name and the arguments of the function we want to replace.

Let’s see for example the Interceptor for malloc().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


INTERCEPTOR(void*, malloc, uptr size) {
//          -----|-------|----------
//          rtype|fctname|argument

  if (DlsymAlloc::Use())
    return DlsymAlloc::Allocate(size);
  ENSURE_ASAN_INITED();
  GET_STACK_TRACE_MALLOC;

  // We can see here the call to the asan_malloc function which will allocate needed memory and poison it.
  return asan_malloc(size, &stack);
}

File: libsanitizer/asan/asan_malloc_linux.cpp

Now, every time we call malloc(), we will go through the function defined above, and not the libc malloc().

Stack

ASan can work with 2 “modes” for the Stack. It can stay with the basic Stack, or create a Fake Stack.

It can only do one stack mode at runtime, not both.

Normal Stack

For the normal Stack, the code has already been instrumented and it will call the __asan_stack_malloc_N() function which will allocate our local variable, as well as poison the memory around it.

The function __asan_stack_malloc_N() will call a function in the ASan Allocator currently selected, which depends on the type of Stack.

For the Normal Stack, the default Allocator is selected, and only local variables will be poisoned.

Fake Stack

The Fake Stack is used when doing User after return error detection.

ASan will use a special Allocator that will create a Fake Stack as well as Fake Frames that will be poisoned after being returned from.

Poisoning

In order to poison Memory, ASan give a simple function bool PoisonShadow(uptr addr, uptr size, u8 value).

Then, it can simply give the size of bytes at addr it wants to set to value.

On Linux, it’s simply a call to memset().

Error Reporting

In this part, we will analyse the ASan output when an error occurred, and see what we can now understand from each part.

For this, we will take an example of a stack_buffer_overflow. And we will need a master of the art for C programing.

Xavier is once again asked to create a program that creates an array of integers and return the last element.

// My beautifull program the returns the last element of an array of integer.

int main() {
    int number[4] = { 0 };

    return number[4];
}

Xavier now executes the program and…

AsanErrorReporting

Ouch!!! That’s a lot of information for Xavier to take at once.

In order to teach Xavier how this works, we will take a closer look at each part of this output.

In fact, this output is divided into 6 distinct parts described below.

AsanErrorReportingSummary

Error Type

The first part of the output is the error type.

AsanErrorReportingErrorType

It can be decomposed in 3 information.

AsanErrorReportingErrorTypeSummary

Error Type

The error type corresponds to the errors we defined in the introduction. It’s the name of the error we produced.

Here, we can see that we produced a stack-buffer-overflow as expected.

Address

The Address shows us the virtual address we tried to access and which generated a stack_buffer_overflow.

Registers

Finally, ASan will output the state of 3 registers.

Program Counter: Indicating the current instruction.
Base Pointer: The pointer indicating the start of our stack frame (also, the previous SP).
Stack Pointer: The pointer indicating the current position of the top of our stack.

This can help us to know exactly when and where the program failed.

Operation

The second part is the type of operation which triggered the error.

AsanErrorReportingOperation

It can be decomposed in 4 information.

AsanErrorReportingOperationSummary

Operation

The operation tells us what we did during the memory error.

Here, it’s READ, because we are trying to read the memory out of bound. But if we were trying to set a value in an array at an index out of bound, the operation will be WRITE.

Size

It corresponds to the size in bytes of the data we are trying to access. Here, it’s 4 (bytes) because we are trying to access an int, which in x86_64 under the System V ABI is stored under 4 bytes in memory.

Address

The address is the same as before, where exactly did we try to access invalid memory.

ThreadID

The ThreadID correspond of an unique identifier (ID) of the Thread we are currently running. This is only useful if you have a multi-threaded application because it can tell you on which thread the invalid memory access was on.

Backtrace

The third part of the output is a backtrace of your program. This tells you the state of your stack when the invalid memory access was done.

AsanErrorReportingOperation

If you had set the debug symbols, ASan can tell you exactly on which line of your program, the invalid memory access is located. We can see that he tells us that the invalid access is on line 6, which is the return number[4];.

Address

The fourth part of the output is the address where the memory error occurred.

AsanErrorReportingAddress

It can be decomposed in 4 information.

AsanErrorReportingAddressSummary

Address

Same as the last 2, where exactly is the invalid memory access.

Place

This tells us where the invalid access is. In which pool of memory. It can be the stack, or the heap, or global memory.

ThreadID

This is the ThreadID we have seen before. It’s also here because each thread has its own Stack.

So we need to know the Thread associated with each stack.

Offset

This tells us exactly where is the access located in the stack frame.

Memory View

This is the most visual part and represent a snapshot of the memory around the invalid access when the error was thrown.

AsanErrorReportingMemoryView

The memory ASan is showing us here, represents the Shadow Bytes around the invalid memory access.

On the left, we can see the addresses of the memory, and on the right, each pair of digits means 8 bytes of the Application memory.

We can see that we accessed the third number on the line 0x1000_00f9_6e50, and the value stored is f3. But what does it mean ? Maybe we can look down at the last section to know the answer.

Shadow Byte Legend

And you guessed right, the last part corresponds to the Shadow Bytes Legend. It tells you exactly the meaning of the magic values you see in the memory view.

AsanErrorReportingShadowByteLegend

We can then look at f3 which means: Stack right redzone.

That tells us that the memory where the value stored is f3, it’s poisoned memory on the right side of a value in the stack. Which correspond exactly to a stack_buffer_overflow.

We can also see f1 before our variable in the Shadow Memory, which correspond to Stack left redzone, which enables ASan to trigger stack_buffer_underflow errors.

Summary

Here is the summary of the parts we have seen.

AsanErrorReportingParts

Performance

In terms of performance, there are 2 parts to take into account here.

First of all, it’s the memory footprint. ASan will take ~1/8 of the available virtual memory of your program.

This part will be used to store the Shadow Bytes and prevent access to the Shadow Memory.

For the second part, ASan will slowdown your code by an average of 1.93. Which means that your program will run approximately 2 times slower than usual.

This is a good trade-off compared to other memory misuse detector tools.

For example, Valgrind, will slowdown your code to 20 times more of your initial execution time.

ASan produces an overhead (~1.93) and it takes ~1/8 of your application memory, so this needs to be kept in mind when choosing ASan.

Conclusion

Address Sanitizer is a powerful tool to detect memory misuse cases.

It doesn’t hurt too much your performance, and is very easily enabled in your build environment thanks to a compiler/linker flag.

It is a very useful tool that any programmer can understand, and it gives a lot of information to help the programmer find the bug more easily.

I hope that this document will help you and Xavier on your future C projects. By helping you to avoid common memory misuse thanks to Address Sanitizer.