Prerequisite
For this article, you’ll need the following knowledge:
- Basic C understanding (Memory, Stack, Heap, Syscall).
- (Optional) x86_64 Assembly
Preamble
Xavier Login is a freshly employee who got his internship in a big company. On his first days, he was asked to do a little program to validate an input by verifying that a magic byte was set.
So this was the first attempt of Xavier:
magic_checker.c
|
|
Xavier was in a good school and never forgot about enabling ASan flags with compiling c code.
Xavier screams internally seeing this big red line with written “stack-buffer-overflow”, knowing he did a big mistake.
Xavier looks at his code and finally finds the error, it was obviously the index in the function is_magic_byte_valid
because the team asked him to check the 42nd index, not the 10th one !
This is the new code he came up with:
magic_checker.c
|
|
After fixing it, Xavier compiles it, and executes it again.
This time, he executes the program and… nothing happens !?
Here is what Xavier drew to represent the situation.
And this doesn’t make sense for him. Why is the first error reported, and not the second ?
But ! Don’t worry Xavier, this document will help you learn and understand, how ASan works under the hood to show you (or not), invalid memory accesses and other memory mistakes.
This will also help you understand exactly what ASan is telling you when an error occurred.
Introduction To Address Sanitizer
AddressSanitizer (ASan) is a memory misuse detector tool for C and C++.
It’s a tool who lives in a compiler toolbox and uses dynamic analysis. Most modern compilers for C/C++ includes ASan supports.
List of compilers that supports ASan:
- GNU GCC (since ver. 4.8)
- Clang LLVM (since ver. 3.1)
- MSVC (since ver. 16.4)
ASan is a tool divided into 2 modules:
- Instrumentation module
- It consists on a compiler pass that will add instructions to our code on specific parts.
- Run-time library
- The library implements functions to replace used memory functions (like malloc(3)).
- The library implements functions to report errors nicely to the user.
For this post, we will look at all the types of errors that ASan can detect.
Then, we will start to see how ASan works by explaining it’s core concepts.
And finally, we will also understand how ASan uses both of the modules to provide an efficient and fast memory error checking.
Error types
ASan can detect several classes of memory errors in C/C++.
Memory Errors
- Global buffer underflow/overflow
- Stack buffer underflow/overflow
- Heap buffer underflow/overflow
- Initialization order bugs
- Use after return
- Use after scope
- Use after free
- Memory leaks
Core Concepts
Let’s see what methods ASan can use to detect memory errors.
Memory Mapping
It will first modify the structure of the Virtual Memory for the program.
The virtual address space used by a program is now divided in 3 parts:
- Application memory: The application still uses the memory normally and can store all of its data.
- It consists of ~7/8 of the virtual memory space.
- Shadow memory: This part of the memory is handled by ASan.
- It takes ~1/8 of the virtual memory space.
- Protected memory: This parts of the memory is used by ASan to detect unwanted access to Shadow Memory.
Memory Mapping for each architecture is defined here: https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.2.0/libsanitizer/asan/asan_mapping.h
Shadow Memory
The Shadow Memory is a part of the virtual memory used to store metadata about the data stored in the Application Memory of this virtual memory.
Each byte in the shadow memory, correspond to exactly 8 bytes in the application memory.
ASan access the Shadow Memory via a function MemToShadow
which will map the application memory address to the shadow memory address.
The goal of this function is to be fast and that it can allow ASan to map an address like such:
shadow_address = MemToShadow(application_adress);
Each bit in the Shadow Memory can be then analyzed be ASan, and identify the wrongly accessed byte in the application memory.
ASan is able to detect memory errors thanks to the bytes contained in the Shadow Memory, in conjunction with a method called Infection.
Infection
The Infection will poison some bytes allocated in Application Memory and store information about which bytes has been poisoned in the Shadow Memory.
The Infection takes place during a malloc()
call (we will see later how can ASan modifies the behavior of malloc()
).
|
|
ASan manages to allocate everything (even the stack, which we’ll see later), using a call to malloc()
.
This is used to ease and simplify the infection done by ASan because everything stored in virtual memory, will go through a malloc()
call, thus infecting our program memory.
For example, let’s allocate a simple integer on the heap.
|
|
The memory around the my_int
pointer will look like:
Poisoned
A byte is Poisoned by ASan in the Application memory. Poisoning a byte means to write a special value to it. This value is then used to identify the type of invalid memory access we are doing.
The goal of the Shadow Memory is only to store information on where the poisoned bytes are in the Application Memory.
We said that, 1 byte in the Shadow Memory corresponds to 8 bytes in the Application Memory. Thus, we can represent the status (poisoned) of 1 byte in Application Memory, with only 1 bit in Shadow Memory.
And we can do this, because all memory allocated by our program will go through a malloc()
.
Moreover, the 8 bytes will always be free to modify, because malloc()
is guaranteed to always return an 8-byte aligned chunk of memory.
We will see in later in this document, the list of all the possible values of a byte in the Shadow Memory.
Conclusion on ASan core concepts
As we can see, ASan modifies the usage of the underlying virtual memory as it needs to ensure that we access the right bytes while accessing memory.
This allows it to check efficiently for errors while only impacting the running program by only removing ~1/8 of the total available memory for the application.
Thanks to what we have seen already, you can then see why Xavier Login first error was caught, but, you are not sure why the second error was not caught by ASan. It should have detected it if ASan poison memory around the variable.
Unfortunately, one limitation of ASan, is that it can only poison bytes near an allocated memory. Thus, there are places in memory not yet poisoned, and ASan can miss out of bound accesses if it’s too far from a known allocation point.
Now that you know the core concepts of ASan, we are able to deep dive into its source code and see how it uses instrumentation and a run-time library to do all the necessary checks and infection.
Instrumentation Module
The goal of the Instrumentation Module is to add run-time checks before every memory instruction.
Initialization
The first instrumentation will be located at the module initialization and will add a call to __asan_init_vN()
where N
is the desired API version (we will not look at API differences).
This function will be called at module initialization time in order for ASan to be initialized.
Instuctions
Then, it needs to instruments certain types of instructions.
Actually, it only modifies 2 types of instructions:
- load: When performing a load instruction (read).
- store: When performing a store instruction (write).
The code:
|
|
Will be enhanced with memory checks, like follows:
|
|
File: gcc/asan.cc
It’s exactly the same for a store instruction. (ASan just update the __asan_report_loadN
to __asan_report_storeN
).
Stack
The module will also instrument variables on the stack.
It will do so by creating a call for __asan_stack_malloc()
which will take care of the creation of the Shadow Bytes in the stack alongside the stack variable.
Run-time Library
The Run-time library defines all the functions that we have seen in the instrumentation part.
It defines all the functions needed for ASan to work around the memory (poisoning bytes and checking for memory accesses), while also reporting error reports.
It consists of:
- Mapping from/to Shadow Memory to/from Application Memory.
- Interception of memory related functions.
- Handling the Stack and the Fake Stack implementation.
- Functions to check if the memory accessed is (un)valid.
- Functions to report errors when the check failed.
Shadow and Application Address Mapping
The functions which are the most utilized through the program will be the functions to convert addresses from (or to) Shadow Memory to (or from) Application Memory.
These functions are defined in the asan_mapping.cc
in order to change scale and offset depending on the architecture.
|
|
For both functions, it checks whether the address is really in the specified region first.
Then it converts the address thanks to the 2 macros MEM_TO_SHADOW
and SHADOW_TO_MEM
.
These functions are inlined in order to reduce function overhead because it will be called a lot of times. The faster this function is, the lower the performance impact ASan instrumentation will force on target code.
Interceptor
One of the main challenges for ASan is to intercept all functions that need to alter memory.
For example, the famous malloc/calloc/free
functions needs to be intercepted in order to poison memory before giving it back to the user.
In order to do this, ASan has a macro called INTERCEPTOR
.
It takes in parameter the return type, the function name and the arguments of the function we want to replace.
Let’s see for example the Interceptor for malloc()
.
|
|
Now, every time we call malloc()
, we will go through the function defined above, and not the libc malloc()
.
Stack
ASan can work with 2 “modes” for the Stack. It can stay with the basic Stack, or create a Fake Stack.
It can only do one stack mode at runtime, not both.
Normal Stack
For the normal Stack, the code has already been instrumented and it will call the __asan_stack_malloc_N()
function which will allocate our local variable, as well as poison the memory around it.
The function __asan_stack_malloc_N()
will call a function in the ASan Allocator currently selected, which depends on the type of Stack.
For the Normal Stack, the default Allocator is selected, and only local variables will be poisoned.
Fake Stack
The Fake Stack is used when doing User after return error detection.
ASan will use a special Allocator that will create a Fake Stack as well as Fake Frames that will be poisoned after being returned from.
Poisoning
In order to poison Memory, ASan give a simple function bool PoisonShadow(uptr addr, uptr size, u8 value)
.
Then, it can simply give the size
of bytes at addr
it wants to set to value
.
On Linux, it’s simply a call to memset()
.
Error Reporting
In this part, we will analyse the ASan output when an error occurred, and see what we can now understand from each part.
For this, we will take an example of a stack_buffer_overflow
. And we will need a master of the art for C programing.
Xavier is once again asked to create a program that creates an array of integers and return the last element.
// My beautifull program the returns the last element of an array of integer.
int main() {
int number[4] = { 0 };
return number[4];
}
Xavier now executes the program and…
Ouch!!! That’s a lot of information for Xavier to take at once.
In order to teach Xavier how this works, we will take a closer look at each part of this output.
In fact, this output is divided into 6 distinct parts described below.
Error Type
The first part of the output is the error type.
It can be decomposed in 3 information.
Error Type
The error type corresponds to the errors we defined in the introduction. It’s the name of the error we produced.
Here, we can see that we produced a stack-buffer-overflow
as expected.
Address
The Address shows us the virtual address we tried to access and which generated a stack_buffer_overflow
.
Registers
Finally, ASan will output the state of 3 registers.
- Program Counter: Indicating the current instruction.
- Base Pointer: The pointer indicating the start of our stack frame (also, the previous SP).
- Stack Pointer: The pointer indicating the current position of the top of our stack.
This can help us to know exactly when and where the program failed.
Operation
The second part is the type of operation which triggered the error.
It can be decomposed in 4 information.
Operation
The operation tells us what we did during the memory error.
Here, it’s READ
, because we are trying to read the memory out of bound.
But if we were trying to set a value in an array at an index out of bound, the operation will be WRITE
.
Size
It corresponds to the size in bytes of the data we are trying to access.
Here, it’s 4 (bytes) because we are trying to access an int
, which in x86_64
under the System V ABI
is stored under 4 bytes in memory.
Address
The address is the same as before, where exactly did we try to access invalid memory.
ThreadID
The ThreadID correspond of an unique identifier (ID) of the Thread we are currently running. This is only useful if you have a multi-threaded application because it can tell you on which thread the invalid memory access was on.
Backtrace
The third part of the output is a backtrace of your program. This tells you the state of your stack when the invalid memory access was done.
If you had set the debug symbols, ASan can tell you exactly on which line of your program, the invalid memory access is located.
We can see that he tells us that the invalid access is on line 6, which is the return number[4];
.
Address
The fourth part of the output is the address where the memory error occurred.
It can be decomposed in 4 information.
Address
Same as the last 2, where exactly is the invalid memory access.
Place
This tells us where the invalid access is. In which pool of memory. It can be the stack, or the heap, or global memory.
ThreadID
This is the ThreadID we have seen before. It’s also here because each thread has its own Stack.
So we need to know the Thread associated with each stack.
Offset
This tells us exactly where is the access located in the stack frame.
Memory View
This is the most visual part and represent a snapshot of the memory around the invalid access when the error was thrown.
The memory ASan is showing us here, represents the Shadow Bytes around the invalid memory access.
On the left, we can see the addresses of the memory, and on the right, each pair of digits means 8 bytes of the Application memory.
We can see that we accessed the third number on the line 0x1000_00f9_6e50
, and the value stored is f3
. But what does it mean ? Maybe we can look down at the last section to know the answer.
Shadow Byte Legend
And you guessed right, the last part corresponds to the Shadow Bytes Legend. It tells you exactly the meaning of the magic values you see in the memory view.
We can then look at f3
which means: Stack right redzone
.
That tells us that the memory where the value stored is f3
, it’s poisoned memory on the right side of a value in the stack.
Which correspond exactly to a stack_buffer_overflow
.
We can also see f1
before our variable in the Shadow Memory, which correspond to Stack left redzone
, which enables ASan to trigger stack_buffer_underflow
errors.
Summary
Here is the summary of the parts we have seen.
Performance
In terms of performance, there are 2 parts to take into account here.
First of all, it’s the memory footprint. ASan will take ~1/8 of the available virtual memory of your program.
This part will be used to store the Shadow Bytes and prevent access to the Shadow Memory.
For the second part, ASan will slowdown your code by an average of 1.93. Which means that your program will run approximately 2 times slower than usual.
This is a good trade-off compared to other memory misuse detector tools.
For example, Valgrind, will slowdown your code to 20 times more of your initial execution time.
ASan produces an overhead (~1.93) and it takes ~1/8 of your application memory, so this needs to be kept in mind when choosing ASan.
Conclusion
Address Sanitizer is a powerful tool to detect memory misuse cases.
It doesn’t hurt too much your performance, and is very easily enabled in your build environment thanks to a compiler/linker flag.
It is a very useful tool that any programmer can understand, and it gives a lot of information to help the programmer find the bug more easily.
I hope that this document will help you and Xavier on your future C projects. By helping you to avoid common memory misuse thanks to Address Sanitizer.