Navigating the Complexity of Emulating Diverse Microcontroller Architectures - Pre-Emulation, Part I

Introduction

Microcontrollers are the backbone of countless embedded systems, ranging from consumer electronics to industrial automation and medical devices. Each architecture brings its unique capabilities tailored to specific application needs. However, developing, testing, and analyzing firmware across diverse architectures presents significant challenges: Physical hardware is often not accessible, and testing directly on target devices can be risky and inefficient, especially when handling sensitive or high-risk systems. This highlights the need for emulation.

Emulation offers a powerful alternative for developers to replicate microcontroller functionalities in a virtual environment, providing a controlled space to load, test, and debug firmware. This reduces reliance on physical boards and devices, accelerates the development process and enhances the ability to reverse-engineer or explore firmware behavior, which is critical for security research and failure analysis.

Several specialized tools, such as Renode, Proteus, MPLAB, tackle the complexities of emulating various microcontroller architectures, but this article will focus on QEMU.

QEMU (Quick EMUlator) is an open-source machine emulator that leverages Dynamic Binary Translation (DBT) to emulate various architectures, including x86, PowerPC and ARM, across different host systems. It features a dynamic translator that converts target CPU instructions into micro-operations coded in C, optimizing performance by storing these converted host instructions in a translation cache for efficient reuse. Its internal Tiny Code Generator (TCG) scans and translates the guest binary into equivalent host assembly code, using an intermediate representation (IR) for seamless code manipulation. This capability allows QEMU to re-host and execute firmware in an emulated environment that faithfully replicates the behavior of the original hardware.

Despite its versatility, challenges arise in emulating microcontrollers due to their unique performance and compatibility requirements. This leads to the question: What are the challenges of emulating diverse architectures, especially when considering architectures like ARM, AVR, and PIC ?

This article focuses on the pre-emulation of ARM, AVR, and PIC architectures using QEMU. For effective emulation, developers must first navigate several challenges, such as obtaining and unpacking firmware and configuring the emulator. This Pre-Emulation stage involves some of the essential steps before emulation, including firmware structure analysis, understanding memory layouts, and determining runtime base addresses accurately.

While QEMU supports ARM and AVR, it lacks support for PIC, necessitating alternative solutions. We will explore how QEMU handles ARM and AVR emulation and discuss the possible reasons of the absence of PIC support.

Before diving in, readers should have a foundational understanding of emulators, microcontroller architectures, and firmwares. This knowledge will enhance comprehension of the methodologies and challenges discussed in this article.

Note: Code segments provided in this article are adapted for clarity and are not 100% accurate. Always check out the actual source code!

Firmware formats

In order to explain how firmwares are loaded and executed, we must first breakdown a list of the firmware formats used by the MCUs we are discussing:

ARM MCUs commonly use Executable and Linkable Format .elf binaries, with the classic .text, .data and .bss key sections among others, as well as other specific sections such as .isr_vector for Cortex-M processors which Contains the addresses of every Interrupt Service Routine (ISR). Other commonly used formats are Intel HEX and .bin formats.

AVR share similarities with ARM in terms of format usage but often emphasize smaller firmware sizes. They commonly utilize ELF files, which have a similar section structure tailored for AVR architectures, alongside Intel HEX and binary formats.

PIC however, diverge from the ELF format, predominantely relying on .hex, .bin as well as .s19 and .sec (Motorola S-Record) formats for programming. Some tools, such as MPLAB X, can output ELF binaries.

QEMU includes a file called loader.c located in hw/core/, which is dedicated to handling the loading of firmware images into emulated memory. This file supports multiple firmware formats, allowing QEMU to emulate various microcontroller architectures seamlessly. Among the supported formats are ELF, U-Boot, and Intel HEX, as well as specialized formats like Ramdisk images, compressed binaries, and files mapped to specific memory regions.

Memory Layout Configuration

Understanding the memory layout of a microcontroller is crucial for understanding how different types of memory, such as RAM, Flash, and Memory-Mapped I/O (MMIO), are arranged within various microcontroller architectures. A well-defined memory layout helps set the starting addresses for these memory types and ensures proper access during firmware execution, which is why developers go straight to the datasheet when available.

ARM

The ARM architecture employs a flexible memory layout characterized by a linear address space, often segmented to enhance memory management and efficiency. This organization includes multiple memory regions, such as code, data, and peripheral spaces, along with options for implementing memory protection and caching depending on the model.

Let’s take the STM32F42xxx series as an example: It features embedded flash memory with a capacity of up to 2 Mbytes, utilizing a dual bank architecture. Each bank consists of a main memory block of 1 Mbyte, divided into 4 sectors of 16 Kbytes, 1 sector of 64 Kbytes, and 7 sectors of 128 Kbytes. Additionally, there is 512 OTP (one-time programmable) bytes for user data, alongside option bytes for configuring read/write protection and other functionalities. The dual bank feature is enabled on 1 Mbyte devices by setting the DB1M option bit, allowing restructuring of the last 512 Kbytes of memory, which changes the sector organization from 12 sectors in single bank mode to 16 sectors in dual bank mode.

STM32F42XXX_FLASH

STM32F429_MEM

AVR

The AVR architecture features a linear and regular memory map, with Program Flash memory divided into two sections: the Boot Program section and the Application Program section, each equipped with dedicated lock bits for write and read/write protection. The Store Program Memory (SPM) instruction, necessary for writing to the Application Flash memory, must reside in the Boot Program section, while return addresses during interrupts and subroutine calls are stored on the stack allocated in the general data SRAM, which limits stack size only by total SRAM capacity.

Specifically, the ATmega328P microcontroller includes 32 Kbytes of in-system reprogrammable flash memory, organized as 16K×16. The I/O memory space contains 64 addresses for CPU peripheral functions, accessible directly or within the data space from 0x20 to 0x5F, with an extended I/O space from 0x60 to 0xFF in SRAM. User programs must initialize the Stack Pointer (SP) in the Reset routine, and the data SRAM is accessible through five different addressing modes.

ATMEGA328P_FLASH

ATMEGA328P_MEM

PIC32

Most PIC32 microcontrollers are based on MIPS architecture, which complicates the memory layout due to its unique segmentation. MIPS features several memory segments: KUSEG, KSEG0, KSEG1, and KSEG2, each serving distinct purposes. There are logical reasons for the existence of the memory segments: Caches in MIPS need to be initialized by boot code. The memory management unit (MMU) being optional, it is useful to have explicit physical memory regions reserved for the kernel, and not accessible by user mode code.

The PIC32MX family of microcontrollers features a unified virtual memory address space of 4 GB, where instructions and data share the same memory. This architecture incorporates a Memory Management Unit (MMU) that maps virtual addresses issued by the MIPS core to physical addresses, while a multi-layer system bus allows concurrent instruction and data access. Additional features include a 32-bit native data width, separate user and kernel mode address spaces, flexible partitioning for program and data Flash memory, robust exception handling, and cacheable and non-cacheable address regions.

In the context of the PIC32 architecture, the memory segments KSEG0 and KSEG1 are particularly significant. Both segments translate to the same physical address of 0x0, encompassing all program Flash and data memory. However, they differ in their caching behavior:

KSEG0 segment is cacheable, meaning that the CPU can store frequently accessed data in the cache, enhancing the speed of read and write operations.
KSEG1, in contrast, is non-cacheable, which makes it crucial for certain operations that require guaranteed access and consistency. It also provides virtual address space translation to the Special Function Registers (SFRs) for PIC32MX family devices.

PIC32_MEM

The following diagram shows where cacheable/non-cacheable KSEG0/KSEG1 virtual memory segments are mapped into physical memory on PIC32MX795F512L. Remember that the KSEG0/KSEG1 addresses don’t occupy the same physical addresses !

PIC32_PHYS_MEM

To emulate a microcontroller, the emulator must replicate its memory layout. QEMU tackles this by defining C structs, including the ones in qemu/include/exec/memory.h, that allocate memory according to each microcontroller’s datasheet, whenever possible. Indeed, it lacks some types of memory, as QEMU doesn’t fully emulate is AVR’s EEPROM. Since EEPROM is essential for non-volatile storage, developers may face challenges when working with EEPROM-dependent code, requiring workarounds.

As we can see in hw/avr/atmega.c and include/hw/misc/unimp.h, the function pretty much does as its name says:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


/**
 * create_unimplemented_device: create and map a dummy device
 * @name: name of the device for debug logging
 * @base: base address of the device's MMIO region
 * @size: size of the device's MMIO region
 *
 * This utility function creates and maps an instance of unimplemented-device,
 * which is a dummy device which simply logs all guest accesses to
 * it via the qemu_log LOG_UNIMP debug log.
 * The device is mapped at priority -1000, which means that you can
 * use it to cover a large region and then map other devices on top of it
 * if necessary.
 */
    create_unimplemented_device("avr-eeprom",       OFFSET_DATA + 0x03f, 3);

I found an article that speaks about this for STM32 EEPROM Emulation.

Basically, STM32 microcontrollers achieve EEPROM emulation through the use of on-chip flash memory, which behaves differently than traditional EEPROM. This is managed through dedicated software routines that handle wear leveling and power interruption robustness. In QEMU’s case, AVR microcontroller’s flash is set up with a fixed memory model and lacks built-in flexibility for emulation of EEPROM, unlike STM32 where ST provides an emulation driver (like X-CUBE-EEPROM). In AVR, QEMU does not inherently emulate EEPROM behavior because AVR’s program memory doesn’t directly support frequent rewrites typical of EEPROM without additional handling.

Determining Base Address

One of the first hurdles when reverse engineering a raw firmware binary is pinning down the runtime base address of the image. Anyone who has tackled the delightful challenge of reverse engineering a bootloader or a raw embedded Linux kernel knows (all too well) that this journey is rather frustrating.

Understanding the distinction between position-dependent and position-independent programs is crucial for grasping how QEMU functions.

Position-independent programs can run correctly regardless of their memory location, dynamically computing pointers relative to the program counter or through runtime relocations, whereas position-dependent programs must execute from a specific memory address, relying on absolute addresses for strings and functions, which requires the correct base address during loading.

Most user-space applications on modern desktop systems are position-independent, as compilers like clang and gcc use the –fPIC flag by default, generating binaries in formats like ELF and PE with loading instructions. Conversely, bare-metal programs in resource-constrained environments, such as bootloaders and embedded Linux kernels, are often position-dependent to conserve space and enhance efficiency, starting their entry points at the file’s first byte. For these binaries, reverse engineers need to manually determine the base address for effective analysis.

Without loading a position-dependent binary at the correct base address, pointers meant to reference strings, functions, and data variables will point to incorrect locations. This severely limits analysis.

For example, let’s take this function, that sends a simple "Goodbye, GISTRE" via UART on an STM32F429:

1
2
3
4
5
6
7
8


static void gistre_goodbye_interrupt(void)
{
  const char *msg = "Goodbye, GISTRE\r\n";
  HAL_StatusTypeDef status;

  status = HAL_UART_Transmit_IT(&huart1, (const uint8_t *)msg, strlen(msg));
  (void)status;
}

If the binary is loaded at the wrong base address 0x0000000, the pointers, meant to point to string at 0x8002d44, point to memory regions that aren’t backed by the file.

BASE_ADD_DIFF

Determining the base address in microcontroller emulation is crucial, as it establishes the starting point in memory for loading and executing firmware, directing the processor to its initial instructions and interrupt vectors. Tools like angr’s GirlScout attempt to identify this address by mapping function locations and control flow, while Firmalice leverages jump tables to align absolute jumps. These methods, along with statistical analyses, enable accurate base address detection when binaries lack metadata or symbols.

Conclusion

Emulating microcontroller architectures is no small feat, and presents a lot of challenges that go beyond the hardware itself. While the first thing that often comes to mind is the hardware emulation, firmware formats are just as critical, even if they don’t always get the attention they deserve. Parsing firmware correctly is a foundational step, as the emulator needs to comprehend the format to load and execute it properly. Proprietary or obscure formats can complicate this process, often leading developers down the path of reverse-engineering unknown structures.

Equally challenging are the memory layout and the types of memory used. Each architecture organizes its memory in unique ways, and this variability forces emulators to adapt in order to correctly map and emulate those configurations. In some cases, like with PIC, the complexity multiplies with specialized or diverse memory segments. These challenges are closely linked to the issue of base addresses, which is often overlooked until it becomes a roadblock. LIke most steps, it requires extensive reading of datasheets and documentation because, without it, the emulator won’t be able to load or execute the firmware properly, rendering the entire emulation process ineffective.

While the topics we’ve covered already highlight some of the challenges that make the creation of a generic emulator far from straightforward, they’re just the beginning (otherwise, we wouldn’t still be here). In the next part, we’ll examine the process of determining the entry point after establishing the base address, and explore how differences in Instruction Set Architecture impact the analysis of program control flow. To better illustrate one of the solutions for managing these complexities in emulation, we will also review relevant sections of the QEMU source code.

Sources

Tools

Binwalk GitHub Repository: Repository for Binwalk, a tool for analyze, extract and reverse-engineer firmware images.
Binary Ninja: Official website of Binary Ninja, a binary analysis platform used for reverse engineering.
angr GirlScout Analysis: Part of the angr GitHub repository, specifically pointing to the GirlScout analysis component for advanced binary analysis.
Firmadyne GitHub Repository: Repository and documentation for Firmadyne, a framework for emulating and analyzing firmware of embedded Linux-based devices.
QEMU GitHub Repository: The official repository for QEMU, open-source machine emulator and virtualizer.

ARM

ARM Developer Documentation: ARM’s official documentation covering architecture and programming for ARM-based systems.
Position-Independent Code Bootloader Article: Blog post discussing position-independent code and bootloader development for ARM Cortex-M series.
ARM Image Structure and Entry Points: Guide by ARM on image structure, entry points, and generation for ARM-based firmware.
ARM ELF for the ARM Architecture: Detailed ELF specification for the ARM architecture.
STM32 EEPROM Emulation: Application note by STMicroelectronics detailing methods for emulating EEPROM on STM32 microcontrollers.

AVR

Microchip Developer Site: Microchip’s official site with developer resources for AVR and other microcontrollers.
ATmega328P Datasheet: Datasheet for the ATmega328P, one of the most popular AVR microcontroller.

PIC

PIC64 Overview: Article discussing Microchip’s PIC64 family, a RISC-V multicore processor.
Microchip PIC64 MPU Documentation: Official Microchip page for PIC64.
Sergev Vakulenko’s QEMU: Wiki for QEMU with information on emulation, including specific considerations for PIC microcontrollers.
PIC32 Memory Organization Overview: Documentation covering the memory organization of the PIC32 microcontroller family.
PIC32 Exception Mechanism and Entry Points: Overview of exception handling and entry points for the PIC32 series.
PIC32 Family Reference Manual: Reference manual for the PIC32 family of microcontrollers.

Other

Mastering the GNU linker script: Blog post about writing and optimizing GNU linker scripts (self explanatory really).
Challenges in Firmware Re-Hosting, Emulation, and Analysis: Overview of system emulation and firmware re-hosting, outlining common challenges, classification methods, and tools to aid practitioners and researchers in selecting and applying suitable emulation techniques.
Oracle’s ELF Documentation: Oracle’s documentation on ELF files, providing insights into ELF file format specifics.