Introduction

When using a macOS system running on Apple Silicon, based on ARM64 architecture, compatibility can be a major challenge for applications that were built for Windows, on x86 hardware. Many applications made for Windows were never compiled to run on other platforms. Straying away from the X86 hardware of Windows computers makes it more complicated to use the software you want as you will not be able to run them natively.

Running Windows x86 applications on a ARM64 macOS system needs multiple translation layers, from CPU instructions to system calls and the API the application expects. Fortunately for us, Apple and CodeWeavers thought about this problem ahead of us, and we are able to analyze their software to understand the solutions they provide.

This article offers a technical analysis of the different tools and techniques used to enable this “cross-execution”.

Topics

What will be covered

In this article, we will explore:

  • An overview of ARM chips’ history and what motivated Apple to make their transition
  • The architectural differences between x86 and ARM
  • The challenge of running Windows x86 applications on ARM based macOS systems
  • The inner works of API reimplementation and system call translation
  • Collaboration of the translation layers in existing applications
  • The limitations of these tools

What won’t be covered

In this article, is not about:

  • Emulation
  • Virtual machines
  • Development or compilation of ARM based software
  • Software cracking

Target audience

This article is aimed at computer science students, software engineers, system programmers and anyone who is interested in low level software and computer architecture.

Being knowledgeable in CPUs and operating systems will be a great help to understand what is being presented. Familiarity with x86 and ARM instructions, ABI, Windows API and system programming can help fully grasp the content of the article.

Small history of ARM chips

ARM processors were created in 1990 for efficiency in low power systems and is widely spread in small embedded devices. They have become more and more powerful over the past decade, such that Apple decided to shift their laptops from Intel x86-64 architecture to their own ARM64 Apple Silicon in June 2020 starting with their M1 chip.

This change has brought more performance and energy efficiency to their laptops, but also compatibility issues with their own older software and even more so on foreign software. The backward compatibility problem was expected, Apple created Rosetta 2 to smoothen the transition.

Since then, Apple has sloppily tried to push software development to be ARM compatible. Five years later, a large portion of applications are still not developed for ARM macOS and many others must rely on Rosetta 2 to be usable.

On the 10th of June 2025, Apple announced the end of Rosetta 2 to be expected in 2026.

Why would one want to run x86 builds on ARM

As the most used architecture for computers, x86 software can be seen as a whole ecosystem with a lot of applications being complementary. Many have become used to this architecture and know what behavior to expect from their computer when using applications simultaneously. Because ARM computers cannot have access to all this software natively, this ecosystem can’t be reproduced and someone used to having any software available will suffer in productivity from having to adapt and find alternatives. Some critical software may also not have any equivalent built for ARM, which could completely cripple the user from doing a given task.
If an unpredicted necessity of using applications that were only built for x86 happened, it would be a hassle to change to a different system just to use one piece of software. Virtual machines that emulate x86 hardware could be an alternative, but a very resource costing one and communication between the emulated device and the hosting device is not immediate.
Being able to run a non-native application as if it were without having to change anything on your system now seems almost too good to be true, but is clearly the better alternative.

Why we need tools

Differences between x86 and ARM architectures

The x86 architecture is a complex instruction set computer (CISC) architecture, while ARM is a reduced instruction set computing (RISC) architecture. Because of this, they work different from each other:

  • x86 has instructions of different lengths, ARM has fixed length instructions
  • ARM has a bigger set of registers
  • The calling conventions are different for passing parameters
  • Memory alignment is different in x86 and ARM, it can cause crashes if not respected
  • x86 processors can run multiple instructions simultaneously, while ARM can run only one
  • ARM aims to be more power efficient, x86 aims to be more performant

Problems encounters with cross-execution

Execution in different architectures without recompilation brings multiple issues:

  • ARM chips cannot decode x86 binary
  • ABI incompatibility can corrupt the stack
  • The system call conventions, parameters and number are different because of both de processor and the operating system differences
  • Some x86 instructions simply don’t exist in ARM
  • Windows binaries use DLLs and Windows instructions that are different from macOS instructions

The Windows NT architecture

The following drawing represents the Windows NT architecture. The later version of Windows were built upon it, such as Windows Server, Windows XP and Windows 7+. We will use this architecture as reference, for it is representative of how Windows systems work while still remaining simple.

We will focus on the core system DLLs and the NTDLL, they will be explained in later parts. The two top layers are given in the application, while the two bottom layers would be in the domain of emulation, they are not relevant to explain.

+---------------------+                      \
|     Windows EXE     |                       } application
+---------------------+                      /

+---------+ +---------+                      \
| Windows | | Windows |                       \ application & system DLLs
|   DLL   | |   DLL   |                       /
+---------+ +---------+                      /

+---------+ +---------+     +-----------+   \
|  GDI32  | |  USER32 |     |           |    \
|   DLL   | |   DLL   |     |           |     \
+---------+ +---------+     |           |      \ core system DLLs
+---------------------+     |           |      / (on the left side)
|    Kernel32 DLL     |     | Subsystem |     /
|  (Win32 subsystem)  |     |Posix, OS/2|    /
+---------------------+     +-----------+   /

+---------------------------------------+
|               NTDLL.DLL               |
+---------------------------------------+

+---------------------------------------+     \
|               NT kernel               |      } NT kernel (kernel space)
+---------------------------------------+     /
+---------------------------------------+     \
|       Windows low-level drivers       |      } drivers (kernel space)
+---------------------------------------+     /

How can we make it work

To counter this problem, we use multiple layers of translations:

  • Convert x86-64 binary into ARM64 binary
  • Adjust the memory layout and instruction calling conventions
  • Provide the same operating system instructions as Windows
  • Map system calls from x86 to their ARM equivalent

System call translation

System calls in Windows

The system calls are made through a DLL (dynamic link library) called NT (new technology), it contains the instructions that are called by other DLLs whenever they need to use system calls. DLLs are x86 binaries. The NT DLL is in direct contact with the kernel and is the layer that actually makes the system calls. Its instruction set is callable directly by the user, or in our case, the application. Inside ntdll.dll, it is allowed to use the syscall instruction, followed by the system call’s number, and it’s parameters.

How to translate system calls

To translate system calls you need to be able to catch some at runtime if you use the base NT DLL, or reimplement ntdll.dll to match the way it is meant to be used in the other libraries.

Runtime system call catch

In Windows x86, the system call numbers and parameters are not the same as in ARM, but when the Windows NT DLL uses one, no error will be detected, instead, it will call another instruction from the ARM instruction set than the intended one. There are two methods to deal with this problem:

  • If the system call has a direct equivalent in ARM, then simply mapping the x86 syscall number to the right ARM will make an efficient and non resource consuming solution. For example, the syscall getpid doesn’t need any parameters, and works the same in x86, just mapping the system call number is perfect to deal with it.
  • If it doesn’t have an equivalent in ARM, reimplementing the behavior of the x86 syscall is necessary. For example, the behavior of the syscall mmap is much stricter in ARM than in x86 as it uses flags like PROT_EXEC, MAP_JIT and MAP_ANON to determine what it must do. The reimplementation of mmap needs to inspect the flags and parameters passed to the syscall and set or unset other flags if needed to match the behavior of the x86 syscall. Some other system calls may need to use multiple native syscalls or choose the right one to fire at the kernel. An interrupt then needs to be set to run the reimplementation whenever the mmap’s system call, number 407, or 197 in hexadecimal, is used in the native ntdll.dll.

This approach has the advantage of needing few updates as syscalls themselves are rarely updated. They are also very well documented which makes it easier to implement.

Reimplement NT DLL

Reimplementing the NT DLL is similar to Windows API reimplementation, except it only needs to be concerned by what ARM syscall should be used when its instructions are called. The goal is to reproduce the exact behavior of the base ntdll.dll with ARM system calls.

NT DLL reimplementation allows a very wide and precise coverage of the base Windows NT DLL. This means a well done homemade NT DLL will be very stable and have few errors.

Limitations

When catching system calls and trying to redirect them or to implement a behavior that would match the x86 one, components that are crucial to make the right decision on which syscall must be done might not be passed. The parameters passed to the NT instructions are not accessible and there might be no way of knowing what the application was exactly trying to do. This means that some system calls cannot be perfectly redirected when caught and can cause exceptions and unwanted behavior. With this solution, there are applications you just cannot run.

To reimplement the NT DLL is a very difficult task, the documentation simply doesn’t exist as it isn’t useful for users and Microsoft wants to protect its operating system from plagiarism. It is also subject to many updates and fixes, which means figuring out how an instruction works is only temporary. Overall, this solution requires continuous hard work to work properly.

Windows API reimplementation

Purpose of Windows API

The Windows API is exposed through .dll files that are linked at runtime. The principal DLLs are kernel.dll, user.dll and gdi.dll, none of which is in direct contact with the kernel. A majority of applications made for Windows communicate with the operating system using the these libraries. They serve as an abstraction layer between the application and the NT DLL. The many uses it has include window management, writing and reading files, access to registries and memory allocation, without it, the applications would try calling instructions that don’t exist.

The documentation on how to use these instructions, as well as their behavior, is specified in the Windows API list.

How the reimplementation is done

The objective in Windows API reimplementation is to replicate the exact behavior of the instructions provided by Windows. The .dll files must be rewritten in a way that it also works as a dynamic library and must provide everything used in the program you want to run.

By doing this, the application can run the same as if it ran on it’s intended operating system. For example, DirectX APIs that can be translated to OpenGL using libraries like DXVK.

Limitations

To be able to reimplement the API, it is required to understand the inner workings of each instruction using only the official documentation and testing their exact behavior. But the documentation is sometimes incomplete, and it would be illegal to copy the official Windows API.

There are many uses of what is considered undefined behavior in applications. By definition, they are not present in the official documentation but are still known and widely used by Windows software developers, therefore it is also crucial to reimplement the same undefined behaviors as the official API. Without them, many applications using this type of workaround would most probably have many issues while running.

The Windows API gets updated, which means there is a need to keep up with yet another long list of instructions with different undefined behaviors you need to implement when the application is using a different version of the API.

Binary Translation

Binaries for x86 and ARM are not the same, they don’t use the same assembly instruction names, register or memory addressing. They also use different flags and their stack do not have the same behavior.
To translate an x86 binary to ARM, it is needed to parse it to isolate the code that needs change and find the right replacement for it. Here is a demonstration of what x86-64 code can be translated into ARM64 code:

x86-64

    mov eax, [rbp-0x4]      ; Copy a value from the stack in eax
    add eax, 1              ; Add 1 to the value
    mov [rbp-0x4], eax      ; Put the new value into the stack
    ret                     ; return
ARM64

    ldr w0, [x29, #-4]      ; Copy a value from the stack in w0 (x29 is the frame pointer in arm)
    add w0, w0, #1          ; Add 1 to the value
    str w0, [x29, #-4]      ; Put the new value into the stack
    ret                     ; return

This method is the only way to run x86 binaries without emulating x86 hardware which is much more power consuming. Some programs write their own code while running, for these, being able to translate binary code in real time is essential, it is called JIT (Just-In-Time) translation.

Limitations

Instructions in x86 don’t always have an ARM equivalent, some will need multiple ARM instructions to reproduce the original behavior. The multiplication of instructions can cause performance issues and make the application run slower.

Some instructions are impossible to reproduce due to hardware differences, these can crash the program as they will not be translated. This type of instruction is called binary incompatible.

JIT slows down the process when first encountering new untranslated binaries. This can result in lags or freezing of the application when the new code is long or complex to translate.

Translation layers structure

       +--------------------------------------+
       |       x86 Windows Application        |
       +--------------------------------------+
                          |
                   Binary Translation
                          |
   +----------------------------------------------+
   |     Translates x86 instructions to ARM64     |
   +----------------------------------------------+
                          |
             Now executing the ARM64 code
                          |
            Windows API Reimplementation
                          |
     +-------------------------------------------+
     |       Handles Windows API functions       |
     +-------------------------------------------+
                          |
      The reimplemented API can invoke syscalls
                          |
               System Call Translation
                          |
  +------------------------------------------------+
  | Translates Windows x86 syscalls to macOS ARM64 |
  +------------------------------------------------+
                          |
                          v
                     macOS kernel

Existing applications

In practice, there are already existing tools that do these exact same translations:

  • Rosetta 2: Made by Apple after their transition to ARM architecture. This application handles binary translation from x86 to ARM as well as runtime system call catch and remap. It checks if an application contains x86 binary and translates it without any participation from the user.
  • WINE and CrossOver: Made by CodeWeavers to be able to run Windows applications on other systems. These provide reimplementations of the Windows API as well as Windows NT, running the application through them gives it all the libraries it needs. They are also very flexible and allows the user to change its behavior to match the application’s need. For example, being able to choose between the homemade NT DLL, or the official Windows NT and let Rosetta handle the syscall translation can change a lot in the application’s behavior.

Conclusion

Apple shifting its development from x86 to ARM hardware has improved power efficiency and performance to its computers, but also created a compatibility problematic with software made for x86 architectures.

This technical problem was met with a layered solution: Binary Translation, System Call Translation, and Windows API reimplementation. These work together to provide an environment where applications get the tools they are familiar with, while making it able for the macOS kernel to understand what the application needs.

This approach still has limits. The application can be too complex, rely on specific hardware features, or depend on incomplete parts of the API reimplementation.

The best solution to run software would be native ARM support, nevertheless, translation layers eases the way until we can get to applications made for ARM and will let us run older software that will never get an ARM build.

Running x86 Windows binaries on ARM64 macOS is a challenge, but also the opportunity to be more flexible on the systems we use in the future.

Bibliography

About the Rosetta translation environment
Why is Rosetta 2 fast ?
Wine Architecture Overview
Wine Kernel modules
Windows API list