Initial patches for ARM64EC (Windows 11) now posted

ARM64EC is a new Windows ABI designed to allow running native ARM64 code in the same process as x64 emulated code. Basic documentation is available at Understanding Arm64EC ABI and assembly code | Microsoft Docs .

I’ve posted an initial series of patches, starting with ⚙ D125411 [ARM64EC 1/?] Add parsing support to llvm-objdump/llvm-readobj. , which lay the groundwork for supporting this ABI in LLVM and clang. The major missing piece with those patches is the support for what the ABI calls “entry thunks” and “exit thunks”; I’m planning to work on that next.

I expect Microsoft will make more compiler-oriented documentation available at some point, but not sure when. (In particular, the hybmp$x section, which describes the thunks emitted by the compiler, currently isn’t documented at all.)

Please ask here if you have any general questions about this effort, or if you want to coordinate work.

Thanks for looking into this! I have a general interest in the area, but I haven’t really had time to look into it properly, so I don’t have much concretely to add. Thanks for the link to the documentation - when ARM64EC was released initially, those docs weren’t available yet.

Hi efriedma, I am also interested in this too. For now I see you already send 10 patches.
What could I help you work on it? Is there anything haven’t done yet?

D126811 currently has a few missing pieces:

  • We don’t currently try to share thunks between different calls. I’m currently looking into this. (The Microsoft compiler implements a name mangling scheme, but I’m not sure that’s actually necessary in practice; they probably get folded together across modules by ICF anyway.)
  • Thunks for varargs calls are not implemented. I haven’t really looked closely at this. The code sequence required is a little complicated; there’s a pointer to the varargs arguments in x4, and the thunk has to allocate a variable amount of space for them, and copy them. Not planning to look at this soon; this might be something to look at if you want to jump in. Maybe a little complicated if you’re not familiar with ISel. (The code to emit the copy probably ends up in AArch64TargetLowering::LowerCall.)

The big missing piece is thunks for direct calls and function definitions. The hard part isn’t really the thunks themselves; the issue is how they’re represented in object files. The information is supposed to be encoded in the hybmp$x section. The basic idea is pretty straightforward: the section is a table. Each row of the table contains the symbol index of a function, the symbol index of its corresponding thunk, and a flag indicating what kind of thunk it is.

I was hoping for some documentation on the hybmp$x section before I start diving too deeply into that; the basic idea is simple, but there are a lot of complicated details, and I was having trouble figuring out why MSVC does exactly what it does. But not sure when that’s going to happen. (I have some contacts at Microsoft who are looking into it, but not sure when I’m going to hear back.) I might start on this without documentation; I can probably come up with something mostly working without the docs.

Entry thunks require support for treating additional registers as callee-save. This depends on the save_any_reg unwind code; this should be be easy to implement, but the exact encoding of save_any_reg isn’t documented yet.

Beyond that, I also need to look into adjustor thunks at some point, but that’s probably not critical for most uses of arm64ec.

I think that covers all the significant features I know about that still need to be implemented?

Thanks for the explanation. Not sure I can handle it or not , but I will try to start from varargs call support.

Microsoft has posted some additional documentation. (Actually, it looks like they posted it a few weeks back, but I didn’t notice.)

(Also available at cpp-docs/arm64ec-windows-abi-conventions.md at main · MicrosoftDocs/cpp-docs · GitHub ; the tables are a bit easier to read there.)

Most of this is already represented in the patches I posted, but it summarizes a bunch of information in a convenient way. Notes on new stuff:

  • The bits about special behavior related to “blr x16” are new to me.
  • I don’t think I’ve ever seen MSVC use __os_arm64x_check_call.
  • It looks like they decided not to document __os_arm64x_dispatch_icall/__os_arm64x_dispatch_icall_cfg. Not sure why.

Some more details about arm64ec, with assembly code examples for the thunks.

1 Like

Thanks for your work! I’m interested in helping those efforts as well. We would like to use it for the Wine project (www.winehq.org). I noticed that you concentrated on compiler support, so I started looking at llvm-lib support meantime and I was thinking about looking at linker next. Please let me know if there is a better way to contribute. Admittedly I’m new to LLVM, so it may take me a while to get up to speed.

On the compiler side, dealing with all the edge cases for thunks and symbol mangling and aliases has turned out to be more complicated than I initially expected. I’ve been continuing work on it on an internal branch. I think I’ve got it mostly complete at this point… I’ll try to post an update in the next couple weeks or so.

I haven’t looked at the linker side of it at all, and I don’t have any immediate plans in that direction; just been treating MSVC link as a black box (and cursing at uninformative “internal error” messages). A couple notes about undocumented stuff you’re like to run into:

  • ARM64EC uses IMAGE_WEAK_EXTERN_ANTI_DEPENDENCY heavily; implementing that is probably a good first step for anything linker-related, since it’s mostly orthogonal to everything else. (I’m not sure what the exact semantics are.)
  • The association between functions and their corresponding thunks is established using a table in the section hybmp$x. Each row in the table is three 32-bit values; the first is the symbol index of the function, the second is the symbol index of the associated thunk, the third is a flag indicating what kind of thunk it is. 1 is an entry thunk, 4 is an exit thunk, 0 is a runtime code patching thunk.

Let me know if you have any other questions I might be able to answer.