LLVM + MASM (llvm-ml)

Hi all,

Just a quick update on the llvm-ml project. As a refresher, this was a proposal to add native MASM support to LLVM’s assembly capabilities, as part of supporting cross-platform Windows compilation.

A large set of directives are now supported, with a growing suite of tests. After the next outstanding chain of commits (tip) adds macro procedures and macro functions (including local symbols), this will include most of the headline features for the language. (STRUCTs have been in place for some time now, including type inference for size-checking.) Features not yet in place include:

  • RECORDs,
  • ASSUME,
  • the ALIGN/ORG/EVEN directives,
  • the GOTO directive (within macros),
  • anonymous labels, and
  • built-in macros (and the string-manipulation directives).

Anyone else interested in reviewing some of the commits around this project?

Thanks,

  • Eric

Hi,

Just a quick update on the llvm-ml project. As a refresher, this was a
proposal to add native [MASM][1] support to LLVM's assembly capabilities, as
part of supporting cross-platform Windows compilation.

[1]: https://docs.microsoft.com/en-us/cpp/assembler/masm/microsoft-macro-as
sembler-reference?view=vs-2019

A large set of directives are now supported, with a growing suite of tests.
After the next outstanding chain of commits ([tip][2]) adds macro procedures
and macro functions (including local symbols), this will include most of the
headline features for the language. (STRUCTs have been in place for some
time now, including type inference for size-checking.) Features not yet in
place include:
* RECORDs,
* ASSUME,
* the ALIGN/ORG/EVEN directives,
* the GOTO directive (within macros),
* anonymous labels, and
* built-in macros (and the string-manipulation directives).
[2]: https://reviews.llvm.org/D89741

Anyone else interested in reviewing some of the commits around this project?

I just recently got an interest in this project, when I tried building the OpenMP runtimes for windows - and they contain one source file in MASM format.

From very brief attempts at assembling the source file [1] with llvm-ml

noticed that it lacked some sort of preprocessing that the source used, among a few other issues.

CMake also ends up adding a few parameters using forward slashes, while it seems like llvm-ml currently only accepts parameters with dashes. Handling both (like clang-cl and lld/COFF do, among others) probably would require rewriting the option handling using the llvm/Option framework, like those tools do.

An example command for building it looks like this (in llvm-project/openmp/build):

<tool> -Domp_EXPORTS -Iruntime/src -I../runtime/src -I../runtime/src/i18n -I../runtime/src/include -I../runtime/src/thirdparty/ittnotify -D _CRT_SECURE_NO_WARNINGS -D _CRT_SECURE_NO_DEPRECATE -D _WINDOWS -D _WINNT -D _WIN32_WINNT=0x0501 -D _USRDLL -win64 -D_M_AMD64 -DOMPT_SUPPORT=0 /c /Fo runtime/src/CMakeFiles/omp.dir/z_Windows_NT-586_asm.asm.obj ../runtime/src/z_Windows_NT-586_asm.asm

When building for i386, you'd also see the parameters "/coff" and "/safeseh" added on the command line [2].

[1] https://github.com/llvm/llvm-project/blob/master/openmp/runtime/src/z_Windows_NT-586_asm.asm

[2] https://github.com/llvm/llvm-project/blob/master/openmp/runtime/cmake/LibompHandleFlags.cmake#L74-L82

// Martin

Hi,

Just a quick update on the llvm-ml project. As a refresher, this was a
proposal to add native MASM support to LLVM’s assembly capabilities, as
part of supporting cross-platform Windows compilation.

sembler-reference?view=vs-2019

A large set of directives are now supported, with a growing suite of tests.
After the next outstanding chain of commits ([tip][2]) adds macro procedures
and macro functions (including local symbols), this will include most of the
headline features for the language. (STRUCTs have been in place for some
time now, including type inference for size-checking.) Features not yet in
place include:

  • RECORDs,
  • ASSUME,
  • the ALIGN/ORG/EVEN directives,
  • the GOTO directive (within macros),
  • anonymous labels, and
  • built-in macros (and the string-manipulation directives).
    [2]: https://reviews.llvm.org/D89741

Anyone else interested in reviewing some of the commits around this project?

I just recently got an interest in this project, when I tried building the
OpenMP runtimes for windows - and they contain one source file in MASM
format.

From very brief attempts at assembling the source file 1 with llvm-ml
noticed that it lacked some sort of preprocessing that the source used,
among a few other issues.

CMake also ends up adding a few parameters using forward slashes, while it
seems like llvm-ml currently only accepts parameters with dashes. Handling
both (like clang-cl and lld/COFF do, among others) probably would require
rewriting the option handling using the llvm/Option framework, like those
tools do.

Yes, this is a known TODO :slight_smile:

In fact… I’ve just uploaded the follow-up commits to Phabricator that switch llvm-ml to use Option.h and put basic command-line compatibility in place. A lot of options aren’t supported just yet - including that llvm-ml doesn’t include dispatch to a linker, so it will only work for /c builds.

The top of the current chain is here: https://reviews.llvm.org/D90061

Other contributions are welcome!

Martin: would you be able to review some of the current stack? (Starting with https://reviews.llvm.org/D89729)

My usual reviewers on this project (Nico & Reid) are a bit flooded with other work. I’d be glad to prioritize getting the OpenMP runtimes building with llvm-ml once the patches in the current stack land!

Thanks,

  • Eric

I'm also rather swamped, and might not be familiar with all the bits you're touching, but I can see if I can carve out some time to comment on what I can at least!

// Martin

Quick update: I’ve just gotten LLVM-ML to assemble z_Windows_NT-586_asm.asm from the OpenMP runtimes in 64-bit mode.

Patches required:
https://reviews.llvm.org/D104194

https://reviews.llvm.org/D104195

https://reviews.llvm.org/D104196

32-bit mode will require another patch or two - in particular, it looks like this is the first file that requires support for the @Version built-in.

Best,

  • Eric

Awesome! Thanks, that's great news!

If llvm-ml can handle that file in both 32 and 64 bit mode by the 13 release, I'd be able to get rid of a lot of uglyness in my builds :slight_smile:

// Martin

Hey Eric,

It's nice to see this finally drawing closer!

I tried applying the remaining patches for this and building OpenMP (using CMake), and the 32 bit version does seem to build fine, but for 64 bit, it still fails.

First off, when building for 64 bit, it doesn't seem to realize it's meant to be building for that target unless -m64 is specified manually. In this case, CMake and the OpenMP CMake build files end up calling llvm-ml like this:

llvm-ml -Domp_EXPORTS -Iruntime/src -I../runtime/src -I../runtime/src/i18n -I../runtime/src/include -I../runtime/src/thirdparty/ittnotify -D _CRT_SECURE_NO_WARNINGS -D _CRT_SECURE_NO_DEPRECATE -D _WINDOWS -D _WINNT -D _WIN32_WINNT=0x0501 -D _USRDLL -win64 -D_M_AMD64 -DOMPT_SUPPORT=0 -DOMPD_SUPPORT=0 /c /Fo runtime/src/CMakeFiles/omp.dir/z_Windows_NT-586_asm.asm.obj ../runtime/src/z_Windows_NT-586_asm.asm

I haven't studied the ml.exe/ml64.exe(?) corresponding options, but I would expect that -win64 could be made to imply the same as -m64?

Then if I add -m64 manually, building fails like this:

llvm-ml: ../lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:396: void {anonymous}::X86MCCodeEmitter::emitMemModRMByte(const llvm::MCInst&, unsigned int, unsigned int, uint64_t, bool, uint64_t, llvm::raw_ostream&, llvm::SmallVectorImpl<llvm::MCFixup>&, const llvm::MCSubtargetInfo&, bool) const: Assertion `IndexReg.getReg() == 0 && !ForceSIB && "Invalid rip-relative address"' failed.

// Martin

Martin,

I haven’t actually seen the -win64 option for ML(64).EXE documented anywhere - have you? If we can find evidence of what it does for those, we’ll of course match it.

If you want to build with llvm-ml for 64-bit without passing the -m64 option… all you need to do is run it through a symlink/copy named llvm-ml64, the same way you’d need to run Microsoft’s “ml64.exe” rather than “ml.exe”. I’ve mimicked the way clang-cl and lld.link dispatch trigger off of the program name, since Microsoft seems to ship the two tools as separate binaries.

Back to the 64-bit build issues, though - I’ll get back to checking on that soon, and see if I can track down the line in the .asm file that triggers that bug. Hopefully it won’t be too tricky a fix!

Best,

  • Eric

Martin,
I haven't actually seen the -win64 option for ML(64).EXE documented anywhere
- have you? If we can find evidence of what it does for those, we'll of
course match it.

Oh, sorry about that, it turns out that this was a flag I added myself in my CMake configuration. ml(64).exe doesn't support it, but it's a uasm flag (which is the tool I use for masm today) - I had forgotten about setting that flag... Then it's trivial for me to change that flag into -m64, or use tool name based switching like you suggested.

Sorry for the noise about this aspect.

Back to the 64-bit build issues, though - I'll get back to checking on that
soon, and see if I can track down the line in the .asm file that triggers
that bug. Hopefully it won't be too tricky a fix!

Thanks!

// Martin

I’ve tracked down the trigger; it’s the line
lea rdx, QWORD PTR [rax*8+16]

https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/openmp/runtime/src/z_Windows_NT-586_asm.asm#L1243

The issue is that LLVM-ML64 generally assumes that (if unspecified) the base register of a memory reference is RIP. Our attempt to mimic this ends up with us trying to assemble the instruction
lea rdx, [rip + 8*rax + 16]

… and the X86 code emitter rightly complains that that’s not a valid memory reference.

This was introduced in an attempt to mimic ML64.EXE in https://reviews.llvm.org/D73227 - and it looks like I got it wrong! (And didn’t test against ML64.EXE nearly enough.) If I’ve got it right now, this was only supposed to apply to references to symbols… and irritatingly enough, only to symbols defined using MASM’s syntax for named variables in memory.

v0 of https://reviews.llvm.org/D105372 restricts it to symbols, at least, and avoids this misassembly. I’ll see about getting another commit up (or patching that one) to restrict further to named variables specifically, and not apply to labels.