[RFC] Target CPU and features for module-level inline assembly

Background

LLVM LTO constructs a symbol table for an LLVM IR file by scanning its contents. This process includes a scan of module-level inline assembly for referenced symbols and symbol versions. This process involves parsing the inline assembly.

The target CPU and feature set are both inputs to the asm parse, and these inputs can arbitrarily influence asm parsing. Unfortunately, the IR symbol table is produced from an IR module in a wide variety of contexts, only some of which have access to the actual CPU and target feature set. For this reason, the ModuleSymbolTable summarily constructs an assembly parser using the empty string as both CPU and feature set.

This results in LTO scan of module-level inline assembly does not respect CPU · Issue #67698 · llvm/llvm-project · GitHub, which produces spurious error messages whenever features not in the base target are used in module -level inline assembly during LTO. Since the parse fails, any referenced symbols are also not included in the symbol table for the module.

Proposal

I looked into fixing this today by trying to extract CPU and feature information from the environment, but it seems to be a broadly expected property of the architecture that a module has a symbol table that can be extracted with little supporting information. That seems like a very desirable property to maintain.

Accordingly, it seems like the most straightforward fix this would be to bundle the information necessary to produce a symbol table into the LLVM IR somehow. The smallest version of this would be just the CPU name and a feature string, so I’d propose these as something to accompany module-level inline asm at the top level.

This, of course, opens up a whole can of worms, since the CPU and feature strings could conflict, either between linked modules or between a module and the flags of the link step. It also doesn’t seem desirable to build something much more heavyweight than the value gained; but the status quo here does seem quite surprising.

Another, hackier, idea would be to have clang etc. encode this information into the inline assembly itself using something like a .llvm-cpu or .llvm-features directive. This would be more limited in scope and much easier to merge (just string concatenation on the inline asm), but I haven’t thought much about this one yet; just something that occurred to me as I was typing this up.

Anyway, I just wanted to put this out to see if 1) if this issue was known and/or discussed somewhere I couldn’t find, and 2) if it seems like this semantic wrinkle is worth ironing out, given the options available.

IMO, we need to stop scanning inline asm from LLVM IR to generate a symbol table. It causes really weird layering artifacts, and has long seemed unfortunate and in need of change, but this issue just adds more reason to do so.

That does mean that we need another solution to achieve LTO’s requirements.

My suggestion: have the IR-producer encode the list of symbols into the IR. In Clang’s case, that will mean doing similar asm scanning, but it will be done at IR generation, not IR parsing, which IMO makes a lot more sense, and, also, avoids the problem presented here.

2 Likes

My suggestion: have the IR-producer encode the list of symbols into the IR. In Clang’s case, that will mean doing similar asm scanning, but it will be done at IR generation, not IR parsing, which IMO makes a lot more sense, and, also, avoids the problem presented here.

I like this a lot. This does mean that there would ultimately be two sources of truth: the asm itself and the derived symbol data structure, to be kept in sync as IR is processed. It also seems that with an approach like this function-level inline asm could eventually be in-scope; have to think more about that.

The rollout would also be a bit tricky, since frontends currently depend on general LLVM IR manipulation routines doing this scan. I suppose that just means we’d need to keep the asm scan code around indefinitely to use whenever the symbol information hasn’t been populated.

kept in sync as IR is processed

IR processing doesn’t touch the content of inline-asm strings, so there should be no need to keep in sync here.

The rollout would also be a bit tricky, since frontends currently depend on general LLVM IR manipulation routines doing this scan.

I think we should keep the asm-scan code around to deal with old bitcode versions for 2-3 releases, and then delete it. (And, yes, that breaks things like archive search for old bitcode with inline-asm. I think that’s an acceptable tradeoff after a migration period).

But for frontends generating new bitcode with inline asm, they should be required to populate the symbol list sooner, not let them silently fallback to the current scanning. I bet this doesn’t actually affect most frontends other than Clang.

Considering that Clang already has a dependency on assembly parsing to support Intel / Microsoft style asm blocks, asking the frontend to do this parse and add a list of referenced symbols makes sense.

I agree that, ultimately, Clang is probably the only frontend that will care.

BTW, from a quick search, I see two other frontends that would need to be updated:

  • Rust emits module-level inline-asm from user-specified global asm, just like Clang; so it would need to invoke the asm-scanner to get a symbol list just as clang does.
  • The LDC D compiler emits module-level inline-asm, but defines the function-name itself, so it can just add that to a list.