Some questions about writing compiler passes.

Hello,

I have a couple basic questions about implementing compiler passes for llvm and was looking for a bit of help since I was looking to write an instrumentation pass of some sort. I am sort of new.

  1. How do compiler passes cope with inline assembly? Can this sometimes violate the optimization assumptions of the pass?

  2. What are good ways to verify and validate passes? Can one prove them correct with Coq? Or is it a matter of running and dumping the AST pre and post transform and using tests to verify their correctness?

Thanks in advance for your help,

Carter.

Hello,

I have a couple basic questions about implementing compiler passes for llvm and was looking for a bit of help since I was looking to write an instrumentation pass of some sort. I am sort of new.

  1. How do compiler passes cope with inline assembly? Can this sometimes violate the optimization assumptions of the pass?

Inline asm nodes in the intermediate representation (LLVM IR) have various properties that prevent such violation - so optimizations can query the inline asm node in the IR & it’ll say things like “I modify memory”, etc - so the optimizations won’t reorder it in ways that’d break the memory ti would observe.

  1. What are good ways to verify and validate passes?

Check out the contents of LLVM’s test directory - that’s how we check LLVM optimizations do what’s intended (though it doesn’t validate that that intent is correct, or that it doesn’t have weird corner cases, etc)

Can one prove them correct with Coq? Or is it a matter of running and dumping the AST pre and post transform and using tests to verify their correctness?

The AST isn’t used for optimization in Clang/LLVM - Clang generates LLVM IR from Clang’s AST (the AST itself is basically immutable - the requirements/invariants are a bit too complicated to readily modify it while preserving those invariants) - then LLVM’s optimization pipeline operates on that IR before lowering it again to the machine code (with a few short stops along the way).

Test cases for optimizations are generally written by having some input IR, running it through one specific optimization, then validating that the IR coming out the other end has certain properties that are desired (has one constant instead of an add of two constants if it’s a constant fold, for example).

Sometimes there are efforts to more totally prove something correct - either by brute force (I think we brute forced some of the machine instruction encoding at one point, to make sure the assembler/disassembler could roundtrip everything) or algorithmic proofs.

  • Dave

Thanks for the help. I have a couple follow up questions-

Where can I find the code which calculates the properties of inline asm nodes and examples of passes that make use of this information?

Thanks again,

Carter.

Thanks for the help. I have a couple follow up questions-

Where can I find the code which calculates the properties of inline asm nodes and examples of passes that make use of this information?

Not sure exactly, probably… - ah, easier than I expected. So I wrote a trivial bit of C code that used inline asm ( “int main() { asm(”“); }”) and compiled that to LLVM IR with clang (clang -emit-llvm -S code.c) & I see that Clang lowers inline assembly to an LLVM IR ‘call’ instruction with the ‘asm’ modifier: “call void asm sideeffect “”, “~{dirflag},~{fpsr},~{flags}”() #1, !srcloc !2”

So I go and look up the API documentation for the LLVM IR call instruction & search for anything asm related and find this: http://llvm.org/doxygen/classllvm_1_1CallInst.html#a731dfe719a1112a16f7b231183cae70c

So you could look around in the LLVM codebase for anything that calls that function. Chances are it isn’t too many places - because for most optimizations the fact that it’s a call (& maybe the CallInst lets you query for particular properties like whether it interacts with memory, etc) probably gives enough info.

  • Dave

Generally the compiler does not calculate properties of inline asm. A good mental model is that inline assembly is just a string to the compiler where we just perform some replacements for register names and similar. Properties and constraints are expected to be specified by the user (you have to mark it volatile if it has side effects, you have to specify lists of registers to be clobbered, you have to specify register allocation constraints via letters, etc. see the gcc documentation for what is possible).

- Matthias

Thanks. I will take a look.