Semantics of LLVM IR intermediate variables

Hi all,

This question may sound stupid, but every time I look at the IR, I take some time to convince myself the following:

The following C source code:

1 int x ;
2 int * p ;
3 p = & x ;

when compiled to LLVM IR using clang generates the following instructions:

1 % x = alloca i32 , align 4
2 % p = alloca i32 * , align 8
3 store i32 * %x , i32 ** %p , align 8

All the local variables in the C source code, i.e. ‘x’ and ‘p’ are pointers now, in fact they are pointers with one level deeper nesting level. What I mean is, ‘x’ is an ‘int’ in the C source, but ‘%x’ is ‘i32*’. ‘p’ is ‘int*’ in the C source, but ‘%p’ is ‘i32**’. Doesn’t it make the IR naming convention a misnoer compared to their C counterpart? Shouldn’t ‘%x.addr’ or ‘%p.addr’ a better naming convention? Is there anything that I am missing?

You’re right that the name isn’t the most accurate from a clang point of view, redirecting to cfe-dev@ if anyone has an opinion from the clang side.

Note though that in LLVM the SSA value names are just for debugging, they are even stripped entirely by default in a Release clang “normal” path.

This is, however, consistent with global variables, where @foo is &foo
in the C source, which also matches what the linker's view is (where the
value of a symbol is its address).

James

Once promoted, which can be expected to be the default for stack allocations, the name %x makes more sense again.
(Afaik, the alloca name is used for new instructions, e.g., PHIs. [0])
The alternative would be to rename during promotion (SROA/mem2reg) but it's unclear if that is worth it.

Cheers,
  Johannes

[0] https://godbolt.org/z/loZOfX