What is ConstantExpr?

Hi, All.

Does anybody know about ConstantExpr in llvm? What’s it?
Since it always appears after llvm optimization such as -O2 level, what is it supposed to be to codegen? I am wondering it represents constant value which can be determined or computed at compile-time(actually is link-time) to improve performance. Although we do not know the actual constant value util the object file is linked.

Here is a my example, but there is still existing code to compute value in run-time.

cat a.C
int n=5;
int main(){
long a = (long)&n+7;
int b = a;
return b;
}

clang++ a.C -c -O2 -emit-llvm -S;cat a.ll
; ModuleID = ‘a.C’
target datalayout = “e-m:o-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-apple-macosx10.12.0”

@n = global i32 5, align 4

; Function Attrs: norecurse nounwind readnone ssp uwtable
define i32 @main() #0 {
ret i32 trunc (i64 add (i64 ptrtoint (i32* @n to i64), i64 7) to i32)
}

clang++ a.C -c -O2;objdump -d a.O

a.O: file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
_main:
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
4: 48 8d 05 00 00 00 00 leaq (%rip), %rax
b: 83 c0 07 addl $7, %eax
e: 5d popq %rbp
f: c3 retq

I am confused about what is its functionality in llvm?

Thanks.

You’re pretty much got it. A Constant Expression (ConstantExpr) is simply a constant value. Since some constant values depend upon architecture-dependent features (e.g., structure layout, pointer size, etc.), LLVM provides the ConstantExpr to represent them in a (more or less) architecture-independent way. For example, a GEP with constant indices on an internal global variable will always compute the same value; it is a constant. However, we use a GEP ConstantExpr to represent it; the backend code generator converts it to the appropriate numerical constant when generating native code. For more information on the ConstantExpr, please see the LLVM Language Reference Manual (). Regards, John Criswell

Still thanks John.

Another example is that

int a;

int main(){
return 5+(long)(&a);
}

In O0 mode, IR is like blow

@a = global i32 0, align 4

; Function Attrs: noinline norecurse nounwind
define signext i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
ret i32 trunc (i64 add (i64 ptrtoint (i32* @a to i64), i64 5) to i32)
}

In O2 mode, IR is optimized as blow.
@a = global i32 0, align 4

; Function Attrs: norecurse nounwind readnone
define signext i32 @main() local_unnamed_addr #0 {
ret i32 trunc (i64 add (i64 ptrtoint (i32* @a to i64), i64 5) to i32)
}

I mean what’s the advantage of that pattern with constantexpr since it is introduced in O2 mode?
How does back end handle this pattern (which is a bitcast operator in my last case in email before)?

Thanks.

I see the constant expression in both the -O0 and -O2 assembly files above; I don’t think the optimizations are adding it to the code. The advantage of the constant expression in this case is that LLVM can express the constant in an architecture-independent way. First, you can’t add a pointer to an integer, so a constant bitcast is needed. Second, even if the bitcast wasn’t necessary, the value of “@a” hasn’t been determined yet (as code generation hasn’t occurred yet), so there is a need of representing the constant symbolically. This is why constant expressions exist: they allow constants to be represented symbolically, and they allow for constants to be represented in a way that can type check. After picking the location of the global variable “a”, the backend should be able to simplify the constant expression into a single integer value. Use clang -S to see the assembly code that LLVM generates; I suspect you’ll see the constant expression simplified to a single constant value. Regards, John Criswell

The short version is: Some values can stand on their own in llvm independently of a basic block or a function. These include things like numbers, addresses of global variables or functions, etc. You can even do computations on them giving you ConstantExpr. Take for example:

int array[10];
int *x = &array + 5;

giving this llvm IR:

@array = common global [10 x i32] zeroinitializer, align 16
@x = global i32* bitcast (i8* getelementptr (i8, i8* bitcast ([10 x i32]* @array to i8*), i64 20) to i32*), align 8

and this assembly:

  .globl _x ## @x
  .p2align 3
_x:
  .quad _array+20

They will be lowered at various places (some in the backend, some by the linker, some by the dynamic loader) but will be a constant value when the program is loaded.

- Matthias

Yes, but the example you give is a global variable. How about local variable which is also folded into constant expression as blow.

@a = global i32 0, align 4
define signext i32 @main() {
ret i32 trunc (i64 add (i64 ptrtoint (i32* @a to i64), i64 5) to i32)
}

I found the computation still exists in assembly code which as blow. Could it be that a constant replaces the result of trunc operation theoretically? Does it need a assembler(or linker?) to do the process to calculate the result on the variable a such as .quad trunc_macro(a+5) as a temp symbol value used by moving it to eax register to return?

I am wondering what is the difference between constant expression and separate instruction?

For example,

define signext i32 @main() {
%1 = ptrtoint i32* @a to i64
%2 = add i64 %1, 5
%3 = trunc i64 %2 to i32
ret i32 %3
}

The code above would be only folded in optimized mode such as -O1 or -O2.

define signext i32 @main() #0 {
ret i32 trunc (i64 add (i64 ptrtoint (i32* @a to i64), i64 5) to i32)
}