Transform: Eliminating boxing-unboxing in untyped languages

Hi,

I'm playing with an untyped language compiler that generates tons of
LLVM code on simple expressions. We can slowly build up to the full
complexity, but let's look at a simple example first:

The + operator that explicitly expects two integers, and those two
integers are provided literally in the same function in the same basic
block. So, (+ 3 4) generates the following. What's actually happening
is that 3 and 4 are getting boxed into a value_t struct before getting
unboxed immediately for the add operation. An InstCombine should be
able to fix this, no?

; ModuleID = 'My JIT'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

%value_t = type { i32, i64, i1, i8*, %value_t**, i64, double,
%value_t* (i32, %value_t**, ...)*, i8, i1, %value_t* }

declare i8* @gc_malloc(i64)

declare i64 @strlen(i8*)

; Function Attrs: nounwind
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture
readonly, i64, i32, i1) #0

; Function Attrs: nounwind readnone
declare double @llvm.pow.f64(double, double) #1

; Function Attrs: nounwind
declare void @llvm.va_start(i8*) #0

; Function Attrs: nounwind
declare void @llvm.va_end(i8*) #0

declare %value_t* @println(i32, %value_t**, ...)

declare %value_t* @print(i32, %value_t**, ...)

declare %value_t* @cequ(i32, %value_t**, ...)

declare %value_t* @cstrjoin(i32, %value_t**, ...)

define %value_t* @anon0() gc "rgc" {
entry:
  %value = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr
(%value_t* null, i32 1) to i64))
  %malloc_value = bitcast i8* %value to %value_t*
  %boxptr = getelementptr inbounds %value_t* %malloc_value, i32 0, i32 0
  %boxptr1 = getelementptr inbounds %value_t* %malloc_value, i32 0, i32 1
  store i32 1, i32* %boxptr
  store i64 3, i64* %boxptr1
  %value2 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr
(%value_t* null, i32 1) to i64))
  %malloc_value3 = bitcast i8* %value2 to %value_t*
  %boxptr4 = getelementptr inbounds %value_t* %malloc_value3, i32 0, i32 0
  %boxptr5 = getelementptr inbounds %value_t* %malloc_value3, i32 0, i32 1
  store i32 1, i32* %boxptr4
  store i64 4, i64* %boxptr5
  %load = load i32* %boxptr
  %is_dbl = icmp eq i32 %load, 6
  %value7 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr
(%value_t* null, i32 1) to i64))
  %malloc_value8 = bitcast i8* %value7 to %value_t*
  %boxptr9 = getelementptr inbounds %value_t* %malloc_value8, i32 0, i32 0
  %boxptr10 = getelementptr inbounds %value_t* %malloc_value8, i32 0, i32 2
  store i32 2, i32* %boxptr9
  store i1 %is_dbl, i1* %boxptr10
  %load12 = load i32* %boxptr4
  %is_dbl13 = icmp eq i32 %load12, 6
  %value14 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr
(%value_t* null, i32 1) to i64))
  %malloc_value15 = bitcast i8* %value14 to %value_t*
  %boxptr16 = getelementptr inbounds %value_t* %malloc_value15, i32 0, i32 0
  %boxptr17 = getelementptr inbounds %value_t* %malloc_value15, i32 0, i32 2
  store i32 2, i32* %boxptr16
  store i1 %is_dbl13, i1* %boxptr17
  %load19 = load i64* %boxptr5
  %load21 = load i64* %boxptr1
  %add = add i64 %load21, %load19
  %value22 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr
(%value_t* null, i32 1) to i64))
  %malloc_value23 = bitcast i8* %value22 to %value_t*
  %boxptr24 = getelementptr inbounds %value_t* %malloc_value23, i32 0, i32 0
  %boxptr25 = getelementptr inbounds %value_t* %malloc_value23, i32 0, i32 1
  store i32 1, i32* %boxptr24
  store i64 %add, i64* %boxptr25
  ret %value_t* %malloc_value23
}

attributes #0 = { nounwind }
attributes #1 = { nounwind readnone }

This is a memory analysis problem and is probably best solved by GVN (or possibly EarlyCSE).

If you look at the debug output from GVN and mem dep analysis, I suspect you'll find that the second gc_malloc call is blocking the load forwarding from the first one. I'd suggest a few things:
- Try a *trivial* example with a single boxed integer. Does that get 'unboxed'? (The allocation won't be removed most likely.)
- If so, does adding a *single* call to gc_malloc between the store and load break it? (I suspect it will.)
- Look into using appropriate attributes (noalias!) to convey the aliasing properties you need. This is best done by tracing through where a simple example fails and looking at surrounding code. (Also, see LangRef)

Philip