Why doesn't this LLVM IR code optimize away?

I’m following the LLVM Beginner Kaleidoscope Tutorial somewhat, and writing my own IR using the IR Builder. I also use the JIT and Optimizer passes given on that page.
With that, I’m trying out a bit of the optimizations, and I’m wondering why this specific IR doesn’t optimize like I thought it would.

define void @testname(ptr %mem) {
entry:
  store i8 77, ptr %mem, align 1
  %memFARAWAY = getelementptr i8, ptr %mem, i64 256
  store i8 46, ptr %memFARAWAY, align 1
  %0 = load i8, ptr %mem, align 1
  %1 = add i8 %0, 1
  store i8 %1, ptr %mem, align 1
  ret void
}

In particular, this IR stores a set constant to a pointer at 0, then another set constant at the pointer 256 indexes away, and then it loads the first one and adds 1 and stores it again.
In my head, this should be optimized away to immediately storing 78 to the first location.

Am I misunderstanding basic concepts?

Assuming you’re strictly following that tutorial, then the issue you’re seeing is that the tutorial code is not running all of LLVMs optimization passes required for this optimization. Looking at the code it seems to only run instcombine, reassociate, gvn and simplifycfg.
This gets us close to what you’re describing in that it will deduce that the second store can be simplified to just storing 78, but does not yet eliminate the first store:

define void @testname(ptr %mem) {
  store i8 77, ptr %mem, align 1
  %memFARAWAY = getelementptr i8, ptr %mem, i64 256
  store i8 46, ptr %memFARAWAY, align 1
  store i8 78, ptr %mem, align 1
  ret void
}

https://godbolt.org/z/9MEKWW95d

What is missing to eliminate the first store is an optimization called “Dead Store Elimination” (DSE). This will delete the redundant store:

define void @testname(ptr %mem) {
  %memFARAWAY = getelementptr i8, ptr %mem, i64 256
  store i8 46, ptr %memFARAWAY, align 1
  store i8 78, ptr %mem, align 1
  ret void
}

https://godbolt.org/z/35csT6fP6

Incase you want to run the full -O3 optimization pipeline that is used by the likes of Clang and opt when passed -O3, see Using the New Pass Manager — LLVM 18.0.0git documentation

2 Likes

Yes! Thank you!
So it really was just not using the proper optimizers. I added it and could reproduce the result.
Also, thanks for the link to the O3 defaults (or in this case even O2 is enough).
Knowing all the optimizers seems to be too much for me, I’ll take some good defaults any day!