New idea thoughts: Optimization passes have callbacks to identify changes made to IR

Hello,

I’m currently developing a tool based on LLVM to understand how the LLVM IR changes after optimization passes are run.

Today it’s a tedious but automatic process at a function level in my language, where I first dump the IR before running any passes, and then do it for the 10 or so passes I care about one-at-a-time to understand which pass affected the change.

There are two problems with this approach:

(1) One-at-time kills the usability of the tool for reasonable size functions, it takes too much time
(2) semi-manually identify how the IR changed, a diff that is not semantic but not entirely +/- either

While (1) is deal breaker, I have ideas on how to improve this in my tool.

However, on (2) – I’m a bit lost without the help of the optimizer telling me what its doing. Have we ever envisioned creating passes that have callbacks to other LLVM or user code that would tell the interested party what changes they are about to make or made?

I understand that for some optimizers it’s really an all-or-nothing approach, and they can’t really pinpoint the line of IR that they are modifying or removing, but in general I feel a lot of optimizers could tell you which node in the IR they are about to modify.

So, on to my next thought. Is it feasible to somehow enforce or ask optimizers to do this, or would someone literally have to go through every optimizer and annotate it?

I’m new to this whole space, but would appreciate some comments on if there is already a half-decent way of achieving this?

I have a tool that, diffdump. I use it to diff the output of opt with
various optimization passes enabled.

    https://github.com/garious/diffdump

For example, to see what constprop does to the IR:

    $ diffdump -u --cmd=opt --arg=-S --arg2=-constprop basictest.ll

Gives you an output:

diff -u a/dump.txt b/dump.txt
--- a/dump.txt 2012-10-31 11:28:31.000000000 -0700
+++ b/dump.txt 2012-10-31 11:28:31.000000000 -0700
@@ -6,34 +6,28 @@
   br i1 %B, label %BB1, label %BB2

BB1: ; preds = %0
- %Val = add i32 0, 0
   br label %BB3

BB2: ; preds = %0
   br label %BB3

BB3: ; preds = %BB2, %BB1
- %Ret = phi i32 [ %Val, %BB1 ], [ 1, %BB2 ]
+ %Ret = phi i32 [ 0, %BB1 ], [ 1, %BB2 ]
   ret i32 %Ret
}

-Greg

Here's one idea:

Between each optimization pass, iterate over every Instruction and build a
CallbackVH for it. These will get callbacks upon certain operations such as
deletion and RAUW'ing. Newly created instructions won't be in your list, so
add them. If this is intended for a learning tool to observe things about
llvm's inner operations this will work quite well.

If you need to track what's going on inside llvm for some purpose that
requires exact tracking, you may want to try running llvm inside a VM that
instruments every change to any Value or Use and collects a stack trace at
that point (similarly to a watchpoint in gdb).

Nick

Thanks, it’s #1 as you suggested, so that seems like a pretty good idea!