Implementing traditional Available Expressions Analysis?


Currently I am in the process of understanding the LLVM code base. For this purpose I am trying to implement the traditional Available Expressions analysis (as in ALSU, based on lexical aspect of an expression).

First, we have to define the notion of an expressions. And, given an instruction, we have to define expressions generated by the Instruction and other expressions killed by this instruction. Say for example, in the following code segment

%1 = call i8* @malloc(i32 4) ; <i8*> [#uses=1]
%2 = bitcast i8* %1 to i32* ; <i32*> [#uses=4]
store i32 5, i32* %2
%3 = load i32* %2
; [#uses=1]
%4 = add nsw i32 %3, 10 ; [#uses=0]
store i32 20, i32* %2
%5 = load i32* %2 ; [#uses=1]
%6 = add nsw i32 %5, 10

the two load instructions %3 and %5, I should refer to the same expression. Also, when processing the second store instruction, in particular, it should kill the instruction (expression) labelled %3.

My questions are,

  1. How to represent an expression?
  2. when processing a statement how to identify the instructions (expressions) it can kill? (As I have already mentioned, I am concentrating on the lexical aspect of an Instruction and not on the value based aspect, as in SSA. Hence the need for doing ‘kill’.)