Missed optimisation opportunities?

I’m writing a front end for an existing interpreted language with slightly odd semantics for primitive values.

Similar to the values in a database table, any value could be null, even for non-pointer types.

For example a boolean variable could be true, false, or null.

To model this behaviour, I’m passing an {i1, [type]} around for every numeric type. And using insertvalue / extractvalue all over the place.

So when compiling this simple CS example;

function long fib (long al_x)
long ll_i=0, ll_ret=0, ll_last=0, ll_last2=1

if isnull(al_x) then
return al_x
end if

ll_i=1
do while ll_i<=al_x
ll_ret=ll_last+ll_last2
ll_last2=ll_last
ll_last=ll_ret
ll_i++
loop
return ll_ret
end function

With the following function passes;
FunctionPasses->add(new DataLayout(*JITEngine->getDataLayout()));
FunctionPasses->add(createBasicAliasAnalysisPass());
FunctionPasses->add(createPromoteMemoryToRegisterPass());
FunctionPasses->add(createInstructionCombiningPass());
FunctionPasses->add(createReassociatePass());
FunctionPasses->add(createGVNPass());
FunctionPasses->add(createCFGSimplificationPass());

Which does a pretty good job of cleaning up most of the messy & verbose code generated by my front end.

I end up with the following bitcode;

define x86_stdcallcc { i1, i32 } @fib({ i1, i32 } %arg_al_x) {
entry:
%“3_4” = extractvalue { i1, i32 } %arg_al_x, 0
br i1 %“3_4”, label %Line15_Offset120, label %Line8_Offset38

Line8_Offset38: ; preds = %Line9_Offset52, %entry
%ll_last2.0 = phi { i1, i32 } [ %ll_last.0, %Line9_Offset52 ], [ { i1 false, i32 1 }, %entry ]
%ll_last.0 = phi { i1, i32 } [ %“9_64_composed”, %Line9_Offset52 ], [ zeroinitializer, %entry ]
%ll_i.0 = phi { i1, i32 } [ %“12_98”, %Line9_Offset52 ], [ { i1 false, i32 1 }, %entry ]
%“8_38_isnull” = extractvalue { i1, i32 } %ll_i.0, 0
%“8_38_value” = extractvalue { i1, i32 } %ll_i.0, 1
%“8_42_value” = extractvalue { i1, i32 } %arg_al_x, 1
%“8_46_value” = icmp sle i32 %“8_38_value”, %“8_42_value”
%“8_48_not_null” = xor i1 %“8_38_isnull”, true
%“8_48_and_not_null” = and i1 %“8_46_value”, %“8_48_not_null”
br i1 %“8_48_and_not_null”, label %Line9_Offset52, label %Line15_Offset120

Line9_Offset52: ; preds = %Line8_Offset38
%“9_60_isnull” = extractvalue { i1, i32 } %ll_last2.0, 0
%“9_56_isnull” = extractvalue { i1, i32 } %ll_last.0, 0
%“9_64_isnull” = or i1 %“9_56_isnull”, %“9_60_isnull”
%“9_56_value” = extractvalue { i1, i32 } %ll_last.0, 1
%“9_60_value” = extractvalue { i1, i32 } %ll_last2.0, 1
%“9_64” = add i32 %“9_56_value”, %“9_60_value”
%0 = insertvalue { i1, i32 } zeroinitializer, i1 %“9_64_isnull”, 0
%“9_64_composed” = insertvalue { i1, i32 } %0, i32 %“9_64”, 1
%“12_98_value” = add i32 %“8_38_value”, 1
%1 = insertvalue { i1, i32 } zeroinitializer, i1 %“8_38_isnull”, 0
%“12_98” = insertvalue { i1, i32 } %1, i32 %“12_98_value”, 1
br label %Line8_Offset38

Line15_Offset120: ; preds = %Line8_Offset38, %entry
%_return_value.0 = phi { i1, i32 } [ %arg_al_x, %entry ], [ %ll_last.0, %Line8_Offset38 ]
ret { i1, i32 } %_return_value.0
}

None of the local variables in this example can ever be null. Each of these extractvalue expressions will always evaluate to i1 0;

%“8_38_isnull” = extractvalue { i1, i32 } %ll_i.0, 0
%“9_60_isnull” = extractvalue { i1, i32 } %ll_last2.0, 0
%“9_56_isnull” = extractvalue { i1, i32 } %ll_last.0, 0

Is there an existing function pass I could add that would clean this up? Or is this example just slightly too complex?

Jeremy Lakeman wrote:

I'm writing a front end for an existing interpreted language with
slightly odd semantics for primitive values.
Similar to the values in a database table, any value could be null, even
for non-pointer types.
For example a boolean variable could be true, false, or null.
To model this behaviour, I'm passing an {i1, [type]} around for every
numeric type. And using insertvalue / extractvalue all over the place.

So when compiling this simple CS example;

function long fib (long al_x)
long ll_i=0, ll_ret=0, ll_last=0, ll_last2=1

if isnull(al_x) then
     return al_x
end if

ll_i=1
do while ll_i<=al_x
     ll_ret=ll_last+ll_last2
     ll_last2=ll_last
     ll_last=ll_ret
     ll_i++
loop
return ll_ret
end function

With the following function passes;
         FunctionPasses->add(new DataLayout(*JITEngine->getDataLayout()));
         FunctionPasses->add(createBasicAliasAnalysisPass());
         FunctionPasses->add(createPromoteMemoryToRegisterPass());
         FunctionPasses->add(createInstructionCombiningPass());
         FunctionPasses->add(createReassociatePass());
         FunctionPasses->add(createGVNPass());
         FunctionPasses->add(createCFGSimplificationPass());

Which does a pretty good job of cleaning up most of the messy & verbose
code generated by my front end.

I end up with the following bitcode;

define x86_stdcallcc { i1, i32 } @fib({ i1, i32 } %arg_al_x) {
entry:
   %"3_4" = extractvalue { i1, i32 } %arg_al_x, 0
   br i1 %"3_4", label %Line15_Offset120, label %Line8_Offset38

Line8_Offset38: ; preds =
%Line9_Offset52, %entry
   %ll_last2.0 = phi { i1, i32 } [ %ll_last.0, %Line9_Offset52 ], [ { i1
false, i32 1 }, %entry ]
   %ll_last.0 = phi { i1, i32 } [ %"9_64_composed", %Line9_Offset52 ], [
zeroinitializer, %entry ]
   %ll_i.0 = phi { i1, i32 } [ %"12_98", %Line9_Offset52 ], [ { i1
false, i32 1 }, %entry ]
   %"8_38_isnull" = extractvalue { i1, i32 } %ll_i.0, 0
   %"8_38_value" = extractvalue { i1, i32 } %ll_i.0, 1
   %"8_42_value" = extractvalue { i1, i32 } %arg_al_x, 1
   %"8_46_value" = icmp sle i32 %"8_38_value", %"8_42_value"
   %"8_48_not_null" = xor i1 %"8_38_isnull", true
   %"8_48_and_not_null" = and i1 %"8_46_value", %"8_48_not_null"
   br i1 %"8_48_and_not_null", label %Line9_Offset52, label
%Line15_Offset120

Line9_Offset52: ; preds = %Line8_Offset38
   %"9_60_isnull" = extractvalue { i1, i32 } %ll_last2.0, 0
   %"9_56_isnull" = extractvalue { i1, i32 } %ll_last.0, 0
   %"9_64_isnull" = or i1 %"9_56_isnull", %"9_60_isnull"
   %"9_56_value" = extractvalue { i1, i32 } %ll_last.0, 1
   %"9_60_value" = extractvalue { i1, i32 } %ll_last2.0, 1
   %"9_64" = add i32 %"9_56_value", %"9_60_value"
   %0 = insertvalue { i1, i32 } zeroinitializer, i1 %"9_64_isnull", 0
   %"9_64_composed" = insertvalue { i1, i32 } %0, i32 %"9_64", 1
   %"12_98_value" = add i32 %"8_38_value", 1
   %1 = insertvalue { i1, i32 } zeroinitializer, i1 %"8_38_isnull", 0
   %"12_98" = insertvalue { i1, i32 } %1, i32 %"12_98_value", 1
   br label %Line8_Offset38

Line15_Offset120: ; preds =
%Line8_Offset38, %entry
   %_return_value.0 = phi { i1, i32 } [ %arg_al_x, %entry ], [
%ll_last.0, %Line8_Offset38 ]
   ret { i1, i32 } %_return_value.0
}

None of the local variables in this example can ever be null. Each of
these extractvalue expressions will always evaluate to i1 0;

   %"8_38_isnull" = extractvalue { i1, i32 } %ll_i.0, 0
   %"9_60_isnull" = extractvalue { i1, i32 } %ll_last2.0, 0
   %"9_56_isnull" = extractvalue { i1, i32 } %ll_last.0, 0

Is there an existing function pass I could add that would clean this up?
Or is this example just slightly too complex?

I don't think any pass in llvm will clean this up, but you can check that by running opt -O2. Off the top of my head, I think the place for this optimization would be jump-threading, by changing it to reason about individual members inside a struct.

Nick

Note that I am not an expert in this code and haven't actually tested any of what I'm about to suggest. I'm purely speculating after a quick read through code I'm otherwise unfamiliar with...

The key fact missed here seems to be that the input to the phi (ll_last2.0) are both insertvalue instructions or constant values (and thus implicitly one). With this information, the phi node could be pushed through the insertvector to do a phi of the non-constant part of the value and an insertvalue to add the constant piece. This actually seems fairly similar to the FoldPHIArgGEPIntoPHI transformation which is currently done for GEPs by InstrCombine. It might be worth looking into whether there's an analogous transformation for insertelement you could add. Once this was done, the extractvalue(insertvalue) patterns should remove the redundant constants.

While this would be profitable in your case, I'm unclear whether this would be a generally profitable transformation to make.

Another option would be to manually destruct your records into their component pieces. This would introduce separate phi nodes for each field and should enable the null checks to be mostly removed in this example.

Philip