Break nested instructions?

Is there any pass that breaks an expression out of an instruction’s operand into its own instruction, so that such nested instructions become explicit and are thus easier to work with in?

e.g Following call instruction contains a GEP instruction as its first operand. Is there any pass which allows me to break up this:

  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str4, i32 0, i32 0), i32 %tmp6)

into these:

%tmp = i8* getelementptr inbounds ([4 x i8]* @.str4, i32 0, i32 0)
%call = call i32 (i8*, ...)* @printf(i8* %tmp, i32 %tmp6)

?

Thank you in advance,

–istavrak

No it doesn’t. It contains a GEP constant expression. This is a bit confusing at first, especially when working with IRBuilder, which can sometimes give you constant expressions when you think that you’re asking for instructions. The constant expression, unlike an instruction, has no variable operands and no side effects, and so is guaranteed to be constant.

There are passes that will do the opposite of what you’re requesting (turn side-effect-free instructions with constant operands into constant expressions), but nothing to work the other way around. This makes some things easier (you can easily see that the operand to the call is a constant, without having to look at the sequence of operations that generates it), but other things more difficult (you need to handle GEP instructions and GEP constant expressions).

It would be quite nice to have a set of adaptor classes for the operations that can be either constant expressions or instructions, for use in the places where you don’t care which (just as CallSite wraps either an invoke or a call, in places that don’t need to handle them differently). There are a few thing that do make this easier:

- Both Instruction and ConstantExpression are subclasses of Value

- (I think) the OpCode for both will be the same, so you can switch on that and then only cast further if you care.

David

Thanks, David. I had misunderstood and thought that arguments were GEP instructions.
Being GEP constant expressions, solves my issue!
Thanks a lot again,

--istavrak

I think it's important to understand that this is not ALWAYS the case. The
operands can be either a constant expression or an instruction.

For example:

     char *a;
     int x;

     if(cond) a = "%d";
     else a = "%x";
     printf(a, x);

In this case, the compiler may still be able to convert to constant
expression for `a`, depending on if it's able to deduce `cond`. But if
`cond` is not "constant", `a` will be a GEP instruction, not a constant
expression.
(Of course, the compiler will probably also warn that passing a variable to
printf is a bad idea, but that's a different matter)

Does this mean that we can have a nested instruction into an instruction?! Wouldn't that lead having a LLVM IR language without terminals?

--istavrak

Interesting point, I'm not sure - but the operand on an "instruction" is a
Value, so I expect it can be any type that is within the Value class
hierarchy?

I expected the same. That's why I was searching a way to "break" the inlined instructions, as nested instructions are not convenient for me.
In case the operands of instructions are always meant to be constant exprs, then it's simple to handle them differently by having different adaptor classes
as David proposed before.

--istavrak

The operands of instructions are always values. There’s some confusion in the terminology here, because the language reference (along with the human-readable serialisation, and most SSA textbooks) refers to registers.

Every instruction that has a value (i.e. basically everything except call instructions that return void) implicitly defines a new register. In the C++ form of the IR, the register value and the instruction are not distinct: the Instruction is a subclass of Value and is used directly. This works because of the SSA form: registers are only defined once, by one instruction, so using a pointer to the Instruction works fine (the only book keeping required is to have a name associated with the instruction).

LLVM does not allow nested instructions. Functions contain basic blocks, basic blocks contain instructions. Instructions *refer to* other values as operands. These values are either local registers (other instructions), global values, or constant expressions. Constant expressions can only refer to globals or other constant expressions (and then, only to the address of the global, which is a constant, not to its value).

David

The language of LLVM allows an (as it is a Value) to refer to an Argument, Basic Block , User (-> Constant, , Operator). So, an operand of an instruction can be an instruction. But, doesn’t this mean that we CAN have an “inlined” instruction inside another one?! How can we say that LLVM doesn’t allow nested instructions?! Sorry, if I miss something here, but this confuses me. --istavrak

The operand is not the instruction, it’s the *result* of the instruction. In LLVM IR’s C++ representation, the result of the instruction is represented as a pointer to the instruction, but there is no nesting. In particular, the same instruction result can be used as operands multiple instructions (or multiple times as operands to the same instruction), but there is no nesting - the instruction only exists in one place (with a basic block as the parent) and is simply *referenced* from other instructions.

In contrast, constant expressions (because they are atemporal - they do not have to be evaluated at any a particular point, as long as the results are available at the relevant instruction) are referenced directly from within an instruction, they are not contained within a basic block.

In code generation, constant expressions will be turned into either a single numeric value or into a combination of relocations.

David

Sorry to cause confusion, what David says is entirely correct: If the
operand to an instruction is another instruction, that is never directly
the operand, but the result of that instruction is passed as the operand
value.

Dear Irini,

I wrote a pass for SAFECode that converts constant expressions into instructions. It is the BreakConstantGEP pass in SAFECode (http://llvm.org/viewvc/llvm-project/safecode/branches/release_32/lib/ArrayBoundChecks/BreakConstantGEPs.cpp?view=log). SAFECode has to change these to instructions because it needs to modify their results at run-time; you can update the code to a newer version of LLVM and use it if you wish.

That said, converting constant expressions into instructions is almost always a bad idea. The compiler takes advantage of the fact that constant expressions are, well, constant to generate more efficient code. You should only convert constant expressions into instructions if you're going to make the constant non-constant (which is what SAFECode does). If you're trying to analyze LLVM IR, you should enhance your pass to understand constant expressions.

Regards,

John Criswell