Indirect Branch Representation

Hi,

I was thinking about the ways to represent indirect branch in LLVM. It seems that “Switch instruction” is the only way to logically represent indirect branch inside LLVM.
Is there any other easier way to represent indirect branch in LLVM?

Thanks

Kapil

Yeah, that's essentially it; the only forms of control flow LLVM
supports are branches/switches, calls, and exceptions. There's been
some discussion that this makes it difficult to represent some
constructs, like setjmp/longjmp and the gcc goto from nested function
extension, but nobody has really proposed anything yet.

LLVM supports turning functions with the fastcc calling convention
into tail calls; that can be useful in some cases. Is there something
in particular you're running into issues with?

-Eli

Specifically, I need a way to represent indirect branch instruction (in binary) as an equivalent LLVM instruction. With switch instruction , I would have to list all the possible targets and then initialize the corresponding instruction. I was just thinking whether it might be possible to have some kind of indirect branch where label is a “variable” and not an explicit label present in module.

-Kapil

Oh... I think you're stuck with either a massive switch statement or tail calls.

-Eli

Just to make sure, can the label in a branch instruction be a “variable”?

No; there isn't any way to take the address of a basic block.

-Eli

No, and there never will be a way. In the future, we will probably extend the CFG to better model the GCC "address of label + indirect goto" extension. However, even with that, all the possible destinations of an indirect goto must be explicitly known.

Without knowing the CFG, the compiler cannot do any useful optimization. Eli is right, you probably want lots of tail calls or something.

-Chris

So, that means that &&(Label) operator, which is defined in C++, is also not supported currently in LLVM. I thought I could obtain address of basic block indirectly through this small hack but it does not seem to work.

Actually, I tried to make folloing dummy C++ code which uses this operator:

int main(int argc,char** argv)
{
int x;
int y;
L1: printf(“Hello\n”);
L2: printf(“World\n”);
if(atoi(argv[1]) == 1)
x = &&L1;
else
x = &&L2;
y = x + 10;
return y;
}

( The above code is not correct as we cast from pointer to interger and also not an example of good programming but would anyways compile correctly in GCC)

If I use llvmgcc, I get:

@.str = internal constant [6 x i8] c"Hello\00"
@.str1 = internal constant [6 x i8] c"World\00"

define i32 @main(i32 %argc, i8** %argv) nounwind {

entry: %“alloca point” = bitcast i32 0 to i32
br label %L1
L1: ; preds = %indirectgoto, %indirectgoto, %entry
%tmp1 = call i32 @puts( i8* getelementptr ([6 x i8]* @.str, i32 0, i32
br label %L2
L2: ; preds = %indirectgoto, %L1
%tmp2 = call i32 @puts( i8* getelementptr ([6 x i8]* @.str1, i32 0, i32
%tmp4 = getelementptr i8** %argv, i32 1
%tmp5 = load i8** %tmp4, align 4
%tmp6 = call i32 @atoi( i8* %tmp5 ) nounwind
%tmp7 = icmp eq i32 %tmp6, 1
%tmp78 = zext i1 %tmp7 to i8
%toBool = icmp ne i8 %tmp78, 0
br i1 %toBool, label %bb, label %bb9

bb: ; preds = %L2
ptrtoint i8* inttoptr (i32 1 to i8*) to i32
br label %bb10
bb9: ; preds = %L2
ptrtoint i8* inttoptr (i32 2 to i8*) to i32
br label %bb10

bb10: ; preds = %bb9, %bb
%x.0 = phi i32 [ 2, %bb9 ], [ 1, %bb ]
%tmp12 = add i32 %x.0, 10
br label %return

return:
ret i32 %tmp12

indirectgoto: ; No predecessors!
switch i32 undef, label %L1 [
i32 1, label %L1
i32 2, label %L2
]
}
declare i32 @puts(i8*)
declare i32 @atoi(i8*)

The instruction marked in BOLD should correspond to each other but LLVM code does not seem to capture the logic represented in corresponding C Code. So, should we take care to avoid such kind of label operators in code.

Kapil

So, that means that &&(Label) operator, which is defined in C++, is also not
supported currently in LLVM.

The &&(label) operator works just fine as long as you don't get into
nested functions.

indirectgoto: ; No predecessors!
        switch i32 undef, label %L1 [
                 i32 1, label %L1
                 i32 2, label %L2
        ]

This is the magic that makes it work at the moment.

-Eli

Hi,

The instruction marked in BOLD should correspond to each other but LLVM code
does not seem to capture the logic represented in corresponding C Code. So,
should we take care to avoid such kind of label operators in code.

I'm not sure what you mean: the LLVM IR looks perfectly correct to me.
Presumably you weren't expecting the label pointer to be encoded using
the values 1 and 2? Since the pointer value is not valid outside of
the function in which the label lives, this encoding is perfectly valid,
though not as flexible as you might wish.

Ciao,

Duncan.

This would also help with Fortran assigned gotos.

                                              -Dave

Actually, I am trying to implement a frontend for conversion from binary to LLVM. In binary, we would encounter branch indirect instructions like, Br Reg, where Reg would be an equivalent virtual register in LLVM. So, we can’t directly implement Switch block method here to represent this instruction as we don’t know how the values of Reg are mapped to basic blocks.

I was going through the documentation and found that indirect branch are produced by virtual function calls. Are there any other cases which result into CodeGen procuding Indirect Branch?

Curiously, the previous thread of this list (titled "Indirect Branch
Representation") points towards one of the two possible
implementations of microthreads that I'm aware of:

1) A method inspired in Duff's Device:

2) The GCC label-as-value extension + a goto statement
http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-July/016071.html

Both methods could be used for creating local continuations, which is
the most important requirement for green threads. It could be nice to
have some way of achieving GCC's extension in LLVM. At this point, I
don't have a deep enough knowledge of llvm to figure it out.

Thanks for all your help!

.alvaro.castro.