Alloca Requirements

Are there any implicit assumptions about where alloca instructions
can appear. I've got a failing test where the only difference
between a passing test and a failing test is one application of
this code in instcombine:

// Convert: malloc Ty, C - where C is a constant != 1 into: malloc [C x Ty], 1

Seems pretty harmless to me.

Later on the instcombine code does this:

// Scan to the end of the allocation instructions, to skip over a block of
// allocas if possible...

That comment makes me a bit suspicious regarding assumptions about alloca
placement.

The interesting thing about this testcase is that the extra instcombine makes
the test pass. If I omit it, the test fails. The only differences in the
asm are stack offsets, which leads me to believe that in the failing test
codegen is not accounting for all allocas properly.

Before this critical instcombine the input code looks like this:

        ; Fails
  %"t$1" = alloca %DV1, align 8 ; <%DV1*> [#uses=3]
  %"t$2" = alloca [1 x [1 x <2 x float>]]*, i32 12, align 8 ; <[1 x [1 x <2 x

]]**> [#uses=2]

  %tmpcast5 = bitcast [1 x [1 x <2 x float>]]** %"t$2" to %DV2* ; <%DV2*>
[#uses=2]
  %"t$34" = alloca [9 x [1 x <2 x float>]*], align 8 ; <[9 x [1 x <2 x

]*]*> [#uses=3]

Afterward it looks like this:

        ; Passes
  %"t$1" = alloca %DV1, align 8 ; <%DV1*> [#uses=3]
  %"t$26" = alloca [12 x [1 x [1 x <2 x float>]]*], align 8 ; <[12 x [1 x [1 x
<2 x float>]]*]*> [#uses=1]
  %"t$26.sub" = getelementptr [12 x [1 x [1 x <2 x float>]]*]* %"t$26", i32 0,
i32 0 ; <[1 x [1 x <2 x float>]]**> [#uses=2]
  %tmpcast5 = bitcast [1 x [1 x <2 x float>]]** %"t$26.sub" to %DV2* ; <%DV2*>
[#uses=2]
  %"t$34" = alloca [9 x [1 x <2 x float>]*], align 8 ; <[9 x [1 x <2 x

]*]*> [#uses=3]

Any thoughts on why this might be a problem?

                                   -Dave

Are there any implicit assumptions about where alloca instructions
can appear.

Static allocas should appear as a continuous chunk in the entry block,
otherwise other passes might make bad assumptions.

The interesting thing about this testcase is that the extra instcombine makes
the test pass. If I omit it, the test fails. The only differences in the
asm are stack offsets, which leads me to believe that in the failing test
codegen is not accounting for all allocas properly.

If running a testcase through -instcombine -instcombine gives a result
that isn't identical to -instcombine, that's a bug. Please file it if
you have a reduced testcase.

-Eli

Hi,

Are there any implicit assumptions about where alloca instructions
can appear.

Static allocas should appear as a continuous chunk in the entry block,
otherwise other passes might make bad assumptions.

an alloca can appear anywhere, but when they are outside the entry block
then some optimizations may not occur. The important distinction is
between alloca's that are appear in a loop and those that are not in a
loop. Rather than detect loops, optimizers tend to just check whether
alloca's are in the entry block or not (the entry block is never part
of a loop).

Ciao,

Duncan.

And you really want your allocas in the entry block so they are
implemented by just stack pointer manipulation rather than calling
alloca(). The latter is slower, and there's also a bug that makes
calling alloca() not getting the alignment right (if it's > 8).

Is there a bug number for that? I wonder if that's what I'm hitting.

                                -Dave

> Are there any implicit assumptions about where alloca instructions
> can appear.

Static allocas should appear as a continuous chunk in the entry block,
otherwise other passes might make bad assumptions.

Ok, we should document this.

> The interesting thing about this testcase is that the extra instcombine
> makes the test pass. If I omit it, the test fails. The only differences
> in the asm are stack offsets, which leads me to believe that in the
> failing test codegen is not accounting for all allocas properly.

If running a testcase through -instcombine -instcombine gives a result
that isn't identical to -instcombine, that's a bug. Please file it if
you have a reduced testcase.

No, hat's not what I'm doing. I'm limiting the number of transformations
instcombine does to do a binary search and narrow down on the specific
transformation that causes the problem (or in this case, masks it).

                               -Dave

It's #4422.