x86-64 large stack offsets

Hey guys,

I’m working on a bug for x86-64 in LLVM 2.9. Well, it’s actually two issues. The assembly generated for large stack offsets has an overflow; And, once the overflow is fixed, the displacement is too large for GNU ld to handle it.

void fool( int long n )
{
double w[268435600];
double z[268435600];
unsigned long i;
for ( i = 0; i < n; i++ ) {
w[i] = 1.0;
z[i] = 2.0;
}
printf(" n: %lld, W %g Z %g\n", n, w[1], z[1] );
}

Here’s one of the offending instructions produced by 2.9:

movsd -2147482472(%rsp), %xmm0

Fixing the displacement overflow is pretty easy. It’s just a matter of changing a few variable types in LLVM from unsigned to uint64_t in the functions that calculate the stack offsets. The real trouble I’m having
is finding a good place to break up the displacements during lowering. I would like the offset to be calculated similar to gcc:

movabsq $-4294969640, %rdx
movsd 0(%rbp,%rdx), %xmm0

Any suggestions on the correct lowering pass to do a transformation like this? I’m an LLVM noob, so I’m not sure where it should go.

Tx,
Cameron

To be pedantic… use of the frame pointer isn’t necessary. The stack pointer would be fine. That’s just how GCC calculates the offset for this test case.

Hi Cameron,

As you have noticed, the x86 backend only supports stack frames up to 2GB.

Fixing that would require the x86 backend to use the register scavenger during prolog epilog insertion like the ARM backend does. That particular code was very difficult to get right, and no one has thought it was worth the trouble to get it working for x86.

Your life will be a whole lot easier if you just use malloc().

/jakob

Jakob Stoklund Olesen <stoklund@2pi.dk> writes:

Hi Jakob,

Thanks for the responses.

As you have noticed, the x86 backend only supports stack frames up to 2GB.

That's unfortunate. :frowning:

Fixing that would require the x86 backend to use the register
scavenger during prolog epilog insertion like the ARM backend does.

Makes sense.

That particular code was very difficult to get right, and no one has
thought it was worth the trouble to get it working for x86.

I wouldn't imagine so, since these kinds of large stack objects are
rather rare in the C world. They are somewhat more common in the
Fortran world. :slight_smile:

Your life will be a whole lot easier if you just use malloc().

Perhaps. This is customer-written code and they will (probably) not be
willing to change it. We could replace the allocas with malloc/free
under the hood but we haven't needed to do that on past platforms. It's
certainly a mildly large change in our compiler in the sense of how
resources get allocated. It is certainly doable but for various reasons
may be undesirable.

Do you have a feel for the complexity involved with the ARM code? What
were the troublesome parts and corner cases, etc.?

                             -Dave

The register scavenger depends on correct kill flags after register allocation. It will assert if it detects something hinky.

Adding scavenger support to the x86 PEI is fairly simple. Fixing the many scavenger assertions that follow is not.

The fundamental problem is that currently nothing in the x86 target depends on correct liveness after register allocation. Since many post-RA passes modify the code, there is a good chance they will update liveness incorrectly, or not at all in some cases.

Your patches to clean this up will be welcome, but please make x86's using the register scavenger enabled by a command line flag. The majority of x86 users don't want to pay the (small) compile time cost and risk of assertions.

The machine code verifier checks for the same things as the scavenger, so it should be a help to you.

/jakob

The fundamental problem is that currently nothing in the x86 target depends on correct liveness after register allocation. Since many post-RA passes modify the code, there is a good chance they will update liveness incorrectly, or not at all in some cases.

I must note that there are not so many post-RA passes on x86
(comparable to ARM), so things might be a bit easier.

Your patches to clean this up will be welcome, but please make x86's using the register scavenger enabled by a command line flag. The majority of x86 users don't want to pay the (small) compile time cost and risk of assertions.

I did this once ~1.5 years ago - it was more or less fine, I believe I
saw 4-5 assertions during the whole llvm-gcc bootstrap.

Hi,

As you have noticed, the x86 backend only supports stack frames up to 2GB.

That's unfortunate. :frowning:

yes, see PR10488.

Ciao, Duncan.

Hi Jakob,

The fundamental problem is that currently nothing in the x86 target depends on correct liveness after register allocation. Since many post-RA passes modify the code, there is a good chance they will update liveness incorrectly, or not at all in some cases.

Your patches to clean this up will be welcome, but please make x86's using the register scavenger enabled by a command line flag. The majority of x86 users don't want to pay the (small) compile time cost and risk of assertions.

wouldn't adding a command line flag just make this feature less tested and
more unreliable? It hardly seems worth adding it if it is so unreliable
that it typically has to be turned off.

Ciao, Duncan.

If Cameron and David can clean things up so it is stable, we can consider turning it on by default. If not, we can remove it again.

I don't know how big the problem is. The trickle of scavenger assertions from ARM has slowed down.

/jakob