Unnecessary spill/fill issue

Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I’ve noticed some inefficient use of the stack around constant vectors. In one example, I have code that computes a series of constant vectors at compile time. Each vector has a single use. In the final asm, I see a series of spills at the top of the function of all the constant vectors immediately to stack, then each use references the stack pointer directly:

Lots of these at top of function:

movabsq $.LCPI0_212, %rbx
vmovaps (%rbx), %ymm0
vmovaps %ymm0, 2816(%rsp) # 32-byte Spill

Later on, each use references the stack pointer:

vpaddd 2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload

It seems the spill to stack is unnecessary. In one particularly bad kernel, I have 128 8-wide constant vectors, and so there is 4KB of stack use just for these constants. I think a better approach could be to load the constant vector pointers as needed:

movabsq $.LCPI0_212, %rbx
vpaddd (%rbx), %ymm4, %ymm1

Thanks,
Jason

Does anyone have any insight into this problem? Is there a way to minimize excessive spill/fill for this kind of scenario?
Thanks,
Jason

It sounds bad, but I can’t tell from the posted info how to diagnose it.

Can you post (a possibly reduced) example to demonstrate what you’re seeing? A bug report would be even better, so we can track if there are multiple problems:
https://llvm.org/bugs/

Hi Jason,

I am guessing that the problem is that we do not recognize the sequence as rematerializable because, we do not directly load LCPI0_212 into a ymm register.
One way to fix that is by using a pseudo instruction that does the load from the constant to ymm (while defining a dead GPR register to be able to expand the pseudo), then teach the folding code how to deal with that.

Another option is to make the rematerialization smarter, but that is more complicated :).

Cheers,
-Quentin

From: "Quentin Colombet via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Jason" <thesurprises@gmail.com>
Cc: llvm-dev@lists.llvm.org
Sent: Monday, May 9, 2016 5:09:35 PM
Subject: Re: [llvm-dev] Unnecessary spill/fill issue

Hi Jason,

I am guessing that the problem is that we do not recognize the
sequence as rematerializable because, we do not directly load
LCPI0_212 into a ymm register.
One way to fix that is by using a pseudo instruction that does the
load from the constant to ymm (while defining a dead GPR register to
be able to expand the pseudo), then teach the folding code how to
deal with that.

Another option is to make the rematerialization smarter, but that is
more complicated :).

Making rematerialization smarter, however, is certainly work that would be broadly appreciated.

-Hal