A use of RDF to extend register Remat

Dear Community,

I would like to discuss few points to use RDF to extend register remat scope. Mr. Krzysztof and I have started discussion this on private mail. But I think now it would be better to include community.
Interested community member kindly previous discussion (at the end of mail) before starting here.

After analyzing if RDF can be used for solving Remat, we think that problem with RDF is that since it is post-RA, rematable register values also will be spilled and all its use will be surrounded with load-spill instructions. So identifying rematable sequence of instructions will be possible with RDF but its uses will not be associated with these instructions with any use-def chain because live-ranges have been split by spill-relaod.

One solution that I can think of is that during RA when a spill code is inserted , it can add a dummy instruction as a maker that can be used after post-RA to identify live-range (reg value) which was spilled and which instructions is using it after reloading (i.e where to use remated value). The post-RA pass can use RDF to construct remat sequence and use it and remove spill/restore as well as the marker instruction (dummy) inserted during the spill code generation. Is this possible with out changing much of Spill code framework? What all steps are required to add such a dummy /debug instruction at MIR ?

In case if RDF graph is not useable for this problem then simply use-def chain on MIR can be traversed in bottom up manner to identify remat instruction sequence.

Please share your thoughts.
Vivek

Hi Vivek,

What is the scope of your extended rematerialization support?

Indeed, rematerialization is usually a way to avoid spilling, but proposed solution sounds more like an optimization to get rid of some load/store pairs.
It seems to me clumsy to extend rematerialization after RA, because that means we have to check and recompute extra things to be able to do it. For instance, we have to rebuild the live-ranges, check for redefinition etc.

In my opinion, if you want to stay away of the register allocator, it would make more sense to explore a rematerialization algorithm working on SSA that uses register pressure to take rematerialization decisions.

Anyhow, whatever is the direction we want to pursue, I believe it is best to start with motivating examples where the current framework is not sufficient and see what would work best.

Cheers,
-Quentin

Hi Vivek,

Hello Quentin,

What is the scope of your extended rematerialization support?

Sorry for staling http://lists.llvm.org/pipermail/llvm-dev/2016-

September/104847.html

but I would like to quote a simple example from previous discussion:
void foo(long);
void bar() {
   for (int i = 0; i < 1600; ++i)
     foo(3494348345984503943);
}

$ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
...
# BB#0: # %entry
...
        lis 3, 12414
        ori 3, 3, 27470
        sldi 3, 3, 32
        oris 3, 3, 35809
        ori 30, 3, 20615
...

So my aim is to remat a value which is constructed with multiple
instructions (mostly on an architecture where complex addressing mode is
not supported)

Indeed, rematerialization is usually a way to avoid spilling, but proposed

solution sounds more like an optimization to get rid of some load/store
pairs.
It seems to me clumsy to extend rematerialization after RA, because that
means we have to check and recompute extra things to be able to do it. For
instance, we have to rebuild the live-ranges, check for redefinition etc.

I agree.

In my opinion, if you want to stay away of the register allocator, it
would make more sense to explore a rematerialization algorithm working on
SSA that uses register pressure to take rematerialization decisions.

Do you here mean working on MIR during RA phase itself ? In that case are
you referring to use size of VirtRegMap to get estimation of register
pressure or any other technique?

Also I have studied a patch for GCC for remat problem (how ever this is for
onny SH architecture) https://gcc.gnu.org/ml/gcc-
patches/2003-12/txt00058.txt
in this patch particularly df.c file . In this patch to find out complex
patterns it just travers use-def chains. I think similar thing can be
implemented at MIR level. I also thinks that it would be better to spend
more time to think about cost functions to decide if it will be good to
remat or not. Also try to address related bug https://llvm.org/bugs/show_
bug.cgi?id=25373#c9

Sincerely,
Vivek

Anyhow, whatever is the direction we want to pursue, I believe it is best