Predication before CodeGen

Hi,

I am planning to generate code for a peculiar architecture with no branch instructions (!), but with predicated loads and stores to memory. This means the architecture is not Turing complete, is going to waste a lot of computation, and any input program that can hope to get compiled for this architecture must have loops that can be fully unrolled, and all its functions must get fully inlined.

Since I have no branches, I think I will need to predicate code before DAG Selection. I was thinking of doing this:

  1. Run an analysis on the LLVM bitcode that calculates the condition under which an instruction will be “reached” – this is straightforward since my input program is not allowed to have cycles in the CFG.
  2. Run a pass that requires the above analysis and uses it to:
  • merge all basic blocks in topological sort order (which exists, because CFG is acyclic).
  • insert appropriate instructions to generate the predicates.
  • change all PHI-nodes into Select nodes.
  • predicate memory operations (well, at least the stores).

It is this final predication step that I am not sure how to handle. Since LLVM does not have predicated load/store instructions, will I have to upgrade the memory operations to a call or something that would then have to be custom handled by the SelectionDAG mechanism?

Or should I just leave the load/stores alone, and somehow, later after the machine instructions are created predicate them using this predicate information? If so, how do I ensure that the dependency on the predicate generating instructions is preserved (the predicate instructions are not dce’d away)?

Or should I be trying to do the predication right before Legalize? If so, I’ll need to think about it a little more.

Thank you for your time :slight_smile:
nikhil

Hi,

I am planning to generate code for a peculiar architecture with _no_ branch
instructions (!), but with predicated loads and stores to memory. This means
the architecture is not Turing complete, is going to waste a lot of
computation, and any input program that can hope to get compiled for this
architecture must have loops that can be fully unrolled, and all its
functions must get fully inlined.

Since I have no branches, I think I will need to predicate code before DAG
Selection. I was thinking of doing this:

oops, I meant to say before "Initial SelectionDAG construction"

Hi,

I am planning to generate code for a peculiar architecture with _no_ branch instructions (!), but with predicated loads and stores to memory. This means the architecture is not Turing complete, is going to waste a lot of computation, and any input program that can hope to get compiled for this architecture must have loops that can be fully unrolled, and all its functions must get fully inlined.

Since I have no branches, I think I will need to predicate code before DAG Selection. I was thinking of doing this:
1) Run an analysis on the LLVM bitcode that calculates the condition under which an instruction will be "reached" -- this is straightforward since my input program is not allowed to have cycles in the CFG.
2) Run a pass that requires the above analysis and uses it to:
   - merge all basic blocks in topological sort order (which exists, because CFG is acyclic).
   - insert appropriate instructions to generate the predicates.
   - change all PHI-nodes into Select nodes.
   - predicate memory operations (well, at least the stores).

I am not aware of a mechanism to completely remove branches for any general program. Don't you have to restrict it a loop with known iteration count?

It is this final predication step that I am not sure how to handle. Since LLVM does not have predicated load/store instructions, will I have to upgrade the memory operations to a call or something that would then have to be custom handled by the SelectionDAG mechanism?

Target specific instructions can have predicate operands. See ARM. I've implemented a if-converter but it runs after register allocation so it won't fit your needs.

Ideally (because that's what everyone does) you want a pass that operate on 3-address machine specific instructions. Then you can predicate code and merge blocks together before you do scheduling. The complication our codegen scheme is we do instruction scheduling on DAGs and only translate into 3-address code at the end of scheduling.

Sometime down the road we will be implementing whole function instruction scheduling. Once that's done, you can perhaps do a predication pass that operate on the DAG that represent the whole function. But that's not going to help you right now. So for now, your choice is either 1) Operate on DAGs that represent basic blocks, predicate nodes and merge multiple DAGs into a larger DAG. 2) Operate on the machine instructions and machine basic blocks after instruction scheduling and redo scheduling afterwards.

Evan

> Hi,
>
> I am planning to generate code for a peculiar architecture with
> _no_ branch instructions (!), but with predicated loads and stores
> to memory. This means the architecture is not Turing complete, is
> going to waste a lot of computation, and any input program that can
> hope to get compiled for this architecture must have loops that can
> be fully unrolled, and all its functions must get fully inlined.
>
> Since I have no branches, I think I will need to predicate code
> before DAG Selection. I was thinking of doing this:
> 1) Run an analysis on the LLVM bitcode that calculates the
> condition under which an instruction will be "reached" -- this is
> straightforward since my input program is not allowed to have
> cycles in the CFG.
> 2) Run a pass that requires the above analysis and uses it to:
> - merge all basic blocks in topological sort order (which
> exists, because CFG is acyclic).
> - insert appropriate instructions to generate the predicates.
> - change all PHI-nodes into Select nodes.
> - predicate memory operations (well, at least the stores).

I am not aware of a mechanism to completely remove branches for any
general program. Don't you have to restrict it a loop with known
iteration count?

Yes, I am going to have to live with that restriction. I also need to
assume no recursive functions, so all calls can be fully inlined.

>
> It is this final predication step that I am not sure how to handle.
> Since LLVM does not have predicated load/store instructions, will I
> have to upgrade the memory operations to a call or something that
> would then have to be custom handled by the SelectionDAG mechanism?

Target specific instructions can have predicate operands. See ARM.
I've implemented a if-converter but it runs after register allocation
so it won't fit your needs.

Ideally (because that's what everyone does) you want a pass that
operate on 3-address machine specific instructions. Then you can
predicate code and merge blocks together before you do scheduling.
The complication our codegen scheme is we do instruction scheduling
on DAGs and only translate into 3-address code at the end of scheduling.

Sometime down the road we will be implementing whole function
instruction scheduling. Once that's done, you can perhaps do a
predication pass that operate on the DAG that represent the whole
function. But that's not going to help you right now. So for now,
your choice is either 1) Operate on DAGs that represent basic blocks,
predicate nodes and merge multiple DAGs into a larger DAG. 2)
Operate on the machine instructions and machine basic blocks after
instruction scheduling and redo scheduling afterwards.

Thanks, that solves my confusion. Actually I did think of this earlier
but rejected the idea since this will mean that the Legalize phase is
going to generate "illegal" output -- it will still have branches in
it. I guess that can't be avoided anyway, because all input programs
aren't even legal :slight_smile:

Also, is there a straightforward way to directly go from LLVM bitcode
to some generic form of machine instructions -- something like
lib/Target/Generic? The docs mention an "old-style 'simple'
instruction selector, which effectively peephole selects each LLVM
instruction into a series of machine instructions." Is this still
available?

Thanks,
nikhil

Also, is there a straightforward way to directly go from LLVM bitcode
to some generic form of machine instructions -- something like
lib/Target/Generic? The docs mention an "old-style 'simple'
instruction selector, which effectively peephole selects each LLVM
instruction into a series of machine instructions." Is this still
available?

We think SelectionDAG based instruction selector is pretty straight forward. :slight_smile: The old style simple instruction selector isn't being used anymore. I've fixed the documentation.

Evan