question on instruction selection

Hi,

I was wondering how exactly instruction selection works in LLVM. As I'm not aware of any document describing it, I'll ask here :slight_smile:

So, the tablegen files that each backend implements describe the mapping between selection DAG nodes and assembly instructions (and optionally? their binary encoding). The selection DAG (one per basic block) is produced "by hand" directly from the LLVM IR. This pass is independent of the target architecture. Therefore, implementing a new backend is "just" about specifying patterns that convert selection DAG nodes to the target's assembly.

Is my understanding correct? Also, is there any document describing the selection DAG nodes that one needs to match against? And what's the algorithm behind this selection DAG?

I'm also wondering what would be the possible improvement of the approach presented last Friday at POPL (http://portal.acm.org/citation.cfm?doid=1706299.1706346 or http://www.eecs.tufts.edu/~dias/gentileset.pdf) for LLVM. Any insight? The approach seems similar to what gcc does or did at least (IR->RTL->ASM), so I'm not entirely sure there would be something to gain here for LLVM. It would be nice to generate the selection dag from LLVM IR automatically, though.

Thanks,
Nuno

Hi,

I was wondering how exactly instruction selection works in LLVM. As I'm not
aware of any document describing it, I'll ask here :slight_smile:

So, the tablegen files that each backend implements describe the mapping
between selection DAG nodes and assembly instructions (and optionally? their
binary encoding).

Yes. And yes, the binary encoding is optional. Obviously, features like JIT
compilation require it though.

The selection DAG (one per basic block) is produced "by
hand" directly from the LLVM IR.

Yes.

This pass is independent of the target
architecture.

Not entirely. There are several virtual functions which target-specific
code overrides for specifying things like how arguments, return values,
and calls are lowered.

Therefore, implementing a new backend is "just" about
specifying patterns that convert selection DAG nodes to the target's
assembly.

That's just a part of it. There's also the description of the target's
register sets, ABI information, assembler information, target-specific
lowering, and a variety of other things. Take a look at an actual
target.

Is my understanding correct? Also, is there any document describing the
selection DAG nodes that one needs to match against?

The "how to write a backend" documents are relevant. As are existing
targets. The comments on the opcodes in
include/CodeGen/SelectionDAGNodes.h are fairly descriptive as well.

And what's the
algorithm behind this selection DAG?

The underlying algorithm is a fairly simple bottom-up tiling.

I'm also wondering what would be the possible improvement of the approach
presented last Friday at POPL
(http://portal.acm.org/citation.cfm?doid=1706299.1706346 or
http://www.eecs.tufts.edu/~dias/gentileset.pdf) for LLVM. Any insight? The
approach seems similar to what gcc does or did at least (IR->RTL->ASM), so
I'm not entirely sure there would be something to gain here for LLVM. It
would be nice to generate the selection dag from LLVM IR automatically,
though.

What are you looking to do?

Dan

Many thanks for your reply, Dan!

The "how to write a backend" documents are relevant. As are existing
targets. The comments on the opcodes in
include/CodeGen/SelectionDAGNodes.h are fairly descriptive as well.

Ah, I see. There's actually much more documentation than I though :slight_smile: Thanks for the pointers.

I'm also wondering what would be the possible improvement of the approach
presented last Friday at POPL
(http://portal.acm.org/citation.cfm?doid=1706299.1706346 or
http://www.eecs.tufts.edu/~dias/gentileset.pdf) for LLVM. Any insight? The
approach seems similar to what gcc does or did at least (IR->RTL->ASM), so
I'm not entirely sure there would be something to gain here for LLVM. It
would be nice to generate the selection dag from LLVM IR automatically,
though.

What are you looking to do?

For now, I'm just trying to understand what's the main contribution of this paper towards simplifying the retargeting of a compiler. Don't get me wrong; I do not want to bash the paper; I just feel that something is escaping me. The approach proposed seems to be fairly similar to what gcc and LLVM do. What makes me wondering is why their algorithm needs heavy reasoning to do semantic equivalence checking, while gcc & LLVM only need simple pattern matching. That's why I've been scratching my head the whole day :slight_smile:
Do you have any insight that can enlighten me, please?

Thank you,
Nuno

I took a very quick look at the paper. I think the idea is to declaratively specify the semantics of each IR operation and each machine instruction, and then automatically figure out how to select the IR operations to machine instructions. GCC and LLVM require that mapping to be specified manually.

I believe it is not that automated. The IR->RTL pass is not automated at all.
I think I'll just talk with the author once again (now more knowledgeable about llvm) :slight_smile:

Thanks,
Nuno