Question about Instruction Selection

Hi,
I’m new to LLVM and I’m doing research on factors of compilation time, especially instruction selection and scheduling. One of the academic papers I read, https://llvm.org/svn/llvm-project/www-pubs/trunk/2008-CGO-DagISel.pdf (Koes, David Ryan, and Seth Copen Goldstein. “Near-optimal instruction selection on dags.”), which is also said to be the algorithm LLVM currently used(Correct me if I’m wrong), introduce an selection algorithm for DAG instead of tree matching.

My question is, according to the paper, each tile in DAG has a cost value which is used in dynamic programming process and CSE decisions. But in TableGen files, I didn’t find any field recording cost values or similar properties. I know that the MatcherTables in generated TableGen include files act like a byte code interpreter which perform bottom-up selection, resembles to bottom-up dynamic programming. But if the generated MatcherTables are really used for performing bottom-up dynamic programming selection, it still needs a cost model to help it making decisions. So where is it?

I’ve read some great articles about part of the platform-independent instruction selection components. But I’m curious about instruction selection algorithm currently used.

Cheers,

Hi,
I'm new to LLVM and I'm doing research on factors of compilation time,
especially instruction selection and scheduling. One of the academic papers
I read,
https://llvm.org/svn/llvm-project/www-pubs/trunk/2008-CGO-DagISel.pdf (Koes,
David Ryan, and Seth Copen Goldstein. "Near-optimal instruction selection on
dags."), which is also said to be the algorithm LLVM currently used(Correct
me if I'm wrong), introduce an selection algorithm for DAG instead of tree
matching.

My question is, according to the paper, each tile in DAG has a cost value
which is used in dynamic programming process and CSE decisions. But in
TableGen files, I didn't find any field recording cost values or similar
properties. I know that the MatcherTables in generated TableGen include
files act like a byte code interpreter which perform bottom-up selection,
resembles to bottom-up dynamic programming. But if the generated
MatcherTables are really used for performing bottom-up dynamic programming
selection, it still needs a cost model to help it making decisions. So where
is it?

There is a cost of sorts used to generate the matchers: it's usually
the size (complexity) of the pattern to match. Patterns can then
override it using the "AddedComplexity" field.

There is also a cost model for the output patterns, based on the
number of selected instructions and estimates of their encoded sizes.
It's used as a tie-breaker when the input complexity is equal.

TableGen then uses that cost to order the matching tables; I'm not
aware of additional cost models used in SelectionDAGISel itself; it's
a "simple" interpreter for the generated matcher, so all the cost
decisions are made statically, in tblgen, at compile-compile time.

If you haven't already, you should look at the generated tables, which
show the computed complexity of individual patterns.

Does that help?
-Ahmed

Thanks for swift reply

Ahmed Bougacha <ahmed.bougacha@gmail.com> 於 2016年6月28日 下午8:11 寫道:

Hi,
I’m new to LLVM and I’m doing research on factors of compilation time,
especially instruction selection and scheduling. One of the academic papers
I read,
https://llvm.org/svn/llvm-project/www-pubs/trunk/2008-CGO-DagISel.pdf (Koes,
David Ryan, and Seth Copen Goldstein. “Near-optimal instruction selection on
dags.”), which is also said to be the algorithm LLVM currently used(Correct
me if I’m wrong), introduce an selection algorithm for DAG instead of tree
matching.

My question is, according to the paper, each tile in DAG has a cost value
which is used in dynamic programming process and CSE decisions. But in
TableGen files, I didn’t find any field recording cost values or similar
properties. I know that the MatcherTables in generated TableGen include
files act like a byte code interpreter which perform bottom-up selection,
resembles to bottom-up dynamic programming. But if the generated
MatcherTables are really used for performing bottom-up dynamic programming
selection, it still needs a cost model to help it making decisions. So where
is it?

There is a cost of sorts used to generate the matchers: it’s usually
the size (complexity) of the pattern to match. Patterns can then
override it using the “AddedComplexity” field.

There is also a cost model for the output patterns, based on the
number of selected instructions and estimates of their encoded sizes.
It’s used as a tie-breaker when the input complexity is equal.

Sorry that I didn’t state the question clear: “cost” I’m asking is the hardware instruction(or operation) cost.
Or is it possible that backend developers express the cost model by mean of TableGen patterns?
E.g. If there is an fmadd instruction and consume less cycle, developers need to add a pattern which map fmul + fadd into fmadd. So one doesn’t need to provide numeric cost values for tablegen to select optimized instruction.

Cheers

McClane

Thanks for swift reply

Ahmed Bougacha <ahmed.bougacha@gmail.com> 於 2016年6月28日 下午8:11 寫道:

Hi,
I'm new to LLVM and I'm doing research on factors of compilation time,
especially instruction selection and scheduling. One of the academic papers
I read,
https://llvm.org/svn/llvm-project/www-pubs/trunk/2008-CGO-DagISel.pdf (Koes,
David Ryan, and Seth Copen Goldstein. "Near-optimal instruction selection on
dags."), which is also said to be the algorithm LLVM currently used(Correct
me if I'm wrong), introduce an selection algorithm for DAG instead of tree
matching.

My question is, according to the paper, each tile in DAG has a cost value
which is used in dynamic programming process and CSE decisions. But in
TableGen files, I didn't find any field recording cost values or similar
properties. I know that the MatcherTables in generated TableGen include
files act like a byte code interpreter which perform bottom-up selection,
resembles to bottom-up dynamic programming. But if the generated
MatcherTables are really used for performing bottom-up dynamic programming
selection, it still needs a cost model to help it making decisions. So where
is it?

There is a cost of sorts used to generate the matchers: it's usually
the size (complexity) of the pattern to match. Patterns can then
override it using the "AddedComplexity" field.

There is also a cost model for the output patterns, based on the
number of selected instructions and estimates of their encoded sizes.
It's used as a tie-breaker when the input complexity is equal.

Sorry that I didn’t state the question clear: “cost” I’m asking is the
hardware instruction(or operation) cost.
Or is it possible that backend developers express the cost model by mean of
TableGen patterns?
E.g. If there is an fmadd instruction and consume less cycle, developers
need to add a pattern which map fmul + fadd into fmadd. So one doesn’t need
to provide numeric cost values for tablegen to select optimized instruction.

Yes, the general thinking is that number of instructions (and also the
size of each instruction, for targets where that changes) is a good
enough estimate, and backend developers fix it up using
AddedComplexity when it isn't: for your fmul+fadd example, we'd prefer
the fmadd pattern instead of a separate fmul (and later fadd) simply
because it is larger.

So far we haven't really needed to model the instruction cost in a
finer way, and I don't think it's obvious how (you mention latency; is
it really the best heuristic on OoO cores?). IIRC there's a FIXME
somewhere in tablegen about modeling this, but I can't find it right
now.

HTH,
-Ahmed

Ahmed Bougacha <ahmed.bougacha@gmail.com> 於 2016年6月28日 下午9:15 寫道:

Thanks for swift reply

Ahmed Bougacha <ahmed.bougacha@gmail.com> 於 2016年6月28日 下午8:11 寫道:

Hi,
I’m new to LLVM and I’m doing research on factors of compilation time,
especially instruction selection and scheduling. One of the academic papers
I read,
https://llvm.org/svn/llvm-project/www-pubs/trunk/2008-CGO-DagISel.pdf (Koes,
David Ryan, and Seth Copen Goldstein. “Near-optimal instruction selection on
dags.”), which is also said to be the algorithm LLVM currently used(Correct
me if I’m wrong), introduce an selection algorithm for DAG instead of tree
matching.

My question is, according to the paper, each tile in DAG has a cost value
which is used in dynamic programming process and CSE decisions. But in
TableGen files, I didn’t find any field recording cost values or similar
properties. I know that the MatcherTables in generated TableGen include
files act like a byte code interpreter which perform bottom-up selection,
resembles to bottom-up dynamic programming. But if the generated
MatcherTables are really used for performing bottom-up dynamic programming
selection, it still needs a cost model to help it making decisions. So where
is it?

There is a cost of sorts used to generate the matchers: it’s usually
the size (complexity) of the pattern to match. Patterns can then
override it using the “AddedComplexity” field.

There is also a cost model for the output patterns, based on the
number of selected instructions and estimates of their encoded sizes.
It’s used as a tie-breaker when the input complexity is equal.

Sorry that I didn’t state the question clear: “cost” I’m asking is the
hardware instruction(or operation) cost.
Or is it possible that backend developers express the cost model by mean of
TableGen patterns?
E.g. If there is an fmadd instruction and consume less cycle, developers
need to add a pattern which map fmul + fadd into fmadd. So one doesn’t need
to provide numeric cost values for tablegen to select optimized instruction.

Yes, the general thinking is that number of instructions (and also the
size of each instruction, for targets where that changes) is a good
enough estimate, and backend developers fix it up using
AddedComplexity when it isn’t: for your fmul+fadd example, we’d prefer
the fmadd pattern instead of a separate fmul (and later fadd) simply
because it is larger.

So far we haven’t really needed to model the instruction cost in a
finer way, and I don’t think it’s obvious how (you mention latency; is
it really the best heuristic on OoO cores?).

I think that’s another big area in instruction scheduling.

IIRC there’s a FIXME
somewhere in tablegen about modeling this, but I can’t find it right
now.

Thanks for your clear explanation Ahmed : )