Proposal for new Legalization framework

In the spirit of the (long-term) intent to migrate away from the SelectionDAG framework, it is desirable to implement legalization passes as discrete passes. Attached is a patch which implements the beginning of a new type legalization pass, to help motivate discussion.

Is LLVM IR the right level for this? The main alternative approach that’s been discussed is to do FastISel to a target-independent opcode set on MachineInstrs, and then do legalization and ultimately the last phase off instruction selection proper after that. The most obvious advantage of using LLVM IR for legalization is that it’s (currently) more developer-friendly. The most obvious advantage of using MachineInstrs is that they would make it easier to do low-level manipulations. Also, doing legalization on MachineInstrs would mean avoiding having LLVM-IR-level optimization passes which lower the IR, which has historically been a design goal of LLVM.

The attached pass operates on LLVM IR, and it’s been educational to develop it this way, but I’m ok with rewriting it in MachineInstrs if that’s the consensus.

Given that the code I wrote operates on LLVM IR, it raises the following interesting issues:

The code I wrote in the attached patch operates on LLVM IR, so for example it expands adds into llvm.uadd_with_overflow intrinsics. The intrinsics available in LLVM IR today aren’t as expressive as the ISD operator set in SelectionDAG, so the generated code is quite a bit more verbose in some cases. Should we instead add new intrinsics, for add and for a bunch of other things? People I’ve talked to so far were hesitant to add new intrinsics unless they’re really prohibitive to do in other ways.

How should we legalize function argument and return types? Because of LLVM IR rules, one can’t just change the signature of a function without changing all its callsites, so as a consequence the code I wrote is a ModulePass. This is unfortunate, since it’s a goal that most of the codegen passes be FunctionPasses. Modifying the Function types may also be incompatible with the ABI coordination dance between front-ends and backends on some targets. One alternative, which is also implemented, is to leave function signatures alone and simply insert conversions to and from legal types. In this case, instruction selection would need to know how to handle illegal types in these special circumstances, but presumably it would be easy enough to special-case them. However, if this pass is followed by an optimization pass analogous to DAGCombine, it may be tricky to keep the optimization pass from creating patterns which codegen isn’t prepared to handle. Another alternative, which is not implemented yet, is have the legalization pass create new Functions, and make the original Functions simply call the legalized functions, and then have a late pass clean everything up.

We may already need some amount of special-casing for things like bitfield loads and stores. To implement the C++ memory model, some bitfield loads and stores actually need to load and store a precise number of bits, even if that number of bits doesn’t correspond to a legal integer (register) size on the target machine. This isn’t implemented yet, but I expect this will be handled by leaving those loads and stores alone, and simply putting the burden on subsequent passes to lower them properly. An alternative to this is to add new truncating-store and sign/zero-extending load intrinsics.

Another complication due to using LLVM IR is the interaction with DebugInfo. If AllocaInsts for illegal types are expanded, or if values for llvm.dbg.value intrinsics are expanded, there’s currently no way to describe this (DWARF can describe it, but LLVM IR can’t currently). I assume this could be fixed by extending LLVM IR’s DebugInfo intrinsics, but I haven’t investigated it yet.

Dan

legalize-integers.patch (48.8 KB)

In the spirit of the (long-term) intent to migrate away from the SelectionDAG framework, it is desirable to implement legalization passes as discrete passes. Attached is a patch which implements the beginning of a new type legalization pass, to help motivate discussion.

This is a great discussion to have.

Is LLVM IR the right level for this?

IMO, no, definitely not.

The main alternative approach that's been discussed is to do FastISel to a target-independent opcode set on MachineInstrs, and then do legalization and ultimately the last phase off instruction selection proper after that. The most obvious advantage of using LLVM IR for legalization is that it's (currently) more developer-friendly. The most obvious advantage of using MachineInstrs is that they would make it easier to do low-level manipulations. Also, doing legalization on MachineInstrs would mean avoiding having LLVM-IR-level optimization passes which lower the IR, which has historically been a design goal of LLVM.

I think that you (in the rest of your email) identify a number of specific problems with using LLVM IR for legalization. These are a lot of specific issues caused by the fact that LLVM IR is intentionally not trying to model machine issues. I'm sure you *could* try to make this work by introducing a bunch of new intrinsics into LLVM IR which would model the union of the selection dag ISD nodes along with the target specific X86ISD nodes. However, at this point, you have only modeled the operations and haven't modeled the proper type system.

LLVM IR is just not the right level for this. You seem to think it is better than MachineInstrs because of developer friendliness, but it isn't clear to me that LLVM IR with the additions you're talking about would actually be friendly anymore :slight_smile:

Personally, I think that the right representation for legalization is MachineInstrs supplemented with a type system that allows MVTs as well as register classes. If you are seriously interested in pushing forward on this, we should probably discuss it in person, or over beer at the next social or something.

-Chris

Is LLVM IR the right level for this? The main alternative approach that’s been discussed is to do FastISel to a target-independent opcode set on MachineInstrs, and then do legalization and ultimately the last phase off instruction selection proper after that. The most obvious advantage of using LLVM IR for legalization is that it’s (currently) more developer-friendly. The most obvious advantage of using MachineInstrs is that they would make it easier to do low-level manipulations. Also, doing legalization on MachineInstrs would mean avoiding having LLVM-IR-level optimization passes which lower the IR, which has historically been a design goal of LLVM.

The attached pass operates on LLVM IR, and it’s been educational to develop it this way, but I’m ok with rewriting it in MachineInstrs if that’s the consensus.

Given that the code I wrote operates on LLVM IR, it raises the following interesting issues:

The code I wrote in the attached patch operates on LLVM IR, so for example it expands adds into llvm.uadd_with_overflow intrinsics. The intrinsics available in LLVM IR today aren’t as expressive as the ISD operator set in SelectionDAG, so the generated code is quite a bit more verbose in some cases. Should we instead add new intrinsics, for add and for a bunch of other things? People I’ve talked to so far were hesitant to add new intrinsics unless they’re really prohibitive to do in other ways.

How should we legalize function argument and return types? Because of LLVM IR rules, one can’t just change the signature of a function without changing all its callsites, so as a consequence the code I wrote is a ModulePass. This is unfortunate, since it’s a goal that most of the codegen passes be FunctionPasses. Modifying the Function types may also be incompatible with the ABI coordination dance between front-ends and backends on some targets. One alternative, which is also implemented, is to leave function signatures alone and simply insert conversions to and from legal types. In this case, instruction selection would need to know how to handle illegal types in these special circumstances, but presumably it would be easy enough to special-case them. However, if this pass is followed by an optimization pass analogous to DAGCombine, it may be tricky to keep the optimization pass from creating patterns which codegen isn’t prepared to handle. Another alternative, which is not implemented yet, is have the legalization pass create new Functions, and make the original Functions simply call the legalized functions, and then have a late pass clean everything up.

We may already need some amount of special-casing for things like bitfield loads and stores. To implement the C++ memory model, some bitfield loads and stores actually need to load and store a precise number of bits, even if that number of bits doesn’t correspond to a legal integer (register) size on the target machine. This isn’t implemented yet, but I expect this will be handled by leaving those loads and stores alone, and simply putting the burden on subsequent passes to lower them properly. An alternative to this is to add new truncating-store and sign/zero-extending load intrinsics.

Another complication due to using LLVM IR is the interaction with DebugInfo. If AllocaInsts for illegal types are expanded, or if values for llvm.dbg.value intrinsics are expanded, there’s currently no way to describe this (DWARF can describe it, but LLVM IR can’t currently). I assume this could be fixed by extending LLVM IR’s DebugInfo intrinsics, but I haven’t investigated it yet.

Hi Dan,

Thank you for working on this. You mentioned that the two alternatives for replacing SelectionDAG is to munch on it from the top (by legalizing the IR) or from the bottom (by removing the scheduler, Isel and finally the legalization and lowering). You also mentioned some of the disadvantages of the approach that you are proposing. And I agree with you, this approach has many disadvantages. I think that the end goal should be legalization at the MI level.
The llvm IR is nice and compact but it is not verbose enough to allow lowering. You mentioned ext/load and trunc/store, but this problem is much worse for vectors. For example, we lower shuffle vector to the following ISD nodes: broadcast, insert_subvector, extract_subvector, concat_vectors, permute, blend, extract/insert_element and a few others. Representing all of these nodes in IR would be inefficient and inconvenient. Every optimization that handles these intrinsics would need to to setup std::vectors, etc, and I think that the compile time for this will not be great either. But I don’t think that this is the biggest problem. How do you plan to handle constants ? Do you lower them to global variables and loads ? How would you implement FastISEL ? Do you plan on having two instruction selectors (like we have today) or do you plan to lower IR to intrinsics and select that ? I think that one of the goals of getting rid of selection dag is to have one instruction selector.

Thanks,
Nadav

One question :

"In the spirit of the (long-term) intent to migrate away from the
  SelectionDAG framework"

.. is this meant in general or just in respect to legalization?

Everything. This includes all of the custom lowering code for all of the targets, all of dagcombine, and maybe all of the patterns in the TD files.

Amen.

I would really push towards doing this in LLVM IR as the next step.

It's possible that what you are proposing is the right "long term" solution but I think it's not a good evolutionary approach; it's more revolutionary.

I've already thought of many things that could be very clearly and easily done in IR that are done in very convoluted ways in Selection DAG. This kind of migration could take place right now and as we thin out the selection DAG portion of things to where it is almost non existent, making a jump to just eliminate it and replacing it would be more practical.

Something like soft float for example is nearly trivial to do in IR.

At the risk of appearing stupid, I can say that I've really struggled to understand selection DAG and all it's facets and interaction with table gen patterns, and this after having done a whole port from scratch already by myself.

Part of it is the lack of documentation but also there are too many illogical things (to me) and special cases and hacks surrounding selection DAG and tablegen.

On the other hand, I recently started to write some IR level passes and it was nearly trivial for me to understand how to use it and transform it. All the classes are more or less very clean, logical and regular. I was writing transformation passes on the first day with no issues.

I think that LLVM IR could be extended to allow for all the things in legalization to take place and many other parts of lowering, i.e. lowering to use some IR which has additional lower level operations.

Reed

I would really push towards doing this in LLVM IR as the next step.

What makes you say that?

It's possible that what you are proposing is the right "long term" solution but I think it's not a good evolutionary approach; it's more revolutionary.

Doing this in LLVM IR seems like a major step backwards. It gets us no closer to the ultimate goal, would add a ton of code, and would make the compiler more complex.

-Chris

Hi Dan,

Others have weighed in on the merits of IR vs MI legalization, I thought I'd chip in on a different area:

+ /// Legal roughly means there's a physical register class on the target
+ /// machine for a type, and there's a reasonable set of instructions
+ /// which operate on registers of this class and interpret their contents
+ /// as instances of the type. For convenience, Legal is also used for
+ /// types which are not legalized by this pass (vectors, floats, etc.)
+ Legal,

I don't think this is the right definition of a legal type. I know that that's how SelectionDAG currently defines it, and I think that definition is behind a lot of the difficulty in retargeting LLVM to something that doesn't look like the intersection of X86 and ARM.

I think the correct answer (credit to Chris for this description) is that a legal type is one that (more or less) corresponds to a set of physical registers, and which the target is capable of loading, storing, and copying (possibly also inserting/extracting elements, for vector types).

--Owen

Hi Dan,

I would really push towards doing this in LLVM IR as the next step.

What makes you say that?

Partly for the reasons Dan stated. For me, the IR is definitely way more friendly too and not tangled
up in lots of undocumented obscurity as selection DAG is with tablegen and many other idiosyncrasies of the backend design. Solving problems with selection DAG, to me, is like playing dungeons and dragons. I feel like I need to ask the wizard for a magic spell to capture a gnome. I don't feel like I'm doing science. It's too much like a game with thousands of rules to know.

I should be able to blow through many complex problems using the C++ helper classes but they are not obvious and I spend more time learning all those obscure classes and digging through code to understand them than in doing real work. With IR, my experience is exactly the opposite.

Someone that understands compilers and has experience with them should not be getting tangled up in a web of complex C++ classes and undocumented tools.

The C++ classes should be making things clearer and this is , for me, definitely the case with IR and not with selection DAG.

I don't immediately see the need for yet another IR. It's already pretty low level. Sure you could design another lower level one that is further removed from C but I don't see that point.

You yourself like people to change things with small patches.

A good example of that recently was Bill Wendlings work on attributes. He added a nice forward step in a reasonable time frame. Sure, one could design from scratch a much more sophisticated way to handle attributes but how to find the time to do that and fit it in with all of what is already there is the rub. Many things in LLVM could be better if it were possible to reset the clock. We have a lot of "technical debt" already from people being in a hurry and not thinking things through nor taking the time to clean up things that were not the best choice and to document what is already there. Well, people also wanted to move forward so it's not a criticism but a statement of what is and what became for various reasons.

So I think the first step is to start to shrink the work done with selection DAG as Dan has done with his
work on legalization. I think that all could go very quickly and incrementally.

Dan already showed that with little work, many things can be done in IR. I myself have been thinking along exactly the same lines for some time now.

As a practical matter; eliminating selection DAG can be done in a simple evolutionary way by doing that same work with IR and judiciously adding new features to IR as is demanded.

We would need to extend IR but it can be done in steps.

It's possible that what you are proposing is the right "long term" solution but I think it's not a good evolutionary approach; it's more revolutionary.

Doing this in LLVM IR seems like a major step backwards. It gets us no closer to the ultimate goal, would add a ton of code, and would make the compiler more complex.

I don't see why it's a step backwards or why it would add tons of code and make the compiler more complex. I see the exactly the opposite.

Why do you think that?

-Chris

Reed

I may have missed the discussion, but why are we trying to move away from the SelectionDAG? Are there specific problems that we don't want to fix or live with? If so, what are they?

-Krzysztof

Hi Dan,

> The main alternative approach that's been discussed is to do FastISel to
a target-independent opcode set on MachineInstrs, and then do legalization
and ultimately the last phase off instruction selection proper after that.
The most obvious advantage of using LLVM IR for legalization is that it's
(currently) more developer-friendly. The most obvious advantage of using
MachineInstrs is that they would make it easier to do low-level
manipulations. Also, doing legalization on MachineInstrs would mean
avoiding having LLVM-IR-level optimization passes which lower the IR, which
has historically been a design goal of LLVM.

The approach taken in WHIRL, which has a lot of advantages, is exactly to
lower the IR. It seems strange that in the back end we have Machine*
classes that correspond very closely to IR equivalents, but which don't
share any code and often have subtly different interfaces. The approach
taken in WHIRL is to progressively replace machine-independent bits of the
IR with machine-dependent ones, with abstract instructions being replaced
with machine instructions, abstract registers with machine registers, and
so on.

Couldn't we first lower LLVM IR to a (mostly) target-independent sequence
of MachineInstrs, and then progressively lower those? That seems to be
very close to what you describe, and makes a great deal of sense to me.
The MachineInstr could model operations and types that are not legal for
the current target, but passes could lower those until everything is legal
and all opcodes are target-specific.

Targets can still lower function arguments/returns as they please. And
most of the infrastructure for this is already present (no new IR).

The non-sequential representation is interesting. It's what first got me
interested in working on SelectionDAG. I find it useful to think about
programs as dependence graphs, and having that be the actual representation
is really cool. However, the only thing the SelectionDAG infrastructure
actually does that really takes advantage of this in a meaningful way is
scheduling (this is a somewhat simplified explanation). And, it turns out
that one of the premises of the non-sequential approach was the idea that
the scheduler would eventually be so good that the user's original schedule
would be completely irrelevant. In reality, keeping the user's original
schedule around has actually proved to be quite practical. SelectionDAG has
since grown the ability to annotate nodes with an ordering, but it's
awkward because it has to live all the way through legalization and
instruction selection.

Also, the non-sequential representation has other downsides. Sequential
representations have builtin topological orderings which are usually easy
to maintain, so they're always available. SelectionDAG has to recompute
topological orderings numerous times and use worklists. In comparison, the
LLVM IR legalize patch I attached at the beginning of this thread doesn't
need a worklist or a sort; it just walks down the instructions in each
block, which is handy.

Another reason why people don't like SelectionDAG framework is that it's
slow. It's effectively one big pass (with several subpasses), and it
consistently scores among the top of the compile time profiles. At one
time, I did a bunch of work to try to make it faster, though I don't think
I did all the right things. I think there are still several opportunities
to speed it up significantly. However, some of the problems are pretty
fundamental. For example, the continuous-CSE, which automatically
eliminates equivalent nodes at all times (except in special cases), is
something that costs a lot, and it creates a fair amount of complexity
because it has a habit of zapping nodes out from underneath people, and it
isn't obviously better than just having simple discrete CSE passes.

Another reason is that SelectionDAG is limited to one basic block at a
time. The idea of whole-function SelectionDAGs, or even just SEME
SelectionDAGs, has been tossed around a lot over the years. However one of
the big things stopping this idea is the slowness. One big SelectionDAG
wouldn't have any big reason to be significantly faster than lots of little
ones, and it could easily be a lot slower.

Another is SelectionDAG's relationship with the notoriously
developer-hostile tablegen-based instruction-selection process. Those
people may not realize that getting rid of SelectionDAG won't necessarily
get rid of tablegen-generated instruction selectors, however that's another
story.

Another is that many parts of SelectionDAG were never really designed, but
rather just emerged over a long period of evolution. Things like Glue,
which was a hack to work around limitations which mostly no longer exist,
but which then grew to become used for a variety of things, simply because
it was there. Or the calling-convention lowering code, which has remnants
of about three different attempts at frameworks which each had good ideas
but were ultimately insufficient in their own ways but never got fully
replaced.

Ultimately, the SelectionDAG is unloved today is it's unloved. It really
needs some stiff refactoring. And optimization. And better documentation.
And better developer tools. But with all the other problems, the zeitgeist
has turned against it.

Dan

Hi Dan,

Hi Owen,

Others have weighed in on the merits of IR vs MI legalization, I thought
I'd chip in on a different area:

+ /// Legal roughly means there's a physical register class on the
target
+ /// machine for a type, and there's a reasonable set of instructions
+ /// which operate on registers of this class and interpret their
contents
+ /// as instances of the type. For convenience, Legal is also used for
+ /// types which are not legalized by this pass (vectors, floats, etc.)
+ Legal,

I don't think this is the right definition of a legal type. I know that
that's how SelectionDAG currently defines it, and I think that definition
is behind a lot of the difficulty in retargeting LLVM to something that
doesn't look like the intersection of X86 and ARM.

Do you have a particular target in mind that we could discuss? Not all
variances from the intersection of x86 and ARM are of the same nature; it's
hard to talk in full generality here.

I think the correct answer (credit to Chris for this description) is that
a legal type is one that (more or less) corresponds to a set of physical
registers, and which the target is capable of loading, storing, and copying
(possibly also inserting/extracting elements, for vector types).

If the target doesn't actually have a copy for a register class which will
be register-allocated, it will just need to pretend it has one at this
level and lower it later somehow, otherwise a lot of other stuff won't work.

I don't see why load and store are special at this level though. Or
insert/extract element?

Dan

I appreciate the detailed explanation. Thanks!

Speaking of the .td files and the table-driven instruction selection---yes, I think that there is a tendency for the .td files to become unreadable, and the selection process (divided into the custom and automatic lowering) to be somewhat harder to track and debug than the other passes. In the long term, are there ideas to replace the tablegen with another tool? The reason I ask is that I plan to develop a pattern-driven instruction combiner/simplifier for our Hexagon backend, that would work on the MI level, and I was planning to use tablegen and .td files to represent the patterns. The only reason for that is that tablegen is already here, but I would be interested in any suggestions on how to represent MI patterns in ways that are (1) easy to edit/add/understand, and (2) not prohibitively complex/obscure for the code to work with (i.e. parse/match/substitute/etc.).

-Krzysztof

I have an example in an architecture that we have here. We have fat pointers, which we represent in the IR as pointers in a different address space. It is very easy to model them in this way, and both LLVM IR and Clang are happy with the notion that you have different sized pointers in different address spaces (instcombine needed a little bit of hacking to tell it that it shouldn't turn inttoptr and ptrtoint sequences into bitcasts if the target and destination are different, which would be fixed by adding the proposed addrspacecast instruction), but as soon as you get to the target you have two assumptions that are firmly entrenched:

1) That all pointers for a given target are the same size (not true for us, or for some GPUs, or even for some microcontrollers)
2) That pointers are integers (not true for us, or any other architecture with fat pointers)

I dealt with the former by adding a new ValueType and custom lowering of pointers in the selected address space to it, but that's a bit ugly when the information about the pointer sizes is already present in the IR. I haven't found a good solution to the latter, because lots of things use the is-a-pointer constraint in selection DAG patterns (e.g. loads, stores, atomics, and so on) and there is no simple way of modifying the pattern matching in TableGen to allow for multiple kinds of pointer.

I'm not sure how GPU back ends that want to select different load / store instructions for different address spaces are supposed to do it.

David

P.S. I'll be at EuroLLVM next week if anyone has good solutions or also encounters these problems and wants to chat about them.

> In the spirit of the (long-term) intent to migrate away from the
SelectionDAG framework, it is desirable to implement legalization passes as
discrete passes. Attached is a patch which implements the beginning of a
new type legalization pass, to help motivate discussion.

This is a great discussion to have.

> Is LLVM IR the right level for this?

IMO, no, definitely not.

> The main alternative approach that's been discussed is to do FastISel to
a target-independent opcode set on MachineInstrs, and then do legalization
and ultimately the last phase off instruction selection proper after that.
The most obvious advantage of using LLVM IR for legalization is that it's
(currently) more developer-friendly. The most obvious advantage of using
MachineInstrs is that they would make it easier to do low-level
manipulations. Also, doing legalization on MachineInstrs would mean
avoiding having LLVM-IR-level optimization passes which lower the IR, which
has historically been a design goal of LLVM.

I think that you (in the rest of your email) identify a number of specific
problems with using LLVM IR for legalization. These are a lot of specific
issues caused by the fact that LLVM IR is intentionally not trying to model
machine issues. I'm sure you *could* try to make this work by introducing
a bunch of new intrinsics into LLVM IR which would model the union of the
selection dag ISD nodes along with the target specific X86ISD nodes.
However, at this point, you have only modeled the operations and haven't
modeled the proper type system.

I don't wish to argue about this, and am fine following your suggestion.
However, I would like to understand your reasons better.

I don't think the type system is really the issue. The only thing
SelectionDAG's type system has which LLVM IR's lacks which is useful here
is "untyped", and that's a special-purpose thing that we can probably
handle in other ways.

You and others are right that there could be a fair number of new
intrinsics, especially considering all the X86ISD ones and all the rest. Is
this a significant concern for you? Targets already have large numbers of
target-specific intrinsics; would adding a relatively moderate number of
new intrinsics really be a problem?

There's also the problem of keeping callers and callees consistent, and
it's indeed quite a dickens, but it need not be a show-stopper.

LLVM IR is just not the right level for this. You seem to think it is

better than MachineInstrs because of developer friendliness, but it isn't
clear to me that LLVM IR with the additions you're talking about would
actually be friendly anymore :slight_smile:

As I see it, people working in codegen are going to have to deal with lots
of codegeny instructions regardless of whether we call them instructions or
intrinsics. Is it really better one way or the other?

Personally, I think that the right representation for legalization is
MachineInstrs supplemented with a type system that allows MVTs as well as
register classes. If you are seriously interested in pushing forward on
this, we should probably discuss it in person, or over beer at the next
social or something.

Ok.

Dan

I have a target for which v4i8 is legal. I can load it, store it, copy it, insert/extract elements, shuffle it. But I can’t add it, subtract it, multiply it, etc. It can also load, store, and copy v16f32, but again can’t operate on it.

I’m open to suggestions on this, but the point was to preserve some functionality for type legalization, rather than going back to the old-style of having only a combined type+op legalizer. The concept was to prune back the functionality of type legalization to only dispensing with types for which the really is not even minimal hardware support.

–Owen

.....

I don't wish to argue about this, and am fine following your suggestion.
However, I would like to understand your reasons better.

What would be the plan as far as incrementally achieving this alternate implementation?

Why has " avoiding having LLVM-IR-level optimization passes which lower the IR, which has historically been a design goal of LLVM"?