Proposal for new Legalization framework

I did not really answer you question very succinctly because I ranted about selection DAG and that's not the issue here.

So you want to have a machine ir and Dan was proposing to do it regular IR.

My main reason for using regular IR is that I think it's more likely to be doable in an incremental way and I don't see how creating another IR will help things.

You can add to the existing IR things that are needed to make it powerful enough to do what machine IR would do.

In the end, machine IR will look just like a subset of the current IR plus some additional things so I don't see how it helps make a new one and now there will be many things that are similar to IR but with slightly different rules, a different C++ class definition, etc.

I think that path to a machine IR will be very lengthy unless Apple and Google are willing to throw a lot of high quality resources that way.

Reed

I don't know if it helps at all, but another option might be some sort
of target CPU modelling.
I know tablegen does a lot of this, but a decompiler called boomerang
(http://boomerang.sourceforge.net/) does have some interesting
infrastructure in the area of CPU modelling.
It uses a CPU modelling language that is used to automatically
generate source code that assembles/disassembles instructions.
It might not be suitable, but I think it would be good to have a look
at boomerang to see if its CPU modelling can be used in LLVM.

Hi Reed,

I would really push towards doing this in LLVM IR as the next step.

What makes you say that?

It's possible that what you are proposing is the right "long term" solution but I think it's not a good evolutionary approach; it's more revolutionary.

Doing this in LLVM IR seems like a major step backwards. It gets us no closer to the ultimate goal, would add a ton of code, and would make the compiler more complex.

-Chris

I did not really answer you question very succinctly because I ranted about selection DAG and that's not the issue here.

So you want to have a machine ir and Dan was proposing to do it regular IR.

My main reason for using regular IR is that I think it's more likely to be doable in an incremental way and I don't see how creating another IR will help things.

I believe we, i.e. the entire llvm community, agree that replacing the instruction selector a big undertaking. I believe it will be the most complex and lengthy one that we would have done. It will take some time to create an alternative to selectiondag. Then it will take a couple of years still to migrate the existing targets over before we can kill off selectiondag.

Given that, we really want one true alternative that will live for a long time. So doing instruction selection in llvm IR doesn't make sense unless we believe it will satisfy all of our goals. It will just a technical debt which will be hard to eliminate if it finds a client.

You can add to the existing IR things that are needed to make it powerful enough to do what machine IR would do.

In the end, machine IR will look just like a subset of the current IR plus some additional things so I don't see how it helps make a new one and now there will be many things that are similar to IR but with slightly different rules, a different C++ class definition, etc.

I think that path to a machine IR will be very lengthy unless Apple and Google are willing to throw a lot of high quality resources that way.

Many people are thinking hard about it. I'm optimistic that some concrete proposals will be presented to the community in the next 2-3 months. In the mean time, I think it's best if the interested parties meet to discuss this in person (sorry I understand this is not possible for some). To me, it's almost impossible to discuss such a broad topic in email threads since it will inevitably splinter into multiple threads.

Evan

Hi Evan,

To me, the expanding of regular IR will achieve nearly the same result as building a lower level IR. I think it will be as good as the alternate proposals without any negatives but of course
I can't prove that, it's just an opinion.

The big advantage is that this work can proceed incrementally even today.
A new low level IR and redesign can easily miss the mark; suffering from what I like to call premature abstraction. It's really hard to know which abstraction will work until you do a lot of the work and then you start to see how well it really works. There is no way to really test any
of this alternate proposal and we will only find out that it's not working after a huge amount of effort and time has been expended.

With a small amount of work, Dan already achieved some good success using the IR approach. Just think how long it will take for the alternate proposal to get that far.

If we move the necessary pieces into the current IR and begin work there; there is no impact to people. Pieces they already see as upstream are just being moved further upstream.

At some point we should be able to just skip over the current DAG and hook up downstream with machine basic blocks and such.

Building an experimental new instruction selector that works from the IR (or this extended IR) can proceed in parallel.

Open64, from what I understand, has done just fine by just havin one IR.

So nobody has to ever rewrite their port if they are happy with the current Selection DAG though after some time there would be little support for it.

I think that anything that is revolutionary should require a much much higher burden of proof and vetting than something that has a clear evolutionary path.

If the revolutionary path cannot be clearly justified then I think it should not be chosen.

Reed

There definitely are strong advantages to using one datastructure to represent multiple levels of IR: you have less code in the compiler, more shared concepts, etc. I have seen and work with several compilers that tried to do this. Even GCC does this (in the opposite direction) with "treessa" which repurposes some front-end data structures for their mid-level IR.

While there are advantages, it also means that you get fewer invariants, and that the data structures are a worse fit for each level. To give you one simple example: LLVM IR is simplified greatly based on the assumption that it is always in SSA and that each instruction produces one result value, and exceptions to that rule (like some intrinsics) can easily be modeled with extract value operations.

This doesn't work for MachineInstrs which have the following additional complexity:
- Not everything is in SSA, you have to model physical registers, even very early.
- Lots of things return N values, and extract-value doesn't work.

I consider it unacceptable to project complexity from MachineInstrs into LLVM IR. There are wins, but there are also unacceptably high costs. Some of those include:
- LLVM IR is our stable IR format, MachineInstr is not. The later *needs* to evolve rapidly, where the former has settled down (mostly).
- The reasons people like to work with LLVM IR is often directly because of the simplifications we get from having a simple model.

Jeopardizing stability in the IR and making LLVM IR works to work with is not acceptable to me.

-Chris

I would really push towards doing this in LLVM IR as the next step.

What makes you say that?

Partly for the reasons Dan stated. For me, the IR is definitely way more friendly too and not tangled
up in lots of undocumented obscurity as selection DAG is with tablegen and many other idiosyncrasies of the backend design.

See my response to David. Making LLVM IR less friendly to solve backend problems isn't acceptable to me.

Solving problems with selection DAG, to me, is like playing dungeons and dragons. I feel like I need to ask the wizard for a magic spell to capture a gnome. I don't feel like I'm doing science. It's too much like a game with thousands of rules to know.

Also, don't conflate dislike of SelectionDAG (which many people share) with a forgone conclusion that the only way to fix it is in LLVM IR. If there is a specific problem with MachineInstrs that make them difficult to work with, we should fix that. Few people will argue that SelectionDAG is worth saving. :slight_smile:

-Chris

We obviously need an incremental migration plan.

That said, personally, I would prefer to figure out what the right destination is, before we start trying to discuss how to get there. I don't think that any other approach makes much sense.

-Chris

To me, the expanding of regular IR will achieve nearly the same result as
building a lower level IR.

Remember that we basically already have a lower level IR consisting of
basic blocks of MachineInstrs at the moment.To an extent this has
already proven itself capable of modelling targets, and making it a
first-class IR might be a reasonable amount of work (certainly easier
than SelectionDAG).

Cheers.

Tim.

Note that this was a transitional step only. Nowadays GIMPLE is its own data structure. It still keeps pointers to the tree data structure for symbols and types. But eventually, if the existing plans materialize, these will be converted into proper indexes into symbol and type tables.

Over time, we have found that there is a distinct advantage in having different internal types represent different IRs. The division of responsibilities is easier to express. Making this distinction in GCC's codebase is less than trivial, however.

There are some data structures that are advantageous to share across IRs. Mostly container types like CFG, call graph, loop structures, etc.

Diego.

Dan, and anyone else interested…

I am not sure if this has been discussed before, but I do have a case when the following logic fails to work:

lib/Analysis/ConstantFolding.cpp

static Constant *ConstantFoldBinaryFP(double (*NativeFP)(double, double),

double V, double W, Type *Ty) {

sys::llvm_fenv_clearexcept();

V = NativeFP(V, W);

if (sys::llvm_fenv_testexcept()) {

sys::llvm_fenv_clearexcept();

return 0;

}

….

This fragment seems to assumes that host and target behave in exact the same way in regard to FP exception handling. In some way I understand it, but… On some cross compilation platforms this might not be always true. In case of Hexagon for example our FP math handling is apparently more precise then “stock” one on x86 host. Specific (but not the best) example would be computing sqrtf(1.000001). Result is 1 + FE_INEXACT set. My current linux x86 host fails the inexact part… resulting in wrong code emitted.

Once again, my question is not about this specific example, but rather about the assumption of identical behavior of completely different systems. What if my target’s “objective” is to exceed IEEE precision? …and I happen to have a set of tests to verify that I do J

Thank you for any comment.

Sergei

Sure, I’m happy to explain. I apologize if I came across overly-strong about this. This is something that has come up many times before.

That’s definitely a fair criticism. In my (often crazy) mind, I’d like to solve a few problems in SelectionDAG that are not just an aspect of the DAG representation. One specific problem area with SelectionDAG (ignoring the DAG) is that various steps (legalization, isel, etc) want to introduce target specific operations that require a specific register class. The only way to model that in SelectionDAG is by picking an MVT that happens to align with it, and hoping that the right thing happens downstream.

It would be much better if SelectionDAG (and its replacement) could represent register classes directly in its type system. However, this is a really really bad idea for LLVM IR for hopefully obvious reasons.

No, I’m not specifically concerned with number of intrinsics.

No.

I consider this to be one (really important!) example of an invariant that would have to be violated to make this plan happen. I think that (in order to make this really work) we’d have to add a non-SSA LLVM IR, potentially multiple return results, subregs, etc. I think it is a really bad idea to make LLVM IR more complicated and worse to work with for the benefit of codegen.

Number of intrinsics is not a strong concern for me.

-Chris

Hi Sergei,

The degree to which LLVM actually makes any guarantees about IEEE
arithmetic precision is ambiguous. LangRef, for one, doesn't even mention
it (it mentions formats, but nothing else). The de-facto way of
interpreting holes in LangRef is to consider how the IR is used by clang
and follow the path up into the C and/or C++ standards and then work from
there. C describes a binding to IEC 60559, but it is optional, and clang
doesn't opt in. C++ doesn't even have the option. So from an official
perspective, it's not clear that you have any basis to complain ;-).

I mention all this not to dismiss your concern, but to put it in context.
Right or wrong, much of the C/C++ software world is not that keenly
concerned in these matters. This includes LLVM in some respects. The
folding of floating-point library routines which you point out in LLVM is
one example of this.

One idea for addressing this would be to teach LLVM's TargetLibraryInfo to
carry information about how precise the target's library functions are.
Then, you could either implement soft-float functions within LLVM itself
for the affected library functions, or you could disable folding for those
functions which are not precise enough on the host (in non-fast-math mode)

Another idea for addressing this would be to convince the LLVM community
that LLVM shouldn't constant-fold floating-point library functions at all
(in non-fast-math mode). I think you could make a reasonable argument for
this. There are ways to do this without loosing much optimization -- such
expressions are still constant after all, so they can be hoisted out of any
loop at all. They could even be hoisted out to main if you want. It's also
worth noting that this problem predates the implementation of fast-math
mode in LLVM's optimizer. Now that fast-math mode is available, it may be
easier to convince people to make the non-fast-math mode more conservative.
I don't know that everyone will accept this, but it's worth considering.

Dan

This is the point I was going to make and I think Tim hit the core of it.

My (week) opinion is that:

Adding lowering information to the IR is NOT the same as building another,
lower-level IR. It'll open doors to places we don't want to go, like
intermixing different levels, allowing for physical registers to be named
in IR, changing many optimizations to worry about lower level IR, etc. I
see it with the same disgust as I see inline assembly in C code.

MachineInstrs is a lower level description that is clearly separated from
the LLVM IR and has been converted from it for a long time. As Chris said,
a stable high-level IR is very important for front-end and optimization
developers, but back-end developers need to tweak it to make it work on
their architectures. My conclusion is that we might need to formalize a low
level IR, based on MIs, and allow it a very loose leash.

Why formalize if we already have it working, you ask? I think that even a
feeble formalization will improve how code is shared among different
back-ends. It'll also be an easy route for a new legalization framework
without having to deprecate much code, and without having to leave too much
old code dangling on less active back-ends. Each step of stronger
formalization can be taken on its own time, implementing on most back-ends,
iteratively.

As Evan said, whatever we do, this move will take years to complete, much
like MC. So, we better plan on something that will be stable throughout the
years, rather than try for something quick and drastic, and have hundreds
of new bugs dangling for years with no good current solution.

My tuppence.

cheers,
--renato

Dan,

Thank you for the quick and throughout reply. First paragraph pretty much sums it up. Unless there is more will to guaranty (or provide under flag) stricter version of IEEE adherence, I doubt much can be done.

So all of you with picky customers out there J Is there anyone else that would be concerned about this problem in any of it potential forms?

Sergei

To all, I'm moving on and accepting what appears to be the consensus of the
list, for now.

That said, I believe it would be easy to have levels and prohibit mixing.
Just have the Verifier pass reject the new intrinsics. A new
CodeGenVerifier pass could be added which accepts them, and tools would run
the verifier for the kind of input they expect. There'd be no need to
change any existing optimizers. No need to even add any new text to
LangRef. The new intrinsics would be documented elsewhere.

Also, there's no proposal here for physical registers or non-SSA registers
anything else like that. I think people are making slippery-slope arguments
here, but I also think that a change which requires modifying the
optimizers would be a point where the slippery-slope could be practically
bounded.

Dan

I have the opposite problem. I have customers who call libm functions with constants (or their LLVM intrinsic equivalents) are get very angry if they don't get constant folded, and they're not picky at all about the precision.

--Owen

If, peradventure, you're using fast-math mode, then you're in a different
boat.

Dan

My point is that we have clients who don’t want to lose existing functionality. Obviously it’s fine to also want to support other models as well. Also, we don’t currently support fast math flags on call instructions, so the boat’s not saved by that.

–Owen

I want to point out something about this direction that hasn't really come
up, but I think deserves some better discussion. I don't think it should be
the basis of a decision one way or the other, its more a consequence of the
decision.

At the IR level, we have some great infrastructure that doesn't exist at
the MI level:

- The pass management tools.
- A verifier that can be run before and after any pass to check the basic
invariants.
- The ability to serialize and deserialize to/from a human understandable
(and authorable) form.

I think before we invest in *significantly* more complexity and logic in
the MI layer of the optimizer, we will need it to have these three things.
Without them, the work will be considerably harder, and we will continue to
be unable to do fine grained testing during the development of new
features. We might not need all of the capabilities we have in the IR, but
I think we'll need at least those used to orchestrate fine grained testing
and validation.

Of course, adding these to MI would be of great benefit to any number of
other aspects of LLVM's development. I am *not* arguing we should eschew MI
because it lacks these things. I just want people to understand that part
of the cost of deciding that MI is the right layer for this is needing to
invest in these pieces of the MI layer.

I just want LLVM to behave the same on whatever platform it's run on. People already accept that depending on iteration order is a bug, but it's been harder to get people to accept that llvm needs bit-exact floating point constant folding, especially given the implementation difficulty.

Nick