[RFC] CGContext skeleton implementation

Hello llvm-dev,

Following up on the “CodeGen Context” discussion that was started, attached is patch which implements a pretty minimal skeleton of a CGContext implementation. The goal is to allow the newly added subtarget attributes on functions to be made available to codegen so that codegen can actually switch subtargets within a Module.

Comments and questions are invited herewith.

There has been some disagreement as to what to name this class. I have stuck with “CGContext” for now, though I’m absolutely not attached to that name.

It demonstrates where the CGContext state lives, how code can access it (in X86ISelLowering.cpp), how it reads the attributes from the IR, and how it gets cached.

One thing that it doesn’t yet do is hook up to the target-specific subtarget feature interpretation. For example, it would need to hook up to the x86 subtarget code which knows that, for example, sse4.2 implies sse4.1, and so on. However, I’m sending this out now so that the other parts of the patch can get some feedback.

Thanks,

Dan

cgcontext.patch (6.87 KB)

Hi Dan,

Thanks for working on this! I like the direction this is going. One thing though, the function attributes no longer have “target-cpu” and “target-features” in them. I opted to separate out all of the different flags into separate attributes, instead of relying upon a single feature string. So the caching key would need to be modified here. Or do you think we should go back to adding back those attributes?

One thing to be addressed is when one function has attributes but the next function lacks them. Some type of default value needs to be assumed for the function lacking an attribute, and the context cached accordingly.

-bw

> Hello llvm-dev,
>
> Following up on the "CodeGen Context" discussion that was started,
attached is patch which implements a pretty minimal skeleton of a CGContext
implementation. The goal is to allow the newly added subtarget attributes
on functions to be made available to codegen so that codegen can actually
switch subtargets within a Module.
>
> Comments and questions are invited herewith.
>
> There has been some disagreement as to what to name this class. I have
stuck with "CGContext" for now, though I'm absolutely not attached to that
name.
>
> It demonstrates where the CGContext state lives, how code can access it
(in X86ISelLowering.cpp), how it reads the attributes from the IR, and how
it gets cached.
>
> One thing that it doesn't yet do is hook up to the target-specific
subtarget feature interpretation. For example, it would need to hook up to
the x86 subtarget code which knows that, for example, sse4.2 implies
sse4.1, and so on. However, I'm sending this out now so that the other
parts of the patch can get some feedback.
>
Hi Dan,

Hi Bill,

Thanks for working on this! I like the direction this is going. One thing
though, the function attributes no longer have “target-cpu” and
“target-features” in them. I opted to separate out all of the different
flags into separate attributes, instead of relying upon a single feature
string. So the caching key would need to be modified here. Or do you think
we should go back to adding back those attributes?

One of the advantages os using a single target-features string is that it's
easy to extend; there's no need for the CGContext code to know which
attributes it should care about and which it should ignore, unless we
introduce a naming convention. Alternatively, we could just have it read
*all* the attributes all the time. What do you think?

One thing to be addressed is when one function has attributes but the next
function lacks them. Some type of default value needs to be assumed for the
function lacking an attribute, and the context cached accordingly.

Yes, that's a good point. I'll update my patch to include code to handle
that case.

Dan

Hello llvm-dev,

Following up on the "CodeGen Context" discussion that was started, attached is patch which implements a pretty minimal skeleton of a CGContext implementation. The goal is to allow the newly added subtarget attributes on functions to be made available to codegen so that codegen can actually switch subtargets within a Module.

Comments and questions are invited herewith.

There has been some disagreement as to what to name this class. I have stuck with "CGContext" for now, though I'm absolutely not attached to that name.

It demonstrates where the CGContext state lives, how code can access it (in X86ISelLowering.cpp), how it reads the attributes from the IR, and how it gets cached.

One thing that it doesn't yet do is hook up to the target-specific subtarget feature interpretation. For example, it would need to hook up to the x86 subtarget code which knows that, for example, sse4.2 implies sse4.1, and so on. However, I'm sending this out now so that the other parts of the patch can get some feedback.

Hi Dan,

This basic idea makes sense to me. It is consistent with my understanding of Bill's proposal. What I don't understand is why CGContext is tied to to CodeGen objects and initialized at ISelLowering. We should be able to get a CGContext from a TargetMachine by providing a Function.

IR passes like the LoopVectorizer and LSR also require subtarget information. This was formalized by TargetTransformInfo. Now any pass using TargetTransformInfo needs to respect the current Function attributes, which means reinitializing subtarget info. I also think that IR passes should use the same interface as SD/MI passes to query codegen modes, rather than directly querying Function attributes. For that reason, they should also use CGContext. It should be possible to override those modes with command line options, but I don't think we should have ad-hoc overrides buried within passes.

I think it would be cleanest if the IR pass pipeline had an explicit point at which the subtarget gets initialized--say a SubtargetInfo pass. IR passes that run early can still get architecture defaults from TargetTransformInfo, but can't make any subtarget specific queries. IR passes that run after this point should be able to make CGContext queries, either directly or via TargetTransformInfo (I'm not sure there is any real advantage in treating TargetTransformInfo as an analysis).

-Andy

> Hello llvm-dev,
>
> Following up on the "CodeGen Context" discussion that was started,
attached is patch which implements a pretty minimal skeleton of a CGContext
implementation. The goal is to allow the newly added subtarget attributes
on functions to be made available to codegen so that codegen can actually
switch subtargets within a Module.
>
> Comments and questions are invited herewith.
>
> There has been some disagreement as to what to name this class. I have
stuck with "CGContext" for now, though I'm absolutely not attached to that
name.
>
> It demonstrates where the CGContext state lives, how code can access it
(in X86ISelLowering.cpp), how it reads the attributes from the IR, and how
it gets cached.
>
> One thing that it doesn't yet do is hook up to the target-specific
subtarget feature interpretation. For example, it would need to hook up to
the x86 subtarget code which knows that, for example, sse4.2 implies
sse4.1, and so on. However, I'm sending this out now so that the other
parts of the patch can get some feedback.

Hi Dan,

This basic idea makes sense to me. It is consistent with my understanding
of Bill's proposal. What I don't understand is why CGContext is tied to to
CodeGen objects and initialized at ISelLowering. We should be able to get a
CGContext from a TargetMachine by providing a Function.

IR passes like the LoopVectorizer and LSR also require subtarget
information. This was formalized by TargetTransformInfo. Now any pass using
TargetTransformInfo needs to respect the current Function attributes, which
means reinitializing subtarget info. I also think that IR passes should use
the same interface as SD/MI passes to query codegen modes, rather than
directly querying Function attributes. For that reason, they should also
use CGContext. It should be possible to override those modes with command
line options, but I don't think we should have ad-hoc overrides buried
within passes.

In my current patch, the CGContext instances are kept within
MachineModuleInfo instance. You're right that this is a Codegen-oriented
class. What would you think about having the CGContext instances kept in
the LLVMContext? This would also allow them to be cached in the same way,
and it would make them available to any part of the middle or backend that
wanted them.

I think it would be cleanest if the IR pass pipeline had an explicit point
at which the subtarget gets initialized--say a SubtargetInfo pass. IR
passes that run early can still get architecture defaults from
TargetTransformInfo, but can't make any subtarget specific queries. IR
passes that run after this point should be able to make CGContext queries,
either directly or via TargetTransformInfo (I'm not sure there is any real
advantage in treating TargetTransformInfo as an analysis).

Dividing the optimization pipeline into "before" and "after" subtarget
information is a bit beyond the scope of what I'm looking to do here. If
CGContext is moved to LLVMContext (and perhaps renamed), then it can be
available to any pass, and the decision of when to introduce subtarget
information can be made at a higher level.

Dan

We currently have something that looks like:

IR Transform
(links with) → TargetTransformInfo
(dynamic call) → X86TTI
(links with) → X86Subtarget

I was thinking of directly replacing X86Subtarget as such:

IR Transform
(links with) → TargetTransformInfo
(dynamic call) → X86TTI
(links with) → TargetMachine
(provides) → CGContext

So CGContext could still live inside libTarget. CGContext would be
initialized for a Function based on the attributes and the information already in its container, TargetMachine.

It sounds like you would be making CGContext part of libCore instead, like this:

IR Transform
(links with) → LLVMContext
(provides) → CGContext

CGContext would still need to be initialized with a TargetMachine, which could happen whenever the TargetMachine and Function are both available.

I’m fine with that as long as

  • no one objects to putting CGContext in libCore
  • our goal is to eventually replace TargetTransformInfo with CGContext (maybe renamed)

I think the issue here is whether we want to continue use an AnalysisGroup to vend target info, or rather to have passes get what they need from LLVMContext. I find the AnalysisGroup thing too subtle and prone to driver bugs, but haven’t fully worked out an alternate design.

Maybe something as simple as this would work:

LLVMContext
→ TargetContext (initialized when TargetMachine is available)
→ SubtargetContext (initialized when Function attributes are available)

I wouldn’t expect you to address this. As long we’re not making it difficult to do. Ideally we would have before and after target information passes, but that’s stricter than some would like. I think it would be really horrible though to have canonicalization passes depend on cpu features. So we should make it explicit which passes do and don’t depend on subtarget. Since cpu features change from function to function, this is naturally tied to initialization of CGContext.

-Andy

We currently have something that looks like:

IR Transform
(links with) -> TargetTransformInfo
(dynamic call) -> X86TTI
(links with) -> X86Subtarget

I was thinking of directly replacing X86Subtarget as such:

IR Transform
(links with) -> TargetTransformInfo
(dynamic call) -> X86TTI
(links with) -> TargetMachine
(provides) -> CGContext

So CGContext could still live inside libTarget. CGContext would be
initialized for a Function based on the attributes and the information
already in its container, TargetMachine.

It sounds like you would be making CGContext part of libCore instead, like
this:

IR Transform
(links with) -> LLVMContext
(provides) -> CGContext

CGContext would still need to be initialized with a TargetMachine, which
could happen whenever the TargetMachine and Function are both available.

I'm fine with that as long as
- no one objects to putting CGContext in libCore
- our goal is to eventually replace TargetTransformInfo with CGContext
(maybe renamed)

I think the issue here is whether we want to continue use an AnalysisGroup
to vend target info, or rather to have passes get what they need from
LLVMContext. I find the AnalysisGroup thing too subtle and prone to driver
bugs, but haven't fully worked out an alternate design.

I'm not actually a fan of the analysis group hack, and there are better
ways to slice this, but I don't really like the direction of putting the
CGContext into the LLVMContext.

There are still very viable uses of LLVM which have *no* concept of a
target outside of the LLVM IR itself. I think we should preserve this
abstraction boundary. I think that means there needs to be an *abstract*
interface between the IR layer and the target/codegen layer with a dynamic
call across that boundary.

So this makes sense to me:

LLVMContext

  -> TargetContext (initialized when TargetMachine is available)
    -> SubtargetContext (initialized when Function attributes are
available)

Only if TargetContext and SubtargetContext are abstract interfaces with
trivial implementations available for cases where there is no target (in
the libTarget sense).

I don't really think that's what you're aiming for, so I would advocate for
the MachineModuleInfo or the TargetMachine being the provider of the
CGContext or the TargetContext and SubtargetContext. I like the
TargetMachine providing it personally, but I don't feel strongly here. I
also think that we could start with the MachineModuleInfo providing it and
refactor that in a subsequent patch.

FWIW, I have a plan to remove the crazy analysis group stuff in the new
pass manager. My hope is that we have a simple analysis which can be
configured with dynamic calls into a TargetMachine (or some
TargetMachine-wrapping interface). It will be very explicit with none of
the analysis group magic. It will just preserve the dynamic interface
boundary between the the IR layer and the target layer.

-Chandler

Right. On the flip side, some passes should be able to make hard queries against target/subtarget that fail if the driver doesn’t initialize them. I don’t want any way to accidentally run the vectorizer or LSR without initializing my subtarget info. Also, I want to prevent early passes from getting at subtarget info.

The only thing I don’t like about introducing CGContext in MachineModuleInfo is that TargetTransformInfo is still clearly doing the wrong thing. We would be falsely advertising that we fixed something that’s still broken.

In that case it sounds like we could have:

TargetTransformInfo analysis for passes that can benefit from target heuristics when available but happen to do something reasonable when the target is missing (inlining). Hopefully (please!) avoid dependence on subtarget here.

Passes that depend on TargetMachine can use CGContext for hard target and subtarget info queries. Since they must depend on TargetMachine anyway, they can do something like TM.getCGContext(Function).

-Andy

Right. On the flip side, some passes should be able to make hard queries
against target/subtarget that fail if the driver doesn’t initialize them. I
don’t want any way to accidentally run the vectorizer or LSR without
initializing my subtarget info. Also, I want to prevent early passes from
getting at subtarget info.

<snip>

Passes that depend on TargetMachine can use CGContext for hard target and
subtarget info queries. Since they must depend on TargetMachine anyway,
they can do something like TM.getCGContext(Function).

This sounds like you want IR-level passes which have direct, hard
dependencies on the target and codegen. That would be a radical departure
from the core design of the IR / MI separation. We have discussed this
directly in the past and Chris and others have argued *very strongly*
against this. If you want to reverse this design direction, it really
should be a totally separate discussion from the current discussion. I'm
not actually in the "strongly against" camp on this subject, but I *am*
strongly against trying to design CGContext to address these needs when we
don't yet have either a concrete design or agreement on that design. I
think we should restrain our design considerations to what we need to solve
the limited problem we have today, and not *breaking* the existing design
around TTI, and the IR/Target layering we currently use.

It might nice if scalaropts did not need to link against libTarget—we could easily avoid introducing more dependence on libTarget if that’s a goal.

Either way, we need an IR-level API that requires subtarget initialization. It should be explicit which passes do and don’t require the subtarget. This would add a lot of clarity to the role of CodeGenPrepare-style IR passes which is starting to become very murky. It would separate the problem of providing TargetMachine to IR passes from the problem of initializing the PassManager. The distinction between opt and llc passes has already become meaningless.

Back to this…

It might nice if scalaropts did not need to link against libTarget—we
could easily avoid introducing more dependence on libTarget if that’s a
goal.

We already have that state today in the overwhelming majority of the cases.
The pass manager built by the PassManagerBuilder doesn't need a target at
all. Yes, there is CodeGenPrepare which violates all manner of rules, but
thus far it has been viewed as a hack, and something that should go away
because, at a very fundamental level, it *is* a lowering pass.

Either way, we need an IR-level API that requires subtarget initialization.

Why? The only concrete use case I can come up with is lowering, but you
claim that you aren't trying to do lowering in the IR -- If you can't make
reasonable decisions without a concrete subtarget, then I claim you *are*
lowering. This is why I suggested that this would be a very significant
change in design from the direction that LLVM is currently moving.

I can see the need for target *heuristics* in an optimization pass (not
necessarily canonicalization) that is not doing lowering. But by definition
of being heuristics, there is an obvious way to implement and test them in
a target independent manner. When the target information is no longer a
heuristic and cannot be stubbed out in a target independent manner, I claim
that the transformation it is guiding has ceased to be target independent
as well and doesn't belong in the IR.

But all of this should absolutely be discussed in a separate thread (or in
person) as it is largely orthogonal to allowing the TargetMachine to react
appropriately to function attributes.

Really?
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/63921

What term would you like to use for "irreversible machine specific IR transformations”? We can call it destructive anti-canonicalization if you like :slight_smile: The LLVM community is pushing in the direction of more machine specific IR transformations (both reversible and irreversible). I want to allow those to run later in the pipeline for better testing and maintenance. I want to see the framework formalized a bit so it’s easier for contributors to get things done the right way.

This has nothing to do with lowering to machine instructions. We do not want to model individual machine instructions and their side-effects with LLVM IR. We just want to produce the IR patterns that make sense for the microarchitecture.

That’s not a useful or practical criteria for what should be implemented in IR. If target support is missing, IR optimization will be skipped and codegen will do terrible things. Not only is that uninteresting for testing, but I want to make sure it doesn’t accidentally happen as a result of driver hacking or untested combination of driver options.

The criteria for IR vs MI seems obvious. Can your transform be expressed in terms of IR operations, does it depend on ISEL choices, precise register pressure, instruction latency, or does it have a phase ordering dependence on an MI pass? IR is sometimes the better way to go for development time, simplicity, maintainability, and sharing across targets.

CodeGenPrepare doesn’t need to go away, it needs to be a collection of passes, some of which are completely target specific.

I will say that one potential issue with target specific IR passes is how CodeGen currently uses AliasAnalysis, which somewhat assumes a canonical IR form.

I tried to present possibilities for allowing TargetMachine to react to function attributes in the bulk of my message (the part that you didn’t quote). Hopefully Dan will choose an implementation that works for IR passes. We could continue discussing it in the original thread.

-Andy

I just want to second this. There are a number of folks out there interested in target or sub-target independent optimization uses for at least part of their compilation pipeline. I'm one of them. :slight_smile:

p.s. Sorry for chiming in so late.

Philip

Trying to bubble way back to the top, Andy, do you think there is anything else that needs to be done here before it can go in? I feel like most of our discussion centered around future work, and I’d like to unblock the immediate work to support subtarget-specific code generation.

Trying to bubble way back to the top, Andy, do you think there is anything else that needs to be done here before it can go in? I feel like most of our discussion centered around future work, and I’d like to unblock the immediate work to support subtarget-specific code generation.

Do we have a solution that can work with TTI? I don’t think there was any objection to putting CGContext in TargetMachine, except that we have too many layers of abstraction, which you said will be cleaned up eventually:

IR Transform
(links with) → TargetTransformInfo
(dynamic call) → X86TTI
(links with) → TargetMachine
(provides) → CGContext

-Andy

This seems fine to me?

Ok. Future work: do something simpler for TTI, as opposed to analysis groups.
-Andy

Hi Andy et al,

Attached is an updated version of the CGContext patch. This fixes some bugs in the previous version, and moves the CGContext registry to TargetMachine. Is this in line with what you expected?

Dan

cgcontext.patch (10.7 KB)

Hi Andy et al,

Attached is an updated version of the CGContext patch. This fixes some bugs in the previous version, and moves the CGContext registry to TargetMachine. Is this in line with what you expected?

Sorry, I put this aside for too long when it wasn’t hard to review at all. This is right in line with what I expected. CGContext is not in the patch, but I get the idea.

-Andy

To resurface this thread a bit, I'm looking into this at the moment. :slight_smile:

-eric