Attaching range metadata to IntrinsicInst

Hi,

The range metadata can only be attached to LoadInst for now. I am considering extending its usage to IntrinsicInst so that the frontend can annotate the range of the return value of an intrinsic call. e.g.,
%a = call i32 @llvm.xxx(), !range !0
!0 = metadata !{ i32 0, i23 1024 }

The motivation behind this extension is some optimizations we are working on for CUDA programs. Some special registers in CUDA (e.g., threadIdx.x) are bounded per CUDA programming guide, and knowing their ranges can improve the precision of ValueTracking and benefit optimizations such as InstCombine.

To implement this idea, we need ValueTracking to be aware of the ranges of these special variables. These special registers are so far read-only and accessed using intrinsics. e.g.,
%threadIdx.x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x().

One possible approach is to have ValueTracking compute the known bits of these intrinsics as special cases. This approach is already taken for the x86_sse42_crc32_64_64 intrinsic. However, this approach may not be elegant because the ranges of these CUDA special registers depend on the GPU compute capability specified by -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is probably discouraged.

Therefore, the approach I am considering is to have clang annotate the ranges of these CUDA special registers according to the -target-cpu flag, and have ValueTracking pick the range metadata for optimization. By doing so, we hide the target-specific info from ValueTracking.

The code change in llvm minus clang won’t be large. The core change is only a few lines: http://reviews.llvm.org/differential/diff/10464/. If this extension sounds good to you, I’ll definitely add more tests and revise the documents on range metadata.

Best,
Jingyue

This seems fine to me, but I’d like to make sure it looks OK to Nick as well.

From: "Chandler Carruth" <chandlerc@google.com>
To: "Jingyue Wu" <jingyue@google.com>, "Nick Lewycky" <nlewycky@google.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 17, 2014 1:44:52 AM
Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst

This seems fine to me, but I'd like to make sure it looks OK to Nick
as well.

Is there any reason not to allow these on calls generally (not just intrinsic calls)?

-Hal

Chandler Carruth wrote:

This seems fine to me, but I'd like to make sure it looks OK to Nick as
well.

I strongly prefer baking in knowledge about the intrinsics themselves into the passes if possible. Metadata will always be secondary.

Separately, should value tracking look use range metadata when it's available? Absolutely.

I think it should apply to all CallInst not just IntrinsicInst (which is derived from CallInst).

Nick

Chandler Carruth wrote:

This seems fine to me, but I'd like to make sure it looks OK to Nick as
well.

I strongly prefer baking in knowledge about the intrinsics themselves into
the passes if possible. Metadata will always be secondary.

So you're saying that in this particular case you'd prefer LLVM passes to
know about the range of these PTX intrinsics, rather than Clang adding them
as metadata?

ValueTracking.cpp already has some iffy target knowledge (someone sneaked a
direct Intrinsic::x86_sse42_crc32_64_64 check in there), but extending it
to other intrinsics in other targets seems like too much... So should
target info be passed into it in some way? Any suggestions where to put it?
TargetLibraryInfo? TargetTransformInfo? In any case this seems like the
target interface will have to be augmented, and we'll have to carry an
object around into ValueTracking's compute* functions. If this is the right
way, then this is the way it will be done - design ideas are appreciated.

Eli

From: "Eli Bendersky" <eliben@google.com>
To: "Nick Lewycky" <nicholas@mxc.ca>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 17, 2014 8:41:58 AM
Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst

Chandler Carruth wrote:

This seems fine to me, but I'd like to make sure it looks OK to Nick
as
well.

I strongly prefer baking in knowledge about the intrinsics themselves
into the passes if possible. Metadata will always be secondary.

So you're saying that in this particular case you'd prefer LLVM
passes to know about the range of these PTX intrinsics, rather than
Clang adding them as metadata?

ValueTracking.cpp already has some iffy target knowledge (someone
sneaked a direct Intrinsic::x86_sse42_crc32_64_64 check in there),
but extending it to other intrinsics in other targets seems like too
much... So should target info be passed into it in some way? Any
suggestions where to put it? TargetLibraryInfo? TargetTransformInfo?
In any case this seems like the target interface will have to be
augmented, and we'll have to carry an object around into
ValueTracking's compute* functions. If this is the right way, then
this is the way it will be done - design ideas are appreciated.

Personally, I'd love to see all of the target-specific intrinsics, and all associated optimization information, localized to each target backend. As it stands, however, the target intrinsics are part of the IR, and the IR optimizers contain the logic necessary to canonicalize that IR. Because ValueTracking is used during canonicalization (by InstCombine, etc.), I think that having it directly understand the intrinsics is fine -- just try not to make the code too messy :wink:

-Hal

Thanks Chandler, Nick, Eli, and Hal for your comments!

TargetTransformInfo and TargetLibraryInfo may not be the best places because their interfaces are designed to be generic. If we want to “bake in knowledge about the intrinsics themselves into the passes”, what about embedding “target cpu” into the IR just as “target triple”? Then, we can call Module::getTargetCPU() to retrieve the target CPU.

Jingyue

Chandler Carruth wrote:

This seems fine to me, but I'd like to make sure it looks OK to Nick as
well.

I strongly prefer baking in knowledge about the intrinsics themselves into the passes if possible. Metadata will always be secondary.

This dual approach seems non-ideal. I'll agree it's workable, but are there other options? I can see a couple of possible ones:
- Adding an attribute for range on return or parameter values. This removes the dropped metadata problem.
- Extending the function attribute inference pass to add range metadata for known functions/intrinsics. This would ensure that the range metadata could get reintroduced if some pass dropped it. This seems messy though.
- Introduce a mechanism to canonicalize metadata and attributes on target intrinsics. The target provides a function which updates the intrinsic declaration (and call site?) as desired. The optimization passes call this on every target intrinsic before anything else, then proceed to use the metadata for optimization.

I don't really like any of these; I'm mostly throwing them out in case it sparks an idea for someone.

Separately, should value tracking look use range metadata when it's available? Absolutely.

Agreed.

I think it should apply to all CallInst not just IntrinsicInst (which is derived from CallInst).

Yes please. It should also apply to InvokeInst as well.

Chandler Carruth wrote:

This seems fine to me, but I'd like to make sure it looks OK to Nick as
well.

I strongly prefer baking in knowledge about the intrinsics themselves
into the passes if possible. Metadata will always be secondary.

So you're saying that in this particular case you'd prefer LLVM passes to
know about the range of these PTX intrinsics, rather than Clang adding them
as metadata?

Yep.

ValueTracking.cpp already has some iffy target knowledge (someone sneaked a

direct Intrinsic::x86_sse42_crc32_64_64 check in there), but extending it
to other intrinsics in other targets seems like too much...

That's not iffy. That's exactly how it should work, and we should have more
of that. There is a major gotcha and that's dealing with the case where the
intrinsics don't exist because the backend wasn't compiled in. If
x86_sse42_crc32_64_64 is in there (and also in instcombine btw), presumably
that problem is solved somehow? Or does llvm actually not build if you
don't enable the x86 target? I feel like we would've heard about that.

Nick

So should target info be passed into it in some way? Any suggestions where

Hi Nick,

That makes sense. I think a main issue here is that the ranges of these PTX special registers (e.g., threadIdx.x) depend on -target-cpu which is only visible to clang and llc. Would you mind we specify “target cpu” in the IR similar to what we did for “target triple”?

Thanks,
Jingyue

Eh? How do you envision this?

-eric

Hi Eric,

In the IR, besides “target datalayout” and “target triple”, we have a special “target cpu” string which is set by the Clang front-end according to its -target-cpu flag. We also write a Module::getTargetCPU() method to retrieve this string from the IR.

Jingyue

Hi Eric,

In the IR, besides "target datalayout" and "target triple", we have a
special "target cpu" string which is set by the Clang front-end according to
its -target-cpu flag. We also write a Module::getTargetCPU() method to
retrieve this string from the IR.

Not sure that I like this. Each function can have a target cpu though.
That each subtarget cares about the value of the target cpu for how
the intrinsic works sounds a lot like TargetTransformInfo to me.

-eric

Hi Nick,

That makes sense. I think a main issue here is that the ranges of these
PTX special registers (e.g., threadIdx.x) depend on -target-cpu which is
only visible to clang and llc. Would you mind we specify "target cpu" in
the IR similar to what we did for "target triple"?

Aha, that's the salient point. I'd like to see llvm using what it knows
about intrinsics statically. Something like "popcount" is a great example.
Needing to know more than the data in the intrinsic, needing to know about
what subarch is being targeted is different. I didn't realize we had such
intrinsics.

If the ranges really can't be deduced from the intrinsics as written -- and
that's enough to make me wonder whether these intrinsics are properly
designed but I won't dart down that rabbit hole now ---- if the ranges
really can't be deduced from the intrinsics alone then you should fall back
to using range metadata as you initially suggested. Sorry for running in a
circle on the design.

As an alternative, I asked Eric in person and his suggestion was to query
TargetTransformInfo for information about the intrinsic. That's also
plausible, it depends on whether you feel like the authoritative
information should be coming from the frontend or from the backend. I could
see this going either way.

Nick

From: "Nick Lewycky" <nlewycky@google.com>
To: "Eli Bendersky" <eliben@google.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 17, 2014 2:19:57 PM
Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst

Chandler Carruth wrote:

This seems fine to me, but I'd like to make sure it looks OK to Nick
as
well.

I strongly prefer baking in knowledge about the intrinsics themselves
into the passes if possible. Metadata will always be secondary.

So you're saying that in this particular case you'd prefer LLVM
passes to know about the range of these PTX intrinsics, rather than
Clang adding them as metadata?

Yep.

ValueTracking.cpp already has some iffy target knowledge (someone
sneaked a direct Intrinsic::x86_sse42_crc32_64_64 check in there),
but extending it to other intrinsics in other targets seems like too
much...

That's not iffy. That's exactly how it should work, and we should
have more of that. There is a major gotcha and that's dealing with
the case where the intrinsics don't exist because the backend wasn't
compiled in. If x86_sse42_crc32_64_64 is in there (and also in
instcombine btw), presumably that problem is solved somehow? Or does
llvm actually not build if you don't enable the x86 target? I feel
like we would've heard about that.

I think that the IR level intrinsics are still defined even if the corresponding backend is not enabled (they're in include/llvm/IR/Intrinsics<TARGET>.td which are all included by include/llvm/IR/Intrinsics.td).

-Hal

From: "Eric Christopher" <echristo@gmail.com>
To: "Jingyue Wu" <jingyue@google.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 17, 2014 4:43:28 PM
Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst

> Hi Eric,
>
> In the IR, besides "target datalayout" and "target triple", we have
> a
> special "target cpu" string which is set by the Clang front-end
> according to
> its -target-cpu flag. We also write a Module::getTargetCPU() method
> to
> retrieve this string from the IR.
>

Not sure that I like this. Each function can have a target cpu
though.
That each subtarget cares about the value of the target cpu for how
the intrinsic works sounds a lot like TargetTransformInfo to me.

-eric

I also think we should avoid this; as I said earlier, ValueTracking is used during canonicalization, and the community consensus seems to be to try, to the extent possible, to make this canonical form backend independent (even for intrinsics). Having Clang add the range metadata seems preferable in this case (and, as a side effect, gives us a new generally-useful capability).

-Hal

From: "Eric Christopher" <echristo@gmail.com>
To: "Jingyue Wu" <jingyue@google.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 17, 2014 4:43:28 PM
Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst

> Hi Eric,
>
> In the IR, besides "target datalayout" and "target triple", we have
> a
> special "target cpu" string which is set by the Clang front-end
> according to
> its -target-cpu flag. We also write a Module::getTargetCPU() method
> to
> retrieve this string from the IR.
>

Not sure that I like this. Each function can have a target cpu
though.
That each subtarget cares about the value of the target cpu for how
the intrinsic works sounds a lot like TargetTransformInfo to me.

-eric

I also think we should avoid this; as I said earlier, ValueTracking is used during canonicalization, and the community consensus seems to be to try, to the extent possible, to make this canonical form backend independent (even for intrinsics). Having Clang add the range metadata seems preferable in this case (and, as a side effect, gives us a new generally-useful capability).

Sure. No objections to that solution either.

-eric

Thanks all for your comments and suggestions!

I sent out a diff (http://reviews.llvm.org/D4187) to extend range metadata to call/invoke.

Jingyue

Hal Finkel wrote:

From: "Eric Christopher"<echristo@gmail.com>
To: "Jingyue Wu"<jingyue@google.com>
Cc: "LLVM Developers Mailing List"<llvmdev@cs.uiuc.edu>
Sent: Tuesday, June 17, 2014 4:43:28 PM
Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst

Hi Eric,

In the IR, besides "target datalayout" and "target triple", we have
a
special "target cpu" string which is set by the Clang front-end
according to
its -target-cpu flag. We also write a Module::getTargetCPU() method
to
retrieve this string from the IR.

Not sure that I like this. Each function can have a target cpu
though.
That each subtarget cares about the value of the target cpu for how
the intrinsic works sounds a lot like TargetTransformInfo to me.

-eric

I also think we should avoid this; as I said earlier, ValueTracking is used during canonicalization, and the community consensus seems to be to try, to the extent possible, to make this canonical form backend independent (even for intrinsics).

The general optimizer parts recognize many functions for the behaviour they're guaranteed to have, from malloc to strlen to sqrt. Before we had a class that would answer the question "is the function named 'malloc' really malloc", we relied on intrinsics to do this sort of thing. Grabbing target intrinsics and making the most of them is safe in the optimizer.

It may be a little weird that you can build llvm to only target one target and have IR that uses intrinsics for all targets, or even x86 mips and arm intrinsics in a single function, but the mid-level optimizer doesn't need to care.

The distinction I'm drawing is that it's correct that the canonicalizers shouldn't need to care what backend is being targeted, which is different from saying that they only work on target-independent IR. They can optimize target-specific intrinsics without needing to care which backend will be used later, so long as they don't produce such intrinsics.

Nick

  Having Clang add the range metadata seems preferable in this case (and, as a side effect, gives us a new generally-useful capability).