LLVM Loop Vectorizer

Hi,

We are starting to work on an LLVM loop vectorizer. There's number of different projects that already vectorize LLVM IR. For example Hal's BB-Vectorizer, Intel's OpenCL Vectorizer, Polly, ISPC, AnySL, just to name a few. I think that it would be great if we could collaborate on the areas that are shared between the different projects. I think that refactoring LLVM in away that would expose target information to IR-level transformations would be a good way to start. Vectorizers, as well as other IR-level transformations, require target-specific information, such as the cost of different instruction or the availability of certain features. Currently, llvm-based vectorizers do not make use of this information, or just hard-code target information. A loop vectorizer would need target information. After we have some basic target information infrastructure in place we can start discussing the vectorizer itself.

I think that the first step would be to expose Target Lowering Interface (TLI) to OPT's IR-level passes. Currently TLI is only available in LLC. I suggest that we merge LLC and OPT into a single tool that will drive both IR-level passes and the codegen. LLC and OPT can remain as wrappers around the new tool. Please let me know if you can think of a good name for the new tool. I was thinking that "llvm-cli" may be a good name (cli = command line interface). OPT and LLC are only used by LLVM developers, so the impact of this change on the user community would be small.

Thanks,
Nadav

From: "Nadav Rotem" <nrotem@apple.com>
To: "llvmdev@cs.uiuc.edu Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, October 5, 2012 1:14:47 AM
Subject: [LLVMdev] LLVM Loop Vectorizer

Hi,

We are starting to work on an LLVM loop vectorizer. There's number of
different projects that already vectorize LLVM IR. For example Hal's
BB-Vectorizer, Intel's OpenCL Vectorizer, Polly, ISPC, AnySL, just
to name a few. I think that it would be great if we could
collaborate on the areas that are shared between the different
projects. I think that refactoring LLVM in away that would expose
target information to IR-level transformations would be a good way
to start. Vectorizers, as well as other IR-level transformations,
require target-specific information, such as the cost of different
instruction or the availability of certain features. Currently,
llvm-based vectorizers do not make use of this information, or just
hard-code target information. A loop vectorizer would need target
information. After we have some basic target information
infrastructure in place we can start discussing the vectorizer
itself.

Great!

I think that the first step would be to expose Target Lowering
Interface (TLI) to OPT's IR-level passes. Currently TLI is only
available in LLC. I suggest that we merge LLC and OPT into a single
tool that will drive both IR-level passes and the codegen.

Having made this suggestion in the past, I am, of course, fully supportive! This is something that we definitely need to do.

LLC and
OPT can remain as wrappers around the new tool. Please let me know
if you can think of a good name for the new tool. I was thinking
that "llvm-cli" may be a good name (cli = command line interface).
OPT and LLC are only used by LLVM developers, so the impact of this
change on the user community would be small.

We could just call it llvmc (for LLVM compiler). Alternatively, we could add OPTs options to LLC, and add an option to OPT to produce IR output after all of the IR-level passes have completed to emulate the current OPT functionality.

Thanks again,
Hal

Nadav Rotem wrote:

Hi,

We are starting to work on an LLVM loop vectorizer. There's number of different projects that already vectorize LLVM IR. For example Hal's BB-Vectorizer, Intel's OpenCL Vectorizer, Polly, ISPC, AnySL, just to name a few. I think that it would be great if we could collaborate on the areas that are shared between the different projects. I think that refactoring LLVM in away that would expose target information to IR-level transformations would be a good way to start. Vectorizers, as well as other IR-level transformations, require target-specific information, such as the cost of different instruction or the availability of certain features. Currently, llvm-based vectorizers do not make use of this information, or just hard-code target information. A loop vectorizer would need target information. After we have some basic target information infrastructure in place we can start discussing the vectorizer itself.

I think that the first step would be to expose Target Lowering Interface (TLI) to OPT's IR-level passes.

I absolutely think that we should have something like TargetData (now DataLayout) but for the vector types and operations. However, I'm not familiar with "Target Lowering Interface". Could you explain?

  Currently TLI is only available in LLC. I suggest that we merge LLC and OPT into a single tool that will drive both IR-level passes and the codegen.

This really shouldn't be necessary. Notably, it is still possible today to build a Module and optimize it without having decided what target you're targeting.

Nick

  LLC and OPT can remain as wrappers around the new tool. Please let me know if you can think of a good name for the new tool. I was thinking that "llvm-cli" may be a good name (cli = command line interface). OPT and LLC are only used by LLVM developers, so the impact of this change on the user community would be small.

I think we should try to abstract the costs of instructions of various targets instead of trying to replicate them exactly. The coarser the costing infrastructure the more robust will be the vectorization pass. Also this eliminates/reduces the need of updating the costing infrastructure as and when new h/w reduces the cost(s) of existing instructions.
- Dibyendu

From: "Dibyendu Das" <Dibyendu.Das@amd.com>
To: "Nadav Rotem" <nrotem@apple.com>, "llvmdev@cs.uiuc.edu Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, October 5, 2012 3:59:56 AM
Subject: Re: [LLVMdev] LLVM Loop Vectorizer

I think we should try to abstract the costs of instructions of
various targets instead of trying to replicate them exactly. The
coarser the costing infrastructure the more robust will be the
vectorization pass. Also this eliminates/reduces the need of
updating the costing infrastructure as and when new h/w reduces the
cost(s) of existing instructions.

I think that one of the big questions is where this information, abstract or not, resides. The cost information needs to be abstract in some sense: IR instructions don't always map directly onto machine instructions, we don't yet have real register-pressure information, etc. Other information is more direct: does the target support vectors of given types, sizes, and are certain operations provided. As much as possible, I believe this information should be derived automatically from the Target TableGen files, and the pre-existing logic in *ISelLowering.cpp. This requires linking those files with the mid-level optimizers.

-Hal

Why not just have a hook into the TargetInstrInfo to query for the cost of an instruction? This is already used in many places throughout the optimizers.

Perhaps we can parameterize the size of the vector while vectorizing @ llvm and fix up the loop iterators in a target specific pass.

From: "Micah Villmow" <Micah.Villmow@amd.com>
To: "Dibyendu Das" <Dibyendu.Das@amd.com>, "Nadav Rotem" <nrotem@apple.com>, "llvmdev@cs.uiuc.edu Mailing List"
<llvmdev@cs.uiuc.edu>
Sent: Friday, October 5, 2012 10:32:17 AM
Subject: Re: [LLVMdev] LLVM Loop Vectorizer

Why not just have a hook into the TargetInstrInfo to query for the
cost of an instruction? This is already used in many places
throughout the optimizers.

Makes sense to me.

-Hal

From: "Ramshankar Ramanarayanan" <Ramshankar.Ramanarayanan@amd.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Dibyendu Das" <Dibyendu.Das@amd.com>
Cc: "llvmdev@cs.uiuc.edu Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, October 5, 2012 11:00:39 AM
Subject: RE: [LLVMdev] LLVM Loop Vectorizer

Perhaps we can parameterize the size of the vector while vectorizing
@ llvm and fix up the loop iterators in a target specific pass.

I don't understand the motivation for your suggestion. Can you please explain?

Thanks again,
Hal

I will simply echo Hal here - yes we definitely want to do this, how exactly
- let's talk :slight_smile:

Sergei.

I think that the first step would be to expose Target Lowering
Interface (TLI) to OPT's IR-level passes. Currently TLI is only
available in LLC. I suggest that we merge LLC and OPT into a single
tool that will drive both IR-level passes and the codegen.

Having made this suggestion in the past, I am, of course, fully supportive!

This is something that we definitely need to do.

If -simd option is specified opt could do validity checks, dependency analysis and such and recognize that a loop can be executed in parallel and as the -simd option is specified, convert the data types to vector instructions and add the scaling factor to the loop's iterators. Following this there can be an early machine function pass that sets up processor specific value in all of instructions in a loop vectorized by opt. This pass could look over options to see what is expected by the user and what is set to default etc. for example for newest x86-64 there is an option -mprefer-avx128 for gcc, which helps over 256AVX for several cases. The generic vector types in llvm could be put to use in opt.

I absolutely think that we should have something like TargetData (now DataLayout) but for the vector types and operations. However, I’m not familiar with “Target Lowering Interface”. Could you explain?

I agree. Once we make the codegen accessible to the IR-level passes we need to start talking about the right abstraction. I have some ideas, but I wanted to start the discussion after we are done with the first phase.

Regarding TLI. So, DAGCombine, CodeGenPrepare, LoopReduce all use the TLI interface which can answer questions such as “is this operation supported ?” or “is this type legal”. This is a subset of what we need in a vectorized. We can discuss other requirements that the vectorizer may have after we finish with the first phase. I suspect that we may have to refactor some functionality out of TLI.

Currently TLI is only available in LLC. I suggest that we merge LLC and OPT into a single tool that will drive both IR-level passes and the codegen.

This really shouldn’t be necessary. Notably, it is still possible today to build a Module and optimize it without having decided what target you’re targeting.

I agree that it is not necessary for many optimizations. However, this is absolutely needed for lower-level transformations such as strength reduction. So, I plan to keep the current behavior that OPT has where if a target information is not provided through the command line then TargetData is kept uninitialized (null pointer). So, as far as IR-level passes go, nothing is going to change.

I think that the first step would be to expose Target Lowering

Interface (TLI) to OPT's IR-level passes.

By "lowering", we assume the bitcode is more abstract than the machine code. However, in some situations, it is just opposite. For instance, some architectures support vectorization of min/max/saturated-{add,sub)/conditional-assignment/etc/../etc. We need to detect such machine dependent patterns, and *PROMOTE* the bitcode into right forms before we are able to vectorize them. How to deal with this situation?

Shuxin

Regarding TLI. So, DAGCombine, CodeGenPrepare, LoopReduce all use the TLI
interface which can answer questions such as "is this operation supported ?"
or "is this type legal". This is a subset of what we need in a vectorized.
We can discuss other requirements that the vectorizer may have after we
finish with the first phase. I suspect that we may have to refactor some
functionality out of TLI.

Possibly, though I think TargetData should still be able to get you
what you want.

Currently TLI is only available in LLC. I suggest that we merge LLC and OPT
into a single tool that will drive both IR-level passes and the codegen.

*shrug* I really don't think this is necessary either. There's really
no need for
merging the two tools (and IMO would weaken the separation here). Why
not just have TargetData/TargetLoweringInfo in opt?

I agree that it is not necessary for many optimizations. However, this is
absolutely needed for lower-level transformations such as strength
reduction. So, I plan to keep the current behavior that OPT has where if a
target information is not provided through the command line then TargetData
is kept uninitialized (null pointer). So, as far as IR-level passes go,
nothing is going to change.

TargetData is pretty useful during opt if it's available, probably no
need to merge
the tools though.

-eric

From: "Shuxin Yang" <shuxin.llvm@gmail.com>
To: "Nadav Rotem" <nrotem@apple.com>
Cc: "llvmdev@cs.uiuc.edu Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, October 5, 2012 12:33:17 PM
Subject: Re: [LLVMdev] LLVM Loop Vectorizer

>I think that the first step would be to expose Target Lowering
>Interface (TLI) to OPT's IR-level passes.

By "lowering", we assume the bitcode is more abstract than the
machine code. However, in some situations, it is just opposite. For
instance, some architectures support vectorization of
min/max/saturated-{add,sub) / conditional-assignment/etc / ../etc.
We need to detect such machine dependent patterns, and * PROMOTE *
the bitcode into right forms before we are able to vectorize them.
How to deal with this situation?

This is a matter of naming. TLI contains information about target capabilities, and could be used to affect the kinds of normalization decisions that you mention.

-Hal

From: "Ramshankar Ramanarayanan" <Ramshankar.Ramanarayanan@amd.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvmdev@cs.uiuc.edu Mailing List" <llvmdev@cs.uiuc.edu>, "Dibyendu Das" <Dibyendu.Das@amd.com>
Sent: Friday, October 5, 2012 11:42:48 AM
Subject: RE: [LLVMdev] LLVM Loop Vectorizer

If -simd option is specified opt could do validity checks, dependency
analysis and such and recognize that a loop can be executed in
parallel and as the -simd option is specified, convert the data
types to vector instructions and add the scaling factor to the
loop's iterators. Following this there can be an early machine
function pass that sets up processor specific value in all of
instructions in a loop vectorized by opt. This pass could look over
options to see what is expected by the user and what is set to
default etc.

Do you mean having this later pass choose the blocking factors, etc?

for example for newest x86-64 there is an option
-mprefer-avx128 for gcc, which helps over 256AVX for several cases.
The generic vector types in llvm could be put to use in opt.

I think that, where possible, the idea is to retain the use of the generic LLVM vector types. Target-specific knowledge, however, might be used to decide when to form those types and with what operations to use them.

-Hal

From: "Eric Christopher" <echristo@gmail.com>
To: "Nadav Rotem" <nrotem@apple.com>
Cc: "llvmdev@cs.uiuc.edu Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, October 5, 2012 1:29:50 PM
Subject: Re: [LLVMdev] LLVM Loop Vectorizer

> Regarding TLI. So, DAGCombine, CodeGenPrepare, LoopReduce all use
> the TLI
> interface which can answer questions such as "is this operation
> supported ?"
> or "is this type legal". This is a subset of what we need in a
> vectorized.
> We can discuss other requirements that the vectorizer may have
> after we
> finish with the first phase. I suspect that we may have to
> refactor some
> functionality out of TLI.
>

Possibly, though I think TargetData should still be able to get you
what you want.

I think this is the wrong way to look at the problem. The real question is: why should we keep OPT and LLC separate? Keeping them separate and using some extension of TargetData will just mean manually duplicating information in this extended TargetData that we otherwise have in the backends. This is error-prone [from personal experience] and otherwise unproductive.

In addition, merging the tools will allow the consolidation of target-specific code in OPT. There is code in InstCombine, for example, that specifically deals with x86 intrinsics. This code should be moved into a callback provided by the x86 target. Currently, however, this is not possible because of this separation.

-Hal

I think this is the wrong way to look at the problem. The real question is: why should we keep OPT and LLC separate? Keeping them separate and using some extension of TargetData will just mean manually duplicating information in this extended TargetData that we otherwise have in the backends. This is error-prone [from personal experience] and otherwise unproductive.

You quite obviously misunderstood me.

In addition, merging the tools will allow the consolidation of target-specific code in OPT. There is code in InstCombine, for example, that specifically deals with x86 intrinsics. This code should be moved into a callback provided by the x86 target. Currently, however, this is not possible because of this separation.

Making the data available to the passes is just fine, I don't see a
need to merge the two tools.

-eric

Hal, Nadav; I think we’re piling too many issues into this one thread:

Nadav Rotem wrote:

I absolutely think that we should have something like TargetData (now
DataLayout) but for the vector types and operations. However, I'm not
familiar with "Target Lowering Interface". Could you explain?

I agree. Once we make the codegen accessible to the IR-level passes we
need to start talking about the right abstraction. I have some ideas,
but I wanted to start the discussion after we are done with the first
phase.

Regarding TLI. So, DAGCombine, CodeGenPrepare, LoopReduce all use the
TLI interface which can answer questions such as "is this operation
supported ?" or "is this type legal". This is a subset of what we need
in a vectorized. We can discuss other requirements that the vectorizer
may have after we finish with the first phase. I suspect that we may
have to refactor some functionality out of TLI.

Okay, if you're referring to llvm::TargetLowering, then yes that should have a whole slew of methods copied out to a new object (I'm imagining TargetVectorData with a getter in TargetData) that would answer those questions.

Exposing TargetLowering itself is a bad idea since its interface refers to MCExpr* and SDValue and other things that genuinely don't make sense at the IR level.

Currently TLI is only available in LLC. I suggest that we merge LLC
and OPT into a single tool that will drive both IR-level passes and
the codegen.

This really shouldn't be necessary. Notably, it is still possible
today to build a Module and optimize it without having decided what
target you're targeting.

I agree that it is not necessary for many optimizations. However, this
is absolutely needed for lower-level transformations such as strength
reduction. So, I plan to keep the current behavior that OPT has where if
a target information is not provided through the command line then
TargetData is kept uninitialized (null pointer). So, as far as IR-level
passes go, nothing is going to change.

How much do you like the way TargetData works? With the strings in the module? It has the benefit of having a working design which means you won't get much noise about it in review.

The downside is that all the information that goes into the TargetVectorData would have to be be encoded as a string and put into the Module. You'll have to invent an encoding and write a parser for it. Yuck.

The upside is that it preserves the IR / codegen distinction that we've all grown to love, and does it using a mechanism that is in LLVM terms as old as the hills. No reviewer could argue that.

I was imagining a new "target vectorunit = "..."" string in the modules. If you want a different way of doing it for the vector information, I might ask that you change how TargetData works too. :slight_smile:

Nick