Loop Metadata?

Is there currently a good way of attaching metadata to loops?

The use case that I have in mind is implementing a feature whereby the
user can put
#pragma unroll(N)
above a loop and that serves has an instruction to the optimizer to
unroll the loop N times.

I understand that LLVM does not have a first-class loop construct, but
would attaching the metadata to the instructions that branch to the loop
header be a good idea?

I also wonder whether whatever mechanism would work for this purpose
would also work for implementing OpenMP support.

Thanks again,
Hal

That's the starting point. The key is how to chose loop identifying instruction (or instructions ?) for the annotation and preserve the annotation through transformations. At some point during optimization, it may be a good idea to hand it over to LoopPassManager.

Is there currently a good way of attaching metadata to loops?

The use case that I have in mind is implementing a feature whereby the
user can put
#pragma unroll(N)
above a loop and that serves has an instruction to the optimizer to
unroll the loop N times.

I had something similar in mind for #pragma vectorize

I understand that LLVM does not have a first-class loop construct, but
would attaching the metadata to the instructions that branch to the loop
header be a good idea?

That's the starting point. The key is how to chose loop identifying instruction (or instructions ?) for the annotation and preserve the annotation through transformations. At some point during optimization, it may be a good idea to hand it over to LoopPassManager.

I don't think annotations are good to implement pragmas.
Annotations are going to disappear if you dump the code to file.
Pragma are a hard contract in between the programmer and the
compiler: the compiler cannot afford loosing a pragma information.

I was considering something like using a builtin call to transmit
the pragma info from the front-end throughout the compiler.
That would expose the pragma "call" in the IR and LLVM would
have to deal with it as a first class construct.
This would avoid loosing the pragma annotation on the way.

Speaking about OpenMP, I think that a big part of it should be
taken care of in the clang front-end. I do not think that we should
expose syntactic kind of info to the middle-end optimizers.

Sebastian

Is there currently a good way of attaching metadata to loops?

The use case that I have in mind is implementing a feature whereby the
user can put
#pragma unroll(N)
above a loop and that serves has an instruction to the optimizer to
unroll the loop N times.

I had something similar in mind for #pragma vectorize

I understand that LLVM does not have a first-class loop construct, but
would attaching the metadata to the instructions that branch to the loop
header be a good idea?

That's the starting point. The key is how to chose loop identifying instruction (or instructions ?) for the annotation and preserve the annotation through transformations. At some point during optimization, it may be a good idea to hand it over to LoopPassManager.

I don't think annotations are good to implement pragmas.
Annotations are going to disappear if you dump the code to file.
Pragma are a hard contract in between the programmer and the
compiler: the compiler cannot afford loosing a pragma information.

I was considering something like using a builtin call to transmit
the pragma info from the front-end throughout the compiler.
That would expose the pragma "call" in the IR and LLVM would
have to deal with it as a first class construct.
This would avoid loosing the pragma annotation on the way.

How would you tie a built-in call with a loop ?

The same way as Hal was saying: tag the loop header.

Sebastian

So how do we tag the loop header? :wink:

If we just put the instruction in the header, in the worst case LICM will move it to header basic block of another loop. :frowning:

We could try to keep it somehow attached to the loop. E.g. by making it modify a scalar that is changed in each loop iteration. But than again,
some sophisticated loop fission pass may move these instructions apart.
We definitely need some conventions for such an approach. Another issue is, that as more semantics we put into such a loop marking instruction, the more probable it is that we block optimizations like dead code elimination of the entire loop. If we do not put any semantics, we may risk to loose the annotation.

Another option that I see is somehow in the direction of what Devang pointed out. We could make our intention of maintain information about
the loops more explicit. I could e.g. imagine having some permanent pass/analysis, maybe called LoopInformationManager, that is in charge of the loop meta data. When reading LLVM-IR it initializes itself by reading the loop meta-data. Other passes can call the LoopInformationManager for the information they are interested in. Passes that do not touch the loop structure need to explicitly specify that they preserved the loop semantics. In case this is not done, the LoopInformationManager removes the meta-data from the loop. Or in case it is informed about a change, it adapts the meta-data accordingly. Such that at the end, only valid meta-data will be written out.
This approach has its own drawbacks, but it may be a direction we could consider.

Cheers
Tobi

How would you tie a built-in call with a loop ?

The same way as Hal was saying: tag the loop header.

So how do we tag the loop header? :wink:

If we just put the instruction in the header, in the worst case LICM
will move it to header basic block of another loop. :frowning:

I don't think that optimizers are going to touch a builtin call for which
they don't understand the side effects. If they do, then they will get
what they deserve: bug reports.

We could try to keep it somehow attached to the loop. E.g. by making it
modify a scalar that is changed in each loop iteration. But than again,
some sophisticated loop fission pass may move these instructions apart.
We definitely need some conventions for such an approach. Another issue
is, that as more semantics we put into such a loop marking instruction,
the more probable it is that we block optimizations like dead code
elimination of the entire loop. If we do not put any semantics, we may
risk to loose the annotation.

Another option that I see is somehow in the direction of what Devang
pointed out. We could make our intention of maintain information about
the loops more explicit. I could e.g. imagine having some permanent
pass/analysis, maybe called LoopInformationManager, that is in charge of
the loop meta data. When reading LLVM-IR it initializes itself by
reading the loop meta-data. Other passes can call the
LoopInformationManager for the information they are interested in.
Passes that do not touch the loop structure need to explicitly specify
that they preserved the loop semantics. In case this is not done, the
LoopInformationManager removes the meta-data from the loop. Or in case
it is informed about a change, it adapts the meta-data accordingly. Such
that at the end, only valid meta-data will be written out.
This approach has its own drawbacks, but it may be a direction we could
consider.

I think that storing information on the side is good for the results of a
program analysis that can be recomputed if the information is lost.
On the other hand, the information that the programmer has put in a
pragma cannot be recomputed, and so in my opinion that should be
part of the IR.

Sebastian