Define new pragma in Clang

Hi all,

I would like to ask whether you can provide some information or a link to any
documentation regarding the definition of custom pragma in Clang. I understand
that the steps are modifying the parser lib/Parse/ParsePragma.cpp, adding an
action in include/clang/Parse/Action.h and a handler in Sema. But it is not as
easy as it sounds.
What I want is to have a pragma in the *.cpp file:

#pragma optimizeLoop
for(int i=0 ....)

and to obtain the LLVM IR in the *.ll file:

bb1:
  %0 = load i32* %i, align 4, !pragma !0
...

!0 = metadata !{metadata !"optimizeLoop"}

Hi all,

I would like to ask whether you can provide some information or a link to any
documentation regarding the definition of custom pragma in Clang. I understand
that the steps are modifying the parser lib/Parse/ParsePragma.cpp, adding an
action in include/clang/Parse/Action.h and a handler in Sema. But it is not as
easy as it sounds.

What exactly are you having trouble with? Generally the pragma parser
just handles the parsing logic of the pragma, and then calls a handler
in Sema which will mark some information in the AST (for example,
adding an attribute or setting a bit in the decl).

- Daniel

Daniel Dunbar <daniel@...> writes:

What exactly are you having trouble with? Generally the pragma parser
just handles the parsing logic of the pragma, and then calls a handler
in Sema which will mark some information in the AST (for example,
adding an attribute or setting a bit in the decl).

- Daniel

> What I want is to have a pragma in the *.cpp file:
>
> #pragma optimizeLoop
> for(int i=0 ....)
>
> and to obtain the LLVM IR in the *.ll file:
>
> bb1:
> %0 = load i32* %i, align 4, !pragma !0
> ...
>
>
> !0 = metadata !{metadata !"optimizeLoop"}

My question is how can I attach the pragma to the following 'for' loop? What
should I set in the AST such that I can translate it in the metadata in the LLVM
IR?

Thank you.

Daniel Dunbar <daniel@...> writes:

What exactly are you having trouble with? Generally the pragma parser
just handles the parsing logic of the pragma, and then calls a handler
in Sema which will mark some information in the AST (for example,
adding an attribute or setting a bit in the decl).

- Daniel

> What I want is to have a pragma in the *.cpp file:
>
> #pragma optimizeLoop
> for(int i=0 ....)
>
> and to obtain the LLVM IR in the *.ll file:
>
> bb1:
> %0 = load i32* %i, align 4, !pragma !0
> ...
>
>
> !0 = metadata !{metadata !"optimizeLoop"}

My question is how can I attach the pragma to the following 'for' loop? What
should I set in the AST such that I can translate it in the metadata in the LLVM
IR?

Generally you would need to do something like:
1. Add a pragma handler, which has a callback on the actions interface.
2. Add a sema implementation of the callback, which sets some
internal bit in the Sema object.
3. Add a new bit to the 'for' statement, to specify whether this it
had #pragma optimize set.
4. Modify codegin to emit the metadata based on that bit.

- Daniel

Generally you would need to do something like:
1. Add a pragma handler, which has a callback on the actions interface.
2. Add a sema implementation of the callback, which sets some
internal bit in the Sema object.
3. Add a new bit to the 'for' statement, to specify whether this it
had #pragma optimize set.
4. Modify codegin to emit the metadata based on that bit.

- Daniel

> Thank you.
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@...
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

Thank you, your hints were very helpful. Still I encountered one more problem
regarding pragmas. I have added two pragmas that work very well, if I add one at
a time in my source code that I use to test clang. Nevertheless, the compiler
crashes if I insert both pragmas in the test source code.
What I have done is:
1 install a fresh copy of llvm + clang
2 modify clang to understand pragma 1 and pragma 2
3 write a test.cpp without any pragma => clang works fine
4 write a test1.cpp containing pragma 1 => clang works fine
5 write a test2.cpp containing pragma 2 => clang works fine
6 write a test12.cpp containing pragma 1 and pragma 2 => clang crashes

It understands correctly the pragma and creates the AST, but crashes with:

clang-cc: /llvm/ADT/SmallVector.h:124: T&
llvm::SmallVectorImpl<T>::operator[](unsigned int) [with T = std::pair<unsigned
int, llvm::TrackingVH<llvm::MDNode> >]: Assertion `Begin + idx < End' failed.

0. Program arguments: clang-cc inputCode.cpp -emit-llvm
1. <eof> parser at end of file
2. Per-module optimization passes
3. Running pass 'Print module to stderr' on module 'inputCode.cpp'.
Aborted (core dumped)

However, I do not modify the passes at all and the AST seems to be correctly
created, since when test1.cpp and test2.cpp are compiled, it works.

Any idea would be appreciated. My guess is that there is a problem with the AST,
since printing the module causes a crash. But still, the other passes run before
were successful and additionally, it doesn't crash when only one pragma is added.

Thank you,
Alexandra

Hi Alexandra,

Its hard for me to know the problem without seeing the patch/crash. It
looks like clang is crashing when it tries to index into a SmallVector
out-of-bounds. I suggest you run clang in a debugger and see what code
is doing this, which will probably give a clue as to the problem.

- Daniel

Daniel Dunbar <daniel@...> writes:

Hi Alexandra,

Its hard for me to know the problem without seeing the patch/crash. It
looks like clang is crashing when it tries to index into a SmallVector
out-of-bounds. I suggest you run clang in a debugger and see what code
is doing this, which will probably give a clue as to the problem.

- Daniel

It seems that the problem is in the file Metadata.cpp in
void MetadataContextImpl::
getMDs(const Instruction *Inst, SmallVectorImpl<MDPairTy> &MDs) const {

...

// MD kinds are numbered from 1.
MDs[MI->first - 1] = std::make_pair(MI->first, MI->second);

}

Since the instruction has metadata attached, but the value of the MD kind is 2.
The metadata with MD kind equal to 1 was attached to a previous instruction.
Taking care of the index of MDs[MI->first - 1] solves the problem.

I used an int idx = 0 and MDs[idx++] = std::make_pair(MI->first, MI->second);

I hope this is correct.

Alexandra

Daniel Dunbar <daniel@...> writes:

Hi Alexandra,

Its hard for me to know the problem without seeing the patch/crash. It
looks like clang is crashing when it tries to index into a SmallVector
out-of-bounds. I suggest you run clang in a debugger and see what code
is doing this, which will probably give a clue as to the problem.

- Daniel

It seems that the problem is in the file Metadata.cpp in
void MetadataContextImpl::
getMDs(const Instruction *Inst, SmallVectorImpl<MDPairTy> &MDs) const {

The function "getMDs" doesn't exist on mainline. If you're using LLVM 2.6 with metadata, I strongly recommend you upgrade to mainline svn.

-Chris

Hi,

I have a similar task as you posted here, (to add a new pragma in clang,
attach it to the loop and generate a metadata in LLVM IR) and I want to ask
you if have handled it, please tell me

1. How can i pass the information from pragma (say a number) to the CodeGen
class. I have added the handler class and functions, but I cant find a way
to pass the info from Sema class to CodeGen.

2. How can I attach the pragma to the specified loop in clang so as after i
can recover that information in LLVM IR.

Thanks a lot.
Tigran

Hi,

Hi Tigran,

I have a similar task as you posted here, (to add a new pragma in clang,
attach it to the loop and generate a metadata in LLVM IR) and I want to ask
you if have handled it, please tell me

1. How can i pass the information from pragma (say a number) to the CodeGen
class. I have added the handler class and functions, but I cant find a way
to pass the info from Sema class to CodeGen.

No idea for this one.

2. How can I attach the pragma to the specified loop in clang so as after i
can recover that information in LLVM IR.

This is an interesting question. LLVM-IR does not have explicit loops, so attaching information to loops is a little tricky. You could add meta-data to the loop header or the loop induction variable, but you need to be careful to not loose that information. Another option is to not attach the information to the loop, but to the individual statements in the loop. The best representation probably depends on what kind of information you would like to pass on.

Cheers
Tobi

Hi Tobi,

Thank you for your answer.
The information that I want to pass is just an unsigned integer number. The
idea to attach the pragma to the statement sounds good for me, if it is
possible then I can attach that to the 'for' loop statement. But it is not
clear for me how to find that 'for' loop statement in clang CodeGen?

PS. Thank you also for your Polly project, it was very helpful for us.

Thanks again,
Tigran

Hi Tobi,

Thank you for your answer.
The information that I want to pass is just an unsigned integer number.

Sure, but what is the meaning of the number?

I ask as depending on the meaning of the number, there may be different ways represent it at LLVM-IR level.

> The

idea to attach the pragma to the statement sounds good for me, if it is
possible then I can attach that to the 'for' loop statement. But it is not
clear for me how to find that 'for' loop statement in clang CodeGen?

That's exactly the problem. There is no 'for' loop statement. LLVM-IR just knows about sequential basic blocks and goto statements which together form a control flow graph. The LoopInfo analysis can detect loops in this control flow graph, but especially after a couple of transformations, it is hard to understand if and how the detected loops relate to the loops in the original C program. Specifically, there is not a specific statement that is representative of a loop and that can be used to attach information about source level loop pragmas.

Hence, depending on what kind of information is provided at the source level, there may be different ways to keep this information at IR-level.

Some examples:

#pragma save-to-speculate
for (int i = 0; i < foo(i); i++)

The semantics of save-to-speculate pragma could be that if foo(i) is false, then for all x > i foo(x) is false. This means, we can easily execute more iterations of the loop, as long as they are predicated by foo(x).
To represent such information, we could attach information to the instruction that checks the exit condition of the loop. As the pragma does not depend on the content of the loop, there is no need to relate it to the individual instructions in the loop. Even if loop invariant code motion or other transformations happen, they do not change the behavior of the exit condition.

#pragma minimal-num-iterations(100)
for (int i = b; i < foo(i); i++)

To store information about the minimal number of loop iterations, we could again attach information to the instruction that exits the loop. Stating e.g. that the first 100 calls to it will yield true.

#pragma omp parallel
for (...

Correctly describing that a loop can be executed in parallel is a lot more difficult. Seemingly unrelated changes to the loop body can introduce new dependences that make parallel execution invalid. Loop invariant code motion or reg2mem can e.g. introduce such dependences. Hence, just attaching the pragma to the loop iv or the exiting condition is insufficient. Hal posted at some point a proposal how to represent this. In general it is not easy, as we need to find a way that does not block valuable compiler transformations, but that at the same time ensures that we detect if the semantics of the loop body changed in a way that parallel execution is not valid any more.

Hence, I believe it is important to decide on a case by case basis what information we want to forward to LLVM-IR and how to represent it.

Cheers
Tobi

Hi Tobi,

Thank you very much, your answer is very helpful.

Best Regards,
Tigran