LLVM- AST contents display and dependency analysis

Hi All,

I tried seeing the AST contents in by using following command:

clang -Xclang -ast-dump -fsyntax-only loop.c

This is giving me some AST output( I believe so) but I am having two issue:

  1. I am not able to put this output in a file as Its showing following error:

yaduveer@yaduveer-Inspiron-3542:~/RP$ clang -Xclang -ast-dump -fsyntax-only loop1d.c | llvm-dis -o ast.txt
llvm-dis: Invalid bitcode signature
clang: error: unable to execute command: Broken pipe
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 3.6.0 (trunk 225627) (llvm/trunk 225626)
Target: x86_64-unknown-linux-gnu
Thread model: posix
clang: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg:

LLVM’s IR is not the same thing as Clang’s AST. Do you want the AST, which has all the tree-structure and scopes of the original source program, or do you want LLVM IR, which is basic block soup? Alternatively, if you want to keep source level information you can try to use the static analysis CFG, which is layered on top of the Clang AST.

If you want the AST, you want to use libTooling or libclang to programmatically examine the AST. The -ast-dump flag will print the AST for debugging purposes, but it cannot be deserialized.

If you want a low-level view of the program, use ‘clang -emit-llvm -S’ to get LLVM IR instead of -ast-dump. This will be usable with other LLVM tools like opt.

Hi Reid,

Thanks for advice.
I am looking for some way to detect where I can make changes in AST metadata to make it work in parallel (without using OpenMP). Which files should I touch to do that and how can I proceed?

Hi Reid,

Apologies for the lack of context. I sent him here. :slight_smile:

Basically, he wants something similar to OpenMP (parallel loops,
separate threads, multi-core CPUs) but without having to specify
pragmas and command line options.

My assumption is that, doing it at the IR level, it's already too
late, because there's a lot of OpenMP that is Clang-based. So, if he
could add a pass (possibly out-of-tree), that could identify loops and
mark them with OpenMP pragmas (as they are, when the pragmas exist in
the source), then he could benefit from OpenMP without requiring the
users to add the pragmas to their sources or add command-line options
to their Makefiles.

In theory, if the toolchain he distributes has lib*omp in the right
place, it should be completely transparent.

Are my assumptions in the right track?

cheers,
--renato

> If you want the AST, you want to use libTooling or libclang to
> programmatically examine the AST. The -ast-dump flag will print the AST
for
> debugging purposes, but it cannot be deserialized.

Hi Reid,

Apologies for the lack of context. I sent him here. :slight_smile:

Basically, he wants something similar to OpenMP (parallel loops,
separate threads, multi-core CPUs) but without having to specify
pragmas and command line options.

My assumption is that, doing it at the IR level, it's already too
late, because there's a lot of OpenMP that is Clang-based. So, if he
could add a pass (possibly out-of-tree),

There isn't really a notion of "pass" on the clang AST; the result of
parsing is considered immutable.

-- Sean Silva

Right. What about the pre-processor?

I'm not an expert on OpenMP, but would OpenMP force illegal behaviour
with that pragma, or just bail if it could not prove legality?

If the latter, than just adding "#pragma omp parallel loop" to all
loops would work as he wants, no?

cheers,
--renato

> If you want the AST, you want to use libTooling or libclang to
> programmatically examine the AST. The -ast-dump flag will print the AST
for
> debugging purposes, but it cannot be deserialized.

Hi Reid,

Apologies for the lack of context. I sent him here. :slight_smile:

No problem. :slight_smile:

Basically, he wants something similar to OpenMP (parallel loops,
separate threads, multi-core CPUs) but without having to specify
pragmas and command line options.

My assumption is that, doing it at the IR level, it's already too
late, because there's a lot of OpenMP that is Clang-based. So, if he
could add a pass (possibly out-of-tree), that could identify loops and
mark them with OpenMP pragmas (as they are, when the pragmas exist in
the source), then he could benefit from OpenMP without requiring the
users to add the pragmas to their sources or add command-line options
to their Makefiles.

In theory, if the toolchain he distributes has lib*omp in the right
place, it should be completely transparent.

Are my assumptions in the right track?

I think you are on the right track, but this is going to be really hard due
to the current design of OpenMP in Clang. A lot of OpenMP processing
happens in Sema right now, like formation of CapturedStmts. There also
isn't any alias analysis at the AST level, so separating dependent
computations will be hard.

Is this tool intended to run automatically, or as a programmer aide to
insert pragmas based on a potentially optimistic analysis? If it's a
programmer aide, I wonder if you could use the Clang CFG to the dependency
analysis.

If you want it to be just as semantics preserving as a regular optimizer
pass, then you probably want to analyze the loop in IR, figure out where
the loop header came from in the source code, grovel around in the AST for
that source location (maybe metadata can help?), and do some kind of
rewrite and recompile.

Another thing you might be able to do is have Clang parse all loop bodies
as CapturedStmts (this will outline all loop bodies for easy
parallelization), and then parallelize loops later. You'll need a way of
re-inlining the loops you decide not to parallelize. OpenMP currently has
some logic for this in Clang IRGen, but this is probably too early for your
purposes.

You might *also* be able to do the outlining completely in LLVM IR time
with LoopExtractor, but I assume there's a reason that OpenMP isn't using
that code.

Hope that helps.

I think you are on the right track, but this is going to be really hard due
to the current design of OpenMP in Clang. A lot of OpenMP processing happens
in Sema right now, like formation of CapturedStmts. There also isn't any
alias analysis at the AST level, so separating dependent computations will
be hard.

I was assuming if the user puts a pragma on a loop and the loop isn't
safe, OpenMP wouldn't try to parallelise it.

If that's the case, just putting pragmas on *every* loop would make
Clang slower, as OpenMP would be working a lot harder on all loops for
very little extra gain, but still safe.

Is this tool intended to run automatically, or as a programmer aide to
insert pragmas based on a potentially optimistic analysis? If it's a
programmer aide, I wonder if you could use the Clang CFG to the dependency
analysis.

I have to say, I don't know exactly what Yaduveer is trying to do. But
I think he now has a lot of options to try out.

I just don't think that any automated use of OpenMP without
command-line flags (on of his requirements) will be making upstream
any time soon, so the change has to be as little invasive as possible
for him to keep his sanity while merging to future versions of clang.

cheers,
--renato

> There isn't really a notion of "pass" on the clang AST; the result of
> parsing is considered immutable.

Right. What about the pre-processor?

I'm not an expert on OpenMP, but would OpenMP force illegal behaviour
with that pragma, or just bail if it could not prove legality?

If the latter, than just adding "#pragma omp parallel loop" to all
loops would work as he wants, no?

I'm not sure what you mean. To put it another way, the only officially
supported way to programmatically add "#pragma omp parallel loop" to a loop
body is to textually rewrite the source code and re-parse.

Of course, all of clang's source code is there for you to look at, so for
any given construct you can theoretically just imitate what is happening
inside clang, but it is extremely error-prone because you need to ensure
that all of the AST's invariants are maintained, with anything ranging from
a crash to silent miscompilation if you don't. These invariants are not
documented and probably not all consciously known.

-- Sean Silva

This brings back memories of EDG...

I'm beginning to think that it'd be easier to create a perl script to
annotate the source code before compilation and always add the OpenMP
command line options, instead of doing anything in Clang/LLVM itself.

cheers,
--renato

Hi All,

Thank you very much for your advice and suggestions. I think I have got answer of my queries so I am closing the thread now.

Once again, Thank you very much for your time.