Conditional Predicate Extraction

Hi all,
I am working on the extraction of the conditional predicates from the basic blocks using LLVM.

I have written a pass using which i am able to extract the conditional as well as unconditional branch instructions corresponding to different basic blocks. I have written a separate tool to extract the control flow graph (granularity of the level of basic blocks). Now i have to extract the conditions from the IR which causes the branching.

Now, these predicate are in terms of conditions involving temporary variables (llvm specific), in order to maintain consistent behaviour when using this information with anyother simulator, it would be better if these predicates have syntax very similar (or exactly) as that of the high level language.

The questions are:

  1. Is it possible to somehow intersperse IR and highlevel code of the program (in C or C++).
  2. Is it possible to incorporate some optimizations using llvm-gcc to make the generated IR very close to the high level language, without affecting the branch information.

Suggestions and answers are welcome.


The questions are:
1. Is it possible to somehow intersperse IR and highlevel code of the

program (in C or C++).

Suggestions and answers are welcome.

You can probably use debug information (generated by llvm gcc on intrinsic
form, see MachineModuleInfo.h) to identify the source ligne for a condition
and the relation between the temporary llvm variable and the high level
language variable.

Perharps it isn't very reliable but I don't see other solution with llvm.
(But I am just a novice).

If you are just doing flow graph analysis, you can consider doing it at the
C/C++ level. Clang (which only support most of C at the moment) as a little
framework for this kind of analysis and it will probably be expended.

Just my 2cents


Thanks cedric,

I will look into the ways the debug information could be used.

However, it seems like i would have to write a piece of code that would successively reduce the conditional predicate in terms of the local and global variables (of the code). I plan to do it like this.

Since i am using a runonBasicblock pass, the context information between two basic blocks cannot be shared, since the pass runs on each basic block at a time. So the idea is to print the output of the pass into a database (a text file). This database is then parsed to populate a prefix tree, where the nodes are the temporary variables and llvm opcodes. This tree would be consumed to eliminate the temporary variables and replace the opcodes by an expression in a c like format (infix arithmetic notation ).

Since in the begining of each basic block the local/global variables are loaded into temporary variables, it is understood that the expression would be in terms of these local/global variables since the temporary variables would have been already consumed by the aforementioned process.

How does it sound like? Is it possible for the llvm-gcc to emit the bitcode in a form which is close to the source language (atleast the variable names). Are there some optimizations possible ?

Thanks alot