Semantic Analysis in Clang

Hi,
   I am looking to perform some semantic analysis in clang. More
specifically, I want to know whether a function exists for a certain type.
The function is global. e.g.

String toString(A a);

    I want to differentiate that this function exists for type A and not for
type B. I have built an AST and am parsing it. How can I achieve this?
(Whenever I encounter a type)

Regards,
Adil

Hi,
   I am looking to perform some semantic analysis in clang. More
specifically, I want to know whether a function exists for a certain type.
The function is global. e.g.

String toString(A a);

    I want to differentiate that this function exists for type A and not
for
type B. I have built an AST and am parsing it. How can I achieve this?
(Whenever I encounter a type)

I'd loop over all declarations of toString and see which type they take...
Perhaps I'm misunderstanding your problem though :slight_smile:

Cheers,
/Manuel

Thanks for the reply. Well, here’s the detailed problem. Once all the syntax checking has been done, the next step before generating IR is to do semantic analysis and type checking. Let us say that I encounter a code like this:

string b;
vector a;
cout<<b;
cout<<a;

How does clang figure out that the 3rd statement is valid because an operator for string exists, while the 4rth statement is not valid. More specifically, I want to know how clang searches through all the operators (or functions). I have to use this functionality. Does the clang api allow me to do this easily or will I have to replicate this functionality?

Regards,
Adil

Search for 'overload resolution'.

Dmitri

Thanks for the reply. Well, here's the detailed problem. Once all the
syntax checking has been done, the next step before generating IR is to do
semantic analysis and type checking. Let us say that I encounter a code
like this:

string b;
vector<T> a;
cout<<b;
cout<<a;

      How does clang figure out that the 3rd statement is valid because an
operator for string exists, while the 4rth statement is not valid. More
specifically, I want to know how clang searches through all the operators
(or functions). I have to use this functionality. Does the clang api allow
me to do this easily or will I have to replicate this functionality?

As far as I know the clang API does not allow you to do that easily - you
need the full semantic analysis state at that point during parsing, and as
far as I'm aware this only exists implicitly in the Sema* classes. Overload
resolution is one of those really complex and messy parts of C++ :slight_smile:

If you let us know what actual problem you're trying to solve, there might
be solutions to that which are simpler than using overload resolution :slight_smile:

Cheers,
/Manuel

I need to insert some code in the file being parser. I need to make sure that the resulting file compiles fine. The code that I am inserting will mostly be " ostream << type_x ;". Now before I do that, I need to ensure that “type_x” has a stream operator defined. Is that possible any other way? I am very grateful for your help.

Regards,
Adil

I need to insert some code in the file being parser. I need to make sure
that the resulting file compiles fine. The code that I am inserting will
mostly be " ostream << type_x ;". Now before I do that, I need to ensure
that "type_x" has a stream operator defined. Is that possible any other
way? I am very grateful for your help.

What we do in the c++11 transition tools in clang-extra is:
- parse
- detect place to insert
- insert
- reparse, look for errors

Whether that's viable depends on how big the chance is that you're making
an error, and what fallbacks you have.

Cheers,
/Manuel

Let us say that I write some code which has some errors. Now I am reparsing it. Can I halt the parsing when I encounter an error so that I can remedy it? If so, what happens to the AST? I mean, does it break at that point or does it contain the rest of the code too with some kind of substitution for the error part?

Let us say that I write some code which has some errors. Now I am
reparsing it. Can I halt the parsing when I encounter an error so that I
can remedy it? If so, what happens to the AST? I mean, does it break at
that point or does it contain the rest of the code too with some kind
of substitution for the error part?

No, if you reparse it and it breaks, all you know is that what you wanted
to do was incorrect... As I mentioned, whether this is a practical approach
depends on what your fallback strategy is - for example, if you don't want
to do anything in case of an error, simply not saving after getting an
error in the reparsing would solve the problem.

That said, it would be really cool to have full access to the lookup after
the semantic analysis - so if you're interested in tackling this, I'd
expect that you'd make a lot of people very happy :smiley:

Cheers,
/Manuel

Hi,
I am interested in helping any way I can. I am a research assistant and always looking to help the community. Unfortunately, I am new to compiler theory. If you can give me a few guidelines on how to achieve that, I would be glad to help. Also, can you please name a few good resources on compiler theory (my university isn’t focusing on compilers so not many people here to help).Regarding clang, what exactly happens when clang encounters an error (I mean i want to pause the recursive visitor at the point when it encounters an error and not end the program). Also, is it possible to parse the code in rewriter buffer without saving it (parse on the fly).

Regards,
Adil

Hi,
  I am interested in helping any way I can. I am a research assistant and
always looking to help the community. Unfortunately, I am new to compiler
theory. If you can give me a few guidelines on how to achieve that, I would
be glad to help. Also, can you please name a few good resources on compiler
theory (my university isn't focusing on compilers so not many people here
to help).

+dgregor, who would be the right person to guide you here (or delegate said
guidance to the right person).

   Regarding clang, what exactly happens when clang encounters an error (I
mean i want to pause the recursive visitor at the point when it encounters
an error and not end the program). Also, is it possible to

The RecursiveASTVisitor runs *after* clang finished parsing the program.
Thus, I would think what you want to do is pretty much impossible.

parse the code in rewriter buffer without saving it (parse on the fly).

Yes, that's definitely possible. You can look at the examples in the c++11
migration tool in the clang-extra-tools repository (
http://llvm.org/viewvc/llvm-project/clang-tools-extra/trunk/cpp11-migrate/)

Cheers,
/Manuel

Thanks a lot. I will report back any results. If you are interested, I am using clang to write a provenance collection tool. Basically, clang is used to rewrite a program such that it emits provenance information at runtime. The literal meaning of provenance is “origin”. The resulting code after passing through my tool will report all function calls, their arguments and return calls. It will be able to generate a complete runtime control flow graph. This project is in collaboration with the Stanford Research Institute and I will be very happy if you might consider making it part of the clang extras (once it is done ).
Any guidance on compiler theory would be much appreciated. Both my supervising professors are experts in information theory and security so it’s up to me to figure all the things related to compilers.

Regards,
Adil