Source rewrite

In an application that uses clang libraries we need to transform some
specially written comments in C construct.

e.g:
int p() {
  int a = 3;
  /* #instrument(a)# */
}

should be parsed as
int p() {
  int a = 3;
  instrument(a);
}

I've added a CommentHandler to Preprocessor to catch comments and I
thought to use Rewriter class to manipulate the input buffer inserting
after the comment the text to be parsed, but then I've realized that
Rewriter class is not designed to rewrite lexer input buffers, but to
build a separate object where original text and changes live together.

Now I'm a bit confused: what's the proper way to cope with the need
described above?

In an application that uses clang libraries we need to transform some
specially written comments in C construct.

e.g:
int p() {
  int a = 3;
  /* #instrument(a)# */
}

should be parsed as
int p() {
  int a = 3;
  instrument(a);
}

I've added a CommentHandler to Preprocessor to catch comments and I
thought to use Rewriter class to manipulate the input buffer inserting
after the comment the text to be parsed, but then I've realized that
Rewriter class is not designed to rewrite lexer input buffers, but to
build a separate object where original text and changes live together.

Now I'm a bit confused: what's the proper way to cope with the need
described above?

Hi Abramo,

The ASTContext class has a 'Comments' member that contains all the SourceRanges's for all comments in the source file.

This assumes you've told the preprocessor to keep comments.

This might not be the only/proper way, however I thought you should be aware of this...

snaroff

You actually don’t have to tell the preprocessor to keep comments; it keeps the source ranges for the comments regardless, and you can go back to the source to get the content of the comments.

  • Doug

I think to have explained badly what we need: I've no problems to get
the comment content, my problem is to translate, during the parsing, the
comment content in another text to be lexed/parsed instead of (or just
after) the comment.

As I write above, what I need is that the AST built from:

int p() {
  int a = 3;
  /* #instrument(a)# */
}

is as if the source read was:

int p() {
  int a = 3;
  instrument(a);
}

Oh, interesting. You'll probably need to each the preprocessor how to parse inside these comments. One option might be to treat such comments similarly to macro expansion, so that processing the comment

   /* #instrument(a)# */

consumes the comment and then pushes a new lexer that will point into a buffer containing

   instrument(a)

just like processing

   FOO

where there is a macro definition

   #define FOO instrument(a)

will create a new lexer pointing into a buffer containing

   instrument(a)

  - Doug

In an application that uses clang libraries we need to transform
some
specially written comments in C construct.

e.g:
int p() {
int a = 3;
/* #instrument(a)# */
}

should be parsed as
int p() {
int a = 3;
instrument(a);
}

I've added a CommentHandler to Preprocessor to catch comments and I
thought to use Rewriter class to manipulate the input buffer
inserting
after the comment the text to be parsed, but then I've realized
that
Rewriter class is not designed to rewrite lexer input buffers,
but to
build a separate object where original text and changes live
together.

Now I'm a bit confused: what's the proper way to cope with the need
described above?

Hi Abramo,

The ASTContext class has a 'Comments' member that contains all the
SourceRanges's for all comments in the source file.

This assumes you've told the preprocessor to keep comments.

You actually don't have to tell the preprocessor to keep comments; it
keeps the source ranges for the comments regardless, and you can go
back
to the source to get the content of the comments.

I think to have explained badly what we need: I've no problems to get
the comment content, my problem is to translate, during the parsing,
the
comment content in another text to be lexed/parsed instead of (or just
after) the comment.

As I write above, what I need is that the AST built from:

int p() {
int a = 3;
/* #instrument(a)# */
}

is as if the source read was:

int p() {
int a = 3;
instrument(a);
}

Oh, interesting. You'll probably need to each the preprocessor how to
parse inside these comments. One option might be to treat such
comments similarly to macro expansion, so that processing the comment

  /* #instrument(a)# */

consumes the comment and then pushes a new lexer that will point into
a buffer containing

  instrument(a)

just like processing

  FOO

where there is a macro definition

  #define FOO instrument(a)

will create a new lexer pointing into a buffer containing

  instrument(a)

Somehow, all semantics checks need be performed and an 'invisible' AST generated and passed on so rewriter can do
the rewrite.

- Fariborz

Yes, this is what jumped in my mind after sending the original message.

Reading your message I now guess that nobody has attempted that
before... I think that I'll try this approach.

Thanks indeed for your help.

In an application that uses clang libraries we need to transform
some
specially written comments in C construct.

e.g:
int p() {
int a = 3;
/* #instrument(a)# */
}

should be parsed as
int p() {
int a = 3;
instrument(a);
}

I've added a CommentHandler to Preprocessor to catch comments and I
thought to use Rewriter class to manipulate the input buffer
inserting
after the comment the text to be parsed, but then I've realized
that
Rewriter class is not designed to rewrite lexer input buffers,
but to
build a separate object where original text and changes live
together.

Now I'm a bit confused: what's the proper way to cope with the need
described above?

Hi Abramo,

The ASTContext class has a 'Comments' member that contains all the
SourceRanges's for all comments in the source file.

This assumes you've told the preprocessor to keep comments.

You actually don't have to tell the preprocessor to keep comments; it
keeps the source ranges for the comments regardless, and you can go
back
to the source to get the content of the comments.

I think to have explained badly what we need: I've no problems to get
the comment content, my problem is to translate, during the parsing,
the
comment content in another text to be lexed/parsed instead of (or just
after) the comment.

As I write above, what I need is that the AST built from:

int p() {
int a = 3;
/* #instrument(a)# */
}

is as if the source read was:

int p() {
int a = 3;
instrument(a);
}

Oh, interesting. You'll probably need to each the preprocessor how to
parse inside these comments. One option might be to treat such
comments similarly to macro expansion, so that processing the comment

/* #instrument(a)# */

consumes the comment and then pushes a new lexer that will point into
a buffer containing

instrument(a)

just like processing

FOO

where there is a macro definition

#define FOO instrument(a)

will create a new lexer pointing into a buffer containing

instrument(a)

Somehow, all semantics checks need be performed and an 'invisible' AST generated and passed on so rewriter can do
the rewrite.

Right. What I've described will do the first part---allow parsing of the code within the comments to produce an AST---and the rewrite can handle the second part, rewriting the comment to something else (e.g., the text parsed within the comment).

  - Doug

I believe to not understand what you mean... I think that the rewriter
is not involved in any way.

Don't you want to replace the comment to instrument(a) in the rewritten source? If you just want to build ASTs then
you do not need the rewriter.

- Fariborz

As I write above, what I need is that the AST built from:

int p() {
int a = 3;
/* #instrument(a)# */
}

is as if the source read was:

int p() {
int a = 3;
instrument(a);
}

Oh, interesting. You'll probably need to each the preprocessor how to
parse inside these comments. One option might be to treat such comments
similarly to macro expansion, so that processing the comment

  /* #instrument(a)# */

consumes the comment and then pushes a new lexer that will point into a
buffer containing

  instrument(a)

just like processing

  FOO

where there is a macro definition

  #define FOO instrument(a)

will create a new lexer pointing into a buffer containing

  instrument(a)

I've done it and it "almost" works...

The problem I see is that the pushed TokenLexer is used only *after* the
first token after the comment is Lexed and not just after the skipped
comment (because usually the comment is not a token to return).

This means that:

int x /* = 0 */;
int z;

is lexed as:

int x; = 0 int z;

instead of the wished way.

Now I'm stuck...

There is a way to cope with that?

My CommentHandler is written in this way:

void Comment_Converter::HandleComment(clang::Preprocessor &PP,
                                      clang::SourceRange Comment) {
  const clang::SourceManager &sm = PP.getSourceManager();
  const clang::LangOptions &lo = PP.getLangOptions();
  clang::SourceLocation begin = Comment.getBegin();
  clang::FileID fid = sm.getFileID(begin);
  const char* start = sm.getCharacterData(begin);
  const char* end = sm.getCharacterData(Comment.getEnd());
  if (start[1] == '*')
    end -= 2;
  start += 2;
  char saved = *end;
  *const_cast<char*>(end) = 0;
  clang::Lexer lexer(sm.getLocForStartOfFile(fid), lo,
                     sm.getBufferData(fid).first,
                     start, end);
  static std::vector<clang::Token> tokens;
  tokens.clear();
  clang::Token tok;
  while (1) {
    lexer.LexFromRawLexer(tok);
    if (tok.is(clang::tok::eof))
      break;
    if (tok.is(clang::tok::identifier))
      tok.setKind(PP.LookUpIdentifierInfo(tok)->getTokenID());
    tokens.push_back(tok);
  }
  *const_cast<char*>(end) = saved;
  PP.EnterTokenStream(tokens.data(), tokens.size(), false, false);
}

hi,

its there any simple example that shows us how to rewrite the source
code that clang just parsed?

best regards

--ether

The simplest rewriter that works on parsed ASTs is probably the blocks rewriter, in lib/Frontend/RewriteBlocks.cpp

  - Doug

hi,

its there any simple example that shows us how to rewrite the source
code that clang just parsed?

The simplest rewriter that works on parsed ASTs is probably the blocks
rewriter, in lib/Frontend/RewriteBlocks.cpp

Above source is obsolete. All rewriting code is done in lib/Frontend/RewriteObjC.cpp nowadays.

- Fariborz

Should we just remove it, then? Or does it still have value?

  - Doug

hi,

its there any simple example that shows us how to rewrite the source
code that clang just parsed?

The simplest rewriter that works on parsed ASTs is probably the blocks
rewriter, in lib/Frontend/RewriteBlocks.cpp

Above source is obsolete. All rewriting code is done in lib/Frontend/RewriteObjC.cpp nowadays.

Should we just remove it, then? Or does it still have value?

Steve introduced it. He told me it is no longer in use. Let's get his confirmation before we act on it.

- Fariborz

hi,

its there any simple example that shows us how to rewrite the
source
code that clang just parsed?

The simplest rewriter that works on parsed ASTs is probably the
blocks
rewriter, in lib/Frontend/RewriteBlocks.cpp

Above source is obsolete. All rewriting code is done in lib/
Frontend/RewriteObjC.cpp nowadays.

Should we just remove it, then? Or does it still have value?

Steve introduced it. He told me it is no longer in use. Let's get his
confirmation before we act on it.

It is vestigial (used back when we were prototyping blocks/rewriting).

Please feel free to remove it...

snaroff