code conversion challenge

Hi there.

Here's the problem:
Given the source file with this content:

   int main(int argc, char * argv[])
   {
     /* all ok */
     return 0;
   }

I want to convert it into something like this:

   module
   ( function
     ( Name("main")
     , Returns("int")
     , Parameters
       ( Parameter(Type("int"), Name("argc")
       , Parameter(Type(Array(Pointer("char"), Size()), Name("argv")
       )
     , Body
       ( Comment("all ok")
       , Return(Int(0i32))
       )
     )
   )

This is a description format that has a binary representation that allows for
easy depth-first and breadth-first traversal.

With it one can describe C/C++, make files, pre-processor macros etc. - the
reader supplies the meaning to the "calls" like "module".

With it I hope to be able to describe things like interfaces and be able to
automate the glue that allows it to be called from scripting languages,
and much more.

I haven't even given this format a name, but I can convert the text above to
and from the binary representation.

So that's the challenge - any takers?

Regards,
Philip Ashmore

OK, maybe not this exact example - the parameters are missing ')', but you get the idea.

Philip

It looks like what you want to do is to run a RecursiveASTVisitor over the AST and essentially cherry-pick certain information off of it. It may be a lot of work, but I think you could do it.

Oh, except for the comment. That would be much more difficult. In your example you seem to be including the comment as though it were a statement in the body. How would your representation (which reminds me of Prolog btw) represent:

int main(int argc, char * argv[])
{
return /* all ok */ 0;
}

or

int main(int argc, char * argv[])
{
doSomethingWithALotOfArgs(argv[0], argv, argv+argc, /verbose=/false);
return 0;
}

Code like that last example is extremely common.

For a more pathological example, consider

#define X(a,b) a##b

int X(ma,/pure evil/in)(int argc, char * argv[])
{
doSomethingWithALotOfArgs(argv[0], argv, argv+argc, /verbose=/false);
return 0;
}

The conclusion is that to actually be useful, your representation would want to make certain things “off limits” or purposefully not representable. It’s up to you to draw the line. Once you have that line, you can then get what you want in a pretty straightforward way from clang.

–Sean Silva

It looks like what you want to do is to run a RecursiveASTVisitor over the AST and essentially cherry-pick certain information off of it. It may be a lot of work, but I think you could do it.

Oh, except for the comment. That would be *much* more difficult. In your example you seem to be including the comment as though it were a statement in the body. How would your representation (which reminds me of Prolog btw) represent:

  int main(int argc, char * argv[])
  {
    return /* all ok */ 0;
  }

Return(Comment("all ok"), Int(0i32))
You could also add File("myfile.sbt"), Line(22) and Column(32) anywhere to track the source file.
It all comes down to what you want to process and how.

or

  int main(int argc, char * argv[])
  {
    doSomethingWithALotOfArgs(argv[0], argv, argv+argc, /*verbose=*/false);
    return 0;
  }

Code like that last example is extremely common.

For a more pathological example, consider

  #define X(a,b) a##b
  int X(ma,/*pure evil*/in)(int argc, char * argv[])
  {
    doSomethingWithALotOfArgs(argv[0], argv, argv+argc, /*verbose=*/false);
    return 0;
  }

, Macro
   ( name(X)
   , Parameters(a, b)
   , Body
     ( Return(Concat(a, b))
     )
   )
, Function
   ( Name(X(ma, Comment("pure evil"), in))
   , Body
     ( Call(doSomethingWithALotOfArgs, Index(argv, 0), Add(argv, argc), Comment("verbose"), Bool(false))

My parser doesn't distinguish between "built-in" symbols and those used in the code.

The conclusion is that to actually be useful, your representation would want to make certain things "off limits" or purposefully not representable. It's up to you to draw the line. Once you have that line, you can then get what you want in a pretty straightforward way from clang.

I think the binary representation would be really useful as a pre-compiled header format where even macro expansion is
deferred.

I forgot to mention that the format is in-place-editable and with a snapshotting filesystem (e.g. fuse) you could
efficiently modify it in place for one source file, make another snapshot and edit that, and then throw the snapshots
away.

It's going to be part of my v3c-storyboard SourceForge project, and being able to process C/C++ into this format would
be a big plus.

Things like extracting function prototypes, automatically determining the required include files, source translation
all become a lot easier this way, as the library has a ridiculously simple C/C++ api - it's all about calls, symbols and
literals.

It looks like what you want to do is to run a RecursiveASTVisitor over
the AST and essentially cherry-pick certain information off of it. It
may be a lot of work, but I think you could do it.

Oh, except for the comment. That would be *much* more difficult. In
your example you seem to be including the comment as though it were a
statement in the body. How would your representation (which reminds me
of Prolog btw) represent:

int main(int argc, char * argv[])
{
return /* all ok */ 0;
}

Return(Comment("all ok"), Int(0i32))
You could also add File("myfile.sbt"), Line(22) and Column(32) anywhere
to track the source file.
It all comes down to what you want to process and how.

or

int main(int argc, char * argv[])
{
doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
/*verbose=*/false);
return 0;
}

Code like that last example is extremely common.

For a more pathological example, consider

#define X(a,b) a##b
int X(ma,/*pure evil*/in)(int argc, char * argv[])
{
doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
/*verbose=*/false);
return 0;
}

, Macro
( name(X)
, Parameters(a, b)
, Body
( Return(Concat(a, b))
)
)
, Function
( Name(X(ma, Comment("pure evil"), in))
, Body
( Call(doSomethingWithALotOfArgs, Index(argv, 0), Add(argv, argc),
Comment("verbose"), Bool(false))

My parser doesn't distinguish between "built-in" symbols and those used
in the code.

The conclusion is that to actually be useful, your representation
would want to make certain things "off limits" or purposefully not
representable. It's up to you to draw the line. Once you have that
line, you can then get what you want in a pretty straightforward way
from clang.

I think the binary representation would be really useful as a
pre-compiled header format where even macro expansion is
deferred.

I forgot to mention that the format is in-place-editable and with a
snapshotting filesystem (e.g. fuse) you could
efficiently modify it in place for one source file, make another
snapshot and edit that, and then throw the snapshots
away.

It's going to be part of my v3c-storyboard SourceForge project, and
being able to process C/C++ into this format would
be a big plus.

Things like extracting function prototypes, automatically determining
the required include files, source translation
all become a lot easier this way, as the library has a ridiculously
simple C/C++ api - it's all about calls, symbols and
literals.

Having done a few real world C++ code transformations recently, I
don't buy that a stripped down format will help a lot. Most of the
things you propose would need very C++ specific implementations - why
not just write tools against the clang AST for them?

Cheers,
/Manuel

It looks like what you want to do is to run a RecursiveASTVisitor over
the AST and essentially cherry-pick certain information off of it. It
may be a lot of work, but I think you could do it.

Oh, except for the comment. That would be *much* more difficult. In
your example you seem to be including the comment as though it were a
statement in the body. How would your representation (which reminds me
of Prolog btw) represent:

   int main(int argc, char * argv[])
   {
     return /* all ok */ 0;
   }

Return(Comment("all ok"), Int(0i32))
You could also add File("myfile.sbt"), Line(22) and Column(32) anywhere
to track the source file.
It all comes down to what you want to process and how.

or

   int main(int argc, char * argv[])
   {
     doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
/*verbose=*/false);
     return 0;
   }

Code like that last example is extremely common.

For a more pathological example, consider

   #define X(a,b) a##b
   int X(ma,/*pure evil*/in)(int argc, char * argv[])
   {
     doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
/*verbose=*/false);
     return 0;
   }

, Macro
   ( name(X)
   , Parameters(a, b)
   , Body
     ( Return(Concat(a, b))
     )
   )
, Function
   ( Name(X(ma, Comment("pure evil"), in))
   , Body
     ( Call(doSomethingWithALotOfArgs, Index(argv, 0), Add(argv, argc),
Comment("verbose"), Bool(false))

My parser doesn't distinguish between "built-in" symbols and those used
in the code.

The conclusion is that to actually be useful, your representation
would want to make certain things "off limits" or purposefully not
representable. It's up to you to draw the line. Once you have that
line, you can then get what you want in a pretty straightforward way
from clang.

I think the binary representation would be really useful as a
pre-compiled header format where even macro expansion is
deferred.

I forgot to mention that the format is in-place-editable and with a
snapshotting filesystem (e.g. fuse) you could
efficiently modify it in place for one source file, make another
snapshot and edit that, and then throw the snapshots
away.

It's going to be part of my v3c-storyboard SourceForge project, and
being able to process C/C++ into this format would
be a big plus.

Things like extracting function prototypes, automatically determining
the required include files, source translation
all become a lot easier this way, as the library has a ridiculously
simple C/C++ api - it's all about calls, symbols and
literals.

Having done a few real world C++ code transformations recently, I
don't buy that a stripped down format will help a lot. Most of the

The format is minimal but there are no limits as to what it can represent.
I had in mind using it from a (graphical) user interface and being able to drill down
into the structure representation to essentially "draw" the required operation.

Given the questions I see regularly on cfe-dev from users, such an intuitive tool could
prove to be very popular.

things you propose would need very C++ specific implementations - why
not just write tools against the clang AST for them?

The problem with AST is that the macros are already expanded.
I'd like to let the user try out different macro definitions and see how it affects the expanded
macro and the generated AST, interactively.

Cheers,
/Manuel

--Sean Silva

     > Hi there.
     >
     > Here's the problem:
     > Given the source file with this content:
     >
     > int main(int argc, char * argv[])
     > {
     > /* all ok */
     > return 0;
     > }
     >
     > I want to convert it into something like this:
     >
     > module
     > ( function
     > ( Name("main")
     > , Returns("int")
     > , Parameters
     > ( Parameter(Type("int"), Name("argc")
     > , Parameter(Type(Array(Pointer("char"), Size()),
     Name("argv")
     > )
     > , Body
     > ( Comment("all ok")
     > , Return(Int(0i32))
     > )
     > )
     > )
     >
     > This is a description format that has a binary representation
     that allows for
     > easy depth-first and breadth-first traversal.
     >
     > With it one can describe C/C++, make files, pre-processor macros
     etc. - the
     > reader supplies the meaning to the "calls" like "module".
     >
     > With it I hope to be able to describe things like interfaces and
     be able to
     > automate the glue that allows it to be called from scripting
     languages,
     > and much more.
     >
     > I haven't even given this format a name, but I can convert the
     text above to
     > and from the binary representation.
     >
     > So that's the challenge - any takers?
     >
     > Regards,
     > Philip Ashmore
     >
     > _______________________________________________
     > cfe-dev mailing list
     > cfe-dev@cs.uiuc.edu<mailto:cfe-dev@cs.uiuc.edu>
     > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
     OK, maybe not this exact example - the parameters are missing ')', but
     you get the idea.

     Philip
     _______________________________________________
     cfe-dev mailing list
     cfe-dev@cs.uiuc.edu<mailto:cfe-dev@cs.uiuc.edu>
     http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

Regards,
Philip Ashmore

It looks like what you want to do is to run a RecursiveASTVisitor over
the AST and essentially cherry-pick certain information off of it. It
may be a lot of work, but I think you could do it.

Oh, except for the comment. That would be *much* more difficult. In
your example you seem to be including the comment as though it were a
statement in the body. How would your representation (which reminds me
of Prolog btw) represent:

int main(int argc, char * argv[])
{
return /* all ok */ 0;
}

Return(Comment("all ok"), Int(0i32))
You could also add File("myfile.sbt"), Line(22) and Column(32) anywhere
to track the source file.
It all comes down to what you want to process and how.

or

int main(int argc, char * argv[])
{
doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
/*verbose=*/false);
return 0;
}

Code like that last example is extremely common.

For a more pathological example, consider

#define X(a,b) a##b
int X(ma,/*pure evil*/in)(int argc, char * argv[])
{
doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
/*verbose=*/false);
return 0;
}

, Macro
( name(X)
, Parameters(a, b)
, Body
( Return(Concat(a, b))
)
)
, Function
( Name(X(ma, Comment("pure evil"), in))
, Body
( Call(doSomethingWithALotOfArgs, Index(argv, 0), Add(argv, argc),
Comment("verbose"), Bool(false))

My parser doesn't distinguish between "built-in" symbols and those used
in the code.

The conclusion is that to actually be useful, your representation
would want to make certain things "off limits" or purposefully not
representable. It's up to you to draw the line. Once you have that
line, you can then get what you want in a pretty straightforward way
from clang.

I think the binary representation would be really useful as a
pre-compiled header format where even macro expansion is
deferred.

I forgot to mention that the format is in-place-editable and with a
snapshotting filesystem (e.g. fuse) you could
efficiently modify it in place for one source file, make another
snapshot and edit that, and then throw the snapshots
away.

It's going to be part of my v3c-storyboard SourceForge project, and
being able to process C/C++ into this format would
be a big plus.

Things like extracting function prototypes, automatically determining
the required include files, source translation
all become a lot easier this way, as the library has a ridiculously
simple C/C++ api - it's all about calls, symbols and
literals.

Having done a few real world C++ code transformations recently, I
don't buy that a stripped down format will help a lot. Most of the

The format is minimal but there are no limits as to what it can represent.
I had in mind using it from a (graphical) user interface and being able
to drill down
into the structure representation to essentially "draw" the required
operation.

Given the questions I see regularly on cfe-dev from users, such an
intuitive tool could
prove to be very popular.

things you propose would need very C++ specific implementations - why
not just write tools against the clang AST for them?

The problem with AST is that the macros are already expanded.
I'd like to let the user try out different macro definitions and see how
it affects the expanded
macro and the generated AST, interactively.

Wouldn't you need to reparse for that? In C++ things can get
significantly different meanings from a few changed tokens in a macro.

Cheers,
/Manuel