Source code documentation

Hello,

I have been watching the CLang project for a while, as I'm interested in using it for my own project. I'm developing Synopsis (http://synopsis.fresco.org), which started as a multi-language code documentation tool, but became actually quite a bit more powerful.

Synopsis is a very modular tool, written in Python, which loads different parsers (Python, IDL, C, C++) to generate an internal representation (parse tree, semantic graph, etc.), which then is further transformed or processed, such as into API documentation, or even new source code.

I'd be very interested in using CLang as parser backend for C and C++, and possibly even more for AST transformations (such as code generation).

I read on http://clang.llvm.org/OpenProjects.html that there are plans to write a code documentation tool based on CLang, so I'd like to know whether any such work has already started, so as to avoid duplicating effort.

Also, would anybody be interested in bindings between Synopsis and CLang, even to the point of - gasp - helping ? :slight_smile:

Thanks,
         Stefan

Hello,

I have been watching the CLang project for a while, as I'm interested in
using it for my own project. I'm developing Synopsis
(http://synopsis.fresco.org), which started as a multi-language code
documentation tool, but became actually quite a bit more powerful.

Synopsis is a very modular tool, written in Python, which loads
different parsers (Python, IDL, C, C++) to generate an internal
representation (parse tree, semantic graph, etc.), which then is further
transformed or processed, such as into API documentation, or even new
source code.

I'd be very interested in using CLang as parser backend for C and C++,
and possibly even more for AST transformations (such as code generation).

That would be *great*.

I read on http://clang.llvm.org/OpenProjects.html that there are plans
to write a code documentation tool based on CLang, so I'd like to know
whether any such work has already started, so as to avoid duplicating
effort.

No, there hasn't been any work in this area. It's a long-standing wish.

Also, would anybody be interested in bindings between Synopsis and
CLang, even to the point of - gasp - helping ? :slight_smile:

You should check out the Python bindings we have for the "CIndex" library. They'll obviously need extensions to capture enough of the AST for C++, but that's in the grand plan anyway: to provide a stable interface to explore (but not transform or modify) Clang's AST. If a documentation tool like Synopsis can't use the CIndex library for some reason, CIndex should be extended.

  - Doug

Hello,

I have been watching the CLang project for a while, as I'm interested in
using it for my own project. I'm developing Synopsis
(http://synopsis.fresco.org), which started as a multi-language code
documentation tool, but became actually quite a bit more powerful.

Synopsis is a very modular tool, written in Python, which loads
different parsers (Python, IDL, C, C++) to generate an internal
representation (parse tree, semantic graph, etc.), which then is further
transformed or processed, such as into API documentation, or even new
source code.

I'd be very interested in using CLang as parser backend for C and C++,
and possibly even more for AST transformations (such as code generation).
     

That would be *great*.
   
Glad to hear that you agree. :slight_smile:

Does LLVM participate in GSoC this year ? If so, could we formulate a project that helps with this (quite substantial) work ?

I read on http://clang.llvm.org/OpenProjects.html that there are plans
to write a code documentation tool based on CLang, so I'd like to know
whether any such work has already started, so as to avoid duplicating
effort.
     

No, there hasn't been any work in this area. It's a long-standing wish.
   
OK.

Also, would anybody be interested in bindings between Synopsis and
CLang, even to the point of - gasp - helping ? :slight_smile:
     
You should check out the Python bindings we have for the "CIndex" library. They'll obviously need extensions to capture enough of the AST for C++, but that's in the grand plan anyway: to provide a stable interface to explore (but not transform or modify) Clang's AST. If a documentation tool like Synopsis can't use the CIndex library for some reason, CIndex should be extended.
   
OK, I will have a look. Given that Synopsis has its own representation (an ASG), I think a first step would be to translate the CIndex-based representation produced by CLang into ASG, so as not to disrupt too much at once.
Then we can look into the two representations to see whether a copy / translation can be avoided without breaking other features (such as Synopsis' support for other languages).

Thanks,
         Stefan

Hello, Stefan

Glad to hear that you agree. :slight_smile:
Does LLVM participate in GSoC this year ? If so, could we formulate a
project that helps with this (quite substantial) work ?

Yes and yes :slight_smile:

Hello,

I have been watching the CLang project for a while, as I'm interested in
using it for my own project. I'm developing Synopsis
(http://synopsis.fresco.org), which started as a multi-language code
documentation tool, but became actually quite a bit more powerful.

Synopsis is a very modular tool, written in Python, which loads
different parsers (Python, IDL, C, C++) to generate an internal
representation (parse tree, semantic graph, etc.), which then is further
transformed or processed, such as into API documentation, or even new
source code.

I'd be very interested in using CLang as parser backend for C and C++,
and possibly even more for AST transformations (such as code generation).
    

That would be *great*.
  
Glad to hear that you agree. :slight_smile:

Does LLVM participate in GSoC this year ? If so, could we formulate a project that helps with this (quite substantial) work ?

Yes and yes!

Also, would anybody be interested in bindings between Synopsis and
CLang, even to the point of - gasp - helping ? :slight_smile:
    
You should check out the Python bindings we have for the "CIndex" library. They'll obviously need extensions to capture enough of the AST for C++, but that's in the grand plan anyway: to provide a stable interface to explore (but not transform or modify) Clang's AST. If a documentation tool like Synopsis can't use the CIndex library for some reason, CIndex should be extended.
  
OK, I will have a look. Given that Synopsis has its own representation (an ASG), I think a first step would be to translate the CIndex-based representation produced by CLang into ASG, so as not to disrupt too much at once.
Then we can look into the two representations to see whether a copy / translation can be avoided without breaking other features (such as Synopsis' support for other languages).

That makes sense. Comment parsing will all be done within Synopsis, I assume?

  - Doug

Does LLVM participate in GSoC this year ? If so, could we formulate a project that helps with this (quite substantial) work ?
     

Yes and yes!
   
OK, great. Let me play with the code a bit, then we may talk about how this project could shape up.

That makes sense. Comment parsing will all be done within Synopsis, I assume?
   
Yes. At present the parser attaches comments to the next declaration it finds, from where Synopsis then picks it up to process it further (extract processing instructions, documentation, whatever).

Also, in one mode of operation Synopsis wants to get a position-correct picture of the entire preprocessed source file, so it can generate a hyperlinked and otherwise styled version of it. Does CLang provide this level of detail ?

Thanks,
         Stefan

Does LLVM participate in GSoC this year ? If so, could we formulate a project that helps with this (quite substantial) work ?
    

Yes and yes!
  
OK, great. Let me play with the code a bit, then we may talk about how this project could shape up.

Sounds good.

That makes sense. Comment parsing will all be done within Synopsis, I assume?
  
Yes. At present the parser attaches comments to the next declaration it finds, from where Synopsis then picks it up to process it further (extract processing instructions, documentation, whatever).

Okay. We don't really have this functionality in Clang yet. Comments are passed through to the AST consumer, and we have a hack that tries to find the comment associated with a declaration after the fact, but this will need work.

Also, in one mode of operation Synopsis wants to get a position-correct picture of the entire preprocessed source file, so it can generate a hyperlinked and otherwise styled version of it. Does CLang provide this level of detail ?

Internally, yes. There isn't enough information exposed via the CIndex interface to do this (but I'd support extending CIndex in this direction).

  - Doug

Yes. At present the parser attaches comments to the next declaration it finds, from where Synopsis then picks it up to process it further (extract processing instructions, documentation, whatever).
     

Okay. We don't really have this functionality in Clang yet. Comments are passed through to the AST consumer, and we have a hack that tries to find the comment associated with a declaration after the fact, but this will need work.
   
OK.

Also, in one mode of operation Synopsis wants to get a position-correct picture of the entire preprocessed source file, so it can generate a hyperlinked and otherwise styled version of it. Does CLang provide this level of detail ?
     
Internally, yes. There isn't enough information exposed via the CIndex interface to do this (but I'd support extending CIndex in this direction).
   
OK. Without knowing CIndex, I'm not sure how useful it is to support such different levels of details through the same representation. For example, to generate a hyperlinked source tree, I'd operate on something close to the parse tree, i.e. individual tokens.

But for the documentation, a much more high-level view is useful, such as a syntax tree or even a semantic graph.

Do you think all of those will be represented by CIndex, eventually ?

Thanks,
         Stefan

Yes. At present the parser attaches comments to the next declaration it finds, from where Synopsis then picks it up to process it further (extract processing instructions, documentation, whatever).
    

Okay. We don't really have this functionality in Clang yet. Comments are passed through to the AST consumer, and we have a hack that tries to find the comment associated with a declaration after the fact, but this will need work.
  
OK.

Also, in one mode of operation Synopsis wants to get a position-correct picture of the entire preprocessed source file, so it can generate a hyperlinked and otherwise styled version of it. Does CLang provide this level of detail ?
    
Internally, yes. There isn't enough information exposed via the CIndex interface to do this (but I'd support extending CIndex in this direction).
  
OK. Without knowing CIndex, I'm not sure how useful it is to support such different levels of details through the same representation. For example, to generate a hyperlinked source tree, I'd operate on something close to the parse tree, i.e. individual tokens.

Check out clang_annotateTokens() at http://clang.llvm.org/doxygen/group__CINDEX__LEX.html . It maps from tokens (which you can get from clang_tokenize()) to the AST entities those tokens refer to. A "cursor" in CIndex parlance represents an AST element.

But for the documentation, a much more high-level view is useful, such as a syntax tree or even a semantic graph.

Sure. Cursors point into the AST, which contains much semantic information.

Do you think all of those will be represented by CIndex, eventually ?

I think so. The goal of CIndex is to support various tools (documentation generators, IDEs, syntax highlighters, whatever) without forcing those tools to deal with the ever-changing Clang ASTs directly. So if Synopsis needs something CIndex doesn't provide, it's probably a CIndex bug. We're not there yet, and it will probably be *more* work right now to use CIndex than it would to grok Clang ASTs directly, but the end result will be better if Synopsis can go through CIndex because many other tools will benefit from the CIndex improvements that Synopsis will need.

  - Doug