-emit-html example

One of the spiffy things that Ted is doing with his static analysis stuff is having it emit reports in HTML. A required part of this is just being able to turn code itself into HTML. I think that the stuff clang is doing is pretty cool, so I thought I'd show an example.

Here's "gcc.c" from llvm-gcc converted to html with:

$ clang -I <tons of stuff> gcc.c -emit-html -o gcc.html

gcc.html.gz (151 KB)

Hi,

Thanks for sending a preview! I have been reading the commits in the past weeks and I was really curious to see what the output would look like (unfortunately I did not have time to play with clang for the last couple of weeks).
Anyway, I think the output is great! However I found some little problems, like:
- the output jumps from line 1659 to 2085
- macro expansion is a bit weird in IE7, at least. the box is very little and wraps text after each word (I've attached a snapshot). In Firefox it looks much prettier, though.

Keep up the great work!

Nuno

clang-html-IE7.png

Hi,

Thanks for sending a preview! I have been reading the commits in the past weeks and I was really curious to see what the output would look like (unfortunately I did not have time to play with clang for the last couple of weeks).

Anyway, I think the output is great! However I found some little problems, like:
- the output jumps from line 1659 to 2085

I don't see this in safari.

- macro expansion is a bit weird in IE7, at least. the box is very little and wraps text after each word (I've attached a snapshot). In Firefox it looks much prettier, though.

I'm not really sure what causes this, I'm no web guru ;-).

Also, the white borders between table cells in your screenshot look wrong. Does anyone know how to stop that from happening?

-Chris

Anyway, I think the output is great! However I found some little problems, like:
- the output jumps from line 1659 to 2085

I don't see this in safari.

Take a look at line 1659:
<tr><td class="num" id="LN1659">1659</td><td class="line"><span class='keyword'>static</span> <span class='keyword'>const</span> <span class='keyword'>struct</span> spec_list_1 extra_specs_1[] = { <span class='macro'>EXTRA_SPECS<span class='expansion'>{ "cc1_cpu" , "%{!mtune*: %{m386:mtune=i386 %n`-m386' is deprecated. Use `-march=i386' or `-mtune=i386' instead.\n} %{m486:-mtune=i486 %n`-m486' is deprecated. Use `-march=i486' or `-mtune=i486' instead.\n} %{mpentium:-mtune=pentium %n`-mpentium' is deprecated. Use `-march=pentium' or `-mtune=pentium' instead.\n} %{mpentiumpro:-mtune=pentiumpro %n`-mpentiumpro' is deprecated. Use `-march=pentiumpro' or `-mtune=pentiumpro' instead.\n} %{mcpu=*:-mtune=%* %n`-mcpu=' is deprecated. Use `-mtune=' or '-march=' instead.\n}} %<mcpu=* %{mintel-syntax:-masm=intel %n`-mintel-syntax' is deprecated. Use `-masm=intel' instead.\n} %{mno-intel-syntax:-masm=att %n`-mno-intel-syntax' is deprecated. Use `-masm=att' instead.\n}"<br> "%{march=native:%<march=native %:local_cpu_detect(arch) %{!mtune=*:%<mtune=native %:local_cpu_detect(tune)}} %{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"<br> } , { "darwin_crt1" , "%:version-compare(!> 10.5 mmacosx-version-min= -lcrt1.o) %:version-compare(>= 10.5 mmacosx-version-min= -lcrt1.10.5.o)"<br> } , { "darwin_dylib1" , "%:version-compare(!> 10.5 mmacosx-version-min= -ldylib1.o) %:version-compare(>= 10.5 mmacosx-version-min= -ldylib1.10.5.o)"<br> } , { "darwin_minversion" , "%{!m64|fgnu-runtime:10.4; ,objective-c|,objc-cpp-output:10.5; ,objective-c-header:10.5; ,objective-c++|,objective-c++-cpp-output:10.5; ,objective-c++-header|,objc++-cpp-output:10.5; :10.4}"<br> } , { "darwin_dsymutil" , "%{g*:%{!gstabs*:%{!g0: dsymutil %{o*:%*}%{!o:a.out}}}}"<br> } , { "darwin_arch" , "%{m64:x86_64;:i386}" } , { "darwin_crt2"<br> , "" } , { "darwin_subarch" , "%{m64:x86_64;:i386}" } ,</span></span> };</td></tr>

The problem is in the "'-march=' instead.\n}} %<mcpu=*" part. Notice the unescaped '<'.

- macro expansion is a bit weird in IE7, at least. the box is very little and wraps text after each word (I've attached a snapshot). In Firefox it looks much prettier, though.

I'm not really sure what causes this, I'm no web guru ;-).

Also, the white borders between table cells in your screenshot look wrong. Does anyone know how to stop that from happening?

I dunno, but later I can try to fix that.

Nuno

This should help, nice catch!
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080414/005304.html

-Chris

What clang revision are you using? Chris checked in a patch earlier today that handled the case where some text wasn't being "escaped." Are you still having this problem with TOT?

Chris Lattner wrote:

One of the spiffy things that Ted is doing with his static analysis
stuff is having it emit reports in HTML. A required part of this is
just being able to turn code itself into HTML. I think that the stuff
clang is doing is pretty cool, so I thought I'd show an example.

Here's "gcc.c" from llvm-gcc converted to html with:

$ clang -I <tons of stuff> gcc.c -emit-html -o gcc.html

the code has line numbers and is syntax highlighted. In addition, if
you float your mouse over a macro (which are displayed in red bold
letters), a pop-up shows the tokens that that instance of the macro
expanded out into. I picked gcc.c because it has a ton of macros.
Take a look at the examples around line 1410 and line 750 for some fun
examples.

Line 1446 has a ^L that shows up ugly in firefox, but the original
source has it too.
Maybe control characters should be output escaped and with a different
class?

Best regards,
--Edwin

Hi Ted,

The problem was that my e-mail only reached the mailing list after Chris' fix. I assume the problem is already fixed (although I haven't tried it yet).

Nuno

>
> Also, the white borders between table cells in your screenshot look
> wrong. Does anyone know how to stop that from happening?

I dunno, but later I can try to fix that.

This was because:

IE doesn't understand border-spacing:0px; a portable (?) alternative is
border-collapse:collapse;

This has been patched in trunk.

This should be fixed. Here's an example:

t.c (109 Bytes)

t.html (2.92 KB)

Chris Lattner wrote:

One of the spiffy things that Ted is doing with his static analysis stuff is having it emit reports in HTML. A required part of this is just being able to turn code itself into HTML. I think that the stuff clang is doing is pretty cool, so I thought I'd show an example.

[.....]

Anyway, give -emit-html a try, if you have ideas for making it better, it's really easy to improve: for example, the code to do the macro expansions is ~70 lines of commented code at the end of HTMLRewrite.

Hey, this is awesome!

I had an idea to generalize it a bit to allow for other uses of source annotation. The attached patch adds a 'Annotate' library which provides interfaces for source file annotation.
There are an 'Annotator' class (derived from ASTConsumer) that traverses the AST and dispatches annotations to a 'AnnotationClient' object.
The AnnotationClient is like this:

libannotate.patch (28.3 KB)

gcc.zip (148 KB)

Hi Argiris,

Sorry for the late reply to this email. It's been a busy week, and I shouldn't have neglected getting back to you.

I took a look at this patch. I like the low-level refactorings to the HTML rewriter API (e.g., adding HighlightKeyword). This provides some nice cleanups that simplify the conceptual complexity of the code (particularly in SyntaxHighlight). These aren't strictly necessary, but do but some structure into how we want to name HTML classes for span tags, etc.

While I appreciate it's clean design, I have to be honest that I'm not really sold (yet) on the Annotator class. While I can envision that we will have multiple clients of the HTML rewriter (e.g., the HTML pretty-printer, the HTMLDiagnostics used by the static analysis engine, a doxygen-like documentation generator, and so on) these different clients will not necessarily fall into an ASTConsumer model, nor will this interface necessarily be the one they want.

Basically I'm not certain if it really solves a problem at this point, and right now adds an extra abstraction layer to implement the HTMLPrinter (something at its heart is very, very simple). Right now we have two clients of the HTML Rewriter: one is an ASTConsumer, and the other is not. I don't believe that an IDE would be an ASTConsumer (in the clang driver sense) either, but would rather interact with the clang libraries interactively to regenerate ASTs on-the-fly.

The nice thing about the "low-level" APIs in HTMLRewrite.h is that they make little assumption about the target application, but do the lion's share of the work when pretty-printing code to HTML without introducing an abstraction layer. The result is that for the current clients of the HTML Rewrite API (HTMLPrinter and HTMLDiagnostics) the amount of code they do to perform HTML "tweaking" is small. The HTMLPrinter has about 20-30 lines of code (which includes opening files and comments) and HTMLDiagnostics contains a little code for doing HTML work but this is proportional to the extra stuff that it outputs.

Don't get me wrong; I'm a big believer in refactoring and modular design. I don't think the Annotator has a bad design, I just don't think it's necessary at this point, and I'd rather not add more abstraction unless its a clear benefit.

Ted

Hi Ted,

Ted Kremenek wrote:

I took a look at this patch. I like the low-level refactorings to the HTML rewriter API (e.g., adding HighlightKeyword). This provides some nice cleanups that simplify the conceptual complexity of the code (particularly in SyntaxHighlight). These aren't strictly necessary, but do but some structure into how we want to name HTML classes for span tags, etc.

While I appreciate it's clean design, I have to be honest that I'm not really sold (yet) on the Annotator class. While I can envision that we will have multiple clients of the HTML rewriter (e.g., the HTML pretty-printer, the HTMLDiagnostics used by the static analysis engine, a doxygen-like documentation generator, and so on) these different clients will not necessarily fall into an ASTConsumer model, nor will this interface necessarily be the one they want.

Basically I'm not certain if it really solves a problem at this point, and right now adds an extra abstraction layer to implement the HTMLPrinter (something at its heart is very, very simple). Right now we have two clients of the HTML Rewriter: one is an ASTConsumer, and the other is not. I don't believe that an IDE would be an ASTConsumer (in the clang driver sense) either, but would rather interact with the clang libraries interactively to regenerate ASTs on-the-fly.

The nice thing about the "low-level" APIs in HTMLRewrite.h is that they make little assumption about the target application, but do the lion's share of the work when pretty-printing code to HTML without introducing an abstraction layer. The result is that for the current clients of the HTML Rewrite API (HTMLPrinter and HTMLDiagnostics) the amount of code they do to perform HTML "tweaking" is small. The HTMLPrinter has about 20-30 lines of code (which includes opening files and comments) and HTMLDiagnostics contains a little code for doing HTML work but this is proportional to the extra stuff that it outputs.

Don't get me wrong; I'm a big believer in refactoring and modular design. I don't think the Annotator has a bad design, I just don't think it's necessary at this point, and I'd rather not add more abstraction unless its a clear benefit.

My motivation to propose the Annotator lib wasn't specifically to apply it for HTMLPrinter, that was more like an example.
The Annotator's purpose would be to verify clang's suitability for an IDE, at least from the aspect of syntax/semantic colorizing. For example it would answer questions like:
-Can I colorize all variable names ? (with exclusive color)
-Can I colorize all type names ?
-Can I associate opening/closing braces for all kinds of blocks (namespaces, functions etc.) ?
-Does the AST carry enough information for doing [insert task] ?

Now, assuming that you have a working Annotator lib, the best way to put it to use (without messing with some IDE) would be to make a HTMLAnnotator.
HTMLAnnotator would be a client of Annotator and HTML Rewrite API.

What do you think about the above ?

I don't believe that an IDE would be an ASTConsumer (in the clang driver sense) either, but would rather interact with the clang libraries interactively to regenerate ASTs on-the-fly.

I was thinking that in the specific task of semantic colorizing, you would have to utilize Preprocessor+Parser+Sema for a particular source file,
so the Annotator being an ASTConsumer, that handles the declarations that the parser gives it, seemed reasonable, do you have something else in mind ?

-Argiris

My motivation to propose the Annotator lib wasn't specifically to apply it for HTMLPrinter, that was more like an example.
The Annotator's purpose would be to verify clang's suitability for an IDE, at least from the aspect of syntax/semantic colorizing. For example it would answer questions like:
-Can I colorize all variable names ? (with exclusive color)
-Can I colorize all type names ?
-Can I associate opening/closing braces for all kinds of blocks (namespaces, functions etc.) ?
-Does the AST carry enough information for doing [insert task] ?

Now, assuming that you have a working Annotator lib, the best way to put it to use (without messing with some IDE) would be to make a HTMLAnnotator.
HTMLAnnotator would be a client of Annotator and HTML Rewrite API.

I think have a playground for such things is useful, but I know if we need a separate library at this point. Probably just adding the Annotator class to the Driver would be sufficient for now. We can then easily move it out. I also don't know if the extra layer of indirection is needed until we have another Annotator in mind besides HTMLAnnotator (i.e., can we just use the HTMLPrinter directly to explore your above questions?). I'm not strongly objecting against adding Annotator; it's just not clear to me that there are other clients that would use it.

What do you think about the above ?

I don't believe that an IDE would be an ASTConsumer (in the clang driver sense) either, but would rather interact with the clang libraries interactively to regenerate ASTs on-the-fly.

I was thinking that in the specific task of semantic colorizing, you would have to utilize Preprocessor+Parser+Sema for a particular source file,
so the Annotator being an ASTConsumer, that handles the declarations that the parser gives it, seemed reasonable, do you have something else in mind ?

I think this pipeline works fine for playing around with things; an IDE would interactively parse different parts of a file, incrementally rebuilding ASTs, etc., and thus the ASTConsumer/Annotator interface would probably not be ideal. This pipeline also basically assumes the workflow in the Driver, which is why I think putting the Annotator class in the Driver makes more sense than creating a separate library.

Hi Ted,

Ted Kremenek wrote:

My motivation to propose the Annotator lib wasn't specifically to apply it for HTMLPrinter, that was more like an example.
The Annotator's purpose would be to verify clang's suitability for an IDE, at least from the aspect of syntax/semantic colorizing. For example it would answer questions like:
-Can I colorize all variable names ? (with exclusive color)
-Can I colorize all type names ?
-Can I associate opening/closing braces for all kinds of blocks (namespaces, functions etc.) ?
-Does the AST carry enough information for doing [insert task] ?

Now, assuming that you have a working Annotator lib, the best way to put it to use (without messing with some IDE) would be to make a HTMLAnnotator.
HTMLAnnotator would be a client of Annotator and HTML Rewrite API.

I think have a playground for such things is useful, but I know if we need a separate library at this point. Probably just adding the Annotator class to the Driver would be sufficient for now. We can then easily move it out. I also don't know if the extra layer of indirection is needed until we have another Annotator in mind besides HTMLAnnotator (i.e., can we just use the HTMLPrinter directly to explore your above questions?). I'm not strongly objecting against adding Annotator; it's just not clear to me that there are other clients that would use it.

Yes, I see your point, not much of need for a separate library at the moment.

-Argiris