Driver support for AST features (UI Proposal)

Hi all,

This is a UI design proposal for the clang driver for providing user /
tool access to clang AST based features.

1. Support '-emit-ast'; this will be an option like '-S' in that it
stops compilation at the AST production phase, and will generate files
with a '.ast' suffix.

2. Recognize '.ast' files as source inputs which can be compiled.
Obviously this will start at the compilation phase, in the same way
that '.i' etc inputs bypass the preprocessing phase.
    a. clang will call clang with an explicit argument telling it to
compile from .ast (-compile-ast) instead of -S.

    b. The compile-ast action will skip passing preprocessing specific
options to clang (but still pass options used by the code generator).

Eventually we will probably also want a feature '-write-ast' (or so)
which will output an AST while doing other things (like compiling),
but that is not part of this proposal.

One salient design point is that the default suffix for AST files is
'.ast', which does not indicate the source language. This means the
driver won't be able to make decisions about things like code
generator options based on its language. I'm not sure yet if this
matters. If it did we would either (a) have to poke the .ast to figure
out its language, or (b) use -x with new options like c++-ast etc
(like for preprocessed inputs) and optionally use suffixes like
'.c-ast', '.cpp-ast', etc.

Comments?

A patch following this design is attached. The compile-ast option is
not implemented in clang, and I don't know exactly what model we want
for compile options passed to an ast -> .s compile step (most of them
are backed in the .ast, but not all of them).

- Daniel

0001-Initial-emit-ast-support.patch (10.8 KB)

Daniel Dunbar wrote:

Hi all,

This is a UI design proposal for the clang driver for providing user /
tool access to clang AST based features.

1. Support '-emit-ast'; this will be an option like '-S' in that it
stops compilation at the AST production phase, and will generate files
with a '.ast' suffix.

2. Recognize '.ast' files as source inputs which can be compiled.
Obviously this will start at the compilation phase, in the same way
that '.i' etc inputs bypass the preprocessing phase.
Comments?
  

Isn't this essentially the same as first emitting a PCH made from the
entire source file, and then compiling an empty source file with a PCH
include directive? In other words, don't we pretty much already support
this, albeit with a weird command syntax?

Sebastian

Yes and yes.

  - Doug

Hi all,

This is a UI design proposal for the clang driver for providing user /
tool access to clang AST based features.

1. Support '-emit-ast'; this will be an option like '-S' in that it
stops compilation at the AST production phase, and will generate files
with a '.ast' suffix.

2. Recognize '.ast' files as source inputs which can be compiled.
Obviously this will start at the compilation phase, in the same way
that '.i' etc inputs bypass the preprocessing phase.
   a. clang will call clang with an explicit argument telling it to
compile from .ast (-compile-ast) instead of -S.

   b. The compile-ast action will skip passing preprocessing specific
options to clang (but still pass options used by the code generator).

Eventually we will probably also want a feature '-write-ast' (or so)
which will output an AST while doing other things (like compiling),
but that is not part of this proposal.

One salient design point is that the default suffix for AST files is
'.ast', which does not indicate the source language. This means the
driver won't be able to make decisions about things like code
generator options based on its language. I'm not sure yet if this
matters. If it did we would either (a) have to poke the .ast to figure
out its language, or (b) use -x with new options like c++-ast etc
(like for preprocessed inputs) and optionally use suffixes like
'.c-ast', '.cpp-ast', etc.

Comments?

This looks great.

I much prefer (a), poking the .ast. From my perspective, it's simpler and less error-prone.

snaroff

Hi all,

This is a UI design proposal for the clang driver for providing user /
tool access to clang AST based features.

1. Support '-emit-ast'; this will be an option like '-S' in that it
stops compilation at the AST production phase, and will generate files
with a '.ast' suffix.

2. Recognize '.ast' files as source inputs which can be compiled.
Obviously this will start at the compilation phase, in the same way
that '.i' etc inputs bypass the preprocessing phase.
   a. clang will call clang with an explicit argument telling it to

clang will call clang-cc ?

compile from .ast (-compile-ast) instead of -S.

   b. The compile-ast action will skip passing preprocessing specific
options to clang (but still pass options used by the code generator).

Shouldn't the option for back-end be stored in the AST node during -emit-ast
phase? This helps, among other things with language specific issues, and maybe
for debug info. (if we chose to putout additional ASTs for such purpose).
Also, the compiler rev number would be a plus to prevent nasty bugs.

- fariborz

Hi Sebastian,

Another point/motivation for this change...

As I'm sure you've noticed, we are starting to use the AST's as a general repository (to support indexing, refactoring, etc.). That said, we'd like the freedom to extend the .ast format without effecting the PCH format (and vice-versa).

snaroff

Hi all,

This is a UI design proposal for the clang driver for providing user /
tool access to clang AST based features.

1. Support '-emit-ast'; this will be an option like '-S' in that it
stops compilation at the AST production phase, and will generate files
with a '.ast' suffix.

Makes sense.

2. Recognize '.ast' files as source inputs which can be compiled.
Obviously this will start at the compilation phase, in the same way
that '.i' etc inputs bypass the preprocessing phase.
   a. clang will call clang with an explicit argument telling it to
compile from .ast (-compile-ast) instead of -S.

   b. The compile-ast action will skip passing preprocessing specific
options to clang (but still pass options used by the code generator).

Seems reasonable.

Eventually we will probably also want a feature '-write-ast' (or so)
which will output an AST while doing other things (like compiling),
but that is not part of this proposal.

One salient design point is that the default suffix for AST files is
'.ast', which does not indicate the source language. This means the
driver won't be able to make decisions about things like code
generator options based on its language.

If nothing else, we need to know when the AST file is a C++ file, so that we can link in the C++ standard library.

I'm not sure yet if this
matters. If it did we would either (a) have to poke the .ast to figure
out its language, or (b) use -x with new options like c++-ast etc
(like for preprocessed inputs) and optionally use suffixes like
'.c-ast', '.cpp-ast', etc.

We should poke the AST file for this information.

Comments?

Looks great! No red flags in the patch; I say, "go for it".

A patch following this design is attached. The compile-ast option is
not implemented in clang, and I don't know exactly what model we want
for compile options passed to an ast -> .s compile step (most of them
are backed in the .ast, but not all of them).

If we do nothing, we'll end up checking the AST-backed options (complaining when there are differences) and just passing the other compilation options through to CodeGen for generation of the .s file. That seems fine to me.

   - Doug

2. Recognize '.ast' files as source inputs which can be compiled.
Obviously this will start at the compilation phase, in the same way
that '.i' etc inputs bypass the preprocessing phase.
a. clang will call clang with an explicit argument telling it to

clang will call clang-cc ?

Yes, typo.

compile from .ast (-compile-ast) instead of -S.

b. The compile-ast action will skip passing preprocessing specific
options to clang (but still pass options used by the code generator).

Shouldn't the option for back-end be stored in the AST node during -emit-ast
phase? This helps, among other things with language specific issues, and
maybe
for debug info. (if we chose to putout additional ASTs for such purpose).
Also, the compiler rev number would be a plus to prevent nasty bugs.

Most of these options already are, the only ones that aren't are
things which only effect the backend (even options like -O2 change the
language since they set preprocessor defines).

And yes, I agree that we should make PCH's version detection a little
more reliable.

- Daniel

Eventually we will probably also want a feature '-write-ast' (or so)
which will output an AST while doing other things (like compiling),
but that is not part of this proposal.

One salient design point is that the default suffix for AST files is
'.ast', which does not indicate the source language. This means the
driver won't be able to make decisions about things like code
generator options based on its language.

If nothing else, we need to know when the AST file is a C++ file, so that we
can link in the C++ standard library.

Actually, thats just based on the driver entry point (clang vs
clang++), not the language! Although one could argue that should
change...

I'm not sure yet if this
matters. If it did we would either (a) have to poke the .ast to figure
out its language, or (b) use -x with new options like c++-ast etc
(like for preprocessed inputs) and optionally use suffixes like
'.c-ast', '.cpp-ast', etc.

We should poke the AST file for this information.

I agree, but its worth noting the downside of this is that the driver
would have to link in substantially more code (or use a custom method
to get the language, for example we could arrange the language to be
part of the file header).

A patch following this design is attached. The compile-ast option is
not implemented in clang, and I don't know exactly what model we want
for compile options passed to an ast -> .s compile step (most of them
are backed in the .ast, but not all of them).

If we do nothing, we'll end up checking the AST-backed options (complaining
when there are differences) and just passing the other compilation options
through to CodeGen for generation of the .s file. That seems fine to me.

It's "fine", but its not ultimately what we want, I think. Consider
trying to build a Makefile system which would cache AST files, for
example. Or a 'ccache' mode which caches .ast files. It's much more
likely to want to pass 0 options to the compile-ast step, than have to
duplicate the options. That would require us to tuck some extra things
in the PCH.

Similarly, having the driver figure out just which options it should
pass for a compile-ast step and which have been subsumed is ugly and
error prone. My inclination is that we should just suck all the
options into the PCH, so that the driver can pass no options to a
compile-ast step. Then if there is a user request for some particular
option to be overrideable at the compile-ast stage, we can add that as
needed.

- Daniel

Hi Daniel,

I've read what you and others said, and I think this is a great proposal. Just to be clear, because of the nature of how -O1 and -O2 can produce different ASTs (due to predefines), should I conclude that we would not (in general) be able to share AST files between different compilation modes (e.g., debug build versus optimized build)? If so, I really see your motivation for shoving all the options into the AST file itself.

Ted

Eventually we will probably also want a feature ‘-write-ast’ (or so)

which will output an AST while doing other things (like compiling),

but that is not part of this proposal.

One salient design point is that the default suffix for AST files is

‘.ast’, which does not indicate the source language. This means the

driver won’t be able to make decisions about things like code

generator options based on its language.

If nothing else, we need to know when the AST file is a C++ file, so that we

can link in the C++ standard library.

Actually, thats just based on the driver entry point (clang vs
clang++), not the language! Although one could argue that should
change…

Oh, right! So… what kinds of decisions does the driver actually have to make that affect options to the code generator?

I’m not sure yet if this

matters. If it did we would either (a) have to poke the .ast to figure

out its language, or (b) use -x with new options like c+±ast etc

(like for preprocessed inputs) and optionally use suffixes like

‘.c-ast’, ‘.cpp-ast’, etc.

We should poke the AST file for this information.

I agree, but its worth noting the downside of this is that the driver
would have to link in substantially more code (or use a custom method
to get the language, for example we could arrange the language to be
part of the file header).

It’s just the bitstream reader; we can factor out code that identifies a PCH file and decodes LangOptions without dragging in the rest of the PCH reader.

A patch following this design is attached. The compile-ast option is

not implemented in clang, and I don’t know exactly what model we want

for compile options passed to an ast → .s compile step (most of them

are backed in the .ast, but not all of them).

If we do nothing, we’ll end up checking the AST-backed options (complaining

when there are differences) and just passing the other compilation options

through to CodeGen for generation of the .s file. That seems fine to me.

It’s “fine”, but its not ultimately what we want, I think. Consider
trying to build a Makefile system which would cache AST files, for
example. Or a ‘ccache’ mode which caches .ast files. It’s much more
likely to want to pass 0 options to the compile-ast step, than have to
duplicate the options. That would require us to tuck some extra things
in the PCH.

Similarly, having the driver figure out just which options it should
pass for a compile-ast step and which have been subsumed is ugly and
error prone. My inclination is that we should just suck all the
options into the PCH, so that the driver can pass no options to a
compile-ast step. Then if there is a user request for some particular
option to be overrideable at the compile-ast stage, we can add that as
needed.

Sure, it’s easy to suck more options into the AST. At present, we take care of everything in LangOptions, classifying each option as “important” (the same setting as the AST file must be used) or “benign” (the option can be overridden). That includes some code-generation-related flags (Optimize, OptimizeSize, PICLevel) that have an impact on predefined macros, but target-specific flags aren’t stored anywhere.

  • Doug

Yes, that is correct. PCH files already have the same limitation, that you can't use a debug-built PCH file with an optimized build of a translation unit. For -O1/-O2/-Os, we actually have LangOptions bits that are encoded in the PCH (and checked at PCH load time).

  - Doug

FYI, I like the proposal, one comment:

Right. Eventually I remembered that and then discarded that concern,
just wanted to mention it (mostly because it influences the choice of
suffix).

- Daniel

Currently? None, so it may be a non-issue. We unconditionally pass
things like -finline-limit- through, for example.

I just brought it up for completeness, and to explicitly call out that
it is a design difference from how preprocessed input files work, for
example.

- Daniel