extending ClangTool libASTUnit saving and loading

Hi,

wonder if it makes sense to extend ClangTool to facilitate saving, loading of ASTUnit for refactoring class of applications.

ASTUnit::save seems to be straightforward with one string (filename), however the LoadFromASTFile seems to be rather complex

std::unique_ptr ASTUnit::LoadFromASTFile(
const std::string &Filename, IntrusiveRefCntPtr Diags,
const FileSystemOptions &FileSystemOpts, bool OnlyLocalDecls,
ArrayRef RemappedFiles, bool CaptureDiagnostics,
bool AllowPCHWithCompilerErrors, bool UserFilesAreVolatile)

for most of AST manipulation, all those details are probably not important and should be deduced or defaults should be used, or?

In all this reasoning I assumed that once AST created it can be saved to file and re-loaded in another execution of the tool to further process the AST. Hopefully I am not wrong

Hi,

wonder if it makes sense to extend ClangTool to facilitate saving, loading of ASTUnit for refactoring class of applications.

ASTUnit::save seems to be straightforward with one string (filename), however the LoadFromASTFile seems to be rather complex

std::unique_ptr ASTUnit::LoadFromASTFile(
const std::string &Filename, IntrusiveRefCntPtr Diags,
const FileSystemOptions &FileSystemOpts, bool OnlyLocalDecls,
ArrayRef RemappedFiles, bool CaptureDiagnostics,
bool AllowPCHWithCompilerErrors, bool UserFilesAreVolatile)

for most of AST manipulation, all those details are probably not important and should be deduced or defaults should be used, or?

In all this reasoning I assumed that once AST created it can be saved to file and re-loaded in another execution of the tool to further process the AST. Hopefully I am not wrong

Multiple questions:

  1. You should already be able to use ASTUnit to save and load ASTs - why do you need integration with ClangTool?
  2. What are you trying to do? Why do you want to store the AST?

Multiple questions:
1. You should already be able to use ASTUnit to save and load ASTs - why
do you need integration with ClangTool?
2. What are you trying to do? Why do you want to store the AST?

I need some little detail to be clarified. Did not check, but I assume that
"clang -emit-ast" is using ASTUnit::Save. Now the main questions:
-> is this AST fit to reload it and generate code or do any other actions
that one would do with source code? Well, tried to do for instance
-> is it possible to load the AST by clang itself to generate code?

I tried this:
clang -c Sema.cpp <bunch of includes, defines>
#above one compiles and generates Sema.o from clang Sema.cpp without errors
clang -emit-ast Sema.cpp <bunch of includes, defines>
#this generates Sema.ast without any errors/warnings
clang -c Sema.ast

the latest one whatsoever parameters I try to feed there are compilation
errors. For instance

.../include/llvm/Support/SourceMgr.h:48:10: error: call to deleted
constructor of 'std::unique_ptr<MemoryBuffer>'
  struct SrcBuffer {

tried to do lunch an older version of clang with the same ast and get an
error:

error: PCH file uses an older PCH format that is no longer supported
error: unable to load PCH file

This makes me thinking that whatsoever similarities or differences there
are, clang tries to load my ast as PCH and not as AST.

This is the context that generated my original question. Now I see that
does not make sense to extend the ClangTool.

All this makes me thinking that would be nice to have a way to tell to
clang infra to l load an AST instead of file to generate code, do static
analyzis or any other operation.

Or... there is a way to do it?

thanks a lot for

regards
Mobiphil

So, why do you want to do that? What’s your use case? :slight_smile:

So, why do you want to do that? What's your use case? :slight_smile:

does your answer mean that it is impossible to tell clang to load the ast
instead of re-parsing re-syntactic-analyzing the source?
Well, could throw back the ball and ask, why being able to save ast at all?

I use at this moment clang infra mainly to understand clang itself. I gave
the Sema.cpp as example as it is probably the hugest file (absolute size
not counting included headers). Probably with headers included lot of
compilation unit would be almost equally large. Anyway the point is that
compilation takes lot of time. Trying to understand how much potential is
in a workflow where one would generate only once the ast, and subsequent
processes would reuse the AST as mentioned earlier (code generation, static
analysis etc, refactoring experiments). Well, for applying different
optmisation strategies, the LLVM intermediate could play similar role.

As you can see, do not pretend to have a specific use case, but I also have
the opinion that even simple features would never be asked explicitly as
the threshold is a bit high, but these features would be probably used once
present.

can you also eventually confirm that simple command line
clang Sema.ast
treats Sema.ast as pch and not as ast?

thanks and best regards
mobi phil

Well, there are currently multiple use cases in clang: pch files (precompiled headers), pcm files (modules), and I think arcmigrate uses ASTUnit (but I don’t know the details).

The problem is that when you store the AST for a full TU, it’s huge (all the transitive headers are in it) and I’m not sure why you would want to do that.
If you think tools are too slow, then our proposed solution for that is using C++ modules (which will also dump the AST, but in a much more nuanced and smart fashion).

That’s why the use case matters; if you’re interested in processing speed, the solution is modules (and the way it serializes the AST).
If your use case is something else, the answer might be something different :slight_smile:

Well, there are currently multiple use cases in clang: pch files
(precompiled headers), pcm files (modules), and I think arcmigrate uses
ASTUnit (but I don't know the details).

though does make sense to tell to the compiler to say generate pch from a
C++ file as it does not make sense, it makes sense to generate AST, which
is possible, but.. still... it is impossible to load AST, regardless of the
size. Any usecase I was thinking about would hold the AST temporary, so
would not be an issue. Did some test and in avarage ast files are bit more
than double larger as an object file with debug information.

The problem is that when you store the AST for a full TU, it's *huge* (all
the transitive headers are in it) and I'm not sure why you would want to do
that.

Well, from a TU point of view it makes sense to have the header
information. When you analyze or (do code completion, I assume), you need
the full bagage. Though having symbols that are not required by the cpp
file is overhead. Is there any feature to say: discard unreferenced symbols
(types, globals etc.). Still not sure how code completion works and how
adding new symbols added to the code are updated (if at all) in the AST.

If you think tools are too slow, then our proposed solution for that is
using C++ modules (which will also dump the AST, but in a much more nuanced
and smart fashion).

well, did some research if it would be possible to build llvm and clang
with C++ modules. No luck? Any hints?

That's why the use case matters; if you're interested in processing speed,
the solution is modules (and the way it serializes the AST).

happy with that... would be happy to see llvm/clang built with modules. I
am sure one could find all info in module based AST's that can be found in
non-module AST's.

If your use case is something else, the answer might be something
different :slight_smile:

so far I can do my job, I found enough examples in libclang to load from
AST, just was wondering what is the point to dump AST with clang without
being able to explicitly load it in most of the tools.

C++ modules are still pretty experimental.

C++ modules are still pretty experimental.

so.. this sounds that would be almost impossible to build llvm/clang with
modules...

now back to "clang -c Sema.ast" ... isn't it fair to intuitively assume
that it would treat the ast as a precompiled unit with all the ingredients
(headers) instead of PCH? In best case clang should recognize what kind of
ast is behind and behave accordingly?

C++ modules are still pretty experimental.

so.. this sounds that would be almost impossible to build llvm/clang with
modules...

Clang and LLVM build fine with modules enabled; we have a buildbot that
tests this on every commit. You need a suitably-configured system (with
module maps for your C and C++ standard libraries), though.

now back to "clang -c Sema.ast" ... isn't it fair to intuitively assume

that it would treat the ast as a precompiled unit with all the ingredients
(headers) instead of PCH? In best case clang should recognize what kind of
ast is behind and behave accordingly?

That's a reasonable assumption, but the question is, what code should we
generate when you compile from an AST file like this? Right now, we
generate code for all the externally-visible definitions in the AST file
(and we do the same thing whether the AST file is a preamble, a PCH, or a
module). That's liable to change in the future (in particular, we may
introduce a mechanism to say "do not generate definitions for an inline
function in a module in every user of the module; instead, generate the
code once from the module itself").

by mistake removed the list …

don’t know if it is interesting for anybody, but fwd. it

put back CC the list, maybe Richard's answer is interesting for somebody
else...

First, the C++ standard library: If you're using recent libc++, you already

have module maps for it. If you're using libstdc++, I can't help -- I don't
have such a module map.

Then the C standard library: I have a module map for glibc that I've been
using for a year or so, and it's needed very little maintenance. One
additional tweak needed is to move around some of the contents of
<assert.h>, because it does things which are hostile to modules. I can
provide you with a patch if you like.

hm.. it seems that the threshold is a bit high. I thought producing modules
would work straightforward with any library or library headers. So I need
to patch the full glibc source or just the headers? Well I did read the
article on "http://clang.llvm.org/docs/Modules.html&quot; but my understanding
of modules is still poor, so will need some time to catch-up, but the patch
may be useful at a certain moment, thanks.

If you're using a recent Mac OS system, you already have module maps for
your C standard library, but ... they're not compatible with C++. With a
little manual effort, you could probably get them working.

I develop on a ubuntu

Then the C standard library: I have a module map for glibc that I've been
using for a year or so, and it's needed very little maintenance. One
additional tweak needed is to move around some of the contents of
<assert.h>, because it does things which are hostile to modules. I can
provide you with a patch if you like.

Hi, can you please send me a copy of your glibc module map, with any
additional patches needed

rgrds,
mobi phil