Remove unused types from ASTContext

Hi,

Before I’ll begin, I want to say that I don’t want to implement below ‘feature’ in a clang production code. I want to create my own custom build for my own needs.

Abstract:
I’m developing a hobby project which creates a lot of types in a compile time, in one .cpp file. Really a lot. I tried to compile next version, but it was compiling whole night and took ~30G of memory (16G RAM + swap) and wasn’t able to finish. I know that almost all of these types are ‘used’ only once, they’re kind of helpers.
Basically what I want to achieve is to decrease memory usage to not need to use the swap, because it increases compilation time a lot.

Question:
Is there a way to implement a removing no longer used types? E.g.

template
struct Foo
{
using FooResult = typename Bar::BarResult;
};

And I’d want to remove the ‘Bar’ type from a ‘ASTContext’ because I know that I won’t need it anymore. (then, if there will again occur an initialization of very same ‘Bar’, I’m ok with that I’ll need to create it from the scratch).
So basically is there any way to detect if type is still referenced somewhere, so I can remove it if it isn’t?

I hope I’ve described my problem well. I know that in fact I’m asking for a explanation how types in clang are related to each other - that’s a lot, but I’d really like to learn and understand that (:

Thanks in advance for help!

Stryku

Do other compilers behave similarly in terms if memory consumption for your example code?

You will need ALL types (that are used at all) to be present at semantic
analysis, since types are used to determine for example implicit casts
(float to int, int to float, short to int, int to short - and even if your
type can't be converted, it still needs to be present during that phase).
Since Semantic analysis is done at the end of the compilation of a
translation unit. And of course, types are used during translation from AST
to IR, so need to be (at least somewhat) present during the IR generation
phase too. [And the compiler WILL need to know what `Bar<FooParam>` is to
find `Bar<FooParam>::BarResult`, even if that is then translated to
`size_t` or `int`.

Also, are you sure that it's the number of types that is the real cause of
the problem?

Can you provide a more complete example (preferrably one that compiles, but
does not take "over night", in a few minutes would be good, that
illustrates that it uses a lot more memory than a comparable piece of code
with less types introduced in it).

It would be good if you did “reply all” so that those other people that have read and is interested in this discussion can see the answers. I’ve added Brian and the mailing list back into this reply.

That is an impressive amount of template-code. I’ve never really done much TMP, but I will see if I can at least somewhat analyse where the memory/time goes in this.

It would probably have been better to have a SMALL example, that compiles in a few seconds to a few minutes, and a “plain” program that achieves roughly the same thing, but I’ll see what I can do.

When I get home (it’s about 9pm, and I’m still at work), and I make no promises of ANY kind…

My bad, I was sure that I’m “replying all”, thanks for that.

Of course I don’t want you to feel any kind of pressure from my side. I’m just curious if I can somehow adjust llvm/clang to help me with such problem.

I’ll try to provide MCVE, but unfortunately probably tomorrow.

I can’t (after about 10 minutes of trying) get this to compile at all - most likely because I have a too old version of C++ library installed…

Did you use CMake? If no, ctai requres clang 4.0 and to be compiled with flags: -std=c++1z -ftemplate-backtrace-limit=0 -ftemplate-depth=8096

stryku

Yes, using cmake. using clang 4.0 (pre-release). But I don’t have a particularly new C++ library installed on my machine (what comes with gcc 4.9.2, to be precise), and that’s complaining about “has_facet” (from memory, I’m not at home right now) somewhere inside the iostream (again, from memory) header.

Sorry for no response, I was quite busy.

Thanks Brian for help with the build (: Also I’ve tried with the g+±6 and it has same problem with the memory usage.

The problem here is caused by creating a lot of template class specializations. As Mats partially said and I found in the llvm sources here: https://github.com/stryku/llvm/blob/master/include/llvm/IR/Type.h#L42 types are stored till the end of program life. AFAIU every class specialization is a separate type.

Simpler program that shows my problem is e.g. https://gist.github.com/stryku/986bcd6a5d5773ef0e0136a5eca97e20
There is a lot of specializations of values_container, create_container, create_impl etc. Additionally every values_container have a lot of template parameters which in fact are also types (built-in types, but still).

For now, I’m playing with llvm and clang sources to understand what’s going on there and I’ll probably try to implement my own mechanism of storing a template class specializations.

Thanks for your interest,
Stryku

Like I said, you need those types for most of the compilation process. It’s REALLY hard to keep track of when these things don’t matter any longer - you certainly need types that actually generate code all the way to LLVM-IR generation. That’s for the AST types. IR types, which isn’t quite the same thing.

The latter, you can do by doing “syntax-only” run on the compiler - it won’t generate any LLVM-IR, so memory usage here is “only” AST. Also, prodcuing LLVM-IR output from the compiler (-S -emit-llvm) and then using llc to compile the IR to machine code, to determine memory usage of the IR types (obviously with other things).

[In my compiler project, I decided it’s so hard to track what is needed again and what isn’t, and decided to NOT free any memory for that reason - of course, not a particularly great strategy if you want to run it as a library in something else, but if I really had to, I could probably stick every object in one or more container(s) and free it all at the end]

Thanks again. Unfortunately I think that even “only AST” will be to much for my RAM.

I have one more idea, please tell me if this makes any sense. I’m thinking about storing class specializations in term of scopes.
(It don’t want this to work with every C++ project, I’m ok with that it’ll work only with my project)
(And I’d stick to couple of assumptions during ctai implementation, e.g. only one depth scope resolution, A::B)
(I’m aware that probably I’d need to write my own implementation of ClassTemplateSpecializationDecl and couple of more classes)

Example code and the flow:

template
struct N_Holder
{
static constexpr int N_Value = N;
};

template <int Val1, int Val2>
struct Add_impl
{
using Val1_Holder = Holder;
using Val2_Holder = Holder;

static constexpr int Result_impl = Val1_Holder::N_Value + Val2_Holder::N_Value;
};

template <int Val1, int Val2>
struct Add
{
static constexpr int Result = Add_impl<Val1, Val2>::Result_impl;
};

struct A_Struct
{
static constexpr int Value = Add<10, 20>::Result;
};

Flow:

(…)
-see class A_Struct
-go into A_Struct scope
-see a member Value =
-create class Add<10, 20> specialization
-go into Add<10, 20> scope
-see parameters {non-type 10, non-type 20}
-see member Result =
-create Add_impl<parameters[0], parameters[1]> specialization
-go into Add_impl<10, 20> scope
-see parameters {non-type 10, non-type 20}
-see alias Val1_Holder =
-create Holder<parameters[0]> specialization
-go into Holder<10> scope
-see parameters {non-type 10}
-see member N_Value = parameters[0]
-go out of Holder<10> scope (nothing do delete)
-assign Val1_Holder = Holder<10> specialization ‘by value’:
{
aliases {}
members { N_Value = 10 }
}
-delete Holder<10> specialization
-see alias Val2_Holder =
-create Holder<parameters[1]> specialization
-go into Holder<20> scope
-see parameters {non-type 20}
-see member N_Value = parameters[0]
-go out of Holder<20> scope (nothing do delete)
-assign Val2_Holder = Holder<20> specialization ‘by value’:
{
aliases {}
members { N_Value = 20 }
}
-delete Holder<20> specialization
-see member Result_impl = Val1_Holder::N_value + Val2_Holder::N_Value;
-go out of Add_impl<10, 20> scope:
-assign Result = Add_impl<10, 20>::Result_impl
-delete Add_impl<10, 20> specialization
-go out of Add<10, 20> specialization
-assign Value = Add<10, 20>::Result
-delete Add<10, 20> specialization
-go out of A_Struct scope
(…)

Thanks,
Stryku

Thanks again. Unfortunately I think that even "only AST" will be to much
for my RAM.

When I did the compile using ToT clang and libc++, it completed in seconds
and occupied a peak of 4GB of memory. This strikes me as pretty sane
behavior and very different from your compile-takes-overnight experience.
How much physical memory do you have on the build machine and how what is
your target time budget for this job? How much memory is consumed on your
existing clang 3.9/4, g++ 6 baselines? If those builds use more than 4GB,
maybe you can just upgrade to a more current release.

If by some chance your build host doesn't have 4GB of physical memory,
maybe that's an easier problem to solve.

How much physical memory do you have on the build machine
16GB

what is your target time budget for this job
Let’s say 8h. I don’t have a strict limit.

How much memory is consumed on your existing clang
Around 3/4GB

BTW, Which code did you build? That one from my repo master? Because there is no problem with building this version of ctai. Problem occurs with the next version of the project. I’ve pushed it and here it is with an example to compile in main.cpp: https://github.com/stryku/ctai/tree/compile-v2-example

About my above idea, after a while I think it doesn’t makes any sense. I won’t implement it.
I think I’ll somehow try to reduce template specializations in the ctai.

Thanks,
Stryku

currently, and I'm pretty sure you can't (with less than lots of months of
effort, if it's at all possible) make it work like that.

Part of the reason for the compiler "not working like that" is that it does
things in phases: It forms AST of the entire program, then runs semantic
analysis on the entire AST, and then runs the CodeGen on the entire AST.
Oh, and debug info is generated alongside CodeGen, and knowing the types in
debug symbols is probably useful [although I don't know how useful
debuggers are in debugging Template Meta Programming code - not something I
spend much time working on in general].

This is mostly correct, but clang semantic analysis and IR generation
actually happen incrementally after every top-level decl. Some things get
deferred to end of TU, so that triggers further analysis and IR gen.

I'm pretty sure we can't do the "delete" steps Mateusz describes because we
can't instantiate a template once and then throw away the instantiated
specialization later. We have to keep it around in case something would
reinstantiate it. The point of instantiation is pretty important in C++, so
I don't think we would want to implement a mode that throws away implicit
instantiations.

Finally, it's not practical to implement such a mode because we use a
BumpPtrAllocator memory management strategy in ASTContext. That means AST
nodes live until end of TU, and there is no infrastructure to free and
re-allocate their memory.

My understanding is that constexpr functions were designed as a more
convenient, readable, and efficient way to implement these kinds of
metaprograms, so I suggest using them instead.

Thanks everyone for an interest. I really appreciate your help and I see your points.

To be able to release next version of ctai, I implemented a simplified example.

One more clarification Reid,

My understanding is that constexpr functions were designed as a more convenient, readable, and efficient way to implement these kinds of metaprograms, so I suggest using them instead.
One of the assumptions during ctai implementation was to use template classes/structs everywhere, without constexpr functions and objects. There is a kind of stereotype that templates are difficult and I wanted to challenge myself with program like that. In fact, with constexpr functions/objects it would be too easy (:

Thanks again,
Stryku