Forward: Discussion about custom memory allocators for STL

Hi Chris,

Thanks a lot for a detailed opinion and explanation!
It really answers the original question, without going to far
into political discussions about boost and generic allocators
pros/cons aspects.

----- Ursprüngliche Mail ----

Von: Chris Lattner <sabre@nondot.org>
An: LLVM Developers Mailing List <llvmdev@cs.uiuc.edu>
Gesendet: Sonntag, den 18. Mai 2008, 08:38:34 Uhr
Betreff: Re: [LLVMdev] Forward: Discussion about custom memory allocators for STL

> There is a discussion thread on llvm-commits list about a
> possibility of using custom memory allocators for STL to improve the
> performance and reduce the memory pressure of STL containers, e.g.
> std::set.
> I thought that this discussion may be interesting for a wider
> audience and therefore I re-post the messages on llvm-dev as well.
>
> It would be interesting to hear what others think about
> - custom allocators,

I think custom allocators can be a great thing, as long as they are
specific to instances of containers, not a general replacement for
malloc/free. That way they can be used on demand when profiling shows
they are important.

Absolutely!

Ideally, I would be able to do something like:

   MyRegion R;
   std:set(R);
   std::map(R);

This is exactly the way, how I intended to use custom STL allocators.
Doing it selectively on the container intance basis is very important.

etc, and have the nodes for both containers share the same pool.
We already have some things like this that we use in clang, check out
llvm/support/allocators.h. It is not the standard STL allocator
interface though, we need a simple adaptor.

When it comes to non-STL memory allocators, I'd say that many points
mentioned by Barry Kelly are very true. Having allocators that easily allow
for fast allocation and easy bulk de-allocation e.g. of all objects created
during the compilation of certain scopes (like block of statements,
function, etc). But in those cases, allocators are usually bound not to the
containers, but to object types. Usually it is done by defining their
class-specific new/delete operators that would use dedicated type specific
pools. I've used this approach for compilers implementation very
successfully, as I mentioned in the original mail on llvm-commits.

> - reasons for current inefficiency of many STL containers on Evan's/
> Chris configs and

The reason is pretty obvious when you know what to look for. On the
mac, std::allocatorjust calls malloc/free. On linux and
apparently everywhere else, std::allocatoris implemented in terms
of a strange pool allocator. [IMO, the mac way of doing things is the
right way to go, but we can discuss philosophy elsewhere. Here we
will focus on the ramifications of this reality]

The end result of this is that std::set/map on the mac ends up calling
malloc/free for each node in the red/black tree, and it allocates a
metric butt load of nodes (one per each item inserted). Further, each
query of the set incurs traversals of the tree. In the case of the
mac, because malloc/free is used, the allocations are spread all
throughout the heap and traversing a tree touches all sorts of random
pages of memory in very cache-unfriendly ways. I think the performance
delta of mac vs linux is primarily due to this locality issue, not the
raw performance of malloc on the two systems.

On linux (for example) the custom implementation of std::set happens
to get lucky, and generally gives better locality than mapping
directly to malloc/free, just because only a very small subset of the
objects on the heap go through std::allocator (f.e. none of the
SDNodes or LLVM IR objects go through std::allocator). This implicit
partitioning of the heap leads to accidental improved performance, but
the performance is still pretty bad if you do have multiple node-based
containers live at once (though that is rare in llvm). I did a whole
phd thesis about partitioning memory objects, so I know that
partitioning data structures into their own regions can be a big
win :). It just happens that std::allocator on linux gets lucky on
this for LLVM, since we generally have few STL containers live at once
(and when we beat on them, we beat on them heavily).

In principle, I buy your explanation. I just want again to refer to the figures from
my original mail:

So, we can see, that performance-wise the difference is not that huge.
But if we look at the number of new/delete calls, then it is quite different:
1) without STL standard allocator - Total of only 847(!!!) mallocs for
all of the allocators together, while adding 1000000 nodes for each
of them.
2) with STL standard allocator - Total of 2000628 mallocs for all of
the allocators together, while adding 1000000 nodes for each of them.So, it looks like on Linux the allocator used by STL is still using malloc/free rather extensively.

Even if pool allocator is used by STL by default, it is not saving too much of malloc/free calls, as
you can see from above figures. Or do you think that this amount of malloc/free calls is produced
by a pool allocator of STL?

> - possibility of using 3rd party libs like Boost (or their parts) in
> LLVM

We have already imported pieces of boost.

BTW, which ones?

The general rule is that it is ok to use specific pieces,

Totally agree with you.

but they have to be converted to the LLVM style and way of doing things.
Also, things with huge chains of dependencies sometimes have to be simplified.

One comment here:
While I agree with this approach for most libraries, I think that boost is (almost) becoming
a second-important libray for C++, almost as important as the STL. Therefore,
modifying/re-formating/converting it it is not such a great idea, eventually,
since everyone using it assumes that its implementation and semantics is exactly the
same on each platform.

- Roman

I just imported some boost code at work. It was a relatively minor library, but using it pulled in 340 headers—out of boost's 5400! This was after computing the minimum closure of includes; importing each top-level boost/* folder yielded a 1600 headers.

I find this more than slightly excessive. It was a complete non-starter to bring in the 1600 or 5400 headers.

As for the 340, for an highly replaceable library comprising 1% of our LOC to bloat the source file count in our tree by 20%—after using a script to extract a perfect transitive dependency closure—well, I was on the verge of tossing it overboard. I can't blame Chris for his reticence to add such nonsense to LLVM, as it only serves to slow builds and source control operations for all.

That said, there are some parts of boost which are well-structured with few or minimal dependencies—like shared_pointer. I feel there's not a lot of need to rewrite these headers. Then again, there's not a lot of need to pull them in either, being as how such templates are trivial.

— Gordon

shared_ptr's idea is trivial, but getting all the corner cases right
so it never leaks is not easy at all.

Totally agree that we should not be pulling boost into the source tree.

But I will reassert my point that using Boost as a library can be a good
thing. Yes, it's an additional dependence for llvm, but all of the parts
of Boost I use are header-only and thus don't create a linking problem.
Honestly, Boost is soo phenomenally good in places that's it's pretty
much a second standard C++ library.

In my experience, maintaining a private fork leads to trouble down the
road.

                                                 -Dave

But I will reassert my point that using Boost as a library can be a good
thing. Yes, it's an additional dependence for llvm, but all of the
parts of Boost I use are header-only and thus don't create a linking
problem. Honestly, Boost is soo phenomenally good in places that's it's
pretty much a second standard C++ library.

In my experience, maintaining a private fork leads to trouble down the
road.

hi all,

i was following the discussion about boost in llvm, since i am interested
in the llvm project, and personally using boost extensively for my own
projects ...

- header bloating: boost _does_ include lots of headers, but doing this,
it works around many compiler-specific issues (broken compilers,
different architectures ...) ... still, other libraries pull in lots of
headers as well ...

- linking issues: the boost sources can be statically linked into a
program, so there is no real need to manually link to the boost-generated
libraries ...

- bundling boost: from my experience it is easier to provide the
necessary subset of boost in the project's source repository ... myself i
am bundling a patched boost source tree with my application, that
supports gcc-4.3, submitting my changes upstream ...

- using boost: several boost libraries are going to be included in c++0x
and std::tr1/tr2 ... so they somehow become the new standard library ...

best, tim

Ideally, I would be able to do something like:

  MyRegion R;
  std:set(R);
  std::map(R);

This is exactly the way, how I intended to use custom STL allocators.
Doing it selectively on the container intance basis is very important.

Cool. The other thing that would be nice is the ability to be able to put multiple containers into one region.

- possibility of using 3rd party libs like Boost (or their parts) in
LLVM

We have already imported pieces of boost.

BTW, which ones?

include/llvm/ADT/OwningPtr.h was heavily influenced from some boost smart pointer.

but they have to be converted to the LLVM style and way of doing things.
Also, things with huge chains of dependencies sometimes have to be simplified.

One comment here:
While I agree with this approach for most libraries, I think that boost is (almost) becoming
a second-important libray for C++, almost as important as the STL. Therefore,
modifying/re-formating/converting it it is not such a great idea, eventually,
since everyone using it assumes that its implementation and semantics is exactly the
same on each platform.

I understand with your sentiment, but pulling in hundreds of header files just isn't worth it if the functionality is conceptually small and simple. If it were something complicated and necessary (e.g. we decided to use boost regex or something) then we certainly wouldn't want to make changes to it. For something trivial like a smart pointer, there is little harm.

-Chris