Thanks a lot for a detailed opinion and explanation!
It really answers the original question, without going to far
into political discussions about boost and generic allocators
----- Ursprüngliche Mail ----
Von: Chris Lattner <firstname.lastname@example.org>
An: LLVM Developers Mailing List <email@example.com>
Gesendet: Sonntag, den 18. Mai 2008, 08:38:34 Uhr
Betreff: Re: [LLVMdev] Forward: Discussion about custom memory allocators for STL
> There is a discussion thread on llvm-commits list about a
> possibility of using custom memory allocators for STL to improve the
> performance and reduce the memory pressure of STL containers, e.g.
> I thought that this discussion may be interesting for a wider
> audience and therefore I re-post the messages on llvm-dev as well.
> It would be interesting to hear what others think about
> - custom allocators,
I think custom allocators can be a great thing, as long as they are
specific to instances of containers, not a general replacement for
malloc/free. That way they can be used on demand when profiling shows
they are important.
Ideally, I would be able to do something like:
This is exactly the way, how I intended to use custom STL allocators.
Doing it selectively on the container intance basis is very important.
etc, and have the nodes for both containers share the same pool.
We already have some things like this that we use in clang, check out
llvm/support/allocators.h. It is not the standard STL allocator
interface though, we need a simple adaptor.
When it comes to non-STL memory allocators, I'd say that many points
mentioned by Barry Kelly are very true. Having allocators that easily allow
for fast allocation and easy bulk de-allocation e.g. of all objects created
during the compilation of certain scopes (like block of statements,
function, etc). But in those cases, allocators are usually bound not to the
containers, but to object types. Usually it is done by defining their
class-specific new/delete operators that would use dedicated type specific
pools. I've used this approach for compilers implementation very
successfully, as I mentioned in the original mail on llvm-commits.
> - reasons for current inefficiency of many STL containers on Evan's/
> Chris configs and
The reason is pretty obvious when you know what to look for. On the
mac, std::allocatorjust calls malloc/free. On linux and
apparently everywhere else, std::allocatoris implemented in terms
of a strange pool allocator. [IMO, the mac way of doing things is the
right way to go, but we can discuss philosophy elsewhere. Here we
will focus on the ramifications of this reality]
The end result of this is that std::set/map on the mac ends up calling
malloc/free for each node in the red/black tree, and it allocates a
metric butt load of nodes (one per each item inserted). Further, each
query of the set incurs traversals of the tree. In the case of the
mac, because malloc/free is used, the allocations are spread all
throughout the heap and traversing a tree touches all sorts of random
pages of memory in very cache-unfriendly ways. I think the performance
delta of mac vs linux is primarily due to this locality issue, not the
raw performance of malloc on the two systems.
On linux (for example) the custom implementation of std::set happens
to get lucky, and generally gives better locality than mapping
directly to malloc/free, just because only a very small subset of the
objects on the heap go through std::allocator (f.e. none of the
SDNodes or LLVM IR objects go through std::allocator). This implicit
partitioning of the heap leads to accidental improved performance, but
the performance is still pretty bad if you do have multiple node-based
containers live at once (though that is rare in llvm). I did a whole
phd thesis about partitioning memory objects, so I know that
partitioning data structures into their own regions can be a big
win :). It just happens that std::allocator on linux gets lucky on
this for LLVM, since we generally have few STL containers live at once
(and when we beat on them, we beat on them heavily).
In principle, I buy your explanation. I just want again to refer to the figures from
my original mail:
So, we can see, that performance-wise the difference is not that huge.
But if we look at the number of new/delete calls, then it is quite different:
1) without STL standard allocator - Total of only 847(!!!) mallocs for
all of the allocators together, while adding 1000000 nodes for each
2) with STL standard allocator - Total of 2000628 mallocs for all of
the allocators together, while adding 1000000 nodes for each of them.So, it looks like on Linux the allocator used by STL is still using malloc/free rather extensively.
Even if pool allocator is used by STL by default, it is not saving too much of malloc/free calls, as
you can see from above figures. Or do you think that this amount of malloc/free calls is produced
by a pool allocator of STL?
> - possibility of using 3rd party libs like Boost (or their parts) in
We have already imported pieces of boost.
BTW, which ones?
The general rule is that it is ok to use specific pieces,
Totally agree with you.
but they have to be converted to the LLVM style and way of doing things.
Also, things with huge chains of dependencies sometimes have to be simplified.
One comment here:
While I agree with this approach for most libraries, I think that boost is (almost) becoming
a second-important libray for C++, almost as important as the STL. Therefore,
modifying/re-formating/converting it it is not such a great idea, eventually,
since everyone using it assumes that its implementation and semantics is exactly the
same on each platform.