std::string

Are there any rules against using std::string or other parts of stl in llvm?

No. You can use any part of the stdlib.

There isn't much use of std::string in LLVM because it's simply not
needed. There's very little string manipulation, so StringRef is often
a much better choice. When there is a need for string manipulation,
the strings are generally very short, so SmallString is better.

- Michael Spencer

Although SmallString is actually pretty inefficient, since it keeps
the string data separate from the "vector" header. I believe libc++'s
std::string actually reuses the pointers in the "vector header" as the
storage for the "small" size, and so in that case std::string is
effectively a very memory efficient SmallString with storage of
roughly 3 pointers' worth of chars.

-- Sean Silva

Why do we actually have all the local versions of containers that already exist as a part of the C++ standard (such as SmallVector, DenseMap, et al.)?

-Krzysztof

See:
http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

-Chris

Were the "small n" characteristics the main motivation? Memory-wise, STL classes allow user-defined allocators, so use of things like memory pools should be relatively straightforward. Just wondering... :slight_smile:

-Krzysztof

See:
http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

Were the "small n" characteristics the main motivation?

It is one of the motivations.

Memory-wise, STL classes allow user-defined allocators, so use of things like memory pools should be relatively straightforward. Just wondering... :slight_smile:

Yes, you get some of the benefits that way, but not all of it.

-Chris

Embarrassed to admit I have not read that manual yet have done a full port. :slight_smile:

Just did things from seeing how other people did them.

Never too late to learn though. Time to peruse through the main docs page again and of course to read the Programmers Manual!

Reed

Short answer: ours are better.

Longer answer: ours are better for particular situations.

(you'll notice, for example, that DenseMap has different iterator
invalidation semantics & requires a tombstone value - by paying that
cost the container can be a bit faster and/or more memory efficient)

For details: http://llvm.org/docs/ProgrammersManual.html

What were the others?

The reason I ask is that STL comes all ready, with containers and algorithms. They may not be optimal for every task, but they do their job and they are part of the standard. There may be some price to pay in terms of performance/memory usage/etc. for a specific application, but overall it may be worth it. Evidently, in case of LLVM, someone (you?) decided that having local set of containers is a better idea. I simply want to understand the reasons behind this decision.

I quickly looked over the library section on containers in the C++03 standard and I didn't see any paragraphs regarding the allocation strategy for classes like "set" or "map". The LLVM page seems to contain information that was based on some specific implementation (or several implementations), but was not mandated by the standard itself.

-Krzysztof

The containers in the STL are general purpose classes that are fast for a wide variety of different elements but don't excel on any special case. The LLVM classes are much more specialized.

For example DenseMap is a hash table that embeds both keys and values into a big table of memory. This gives nice cache behavior for simple pointer-to-pointer mappings which are common throughout LLVM while std::map wastes memory and has worse cache behavior. For big keys DenseMap wastes a ton of memory and becomes slow, std::map is better for those cases.

Another example: clang's lexing speed is directly bound by StringMap, a specialized hash table for string keys (std::map<std::string, …> is extremely slow in comparison). A fast lexer is a big deal when you want to embed clang into an IDE and still get responsive autocompletion.

Most of the containers in LLVM were motivated by cases where the standard containers were just too slow and needed a replacement. Some of them were closely related to a bad implementation decision in a particular version of the standard library, e.g. libstdc++'s std::string doesn't implement the small string optimization, making incremental building of strings slow, to make that use case faster SmallString was invented. Now that small std::strings are more commonly implemented we may want to retire it eventually.

- Ben

It is one of the motivations.

The reason I ask is that STL comes all ready, with containers and algorithms. They may not be optimal for every task, but they do their job and they are part of the standard. There may be some price to pay in terms of performance/memory usage/etc. for a specific application, but overall it may be worth it. Evidently, in case of LLVM, someone (you?) decided that having local set of containers is a better idea. I simply want to understand the reasons behind this decision.

I'm confused here. You're acting as though we don't use the STL. In fact, we do use std::string, std::vector, std::map etc when they are the right solution for the job. We are not avoiding the STL, we're just not using it when it isn't the best tool for a job.

LLVM's purpose isn't to use inappropriate C++ containers just because they are "standard". Our goal is to build the best compiler possible, and if that means we have to implement custom containers, so be it. I would really like for the C++ standard library to provide containers better than what LLVM does, but until it does, we are stuck with ours.

I quickly looked over the library section on containers in the C++03 standard and I didn't see any paragraphs regarding the allocation strategy for classes like "set" or "map". The LLVM page seems to contain information that was based on some specific implementation (or several implementations), but was not mandated by the standard itself.

We live in a pragmatic world, not a theoretical one.

-Chris

+1 to Chris's answers, and this is somewhat off topic for this list,
but to answer this paragraph: C++03 and C++11 don't explicitly specify
the allocation strategy for set or map (or unordered_map), but they do
specify when iterators and pointers can be invalidated, which only
allows one or a few allocation strategies.

Jeffrey

I'm trying to understand the reasoning behind the decisions made at the beginning of LLVM. My working assumption is that ADT didn't exist when LLVM started (whereas STL did). In such case, I'm assuming that creation of ADT was motivated by needs of LLVM that STL didn't meet. I'm trying to understand what the needs were and where STL was considered inadequate. Creating a new set of containers is an investment, so, again, I'm assuming that there were specific motives that caused that investment to be made. Benjamin's answer was actually very informative, that was that kind of information I was looking for.

-Krzysztof

It also provides a level of consistency. Different systems/compilers can use different STL implementations with different characteristics. The LLVM containers are a known quantity when trying to assess performance.

Also, there was no hashtable in STL before C++11 (DenseMap), and there
is still no StringRef and ArrayRef equivalent.

Dmitri

You can actually see what happened if you look back far enough in the
mailing list:

One common idiom was:

Someone saw that X and Y optimization passes were slow
They determined the slowness was in the containers
The containers turned out to be slow on all platforms
New containers were implemented
They made real world significant differences in speed on all platforms
The new containers were used.

The alternative would have been to try to get patches into multiple
STL's, and then tell everyone in the world to upgrade. IE "Oh, LLVM
is slow if you are on GCC 4.3.6, you need 4.3.7, or MSVC 2012, or XLC
11.9 or ...".

Every so often someone determines some optimization pass is still
using XX percent of time in a container, they test various containers,
discover that either it's faster or slower than other alternatives,
and LLVM adjusts accordingly.

Were the "small n" characteristics the main motivation? Memory-wise, STL classes allow user-defined allocators, so use of things like memory pools should be relatively straightforward. Just wondering... :slight_smile:

Just fyi:

http://home.roadrunner.com/~hinnant/stack_alloc.html

This may or may not be appropriate for llvm. But it is an example of a fully C++11 conforming way to make a SmallContainer using a custom allocator and any C++11 std::container, or any container that mimics the std container requirements.

One thing to keep in mind when designing SmallContainer (either from scratch or using an allocator) is that there is a tension between internal buffer size and C++11 move semantics. If the buffer is internal to the container (as is typical for std::string these days), the bigger the buffer the more expensive the container move members are. If the container never needs to be moved, this isn't a concern of course. But otherwise watch out for it. Typically the cost of a container move operation is proportional to sizeof(container). It was this very issue that drove the design of libc++'s std::string: make the internal buffer as large as possible without increasing sizeof(string).

I'm confused here. You're acting as though we don't use the STL. In fact, we do use std::string, std::vector, std::map etc when they are the right solution for the job.

I'm trying to understand the reasoning behind the decisions made at the beginning of LLVM. My working assumption is that ADT didn't exist when LLVM started (whereas STL did).

Right. LLVM started using the stl much more pervasively. When I (and others) started using an actual profiler on it (2006 or later?) ADT started coming into existence.

In such case, I'm assuming that creation of ADT was motivated by needs of LLVM that STL didn't meet. I'm trying to understand what the needs were and where STL was considered inadequate. Creating a new set of containers is an investment, so, again, I'm assuming that there were specific motives that caused that investment to be made. Benjamin's answer was actually very informative, that was that kind of information I was looking for.

As DannyB says, the best way to find this is svn history. Look at and near the commits that added SmallVector.h or whatever.

-Chris

We should not blinding worship STL. It might be general purpose but it can be a big performance drain in memory and/or runtime. I really like the StringRef class for example.