High Performance containers

Hi,

Let me present myself : I work on High Performance Computing, mainly on number crunching where the languages used are mainly Fortran and C++. This field is moving more and more from pure number crunching where Fortran shines to a mix of numbers, texts and other data that you cannot easily deal with Fortran. Unfortunately, the C++ standard library suffers from its age and the fact that it has never been thought for performance.

As a consequence, I develop an Open Source library of containers and utility that could be useful for HPC people. But the more I look at LLVM, the more I find that our problems are very close. Moreover, I find it disappointing that when Chandler Carruth gives a talk about LLVM containers, people cannot ask for a standalone library that they can use in their code. It turns out that the world I live in is very close to the LLVM world:

  • No exceptions (painful with highly multithreaded applications, and when mixing languages)
  • Need an efficient array container with small size optimization
  • Need an efficient hash map, hash set (with open addressing)
  • Need an efficient string that can plays smoothly with UTF8 and its folklore (filenames being byte arrays on Linux, UCS2/UTF16 on Windows, etc)
  • Need an easily used formatting library (the way Python does with format)
  • Need an easy way to instrument the containers (such as checking statistics on malloc size, on the number of copies vs moves, etc)
  • Need an efficient way to “return errors” that must be checked with rich type information
  • Using the same Open Source licence

In the end, I believe that LLVM problems for performance are common to many people. My background is mainly on x86-64, ILP, vectorization, multithreading and memory layout optimizations. I am sure that mixing different background can make a great library that could be useful to many C++ developers. What I am looking for is for developers with LLVM experience in core containers design too share ideas to build such a great library. The problem being that an API can sometimes kill performance, it is very important to share experience when designing it.

The project is still young and available here : https://github.com/insideloop/InsideLoop

Let me know if you think that such a library could be useful for you and if you would like to contribute. And, if it is a success, in a few years, why not using it in some LLVM parts…

François Fayard

Hi Francois,

Have you looked at the ADT library in LLVM, and have you considered contributing to LLVM directly (and improving the available data structures / algorithms in the codebase)?

I understand that might not meet the goal of something that is released and supported by the LLVM project (i.e. a standalone containers/adapters library) but I suspect something that developers working on LLVM passes and/or the compilers can use.

Good luck with the project, BTW. :slight_smile:

Cheers

Hi Dean. Thanks for your reply.

The ADT library is exactly what I end up replicating. My library started 2 years ago and at the beginning, what I needed was very different from LLVM. My first containers were:

  • A custom std::vector that does not initialize elements to 0 for int, double, etc. This is very important in HPC for the first touch policy used on NUMA architectures. It also allows alignement to vector width which can be important for performance with vectorization.
  • A small size optimized vector
  • Multidimensional arrays

That’s later, when I discovered Chandler Carruth’s talks, that I discovered that I was not the only one having issues with the STL. My hash map, is almost the same as LLVM one. I also have a hash set, a view on a string, and a unicode friendly string which can handle UTF8 as an invariant and which is implemented the same way std::string is implemented in libc++.

Where I might bring some help, is with the probing method in map and with the default hashing functions. There are some hashing strategies such as robin hood hashing that might be worth trying. Also I know, that the hashing strategy for integers in LLVM is suboptimal. But I am not sure it would give a lot of help as I don’t think LLVM hashes a lot on integers.

François Fayard

Just a friendly comment: opening remarks like that has a high chance to
annoy people to just ignore the rest of the post. Claiming that STL
design doesn't care about performance is not only ignorant, but
blatantly false. One of the major contributions STL had from the
beginning is requiring algorithmic complexity for many important
algorithms. The specific implementation choices might not agree
with your specific environment, but that doesn't mean they haven't been
carefully made. That's exactly where many LLVM ADT entries came from: a
specific problem with a big enough impact on the total design and/or
runtime that can be optimized for those constraints.

Joerg

This is a very common occurrence. The problems are similar, the
implementations slightly different and adaptation becomes really hard,
so everyone develops their own.

I don't think many people will consider replacing the ADT (et al) for
some external library, no matter how good it is. Not based on the
merits of your library, but on the usefulness of replacing a well
known and fine tuned library with an unknown one that solves pretty
much the same problems and can't really be that much faster. Plus, we
have very little use for BLAS-like functionality, which is the bulk of
most HPC libraries.

You can imagine a community the size of LLVM's having to rely on
external libraries that may have the same goals today, but different
paths tomorrow. The end result is the same: a fork for the things that
matter to us, just like we've done with the standard library.

All in all, your library looks really nice, and it does solve an
intersection of the problems we also solve, probably in very similar
ways, but that doesn't mean source code can be shared at that level.
There are other factors at play which have nothing to do with solving
problems.

cheers,
--renato

Thanks for your friendly comment. I agree that I was a bit rough with the STL. To be more specific, all those features can be performance issue in the STL. And I thought it was a given in the LLVM community.

  • there is now small array optimization in the STL
  • The usage of unsigned int (hello std::size_t) prevents so many optimizations. Vectorization could be one of them.
  • std::unordered_map cannot be implemented with open addressing. Same for std::unordered_set
  • Some API seems to be designed to shoot yourself in the foot performance wise, such as + for concatenating strings whereas an API such contact(s0, s1, …, sn) will never create temporaries
  • Default initializations of elements in containers such as std::vector make it impossible to tune for NUMA architectures unless you really want to use custom allocators which will give you more pain than solutions.
  • Mathematical functions such as std::pow are a joke. I am sure it does not affect STL people, but do you realize that the C++11 standard obliges the implementers to treat std::pow(x, 3) with x as a float with the following conversions : first convert x to double, then compute its cube, then convert it back to float. With C++03, the problem was not there. ( http://en.cppreference.com/w/cpp/numeric/math/pow )

The LLVM team seems to have done a great job reimplementing all those containers in a more efficient way which optimizations which are not at all specific to the STL. I am sure all the great idea you had could be useful to so many people in other field.

François Fayard

You can imagine a community the size of LLVM’s having to rely on
external libraries that may have the same goals today, but different
paths tomorrow. The end result is the same: a fork for the things that
matter to us, just like we’ve done with the standard library.

All in all, your library looks really nice, and it does solve an
intersection of the problems we also solve, probably in very similar
ways, but that doesn’t mean source code can be shared at that level.
There are other factors at play which have nothing to do with solving
problems.

I agree with you. And I never thought that sharing the same source code could be something that can be done.

But sharing ideas, implementation tricks, algorithms (I am thinking about all the probing methods available in open addressing) might be helpful. That’s were we can share ideas. Scientific computing is more and more about computing on strings (think DNA), and data structures such as graphs which are not as simple as the good old days where BLAS workloads where the only ones.

To be honest, I’d like to get the best of ADT and understand the rationale behind some choices. In return, I might bring you some time to test new ideas and I can try to give ideas that could also be implemented in LLVM. For me, what is time consuming is not coding, but designing APIs that are both easy to use, difficult to misuse, and don’t lock performance. Getting the 3 of them right (or as close as you can) is really hard work.
Just to give, you an idea, here is how my hash map works if you want to do memoization

Speaking of benchmarks, we might be able use the library, or some parts of it, in our test suite for correctness and performance testing. I see some stand-alone benchmarks that seem useful (e.g., ) but also some of the numeric things (e.g. which makes use of SIMD pragmas and vector intrinsics). Something with the hash tables and multi-dimensional arrays could be useful too. If you’re interested (or anyone else is), I’m happy to chat more about the details. -Hal

That is certainly an interesting proposition. I think we can have it
on both testing and benchmarking sets, but with different workloads,
maybe?

--renato

By the way, do you have any solution for an automatic performance regression checker ?

I don’t have one yet.

François Fayard

We are working on one, but it’s far too green to announce it to a
wider audience:
https://git.linaro.org/people/masaki.arai/hcqc.git/

Will check it out.

Also, I have just seen that some people want to develop a kind of “performance sanitizer” is there anything useable yet?

François Fayard

LLVMs answer to this is setting up CI jobs that submit result to an LNT server (http://llvm.org/docs/lnt/). It should also work for non-llvm projects.
Though today you better think of it as “performance tracking” rather than “automatic regression checking”.

  • Matthias