Should we split llvm Support and ADT?

Re-writing StringRef / ArrayRef etc to use the exact same API is a good idea long term, but there’s a lot of ugly messy details that need to be dealt with. There’s thousands of uses of take_front / drop_front, etc that have to be converted. Then there’s some methods that aren’t in string_view at all, like consume_integer(), consume_front(), etc that would have to be raised up to global functions in StringExtras. All of this can certainly be done, but it’s going to be a ton of churn and hours spent to get it all STL-ified.

Do you consider this a blocker for doing such a split? Would it make sense to do it incrementally where we first just move StringRef et all wholesale, and then incrementally work to STL-ify the interface?

Sure, I guess that splitting the arrayref/stringref headers out is a fine first step.

-Chris

You mentioned that a good line to draw is one where we’re adding things that are known to be added to c++ future. How strictly do we want to enforce this? There are lots of things have equally broad utility, but aren’t necessarily known to be added to c++ in the future.

For example, all of MathExtras and StringExtras, many member functions of StringRef that are not in string_view, etc. can we still have these in the top level compatibility library?

We could still aim for interfaces that 1-to-1 match STL, but it would nice if we could have some equally low level extras to enhance these classes

Having watched a similar library go through this exact evolution, I really doubt we want to make any split around “things known to be in C++ in the future”… It turns out that this is nearly impossible to predict and precludes a tremendous amount of useful utilities.

For example, there is no indication that the range helpers LLVM provides will ever end up in C++'s standard library, but they certainly seem useful for the demangler (the concrete use case cited).

What is the concrete problem with just linking the support library, in all its glory, into the demangler? Why shouldn’t we do that? I feel like that has gotten lost (for me)…

Having watched a similar library go through this exact evolution, I really
doubt we want to make any split around "things known to be in C++ in the
future"... It turns out that this is nearly impossible to predict and
precludes a tremendous amount of useful utilities.

For example, there is no indication that the range helpers LLVM provides
will ever end up in C++'s standard library, but they certainly seem useful
for the demangler (the concrete use case cited).

What is the concrete problem with just linking the support library, in all
its glory, into the demangler? Why *shouldn't* we do that? I feel like that
has gotten lost (for me)....

I think it's for reuse in low-level libraries (e.g. libcxxabi).

-- Sean Silva

Ok, but the challenges there seem substantially larger:

  1. We have to fix the licensing thing (we’re working on that, but it’s not going to be instantaneous).

  2. We would have to somehow “sandbox” every symbol linked into this library so that when libcxxabi itself is linked with some slightly different version of LLVM than it was built with things don’t explode.

If we solve #1 and #2 at all, running the appropriate build steps to strip any unused part of the big Support library out to minimize the runtime library cost seems pretty easy. And then we can use the entire Support library, no need to split.

So I’m wondering what the splitting is solving I guess?

Is there actually a valid use case for using the entire Support library though?

One thing that splitting solves is that I can have StringRef and ArrayRef split up and committed by tomorrow. The same can’t be said for the entire Support library :slight_smile:

Is there actually a valid use case for using the entire Support library though?

One thing that splitting solves is that I can have StringRef and ArrayRef split up and committed by tomorrow. The same can’t be said for the entire Support library :slight_smile:

Huh?

I’m asking what is the (remaining) use case for any split. unsplit is the status-quo, so I don’t see how the easy of doing a split is a use case… But maybe I’m misunderstanding something…

What I mean is, why would a non-LLVM project want or need everything in Support? There is a ton of compiler-specific stuff in there. There’s also file i/o stuff, string formatting, threading support, floating point support, even target specific stuff. Who needs this other than LLVM? And if the answer is “nobody” (which I suspect it is), then why should non-LLVM projects import it? “Because it’s the status quo” doesn’t seem like a good reason.

What I mean is, why would a non-LLVM project want or need everything in Support? There is a ton of compiler-specific stuff in there. There’s also file i/o stuff, string formatting, threading support, floating point support, even target specific stuff. Who needs this other than LLVM? And if the answer is “nobody” (which I suspect it is), then why should non-LLVM projects import it? “Because it’s the status quo” doesn’t seem like a good reason.

What “non-LLVM” project should be using this library at all though? I don’t know about others, but this is a non-goal for me. If we want to use a generic and re-usable library, I would start with one of the many existing ones rather than inventing our own, and that has its own host of problems.

But also, “file i/o stuff, string formatting, threading support, floating point support” all are also part of the standard library and folks seem OK with that. So I’m trying to understand the specific split you want to make and what motivates it. I don’t think the fact that we can make a split is actually sufficient justification for making a split.

I continue to think the most effective thing to do is to not think about this in terms of “splitting” and instead look for specific, well-defined components that could be extracted and layered above support.

I was pretty happy with extracting a BinaryFormat library for example. Here, because we gained a clear role and scope for the library, improved organization was a compelling motivation. But I don’t see any arbitrary split as particularly better or less well organized, and based on the previous arbitrary split (ADT vs. Support) I suspect we will continually put new things into the wrong location or want to move things across the boundary.

So I kind of come back again to my original question: what problem are you trying to solve? I can guess at a few things, but probably will be wrong… my guesses:

  1. Better organization of code & libraries
  2. A technical incompatibility like removing global constructors
  3. Reducing the size of tools using the library

For #1: I continue to think finding clearly defined components to extract is the best approach here. I suspect we’ll be left with essentially a replacement for facilities that one might expect to find in a standard library, and I suspect that’s about the best we can do.

For #2: This would be awesome. I would probably approach it by trying to extract libraries that actually need global constructors into separate libraries that document this requirement. My suspicion is that these are very few and far between.

For #3: I don’t understand why this matters – dynamic linking shouldn’t care, and static linking should drop the unused code. But if this isn’t working for some reason, I’m totally down with solving it, but the first step is probably to understand the specific issue being hit.

Anyways, mostly guessing at this to make sure we make progress. Don’t want to just be obstructionist here, I’m just genuinely trying to find the best path forward.

-Chandler

The dependencies for StringRef don’t seem particularly clean. StringRef.cpp includes APInt.h and APFloat.h. APInt.cpp includes FoldingSet.h and SmallString.h.

I don’t think it’s a good idea to try to build the uber-portable, minimal, standard library. We already have a few of those. They are hard to modify and improve because they have too many customers with too many requirements. The sanitizer runtime libraries are complicated enough that they could also definitely benefit from a minimal LLVM support layer, but I would hate to have to maintain such a library because they have such restrictive rules against not having library dependencies.

The important part of StringRef is the idea, not the code. Eventually, we’ll have std::string_view, and the idea will be everywhere.

In my opinion, this should be very strictly enforced. We need a black and white test for “what goes where”. The “already accepted into a future standard” metric satisfies that.

I’d even go so far as to say that stuff in StringRef that isn’t in string_view (e.g. the atoi stuff) should be split out (e.g. to StringExtras) before the move.

-Chris

I don’t think it’s a good idea to try to build the uber-portable, minimal, standard library. We already have a few of those. They are hard to modify and improve because they have too many customers with too many requirements. The sanitizer runtime libraries are complicated enough that they could also definitely benefit from a minimal LLVM support layer, but I would hate to have to maintain such a library because they have such restrictive rules against not having library dependencies.

The important part of StringRef is the idea, not the code. Eventually, we’ll have std::string_view, and the idea will be everywhere.

+1

My understanding is that Zachary is trying to solve a practical problem, not a theoretical “generic library” one. He wants to have specific functionality available to the demangler. If LLVM required C++’17, then we wouldn’t be having this particular discussion.

-Chris