Removing deprecated <ext/hash_set>, <ext/hash_map> and <ext/__hash>

Hi,

Libc++ has been shipping the <ext/hash_set>, <ext/hash_map> and <ext/__hash> headers for a while and they are deprecated. Those headers contain data structures like __gnu_cxx::hash_map that have replacements like std::unordered_map. I would like to remove those headers. I’ve put up a patch for review but I won’t commit it until we have a sort of plan because I know some people have expressed feelings about removing the headers in the past: https://reviews.llvm.org/D57688.

FWIW, I’ve compiled a large code base with that patch and I didn’t get any trouble. I suspect the amount of breakage this will cause is manageable especially if we give advance notice, but others might disagree.

Is anybody opposed to removing those headers? If you’re opposed to the removal, please explain why and what removal timeline/plan would work for you.

Thanks!
Louis

Well, it would be interesting to do a FreeBSD ports exp-run with these headers removed. You’d be surprised how much software is out there using obsolete, experimental and/or not even officially existing headers. :slight_smile:

-Dimitry

Well, it would be interesting to do a FreeBSD ports exp-run with these headers removed. You’d be surprised how much software is out there using obsolete, experimental and/or not even officially existing headers. :slight_smile:

Please do so and report on the results. One solution might be to ship those headers outside of libc++ for projects that still need them.

Louis

You haven't said anything about *why* they should be removed. Seriously,
removal just creates a maintainance burden for little to no reason. As
with many other such proposals: the burden is not on your shoulder, but
on maintainers of build systems like pkgsrc, ports or the like. It's
nice that things get deprecated. But changing existing software is a lot
of effort, esp. when it is no longer actively maintained.

Joerg

FWIW, I’m pretty sure we still have plenty of code using them. We’ve not done any analysis to see what timeframe that code could be be updated on – our focus has been on adopting libc++ (and easing that path), not removing the uses of weird things that it also supports. I’ll point out that I’d rather focus our energy on adopting libc++ than even doing this analysis. ;]

I somewhat agree with Joerg that it would be good to understand the motivation. Much like a bunch of our other compatibility things (GCC flags, language extensions, etc.), having these headers helps encourage / ease adoption which seems a generally good thing for libc++. I can imagine that there is some large cost to keeping these around that would motivate removing them, but I don’t see anything about that in the above?

-Chandler

FWIW, I’m pretty sure we still have plenty of code using them. We’ve not done any analysis to see what timeframe that code could be be updated on – our focus has been on adopting libc++ (and easing that path), not removing the uses of weird things that it also supports. I’ll point out that I’d rather focus our energy on adopting libc++ than even doing this analysis. ;]

I somewhat agree with Joerg that it would be good to understand the motivation. Much like a bunch of our other compatibility things (GCC flags, language extensions, etc.), having these headers helps encourage / ease adoption which seems a generally good thing for libc++. I can imagine that there is some large cost to keeping these around that would motivate removing them, but I don’t see anything about that in the above?

To be clear: there is not a large cost in keeping those headers around. I don’t think that’s the question, since the same could be said of almost any removal of deprecated API. The cost of keeping code around is usually not that large if you decide not to maintain it anymore. But that’s called code rot, and it’s generally not a good idea to accumulate too much of it. hash_map probably has bugs that we haven’t and won’t fix, etc.

Libc++ implements a Standard. __gnu_cxx::hash_map is not part of (any version of) that Standard, and so it does not belong in libc++. When I remove or rename some internal function inside libc++ that uses reserved identifiers, I don’t bother asking on this list. It doesn’t mean that I purposefully try to break users (quite the opposite), but I know where the line is drawn when/if users break because they use implementation details. This is also very similar to how we deprecate AND remove TSes one year after they are merged into the Standard. The amount of breakage caused by removing <experimental/optional> was non-trivial in our case, but we dealt with it and now our code is better.

What I’m trying to do here is understand whether and why __gnu_cxx::hash_map is special in that respect. Why was it put there in the first place? Why is it so hard to get rid of? If somebody can explain why it should be kept around, I’m happy to talk about it (this is why we have this thread). However, the argument of “we’re too busy/lazy to update our code” doesn’t sound like a very strong one to me, especially coming from someone that has excellent tools to deal with this kind of problem. So there must be something else underneath, and that’s what I would love to understand.

Louis

FWIW – My response (for several years) to any reported problems in __gnu_cxx::hash_map/set has been “use the ones in std::, then”

– Marshall

P.S. The implementation in __gnu_cxx is NOT a faithful implementation of what libstdc++ has; it is a wrapper over std::unordered_map/set.

FWIW, I’m pretty sure we still have plenty of code using them. We’ve not done any analysis to see what timeframe that code could be be updated on – our focus has been on adopting libc++ (and easing that path), not removing the uses of weird things that it also supports. I’ll point out that I’d rather focus our energy on adopting libc++ than even doing this analysis. ;]

I somewhat agree with Joerg that it would be good to understand the motivation. Much like a bunch of our other compatibility things (GCC flags, language extensions, etc.), having these headers helps encourage / ease adoption which seems a generally good thing for libc++. I can imagine that there is some large cost to keeping these around that would motivate removing them, but I don’t see anything about that in the above?

To be clear: there is not a large cost in keeping those headers around. I don’t think that’s the question, since the same could be said of almost any removal of deprecated API. The cost of keeping code around is usually not that large if you decide not to maintain it anymore. But that’s called code rot, and it’s generally not a good idea to accumulate too much of it. hash_map probably has bugs that we haven’t and won’t fix, etc.

Libc++ implements a Standard. __gnu_cxx::hash_map is not part of (any version of) that Standard, and so it does not belong in libc++. When I remove or rename some internal function inside libc++ that uses reserved identifiers, I don’t bother asking on this list. It doesn’t mean that I purposefully try to break users (quite the opposite), but I know where the line is drawn when/if users break because they use implementation details. This is also very similar to how we deprecate AND remove TSes one year after they are merged into the Standard. The amount of breakage caused by removing <experimental/optional> was non-trivial in our case, but we dealt with it and now our code is better.

FWIW, I think extensions for compatibility are interestingly different from TSes. Clang also implements standards but supports a wide range of extensions for compatibility with both GCC and MSVC. Unlike TSes and other things with a clear path forward, many of these extensions don’t have that clear path forward. I’m not trying to say __gnu_cxx::hash_map is necessarily the same, just that I think there is likely to be different needs for compatibility extensions than for “early versions” of standards-tracked things.

And of course, any such way of handling needs to have a reasonable benefit given the cost, and have a reasonable plan in place for maintaining it and supporting it without holding back the larger project.

What I’m trying to do here is understand whether and why __gnu_cxx::hash_map is special in that respect. Why was it put there in the first place? Why is it so hard to get rid of?

Sure, I’m happy to provide some of the historical context (I think I was one of the people arguing it should be added).

When we first tried even minor adoption of libc++ we couldn’t make it far in even small scale experiments because so much of our code relied on __gnu_cxx::hash_map. This was a lot of years ago, and we had no realistic way to migrate things, much less to do so when we hadn’t even conducted the experiment to see what moving to libc++ would look like.

At the time, I argued that this would likely be reasonably common for code building against libstdc++ for many years, much like reliance on GNU extensions to C and C++ is widespread. To make adoption effective, some of these need to be supported even if we would prefer people converge on the standard facilities. Things like __thread come to mind.

Some of this code isn’t even being updated to C++11 actively. While it builds with C++11, this is largely because C++11 didn’t break anything it relied on. I think it is important that Clang and libc++ both support such code to some extent in order to maximize their adoption in the larger Linux ecosystem (and likely BSD, but I know less there). That support doesn’t have to be perfect of course, and the engineering costs should be considered for anything like this. But even the imperfect _gnu_cxx::hash* stuff libc++ provides has empirically had value for at least a few code bases trying to adopt libc++. So I don’t think the tradeoff is trivial here.

If somebody can explain why it should be kept around, I’m happy to talk about it (this is why we have this thread).

I think the argument when it was added still holds: there is likely to be a great deal of code written against this because for many years it was the only viable option, and it has been stable and reliably available in libstdc++. I think making it easy for existing codebases to move to libc++ is good for the project, the same way that I think making it easy to move to Clang is good for that project.

I can make a somewhat stronger argument that Google specifically would need this to remain to continue working on adoption. We will likely be able to eventually move off of this, but our priority at the moment is getting on to libc++, and there are already enough barriers there that I would very much like to avoid adding another component to that.

However, the argument of “we’re too busy/lazy to update our code” doesn’t sound like a very strong one to me, especially coming from someone that has excellent tools to deal with this kind of problem.

Let’s avoid terms like “lazy”. =]

I don’t think what I said was that we were busy? Sorry if it came across that way. I’ve tried to expand on it a bit above.

But my second point was not about specific needs of existing users. Instead, it was about users we don’t yet have. I think there may be significant value to them of supporting some extensions like this, and hopefully above I’ve given more useful details about this value. Let me know if this is the kind of information you’re looking for.

See my longer response to Louis for details, but FWIW, even this level of “support” has been materially useful for our codebase. ::shrug::

It was put in because it used to be very popular. Many programs later
moved to TR1 and then to C++11, but for a time there was no alternative
in STL. It is hard to get rid of it, because quite a bit code is still
around without active maintainer. It still serves a purpose, it more or
less works as is, so artifically breaking it just creates work for
little reason. Keep in mind that code like that is not *my* code. Please
don't pull the OpenSSL approach here where the API is randomly changed
because it can be changed and the patching of hundreds of existing
projects is left for others to figure out.

Joerg

[…]

Thanks for the reply, Chandler. The idea that hash_map can ease adoption of libc++ for new users makes sense, and I do agree it provides value to libc++. I also think we must be careful not to try to mimic too many bad things that libstdc++ does just for the purpose of easing adoption, since that will be bad for libc++ in a different way.

What I’m trying to do here is understand whether and why __gnu_cxx::hash_map is special in that respect. Why was it put there in the first place? Why is it so hard to get rid of?

Sure, I’m happy to provide some of the historical context (I think I was one of the people arguing it should be added).
[…]

Thanks a lot for this, I wasn’t aware of the context.

I’d be very interested to see how much breakage this causes in something like FreeBSD just so we have a more concrete idea of what software this breaks. Perhaps the breakage will be smaller than the last time this was attempted? Otherwise, we’re operating on assumptions. Dmitry, do you think you could do the exp-run you were talking about? I’d be very interested to see the results.

Louis