on libc++ extensions and slist

In existing C++03 standard libraries there are a ton of extensions. Some good, some not so good. Some of the better ones have morphed into official C++11 libraries. However, I can't think of a single example where an extension has been standardized unchanged from its extension form. There is good reason for this.

I'm going to pick on slist. But my argument is meant to be broader than slist.

slist was first produced by SGI. I'm not sure how long ago, but I'm betting circa 1996. I'm also betting (not sure) that Matt Austern was the author (I have the utmost respect for Matt Austern). At that time, C++ templates were pretty new, and the industry was not very experienced in their use. For whatever reasons, the committee standardized a doubly linked list, but not the singly linked list (slist) in 1998.

At that time (1998) there was a debate over whether std::list::size() should be O(1) or O(N). There are good arguments to be made on both sides of the debate (it involves list::splice). And because the debate became so divisive, the committee drew a compromise:

   container::size() /should/ have constant complexity

(yes, this really does have something to do with slist, please be patient)

I should explain a little about committee-speak. "Should" doesn't mean "should". "Should" means "maybe, maybe not". I'm not kidding. :frowning:

In practice, the only container that ever did have an O(N) size() was list, and even then, only for some implementations (including SGI and gcc). And even though all well-read C++ programmers know not to use list::size() unless they are willing to pay O(N), I've seen *a lot* of code that uses list::size() carelessly. I've even seen it in places you wouldn't expect, such as boost.

    for (size_t i = 0; i < c.size(); ++i) // quadratic loop, not linear!
    {
        // ...
    }

Somewhere around 2001-2002 Matt and I were having a friendly debate about this issue. I said: list shouldn't even have size(). Having it compile to an O(N) operation is almost always a run time performance bug. And to Matt's credit, he immediately got it: A compile time error is infinitely better than a run time performance bug. And when Matt proposed slist (renamed to forward_list) to the C++ committee, it was missing a size() member.

Casual clients of std::forward_list, when they write:

    for (size_t i = 0; i < c.size(); ++i)
    {
        // ...
    }

will rightly get a compile time error (as both Matt and I agree they should).

Casual clients of ext::slist, when they write:

    for (size_t i = 0; i < c.size(); ++i)
    {
        // ...
    }

get a silent run time performance bug.

Hello,

[...]

Possible solutions to ease migration and not lead our customers into horrible mistakes we already know about:

1. We could create an ext::slist that lacks size(). That way the customer would not have to change their code unless they used size().

Disadvantage: We haven't delivered a seamless drop in replacement lib. They still have to change their code. Why not also change the name of the container to std::forward_list at the same time? Otherwise we have to support slist. And not just to ease migration. Once we introduce it, we have to support it for forever for backward compatibility with our own lib.

At least until a new version of libc++ comes along. As far as I see, then backward compatibility can be broken.
Of course, this just postpones the migration efforts discussed below ...

2. Status quo: We could do nothing.

Disadvantage: I'm seeing rising complaints about migration efforts.

3. Educate: In a private email about a week ago M.E. O'Neill noted that we should provide documentation recommending migration strategies for things like slist -> forward_list. She suggested actually providing an <slist> header which did nothing but point to the documentation and then error out. I'm not sure I'd go as far as providing such a header. But I am really enthusiastic about a "migration document". Having something to point to which explains why using slist is like playing with a loaded gun. When people are migrating, they are open to such arguments if those arguments are both convincing and readily available. And if the migration document provides a way to experiment without committing yourself (i.e. make it easy to backpedal the migration):

I'm in favour of such a header. *Providing* documentation is only one part of the equation; the documentation also
has to be found. An #error in the slist header containing the URL of the migration document would simplify this.

Disadvantage: Doesn't provide seamless migration. And it is a fair amount of work to document.

But is seamless migration actually a reasonable expectation for someone changing from libX to libc++? After all,
slist has always been an extension, so despite the fact that many standard libraries implement it, one has to expect
to have to change code when moving to libc++.

A second point: People will most likely not migrate to libc++ just for fun. They will migrate to it e.g. because
they want to have a C++0x conforming standard library, and because the standard library they are currently using
does not provide this (e.g. because development on it has ceased, or because newer versions come with an imcompatible
license). And since C++0x has a single-linked list implementation, there is no need anymore to use slist (the same
goes for hash_map et.al).

Field experience: I'm responsible for deprecating auto_ptr.

One of the best changes in the C++0x standard library :slight_smile:

[...]

Jonathan

In existing C++03 standard libraries there are a ton of extensions. Some good, some not so good. Some of the better ones have morphed into official C++11 libraries. However, I can’t think of a single example where an extension has been standardized unchanged from its extension form. There is good reason for this.

I’m going to pick on slist. But my argument is meant to be broader than slist.

I want to clarify, I don’t disagree with any of your diagnosis. I agree completely with the flaws and problems present in slist.

When our customers are migrating to libc++, it is at a time of their choosing. They have set aside time to do the migration. And shown interest in doing so. A year after they migrate, when subtle problems surface, they are not going to be in the mood to be educated. They are going to be in the mood to be educated about the right way to do things when they are migrating, and not later when run time problems arise.

Keep in mind that some customers at least may not be able to update all of their code in one fell swoop. Forcing changes to the client code in too many places may make it impossible to migrate at all rather than merely adding to the cost of migration.

Also, in some contexts, there is no user education to be done. The code in question is legacy code, old and slated for deletion. No one maintains it, updates it, or cares for it; but it isn’t (quite) ready to be removed once and for all. These situations make the arguments about education only apply to a subset of the migration (although there they certainly make sense).

  1. We could create an ext::slist that lacks size(). That way the customer would not have to change their code unless they used size().

I agree with your disadvantages here. Reducing (slightly) the magnitude of those changes doesn’t seem as attractive. For a significant chunk of the code I’ve seen in this position, the cost of changing the code is dominated by a very high constant factor, not by the nature of the change itself. (Consider, an open source library that has to have changes pushed back upstream, and written in a particular style and which no one is actively maintaining any longer.)

  1. Status quo: We could do nothing.

Disadvantage: I’m seeing rising complaints about migration efforts.

There is another disadvantage: it may preclude migration at all. If there exists code that simply can’t be updated, you’re now stuck.

More realistically, it may drive the cost of migration sufficiently far up that it becomes a difficult or impossible to tackle problem.

  1. Educate: In a private email about a week ago M.E. O’Neill noted that we should provide documentation recommending migration strategies for things like slist → forward_list. She suggested actually providing an header which did nothing but point to the documentation and then error out. I’m not sure I’d go as far as providing such a header. But I am really enthusiastic about a “migration document”. Having something to point to which explains why using slist is like playing with a loaded gun. When people are migrating, they are open to such arguments if those arguments are both convincing and readily available. And if the migration document provides a way to experiment without committing yourself (i.e. make it easy to backpedal the migration):

Again, see my points above about education. Sometimes there is no one to educate, and no one intending to maintain or improve the code long term. Certainly, documenting the migration hurdles is very helpful to those doing the migration, but it may not be enough for sufficiently popular extensions.

Disadvantage: Doesn’t provide seamless migration. And it is a fair amount of work to document.

Field experience: I’m responsible for deprecating auto_ptr. At the same time I introduced unique_ptr. This involved a lot of documentation. Not only with committee papers, but in participating in countless newsgroup and mailing posts to the general public, presentations, etc. And it is paying off. I now see lots of people I’ve never heard of telling others that they should replace auto_ptr with unique_ptr, even though it is not a drop-in replacement. And the listeners are receptive to the message.

I don’t really dispute any part of this other than the value of education on old code. For new code and actively maintained code, this is all spot on.

That said, I would like to migrate a very large codebase to a point where it could use libc++. Why? Because I want to migrate the codebase to C++0x, and one aspect of doing that is migrating to a viable standard library for C++0x. I’m very interested in investigating the trade-offs between libstdc++ and libc++ here, but we can’t do that if we have to update the entire code base in the process.

I actually will pick on hash_{map,set} here because I think they’re even better examples. I actually agree that ‘slist’ is border-line.

There is essentially zero possibility that we will use hash_map forever. We will switch to unordered_map because its better and because its standardized. However, it isn’t available (technically[1]) until we switch to C++0x. With a codebase of sufficiently large size, we can’t afford to do both transitions simultaneously, we have to first switch to C++0x, and then switch from old extension containers to new standard ones.

[1]: Yes it is available as std::tr1::unordered_map, but now we have to migrate the codebase 3 times, once to the std::tr1 implementation, once to 0x, and then once back to std::. That drives up the cost of migration by another notch.

Now, my goal is to have two different libraries to choose from when migrating to C++0x so that bugs in one don’t block progress. But in order to do that we need at least “enough” extension support in both libraries to make it tractable to use either one.

This isn’t an argument for “all”, merely for “enough”. For example libc++ already has hash_map and hash_set, and that is probably the single most important container to support. We’ve now gotten a trial run conversion of a non-trivial chunk of our codebase to libc++, and the only other container that was missing was ‘slist’. The only other missing extensions were a handful of very rarely used algorithms. Algorithms are much easier to work around than types, as two pieces of code can use different but compatible implementations without conflicting. I see no reason for libc++ to bother with these extensions.

The types are trickier. Had there been 10 extension types we needed, making ‘slist’ a slippery slope of extensions we might never get off of, I would be more sympathetic to drawing the line sooner and more forcefully. I think the best argument for adding a compatible ‘slist’ implementation is that it seems to be the last we’ll need.

Size is not the only issue with slist. Slist also features container operations without the _after suffix, meaning they operate on the elements pointed to by the iterators. These, also, must have linear complexity, and so don’t exist in forward_list. Given this, the migration may be far from trivial - especially given that clients might find that they really ought to be using a doubly-linked list (or some other container) in their situation.

One idea that I had for an unrelated problem but might be useful for this one as well would be to have specific deprecated tags like [[deprecated(forward_string)]]. We could put these on the functions that are removed from the standard library, and migration could be done one container at a time by turning on each deprecated tag in order (something akin to -Wdeprecated=forward_string). We could put this on the slist functions that are not present in forward_list, and thus someone can clean up their codebase until it should be a drop-in replacement, and then perform the drop-in replacement. (N.B. to Chandler: I’m aware that this is not applicable in our situation).

Thoughts?

Sean

Thanks Chandler. The migration experience details you have shared with us are very important.

I was wondering if you could expound a little on why the following migration technique isn't viable:

#include <ciso646> // identify library

#ifdef _LIBCPP_VERSION
# include <forward_list>
  template <class T> using MyList = std::forward_list<T>;
#else
# include <ext/slist>
  template <class T> using MyList = __gnu_cxx::slist<T>;
#endif

int main()
{
  MyList<int> c;
  c.push_front(1);
  c.push_front(2);
  c.push_front(3);
}

If you use member functions which aren't in std::foward_list (such as insert), I can see how that would be problematic. And in that case I find myself wondering if an adaptor approach around std::forward_list wouldn't be sufficient. Something like:

template <class T, class A>
class slist
{
   std::foward_list<T, A> data_;

public:
   // forward everything to data_...

   // except size
   size_type size() const {return std::distance(data_.begin(), data_.end());}

   // and previous
   iterator previous(iterator i)
   {
       iterator r = data_.before_begin();
       for (const_iterator j = data_.begin(); j != i; ++i)
           ++r;
       return r;
   }

   const_iterator previous(const_iterator i) const
   {
       const_iterator r = data_.before_begin();
       for (const_iterator j = data_.begin(); j != i; ++i)
           ++r;
       return r;
   }

    iterator insert(iterator pos, const T& x)
        {return data_.insert_after(previous(pos), x);}

   // etc...
};

Howard

I have converted some fairly substantial code bases at my work to run on clang, then libc++, then c++0x. I also did quite a lot of the work involved in getting boost to run on clang/libc++/c++0x.

On any sizeable code base, each of these changes, particularly mapping to clang and then libc++, will take some work. This work is mostly because other compilers have been far too lax in the past.

I want to see libc++ and clang++ keep their neatness, and not start introducing huge number of patches to get around old standards-incompatable code.

This certainly seems to be the route that clang++ itself is sticking to, my experience is almost any piece of sizeable c++ code will fail to compile when first given to clang++ (although the error messages are very helpful in showing how to fix it).

However, a written migration strategy, covering issues which are likely to come up, would be useful. I'm not sure what the best way to present such a document would be (in svn, on a wiki for easier editing?)

In short: Tell people how to fix their code, don't help them patch around it silently. They are going to have pain switching to clang/libc++/0x whatever you do.

Chris