Removing std::vector from APIs (was Re: Mutating the elements of a ConstantArray)

I've made a start on this for the constructors of StructType and
FunctionType. I've attached the patch so far, which is just enough to
build lib/VMCore/. Before I continue with it:

1. Is this still the way you want to go?
2. Do you care about breaking out-of-tree code that creates struct or
function types? I'm happy to convert cfe and llvm-gcc myself.
3. Are there any logistical problems with committing a wide-reaching
patch like this?
3. Any comments on the patch itself?

Thanks,
Jay.

patch.novector (10.9 KB)

2009/4/1 Chris Lattner <clattner@apple.com>:

As far API design goes, we’re in a mixed state. I’d strongly prefer

to get rid of std::vector from the various interfaces, f.e. creating a

constant array currently requires passing in an std::vector. For

these sorts of interfaces, we should migrate to passing in a "Constant

const / unsigned" pair. This allows use with a C array, a

SmallVector, std::vector, or any other container with sequential

storage.

I’ve made a start on this for the constructors of StructType and
FunctionType. I’ve attached the patch so far, which is just enough to
build lib/VMCore/. Before I continue with it:

Great!

  1. Is this still the way you want to go?

Yes

  1. Do you care about breaking out-of-tree code that creates struct or
    function types? I’m happy to convert cfe and llvm-gcc myself.

Nope, we expect people with out of tree code to update themselves.

  1. Are there any logistical problems with committing a wide-reaching
    patch like this?

I don’t think so.

  1. Any comments on the patch itself?

The one major thing to be aware of is that it isn’t safe to use &V[0] when V is an empty std::vector (though it is safe for smallvector). When converting code, just be careful so that this doesn’t break {} or void().

One minor thing is that StructType::get(NULL, 0) looks somewhat strange to me. How about adding a zero-argument version of “get” that returns the empty struct? That would allow code to use StructType::get().

Thanks for working on this Jay!

-Chris

3. Any comments on the patch itself?

The one major thing to be aware of is that it isn't safe to use &V[0] when V
is an empty std::vector

Oh dear. That's a bit of a flaw in the plan. I suppose the solution is
to switch to SmallVector whenever this might be a problem.

I'm a bit concerned that any new &empty[0] problems that are
introduced will go unnoticed. With GNU libstdc++ they aren't diagnosed
unless you build with -D_GLIBCXX_DEBUG (or ENABLE_EXPENSIVE_CHECKS=1).

For now I'm testing with ENABLE_EXPENSIVE_CHECKS=1, and it is indeed
catching lots of errors!

(though it is safe for smallvector).

DR 464 proposes a new data() method. I'd suggest implementing that in
SmallVector, instead of relying on the relaxed checking in
operator().

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#464

GNU libstdc++'s vector already has this data() method, but I suppose
we can't rely on in it being in std::vector in general.

One minor thing is that StructType::get(NULL, 0) looks somewhat strange to
me. How about adding a zero-argument version of "get" that returns the
empty struct? That would allow code to use StructType::get().

OK, I've done this for both StructType and FunctionType.

I'll post some patches on llvm-commits.

Thanks,
Jay.

> 3. Any comments on the patch itself?
>
> The one major thing to be aware of is that it isn't safe to use &V[0]
> when V is an empty std::vector

Oh dear. That's a bit of a flaw in the plan. I suppose the solution is
to switch to SmallVector whenever this might be a problem.

Or use iterators. That's why they're there.

For now I'm testing with ENABLE_EXPENSIVE_CHECKS=1, and it is indeed
catching lots of errors!

Mm hmm. :slight_smile:

                              -Dave

3. Any comments on the patch itself?

The one major thing to be aware of is that it isn't safe to use &V[0] when V
is an empty std::vector

Oh dear. That's a bit of a flaw in the plan. I suppose the solution is
to switch to SmallVector whenever this might be a problem.

As David points out, another solution is to make an inline templated iterator version like CallInst::Create has.

(though it is safe for smallvector).

DR 464 proposes a new data() method. I'd suggest implementing that in
SmallVector, instead of relying on the relaxed checking in
operator().

C++ Standard Library Defect Reports and Accepted Issues

GNU libstdc++'s vector already has this data() method, but I suppose
we can't rely on in it being in std::vector in general.

Sure, adding .data() to smallvector would make sense.

One minor thing is that StructType::get(NULL, 0) looks somewhat strange to
me. How about adding a zero-argument version of "get" that returns the
empty struct? That would allow code to use StructType::get().

OK, I've done this for both StructType and FunctionType.

I'll post some patches on llvm-commits.

Cool, thanks Jay! I'll be out on vacation for the next week, but other people are plenty capable of reviewing the patches. I'll take a look at anything remaining when I return.

-Chris

The one major thing to be aware of is that it isn't safe to use &V[0] when V is an empty std::vector

Oh dear. That's a bit of a flaw in the plan. I suppose the solution is to switch to SmallVector whenever this might be a problem.

Or use iterators. That's why they're there.

The reason to use the pointer-length pair in preference to iterators is that iterators make templates of everything, causing code bloat. In my code, rather than pass the separate parameters I use a range<Iter> class something like this:

template<typename Iter>
class range {
   Iter Begin, End;

public:
   range() : Begin(), End() { }

   template<typename ConvertibleToRange>
   range(T &seq) : Begin(seq.begin()), End(seq.end()) { }

   Iter begin() { return Begin; }
   Iter end() { return End; }
   // ... other Sequence methods ...
};

And encapsulate the logic to deal with the empty sequence problem in a function:

template<typename ConvertibleToRange>
range<typename ConvertibleToRange::value_type*>
ptr_range(ConvertibleToRange &seq) {
   typename ConvertibleToRange::iterator i = seq.begin(), e = seq.end();
   if (i == e)
     return range<typename ConvertibleToRange::value_type*>();
   return range<typename ConvertibleToRange::value_type*>(&*i, &*i + (e - i));
};

So that:

StructType::get(v.empty()? 0 : &v[0], v.empty()? 0 : &v[0] + v.size());
// becomes
StructType::get(ptr_range(v));

Two important refinements… First, the above can be made safer so that ptr_range will reject e.g. std::list with a bit of metaprogramming:

template<typename T> struct is_ptr { enum { value = false }; };
template<typename T> struct is_ptr<T*> { enum { value = true }; };

template<typename T>
struct is_contiguous_sequence { enum { value = is_ptr<T::iterator>::value }; };
template<typename T, typename Alloc>
struct is_contiguous_sequence<std::vector<T,Alloc> > { enum { value = true }; };
template<typename T, typename Ch, typename Alloc>
struct is_contiguous_sequence<std::basic_string<T,Ch,Alloc> > { enum { value = true }; };

template<typename T, bool True> struct type_if { typedef T type; }
template<typename T> struct type_if<T,false> { /* cause substitution failure */ }

// New prototype for ptr_range.
template<typename ConvertibleRange>
typename type_if<
   range<typename T::value_type*>,
   is_contiguous_sequence<T>::value
>::type ptr_range(ConvertibleRange &seq);

And secondly, range<T*> can be extended to convert from arrays in addition to objects with begin()/end() methods by (1) providing a partial specialization on T* which adds a template constructor:

template<typename T>
template<size_t N>
range<T*>::range<T*>(T (&arr)[N]) : Begin(arr), End(arr + N) { }

or by (2a) using a traits type to look up ConvertibleToRange::iterator & -::value_type

template<typename T>
struct range_traits {
   typedef typename T::value_type value_type;
   typedef typename T::iterator iterator;
   // .. etc ...
};

template<typename T, size_t N>
struct range_traits<T[N]> {
   typedef T value_type;
   typedef T *iterator;
   // .. etc ...
};

and by (2b) indirecting calls to seq.begin() and seq.end() through template functions:

template<typename ConvertibleToRange>
typename range_traits<T>::iterator beginof(T &seq) { seq.begin(); }
template<typename ConvertibleToRange, size_t N>
typename range_traits<T[N]>::iterator beginof(T (&seq)[N]) { return seq; }

template<typename ConvertibleToRange>
typename range_traits<T>::iterator endof(T &seq) { seq.begin(); }
template<typename ConvertibleToRange, size_t N>
typename range_traits<T[N]>::iterator endof(T (&seq)[N]) { return seq + N; }

This refines StructType::get(arr, arr + sizeof(arr) / sizeof(arr[0]));
to just StructType::get(arr);

— Gordon

I tend to prefer ranges as well, but iterators don't cause bloat. Just have something like this:

   template <typename iter>
   ... create(iter start, iter end) {
     if (start != end)
       return create(&*start, end-start);
     return create(0,0);
   }

which inlines away. This requires the iterators to be sequential in memory, which is perfectly reasonable IMO.

-Chris

Chris Lattner wrote:

  

The one major thing to be aware of is that it isn't safe to use
&V[0] when V is an empty std::vector
          

Oh dear. That's a bit of a flaw in the plan. I suppose the solution
is to switch to SmallVector whenever this might be a problem.
        

Or use iterators. That's why they're there.
      

The reason to use the pointer-length pair in preference to iterators
is that iterators make templates of everything, causing code bloat. In
my code, rather than pass the separate parameters I use a range<Iter>
class something like this:
    
I tend to prefer ranges as well, but iterators don't cause bloat. Just have something like this:
  

I tend to prefer begin/end iterators (or first/last for subranges, following the naming conventions of STL) for things like this - if the argument types are declared correctly, you should be able to use std::vector, SmallVector, or just regular POD pointers. I tend to prefer iterators for the (admittedly minor) reason that converting from a SmallVector to a start+size pair involves taking size(), which requires a scaling operation, whereas converting to first/last is just a bitcast at most.