std::function is inefficient

The nonstatic members of std::function are declared like this:

typename aligned_storage<3sizeof(void)>::type _buf;
__base* _f;

sizeof(std::function) is apparently supposed to be 16 bytes on 32-bit systems and 32 bytes on 64-bit. On my system, it’s 48 bytes, because the maximum alignment is 16 bytes, _buf takes 32 bytes, and there are 8 padding bytes after the pointer.

As for time complexity, following a __base* to get to a vtable is unnecessary in the first place.

Here’s an outline of an architecture that I think would be faster, slimmer, and better. It has better locality because calls can be dispatched without accessing the object on the heap at all. There are fewer memory accesses total. The number of indirect branches remains the same. One less pointer gets stored, FWIW.

It does presume that the abstract base pointer and the derived pointers are identical, but then so does the current implementation (see __func::__clone(__base*)). It does use C++ polymorphic classes, and I’m not sure whether that’s really kosher (see my previous message here).

template< typename ret, typename … args >
struct __base {
virtual ret call( args && … ) = 0;
virtual void __clone(void *) const = 0;
// etc.
~base();
};

template< typename ret, typename … args >
struct __empty_func
: base< ret, args … > {
virtual ret call( args && … )
{ throw bad_function_call(); }
virtual void __clone( void * p ) const
{ new( p ) __empty_func; }
// etc.
};

template< typename fp, typename alloc, typename ret, typename … args >
struct __small_func
: base< ret, args … > {
__compressed_pair< fp, alloc > _f;

virtual ret call( args && … a )
{ return _f( forward< args >( a ) … ); }
virtual void __clone( void * p ) const
{ new( p ) __small_func(_f); } // allocator only used when assigning a new type
// (Note: Current implementation does not support allocator erasure.)

// etc.
};

template< typename fp, typename alloc, typename ret, typename … args >
struct __big_func
: base< ret, args … > {
fp *_f;
alloc _a;

virtual ret call( args && … a )
{ return (*_f)( forward< args >( a ) … ); }
virtual void __clone( void * p ) const
{ new( p ) __small_func( new fp(_f) ); } // rather, use the allocator.
// etc.
};

template< typename ret, typename … args >
struct function< ret( args … ) {
aligned_storage< 4 * sizeof(void*) >::type _buf;

function() {
new(_buf) __empty_func<ret, args…>;
}

operator bool () const {
return typeid(__empty_func<ret, args…>)
== typeid(* static_cast<__base<ret, args…>*>(_buf));
}

void __clear() {
static_cast<__base<ret, args…>*>(_buf).~__base();

new(_buf) __empty_func<ret, args…>; // do this as a scope guard

}

template< typename target >
enable_if_t< sizeof(__small_func) <= sizeof _buf,
function & >
operator = ( target const & o ) {
try {
new(_buf) __small_func<target, ret, args…>( o );
} catch (…) {
__clear();
throw;
}
}

~ function()
{ __clear(); }
};

Let me know what you think…

  • Thanks,
    David