[RFC] Implement gcc Bound PMF in clang

for PR [Clang] Add support for GCC bound member functions extension by dtcxzyw · Pull Request #135649 · llvm/llvm-project · GitHub

Motivation

Words copied from gcc document

In C++, pointer to member functions (PMFs) are implemented using a wide pointer of sorts to handle all the possible call mechanisms; the PMF needs to store information about how to adjust the ‘this’ pointer, and if the function pointed to is virtual, where to find the vtable, and where in the vtable to look for the member function. If you are using PMFs in an inner loop, you should really reconsider that decision. If that is not an option, you can extract the pointer to the function that would be called for a given object/PMF pair and call it directly inside the inner loop, to save a bit of time.

Note that you still pay the penalty for the call through a function pointer; on most modern architectures, such a call defeats the branch prediction features of the CPU. This is also true of normal virtual function calls.

The syntax for this extension is

extern A a;
extern int (A::*fp)();
typedef int (*fptr)(A *);

fptr p = (fptr)(a.*fp);

For PMF constants (i.e. expressions of the form ‘&Klasse::Member’), no object is needed to obtain the address of the function. They can be converted to function pointers directly:

fptr p1 = (fptr)(&A::foo);

You must specify -Wno-pmf-conversions to use this extension.

Real worlds applications

All real worlds applications are early binding.

Case 1: bind for life time constant conditions

struct A {
   bool c1, c2, c3;
   void foo() {
      if (c1)  foo1();
      else if (c2) foo2();
      else if (c3) foo3();
      else foo4();
   }
  void foo1(); // ... foo4()
};

Such code can be optimized to:

struct A {
   bool c1, c2, c3;
   void (A::*p_do_foo)();
   A(...) {
      // bind p_do_foo by life time constant conditions
      if (c1)  p_do_foo = &A::foo1;
      else if (c2) p_do_foo = &A::foo2;
      else if (c3) p_do_foo = &A::foo3;
      else p_do_foo = &A::foo4;   
   }
   inline void foo() { (this->*p_do_foo)();  }
};
struct B : A {
  B(...) { /*bind p_do_foo as B's member function*/ }
};

Calling member function through p_do_foo is slower than plain function ptr.

Case 2: bind virtual functions

This is a real world code snippet from ToplingZipTableReader (An SST componet of ToplingDB), the using can be simplified as demo below:

// more ...
// the TableIterator
void SetSubReader(const ToplingZipSubReader* sub) {
  subReader_ = sub;
  iter_ = sub->index_->NewIterator();
  store_ = sub->store_.get(); // get_record_append is a case 1 PMF
  get_record_append_ = store_->m_get_record_append_CacheOffsets;
 #if defined(_MSC_VER) || defined(__clang__)
  #define IF_BOUND_PMF(Then, Else) Else // has no PMF
 #else
  #define IF_BOUND_PMF(Then, Else) Then // use PMF
  // bind the virtual function by pmf
  iter_next_ = (IterScanFN)(iter_->*(&COIndex::Iterator::Next));
  iter_prev_ = (IterScanFN)(iter_->*(&COIndex::Iterator::Prev));
 #endif
  tag_rs_kind_ = sub->tag_rs_kind_;
}
// calling by PMF is much faster than by virtual call, iter_next_ and iter_
// can be loaded in parallel by CPU instruction level parallelization.
bool IndexIterInvokeNext() { return IF_BOUND_PMF(iter_next_(iter_), iter_->Next()); }
bool IndexIterInvokePrev() { return IF_BOUND_PMF(iter_prev_(iter_), iter_->Prev()); }
// more ...
//
// UintIndex Iterator code snippet, virtual function Next is very simple.
// Such index is used for MyTopling(MySQL) auto_increment primary key.
//
template<int SuffixLen>
struct UintIndexIterTmpl : public UintIndexIterator {
  bool Next() override {
    if (UNLIKELY(pos_ >= max_pos_)) {
      m_id = size_t(-1);
      return false;
    } else {
      ++m_id;
      // In AllOneRankSelect::zero_seq_len(size_t) { returns 0; }
      pos_ = pos_ + index_.indexSeq_.zero_seq_len(pos_ + 1) + 1;
      UpdateBufferConstLen(); // bswap and store SuffixLen bytes
      return true;
    }
  }
}
// more ...

In the demo, the virtual function(Next()) is very small, and there are not neighbor code around the virtual call(in IndexIterInvokeNext) to it, thus the virtual function lookup can not be hide by CPU pipelining and multi issues, PMF can make a measurable speedup(~4ns to ~3ns).

Looks related [Clang] Add support for GCC bound member functions extension by dtcxzyw · Pull Request #135649 · llvm/llvm-project · GitHub
@dtcxzyw

Yes, this feature request is filed by me, I propose the RFC to demonstrate its usefulness in real world.

Hi @rockeet! Please add this other use-case (unless, of course, someone can suggest another way to solve the same issue): I have a C API that allows registering C callbacks for various events, and I was planning to provide a C++ wrapper API for it. The simplest thing to do would be to offer a C++ class which would, in its constructor, invoke the C function to register the callbacks, which would be static methods (so that they are callable from C) which would then delegate the work to a virtual method:

class EventProcessorBase {
    EventProcessorBase() {
        // Invokes the C functions to register C-style callbacks
        c_api_register_cb1(&EventProcessorBase::cb1_wrapper);
        c_api_register_cb2(&EventProcessorBase::cb2_wrapper);
        ...
    }

    static void cb1_wrapper() { do_cb1(); }
    static void cb2_wrapper() { do_cb2(); }

    virtual void do_cb1() {};
    virtual void do_cb2() {};
};

So that the client would just need to derive from EventProcessorBase and reimplement the virtual methods that it needs.

However, it would be more efficient (especially if the C module performs more work if a callback is attached, for instance in preparing the function arguments) if the callbacks were registered only if the derived class provided some implementation for the virtual methods – otherwise there’s no point in registering the callbacks.

This extension would allow me to check in the EventProcessorBase constructor which virtual methods have been reimplemented, and register the callbacks only for them.

I don’t think your code snippet can be reimplemented using bound PMF without changing the overall usage of EventProcessorBase, not because the example is incomplete, but because early binding in EventProcessorBase constructor is impossible.

When I say the example is incomplete, I mean, obviously, you cannot call a virtual member function from a static member function since there is no this pointer. For the same reason, registering a virtual function as a bound method alone doesn’t do anything – the first parameter of that method must take this pointer.

I completed your snippet using GCC bound PMF, and let’s continue from there:

#include <cstdio>

typedef void cb1(void *);

struct __DIY_closure
{
    cb1 *f;
    void *user_data;

    void operator()()
    {
        f(user_data);
    }
};

static __DIY_closure saved_closure;

void c_api_register_cb1(cb1 f, void *user_data)
{
    saved_closure.f = f;
    saved_closure.user_data = user_data;
}

class EventProcessorBase
{
  public:
    EventProcessorBase()
    {
        register_cb1();
    }

    void register_cb1()
    {
        constexpr auto dyn = &EventProcessorBase::do_cb1;
        c_api_register_cb1((cb1 *)(this->*dyn), this);
    }

    virtual void do_cb1() = 0;
    virtual ~EventProcessorBase() = default;
};

class NewEvent : public EventProcessorBase
{
  public:
    void do_cb1() override
    {
        std::puts("NewEvent::do_cb1 is working");
    }
};

int main()
{
    auto obj = new NewEvent;
    obj->register_cb1();  // non-ctor
    saved_closure();
}

You can see that the cb1 signature is entirely suitable for a C-style API, and “NewEvent::do_cb1 is working.”

However, if you comment out the line mark // non-ctor, and rely on EventProcessorBase constructor to do the registration,

    EventProcessorBase()
    {
        register_cb1();
    }

The program will crash.

pure virtual method called
terminate called without an active exception
Aborted (core dumped)

You can play with the example here: Compiler Explorer

If you save a value of a C++ pointer-to-member-function (dyn variable in the example) in the same way in the context of EventProcessorBase constructor and call it later after the initialization of the NewEvent object completes, you won’t have such an issue because the polymorphism of C++ pointer-to-member-function is inherent. In contrast, the polymorphism of a bound method requires the polymorphism to be established at the time of the cast. This limitation equally applies to C++Builder’s __closure.

GCC’s bound method as a feature is undoubtedly useful, but it deserves a warning, and I’m not sure whether the demand is ubiquitous enough for a port in Clang (__closure is arguably more widely spread, for example, and does have an implementation in a Clang-fork). Ideas in this area need more exploration.

Oops, indeed I simplified my example too much. Yes, we need to pass this as the callback data.