Motivation
Words copied from gcc document
In C++, pointer to member functions (PMFs) are implemented using a wide pointer of sorts to handle all the possible call mechanisms; the PMF needs to store information about how to adjust the ‘this’ pointer, and if the function pointed to is virtual, where to find the vtable, and where in the vtable to look for the member function. If you are using PMFs in an inner loop, you should really reconsider that decision. If that is not an option, you can extract the pointer to the function that would be called for a given object/PMF pair and call it directly inside the inner loop, to save a bit of time.
Note that you still pay the penalty for the call through a function pointer; on most modern architectures, such a call defeats the branch prediction features of the CPU. This is also true of normal virtual function calls.
The syntax for this extension is
extern A a;
extern int (A::*fp)();
typedef int (*fptr)(A *);
fptr p = (fptr)(a.*fp);
For PMF constants (i.e. expressions of the form ‘&Klasse::Member’), no object is needed to obtain the address of the function. They can be converted to function pointers directly:
fptr p1 = (fptr)(&A::foo);
You must specify -Wno-pmf-conversions to use this extension.
Real worlds applications
All real worlds applications are early binding
.
Case 1: bind for life time constant conditions
struct A {
bool c1, c2, c3;
void foo() {
if (c1) foo1();
else if (c2) foo2();
else if (c3) foo3();
else foo4();
}
void foo1(); // ... foo4()
};
Such code can be optimized to:
struct A {
bool c1, c2, c3;
void (A::*p_do_foo)();
A(...) {
// bind p_do_foo by life time constant conditions
if (c1) p_do_foo = &A::foo1;
else if (c2) p_do_foo = &A::foo2;
else if (c3) p_do_foo = &A::foo3;
else p_do_foo = &A::foo4;
}
inline void foo() { (this->*p_do_foo)(); }
};
struct B : A {
B(...) { /*bind p_do_foo as B's member function*/ }
};
Calling member function through p_do_foo
is slower than plain function ptr.
Case 2: bind virtual functions
This is a real world code snippet from ToplingZipTableReader (An SST componet of ToplingDB), the using can be simplified as demo below:
// more ...
// the TableIterator
void SetSubReader(const ToplingZipSubReader* sub) {
subReader_ = sub;
iter_ = sub->index_->NewIterator();
store_ = sub->store_.get(); // get_record_append is a case 1 PMF
get_record_append_ = store_->m_get_record_append_CacheOffsets;
#if defined(_MSC_VER) || defined(__clang__)
#define IF_BOUND_PMF(Then, Else) Else // has no PMF
#else
#define IF_BOUND_PMF(Then, Else) Then // use PMF
// bind the virtual function by pmf
iter_next_ = (IterScanFN)(iter_->*(&COIndex::Iterator::Next));
iter_prev_ = (IterScanFN)(iter_->*(&COIndex::Iterator::Prev));
#endif
tag_rs_kind_ = sub->tag_rs_kind_;
}
// calling by PMF is much faster than by virtual call, iter_next_ and iter_
// can be loaded in parallel by CPU instruction level parallelization.
bool IndexIterInvokeNext() { return IF_BOUND_PMF(iter_next_(iter_), iter_->Next()); }
bool IndexIterInvokePrev() { return IF_BOUND_PMF(iter_prev_(iter_), iter_->Prev()); }
// more ...
//
// UintIndex Iterator code snippet, virtual function Next is very simple.
// Such index is used for MyTopling(MySQL) auto_increment primary key.
//
template<int SuffixLen>
struct UintIndexIterTmpl : public UintIndexIterator {
bool Next() override {
if (UNLIKELY(pos_ >= max_pos_)) {
m_id = size_t(-1);
return false;
} else {
++m_id;
// In AllOneRankSelect::zero_seq_len(size_t) { returns 0; }
pos_ = pos_ + index_.indexSeq_.zero_seq_len(pos_ + 1) + 1;
UpdateBufferConstLen(); // bswap and store SuffixLen bytes
return true;
}
}
}
// more ...
In the demo, the virtual function(Next()
) is very small, and there are not neighbor code around the virtual call(in IndexIterInvokeNext
) to it, thus the virtual function lookup can not be hide by CPU pipelining and multi issues, PMF can make a measurable speedup(~4ns to ~3ns).