Runtime optimization of C++ code with virtual functions

Is there any possible method using LLVM that would help in this case?

LLVM won't help in this case.

Is that so or it means that LLVM wouldn't have a prebuilt solution?

It means that LLVM doesn't have any trivial builtin solution.

I'm asking because (without having ever looked seriously into LLVM) I
was thinking to experiment along these lines:

class Source {
  void send (T data) {
    invoke_jit_magic();
    transport (data);
        }
}

transport() would be a virtual method like the original posting. In my
case send() would be part of the framework, so it is not a problem to
add the invoke_jit_magic. In other case it might be trickier.

Ok.

On the first call, invoke_jit_magic gains control, traverse the binary
converting (a subset of) what it finds to LLVM IR, until it gets to the
concrete target. It may have to do a bit of work to understand how
parameters are passed to the transport code (it is a virtual function
call and might be messy in presence of multiple/virtual inheritance.
After that LLVM jit can be used to replace the original binary fragment
with something faster.

Ok.

I agree with the suggestion of using templates when possible.

But this works at compile time only right?

In my case
it is not doable because transport would be propietary and the code
containing it distributed only as binary.

Ok.

I understand that the disassemblying portion need to be rewritten. Is
there anything else that would prevent this approach from working?
Again, haven't looked into LLVM yet, so I can immagine there might be
problems in describing physical registers in the IR and at some point
stuff must be exactly where the pre-existing code expects it. I don;t
want to take your time, but if you could elaborate a bit it might
prevent me from going down the wrong path.

This should work, I don't expect you to run into any significant problems.
When you're rewriting the LLVM IR for the indirect call, you can just
replace it with a direct call to the native code.

Compared to template based specialization this would have the advantage of being dynamic.

Stephane Letz

But templates have the advantage of being able to be inlined. This is a much
more important transformation than simply converting an indirect call to a
direct one, especially on modern implementations like Core or Opteron.

You approach is going to make inlining very difficult, I think. Not that
there's a whole lot that can be done about it, given the binary translation
going on. For example, how would you inline calls to send() where transport()
has been inlined (assuming send() wasn't already inlined)?

Is there some other set of transformations you have in mind to generate more
efficient code for transport() at run time? Partial evaluation might be
interesting, but that's applicable whether or not transport() is virtual. In
fact, virtual call resolution is a form of partial evaluation where the
run-time constants are the "this" pointer and its most-derived subclass type.

If you really want to generate fast code, it might be worth your while to
implement more general partial evaluation and specialization. If you make it
general enough, you'll get run-time virtual call resolution "for free."

You might also have a look at the Self papers. The Self team did a lot of
work on runtime optimization of dynamic dispatch. IIRC they also did some
partial evaluation work.

                                           -Dave