Runtime optimization of C++ code with virtual functions

Let's say we have the following scheme using C++ and virtual functions:

class DSP {

  public:

  DSP() {}
  virtual ~DSP() {}

  virtual int Compute(int count, float** in, float** out) = 0;
};

class CONCRETE_DSP : public DSP {

  public:

  CONCRETE_DSP():fValue() {}
  virtual ~CONCRETE_DSP() {}

  virtual int Compute(int count, float** in, float** out)
  {
    DoSomeProcess();
  }
};
      
class SEQ_DDSP : public DSP {

   private:
  
  DSP* fArg1;
  DSP* fArg2;
  
   public:

  SEQ_DDSP(DSP* a1, DSP* a2):fArg1(a1), fArg2(a2) {}
  virtual~SEQ_DDSP() {delete fArg1; delete fArg2;}
  
  virtual int Compute(int count, float** in, float** out)
  {
    // Some code that uses:
    fArg1->Compute(count, in, out);
    fArg2->Compute(count, in, out);
  }
};

class PAR_DSP : public DSP {

   private:
  
  DSP* fArg1;
  DSP* fArg2;
  
   public:

  PAR_DSP(DSP* a1, DSP* a2):fArg1(a1), fArg2(a2) {}
  virtual~PAR_DSP() {delete fArg1; delete fArg2;}
  
  virtual int Compute(int count, float** in, float** out)
  {
    // Some code that uses:
    fArg1->Compute(count, in, out);
    fArg2->Compute(count, in, out);
  }

};

void ProcessGraph (float** in, float** out)
{
  DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new CONCRETE_DSP()), new CONCRETE_DSP());
  graph->Compute(512, in, out);
  delete graph;
}

At runtime after a graph is created, one could imagine optimizing by resolving call to "virtual Compute" and possibly get a more efficient Compute method for the entire graph, so that we could write:

DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new CONCRETE_DSP()), new CONCRETE_DSP());

graph->Optimize();

graph->Compute(512, in, out); possibly a lot of time.

Is there any possible method using LLVM that would help in this case?

Thanks

Stephane Letz

LLVM won't help in this case. However, I'd strongly recommend dropping the virtual functions and using template instantiation to get this. That way you'd do something like:

   PAR_DSP<SEQ_DDSP<CONCRETE_DSP, CONCRETE_DSP>, CONCRETE_DSP> X;
   X->Compute(512, in, out);

This will be efficient even when statically compiled.

-Chris

Is that so or it means that LLVM wouldn't have a prebuilt solution?
I'm asking because (without having ever looked seriously into LLVM) I was
thinking to experiment along these lines:

class Source {
  void send (T data) {
    invoke_jit_magic();
    transport (data);
         }
}

transport() would be a virtual method like the original posting. In my case send() would be
part of the framework, so it is not a problem to add the invoke_jit_magic. In other case it might be trickier.

On the first call, invoke_jit_magic gains control, traverse the binary converting (a subset of) what it finds
to LLVM IR, until it gets to the concrete target. It may have to do a bit of work to understand how parameters
are passed to the transport code (it is a virtual function call and might be messy in presence of multiple/virtual inheritance.
After that LLVM jit can be used to replace the original binary fragment with something faster.

I agree with the suggestion of using templates when possible. In my case it is not doable because transport would be
propietary and the code containing it distributed only as binary.

I understand that the disassemblying portion need to be rewritten. Is there anything else that would prevent this
approach from working?
Again, haven't looked into LLVM yet, so I can immagine there might be problems in describing physical registers in the
IR and at some point stuff must be exactly where the pre-existing code expects it.
I don;t want to take your time, but if you could elaborate a bit it might prevent me from going down the wrong path.

Best regards,

    Maurizio

Is there any possible method using LLVM that would help in this case?

LLVM won't help in this case.

Is that so or it means that LLVM wouldn't have a prebuilt solution?

It means that LLVM doesn't have any trivial builtin solution.

I'm asking because (without having ever looked seriously into LLVM) I was thinking to experiment along these lines:

class Source {
  void send (T data) {
    invoke_jit_magic();
    transport (data);
        }
}

transport() would be a virtual method like the original posting. In my case send() would be part of the framework, so it is not a problem to add the invoke_jit_magic. In other case it might be trickier.

Ok.

On the first call, invoke_jit_magic gains control, traverse the binary converting (a subset of) what it finds to LLVM IR, until it gets to the concrete target. It may have to do a bit of work to understand how parameters are passed to the transport code (it is a virtual function call and might be messy in presence of multiple/virtual inheritance. After that LLVM jit can be used to replace the original binary fragment with something faster.

Ok.

I agree with the suggestion of using templates when possible. In my case it is not doable because transport would be propietary and the code containing it distributed only as binary.

Ok.

I understand that the disassemblying portion need to be rewritten. Is there anything else that would prevent this approach from working? Again, haven't looked into LLVM yet, so I can immagine there might be problems in describing physical registers in the IR and at some point stuff must be exactly where the pre-existing code expects it. I don;t want to take your time, but if you could elaborate a bit it might prevent me from going down the wrong path.

This should work, I don't expect you to run into any significant problems. When you're rewriting the LLVM IR for the indirect call, you can just replace it with a direct call to the native code.

-Chris