[patch] CodeEmitter Memory Foot Reduction

Chris,

The basic idea of using templates inconjunction with inlining is for efficiency.

6,500 virtual calls outputting bytes out of 10000 calls, and the rest 1,750 being words to output 10,000 of code does not entice me to use virtual calls.

I understand that you say that, but I can’t bring myself to care at this point. Have you thought about how many cycles are already used to produce the instructions that lead to the emission of those 10K bytes? The total percentage of time spent doing these virtual calls will be tiny compared to the total time to generate the code.

If you switch to using virtual functions, get the code working, and we measure a performance problem, then we can fix it. There are much better ways to do this than templating the whole code emitter.

Whats Daniels approach, does he have any online documentation or code, do you have an email address so I may talk to him.

Take a look at how asmprinters work in include/llvm/Target/TargetRegistry.h . If you have specific questions, llvmdev is a great place to ask them.

-Chris

2009/7/16 Chris Lattner <clattner@apple.com>

Chris,

The basic idea of using templates inconjunction with inlining is for efficiency.

6,500 virtual calls outputting bytes out of 10000 calls, and the rest 1,750 being words to output 10,000 of code does not entice me to use virtual calls.

I understand that you say that, but I can’t bring myself to care at this point. Have you thought about how many cycles are already used to produce the instructions that lead to the emission of those 10K bytes? The total percentage of time spent doing these virtual calls will be tiny compared to the total time to generate the code.

If you switch to using virtual functions, get the code working, and we measure a performance problem, then we can fix it. There are much better ways to do this than templating the whole code emitter.

That means JIT code also has the virtual function overhead too, this will slow down existing JIT code. Templates are already there and they work and they do not take up too much memory.

Whats Daniels approach, does he have any online documentation or code, do you have an email address so I may talk to him.

Take a look at how asmprinters work in include/llvm/Target/TargetRegistry.h . If you have specific questions, llvmdev is a great place to ask them.

Okay I will take a look.

Aaron

Whats Daniels approach, does he have any online documentation or code, do
you have an email address so I may talk to him.

Take a look at how asmprinters work
in include/llvm/Target/TargetRegistry.h . If you have specific questions,
llvmdev is a great place to ask them.

Okay I will take a look.

I don't have any documentation yet other than the doxygen comments
(some will be added at least before 2.6), but the basic idea is that
there is one global Target instance per target, and targets register
optional components via initialization functions (which can be called
via static constructors, or explicitly by the client).

Clients of the targets simple request a Target, which will always be
linked in, and look to see if the optional functionality is present
(i.e. was linked in).

- Daniel

As I said before, if you are compelled to, feel free to continue with your approach of premature optimization. I will fix it later.

-Chris

2009/7/16 Daniel Dunbar <daniel@zuster.org>

2009/7/16 Chris Lattner <clattner@apple.com>

Whats Daniels approach, does he have any online documentation or code, do
you have an email address so I may talk to him.

Take a look at how asmprinters work
in include/llvm/Target/TargetRegistry.h . If you have specific questions,
llvmdev is a great place to ask them.

Okay I will take a look.

I don’t have any documentation yet other than the doxygen comments
(some will be added at least before 2.6), but the basic idea is that
there is one global Target instance per target, and targets register
optional components via initialization functions (which can be called
via static constructors, or explicitly by the client).

Clients of the targets simple request a Target, which will always be
linked in, and look to see if the optional functionality is present
(i.e. was linked in).

Okay so features must be linked in rather than availiable from dynamicly linked libraries.

This does not really help.

Sorry,

Aaron

2009/7/16 Chris Lattner <clattner@apple.com>

I understand that you say that, but I can’t bring myself to care at this point. Have you thought about how many cycles are already used to produce the instructions that lead to the emission of those 10K bytes? The total percentage of time spent doing these virtual calls will be tiny compared to the total time to generate the code.

If you switch to using virtual functions, get the code working, and we measure a performance problem, then we can fix it. There are much better ways to do this than templating the whole code emitter.

That means JIT code also has the virtual function overhead too, this will slow down existing JIT code. Templates are already there and they work and they do not take up too much memory.

As I said before, if you are compelled to, feel free to continue with your approach of premature optimization. I will fix it later.

If you are not too bothered with the memory overhead in the short term then it is probably best to leave the code as it is for meantime.

I think the original MachineCodeEmitter with inline emit* functions and for virtual functions like extend() so we can control memory management in sub classes.

class MachineCodeEmitter {
public:

inline void emitByte( uint8_t b) {
if (freespace())
BufferPtr++ = b;
else {
extend();

BufferPtr++ = b;
}
}

protected:
void virtual extend() = 0
};

so extend is overriden in JITCodeEmitter and ObjectCodeEmitter and is called moving emitted code to a bigger buffer if we run out of memory, then emission continues.

This gives the lowest overhead and flexability. If ObjectCodeEmitter and JITEmitter manage memory then the design is transparent to the higher levels of DOE and JIT.

The old design was the best design it just needed an extend() method.

Aaron

2009/7/16 Aaron Gray <aaronngray.lists@googlemail.com>

2009/7/16 Chris Lattner <clattner@apple.com>

I understand that you say that, but I can’t bring myself to care at this point. Have you thought about how many cycles are already used to produce the instructions that lead to the emission of those 10K bytes? The total percentage of time spent doing these virtual calls will be tiny compared to the total time to generate the code.

If you switch to using virtual functions, get the code working, and we measure a performance problem, then we can fix it. There are much better ways to do this than templating the whole code emitter.

That means JIT code also has the virtual function overhead too, this will slow down existing JIT code. Templates are already there and they work and they do not take up too much memory.

As I said before, if you are compelled to, feel free to continue with your approach of premature optimization. I will fix it later.

If you are not too bothered with the memory overhead in the short term then it is probably best to leave the code as it is for meantime.

I think the original MachineCodeEmitter with inline emit* functions and for virtual functions like extend() so we can control memory management in sub classes.

class MachineCodeEmitter {
public:

inline void emitByte( uint8_t b) {
if (freespace())
BufferPtr++ = b;
else {
extend();

BufferPtr++ = b;
}
}

protected:
void virtual extend() = 0
};

so extend is overriden in JITCodeEmitter and ObjectCodeEmitter and is called moving emitted code to a bigger buffer if we run out of memory, then emission continues.

This gives the lowest overhead and flexability. If ObjectCodeEmitter and JITEmitter manage memory then the design is transparent to the higher levels of DOE and JIT.

The old design was the best design it just needed an extend() method.

Oh, and JITEmitter::finishFunction will have to provide relocations for JIT too.

Aaron

Library features do not have to be linked in, however that is the
mechanism we want to use for the core LLVM libraries. The registry
mechanism itself doesn't care whether it is called via an
initialization function called directly, or a static constructor (etc)
run during the loading of a dynamic library.

- Daniel

Library features do not have to be linked in, however that is the
  mechanism we want to use for the core LLVM libraries. The registry
  mechanism itself doesn't care whether it is called via an
  initialization function called directly, or a static constructor (etc)
  run during the loading of a dynamic library.

Hi Daniel,

Sorry I really don't follow what you say.

Aaron