MachO and ELF Writers/MachineCodeEmitters are hard-coded into LLVMTargetMachine

someguy · March 15, 2009, 11:34am

Currently, the MachO and ELF Writers and MachineCodeEmitters are
hard-coded into LLVMTargetMachine and llc.

In other words, the 'object file generation' capabilities of the
Common Code Generator are not generic.

LLVMTargetMachine::addPassesToEmitFile explicitly checks whether the
derived backend TargetMachine implements one of getMachOWriterInfo or
getELFWriterInfo, and returns a corresponding FileModel enum value.

llc's main function uses the resulting FileModel value to determine
which of the {AddMachOWriter,AddELFWriter} functions to call.

This is limiting for a number of reasons:
1. If a given platform (e.g. x86) may support both MachO and ELF,
MachO will be selected, as it is checked first. This is bad behaviour,
it should be up to the user to decide which object format he wants.
2. Extension of the object file generation capabilities to include new
object file formats is difficult, and requires modifications to LLVM
code (not just a plugin).

I suggest transforming the {getMachOWriterInfo, getELFWriterInfo}
functions (on TargetMachine) into a single (templated?)
getObjectFileWriterInfo function. Additionally a addObjectFileWriter
member should be added to TargetMachine, taking the place of the
static {AddMachOWriter, AddELFWriter} functions.

As I need this functionality (custom object file generation) for my
current target, I'd be happy to make the modifications to the LLVM
core. Before I do so, I'd like to get feedback on my proposed
solution.

I've added a bug for this issue: http://llvm.org/bugs/show_bug.cgi?id=3813

Aaron_Gray1 · March 15, 2009, 8:39pm

Currently, the MachO and ELF Writers and MachineCodeEmitters are
hard-coded into LLVMTargetMachine and llc.

I am also interested in working on this area and interested in writting a COFF file backend.

In other words, the 'object file generation' capabilities of the
Common Code Generator are not generic.

I was looking at making a parallel class to MachineCodeEmitter, 'MachineCodeWriter' that can be used generically instead of MachineCodeEmitter to write to a supplied 'vector<byte>'. This would not introduce any overhead to the existing runtime code and would allow inlining of writting functions in X86CodeEmitter and other emitters. They would have to be templated and the MCE member parameterized.

LLVMTargetMachine::addPassesToEmitFile explicitly checks whether the
derived backend TargetMachine implements one of getMachOWriterInfo or
getELFWriterInfo, and returns a corresponding FileModel enum value.

llc's main function uses the resulting FileModel value to determine
which of the {AddMachOWriter,AddELFWriter} functions to call.

This is limiting for a number of reasons:
1. If a given platform (e.g. x86) may support both MachO and ELF,
MachO will be selected, as it is checked first. This is bad behaviour,
it should be up to the user to decide which object format he wants.
2. Extension of the object file generation capabilities to include new
object file formats is difficult, and requires modifications to LLVM
code (not just a plugin).

I suggest transforming the {getMachOWriterInfo, getELFWriterInfo}
functions (on TargetMachine) into a single (templated?)
getObjectFileWriterInfo function. Additionally a addObjectFileWriter
member should be added to TargetMachine, taking the place of the
static {AddMachOWriter, AddELFWriter} functions.

As I need this functionality (custom object file generation) for my
current target, I'd be happy to make the modifications to the LLVM
core. Before I do so, I'd like to get feedback on my proposed
solution.

I've added a bug for this issue: 3813 – MachO and ELF Writers/MachineCodeEmitters are hard-coded into LLVMTargetMachine

Aaron

some_guy · March 15, 2009, 9:09pm

I like the idea of a generic MachineCodeWriter, although I prefer the
name 'ObjectFileWriter'...

I think we need to take a hard look at which bits of the
Writer/Emitter infrastructure are needed for what tasks (Object File
Emittion, JIT, etc.) and make sure that our abstractions are flexible
enough... As it stands at the moment, the Writer and Emitter classes
could definately be merged (at least from the perspective of object
file generation).

At the moment, the Writer and Emitter are declared friend, and the
encapsulation is all broken anyhow... I'd like to rethink the whole
model a little...

In general, I think that a TargetMachine should expose a
'getObjectFileWriter' method, which could be used to obtain an object
file generator. An additional method should be available to allow
users of the TargetMachine to query which types of Object Files the
TargetMachine supports.

llc could then be simply re-written to use these generic functions
instead of the hard-coded MachO and ELF ones.

Aaron_Gray1 · March 15, 2009, 10:52pm

I like the idea of a generic MachineCodeWriter, although I prefer the
name 'ObjectFileWriter'...

Thats much more descriptive of the functionality.

I think we need to take a hard look at which bits of the
Writer/Emitter infrastructure are needed for what tasks (Object File
Emittion, JIT, etc.) and make sure that our abstractions are flexible
enough...

I would suggest being very familuar with the current code the JIT, and MachineCodeEmitter, and X86 and other CodeEmitter code before jumping in

As it stands at the moment, the Writer and Emitter classes
could definately be merged (at least from the perspective of object
file generation).

I would not do this, their functionality is distinct.

The MachineCodeEmitter is specifically used for the JIT, it works fine for now, I think we should leave this alone !

I did a patch that has not been accepted as of yet that deals with the GVStub methods moving them into the JITEmitter class, this made several anonomous namespace classes into llvm namespace and also moving the JITEmitter class main into a header file. This gave it visibility too. NOTE The doxygen API documentation does not show such anonymous namespace classes.

I looked into using two MachineCodeEmitter objects in the JITEmitter class to deal with the second dealing with stub generation instread but this got messy.

Just parameterizing the X86CodeEmitter and others gives us the base level of flexability and allows us not to have to disturb the existing JIT code too much.

As you probably know ObjectFile emittion is not working at all at present, the upper levels have been written out of SVN some time ago.

At the moment, the Writer and Emitter are declared friend, and the
encapsulation is all broken anyhow... I'd like to rethink the whole
model a little...

My inclination is to go down that route theoretically then step back to where we are and look at incremental changes that donot disturb the status quo too much, otherwise we will not get our patches through.

In general, I think that a TargetMachine should expose a
'getObjectFileWriter' method, which could be used to obtain an object
file generator. An additional method should be available to allow
users of the TargetMachine to query which types of Object Files the
TargetMachine supports.

Okay with that.

llc could then be simply re-written to use these generic functions
instead of the hard-coded MachO and ELF ones.

Okay, this give more flexability and usability.

Aaron_Gray1 · March 16, 2009, 3:26am

Sorry, I disagree actually the MachineCodeEmitter or the ‘MachineCodeWritter’ does not do any file handling at all. Do look at the code for the MachineCodeWritter and you will see it only writes to memory and if it reaches the end of the allotted memory I believe higher ordered logic reallocates a larget buffer and starts again from scratch. This could be avoided if they generated fixus for absolute memory references refering within the outputted code. Then a alloc function could be called before outputting say a 4 byte int and could realloc and copy code and when finally written the fixups could be applied.

I am also wondering about the efficiency of std::vector whether we could use that for the MachineCodeWriter, or whether we write out own code output stream/buffering ?

I still think this is where the crux of the problem lies the upper logic is relatively simple compared to this buy looking at what you say it is important to get it right.

Cheers,

Aaron

Aaron_Gray1 · March 16, 2009, 4:27am

‘ObjectCodeEmitter’ looks like the right description to parallel the MachineCodeEmitter. Its emitting object code to a data stream (which is an object file section) and not direct to a file.

I will knock to gether an ObjectCodeEmitter that is call compatible with the MachineCodeEmitter and wtites to a std::vector, so it could replace the MachineCodeEmitter class generically in usage.

This needs alot of thought and to get things right, and provide the right incremental patches to get this accepted.

Cheers,

Aaron

some_guy · March 16, 2009, 6:39am

Sorry, I disagree actually the MachineCodeEmitter or the
'MachineCodeWritter' does not do any file handling at all. Do look at the
code for the MachineCodeWritter and you will see it only writes to memory
and if it reaches the end of the allotted memory I believe higher ordered
logic reallocates a larget buffer and starts again from scratch. This could
be avoided if they generated fixus for absolute memory references refering
within the outputted code. Then a alloc function could be called before
outputting say a 4 byte int and could realloc and copy code and when finally
written the fixups could be applied.

IIRC the memory allocation is done in the MachineCodeEmitter, not the
higher level (see startFunction and finishFunction). The current
implementation has startFunction allocate some (arbitrary) reserve
size in the output vector, and if we the emitter runs out of space,
finishFunction returns a failure, causing the whole process to occur
again. This is icky.

It would be far better if the underlying buffer would grow
automatically (with an allocation method in the base class, as you
suggested), as code is emitted to it.

'ObjectCodeEmitter' looks like the right description to parallel the
MachineCodeEmitter. Its emitting object code to a data stream (which
is an object file section) and not direct to a file.

I can live with that. Before you implement anything, can we try and
define the responsibilities of the various classes?

We have MachineCodeEmitter, which is responsible for actually emitting
bytes into a buffer for a function. Should it have methods for
emitting instructions/operands, or should it only work at the byte,
dword, etc. level?

ObjectCodeEmitter, is responsible for emission of object 'files' into
a memory buffer. This includes handling of all object headers,
management of sections/segments, symbol and string tables and
relocations. The ObjectCodeEmitter should delegate all actual 'data
emission' to the MachineCodeEmitter.

ObjectCodeEmitter is a MachineFunctionPass. It does 'object wide'
setup in doInitialization and finalizes the object in doFinalize(?).
Each MachineFunction is emitted through the runOnFunction method,
which passes the MachineFunction to the MachineCodeEmitter. The
MachineCodeEmitter calls back to the ObjectCodeEmitter in order to
look up sections/segments, add globals to an unresolved globals list
etc.

I'm not too happy about the broken encapsulation here. I'd prefer to
find a better way to model this.

someguy · March 16, 2009, 6:49am

Sorry, I disagree actually the MachineCodeEmitter or the
'MachineCodeWritter' does not do any file handling at all. Do look at the
code for the MachineCodeWritter and you will see it only writes to memory
and if it reaches the end of the allotted memory I believe higher ordered
logic reallocates a larget buffer and starts again from scratch. This could
be avoided if they generated fixus for absolute memory references refering
within the outputted code. Then a alloc function could be called before
outputting say a 4 byte int and could realloc and copy code and when finally
written the fixups could be applied.

IIRC the memory allocation is done in the MachineCodeEmitter, not the
higher level (see startFunction and finishFunction). The current
implementation has startFunction allocate some (arbitrary) reserve
size in the output vector, and if we the emitter runs out of space,
finishFunction returns a failure, causing the whole process to occur
again. This is icky.

It would be far better if the underlying buffer would grow
automatically (with an allocation method in the base class, as you
suggested), as code is emitted to it.

'ObjectCodeEmitter' looks like the right description to parallel the
MachineCodeEmitter. Its emitting object code to a data stream (which
is an object file section) and not direct to a file.

I can live with that. Before you implement anything, can we try and
define the responsibilities of the various classes?

We have MachineCodeEmitter, which is responsible for actually emitting
bytes into a buffer for a function. Should it have methods for
emitting instructions/operands, or should it only work at the byte,
dword, etc. level?

ObjectCodeEmitter, is responsible for emission of object 'files' into
a memory buffer. This includes handling of all object headers,
management of sections/segments, symbol and string tables and
relocations. The ObjectCodeEmitter should delegate all actual 'data
emission' to the MachineCodeEmitter.

ObjectCodeEmitter is a MachineFunctionPass. It does 'object wide'
setup in doInitialization and finalizes the object in doFinalize(?).
Each MachineFunction is emitted through the runOnFunction method,
which passes the MachineFunction to the MachineCodeEmitter. The
MachineCodeEmitter calls back to the ObjectCodeEmitter in order to
look up sections/segments, add globals to an unresolved globals list
etc.

I'm not too happy about the broken encapsulation here. I'd prefer to
find a better way to model this.

Aaron_Gray1 · March 16, 2009, 1:50pm

Sorry, I disagree actually the MachineCodeEmitter or the
'MachineCodeWritter' does not do any file handling at all. Do look at the
code for the MachineCodeWritter and you will see it only writes to memory
and if it reaches the end of the allotted memory I believe higher ordered
logic reallocates a larget buffer and starts again from scratch. This could
be avoided if they generated fixus for absolute memory references refering
within the outputted code. Then a alloc function could be called before
outputting say a 4 byte int and could realloc and copy code and when finally
written the fixups could be applied.

IIRC the memory allocation is done in the MachineCodeEmitter, not the
higher level (see startFunction and finishFunction). The current
implementation has startFunction allocate some (arbitrary) reserve
size in the output vector, and if we the emitter runs out of space,
finishFunction returns a failure, causing the whole process to occur
again. This is icky.

Going from the doxygen documentation which I doubt has changed - MachineCodeEmitter::(start/finish)Function are both abstract functions, it the hidden class JITEmitter that implements these. MachioneCodeEmitter is a abstract class but does provide start, end and current prointers.

It would be far better if the underlying buffer would grow
automatically (with an allocation method in the base class, as you
suggested), as code is emitted to it.

Yes. An alloc(4) for example would make sure theres another 4 bytes to be written to if not it would copy the whole buffer and allocate say 4/3 more memory. The only problem is non PIC (position Independant Code) this would require storing fixups, which could probably be done via the relocation mechanism.

I want to use a straight 'std::vector<byte>' or reference to for the ObjectCodeEmitter.

Any way I think we bear this in mind but should leave this code alone for now and come back to it once we have ObjectCodeWritters in place. (This is political)

'ObjectCodeEmitter' looks like the right description to parallel the
MachineCodeEmitter. Its emitting object code to a data stream (which
is an object file section) and not direct to a file.

I can live with that. Before you implement anything, can we try and
define the responsibilities of the various classes?

This is pritty clear cut read and reread the code. But adding some more documentation would help.

We have MachineCodeEmitter, which is responsible for actually emitting
bytes into a buffer for a function.

Yep.

Should it have methods for
emitting instructions/operands, or should it only work at the byte,
dword, etc. level?

No this is done by X86CodeEmitter and the other ***CodeEmitters. They are in anonymous name spaces but look in 'lib/Target/*' direcories, specifically 'lib/Target/X*CodeEmitter' and look at X86CodeEmitter.

ObjectCodeEmitter, is responsible for emission of object 'files' into
a memory buffer. This includes handling of all object headers,
management of sections/segments, symbol and string tables and
relocations. The ObjectCodeEmitter should delegate all actual 'data
emission' to the MachineCodeEmitter.

No look at ELFWritter and ELFEmitter.

ObjectCodeEmitter is a MachineFunctionPass. It does 'object wide'

No ELFWriter inherits from MachineFunctionPass.
And ELFCodeEmitter from MachineCodeEmitter.

setup in doInitialization and finalizes the object in doFinalize(?).
Each MachineFunction is emitted through the runOnFunction method,
which passes the MachineFunction to the MachineCodeEmitter. The

ELFWriter::runOnFunction does nothing.

MachineCodeEmitter calls back to the ObjectCodeEmitter in order to
look up sections/segments, add globals to an unresolved globals list
etc.

No.

I'm not too happy about the broken encapsulation here. I'd prefer to
find a better way to model this.

Please, reread the code form SVN, make diagrams, preferable UML and look at what is really happening !

Aaron

Aaron_Gray1 · March 16, 2009, 3:58pm

Sorry, I disagree actually the MachineCodeEmitter or the
'MachineCodeWritter' does not do any file handling at all. Do look at the
code for the MachineCodeWritter and you will see it only writes to memory
and if it reaches the end of the allotted memory I believe higher ordered
logic reallocates a larget buffer and starts again from scratch. This could
be avoided if they generated fixus for absolute memory references refering
within the outputted code. Then a alloc function could be called before
outputting say a 4 byte int and could realloc and copy code and when finally
written the fixups could be applied.

IIRC the memory allocation is done in the MachineCodeEmitter, not the
higher level (see startFunction and finishFunction). The current
implementation has startFunction allocate some (arbitrary) reserve
size in the output vector, and if we the emitter runs out of space,
finishFunction returns a failure, causing the whole process to occur
again. This is icky.

As I said the MachineCodeEmitter works exclusively with the JITEmitter class which inherits from the MachineCodeEmitter class and thats where the allocation is going on.

It would be far better if the underlying buffer would grow
automatically (with an allocation method in the base class, as you
suggested), as code is emitted to it.

I think we leave the JITEmitter and MachineCodeEmitter alone as they work and its a pritty difficult thing to replace them without causing any regressions.

'ObjectCodeEmitter' looks like the right description to parallel the
MachineCodeEmitter. Its emitting object code to a data stream (which
is an object file section) and not direct to a file.

I can live with that. Before you implement anything, can we try and
define the responsibilities of the various classes?

Yes.

We have MachineCodeEmitter, which is responsible for actually emitting
bytes into a buffer for a function. Should it have methods for
emitting instructions/operands, or should it only work at the byte,
dword, etc. level?

It works at primatine byte, word, dword level only. the *CodeEmitter classes like the X86CodeEmitter class are responsible for generating machine code and use the MachineCodeEmitter class to do this.

ObjectCodeEmitter, is responsible for emission of object 'files' into
a memory buffer. This includes handling of all object headers,
management of sections/segments, symbol and string tables and

It writes to sections in ELFWriter::ELFSection.SectionData which are std::vector<unsigned char>.

relocations. The ObjectCodeEmitter should delegate all actual 'data
emission' to the MachineCodeEmitter.

No it will have to parallel the MachineCodeEmitter as a generic alternative.

ObjectCodeEmitter is a MachineFunctionPass. It does 'object wide'
setup in doInitialization and finalizes the object in doFinalize(?).
Each MachineFunction is emitted through the runOnFunction method,
which passes the MachineFunction to the MachineCodeEmitter. The
MachineCodeEmitter calls back to the ObjectCodeEmitter in order to
look up sections/segments, add globals to an unresolved globals list
etc.

No look at the ELFWriter which inherits from MachineFunctionPass and ELFCodeEmitter which inherits from the MachineCodeEmitter.

We need to override the behaviour of the MachineCodeEmitter creating the ObjectCodeEmitter which writes to the appropriate section in ELFWriter::ELFSection.SectionData.

I'm not too happy about the broken encapsulation here. I'd prefer to
find a better way to model this.

Where ? How ?

Aaron

someguy · March 16, 2009, 5:05pm

Aaron, I mailed in the same mail twice (by mistake), you answered both
copies. Differently!

In any case, I've re-read what exists. I'm dumping what I understand
here, so that we can discuss in detail. I'm using MachO as the example
object format, as the ELF code is totally broken and outdated. Lets
use the following as the basis for our discussion?

There are 3 classes which participate in object file emission:
1. MachOWriter
- a MachineFunctionPass, with a donothing runOnMachineFunction.

doInitialization and doFinalization are used to emit the object file
header and finalize the various object file segments, respectively.

The MachOWriter is responsible for creation of MachOCodeEmitter, via
it's getMachineCodeEmitter function.

2. MachOCodeEmitter
- a MachineCodeEmitter. Friend class of MachOWriter (friend class
== broken encapsulation!?)

startFunction allocates storage in the text section for the current function.

finishFunction emits constant-pools, jumptables; transforms
relocations adding globals to the MachOWriter's PendingGlobals list,
and all relocations to the parent section's relocation list; adds a
symbol for the function to the MachOWriter's SymbolTable.

[In general, all the operations in finishFunction actually modify the
data of the MachOWriter. Shouldn't these be pushed into the
MachOWriter? ]

3. X86CodeEmitter
- a MachineFunctionPass, NOT a MachineCodeEmitter (Could the naming
change perhaps?)

This class receives (during construction) a reference to a
MachineCodeEmitter (e.g. MachOCodeEmitter, which in turn stores a
reference to a MachOWritter).

The runOnMachineFunction for the X86CodeEmitter does:
- call MachOCodeEmitter::startFunction
- for each basicblock in function:
   - call MachOCodeEmitter::StartMachineBasicBlock
   - for each instruction in basicblock:
       - emit instruction, using MachineCodeEmitter::emit* functions
- call MachOCodeEmitter::finishFunction

[This runOnMachineFunction could definitely be generalized, i.e.
implemented in a base class ('EmitterMachineFunctionPass' or a better
name). This base class would then have (abstract) emitInstruction,
emitOperand, etc... methods. It should also integrate with the
*GenCodeEmitter emitted by tblgen so that you get automatic code
emission. When implementing a new target, one would simply need to
inherit the baseclass, and override the functions necessary to tweak
output.]

Aaron_Gray1 · March 16, 2009, 5:56pm

Aaron, I mailed in the same mail twice (by mistake), you answered both
copies. Differently!

In any case, I've re-read what exists. I'm dumping what I understand
here, so that we can discuss in detail. I'm using MachO as the example
object format, as the ELF code is totally broken and outdated. Lets
use the following as the basis for our discussion?

I've never looked at the MachO code as I do not have such a platform nor do I know the file format.

Could we concentrate on the ELF backend, please.

3. X86CodeEmitter
- a MachineFunctionPass, NOT a MachineCodeEmitter (Could the naming
change perhaps?)

Yes, it uses a MachineCodeEmitter in its internals (MCE) and as a constructor argument.

This class receives (during construction) a reference to a
MachineCodeEmitter (e.g. MachOCodeEmitter, which in turn stores a
reference to a MachOWritter).

The runOnMachineFunction for the X86CodeEmitter does:
- call MachOCodeEmitter::startFunction
- for each basicblock in function:
  - call MachOCodeEmitter::StartMachineBasicBlock
  - for each instruction in basicblock:
      - emit instruction, using MachineCodeEmitter::emit* functions
- call MachOCodeEmitter::finishFunction

[This runOnMachineFunction could definitely be generalized, i.e.
implemented in a base class ('EmitterMachineFunctionPass' or a better
name). This base class would then have (abstract) emitInstruction,
emitOperand, etc... methods. It should also integrate with the
*GenCodeEmitter emitted by tblgen so that you get automatic code
emission. When implementing a new target, one would simply need to
inherit the baseclass, and override the functions necessary to tweak
output.]

runOnMachineFunction is a standard LLVM message we cannot play around with it.

Please read LLVM code more and get a general overview of the standard interfaces.

Aaron

someguy · March 16, 2009, 6:26pm

I've never looked at the MachO code as I do not have such a platform nor do
I know the file format.

Could we concentrate on the ELF backend, please.

I don't mind using the ELF backend as our test case, it just seems
that the ELFWriter/ELFCodeEmitter don't even use the
BufferBegin/BufferEnd/CurBufferPtr system exposed by the base
MachineCodeEmitter. There is a big "FIXME" and an abort at the
beginning of the ELFCodeEmitter::startFunction. This is why I used
MachO to 'grok' the concept.

3. X86CodeEmitter
- a MachineFunctionPass, NOT a MachineCodeEmitter (Could the naming
change perhaps?)

Yes, it uses a MachineCodeEmitter in its internals (MCE) and as a
constructor argument.

runOnMachineFunction is a standard LLVM message we cannot play around with
it.

I'm well aware that runOnMachineFunction is a standard LLVM meme. My
suggestion in no way conflicts with its standard meaning.

All I meant was that we could build a new target-nonspecific 'base
class', called e.g. ObjectEmitter, which would inherit
MachineFunctionPass. This base-class should implement
runOnMachineFunction using the pattern described above, as it seems
relatively target-nonspecific. In order to emit actual instructions,
the ObjectEmitter::runOnMachineFunction could call the (abstract)
ObjectEmitter::emitInstruction method, which _must_ be
target-specific.

This emitInstruction method could either be abstract, in which case
the developer of the target backend must implement it appropriately.
Alternatively it could be virtual, with the default implementation
utilizing tblgen generated emission code.

In either case, the ObjectEmitter would lighten the load on the target
backend developer, as he would only be required to implement
target-specific functions (emitInstruction and dependencies).

I don't see any reason that such a pattern would 'break' the LLVM
standard pattern.

Please read LLVM code more and get a general overview of the standard
interfaces.

Although I am quite new to llvm, I have spent 2 straight weeks reading
its code. There are many things I have yet to understand, and I
appreciate your patience. Please do try and point out my
misunderstandings rather than make generic 'read more code'
suggestions. I really want to learn, and am more than willing to make
the effort.

One more thing: the naming of the various classes is a little confusing:
- ELFWriter - a MachineFunctionPass
- ELFCodeEmitter - a MachineCodeEmitter
(Those I can deal with, although ELFWriterPass makes me much happier)
- X86CodeEmitter - a MachineFunctionPass
(This is just confusing! Its an emitter, but not a MachineCodeEmitter.
Perhaps X86CodeGenerator is more appropriate? Or even
X86CodeGeneratorPass?)

Also, am I right in saying that only the X86CodeEmitter is used for
JIT, and that a special JIT MachineCodeEmitter is passed to its
constructor?

Thanks.
someguy

BTW: if you want to hash these things out 'live', I'm usually in the
IRC channel during the day (GMT+1).

Aaron_Gray1 · March 16, 2009, 8:37pm

I've never looked at the MachO code as I do not have such a platform nor do
I know the file format.

Could we concentrate on the ELF backend, please.

I don't mind using the ELF backend as our test case, it just seems
that the ELFWriter/ELFCodeEmitter don't even use the
BufferBegin/BufferEnd/CurBufferPtr system exposed by the base
MachineCodeEmitter. There is a big "FIXME" and an abort at the
beginning of the ELFCodeEmitter::startFunction. This is why I used
MachO to 'grok' the concept.

Okay I will read the MachO backend, thanks.

3. X86CodeEmitter
- a MachineFunctionPass, NOT a MachineCodeEmitter (Could the naming
change perhaps?)

Yes, it uses a MachineCodeEmitter in its internals (MCE) and as a
constructor argument.

runOnMachineFunction is a standard LLVM message we cannot play around with
it.

I'm well aware that runOnMachineFunction is a standard LLVM meme. My
suggestion in no way conflicts with its standard meaning.

Okay

All I meant was that we could build a new target-nonspecific 'base
class', called e.g. ObjectEmitter, which would inherit
MachineFunctionPass. This base-class should implement
runOnMachineFunction using the pattern described above, as it seems
relatively target-nonspecific. In order to emit actual instructions,
the ObjectEmitter::runOnMachineFunction could call the (abstract)
ObjectEmitter::emitInstruction method, which _must_ be
target-specific.

Do we need to abstract targets back to a ObjectEmitter class, can we not just implement ELFEmitter and ELFWriter ?

This emitInstruction method could either be abstract, in which case
the developer of the target backend must implement it appropriately.
Alternatively it could be virtual, with the default implementation
utilizing tblgen generated emission code.

I would prefer not to use virtual methods at this level or at least keep them to a minimum. Ideally everything reasonable to be inlined should be inlined.

In either case, the ObjectEmitter would lighten the load on the target
backend developer, as he would only be required to implement
target-specific functions (emitInstruction and dependencies).

The

I don't see any reason that such a pattern would 'break' the LLVM
standard pattern.

Okay.

Please read LLVM code more and get a general overview of the standard
interfaces.

Although I am quite new to llvm, I have spent 2 straight weeks reading
its code. There are many things I have yet to understand, and I
appreciate your patience. Please do try and point out my
misunderstandings rather than make generic 'read more code'
suggestions. I really want to learn, and am more than willing to make
the effort.

Yeah, its like that. I will have to read the MachO stuff.

One more thing: the naming of the various classes is a little confusing:
- ELFWriter - a MachineFunctionPass
- ELFCodeEmitter - a MachineCodeEmitter
(Those I can deal with, although ELFWriterPass makes me much happier)
- X86CodeEmitter - a MachineFunctionPass
(This is just confusing! Its an emitter, but not a MachineCodeEmitter.
Perhaps X86CodeGenerator is more appropriate? Or even
X86CodeGeneratorPass?)

Yeah, but it is how it is and people are familiar with it like that.

Also, am I right in saying that only the X86CodeEmitter is used for
JIT, and that a special JIT MachineCodeEmitter is passed to its
constructor?

Yep, thats the bit I think could be made generic, parameterize the MCE variable, and we can pass in our ObjectCodeEmitter class and object. That was my plan anyway for the lower level.

Okay I will read the MachO code and hopefully get your perspective.

BTW: if you want to hash these things out 'live', I'm usually in the
IRC channel during the day (GMT+1).

I do prefer e-mail

Cheers,

Aaron

Topic		Replies	Views
Static code generation - is it gone from LLVM 2.7? LLVM Dev List Archives	11	90	March 29, 2010
Directly generating binary file LLVM Dev List Archives	3	113	March 1, 2006
[MC] [llvm-mc] Getting target specific information to <target>ELFObjectWriter LLVM Dev List Archives	12	107	December 18, 2012
n00b question: From module/bitcode to Mach-O dylib file directly? LLVM Dev List Archives	4	67	March 16, 2009
[RFC] Reworking the TargetMachine Interface Code Generation llvm	0	188	October 10, 2024

MachO and ELF Writers/MachineCodeEmitters are hard-coded into LLVMTargetMachine

Related topics