Being able to know the jitted code-size before emitting

Hi everyone,

vmkit requires to know the size of a jitted method before emitting the method. This allows to allocate the correct size for the method. The attached patch creates this functionality when the flag SizedMemoryCode is on.

In order to implement this functionality, i had to virtualize some MachineCodeEmitter functions.

Is it OK to commit the patch?

Thanks,
Nicolas

jit-size.patch (16.1 KB)

Hi everyone,

vmkit requires to know the size of a jitted method before emitting the method. This allows to allocate the correct size for the method. The attached patch creates this functionality when the flag SizedMemoryCode is on.

In order to implement this functionality, i had to virtualize some MachineCodeEmitter functions.

Is it OK to commit the patch?

A couple style comments:

- The header for SizeEmitter.h is the old style. Please remove the part about "was developed by". You can see other files for the correct format. I think SizeEmitter.cpp is missing the header totally.
- SizeEmitter.h/SizeEmitter.cpp is lacking comments. Please add doxygen style comments for all functions and member variables.

Thanks,
Tanya

Hi,

Two questions. 1) How are you computing size of the method being jitted? 2) Why not simply add the functionality of allocating emission buffer of specific size to MachineCodeEmitter instead?

Thanks,

Evan

Hi Evan,

Evan Cheng wrote:

1) How are you computing size of the method being jitted?

I add a new pass with addSimpleCodeEmitter, with the emitter being a SizeEmitter. Since the target calls the emitter with functions such as writeByte, writeWord, etc.... the SizeEmitter class implements these function by incrementing a counter.

At the end of the pass, the code size of the function is known.

2) Why not simply add the functionality of allocating emission buffer of specific size to MachineCodeEmitter instead?
  
I don't understand. The MachineCodeEmitter class writes directly to a buffer which is remalloc'ed if necessary. I want this buffer to be pre-allocated and of the correct size. Hence I need a pass that runs before emitting code.

Nicolas

Hi Evan,

Evan Cheng wrote:

1) How are you computing size of the method being
jitted?

I add a new pass with addSimpleCodeEmitter, with the emitter being a
SizeEmitter. Since the target calls the emitter with functions such as
writeByte, writeWord, etc.... the SizeEmitter class implements these
function by incrementing a counter.

At the end of the pass, the code size of the function is known.

That's a hack. :slight_smile: Some targets already have ways to compute the exact size of a function. See ARM::GetFunctionSize() ARM::GetInstSize(). I'd like to see them standardized (all targets that have JIT support can accurately calculate the function / instruction sizes) and then you can make use of that.

2) Why not simply add the functionality of allocating emission
buffer of specific size to MachineCodeEmitter instead?

I don't understand. The MachineCodeEmitter class writes directly to a
buffer which is remalloc'ed if necessary. I want this buffer to be
pre-allocated and of the correct size. Hence I need a pass that runs
before emitting code.

Suppose you have the size, can't you pass the desired buffer size to MachineCodeEmitter and have it do the right thing? It seems like you can add the functionality to MachineCodeEmitter rather than add a new class for it.

Evan

Evan Cheng wrote:

  
That's a hack. :slight_smile:

It is if you think that code emitter should only be used for actually writing somewhere the data. It is not if you find it another useful utility :wink:

Some targets already have ways to compute the exact size of a function. See ARM::GetFunctionSize() ARM::GetInstSize(). I'd like to see them standardized (all targets that have JIT support can accurately calculate the function / instruction sizes) and then you can make use of that.
  
OK, I see. However this requires to do this to all targets. In my solution, it works for all targets, is not very intrusive and requires fewer code, hence introduces fewer bugs :wink:

Suppose you have the size, can't you pass the desired buffer size to MachineCodeEmitter and have it do the right thing?

Sure, that's what I'm doing. The emitter with the SizeEmitter class will give the accurate size to the MachineCodeEmitter. And the MachineCodeEmitter will emit the instructions in the buffer.

My solution is surely using things in a way that wasn't intended, but it works fine and for all targets. Is this really a stopper for integrating it? (You can see here my laziness on implementing the GetFunctionSize to every targets :)).

Nicolas

Evan Cheng wrote:

That's a hack. :slight_smile:

It is if you think that code emitter should only be used for actually
writing somewhere the data. It is not if you find it another useful
utility :wink:

Except it's pretty slow at it. :slight_smile:

Some targets already have ways to compute the exact
size of a function. See ARM::GetFunctionSize() ARM::GetInstSize(). I'd
like to see them standardized (all targets that have JIT support can
accurately calculate the function / instruction sizes) and then you
can make use of that.

OK, I see. However this requires to do this to all targets. In my
solution, it works for all targets, is not very intrusive and requires
fewer code, hence introduces fewer bugs :wink:

Let's see. ARM has it already. PPC has getNumBytesForInstruction so you only need to add one to compute function size. Also you only need to implement it for targets that support JIT right now, which leaves Alpha and X86. I'm guessing Alpha is using fixed encoding so it should be pretty easy. Or you can just punt it and let the target maintainers worry about it.

Really you only need to implement GetFunctionSize for X86 and that's not as horrible as you think it is. For each instruction:
1. Compute how many bytes of prefix. The code is there in X86CodeEmitter except all you need is the size, not the prefixes being emitted.
2. For each instruction format, e.g AddRegFrm, MRMDestReg, it's either only one single byte for opcode or one additional ModR/M byte. Again, all the information is right there.
3. Some memory instructions needs a SIB byte.
4. Process the operands! Some special handling for operands that made up of the memory address. But the logic is all there in emitMemModRMByte.

What do you think?

Evan

In general, it is not possible to know jitted code size without
emitting. You can suppress the actual write of the instructions
themselves, but you have to do all of the work prior to that point.

The reason is that on many architectures there are span-dependent
branches. The final instruction size depends on the branch span. The
span depends on the code size, and the code size depends on the
representation of the span. It really is a cyclical dependency, and it
is usually solved by doing the best you can and then breaking cycles in
the span dependency graph by picking one of the offending spans
arbitrarily and fixing it to the max representation length (after which
the rest of the dependency loop can be resolved).

But the point is that you have to actually generate the code (at least
in memory) to do this analysis.

You can get a conservative upper bound on those architectures by
assuming that all spans will use the maximal sized representation, but
you cannot get a precise size without emitting the code.

shap

In general, it is not possible to know jitted code size without
emitting. You can suppress the actual write of the instructions
themselves, but you have to do all of the work prior to that point.

That's not true. llvm targets which support JIT have all the information necessary to calculate the size of every instructions. It's possible to compute instruction length without having to compute what bits are going to be emitted.

The reason is that on many architectures there are span-dependent
branches. The final instruction size depends on the branch span. The
span depends on the code size, and the code size depends on the
representation of the span. It really is a cyclical dependency, and it
is usually solved by doing the best you can and then breaking cycles in
the span dependency graph by picking one of the offending spans
arbitrarily and fixing it to the max representation length (after which
the rest of the dependency loop can be resolved).

But the point is that you have to actually generate the code (at least
in memory) to do this analysis.

JIT is responsible for relocation. It knows everything, including the distance between the source and destination. Codegen needs to resolve all this before jitting so it's always possible to compute the size of every instruction before actually code emission.

Evan

> In general, it is not possible to know jitted code size without
> emitting. You can suppress the actual write of the instructions
> themselves, but you have to do all of the work prior to that point.

That's not true. llvm targets which support JIT have all the
information necessary to calculate the size of every instructions.
It's possible to compute instruction length without having to compute
what bits are going to be emitted.

Evan: please explain how span-dependent branches are resolved in your
method. You don't need to compute the bits that will be emitted, but you
do need to compute the length of those bits. In most real
implementations, the two steps are therefore inseparable.

> But the point is that you have to actually generate the code (at least
> in memory) to do this analysis.

JIT is responsible for relocation. It knows everything, including the
distance between the source and destination. Codegen needs to resolve
all this before jitting so it's always possible to compute the size of
every instruction before actually code emission.

This is basically repeating what I already said. You can suppress the
actual write of the instructions, but you need to run all of the steps
of code generation up to (but excluding) emission if you need to resolve
span dependent branches.

Meta-observation: if you cared how big the jitted code was, presumably
there was some question about whether/where you had the space for the
code. Codegen will require at least that much space in order to answer
the question. Given which, the simplest approach is to go ahead and
generate the code provisionally and either (a) keep it or (b) discard
it.

shap

I think the important point here is that llvm explicitly represent short and long branches as two different instructions. We don't leave it up to the assembler to determine whether a short or long branch is to be used. The PPC and ARM backend have branch shortening passes to know what sort of branch to us, for example.

-Chris

Intra-function branches are resolved by branch shortening pass. If a call destination it too far away, then JIT will emit function stub to facilitate it. This means codegen has already determined the instructions that will be emitted so it's possible to determine the length of each instruction.

Evan

Evan Cheng wrote:

Let's see. ARM has it already. PPC has getNumBytesForInstruction so you only need to add one to compute function size. Also you only need to implement it for targets that support JIT right now, which leaves Alpha and X86. I'm guessing Alpha is using fixed encoding so it should be pretty easy. Or you can just punt it and let the target maintainers worry about it.
  
Yeah, fixed encoding like in PPC is easy to handle.

Really you only need to implement GetFunctionSize for X86 and that's not as horrible as you think it is. For each instruction:
1. Compute how many bytes of prefix. The code is there in X86CodeEmitter except all you need is the size, not the prefixes being emitted.
2. For each instruction format, e.g AddRegFrm, MRMDestReg, it's either only one single byte for opcode or one additional ModR/M byte. Again, all the information is right there.
3. Some memory instructions needs a SIB byte.
4. Process the operands! Some special handling for operands that made up of the memory address. But the logic is all there in emitMemModRMByte.

What do you think?

That's doable, but I'm afraid I'll have to duplicate many code (eg emitMemModRMByte -> sizeMemModRMByte and probably others). That's not a problem (I'm thinking bug fixes and improvements, we'll have to handle two duplicated code).

Nicolas

Evan Cheng wrote:

Let's see. ARM has it already. PPC has getNumBytesForInstruction so
you only need to add one to compute function size. Also you only need
to implement it for targets that support JIT right now, which leaves
Alpha and X86. I'm guessing Alpha is using fixed encoding so it should
be pretty easy. Or you can just punt it and let the target maintainers
worry about it.

Yeah, fixed encoding like in PPC is easy to handle.

Really you only need to implement GetFunctionSize for X86 and that's
not as horrible as you think it is. For each instruction:
1. Compute how many bytes of prefix. The code is there in
X86CodeEmitter except all you need is the size, not the prefixes being
emitted.
2. For each instruction format, e.g AddRegFrm, MRMDestReg, it's either
only one single byte for opcode or one additional ModR/M byte. Again,
all the information is right there.
3. Some memory instructions needs a SIB byte.
4. Process the operands! Some special handling for operands that made
up of the memory address. But the logic is all there in
emitMemModRMByte.

What do you think?

That's doable, but I'm afraid I'll have to duplicate many code (eg
emitMemModRMByte -> sizeMemModRMByte and probably others). That's not a
problem (I'm thinking bug fixes and improvements, we'll have to handle
two duplicated code).

I don't think the duplication is going to be top much of a problem. If it is, I'll bug you about refactoring. :slight_smile:

Thanks,

Evan

Hi Evan,

Evan Cheng wrote:

I don't think the duplication is going to be top much of a problem. If it is, I'll bug you about refactoring. :slight_smile:

I don't mean to show how lazy I can be, but I also need to know the size of the exception table emitted in memory (JITDwarfEmitter.cpp). Reviewing it a little, I can not see how things won't be duplicated. Knowing that I already duplicated code to implement dwarf facilities to JIT...

Besides, my implementation is not that slow, you know. :slight_smile: It's roughly 5 times faster than the actual emission of the code. And if someone's unhappy with it, then it doesn't have to use it (or code it your way). The only downside of my implementation is the virtualization of some function calls of MachineCodeEmitter. But that does not really cost much.

So I'm torn between two different approaches. I don't see one of them being cleaner than the other one: one duplicates code, one has an unusal way to do things. Moreover, if that functionality is integrated into the tree, there is little remaining for vmkit to use TOT without patching. And I would love to see that as soon as possible.

So are you still strongly against it? Should I stop complaining and start writing the functionality the way you suggested it? :wink:

Thx,
Nicolas

Hi Evan,

Evan Cheng wrote:

I don't think the duplication is going to be top much of a problem. If
it is, I'll bug you about refactoring. :slight_smile:

I don't mean to show how lazy I can be, but I also need to know the size
of the exception table emitted in memory (JITDwarfEmitter.cpp).
Reviewing it a little, I can not see how things won't be duplicated.
Knowing that I already duplicated code to implement dwarf facilities to
JIT...

Besides, my implementation is not that slow, you know. :slight_smile: It's roughly
5 times faster than the actual emission of the code. And if someone's
unhappy with it, then it doesn't have to use it (or code it your way).
The only downside of my implementation is the virtualization of some
function calls of MachineCodeEmitter. But that does not really cost much.

So I'm torn between two different approaches. I don't see one of them
being cleaner than the other one: one duplicates code, one has an unusal
way to do things. Moreover, if that functionality is integrated into the
tree, there is little remaining for vmkit to use TOT without patching.
And I would love to see that as soon as possible.

So are you still strongly against it? Should I stop complaining and
start writing the functionality the way you suggested it? :wink:

Yeah, sorry I'm stubborn sometimes. :slight_smile: And really I think adding the code size functionality is not really that complicated. I would be happy to help if you run into issues.

Thanks.

Evan

Hi Evan,

Evan Cheng wrote:

Yeah, sorry I'm stubborn sometimes. :slight_smile: And really I think adding the code size functionality is not really that complicated. I would be happy to help if you run into issues.

What do you think of adding a TargetMachine::getFunctionSize(MachineFunction*) and a TargetInstrInfo::getInstructionSize(MachineInstruction*)? Is this a good place to make them available to other passes? (ie the JIT)

Thanks,
Nicolas

Hi Evan,

Evan Cheng wrote:

Yeah, sorry I'm stubborn sometimes. :slight_smile: And really I think adding the
code size functionality is not really that complicated. I would be
happy to help if you run into issues.

What do you think of adding a
TargetMachine::getFunctionSize(MachineFunction*) and a
TargetInstrInfo::getInstructionSize(MachineInstruction*)? Is this a good
place to make them available to other passes? (ie the JIT)

I think both of these belong to TargetInstrInfo. And yes, it's a good idea, there are other passes which can make use of them, e.g. branch shortening.

Thanks,

Evan

OK, here's a new patch that adds the infrastructure and the implementation for X86, ARM and PPC of GetInstSize and GetFunctionSize. Both functions are virtual functions defined in TargetInstrInfo.h.

For X86, I moved some commodity functions from X86CodeEmitter to X86InstrInfo.

What do you think?

Nicolas

Evan Cheng wrote:

jit-size.patch (38.9 KB)

Comments below.

OK, here's a new patch that adds the infrastructure and the implementation for X86, ARM and PPC of GetInstSize and GetFunctionSize. Both functions are virtual functions defined in TargetInstrInfo.h.

For X86, I moved some commodity functions from X86CodeEmitter to X86InstrInfo.

What do you think?

Nicolas

Evan Cheng wrote:

I think both of these belong to TargetInstrInfo. And yes, it's a good idea, there are other passes which can make use of them, e.g. branch shortening.

Thanks,

Evan

Thanks,
Nicolas

Thanks.

Evan

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Index: include/llvm/Target/TargetInstrInfo.h

--- include/llvm/Target/TargetInstrInfo.h (revision 49716)
+++ include/llvm/Target/TargetInstrInfo.h (working copy)
@@ -388,6 +388,18 @@
    abort();
    return 0; // Must return a value in order to compile with VS 2005
  }
+
+ /// GetInstSize - Returns the size of the specified Instruction.
+ ///
+ virtual unsigned GetInstSize(const MachineInstr *MI) const {
+ assert(0 && "Target didn't implement TargetInstrInfo::GetInstSize!");
+ return 0;
+ }
+
+ /// GetFunctionSize - Returns the size of the specified MachineFunction.
+ ///
+ virtual unsigned GetFunctionSize(const MachineFunction &MF) const = 0;
+
};

Great! But perhaps change the name to GetInstSizeInBytes() to be more explicit? Same for GetFunctionSize(). Hrm. Are there machines where instructions may not be byte sized? I guess we'll worry about it if someone adds that target. :slight_smile:

/// TargetInstrInfoImpl - This is the default implementation of
@@ -408,6 +420,7 @@
                             MachineBasicBlock::iterator MI,
                             unsigned DestReg,
                             const MachineInstr *Orig) const;
+ virtual unsigned GetFunctionSize(const MachineFunction &MF) const;
};

} // End llvm namespace
Index: lib/CodeGen/TargetInstrInfoImpl.cpp

--- lib/CodeGen/TargetInstrInfoImpl.cpp (revision 49716)
+++ lib/CodeGen/TargetInstrInfoImpl.cpp (working copy)
@@ -94,3 +94,13 @@
  MBB.insert(I, MI);
}

+unsigned TargetInstrInfoImpl::GetFunctionSize(const MachineFunction &MF) const {
+ unsigned FnSize = 0;
+ for (MachineFunction::const_iterator MBBI = MF.begin(), E = MF.end();
+ MBBI != E; ++MBBI) {
+ const MachineBasicBlock &MBB = *MBBI;
+ for (MachineBasicBlock::const_iterator I = MBB.begin(),E = MBB.end(); I != E; ++I)
+ FnSize += GetInstSize(I);
+ }
+ return FnSize;
+}

How about a default GetInstSize() as well? Return 1 for every instruction except for some special TargetInstrInfo instructions, e.g. PHI, IMPLICIT_DEF, LABEL. I don't know if it's useful or not. But perhaps we can default most targets to it?

Index: lib/Target/PowerPC/PPCBranchSelector.cpp

--- lib/Target/PowerPC/PPCBranchSelector.cpp (revision 49716)
+++ lib/Target/PowerPC/PPCBranchSelector.cpp (working copy)
@@ -22,7 +22,6 @@
#include "PPCPredicates.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/Target/TargetMachine.h"
-#include "llvm/Target/TargetAsmInfo.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/MathExtras.h"
@@ -54,25 +53,6 @@
  return new PPCBSel();
}

-/// getNumBytesForInstruction - Return the number of bytes of code the specified
-/// instruction may be. This returns the maximum number of bytes.
-///
-static unsigned getNumBytesForInstruction(MachineInstr *MI) {
- switch (MI->getOpcode()) {
- case PPC::INLINEASM: { // Inline Asm: Variable size.
- MachineFunction *MF = MI->getParent()->getParent();
- const char *AsmStr = MI->getOperand(0).getSymbolName();
- return MF->getTarget().getTargetAsmInfo()->getInlineAsmLength(AsmStr);
- }
- case PPC::label: {
- return 0;
- }
- default:
- return 4; // PowerPC instructions are all 4 bytes
- }
-}
-
bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
  const TargetInstrInfo *TII = Fn.getTarget().getInstrInfo();
  // Give the blocks of the function a dense, in-order, numbering.
@@ -88,7 +68,7 @@
    unsigned BlockSize = 0;
    for (MachineBasicBlock::iterator MBBI = MBB->begin(), EE = MBB->end();
         MBBI != EE; ++MBBI)
- BlockSize += getNumBytesForInstruction(MBBI);
+ BlockSize += TII->GetInstSize(MBBI);

    BlockSizes[MBB->getNumber()] = BlockSize;
    FuncSize += BlockSize;
@@ -124,7 +104,7 @@
      for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
           I != E; ++I) {
        if (I->getOpcode() != PPC::BCC || I->getOperand(2).isImmediate()) {
- MBBStartOffset += getNumBytesForInstruction(I);
+ MBBStartOffset += TII->GetInstSize(I);
          continue;
        }

Index: lib/Target/PowerPC/PPCInstrInfo.h

--- lib/Target/PowerPC/PPCInstrInfo.h (revision 49716)
+++ lib/Target/PowerPC/PPCInstrInfo.h (working copy)
@@ -155,6 +155,11 @@

  virtual bool BlockHasNoFallThrough(MachineBasicBlock &MBB) const;
  virtual bool ReverseBranchCondition(std::vector<MachineOperand> &Cond) const;
+
+ /// GetInstSize - Return the number of bytes of code the specified
+ /// instruction may be. This returns the maximum number of bytes.
+ ///
+ virtual unsigned GetInstSize(const MachineInstr *MI) const;
};

}
Index: lib/Target/PowerPC/PPCInstrInfo.cpp

--- lib/Target/PowerPC/PPCInstrInfo.cpp (revision 49716)
+++ lib/Target/PowerPC/PPCInstrInfo.cpp (working copy)
@@ -20,6 +20,7 @@
#include "llvm/ADT/STLExtras.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/Support/CommandLine.h"
+#include "llvm/Target/TargetAsmInfo.h"
using namespace llvm;

extern cl::opt<bool> EnablePPC32RS; // FIXME (64-bit): See PPCRegisterInfo.cpp.
@@ -724,3 +725,21 @@
  Cond[0].setImm(PPC::InvertPredicate((PPC::Predicate)Cond[0].getImm()));
  return false;
}
+
+/// GetInstSize - Return the number of bytes of code the specified
+/// instruction may be. This returns the maximum number of bytes.
+///
+unsigned PPCInstrInfo::GetInstSize(const MachineInstr *MI) const {
+ switch (MI->getOpcode()) {
+ case PPC::INLINEASM: { // Inline Asm: Variable size.
+ const MachineFunction *MF = MI->getParent()->getParent();
+ const char *AsmStr = MI->getOperand(0).getSymbolName();
+ return MF->getTarget().getTargetAsmInfo()->getInlineAsmLength(AsmStr);
+ }
+ case PPC::label: {
+ return 0;
+ }
+ default:
+ return 4; // PowerPC instructions are all 4 bytes
+ }
+}
Index: lib/Target/ARM/ARMInstrInfo.cpp

--- lib/Target/ARM/ARMInstrInfo.cpp (revision 49716)
+++ lib/Target/ARM/ARMInstrInfo.cpp (working copy)
@@ -877,8 +877,8 @@

/// GetInstSize - Return the size of the specified MachineInstr.
///
-unsigned ARM::GetInstSize(MachineInstr *MI) {
- MachineBasicBlock &MBB = *MI->getParent();
+unsigned ARMInstrInfo::GetInstSize(const MachineInstr *MI) const {
+ const MachineBasicBlock &MBB = *MI->getParent();
  const MachineFunction *MF = MBB.getParent();
  const TargetAsmInfo *TAI = MF->getTarget().getTargetAsmInfo();

@@ -937,17 +937,3 @@
  }
  return 0; // Not reached
}
-
-/// GetFunctionSize - Returns the size of the specified MachineFunction.
-///
-unsigned ARM::GetFunctionSize(MachineFunction &MF) {
- unsigned FnSize = 0;
- for (MachineFunction::iterator MBBI = MF.begin(), E = MF.end();
- MBBI != E; ++MBBI) {
- MachineBasicBlock &MBB = *MBBI;
- for (MachineBasicBlock::iterator I = MBB.begin(),E = MBB.end(); I != E; ++I)
- FnSize += ARM::GetInstSize(I);
- }
- return FnSize;
-}
-
Index: lib/Target/ARM/ARMRegisterInfo.cpp

--- lib/Target/ARM/ARMRegisterInfo.cpp (revision 49716)
+++ lib/Target/ARM/ARMRegisterInfo.cpp (working copy)
@@ -948,7 +948,7 @@

  bool ForceLRSpill = false;
  if (!LRSpilled && AFI->isThumbFunction()) {
- unsigned FnSize = ARM::GetFunctionSize(MF);
+ unsigned FnSize = TII.GetFunctionSize(MF);
    // Force LR to be spilled if the Thumb function size is > 2048. This enables
    // use of BL to implement far jump. If it turns out that it's not needed
    // then the branch fix up path will undo it.
Index: lib/Target/ARM/ARMConstantIslandPass.cpp

--- lib/Target/ARM/ARMConstantIslandPass.cpp (revision 49716)
+++ lib/Target/ARM/ARMConstantIslandPass.cpp (working copy)
@@ -368,7 +368,7 @@
    for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
         I != E; ++I) {
      // Add instruction size to MBBSize.
- MBBSize += ARM::GetInstSize(I);
+ MBBSize += TII->GetInstSize(I);

      int Opc = I->getOpcode();
      if (I->getDesc().isBranch()) {
@@ -519,7 +519,7 @@
  for (MachineBasicBlock::iterator I = MBB->begin(); ; ++I) {
    assert(I != MBB->end() && "Didn't find MI in its own basic block?");
    if (&*I == MI) return Offset;
- Offset += ARM::GetInstSize(I);
+ Offset += TII->GetInstSize(I);
  }
}

@@ -617,7 +617,7 @@
  unsigned NewBBSize = 0;
  for (MachineBasicBlock::iterator I = NewBB->begin(), E = NewBB->end();
       I != E; ++I)
- NewBBSize += ARM::GetInstSize(I);
+ NewBBSize += TII->GetInstSize(I);

  unsigned OrigBBI = OrigBB->getNumber();
  unsigned NewBBI = NewBB->getNumber();
@@ -968,9 +968,9 @@
    MachineBasicBlock::iterator MI = UserMI;
    ++MI;
    unsigned CPUIndex = CPUserIndex+1;
- for (unsigned Offset = UserOffset+ARM::GetInstSize(UserMI);
+ for (unsigned Offset = UserOffset+TII->GetInstSize(UserMI);
         Offset < BaseInsertOffset;
- Offset += ARM::GetInstSize(MI),
+ Offset += TII->GetInstSize(MI),
            MI = next(MI)) {
      if (CPUIndex < CPUsers.size() && CPUsers[CPUIndex].MI == MI) {
        if (!OffsetIsInRange(Offset, EndInsertOffset,
@@ -1225,7 +1225,7 @@
    SplitBlockBeforeInstr(MI);
    // No need for the branch to the next block. We're adding a unconditional
    // branch to the destination.
- int delta = ARM::GetInstSize(&MBB->back());
+ int delta = TII->GetInstSize(&MBB->back());
    BBSizes[MBB->getNumber()] -= delta;
    MachineBasicBlock* SplitBB = next(MachineFunction::iterator(MBB));
    AdjustBBOffsetsAfter(SplitBB, -delta);
@@ -1243,18 +1243,18 @@
  BuildMI(MBB, TII->get(MI->getOpcode())).addMBB(NextBB)
    .addImm(CC).addReg(CCReg);
  Br.MI = &MBB->back();
- BBSizes[MBB->getNumber()] += ARM::GetInstSize(&MBB->back());
+ BBSizes[MBB->getNumber()] += TII->GetInstSize(&MBB->back());
  BuildMI(MBB, TII->get(Br.UncondBr)).addMBB(DestBB);
- BBSizes[MBB->getNumber()] += ARM::GetInstSize(&MBB->back());
+ BBSizes[MBB->getNumber()] += TII->GetInstSize(&MBB->back());
  unsigned MaxDisp = getUnconditionalBrDisp(Br.UncondBr);
  ImmBranches.push_back(ImmBranch(&MBB->back(), MaxDisp, false, Br.UncondBr));

  // Remove the old conditional branch. It may or may not still be in MBB.
- BBSizes[MI->getParent()->getNumber()] -= ARM::GetInstSize(MI);
+ BBSizes[MI->getParent()->getNumber()] -= TII->GetInstSize(MI);
  MI->eraseFromParent();

  // The net size change is an addition of one unconditional branch.
- int delta = ARM::GetInstSize(&MBB->back());
+ int delta = TII->GetInstSize(&MBB->back());
  AdjustBBOffsetsAfter(MBB, delta);
  return true;
}
Index: lib/Target/ARM/ARMInstrInfo.h

--- lib/Target/ARM/ARMInstrInfo.h (revision 49716)
+++ lib/Target/ARM/ARMInstrInfo.h (working copy)
@@ -225,18 +225,12 @@

  virtual bool DefinesPredicate(MachineInstr *MI,
                                std::vector<MachineOperand> &Pred) const;
+
+ /// GetInstSize - Returns the size of the specified MachineInstr.
+ ///
+ virtual unsigned GetInstSize(const MachineInstr* MI) const;
};

- // Utility routines
- namespace ARM {
- /// GetInstSize - Returns the size of the specified MachineInstr.
- ///
- unsigned GetInstSize(MachineInstr *MI);
-
- /// GetFunctionSize - Returns the size of the specified MachineFunction.
- ///
- unsigned GetFunctionSize(MachineFunction &MF);
- }
}

#endif
Index: lib/Target/X86/X86RegisterInfo.cpp

--- lib/Target/X86/X86RegisterInfo.cpp (revision 49716)
+++ lib/Target/X86/X86RegisterInfo.cpp (working copy)
@@ -84,7 +84,7 @@
// getX86RegNum - This function maps LLVM register identifiers to their X86
// specific numbering, which is used in various places encoding instructions.
//
-unsigned X86RegisterInfo::getX86RegNum(unsigned RegNo) const {
+unsigned X86RegisterInfo::getX86RegNum(unsigned RegNo) {

What happened to "const"?

  switch(RegNo) {
  case X86::RAX: case X86::EAX: case X86::AX: case X86::AL: return N86::EAX;
  case X86::RCX: case X86::ECX: case X86::CX: case X86::cl: return N86::ECX;
Index: lib/Target/X86/X86CodeEmitter.cpp

--- lib/Target/X86/X86CodeEmitter.cpp (revision 49716)
+++ lib/Target/X86/X86CodeEmitter.cpp (working copy)
@@ -92,8 +92,6 @@
                          intptr_t PCAdj = 0);

    unsigned getX86RegNum(unsigned RegNo) const;
- bool isX86_64ExtendedReg(const MachineOperand &MO);
- unsigned determineREX(const MachineInstr &MI);

    bool gvNeedsLazyPtr(const GlobalValue *GV);
  };
@@ -405,139 +403,6 @@
  }
}

-static unsigned sizeOfImm(const TargetInstrDesc *Desc) {
- switch (Desc->TSFlags & X86II::ImmMask) {
- case X86II::Imm8: return 1;
- case X86II::Imm16: return 2;
- case X86II::Imm32: return 4;
- case X86II::Imm64: return 8;
- default: assert(0 && "Immediate size not set!");
- return 0;
- }
-}
-
-/// isX86_64ExtendedReg - Is the MachineOperand a x86-64 extended register?
-/// e.g. r8, xmm8, etc.
-bool Emitter::isX86_64ExtendedReg(const MachineOperand &MO) {
- if (!MO.isRegister()) return false;
- switch (MO.getReg()) {
- default: break;
- case X86::R8: case X86::R9: case X86::R10: case X86::R11:
- case X86::R12: case X86::R13: case X86::R14: case X86::R15:
- case X86::R8D: case X86::R9D: case X86::R10D: case X86::R11D:
- case X86::R12D: case X86::R13D: case X86::R14D: case X86::R15D:
- case X86::R8W: case X86::R9W: case X86::R10W: case X86::R11W:
- case X86::R12W: case X86::R13W: case X86::R14W: case X86::R15W:
- case X86::R8B: case X86::R9B: case X86::R10B: case X86::R11B:
- case X86::R12B: case X86::R13B: case X86::R14B: case X86::R15B:
- case X86::XMM8: case X86::XMM9: case X86::XMM10: case X86::XMM11:
- case X86::XMM12: case X86::XMM13: case X86::XMM14: case X86::XMM15:
- return true;
- }
- return false;
-}
-
-inline static bool isX86_64NonExtLowByteReg(unsigned reg) {
- return (reg == X86::SPL || reg == X86::BPL ||
- reg == X86::SIL || reg == X86::DIL);
-}
-
-/// determineREX - Determine if the MachineInstr has to be encoded with a X86-64
-/// REX prefix which specifies 1) 64-bit instructions, 2) non-default operand
-/// size, and 3) use of X86-64 extended registers.
-unsigned Emitter::determineREX(const MachineInstr &MI) {
- unsigned REX = 0;
- const TargetInstrDesc &Desc = MI.getDesc();
-
- // Pseudo instructions do not need REX prefix byte.
- if ((Desc.TSFlags & X86II::FormMask) == X86II::Pseudo)
- return 0;
- if (Desc.TSFlags & X86II::REX_W)
- REX |= 1 << 3;
-
- unsigned NumOps = Desc.getNumOperands();
- if (NumOps) {
- bool isTwoAddr = NumOps > 1 &&
- Desc.getOperandConstraint(1, TOI::TIED_TO) != -1;
-
- // If it accesses SPL, BPL, SIL, or DIL, then it requires a 0x40 REX prefix.
- unsigned i = isTwoAddr ? 1 : 0;
- for (unsigned e = NumOps; i != e; ++i) {
- const MachineOperand& MO = MI.getOperand(i);
- if (MO.isRegister()) {
- unsigned Reg = MO.getReg();
- if (isX86_64NonExtLowByteReg(Reg))
- REX |= 0x40;
- }
- }
-
- switch (Desc.TSFlags & X86II::FormMask) {
- case X86II::MRMInitReg:
- if (isX86_64ExtendedReg(MI.getOperand(0)))
- REX |= (1 << 0) | (1 << 2);
- break;
- case X86II::MRMSrcReg: {
- if (isX86_64ExtendedReg(MI.getOperand(0)))
- REX |= 1 << 2;
- i = isTwoAddr ? 2 : 1;
- for (unsigned e = NumOps; i != e; ++i) {
- const MachineOperand& MO = MI.getOperand(i);
- if (isX86_64ExtendedReg(MO))
- REX |= 1 << 0;
- }
- break;
- }
- case X86II::MRMSrcMem: {
- if (isX86_64ExtendedReg(MI.getOperand(0)))
- REX |= 1 << 2;
- unsigned Bit = 0;
- i = isTwoAddr ? 2 : 1;
- for (; i != NumOps; ++i) {
- const MachineOperand& MO = MI.getOperand(i);
- if (MO.isRegister()) {
- if (isX86_64ExtendedReg(MO))
- REX |= 1 << Bit;
- Bit++;
- }
- }
- break;
- }
- case X86II::MRM0m: case X86II::MRM1m:
- case X86II::MRM2m: case X86II::MRM3m:
- case X86II::MRM4m: case X86II::MRM5m:
- case X86II::MRM6m: case X86II::MRM7m:
- case X86II::MRMDestMem: {
- unsigned e = isTwoAddr ? 5 : 4;
- i = isTwoAddr ? 1 : 0;
- if (NumOps > e && isX86_64ExtendedReg(MI.getOperand(e)))
- REX |= 1 << 2;
- unsigned Bit = 0;
- for (; i != e; ++i) {
- const MachineOperand& MO = MI.getOperand(i);
- if (MO.isRegister()) {
- if (isX86_64ExtendedReg(MO))
- REX |= 1 << Bit;
- Bit++;
- }
- }
- break;
- }
- default: {
- if (isX86_64ExtendedReg(MI.getOperand(0)))
- REX |= 1 << 0;
- i = isTwoAddr ? 2 : 1;
- for (unsigned e = NumOps; i != e; ++i) {
- const MachineOperand& MO = MI.getOperand(i);
- if (isX86_64ExtendedReg(MO))
- REX |= 1 << 2;
- }
- break;
- }
- }
- return REX;
-}
-
void Emitter::emitInstruction(const MachineInstr &MI,
                              const TargetInstrDesc *Desc) {
  DOUT << MI;
@@ -584,7 +449,7 @@

  if (Is64BitMode) {
    // REX prefix
- unsigned REX = determineREX(MI);
+ unsigned REX = X86InstrInfo::determineREX(MI);
    if (REX)
      MCE.emitByte(0x40 | REX);
  }
@@ -632,7 +497,7 @@
    case X86::MOVPC32r: {
      // This emits the "call" portion of this pseudo instruction.
      MCE.emitByte(BaseOpcode);
- emitConstant(0, sizeOfImm(Desc));
+ emitConstant(0, X86InstrInfo::sizeOfImm(Desc));
      // Remember PIC base.
      PICBaseOffset = MCE.getCurrentPCOffset();
      X86JITInfo *JTI = dynamic_cast<X86JITInfo*>(TM.getJITInfo());
@@ -657,7 +522,7 @@
      } else if (MO.isExternalSymbol()) {
        emitExternalSymbolAddress(MO.getSymbolName(), X86::reloc_pcrel_word);
      } else if (MO.isImmediate()) {
- emitConstant(MO.getImm(), sizeOfImm(Desc));
+ emitConstant(MO.getImm(), X86InstrInfo::sizeOfImm(Desc));
      } else {
        assert(0 && "Unknown RawFrm operand!");
      }
@@ -669,7 +534,7 @@

    if (CurOp != NumOps) {
      const MachineOperand &MO1 = MI.getOperand(CurOp++);
- unsigned Size = sizeOfImm(Desc);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
      if (MO1.isImmediate())
        emitConstant(MO1.getImm(), Size);
      else {
@@ -698,7 +563,7 @@
                     getX86RegNum(MI.getOperand(CurOp+1).getReg()));
    CurOp += 2;
    if (CurOp != NumOps)
- emitConstant(MI.getOperand(CurOp++).getImm(), sizeOfImm(Desc));
+ emitConstant(MI.getOperand(CurOp++).getImm(), X86InstrInfo::sizeOfImm(Desc));
    break;
  }
  case X86II::MRMDestMem: {
@@ -706,7 +571,7 @@
    emitMemModRMByte(MI, CurOp, getX86RegNum(MI.getOperand(CurOp+4).getReg()));
    CurOp += 5;
    if (CurOp != NumOps)
- emitConstant(MI.getOperand(CurOp++).getImm(), sizeOfImm(Desc));
+ emitConstant(MI.getOperand(CurOp++).getImm(), X86InstrInfo::sizeOfImm(Desc));
    break;
  }

@@ -716,18 +581,18 @@
                     getX86RegNum(MI.getOperand(CurOp).getReg()));
    CurOp += 2;
    if (CurOp != NumOps)
- emitConstant(MI.getOperand(CurOp++).getImm(), sizeOfImm(Desc));
+ emitConstant(MI.getOperand(CurOp++).getImm(), X86InstrInfo::sizeOfImm(Desc));
    break;

  case X86II::MRMSrcMem: {
- intptr_t PCAdj = (CurOp+5 != NumOps) ? sizeOfImm(Desc) : 0;
+ intptr_t PCAdj = (CurOp+5 != NumOps) ? X86InstrInfo::sizeOfImm(Desc) : 0;

    MCE.emitByte(BaseOpcode);
    emitMemModRMByte(MI, CurOp+1, getX86RegNum(MI.getOperand(CurOp).getReg()),
                     PCAdj);
    CurOp += 5;
    if (CurOp != NumOps)
- emitConstant(MI.getOperand(CurOp++).getImm(), sizeOfImm(Desc));
+ emitConstant(MI.getOperand(CurOp++).getImm(), X86InstrInfo::sizeOfImm(Desc));
    break;
  }

@@ -741,7 +606,7 @@

    if (CurOp != NumOps) {
      const MachineOperand &MO1 = MI.getOperand(CurOp++);
- unsigned Size = sizeOfImm(Desc);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
      if (MO1.isImmediate())
        emitConstant(MO1.getImm(), Size);
      else {
@@ -769,7 +634,7 @@
  case X86II::MRM4m: case X86II::MRM5m:
  case X86II::MRM6m: case X86II::MRM7m: {
    intptr_t PCAdj = (CurOp+4 != NumOps) ?
- (MI.getOperand(CurOp+4).isImmediate() ? sizeOfImm(Desc) : 4) : 0;
+ (MI.getOperand(CurOp+4).isImmediate() ? X86InstrInfo::sizeOfImm(Desc) : 4) : 0;

    MCE.emitByte(BaseOpcode);
    emitMemModRMByte(MI, CurOp, (Desc->TSFlags & X86II::FormMask)-X86II::MRM0m,
@@ -778,7 +643,7 @@

    if (CurOp != NumOps) {
      const MachineOperand &MO = MI.getOperand(CurOp++);
- unsigned Size = sizeOfImm(Desc);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
      if (MO.isImmediate())
        emitConstant(MO.getImm(), Size);
      else {
Index: lib/Target/X86/X86InstrInfo.h

--- lib/Target/X86/X86InstrInfo.h (revision 49716)
+++ lib/Target/X86/X86InstrInfo.h (working copy)
@@ -15,6 +15,7 @@
#define X86INSTRUCTIONINFO_H

#include "llvm/Target/TargetInstrInfo.h"
+#include "X86.h"
#include "X86RegisterInfo.h"
#include "llvm/ADT/IndexedMap.h"
#include "llvm/Target/TargetRegisterInfo.h"
@@ -380,7 +381,21 @@
  unsigned char getBaseOpcodeFor(unsigned Opcode) const {
    return getBaseOpcodeFor(&get(Opcode));
  }
+
+ static bool isX86_64NonExtLowByteReg(unsigned reg) {
+ return (reg == X86::SPL || reg == X86::BPL ||
+ reg == X86::SIL || reg == X86::DIL);
+ }
+
+ static unsigned sizeOfImm(const TargetInstrDesc *Desc);
+ static unsigned getX86RegNum(unsigned RegNo);
+ static bool isX86_64ExtendedReg(const MachineOperand &MO);
+ static unsigned determineREX(const MachineInstr &MI);

+ /// GetInstSize - Returns the size of the specified MachineInstr.
+ ///
+ virtual unsigned GetInstSize(const MachineInstr *MI) const;
+
private:
  MachineInstr* foldMemoryOperand(MachineInstr* MI,
                                    unsigned OpNum,
Index: lib/Target/X86/X86RegisterInfo.h

--- lib/Target/X86/X86RegisterInfo.h (revision 49716)
+++ lib/Target/X86/X86RegisterInfo.h (working copy)
@@ -84,7 +84,7 @@

  /// getX86RegNum - Returns the native X86 register number for the given LLVM
  /// register identifier.
- unsigned getX86RegNum(unsigned RegNo) const;
+ static unsigned getX86RegNum(unsigned RegNo);

  unsigned getStackAlignment() const { return StackAlign; }

Index: lib/Target/X86/X86InstrInfo.cpp

--- lib/Target/X86/X86InstrInfo.cpp (revision 49716)
+++ lib/Target/X86/X86InstrInfo.cpp (working copy)
@@ -24,7 +24,9 @@
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
#include "llvm/Target/TargetOptions.h"
+#include "llvm/Target/TargetAsmInfo.h"

using namespace llvm;

@@ -2263,3 +2265,556 @@
  else
    return &X86::GR32RegClass;
}
+
+unsigned X86InstrInfo::sizeOfImm(const TargetInstrDesc *Desc) {
+ switch (Desc->TSFlags & X86II::ImmMask) {
+ case X86II::Imm8: return 1;
+ case X86II::Imm16: return 2;
+ case X86II::Imm32: return 4;
+ case X86II::Imm64: return 8;
+ default: assert(0 && "Immediate size not set!");
+ return 0;
+ }
+}
+
+/// isX86_64ExtendedReg - Is the MachineOperand a x86-64 extended register?
+/// e.g. r8, xmm8, etc.
+bool X86InstrInfo::isX86_64ExtendedReg(const MachineOperand &MO) {
+ if (!MO.isRegister()) return false;
+ switch (MO.getReg()) {
+ default: break;
+ case X86::R8: case X86::R9: case X86::R10: case X86::R11:
+ case X86::R12: case X86::R13: case X86::R14: case X86::R15:
+ case X86::R8D: case X86::R9D: case X86::R10D: case X86::R11D:
+ case X86::R12D: case X86::R13D: case X86::R14D: case X86::R15D:
+ case X86::R8W: case X86::R9W: case X86::R10W: case X86::R11W:
+ case X86::R12W: case X86::R13W: case X86::R14W: case X86::R15W:
+ case X86::R8B: case X86::R9B: case X86::R10B: case X86::R11B:
+ case X86::R12B: case X86::R13B: case X86::R14B: case X86::R15B:
+ case X86::XMM8: case X86::XMM9: case X86::XMM10: case X86::XMM11:
+ case X86::XMM12: case X86::XMM13: case X86::XMM14: case X86::XMM15:
+ return true;
+ }
+ return false;
+}
+
+/// determineREX - Determine if the MachineInstr has to be encoded with a X86-64
+/// REX prefix which specifies 1) 64-bit instructions, 2) non-default operand
+/// size, and 3) use of X86-64 extended registers.
+unsigned X86InstrInfo::determineREX(const MachineInstr &MI) {
+ unsigned REX = 0;
+ const TargetInstrDesc &Desc = MI.getDesc();
+
+ // Pseudo instructions do not need REX prefix byte.
+ if ((Desc.TSFlags & X86II::FormMask) == X86II::Pseudo)
+ return 0;
+ if (Desc.TSFlags & X86II::REX_W)
+ REX |= 1 << 3;
+
+ unsigned NumOps = Desc.getNumOperands();
+ if (NumOps) {
+ bool isTwoAddr = NumOps > 1 &&
+ Desc.getOperandConstraint(1, TOI::TIED_TO) != -1;
+
+ // If it accesses SPL, BPL, SIL, or DIL, then it requires a 0x40 REX prefix.
+ unsigned i = isTwoAddr ? 1 : 0;
+ for (unsigned e = NumOps; i != e; ++i) {
+ const MachineOperand& MO = MI.getOperand(i);
+ if (MO.isRegister()) {
+ unsigned Reg = MO.getReg();
+ if (isX86_64NonExtLowByteReg(Reg))
+ REX |= 0x40;
+ }
+ }
+
+ switch (Desc.TSFlags & X86II::FormMask) {
+ case X86II::MRMInitReg:
+ if (isX86_64ExtendedReg(MI.getOperand(0)))
+ REX |= (1 << 0) | (1 << 2);
+ break;
+ case X86II::MRMSrcReg: {
+ if (isX86_64ExtendedReg(MI.getOperand(0)))
+ REX |= 1 << 2;
+ i = isTwoAddr ? 2 : 1;
+ for (unsigned e = NumOps; i != e; ++i) {
+ const MachineOperand& MO = MI.getOperand(i);
+ if (isX86_64ExtendedReg(MO))
+ REX |= 1 << 0;
+ }
+ break;
+ }
+ case X86II::MRMSrcMem: {
+ if (isX86_64ExtendedReg(MI.getOperand(0)))
+ REX |= 1 << 2;
+ unsigned Bit = 0;
+ i = isTwoAddr ? 2 : 1;
+ for (; i != NumOps; ++i) {
+ const MachineOperand& MO = MI.getOperand(i);
+ if (MO.isRegister()) {
+ if (isX86_64ExtendedReg(MO))
+ REX |= 1 << Bit;
+ Bit++;
+ }
+ }
+ break;
+ }
+ case X86II::MRM0m: case X86II::MRM1m:
+ case X86II::MRM2m: case X86II::MRM3m:
+ case X86II::MRM4m: case X86II::MRM5m:
+ case X86II::MRM6m: case X86II::MRM7m:
+ case X86II::MRMDestMem: {
+ unsigned e = isTwoAddr ? 5 : 4;
+ i = isTwoAddr ? 1 : 0;
+ if (NumOps > e && isX86_64ExtendedReg(MI.getOperand(e)))
+ REX |= 1 << 2;
+ unsigned Bit = 0;
+ for (; i != e; ++i) {
+ const MachineOperand& MO = MI.getOperand(i);
+ if (MO.isRegister()) {
+ if (isX86_64ExtendedReg(MO))
+ REX |= 1 << Bit;
+ Bit++;
+ }
+ }
+ break;
+ }
+ default: {
+ if (isX86_64ExtendedReg(MI.getOperand(0)))
+ REX |= 1 << 0;
+ i = isTwoAddr ? 2 : 1;
+ for (unsigned e = NumOps; i != e; ++i) {
+ const MachineOperand& MO = MI.getOperand(i);
+ if (isX86_64ExtendedReg(MO))
+ REX |= 1 << 2;
+ }
+ break;
+ }
+ }
+ return REX;
+}
+
+/// sizePCRelativeBlockAddress - This method returns the size of a PC
+/// relative block address instruction
+///
+static unsigned sizePCRelativeBlockAddress() {
+ return 4;
+}
+
+/// sizeGlobalAddress - Give the size of the emission of this global address
+///
+static unsigned sizeGlobalAddress(bool dword) {
+ if (dword)
+ return 8;
+ else
+ return 4;
+}

How about?
return dword ? 8 : 4;

+
+/// sizeConstPoolAddress - Give the size of the emission of this constant
+/// pool address
+///
+static unsigned sizeConstPoolAddress(bool dword) {
+ if (dword)
+ return 8;
+ else
+ return 4;
+}
+
+/// sizeExternalSymbolAddress - Give the size of the emission of this external
+/// symbol
+///
+static unsigned sizeExternalSymbolAddress(bool dword) {
+ if (dword)
+ return 8;
+ else
+ return 4;
+}
+
+/// sizeJumpTableAddress - Give the size of the emission of this jump
+/// table address
+///
+static unsigned sizeJumpTableAddress(bool dword) {
+ if (dword)
+ return 8;
+ else
+ return 4;
+}
+
+static unsigned sizeConstant(unsigned Size) {
+ return Size;
+}
+
+static unsigned sizeRegModRMByte(){
+ return 1;
+}
+
+static unsigned sizeSIBByte(){
+ return 1;
+}
+
+static unsigned sizeDisplacementField(const MachineOperand *RelocOp) {

How about getDisplacementFieldSize()?

+ unsigned FinalSize = 0;
+ // If this is a simple integer displacement that doesn't require a relocation,
+ // emit it now.
+ if (!RelocOp) {
+ FinalSize += sizeConstant(4);
+ return FinalSize;
+ }
+
+ // Otherwise, this is something that requires a relocation. Emit it as such
+ // now.

Not "emitting" anything here. :slight_smile:

+ if (RelocOp->isGlobalAddress()) {
+ FinalSize += sizeGlobalAddress(false);
+ } else if (RelocOp->isConstantPoolIndex()) {
+ FinalSize += sizeConstPoolAddress(false);
+ } else if (RelocOp->isJumpTableIndex()) {
+ FinalSize += sizeJumpTableAddress(false);
+ } else {
+ assert(0 && "Unknown value to relocate!");
+ }
+ return FinalSize;
+}
+
+static unsigned sizeMemModRMByte(const MachineInstr &MI, unsigned Op,
+ bool IsPIC, bool Is64BitMode) {

getMemModRMByteSize()?

+ const MachineOperand &Op3 = MI.getOperand(Op+3);
+ int DispVal = 0;
+ const MachineOperand *DispForReloc = 0;
+ unsigned FinalSize = 0;
+
+ // Figure out what sort of displacement we have to handle here.
+ if (Op3.isGlobalAddress()) {
+ DispForReloc = &Op3;
+ } else if (Op3.isConstantPoolIndex()) {
+ if (Is64BitMode || IsPIC) {
+ DispForReloc = &Op3;
+ } else {
+ DispVal = 1;
+ }
+ } else if (Op3.isJumpTableIndex()) {
+ if (Is64BitMode || IsPIC) {
+ DispForReloc = &Op3;
+ } else {
+ DispVal = 1;
+ }
+ } else {
+ DispVal = 1;
+ }
+
+ const MachineOperand &Base = MI.getOperand(Op);
+ const MachineOperand &IndexReg = MI.getOperand(Op+2);
+
+ unsigned BaseReg = Base.getReg();
+
+ // Is a SIB byte needed?
+ if (IndexReg.getReg() == 0 &&
+ (BaseReg == 0 || X86RegisterInfo::getX86RegNum(BaseReg) != N86::ESP)) {
+ if (BaseReg == 0) { // Just a displacement?
+ // Emit special case [disp32] encoding
+ ++FinalSize;
+ FinalSize += sizeDisplacementField(DispForReloc);
+ } else {
+ unsigned BaseRegNo = X86RegisterInfo::getX86RegNum(BaseReg);
+ if (!DispForReloc && DispVal == 0 && BaseRegNo != N86::EBP) {
+ // Emit simple indirect register encoding... [EAX] f.e.
+ ++FinalSize;
+ // Be pessimistic and assume it's a disp32, not a disp8
+ } else {
+ // Emit the most general non-SIB encoding: [REG+disp32]
+ ++FinalSize;
+ FinalSize += sizeDisplacementField(DispForReloc);
+ }
+ }
+
+ } else { // We need a SIB byte, so start by outputting the ModR/M byte first
+ assert(IndexReg.getReg() != X86::ESP &&
+ IndexReg.getReg() != X86::RSP && "Cannot use ESP as index reg!");
+
+ bool ForceDisp32 = false;
+ if (BaseReg == 0 || DispForReloc) {
+ // Emit the normal disp32 encoding.
+ ++FinalSize;
+ ForceDisp32 = true;
+ } else {
+ ++FinalSize;
+ }
+
+ FinalSize += sizeSIBByte();
+
+ // Do we need to output a displacement?
+ if (DispVal != 0 || ForceDisp32) {
+ FinalSize += sizeDisplacementField(DispForReloc);
+ }
+ }
+ return FinalSize;
+}
+
+static unsigned GetInstSizeWithDesc(const MachineInstr &MI,
+ const TargetInstrDesc *Desc,
+ bool IsPIC, bool Is64BitMode) {

I probably would have overload the name GetInstSizeInBytes(). Not a big deal.

+ DOUT << MI;

Do you want the DOUT here?

+
+ unsigned Opcode = Desc->Opcode;
+ unsigned FinalSize = 0;
+
+ // Emit the lock opcode prefix as needed.
+ if (Desc->TSFlags & X86II::LOCK) ++FinalSize;
+
+ // Emit the repeat opcode prefix as needed.
+ if ((Desc->TSFlags & X86II::Op0Mask) == X86II::REP) ++FinalSize;
+
+ // Emit the operand size opcode prefix as needed.
+ if (Desc->TSFlags & X86II::OpSize) ++FinalSize;
+
+ // Emit the address size opcode prefix as needed.
+ if (Desc->TSFlags & X86II::AdSize) ++FinalSize;
+
+ bool Need0FPrefix = false;
+ switch (Desc->TSFlags & X86II::Op0Mask) {
+ case X86II::TB: // Two-byte opcode prefix
+ case X86II::T8: // 0F 38
+ case X86II::TA: // 0F 3A
+ Need0FPrefix = true;
+ break;
+ case X86II::REP: break; // already handled.
+ case X86II::XS: // F3 0F
+ ++FinalSize;
+ Need0FPrefix = true;
+ break;
+ case X86II::XD: // F2 0F
+ ++FinalSize;
+ Need0FPrefix = true;
+ break;
+ case X86II::D8: case X86II::D9: case X86II::DA: case X86II::DB:
+ case X86II::DC: case X86II::DD: case X86II::de: case X86II::DF:
+ ++FinalSize;
+ break; // Two-byte opcode prefix
+ default: assert(0 && "Invalid prefix!");
+ case 0: break; // No prefix!
+ }
+
+ if (Is64BitMode) {
+ // REX prefix
+ unsigned REX = X86InstrInfo::determineREX(MI);
+ if (REX)
+ ++FinalSize;
+ }
+
+ // 0x0F escape code must be emitted just before the opcode.
+ if (Need0FPrefix)
+ ++FinalSize;
+
+ switch (Desc->TSFlags & X86II::Op0Mask) {
+ case X86II::T8: // 0F 38
+ ++FinalSize;
+ break;
+ case X86II::TA: // 0F 3A
+ ++FinalSize;
+ break;
+ }
+
+ // If this is a two-address instruction, skip one of the register operands.
+ unsigned NumOps = Desc->getNumOperands();
+ unsigned CurOp = 0;
+ if (NumOps > 1 && Desc->getOperandConstraint(1, TOI::TIED_TO) != -1)
+ CurOp++;
+
+ switch (Desc->TSFlags & X86II::FormMask) {
+ default: assert(0 && "Unknown FormMask value in X86 MachineCodeEmitter!");
+ case X86II::Pseudo:
+ // Remember the current PC offset, this is the PIC relocation
+ // base address.
+ switch (Opcode) {
+ default:
+ break;
+ case TargetInstrInfo::INLINEASM: {
+ const MachineFunction *MF = MI.getParent()->getParent();
+ const char *AsmStr = MI.getOperand(0).getSymbolName();
+ const TargetAsmInfo* AI = MF->getTarget().getTargetAsmInfo();
+ FinalSize += AI->getInlineAsmLength(AsmStr);
+ break;
+ }
+ case TargetInstrInfo::label:
+ break;
+ case TargetInstrInfo::IMPLICIT_DEF:
+ case TargetInstrInfo::DECLARE:
+ case X86::DWARF_LOC:
+ case X86::FP_REG_KILL:
+ break;
+ case X86::MOVPC32r: {
+ // This emits the "call" portion of this pseudo instruction.
+ ++FinalSize;
+ FinalSize += sizeConstant(X86InstrInfo::sizeOfImm(Desc));
+ break;
+ }
+ CurOp = NumOps;
+ break;
+ case X86II::RawFrm:
+ ++FinalSize;
+
+ if (CurOp != NumOps) {
+ const MachineOperand &MO = MI.getOperand(CurOp++);
+ if (MO.isMachineBasicBlock()) {
+ FinalSize += sizePCRelativeBlockAddress();
+ } else if (MO.isGlobalAddress()) {
+ FinalSize += sizeGlobalAddress(false);
+ } else if (MO.isExternalSymbol()) {
+ FinalSize += sizeExternalSymbolAddress(false);
+ } else if (MO.isImmediate()) {
+ FinalSize += sizeConstant(X86InstrInfo::sizeOfImm(Desc));
+ } else {
+ assert(0 && "Unknown RawFrm operand!");
+ }
+ }
+ break;
+
+ case X86II::AddRegFrm:
+ ++FinalSize;
+
+ if (CurOp != NumOps) {
+ const MachineOperand &MO1 = MI.getOperand(CurOp++);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
+ if (MO1.isImmediate())
+ FinalSize += sizeConstant(Size);
+ else {
+ bool dword = false;
+ if (Opcode == X86::MOV64ri)
+ dword = true;
+ if (MO1.isGlobalAddress()) {
+ FinalSize += sizeGlobalAddress(dword);
+ } else if (MO1.isExternalSymbol())
+ FinalSize += sizeExternalSymbolAddress(dword);
+ else if (MO1.isConstantPoolIndex())
+ FinalSize += sizeConstPoolAddress(dword);
+ else if (MO1.isJumpTableIndex())
+ FinalSize += sizeJumpTableAddress(dword);
+ }
+ }
+ break;
+
+ case X86II::MRMDestReg: {
+ ++FinalSize;
+ FinalSize += sizeRegModRMByte();
+ CurOp += 2;
+ if (CurOp != NumOps)
+ FinalSize += sizeConstant(X86InstrInfo::sizeOfImm(Desc));
+ break;
+ }
+ case X86II::MRMDestMem: {
+ ++FinalSize;
+ FinalSize += sizeMemModRMByte(MI, CurOp, IsPIC, Is64BitMode);
+ CurOp += 5;
+ if (CurOp != NumOps)
+ FinalSize += sizeConstant(X86InstrInfo::sizeOfImm(Desc));
+ break;
+ }
+
+ case X86II::MRMSrcReg:
+ ++FinalSize;
+ FinalSize += sizeRegModRMByte();
+ CurOp += 2;
+ if (CurOp != NumOps)
+ FinalSize += sizeConstant(X86InstrInfo::sizeOfImm(Desc));
+ break;
+
+ case X86II::MRMSrcMem: {
+
+ ++FinalSize;
+ FinalSize += sizeMemModRMByte(MI, CurOp+1, IsPIC, Is64BitMode);
+ CurOp += 5;
+ if (CurOp != NumOps)
+ FinalSize += sizeConstant(X86InstrInfo::sizeOfImm(Desc));
+ break;
+ }
+
+ case X86II::MRM0r: case X86II::MRM1r:
+ case X86II::MRM2r: case X86II::MRM3r:
+ case X86II::MRM4r: case X86II::MRM5r:
+ case X86II::MRM6r: case X86II::MRM7r:
+ ++FinalSize;
+ FinalSize += sizeRegModRMByte();
+
+ if (CurOp != NumOps) {
+ const MachineOperand &MO1 = MI.getOperand(CurOp++);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
+ if (MO1.isImmediate())
+ FinalSize += sizeConstant(Size);
+ else {
+ bool dword = false;
+ if (Opcode == X86::MOV64ri32)
+ dword = true;
+ if (MO1.isGlobalAddress()) {
+ FinalSize += sizeGlobalAddress(dword);
+ } else if (MO1.isExternalSymbol())
+ FinalSize += sizeExternalSymbolAddress(dword);
+ else if (MO1.isConstantPoolIndex())
+ FinalSize += sizeConstPoolAddress(dword);
+ else if (MO1.isJumpTableIndex())
+ FinalSize += sizeJumpTableAddress(dword);
+ }
+ }
+ break;
+
+ case X86II::MRM0m: case X86II::MRM1m:
+ case X86II::MRM2m: case X86II::MRM3m:
+ case X86II::MRM4m: case X86II::MRM5m:
+ case X86II::MRM6m: case X86II::MRM7m: {
+
+ ++FinalSize;
+ FinalSize += sizeMemModRMByte(MI, CurOp, IsPIC, Is64BitMode);
+ CurOp += 4;
+
+ if (CurOp != NumOps) {
+ const MachineOperand &MO = MI.getOperand(CurOp++);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
+ if (MO.isImmediate())
+ FinalSize += sizeConstant(Size);
+ else {
+ bool dword = false;
+ if (Opcode == X86::MOV64mi32)
+ dword = true;
+ if (MO.isGlobalAddress()) {
+ FinalSize += sizeGlobalAddress(dword);
+ } else if (MO.isExternalSymbol())
+ FinalSize += sizeExternalSymbolAddress(dword);
+ else if (MO.isConstantPoolIndex())
+ FinalSize += sizeConstPoolAddress(dword);
+ else if (MO.isJumpTableIndex())
+ FinalSize += sizeJumpTableAddress(dword);
+ }
+ }
+ break;
+ }
+
+ case X86II::MRMInitReg:
+ ++FinalSize;
+ // Duplicate register, used by things like MOV8r0 (aka xor reg,reg).
+ FinalSize += sizeRegModRMByte();
+ ++CurOp;
+ break;
+ }
+
+ if (!Desc->isVariadic() && CurOp != NumOps) {
+ cerr << "Cannot encode: ";

"Cannot determine size"?

+ MI.dump();
+ cerr << '\n';
+ abort();
+ }
+
+ return FinalSize;
+}
+
+unsigned X86InstrInfo::GetInstSize(const MachineInstr *MI) const {
+ const TargetInstrDesc &Desc = MI->getDesc();
+ bool IsPIC = (TM.getRelocationModel() == Reloc::PIC_);
+ bool Is64BitMode = ((X86Subtarget*)TM.getSubtargetImpl())->is64Bit();
+ unsigned Size = GetInstSizeWithDesc(*MI, &Desc, IsPIC, Is64BitMode);
+ if (Desc.getOpcode() == X86::MOVPC32r) {
+ Size += GetInstSizeWithDesc(*MI, &get(X86::POP32r), IsPIC, Is64BitMode);
+ }

I would prefer this special case is handled in GetInstSizeWithDesc().

+ return Size;
+}

Looks great! Thanks.

Evan