keeping instructions in order and hidden dependencies

I have some pseudos that I am expanding in the Mips 16 port. Currently they are blasted in one chunk as a multi line instruction sequence but I am changing the code now to expand them
after register allocation.

They are essentially macros and I need to make sure, at this time at least, that the individual instructions are not reordered or moved around.

There are dependencies sometimes between the instructions that I'm not sure how to tell LLVM about.

For example, this first one is a two instruction macro where register T8 is implicitly set by the first instruction and used by the second instruction.

T8 is not a mips16 registers but some instructions use it implicitly and it can function as
a condition code register.

In this first case, I do a compare (CMP) and the result sets T8 and then the branch instruction following it uses this as if it were a condition code register.

Maybe I can just set first instruction as defining T8 and the second as using it for the last time.

Without this expansion, this is not an issue.

At some future time it might be possible to reuse this condition register later and move it around but for now I'm not needing that optimization.

Tia.

Reed

Some of my pseudos do conditional branch .+4 and such.

I don't want the instruction scheduler to get creative on me.

AFAIK, You have two choices: use a pseudo that is lowered into separate instructions later as part of asm emission, or use MI bundles. The former is generally what existing targets use for this sort of thing, but perhaps the second would work better for you.

Cameron

One of my reasons for lowering things early is that I need to get an accurate count of the size of things.

Some of the pseudos even have instructions like compare immediate, which in Mips 16 has two forms depending on the magnitude of the immediate field.

Though I suppose it's possible to leave things as a pseudo and calculate their size, though I'm not sure where I could store the result.

I need the size for long/short jump optimization, constant islands, delay slot filling (on mips16 you can't put a 32 bit instruction in a delay slot) and other things.

I have not used bundles.

Is the idea that I could create a bundle and insert the instructions in the bundle and then they are guaranteed to stay together?

At some future time I will need to teach LLVM more about what I'm doing so that it can schedule the individual instructions but now I'm mostly
concerned about not having bugs from subtle optimization issues.

You are trying to do a few different things here, and a uniform solution may not work for all of them. For a fixed instruction sequence, e.g. a special kind of move-and-branch sequence used for tail calls, you probably want a pseudo. If you are trying to combine arbitrary instructions together, e.g. Thumb IT blocks, you probably want to use bundles, even if the sequences are a fixed length. I think bundles might work well for your delay slots. Bundles will force instructions to stay together in sequence, and there is really no other way to do this with arbitrary instructions.

If you have a compare instruction that generates a flag and a branch instruction that uses it, you want to model this dependency with an unallocatable physical register, e.g. the CPSR on ARM:

// Condition code registers.
def CCR : RegisterClass<"ARM", [i32], 32, (add CPSR)> {
  let CopyCost = -1; // Don't allow copying of status registers.
  let isAllocatable = 0;
}

This will require you to mark defs and uses (and kills), but if you're using it purely inside of a single basic block you won't have to worry about it too much. If you have instructions that may only generate this dependency in certain circumstances, you can use implicit def/use operands, which can be added to any MachineInstr even if the register is not part of the normal def/use list. If you need the instructions to always be adjacent and not simply satisfy a physical register dependency, then you'll have to use pseudos or bundles.

For long/short jump optimizations you probably want to optimistically assume that all jumps are short and add a pass to relax them prior to asm emission, but I don't know the exact constraints of MIPS. On ARM this is done as part of the constant island pass.

I CC'd Jim because he's the expert on the ARM constant island formation and he can give you some advice from experience there. I have always avoided that code. :slight_smile:

Cameron

Sounds like bundles will be the simplest to start with though I suppose
I could just lower the pseudos after scheduling is done; for now.

Bundles will prevent things from being able to be scheduled in more creative ways but for that I need to think more about the problem.

So I can just create a bundle, insert instructions in it, and all will work more or less?

I'm trying to take the next step now and separate out the blobs that I'm emitted as part of pseudos.

I want to have individual instructions, even if they are part of bundles. That also makes it easier on the direct object emitter.

MOdelling this T8 register (the source of most of the pseudos) as an unallocatable condition code regiter it probably the right thing to do.
I will think about that.

This particular register can be copied to normal mips16 registers but
I try and avoid that and instead use it as is using special instructions
that implicitly use T8.

Looks like Akira is already using bundles in some other parts of the Mips port. He's using the helper class for that.

I got my expander code working for the pseudos yesterday and today I just need to make sure I'm doing that in a safe place and in a safe way
and then I can finish all of that along with my 16/32 bit optimization; hopefully before Tuesday which is my goal.

I have a lot of pseudos but there are some patterns to them so the C++
code that expands them should not be too much.

From: "Cameron Zwarich" <zwarich@apple.com>
To: "Reed Kotler" <rkotler@mips.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Sunday, February 17, 2013 1:49:46 AM
Subject: Re: [LLVMdev] keeping instructions in order and hidden dependencies

You are trying to do a few different things here, and a uniform
solution may not work for all of them. For a fixed instruction
sequence, e.g. a special kind of move-and-branch sequence used for
tail calls, you probably want a pseudo. If you are trying to combine
arbitrary instructions together, e.g. Thumb IT blocks, you probably
want to use bundles, even if the sequences are a fixed length. I
think bundles might work well for your delay slots. Bundles will
force instructions to stay together in sequence, and there is really
no other way to do this with arbitrary instructions.

If you have a compare instruction that generates a flag and a branch
instruction that uses it, you want to model this dependency with an
unallocatable physical register, e.g. the CPSR on ARM:

// Condition code registers.
def CCR : RegisterClass<"ARM", [i32], 32, (add CPSR)> {
  let CopyCost = -1; // Don't allow copying of status registers.
  let isAllocatable = 0;
}

This will require you to mark defs and uses (and kills), but if
you're using it purely inside of a single basic block you won't have
to worry about it too much. If you have instructions that may only
generate this dependency in certain circumstances, you can use
implicit def/use operands, which can be added to any MachineInstr
even if the register is not part of the normal def/use list. If you
need the instructions to always be adjacent and not simply satisfy a
physical register dependency, then you'll have to use pseudos or
bundles.

For long/short jump optimizations you probably want to optimistically
assume that all jumps are short and add a pass to relax them prior
to asm emission, but I don't know the exact constraints of MIPS. On
ARM this is done as part of the constant island pass.

If you'd like other examples, for PowerPC this is done in PPCBranchSelector.cpp, and Hexagon does something similar in HexagonFixupHwLoops.cpp.

-Hal

You are trying to do a few different things here, and a uniform solution may not work for all of them. For a fixed instruction sequence, e.g. a special kind of move-and-branch sequence used for tail calls, you probably want a pseudo. If you are trying to combine arbitrary instructions together, e.g. Thumb IT blocks, you probably want to use bundles, even if the sequences are a fixed length. I think bundles might work well for your delay slots. Bundles will force instructions to stay together in sequence, and there is really no other way to do this with arbitrary instructions.

If you have a compare instruction that generates a flag and a branch instruction that uses it, you want to model this dependency with an unallocatable physical register, e.g. the CPSR on ARM:

// Condition code registers.
def CCR : RegisterClass<"ARM", [i32], 32, (add CPSR)> {
let CopyCost = -1; // Don't allow copying of status registers.
let isAllocatable = 0;
}

This will require you to mark defs and uses (and kills), but if you're using it purely inside of a single basic block you won't have to worry about it too much. If you have instructions that may only generate this dependency in certain circumstances, you can use implicit def/use operands, which can be added to any MachineInstr even if the register is not part of the normal def/use list. If you need the instructions to always be adjacent and not simply satisfy a physical register dependency, then you'll have to use pseudos or bundles.

For long/short jump optimizations you probably want to optimistically assume that all jumps are short and add a pass to relax them prior to asm emission, but I don't know the exact constraints of MIPS. On ARM this is done as part of the constant island pass.

I CC'd Jim because he's the expert on the ARM constant island formation and he can give you some advice from experience there. I have always avoided that code. :slight_smile:

Good answer this time (though in a separate thread)! I would add that bundles should only be formed after coalescing. expandPostRAPseudos is clearly the safest for this sort of thing if you can get the registers you need.

-Andy