Symbol folding with MC

Hello, I have some questions regarding folding operations with symbols during the instruction print stage with MC. At the moment I’m working with global symbols but i guess that other symbol types should be equivalent.

My first question is how can i negate the address of a symbol?

Consider this piece of code:
char g_var[80];
char foo(int a) { return g_var[a]; }

this gets compiles into something like (in pseudo asm):
addi a, g_var
load retreg, a

but i dont have an add with immediate instruction so i have to do the following
subi a, -g_var // negate g_var addr
load retreg, a

A solution I thought could be passing a target flag indicating that a negation is needed when lowering the machineinstr into a MCInst, and adding a MCExpr to negate the symbol. But I want to know if there’s a better way to do this, instead of delaying it to the stage of MCInst lowering.

The other questions is how to fold single and complex operations on symbols, say we have something like:

unsigned int g_var[80];
unsigned int foo() { return (unsigned int)&g_var[0] & 0x1234; }

Currently this moves the g_var address into a register and then performs the and operation, but i want this to be done at compilation time, so we have something like:

move retreg, (g_val & 0x1234)

Without touching anything else only additions get folded, but this could be expanded into other operations like or, xor, shifts, etc… A more complex case would be combining operations in a single statement. So my question is how to achieve this. As an idea I’ve thought of using a pseudo instruction that takes an operand depending of the instruction to fold, then expand this pseudo instr into the real move instruction by setting a target flag depending on the operation to fold, and in the MCInst lower stage create a MCExpr depending on these flags, but this has the problem that it can’t handle more than one operation per statement.

Thanks

Hello,

Hello, I have some questions regarding folding operations with symbols during the instruction print stage with MC. At the moment I'm working with global symbols but i guess that other symbol types should be equivalent.

My first question is how can i negate the address of a symbol?

Consider this piece of code:
char g_var[80];
char foo(int a) { return g_var[a]; }

this gets compiles into something like (in pseudo asm):
addi a, g_var
load retreg, a

but i dont have an add with immediate instruction so i have to do the following
subi a, -g_var // negate g_var addr
load retreg, a

A solution I thought could be passing a target flag indicating that a negation is needed when lowering the machineinstr into a MCInst, and adding a MCExpr to negate the symbol. But I want to know if there's a better way to do this, instead of delaying it to the stage of MCInst lowering.

These sorts of constraints are normally enforced at prior to lowering to MC. Doing them directly as part of instruction selection as much as possible is good (the ARM target has examples of this for using ADD/SUB immediate instructions). For example, don't express in the target .td file(s) that you have an add-immediate instruction if you actually don't, but do add patterns for the operation using the subtract-immediate instruction. For symbolic immediate references, you're correct that the expression on the operand will include the negation.

MC is designed such that it should always represent legal instructions, and only legal instructions. That includes things like register operands being legal for the instruction, immediates being in range, etc.. There's (currently) no verification pass for those constraints, but that's the idea, so waiting 'til after MC lowering to check for and transform the instructions is not preferable and likely to break if/when we add such a verification pass.

If your target has properties that make it impossible to do this at instruction selection time, I would suggest a late machine function pass that will scan for and transform the instructions as necessary. This would all be at the MachineInstr level before lowering to MC.

The other questions is how to fold single and complex operations on symbols, say we have something like:

unsigned int g_var[80];
unsigned int foo() { return (unsigned int)&g_var[0] & 0x1234; }

Currently this moves the g_var address into a register and then performs the and operation, but i want this to be done at compilation time, so we have something like:

move retreg, (g_val & 0x1234)

For many targets this isn't legal, as the object file format used can't represent those sorts of expressions in a relocation. It sounds like your situation is different, though.

Without touching anything else only additions get folded, but this could be expanded into other operations like or, xor, shifts, etc.. A more complex case would be combining operations in a single statement. So my question is how to achieve this. As an idea I've thought of using a pseudo instruction that takes an operand depending of the instruction to fold, then expand this pseudo instr into the real move instruction by setting a target flag depending on the operation to fold, and in the MCInst lower stage create a MCExpr depending on these flags, but this has the problem that it can't handle more than one operation per statement.

A custom lowering or a target DAG combine would likely be your best bet.

Regards,
  Jim

Hello Jim thanks for the reply,

For normal additions with immediates I’ve done the same as ARM does, basically transforming add(x, imm) nodes to sub(x, -imm) with a pattern in the .td file like this:
def : Pat<(add DLDREGS:$src1, imm:$src2),
(SUBIWRdK DLDREGS:$src1, (imm16_neg_XFORM imm:$src2))>;

Now, the typical pattern concerning additions with global addresses looks like this: (taken from x86)
def : Pat<(add GR32:$src1, (X86Wrapper tglobaladdr :$src2)),
(ADD32ri GR32:$src1, tglobaladdr:$src2)>;

but i can’t write that since i dont have an add with imm instr, and doing:

def : Pat<(add DREGS:$src, (Wrapper tglobaladdr:$src2)),
(SUBIWRdK DREGS:$src, tglobaladdr:$src2)>;
is wrong because the tglobaladdr has to be negated somehow, so i don’t understand how should I negate the symbol reference using patterns, if it’s even possible. The obvious hack is adding a “-” char when lowering the symbol reference into text.

Regarding my second question, as you mentioned all symbols have static addresses so no relocations are performed, so it should be safe to fold immediate operations with the symbol reference. My problem here is that i don’t know how to fold an arbitrary expression on a global (initially in the form of a DAG) to something that can be translated later into an expression with MC. It’s something weird because operations are performed in the operand of an instruction, and since it has to support any arbitrary expression you can’t have all combinations of operations using custom instructions. So how should i proceed in here using custom lowering or target dag combines?

Thanks

Hello Jim thanks for the reply,

For normal additions with immediates I've done the same as ARM does, basically transforming add(x, imm) nodes to sub(x, -imm) with a pattern in the .td file like this:
def : Pat<(add DLDREGS:$src1, imm:$src2),
              (SUBIWRdK DLDREGS:$src1, (imm16_neg_XFORM imm:$src2))>;

Cool. That's exactly the sort of thing I was referring to.

Now, the typical pattern concerning additions with global addresses looks like this: (taken from x86)
def : Pat<(add GR32:$src1, (X86Wrapper tglobaladdr :$src2)),
              (ADD32ri GR32:$src1, tglobaladdr:$src2)>;

but i can't write that since i dont have an add with imm instr, and doing:

def : Pat<(add DREGS:$src, (Wrapper tglobaladdr:$src2)),
              (SUBIWRdK DREGS:$src, tglobaladdr:$src2)>;
is wrong because the tglobaladdr has to be negated somehow, so i don't understand how should I negate the symbol reference using patterns, if it's even possible. The obvious hack is adding a "-" char when lowering the symbol reference into text.

You can probably do some of this with a complex pattern that has a transform function. Something like (completely untested, etc):

def neg_tglobaladdr_XFORM : SDNodeXForm<tglobaladdr, [{return makeNegatedGlobalAddr(CurDAG);}]>;
def neg_tglobaladdr : PatLeaf<(tglobaladdr), [{
    return <true if the curdag really is a tglobaladdr, false otherwise>;
  }], neg_tglobaladdr_XFORM>;

def : Pat<(add DREGS:$src, (Wrapper tglobaladdr:$src2)),
              (SUBIWRdK DREGS:$src, neg_tglobaladdr:$src2)>;

As you note below, however, that sort of thing only gets you partway there.

Regarding my second question, as you mentioned all symbols have static addresses so no relocations are performed, so it should be safe to fold immediate operations with the symbol reference. My problem here is that i don't know how to fold an arbitrary expression on a global (initially in the form of a DAG) to something that can be translated later into an expression with MC. It's something weird because operations are performed in the operand of an instruction, and since it has to support any arbitrary expression you can't have all combinations of operations using custom instructions. So how should i proceed in here using custom lowering or target dag combines?

Yeah, machine instruction operands aren't set up to handle that sort of thing. This is outside the scope of what LLVM ordinarily does.

I suspect that you'll need to modify the MachineOperand class to have a Kind that accepts MCExpr operands. The combiners and isel patterns would then have a place to hang the expressions they create. Your MC lowering pass would then have the information it needs.

I'm not completely thrilled with that idea, as it seems a bit heavyweight. Perhaps someone else has a better plan they can suggest.

Regards,
  Jim

Thanks Jim, I’ve implemented the negation part successfully :slight_smile: maybe the second part could be a possible feature request so others could use it aswell?

2011/4/27 Jim Grosbach <grosbach@apple.com>

Thanks Jim, I've implemented the negation part successfully :slight_smile: maybe the second part could be a possible feature request so others could use it aswell?

Glad to hear things are moving forward for you. Feel free to file a PR for the enhancement.

Regards,
  Jim