tblgen multiclasses

For anyone interested, X86InstrSSE.td makes extensive use of multiclasses now if people are looking for examples other than the sparc backend.

-Chris

Hi Chris,

Thanks for this info. This provides even better and more advanced
examples of multiclass usage!

But your previous explanations were so good that I implemented in my
backend last week almost the same that you've done now in the
X86InstrSSE.td. I even introduced isCommutable parameter to indicate
this property, just as you did. So, by now integer arithmetic and
general purpose instructions are implemented. I'm working on the FP
support now.

Some feedback about tblgen from my side, i.e. an LLVM newcomer with
quite some compiler construction experience:
When it comes to tblgen descriptions and corresponding DAG selection
and lowering code to be written for a backend, I found the use of
InFlag, OutFlag and chains less understandable, very underspecified and
not (well ) documented. Even though they are used in all backends,
their semantics and correct use is far from obvious (even though I'm
not new to compiler writing). I spent most time on getting these things
right. And I learned that if they misbehave it has very fatal
consequences on the overall code selection process. Now it works, but I
still don't quite understand their overall semantics and don't feel
very confident about them. And I guess I'm not the only one. Therefore,
I would kindly ask to provide here on the developers list and may be
even in the docs a clear explanation of these concepts, giving
guidelines on their usage and probably a small, but understandable
example making use of them. I think such a description would make
creation of new backends much easier and faster.

Best Regards,
Roman

Hi,

I'm now ready to implement the FP support for my embedded target.

My target supports only f64 at the moment.
Question: How can I tell LLVM that float is the same as double on my
target? May be by assigning the same register class to both MVT::f32
and MVT::f64?

But FP is supported only in the emulated mode, because the target does
not have any hardware support for FP. Therefore each FP operation is
supposed to be converted into a call of an assembler function
implementing a corresponding operation. All these FP operations
implemented in assembler always expect parameters on concrete f64
registers, i.e. %d0,%d1 and return their results in reg %d0. The value
of %d1 is clobbered by such calls. (actually %dX are pseudo regs, see
below).

1. Since these FP emulation functions takes operands on registers and
produce operands on registers without any further side-effects, they
look pretty much like real instructions. Thus I have the idea to
represent them in the tblgen instruction descriptions like
pseudo-instructions, where constraints define which concrete physical
%dX registers are to use. This would enfore correct register
allocation.

For example:
def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0, d1))]>,
           Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1

This seems to work, at least on simple test files.
            
But I would also need a way to convert such a FSUB64 pseudo-instruction
into the assembler function call, e.g. "call __fsub64". At the moment I
don't quite understand at which stage and how I should do it (lowering,
selection, combining??? ). What would be the easiest way to map it to
such a call instruction?

One issue with the described approach is a pretty inefficient code
resulting after the register allocation. For example, there are a lot
of instructions of the form "mov %d0, %d0", copying the register into
itself. My guess is that the following happens:
before reg.alloc there are instructions of the form:
mov %virtual_reg0, %d0
mov %virtual_reg1, %d1
fsub64
which ensure that operand constraints of the operation are fullfilled
and they are on the right registers. During the alloction register
allocator assigns the same physical register to the virtual register.
Therefore the code becomes:
mov %d0, %d0
mov %d1, %d1
fsub64

But then there is no call to "useless copies elimination" pass or
peephole pass that would basically remove such copies.

Question: Is there such a pass available in LLVM? Actually, it is also
interesting to know, why the regalloc does not eliminate such coalesced
moves itself? Wouldn't it make sense?

Does this idea of representing the emulated FP operation calls as
instructions as described above make some sense? Or do you see easier
or more useful ways to do it?

2. In reality, the processor has only 32bit regs. Therefore, any f64
value should be mapped to two 32bit registers. What is the best way to
achieve it? I guess this is a well-known kind of problem.

So far I was thinking about introducing some pseudo f64 registers, i.e.
%dX used above, and working with them in the instruction descriptions.
And then at the later stages, probably after lowering and selection,
expand them into pairs of load or store operations.

But I'm not quite sure that this is a right way to go. I suspect that
something can be done using some form of EXPAND operation in the
lowering pass. For example, I see that assignments of f64 immediates to
globals is expanded by LLVM automatically into two 32bit stores, which
is very nice. May be it is also possible to do it for 64bit registers
as well?

OK, enough questions for today :wink:

Thanks for any feedback,
Roman

I'm now ready to implement the FP support for my embedded target.

cool.

My target supports only f64 at the moment.
Question: How can I tell LLVM that float is the same as double on my
target? May be by assigning the same register class to both MVT::f32
and MVT::f64?

Just don't assign a register class for the f32 type. This is what the X86 backend does when it is in "floating point stack mode". This will implicitly promote everything to f64.

But FP is supported only in the emulated mode, because the target does
not have any hardware support for FP. Therefore each FP operation is
supposed to be converted into a call of an assembler function
implementing a corresponding operation.

Ok.

All these FP operations
implemented in assembler always expect parameters on concrete f64
registers, i.e. %d0,%d1 and return their results in reg %d0. The value
of %d1 is clobbered by such calls. (actually %dX are pseudo regs, see
below).

Ok.

1. Since these FP emulation functions takes operands on registers and
produce operands on registers without any further side-effects, they
look pretty much like real instructions. Thus I have the idea to
represent them in the tblgen instruction descriptions like
pseudo-instructions, where constraints define which concrete physical
%dX registers are to use. This would enfore correct register
allocation.

For example:
def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0, d1))]>,
          Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1

This seems to work, at least on simple test files.

That should be a robust solution.

But I would also need a way to convert such a FSUB64 pseudo-instruction
into the assembler function call, e.g. "call __fsub64". At the moment I
don't quite understand at which stage and how I should do it (lowering,
selection, combining??? ). What would be the easiest way to map it to
such a call instruction?

Why not just make the asm string be "call __fsub64"?

One issue with the described approach is a pretty inefficient code
resulting after the register allocation. For example, there are a lot
of instructions of the form "mov %d0, %d0", copying the register into
itself. My guess is that the following happens:

Make sure to implement TargetInstrInfo::isMoveInstr. This will allow the coallescer to eliminate these.

before reg.alloc there are instructions of the form:
mov %virtual_reg0, %d0
mov %virtual_reg1, %d1
fsub64
which ensure that operand constraints of the operation are fullfilled
and they are on the right registers. During the alloction register
allocator assigns the same physical register to the virtual register.
Therefore the code becomes:
mov %d0, %d0
mov %d1, %d1
fsub64

But then there is no call to "useless copies elimination" pass or
peephole pass that would basically remove such copies.

Yep.

Question: Is there such a pass available in LLVM? Actually, it is also
interesting to know, why the regalloc does not eliminate such coalesced
moves itself? Wouldn't it make sense?

The coallescer does, please implement isMoveInstr.

Does this idea of representing the emulated FP operation calls as
instructions as described above make some sense? Or do you see easier
or more useful ways to do it?

That is a reasonable way to do it. Another reasonable way would be to lower them in the instruction selector itself though the use of custom expanders. In practice, using instructions with "call foo" in them instead of lowering to calls may be simpler. Also, if you *know* that these calls don't clobber the normal set of callee clobbered registers, using the asm string is the right way to go.

2. In reality, the processor has only 32bit regs. Therefore, any f64
value should be mapped to two 32bit registers. What is the best way to
achieve it? I guess this is a well-known kind of problem.

Ah, this is trickier. :slight_smile: We have a robust solution in the integer side, but don't allow the FP side to use it.

For the time being, I'd suggest defining an "fp register set" which just aliases the integer register set (i.e. say that d0 overlaps r0+r1).

So far I was thinking about introducing some pseudo f64 registers, i.e.
%dX used above, and working with them in the instruction descriptions.
And then at the later stages, probably after lowering and selection,
expand them into pairs of load or store operations.

If you tell the register allocator about the "aliases", it should do the right thing for you. Take a look at how aliasing in the X86 register set is handled in X86RegisterInfo.td.

-Chris

But your previous explanations were so good that I implemented in my
backend last week almost the same that you've done now in the
X86InstrSSE.td. I even introduced isCommutable parameter to indicate
this property, just as you did. So, by now integer arithmetic and
general purpose instructions are implemented. I'm working on the FP
support now.

Great :slight_smile:

Some feedback about tblgen from my side, i.e. an LLVM newcomer with
quite some compiler construction experience:

When it comes to tblgen descriptions and corresponding DAG selection
and lowering code to be written for a backend, I found the use of
InFlag, OutFlag and chains less understandable, very underspecified and
not (well) documented. Even though they are used in all backends,
their semantics and correct use is far from obvious (even though I'm
not new to compiler writing).

Right, it is unfortunate that the code generator isn't better documented :(. Patches gratiously accepted :slight_smile:

I spent most time on getting these things right. And I learned that if they misbehave it has very fatal consequences on the overall code selection process. Now it works, but I still don't quite understand their overall semantics and don't feel very confident about them. And I guess I'm not the only one. Therefore, I would kindly ask to provide here on the developers list and may be even in the docs a clear explanation of these concepts, giving guidelines on their usage and probably a small, but understandable example making use of them. I think such a description would make creation of new backends much easier and faster.

Basically, flag operands are a hack used to handle resources that are not accurately modeled in the scheduler (e.g. condition codes, explicit register assignments, etc). The basic idea of the flag operand is that they require the scheduler to keep the "flagged" nodes stuck together in the output machine instructions.

If you have a specific question, I'm more than happy to answer it,

-Chris

Hi,

My target supports only f64 at the moment.
Question: How can I tell LLVM that float is the same as double on my
target? May be by assigning the same register class to both MVT::f32

?> and MVT::f64?

Just don't assign a register class for the f32 type. This is what the
X86 backend does when it is in "floating point stack mode". This will

implicitly promote everything to f64.

OK. This is even easier than I expected :slight_smile:

> 1. Since these FP emulation functions takes operands on registers
and
> produce operands on registers without any further side-effects,
they
> look pretty much like real instructions. Thus I have the idea to
> represent them in the tblgen instruction descriptions like
> pseudo-instructions, where constraints define which concrete
physical
> %dX registers are to use. This would enfore correct register
> allocation.
>
> For example:
> def FSUB64: I<0x11, (ops), "fsub64", [(set d0, (fsub d0, d1))]>,
> Imp<[d0,d1],[d0,d1]>; // Uses d0, d1 and defines d0,d1
>
> This seems to work, at least on simple test files.

That should be a robust solution.

> But I would also need a way to convert such a FSUB64
pseudo-instruction
> into the assembler function call, e.g. "call __fsub64". At the
moment I
> don't quite understand at which stage and how I should do it
(lowering,
> selection, combining??? ). What would be the easiest way to map it
to
> such a call instruction?

Why not just make the asm string be "call __fsub64"?

Well, of course it would be the best solution. But the interesting part
is that I need to generate the machine code directly because for
different reasons use of a system assembler is not an option. As a
result, I need to do this conversion in the target backend and later
generate object code directly. But when and how this conversion "fsub64
insn -> call __fsub64" insn should be done? What is your advice?

> One issue with the described approach is a pretty inefficient code
> resulting after the register allocation. For example, there are a
lot
> of instructions of the form "mov %d0, %d0", copying the register
into
> itself. My guess is that the following happens:

Make sure to implement TargetInstrInfo::isMoveInstr. This will allow
the coallescer to eliminate these.

Very good point. Now I see how to improve this part.

> Does this idea of representing the emulated FP operation calls as
> instructions as described above make some sense? Or do you see
easier
> or more useful ways to do it?

That is a reasonable way to do it. Another reasonable way would be
to lower them in the instruction selector itself though the use of
custom expanders. In practice, using instructions with "call foo"

in > them instead of lowering to calls may be simpler.

Hmm, let me see. Just to check that I understand your proposal
correctly:
You mean I don't need to define any FP operations as machine
instructions at all. Instead, I basically tell that I will expand all
FP operations myself and then I simply expand them into the following
sequence of instructions:
  mov arg1, %d0 // enfore register constraint
  mov arg2, %d1 // enfore register constraint
  call __fsub64

Is it correct understanding? If yes, how do I explain that arguments
are to be passed on the concrete physical registers like %d0 and %d1
and result comes on %d0? Do I need to allocate virtual regs for them
and pre-assign physical regs somehow?

Or my be I have to define a new calling convention that would enforce
it?
Actually, how can this be done with LLVM? I mean, if I want to
introduce a new calling convention, what do I need to do in backend to
define and register it? Is it required to change the frontend to make
it visible at the source program level?

Also, if you *know*
that these calls don't clobber the normal set of callee clobbered
registers, using the asm string is the right way to go.

> 2. In reality, the processor has only 32bit regs. Therefore, any

f64 value should be mapped to two 32bit registers. What is the best
way to achieve it? I guess this is a well-known kind of problem.

Ah, this is trickier. :slight_smile: We have a robust solution in the integer
side, but don't allow the FP side to use it.

For the time being, I'd suggest defining an "fp register set" which
just aliases the integer register set (i.e. say that d0 overlaps
r0+r1).

OK. I almost did this way already. But I introduced two FP register
sets. One for fp32 (for the future) and one for fp64. fp32 aliases the
integer register set. fp64 aliases the fp32 register set, but not the
integer register set explicitly. I thought that aliases are transitive?
Or do I have to mention all aliases explicitly, e.g. for %d0 I need to
say [%s0,%s1,%GR0,%GR1]?

But a more interesting question is this:
The scheme above assumes that there is a "hardwired" mapping between FP
registers and concerete pairs of integer registers. In many cases this
is enough, since the emulated operations indeed expect parameters on
predefined pairs of 32bit integer registers. But when it comes to other
uses of FP registers (mainly for storing some values) there is no this
limitation that a concrete pair of integer registers should be used.
Actually, any combination of two 32bit integer registers would do. How
this can be modelled and represented to regalloc, if at all? One guess
it to define one FP reg for each possible combination of two integer
registers, which would lead to definition N*(N-1) FP registers, where N
is the number of integer registers (And I have only 8 integer regs).
But it seems to be not very elegant for my taste,or?

> So far I was thinking about introducing some pseudo f64 registers,
i.e.
> %dX used above, and working with them in the instruction
descriptions.
> And then at the later stages, probably after lowering and
selection,
> expand them into pairs of load or store operations.

If you tell the register allocator about the "aliases", it should do
the right thing for you. Take a look at how aliasing in the X86
register set is handled in X86RegisterInfo.td.

Can you elaborate a bit? Does it mean that I don't need to define fp64
loads from memory or fp64 stores to memory and reg<-reg tranfers for
64bit ops, because all that will be done automatically using pairs of
32bit instructions? So far, I had the impression I need to use fp64
regs in the instruction descriptions explicitly. But in this case
generated selected instructions operation on these 64bit regs and there
is a problem how to expand them into pairs of 32bit instructions.

-Roman

Basically, flag operands are a hack used to handle resources that are not
accurately modeled in the scheduler (e.g. condition codes, explicit
register assignments, etc). The basic idea of the flag operand is that
they require the scheduler to keep the "flagged" nodes stuck together in
the output machine instructions.

From an user point of view, flags have two different uses

1) Forcing a value to be in a particular register.

2) As a hack when a machine state is not made explicit. For example,
in the ARM backend I haven't declared the "FP status" and the "status"
registers. So the FMSTAT instruction needs a flag.

It should be possible to remove all uses of 2 by a writing a more
complete description. Maybe the uses of 1 could be abstracted with a
higher lever interface...

If you have a specific question, I'm more than happy to answer it,

-Chris

Rafael

> Basically, flag operands are a hack used to handle resources that
are not
> accurately modeled in the scheduler (e.g. condition codes, explicit
> register assignments, etc). The basic idea of the flag operand is
that
> they require the scheduler to keep the "flagged" nodes stuck
together in
> the output machine instructions.

>From an user point of view, flags have two different uses

1) Forcing a value to be in a particular register.

  How does it do it? I cannot remember that it does so explicitly. Or
do you mean that it forces the InFlag argument of an insn to be the
same register as the OutFlag of another instruction?

2) As a hack when a machine state is not made explicit. For example,
in the ARM backend I haven't declared the "FP status" and the
"status"
registers. So the FMSTAT instruction needs a flag.

  OK. Their usage as a mean to tarnsfer a status directly is more clear
now. The insn with incoming flag basically reuses the outflag of
another insn. If I understand correctly, any intruction that can change
status (usually a status register, or are there any other examples???)
should/can produce an OutFlag. And any instruction that can need a
flag, should be decalred with InFlag. What about instructions that use
status, but do not change it (like conditional jumps) - should they
also have an outFlag to indicate that they propagate a status?

  And what about such instructions like jumps or calls, where chains
are used. What is the exact meaning of the chain? What it refers to? Is
it a chain of operands? Is it a chain of instructions? How is it
propagated? What is a guideline to use it? If a new backend is to be
developed, what are the criteria to ecide that a chain should be used
for a given instruction?

For example, if we have something like:
cmp %a, %b
jmp lt, label1
jmp gt, label2
call f1

What should be propagated as In/Out flags in this example and what and
why should be propagated as chains?

It should be possible to remove all uses of 2 by a writing a more
complete description. Maybe the uses of 1 could be abstracted with a
higher lever interface...

-Roman

> That is a reasonable way to do it. Another reasonable way would be
> to lower them in the instruction selector itself though the use of
> custom expanders. In practice, using instructions with "call foo"
in > them instead of lowering to calls may be simpler.

Hmm, let me see. Just to check that I understand your proposal
correctly:
You mean I don't need to define any FP operations as machine
instructions at all. Instead, I basically tell that I will expand all
FP operations myself and then I simply expand them into the following
sequence of instructions:
  mov arg1, %d0 // enfore register constraint
  mov arg2, %d1 // enfore register constraint
  call __fsub64

Is it correct understanding? If yes, how do I explain that arguments
are to be passed on the concrete physical registers like %d0 and %d1
and result comes on %d0? Do I need to allocate virtual regs for them
and pre-assign physical regs somehow?

Or my be I have to define a new calling convention that would enforce
it?

The Alpha backend does this for division and remainder of integers.
See AlphaISelLowering.cpp:501 for the lowering to a custom call node,
then AlphaISelDAGToDAG.cpp:215 for the enforcing of the register
constraints (copy into/out of physical registers), then
AlphaInsrInfo.td:476 (JSRs) for the call instruction with special
register DEF/USE sets to match the calling convention of the library
function.

Hope that helps.

Andrew

> 1) Forcing a value to be in a particular register.

  How does it do it? I cannot remember that it does so explicitly. Or
do you mean that it forces the InFlag argument of an insn to be the
same register as the OutFlag of another instruction?

Use a flag to link a CopyToReg and the node that expects the value in
a fixed register.

> 2) As a hack when a machine state is not made explicit. For example,
> in the ARM backend I haven't declared the "FP status" and the
> "status"
> registers. So the FMSTAT instruction needs a flag.

  OK. Their usage as a mean to tarnsfer a status directly is more clear
now. The insn with incoming flag basically reuses the outflag of
another insn. If I understand correctly, any intruction that can change
status (usually a status register, or are there any other examples???)

Any status that is not explicit. In the ARM we have a floating point
status register also.

should/can produce an OutFlag. And any instruction that can need a
flag, should be decalred with InFlag. What about instructions that use
status, but do not change it (like conditional jumps) - should they
also have an outFlag to indicate that they propagate a status?

It might be useful to avoid duplicating a cmp.

  And what about such instructions like jumps or calls, where chains
are used. What is the exact meaning of the chain? What it refers to? Is
it a chain of operands? Is it a chain of instructions? How is it
propagated? What is a guideline to use it? If a new backend is to be
developed, what are the criteria to ecide that a chain should be used
for a given instruction?

A chain indicates control dependency
(http://llvm.org/docs/CodeGenerator.html). I.e. it orders side
effects.

For example, if we have something like:
cmp %a, %b
jmp lt, label1
jmp gt, label2
call f1

What should be propagated as In/Out flags in this example and what and
why should be propagated as chains?

I think that there must be a flag from "cmp" to "jmp lt" and a flag
from "jmp lt" to "jmp gt". A chain should connect "jmp gt" to "call".

-Roman

Rafael

such a call instruction?

Why not just make the asm string be "call __fsub64"?

Well, of course it would be the best solution. But the interesting part
is that I need to generate the machine code directly because for
different reasons use of a system assembler is not an option. As a

ok.

result, I need to do this conversion in the target backend and later
generate object code directly. But when and how this conversion "fsub64
insn -> call __fsub64" insn should be done? What is your advice?

I don't understand. If you are writing out the .o file directly, you already know how to encode calls... can't you just encode it as the right sort of call? What facilities are you using to emit the machine code, are you using the llvm machine code emitter generator stuff (like PPC)?

Does this idea of representing the emulated FP operation calls as
instructions as described above make some sense? Or do you see

easier

or more useful ways to do it?

That is a reasonable way to do it. Another reasonable way would be
to lower them in the instruction selector itself though the use of
custom expanders. In practice, using instructions with "call foo"

in > them instead of lowering to calls may be simpler.

Hmm, let me see. Just to check that I understand your proposal
correctly:
You mean I don't need to define any FP operations as machine
instructions at all. Instead, I basically tell that I will expand all
FP operations myself and then I simply expand them into the following
sequence of instructions:
mov arg1, %d0 // enfore register constraint
mov arg2, %d1 // enfore register constraint
call __fsub64

Is it correct understanding?

Yes, if you tell the legalizer you want to custom expand everything, you can do this. In practice, there may be ones the legalizer doesn't know how to custom expand yet, b ut that is an easy addition.

If yes, how do I explain that arguments are to be passed on the concrete physical registers like %d0 and %d1 and result comes on %d0? Do I need to allocate virtual regs for them and pre-assign physical regs somehow?

As others have pointed out, you flag copy{to/from}reg nodes to the call.

Or my be I have to define a new calling convention that would enforce
it?
Actually, how can this be done with LLVM? I mean, if I want to
introduce a new calling convention, what do I need to do in backend to
define and register it? Is it required to change the frontend to make
it visible at the source program level?

You should be able to handle this in the lowering stuff, you don't need anything complex here.

For the time being, I'd suggest defining an "fp register set" which
just aliases the integer register set (i.e. say that d0 overlaps
r0+r1).

OK. I almost did this way already. But I introduced two FP register
sets. One for fp32 (for the future) and one for fp64. fp32 aliases the
integer register set. fp64 aliases the fp32 register set, but not the
integer register set explicitly. I thought that aliases are transitive?
Or do I have to mention all aliases explicitly, e.g. for %d0 I need to
say [%s0,%s1,%GR0,%GR1]?

Depending on how you defined the aliases, they aren't necessarily transitive. I'd like at the <yourtarget>GenRegisterInfo.inc file, and see what it lists as the aliases for each reg.

But a more interesting question is this: The scheme above assumes that there is a "hardwired" mapping between FP registers and concerete pairs of integer registers. In many cases this is enough, since the emulated operations indeed expect parameters on predefined pairs of 32bit integer registers. But when it comes to other uses of FP registers (mainly for storing some values) there is no this limitation that a concrete pair of integer registers should be used. Actually, any combination of two 32bit integer registers would do. How this can be modelled and represented to regalloc, if at all? One guess it to define one FP reg for each possible combination of two integer registers, which would lead to definition N*(N-1) FP registers, where N is the number of integer registers (And I have only 8 integer regs). But it seems to be not very elegant for my taste,or?

The right way would be to expose the fact that these really are integer registers, and just use integer registers for it. This would be no problem except that the legalizer doesn't know how to convert f64 -> 2 x i32 registers. This could be added, but a simpler approach to get you running faster is to add the bogus register set.

So far I was thinking about introducing some pseudo f64 registers,

i.e.

%dX used above, and working with them in the instruction

descriptions.

And then at the later stages, probably after lowering and

selection,

expand them into pairs of load or store operations.

If you tell the register allocator about the "aliases", it should do
the right thing for you. Take a look at how aliasing in the X86
register set is handled in X86RegisterInfo.td.

Can you elaborate a bit? Does it mean that I don't need to define fp64
loads from memory or fp64 stores to memory and reg<-reg tranfers for
64bit ops, because all that will be done automatically using pairs of
32bit instructions? So far, I had the impression I need to use fp64
regs in the instruction descriptions explicitly. But in this case
generated selected instructions operation on these 64bit regs and there
is a problem how to expand them into pairs of 32bit instructions.

Oh, I'm sorry, I misunderstood. Yes, you're right, you'll have to define "f64" loads and stores and copies.

-Chris

Hi Andrew,

>>> such a call instruction?
>>
>> Why not just make the asm string be "call __fsub64"?
>
> Well, of course it would be the best solution. But the interesting
part
> is that I need to generate the machine code directly because for
> different reasons use of a system assembler is not an option. As a

ok.

> result, I need to do this conversion in the target backend and
later
> generate object code directly. But when and how this conversion
"fsub64
> insn -> call __fsub64" insn should be done? What is your advice?

I don't understand. If you are writing out the .o file directly, you

already know how to encode calls... can't you just encode it as the
right sort of call?

Yes, sure. I simply overlooked it, because it is too simple and obvious
:wink: I was thinking about doing it at a higher level, but this can be
done as well.

But I think that Andrew's approach as used in the Alpha backend will do
the job and it is rather easy to implement.

What facilities are you using to emit the machine
code, are you using the llvm machine code emitter generator stuff
(like PPC)?

At the moment, I do not emit real machine code. But I'm planning to do
it. If possible, I'll try to use the code emitter stuff of tblgen. But
I'm not sure if it can handle the insntruction encodings of my target.
This target uses variable length instruction encoding, where 2 bytes
are used for opcodes and encodings of memory references and some
registers are put between these two bytes. Therefore, the bit offsets
are not constant and depend on the type of instruction (e.g. rm, ri and
so on). Do you think it would be easy to express such encoding for
tblgen?

>>> Does this idea of representing the emulated FP operation calls as
>>> instructions as described above make some sense? Or do you see
>> easier
>>> or more useful ways to do it?
>>
>> That is a reasonable way to do it. Another reasonable way would
be
>> to lower them in the instruction selector itself though the use
of
>> custom expanders. In practice, using instructions with "call
foo"
> in > them instead of lowering to calls may be simpler.
>
> Hmm, let me see. Just to check that I understand your proposal
> correctly:
> You mean I don't need to define any FP operations as machine
> instructions at all. Instead, I basically tell that I will expand
all
> FP operations myself and then I simply expand them into the
following
> sequence of instructions:
> mov arg1, %d0 // enfore register constraint
> mov arg2, %d1 // enfore register constraint
> call __fsub64
>
> Is it correct understanding?

Yes, if you tell the legalizer you want to custom expand everything,
you
can do this. In practice, there may be ones the legalizer doesn't
know
how to custom expand yet, b ut that is an easy addition.

> If yes, how do I explain that arguments are to be passed on the
concrete
> physical registers like %d0 and %d1 and result comes on %d0? Do I
need
> to allocate virtual regs for them and pre-assign physical regs
somehow?

As others have pointed out, you flag copy{to/from}reg nodes to the
call.

OK. Andrew has explained how to do it. I'll give it a try.

> Or my be I have to define a new calling convention that would
enforce
> it?
> Actually, how can this be done with LLVM? I mean, if I want to
> introduce a new calling convention, what do I need to do in backend
to
> define and register it? Is it required to change the frontend to
make
> it visible at the source program level?

You should be able to handle this in the lowering stuff, you don't
need anything complex here.

OK, I see. But I'd like to know how to introduce a new calling
convention so that it is visible at the source level. Basically, there
are some drivers existing for this system and I'd like to be able to
call some functions defined there. But these drivers are using very
custom calling convention. I thought that declaring functions like
follows could be the most appropriate solution:

extern __MySpecialDriverAttribute int read_from_device(int devid, int
channel);

But for doing this I need to define a custom attribute or a new calling
convention. Or do you see any other opportunity?

>> For the time being, I'd suggest defining an "fp register set"
which
>> just aliases the integer register set (i.e. say that d0 overlaps
>> r0+r1).
>
> OK. I almost did this way already. But I introduced two FP register
> sets. One for fp32 (for the future) and one for fp64. fp32 aliases
the
> integer register set. fp64 aliases the fp32 register set, but not
the
> integer register set explicitly. I thought that aliases are
transitive?
> Or do I have to mention all aliases explicitly, e.g. for %d0 I need
to
> say [%s0,%s1,%GR0,%GR1]?

Depending on how you defined the aliases, they aren't necessarily
transitive. I'd like at the <yourtarget>GenRegisterInfo.inc file,
and see
what it lists as the aliases for each reg.

Done. And I looked into the tblgen code. Tarnsitivity is not ensured by
tblgen in any form, since it does not compute it. What it ensures is
the commutativity of aliases, i.e. if A aliases B, then B aliases A. I
think it would make sense if tblgen would compute a transitive closure
automatically for alias sets, because I can hardly imagine
non-transitive aliasing semantics. If you think that this makes sense,
I could probably write a patch for tblgen to do that.

> But a more interesting question is this: The scheme above assumes
that
> there is a "hardwired" mapping between FP registers and concerete
pairs
> of integer registers. In many cases this is enough, since the
emulated
> operations indeed expect parameters on predefined pairs of 32bit
integer
> registers. But when it comes to other uses of FP registers (mainly
for
> storing some values) there is no this limitation that a concrete
pair of
> integer registers should be used. Actually, any combination of two
32bit
> integer registers would do. How this can be modelled and
represented to
> regalloc, if at all? One guess it to define one FP reg for each
possible
> combination of two integer registers, which would lead to
definition
> N*(N-1) FP registers, where N is the number of integer registers
(And I
> have only 8 integer regs). But it seems to be not very elegant for
my
> taste,or?

The right way would be to expose the fact that these really are
integer registers, and just use integer registers for it.

How and where can this fact be exposed? In register set descriptions?
Or may be telling to use i32 register class when we assign the register
class to f64 values?

This
would be no problem except that the legalizer doesn't know how to
convert f64 -> 2 x i32 registers. This could be added,

Can you elaborate a bit more about how this can be added? Do you mean
that legalizer would always create two new virtual i32 registers for
each such f64 value, copy the parts of f64 into it and let the register
allocator later allocate some physical registers for it?
Would it require adaptations only in the target-specific legalizer or
do you think that some changes in the common part (in Target directory)
of the legalizer are required?

but a
simpler approach to get you running faster is to add the bogus
register set.

True. To get something that works as soon as possible, this is simpler.
But to produce a faster code, a more complex approach described above
could be a (big?) win.

- Roman

Hi,

Is it possible to dynamically define implicit defs for some
instructions?
Concretely, I'd like to define a register for a return value of a call
in a dynamic way, instead of using current static approach looking
like:
let Defs = [R0] in
def CALLimm : I<...>;

The reason for this wish is that some of the calling conventions on my
target use different sets of physical registers for their return
values. Therefore I cannot describe it by one static set of regs, as
shown above.

By looking into LLVM code, I've found how to force some arguments of
calls to be placed on required registers and how to copy the result of
the call from certain (return) registers. But if I try to copy from a
return register that is not in the static Defs set of the CALL machine
instruction, then register allocator complains that the interval for a
return value of a CALL does not exist. Which means that the register
allocator does not understand that this register is implicitly defined
by the CALL insn.

One obvious solution is to define several machine instructions for a
CALL, each defining its own set of implicitly defined registers. But it
is not very elegent in my opinion. Are there any other ways to achieve
the same result? May be it can be solved simpler?

Thanks,
-Roman

Hi,

Is it possible to dynamically define implicit defs for some
instructions?
And if possible, already in the target lowering pass, e.g. in
LowerCALL()?

Concretely, I'd like to define a register for a return value of a call
in a dynamic way, instead of using current static approach looking
like:
let Defs = [R0] in
def CALLimm : I<...>;

The reason for this wish is that some of the calling conventions on my
target use different sets of physical registers for their return
values. Therefore I cannot describe it by one static set of regs, as
shown above.

By looking into the LLVM code, in particular inside LowerCALL() and the
like, I've found how to force some arguments of calls to be placed on
required registers and how to copy the result of the call from certain
(return) registers. But if I try to copy from a return register that is
not in the static Defs set of the CALL machine instruction, then
register allocator complains that the interval for a return value of a
CALL does not exist. Which means that the register allocator does not
understand that this register is implicitly defined by the CALL insn.

One obvious solution is to define several machine instructions for a
CALL, each defining its own set of implicitly defined registers. But it
is not very elegent in my opinion. Are there any other ways to achieve
the same result? May be it can be solved simpler?

Thanks,
-Roman

Is it possible to dynamically define implicit defs for some
instructions?

Yes! This is what explicit operands are :). Specifically, if you want to vary on a per-opcode basis what registers are used/def'd by the instruction, you can just add those registers as explicit use/def operands in the machine instruction with the physical registers directly filled in.

The reason for this wish is that some of the calling conventions on my
target use different sets of physical registers for their return
values. Therefore I cannot describe it by one static set of regs, as
shown above.

Right.

One obvious solution is to define several machine instructions for a
CALL, each defining its own set of implicitly defined registers. But it
is not very elegent in my opinion. Are there any other ways to achieve
the same result? May be it can be solved simpler?

This is another solution which works great if there are only a few variants.

-Chris

Hi Chris,

Thanks for your response.

> Is it possible to dynamically define implicit defs for some
> instructions?

Yes! This is what explicit operands are :). Specifically, if you
want to
vary on a per-opcode basis what registers are used/def'd by the
instruction, you can just add those registers as explicit use/def
operands
in the machine instruction with the physical registers directly
filled in.

OK. I have not explained in my first email about DEFs everything I
wanted, the second one was a bit more specific :wink: Of course, I realize
that I can add explicit DEFs to the machine instruction. But is it
possible to do it at the higher level, e.g. in the ISelLowering passes,
where no machine insns are generated yet. Basically, it is currently
possible at this level to do the mapping of virtual registers for
parameters to the physical registers, and this can be done without any
problems. It is usually done in LowerCALL or something like that. I'd
like to do the same for return registers at the same place, to keep all
these related things together and because of certain symmetry of goals.
But if I would do it as you propose, I'll need to put somewhere else,
where we can operate at the MI level, right? So the question is
actually about possibility of doing it in LowerCALL, where MIs are not
generated yet.
  
P.S. Chris, have you seen this mail from me on the mailing list?
http://lists.cs.uiuc.edu/pipermail/llvmdev/2006-October/006937.html
There was no reply, so I'm not sure if it is because there are no
comments or may be because it was overseen.

Is it possible to dynamically define implicit defs for some
instructions?

Yes! This is what explicit operands are :). Specifically, if you
want to
vary on a per-opcode basis what registers are used/def'd by the
instruction, you can just add those registers as explicit use/def
operands
in the machine instruction with the physical registers directly
filled in.

OK. I have not explained in my first email about DEFs everything I
wanted, the second one was a bit more specific :wink: Of course, I realize
that I can add explicit DEFs to the machine instruction. But is it
possible to do it at the higher level, e.g. in the ISelLowering passes,
where no machine insns are generated yet. Basically, it is currently
possible at this level to do the mapping of virtual registers for
parameters to the physical registers, and this can be done without any
problems. It is usually done in LowerCALL or something like that. I'd
like to do the same for return registers at the same place, to keep all
these related things together and because of certain symmetry of goals.
But if I would do it as you propose, I'll need to put somewhere else,
where we can operate at the MI level, right? So the question is
actually about possibility of doing it in LowerCALL, where MIs are not
generated yet.

This is certainly possible, but requires some C++ code. I assume you have something like the PPC backend right now. If so, you have this in your lowering code:

   // Add argument registers to the end of the list so that they are known live
   // into the call.
   for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i)
     Ops.push_back(DAG.getRegister(RegsToPass[i].first,
                                   RegsToPass[i].second.getValueType()));
   if (InFlag.Val)
     Ops.push_back(InFlag);
   Chain = DAG.getNode(CallOpc, NodeTys, &Ops[0], Ops.size());

This creates a call node with a list of input registers, these are marked as uses. In the PPC backend, this is matched with this pattern:

...
   def BLA : IForm<18, 1, 1, (ops aaddr:$func, variable_ops),
                             "bla $func", BrB, [(PPCcall (i32 imm:$func))]>;
...

The "variable_ops" part of the operation list causes all those registers to be added to the node when the isel turns the isel graph into the sched graph. Unfortunately, this will not work with extra definitions (call clobbered regs in your case).

Perhaps the easiest way to handle this whole thing is to add an integer operand to the DAG call node that indicates the calling convention used. You can then use a 'usesCustomDAGSchedInserter' inserter to create the machine instr any way you like based on this information. See the PPC backend and how it inserts SELECT_CC_I4, for example. In your case, you would result in one machine instr (instead of the many SELECT_CC_I4 produces).

P.S. Chris, have you seen this mail from me on the mailing list?
http://lists.cs.uiuc.edu/pipermail/llvmdev/2006-October/006937.html
There was no reply, so I'm not sure if it is because there are no
comments or may be because it was overseen.

I missed it, I'll respond now.

-Chris

I don't understand. If you are writing out the .o file directly, you
already know how to encode calls... can't you just encode it as the
right sort of call?

Yes, sure. I simply overlooked it, because it is too simple and obvious
:wink: I was thinking about doing it at a higher level, but this can be
done as well.
But I think that Andrew's approach as used in the Alpha backend will do
the job and it is rather easy to implement.

ok

What facilities are you using to emit the machine
code, are you using the llvm machine code emitter generator stuff
(like PPC)?

At the moment, I do not emit real machine code. But I'm planning to do
it. If possible, I'll try to use the code emitter stuff of tblgen. But

ok.

I'm not sure if it can handle the insntruction encodings of my target.
This target uses variable length instruction encoding, where 2 bytes
are used for opcodes and encodings of memory references and some
registers are put between these two bytes. Therefore, the bit offsets
are not constant and depend on the type of instruction (e.g. rm, ri and
so on). Do you think it would be easy to express such encoding for
tblgen?

No. Unfortunately, tblgen can only handle targets with 32-bit wide instruction words at the moment (alpha, ppc, sparc, etc). You'll have to write a custom code emitter like the X86 backend has.

You should be able to handle this in the lowering stuff, you don't
need anything complex here.

OK, I see. But I'd like to know how to introduce a new calling
convention so that it is visible at the source level. Basically, there
are some drivers existing for this system and I'd like to be able to
call some functions defined there. But these drivers are using very
custom calling convention. I thought that declaring functions like
follows could be the most appropriate solution:

extern __MySpecialDriverAttribute int read_from_device(int devid, int
channel);

But for doing this I need to define a custom attribute or a new calling
convention. Or do you see any other opportunity?

I would suggest following the model of stdcall/fastcall in the windows x86 world. Specifically, this would require modifying llvm-gcc to recognize attributes on your target, then passing that on to llvm through its calling convention representation. This will let you define stuff like:

void foo() __attribute__(((mycall))) {
}

You can then use a #define to hide the attribute syntax.

Depending on how you defined the aliases, they aren't necessarily
transitive. I'd like at the <yourtarget>GenRegisterInfo.inc file,
and see
what it lists as the aliases for each reg.

Done. And I looked into the tblgen code. Tarnsitivity is not ensured by
tblgen in any form, since it does not compute it. What it ensures is
the commutativity of aliases, i.e. if A aliases B, then B aliases A. I
think it would make sense if tblgen would compute a transitive closure
automatically for alias sets, because I can hardly imagine
non-transitive aliasing semantics. If you think that this makes sense,
I could probably write a patch for tblgen to do that.

Be careful. On X86, "AL aliases AX" and "AH aliases AX" but "AL does not alias AH".

The right way would be to expose the fact that these really are
integer registers, and just use integer registers for it.

How and where can this fact be exposed? In register set descriptions?
Or may be telling to use i32 register class when we assign the register
class to f64 values?

It currently cannot be done without changes to the legalize pass.

This
would be no problem except that the legalizer doesn't know how to
convert f64 -> 2 x i32 registers. This could be added,

Can you elaborate a bit more about how this can be added? Do you mean
that legalizer would always create two new virtual i32 registers for
each such f64 value, copy the parts of f64 into it and let the register
allocator later allocate some physical registers for it?

Yes.

Would it require adaptations only in the target-specific legalizer or
do you think that some changes in the common part (in Target directory)
of the legalizer are required?

The target independent parts would need to know how to do this. Specifically it would need to know how to "expand" f64 to 2x i32.

but a simpler approach to get you running faster is to add the bogus register set.

True. To get something that works as soon as possible, this is simpler.
But to produce a faster code, a more complex approach described above
could be a (big?) win.

That is possible. I would try implementing the simple approach and seeing what cases are being missed.

-Chris

Hi Chris,

> OK. I have not explained in my first email about DEFs everything I
> wanted, the second one was a bit more specific :wink: Of course, I
realize
> that I can add explicit DEFs to the machine instruction. But is it
> possible to do it at the higher level, e.g. in the ISelLowering
passes,
> where no machine insns are generated yet. Basically, it is
currently
> possible at this level to do the mapping of virtual registers for
> parameters to the physical registers, and this can be done without
any
> problems. It is usually done in LowerCALL or something like that.
I'd
> like to do the same for return registers at the same place, to keep
all
> these related things together and because of certain symmetry of
goals.
> But if I would do it as you propose, I'll need to put somewhere
else,
> where we can operate at the MI level, right? So the question is
> actually about possibility of doing it in LowerCALL, where MIs are
not
> generated yet.

This is certainly possible, but requires some C++ code. I assume you
have
something like the PPC backend right now. If so, you have this in
your
lowering code:

   // Add argument registers to the end of the list so that they are
known live
   // into the call.
   for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i)
     Ops.push_back(DAG.getRegister(RegsToPass[i].first,
                                  
RegsToPass[i].second.getValueType()));
   if (InFlag.Val)
     Ops.push_back(InFlag);
   Chain = DAG.getNode(CallOpc, NodeTys, &Ops[0], Ops.size());

This creates a call node with a list of input registers, these are
marked
as uses. In the PPC backend, this is matched with this pattern:

...
   def BLA : IForm<18, 1, 1, (ops aaddr:$func, variable_ops),
                             "bla $func", BrB, [(PPCcall (i32
imm:$func))]>;
...

The "variable_ops" part of the operation list causes all those
registers
to be added to the node when the isel turns the isel graph into the
sched
graph. Unfortunately, this will not work with extra definitions
(call
clobbered regs in your case).

Perhaps the easiest way to handle this whole thing is to add an
integer operand to the DAG call node that indicates the calling
convention used.
You can then use a 'usesCustomDAGSchedInserter' inserter to create
the machine instr any way you like based on this information.
See the PPC backend and how it inserts SELECT_CC_I4, for example.
In your case, you would result in one machine instr (instead of the
many SELECT_CC_I4 produces).

Thanks for the hint. I did it as you describe. Well, almost. Instead of
using the calling convention as an additional argument, I directly put
the list of DEF regs into the list of operands.

Taking the code above as an example, I do something like:

// Put a list registers implicitly DEFined by this instruction
Ops.push_back(DAG.gettargetConstant(isDefReg1, MVT::i32);
...
Ops.push_back(DAG.gettargetConstant(isDefRegN, MVT::i32);
// Put a delimiter indicating the end of the list
Ops.push_back(DAG.gettargetConstant(specialDelimiterValue, MVT::i32);
// Here comes the usual ops.push_back(DAG.getRegister(...)) for USEed
// registers

And then when processing the usesCustomDAGSchedInserter-marked CALL
insn, I basically create a CALL instruction by iterating over this list
and adding all of the regs from the DEF list as isDef operands of the
machine instruction. This works without any problems and enforces the
required register constraints.

Having done that, I'll concentrate now on the mapping of 64bit FP regs
2x32bit integer regs and register allocation of them.

Thanks again,
Roman