tblgen multiclasses

Hi,

I still have some questions about FP emulation for my embedded target.

To recap a bit:
My target only has integer registers and no hardware support for FP. FP
is supported only via emulation. Only f64 is supported. All FP
operations should be implemented to use i32 registers.

Based on the fruitful discussions on this list I was already able to
implement mapping of the FP operations to special library calls.

I also implemented a simple version of the register mapping, where I
introduced a bogus f64 register set and used it during the code
selection and register allocation. After register allocation a special
post-RA pass just converts instructions using f64 operands into
multiple instructions using i32 operands. This seems to work, but has
one disadvantage. Since it is a post-RA pass, it uses a fixed mapping
between physical f64 registers and a pair of physical i32 registers,
e.g. D0:f64 -> i1:i32 x i2:i32. This leads to a non-optimal register
allocation. But anyway, I have an almost working compiler with integer
and FP support for my rather specific embedded target! This shows a
very impressive quality of the LLVM compiler.

Another opportunity, as Chris indicated in his previous mails (see
below), would be to expose the fact that f64 regs really are integer
registers.

>> The right way would be to expose the fact that these really are
>> integer registers, and just use integer registers for it.
>
> How and where can this fact be exposed? In register set
descriptions?
> Or may be telling to use i32 register class when we assign the
register
> class to f64 values?

It currently cannot be done without changes to the legalize pass.

>> This
>> would be no problem except that the legalize doesn't know how to
>> convert f64 -> 2 x i32 registers. This could be added,
>
> Can you elaborate a bit more about how this can be added? Do you
mean
> that legalize would always create two new virtual i32 registers
for
> each such f64 value, copy the parts of f64 into it and let the
register
> allocator later allocate some physical registers for it?

Yes.

> Would it require adaptations only in the target-specific legalize
> or do you think that some changes in the common part (in Target
directory) of the legalize are required?

The target independent parts would need to know how to do this.
Specifically it would need to know how to "expand" f64 to 2x i32.

I tried to implement it, but I still have some troubles with that.
In my understanding, the code in TargetLowering.cpp and also in
SelectioNDAGISel.cpp should be altered. I tried for example to modify
the computeRegisterProperties to tell that f64 is actually represented
as 2xi32. I also added some code into the function
FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32
regs for f64 values. But it does not seem to help.

From what I can see, the problem is that emitNode() still looks at the

machine instruction descriptions. And since I still have some insns for
load and stores of f64 values (do I still need to have them, if I do
the mapping?), it basically allocates f64 registers without even being
affected in any form by the modifications described above, because it
does not use any information prepared there.

So, I'm a bit lost now. I don't quite understand what should be done to
explain the CodeGen how to map virtual f64 regs to the pairs of virtual
i32 regs? May be I'm doing something wrong? May be I need to explain
the codegen that f64 is a packed type consisting of 2xi32 or a vector
of i32??? Chris could you elaborate a bit more about this? What needs
to be explained to the codegen/legalizer and where?

Another thing I have in mind is:
It looks like the easiest way at all would be to have a special pass
after the assignment of virtual registers, but before a real register
allocation pass. This pass could define the mapping for each virtual
f64 register and then rewrite the machine insns to use the
corresponding i32 regs. The problem with this approach is that I don't
quite understand how to insert such a pass before physical register
allocation pass and if it can be done at all. Also, it worries me a bit
that it would eventually require modifications of PHI-nodes and
introduction of new ones in those cases, where f64 regs were used in
the PHI nodes. Now a pair of PHI-nodes would be required for that.
Since I don't have experience with PHI-nodes handling in LLVM, I'd like
to avoid this complexity, unless you say it is actually pretty easy to
do. What do you think of this approach? Does it make sense? Is it
easier than the previous one, which requires changes in the code
selector/legalizer?

Thanks,
  Roman

I still have some questions about FP emulation for my embedded target.
To recap a bit:
My target only has integer registers and no hardware support for FP. FP
is supported only via emulation. Only f64 is supported. All FP
operations should be implemented to use i32 registers.

ok

allocation. But anyway, I have an almost working compiler with integer
and FP support for my rather specific embedded target! This shows a
very impressive quality of the LLVM compiler.

Great!

Another opportunity, as Chris indicated in his previous mails (see
below), would be to expose the fact that f64 regs really are integer
registers.

Right.

The target independent parts would need to know how to do this.
Specifically it would need to know how to "expand" f64 to 2x i32.

I tried to implement it, but I still have some troubles with that.
In my understanding, the code in TargetLowering.cpp and also in
SelectioNDAGISel.cpp should be altered. I tried for example to modify
the computeRegisterProperties to tell that f64 is actually represented
as 2xi32.

Good, this is the first step. Your goal is to get TLI.getTypeAction(MVT::f64) to return 'expand' and to get TLI.getTypeToTransformTo(f64) to return i32.

I also added some code into the function
FunctionLoweringInfo::CreateRegForValue for allocating this pair of i32
regs for f64 values. But it does not seem to help.

Ok.

From what I can see, the problem is that emitNode() still looks at the
machine instruction descriptions. And since I still have some insns for
load and stores of f64 values (do I still need to have them, if I do
the mapping?), it basically allocates f64 registers without even being
affected in any form by the modifications described above, because it
does not use any information prepared there.

If you get here, something is wrong. The code generator basically works like this:

1. Convert LLVM to naive dag
2. Optimize dag
3. Legalize
4. Optimize
5. Select
6. Schedule and emit.

If you properly mark f64 as expand, f64 values should only exist in stages 1/2/3. After legalization, they should be gone: only legal types (i32) should exist in the dag.

So, I'm a bit lost now. I don't quite understand what should be done to
explain the CodeGen how to map virtual f64 regs to the pairs of virtual
i32 regs? May be I'm doing something wrong? May be I need to explain
the codegen that f64 is a packed type consisting of 2xi32 or a vector
of i32??? Chris could you elaborate a bit more about this? What needs
to be explained to the codegen/legalizer and where?

The first step is to get somethign simple like this working:

void %foo(double* %P) {
   store double 0.0, double* %P
   ret void
}

This will require the legalizer to turn the double 0.0 into two integer zeros, and the store into two integer stores.

Another thing I have in mind is:
It looks like the easiest way at all would be to have a special pass
after the assignment of virtual registers, but before a real register
allocation pass. This pass could define the mapping for each virtual
f64 register and then rewrite the machine insns to use the
corresponding i32 regs. The problem with this approach is that I don't
quite understand how to insert such a pass before physical register
allocation pass and if it can be done at all. Also, it worries me a bit
that it would eventually require modifications of PHI-nodes and
introduction of new ones in those cases, where f64 regs were used in
the PHI nodes. Now a pair of PHI-nodes would be required for that.
Since I don't have experience with PHI-nodes handling in LLVM, I'd like
to avoid this complexity, unless you say it is actually pretty easy to
do. What do you think of this approach? Does it make sense? Is it
easier than the previous one, which requires changes in the code
selector/legalizer?

The best approach is to make the legalizer do this transformation.

-Chris

Hi Chris,

Thank you very much for your answer! It helps me to move in the right
direction. When you explain it, it sounds rather easy. But I still
have some tricky issues. This is either because I'm not so familiar
with LLVM or because it is a bit underestimated how much LLVM
legalizer/expander relay on expandable types to be integers (see my
explanations below).

> Another opportunity, as Chris indicated in his previous mails (see
> below), would be to expose the fact that f64 regs really are
integer
> registers.

Right.

>> The target independent parts would need to know how to do this.
>> Specifically it would need to know how to "expand" f64 to 2x i32.
>
> I tried to implement it, but I still have some troubles with that.
> In my understanding, the code in TargetLowering.cpp and also in
> SelectioNDAGISel.cpp should be altered. I tried for example to
modify
> the computeRegisterProperties to tell that f64 is actually
represented
> as 2xi32.

Good, this is the first step. Your goal is to get
TLI.getTypeAction(MVT::f64) to return 'expand' and to get
TLI.getTypeToTransformTo(f64) to return i32.

After I sent a mail to the mailing list, I figured out that I need to
do this, so I added exactly what you describe and it helped.

> I also added some code into the function
> FunctionLoweringInfo::CreateRegForValue for allocating this pair of
i32 regs for f64 values. But it does not seem to help.

Ok.

> From what I can see, the problem is that emitNode() still looks at
> the machine instruction descriptions. And since I still have some

insns for
load and stores of f64 values (do I still need to have them, if I
do the mapping?), it basically allocates f64 registers without even
being affected in any form by the modifications described above,
because it does not use any information prepared there.

OK. After the changes mentioned above, the pairs of virtual i32 regs
are used in most situations. And it does it exactly as it was intended
to do in most cases.

If you get here, something is wrong. The code generator basically
works
like this:

1. Convert LLVM to naive dag
2. Optimize dag
3. Legalize
4. Optimize
5. Select
6. Schedule and emit.

If you properly mark f64 as expand, f64 values should only exist in
stages 1/2/3. After legalization, they should be gone: only legal
types(i32) should exist in the dag.

> So, I'm a bit lost now. I don't quite understand what should be
done to
> explain the CodeGen how to map virtual f64 regs to the pairs of
virtual
> i32 regs? May be I'm doing something wrong? May be I need to
explain
> the codegen that f64 is a packed type consisting of 2xi32 or a
vector
> of i32??? Chris could you elaborate a bit more about this? What
needs
> to be explained to the codegen/legalizer and where?

The first step is to get somethign simple like this working:

void %foo(double* %P) {
   store double 0.0, double* %P
   ret void
}

This will require the legalizer to turn the double 0.0 into two
integer zeros, and the store into two integer stores.

Sample code like this, i.e. simple stores, loads or even some
arithmetic operations works fine now. No problems.

But there are big issues with correct legalization and expansion, i.e.
with ExpandOp() and LegalizeOp(). I don't know how to explain it
properly, but basically these functions assume at many places that in
the case where an MVT requires more than one register this MVT is
always an integer type. There are some assertions checking for it, and
there are quite some places where it is assumed. More over, since
getTypeAction(MVT::f64) now returnes Expand, the legalizer tries to
expand too much and BTW it does not check for getOperationAction or
something like that in this case. For example, it tries to expand also
all the operations like ADD, SUB, etc into operations on the halves of
f64 (probably because it thinks it is an integer :wink: even though for
such operations I do not need any expanstion, since they are
implemented as library functions.

For most of the places assuming the integer type to be expanded, I
inserted some code to explicitly check if MVT::f64 is being expanded.
This worked for most of the cases, but not for all. In particular I
cannot solve the SELECT_CC on f64 expansion. It generates a target
specific SELECT_CC node that correctly contains pairs of i32 for the
TrueValue and FalseValue. But when the value of this operation is used
later, then expander tries to expand the result of it. And it cannot do
it, since it seems to have a problem with EXTRACT_ELEMENT applied to
SELECT_CC mentioned above. The problem is probably that it cannot
extract the corresponding halves from the target specific SELECT_CC
node (and it can do it without problems for usual integer-based
ISD::SELECT_CC nodes). At this place I got stuck, since I do not see
how I can overcome it.

Overall, changing the lagalizer to support the expansion of tge
MVT::f64 proves to be more complicated as I initially expected. And it
also seems to be a bit of overkill. Therefore I was thinking about the
special pass after code selection, but before register allocation.
After all, I just want to do a transformation on all instructions that
read or write from/into virtual f64 regs.

  load/store vregf64, val
->
  load/store vregi32_1, val_low
  load/store vregi32_2, val_high
  
My subjective feeling is that is can be done easier in a separate pass
rather then chaning the legalizer all over the place in a rather
non-elegant way.

> Another thing I have in mind is:
> It looks like the easiest way at all would be to have a special
pass after the assignment of virtual registers, but before a real
register allocation pass. This pass could define the mapping for each
virtual
> f64 register and then rewrite the machine insns to use the
> corresponding i32 regs. The problem with this approach is that I
don't quite understand how to insert such a pass before physical
register allocation pass and if it can be done at all. Also, it
worries me a bit
> that it would eventually require modifications of PHI-nodes and
> introduction of new ones in those cases, where f64 regs were used
in the PHI nodes. Now a pair of PHI-nodes would be required for that.
> Since I don't have experience with PHI-nodes handling in LLVM, I'd
like to avoid this complexity, unless you say it is actually pretty
easy to
> do. What do you think of this approach? Does it make sense? Is it
> easier than the previous one, which requires changes in the code
> selector/legalizer?

The best approach is to make the legalizer do this transformation.

I believe, since you know it certainly better than me. But I
experienced quite some problems, as I described above. Now, if we would
assume for a second that this approach with a separate pass makes some
sense. I'm just curious how I could insert a new pass after the code
selection, but before any other passes including regiser allocation? I
have not found any easy way to do it yet. For post-RA pass it is very
easy and supported, but for pre-RA or post-code-selection - it is non
obvious.
I was thinking about to possibilities:
1) Mark all f64 load/store/move target insns as
usesCustomDAGSchedInserter = 1 and then intercept in the
InsertAtEndOfBasicBlock() their expansion. This should be fine, since
at this stage machine insns are still using the virtual registers and
it happens before register allocation. Then this function could expand
them into pairs of insns operating on i32 virtual regs. The problem
here is that InsertAtEndOfBasicBlock() is called not for all of the
emitted insns. Ironically enough, it is not called for ISD::CopyToReg
and ISD::CopyFromReg, which are the load and store insns. BTW, is it
intended or was it simply overseen? What would happen, if instructions
produced for these nodes are marked usesCustomDAGSchedInserter?
Shouldn't they be passed then to the custom target MI expander as it is
done for all other instructions? Would it make sense to always check
during the insertion of an MI into a BB if it is a
usesCustomDAGSchedInserter marked MI and if yes call a target-specific
expander for it?

2) Introduce a fake register allocation pass and make it require an
f64toi32 pass as a pre-requisite. And basically call an existing
register allocator like in this code?

namespace {

  static RegisterRegAlloc
    TargetXRegAlloc("targetx", " targetx register allocator",
                       createTargetXRegisterAllocator);

  struct VISIBILITY_HIDDEN RA : public MachineFunctionPass {
  private:

    MachineFunctionPass *RealRegAlloc;

  public:

    RA()
    {
      // Instantiate a real allocator to do the job!
      RealRegAlloc =
(MachineFunctionPass*)(createLinearScanRegisterAllocator());
    }

    virtual const char* getPassName() const {
      return "TargetX Register Allocator";
    }

    virtual void getAnalysisUsage(AnalysisUsage &AU) const {

        // Add target specific pass as a requirement
        AU.addRequired<f64toi32pass>();

        // Reuse all requirements from the real allocator
        RealRegAlloc->getAnalysisUsage(AU);
    }

    /// runOnMachineFunction - register allocate the whole function
    bool runOnMachineFunction(MachineFunction&);
  };
}

bool RA::runOnMachineFunction(MachineFunction &fn) {
  return RealRegAlloc->runOnMachineFunction(fn);
}

FunctionPass* llvm::createTigerRegisterAllocator() {
  return new RA();
}

Looks fine and pretty obvious, but it does not work. When
runOnMachineFunction is invoked, I get the error, which I don't quite
understand. Why do I get it at all?

AnalysisType& llvm::Pass::getAnalysis() const [with AnalysisType =
llvm::LiveIntervals]: Assertion `Resolver && "Pass has not been
inserted into a PassManager object!"' failed.

OK. These are my current problems with f64 to 2xi32 conversion. So far
I cannot solve it using any of the mentioned methods :frowning:

Any further help and advice are very welcome!

Thanks,
Roman

P.S. A minor off-topic question: Is it possible to explain the LLVM
backend that "float" is the same type as "double" on my target? I
managed to explain it for immediates and also told to promote f32 to
f64. But it does not work for float variables or parameters, because
LLVM considers them to be float in any case and to have a 32bit
representation in memory. Or do I need to handle this equivalence in
the front-end only?

The first step is to get somethign simple like this working:

void %foo(double* %P) {
   store double 0.0, double* %P
   ret void
}

This will require the legalizer to turn the double 0.0 into two
integer zeros, and the store into two integer stores.

Sample code like this, i.e. simple stores, loads or even some
arithmetic operations works fine now. No problems.

Great.

But there are big issues with correct legalization and expansion, i.e.
with ExpandOp() and LegalizeOp(). I don't know how to explain it
properly, but basically these functions assume at many places that in
the case where an MVT requires more than one register this MVT is
always an integer type. There are some assertions checking for it, and
there are quite some places where it is assumed. More over, since
getTypeAction(MVT::f64) now returnes Expand, the legalizer tries to
expand too much and BTW it does not check for getOperationAction or
something like that in this case. For example, it tries to expand also
all the operations like ADD, SUB, etc into operations on the halves of
f64 (probably because it thinks it is an integer :wink: even though for
such operations I do not need any expanstion, since they are
implemented as library functions.

Right. These places will have to be updated, and code to handle the new cases needs to be added.

For most of the places assuming the integer type to be expanded, I
inserted some code to explicitly check if MVT::f64 is being expanded.
This worked for most of the cases, but not for all. In particular I
cannot solve the SELECT_CC on f64 expansion. It generates a target
specific SELECT_CC node that correctly contains pairs of i32 for the
TrueValue and FalseValue. But when the value of this operation is used
later, then expander tries to expand the result of it. And it cannot do
it, since it seems to have a problem with EXTRACT_ELEMENT applied to
SELECT_CC mentioned above. The problem is probably that it cannot
extract the corresponding halves from the target specific SELECT_CC
node (and it can do it without problems for usual integer-based
ISD::SELECT_CC nodes). At this place I got stuck, since I do not see
how I can overcome it.

I don't follow, can you try explaining it and including the relevant code that isn't working for you?

Overall, changing the lagalizer to support the expansion of tge
MVT::f64 proves to be more complicated as I initially expected. And it
also seems to be a bit of overkill. Therefore I was thinking about the
special pass after code selection, but before register allocation.

Ok.

After all, I just want to do a transformation on all instructions that
read or write from/into virtual f64 regs.

load/store vregf64, val
->
load/store vregi32_1, val_low
load/store vregi32_2, val_high

My subjective feeling is that is can be done easier in a separate pass
rather then chaning the legalizer all over the place in a rather
non-elegant way.

You could do this, but it's not the "right" way to go, for the same reason that expanding i64 -> 2x i32 after isel isn't the right thing to do. Doing this 'late' requires lots of bogus instructions/register files to be added to the target, it doesn't allow the dag combiner to optimize and eliminate redundant expressions (for example, 'store double 0.0' should only materialize 0 into one 32-bit register, not two zeros), and it generally isn't in the spirit of the current infrastructure.

I realize that you may not be interested in getting the best possible solution (time pressures may be more important), but realize that you will
lose performance if you do a pass after isel time to handle this.

The best approach is to make the legalizer do this transformation.

I believe, since you know it certainly better than me. But I
experienced quite some problems, as I described above. Now, if we would
assume for a second that this approach with a separate pass makes some
sense. I'm just curious how I could insert a new pass after the code
selection, but before any other passes including regiser allocation? I

This should be workable.

have not found any easy way to do it yet. For post-RA pass it is very
easy and supported, but for pre-RA or post-code-selection - it is non
obvious.

I suggest a third approach:

1. Add an f64 register class to the target.
2. Add FP pseudo instructions that are produced by the isel, these use the
    f64 register class.
3. Write a machine function pass that runs before the RA that translates
    these instructions into libcalls or other integer ops. This would
    lower the f64 pseudo regs into 2x i32 pseudo regs. The real RA should
    never see the bogus f64 regs.

P.S. A minor off-topic question: Is it possible to explain the LLVM
backend that "float" is the same type as "double" on my target? I
managed to explain it for immediates and also told to promote f32 to
f64. But it does not work for float variables or parameters, because
LLVM considers them to be float in any case and to have a 32bit
representation in memory. Or do I need to handle this equivalence in
the front-end only?

If you tell the code generator to promote f32 to f64, it will handle 90% of the work for you. The remaining pieces are in the lowercall/lowerarguments/lower return code, where you do need to specify how these are passed. Usually just saying they are in f64 registers should be enough.

-Chris

Chris Lattner wrote:

> P.S. A minor off-topic question: Is it possible to explain the LLVM
> backend that "float" is the same type as "double" on my target? I
> managed to explain it for immediates and also told to promote f32
> to f64. But it does not work for float variables or parameters,
> because LLVM considers them to be float in any case and to have a
> 32bit representation in memory. Or do I need to handle this

equivalence in the front-end only?

If you tell the code generator to promote f32 to f64, it will handle
90% of the work for you. The remaining pieces are in the
lowercall/lowerarguments/lower return code, where you do need to
specify how these are passed. Usually just saying they are in f64
registers should be enough.

Yes. I have done almost all of that already a while ago and it solves
almost all problems, as you say. But the problem is a code like:

float f;

float float_addition(float a, float b)
{
  return a+b;
}

which is translated by LLVM into:

target endian = little
target pointersize = 32
target triple = "i686-pc-linux-gnu"
deplibs = [ "c", "crtend" ]
%f = weak global float 0.000000e+00 ; <float*> [#uses=0]

implementation ; Functions:

float %float_addition(float %a, float %b) {
entry:
  %tmp.2 = add float %a, %b ; <float> [#uses=1]
  ret float %tmp.2
}

The global variable f is still considered to be "float" and therefore
occupies 4 bytes and handled as a 32bit float. Due to expansion from
f32 to f64, codegen tries to load it from a 32bit memory cell and then
expand into f64. But I have no 32bit FP type on my target and no 32bit
FP representation in memory. So, I really want "float" to be just an
alias for "double". "f" should be trated in the same way, as if it is a
64bit double. This was the reason, why I asked if something should be
done in the front-end to achieve this, additionally to the actions that
you describe. Or may be there is a way to have an LLVM pass that just
traverses the module and changes the types of all float variables to
double?

- Roman

Yes. I have done almost all of that already a while ago and it solves
almost all problems, as you say. But the problem is a code like:

ok

float f;
float float_addition(float a, float b) {
return a+b;
}

which is translated by LLVM into:

target endian = little
target pointersize = 32
target triple = "i686-pc-linux-gnu"
deplibs = [ "c", "crtend" ]
%f = weak global float 0.000000e+00 ; <float*> [#uses=0]

implementation ; Functions:

float %float_addition(float %a, float %b) {
entry:
  %tmp.2 = add float %a, %b ; <float> [#uses=1]
  ret float %tmp.2
}

Right.

The global variable f is still considered to be "float" and therefore
occupies 4 bytes and handled as a 32bit float. Due to expansion from
f32 to f64, codegen tries to load it from a 32bit memory cell and then
expand into f64. But I have no 32bit FP type on my target and no 32bit
FP representation in memory. So, I really want "float" to be just an
alias for "double". "f" should be trated in the same way, as if it is a
64bit double. This was the reason, why I asked if something should be
done in the front-end to achieve this, additionally to the actions that
you describe. Or may be there is a way to have an LLVM pass that just
traverses the module and changes the types of all float variables to
double?

There are two ways to do it. First, you could modify llvm-gcc for your target to say that sizeof(float) == sizeof(double). I don't recommend this strategy though.

The second way to do this is to implement an "extending load" instruction. This instruction allows your target to load a 32-bit FP value from memory and do an implicit extension to f64. You can implement this with a libcall of course.

-Chris

Hi,

have not found any easy way to do it yet. For post-RA pass it is
very easy and supported, but for pre-RA or post-code-selection - it
is non obvious.

I suggest a third approach:

[snip]

3. Write a machine function pass that runs before the RA that
translates these instructions into libcalls or other integer ops.
This would lower the f64 pseudo regs into 2x i32 pseudo regs. The
real RA should never see the bogus f64 regs.

Thanks, this is a good idea.

But I cannot figure out how to make a machine function pass run
_BEFORE_ the RA. I guess I'm missing something very obvious.

How do I enforce that a certain machine function pass runs before RA
(LLVM Linear Scan RA in my case)??? I tried to add the RA pass as a
requirement in the getAnalysisUsage() of my machine function pass, but
this does not work, since RA is not registered as a usual pass and uses
a special RA registry instead. Should my machine function pass be
explicitly added to the getAnalysisUsage() of the Linear Scan RA pass?
This would work probably, but it is not too nice, since it would change
the existing LLVM pass in a target-specific way.

  And BTW, it seems to me that currently new RA passes are not allowed
to derive from the existing ones. If it is correct, why so? Wouldn't it
be nice?

-Roman

Thanks, this is a good idea.

But I cannot figure out how to make a machine function pass run
_BEFORE_ the RA. I guess I'm missing something very obvious.

In your target's TargetMachine::addInstSelector method, add it to the pass mgr right after your isel.

And BTW, it seems to me that currently new RA passes are not allowed
to derive from the existing ones. If it is correct, why so? Wouldn't it
be nice?

I'm not sure what you mean. We don't expose linscan through a public header, but a pass in the same .cpp file could subclass it. We haven't had a need to do this yet, so we don't have the provisions to do it.

-Chris

Hi Chris,

> Thanks, this is a good idea.
>
> But I cannot figure out how to make a machine function pass run
> _BEFORE_ the RA. I guess I'm missing something very obvious.

In your target's TargetMachine::addInstSelector method, add it to the
pass mgr right after your isel.

Thanks a lot! This is exactly what I could not understand.

And BTW, it seems to me that currently new RA passes are not
allowed to derive from the existing ones. If it is correct, why so?
Wouldn't it be nice?

I'm not sure what you mean. We don't expose linscan through a public

header, but a pass in the same .cpp file could subclass it.

We haven't had a need to do this yet, so we don't have the provisions

to do it.

OK, I see. I just had the idea that it could be useful, if someone
defines a target-specific RA, which is a slight modification of an
existing one, like the linear scan RA. Let's say it just executes some
target-specific actions before and after the existing register
allocator. Then you probably don't want to put these target-specific
bits into the same file as the existing allocator. It would be much
cleaner to define in a separate source file a new RA that in some
sense "derives" from the existing RA (either using inheritance or by
having a class member that is of a known existing RA class). Such a new
RA would do some pre/post RA actions in its runOnMachineFunction()
method and delegate a real RA job to the "parent" register allocator.

Thanks again for your hint,
-Roman

Hi,

I'm trying to enable the pre-decrementing and post-incrementing based
addressing modes for my target. So far without any success :frowning:

I enabled these modes in my TargetLowering class using
setIndexedLoadAction and setIndexedStoreAction calls. I also defined
getPreIndexedAddressParts and getPostIndexedAddressParts. And I can see
that DAGCombiner::CombineToPostIndexedLoadStore is invoked. But this
function never does any replacements and very seldomly invoke
getPostIndexedAddressParts and so on, even in those situations where I
would assume it.

For example, it does not use these modes for the code like:
void test_postinc(int array, int i)
{
  array[i] = 1;
  array[i+1] = 1;
  array[i+2] = 1;
  array[i+3] = 1;
}

So, I'm wondering how good is the support for these modes in LLVM? Do I
miss something which should be implemented for my target to enable it?
It looks like only PowerPC backend tries to make use of it, but even
there it is not implemented completely.

And, BTW, are there any good test-cases or sample code (either ll or c
files) that is (or should be) translated by LLVM using these modes? I'd
like to get a better understanding about the situations where it can
happen at all. For example, is the code like above a good candidate for
it or not?

-Roman

I enabled these modes in my TargetLowering class using
setIndexedLoadAction and setIndexedStoreAction calls. I also defined
getPreIndexedAddressParts and getPostIndexedAddressParts. And I can see
that DAGCombiner::CombineToPostIndexedLoadStore is invoked. But this
function never does any replacements and very seldomly invoke
getPostIndexedAddressParts and so on, even in those situations where I
would assume it.

Ok,

For example, it does not use these modes for the code like:
void test_postinc(int array, int i)
{
  array[i] = 1;
  array[i+1] = 1;
  array[i+2] = 1;
  array[i+3] = 1;
}

See below.

So, I'm wondering how good is the support for these modes in LLVM? Do I
miss something which should be implemented for my target to enable it?
It looks like only PowerPC backend tries to make use of it, but even
there it is not implemented completely.

And, BTW, are there any good test-cases or sample code (either ll or c
files) that is (or should be) translated by LLVM using these modes? I'd
like to get a better understanding about the situations where it can
happen at all. For example, is the code like above a good candidate for
it or not?

Indexed load / store supports is not "done". Currently only PPC target makes use of it. Lots of testing and further refinement is needed.

Basically, the only cases where transformation will happen now is if the load/store address has other non-address uses. e.g.

x1 = load [base]
x2 = base + 4

As for your example, let's first example why array[i+1] = 1; array[i+2] = 1; does not currently triggers a transformation:

array[i+1] = 1
array[i+2] = 1
=>
tmp = array * 4
base = tmp + i
store 1, [base + 4]
store 1, [base + 8]

Unfortunately base+8 is not written as (base+4)+4 so the transformation does not happen. We need to add DAG combiner transformation to make that happen. If you are interested in making that happen, we would be happen to help. :slight_smile:

The reason that array[i] = 1; array[i+1] = 1; doesn't trigger a post-inc transformation is different. In theory, this should be
array[i] = 1
array[i+1] = 1
=>
tmp = array * 4
base = array * 4 + i
store 1, [base]
store 1, [base + 4]

The first store should have been transformed into a post-inc store.

The issue is in SDNode CSE. If you take a look at pre-legalizer DAG dump. You will see this:

tmp = array * 4
base1 = array * 4 + i
base2 = i + array * 4
store 1, [base1]
store 2, [base2 + 4]

So the post-inc transformation does not recognize an opportunity. The current "fix" should be to teach SDNode CSE to understand commutative property to eliminate one of the add operation. Chris can say more about the issues related.

Hope this helps.

Evan

Hi,

I just updated my LLVM sources from CVS/HEAD and rebuilt them. And I
downloaded the GCC4 frontend from the 1.9 release.

Now I cannot compile anything, since GCC frontend seems to produce BC
files that cannot be read by llvm-dis, llc and other utils.

llvm-dis shows a following message:
Bytecode formats < 7 are not longer supported. Use llvm-upgrade.
(Vers=6, Pos=9)

But since the new llvm-dis cannot disassemble, I cannot use
llvm-upgrade, since I need a way to produce an *.ll file.

What's wrong? I thought that LLVM 1.9 GCC4 frontend produces BC files
in a new format already? Or do I need to rebuild the front-end from the
CVS? May be there are pre-built images available?

-Roman

The format's changed since 1.9. You should download llvm-gcc4 source
and compile it yourself.

-bw

Hi Roman,

Hi,

I just updated my LLVM sources from CVS/HEAD and rebuilt them. And I
downloaded the GCC4 frontend from the 1.9 release.

Now I cannot compile anything, since GCC frontend seems to produce BC
files that cannot be read by llvm-dis, llc and other utils.

llvm-dis shows a following message:
Bytecode formats < 7 are not longer supported. Use llvm-upgrade.
(Vers=6, Pos=9)

That is correct. Bytecode format 6 corresponds to version 1.9. LLVM is
currently using version 7 and the definition of version 7 is still in
flux as the instruction set continues to change.

But since the new llvm-dis cannot disassemble, I cannot use
llvm-upgrade, since I need a way to produce an *.ll file.

If you can't do as Bill suggested (get the latest llvm-gcc and compile
it), you can use this approach:

Compile with the 1.9 version of llvm-gcc4. Use the -emit-llvm -S
options. That will yield a 1.9 assembly (ll) file.

Run llvm-upgrade from the CVS head on on the produced llvm assembly.
That will produce another .ll file. Then that assembly file can be run
through the latest llvm-as.

Something like this:

cfe1.9/bin/llvm-gcc -S -emit-llvm source.c -o - | \
   HEAD/bin/llvm-upgrade | \
   HEAD/bin/llvm-as -o source.bc

What's wrong? I thought that LLVM 1.9 GCC4 frontend produces BC files
in a new format already?

It is, but the format continues to change rapidly in preparation for
release 2.0. Because of the rapid changes, we decided to break backwards
compatibility except through LLVM Assembly files. This has drastically
simplified our task and will get us to a faster/smaller/smarter bytecode
format for 2.0 more easily.

Sorry for any inconvenience this has caused, but it was necessary to
make progress.

Or do I need to rebuild the front-end from the
CVS?

That's definitely the simplest thing to do.

May be there are pre-built images available?

No, there's not. We only distribute them in accordance with a release.
The 2.0 release is not scheduled until the May time frame.

-Roman

Reid.

Hi Reid,

> But since the new llvm-dis cannot disassemble, I cannot use
> llvm-upgrade, since I need a way to produce an *.ll file.

If you can't do as Bill suggested (get the latest llvm-gcc and
compile
it), you can use this approach:

Compile with the 1.9 version of llvm-gcc4. Use the -emit-llvm -S
options. That will yield a 1.9 assembly (ll) file.

Run llvm-upgrade from the CVS head on on the produced llvm assembly.
That will produce another .ll file. Then that assembly file can be
run through the latest llvm-as.

Something like this:

cfe1.9/bin/llvm-gcc -S -emit-llvm source.c -o - | \
   HEAD/bin/llvm-upgrade | \
   HEAD/bin/llvm-as -o source.bc

OK. I tried this approach and it works without any problems.

>
> What's wrong? I thought that LLVM 1.9 GCC4 frontend produces BC
files
> in a new format already?

It is, but the format continues to change rapidly in preparation for
release 2.0. Because of the rapid changes, we decided to break
backwards compatibility except through LLVM Assembly files. This has
drastically simplified our task and will get us to a
faster/smaller/smarter bytecode format for 2.0 more easily.

Sorry for any inconvenience this has caused, but it was necessary to
make progress.

No problem. It was just not clear for me that the GCC4 frontend from
LLVM 1.9 release does not support the new bytecode format.

> Or do I need to rebuild the front-end from the
> CVS?

That's definitely the simplest thing to do.

OK. GCC4 build is in progress on my machine at the moment. I guess, it
will take a while :wink:

> May be there are pre-built images available?

No, there's not. We only distribute them in accordance with a
release. The 2.0 release is not scheduled until the May time frame.

OK. It is not a big deal to rebuild it, even though I still think that
providing daily pre-build GCC front-ends for the current CVS HEAD
branch could be useful for nightly builds/tests and so on and could
save some time for LLVM developers/testers...

Thanks for clarifications,
  Roman

Are you volunteering to provide equipment, bandwidth and services to
build it on N different platforms each day?

:slight_smile:

Most of us just keep an "untouched" build around for nightly test
purposes. The platform differences are significant enough that building
a set of binaries each day can quickly become a full time job. We'd
rather be improving LLVM.

Reid.

Hi,

I'm thinking about introducing some sort of memory protection into the
LLVM backend for my embedded target. The overall goal is to make some
regions of memory non-readable and/or non-writable. The bounds of these
memory regions are sometimes fixed and known in advance and in some
other cases can be dynamic (e.g. for heap or stack).

Since my target does not have an MMU and any sort of hardware based
memory protection, these checks should be implemented in software.

I see the following possibilities to achieve that:
1) Simply check every pointer before any indrect memory access. This is
very easy to implement, but is rather inefficient, because it has a
very big run-time overhead and bloats the generated code. This is how
it is done in our old compiler.

2) Use more advanced analysis and eliminate most of the checks if it
can be proved that a given pointer always has a "good" value at this
point.

Since the idea of (2) is not so new, I guess that some work was already
done in LLVM in this direction. For example, I have found the following
papers:

@InProceedings{DKAL:LCTES03,
    Author = {Dinakar Dhurjati, Sumant Kowshik, Vikram Adve and
Chris Lattner},
    Title = "{Memory Safety Without Runtime Checks or Garbage
Collection}",
    Booktitle = "{Proc. Languages Compilers and Tools for Embedded
Systems 2003}",
    Address = {San Diego, CA},
    Month = {June},
    Year = {2003},
    URL =
{http://llvm.cs.uiuc.edu/pubs/2003-05-05-LCTES03-CodeSafety.html\}
  }

and

"Segment Protection for Embedded Systems Using Run-time Checks"
By Matthew Simpson, Bhuvan Middha and Rajeev Barua.
Proceedings of the ACM International Conference on Compilers,
Architecture, and Synthesis for Embedded Systems (CASES),
San Francisco, CA, September 25-27, 2005

If I understand correctly, they are related to DSA analysis.

So, I'd like to ask about the status and availability of such kinds of
analysis in LLVM. What are the analysis passes to look at? What are the
issues with them? What is a current status? Are there any problems
(e.g. patent issues like with DSA) that prevent from using them?

Any feedback about this issues is very appriciated.

-Roman