New to LLVM, Help needed

Hi,

I have started to write an llvm backend for one of our microcontrollers (PICxx). I started studying the framework of PowerPc backend of llvm and decided to start by following that framework. Now I have most of the classes and Tblgen files written for a very basic hypothetical microcontroller with very few instructions.

The project builds and the llc recognizes the new processor, however, when it reaches the point where it wants to lower llvm IR to PICxx DAG, it asserts in LegalizeDAG.cpp in ExpandOp() function after it hits the default case of switch(Node->getOpcode())

Can someone please help me understand how am I ending up in the default case?

Thanks,

A.

I have started to write an llvm backend for one of our microcontrollers (PICxx). I started studying the framework of PowerPc backend of llvm and decided to start by following that framework. Now I have most of the classes and Tblgen files written for a very basic hypothetical microcontroller with very few instructions.

Cool!

The project builds and the llc recognizes the new processor, however, when it reaches the point where it wants to lower llvm IR to PICxx DAG, it asserts in LegalizeDAG.cpp in ExpandOp() function after it hits the default case of switch(Node->getOpcode())

Can someone please help me understand how am I ending up in the default case?

It’s hard to say. You’d have to look at what the Node->getOpcode() is. It should be one of the ones that’s being handled. It’s not, so you need to figure out why it isn’t, where it’s coming from, and how to get it to be one of the opcodes handled. Check your PICxxISelLowering.cpp (?) file and see what it’s doing with that opcode. Are you really going to “expand” it, or should it do something else (legal, promote)?

These are a few tips. Others will jump in with better ideas, I’m sure.

-bw

Is the documentation for the PICxx series you’re writing this for publicly available? Is it planned to contribute the back end to the public LLVM project? I suspect there are people out there who would help with the back end if the source were in the public LLVM repo.

Please do a debug build and run it under gdb. Let us know where it is asserting and what it is asserting on so we can help you.

Evan

More information on this… still not working

When I build the project for Debug and run the program, the following message is printed before assert.

NODE: 0x937bac8: i64 = GlobalAddress <i32* @var> 0

I guess it is expecting that GlobalAddress be legalized before reaching ExpandOp(). I haven’t implemented anything for ISD::GlobalAddress, and that may explain it, however, I couldn’t find much about it in the PowerPC implementation either. The only thing is in the ctor of PPCTargetLowering there is setOperationAction(ISD::GlobalAddress, MVT::i32, Custom); and

setOperationAction(ISD::GlobalAddress, MVT::i64, Custom);

so I also added these two function calls to my ctor of PICxxTargetLowering. Nothing changed. I also tried “Expand” and “Legal” instead of “Custom”, still no progress. I’m sure I am missing some thing here.

Any idea??

Thanks

A.

More information on this... still not working
When I build the project for Debug and run the program, the following
message is printed before assert.

NODE: 0x937bac8: i64 = GlobalAddress <i32* @var> 0

This implies that your target uses 64-bit pointers, but that it doesn't have a 64-bit register file, is this right?

If you aren't using 64-bit pointers, you should investigate where this node came from.

I guess it is expecting that GlobalAddress be legalized before reaching
ExpandOp(). I haven't implemented anything for ISD::GlobalAddress, and
that may explain it, however, I couldn't find much about it in the
PowerPC implementation either. The only thing is in the ctor of
PPCTargetLowering there is setOperationAction(ISD::GlobalAddress,
MVT::i32, Custom); and

setOperationAction(ISD::GlobalAddress, MVT::i64, Custom);
so I also added these two function calls to my ctor of
PICxxTargetLowering. Nothing changed. I also tried "Expand" and "Legal"
instead of "Custom", still no progress. I'm sure I am missing some thing
here.

You may be running into problems if the pointer type in the code generator isn't natively supported by your register file. We haven't hit a target like this yet, so you will likely have to expand LegalizeDAG to handle these cases.

-Chris

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Evan Cheng
Sent: Monday, September 10, 2007 10:29 AM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] New to LLVM, Help needed

Please do a debug build and run it under gdb. Let us know where it is
asserting and what it is asserting on so we can help you.

Evan

Hi,

I have started to write an llvm backend for one of our microcontrollers
(PICxx). I started studying the framework of PowerPc backend of llvm and
decided to start by following that framework. Now I have most of the
classes and Tblgen files written for a very basic hypothetical
microcontroller with very few instructions.

The project builds and the llc recognizes the new processor, however,
when it reaches the point where it wants to lower llvm IR to PICxx DAG,
it asserts in LegalizeDAG.cpp in ExpandOp() function after it hits the
default case of switch(Node->getOpcode())

Can someone please help me understand how am I ending up in the default
case?

Thanks,

A.

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

Thank you Chris,
I had the pointer size wrong. I fixed it and now it passes that point as
expected :slight_smile: which takes me to a second question:
The processor that I am working on is 8-bit and has Harvard
architecture; this implies different pointer types (sizes) to objects in
data memory or program memory (functions or data in program memory)
At this moment, I am just using only one pointer size (16-bit) just to
get things going; however, eventually I need to model the two pointer
types.
I was wondering if you have any suggestion as to how best I can model
this in LLVM.

Thanks
Ali

Thank you Chris,
I had the pointer size wrong. I fixed it and now it passes that point as
expected :slight_smile: which takes me to a second question:

cool

The processor that I am working on is 8-bit and has Harvard
architecture; this implies different pointer types (sizes) to objects in
data memory or program memory (functions or data in program memory)
At this moment, I am just using only one pointer size (16-bit) just to
get things going; however, eventually I need to model the two pointer
types.
I was wondering if you have any suggestion as to how best I can model
this in LLVM.

I don't think the code generator will need significant extension to support this: SDISel would just lower each to a different integer size.

The place that will need extension is the LLVM IR itself. Here I think that we should expand PointerType to take some sort of address space identifier. This will allow the front-end to produce pointers of the right type and the codegen will be able to lower them to the right sized integer. This is a fairly invasive change, but should be relatively straight-forward to do.

-Chris

Chris,
Extending LLVM IR to support PointerTypes that take an address space is
what I was hoping to avoid. However, if we want to do things right, that
is probably the way to go. Now that we got here, let me write some of my
thoughts on this and solicit your input:

--- 1) Syntax extension:
In our existing compiler for 8-bit microcontrollers, we have introduced
rom and ram qualifiers (with ram being the default one) that can be
applied to any type for example:
rom int a; //integer in program memory
rom int *a; //ram pointer to integer in rom
int * rom a; //rom pointer to integer in ram
rom int * rom a; //rom pointer to integer in rom
Is something similar to the above what you also envision?

--- 2) Automatic pointers:
This is what we don't have in our existing compiler, but many people are
asking for it. Would it be possible in LLVM to treat pointers as general
all the way to code generation, and then decide its Address Space based
on the following criteria? (we should be able to do so in an LLVM pass
because at code generation time we have the full view of the program)
-- a) Address Space of the pointer is the Address Space of the variable
eg: ptr = &var; //AddSp of ptr becomes AddSp of var
-- b) Address Space of the pointer is the address Space of the pointer
eg: ptr1 = ptr2; //AddSp of ptr1 becomes AddSp of ptr2
-- c) Conflicts inside functions are not resolvable and should generate
diagnostic.
eg:
void f(void){
    generalPtr = romPtr;
    //some code
    generalPtr = ramPtr; // non resolvable conflict
}
-- d) Conflicts at the function interface will spawn a new function
eg:
void inc(int *a){
    (*a)++;
}
void g(void){
    inc(romPointer); // this will spawn an f with rom pointer
    inc(ramPointer); // this will spawn an f with ram pointer
}

In the case of (2) we still need rom and ram qualifiers to declare
variables in the intended Address Space, however the impact on the front
end will probably be reduced.
A combination of (1) and (2) would probably be ideal.

Regards,
Ali.

You may want to take a look at the Embedded C specification draft on named memory spaces. Having the IR be able to support the standard would probably be a “good thing”.

See section 5.1
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf

Chris,
Extending LLVM IR to support PointerTypes that take an address space is
what I was hoping to avoid. However, if we want to do things right, that
is probably the way to go. Now that we got here, let me write some of my
thoughts on this and solicit your input:

Okay, I agree that it's the right way to go. Also, being able to eventually the Embedded C specification as Christopher points out seems very useful :).

--- 1) Syntax extension:
In our existing compiler for 8-bit microcontrollers, we have introduced
rom and ram qualifiers (with ram being the default one) that can be
applied to any type for example:
rom int a; //integer in program memory
rom int *a; //ram pointer to integer in rom
int * rom a; //rom pointer to integer in ram
rom int * rom a; //rom pointer to integer in rom
Is something similar to the above what you also envision?

As far as C syntax goes, I have no preference. I think that following Embedded C makes the most sense.

--- 2) Automatic pointers:
This is what we don't have in our existing compiler, but many people are
asking for it. Would it be possible in LLVM to treat pointers as general
all the way to code generation, and then decide its Address Space based
on the following criteria? (we should be able to do so in an LLVM pass
because at code generation time we have the full view of the program)
-- a) Address Space of the pointer is the Address Space of the variableeg: ptr = &var; //AddSp of ptr becomes AddSp of var
-- b) Address Space of the pointer is the address Space of the pointer
eg: ptr1 = ptr2; //AddSp of ptr1 becomes AddSp of ptr2
-- c) Conflicts inside functions are not resolvable and should generate
diagnostic.
eg:
void f(void){
    generalPtr = romPtr;
    //some code
    generalPtr = ramPtr; // non resolvable conflict
}

This basically amounts to type inference. If you want this, it would have to be implemented in the front-end, not in at the LLVM level (you lose too much to give useful error reports etc).

Type inference is very nice, but it is not in the spirit of C at all. C is very explicit (to a fault perhaps).

-- d) Conflicts at the function interface will spawn a new function
eg:
void inc(int *a){
    (*a)++;
}
void g(void){
    inc(romPointer); // this will spawn an f with rom pointer
    inc(ramPointer); // this will spawn an f with ram pointer
}

In the case of (2) we still need rom and ram qualifiers to declare
variables in the intended Address Space, however the impact on the front
end will probably be reduced.
A combination of (1) and (2) would probably be ideal.

This again is a front-end issue. It sounds like you want generic functions ala C++ templates. If you go down this path, you are basically designing your own c-like language, you're not doing a simple C extension (which is what Embedded C is).

Regardless of whether you choose to make your own language or use Embedded C, the LLVM support should be the same though.

-Chris

Thank you Chris and Christopher,
I agree... the Embedded C Language Extensions report provides a good
foundation to build on, and what it proposes as far as Address Space is
probably a super set of what we have in our existing compiler (and
probably would like to keep) so no conflict there. I also agree that
regardless of how we would like to deal with pointers, the same
extensions must be applied to LLVM.
I think it all boils down to whether you think it is time to incorporate
these extensions into LLVM IR and how long do you think it will take to
do so?

Regards
Ali.

I think it all boils down to whether you think it is time to incorporate
these extensions into LLVM IR and how long do you think it will take to
do so?

Sure, any time is good. The reason we don't have this now is primarily because noone has stepped up to contribute it. If you're like to start this, I'd be happy to help with the design issues.

-Chris

Cool,
Let me list the things that I can think of for now :
(Please feel free to add/modify/eliminate/prioritize/etc)

1) I am still reading the LLVM docs, could you give me pointers to stuff
that is more relevant to this discussion so I get faster start.
2) As far as the Embedded C language extensions, for now, I am more
interested in the address space, and after that work on the Fixed point
and I/O registers.
3) As far as things related to the Address Space, the ISO report leaves
some stuff to the implementation; we have to decide what we want to do
about them:
Eg:
--how many Address Spaces do we want to support in the LLVM IR
--what address space names do we want to add to the front-end
--how best to model nested address spaces (if we want to support this in
IR)
--do we want to provide some kind of support (type inference stuff) for
things like Automatic pointers that I mentioned earlier (in case some
one wants to use them). I agree that this is against C language
fundamentals, however, people are asking for them more and more and some
compilers are actually trying to support them in weird ways, I think we
can handle them in LLVM much better than how others do. But again, this
is probably at the bottom of our list anyways!

A.

My $0.02

Eg:
--how many Address Spaces do we want to support in the LLVM IR
--what address space names do we want to add to the front-end

I think that the name of the address spaces will need to be target specific, based on the target's front end. I'd guess that these would get mapped into a set of numbered address spaces that LLVM supports in the IR (and the target specific code generator). The number of address spaces that LLVM supports in the IR is mainly an encoding efficiency issue. I'd start with only a few (4 perhaps), as I believe that it should not be difficult to expand the number supported if a target appeared that needed them.

--how best to model nested address spaces (if we want to support this in
IR)

Supporting nested address spaces is simply a matter of constraining transformations that may be applied. I think that this is mostly a front end issue (correct me if I'm wrong Chris), and if there are parts that need to be dealt with in LLVM they can be added after basic address space support is available.

--do we want to provide some kind of support (type inference stuff) for
things like Automatic pointers that I mentioned earlier (in case some
one wants to use them). I agree that this is against C language
fundamentals, however, people are asking for them more and more and some
compilers are actually trying to support them in weird ways, I think we
can handle them in LLVM much better than how others do. But again, this
is probably at the bottom of our list anyways!

I think that this is something that can be tackled after the basic address space support is in place.

Here are some questions/suggestions that might help guide where to look through the code base and think about the design:
* Where does the address space information need to be stored in the IR?
  Globals, function parameters that are pointers, alloca's, malloc's, GEP's?
* What changes are required so that the address space info is preserved in the IR by existing passes?
* Where is the address space information consumed in the back end?
  My guess is instruction selection, which means that the DAG node form of LD/ST will need to carry address space information.
* What changes are required so that the address space info is preserved in the DAG nodes given existing transformations?

Perhaps take a look at how other pointer attributes (volatile/noalias) weave their way through the data flow to get an idea of these further attributes might be handled.

Chris can most likely help answer those questions above and probably issues I haven't thought of as well =)

We'll have to introduce local load / store instructions. These look just like normal load / store except for the additional address space index bit. Alternatively, just add the address space information to load / store. I can see advantages to either approaches.

Evan