Does current LLVM target-independent code generator supports my strange chip?

I have a very strange and complicate H/W platform.
It has many registers in one format.
The register format is:

Why not model each channel as a separate physical register?

Evan

Because each channel contains 24-bit, so.. what is the
llvm::SimpleValueType I should use for each channel?
the current llvm::SimpleValueType contains i1, i8, i16, i32, i64, f32,
f64, f80, none of them are fit one channel (24-bit).

I think I can use i32 or f32 to represent each 24-bit channel, if the
runtime result of some machine instructions exceeds 23-bit (1 bit is
for sign), then it is an overflow.
Is it correct to claim that the programmers needs to revise his
program to fix this problem?
Am I right or wrong about this thought?

If there is a chip, whose registers are 24-bit long, and you have to
compile C/C++ programs on it.
How would you represent the following statement?

int a = 3;
(Programmers think sizeof(int) = 4)

Wei.

This is similar to ATI's R300/R420 pixel shaders. I'm familiar with this hardware, but not really an LLVM expert (working on a code generator myself, but learning as I go).

Do you have 24-bit integer operations, or just floating point?

What about load/store?

Are you looking to run large C programs with complex data structures, or just comparatively simple math functions (i.e. a compute "kernel")?

If you only want to support programs that can live entirely within registers, you can custom handle the conversion of the integer/float constants that LLVM spits out and i32/f32 sounds a good place to start - LLVM's mem2reg and inlining is very effective at getting rid the majority of stack operations, and I'd assume you'd have intrinsics for I/O.

If you want to support memory operations, your integers need to support the addressing range correctly - you effectively have 17 bits of mantissa - so it may be a tight squeeze without 24 bit integer ops (shifts and ands and stuff will also be a painful, but that's a more expansive topic).

Dan

24 bit is not unusual in the DSP world. I suppose int == 24 bit integer for some of these chips?

There isn't a i24 simple type. However, you can create an extended integer type. See getExtendedIntegerVT. It's almost guaranteed you will have to change a chunk of target independent codegen to support the use of an extended type though.

Evan

Do you mean MVT::getIntegerVT? Because I can not find
getExtendedIntegerVT in the llvm source codes.
I am excited seeing this function, however I have the following more
questions.

1) You mention I will have to change not small amount of target
indenpendent codegen codes to support this extended type.
Are there any document to describe how to do such kind modification?
I see there is a "extending LLVM" document in the official website,
and I don't know whether the info written in its "Adding a new
SelectionDAG node" section (although its quite simple) is what I need?
If not, where can I get more information about this topic?

2) What will go wrong if I use MVT::i32 or MVT::f32 to represent such
a 24-bit register? Will LLVM optimization pass produce wrong codes or
other really bad things? Or just produce codes which will overflow in
some should not overflow situation.

I think I am pretty new in LLVM world. wanna to get more help from you
expert. Thx.

Wei.

I have 24-bit integer operations as well as 24-bit floating point
(s7.16) operations.

The H/W supports load/store instructions, however, they does suggest
us not to use these load/store instructions besides debugging purpose.
That is to say, you can imagine we don't have load/store instructions,
we don't have memory, we just have registers.

I will run OpenGL shading laugnage programs on these chip.

About your comments, I (a new LLVM user) have some more questions:

1) You mention "custom handle the conversion of the integer/float
constants that LLVM spits out", does it means:
I have to register a callback function which will operate when LLVM
wants to spits out a constant value to memory. But what about non-
constant value? ex:
int a;
and LLVM wants to put a into memory.
and I don't really know what the "i32/f32 sounds a good place to
start" means...

2) I don't know why you mention "I'd assume you'd have intrinsics for
I/O."

3) I don't think I get you about the following statements:

If you want to support memory operations, your integers need to
support the addressing range correctly - you effectively have 17 bits
of mantissa - so it may be a tight squeeze without 24 bit integer ops
(shifts and ands and stuff will also be a painful, but that's a more
expansive topic).

Can you give some example?

Really really thanks about your comments.

Wei.

I have 24-bit integer operations as well as 24-bit floating point
(s7.16) operations.

The H/W supports load/store instructions, however, they does suggest
us not to use these load/store instructions besides debugging purpose.
That is to say, you can imagine we don’t have load/store instructions,
we don’t have memory, we just have registers.

I will run OpenGL shading laugnage programs on these chip.

GLSL doesn’t have pointers, so no “generic” load + store simplifying things.

Unextended GLSL only requires support for integers in the 16 bit range, and has no bitwise operations. It also doesn’t specify integer overflow behavior in any way.

The machines I worked with didn’t support any integer ops, but GLSL let us get by with “emulated” 16 bit integers (storing and operating on them as floating point; divides required truncation after the op - that sort of thing).

Since you have 24 bit integer operations, you’re in better shape.

About your comments, I (a new LLVM user) have some more questions:

  1. You mention “custom handle the conversion of the integer/float
    constants that LLVM spits out”, does it means:
    I have to register a callback function which will operate when LLVM
    wants to spits out a constant value to memory. But what about non-
    constant value?

What I mean is that you can probably get away with LLVM working with float literals as f32, then converting them to your 24 bit format during code gen. The specifics depend on how you want to handle constants in your backend: literals in instructions or a constant pool are the options I know of. For now, I’m using special “load literal” instructions, but a constant pool may be more appropriate in the long run. I’m still learning.

Integers too: let LLVM work with i32 internally, and convert literals during code gen.

Since GLSL doesn’t require load/store, and it sounds like your HW may not 100% reliable for these ops, you want to make sure your code stays in registers.

I assume you’ll be starting with the reference GLSL parser (from 3DLabs, IIRC - I don’t even know if they stil exist, actually) and having it generate LLVM IR (has anybody done this before?). This will give you much more control over the code - Clang is the front end for the project I’m working on, and it generates stack based code; most of the stack operations get optimized out by inlining and the mem2reg pass, but not everything.

ex:
int a;
and LLVM wants to put a into memory.

and I don’t really know what the “i32/f32 sounds a good place to
start” means…

I mean that having your registers declared as i32 + f32 will probably work out well, especially since you don’t have pointers in your language.

The issue would be that LLVM would want to store register values as 32 bits - and do all the pointer math that way. Depending on how your HW works, this may or may not be okay. Even then, you might be able to patch it up if you really needed to store your registers 3 byte aligned.

Fortunately, this is not an issue with GLSL.

  1. I don’t know why you mention “I’d assume you’d have intrinsics for
    I/O.”

For GLSL, you have to have some way of reading attributes and uniforms, exporting to/reading from varyings, etc.

Different GPUs do things differently of course: in some cases, it’s a matter of certain GPRs being initialized by “fixed function” HW with input values at the start of the shader and certain GPRs being left with output values at the end of the shader. Other GPUs require explicit “export” instructions, perhaps just reads/writes to dedicated I/O registers. Some have a mix (this is the case for HW I’ve worked with).

If you have export instructions, or even special I/O registers, I was thinking that they could be represented or accessed by Target specific ops -intrinsics. You’d have the GLSL front end generate these intrinsic operations.

I haven’t had to work with register constraints in LLVM, so I’m not sure what would be best approach if I/O is done through specific GPRs: you don’t want to reserve those registers for I/O only… it would take some exploration.

  1. I don’t think I get you about the following statements:

If you want to support memory operations, your integers need to

support the addressing range correctly - you effectively have 17 bits

of mantissa - so it may be a tight squeeze without 24 bit integer ops

(shifts and ands and stuff will also be a painful, but that’s a more

expansive topic).

Can you give some example?

Sorry, I was “thinking out loud”.

I made the assumption here that you didn’t have 24 bit integer ops, and that you might try to represent pointers as integers in a single 24 bit float value (maybe with a 1D texture as your addressable memory). In that case, you’d have a very limited range.

But GLSL doesn’t have pointers, so this isn’t an issue (and 24 bit integers gives you a decent addressing range for debugging).

Dan

The machines I worked with didn't support any integer ops, but GLSL
let us get by with "emulated" 16 bit integers (storing and operating
on them as floating point; divides required truncation after the op -
that sort of thing).

Although my platform indeed supports integer operations, however, it
only supports integer +,-,*, not /. The document says if I need to do
integer division, I have to convert them to floating point first.
Hence, I have similar problems.
So...
Does your method means you write some codes in your 'frontend' to emit
LLVM IR to convert the integer to floating point first, then perform
the operations, and then convert the result back to integer?
Or you write such codes in your 'backend'?

No matter what your answer is, I think the 'frontend' approach is more
cleaner than the 'backend' approach (The 'backend' approach is more
like a hack?). Am I right? Or writing such mechanism in backend has
other advantages?

What I mean is that you can probably get away with LLVM working with
float literals as f32, then converting them to your 24 bit format
during code gen.

I think I got you here.

Integers too: let LLVM work with i32 internally, and convert literals
during code gen.

Huh.. I think I got you here, too.
But I probably don't know how you handle integer constants larger than
24-bit.
For example, if I sees the following instructions during code gen:

int %a, add int %b, int 0x12345678

Do I have to emit machine instructions similar to the following?

int %a, add int %b, int 0x5678
int %c, add int %d, int 0x1234
int %e, add int %c, 1 <--- depends on the result of the first addition

However, this means the backend has to remember the register %a now
stores low bytes of the result, and the register %c stores the high
bytes of the result. This tracking is not an easy job, I think.

I assume you'll be starting with the reference GLSL parser (from
3DLabs, IIRC - I don't even know if they stil exist, actually)

You can find the 3Dlabs frontend here:
http://l4.me.uk/static/glsl/

And I don't think anyone has ported this frontend onto LLVM before.

The issue would be that LLVM would want to store register values as 32
bits - and do all the pointer math that way.

I don't really get you here.
Why LLVM do all the pointer math in 32-bit just because I store
register values as 32-bit?

I haven't had to work with register constraints in LLVM, so I'm not
sure what would be best approach if I/O is done through specific GPRs:
you don't want to reserve those registers for I/O only.... it would
take some exploration.

unfortunately~! my platform indeed uses GPRs to do the input/output.
My current thought is to compute the amount of used attributes/
varyings in a shader, and reserve same amount GPRs for those
attributes/varyings ONLY. Because I think if I have NO memory can
spill registers out, there is no much space for the register
allocation. The method I might use is to INLINE all functions, and
perform the register allocation. This strategy is bad, or course, or
do you think of some other better solution?

Wei.

Let me clarify - I haven't used LLVM for GLSL - I'm also relatively new to LLVM targeting a modern GPU. My GLSL work was back in the timeframe of AMD's R300/R400 series, which was 4 years ago.

The machines I worked with didn't support any integer ops, but GLSL
let us get by with "emulated" 16 bit integers (storing and operating
on them as floating point; divides required truncation after the op -
that sort of thing).

Although my platform indeed supports integer operations, however, it
only supports integer +,-,*, not /. The document says if I need to do
integer division, I have to convert them to floating point first.
Hence, I have similar problems.

So...
Does your method means you write some codes in your 'frontend' to emit
LLVM IR to convert the integer to floating point first, then perform
the operations, and then convert the result back to integer?
Or you write such codes in your 'backend'?

No matter what your answer is, I think the 'frontend' approach is more
cleaner than the 'backend' approach (The 'backend' approach is more
like a hack?). Am I right? Or writing such mechanism in backend has
other advantages?

IMHO I don't think of the backend approach as a hack:

Minimizing the dependencies of the frontend on the target is generally a good thing, assuming you'll possibly be targeting different HW in the future.

The backend approach means that integer division is a fairly long code sequence: that's just fine within LLVM.

What I mean is that you can probably get away with LLVM working with
float literals as f32, then converting them to your 24 bit format
during code gen.

I think I got you here.

Integers too: let LLVM work with i32 internally, and convert literals
during code gen.

Huh.. I think I got you here, too.
But I probably don't know how you handle integer constants larger than
24-bit.
For example, if I sees the following instructions during code gen:

int %a, add int %b, int 0x12345678

Do I have to emit machine instructions similar to the following?

int %a, add int %b, int 0x5678
int %c, add int %d, int 0x1234
int %e, add int %c, 1 <--- depends on the result of the first addition

However, this means the backend has to remember the register %a now
stores low bytes of the result, and the register %c stores the high
bytes of the result. This tracking is not an easy job, I think.

Unextended GLSL doesn't require support for integers larger than 16 bits.

I assume you'll be starting with the reference GLSL parser (from
3DLabs, IIRC - I don't even know if they stil exist, actually)

You can find the 3Dlabs frontend here:
http://l4.me.uk/static/glsl/

And I don't think anyone has ported this frontend onto LLVM before.

The issue would be that LLVM would want to store register values as 32
bits - and do all the pointer math that way.

I don't really get you here.
Why LLVM do all the pointer math in 32-bit just because I store
register values as 32-bit?

What I mean is that LLVM would think of your registers as taking 4 bytes in memory, and do all the pointer math that way: multiplying array indexes by 4. This may be fine on your machine, but it seems plausible that you would want 3 byte alignment, and, in that case, you would have to patch things up.

I haven't had to work with register constraints in LLVM, so I'm not
sure what would be best approach if I/O is done through specific GPRs:
you don't want to reserve those registers for I/O only.... it would
take some exploration.

unfortunately~! my platform indeed uses GPRs to do the input/output.
My current thought is to compute the amount of used attributes/
varyings in a shader, and reserve same amount GPRs for those
attributes/varyings ONLY. Because I think if I have NO memory can
spill registers out, there is no much space for the register
allocation. The method I might use is to INLINE all functions, and
perform the register allocation. This strategy is bad, or course, or
do you think of some other better solution?

This sounds like a good bringup approach to get you started, both I/O and inlining all functions.

I've been learning LLVM as I go - my suspicion is that LLVM can do better on the I/O question with the right register information - as you learn more, some creative approach will present itself.

Similarly for inlining - calls and returns can be custom handled - maybe there's a way to tie this in to a customized register allocator... As long as your shaders aren't busting out of your instruction limits (or instruction cache size, depending on the HW), inlining is a good thing.

In addition to GLSL, Khronos' recently announced OpenCL which also disallows recursion in part because stack operations are still very slow on GPUs (small dependent load/stores aren't great for the huge pipeline). A random non-expert thought: maybe there's some general approach to non-stack based function calling that could be implemented with a global register allocator and an analysis of the call tree?

Dan

Do you mean MVT::getIntegerVT? Because I can not find
getExtendedIntegerVT in the llvm source codes.
I am excited seeing this function, however I have the following more
questions.

See ValueTypes.h and ValueTypes.cpp. Also this example:

@str = internal constant [4 x i8] c"%d\0A\00"

define void @foo2(i24 %a, i24 %b) nounwind {
entry:
  %t1 = add i24 %a, %b
  %t2 = zext i24 %t1 to i32
  %t3 = tail call i32 (i8*, ...)* @printf( i8* getelementptr ([4 x i8]* @str, i32 0, i32 0), i32 %t2 ) nounwind
  ret void
}

declare i32 @printf(i8*, ...) nounwind

You can run llc on it to see how codegen deals with i24.

1) You mention I will have to change not small amount of target
indenpendent codegen codes to support this extended type.
Are there any document to describe how to do such kind modification?
I see there is a "extending LLVM" document in the official website,
and I don't know whether the info written in its "Adding a new
SelectionDAG node" section (although its quite simple) is what I need?
If not, where can I get more information about this topic?

I am not sure how legalizer and friends deal with i24 / f24 as legal types.

These are potentially useful.
http://llvm.org/docs/WritingAnLLVMBackend.html
http://llvm.org/docs/CodeGenerator.html

But this is advanced stuff so your best bet is to ask questions here (and on irc?).

2) What will go wrong if I use MVT::i32 or MVT::f32 to represent such
a 24-bit register? Will LLVM optimization pass produce wrong codes or
other really bad things? Or just produce codes which will overflow in
some should not overflow situation.

Overflow is going to be a problem. There will probably be more issues to work through.

Evan

Hi,

I am not sure how legalizer and friends deal with i24 / f24 as legal
types.

the type legalizer currently assumes that all legal integer types
have a power-of-two number of bits. I don't see any obstacles to
making it more general though. First off, i24 would need to be
added to the list of simple value types. Then the integer promotion
and expansion logic would need to be taught things like this:
i16 promotes to i24, i32 promotes to i48 which is then expanded to
2 x i24. Finally, all of the code would need to be audited to see
if it assumes that types promoted to / expanded to (or from) are
powers of two in length. Most of it probably doesn't assume any
such thing, fortunately.

Once types are legal, there's still the problem of making sure
everything else works fine with i24. One obvious problem is
that (a bit like x86 long double) it isn't naturally aligned.
Presumably if you store two i24's then the second is stored
4 bytes after the first? Dunno how many places in the code
generator make assumptions about this kind of thing.

Ciao,

Duncan.

Hi,

perhaps a little bit off topic, but I read 'OpenCL':

OpenCL is very often mentioned with LLVM and Clang. Is it possible to use OpenCL with LLVM/Clang (I mean the official repository) by now? Or is there a schedule which shows when we will see OpenCL-support in LLVM/Clang?

Thanks,
Nico

O...k... I try to make some conclusions:

1) The conversion from f32 to f24 or i32 to i24 should be written in
the backend.

Because we should not put any hardware dependent behaviors in the
frontend. If we may change our H/W platform to another one which
supports f32, i32 natively, we will only need to change the backend
codes.

For example:
The backend approach means that integer division is a fairly long code
sequence: that's just fine within LLVM.

2) If we use MVT::getIntegerVT() to get i24 LLVM type, then the
problem will be:
  > target independent codegen's legalizer can not handle this.
Because the type legalizer currently assumes that all legal integer
types have a power-of-two number of bits.
  > target independent codegen needs to be taught the i24 type. This
might be a lot of codes. Dunno how to modify LLVM codegen to support
this new type. Is there any document describing this?

Hence, the reasonable approach as far as I know is:

3) Using f32/i32 to represent f24/i24 register, however, the problem
may be:
  > overflow - Don't know how to solve it in LLVM
  > Does this approach suffer any other drawbacks?

Thanks.
Wei.

O...k... I try to make some conclusions:

1) The conversion from f32 to f24 or i32 to i24 should be written in
the backend.

I disagree. This should be handled by the type legalization
infrastructure. After all, that's what it is for! However
there is currently no support for anything like f32 -> f24.
On the other hand, as I mentioned in another email, I think
i32 -> i24 can be done generically.

2) If we use MVT::getIntegerVT() to get i24 LLVM type, then the
problem will be:
  > target independent codegen's legalizer can not handle this.
Because the type legalizer currently assumes that all legal integer
types have a power-of-two number of bits.

If i24 is added as a simple value type, then the type legalizer
can be generalized to handle this without too much difficulty.

Ciao,

Duncan.

I disagree. This should be handled by the type legalization
infrastructure.

huh...
As far as I know, the type legalization is in the SelectionDAG phase,
and it is also in the backend. Am I right? or I miss something.

there is currently no support for anything like f32 -> f24

You say "there is currently no support for anything like f32 -> f24",
does it means I can not write codes like below?

addRegisterClass(MVT::i24, XXXRegisterClass);

If the target-indenpendent codegen supports i24, then I can writes
codes like above, then does it means LLVM backend codegen can handle
any i32->i24 and f32->f24 for me automatically? So that I don't need
to worry about i32->i24 and f32->f24?

On the other hand, as I mentioned in another email, I think
i32 -> i24 can be done generically.

I don't think I got you here...
You mean using the legalizer to handle this for me? or modify the
target independent codegen to let it know i24?

Thanks.
Wei.

I think Duncan and I disagree. Generally I would defer to anybody else on this list: my experience is backend Target only, with very little poking around the internals. I'm usually asking, not answering, questions here: the 24 bit floats reminded me of the "good old days" at ATI.

That said, I think you could make f32/i32 work for your purposes - given the limited types and memory operations of unextended GLSL. At a minimum, I think that starting with f32/i32 would give you chance to learn and understand more about LLVM.

If there are people who are willing to help you add i24/f24 to LLVM's core code base or you have time to learn about LLVM's internals on your own, then adding 24 bit support seems the safer path (if only because it is recommended by people more knowledgeable than myself).

Dan

Daniel,

Many thanks for your recommendations.
I think your method is an easiest way to handle this situation without
modify LLVM itself.
Thanks for your recommendations again.

Wei.