Proposal: intp type

Kenneth_Uildriks · November 9, 2009, 3:58pm

Simply put, it's a pointer-sized integer. I'm blatantly stealing this
idea from .NET, where IntPtr (a pointer-sized integer) is a basic
type.

In my front end, I had considered just using a pointer for intp
behind-the-scenes and doing conversion to/from int64 when I wanted to
do arithmetic on them, but pointers and integers don't always align
the same way. So what I really want is a way to represent in IR that
a certain parameter/struct member is a pointer-sized integer.

Now that optimizations work without target data, I think we're
tantalizingly close to a situation where useful, target-independent IR
files are a real possibility. intp is one of the remaining pieces of
that puzzle; lots of native code expects an integer parameter/struct
member sized the same as a pointer. It makes sense for sizes and
offsets to be represented this way, and in a lot of cases, they are
represented this way, and in IR a lot more would be represented this
way if such a type existed.

Also, having this type present will tend over time to reduce the
number of casts to/from int64 appearing in bitcode that have to be
stripped out, slightly speeding up optimizations and codegen.

The ramifications that I see:

1. Conversions to/from other integer types: right now, integer type
conversions are always explicity specified as either a trunc, a sext,
or a zext. Since the size of intp is not known at IR generation time,
you can't know whether a conversion to/from intp truncates or extends.

2. The usual ramifications of adding a new type: IR
generation/analysis, optimizations, and codegen all have to be updated
to deal with the existence of the new type. The changes should be
minor; it's an integer type with all of the supported operations of
integer types, with the only difference being that its size cannot be
determined without target data.

Talin1 · November 10, 2009, 6:35am

I've asked for this as well.

A workaround that I have considered, but haven't had time to explore yet, is to actually store such integers as pointers, and then bitcast to int64 to do actual math operations and GEPs. While this might sound inefficient on 32-bit platforms, I believe that LLVM's optimizers can take notice of the fact that you aren't using the upper bits and therefore degrade to the less expensive 32-bit operations.

Clearly this is an ugly hack (especially in the obscurity of the IR code generated), but I haven't come up with anything better so far.

Kenneth Uildriks wrote:

Simply put, it's a pointer-sized integer. I'm blatantly stealing this
idea from .NET, where IntPtr (a pointer-sized integer) is a basic
type.

In my front end, I had considered just using a pointer for intp
behind-the-scenes and doing conversion to/from int64 when I wanted to
do arithmetic on them, but pointers and integers don't always align
the same way. So what I really want is a way to represent in IR that
a certain parameter/struct member is a pointer-sized integer.

Now that optimizations work without target data, I think we're
tantalizingly close to a situation where useful, target-independent IR
files are a real possibility. intp is one of the remaining pieces of
that puzzle; lots of native code expects an integer parameter/struct
member sized the same as a pointer. It makes sense for sizes and
offsets to be represented this way, and in a lot of cases, they are
represented this way, and in IR a lot more would be represented this
way if such a type existed.

Also, having this type present will tend over time to reduce the
number of casts to/from int64 appearing in bitcode that have to be
stripped out, slightly speeding up optimizations and codegen.

The ramifications that I see:

1. Conversions to/from other integer types: right now, integer type
conversions are always explicity specified as either a trunc, a sext,
or a zext. Since the size of intp is not known at IR generation time,
you can't know whether a conversion to/from intp truncates or extends.

2. The usual ramifications of adding a new type: IR
generation/analysis, optimizations, and codegen all have to be updated
to deal with the existence of the new type. The changes should be
minor; it's an integer type with all of the supported operations of
integer types, with the only difference being that its size cannot be
determined without target data.
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- Talin

Kenneth_Uildriks · November 10, 2009, 1:59pm

The trouble with that is that the alignment of pointers is not
necessarily the same as the alignment of pointer-sized integers.

me22 · November 10, 2009, 2:10pm

Now that there are arbitrary-sized integers, couldn't you zext to i256
then trunc down again, and later let the folder simplify as
appropriate?

Kenneth_Uildriks · November 10, 2009, 4:24pm

I suppose that would work, but I wouldn't like to see two cast
instructions for every conversion.

Perhaps every conversion to/from intp could be represented as a zext,
whether or not it actually performs an extension. Is there anything
in LLVM that depends on a zext actually increasing the size of the
integer?

Talin1 · November 11, 2009, 12:10am

I realize that most users of LLVM aren’t affected by this, because most frontends aren’t target-neutral, and thus know in advance how big a pointer is. At least, that’s my impression.

In my case, I’ve been attempting to build a target-neutral frontend. In my tool chain, the target is specified at link time, not at compile time. Among other things, that means that the same IR file can be used for multiple targets.

What strikes me is how tantalizingly close LLVM comes to being able to do this. I am surprised, for example, that I can general all of the DWARF debugging structures without ever having to choose a target machine. Most things can be done quite easily without knowing the exact size of a pointer. When it comes to being able to “generate once, run anywhere”, LLVM is like 99.5% of the way there. Which makes that last remaining .5% particularly vexing.

There’s only a tiny handful of fairly esoteric cases which require selecting a target before you generate IR. Unfortunately, the “pointer the same size as an int” is one of these rare cases - it is something that is very painful to try and work around. (A similar esoteric use case is: "which of the following two types is larger, 3 x int32 or 2 x {}*? – i.e. the union problem.)

Kenneth_Uildriks · November 11, 2009, 12:41am

In my case, I've been attempting to build a target-neutral frontend. In my
tool chain, the target is specified at link time, not at compile time. Among
other things, that means that the same IR file can be used for multiple
targets.

That's the direction I'm going in too.

What strikes me is how tantalizingly close LLVM comes to being able to do
this. I am surprised, for example, that I can general all of the DWARF
debugging structures without ever having to choose a target machine. Most
things can be done quite easily without knowing the exact size of a pointer.
When it comes to being able to "generate once, run anywhere", LLVM is like
99.5% of the way there. Which makes that last remaining .5% particularly
vexing.

There's only a tiny handful of fairly esoteric cases which require selecting
a target before you generate IR. Unfortunately, the "pointer the same size
as an int" is one of these rare cases - it is something that is very
painful to try and work around. (A similar esoteric use case is: "which of
the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union
problem.)

I'm willing to spend some time on adding intp to LLVM... my
front-end's standard libraries would be cleaner and more portable that
way.

Kenneth_Uildriks · November 11, 2009, 1:00pm

(A similar esoteric use case is: "which of

the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union
problem.)

The size of a union can be compiled into a ConstantExpr. i.e.,

(sizeof(T1) > sizeof(T2)) ? sizeof(T1) : sizeof(T2))

Since sizeof(T1) and sizeof(T2) themselves are ConstantExpr's, and so
is icmp(ConstantExpr, ConstantExpr) and select (ConstantExpr,
ConstantExpr, ConstantExpr).

You won't be able to tell which is bigger from your front-end, but
you'll have a ConstantExpr that you can feed to malloc, etc.

Kenneth_Uildriks · November 11, 2009, 2:22pm

Of course if you try to represent it as an aggregate, rather than a
block of memory, you're stuck again, and for the same reason. An
array type can't use a ConstantExpr for its size... it has to be
specified as a literal integer by the front-end. So passing your
union as a parameter or returning it by value won't work... unions can
*only* live in memory unless you've got target data.

Very interesting problem (but one I don't feel ready to even
high-level-design a solution for yet)...

Gabor_Greif4 · November 11, 2009, 6:20pm

> 1. Conversions to/from other integer types: right now, integer type
> conversions are always explicity specified as either a trunc, a sext,
> or a zext. Since the size of intp is not known at IR generation time,
> you can't know whether a conversion to/from intp truncates or extends.

Now that there are arbitrary-sized integers, couldn't you zext to i256
then trunc down again, and later let the folder simplify as
appropriate?

This is not correct, because i256 occupies too much space in
structures, etc.

This question came up in the past, and I half-jokingly suggested "i0"
as the interger type that can store a null-pointer, and thus every
pointer.

i0 could be a type alias which gets resolved at the time when sizeof
(void*)
is first known. But as a nice bonus, conversions between i0 and T*
could be
omitted.

Just my 2 cents.

Gabor

Chris_Lattner · November 11, 2009, 6:56pm

If you're storing it in a structure, just cast it back to an i8* and store it as i8*.

-Chris

Chris_Lattner · November 11, 2009, 7:11pm

I realize that most users of LLVM aren't affected by this, because most frontends aren't target-neutral, and thus know in advance how big a pointer is. At least, that's my impression.

I believe that.

There's only a tiny handful of fairly esoteric cases which require selecting a target before you generate IR. Unfortunately, the "pointer the same size as an int" is one of these rare cases - it is something that is very painful to try and work around. (A similar esoteric use case is: "which of the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union problem.)

With this explanation, the idea of adding a union type seems a lot more compelling to me. For the record, I'm not opposed to an intptr_t type or a union type, but the semantics have to be clean and well specified.

-Chris

Nick_Lewycky · November 11, 2009, 8:10pm

Kenneth Uildriks wrote:

In my case, I've been attempting to build a target-neutral frontend. In my
tool chain, the target is specified at link time, not at compile time. Among
other things, that means that the same IR file can be used for multiple
targets.

That's the direction I'm going in too.

What strikes me is how tantalizingly close LLVM comes to being able to do
this. I am surprised, for example, that I can general all of the DWARF
debugging structures without ever having to choose a target machine. Most
things can be done quite easily without knowing the exact size of a pointer.
When it comes to being able to "generate once, run anywhere", LLVM is like
99.5% of the way there. Which makes that last remaining .5% particularly
vexing.

There's only a tiny handful of fairly esoteric cases which require selecting
a target before you generate IR. Unfortunately, the "pointer the same size
as an int" is one of these rare cases - it is something that is very
painful to try and work around. (A similar esoteric use case is: "which of
the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union
problem.)

I'm willing to spend some time on adding intp to LLVM... my
front-end's standard libraries would be cleaner and more portable that
way.

Sorry, but I'm still opposed. From your description of 'intp' it sounds like it's a strict subset of pointers. You can't sext it, zext it or trunc it, like you can with integers. You can bitcast it, but only to another pointer.

The use case you mentioned was that some native system APIs want integers that are the same size as pointers. So why not just declare those arguments or fields with a pointer type in LLVM? Then you've got a field with the right size.

Nick

Kenneth_Uildriks · November 11, 2009, 8:22pm

Kenneth Uildriks wrote:

In my case, I've been attempting to build a target-neutral frontend. In
my
tool chain, the target is specified at link time, not at compile time.
Among
other things, that means that the same IR file can be used for multiple
targets.

That's the direction I'm going in too.

What strikes me is how tantalizingly close LLVM comes to being able to do
this. I am surprised, for example, that I can general all of the DWARF
debugging structures without ever having to choose a target machine. Most
things can be done quite easily without knowing the exact size of a
pointer.
When it comes to being able to "generate once, run anywhere", LLVM is
like
99.5% of the way there. Which makes that last remaining .5% particularly
vexing.

There's only a tiny handful of fairly esoteric cases which require
selecting
a target before you generate IR. Unfortunately, the "pointer the same
size
as an int" is one of these rare cases - it is something that is very
painful to try and work around. (A similar esoteric use case is: "which
of
the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the
union
problem.)

I'm willing to spend some time on adding intp to LLVM... my
front-end's standard libraries would be cleaner and more portable that
way.

Sorry, but I'm still opposed. From your description of 'intp' it sounds like
it's a strict subset of pointers. You can't sext it, zext it or trunc it,
like you can with integers. You can bitcast it, but only to another pointer.

You can do integer arithmetic & bitwise operations with it. You can
convert it to other types of integers, although you wouldn't be able
to tell whether you were truncating or zexting them at IR-generation
time (at least not without target data). You can create literal
values of intp type. You can, of course, safely convert it to/from a
pointer.

intp is an integer, not a pointer. It's sized the same as a pointer,
so you can use it as a pointer offset, a size parameter, or something
along those lines, without having to know how big a pointer.

The use case you mentioned was that some native system APIs want integers
that are the same size as pointers. So why not just declare those arguments
or fields with a pointer type in LLVM? Then you've got a field with the
right size.

But not necessarily the right alignment. Some platforms align
pointers differently from ints.

Nick_Lewycky · November 11, 2009, 8:46pm

Kenneth Uildriks wrote:

Kenneth Uildriks wrote:

In my case, I've been attempting to build a target-neutral frontend. In
my
tool chain, the target is specified at link time, not at compile time.
Among
other things, that means that the same IR file can be used for multiple
targets.

That's the direction I'm going in too.

What strikes me is how tantalizingly close LLVM comes to being able to do
this. I am surprised, for example, that I can general all of the DWARF
debugging structures without ever having to choose a target machine. Most
things can be done quite easily without knowing the exact size of a
pointer.
When it comes to being able to "generate once, run anywhere", LLVM is
like
99.5% of the way there. Which makes that last remaining .5% particularly
vexing.

There's only a tiny handful of fairly esoteric cases which require
selecting
a target before you generate IR. Unfortunately, the "pointer the same
size
as an int" is one of these rare cases - it is something that is very
painful to try and work around. (A similar esoteric use case is: "which
of
the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the
union
problem.)

I'm willing to spend some time on adding intp to LLVM... my
front-end's standard libraries would be cleaner and more portable that
way.

Sorry, but I'm still opposed. From your description of 'intp' it sounds like
it's a strict subset of pointers. You can't sext it, zext it or trunc it,
like you can with integers. You can bitcast it, but only to another pointer.

You can do integer arithmetic & bitwise operations with it. You can
convert it to other types of integers, although you wouldn't be able
to tell whether you were truncating or zexting them at IR-generation
time (at least not without target data). You can create literal
values of intp type. You can, of course, safely convert it to/from a
pointer.

I'd be happy to permit arithmetic and bitwise operations on pointers. (I thought we already did. We don't.)

You still can't create literal values with it (besides null) because you don't know whether your constant will fit. Or rather, what you get is exactly what the inttoptr instruction already gives you.

intp is an integer, not a pointer. It's sized the same as a pointer,
so you can use it as a pointer offset, a size parameter, or something
along those lines, without having to know how big a pointer.

The use case you mentioned was that some native system APIs want integers
that are the same size as pointers. So why not just declare those arguments
or fields with a pointer type in LLVM? Then you've got a field with the
right size.

But not necessarily the right alignment. Some platforms align
pointers differently from ints.

Within a structure or array you mean? Or do you mean that some platforms pass pointers and integers differently as function arguments?

Nick

Kenneth_Uildriks · November 11, 2009, 8:52pm

I'd be happy to permit arithmetic and bitwise operations on pointers. (I
thought we already did. We don't.)

That would be very helpful too.

You still can't create literal values with it (besides null) because you
don't know whether your constant will fit. Or rather, what you get is
exactly what the inttoptr instruction already gives you.

Within a structure or array you mean? Or do you mean that some platforms
pass pointers and integers differently as function arguments?

Nick

I mean within a structure or array. I don't know whether any
platforms would pass them any differently as function arguments, but I
do know that the "default" data layout (which I believe is the Sparc
data layout) aligns int32's on 32-bit boundaries and 32-bit pointers
on 64-bit boundaries.

Duncan_Sands · November 12, 2009, 9:42am

I mean within a structure or array. I don't know whether any
platforms would pass them any differently as function arguments, but I
do know that the "default" data layout (which I believe is the Sparc
data layout) aligns int32's on 32-bit boundaries and 32-bit pointers
on 64-bit boundaries.

Wrap the pointer in a packed struct maybe?

Ciao,

Duncan.

Kenneth_Uildriks · November 12, 2009, 5:20pm

That sets the alignment to 1 unconditionally, which might also be wrong.

Now I assume that a one-element regular struct would have the same
alignment as the element, right? In that case:

1. A type such as intp will get the following definition:

struct {
{intAAA, intBBB, intCCC, intDDD, intEEE, intFFF, intGGG, intHHH}*[0];
int32 or int64
};

where AAA, BBB, CCC, DDD, EEE, FFF, GGG, HHH are 16 bit values that
combine to form a GUID. Since it's zero-sized, the struct as a whole
will align the same way as the int.

Then define a transform that looks for the GUID and changes the int32
to the platform-specific intp integer-size. This transform will be
run at link time.

Duncan_Sands · November 12, 2009, 7:16pm

Hi Kenneth,

Talin1 · November 12, 2009, 7:29pm

Well, as far as intp goes (or iptr if you prefer - the naming convention in LLVM is i), here’s what I would expect:

General rule #1: If an instruction accepts both i32 and i64, then it should accept iptr as well. If it only accepts i32, then it can continue to only accept i32.
General rule #2: It should support operations that are commonly used with size_t and ptrdiff_t.
Operations that should work with iptr:
Basic math: add, subtract, multiply, divide, mod.
Bitwise binary operators: shl, ashr, lshr, and, or, xor, etc.
Comparison operations.
alloca - currently doesn’t work with i64, should it?
GEP - rules are the same as for using i64 indices.
memcpy intrinsics
bit manipulation intrinsics
overflow arithmetic intrinsics - would be nice
atomic intrinsics - would be very nice (I assume that atomic iptr works on all platforms that support atomics: That is, on 32-bit platforms where iptr == i32 I expect atomic i32 to work; on 64-bit platforms where iptr == i64 I expect atomic i64 to work).- Operations that don’t need to work with iptr - i.e. I don’t mind having to convert to some other int type first:
switch
extractelement / insertelement / shufflevector
extractvalue / insertvalue - not sure about these.
code generator intrinsics (frameaddress, etc.)- Converting to pointer types: inttoptr and ptrtoint should be no-ops, effectively.
Converting to other integer types: The issue here is that with other integer conversions in LLVM, you are required to know whether or not you are converting to a larger or smaller size - whether to use an ext or a trunc instruction. When converting to pointers, however, the choice of trunc or ext is automatic. Ideally, conversion to iptr would work the same way as conversion to a pointer type. There’s also the issue of signed vs. unsigned extension.
Note that some constant-folding operations would need to be deferred until the target size is established.- Converting to FP types: Either don’t support (i.e. require casting to known-width integer first), or map to i32->FP or i64->FP after the size is known.

Topic		Replies	Views
How to create an IntegerType of the native word size LLVM Dev List Archives	2	78	January 24, 2011
llvm "iword" type LLVM Dev List Archives	6	71	August 9, 2010
Integer questions LLVM Dev List Archives	11	106	September 10, 2008
inttoptr weirdness LLVM Dev List Archives	6	75	December 15, 2009
Various Intermediate Representations. IR LLVM Dev List Archives	7	88	April 16, 2020

Proposal: intp type

Related topics