The definition of getTypeSize

Now that I'm working on codegen support for arbitrary precision
integers (think i36 or i129), I've hit the problem of what
getTypeSize and friends should return for such integers (the
current implementations seem to me to be wrong). However it's
not clear to me how getTypeSize and friends are defined.

There seem to be several possible meanings for the size of a type
(only talking about primitive types here):

(1) The minimum number of bits needed to hold all values of the type.
(2) The minimum number of bits read by a load (maybe a better way of
saying this: if you load a value and store it somewhere else, how many
bits are correctly copied?)
(3) The maximum number of bits that may be overwritten by a store.
(4) The amount of memory allocated by an alloca or malloc for a
variable of this type.
(5) The spacing between successive variables of this type in an
array or struct.

For example, take i36. For this type (1) is 36; (2) is also 36
(a load typically expands to two 32 bit loads, but any bits beyond 36
are discarded); (3) is 64 given my current implementation (a store writes
two 32 bit values; bits beyond bit 36 hold some rubbish which overwrites
whatever was originally at that memory location); (4) needs to be at
least 64; (5) will also be 64.

In general (1) and (2) will be the same. (4) needs to be at least
as big as (3). (5) needs to be at least as big as (4).

Another example is 80-bit floating point types. Here (1), (2)
and (3) are presumably 80 bits. On my machine (5) is 96 bits.
I'm not sure what (4) is, presumably 80 or 96.

Which (if any) of these should getTypeSize, getABITypeSize, getTypeSizeInBits
and getABITypeSizeInBits correspond to?

It seems clear that getTypeSizeInBits corresponds to (1) and (2), as
shown by it returning 36 for i36. This is like gcc's TYPE_PRECISION,
and is a useful concept - but I think the name should be changed, since
right now it implicitly suggests it returns 8*getTypeSize. If no one
objects, I will rename it to getBitsUsedByType.

Currently getTypeSize doesn't seem to correspond to any of these possibilities,
at least for APInt's: the current implementation returns the APInt bitwidth rounded
up to a multiple of the alignment. That makes it sound like it's trying to be (5).
I think getTypeSize should be defined to be (3), the maximum number of bits that
may be overwritten by a store [except that it's in bytes]. This means changing the
implementation for APInts, but not for other types.

Clearly getABITypeSize corresponds to (5) [except that it's in bytes]. This
corresponds to gcc's TYPE_SIZE.

Currently getABITypeSizeInBits returns 36 for i36, and otherwise 8*getABITypeSize.
It seems to me that this is clearly wrong for APInts, and that it should always
return 8*getABITypeSize (in which case it can be eliminated). If no one objects,
I will delete it as redundant.

Finally, looking through various users of getTypeSize, it seems that they assume
that getTypeSize returns (4), the amount of memory allocated for a variable of
this type. That seems reasonable. If everyone agrees, I will document that
for LLVM (3) and (4) coincide, and is what getTypeSize returns [except that it
returns the number of bytes, rather than the number of bits].

Best wishes,

Duncan.

Now that I'm working on codegen support for arbitrary precision
integers (think i36 or i129), I've hit the problem of what
getTypeSize and friends should return for such integers (the
current implementations seem to me to be wrong). However it's
not clear to me how getTypeSize and friends are defined.

There seem to be several possible meanings for the size of a type
(only talking about primitive types here):

(1) The minimum number of bits needed to hold all values of the type.
(2) The minimum number of bits read by a load (maybe a better way of
saying this: if you load a value and store it somewhere else, how many
bits are correctly copied?)
(3) The maximum number of bits that may be overwritten by a store.
(4) The amount of memory allocated by an alloca or malloc for a
variable of this type.
(5) The spacing between successive variables of this type in an
array or struct.

For example, take i36. For this type (1) is 36; (2) is also 36
(a load typically expands to two 32 bit loads, but any bits beyond 36
are discarded); (3) is 64 given my current implementation (a store writes
two 32 bit values; bits beyond bit 36 hold some rubbish which overwrites
whatever was originally at that memory location); (4) needs to be at
least 64; (5) will also be 64.

Why is (5) 64? Can it be 40?

Should (4) be the same as (5) since alloca / malloc are allocating an array of the specific type?

In general (1) and (2) will be the same. (4) needs to be at least
as big as (3). (5) needs to be at least as big as (4).

Do you really need all these "size"? What about just "size in bits", "storage size in bits", and "abi size"? The first is the exact size of the type (i.e. 36); the second is the size rounded up to some nature boundary for load / store (i.e. 64); the last one is the size including alignment padding when it's part of a larger object (i.e. 40?).

Another example is 80-bit floating point types. Here (1), (2)
and (3) are presumably 80 bits. On my machine (5) is 96 bits.
I'm not sure what (4) is, presumably 80 or 96.

Which (if any) of these should getTypeSize, getABITypeSize, getTypeSizeInBits
and getABITypeSizeInBits correspond to?

TypeSize == "real size", ABITypeSize == "abi size". You will need another pair for the storage size?

It seems clear that getTypeSizeInBits corresponds to (1) and (2), as
shown by it returning 36 for i36. This is like gcc's TYPE_PRECISION,
and is a useful concept - but I think the name should be changed, since
right now it implicitly suggests it returns 8*getTypeSize. If no one
objects, I will rename it to getBitsUsedByType.

Isn't it the other way around? Type information should be specified in bits, not in bytes. So getTypeSizeInBits returns the exact size in bits. I don't see how the new name is any clearer. I actually prefer the current name.

Currently getTypeSize doesn't seem to correspond to any of these possibilities,
at least for APInt's: the current implementation returns the APInt bitwidth rounded
up to a multiple of the alignment. That makes it sound like it's trying to be (5).
I think getTypeSize should be defined to be (3), the maximum number of bits that
may be overwritten by a store [except that it's in bytes]. This means changing the
implementation for APInts, but not for other types.

To me getTypeSize is getTypeSizeInBits divided by 8 and rounded up. I think renaming it to getTypeSizeInBytes make sense.

Clearly getABITypeSize corresponds to (5) [except that it's in bytes]. This
corresponds to gcc's TYPE_SIZE.

Right.

Currently getABITypeSizeInBits returns 36 for i36, and otherwise 8*getABITypeSize.
It seems to me that this is clearly wrong for APInts, and that it should always
return 8*getABITypeSize (in which case it can be eliminated). If no one objects,
I will delete it as redundant.

I'd suggest you not deleting anything for now. Let these evolve and see what really makes sense after a while.

Evan

Hi Evan, thanks for replying.

> Now that I'm working on codegen support for arbitrary precision
> integers (think i36 or i129), I've hit the problem of what
> getTypeSize and friends should return for such integers (the
> current implementations seem to me to be wrong). However it's
> not clear to me how getTypeSize and friends are defined.
>
> There seem to be several possible meanings for the size of a type
> (only talking about primitive types here):
>
> (1) The minimum number of bits needed to hold all values of the type.
> (2) The minimum number of bits read by a load (maybe a better way of
> saying this: if you load a value and store it somewhere else, how many
> bits are correctly copied?)
> (3) The maximum number of bits that may be overwritten by a store.
> (4) The amount of memory allocated by an alloca or malloc for a
> variable of this type.
> (5) The spacing between successive variables of this type in an
> array or struct.
>
> For example, take i36. For this type (1) is 36; (2) is also 36
> (a load typically expands to two 32 bit loads, but any bits beyond 36
> are discarded); (3) is 64 given my current implementation (a store
> writes
> two 32 bit values; bits beyond bit 36 hold some rubbish which
> overwrites
> whatever was originally at that memory location); (4) needs to be at
> least 64; (5) will also be 64.

Why is (5) 64? Can it be 40?

This is due to a technical difficulty on big-endian machines. In fact
only 40 bits will be written, but with the simplest implementation this
will be the first 5 bytes of 8 on a little-endian machine and the last
5 bytes of 8 on a big-endian machine. Thus with this implementation in
general you need to allocate at least 8 bytes for a variable of this type.
In order to always only write to the first 5 bytes I would need to generalize
TRUNCSTORE to write to a range of bits starting from a possibly non-zero
bit-offset. Even then I'm not sure it can always be done on all targets
(see below). This doesn't seem worth the trouble.

Should (4) be the same as (5) since alloca / malloc are allocating an
array of the specific type?

Yes, I think so. Currently alloca allocates a multiple of getTypeSize
(see visitAlloca in SelectionDAGISel). This seems to be a bug - it
needs to use getABITypeSize. This also means that all the (many) places
that use getTypeSize as the amount of memory allocated by an alloca need
to be changed...

> In general (1) and (2) will be the same. (4) needs to be at least
> as big as (3). (5) needs to be at least as big as (4).

Do you really need all these "size"? What about just "size in bits",
"storage size in bits", and "abi size"? The first is the exact size
of the type (i.e. 36); the second is the size rounded up to some
nature boundary for load / store (i.e. 64); the last one is the size
including alignment padding when it's part of a larger object (i.e.
40?).

Given that (1) = (2) = "size in bits", (3) = "storage size in bits",
and (4) = (5) = "abi size", we are in agreement here. But I wanted
to be sure that everyone agrees about these notions. I disagree that
for i36 the "storage size in bits" should be 64 while the "abi size"
should be 40, though I understand where you are coming from. Since the
"abi size" you describe is clearly (5), and you just suggested that
(4) and (5) should be the same, you seem to be saying that an alloca
should allocate 40 bits. However at the same time you suggest that
stores ("storage size in bits") should be 64 bits, bigger than the size
that would be alloc'd! Well, probably I misunderstood you. This
ease of misunderstanding is one reason why I want to better document
the meaning of the existing size methods.

> Another example is 80-bit floating point types. Here (1), (2)
> and (3) are presumably 80 bits. On my machine (5) is 96 bits.
> I'm not sure what (4) is, presumably 80 or 96.
>
> Which (if any) of these should getTypeSize, getABITypeSize,
> getTypeSizeInBits
> and getABITypeSizeInBits correspond to?

TypeSize == "real size", ABITypeSize == "abi size". You will need
another pair for the storage size?

What is "real size"? Do you mean getTypeSizeInBits rounded up to a
multiple of 8? That was my first thought too, but I came to the conclusion
that it isn't useful. What seems to me more useful is (3), the number
of bits that may be overwritten by a store. You may well wonder why this
isn't the same thing. Why isn't it 40 for an i36? As I explained above this
because the simplest implementation causes 5 bytes to be overwritten, but
starting at a 3 byte offset from the pointer on a big-endian machine.
Maybe it is worth me working harder and ensuring that always only the first
5 bytes are overwritten. Then getTypeSize could be 5 which would be what
everyone expects. But how is this to work on a target that can't do byte
stores? On such a machine it seems to me that getTypeSize would have to be
strictly bigger than getTypeSizeInBits rounded up to a multiple of 8...
How does this work for i8 on such machines?

> It seems clear that getTypeSizeInBits corresponds to (1) and (2), as
> shown by it returning 36 for i36. This is like gcc's TYPE_PRECISION,
> and is a useful concept - but I think the name should be changed,
> since
> right now it implicitly suggests it returns 8*getTypeSize. If no one
> objects, I will rename it to getBitsUsedByType.

Isn't it the other way around? Type information should be specified
in bits, not in bytes. So getTypeSizeInBits returns the exact size in
bits. I don't see how the new name is any clearer. I actually prefer
the current name.

For me the problem is that "Size" is too overloaded. I'm fine with keeping
the name - in that case I'll just add some more documentation.

> Currently getTypeSize doesn't seem to correspond to any of these
> possibilities,
> at least for APInt's: the current implementation returns the APInt
> bitwidth rounded
> up to a multiple of the alignment. That makes it sound like it's
> trying to be (5).
> I think getTypeSize should be defined to be (3), the maximum number
> of bits that
> may be overwritten by a store [except that it's in bytes]. This
> means changing the
> implementation for APInts, but not for other types.

To me getTypeSize is getTypeSizeInBits divided by 8 and rounded up. I
think renaming it to getTypeSizeInBytes make sense.

Many parts of LLVM are using getTypeSize as the size of an alloca. That
seems to be wrong - I guess they should all be using getABITypeSize. In
that case is getTypeSize useful? I don't see the point of having it be
getTypeSizeInBits rounded up to a multiple of 8. In fact that seems dangerous.
What would it be used for? The number of bits stored in a write? But that
is (3) and as I pointed out it may need to be a bigger number. The number
of bits read in a load? But that might be less bits (36 for an i36) so you
may make wrong conclusions. I would rather define it as (3), since that has
a pretty clear meaning, and seems useful.

> Clearly getABITypeSize corresponds to (5) [except that it's in
> bytes]. This
> corresponds to gcc's TYPE_SIZE.

Right.

>
> Currently getABITypeSizeInBits returns 36 for i36, and otherwise
> 8*getABITypeSize.
> It seems to me that this is clearly wrong for APInts, and that it
> should always
> return 8*getABITypeSize (in which case it can be eliminated). If
> no one objects,
> I will delete it as redundant.

I'd suggest you not deleting anything for now. Let these evolve and
see what really makes sense after a while.

Sure.

Ciao,

Duncan.

What are about packed vs non-packed structs ? You'll have to look at the container of the type.

I do not have a strong opinion on the naming of the various size functions, or on how to deal with APInt's of sizes that are not multiples of a byte.

I do think loading and storing more bytes than necessary is generally a bad idea, especially if you're getting uninitialized bits. (If you incorrectly cross a page boundary, you might fault, for one thing. Watch out for packed structs.)

I also think the current general approach to x86 long double is best: use 80 bits for the underlying size, and where extra bytes are required for padding, do that by looking at the alignment. There is no need to allocate more than 10 bytes for scalars of this type, and we currently don't for stack objects.

(The code you point to in visitAlloca does look like it might be a problem, but I can't make it fail. Known-size arrays appear here as a single object [3 x x86_f80], and the size is computed
correctly; unknown-size arrays have the size of the elements passed from the FE, and this includes padding. How does that size*count computation get executed?)

Hi Evan, thanks for replying.

Now that I'm working on codegen support for arbitrary precision
integers (think i36 or i129), I've hit the problem of what
getTypeSize and friends should return for such integers (the
current implementations seem to me to be wrong). However it's
not clear to me how getTypeSize and friends are defined.

There seem to be several possible meanings for the size of a type
(only talking about primitive types here):

(1) The minimum number of bits needed to hold all values of the type.
(2) The minimum number of bits read by a load (maybe a better way of
saying this: if you load a value and store it somewhere else, how many
bits are correctly copied?)
(3) The maximum number of bits that may be overwritten by a store.
(4) The amount of memory allocated by an alloca or malloc for a
variable of this type.
(5) The spacing between successive variables of this type in an
array or struct.

For example, take i36. For this type (1) is 36; (2) is also 36
(a load typically expands to two 32 bit loads, but any bits beyond 36
are discarded); (3) is 64 given my current implementation (a store
writes
two 32 bit values; bits beyond bit 36 hold some rubbish which
overwrites
whatever was originally at that memory location); (4) needs to be at
least 64; (5) will also be 64.

Why is (5) 64? Can it be 40?

This is due to a technical difficulty on big-endian machines. In fact
only 40 bits will be written, but with the simplest implementation this
will be the first 5 bytes of 8 on a little-endian machine and the last
5 bytes of 8 on a big-endian machine. Thus with this implementation in
general you need to allocate at least 8 bytes for a variable of this type.
In order to always only write to the first 5 bytes I would need to generalize
TRUNCSTORE to write to a range of bits starting from a possibly non-zero
bit-offset. Even then I'm not sure it can always be done on all targets
(see below). This doesn't seem worth the trouble.

Seems ok for the initial implementation. We can worry about size reduction later.

Should (4) be the same as (5) since alloca / malloc are allocating an
array of the specific type?

Yes, I think so. Currently alloca allocates a multiple of getTypeSize
(see visitAlloca in SelectionDAGISel). This seems to be a bug - it
needs to use getABITypeSize. This also means that all the (many) places
that use getTypeSize as the amount of memory allocated by an alloca need
to be changed...

Hrm. It does seem like it could be a bug. I am not too familiar with the code so someone please verify this. Chris?

In general (1) and (2) will be the same. (4) needs to be at least
as big as (3). (5) needs to be at least as big as (4).

Do you really need all these "size"? What about just "size in bits",
"storage size in bits", and "abi size"? The first is the exact size
of the type (i.e. 36); the second is the size rounded up to some
nature boundary for load / store (i.e. 64); the last one is the size
including alignment padding when it's part of a larger object (i.e.
40?).

Given that (1) = (2) = "size in bits", (3) = "storage size in bits",
and (4) = (5) = "abi size", we are in agreement here. But I wanted
to be sure that everyone agrees about these notions. I disagree that
for i36 the "storage size in bits" should be 64 while the "abi size"
should be 40, though I understand where you are coming from. Since the
"abi size" you describe is clearly (5), and you just suggested that
(4) and (5) should be the same, you seem to be saying that an alloca
should allocate 40 bits. However at the same time you suggest that
stores ("storage size in bits") should be 64 bits, bigger than the size
that would be alloc'd! Well, probably I misunderstood you. This
ease of misunderstanding is one reason why I want to better document
the meaning of the existing size methods.

You have explained why (5) should be 64 so that's clear now.

Another example is 80-bit floating point types. Here (1), (2)
and (3) are presumably 80 bits. On my machine (5) is 96 bits.
I'm not sure what (4) is, presumably 80 or 96.

Which (if any) of these should getTypeSize, getABITypeSize,
getTypeSizeInBits
and getABITypeSizeInBits correspond to?

TypeSize == "real size", ABITypeSize == "abi size". You will need
another pair for the storage size?

What is "real size"? Do you mean getTypeSizeInBits rounded up to a
multiple of 8? That was my first thought too, but I came to the conclusion

To me real size is (1), e.g. 36 for i36.

that it isn't useful. What seems to me more useful is (3), the number
of bits that may be overwritten by a store. You may well wonder why this
isn't the same thing. Why isn't it 40 for an i36? As I explained above this
because the simplest implementation causes 5 bytes to be overwritten, but
starting at a 3 byte offset from the pointer on a big-endian machine.
Maybe it is worth me working harder and ensuring that always only the first
5 bytes are overwritten. Then getTypeSize could be 5 which would be what
everyone expects. But how is this to work on a target that can't do byte
stores? On such a machine it seems to me that getTypeSize would have to be
strictly bigger than getTypeSizeInBits rounded up to a multiple of 8...
How does this work for i8 on such machines?

I dunno. I don't think llvm can codegen for such a machine. Maybe it's worth worrying about it at this point?

It seems clear that getTypeSizeInBits corresponds to (1) and (2), as
shown by it returning 36 for i36. This is like gcc's TYPE_PRECISION,
and is a useful concept - but I think the name should be changed,
since
right now it implicitly suggests it returns 8*getTypeSize. If no one
objects, I will rename it to getBitsUsedByType.

Isn't it the other way around? Type information should be specified
in bits, not in bytes. So getTypeSizeInBits returns the exact size in
bits. I don't see how the new name is any clearer. I actually prefer
the current name.

For me the problem is that "Size" is too overloaded. I'm fine with keeping
the name - in that case I'll just add some more documentation.

Currently getTypeSize doesn't seem to correspond to any of these
possibilities,
at least for APInt's: the current implementation returns the APInt
bitwidth rounded
up to a multiple of the alignment. That makes it sound like it's
trying to be (5).
I think getTypeSize should be defined to be (3), the maximum number
of bits that
may be overwritten by a store [except that it's in bytes]. This
means changing the
implementation for APInts, but not for other types.

To me getTypeSize is getTypeSizeInBits divided by 8 and rounded up. I
think renaming it to getTypeSizeInBytes make sense.

Many parts of LLVM are using getTypeSize as the size of an alloca. That
seems to be wrong - I guess they should all be using getABITypeSize. In
that case is getTypeSize useful? I don't see the point of having it be
getTypeSizeInBits rounded up to a multiple of 8. In fact that seems dangerous.
What would it be used for? The number of bits stored in a write? But that
is (3) and as I pointed out it may need to be a bigger number. The number
of bits read in a load? But that might be less bits (36 for an i36) so you
may make wrong conclusions. I would rather define it as (3), since that has
a pretty clear meaning, and seems useful.

I think rather than worrying about how these functions are being used *now* you should define them in a way that's clear and fix the incorrect uses. To me that means:

getTypeSizeInBits - actual size
getABITypeSizeInBits - ABI size when used in a larger structure
getStorageTypeSizeInBits - size of load / store

Clearly getABITypeSize corresponds to (5) [except that it's in
bytes]. This
corresponds to gcc's TYPE_SIZE.

Right.

Currently getABITypeSizeInBits returns 36 for i36, and otherwise
8*getABITypeSize.
It seems to me that this is clearly wrong for APInts, and that it
should always
return 8*getABITypeSize (in which case it can be eliminated). If
no one objects,
I will delete it as redundant.

I'd suggest you not deleting anything for now. Let these evolve and
see what really makes sense after a while.

Sure.

Thanks for doing this!

Evan

Hi Evan,

I think rather than worrying about how these functions are being used
*now* you should define them in a way that's clear and fix the
incorrect uses. To me that means:

getTypeSizeInBits - actual size
getABITypeSizeInBits - ABI size when used in a larger structure
getStorageTypeSizeInBits - size of load / store

I agree and will try to do something later this week.

Ciao,

Duncan.

PS: getStorageTypeSizeInBits -> getTypeStorageSizeInBits

Hi Dale,

I do not have a strong opinion on the naming of the various size
functions, or on how to deal with APInt's of sizes that are not
multiples of a byte.

I do think loading and storing more bytes than necessary is generally
a bad idea, especially if you're getting uninitialized bits.

I agree, and I agree that it is good that writing an x86 long double
stores 80 bits, not more. I don't much like the idea of i36 writing
64 bits rather than 40 bits, but as an initial implementation this
seems ok to me.

(If you incorrectly cross a page boundary, you might fault, for one thing.

I don't see how this can be a problem unless you write more than the
getABITypeSize, which would already be wrong for other reasons. Well,
I guess if you do an unaligned store of an x86 long double 10 bytes
before the end of a page you could get a fault if 12 bytes are written.
But since llvm-gcc says that sizeof(long double)=12 (or more) the program
was incorrect anyway.
Another issue is what TYPE_SIZE gcc thinks an i36 has. It would be
dangerous if the LLVM getABITypeSize was different.

Watch out for packed structs.)

I don't see that that's a problem if getTypeSize (size of a store)
and getABITypeSize are both 8 bytes for an i36.

I also think the current general approach to x86 long double is
best: use 80 bits for the underlying size, and where extra bytes are
required for padding, do that by looking at the alignment. There is
no need to allocate more than 10 bytes for scalars of this type, and
we currently don't for stack objects.

Don't forget that alloca can take a NumElements argument. It sounds
like you would like alloca to allocate getTypeSize bytes when NumElements
is not specified, and NumElements*getABITypeSize when NumElements is specified
(you could imagine allocating getTypeSize bytes when NumElements is 1, but since
NumElements may be a variable this would IMHO complicate things nastily).
That sounds a bit dangerous to me (not uniform -> easier for optimizers to make
mistakes), but otherwise I've no real objection. Maybe Chris can comment on this?

(The code you point to in visitAlloca does look like it might be a
problem, but I can't make it fail. Known-size arrays appear here as
a single object [3 x x86_f80], and the size is computed
correctly; unknown-size arrays have the size of the elements passed
from the FE, and this includes padding. How does that size*count
computation get executed?)

Since llvm-gcc tends to generate either an alloca without NumElements, or
an alloca with NumElements but for an i8, I doubt you can hit this without
hand crafting a test-case in LLVM assembler.

Best wishes,

Duncan.

BTW, I agree with this sentiment. In particular, every use of "getTypeSize" is currently probably broken for "odd sized" types. Thank you for tackling this Duncan!

-Chris

I think rather than worrying about how these functions are being used
*now* you should define them in a way that's clear and fix the
incorrect uses. To me that means:

getTypeSizeInBits - actual size
getABITypeSizeInBits - ABI size when used in a larger structure
getStorageTypeSizeInBits - size of load / store

Done:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20071029/055062.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20071029/055063.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20071029/055064.html

Ciao,

Duncan.