PointerIntPair

The TableGen field value (RecordVal) class includes a PointerIntPair. The integer is a boolean that specifies whether the field was defined with the 'field' prefix.

I'd like to add another boolean to RecordVal to specify whether the field is actually a template argument. I could make the Int an enum instead, with three elements. That would require 2 bits. I see this has been done elsewhere.

What do people think about PointerIntPair? Is it a good thing to use? I could simply add the enum as a new data item in the RecordVal, but I estimate this could add 10--12 MB to the memory requirement of TableGen. (The AMDGPU target's TableGen files result in about 3 million record fields.)

While I'm here, let me ask how the PointerIntPair works at all. Does every target guarantee at least 4-byte alignment of stack/heap objects?

A PointerIntPair can be placed in the pointer part of another PointerIntPair. So you can nest them to have 2 booleans without creating an enum. This works because PointerIntPair always puts the bit in the highest bit possible leaving the low bits free to be used by another int.

The number of bits available in a pointer depends on the alignment requirement of the type the pointer points to. If it’s a class/struct it depends on the largest alignment required by its fields. The alignof operator is used to check the alignment.

The bits are orthogonal, so I think an enum makes more sense. I take it from your response that you think a PointerIntPair is a fine thing to use.

There are no alignment requirements on an X86, for example. I presume that compilers impose alignments anyway, but are they consistent enough to rely on?

Does every target guarantee at least 4-byte alignment of stack/heap objects?

The short answer is no, for example, check here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
paragraph 6.2.8, point 1
There are “informal” rules though e.g., malloc is assumed to return at least 4-byte aligned pointers.

I presume that compilers impose alignments anyway, but are they consistent enough to rely on?

It depends on what you mean “rely on”. If you have less alignment from one compiler to another, your code won’t compile because static_assert's are going to fire [1]. So, you can rely on
that in the sense that you won’t get a runtime error. However, the way I understand it, you can’t rely on not even getting a compile error across compilers.

In practice, however, you’ll almost never get an error because the alignments are for the most part consistent.

[1] Here’s the actual place in the code: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/PointerIntPair.h#L148
If you follow what this is based on (i.e. the traits), you eventually end up here for the default case: https://llvm.org/doxygen/PointerLikeTypeTraits_8h_source.html#l00056

Best,
Stefanos

Στις Πέμ, 31 Δεκ 2020 στις 9:13 μ.μ., ο/η Paul C. Anagnostopoulos via llvm-dev <llvm-dev@lists.llvm.org> έγραψε:

The bits are orthogonal, so I think an enum makes more sense.

I’m not quite following this - the enum would be used if these two properties are mutually exclusive, right? (ie: you can have one or the other, but not both) - that’s not what I’d describe as “orthogonal” - orthogonal things vary independently of one another. (ie: you can have one feature, or the other, or neither, or both) If these properties can’t both be enabled at the same time, then they’re not orthogonal - they’re mutually exclusive.

I take it from your response that you think a PointerIntPair is a fine thing to use.

There are no alignment requirements on an X86, for example. I presume that compilers impose alignments anyway, but are they consistent enough to rely on?

Depends what you’re pointing to - if you were pointing to a char it’ll have alignment 1 (necessary - if you had an array of characters you can’t have padding between them, so you’ll have pointers with no alignment/no spare low bits).

If you’re only pointing to malloc/new’d blocks, you’ve got some wiggle room imposed by the allocator. In some places LLVM code explicitly overaligns certain types to provide more spare low bits in their pointers.

Yes, sorry, I meant they are not orthogonal. I'll get a better night's sleep tonight, I promise. It's not like I'm going to a party or anything.

I found three places in LLVM that rely on 3 bits being available, and plenty of places that rely on 2. So I'll be fine with a 2-bit enum. It's a simple change.