Is pointer tagging defined behavior?

Dynamic languages commonly use an implementation technique where you take a pointer to an object (aligned on eight bytes so the lower three bits are zero), cast to intptr_t, change the lower three bits to a tag value indicating the type of the object, then later test the tag value, remove the tag, cast back to a pointer and dereference the pointer.

As I understand it, the standard says this is implementation defined. Does LLVM consider it to be defined behavior?

If so, is this still true if you write your own memory manager that allocates chunks of memory (rounded up to 8 bytes) from a big char array?

(Assuming a mainstream platform such as x64 - I’m not talking about a scenario where there is an unusual CPU architecture.)

That doesn't sound exactly right.

In the implementations I've seen, pointers always have tags with all 0
bits. So if the thing is actually a pointer you AND with 0x7 and find the
result is zero then you just go ahead and use the original value as a
pointer.

If the tag bits are nonzero then you don't have a pointer at all, you have
an integer or character or single float.

However. It's not out of the question that you might use some tag values to
indicate pointers to special kinds of objects that the runtime knows about,
such as strings or arrays. Even so, the tagged pointer is still guaranteed
to look like a pointer into somewhere in the first 8 bytes of the same
object (objects are never smaller than 8 bytes), so that's perfectly well
defined.

The only possible objection is that the pointer will be misaligned. I
believe you can find a discussion here in the last several months in which
it was stated that misaligned pointers are always ok on any machine,
provided that they are not dereferenced. In the case of a tagged pointer,
the pointer is always aligned before being dereferenced, either by masking,
or subtracting, or as an immediate offset (possibly combined with a field
offset).

V8 does (or used to do?) the opposite -- pointers are tagged in their
low bits, and integers are not. Since most of the time you're
accessing an offset within the pointer anyway, you didn't have to
unmask the low bits out, but could instead change the offset instead
(i.e. load from Ptr+15 instead of Ptr+16, since pointers have their
lowest bit set (say)).

Not tagging integers then makes most integer math cheaper
e.g. addition of two unpacked integers can just be an add instruction,
multiplication needs only one shift instead of two etc..

-- Sanjoy