Why are GEPs type based?

Hi,

I’ve been wondering why LLVMs GEP instructions are based on types, rather than encoding the raw address calculation as a base pointer plus some scaled offsets (still in the form of a GEP, to retain provenance).

The type information does not seem particularly useful (shouldn’t be used as an optimization base, because struct layouts lie), but increases the non-canonical IR space (there are many ways to encode the same GEP) and increases compile-time (optimizations need to constantly decompose GEPs, e.g. to get constant offsets).

What am I missing here?

Nikita,
Regards

Hi,

Although I’m not an expert on the topic, there are at least two reasons:

  1. It looks more like C/C++ than computing offsets. This goes hand in hand with the fact that GEP abstracts
    target-specific information. For example, a pointer is 4 bytes in a typical 32-bit system but 8 bytes in a 64-bit system.
    If you have a struct like:
    struct {
    int *p;
    int v;
    };

To get v, with a GEP you just say “give me the second member”. If you were to code this with offsets, you would need to
know the target, something that generally front-ends are not good to have a dependency on (Clang and other front-ends actually
have and that’s another big discussion).

  1. It’s very important for alias analysis. Again, not an expert on that, but e.g. see the first rule on when a pointer is based on
    another (pointer) here: https://llvm.org/docs/LangRef.html#pointeraliasing

Best regards,
Stefanos

Στις Δευ, 13 Ιουλ 2020 στις 11:08 μ.μ., ο/η Nikita Popov via llvm-dev <llvm-dev@lists.llvm.org> έγραψε:

You are right that it’s mostly a convenience for the front-ends. So they don’t have to deal with boring things like padding and sizing things.

Otherwise it adds no semantic value. Object aliasing is not field sensitive in LLVM, so it doesn’t matter. Though someone may want to add support for that in the future for languages where it’s ok to do so.

FWIW, Alive2’s GEP instruction works over bytes only (pairs of constant * %reg). Though I’m not sure I would advocate to change LLVM’s representation.

Nuno

Good to know, thanks for the info.

  • Stefanos