Even though this is almost verbatim to what’s on the doc, it’s a misconception
and perhaps the doc should be updated).
The gep
operation does not know if the object is allocated, freed or even valid. It does not need to know, as it only performs the address calculation (offset from a base pointer), not the actual load.
See The Often Misunderstood GEP Instruction — LLVM 19.0.0git documentation
Not quite. There is no past or current object sizes, as gep
operates on compile-time types. If you access two different objects from the same pointer location, you’ll need two different gep
s operating on two different struct types (or different offsets on the same opaque pointer).
If you only use pointers, there is no way to calculate inbounds. If you use structs or arrays of structs, it’s only meaningful to calculate inbounds of a “potential” object of that type inside an array that is expected to be still valid.
To “know” the object size or if it’s still allocated, you need runtime information or a very clever compile time semantics. None of which are available to LLVM IR (as it’s too low level).
All of that is controlled by the runtime. LLVM can only trust you’re doing “the right thing”.
This has been discussed in the context of MLIR, but only because we control semantics at that level.
To do this at LLVM level you’d need to create operations / intrinsics / address spaces that are known to have certain properties (ex. bounds, liveness, containership), which is a big diversion from the current design so unlikely to work upstream.
This looks like something similar to pointer provenance and capability control. Take a look at CHERI papers.
As a proof-of-concept, I suggest you try some special intrinsics that a pass could infer semantics using some global analysis and only rely on gep
for the address calculation (as it was intended). Long term, you should also take a look at MLIR as a way to keep that semantics as a dialect and then lower to LLVM at the end.