What if there are no objects at the address yet and the following store is “creating” an object?
Stores don’t create objects; those are created by allocations (e.g., alloca, call to malloc, global variable).
It looks like due to these semantic distinctions these two equivalent C functions compile to non-equivalent IR and are optimized differently: Compiler Explorer
I haven’t read closely the current C pointer provenance papers to be certain, but those two C functions are not in fact equivalent. My recollection of the current leading PVNI proposal is that the second function is UB if the four writes straddle an object boundary, whereas the first function is still defined in that case (because provenance is not tracked through integers).
If LLVM is unable to reason about inttoptrs due to the potential refinement issues (even though the source code doesn’t appear to have them, although I could be wrong), is there some forward way to make foo optimize the same as bar?
Probably not. inttoptr may look easy to optimize, but it has very nonobvious pointer provenance ramifications. However, the other issue with pointer provenance is that proper specification for it is still an area of active academic research. There’s no proper specification of pointer provenance in the C specification (though it is still generally seen as implied), and the specific wording for pointer provenance currently in the LLVM IR LangRef is itself somewhat inconsistent with the current understanding. This makes being very specific about what is and isn’t legal around pointer provenance, and especially inttoptr, challenging, and without that specificity, we need to be extremely wary about optimizing inttoptr. The standing guideline around inttoptr is to try to avoid generating it wherever possible, so that the hard questions around semantics just don’t kick in in the first place.
Stores don’t create objects; those are created by allocations (e.g., alloca , call to malloc , global variable).
What object does malloc create? What is its type and size? Same for alloca.
haven’t read closely the current C pointer provenance papers to be certain, but those two C functions are not in fact equivalent. My recollection of the current leading PVNI proposal is that the second function is UB if the four writes straddle an object boundary, whereas the first function is still defined in that case (because provenance is not tracked through integers).
Assume that p = (long)malloc(N);, where N > 4; What boundary could be straddled here?
Probably not.
Being unable to turn store i8, (inttoptr x+i) into store <i8 x N> (inttoptr x) is unfortunate as GCC doesn’t seem to have an issue optimizing such code and it doesn’t appear to be that uncommon pattern.
Alright, so it’s not in a higher level C++ sense. Object is just some “allocated slab of memory with a specific size”, correct?
So as long as we can guarantee that no stores/loads were done outside of original “memory slab” pointed to by inttoptr %p, or even if we are fine with UB if this happens (due to higher level semantics of the language the IR was generated from, although not sure how to express it in IR yet), this transformation is ok?
Assume that p = (long)malloc(N);, where N > 4; What boundary could be straddled here?
If we knew the object that p pointed to came from such a malloc, then sure, we could conceivably optimize it away [1]. But in the program fragment given, the compiler doesn’t know that p has to come from such a malloc.
[1] I would still be very hesitant to sanction such an optimization until pointer provenance is pinned down better; there’s too many edge cases of pointers having the same integer value yet different provenances to feel comfortable optimizing it without a formal semantics. This is a space with a long history of miscompiles due to underspecified provenance concerns.
Is there a way to express this notion in IR currently? Say, you want to generate IR for something similar to memset that takes intptr instead of regular pointer parameter but has the same “contract” that all accessed memory belongs to the same original allocation.
You mean the 2018 paper? Does it reflect the current state of things in LLVM (it’s been 5 years)? Haven’t finished it yet (especially the concrete operation semantics), but according to it does non-inbounds gep on wildcard-provenance ptr produce non-wildcard provenance? The examples look as if they only discuss inbounds gep, but the description (deferring UB to the use) reads as non-inbounds gep.
Maybe that’s why GCC has long-standing bugs like this or this where code using int2ptr casts gets compiled incorrectly.
If you are curious about the plethora of pitfalls around int-to-ptr casts, I will also shamelessly plug my blog post on the subject, and this follow-up.
I have another clarifying question. Is it correct to state that in the current model all objects/allocations are continuous? I.e. if LLVM sees load %p and load (gep %p, 42) it can (not sure if it currently does) assume that %p is dereferenceable(42), correct?
Isn’t that a wrong assumption though (assuming this is a non-inbounds GEP)? With mmap I can create allocations that have holes in them. I would assume that the following is legal:
have x point to an mmap’ed page
also map another page at x + 2*PAGE_SIZE, but no page mapped at x + PAGE_SIZE
load from x and from GEP x, 2*PAGE_SIZE
Both of these loads clearly access mapped dereferenceable memory. The GEP itself cannot be a problem since it has no inbounds. But if LLVM now assumes that x is dereferenceable(2*PAGE_SIZE) that is clearly wrong.
One might want to justify this by saying that each separate mmap is a separate allocation (like a separate malloc), but I don’t think this is a realistic assumption. Extending an existing mmap’ed region by mapping more pages at its end should be possible.
Surely several adjacent mmap, even if executed separately, should be a single allocation so that I can have a single large object on them.
If I implement an allocator, and use mmap to slowly page in more and more memory, I’ll absolutely want to have allocations that span pages that were brought in via separate mmap calls.
Wouldn’t it be a “single allocation” once it returns from your allocator “malloc”/new? I didn’t get an impression that any of the memory models attempted to support such tricks with mmap (it has lots of ways to customize which I don’t think make sense to deal with in detail).
You’ll just have to be careful to not deal with such allocations as a single object inside your allocator.
In some papers I’ve read, allocator is supposed to be some sort of an external “oracle”.
Each pointer can only reference 1 object (ignoring int-to-pointer casts for a second).
That implies that if one does load %p and load (gep %p, 42), then both loads access the same object and there’s no gap between them.
If you do multiple mmaps, each pointer return is either a different object if the call is marked with noalias, or is of some unknown object otherwise. In the latter case, LLVM doesn’t know the object, but it assumes that each object’s memory is layed out contiguously.
Ralf’s example can only be handled safely with integers and integer-to-pointer casts. Integers don’t have provenance, so they can hop between memory blocks.
TL;DR: you can’t access multiple memory blocks with the same pointer. That breaks the object provenance rule (that c++ also has).
64-bit address spaces are big, just reserve a giant slab of PROT_NONE memory and over-map as you need it, or something like that. That’s approximately what snmalloc does (and GHC has done for a long time IIRC).