Motivation
A large number of C library functions report errors by setting the errno
variable. LLVM currently has no explicit way to model that a function can only write to errno
in particular, so we have to make very conservative assumptions about which memory can be clobbered by such library calls.
In a previous discussion on the topic, I suggested to solve this by emitting int
TBAA metadata on such FP libcalls, and this did end up being implemented. While this does somewhat work, it is something of a hack, and does not generalize well. For example, if the libcall can also legitimately access other memory (e.g. because it has pointer arguments), then this approach does not work.
Proposal
I propose to explicitly model errno
by adding a new location kind to MemoryEffects
. A function that only writes to errno
would have memory effects memory(errno: write)
or maybe memory(errnomem: write)
to stick closer to the spelling of other locations. A function that can read arguments and write errno would be memory(argmem: read, errnomem: write)
.
errno TBAA
Unfortunately, we can not actually make a lot of assumptions about which pointers may alias errno. For all we know, a pointer passed as a function argument might actually be a pointer to errno
.
For C-based languages with strict aliasing, one of the strongest guarantees we have is that errno
needs to be accessed with int-compatible TBAA (which is also how the original TBAA-based hack come about). A complication here is that in the proposed represention using memory effects, we don’t actually know how the relevant TBAA looks like, as it can be freely determined by the frontend, and we presumably wouldn’t want to hardcode the specific format produced by Clang in LLVM.
As such, I’m also proposing to add named metadata !llvm.errno.tbaa
, which specifies the TBAA metadata for an integer access:
!llvm.errno.tbaa = !{!7}
!7 = !{!"int", !8, i64 0}
!8 = !{!"omnipotent char", !9, i64 0}
!9 = !{!"Simple C++ TBAA"}
Then, if there is an access with, say, float TBAA for the same root, we would know that it does not alias with memory(errnomem: readwrite)
.
To support module merging, I think that !llvm.errno.tbaa
should accept a list of TBAA nodes, and we can conclude no alias if we don’t alias for any one of them.
Other ways to determine no alias
Beyond TBAA, there are probably also a few other ways to determine that accesses do not alias with memory(errnomem: ...)
.
For example, we should be able to say that alloca memory never aliases errno, even if it escapes.
We can probably also say that an access larger than sizeof(int)
does not alias errno (but a language expert would have to confirm that).
Auto-upgrade
Because errno
is currently part of the other
location, the semantics of existing textual IR remain unchanged.
For bitcode, we will explicitly upgrade MemoryEffect
s to copy the modref value of the other
location to the errno
location.