I’ve gotten a fantastic bug report. Consider the LLVM IR:
target triple = “x86_64-unknown-linux-gnu”
define internal i8* @access({ i8* }* %arg, i64) {
ret i8* undef
}
define i8* @g({ i8* }* %arg) {
bb:
%tmp = alloca { i8* }, align 8
store { i8 }* %arg, { i8* }** %tmp, align 8
br i1 undef, label %bb4, label %bb1
bb1:
%tmp2 = load { i8* }, { i8 }** %tmp, align 8
%tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef)
br label %bb4
bb4:
ret i8* undef
}
This IR, if compiled with opt -passes='cgscc(inline,argpromotion)' -disable-output
hits a bunch of asserts in the LazyCallGraph.
The problem here is that argpromotion
turns a normal looking function i8* @access({ i8* }* %arg, i64)
and turn it into a magical function i8* @access(i8* %arg, i64)
. This latter signature is the POSIX access
function that LLVM’s TargetLibraryInfo
knows magical things about.
Because some library functions known to TargetLibraryInfo
can have calls to them introduced at arbitrary points of optimization (consider vectorized variants of math functions), the new pass manager and its graph to provide ordering over the module get Very Unhappy when you introduce a definition of a library function in the middle of the compilation pipeline.
And really, we do not want argpromotion
to do this. We don’t want it to turn some random function by the name of @free
into the actual @free
function and suddenly change how LLVM handles it.
So what do we do?
One option is to make argpromotion
and every other pass that mutates a function’s signature rename the function (or add a nobuiltin
attribute to it). However, this seems brittle and somewhat complicated.
My proposal is that we admit that certain names of functions are reserved in LLVM’s IR. For these names, in some cases any function with that name will be treated specially by the optimizer. We can still check the signatures when transforming code based on LLVM’s semantic understanding of that function, but this avoids any need to check things when mutating the signature of the function.
This would require frontends to avoid emitting functions by these names unless they should have these special semantics. However, even if they do, everything should remain conservatively correct. But I’ll send an email to cfe-dev suggesting that Clang start “mangling” internal functions that collide with target names. I think this is important as I’ve found a quite surprising number of cases where this happens in real code.
There is no need to auto-upgrade here, because again, LLVM’s handling will remain conservatively correct.
Does this seem reasonable? If so, I’ll send patches to update the LangRef with these restrictions. I’ll also take a quick stab at generating some example tables of such names from the .td files used by TargetLibraryInfo
already. These can’t be authoritative because of the platform-specific nature of it, but should help people understand how this area works.
One alternative that seems appealing but doesn’t actually help would be to make TargetLibraryInfo
ignore internal functions. That is how the C++ spec seems to handle this for example (C library function names are reserved only when they have linkage). But this doesn’t work well for LLVM because we want to be able to LTO an internalized C library. So I think we need the rule for LLVM function names to not rely on linkage here.
Thanks,
-Chandler