I recently ran into issues using the byval attribute. I was adding it to my functions that take arguments by value, but as pointers. I noticed that the optimizer would totally butcher my code if I only added the byval attribute to the function declaration and not at the call site. Specifically, it would simply remove any code that initialized data that I was passing in my byval arguments, so it would just pass a pointer to uninitialized memory from a previous alloca. Adding byval to the relevant arguments at the call site seemed to fix the problem.
This is very confusing to me as none of the documentation Iâve read says anything about call site attributes (to the point where I didnât even think you could set attributes at the call site). Why do I need to set attributes at the call site if theyâre already on the function declaration? Am I using this attribute wrong? Is this covered somewhere in the docs?
For additional context, here is a post I made about this on r/llvm with specific code samples: Reddit - Dive into anything.
âfunction argsâ: argument list whose types match the function signature argument types and parameter attributes. All arguments must be of first class type. If the function signature indicates the function accepts a variable number of arguments, the extra arguments can be specified.
The IR Verifier is supposed to catch these kinds of discrepancies. So, it would make sense to run on your IR output.
The background here is that for indirect calls like call void %foo(), where %foo is an argument/instruction rather than a global, the function declaration is not known, so the only place where ABI-affecting attributes can be specified is the call-site. Anything that affects the ABI (including the function type and the ABI attributes) needs to be the same at the call-site and the function definition, to ensure that the caller passes arguments the same way as the callee receives them. (In first approximation, Iâm glossing over some details here.)
A mismatch is still valid IR, just undefined behavior at runtime, which is why the IR verifier does not report this. This is something the IR linter (-passes=lint) should report, but currently doesnât.
The reason why you mostly get away with only placing the attributes on the function declaration is that LLVM usually inherits attributes from the declaration to the call-site for optimization purposes, which also inherits the ABI attributes. This isnât guaranteed though, and you hit one of the cases where it does not happen.
All this doesnât appear to be well-documented in LangRef. We should improve that.
Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into âunreachableâ? Why not make the verifier reject it?
This is a common problem run into by authors of front-ends that are using custom calling conventions: you need to make sure to set the right calling convention on both the function and on each call to the function.
ABI argument attributes are sort of a calling convention extension mechanism, so you need to set them on both the call site and function declaration for the same reasons.
As Nikita mentioned, LLVM will sometimes look through direct call sites to see some ABI attributes, but it really shouldnât. I believe Arthur attempted to stop looking through direct call sites, but it breaks a ton of instrumentation passes, which then need to start adding sext annotations and similar attributes.
As for documenting this, I think the paragraphs on parameter attributes could use some work and communicate this.