Hi all-
I'm working on an attribute implementation (RFC coming once I get a better idea about the implementation details, cpu_dispatch/cpu_specific from ICC) that permits multiple definitions of a function. All definitions of the function ARE emitted, however with different name-mangling. 1 version of the function keeps the normal name-mangling/linking, so all call-expressions should call THAT one.
My question is: Is there a way to mark a FunctionDecl as 'don't find me in lookup' such that it never ends up being a "Callee"? Or should I be figuring out how to change the lookup-results here?
My first guess would be that you want to modify overload resolution. I think the CUDA host/device attributes are involved there. You can also look at the implementation of attribute((enable_if)) as well.
However, do we really need a FunctionDecl for every subtarget specialization? How does a user trigger specialization? Can they do something like take a specialization and instantiate a class template with it?
Each ‘foo’ implementation is emitted with different name mangling. They are generally ALL considered the same function, Function Pointers, references, etc will all be to the ‘dispatch’ version of the function.
The dispatch function will be implemented in terms of an iFunc.
I’m using the ‘emit’ functionality from attribute-target to get the target-cpu emitted properly, however I hadn’t realized it allowed multiple function definitions. I’ll look into that one as well.
The difference as far as I can tell is that the dispatch is done at Load/Runtime. It checks on “CPUID” to determine which function to call. As far as I can tell, target simply sends optimization hints to the backend, right?
I think this is intended to be a superset of the ‘target’ functionality.
The difference as far as I can tell is that the dispatch is done at Load/Runtime. It checks on “CPUID” to determine which function to call. As far as I can tell, target simply sends optimization hints to the backend, right?
You can set up automagic dispatch using ifunc or the rest of that (they do in gcc), just a lot of that functionality isn’t wired up.
I think this is intended to be a superset of the ‘target’ functionality.
At a function definition level it’s exactly the same
At any rate, please do keep me on reviews for this. I don’t believe the mechanics in general should be any different.
The difference as far as I can tell is that the dispatch is done at Load/Runtime. It checks on “CPUID” to determine which function to call. As far as I can tell, target simply sends optimization hints to the backend, right?
You can set up automagic dispatch using ifunc or the rest of that (they do in gcc), just a lot of that functionality isn’t wired up.
[Keane, Erich] I’m currently implementing in terms of ifunc, though I’m not sure what you mean here? Are we missing some functionality of ‘target’ that would make it a lot more similar?
I think this is intended to be a superset of the ‘target’ functionality.
At a function definition level it’s exactly the same
[Keane, Erich] The difference is that this allows multiple definitions of the same function, which does not seem to be the case in target. Am I missing something else here?
At any rate, please do keep me on reviews for this. I don’t believe the mechanics in general should be any different.
[Keane, Erich] I definitely will! I was hoping to do ‘in progress’ reviews once I’m confident with the direction.
Thanks for the link! I definitely see that this is very similar then! I DID try the first example on your link, and it fails because of ‘redefinition of foo’.
It actually seems that this cpu_dispatch/cpu_specific is perhaps even a sub-set of GCC’s target (since it only handles the ‘arch=’ tests). The only real difference after that will be the linked-names, but perhaps that is something acceptable to us as well (or that could be kept internal).
Perhaps my best goal here would be instead to help you (or, have you guide me) to implement ‘target’ the rest of the way(IFunc, multiple defs, changing the name-mangling), then create cpu_dispatch/cpu_specific as an alias for ‘target’.