[RFC] Attribute for creating function clones with callsite information

Hello everyone! We would like to propose a new attribute callsite_wrapper to be able to replace functions with a wrapper that is instantiated for every callsite and has access to the source location (file, line) of the callsite.

The main goal of this RFC is to discuss the possibility of having such an attribute in Clang, and the secondary goal is to discuss possible implementation. We have created a proof of concept for this attribute, so we know it is possible, but it would be nice to find an alternative, simpler implementation.

Problem

In the Linux kernel, we have a need to collect information about every memory allocation and attribute this information to the place where the allocation originated. We need to achieve this with almost zero performance penalty, so regular memory sanitizing instrumentation is not a solution for us.

To do this, we currently use the preprocessor to replace all memory-allocating functions with a block that creates a static variable with callsite information (file, line, etc.) and accumulated statistics (e.g., how much memory was allocated/freed). This allows us to prepare these structures for every callsite at compile time, and the only additional thing happening at runtime is accumulating statistics, so the performance of this solution is quite good. You can see the implementation here and usage here. I will put these examples here for convenience:

#define vmalloc(...)		alloc_hooks(vmalloc_noprof(__VA_ARGS__))

#define alloc_hooks(_do_alloc)						\
({									\
	DEFINE_ALLOC_TAG(_alloc_tag);					\
	alloc_hooks_tag(&_alloc_tag, _do_alloc);			\
})

#define DEFINE_ALLOC_TAG(_alloc_tag)						\
	static struct alloc_tag _alloc_tag __used __aligned(8)			\
	__section(ALLOC_TAG_SECTION_NAME) = {					\
		.ct = CODE_TAG_INIT,						\
		.counters = &_shared_alloc_tag };

#define CODE_TAG_INIT {					\
	.modname	= CT_MODULE_NAME,		\
	.function	= __func__,			\
	.filename	= __FILE__,			\
	.lineno		= __LINE__,			\
	.flags		= 0,				\
}

We only need macros because we need source location substitutions (__FILE__, __LINE__, etc.) to point to the callsite. And macros also help to “instantiate” these static variables for every callsite. But the problem with using the preprocessor is that it is not context-aware and will replace everything that is named like a memory-allocating function. For example, if you name a variable malloc to store a pointer to an allocating function, you would get a compile error because macros would try to replace its name and will fail. And yes, getting a pointer of an instrumented allocation function is also not possible with macros.

Proposed solution

Instead of having a “preprocessor function,” we can have a regular function that has these properties:

  1. Every call to this function will instantiate a new function, so that each callsite could have its own static variables.
  2. Every source location expression like __builtin_FILE() in the function should point to the callee. We can’t use preprocessor macros like __FILE__ because the preprocessor will replace it before the function is instantiated.

I have created a proof of concept for this attribute here.

It integrates this feature in Sema and uses a combination of SubstDecl and InstantiateFunctionDefinition to instantiate function clones, see BuildCallsiteWrapperDeclarationNameExpr. Then it customizes TreeTransform, so that SourceLocExpr in the instantiated function gets Context and Loc from the callsite. The rest of the code is needed for intercepting function dereferences (see BuildDeclarationNameExpr) and to make instantiation work properly outside of template context.

Alternative solutions

There are no existing solutions in the compiler to achieve the same behavior and performance as we have with the preprocessor implementation. However, it is worth mentioning a few relevant things:

Default arguments with source location expressions:

void foo(const char* file = __builtin_FILE()) {
   ...
}

Default arguments are only allowed in C++, not in C. Even if we had default arguments in C, this solution doesn’t clone functions and doesn’t allow per-callsite static storage.

Templates

It would be nice to have a non-type template argument that defaults to a source location expression:

template <int line = __builtin__LINE()>
void foo() {
    ...
}

This would achieve our goals of instantiating different functions for different callsites. But templates currently don’t have support for non-type string variables, which we would need for __builtin__FILE(). And we don’t have templates in C. Any attempt to bring it there, even without template syntax, is doomed. See my talk at the LPC 2023 conference.

FWIW, Rust calls this #[track_caller]. See docs here: Code generation - The Rust Reference

Thank you! I’ll look in details how they implemented it, do you know where in their codebase the implementation is located? Quick search in rust-lang/rust has revealed a lot of usages of this attribute, but I couldn’t find implementation there. But I found their RFC.

I will also check what design decisions they made, the most interesting for me is calling track_caller from track_caller (nesting) and track_caller calling itself (recursion).

And I’ll probably name my attribute the same :slight_smile:

I still dont’ quite get the motivation for this, but I’m hopeful others can comprehend that better than me.

As far as the implementation: implementing this in the AST leaves me very concerned how this interacts with existing instantiations and other locations of instantiation (like how does this handle being called in a constant evaluation, etc? As a template argument, etc). It seems to me that this is more akin to the 'function multi-versioning` where we generate these instead during codgen. But again, I’m perhaps misunderstanding the use of this.

But templates currently don’t have support for non-type string variables

C++23/26 has some additional NTTP support that might be able to do something like this with some finagling.

It seems to me that this is more akin to the 'function multi-versioning` where we generate these instead during codgen.

The main problem for us with this approach is that multiversioned functions doesn’t multiversion decls inside it. See this example, here we have only one @foo_wrapped()::counter that is reused in every version of function.

For our usecase we need each clone to have their own independent static variables. When we put these static into a separate section you can iterate through them and collect statistics for each callsite. More specifically, it allows us to dump for each callsite how much memory it has allocated and not yet freed, so we can find the source of memory leaks for example.

I hope this helped to understand more about the motivation. I will also look on how track_caller in Rust is used: I found a lot of uses of it there, so I might find more motivation.

implementing this in the AST leaves me very concerned how this interacts with existing instantiations and other locations of instantiation (like how does this handle being called in a constant evaluation, etc? As a template argument, etc).

My point of view is that reusing instantiation code is safe: it works for templates, so it should work for us, because we want these functions to behave like template instantiations. And it would make more sense to implement the whole feature like NTTP default arguments. But we need this feature in Linux Kernel, which is C only. Emitting FunctionTemplateDecl in C and trying to use the whole template machinery will bring an enormous amount of hacks around a lot of parts of Clang frontend, believe me, I tried.

So what if we limit this attribute to C only, where we have much less things to worry about?

When you call InstantiateFunctionDefinition, that’s a request to instantate a function as if it were a C++ template function. So you’re pulling in all the C++ semantics involved in templates: dependent typing, recursive instantiation, C++ name mangling, etc. That’s very hard to reason about. And it doesn’t interact well with regular C++ templates.

The Linux kernel already abuses always_inline to achieve results similar to templates; I’m not really eager to stretch the limits in another direction.

For example, if you name a variable malloc to store a pointer to an allocating function,

I don’t think there’s any way to write a variable named malloc that would trigger macro replacement for #define malloc(...).

2 Likes

Some sniffing around found the requires_caller_location method in compiler/rustc_middle/src/ty/instance.rs. This looks to be where codegen asks “does this function use #[track_caller]?” and seems to have much better “actual effects” results when searching for it.

AFAIK, the implementation is basically as a hidden parameter to the function that is accessible via std::panic::Location::caller(). This is then noticed by the callsite to inject either the location of the call or, if the caller itself uses #[track_caller], the parameter it received. This means that one can have a stack of #[track_caller] functions and the first entry into the stack will be the location for all of them.

I mean, I candidly fail to see why this is an issue. Sure, if you make X a macro, you may have trouble using X w/o it being interpreted as a macro, but you can just give the underlying function a different name then and just use that instead if you don’t want to expand the macro. I don’t think adding a compiler feature to replace a particular macro constitutes a practical solution to the problem of macros ‘shadowing’ other names…

Moreover, this seems like a really narrow use case for an attribute: stamping out a new static variable that contains very specific information every time a function is called doesn’t seem like something that would see much (or any) use outside of this.

Also, this. Since the name of a function-like macro is only treated as a macro if it is followed by a ( token, declaring an entity (or e.g. taking the address of one) with the same name shouldn’t be an issue:

#define foo(x) expands to nonsense
int foo;        // Ok
int *x = &foo;  // Likewise
void (foo)() {} // Also ok. The extra parens around `foo` prevent expansion here.