Question about "structs" and performance

It has been brought to my attention that a front end should avoid generating aggregates to represent things like a C “struct” (Performance Tips for Frontend Authors — LLVM 20.0.0git documentation). My question, then, is what is the optimal thing to do with them (not necessarily what clang does because of its ABI baggage)? Is it optimal to “atomize” such structs in the generated code, meaning to treat each of their members as an independent expression? Would it be better to pass a struct “by value” to a function by instead passing its members individually by value? Or, is that approach also a performance pessimization? Should I instead always create alloca’s for structs and pass even logically by value structs to functions by their address (and if so, what annotations do I need to add to indicate that the memory is not modified / the address doesn’t escape)?. When I need to return multiple values and/or a struct “by value” from a function, should I be creating an aggregate to return them all in, or should I instead be generating a function that also takes a pointer “out parameter” to write results to?

1 Like

The tip is to avoid having values of types that are aggregate or array. If you can do something without creating such a value, then do it that way. Specifically, if you want to access members, use getelementptr with the aggregate type to get the address of the specific member.
If you need to pass a struct to a function, do what the ABI suggests.

That doesn’t really answer my questions. There are several ways I could avoid creating aggregate values, and I am asking which of those ways is generally best. Similarly, there is no exiting ABI to constrain how I pass a “struct” to a function, and in the absence of such constraints I am asking what would generally produce the best codegen.

The answer here depends on the size of the struct. If the struct is large, you will want to pass it by pointer (and return via sret pointer). If it is small, you will want to pass it in scalarized form.

There is no choice that will always be optimal, you can only have heuristics. For platform ABIs, these heuristics can be fairly non-trivial (e.g. with special cases for homogeneous aggregates). A very simple baseline heuristic would be to pass two members by value and more by pointer.

For internal linkage functions, LLVM will convert aggregates passed by pointer into scalarized arguments when it thinks doing so would be profitable (“argument promotion”).

This tip is about avoiding unnecessary obstructions for the compiler, outside of that you’re free to decide what you want to do. Some such decisions will depend on your target architecture. ABIs describe how to pass structures to functions—small structures will often be passed in registers, larger structures need to be passed by reference. Make some initial decision on what your ABI (calling convention in particular) will be, and work with it. If you want some guidance there, look at existing ABIs.