Why does clang do a memcpy? Is the cast not enough? (ABI function args)

edA-qa_mort-ora-y · April 18, 2018, 4:40pm

I'm implementing function arguments and tested this code in C:

// clang \-emit\-llvm ll\_struct\_arg\.c \-S \-o /dev/tty
typedef struct vpt\_data \{
    char a;
    int b;
    float c;
\} vpt\_data;

void vpt\_test\( vpt\_data vd \) \{
\}

int main\(\) \{
    vpt\_data v;
    vpt\_test\(v\);
\}

This emits an odd LLVM structure that casts to the desired struct type,
but also memcpy's to a temporary structure. I'm unsure of why the memcpy
is done as opposed to just casting directly?

define i32 @main\(\) \#0 \{
  %v = alloca %struct\.vpt\_data, align 4
  %1 = alloca \{ i64, float \}, align 4
  %2 = bitcast \{ i64, float \}\* %1 to i8\*
  %3 = bitcast %struct\.vpt\_data\* %v to i8\*
  call void @llvm\.memcpy\.p0i8\.p0i8\.i64\(i8\* %2, i8\* %3, i64 12, i32

4, i1 false)
%4 = getelementptr inbounds { i64, float }, { i64, float }* %1,
i32 0, i32 0
%5 = load i64, i64* %4, align 4
%6 = getelementptr inbounds { i64, float }, { i64, float }* %1,
i32 0, i32 1
%7 = load float, float* %6, align 4
call void @vpt_test(i64 %5, float %7)
ret i32 0
}

DimitryAndric · April 18, 2018, 5:13pm

Because you are passing the parameter by value? It *should* copy the
data. In this particular case it will probably be elided if you turn on
optimization, but it is more logical to pass structs via a const
reference or pointer.

-Dimitry

edA-qa_mort-ora-y · April 18, 2018, 5:28pm

I understand it's passing by value, that's what I'm testing here. The
question is why does it copy the data rather than just casting and
loading values from the original variable (%v) ? It seems like the
copying is unnecessary.

Not all struct's result in the copy, only certain forms -- others are
just cast directly as I was expecting. I'm just not clear on what the
differences are, and whether I need to do the same thing.

Krzysztof_Parzyszek · April 18, 2018, 5:33pm

It is a matter of the calling convention. It would specify what structs are passed in registers, and which are passed through stack.

-Krzysztof

edA-qa_mort-ora-y · April 18, 2018, 5:38pm

Yes, I understand that as well (it's what I'm trying to recreate in my
language now).

I'm really wondering why it does the copy, since from what I can tell it
could just as easily cast the original value and do the load without the
memcpy operation.

That is, the question is about the memcpy and extra alloca -- I
understand what it's doing, just not why it's doing it this way.

Krzysztof_Parzyszek · April 18, 2018, 5:43pm

This is the standard way of copying memory in the IR. Backends can expand the memcpy into loads/stores if they want.

-Krzysztof

edA-qa_mort-ora-y · April 18, 2018, 5:50pm

Yes, but why is it even copying the memory? It already has a pointer
which it can cast and load from -- and does so in other scenarios.

I'm wondering whether this copying is somehow required and I'm missing
something, or it's just an artifact of the clang emitter. That is, could
it not omit the memcpy and cast the original variable?

mats_petersson · April 18, 2018, 6:04pm

It needs to LOAD the data. It is FASTER to do a memcpy (if the data is large enough) than to do a “load”. If you actually convince the compiler to do a load, it will produce enough 32- or 64-bit LOAD/STORE pairs to copy the data. Not only does this bloat the code, it is also likely slower than running memcpy as a loop.

For SMALL copies, memcpy gets replaced by simple load/store instructions anyway in the memcpy optimisation pass, so it is not an overhead.

I know this, because I had to implement a similar thing in my Pascal compiler to avoid it exploding when trying to use a “record” (Pascal’s “struct”) with an array of 16000 int - it generated several thousand LOAD and STORE instructions for each function call. Which made the whole thing take almost forever, and the code generated was terrible. Calling memcpy instead solved the problem.

dblaikie · April 19, 2018, 5:26pm

I believe the memcpy is there just as a consequence of Clang’s design - different parts of the compiler own different pieces of this, so in some sense one hand doesn’t see what the other is doing. Part of it is “create an argument” (memcpying the local variable into an unnamed value) and then the next part is “oh, but that argument gets passed in registers, so decompose it into registers again”.

Clang doesn’t need to produce perfectly optimal IR - because the optimization pipeline of LLVM will clean things up. So in many cases it’s just easier (& not a significant impediment to performance) to have some of these sort of redundancies/oddities in output, and just let the LLVM optimization pipeline clean them up later.

edA-qa_mort-ora-y · April 20, 2018, 5:59am

Thanks. That kind of makes sense.

I see that a lot in my code as well: poor IR structures that aren't
worth the effort to clean up since the LLVM passes do such a fine job of it.

Turns out I now have the same copying structure in my ABI support code,
though I use Store instead.

Topic		Replies	Views
llvm.memcpy for struct copy LLVM Dev List Archives	13	134	February 2, 2018
llvm.memcpy for struct copy LLVM Dev List Archives	3	97	February 2, 2018
Clang generates calls to llvm.memcpy with overlapping arguments, but LangRef requires the arguments to not overlap Clang Frontend	6	117	September 14, 2020
Passing structs byval Clang Frontend	1	87	April 29, 2008
Clang question LLVM Dev List Archives	17	120	March 5, 2012

Why does clang do a memcpy? Is the cast not enough? (ABI function args)

Related topics