How to prevent insertion of memcpy()

Hi,

I have the following program:

// test.c

#include <stdlib.h>
struct foo_t {
int x[1024];
};
__thread struct foo_t g_foo;
void bar(struct foo_t* foo) {
g_foo = foo;
}
int main() {
struct foo_t
f = (struct foo_t*)malloc(sizeof(struct foo_t));
bar(f);
return 0;
}

When I compile it with clang I see that it inserts memcpy() in function bar():

$ clang -v
clang version 3.2 (trunk 157390)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang test.c -g && objdump -dCS a.out

void bar(struct foo_t* foo) {
4005b0: 55 push %rbp
4005b1: 48 89 e5 mov %rsp,%rbp
4005b4: 48 83 ec 10 sub $0x10,%rsp
4005b8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
g_foo = *foo;
4005bc: 48 8b 7d f8 mov -0x8(%rbp),%rdi
4005c0: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax
4005c7: 00 00
4005c9: 48 8d 80 00 f0 ff ff lea -0x1000(%rax),%rax
4005d0: ba 00 10 00 00 mov $0x1000,%edx
4005d5: 48 89 7d f0 mov %rdi,-0x10(%rbp)
4005d9: 48 89 c7 mov %rax,%rdi
4005dc: 48 8b 75 f0 mov -0x10(%rbp),%rsi
4005e0: e8 c3 fe ff ff callq 4004a8 memcpy@plt
}
4005e5: 48 83 c4 10 add $0x10,%rsp
4005e9: 5d pop %rbp
4005ea: c3 retq
4005eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

How do I disable that feature? I’ve tried -fno-builtin and/or -ffreestanding with no success.
TIA

How do I disable that feature? I've tried -fno-builtin and/or -ffreestanding
with no success.

clang (as well as gcc) requires that freestanding environment provides
memcpy, memmove, memset and memcmp.

PS: Consider emailing cfedev, not llvmdev.

Hi,

I have the following program:

// test.c

#include <stdlib.h>
struct foo_t {
int x[1024];
};
__thread struct foo_t g_foo;
void bar(struct foo_t* foo) {
g_foo = foo;
}
int main() {
struct foo_t
f = (struct foo_t*)malloc(sizeof(struct foo_t));
bar(f);
return 0;
}

When I compile it with clang I see that it inserts memcpy() in function bar():

$ clang -v
clang version 3.2 (trunk 157390)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang test.c -g && objdump -dCS a.out

void bar(struct foo_t* foo) {
4005b0: 55 push %rbp
4005b1: 48 89 e5 mov %rsp,%rbp
4005b4: 48 83 ec 10 sub $0x10,%rsp
4005b8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
g_foo = *foo;
4005bc: 48 8b 7d f8 mov -0x8(%rbp),%rdi
4005c0: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax
4005c7: 00 00
4005c9: 48 8d 80 00 f0 ff ff lea -0x1000(%rax),%rax
4005d0: ba 00 10 00 00 mov $0x1000,%edx
4005d5: 48 89 7d f0 mov %rdi,-0x10(%rbp)
4005d9: 48 89 c7 mov %rax,%rdi
4005dc: 48 8b 75 f0 mov -0x10(%rbp),%rsi
4005e0: e8 c3 fe ff ff callq 4004a8 memcpy@plt
}
4005e5: 48 83 c4 10 add $0x10,%rsp
4005e9: 5d pop %rbp
4005ea: c3 retq
4005eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

How do I disable that feature? I’ve tried -fno-builtin and/or -ffreestanding with no success.
TIA

Hi,

Thanks. I’ve emailed cfe-dev.
We absolutely need clang/llvm to not insert the calls into our code.

Hi Dmitry,

We absolutely need clang/llvm to not insert the calls into our code.

why is that?

Ciao, Duncan.

This really isn’t possible.

The C++ standard essentially requires the compiler to insert calls to memcpy for certain code patterns.

What do you really need here? Clearly you have some way of handling when the user writes memcpy; what is different about Clang or LLVM inserting memcpy?

I need it for ThreadSanitizer runtime. In particular
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc?view=annotate
line 1238. But I had similar problems in other places.
Both memory access processing and signal handling are quite tricky, we can’t allow recursion.

The first thing to think about is that you do need to use -fno-builtin / -ffreestanding when compiling the runtime because it provides its own implementations of memcpy.

The second is that there is no way to write fully generic C++ code w/o inserting calls to memcpy. =/ If you are writing your memcpy implementation, you’ll have to go to great lengths to use C constructs that are guaranteed to not cause this behavior, or to manually call an un-instrumented memcpy implementation. I don’t know of any easy ways around this.

How do I disable that feature? I’ve tried -fno-builtin and/or -ffreestanding
with no success.

clang (as well as gcc) requires that freestanding environment provides
memcpy, memmove, memset and memcmp.

PS: Consider emailing cfedev, not llvmdev.

Hi,

Thanks. I’ve emailed cfe-dev.
We absolutely need clang/llvm to not insert the calls into our code.

This really isn’t possible.

The C++ standard essentially requires the compiler to insert calls to memcpy for certain code patterns.

What do you really need here? Clearly you have some way of handling when the user writes memcpy; what is different about Clang or LLVM inserting memcpy?

I need it for ThreadSanitizer runtime. In particular
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc?view=annotate
line 1238. But I had similar problems in other places.
Both memory access processing and signal handling are quite tricky, we can’t allow recursion.

The first thing to think about is that you do need to use -fno-builtin / -ffreestanding when compiling the runtime because it provides its own implementations of memcpy.

We used both at some points in time, but the problem is that they do not help to solve the problem. I think we use -fno-builtin now, I am not sure about -ffreestanding.

The second is that there is no way to write fully generic C++ code w/o inserting calls to memcpy. =/ If you are writing your memcpy implementation, you’ll have to go to great lengths to use C constructs that are guaranteed to not cause this behavior, or to manually call an un-instrumented memcpy implementation. I don’t know of any easy ways around this.

What are these magic constructs. I had problems with both struct copies and for loops.

Don’t copy things by value ever. =/ It is really, really hard to do this. If at all possible, I would build your runtime against an un-instrumented memcpy (perhaps defined within the runtime), and then use aliases or other techniques to wrap the instrumented functions in the exported names necessary for use when intercepting memcpy calls from the instrumented program.

How do I disable that feature? I’ve tried -fno-builtin and/or -ffreestanding
with no success.

clang (as well as gcc) requires that freestanding environment provides
memcpy, memmove, memset and memcmp.

PS: Consider emailing cfedev, not llvmdev.

Hi,

Thanks. I’ve emailed cfe-dev.
We absolutely need clang/llvm to not insert the calls into our code.

This really isn’t possible.

The C++ standard essentially requires the compiler to insert calls to memcpy for certain code patterns.

What do you really need here? Clearly you have some way of handling when the user writes memcpy; what is different about Clang or LLVM inserting memcpy?

I need it for ThreadSanitizer runtime. In particular
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc?view=annotate
line 1238. But I had similar problems in other places.
Both memory access processing and signal handling are quite tricky, we can’t allow recursion.

The first thing to think about is that you do need to use -fno-builtin / -ffreestanding when compiling the runtime because it provides its own implementations of memcpy.

We used both at some points in time, but the problem is that they do not help to solve the problem. I think we use -fno-builtin now, I am not sure about -ffreestanding.

The second is that there is no way to write fully generic C++ code w/o inserting calls to memcpy. =/ If you are writing your memcpy implementation, you’ll have to go to great lengths to use C constructs that are guaranteed to not cause this behavior, or to manually call an un-instrumented memcpy implementation. I don’t know of any easy ways around this.

What are these magic constructs. I had problems with both struct copies and for loops.

Don’t copy things by value ever. =/ It is really, really hard to do this.

Do you mean ‘don’t do struct copies’? Are there other problems aside from implicit memcpy calls?

If at all possible, I would build your runtime against an un-instrumented memcpy (perhaps defined within the runtime), and then use aliases or other techniques to wrap the instrumented functions in the exported names necessary for use when intercepting memcpy calls from the instrumented program.

I am not sure I understand it.
We can’t afford function calls scattered at random places. It will cost 30% of performance of so.

How do I disable that feature? I’ve tried -fno-builtin and/or -ffreestanding
with no success.

clang (as well as gcc) requires that freestanding environment provides
memcpy, memmove, memset and memcmp.

PS: Consider emailing cfedev, not llvmdev.

Hi,

Thanks. I’ve emailed cfe-dev.
We absolutely need clang/llvm to not insert the calls into our code.

This really isn’t possible.

The C++ standard essentially requires the compiler to insert calls to memcpy for certain code patterns.

What do you really need here? Clearly you have some way of handling when the user writes memcpy; what is different about Clang or LLVM inserting memcpy?

I need it for ThreadSanitizer runtime. In particular
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/rtl/tsan_interceptors.cc?view=annotate
line 1238. But I had similar problems in other places.
Both memory access processing and signal handling are quite tricky, we can’t allow recursion.

The first thing to think about is that you do need to use -fno-builtin / -ffreestanding when compiling the runtime because it provides its own implementations of memcpy.

We used both at some points in time, but the problem is that they do not help to solve the problem. I think we use -fno-builtin now, I am not sure about -ffreestanding.

The second is that there is no way to write fully generic C++ code w/o inserting calls to memcpy. =/ If you are writing your memcpy implementation, you’ll have to go to great lengths to use C constructs that are guaranteed to not cause this behavior, or to manually call an un-instrumented memcpy implementation. I don’t know of any easy ways around this.

What are these magic constructs. I had problems with both struct copies and for loops.

Don’t copy things by value ever. =/ It is really, really hard to do this.

Do you mean ‘don’t do struct copies’? Are there other problems aside from implicit memcpy calls?

Don’t do copies outside of a restricted set of primitive types (sizeof(T) <= sizeof(T*) would be my rule of thumb, but there is no hard-and-fast rule here to avoid these problems).

If at all possible, I would build your runtime against an un-instrumented memcpy (perhaps defined within the runtime), and then use aliases or other techniques to wrap the instrumented functions in the exported names necessary for use when intercepting memcpy calls from the instrumented program.

I am not sure I understand it.
We can’t afford function calls scattered at random places. It will cost 30% of performance of so.

These won’t end up actually being function calls… Clang lowers them to ‘memcpy’, and LLVM will try to lower them to actual loads and stores where possible.

We should discuss these issues separately though:

  1. Get the runtime working w/o worrying about memcpy being inserted or not by having a clear barrier between instrumented functions and non-instrumented functions, and making the non-instrumented ones available when compiling and linking the runtime, but not when compiling / linking the instrumented program.

  2. Deal with any performance fallout of the thusly built runtime. We can fix the LLVM optimizers until they generate the optimal code. =]

There are some other platforms that absolutely can't tolerate function
calls. Do they have an attribute or pass to tell LLVM to inline any
functions it or clang inserts? Could Dmitry do the same thing?

Yes, there are attributes which can be attached to the non-instrumented memcpy function, provided by the runtime and selected due to -ffreestanding, which will force inlining. attribute((always_inline)), attribute((flatten)). I suspect we don’t correctly support the latter in Clang/LLVM, but that’s clearly a missing feature we should fix.

But to harp on it a bit because this is on ‘llvmdev’ and is of general interest: please don’t use these to fix performance problems without first filing a bug against LLVM’s optimizers for why it was necessary. In an ideal world these should only be used where there is a platform/ABI/debugging/etc contract that no function calls occur.

Hi,

I have the following program:

// test.c

#include <stdlib.h>
struct foo_t {
int x[1024];
};
__thread struct foo_t g_foo;
void bar(struct foo_t* foo) {
g_foo = foo;
}
int main() {
struct foo_t
f = (struct foo_t*)malloc(sizeof(struct foo_t));
bar(f);
return 0;
}

When I compile it with clang I see that it inserts memcpy() in function bar():

Hi Dmitry,

Short answer: you can’t. Freestanding implementations are required to provide memcpy.

-Chris

Hi Chris,

I understand that a freestanding impl must provide memcpy(). But I think it’s somewhat orthogonal to actual insertion of the function calls. For example, it’s legal to compile with -O0 and with -O3, but I think a conforming compiler that always compiles with -O3 w/o any means to override that would not be that useful. So I thought maybe there is some flag that control insertion of memcpy’s, perhaps a hacky flag that turns off the particular pass.

Meanwhile I will try Chandler’s solution.

Thanks, I will try this.