Clang question

Clang is inserting an llvm.memcpy function call into my program where it does not exist (the code never calls memcpy), is there a particular reason for this? It also looks like it’s inserting two other artificial function calls, something to do with llvm.lifetime.start and llvm.lifetime.end, what are these functions and why are they being inserted artificially?


So, let me rephrase, I understand what these functions are, I just want to know why and when they are inserted so that I can make an attempt to remove them, as they are not produced in llvm-gcc, only in clang?

Memcpy in my experience has been inserted when a struct copy is generated.

Ok, thanks. Is this an automatic optimization or is there some other way (possibly some other opt I am calling that does this) to get around the memcpy, such as llvm-gcc does? (since it does not use it)

So it appears that the external node calls these three functions along with my “real” function.

Hi Ryan,

the compiler is free to insert implicit calls to memcpy(), for instance
for assignments from one struct/class variable to another. The same goes
for memset(), which may be inserted implicitly for the initialization of
local structs or arrays.

The good news is that the backend normally optimizes these calls away
where possible, replacing them with simple moves - at least as long as
the number of bytes to copy does not exceed a certain threshold.

As for the llvm.lifetime intrinsics, take a look at the documentation:
If I'm not mistaken, these calls seem to be used to mark the lifespan of
a stack-allocated object.


llvm.lifetime.* are just markers that are used by the optimizer to reason about the code.
They disappear without a trace when lowering to machine code.

The memcpy is just the way Clang does POD copying. It's up to the optimizers to decide whether to lower this to custom code or actually emit a call to memcpy.



Yes, you are correct on the lifetime calls, they are just markers for liveness.

However, the backend is not optimizing these calls away. I could try to deal with them outside of llvm but I was hoping for a cleaner solution using llvm?

You don't have memcpy or want it to always lower it?


I would like it to always be lowered, I don’t want it.

You'll need to do the work then. I'd also question why? On most platforms a decent memcpy exists.



Ok, thanks, looks like I’ll need to figure something out. I was hoping scalarrepl would take care of this for me, but it’s not lowering the structure (I haven’t look at the opt code to see why, I"m sure there’s some valid reason I’m unaware of atm).

Does -fno-builtin[-memcpy] handle this?


In a backend, set this in the TargetLowering:

  maxStoresPerMemcpy = 4096;
  maxStoresPerMemmove = 4096;
  maxStoresPerMemset = 4096;

We don't have memcpy in our backend so we have to expand it to a sequence of stores.


As a command line argument for clang? Or an opt? Says, “argument unused during compilation”, but I think that is basically what I’m looking for right?


Clang doesn’t accept this as an option; however, it did accept -fno-builtin (the more general for all usage) and this has seemed to work. Thank you.

My other question would then be how to lower vector instructions, such as extractelement, insertelement and shufflevector. These should be solved by ld/st/address calculation, correct? This is somewhat of the same problem it seems to me, or not?


Nevermind. bb-vectorize causes this optimization I see, I have disabled it.

I am still curious though, what is the syntactically correct way to just remove the -memcpy using -fno-builtin, I have tried both -fno-builtin[-memcpy] and the “gcc” version -fno-builtin-memcpy?

It's a known issue that we don't support -fno-builtin-memcpy etc. As
far as I know, nobody really considers it a priority.


Thanks for the reply.