Memcpy / Memset for address spaces >= 256

Hi,

SelectionDAGBuilder doesn't know how to lower a Memcpy and Memset if one of the pointer operands have an address space >= 256. This is understandable since the libc's memcpy / memset don't work for these address spaces. However, both Clang (when copying a struct) and some optimization passes (LoopIdiomRecognize, MemCpyOpt) can emit memcpy / memset for these address spaces. This triggers an assert in SelectionDAGBuilder. The optimization passes could be modified to give up when they encounter an address space >= 256, but I think clang would need some new code that emits a struct copy member-by-member. I think it's better to extend the code generator to be able to emit code for that. What do you think?

The problem is also described here: http://llvm.org/bugs/show_bug.cgi?id=18549

-Manuel

I have some patches that automatically expand all memcpy and similar if the operands are not in AS 0. I think this is probably not quite the right approach though, and we should be asking the back end for the function that does a memcpy / memset / whatever in a non-0 address space, and expand automatically if it doesn't provide one.

In an ideal world, I'd rather have the memcpy / memset lowering moved entirely out of SelectionDAG and into a FunctionPass, where it would be much easier to debug. I'd also want to do the same for lowering of unaligned loads / stores, so by the time you get to the back end every load and store is something that can map trivially to a single instruction (assuming an adequate addressing mode exists).

David

So a set of target specific legalisation passes? Like from the recent “PNaCl’s IR simplification passes” discussion.

Hi David,

sorry for sending you the mail two times, I forgot to send to the list the first time.

I have some patches that automatically expand all memcpy and similar
if the operands are not in AS 0. I think this is probably not quite
the right approach though, and we should be asking the back end for
the function that does a memcpy / memset / whatever in a non-0 address
space, and expand automatically if it doesn't provide one.

Can you share these patches? This would be a tentative solution for the reporter of the bug I linked in the original post.

In an ideal world, I'd rather have the memcpy / memset lowering moved
entirely out of SelectionDAG and into a FunctionPass, where it would
be much easier to debug. I'd also want to do the same for lowering of
unaligned loads / stores, so by the time you get to the back end every
load and store is something that can map trivially to a single
instruction (assuming an adequate addressing mode exists).

While I agree that the memcpy lowering pass could be done as an IR pass because it involves loops, I don't think you should do that for lowering of unaligned loads / stores. But that's mostly unrelated to this thread and should be discussed separately.

There are still some advantages of lowering the memcpy / memset in SelectionDAGBuilder. The infrastructure (e.g. target hooks for determining the right register class for memory operations) is already there. I don't know how hard it is to generate loops in SelectionDAGBuilder, though.

-Manuel

They're quite ugly (hence not upstreaming yet - most of the stuff in this repo needs some tidying and may be the wrong approach). Looking at the specific change, it's actually a one liner (and a really, really ugly hack):

https://github.com/CTSRD-CHERI/llvm/commit/a3e044242ab109a7dad2589f4cc1461b08cb513d

This basically sets the limit for the size of a memcpy to expand to infinite. I'm not sure it works for variable-sized memcpys, but I don't think those are inserted by optimisations so we haven't hit them yet.

David

I have some patches that automatically expand all memcpy and similar
if the operands are not in AS 0. I think this is probably not quite
the right approach though, and we should be asking the back end for
the function that does a memcpy / memset / whatever in a non-0 address
space, and expand automatically if it doesn't provide one.

Can you share these patches? This would be a tentative solution for the reporter of the bug I linked in the original post.

They're quite ugly (hence not upstreaming yet - most of the stuff in
this repo needs some tidying and may be the wrong approach). Looking
at the specific change, it's actually a one liner (and a really,
really ugly hack):

https://github.com/CTSRD-CHERI/llvm/commit/a3e044242ab109a7dad2589f4cc1461b08cb513d

Would passing AlwaysInline=true as a parameter have the same effect as this change?

This basically sets the limit for the size of a memcpy to expand to
infinite. I'm not sure it works for variable-sized memcpys, but I
don't think those are inserted by optimisations so we haven't hit them
yet.

I previously thought LoopIdiomRecognize can insert variable-sized memcpys / memsets but it only looks into "countable" loops which are defined as ScalarEvolution's hasLoopInvariantBackedgeTakenCount(). This probably means no variable-sized memcpys / memsets are inserted. Or does it only mean the loop count has to be independent from the loop body? I'll try out passing "AlwaysInline" to getMemcpy() and see whether it works for the code presented in the bug report linked in the original post.

However, for a fix that can be integrated into the tree, variable-sized memcpys / memsets either have to be implemented or a better (and one when built without assertions) error message has to be presented.

-Manuel

Hi,

this thread is now almost a year old, but the problem still exists.

I have some patches that automatically expand all memcpy and similar
if the operands are not in AS 0. I think this is probably not quite
the right approach though, and we should be asking the back end for
the function that does a memcpy / memset / whatever in a non-0 address
space, and expand automatically if it doesn't provide one.

Is it the back end that should be asked for a memcpy or memset function? Since these functions are provided by the runtime, I think it should be the user that configures this.

My suggestion for how to fix the problem is:
1) Provide a way for the user to specify which functions to call for memcpy / memset with address spaces >= 256.
2) Modify / add target hooks to enable better code generation for small constant-sized memcpy / memset (like possible now with address spaces < 256).

-Manuel