Lowering "memcpy" intrinsic function on ARM using LDMIA/STMIA

Hi,

llvm emits code for “memcpy” on ARM as consecutive ldr/str commands, and further combines them into ldm/stm with special pass after register allocation. But ldm/stm commands require registers to go in ascending order, what is often not so after regalloc, therefore some str/ldr commands. For example such code:

struct Foo {int a, b, c, d; }
void CopyStruct(struct Foo *a, struct Foo *b) { *a = *b; }

compiled to:

ldmia r1, {r2, r3, r12}
ldr r1, [r1, #12]
stmia r0, {r2, r3, r12}
str r1, [r0, #12]
bx lr

I ran different tests and always regalloc allocates at least one register not in ascending order.

What is your ideas to overcome this issue? Maybe llvm should emit code for “memcpy” straight into ldm/stm or exchange registers before combining ldr/str to make them go in ascending order or fix somehow register allocator?

Best regards, Vasiliy.

Hi,

llvm emits code for "memcpy" on ARM as consecutive ldr/str commands, and

Hmm, this happens elsewhere as well (x86?). Perhaps what we need is a
switch to disable memset/memcpy lowering?

09.02.2011 18:57, Jason Kim пишет:

Hi,

llvm emits code for "memcpy" on ARM as consecutive ldr/str commands, and

Hmm, this happens elsewhere as well (x86?). Perhaps what we need is a
switch to disable memset/memcpy lowering?

Do you offer to call libc memset/memcpy functions always instead of intrinsic lowering? It seems not a good idea, because often (especially in cases of small chunks of memory) consecutive ldm/stm instructions are more efficient than memcpy call.

-fno-builtin is the flag you want.

deep

llc hasn't such flag and as I mentioned transforming memcpy into ldm/stm instructions often is more efficient way than calling memcpy from libc.

10.02.2011 01:22, Sandeep Patel ÐÉÛÅÔ:

Seems like a little misunderstanding. I wrote about bitcode memcpy intrinsic, not memcpy from libc. Exactly this intrinsic is used in IR for stuctures coping as in my example. And lowering of memcpy intrinsic has mentioned issue on ARM.

10.02.2011 01:22, Sandeep Patel ÐÉÛÅÔ:

Hi Vasiliy,

We should handle this better. I’m not sure how to guarantee that we can generate ldm/stm without regalloc support. Our only idea is to teach the new register allocator to do a much better job satisfying register hints. If you’d like to track this, feel free to file a bug.

Thanks,
-Andy