Good day all,
I’ve spent a few days resurrecting the circa-2015 work on removing the explicit alignment argument (4th arg) from the @llvm.memcpy/memmove/memset intrinsics in favour of using the alignment attribute on the pointer args of calls to the intrinsic. This work was first proposed back in August 2015 by Lang Hames:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html (item 2)
and an attempt at landing the work was made by Pete Cooper in November 2015, but then backed out due to unspecified bot failures:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
I’ve prepared changes for LLVM, Clang, and Polly that are now up for review:
Importantly for those maintaining downstream users of the LLVM API, this changes the prototypes for the @llvm.memcpy/memmove/memset intrinsics and changes the IRBuilder API for creating memcpy and memmove calls.
For example, IR which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
The LLVM change includes auto upgrade of the old IR. However, match expressions in IR tests and calls to IRBuilder’s CreateMemCpy & CreateMemMove will need to be updated.
My plan is to post another note to the list when the change is landed, and stable.
Comments? Concerns?
-Daniel
Thanks for working on this. This still seems like the right thing to do (and will let us represent separate source and destination alignments). -Hal
Hi all,This change has been reviewed, and appears to be ready to land (review available here if anyone still wants to chime in: https://reviews.llvm.org/D41675 ). The process that we’re going to use for landing this will take a few steps. To wit:
Step 1) Remove align argument, and add align attribute to pointer args. Require that src & dest have the same alignment via verifier rule. Also update Clang & Polly tests to pattern-match the new form.
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods.
Unless there are objections, I would like to land step 1 tomorrow (Friday Jan 19) morning. My hope is to be completing up to step 4 by the middle of next week, and then steps 5 & 6 in the coming weeks.
Landing step 1 will potentially have a large impact on downstream consumer’s tests. For example, a front-end that checks IR generation of constructs implemented using memcpy/memmove/memset; the Clang tests for OpenMP firstprivate/lastprivate clauses, for instance, required a lot of manual updating. These sorts of tests will need to be updated to match the new pattern printed for the memory intrinsics. I’ve included an extended sed script in the commit log that can update many patterns, but definitely does not catch them all so some manual updating will be required.
-Daniel
Hi all,This change has been reviewed, and appears to be ready to land (review available here if anyone still wants to chime in: https://reviews.llvm.org/D41675 ). The process that we’re going to use for landing this will take a few steps. To wit:
Step 1) Remove align argument, and add align attribute to pointer args.
What does this mean? Are you suggesting that alignment become part of PointerType? That would introduce lots of problems, because unaligned and aligned pointers would require casts, which then interfere with optimizations.
-Chris
Bad communication on my part; perhaps, I used non-standard terminology to refer to a CallInst’s parameter attributes. I am not touching the definition of PointerType in any way; I don’t see the need, and I’m not that ambitious.
Currently, a call to @llvm.memcpy might look like this:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
where the ‘i32 4’ argument (2nd last arg) is the minimum alignment of both of the pointer args (%src & %dest). After this change, this same call will instead read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
where we use an align attribute on first two function-call arguments. (i.e CallInst::addParamAttr(0, …) & CallInst::addParamAttr(1, …))
That is all that this first step is doing. With the caveat that we, temporarily (basically for the purposes of testing the initial change), will require that the alignment attribute on both of the pointer args be the same value. So, nothing like this for now:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 16 %dest, i8* align 4 %src, i32 100, i1 false)
The alignments-equal restriction will be lifted in step 2, which should happen next week.
Is that clearer?
-Daniel
Awesome, this sounds like a great approach to me! Thanks Daniel,
-Chris
Hi all,
A quick note just to let people know that as of this past Friday my go at this work has been fully landed. It ended up being a back-burner item, so it took longer than I would have liked to get completed. None the less, the changes made were:
- The IRBuilders in LLVM, Clang, and Polly were all updated to create only the new form of the memory intrinsics.
- All LLVM passes to understand the new forms of the intrinsics; modulo the notes below.
- The IRBuilder APIs for creating the old-style memcpy/memmove/memset forms and the MemIntrinsicInst API for getting/setting the old generic/conservative alignment have all been eliminated.
There are a few places where the LLVM pass updates were conservative, and could potentially be made more aggressive by a motivated individual — list below.
All that remains is for interested folk to enhance lowering of small memcpy/memmove into load/store sequences. The lowering in SelectionDAG currently just conservatively uses the MinAlign() of the source & dest alignments as the alignment for the loads and stores. I suspect that we could do better by teaching the lowering that the source & dest can have different alignments.
Note that, also, the createMem[Cpy|Move]Loop type functions used in the LowerMemIntrinsics pass are also now invoked with different source & dest alignments by LowerMemIntrinsics, rather than the same alignment for both, so these helpers will now be invoked with more information than they have in the past; I’m guessing that it’s possible they could do better with this information. For example, createMemMoveLoop() doesn’t even use the alignments it’s given at all right now, and the neither of the createMemCpyLoop*() functions try to set the alignments on the loads & stores it creates.
Passes that have conservative alignments after updating:
- SelectionDAG
- AArch64FastISel
- ARMFastISel
- MemorySanitizer
- MemCpyOpt : Call slot optimization
- InlineFunction : HandleByValArgumentInit
- LowerMemIntrinsics (see note above)
Cheers,
Daniel
Hi Daniel,
a quick question (and kind-of a follow-up to
<https://lists.llvm.org/pipermail/llvm-dev/2017-July/115665.html>\): Do the
pointers have to be aligned even if the size is 0? It would be nice to have
this stated explicitly in the LangRef.
Kind regards,
Ralf
Someone more knowledgable can correct me if I’m wrong here…
As I understand it the alignment attribute on any pointer argument in a call is telling the compiler something about the alignment of that pointer, and that is all. It doesn’t know/understand/care about anything else about the call.
So, in the case of memcpy/memmove/memset I would expect that the alignment argument attribute, if you provide one, has to be correct for the value of the pointer argument it is being attached to. I would also expect that whether the memmove/memcpy/memset is doing a 0-length operation or not is entirely immaterial to the alignment of its pointer args.
As an aside: The alignment attributes on the source/dest of a memcpy/memmove/memset intrinsic are entirely optional.
-Daniel