Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

Hi All,

In the language reference manual, the access behavior of the memcpy,
memmove and memset intrinsics is not well defined with respect to the
volatile flag. The LRM even states that "it is unwise to depend on it".
This forces optimization passes to be conservatively correct and prevent
optimizations.

A very simple example of this is :

$ cat test.c

#include <stdint.h>

struct R {
  uint16_t a;
  uint16_t b;
};

volatile struct R * const addr = (volatile struct R *) 416;

void test(uint16_t a)
{
  struct R r = { a, 1 };
  *addr = r;
}

$ clang -O2 -o - -emit-llvm -S -c test.c
; ModuleID = 'test.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.R = type { i16, i16 }

@addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8

define void @test(i16 zeroext %a) nounwind uwtable {
  %r.sroa.0 = alloca i16, align 2
  %r.sroa.1 = alloca i16, align 2
  store i16 %a, i16* %r.sroa.0, align 2
  store i16 1, i16* %r.sroa.1, align 2
  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32
  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

In the generated ir, the loads are marked as volatile. In this case,
this is due to the new SROA being conservatively correct. We should
however expect the test function to look like this much simpler one
(which was actually the case with llvm <= 3.1) :

define void @test(i16 zeroext %a) nounwind uwtable {
  store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
  store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

I propose to specify the volatile access behavior for those intrinsics :
instead of one flag, they should have 2 independant volatile flags, one
for destination accesses, the second for the source accesses.

If there is general agreement, I plan to proceed with the following steps :
1. Specify the access behavior (this patch).
2. Auto-upgrade the existing memcpy, memmove and memset intrinsics into
the more precise form by replicating the single volatile flag and rework
the MemIntrinsic hierarchy to provide (is|set)SrcVolatile(),
(is|set)DestVolatile() and implement (set|is)Volatile in terms of the
former 2 methods. This will conservatively preserve semantics. No
functional change so far. Commit 1 & 2.
3. Audit all uses of isVolatile() / setVolatile() and move them to the
more precise form. From this point, more aggressive / precise
optimizations can happen. Commit 3.
4. Teach clang to use the new form.
5. Optionally remove the old interface form,as there should be no
in-tree users left. This would however be an API change breaking
external code.

Cheers,

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch (7.21 KB)

I can't think of a better way to do this, so I think it's ok.

I also submitted a complementary patch on llvm-commits clarifying volatile semantics.

-Andy

LGTM as well.

I’m particularly happy to specify that there is no guarantee about the width of loads and/or stores resulting, as I think that’s important for frontends to not depend on (or to request a change if they need to depend on it).

Thanks Andy far the additional clarification, that helps pin down things even more.

Thanks Andy and Chandler,

After specifying the volatile access behaviour, the second step was to
autoupgrade the memmove/memcpy intrinsics, and implement
(is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no
functional change.

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the
one you already reviewed, unaltered.

0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch
implements this second step. It requires clang tests to be patched with
clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch as
the codegen'd memcpy / memmove will use the new format.

Finally, 0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch
is purely mechanical : it upgrades all llvm tests (but
auto_upgrade_intrinsics :)) to use the new format for memcpy/memmove.

There are no functional changes, "make check-llvm check-clang" pass
after each patch is applied.

When those patches are accepted, I will commit them and start
propagating the new interface in the codebase.

Cheers,

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch (7.21 KB)

0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch (22.1 KB)

0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch (178 KB)

clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch (21.7 KB)

Same patches as before, but 0002-memcpy has been updated to put the
(is|set)SrcVolatile methods to where they logically belong :
MemTransferInst. This makes (is|set)Volatile methods look a bit ugly to
keep compatibility with existing behaviour, but they will hopefully
disappear when all users have moved to the new interface --- in the next
series of patches.

I plan to give a try to phabricator for the next patches, but could not
for the actual ones as they touch llvm and clang.

Cheers,

0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch (7.21 KB)

0002-memcpy-memmove-set-volatile-on-the-source-or-destina.patch (22.4 KB)

0003-Upgrade-all-tests-but-auto_upgrade_intrinsics-to-use.patch.bz2 (20.8 KB)

clang-0001-Update-checks-to-take-into-account-the-changes-on-th.patch (21.7 KB)