From: "Krzysztof Parzyszek via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Tuesday, November 10, 2015 1:22:57 PM
Subject: Re: [llvm-dev] SROA and volatile memcpy/memset
>> I have a customer testcase where SROA splits a volatile memcpy and
>> we end up
>> generating bad code[1]. While this looks like a bug, simply
>> preventing SROA
>> from splitting volatile memory intrinsics causes basictest.ll for
>> SROA to
>> fail. Not only that, but it also seems like handling of volatile
>> memory
>> transfers was done with some intent.
>
> There is no such thing as a volatile memcpy or memset in standard
> ISO C,
> so what exactly are you doing and why do you expect it to work that
> way?
The motivating example has an aggregate copy where the aggregate is
volatile, followed by a store to one of its members. (This does not
have
anything to do with devices.) SROA expanded this into a series of
volatile loads and stores, which cannot be coalesced back into fewer
instructions. This is clearly worse than doing the copy and then the
member overwrite.
--- test.c ---
typedef struct {
volatile unsigned int value;
} atomic_word_t;
typedef union {
struct {
unsigned char state;
unsigned char priority;
};
atomic_word_t atomic;
unsigned int full;
} mystruct_t;
mystruct_t a;
unsigned int foo(void) {
mystruct_t x;
mystruct_t y;
x.full = a.atomic.value;
y = x;
y.priority = 7;
return y.full;
}
--------------
SROA seems to be doing a number of things here. What about if we prevented SROA from generating multiple slices splitting volatile accesses? There might be a significant difference between that and something like this test (test/Transforms/SROA/basictest.ll):
define i32 @test6() {
; CHECK-LABEL: @test6(
; CHECK: alloca i32
; CHECK-NEXT: store volatile i32
; CHECK-NEXT: load i32, i32*
; CHECK-NEXT: ret i32
entry:
%a = alloca [4 x i8]
%ptr = getelementptr [4 x i8], [4 x i8]* %a, i32 0, i32 0
call void @llvm.memset.p0i8.i32(i8* %ptr, i8 42, i32 4, i32 1, i1 true)
%iptr = bitcast i8* %ptr to i32*
%val = load i32, i32* %iptr
ret i32 %val
}
-Hal