Eliminate Store-Load pair even the LoadInst is volatile

Hi all,

I put a case into llvm and got the following .ll code:


%r1419_0_0_0_i376 = alloca i32 ; <i32*> [#uses=2]

%tmp1476_i = lshr i32 %tmp1226_i, 24 ; [#uses=1]

store i32 %tmp1476_i, i32* %r1419_0_0_0_i376, align 4
%tmp1505_i = volatile load i32* %r1419_0_0_0_i376, align 4 ; [#uses=1]

%tmp1542_i = getelementptr [256 x i8]* @Te, i32 0, i32 %tmp1505_i

llvm opt can’t remove the redundant store-load pair to just use the value %tmp1476 as the load is volatile.
But I think for the above situation, it’s safe to remove store-load, as the allocad %r1419_0_0_0_i376 just has two users (the one load and one store), correct?

Can I add some code to instcombine or dce for this?

Zhou Sheng wrote:

Hi all,

I put a case into llvm and got the following .ll code:

...
%r1419_0_0_0_i376 = alloca i32 ; <i32*> [#uses=2]
...
%tmp1476_i = lshr i32 %tmp1226_i, 24 ; <i32> [#uses=1]

store i32 %tmp1476_i, i32* %r1419_0_0_0_i376, align 4
%tmp1505_i = volatile load i32* %r1419_0_0_0_i376, align 4 ; <i32> [#uses=1]

%tmp1542_i = getelementptr [256 x i8]* @Te, i32 0, i32 %tmp1505_i
...

llvm opt can't remove the redundant store-load pair to just use the value %tmp1476 as the load is volatile.
But I think for the above situation, it's safe to remove store-load, as the allocad %r1419_0_0_0_i376 just has two users (the one load and one store), correct?
  

I don't think you can do that. Loads are often marked volatile because the memory location is accessed in some "undefined" way. For example, if the pointer to the alloca'ed memory escapes, it could be used for synchronization with another thread. There could be other, concurrent stores by other threads that will change the value read by the load. Your optimization would break such code because this thread would only see stores that it itself made.

Can I add some code to instcombine or dce for this?

In general, I think shortcutting the volatile load would be incorrect, and therefore, such a transform should not be a part of instcombine or dce.

However, if you wanted to, you could write your own custom pass that shortcuts the load *if* you know that in your particular situation that such shortcuts are safe. For example, if I was writing a front-end for LLVM that naively generated volatile loads for all synchronization variables, I could write a pass that would short-cut the load if it could also prove that the alloca never escapes (and therefore, only the local thread can access the alloca'ed memory).

-- John T.