LLVM is doing something a bit weird in this example (which messes up DSA)

Hi all,

I have yet another DSA-related question :), and I would appreciate
your help. Actually, the following example generates some interesting
potential issues in the LLVM IR too.

Here is the example in C:
#define CAS(x,y,z) __atomic_compare_exchange_n(x,&(y),z,true,0,0)

int main() {
  int *x = 0;
  int y = 0;
  int *z = x;
  CAS(&z,x,&y); // if (z == x) z = &y;
  assert(*z == y);
  return 0;
}

Now, when compiled into LLVM IR 3.4 (with -mem2reg), an interesting
thing happens in this LLVM IR excerpt:
  %1 = bitcast i32** %z to i64*
  %2 = bitcast i32** %x to i64*
  %3 = bitcast i32** %.atomictmp to i64*
  %4 = load i64* %2, align 8
  %5 = load i64* %3, align 8
  %6 = cmpxchg i64* %1, i64 %4, i64 %5 monotonic

More specifically, there is this %2 bitcast and the subsequent %4 load
that effectively turned an i32* pointer value into an i64 integer
value without using ptrtoint instruction.
My first question is whether that is even allowed in LLVM IR?
It feels like ptrtoint gets bypassed somehow, which does not seem right.

Now, if this LLVM IR code is indeed fine, then we have a problem with
DSA when it gets to this cmpxchg instruction since DSA then does not
know that the second and third arguments to cmpxchg are in fact
pointers. Messy stuff...

Your help would be greatly appreciated...

Thanks!

Cheers,
-- Zvonimir

Hi all,

I have yet another DSA-related question :), and I would appreciate
your help. Actually, the following example generates some interesting
potential issues in the LLVM IR too.

Here is the example in C:
#define CAS(x,y,z) __atomic_compare_exchange_n(x,&(y),z,true,0,0)

int main() {
   int *x = 0;
   int y = 0;
   int *z = x;
   CAS(&z,x,&y); // if (z == x) z = &y;
   assert(*z == y);
   return 0;
}

Now, when compiled into LLVM IR 3.4 (with -mem2reg), an interesting
thing happens in this LLVM IR excerpt:
   %1 = bitcast i32** %z to i64*
   %2 = bitcast i32** %x to i64*
   %3 = bitcast i32** %.atomictmp to i64*
   %4 = load i64* %2, align 8
   %5 = load i64* %3, align 8
   %6 = cmpxchg i64* %1, i64 %4, i64 %5 monotonic

More specifically, there is this %2 bitcast and the subsequent %4 load
that effectively turned an i32* pointer value into an i64 integer
value without using ptrtoint instruction.
My first question is whether that is even allowed in LLVM IR?
It feels like ptrtoint gets bypassed somehow, which does not seem right.

I believe this optimization is correct (although it obscures source-level types information) on systems for which pointers are 64 bits in size (e.g., x86_64).

Now, if this LLVM IR code is indeed fine, then we have a problem with
DSA when it gets to this cmpxchg instruction since DSA then does not
know that the second and third arguments to cmpxchg are in fact
pointers. Messy stuff...

Have you looked at the local DSGraphs to verify that DSA is not inferring the pointer type for the fields accessed by the cmpxchg? I would think that DSA can handle this sort of thing, although I can think of two reasons that it may not:

a) It may not properly understand the semantics of the cmpxchg instruction. A quick check into the Local.cpp file could confirm this.

b) DSA may be "losing" the pointer through the bitcast. I think we've modified DSA to not do that, but again, it's possible.

Can you verify that DSA is not inferring a link for offset 0 in %1?

-- John T.

Thanks for your help John...

Yup, I looked at Local.cpp even before I fired off my question to the
mailing list.

Take a peek here at line 464:
https://github.com/llvm-mirror/poolalloc/blob/master/lib/DSA/Local.cpp

Based on my understanding of this line, if AtomicCmpXchgInst does not
return a pointer type, nothing gets merged. And in the example I
posted, a pointer value is indeed not returned (but rather an i64
value "masking" an i32* value). And so DSA does not do the right thing
in this case...

-- Zvonimir

Hi John,

Sorry for bugging you again, but I would just like to know if you are
maybe looking into this to give us a hand.

Coming back to our code:
  %1 = bitcast i32** %z to i64*
  %2 = bitcast i32** %x to i64*
  %3 = bitcast i32** %.atomictmp to i64*
  %4 = load i64* %2, align 8
  %5 = load i64* %3, align 8
  %6 = cmpxchg i64* %1, i64 %4, i64 %5 monotonic

And after studying DSA some more, I think that DSA "loses" a pointer
in the load instruction %4 (or maybe bitcast %2).

Anyhow, I am not a DSA expert, but rather just an avid user :), and so
your help would be greatly appreciated.

One potential solution that comes to my mind would be to mark any
pointers for which one level of indirection gets removed with bitcast
using DSA's PtrToIntMarker (since that is what can ultimately happen
with such pointers). Do you think that would help?

Thanks!

Best,
-- Zvonimir

Quick update on this: it appears that DSA is correctly marking the
nodes in question (in my example below) as IntToPtr nodes even though
inttoptr operation is in fact happening implicitly through bitcast. So
we are now leveraging (in our SMACK verifier) the IntToPtr flag to
compute conservative alias information and hence preserve soundness of
the verifier in these kinds of situations. Hopefully that will be good
enough...