LLVM Concurrency and Undef

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

From the LLVM Language Reference Manual, I see the following

definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?
If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

Thanks,
Santosh

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

>From the LLVM Language Reference Manual, I see the following
definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?

It behaves like any other undef value... which do often lead to
undefined behavior.

If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

The point is primarily to allow optimizations like LICM to introduce
loads whose value is never used. It also keeps consistent semantics
through CodeGen, where some targets widen loads.

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

The intention is that they should have the same semantics.

-Eli

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

>From the LLVM Language Reference Manual, I see the following
definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?

It behaves like any other undef value... which do often lead to
undefined behavior.

If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

The point is primarily to allow optimizations like LICM to introduce
loads whose value is never used. It also keeps consistent semantics
through CodeGen, where some targets widen loads.

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

The intention is that they should have the same semantics.

Suppose there are three threads T1, T2 and T3,
T1 (S1 )stores to a location l as non-atomic,
T2 then (S2)stores to l as SC-atomic,
later T3 (L3)loads from l as SC-atomic.

I think the load @ T3 should return undef, since it can see both
writes from T1 T2. Then the question is if the SC store @ T2 --- S2
and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with)
edge.

This will affect if later conflicting accesses are ordered or not, and
whether memory accesses are ordered makes load return undef or not.

If the S2 and L3 still introduces an SW edge, the next question is
suppose there is a later SC-load L3' @ T3, does it also return undef?
But this potentially makes the SC atomic sequence S2 L3 L3'
inconsistent --- later SC loads can read different writes from earlier
loads if there are no SC-stores in between.

So I think data-races SC/acq/rel atomics cannot introduce SW edges.
Intuitively if the locations designed for locks are used by non-atomic
memory accesses, the locks cannot behave correctly. Is it correct?

Yes, that is correct. I am now convinced that it is worthwhile to
include the definition of how SW edges form in LangRef.

-Eli

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

>From the LLVM Language Reference Manual, I see the following
definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?

It behaves like any other undef value... which do often lead to
undefined behavior.

If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

The point is primarily to allow optimizations like LICM to introduce
loads whose value is never used. It also keeps consistent semantics
through CodeGen, where some targets widen loads.

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

The intention is that they should have the same semantics.

Suppose there are three threads T1, T2 and T3,
T1 (S1 )stores to a location l as non-atomic,
T2 then (S2)stores to l as SC-atomic,
later T3 (L3)loads from l as SC-atomic.

I think the load @ T3 should return undef, since it can see both
writes from T1 T2. Then the question is if the SC store @ T2 --- S2
and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with)
edge.

This will affect if later conflicting accesses are ordered or not, and
whether memory accesses are ordered makes load return undef or not.

If the S2 and L3 still introduces an SW edge, the next question is
suppose there is a later SC-load L3' @ T3, does it also return undef?
But this potentially makes the SC atomic sequence S2 L3 L3'
inconsistent --- later SC loads can read different writes from earlier
loads if there are no SC-stores in between.

So I think data-races SC/acq/rel atomics cannot introduce SW edges.
Intuitively if the locations designed for locks are used by non-atomic
memory accesses, the locks cannot behave correctly. Is it correct?

Yes, that is correct. I am now convinced that it is worthwhile to
include the definition of how SW edges form in LangRef.

Some of the descriptions in LangRef and
http://llvm.org/docs/Atomics.html are like
1) monotonic atomic to the same location are totally ordered
2) all SC atomic to to any location are totally ordered
3) monotonic loads should see monotonic stores monotonically
...

However, these conditions can be broken if any of them are
'interferenced' by non-atomic memory accesses, since that can turn
some of the atomic loads to be undef (with any values). So it is
unlikely forming a total order any more.

So, do these properties hold only for data race free atomics? For example,

T1: T2:
                                              b: non-atomic store 2 to x
                                              c: sc store 3 to y
d: r1 = sc load from y
e: r2 = sc load from x

Here, r2 still returns undef, since e can see a non-atomic store
although from happens-before. Then a c d e may not be totally ordered.

It's already there:

"We define a happens-before partial order as the least partial order that

* Is a superset of single-thread program order, and
* When a synchronizes-with b, includes an edge from a to b.
Synchronizes-with pairs are introduced by platform-specific
techniques, like pthread locks, thread creation, thread joining, etc.,
and by atomic instructions. (See also Atomic Memory Ordering
Constraints)."

...

"acquire - In addition to the guarantees of monotonic, if this
operation reads a value written by a release atomic operation, it
synchronizes-with that operation."

(Strictly, the release operation synchronizes-with the acquire, not
the other way around.)

Since atomic/non-atomic races are defined to return undef from the
load, even if the load has seq_cst ordering, the load never reads a
value written, so none of the stores synchronize with the load.

The text does say that all seq_cst loads and stores participate in the
global seq_cst ordering that's compatible with the happens-before
ordering, but that doesn't imply that happens-before is a superset of
the seq_cst ordering. I'm not sure whether any paradoxes arise from
allowing racy seq_cst loads to return undef, but I haven't seen any
examples so far.

I don't object to clarifying the text, of course.

Jeffrey

Say r1==3, so we're guaranteed that 'e' happens after 'b'. Since
neither a nor b happens before the other, and at least one of them is
not atomic, r2==undef. The seq_cst ordering a, c, d, e is consistent
with the observable results. That's true whether printing r2 prints 1,
2, or 47 since undef can be observed as any value.

Jeffrey

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

>From the LLVM Language Reference Manual, I see the following
definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?

It behaves like any other undef value... which do often lead to
undefined behavior.

If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

The point is primarily to allow optimizations like LICM to introduce
loads whose value is never used. It also keeps consistent semantics
through CodeGen, where some targets widen loads.

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

The intention is that they should have the same semantics.

Suppose there are three threads T1, T2 and T3,
T1 (S1 )stores to a location l as non-atomic,
T2 then (S2)stores to l as SC-atomic,
later T3 (L3)loads from l as SC-atomic.

I think the load @ T3 should return undef, since it can see both
writes from T1 T2. Then the question is if the SC store @ T2 --- S2
and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with)
edge.

This will affect if later conflicting accesses are ordered or not, and
whether memory accesses are ordered makes load return undef or not.

If the S2 and L3 still introduces an SW edge, the next question is
suppose there is a later SC-load L3' @ T3, does it also return undef?
But this potentially makes the SC atomic sequence S2 L3 L3'
inconsistent --- later SC loads can read different writes from earlier
loads if there are no SC-stores in between.

So I think data-races SC/acq/rel atomics cannot introduce SW edges.
Intuitively if the locations designed for locks are used by non-atomic
memory accesses, the locks cannot behave correctly. Is it correct?

Yes, that is correct. I am now convinced that it is worthwhile to
include the definition of how SW edges form in LangRef.

Some of the descriptions in LangRef and
http://llvm.org/docs/Atomics.html are like
1) monotonic atomic to the same location are totally ordered
2) all SC atomic to to any location are totally ordered
3) monotonic loads should see monotonic stores monotonically
...

However, these conditions can be broken if any of them are
'interferenced' by non-atomic memory accesses, since that can turn
some of the atomic loads to be undef (with any values). So it is
unlikely forming a total order any more.

So, do these properties hold only for data race free atomics? For example,

T1: T2:
a: sc store 1 to location x
b: non-atomic store 2 to x
c: sc store 3 to y
d: r1 = sc load from y
e: r2 = sc load from x

Here, r2 still returns undef, since e can see a non-atomic store
although from happens-before. Then a c d e may not be totally ordered.

Say r1==3, so we're guaranteed that 'e' happens after 'b'. Since
neither a nor b happens before the other, and at least one of them is
not atomic, r2==undef. The seq_cst ordering a, c, d, e is consistent
with the observable results. That's true whether printing r2 prints 1,
2, or 47 since undef can be observed as any value.

My confusion is the difference between SC and AcquireRelease. The text says

...
SequentiallyConsistent

SequentiallyConsistent (seq_cst in IR) provides Acquire semantics for
loads and Release semantics for stores. Additionally, it guarantees
that a total ordering exists between all SequentiallyConsistent
operations.
...

Here, what does the total ordering mean? Does it mean given a sequence
of SC atomic in the total order, an SC load should return the recent
SC store from the sequence? This is like the meaning of Sequential
consistent memory model---any SC stores should be visible to all SC
loads immediately in the same order.

In the above program, we have the SC atomics

sc store 1 to x
sc store 3 to y
sc load from y = 3
sc load from x = 47 or undef

Here, no matter how we order them, there is no way to place a store 47
to x before load x with 47. Did I make a wrong assumption?

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

>From the LLVM Language Reference Manual, I see the following
definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?

It behaves like any other undef value... which do often lead to
undefined behavior.

If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

The point is primarily to allow optimizations like LICM to introduce
loads whose value is never used. It also keeps consistent semantics
through CodeGen, where some targets widen loads.

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

The intention is that they should have the same semantics.

Suppose there are three threads T1, T2 and T3,
T1 (S1 )stores to a location l as non-atomic,
T2 then (S2)stores to l as SC-atomic,
later T3 (L3)loads from l as SC-atomic.

I think the load @ T3 should return undef, since it can see both
writes from T1 T2. Then the question is if the SC store @ T2 --- S2
and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with)
edge.

This will affect if later conflicting accesses are ordered or not, and
whether memory accesses are ordered makes load return undef or not.

If the S2 and L3 still introduces an SW edge, the next question is
suppose there is a later SC-load L3' @ T3, does it also return undef?
But this potentially makes the SC atomic sequence S2 L3 L3'
inconsistent --- later SC loads can read different writes from earlier
loads if there are no SC-stores in between.

So I think data-races SC/acq/rel atomics cannot introduce SW edges.
Intuitively if the locations designed for locks are used by non-atomic
memory accesses, the locks cannot behave correctly. Is it correct?

Yes, that is correct. I am now convinced that it is worthwhile to
include the definition of how SW edges form in LangRef.

It's already there:

"We define a happens-before partial order as the least partial order that

* Is a superset of single-thread program order, and
* When a synchronizes-with b, includes an edge from a to b.
Synchronizes-with pairs are introduced by platform-specific
techniques, like pthread locks, thread creation, thread joining, etc.,
and by atomic instructions. (See also Atomic Memory Ordering
Constraints)."

...

"acquire - In addition to the guarantees of monotonic, if this
operation reads a value written by a release atomic operation, it
synchronizes-with that operation."

(Strictly, the release operation synchronizes-with the acquire, not
the other way around.)

Since atomic/non-atomic races are defined to return undef from the
load, even if the load has seq_cst ordering, the load never reads a
value written, so none of the stores synchronize with the load.

A undef can be replaced by any concrete value. If the undef returned
from the racy SC load happens to be instantiated by a value of the
latest SC store, does it consider as "... reads a value written by
..."? I think our answer is still no, right?

You lose all the atomic guarantees when there's a non-atomic store
involved. See http://llvm.org/docs/LangRef.html#memmodel ; in
particular, "if R is atomic, and all the writes Rbyte may see are
atomic, it chooses one of the values written [...] otherwise Rbyte
returns undef."

-Eli

Right; semantically, undef isn't any concrete value, even though
backends are forced to eventually turn it into one.

-Eli

Yep. Note also that 'undef' isn't even necessarily a _single_ arbitrary value.

http://llvm.org/docs/LangRef.html#undefvalues says "This example
points out that two 'undef' operands are not necessarily the same.
This can be surprising to people (and also matches C semantics) where
they assume that "X^X" is always zero, even if X is undefined. This
isn't true for a number of reasons, but the short answer is that an
'undef' "variable" can arbitrarily change its value over its "live
range"."

The conceptual undef's here will never show up in the IR because it's
extremely difficult to prove that a race exists.

Most of the effort towards tracking undefined behavior more generally
hasn't really centered around the presence of undef's in the IR. And
I'm not sure they could be usefully classified because they tend to
disappear from the IR quickly.

(Putting llvmdev back on the cc list.)

-Eli

Hi all,

I have been trying to understand the use of undef in both sequential
and concurrent programs.

>From the LLVM Language Reference Manual, I see the following
definition of undef.
"Undef can be used anywhere a constant is expected, and indicates that
the user of the value may receive an unspecified bit-pattern".
LLVM Language Reference manual also demonstrates how optimizers can
use these undef values to optimize the program.

However, on the other hand, with the LLVM Atomics and Concurrency
Guide states that
If code accesses a memory location from multiple threads at the same
time, the resulting loads return 'undef'.
This is different from the C++ memory model, which provides undefined
behavior. What is the rationale for returning an undef on racing
reads?

LLVM Atomics and Concurrency guide also states the following
"Note that speculative loads are allowed; a load which is part of a
race returns undef, but does not have undefined behavior"

If the speculative loads returns an undef and the returned value is
used, then it results in an undefined behavior. Am I correct?

It behaves like any other undef value... which do often lead to
undefined behavior.

If so, what is the purpose of returning an undef with a speculative load?
Is it to ensure that the subsequent uses of the value of the
speculatively introduced load is caught/detected by the optimization?

The point is primarily to allow optimizations like LICM to introduce
loads whose value is never used. It also keeps consistent semantics
through CodeGen, where some targets widen loads.

Is it possible to separate the "undef" in a sequential setting and
"undef" with speculative loads in a concurrent setting with separate
undefs?

The intention is that they should have the same semantics.

As for whether separating the sequential ``undef'' and the concurrent
``undef'', there is an analogous problem that is why C defines
different undefined behaviors in term of contexts that result in
undefined behaviors. When designing any analysis tools for C to
prevent undefined behaviors, we can reason about which kinds of
undefined behaviors can be eliminated. So classifying undefined
behaviors helps at the case.

In the LLVM setting, have we already defined different kinds of
``undef'' in a sequential setting? The question is whether or not LLVM
can define them without considering a high-level language that is
compiled to the IR, since usually the semantics in the high-level
language indicates how to classify them. However, the undef values
introduced by racy loads seem to be new, for example, C++ and Java do
not have such concepts. Then is it worth to separate it from those
existing undefs? Can we get any benefit from the view of compiler
design if not distinguishing the undefs?

The conceptual undef's here will never show up in the IR because it's
extremely difficult to prove that a race exists.

Most of the effort towards tracking undefined behavior more generally
hasn't really centered around the presence of undef's in the IR. And
I'm not sure they could be usefully classified because they tend to
disappear from the IR quickly.

Did you mean the undefined values tend to disappear from the IR of LLVM 3.0?

I just meant that there are many optimization passes that will remove
undef's from IR.

-Eli