Proposal for a new LLVM concurrency memory model

The compiler has to know the threading model. There's no way around
that. gcc has the -pthread switch, for example.

The compiler integrates into a comprehensive programming environment.
It has to know what the environment provides and uses. That _is_ very much
the compiler's responsibility.

That's why target-independent code is so hard to write. :slight_smile:

Again, my vote is to define vector atomics as respecting atomicity across
elements and make it the compiler's and user's job to know when it can use
them.

                          -Dave

In terms of x86, the equivalent would be if there was an instruction like

ADDPD mem128, xmm

and we could put a LOCK prefix on it.

                            -Dave

The compiler integrates into a comprehensive programming environment.
It has to know what the environment provides and uses. That _is_ very much
the compiler's responsibility.

I agree the compilers should know about threads, but enough to work
around the issues and not start creating threads the user didn't ask
for.

I'm not against enhancing the compiler to that point, I just think
that you're digging too deep and Balrog might show up unexpectedly.
Thread issues can be daunting by themselves, automatically creating
threaded code is a recipe to disaster, IMHO.

Again, my vote is to define vector atomics as respecting atomicity across
elements and make it the compiler's and user's job to know when it can use
them.

By means of #pragmas?

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

I'm not against enhancing the compiler to that point, I just think
that you're digging too deep and Balrog might show up unexpectedly.
Thread issues can be daunting by themselves, automatically creating
threaded code is a recipe to disaster, IMHO.

Eventually all compilers are going to have to create threads. The
architecture roadmap demands it.

> Again, my vote is to define vector atomics as respecting atomicity across
> elements and make it the compiler's and user's job to know when it can
> use them.

By means of #pragmas?

That's one way. OpenMP is the standard API to control this but
vendors have tons of extensions and their own directives to tell
the compiler what is or is not legal. IMHO, LLVM is the wrong
place to deal with these. It should be done at a higher level.
It _can_ be done at the LLVM IR level, but much less conveniently.

                         -Dave

And it's not the disaster you fear. It's much easier for the compiler
to reason about threads it creates. It gets to control the model.

Lots of compilers have been doing this for years.

                           -Dave

Now I agree with you, completely. :wink:

It is important, it is the future, but it is in the library/language
definition space that the solution will be more elegant.

Like functional languages, HPF, openMPI, Scala having intrinsic
message passing or openMP constructs, the compiler has much more
freedom to guess and do the right thing (tm).

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

I actually fear the well-defined thread implementation of the compiler
interacting with chaotic threads from user code... :wink:

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

I suspect not. :slight_smile: I still mean within the compiler, just at a higher
level program abstraction; a higher-level IR.

Concurrent languages can help, but history shows that it's tough to
get them accepted. HPF is a four-letter word here. :wink: Now CAF,
on the other hand...

Maybe this trend will change with the new silicon constraints, but I'm
skeptical. We'll be programming in C, C++ and Fortran for a long time.

The compiler is going to have to find parallelism and the user is
going to have to help with directives.

                            -Dave

If the user uses OpenMP or some other concurrent language model understood by
the compiler (UPC, CAF, etc.) everything is fine. If the user is mixing a
concurrency library like pthreads or MPI, it's up to the user to make sure the
code is correct. The compiler has to be conservative unless the user tells it
not to be via directive.

Yes, mixing thread models puts a lot of burden on the user. There's not much
we can do about that. Compilers can be taught some things about concurrency
libraries but there's a point of diminishing returns.

                           -Dave

Maybe this trend will change with the new silicon constraints, but I'm
skeptical. We'll be programming in C, C++ and Fortran for a long time.

That I don't deny...

The compiler is going to have to find parallelism and the user is
going to have to help with directives.

But that is taking a bit too long... I reckon we're going to see
Haskell in production before that... :wink:

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

I think we're diverging from the memory model now... David, I think
you're happy with the current proposal to define atomics as
non-tearing even for vector operands (acknowledging that the backend
may fail to codegen operands that are too big)? Did I miss any other
suggestions you made?

Thanks,
Jeffrey

I've changed racy loads from returning trap values to returning undef.
I think that's ok for Boehm's switch optimization
(http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/why_undef.html) given
that undef values can appear different on each use, but there could
easily be other optimizations we haven't thought of that it breaks. On
the other hand, returning trap values would break partial loads, as
you pointed out.

I suspect we'll be able to change this again after accepting the
memory model if it turns out to impede optimizations.

I think we're diverging from the memory model now... David, I think

Yep, thanks for pulling us back. :slight_smile:

you're happy with the current proposal to define atomics as
non-tearing even for vector operands (acknowledging that the backend
may fail to codegen operands that are too big)? Did I miss any other
suggestions you made?

We want float atomics, both scalar and vector. I'm still reviewing the
proposal and will have more comments in a couple days.

                         -Dave