interest in support for Transactional Memory?

Hi,

I would like to know whether the community is interested in getting support
for Transactional Memory (TM) merged in upstream LLVM. TM basically gives you
transaction properties (eg, virtually atomic + isolated execution) for
ordinary program code. Thus, to make incrementing a counter thread-safe, you
could say __transaction { counter++; } and the compiler would transform this
code so that it uses a TM library, which in turn does concurrency control for
the memory accesses in a transaction. Recent studies support the assumption
that shared-memory synchronization with transactions is supposed to be a lot
easier than when using locking, for example.

There seems to be interest by several large companies towards TM. Intel,
Oracle, IBM, Red Hat, and others have been working on a specification for
memory transactions in C++:
http://software.intel.com/file/21569
Intel, Red Hat, and my group have been working on specifying a TM library
ABI:
http://software.intel.com/file/8097/
Intel has published an ICC prototype version with TM support, and Red Hat is
working on TM support in the gcc "trans-memory" branch.

My group has been working on compiler support for TM for LLVM during the last
couple of years, and we have built an LLVM-based compiler, DTMC, that covers
both the frontend part (ie, parsing the __transaction statements and
translating txn boundaries to IR markers) and the IR transformations
(transforming all transactional code). The former is a patch to llvm-gcc
(based, in turn, on gcc's TM support), but we're currently also looking at
other options for the frontend. The latter is a normal module pass, and the
larger part of the work.
We have also built a TM library, TinySTM++ (but TM libraries are not compiler
specific). You can have a look at the last stable releases of the code
(compiler and library) at http://tm.inf.tu-dresden.de

Torvald

Hi Torvald,

I would like to know whether the community is interested in getting support
for Transactional Memory (TM) merged in upstream LLVM.

I guess not :frowning:

TM basically gives you

transaction properties (eg, virtually atomic + isolated execution) for
ordinary program code. Thus, to make incrementing a counter thread-safe, you
could say __transaction { counter++; } and the compiler would transform this
code so that it uses a TM library, which in turn does concurrency control for
the memory accesses in a transaction. Recent studies support the assumption
that shared-memory synchronization with transactions is supposed to be a lot
easier than when using locking, for example.

Why does this require special LLVM support rather than, say, having the front
end lower everything to library calls and so forth, like gcc does for OpenMP?

Ciao,

Duncan.

There are different ways one could go there. First, if there is a frontend with
TM support available, you only need to do a few things in LLVM:

1) Txn begin is like a setjmp call. You need to ensure that stack slots are
restored to the original values when aborting and restarting a txn. (Or you
can ensure that slots that are live-in into a txn begin do not get reused
until a matching commit). LLVM currently skips stack slot coloring if setjmp
is called in the function, so one could extend this to handling an returns-
twice attribute. However, this coarse approach is costly (testing it with a
microbenchmark (accessing a tree with txns), it decreased performance by 30%).

2) Functions that are called from txns get cloned and the clones get
instrumented. The ABI requirements regarding how to store the clone functions
in native code and how to look them up are not finalized yet, but it may
require LLVM support as well.

If developing TM support from scratch, I would not put it in the frontend
because:

1) Performance. Doing TM instrumentation after running other standard
optimizations is worthwhile. Inlining, constant propagation, ... and LTO in
general can all give you less loads and stores in txns (which either have a
decent overhead for software TM libraries or can count towards hardware TM
(HTM) capacity limits). You can potentially do better alias and dependency
analysis after other optimizations in IR.

2) The TM support is not necessarily language specific, IR-level TM
instrumentation could be used with light-weight TM support in several different
frontends.

3) The instrumentation for the kind of HTM that we have worked with can be
expressed with inline asm in library code. The library can then be linked and
LTO'd, so there's no noticeable performance difference to directly transforming
loads/stores to HTM transactional loads/stores. However, this might not be the
case for each HTM. For example, transactionally accessed variables on the
stack might have to be separated from nontransactionally accesses stack slots
if they are on the same cache line, or the compiler has to detect this an
instruct the TM to use STM instead of HTM.

Torvald

Hi Torvald,

transaction properties (eg, virtually atomic + isolated execution) for
ordinary program code. Thus, to make incrementing a counter thread-safe,
you could say __transaction { counter++; } and the compiler would
transform this code so that it uses a TM library, which in turn does
concurrency control for the memory accesses in a transaction. Recent
studies support the assumption that shared-memory synchronization with
transactions is supposed to be a lot easier than when using locking, for
example.

Why does this require special LLVM support rather than, say, having the
front end lower everything to library calls and so forth, like gcc does
for OpenMP?

There are different ways one could go there. First, if there is a frontend with
TM support available, you only need to do a few things in LLVM:

1) Txn begin is like a setjmp call. You need to ensure that stack slots are
restored to the original values when aborting and restarting a txn. (Or you
can ensure that slots that are live-in into a txn begin do not get reused
until a matching commit). LLVM currently skips stack slot coloring if setjmp
is called in the function, so one could extend this to handling an returns-
twice attribute. However, this coarse approach is costly (testing it with a
microbenchmark (accessing a tree with txns), it decreased performance by 30%).

Just curious if LLVM's zero cost exception system (lib unwind like behavior), would have
any effect on the above?

2) Functions that are called from txns get cloned and the clones get
instrumented. The ABI requirements regarding how to store the clone functions
in native code and how to look them up are not finalized yet, but it may
require LLVM support as well.

If developing TM support from scratch, I would not put it in the frontend
because:

1) Performance. Doing TM instrumentation after running other standard
optimizations is worthwhile. Inlining, constant propagation, ... and LTO in
general can all give you less loads and stores in txns (which either have a
decent overhead for software TM libraries or can count towards hardware TM
(HTM) capacity limits). You can potentially do better alias and dependency
analysis after other optimizations in IR.

2) The TM support is not necessarily language specific, IR-level TM
instrumentation could be used with light-weight TM support in several different
frontends.

3) The instrumentation for the kind of HTM that we have worked with can be
expressed with inline asm in library code. The library can then be linked and
LTO'd, so there's no noticeable performance difference to directly transforming
loads/stores to HTM transactional loads/stores. However, this might not be the
case for each HTM. For example, transactionally accessed variables on the
stack might have to be separated from nontransactionally accesses stack slots
if they are on the same cache line, or the compiler has to detect this an
instruct the TM to use STM instead of HTM.

Torvald
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Garrison

I'm not really familiar with this system but I guess not because for unwinding
you do not need to restore all stack contents.

Torvald