About clock and wait instruction

hi,

In high level languge there are usually have time and sync instruction,
to handle async and sync operation.

I want to know how LLVM take an count of these.

thanks

yueqiang
2003/12/19

I'm not sure exactly what 'time' and 'sync' operations you're talking
about, or what languages support them. However, LLVM is designed to make
it trivial to interface to libraries and routines written in other
languages. This means that (even if LLVM should be extended to support
these operations eventually) you can always write them in assembly or
whatever, then call them from LLVM.

Note however that LLVM is _not_ a high-level language, and in fact we do
not explicitly support _many_ features of HLLs directly in LLVM. We use a
strategy of representing the key components of the high-level ideas using
low-level primitives that can be used for a variety of purposes. If you
describe what the time/sync operations are, maybe I can sketch out a
suggested mapping for you.

-Chris

Chris Lattner wrote:

In high level languge there are usually have time and sync instruction,
to handle async and sync operation.
I want to know how LLVM take an count of these.

I'm not sure exactly what 'time' and 'sync' operations you're talking
about, or what languages support them. However, LLVM is designed to make
it trivial to interface to libraries and routines written in other
languages. This means that (even if LLVM should be extended to support
these operations eventually) you can always write them in assembly or
whatever, then call them from LLVM.

Perhaps "clock" is referring to things like reading CPU cycle counter on most modern processors (asm("rpcc %v0", foo) or __RPCC() compiler builtin on Alpha, e.g.); in the long term, a candidate for builtins I suspect.

Note however that LLVM is _not_ a high-level language, and in fact we do
not explicitly support _many_ features of HLLs directly in LLVM. We use a
strategy of representing the key components of the high-level ideas using
low-level primitives that can be used for a variety of purposes. If you
describe what the time/sync operations are, maybe I can sketch out a
suggested mapping for you.

While on the subject of builtins/asm etc, most modern CPUs also have instructions to do memory barriers/fences (i.e. stall the CPU until all in-flight memory loads and/or stores preceding the fence instruction have finished e.g., - may be that's what "wait" instruction in the subject line refers to ?). These are typically implemented as compiler builtins or asm in C. I do realize that anyone working on running existing code through LLVM can easily work around the current asm/builtin implementation for now by calling an assembly function, however, a perhaps not so obvious implication/intent in a memory fence like builtin is that the programmer also does not want compiler to reorder load/store instructions across the barrier. I do not see any mechanism in LLVM framework to express such a notion of barrier/fence or a mechanism to indicate that load/stores within what might otherwise look like a "normal" basic block, must not be reordered). [ (a) Does LLVM understand 'volatile' attribute in C ? (b) My apologies in advance if I end up (or already have ?) "highjacking" this thread into another unrelated topic... ]

May be an example (grossly simplified, but otherwise "real life") will help :

     *old = *a->next_link;
     *x = 1; /* set a flag - indicates start of operation */
     *a->next_link = *b->next_link;

     asm("<store fence instruction>");

     *x = 0; /* reset the flag - done */

Here, assume that (1) x, a, b and old are all (non-alias) addresses that map to a shared memory segment and/or execution environment for this code is multi-threaded - i.e. there's another thread of execution (watchdog) that the compiler may not be aware of, to which these memory writes are "visible", and (2) the watchdog thread is responsible for "recovering" the data structure manipulation, should the thread doing it fail for some reason while in this "critical section" code (assume that the "critical section" in the example above is a bit more "elaborate" than just one memory update of the linked list pointer). It is important in this case that despite what a simple dataflow analysis might otherwise indicate, the compiler/optimizer must not zap *x = 1 as a case of redundant store operation. Another item that falls in this general category is code that uses setjmp/longjmp :

foo() {
    int x;

    x = 0;

    .....

    if ((setjmp(...) == 0) {

          ....

       x = 1;

          ....

       /* assume somewhere deep down the call chain from here,
          there's a longjmp */

          .....

     } else {
       if (x == 1) {
           .....
       }
     }

In the example above, if compiler doesn't understand the special semantics of setjmp, there's a potential for if (x == 1) block to get optimized incorrectly (x being a local variable, and setjmp being just another "ordinary" function call which is not taking address of x as a parameter, if control reaches the else block of outer if statement, data flow analysis can easily prove that value of x has to be 0 and the if block becomes dead code...). I must admit I'm not fully up to speed on LLVM yet, and perhaps setjmp does get a special treatment in LLVM (ISTM C++ try/catch blocks do get a special treatment; not sure about setjmp/longjmp)....

In "traditional" (one-C-file-at-a-time) compiler/optimizers, one can workaround this by taking address of x and passing it as a parameter to a null external function to keep the compiler from doing unsafe optimizations even when setjmp/longjmp is not treated any special. My concern when one is dealing with a whole-program optimizer infrastructure like LLVM (or for that matter post-link optimizer like Spike from DEC/Compaq which works off of fully linked binary and naked machine instructions) has been that it can easily (atleast in theory) see through this call-a-null-function trick... Yet, one could argue that there're plenty of legitimate optimization opportunities where memory references can be reordered, squashed, hoisted across basic blocks or even function calls (IOW turning off certain aggressive optimizations altogether might be a sledgehammer approach). I'm geting this nagging feeling that there may need to be a mechanism where special annotations need to be placed in LLVM instruction stream to ensure safe optimizations.... Someone please tell me my concerns are totally unfounded, atleast for LLVM :slight_smile:

- Vipin

Vipin,

The very short answers are:

(a) LLVM currently lacks any primitives for expressing synchronization
operations or memory barriers/fences explicitly, but we are working actively
on it.

(b) LLVM correctly exposes setjmp/longjmp as well as C++ exceptions through
a uniform mechanism, namely, the 'invoke' and 'unwind' instructions. See:
  http://llvm.cs.uiuc.edu/docs/LangRef.html#i_invoke
or Section 2.4 of:
  http://llvm.cs.uiuc.edu/pubs/2003-09-30-LifelongOptimizationTR.html

--Vikram

Perhaps "clock" is referring to things like reading CPU cycle counter on
most modern processors (asm("rpcc %v0", foo) or __RPCC() compiler
builtin on Alpha, e.g.); in the long term, a candidate for builtins I
suspect.

Yes, that would make sense.

While on the subject of builtins/asm etc, most modern CPUs also have
instructions to do memory barriers/fences (i.e. stall the CPU until all
in-flight memory loads and/or stores preceding the fence instruction
have finished e.g., - may be that's what "wait" instruction in the
subject line refers to ?).

Sure, ok.

These are typically implemented as compiler builtins or asm in C. I do
realize that anyone working on running existing code through LLVM can
easily work around the current asm/builtin implementation for now by
calling an assembly function, however, a perhaps not so obvious
implication/intent in a memory fence like builtin is that the programmer
also does not want compiler to reorder load/store instructions across
the barrier. I do not see any mechanism in LLVM framework to express
such a notion of barrier/fence or a mechanism to indicate that
load/stores within what might otherwise look like a "normal" basic
block, must not be reordered).

LLVM fully respects the notion that a call to an external function could
do just about anything, even when doing aggressive interprocedural
optimization. However, if you call a function which could not possibly
read or write to a memory location (because it was allocated off the
heap/stack and whose address is not passed (possibly indirectly) into the
call), it will not guarantee that the store or load happens in the proper
order. For this, you need...

[ (a) Does LLVM understand 'volatile'
attribute in C ? (b) My apologies in advance if I end up (or already
have ?) "highjacking" this thread into another unrelated topic... ]

Yup, LLVM does fully support volatile. Note that in 1.1 there was a bug
(PR179) where an optimization incorrectly eliminated volatile
loads/stores, but that is fixed, and will be in 1.2. If you'd like the
fix, it's in CVS, or the patches are attached to the PR.

May be an example (grossly simplified, but otherwise "real life") will
help :

     *old = *a->next_link;
     *x = 1; /* set a flag - indicates start of operation */
     *a->next_link = *b->next_link;

     asm("<store fence instruction>");

     *x = 0; /* reset the flag - done */

Here, assume that (1) x, a, b and old are all (non-alias) addresses that
map to a shared memory segment and/or execution environment for this
code is multi-threaded - i.e. there's another thread of execution
(watchdog) that the compiler may not be aware of, to which these memory
writes are "visible".

There are two issues: the compiler and the processor. If the loads/stores
are marked volatile, LLVM will not reorder them, so you've taken care of
the compiler side of things. On the other hand, the processor (if it
doesn't have a strong consistency model) might reorder the accesses, so a
barrier/fence is still needed. For this reason, a builting might be
appropriate. Using an abstract builtin would allow writing generic code
that works on processors with difference consistency models, you would
just have to put fences in for the lowest-common-denominator (which is
still better than ifdefs! :).

of redundant store operation. Another item that falls in this general
category is code that uses setjmp/longjmp :

<snip>

In the example above, if compiler doesn't understand the special
semantics of setjmp, there's a potential for if (x == 1) block to get
optimized incorrectly.

According to ANSI C, any variable live across a setjmp must be marked
volatile. Of course this is silly and few people actually do that in
their code, but real compilers will break the code if you don't.
"Luckily," LLVM is _not_ one of these compilers. It will correctly update
the variable, as it explicitly represents setjmp/longjmp using the same
mechanisms it uses for C++ EH. In fact, in LLVM, longjmp and C++
destructors/cleanups even interact mostly correctly.

My concern when one is dealing with a whole-program optimizer
infrastructure like LLVM has been that it can easily (atleast in theory)
see through this call-a-null-function trick... Yet, one could argue that
there're plenty of legitimate optimization opportunities where memory
references can be reordered, squashed, hoisted across basic blocks or
even function calls (IOW turning off certain aggressive optimizations
altogether might be a sledgehammer approach). I'm geting this nagging
feeling that there may need to be a mechanism where special annotations
need to be placed in LLVM instruction stream to ensure safe
optimizations.... Someone please tell me my concerns are totally
unfounded, atleast for LLVM :slight_smile:

Your concerns are totally unfounded, at least for LLVM. :slight_smile: We fully
support volatile (even optimizing it away in some trivial cases where it
is obviously unneeded), and all of the IPO we do assumes an "open world".
That means that all of the optimizers are safe with partial programs or
libraries. In fact, we run several of the optimizers (such as the dead
argument elimination and IP constant prop passes) at compile time as well
as at link time. :slight_smile:

That said, there is still room for improvement. In particular, it would
make sense to add a small number of intrinsics for performing read/write
barriers and such. The GNU hack of 'asm("", memory)' is really pretty
nasty.

-Chris