Prevent instruction elimination

Hello,

Does there exist something like a "dummy" instruction that is not removed by
the optimizers, even if it seems to be dead code?
I want to use such instructions to carry metadata information, but they
should have a minimal impact on the code generation and optimization. I used
an add instruction:

%0 = add i8 1, 2, !pragma_instrument_mem_add !0 ; <i8> [#uses=0]

which should not carry any dependencies, if inserted inside a loop for
instance. But the problem is that it is removed when I use the optimization
level -O2. Is there a way to prevent this?
I would like to attach metadata to instructions rather than using intrinsics
that reference metadata.

Thank you,
Alexandra

Hello,

Does there exist something like a "dummy" instruction that is not removed by
the optimizers, even if it seems to be dead code?
I want to use such instructions to carry metadata information, but they
should have a minimal impact on the code generation and optimization. I used
an add instruction:

You may want to use LLVM Metadata features. Search for MDNode in the doxygen docs for some information on how to create metadata.

Alternatively, you can use calls to external functions. These are seldomly optimized since optimizations assume that external functions can have undetermined side effects.

-- John T.

Hi,

John Criswell-4 wrote:

You may want to use LLVM Metadata features. Search for MDNode in the
doxygen docs for some information on how to create metadata.

I use metadata information in this way:
     unsigned mk = Context.getMDKindID(mdtype);
     Value *V = MDString::get(Context,StringRef(str));
     MDNode* n = MDNode::get(Context, &V, 1);
     inst->setMetadata(mk, n);

Maybe it is a deprecated way of handling metadata, as I started with this
code from LLVM 2.6.
But what I need is a dummy instruction ("inst" in the above example), that
has a minimal impact. I want the code to be optimized (almost) as this
instruction would not exist.

John Criswell-4 wrote:

Alternatively, you can use calls to external functions. These are
seldomly optimized since optimizations assume that external functions
can have undetermined side effects.

-- John T.

I should avoid calls to some functions, as they would be considered a
dependence if they are inserted in the body of a loop, and would definitely
affect the optimization step. I also thought about inserting an inline
assembly that only contains a comment and carries metadata info. But this is
also a call, so it influences the optimizers.

Alexandra

Hi,

John Criswell-4 wrote:

You may want to use LLVM Metadata features. Search for MDNode in the
doxygen docs for some information on how to create metadata.

I use metadata information in this way:
      unsigned mk = Context.getMDKindID(mdtype);
      Value *V = MDString::get(Context,StringRef(str));
      MDNode* n = MDNode::get(Context,&V, 1);
      inst->setMetadata(mk, n);

Maybe it is a deprecated way of handling metadata, as I started with this
code from LLVM 2.6.

I think most of the metadata APIs remained unchanged from LLVM 2.6 to LLVM 2.7 (I don't recall making many changes to SAFECode for it). I don't know about LLVM 2.7 to LLVM 2.8. In any event, I doubt the API changes are major.

But what I need is a dummy instruction ("inst" in the above example), that
has a minimal impact. I want the code to be optimized (almost) as this
instruction would not exist.

I'm not familiar with your particular needs, and I'm not an expert on annotating LLVM IR either, but it sounds like you just need to use the LLVM metadata features.

As for instructions, I don't know of an instruction which does nothing, won't be removed by optimization, and yet does not inhibit optimization. Perhaps a local alloca and a volatile load or store would do the trick? Being volatile, the compiler won't remove it (or if it does, it's a bug, and you should file a bug report), and since it loads from a memory object not used for anything else, alias analysis should be able to see that it doesn't interefere with any other load/store.

Other volatile loads/store won't be moved across it, but I doubt that'll be a concern. Most loads/stores are non-volatile.

-- John T.

Hi John,

As for instructions, I don't know of an instruction which does nothing,
won't be removed by optimization, and yet does not inhibit
optimization. Perhaps a local alloca and a volatile load or store would
do the trick? Being volatile, the compiler won't remove it (or if it
does, it's a bug, and you should file a bug report), and since it loads
from a memory object not used for anything else, alias analysis should
be able to see that it doesn't interefere with any other load/store.

LLVM certainly will remove volatile loads and stores to local variables
(at least in simple situations). I suggest using an empty asm statement.

Ciao,

Duncan.

Hi John,

As for instructions, I don't know of an instruction which does nothing,
won't be removed by optimization, and yet does not inhibit
optimization. Perhaps a local alloca and a volatile load or store would
do the trick? Being volatile, the compiler won't remove it (or if it
does, it's a bug, and you should file a bug report), and since it loads
from a memory object not used for anything else, alias analysis should
be able to see that it doesn't interefere with any other load/store.

LLVM certainly will remove volatile loads and stores to local variables
(at least in simple situations). I suggest using an empty asm statement.

Really? Isn't that illegal? The whole point of "volatile" is to tell the compiler that it should not remove a load/store. Optimizing them away seems counter-intuitive and directly contradicts the documented behavior in the LLVM Language Reference manual (which states that the number of volatile loads/stores will not be changed).

-- John T.

Hi John,

As for instructions, I don't know of an instruction which does nothing,
won't be removed by optimization, and yet does not inhibit
optimization. Perhaps a local alloca and a volatile load or store would
do the trick? Being volatile, the compiler won't remove it (or if it
does, it's a bug, and you should file a bug report), and since it loads
from a memory object not used for anything else, alias analysis should
be able to see that it doesn't interefere with any other load/store.

LLVM certainly will remove volatile loads and stores to local variables
(at least in simple situations). I suggest using an empty asm statement.

Really? Isn't that illegal? The whole point of "volatile" is to tell
the compiler that it should not remove a load/store. Optimizing them
away seems counter-intuitive and directly contradicts the documented
behavior in the LLVM Language Reference manual (which states that the
number of volatile loads/stores will not be changed).

If a local variable doesn't escape the function, no other thread can
touch it, and a volatile load from it is thus proven equivalent to a
regular load.

Hi Duncan,

The empty inline asm crossed my mind as well, but LLVM handles inline
assemblies as calls. This would lead to a dependence if it is inside of a
loop, right? And this means a considerable impact on the optimizers.
Is it possible to avoid it?

Thanks,
Alexandra

Duncan Sands wrote:

Use of metadata will not prevent optimizer from removing an instruction. Actually, that is the corner stone of LLVM metadata design.

I am curious, what information you want to carry and until what point ?

Hi John,

As for instructions, I don't know of an instruction which does nothing,
won't be removed by optimization, and yet does not inhibit
optimization. Perhaps a local alloca and a volatile load or store would
do the trick? Being volatile, the compiler won't remove it (or if it
does, it's a bug, and you should file a bug report), and since it loads
from a memory object not used for anything else, alias analysis should
be able to see that it doesn't interefere with any other load/store.

LLVM certainly will remove volatile loads and stores to local variables
(at least in simple situations). I suggest using an empty asm statement.

Really? Isn't that illegal? The whole point of "volatile" is to tell
the compiler that it should not remove a load/store. Optimizing them
away seems counter-intuitive and directly contradicts the documented
behavior in the LLVM Language Reference manual (which states that the
number of volatile loads/stores will not be changed).

If a local variable doesn't escape the function, no other thread can
touch it, and a volatile load from it is thus proven equivalent to a
regular load.

The above logic makes sense when you're talking about non-volatile loads and stores. To me, it doesn't make sense for volatile loads and stores.

The whole point of volatile is to tell the compiler that its assumptions about how memory works doesn't apply to this load or store and it should, therefore, leave it alone and not do any optimization. Informally, it's the programmer's way of telling the compiler, "I know what I'm doing and you don't, so don't touch that memory operation."

What you and Duncan are saying is that volatile is volatile except when it isn't. I think that's poor design. At the very least, it is confusing, and at worst, it prevents LLVM from handling C's "volatile" keyword correctly.

If it's decided that the current behavior is what LLVM will do, it should at least be documented in the LLVM Language Reference Manual. Right now, the current behavior directly contradicts the reference manual, and that is definitely confusing.

-- John T.

Devang Patel wrote:

Use of metadata will not prevent optimizer from removing an instruction.
Actually, that is the corner stone of LLVM metadata design.

I am curious, what information you want to carry and until what point ?
-
Devang

I want to handle new pragma inserted in the C/C++ source code and to adapt
clang to transform them in metadata information. I want to keep this
information until I run some passes on the *.bc files and manipulate the
code. Then, I delete those "dummy" instructions.
I want to transform this:
#pragma my_pragma {

C/C++ code
C/C++ code
}

into
LLVM_dummy_inst1 , !metadata_info !0
optimized LLVM code
optimized LLVM code
LLVM_dummy_inst2, !metadata_info !1

but if I run this with clang -O2 or with opt -O2, my dummy_inst are removed,
so I cannot find the pragmas in the LLVM IR.

I know that metadata will not prevent the elimination, but I am asking if
there is any way to prevent this, or what kind of instructions I could use
to have a minimal influence on the optimizers.

Alexandra

What are you going to do if "optimized LLVM code" is hoisted above or sinked below LLVM_dummy_inst by the optimizer ? It seems you are looking for a way communicate some info for a block of instructions. If that is the case then one solution is to extract interesting block of instructions in a separate function and make sure that function is not inlined. If this is not feasible than attaching your info with each instructions in "optimized LLVM code" block is better than trying to find artificial barriers to communicate some info.

Hi Alexandra,

The empty inline asm crossed my mind as well, but LLVM handles inline
assemblies as calls. This would lead to a dependence if it is inside of a
loop, right? And this means a considerable impact on the optimizers.
Is it possible to avoid it?

if a statement has no side-effects then the optimizers will remove it.
Thus you are obliged to have a statement with side-effects. This same
problem occurred with the debug info intrinsics, and the chosen solution
was to teach all the optimizers that these intrinsics didn't actually do
anything (but shouldn't be removed).

I think you should seek a different design. What are you really trying
to do?

Ciao,

Duncan.

Devang Patel wrote:

What are you going to do if "optimized LLVM code" is hoisted above or
sinked below LLVM_dummy_inst by the optimizer ? It seems you are looking
for a way communicate some info for a block of instructions. If that is
the case then one solution is to extract interesting block of instructions
in a separate function and make sure that function is not inlined. If this
is not feasible than attaching your info with each instructions in
"optimized LLVM code" block is better than trying to find artificial
barriers to communicate some info.
-
Devang

Indeed, I want to use the metadata as barriers. They are meant to mark the
beginning and the end of the compound statement of the pragma.

#pragma my_pragma{
  code
}

Therefore, I expect that the dummy-instructions will be part of different
basic blocks, and I can determine the set of blocks in "code", inside the
pragma region, from the control flow graph. So, if the instructions are
reordered inside the same basic block, it will not pose any problems. Also,
if both the beginning and the end of the barrier are part of the same block,
it can be handled.

I modify clang in order to insert dummy instructions in the location of the
pragma (in the beginning and the end of the pragma scope). I use a map
(source_location, pragma) and I insert the dummy instruction when this
location is reached in the code generator. It seems difficult to attach the
metadata to the first and the last instruction emitted for the compound
statement, as the "code" inside the pragma region can be anything.

Do you consider it would be a better idea to attach metadata to all
instructions emitted between source_location marking the beginning and the
source_location marking the end of the region?

Thank you,
Alexandra

Yes, that's how, now, we keep track of lexical scopes for debug info's use.

Thank you, Duncan. I thought about the instrinsics for debugging, but as
mentioned on the llvm blog, they might influence the optimizers: "Note that
intrinsics themselves are not considered metadata, so they can affect code
generation etc." That is why I was searching for an instruction that doesn't
do anything, but carries metadata.

I want to let the optimizers finish their work, and then to analyze parts of
the code, delimited by my metadata. So, the next approach is, instead of
using metadata as barriers to mark the scope of my pragma, to actually mark
all the instructions inside its scope with the metadata, as suggested by
Devang.

Thank you for the ideas.
Alexandra

Duncan Sands wrote:

Hello,

Devang Patel wrote:

#pragma my_pragma{
code
}

I use a map
(source_location, pragma) and I insert the dummy instruction when this
location is reached in the code generator. It seems difficult to attach
the
metadata to the first and the last instruction emitted for the compound
statement, as the "code" inside the pragma region can be anything.

Do you consider it would be a better idea to attach metadata to all
instructions emitted between source_location marking the beginning and
the
source_location marking the end of the region?

Yes, that's how, now, we keep track of lexical scopes for debug info's
use.
-
Devang

It seems I need some more guidance in these matters. So the plan is to
attach metadata to all instructions contained in the scope of the pragma. In
this respect, I create a boolean inside CodeGenFunction which is set to true
once we reach the pragma and reset to false, when we exit the pragma scope.
The set and reset functions are called from
CGStmt in EmitCompoundStmt, if there is a pragma corresponding to the left
brace and the right brace of the compound statement.

Next, I tried to attach metadata to all instructions emitted, if the boolean
is set to true. However this is rather a tedious work, as it implies
modifying all files in clang::CodeGen. Actually, for each generated
instruction I should check if it is in the pragma region and in this case
attach metadata info. I am not sure if tracking all the instructions and
verifying them is the best idea.

Another idea would be to create a boolean as part of the IRBuilder and set
it from CodeGen. And for each instruction inserted by the IRBuilder to check
the boolean and attach metadata if necessary. But I think this is too
specialized to my particular needs and it would not be a good solution.
Is there a cleaner method?

Thanks,
Alexandra

Setting a bit in IRBuilder is simpler. Why not use special purpose IRBuilder based on standard IRBuilder ?
Another alternative is to take two step approach:

step 1: insert special intrinsics to mark begin and end of your special scopes.
step 2: immediately after clang finishes generating IR, run your special pass to remove your special intrinsics and update all instructions in the scope appropriately.

This way, you'll be able to localize your changes.

Another idea would be to create a boolean as part of the IRBuilder and set
it from CodeGen. And for each instruction inserted by the IRBuilder to check
the boolean and attach metadata if necessary. But I think this is too
specialized to my particular needs and it would not be a good solution.
Is there a cleaner method?

Setting a bit in IRBuilder is simpler. Why not use special purpose IRBuilder based on standard IRBuilder ?
Another alternative is to take two step approach:

step 1: insert special intrinsics to mark begin and end of your special scopes.
step 2: immediately after clang finishes generating IR, run your special pass to remove your special intrinsics and update all instructions in the scope appropriately.

This way, you'll be able to localize your changes.

There's yet another way to implement this.
Currently, IRBuilder keeps track of a DebugLoc to set debug
information on instructions generated. You could generalize this to
keeping a SmallVector<std::pair<unsigned /*Kind*/, MDNode* /*Data*/>,

(or SmallSet/DenseMap/whatever) of metadata to set on generated

instructions. Debug information is then just a pair consisting of
LLVMContext::MD_dbg and debugloc.getAsMDNode(Context)).

Once you have this, you'd just need to call something like
Builder.addMetaData(MyKind, MyData) when entering your pragma'd block
and Builder.removeMetaData(MyKind) when exiting it.

Some other points:
- This could be useful for other people too, so it might be accepted
as a patch.
- Kind can also be a const char* if you prefer, but the unsigned ID
is probably more efficient when replacing or deleting Data of a
pre-existing Kind. Helper methods that accept StringRefs can easily be
added by calling Context.getMDKindID(name) before delegating to the
unsigned version.
- This would probably make IRBuilder a bit more expensive to copy if
there's an additional allocation (i.e. if the SmallVector/SmallSet got
overfull, or if you're using a DenseMap).
- I'm not sure how efficient it is to convert DebugLoc back and forth
to MDNode* (adding "dbg" metadata as an MDNode* automatically converts
it back because Instruction stores a DebugLoc) so maybe it should
still be handled specially. Benchmarking to see if it makes any
difference should be easy enough, just time compilation of something
large with clang -g...