A Proposal for adding an experimental IR-level region-annotation infrastructure
FWIW, we needed to maintain single entry-multiple exit regions for WinEH and we accomplished it via a different mechanism.
We had an instruction which produces a value of type Token (http://llvm.org/docs/LangRef.html#token-type) which let us establish the region and another instruction to exit the region by consuming it. The dominance rules allowed us to avoid situations where the compiler might trash the regions in weird ways and made sure that regions would be left unharmed.
AFAIK, a similar approach using Token could work here. I think it would reduce the amount of stuff you’d need LLVM to maintain.
David, one quick question, is there a way to preserve and associate a set of “properties, value info/attr ” to the given region using Token?
Thanks,
Xinmin
We are experimenting similar thing on SESE regions. We introduce an intrinsic to produce a token and another to consume the token. These two intrinsics mark the region, and we annotate extra information as OpBundle of the intrinsic that produce the token.
Thanks
Hongbin
Would you send us the LLVM IR for below example using token and OpBundle. So, we can understand better. Thanks.
#pragma omp target teams distribute parallel for simd shared(xp, yp) linear(i) firstprivate(m, n) map(m, n)
for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
#pragma prefetch x:1:20 y:0:10
for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
I am not an OpenMP expert, so some annotation may be wrong:
// CHECK: [[ENTRY:%[a-zA-Z0-9.]+]] = tail call token @llvm.directive.scope.entry() [ “target teams distribute”(), “parallel for”, “simd” (), “shared” (i32 *xp, i32 yp), “linear_iv” (), “firstprivate” (i32 m, i32 n), “map” (m, n) ] ; notice that I use “linear_iv” for linear induction variable, you may want to fix this
#pragma omp target teams distribute parallel for simd shared(xp, yp) linear(i) firstprivate(m, n) map(m, n)
for (i=0; i<2N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
// CHECK: tail call void @llvm.directive.scope.exit(token [[ENTRY]])
// CHECK: [[ENTRY:%[a-zA-Z0-9.]+]] = tail call token @llvm.directive.scope.entry() [ “prefetch”(i32 *xp, i64 1, i64 20, i32 yp, i64 0, i64 10) ]
#pragma prefetch x:1:20 y:0:10
for (i=0; i<2N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
// CHECK: tail call void @llvm.directive.scope.exit(token [[ENTRY]])
Interesting, this is similar to what we have.
One more question, these stuff in the yellow, are they represented as LLVM VALUEs? In other words, does the LLVM optimizer update them? ,E.g. %m is re-named %m.1 in the loop, is the “m” in the token @… is updated as well? In the RFC, the “m” is argument of intrinsic call, all use-def info are used by optimizer, and optimizer updates them during optimization as regular function arguments. I am trying understand if there is any difference between token scheme and intrinsic scheme in this regard.
tail call token @llvm.directive.scope.entry() [ “target teams distribute”(), “parallel for”, “simd” (), “shared” (i32 *xp, i32 *yp), “linear_iv” (), “firstprivate” (i32 m, i32 n), “map” (m, n) ] ;
+1, tokens are the current True Way to create single-entry multi-exit regions. Your example for an annotated loop would look like:
%region = call token @llvm.openmp.regionstart(metadata …) ; whatever parameters you need here
loop
call void @llvm.openmp.regionend(token %region)
If you use tokens, I would recommend proposal (c), where you introduce new intrinsics for every new kind of region, instead of adding one overly generic set of region intrinsics.
We already have a way to form regions with real barriers, and it’s tokens.
Yes, those are LLVM SSA values. “map” (m, n) should be “map” (i32 m, i32 n) .
Thanks
Hongbin
And “map” and “firstprivate” … are represented as MDString, right? Thanks.
I think they are not MDString, but “bundle tags” that managed by LLVMContextImpl::getOrInsertBundleTag.
+1, tokens are the current True Way to create single-entry multi-exit
regions. Your example for an annotated loop would look like:%region = call token @llvm.openmp.regionstart(metadata ...) ; whatever
parameters you need here
loop
call void @llvm.openmp.regionend(token %region)If you use tokens, I would recommend proposal (c), where you introduce new
intrinsics for every new kind of region, instead of adding one overly
generic set of region intrinsics.
Maybe we can come up with several categories of regions, and create new
intrinsic for each category, instead of creating new intrinsic for every
*kind*.
Thanks
Hongbin
I think they are not MDString, but "bundle tags" that managed
by LLVMContextImpl::getOrInsertBundleTag.
I just treat them as something like the string that returned by
Inst->getName()
Got it. Thanks.
def int_experimental_directive : Intrinsic<, [llvm_metadata_ty],
[IntrArgMemOnly],
"llvm.experimental.directive">;def int_experimental_dir_qual : Intrinsic<, [llvm_metadata_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual">;def int_experimental_dir_qual_opnd : Intrinsic<,
[llvm_metadata_ty, llvm_any_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opnd">;def int_experimental_dir_qual_opndlist : Intrinsic<
,
[llvm_metadata_ty, llvm_vararg_ty],
[IntrArgMemOnly],
"llvm.experimental.dir.qual.opndlist">;
I'll bite.
What does argmemonly mean when the operands are metadata/?
If the rest is an attempt to keep the intrinsic from being floated or
removed, i'm strongly against extending a way we already know to have
significant effect on optimization (fake memory dependence) to do this.
Particularly for something so major.
Can you elaborate why? I’m curious.
Thanks,
The con of proposal c was that many passes would need to learn about many
region intrinsics. With tokens, you only need to teach all passes about
tokens, which they should already know about because WinEH and other things
use them.
With tokens, we can add as many region-introducing intrinsics as makes
sense without any additional cost to the middle end. We don't need to make
one omnibus region intrinsic set that describes every parallel loop
annotation scheme supported by LLVM. Instead we would factor things
according to other software design considerations.
I think that, unless we allow frontends to add their own intrinsics without recompiling LLVM, this severely restricts the usefulness of this feature. The motivation here is to support frontends inserting custom region annotations which are handled by custom passes the frontends inject into the (end of the) pipeline. -Hal
Thanks for explaining!
I’m not convinced that “building a frontend without recompiling LLVM while injecting custom passes” is a strong compelling use-case, i.e. can you explain why requiring such use-case/frontends to rebuild LLVM is so limiting?
Thanks,