Metadata for Argument, BasicBlock

Hi everybody,

Is there a clean way to attach metadata nodes to Arguments and/or BasicBlocks?
It looks to me like one can directly attach metadata only to instructions.
My current workaround is to insert a call to a dummy function that holds metadata for its parent block - pretty ugly, but manageable. The same problem arises when I want to store specific information about the arguments of a function.

Does anybody have a suggestion how I could do this more elegantly?

Thanks,
Ralf

Hi Ralf,

Is there a clean way to attach metadata nodes to Arguments and/or
BasicBlocks?

not at the moment. Feel free to work on adding this functionality!

It looks to me like one can directly attach metadata only to instructions.
My current workaround is to insert a call to a dummy function that holds
metadata for its parent block - pretty ugly, but manageable. The same
problem arises when I want to store specific information about the
arguments of a function.

Does anybody have a suggestion how I could do this more elegantly?

Maybe you could take the address of the basic block (using blockaddress), and
use that as an argument for a module level metadatum.

Ciao, Duncan.

Hi Duncan,

Hi Ralf,

Is there a clean way to attach metadata nodes to Arguments and/or
BasicBlocks?

not at the moment. Feel free to work on adding this functionality!

I am looking into that now.
I decided to temporarily go for the following syntax for BasicBlock metadata (subject to discussion):

entry:
     !property_1 !0, !property_2 !1
   %x = fadd float %a, %b

It seems that I have to touch lots of files for this:
BasicBlock.h/.cpp, Metadata.cpp, LLParser.cpp, AsmParser, AsmWriter, BitcodeReader.cpp, BitcodeWriter.cpp so far.
I basically went and duplicated code that handles metadata attached to instructions.

Concerning Argument metadata, I am unsure of how to best represent this in LLVM IR. The following seems to be better than putting the metadata somewhere before the first block or anywhere else, but the space in the parameter list looks a bit crowded...

declare void test(i32 %param1 !property1 !0, !property2 !1,
                   float* %param2 readonly !property2 !1)

Cheers,
Ralf

What kind of things might basic block metadata be used for?

Dan

Hi Dan,

I am using it to store results of a vectorization analysis. A BasicBlock has certain properties in this context, e.g. we mark control flow that may never diverge in different instances ("threads" if you think in terms of CUDA) of the same function by marking the corresponding blocks. This information is later used when linearizing the function (control flow to data flow conversion). I'll be happy to give you more detail on this if you want to :).
I could imagine there are other things that could make use of this, or am I wrong with that?

Cheers,
Ralf

If we were to implement the #unroll pragma, we would want to add metadata to loop headers. But, it's not a big deal since we can simply add this metadata to block terminators.

Indeed, Nadav.
I also want to store information about loops as block-metadata of the loop header, but as you say this is easily doable by using block terminators.
However, for the divergence analysis, we cannot use the terminators, because the properties of a block are determined by multiple criterions.
If you do not want to introduce ugly dummy-calls to store that data (and write even more ugly code to find that call later etc.), you need block metadata.

FYI, I just sent a patch to llvm-commits that implements basic block metadata:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142379.html

Hi Dan,

I am using it to store results of a vectorization analysis. A BasicBlock has certain properties in this context, e.g. we mark control flow that may never diverge in different instances ("threads" if you think in terms of CUDA) of the same function by marking the corresponding blocks. This information is later used when linearizing the function (control flow to data flow conversion). I'll be happy to give you more detail on this if you want to :).

Why are you using metadata to store the results of an analysis?
LLVM has infrastructure for running analysis passes and making
their information available to other passes.

I could imagine there are other things that could make use of this, or am I wrong with that?

There are surely many things it could be used for. Interesting
questions include whether or not there are other ways to
achieve those things, and whether making a basic block be
something which can carry special semantics is a concept that
makes sense within the rest of the system.

Dan

Hi Dan,

I am using it to store results of a vectorization analysis. A BasicBlock has certain properties in this context, e.g. we mark control flow that may never diverge in different instances ("threads" if you think in terms of CUDA) of the same function by marking the corresponding blocks. This information is later used when linearizing the function (control flow to data flow conversion). I'll be happy to give you more detail on this if you want to :).

Why are you using metadata to store the results of an analysis?
LLVM has infrastructure for running analysis passes and making
their information available to other passes.

The analysis is only one way to supply the necessary information to the vectorizer - it could also be generated by a front-end directly. This is a very likely use-case for data-parallel languages that have specific constructs like "uniform"/"varying" (e.g. RenderMan in graphics). Metadata is the perfect thing to store this kind of information.

I could imagine there are other things that could make use of this, or am I wrong with that?

There are surely many things it could be used for. Interesting
questions include whether or not there are other ways to
achieve those things, and whether making a basic block be
something which can carry special semantics is a concept that
makes sense within the rest of the system.

That is why I brought this up here for discussion with those of you people that know more about the implications that such functionality would have.

Cheers,
Ralf

I'd be really keen for this to go in. In order to support worse case execution time analysis on compiled binaries we (XMOS) need a way to mark paths that should be excluded when checking timing constraints. A typical query is "Check the worse case execution time from A to B excluding paths which pass through location C is no more than x nanoseconds". Here C would be a location marked in the source code. A natural way to implement this would be to have the frontend attach metadata to the basic block containing C associating the block with the label "C". Trying to attach the information to an individual instruction is problematic since that specific instruction might be removed / hoisted. Other solutions I can think of might prevent optimizations from happening (which is something we want to avoid).

Regards,

Richard

Attaching metadata to a block isn't completely free of problems either, because blocks
can be merged, duplicated, and so on.

A possible alternative approach would be to use a special inline asm, like

  asm volatile ("# my special anchor");

(written here as C syntax because it's easier in email)

as an anchor to mark a position in the program. Today, LLVM treats this as reading
and writing arbitrary memory, which blocks memory optimizations, but this seems
to be over-conservative, since GCC does not (GCC requires the explicit "memory"
clobber to indicate memory use). If this were fixed, such an inline asm would be a
convenient way to mark a position in a program in a way that wouldn't be lightly
deleted or moved, and it would have a minimal impact on optimization.

As long as LLVM's target-independent optimizers don't start interpreting inline asm
strings, which seems a sane assumption.

Dan

I'm not a graphics expert, but my understanding of "uniform/varying" is that it's really
a property of values, the values which determine which basic blocks are executed, rather
than of the basic blocks themselves. Is that true?

Dan

That is true, sorry for that confusing sentence.
When vectorizing an entire function to exploit data-level parallelism, normally the whole function is linearized during control-flow to data-flow conversion.
Our analysis [1] is able to determines cases in which parts of the CFG can be excluded from linearization ("non-divergent" blocks).
The basis for this is indeed the uniform/varying property of branch conditions, but the criterion is block-specific and thus naturally attaches to basic blocks.
Also, the block properties influence the uniform/varying property of values (e.g. a phi with uniform incoming values that is in a block where paths from a varying branch join).

Best,
Ralf

[1] http://dx.doi.org/10.1007/978-3-642-28652-0_1

A front-end isn't going to do this analysis though. A front-end is just going to know
which user variables are "varying" and which are "uniform". From your descriptions, it
sounds like your analysis pass is doing all the work thinking about how values, branches,
and phis relate to each other, and it can presumably publish its results to its clients through
the normal Pass communication mechanisms.

Dan

Or maybe a nop/anchor intrinsic?

So, is there any chance that metadata for basic blocks is considered a useful feature?

There is a patch ready and on the commits-list, it compiles, passes all tests, has a test case of its own, and (as far as I can tell) does not interfere with anything.

Cheers,
Ralf

communication channel between passes. This does not appear to be a
better approach than using LLVM's conventional pass communication channels.

Dan