metadata syntax

I'm looking for input into the syntax for future metadata work. The plan is to make MDNodes hold a list of WeakVHs which will allow us to track metadata associated with values even through calls to ReplaceAllUsesWith. It also means that you can refer to other Values in the program, including instructions in another Function.

That's where the problems begin. There's currently no way in llvm assembly to refer to instructions in another function, though that's not hard to solve; I propose "@func/%tmp" as the syntax. The harder problem is how to express references to void instructions (stores, branches, switches) which historically have not been allowed to have names since they could never be referred to.

The obvious solution is to let them have names. The trouble is that this breaks .ll syntax for any program that's using old-style anonymous instructions. For example:

define i32 @foo(i32* %ptr) {
   add i32 1, 2 ; %1
   store i32 0, i32* %ptr ; previously ignored, now %2
   add i32 3, 4 ; previously %2, now %3
   ret i32 %2 ;; illegal! refers to the store not the add
}

Is this okay? Or do we need to come up with a different solution? Any suggestions?

Nick

Hi Nick,

I'm looking for input into the syntax for future metadata work. The plan
is to make MDNodes hold a list of WeakVHs which will allow us to track
metadata associated with values even through calls to
ReplaceAllUsesWith. It also means that you can refer to other Values in
the program, including instructions in another Function.

That's where the problems begin. There's currently no way in llvm
assembly to refer to instructions in another function, though that's not
hard to solve; I propose "@func/%tmp" as the syntax. The harder problem
is how to express references to void instructions (stores, branches,
switches) which historically have not been allowed to have names since
they could never be referred to.

why do you need to refer to instructions in different functions,
and instructions with no names? I'm kind of lost as to what the
purpose of this is.

Ciao,

Duncan.

Duncan Sands wrote:

Hi Nick,

I'm looking for input into the syntax for future metadata work. The plan is to make MDNodes hold a list of WeakVHs which will allow us to track metadata associated with values even through calls to ReplaceAllUsesWith. It also means that you can refer to other Values in the program, including instructions in another Function.

That's where the problems begin. There's currently no way in llvm assembly to refer to instructions in another function, though that's not hard to solve; I propose "@func/%tmp" as the syntax. The harder problem is how to express references to void instructions (stores, branches, switches) which historically have not been allowed to have names since they could never be referred to.

why do you need to refer to instructions in different functions,
and instructions with no names? I'm kind of lost as to what the
purpose of this is.

It'd be nice to say "this StoreInst doesn't overlap that StoreInst" in a metadata. Or perhaps "here's the likely values for this SwitchInst".

I have only weak ideas for how interprocedural metadata could be used, but it's clear that an MDNode made out of WeakVH's could hold one, which is why I'd like to get a syntax for them. It's certainly something that could wait until we have a client, but I was hoping to not make multiple different syntaxes/bitcode formats that we would have to support in the future.

The first client for metadata I have in mind for metadata is TBAA.

Nick

Hi Nick,

It'd be nice to say "this StoreInst doesn't overlap that StoreInst" in a
metadata. Or perhaps "here's the likely values for this SwitchInst".

well... ok, if you must :slight_smile:

I have only weak ideas for how interprocedural metadata could be used,
but it's clear that an MDNode made out of WeakVH's could hold one, which
is why I'd like to get a syntax for them. It's certainly something that
could wait until we have a client, but I was hoping to not make multiple
different syntaxes/bitcode formats that we would have to support in the
future.

How can MDNodes made out of WeakVH's result in this?

The first client for metadata I have in mind for metadata is TBAA.

To hold the graph of which type can alias which other?

I fear that metadata, being extremely flexible, is going to end up being
used for a gazillion different things simply because it's the easy solution
(it's already there), not because it's the right solution, resulting in a
big pile of ill-defined metadata mush floating around in the IR. Any
thoughts on how to avoid that?

Ciao,

Duncan.

Why do you think that TBAA will beat Debug Info on this front :wink:

Back to original topic, I still don't see real good reason to refer
instructions in other function. I'd rather wait till we have a good
reason to use it. And if we have to invent a new syntax then pl. avoid
overloading "/".

gcc has an extension to allow the address of labels to be taken and exported. IMO it would not be a great loss if llvm refused to support this, but I suppose somebody somewhere is using it...

Duncan Sands wrote:

Hi Nick,

It'd be nice to say "this StoreInst doesn't overlap that StoreInst" in a metadata. Or perhaps "here's the likely values for this SwitchInst".

well... ok, if you must :slight_smile:

I have only weak ideas for how interprocedural metadata could be used,
but it's clear that an MDNode made out of WeakVH's could hold one, which is why I'd like to get a syntax for them. It's certainly something that could wait until we have a client, but I was hoping to not make multiple different syntaxes/bitcode formats that we would have to support in the future.

How can MDNodes made out of WeakVH's result in this?

A WeakVH is just a fancy Value*. Unlike the instruction stream, there's no restrictions on what Values it can hold at a given point, and I don't see any good reason to add them. Saying "we couldn't think of a syntax for the .ll file" is a terrible reason, but at least it's honest.

The first client for metadata I have in mind for metadata is TBAA.

To hold the graph of which type can alias which other?

Almost, since llvm::Type refers to multiple high-level types anyways. You could store a piece of metadata that associates the StoreInst with a number for the high-level type, then the graph relating those numbers would be 'elsewhere'. (No, I haven't designed how TBAA ought to work.)

I fear that metadata, being extremely flexible, is going to end up being
used for a gazillion different things simply because it's the easy solution
(it's already there), not because it's the right solution, resulting in a
big pile of ill-defined metadata mush floating around in the IR. Any
thoughts on how to avoid that?

My fear was that nobody would use metadata. :slight_smile:

The answer is that it's meant to be the easy solution in cases where you just want to pass data about the instruction stream through the IR, without actually having instructions. Some folks might try to use metadata instead of just creating an analysis with a map, in which case we'll have to catch that on review like we do with any other design problem. I just don't think metadata is that special.

Nick

Code review!

-Chris

Metadata can't be used for this, because metadata is a "best effort" thing, not a guaranteed way to track data.

-Chris

I'm looking for input into the syntax for future metadata work. The plan
is to make MDNodes hold a list of WeakVHs which will allow us to track
metadata associated with values even through calls to
ReplaceAllUsesWith. It also means that you can refer to other Values in
the program, including instructions in another Function.

That's where the problems begin. There's currently no way in llvm
assembly to refer to instructions in another function, though that's not
hard to solve; I propose "@func/%tmp" as the syntax. The harder problem
is how to express references to void instructions (stores, branches,
switches) which historically have not been allowed to have names since
they could never be referred to.

I don't see a great need to support references to values in different functions.

The obvious solution is to let them have names. The trouble is that this
breaks .ll syntax for any program that's using old-style anonymous
instructions. For example:

define i32 @foo(i32* %ptr) {
  add i32 1, 2 ; %1
  store i32 0, i32* %ptr ; previously ignored, now %2
  add i32 3, 4 ; previously %2, now %3
  ret i32 %2 ;; illegal! refers to the store not the add
}

Is this okay? Or do we need to come up with a different solution? Any
suggestions?

We can't break this. I'd suggest putting void-value'd stuff in a separate namespace, perhaps !123 ?

That way they'd be numbered independently of the %'s.

-Chris