New intrinsic property IntrOnlyWrite

Hi,

I'd like to draw your attention to http://reviews.llvm.org/D18291, in which I propose a new intrinsic property for intrinsics that are lowered to instructions that mayStore, but are neither mayLoad nor hasSideEffects.

This is relevant for AMDGPU, where we have store instructions that don't operate on pointers. The codegen backend understands these perfectly well as stores, and so we can enable better scheduling decisions than if we forced these instruction to hasSideEffects.

In a perfect world, we'd be able to model the behavior of these load and store intrinsics via ReadWriteArgMem, but that would require massive changes in how LLVM thinks about memory locations and how to describe them.

This comparatively minor addition allows us to move forward with decent scheduling in codegen for the time being.

Cheers,
Nicolai

I'm generally in support of this change. I haven't looked at the patch yet, but the direction seems worthwhile.

Note that we already have a writeonly predicate in a few places in the code (BasicAA being one). If we do introduce the new intrinsic property, we should refactor all of these places to use the new mechanism. This could be separate changes, but should be considered required as part of adding the new property.

Another way you might consider slicing the problem is to introduce a WriteOnlyArg property. When combined with ArgMemOnly, this would more precisely model the pseudo store (right?) and is a better fit for the memset/memcpy use case already handled in BasicAA.

Philip

p.s. Inline comments below specific to your use case.

Hi,

I'd like to draw your attention to http://reviews.llvm.org/D18291, in which I propose a new intrinsic property for intrinsics that are lowered to instructions that mayStore, but are neither mayLoad nor hasSideEffects.

This is relevant for AMDGPU, where we have store instructions that don't operate on pointers. The codegen backend understands these perfectly well as stores, and so we can enable better scheduling decisions than if we forced these instruction to hasSideEffects.

Can you give a bit more detail here? Example, etc..?

In a perfect world, we'd be able to model the behavior of these load and store intrinsics via ReadWriteArgMem, but that would require massive changes in how LLVM thinks about memory locations and how to describe them.

This comments makes me think you might have a much deeper problem. Let's see an actual example first though; I might just be misreading your intent.

Hi,

Can you elaborate what is the impact at the IR level?
If the point is just about how you lower for you target, why are you needing an IR level attribute? You backend if free to specialize the lowering for any intrinsic regardless the IR level attributes.

I'm generally in support of this change. I haven't looked at the patch
yet, but the direction seems worthwhile.

Note that we already have a writeonly predicate in a few places in the
code (BasicAA being one). If we do introduce the new intrinsic
property, we should refactor all of these places to use the new
mechanism. This could be separate changes, but should be considered
required as part of adding the new property.

That makes a lot of sense.

Another way you might consider slicing the problem is to introduce a
WriteOnlyArg property. When combined with ArgMemOnly, this would more
precisely model the pseudo store (right?) and is a better fit for the
memset/memcpy use case already handled in BasicAA.

This is unsufficient for our use case. Let me try to explain with an example (also for your later comments). We have an intrinsic

declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>, i32, i32, i1, i1) #1

that stores a 32-bit value in a memory location that is described by a "buffer descriptor" (the <4 x i32>), an i32 index and an i32 offset. You can think of the buffer descriptor as a "heavy pointer" that contains a base address and can also contain additional information like a buffer size, a stride, and a data format. The given index is multiplied by the stride before being added to the base address; the offset is an additional byte offset. The resulting address is checked against the buffer size (and an out of bounds store is silently discarded, which fits the requirements of APIs like OpenGL). Furthermore, the data being stored may be subject to a format conversion (float to int, etc.). The additional i1 arguments enable provide some cache control.

If you have some advice on how to fit such a memory addressing model into the ArgMemOnly logic that would be appreciated, but from my reading of the code it appears that trying to make ArgMemOnly work with this would be a rather large project which we don't really have the resources to tackle right now.

Why am I pushing for IntrWriteOnly?

I suspect the merely IntrWriteOnly without a functional ArgMemOnly does not provide a lot of opportunity for optimization at the IR level.

However, the codegen backend does understand the resulting hardware instructions, which are marked mayLoad = 0, mayStore = 1, hasSideEffects = 0. Having mayLoad = 0 and hasSideEffects = 0 makes a difference for instruction scheduling.

If you try to map the intrinsic as-is, without IntrWriteOnly, onto such hardware instructions, TableGen will (correctly) complain about a mismatch of mayLoad and hasSideEffects.

So I'd like to do a minimal change that will make TableGen happy, but I do not want that change to be an ugly kludge.

I understand your concern about having this interact nicely with BasicAA. Perhaps both an IntrWriteOnlyMem and an IntrWriteOnlyArgMem should be added for orthogonality?

Cheers,
Nicolai

As I explained in my reply to Philip, what I really need is a way to get TableGen to shut up about what it reasonably believes to be a mismatch between the properties of the intrinsic (which it conservatively believes to be mayLoad = 1, mayStore = 1, hasSideEffects = 1) and the hardware instruction (which is correctly mayLoad = 0, mayStore = 1, hasSideEffects = 0).

Indeed, write-only without an ArgMemOnly property may well be useless at the IR level. If you think that there is a better way to explain the situation to TableGen, please let me know.

Please see my mail to Philip for a more detailed explanation for why we are in this situation.

Thanks,
Nicolai

I'm generally in support of this change. I haven't looked at the patch
yet, but the direction seems worthwhile.

Note that we already have a writeonly predicate in a few places in the
code (BasicAA being one). If we do introduce the new intrinsic
property, we should refactor all of these places to use the new
mechanism. This could be separate changes, but should be considered
required as part of adding the new property.

That makes a lot of sense.

Another way you might consider slicing the problem is to introduce a
WriteOnlyArg property. When combined with ArgMemOnly, this would more
precisely model the pseudo store (right?) and is a better fit for the
memset/memcpy use case already handled in BasicAA.

This is unsufficient for our use case. Let me try to explain with an example (also for your later comments). We have an intrinsic

declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>, i32, i32, i1, i1) #1

that stores a 32-bit value in a memory location that is described by a "buffer descriptor" (the <4 x i32>), an i32 index and an i32 offset. You can think of the buffer descriptor as a "heavy pointer" that contains a base address and can also contain additional information like a buffer size, a stride, and a data format. The given index is multiplied by the stride before being added to the base address; the offset is an additional byte offset. The resulting address is checked against the buffer size (and an out of bounds store is silently discarded, which fits the requirements of APIs like OpenGL). Furthermore, the data being stored may be subject to a format conversion (float to int, etc.). The additional i1 arguments enable provide some cache control.

So, I think you have a fundamental problem here. By not using pointers, you are breaking one of the deepest assumptions in LLVM: that we can reason about memory aliasing without having to consider the numeric values of non-pointer values. If a buffer can never *ever* be addressed via a normal pointer, then *maybe* we can paper over this, but if you're mixing pointer and non-pointer accesses to the same address, you have a potentially serious problem.

As a concrete example, let's take the following:
%mem = malloc() <-- returns a noalias pointer
%buf = construct_buf_ref(%mem) <-- assume this is transparent and doesn't capture
load %mem
buffer_store(%buf)
load %mem

In this (psuedo code) example, basicaa will use capture tracking to *prove* that buffer_store can't have written to %mem. buffer_store is allowed to write to any memory it can address, but because %mem has not been captured, buffer_store can not obtain a pointer to it. This could lead to use value forwarding the load and producing an incorrect result.

Note: If you treat "construct_buf_ref" as a capturing operation, you can paper over this example (and many others). I'm using it mostly as an example of how deep the assumptions about memory, pointers, and capturing go.

If you have some advice on how to fit such a memory addressing model into the ArgMemOnly logic that would be appreciated, but from my reading of the code it appears that trying to make ArgMemOnly work with this would be a rather large project which we don't really have the resources to tackle right now.

Why am I pushing for IntrWriteOnly?

I suspect the merely IntrWriteOnly without a functional ArgMemOnly does not provide a lot of opportunity for optimization at the IR level.

I think this is not entirely true. In particular, it gives us a way to model things like memset and memset_16 by combining InstrWriteOnly and ArgMemOnly. It's not quite as powerful as having a writeonly argument attribute, but it's better than where we are today.

However, the codegen backend does understand the resulting hardware instructions, which are marked mayLoad = 0, mayStore = 1, hasSideEffects = 0. Having mayLoad = 0 and hasSideEffects = 0 makes a difference for instruction scheduling.

If you try to map the intrinsic as-is, without IntrWriteOnly, onto such hardware instructions, TableGen will (correctly) complain about a mismatch of mayLoad and hasSideEffects.

So I'd like to do a minimal change that will make TableGen happy, but I do not want that change to be an ugly kludge.

I'm not opposed to introducing an InstrWriteOnly property for intrinsics. My concern is that you're expecting it to be something it isn't (per capture discussion above.)

I understand your concern about having this interact nicely with BasicAA. Perhaps both an IntrWriteOnlyMem and an IntrWriteOnlyArgMem should be added for orthogonality?

A InstrWriteOnly + ArgMemOnly should equal an InstWriteOnlyArgMem. Thus, we shouldn't need separate attributes. My original comment was about introducing a writeonly attribute on the argument of a function. (i.e. a *particular* attribute) This would allow us to precisely describe memcpy for instance. (src argument is readonly, dest argument is writeonly)

Hi,

Can you elaborate what is the impact at the IR level?
If the point is just about how you lower for you target, why are you needing an IR level attribute? You backend if free to specialize the lowering for any intrinsic regardless the IR level attributes.

As I explained in my reply to Philip, what I really need is a way to get TableGen to shut up about what it reasonably believes to be a mismatch between the properties of the intrinsic (which it conservatively believes to be mayLoad = 1, mayStore = 1, hasSideEffects = 1) and the hardware instruction (which is correctly mayLoad = 0, mayStore = 1, hasSideEffects = 0).

I'm not sure what is the semantics of "hasSideEffects" at the MI level. I'm surprised we can consider correct that something that writes to memory is not "having side effects".

At the IR level, the definition of "mayHaveSideEffects" is more coherent with what I expect:

  bool mayHaveSideEffects() const {
    return mayWriteToMemory() || mayThrow() || !mayReturn();
  }

Indeed, write-only without an ArgMemOnly property may well be useless at the IR level. If you think that there is a better way to explain the situation to TableGen, please let me know.

I don't really understand why an MI which is "mayLoad" would have a side effect?
I don't get either why not an MI which is "mayStore" would not have a side effect. I'd expect that mayStore imply "hasSideEffect" (i.e. the latter being a strict superset).
(but hey, I'm not an MI expert and I'd need some clarification on the implication of "hasSideEffect" at the MI level)

The MI level talks about side effects not modeled by any other mechanism (MCInstrDesc::hasUnmodeledSideEffects()) which is more useful than a conservative as we can differentiate between effects of memory read/writes (to possible known addresses) and arbitrary things (like wait for input, write to device, ...).

- Matthias

Answer to myself, I finally found the MI API (thanks Matthias), and it is called hasUnmodeledSideEffects. I think the "unmodeled" part of the name is important here :slight_smile:
Here is the doxygen:

  /// \brief Return true if this instruction has side
  /// effects that are not modeled by other flags. This does not return true
  /// for instructions whose effects are captured by:
  ///
  /// 1. Their operand list and implicit definition/use list. Register use/def
  /// info is explicit for instructions.
  /// 2. Memory accesses. Use mayLoad/mayStore.
  /// 3. Calling, branching, returning: use isCall/isReturn/isBranch.

Now you initially reported that "TableGen will (correctly) complain about a mismatch of mayLoad and hasSideEffects", I believe this is incorrect considering the above description.

I'm generally in support of this change. I haven't looked at the patch
yet, but the direction seems worthwhile.

Note that we already have a writeonly predicate in a few places in the
code (BasicAA being one). If we do introduce the new intrinsic
property, we should refactor all of these places to use the new
mechanism. This could be separate changes, but should be considered
required as part of adding the new property.

That makes a lot of sense.

Another way you might consider slicing the problem is to introduce a
WriteOnlyArg property. When combined with ArgMemOnly, this would more
precisely model the pseudo store (right?) and is a better fit for the
memset/memcpy use case already handled in BasicAA.

This is unsufficient for our use case. Let me try to explain with an
example (also for your later comments). We have an intrinsic

declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>,
i32, i32, i1, i1) #1

that stores a 32-bit value in a memory location that is described by a
"buffer descriptor" (the <4 x i32>), an i32 index and an i32 offset.
You can think of the buffer descriptor as a "heavy pointer" that
contains a base address and can also contain additional information
like a buffer size, a stride, and a data format. The given index is
multiplied by the stride before being added to the base address; the
offset is an additional byte offset. The resulting address is checked
against the buffer size (and an out of bounds store is silently
discarded, which fits the requirements of APIs like OpenGL).
Furthermore, the data being stored may be subject to a format
conversion (float to int, etc.). The additional i1 arguments enable
provide some cache control.

So, I think you have a fundamental problem here. By not using pointers,
you are breaking one of the deepest assumptions in LLVM: that we can
reason about memory aliasing without having to consider the numeric
values of non-pointer values. If a buffer can never *ever* be addressed
via a normal pointer, then *maybe* we can paper over this, but if you're
mixing pointer and non-pointer accesses to the same address, you have a
potentially serious problem.

As a concrete example, let's take the following:
%mem = malloc() <-- returns a noalias pointer
%buf = construct_buf_ref(%mem) <-- assume this is transparent and
doesn't capture
load %mem
buffer_store(%buf)
load %mem

In this (psuedo code) example, basicaa will use capture tracking to
*prove* that buffer_store can't have written to %mem. buffer_store is
allowed to write to any memory it can address, but because %mem has not
been captured, buffer_store can not obtain a pointer to it. This could
lead to use value forwarding the load and producing an incorrect result.

Note: If you treat "construct_buf_ref" as a capturing operation, you can
paper over this example (and many others). I'm using it mostly as an
example of how deep the assumptions about memory, pointers, and
capturing go.

Thank you for the explanation, I learned something new!

However, I don't think this is a problem for us. First, we don't even have malloc (GPUs are funny that way). We do use alloca, but we can rely on the front-ends not emitting code that generates buffer descriptors from pointers.

If you have some advice on how to fit such a memory addressing model
into the ArgMemOnly logic that would be appreciated, but from my
reading of the code it appears that trying to make ArgMemOnly work
with this would be a rather large project which we don't really have
the resources to tackle right now.

Why am I pushing for IntrWriteOnly?

I suspect the merely IntrWriteOnly without a functional ArgMemOnly
does not provide a lot of opportunity for optimization at the IR level.

I think this is not entirely true. In particular, it gives us a way to
model things like memset and memset_16 by combining InstrWriteOnly and
ArgMemOnly. It's not quite as powerful as having a writeonly argument
attribute, but it's better than where we are today.

However, the codegen backend does understand the resulting hardware
instructions, which are marked mayLoad = 0, mayStore = 1,
hasSideEffects = 0. Having mayLoad = 0 and hasSideEffects = 0 makes a
difference for instruction scheduling.

If you try to map the intrinsic as-is, without IntrWriteOnly, onto
such hardware instructions, TableGen will (correctly) complain about a
mismatch of mayLoad and hasSideEffects.

So I'd like to do a minimal change that will make TableGen happy, but
I do not want that change to be an ugly kludge.

I'm not opposed to introducing an InstrWriteOnly property for
intrinsics. My concern is that you're expecting it to be something it
isn't (per capture discussion above.)

I understand your concern about having this interact nicely with
BasicAA. Perhaps both an IntrWriteOnlyMem and an IntrWriteOnlyArgMem
should be added for orthogonality?

A InstrWriteOnly + ArgMemOnly should equal an InstWriteOnlyArgMem. Thus,
we shouldn't need separate attributes. My original comment was about
introducing a writeonly attribute on the argument of a function. (i.e.
a *particular* attribute) This would allow us to precisely describe
memcpy for instance. (src argument is readonly, dest argument is
writeonly)

Okay, I get it now.

Thanks,
Nicolai

The "side effects" are unmodeled at the IR level, but they *are* modeled at the MI level.

TableGen is conservative: it assumes that for intrinsics without a property that talks about memory use, there are unmodeled side-effects.

Cheers,
Nicolai

Hi,

following up on this discussion, I cleaned things up a bit and explored the direction of an IR attribute (in addition to the LLVM intrinsic property which is for TableGen only).

There are two proposed changes:

1) http://reviews.llvm.org/D18291 -- this only affects TableGen and the AMDGPU backend. It's a small change which I'd like to commit as soon as possible.

2) http://reviews.llvm.org/D18714 -- builds on top of the previous one, adding the LLVM IR attribute ``writeonly`` to go with the LLVM intrinsic property. This change touches more things, but should still be mostly straightforward. The change starts adding the ``writeonly`` attribute -- there are a few obvious next things one could do with it (basically, the TODOs in BasicAliasAnalysis), but I think it makes sense to already commit this as is.

Please review.

Thanks,
Nicolai