RFC: ConstantData should not have use-lists

r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.

Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.

(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)

Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.

Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.

What's left?

0001-WIP-HACK-SimplifyLibCalls-Disable-optimizeSinCosPi-o.patch (1.84 KB)

0002-WIP-HACK-LICM-Ignore-stores-to-UndefValue-and-Consta.patch (2.18 KB)

0003-WIP-IR-Remove-use-lists-from-ConstantData.patch (17.8 KB)

I wonder if the constant class hierarchy should not be revisited in light of this?
For instance you identified that some are local to a module while others are “context-wide”.

I haven’t given too much thoughts about this, but I'm curious if you did?

r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.

Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.

(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)

Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.

Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.

What's left?

I just filed PR30513 to track remaining work.

1. Avoid the remaining uses of ConstantData use-lists. There are only
  a couple of cases left, highlighted in the WIP HACK patches attached
  below (0001 and 0002).

2. Remove the use-lists! Replace them with ref-counts to keep most of
  the use-list API functional (and minimize the size of the change).
  See the WIP patch below (0003).

3. (Optional) Remove use-lists from other non-GlobalValue Constants
  that do not reference any GlobalValues. This would require some
  sort of magic in, e.g., ConstantVector to conditionally have a
  use-list.

I wonder if the constant class hierarchy should not be revisited in light of this?
For instance you identified that some are local to a module while others are “context-wide”.

I haven’t given too much thoughts about this, but I'm curious if you did?

You mean something like PureConstantVector (which cannot transitively reference GlobalValue) vs ConstantVectorWithGlobalRef (which can/must transitively reference GlobalValue), right? (And also for ConstantStruct, ConstantArray, and ConstantExpr, etc.)

I hadn't considered that, and it seems worth thinking about. I'm unsure whether using isa<>() would really be cleaner than using Value::hasUseList; and it would certainly be intrusive. Do you see any concrete benefits?

One possible long-term thing (after #4)... we could add ConstantDataUser (vs. User), which can only reference a Constant-with-no-GlobalValue, and has operands the size of a pointer. Obviously nice to save on operand-size, but I'm not convinced it would save sufficient memory to be worthwhile: IIRC, Instruction accounts for most instances of User.

r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.

Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.

(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)

Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.

Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.

What's left?

I just filed PR30513 to track remaining work.

1. Avoid the remaining uses of ConstantData use-lists. There are only
   a couple of cases left, highlighted in the WIP HACK patches attached
   below (0001 and 0002).

2. Remove the use-lists! Replace them with ref-counts to keep most of
   the use-list API functional (and minimize the size of the change).
   See the WIP patch below (0003).

3. (Optional) Remove use-lists from other non-GlobalValue Constants
   that do not reference any GlobalValues. This would require some
   sort of magic in, e.g., ConstantVector to conditionally have a
   use-list. Call sites of API like Value::use_begin would have to
   check for Value::hasUseList.

Could you explain why #3 requires checking for hasListList() or e.g
isa<ConstantVectorWithGlobalRef>() before calling use_begin()?

4. (Optional) Remove the ref-count from ConstantData (and, potentially,
   other use-list-free Constants). This would eliminate ref-count
   traffic, but would also require checking at call sites before using
   any use-list-related API.

Feedback

- Does anyone disagree with this general direction? Why?
- Any thoughts on #3?
- Any thoughts on #4?

<0001-WIP-HACK-SimplifyLibCalls-Disable-optimizeSinCosPi-o.patch>
<0002-WIP-HACK-LICM-Ignore-stores-to-UndefValue-and-Consta.patch>
<0003-WIP-IR-Remove-use-lists-from-ConstantData.patch>

We should probably improve the use-list-order2.ll test instead of deleting it.
I've attached a patch to PR24755 which might help with this ("Improve use-list
order testing for function operands"). I think the test should fail if its uses
of ConstantData are replaced by e.g personality functions.

vedant

r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.

Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.

(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)

Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.

Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.

What's left?

I just filed PR30513 to track remaining work.

1. Avoid the remaining uses of ConstantData use-lists. There are only
  a couple of cases left, highlighted in the WIP HACK patches attached
  below (0001 and 0002).

2. Remove the use-lists! Replace them with ref-counts to keep most of
  the use-list API functional (and minimize the size of the change).
  See the WIP patch below (0003).

3. (Optional) Remove use-lists from other non-GlobalValue Constants
  that do not reference any GlobalValues. This would require some
  sort of magic in, e.g., ConstantVector to conditionally have a
  use-list. Call sites of API like Value::use_begin would have to
  check for Value::hasUseList.

Could you explain why #3 requires checking for hasListList() or e.g
isa<ConstantVectorWithGlobalRef>() before calling use_begin()?

Same reason #2 (WIP patch 0003) requires isa<ConstantData> or Value::hasUseList before calling use_begin. Accessing Value::use_begin and Value::use_end means that the call site is trying to walk through the use-list. Since we don't track the use-list for ConstantData, any call site that's inferring something from an empty-looking use-list will have a bug. IIRC, 0003 changes Value::use_begin and Value::use_end to assert that Value::hasUseList; if it doesn't, it should.

4. (Optional) Remove the ref-count from ConstantData (and, potentially,
  other use-list-free Constants). This would eliminate ref-count
  traffic, but would also require checking at call sites before using
  any use-list-related API.

Feedback

- Does anyone disagree with this general direction? Why?
- Any thoughts on #3?
- Any thoughts on #4?

<0001-WIP-HACK-SimplifyLibCalls-Disable-optimizeSinCosPi-o.patch>
<0002-WIP-HACK-LICM-Ignore-stores-to-UndefValue-and-Consta.patch>
<0003-WIP-IR-Remove-use-lists-from-ConstantData.patch>

We should probably improve the use-list-order2.ll test instead of deleting it.
I've attached a patch to PR24755 which might help with this ("Improve use-list
order testing for function operands"). I think the test should fail if its uses
of ConstantData are replaced by e.g personality functions.

Agreed. Thanks for the patch.

FWIW, I still would find this interesting, at least for conceptual improvements in the IR.

I’d really like it if we could separate the ideas of true constants (that are inherently foldable and don’t need use lists) and things that transitively reference globals.

Recently I’ve been thinking that the split which might make sense would be to have Constants which have no use-lists and must be manifest (no references to globals, and GlobalExprs which have use lists and can reference Globals (as wall as Constants).

This might in turn allow us to consider better models for things like expanding constantexprs that don’t fit any relocations on the platform.

Personally, I’d probably stop after your #2 unless/until we do something to re-think the class hierarchy here.

-Chandler

r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.

Since then, I’ve removed most instances of iterating through the
use-lists of an instance of ConstantData. I’d like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.

(The same is not true of Constant, generally. A GlobalValue’s use-list
will local to the GlobalValue’s Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)

Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output does not depend on the use-list
order of something like i32 0.

Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.

What’s left?

I just filed PR30513 to track remaining work.

  1. Avoid the remaining uses of ConstantData use-lists. There are only
    a couple of cases left, highlighted in the WIP HACK patches attached
    below (0001 and 0002).

  2. Remove the use-lists! Replace them with ref-counts to keep most of
    the use-list API functional (and minimize the size of the change).
    See the WIP patch below (0003).

  3. (Optional) Remove use-lists from other non-GlobalValue Constants
    that do not reference any GlobalValues. This would require some
    sort of magic in, e.g., ConstantVector to conditionally have a
    use-list.

I wonder if the constant class hierarchy should not be revisited in light of this?
For instance you identified that some are local to a module while others are “context-wide”.

I haven’t given too much thoughts about this, but I’m curious if you did?

You mean something like PureConstantVector (which cannot transitively reference GlobalValue) vs ConstantVectorWithGlobalRef (which can/must transitively reference GlobalValue), right? (And also for ConstantStruct, ConstantArray, and ConstantExpr, etc.)

I hadn’t considered that, and it seems worth thinking about. I’m unsure whether using isa<>() would really be cleaner than using Value::hasUseList; and it would certainly be intrusive. Do you see any concrete benefits?

One possible long-term thing (after #4)… we could add ConstantDataUser (vs. User), which can only reference a Constant-with-no-GlobalValue, and has operands the size of a pointer. Obviously nice to save on operand-size, but I’m not convinced it would save sufficient memory to be worthwhile: IIRC, Instruction accounts for most instances of User.

FWIW, I still would find this interesting, at least for conceptual improvements in the IR.

I’d really like it if we could separate the ideas of true constants (that are inherently foldable and don’t need use lists) and things that transitively reference globals.

Recently I’ve been thinking that the split which might make sense would be to have Constants which have no use-lists and must be manifest (no references to globals, and GlobalExprs which have use lists and can reference Globals (as wall as Constants).

Note this naming somewhat implies GlobalVector/GlobalArray/GlobalStruct. Not sure those are bad/wrong names, but they sound a little like a GlobalObject to me.

Unless… would we actually need all three of GlobalArray + ConstantArray + ConstantDataArray? Or would we only have GlobalArray and ConstantDataArray, because of constant-folding?

This might in turn allow us to consider better models for things like expanding constantexprs that don’t fit any relocations on the platform.

Personally, I’d probably stop after your #2 unless/until we do something to re-think the class hierarchy here.

Yup, makes sense to me. Thanks for the input!