r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.
Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.
(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)
Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.
Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.
I wonder if the constant class hierarchy should not be revisited in light of this?
For instance you identified that some are local to a module while others are “context-wide”.
I haven’t given too much thoughts about this, but I'm curious if you did?
r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.
Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.
(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)
Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.
Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.
What's left?
I just filed PR30513 to track remaining work.
1. Avoid the remaining uses of ConstantData use-lists. There are only
a couple of cases left, highlighted in the WIP HACK patches attached
below (0001 and 0002).
2. Remove the use-lists! Replace them with ref-counts to keep most of
the use-list API functional (and minimize the size of the change).
See the WIP patch below (0003).
3. (Optional) Remove use-lists from other non-GlobalValue Constants
that do not reference any GlobalValues. This would require some
sort of magic in, e.g., ConstantVector to conditionally have a
use-list.
I wonder if the constant class hierarchy should not be revisited in light of this?
For instance you identified that some are local to a module while others are “context-wide”.
I haven’t given too much thoughts about this, but I'm curious if you did?
You mean something like PureConstantVector (which cannot transitively reference GlobalValue) vs ConstantVectorWithGlobalRef (which can/must transitively reference GlobalValue), right? (And also for ConstantStruct, ConstantArray, and ConstantExpr, etc.)
I hadn't considered that, and it seems worth thinking about. I'm unsure whether using isa<>() would really be cleaner than using Value::hasUseList; and it would certainly be intrusive. Do you see any concrete benefits?
One possible long-term thing (after #4)... we could add ConstantDataUser (vs. User), which can only reference a Constant-with-no-GlobalValue, and has operands the size of a pointer. Obviously nice to save on operand-size, but I'm not convinced it would save sufficient memory to be worthwhile: IIRC, Instruction accounts for most instances of User.
r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.
Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.
(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)
Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.
Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.
What's left?
I just filed PR30513 to track remaining work.
1. Avoid the remaining uses of ConstantData use-lists. There are only
a couple of cases left, highlighted in the WIP HACK patches attached
below (0001 and 0002).
2. Remove the use-lists! Replace them with ref-counts to keep most of
the use-list API functional (and minimize the size of the change).
See the WIP patch below (0003).
3. (Optional) Remove use-lists from other non-GlobalValue Constants
that do not reference any GlobalValues. This would require some
sort of magic in, e.g., ConstantVector to conditionally have a
use-list. Call sites of API like Value::use_begin would have to
check for Value::hasUseList.
Could you explain why #3 requires checking for hasListList() or e.g
isa<ConstantVectorWithGlobalRef>() before calling use_begin()?
4. (Optional) Remove the ref-count from ConstantData (and, potentially,
other use-list-free Constants). This would eliminate ref-count
traffic, but would also require checking at call sites before using
any use-list-related API.
Feedback
- Does anyone disagree with this general direction? Why?
- Any thoughts on #3?
- Any thoughts on #4?
We should probably improve the use-list-order2.ll test instead of deleting it.
I've attached a patch to PR24755 which might help with this ("Improve use-list
order testing for function operands"). I think the test should fail if its uses
of ConstantData are replaced by e.g personality functions.
r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.
Since then, I've removed most instances of iterating through the
use-lists of an instance of ConstantData. I'd like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.
(The same is not true of Constant, generally. A GlobalValue's use-list
will local to the GlobalValue's Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)
Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output *does not* depend on the use-list
order of something like i32 0.
Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.
What's left?
I just filed PR30513 to track remaining work.
1. Avoid the remaining uses of ConstantData use-lists. There are only
a couple of cases left, highlighted in the WIP HACK patches attached
below (0001 and 0002).
2. Remove the use-lists! Replace them with ref-counts to keep most of
the use-list API functional (and minimize the size of the change).
See the WIP patch below (0003).
3. (Optional) Remove use-lists from other non-GlobalValue Constants
that do not reference any GlobalValues. This would require some
sort of magic in, e.g., ConstantVector to conditionally have a
use-list. Call sites of API like Value::use_begin would have to
check for Value::hasUseList.
Could you explain why #3 requires checking for hasListList() or e.g
isa<ConstantVectorWithGlobalRef>() before calling use_begin()?
Same reason #2 (WIP patch 0003) requires isa<ConstantData> or Value::hasUseList before calling use_begin. Accessing Value::use_begin and Value::use_end means that the call site is trying to walk through the use-list. Since we don't track the use-list for ConstantData, any call site that's inferring something from an empty-looking use-list will have a bug. IIRC, 0003 changes Value::use_begin and Value::use_end to assert that Value::hasUseList; if it doesn't, it should.
4. (Optional) Remove the ref-count from ConstantData (and, potentially,
other use-list-free Constants). This would eliminate ref-count
traffic, but would also require checking at call sites before using
any use-list-related API.
Feedback
- Does anyone disagree with this general direction? Why?
- Any thoughts on #3?
- Any thoughts on #4?
We should probably improve the use-list-order2.ll test instead of deleting it.
I've attached a patch to PR24755 which might help with this ("Improve use-list
order testing for function operands"). I think the test should fail if its uses
of ConstantData are replaced by e.g personality functions.
FWIW, I still would find this interesting, at least for conceptual improvements in the IR.
I’d really like it if we could separate the ideas of true constants (that are inherently foldable and don’t need use lists) and things that transitively reference globals.
Recently I’ve been thinking that the split which might make sense would be to have Constants which have no use-lists and must be manifest (no references to globals, and GlobalExprs which have use lists and can reference Globals (as wall as Constants).
This might in turn allow us to consider better models for things like expanding constantexprs that don’t fit any relocations on the platform.
Personally, I’d probably stop after your #2 unless/until we do something to re-think the class hierarchy here.
r261464 added a type called ConstantData to the Value hierarchy. This
is a parent type for constants with no operands, such as i32 0 and null.
Since then, I’ve removed most instances of iterating through the
use-lists of an instance of ConstantData. I’d like to make this
illegal. Since the users of ConstantData are spread across an
LLVMContext, most code that looks at the users is wrong. Adding an
assertion should catch a lot of bugs (see r263853 and r263875) and
avoid some expensive walks through uninteresting code.
(The same is not true of Constant, generally. A GlobalValue’s use-list
will local to the GlobalValue’s Module. Any ConstantVector,
ConstantArray, or ConstantStruct that points at a GlobalValue will also
be local to the same Module. In these cases, we also need RAUW
support.)
Besides catching bugs, removing use-lists from ConstantData will
guarantee that the compiler output does not depend on the use-list
order of something like i32 0.
Finally, this should dramatically reduce the overhead of serializing
use-list order in bitcode. We will no longer track the arbitrary
order of references to things like i32 0 and null.
What’s left?
I just filed PR30513 to track remaining work.
Avoid the remaining uses of ConstantData use-lists. There are only
a couple of cases left, highlighted in the WIP HACK patches attached
below (0001 and 0002).
Remove the use-lists! Replace them with ref-counts to keep most of
the use-list API functional (and minimize the size of the change).
See the WIP patch below (0003).
(Optional) Remove use-lists from other non-GlobalValue Constants
that do not reference any GlobalValues. This would require some
sort of magic in, e.g., ConstantVector to conditionally have a
use-list.
I wonder if the constant class hierarchy should not be revisited in light of this?
For instance you identified that some are local to a module while others are “context-wide”.
I haven’t given too much thoughts about this, but I’m curious if you did?
You mean something like PureConstantVector (which cannot transitively reference GlobalValue) vs ConstantVectorWithGlobalRef (which can/must transitively reference GlobalValue), right? (And also for ConstantStruct, ConstantArray, and ConstantExpr, etc.)
I hadn’t considered that, and it seems worth thinking about. I’m unsure whether using isa<>() would really be cleaner than using Value::hasUseList; and it would certainly be intrusive. Do you see any concrete benefits?
One possible long-term thing (after #4)… we could add ConstantDataUser (vs. User), which can only reference a Constant-with-no-GlobalValue, and has operands the size of a pointer. Obviously nice to save on operand-size, but I’m not convinced it would save sufficient memory to be worthwhile: IIRC, Instruction accounts for most instances of User.
FWIW, I still would find this interesting, at least for conceptual improvements in the IR.
I’d really like it if we could separate the ideas of true constants (that are inherently foldable and don’t need use lists) and things that transitively reference globals.
Recently I’ve been thinking that the split which might make sense would be to have Constants which have no use-lists and must be manifest (no references to globals, and GlobalExprs which have use lists and can reference Globals (as wall as Constants).
Note this naming somewhat implies GlobalVector/GlobalArray/GlobalStruct. Not sure those are bad/wrong names, but they sound a little like a GlobalObject to me.
Unless… would we actually need all three of GlobalArray + ConstantArray + ConstantDataArray? Or would we only have GlobalArray and ConstantDataArray, because of constant-folding?
This might in turn allow us to consider better models for things like expanding constantexprs that don’t fit any relocations on the platform.
Personally, I’d probably stop after your #2 unless/until we do something to re-think the class hierarchy here.
If you remove the refcount then how can you free ConstantData that is no longer referenced? If you don’t do that, will a long-loved LLVMContext gradually accumulate all 232i32 constants (and all 264i64s and so on)?
We don’t currently have any code for garbage-collecting dead ConstantData besides throwing away the whole LLVMContext, as far as I know. I mean, theoretically you could write code to do it, I guess, but nothing actually does.
Glad to see this revived! I’m not spending much time on LLVM these days, but happy to help where I can. Feel free to continue including me on the reviews and I’ll try to find time to look at things that are stuck.
I’m not sure what your direct motivation is for picking this up—maybe just ruling out a pernicious class of bugs?—but I agree with the suggestion @chandlerc had 9 years ago…
I.e., it would be interesting to split up constants in the Value class hierarchy, distinguishing between “true constants” (compile-time constants, with no direct or indirect reference to a GlobalValue… most of these are ConstantData, but there are also complex aggregates that don’t “fit” ConstantData nicely) and “module constants” (link-time or load-time constants, which directly or indirectly reference a GlobalValue, and thus need to interact with RAUW). I’d be happy to join a discussion about such a change, if that’s where you’re planning to go next (I might even be able to dig up some notes on the last time I was thinking about it, which was a couple of years ago…). (Or, as I’m pretty out of the loop, maybe I’ve already missed any such discussion?)
(BTW, vaguely related, in that this RFC moves true constants closer to an idempotent ideal, I’ve also in the past thought through approaches for making instances of the Type hierarchy idempotent (main thrust is to move type names to modules). In a prior discussion, someone pointed out it would effectively (partially?) revert the LLVM 3 type system rewrite, which is perhaps suspicious, but I think the math has changed now that pointers are opaque and recursive types have disappeared… might be worth a discussion if someone is interested in doing the work…)
Also, if you’re not already planning to look at it, consider checking the overhead of serializing use-list-order after this patch is applied. Maybe we could do it all the time (or more often) in bitcode. And maybe it would be worth considering having it on-by-default when dumping textual IR (could potentially make the IR reader more accepting of use-list order syntax to make such dumps more useful for modification?). When use-list-order serialization was introduced, it was use-lists of constants that made it impractical to turn on everywhere.