Background
Most side-effect-free instructions are currently also supported as constant expressions. In this example, @test1()
uses an instruction, while @test2()
uses an equivalent constant expression:
@g = external global i32
define i64 @test1() {
%addr = ptrtoint @g to i64
ret i64 %addr
}
define i64 @test2() {
ret i64 ptrtoint @g to i64
}
The primary purpose of constant expressions are global variable initializers, which cannot contain instructions. For example, @g.end
here is initialized to a pointer to the end of @g
:
@g = global i32 0
@g.end = global ptr getelementptr (i32, ptr @g, i64 1)
However, while global initializers accept all kinds of expressions on the IR level, only a very narrow set is supported at the MC layer: Initializers must be “relocatable”, which essentially limits expressions to adding an offset to other globals (and possibly computing the difference between two globals).
This RFC proposes to reduce the set of supported constant expressions to those that can actually be used in global variable initializers, and drop support for everything else (such as floating-point operations, vector operations and devision operations).
Motivation for removal
Trapping constant expressions
Because the udiv, sdiv, urem and srem opcodes are supported as constant expressions, it is possible for constant expressions to trigger undefined behavior. This means that such constant expressions or instructions referencing them cannot be speculated.
Consider this example:
@g = extern_weak global i32
define i64 @test(i1 %c) {
entry:
br i1 %c, label %if, label %join
if:
br label %join
join:
%phi = phi i64 [ poison, %if ], [ srem (i64 1, i64 ptrtoint (i32* @g to i64)) , %entry ]
ret i64 %phi
}
Normally, opt -instsimplify
would fold the phi node into the only non-poison value:
@g = extern_weak global i32
define i64 @test(i1 %c) {
entry:
br i1 %c, label %if, label %join
if:
br label %join
join:
ret i64 srem (i64 1, i64 ptrtoint (i32* @g to i64))
}
However, this transformation is illegal, because the srem
(which might divide by zero if @g
is null) is now executed unconditionally, while it was previously executed only on the %entry -> %join
edge.
While this particular instance has been fixed, opt -instcombine
still miscompiles this as of this writing. See also issue #49839 for a long string of related bugs in various passes.
Because these kinds of trapping division expressions appear very rarely in practice, a lot of code is not aware that they require special handling. Fixing one case usually just shifts the problem to another transform.
Infinite combine loops
Constant expressions are the most common source of infinite loops inside InstCombine.
To give an example of how such a loop could occur, consider this function:
define i64 @test(i64 %x) {
%sub = sub i64 %x, ptrtoint (ptr @g to i64)
ret i64 %sub
}
This contains a X - C
pattern, which is usually canonicalized to X + (-C)
, on the assumption that -C
is going to fold. However, if C
is a constant expression, then -C
may not fold, and we get:
define i64 @test(i64 %x) {
%sub = add i64 %x, sub (i64 0, i64 ptrtoint (ptr @g to i64))
ret i64 %sub
}
At this point we have an X + (-Y)
pattern, which is canonicalized to X - Y
, and we end up with the original function. We will then toggle back and forth between these two representations.
This is why InstCombine needs to take special care when handling constants, for example by using m_ImmConstant()
in certain transforms.
Exponential IR representation
Constant expressions with reused operands have small in-memory and bitcode representation, but may have exponentially large textual IR representation.
For example, take the following function:
define i64 @test(i64 %x1) {
%x2 = mul i64 %x1, %x1
%x3 = mul i64 %x2, %x2
%x4 = mul i64 %x3, %x3
%x5 = mul i64 %x4, %x4
%x6 = mul i64 %x5, %x5
ret i64 %x6
}
If we replace %x1
with ptrtoint (ptr @g to i64)
, then the resulting function will look like this (mind the scroll bar!):
define i64 @test() {
ret i64 mul (i64 mul (i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)))), i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))))), i64 mul (i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)))), i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))))))
}
For each additional multiply, the size of the textual IR increases by a factor of two, because sub-expressions have to be repeated on every use.
This is usually not an issue for normal compilation (because it will never produce textual IR), but it is an issue when debugging or producing test cases.
Cost modelling
Both in ad-hoc and TTI-driven cost models, constants are usually considered to be zero-cost. This does not account for the fact that constant expressions might end up expanding to many instructions in the backend.
The example from the previous section would get cost-modelled like a simple return i64 123
, rather than the sequence of five multiplies it really is.
Missing / duplicate folds
Because instructions and constant expressions form distinct hierarchies, folds for one generally do not translate into folds for the other. LLVM’s primary folding pipeline consists of four parts:
- ConstFold: DataLayout-independent constant folding, invoked as part of constant expression construction.
- ConstantFolding: DataLayout-dependent constant folding.
- InstSimplify: Folds that only produce existing instructions or constants.
- InstCombine: General folds that may produce instructions and perform complex transforms.
Some folds in InstCombine/InstSimplify can apply to constant expressions as well, because PatternMatch matchers like m_Add()
handle both instructions and constant expressions. However, the fold still needs to be rooted at an instruction.
As such, folds implemented in InstSimplify/InstCombine will largely not apply to constant expressions. Duplicating them is not considered worthwhile, because complex constant expressions are rare, so they only implement a relatively small number of globals-related folds.
Missing DataLayout and other context
Constant expressions are created inside an LLVM context, but have no knowledge about the DataLayout of the module they will be used in. In fact, the same constant expression might be used in two different modules with different data layouts.
This is the reason behind the two different constant folders, where one is invoked during constant expression construction, and the other in select places, such as InstCombine.
Similarly, some folds depend on whether the null_pointer_is_valid
attribute is present on the function, but constant expressions have no knowledge of the function they are used in.
Another example is that constant folding of floating-point expressions may depend on the value of the "denormal-fp-math"
attribute (âš™ D116952 [ConstantFolding] Respect denormal handling mode attributes when folding instructions). Once again, constant expressions have no knowledge of this. Constant expressions also do not support fast-math-flags.
It’s worth noting that removing constant expressions will not entirely solve this problem by itself, because this is partially an issue of constant folding APIs. However, it would put us in a place where we can address the problem.
Proposal
The proposal is to remove support for all constant expressions that can not be lowered as a global variable initializer.
I am actually not completely sure what the final set of supported expressions will be (and this likely doesn’t have to be decided right away, because there are plenty of expressions that are definitely safe to remove).
Based on the implementations of lowerConstant()
and evaluateAsRelocatable()
, I expect that the final set will be close to the following:
getelementptr
bitcast
addrspacecast
ptrtoint
inttoptr
add
sub
trunc
It’s likely that this can be further reduced, but doing so might require introducing new specialized relocation constants.
For reference, the following additional expressions are supported by lowerConstant()
, but I believe are not supported by evaluateAsRelocatable()
for non-absolute operands:
mul
sdiv
srem
shl
and
or
xor
And finally, these expressions are supported by ConstantExpr
, but not by lowerConstant()
, so definitely cannot occur in initializers:
zext
sext
fptrunc
fpext
fptoui
fptosi
uitofp
sitofp
select
icmp
fcmp
fneg
fadd
fsub
fmul
fdiv
frem
udiv
urem
lshr
ashr
extractelement
insertelement
shufflevector
# These two expressions are not serializable in bitcode
extractvalue
insertvalue
Bitcode auto-upgrade
No longer supported constant expressions in old bitcode will be treated in one of two ways: If the expression is part of a global initializer, this will result in a deserialization error, because such expressions are no longer supported.
If the expression is part of an instruction operand, then it will be expanded into an instruction sequence. This may also require introducing new blocks on phi edges.
A prototype implementation for this is available at âš™ D127729 [Bitcode] Support expanding constant expressions into instructions.
API impact
As an example, let’s consider the removal of the extractvalue
expression. A prototype patch for this is available at ⚙ D125795 [IR] Remove support for extractvalue constant expression. I’m using this one as a sample, because it doesn’t have bitcode support, so is orthogonal to the previous point.
The main impact is that the ConstantExpr::getExtractValue()
method is removed. Usages should be replaced in one of two ways:
-
For usages that don’t care about the fact that the result is a constant,
IRBuilder::CreateExtractValue()
should be used. The IRBuilder accepts a folder (usingConstFolder
by default), which will automatically constant fold the expression if possible, and avoid creating an unnecessary instruction in that case. -
For usages where the result must be a constant,
ConstantFoldExtractValueInstruction()
can be used. It should be emphasized that this will only return something if the extractvalue folds, so a check fornullptr
may be necessary. This is unlikeConstantExpr::getExtractValue()
, which will return a constant expression if it does not fold.
Finally, the C API function LLVMConstExtractValue()
is removed. LLVMBuildExtractValue()
should be used instead.
While the above examples use extractvalue
, the basic general pattern should also hold for other expressions, though the number of usage-sites to update may vary. There is additional cleanup that can be performed once an expression is removed (e.g., PatternMatch no longer has to recognize the constant expression form), but those changes are not on the critical path.
Disadvantages
I believe the primary disadvantage of this change is a certain amount of API churn. Based on git grep
we have ~1400 uses of ConstantExpr::getXYZ()
, though I believe less than 900 of these would be affected, because the others use expression types that will be retained for now.
There’s also some risk of regressions in code that performs symbolic evaluation using constant folding APIs only (rather than InstSimplify for example), because unfolded expressions cannot be represented anymore. I doubt this will matter in practice though, and will be offset by increased folding elsewhere.