[RFC] Remove most constant expressions

Background

Most side-effect-free instructions are currently also supported as constant expressions. In this example, @test1() uses an instruction, while @test2() uses an equivalent constant expression:

@g = external global i32

define i64 @test1() {
  %addr = ptrtoint @g to i64
  ret i64 %addr
}

define i64 @test2() {
  ret i64 ptrtoint @g to i64
}

The primary purpose of constant expressions are global variable initializers, which cannot contain instructions. For example, @g.end here is initialized to a pointer to the end of @g:

@g = global i32 0
@g.end = global ptr getelementptr (i32, ptr @g, i64 1)

However, while global initializers accept all kinds of expressions on the IR level, only a very narrow set is supported at the MC layer: Initializers must be “relocatable”, which essentially limits expressions to adding an offset to other globals (and possibly computing the difference between two globals).

This RFC proposes to reduce the set of supported constant expressions to those that can actually be used in global variable initializers, and drop support for everything else (such as floating-point operations, vector operations and devision operations).

Motivation for removal

Trapping constant expressions

Because the udiv, sdiv, urem and srem opcodes are supported as constant expressions, it is possible for constant expressions to trigger undefined behavior. This means that such constant expressions or instructions referencing them cannot be speculated.

Consider this example:

@g = extern_weak global i32

define i64 @test(i1 %c) {
entry:
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  %phi = phi i64 [ poison, %if ], [ srem (i64 1, i64 ptrtoint (i32* @g to i64)) , %entry ]
  ret i64 %phi
}

Normally, opt -instsimplify would fold the phi node into the only non-poison value:

@g = extern_weak global i32

define i64 @test(i1 %c) {
entry:
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  ret i64 srem (i64 1, i64 ptrtoint (i32* @g to i64))
}

However, this transformation is illegal, because the srem (which might divide by zero if @g is null) is now executed unconditionally, while it was previously executed only on the %entry -> %join edge.

While this particular instance has been fixed, opt -instcombine still miscompiles this as of this writing. See also issue #49839 for a long string of related bugs in various passes.

Because these kinds of trapping division expressions appear very rarely in practice, a lot of code is not aware that they require special handling. Fixing one case usually just shifts the problem to another transform.

Infinite combine loops

Constant expressions are the most common source of infinite loops inside InstCombine.

To give an example of how such a loop could occur, consider this function:

define i64 @test(i64 %x) {
  %sub = sub i64 %x, ptrtoint (ptr @g to i64)
  ret i64 %sub
}

This contains a X - C pattern, which is usually canonicalized to X + (-C), on the assumption that -C is going to fold. However, if C is a constant expression, then -C may not fold, and we get:

define i64 @test(i64 %x) {
  %sub = add i64 %x, sub (i64 0, i64 ptrtoint (ptr @g to i64))
  ret i64 %sub
}

At this point we have an X + (-Y) pattern, which is canonicalized to X - Y, and we end up with the original function. We will then toggle back and forth between these two representations.

This is why InstCombine needs to take special care when handling constants, for example by using m_ImmConstant() in certain transforms.

Exponential IR representation

Constant expressions with reused operands have small in-memory and bitcode representation, but may have exponentially large textual IR representation.

For example, take the following function:

define i64 @test(i64 %x1) {
  %x2 = mul i64 %x1, %x1
  %x3 = mul i64 %x2, %x2
  %x4 = mul i64 %x3, %x3
  %x5 = mul i64 %x4, %x4
  %x6 = mul i64 %x5, %x5
  ret i64 %x6
}

If we replace %x1 with ptrtoint (ptr @g to i64), then the resulting function will look like this (mind the scroll bar!):

define i64 @test() {
  ret i64 mul (i64 mul (i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)))), i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))))), i64 mul (i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)))), i64 mul (i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))), i64 mul (i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64)), i64 mul (i64 ptrtoint (ptr @g to i64), i64 ptrtoint (ptr @g to i64))))))
}

For each additional multiply, the size of the textual IR increases by a factor of two, because sub-expressions have to be repeated on every use.

This is usually not an issue for normal compilation (because it will never produce textual IR), but it is an issue when debugging or producing test cases.

Cost modelling

Both in ad-hoc and TTI-driven cost models, constants are usually considered to be zero-cost. This does not account for the fact that constant expressions might end up expanding to many instructions in the backend.

The example from the previous section would get cost-modelled like a simple return i64 123, rather than the sequence of five multiplies it really is.

Missing / duplicate folds

Because instructions and constant expressions form distinct hierarchies, folds for one generally do not translate into folds for the other. LLVM’s primary folding pipeline consists of four parts:

  1. ConstFold: DataLayout-independent constant folding, invoked as part of constant expression construction.
  2. ConstantFolding: DataLayout-dependent constant folding.
  3. InstSimplify: Folds that only produce existing instructions or constants.
  4. InstCombine: General folds that may produce instructions and perform complex transforms.

Some folds in InstCombine/InstSimplify can apply to constant expressions as well, because PatternMatch matchers like m_Add() handle both instructions and constant expressions. However, the fold still needs to be rooted at an instruction.

As such, folds implemented in InstSimplify/InstCombine will largely not apply to constant expressions. Duplicating them is not considered worthwhile, because complex constant expressions are rare, so they only implement a relatively small number of globals-related folds.

Missing DataLayout and other context

Constant expressions are created inside an LLVM context, but have no knowledge about the DataLayout of the module they will be used in. In fact, the same constant expression might be used in two different modules with different data layouts.

This is the reason behind the two different constant folders, where one is invoked during constant expression construction, and the other in select places, such as InstCombine.

Similarly, some folds depend on whether the null_pointer_is_valid attribute is present on the function, but constant expressions have no knowledge of the function they are used in.

Another example is that constant folding of floating-point expressions may depend on the value of the "denormal-fp-math" attribute (⚙ D116952 [ConstantFolding] Respect denormal handling mode attributes when folding instructions). Once again, constant expressions have no knowledge of this. Constant expressions also do not support fast-math-flags.

It’s worth noting that removing constant expressions will not entirely solve this problem by itself, because this is partially an issue of constant folding APIs. However, it would put us in a place where we can address the problem.

Proposal

The proposal is to remove support for all constant expressions that can not be lowered as a global variable initializer.

I am actually not completely sure what the final set of supported expressions will be (and this likely doesn’t have to be decided right away, because there are plenty of expressions that are definitely safe to remove).

Based on the implementations of lowerConstant() and evaluateAsRelocatable(), I expect that the final set will be close to the following:

getelementptr
bitcast
addrspacecast
ptrtoint
inttoptr
add
sub
trunc

It’s likely that this can be further reduced, but doing so might require introducing new specialized relocation constants.

For reference, the following additional expressions are supported by lowerConstant(), but I believe are not supported by evaluateAsRelocatable() for non-absolute operands:

mul
sdiv
srem
shl
and
or
xor

And finally, these expressions are supported by ConstantExpr, but not by lowerConstant(), so definitely cannot occur in initializers:

zext
sext
fptrunc
fpext
fptoui
fptosi
uitofp
sitofp
select
icmp
fcmp
fneg
fadd
fsub
fmul
fdiv
frem
udiv
urem
lshr
ashr
extractelement
insertelement
shufflevector
# These two expressions are not serializable in bitcode
extractvalue
insertvalue

Bitcode auto-upgrade

No longer supported constant expressions in old bitcode will be treated in one of two ways: If the expression is part of a global initializer, this will result in a deserialization error, because such expressions are no longer supported.

If the expression is part of an instruction operand, then it will be expanded into an instruction sequence. This may also require introducing new blocks on phi edges.

A prototype implementation for this is available at ⚙ D127729 [Bitcode] Support expanding constant expressions into instructions.

API impact

As an example, let’s consider the removal of the extractvalue expression. A prototype patch for this is available at ⚙ D125795 [IR] Remove support for extractvalue constant expression. I’m using this one as a sample, because it doesn’t have bitcode support, so is orthogonal to the previous point.

The main impact is that the ConstantExpr::getExtractValue() method is removed. Usages should be replaced in one of two ways:

  1. For usages that don’t care about the fact that the result is a constant, IRBuilder::CreateExtractValue() should be used. The IRBuilder accepts a folder (using ConstFolder by default), which will automatically constant fold the expression if possible, and avoid creating an unnecessary instruction in that case.

  2. For usages where the result must be a constant, ConstantFoldExtractValueInstruction() can be used. It should be emphasized that this will only return something if the extractvalue folds, so a check for nullptr may be necessary. This is unlike ConstantExpr::getExtractValue(), which will return a constant expression if it does not fold.

Finally, the C API function LLVMConstExtractValue() is removed. LLVMBuildExtractValue() should be used instead.

While the above examples use extractvalue, the basic general pattern should also hold for other expressions, though the number of usage-sites to update may vary. There is additional cleanup that can be performed once an expression is removed (e.g., PatternMatch no longer has to recognize the constant expression form), but those changes are not on the critical path.

Disadvantages

I believe the primary disadvantage of this change is a certain amount of API churn. Based on git grep we have ~1400 uses of ConstantExpr::getXYZ(), though I believe less than 900 of these would be affected, because the others use expression types that will be retained for now.

There’s also some risk of regressions in code that performs symbolic evaluation using constant folding APIs only (rather than InstSimplify for example), because unfolded expressions cannot be represented anymore. I doubt this will matter in practice though, and will be offset by increased folding elsewhere.

4 Likes

I think this is going in the right direction, in terms of the infrastructure required to change our representation of constant expressions.

I think the proposal should try to explain where we eventually want to end up; continuing to use ConstantExpr with a limited set of opcodes doesn’t feel like an endpoint. It still has many of the same issues described in the “background”, just with less impact because constant expressions show up less often.

1 Like

I think this is great! Thanks for the research effort and for putting forward this proposal!

It would be awesome if constant exprs could not trap, but from your list, you are still unsure about sdiv & srem. Nevertheless, it seems your proposal has enough value even if canTrap() can’t go away.

P.S.: Not sure you have seen this function: LLVM: llvm::ConstantExpr Class Reference
Your bitcode migration patch reimplements part of it. Not sure you can fully reuse it as you don’t want (?) to convert all of a constant expr to instructions, maybe only partially, but just wanted to make sure you knew about that function.

I’m not entirely sure what the endpoint here is going to be. I do think that supporting a limited set of constant expressions is a viable outcome – you are right that this won’t resolve all issues, but it will resolve some (e.g. trapping constants) and greatly mitigate others (e.g. because we drop all expressions operating on floats and aggregates).

For another possible outcome I’ll just quote Chris Lattner from https://discourse.llvm.org/t/design-issues-in-llvm-ir/58436:

With this approach, we’d replace remaining constant expressions with different Constant kinds that more directly represent relocations supported for global initializers. These would effectively be a new type of “constant expression” that does not mirror any IR instruction.

A disadvantage of this is that we can no longer handle a getelementptr instruction and a RelocatableConstant the same way, even though they are basically the same thing.

Maybe the ideal outcome here would be to only retain the getelementptr constant expression together with what Chris calls the MachoRelocatableConstant (so we don’t need to represent this using ptrtoint and sub).

Ultimately, I think that a lot of the ground work is going to be the same regardless of how exactly the endpoint looks like, which is why my plan was to start removing the parts that we can remove without replacement.

Probably a good way to make sure these can really be removed is to adjust lowerConstant() to reject these already (rather than waiting for evaluateAsRelocatable()) and check whether that breaks anything. I’ll give that a try.

Yeah, I’m aware of this function. However, the bitcode reader doesn’t expand ConstantExpr, but rather an internal representation of the constant expression (called BitcodeConstant in the patch) – after all, once the constant expression is no longer supported, it will no longer be possible to construct such a ConstantExpr for expansion purposes either. So I believe we do need to repeat this code.

Strong +1 on the direction here. I think reducing our usage of constantexpr even if we don’t get all the way to removing it entirely is a very good idea.

One thing we could consider would be to, in addition to trying to remove certain node types entirely, disallow constant expressions rooted in anything other than global initializers. Just because we can express a constant expression doesn’t mean we need to.

This seems like a great direction!

However, I’d like to mention another potential downside:

Currently, if new kinds of relocations are defined, we can support them with no higher level syntax changes. If we cut this down, we’ll need to add back some new form of constant expression to the IR every time something new is supported by a target.

For example, RISCV theoretically supports any arbitrary combination of add/sub symbol operands as link-input. (“theoretically”, because neither the llvm nor gnu assembler actually handles it). You can perfectly well represent A + B + C + D in an object file – and if they are all link-time constants, create a binary from it too. Should we also preserve support for that kind of expression in the IR?

Of course, we don’t actually support it today, despite it being representable, so maybe we shouldn’t worry too much about that sort of thing…

A potentially solution to that would be introduce a call constantexpr. It would call an arbitrary LLVM function (but not external), and then it would be up to the backend to support whatever subset they wished.
That way, costantexpr could be deleted altogether (or rather keep just a call).

This solution sounds too simple; there’s probably a catch that I’m missing. A the very least we would need to make sure no harmful optimizations would be run on such functions, such as loop reroll, function outlining, probably others.

I’ve put up ⚙ D127972 [AsmPrinter] Further restrict expressions supported in global initializers to limit expressions accepted by lowerConstant() – it looks we can can indeed drop all the binary expressions apart from add/sub without issue (and in particular, we can drop the div/rem expressions).

I think the main complication with doing this would be the need to expand constant expressions to instructions whenever an initializer ends up in normal function IR. For example, it would not be possible to simply RAUW a load with the result of ConstantFoldLoad(), one would potentially have to replace it an expanded instruction sequence instead. Though probably the number of places where this is done is fairly limited.

We currently also use inttoptr -1 as a replacement for dead blockaddress references, where we do need the ability of simple RAUW, but we could also introduce a dedicated Constant for that.

Under the current proposal, this would still be supported (as both ptrtoint and add are retained). But yes, if we restrict things further, then we would be hardcoding currently supported relocation types and need to introduce new constants if a new relocation type shows up.

I’m not sure how a call constant expression would help – that seems to go in the opposite direction of this proposal, by removing all limits on constant expressions. For example, we’d be back to the problem where the called function can trap. It will also make it hard to include such expressions in optimizations.

Maybe worth noting that LLVM already has an llvm.global_ctors mechanism for init_array functions that can perform arbitrarily complex global initialization, but this is orthogonal to the constant expression case, where we are only interested in initializations that are directly supported by relocations, and do not require running a startup function.

I am very +1 on moving forward with improving this area. Nikita’s quote of my quote above still captures roughly my opinion here. I think that global variable initializers are a “different thing” than LLVM IR operations, and the work on LLVM IR for type-less pointers etc is further dilluting the ability of LLVM IR to support structural analysis, so I think it is fine if they continue to diverge.

Regarding LLVM IR support for structural analysis, Arthur & I tried to backfill some of the reasoning for why LLVM had pointee types in the first place, what they were for, and how they did or didn’t work out in https://reviews.llvm.org/D126309, which should be live at the main documentation. PTAL if you have additional commentary. Right now a lot of LLVM users are asking themselves the question, “why is this disruptive migration happening”, and I want them to find the true story and reasoning.

As for my opinion on the topic at hand, reducing the generality of ConstantExpr seems like a good step in the direction of creating a dedicated global initializer or relocatable expression representation. Let me direct your attention to getImageRelativeConstant in Clang. It creates a pretty gnarly pointer difference constant expression, and we need to handle that.

I’ll give you the answer, but it isn’t a good answer. I support the move to opaque pointers.

The background on this has to do with where LLVM came from: Vikram and I (and his research group) were working on heroic optimizations for data structure optimization of C/C++ programs, e.g. automatic pool allocation and pointer compression, and aggressive field sensitive pointer analysis.

As part of this, you need to do field sensitive analysis of the IR and need to decide when things are too unstructured to bother. As such, it made sense to build this info into the IR. This was all great research and I think there are still a bunch of good ideas there that could be explored further and have more impact. That said, while It was an impressive tour-de-force, LLVM is really the wrong level of abstraction to do this sort of thing. I’ve also learned with bitter experience that heroic compiler optimizations are not great for users of the compiler so if someone were to build these sorts of tools, I think it makes more sense to be in the context of a static analysis helper tool than an automatic optimization.

There was also an idea that GEP would be useful for dependence analysis of fortran style loops etc, which never seriously came to pass because LLVM is too low level for interesting fortran style loop transformations (something that has long been a problem in the llvm community). There are also some SPEC hacks in GlobalOpt that this sort of info made convenient, they are similar heroic optimizations.

In short, my current opinion is that LLVM should be true to what it is good at. It is an abstraction for CPUs with vectors, and it supports a capable suite of mid-level optimizations and codegen. The ConstantExpr representation does not need to reflect the SSA IR, that was a mistake - as @nikic mentions, the important thing to model (beyond simple int/float/pointer/vector constants) are relocatable constants, constant arrays for global initializers, etc.

I think it would also be interesting to consider making the initializers of GlobalVariable not be a Value* at all, it could be its own separate class. This would make Constant* just need to the simple constants that occur in function bodies.

-Chris

1 Like

Slowly moving away from constants is a good thing.

We could take a leaf out of MLIR’s book. In MLIR, global initializers can be single-block regions. Translated to LLVM, you’d have something like

@array = global [8 x i32]
@var = global ptr initializer {
  %p = getelementptr [8 x i32], ptr @array, i32 0, i32 4
  ret ptr %p
}

The GEP in there would not be a ConstantExpr but a normal GEPInst. Presumably, the verifier would restrict the code patterns in these blocks tightly (i.e. short whitelist of instructions).

Of course, this depends on where we want to see LLVM going overall. Personally, I’d like to see the MLIR and LLVM class hierarchies to move closer together in terms of capabilities because I think the strict split between the two hurts our capabilities and in particular what we can do in GPU compiler stacks. If we choose to step cautiously in the direction of bringing the class hierarchies closer together, then the above is a fairly natural representation of global variable initializers and avoids the problem of a special-purpose operator just for relocation initializers.

Can you elaborate on why reusing operations in a limited form (forming a mini dialect) is better than modeling the domain problem directly? LLVM IR already does this with the constant* grammar and it is a mess.

If mlir tells us anything, it is that artificial reuse of superficially similar things is not needed - it is better to create abstractions to directly model the domain in question. This makes reasoning about invariants (eg verification) possible and makes transformations more obviously correct.

One of the major problems we have today with the constant grammar is that most of it isn’t valid in global variable initializers (eg you can’t divide the addresses of two globals casted to integers), because global initializers can only express what their target dependent relocation system can handle. Lying about this leads to a bunch of problems and complexity. I think llvm should be honest about what the targets can do, rather than mislead front end authors about it.

-Chris

I see two advantages, both of them around simplicity and uniformity of representation.

The first is that if we introduce a dedicated RelocationConstant, it is going to “leak” out of global variable initializers into code and we will eventually have to teach all pattern matching, alias analysis, etc. about the equivalence between RelocationConstant and GEPs. By contrast, if the GEP is part of the global variable initializer, then any “leakage” of initializers will just result in GEPs in the code, which we know how to deal with.

The second is that I see a less complex C++ class hierarchy as a good thing. Constants are part of the Value hierarchy, but they are special in that they are owned by the LLVMContext instead of the Module or Function. This is a major blocker for multi-threading because of def-use lists.

The Constant sub-hierarchy is also a constant source of confusion for newcomers. Judging by past history of mailing list (and Discourse) postings, it seems to happen with some regularity that folks get tripped up over not being able to RAUW on constants, for example.

To be honest, I’d prefer it if the entire Constant sub-hierarchy went away in favor of constant materialization instructions.

There are many useful lessons in MLIR indeed. I’d say that not having a Constant hierarchy at all is one of them :slight_smile:

Another lesson is the power of separating the IR (set of instructions/operations and their semantics) from the C++ substrate (the Value, (Basic)Block, etc. classes) that is used to represent that IR. In MLIR, abstractions are freely created as you say, but importantly they are all created on top of a rather regular and highly re-used C++ substrate.

In summary, my main thrust is actually the longer-term view that we should eliminate the Constant hierarchy. If you start from there, you then get to the question of how to represent global variable initializers. And if you then ask yourself how to do that without introducing a new C++ class hierarchy from scratch, then what LLVM-IR-on-MLIR does falls out fairly naturally. Though I’d be happy to read other ideas :slight_smile:

I think there’s a general consensus in favor of moving away from constant expressions as much as possible. The question of what to do with remaining expressions is still open, with some of the possibilities being:

  • Retain a small set of constant expressions in current form.
  • Introduce special constants like RelocatableConstant that can only represent actually supported expressions.
  • Possibly in conjunction with the previous point, limit the use of some Constants to initializers and don’t allow using them in function IR. This would require expansion to instructions when using an initializer in function IR.
  • Represent initializers as special functions containing normal instructions. These could be cloned when using an initializer in function IR.

Unfortunately, I don’t think that one of these is strictly superior, they each introduce complications in different places.

@nhaehnle In addition to what Chris said, one thing I would be concerned about the “initialization function” approach is that it would make constant folding loads on aggregates hard and/or very inefficient. If you currently have a constant struct/array with some constant expression elements, then it’s easy to fold a load of one of them. If you have an initialization function that builds up the struct elements one by one (using insertvalue), one wouldn’t be able to efficiently determine the contents of one element.

As a side note, I would like if (long term) globals moved away from being represented structurally at all, towards being a “bag of bytes” basically, where you just have certain values or relocatable expressions stored at certain offsets, and nothing more. (With opaque pointers, we mostly already treat globals this way, but there are annoying holdovers. For example, global_ctor evaluation is limited in what ctors it can evaluate, because it tries to hold on to the structure of the global.)


In any case, I’m wondering who would be interested in reviewing patches for the constant expression removal? This is my current patch stack:

I was thinking we’d introduce make{struct,array,vector} instructions as the moral equivalent of Constant{Struct,Array,Vector}. That said:

I do like this idea as well. It sounds compatible with getting rid of the Constant hierarchy entirely and the representation of initializers becomes simpler overall.

I took the liberty of linking those together on Phabricator, I hope you don’t mind.

I believe introducing a special dialect like RelocatableConstant is a great idea.