Here is my understanding and assumption about the placement of static allocas:
“All static allocas should appear in the entry basic block before any function call for better optimization opportunities. If there are interleaved static allocas with function call in-between, such an ir is considered broken, even though the ir is valid from correctness perspective. And if any pass is not adhering to the requirement that all static allocas should be placed in the entry block before any function call, then such a pass is considered broken since it may lead to surprising results in general.”
Let me know if my above understanding is correct or not.
IIRC you can interleave debug intrinsics (e.g. llvm.dbg.declare) with alloca instructions (at least the verifier doesn’t complain). Not sure if there are other intrinsics that fall into this category as well.
As I understand it, the verifier does not complain about the (mis-)placement of alloca, probably because it is not something related to correctness (in theory), but it is related to optimization opportunities (in practice).
So not sure about - what is the general rule being set on placement of static allocas?, or, if there is any such rule in the first place? or, front-end and opt passes are free about the placement of static allocas?
Frontends should place allocas in the entry block, and importantly, they should appear before any instruction that can later expand into control flow, such as an inlinable function call. Passes should not pessimize IR by inserting control flow before static allocas. The doc you linked to seems to cover that.
As far as rules go, this is not something that the verifier can enforce, because static allocas don’t carry a special “static” marker. The static property is determined simply by the placement of the instruction. There is the inalloca marker, but that’s not relevant here.
Frontends should place allocas in the entry block, and importantly, they
should appear before any instruction that can later expand into control
flow, such as an inlinable function call. Passes should not pessimize IR by
inserting control flow before static allocas. The doc you linked to seems
to cover that.
The interesting part is not as much control flow as it is non-alloca
instructions because control flow already breaks our current "canonical form".
So, do we want to say that allocas should not be interleaved with any
other instruction in the entry block (in our canonical form), or should
we say that canonical form is "just" requiring them to be in the entry.
We only do the latter explicitly today. Some FEs and passes insert code
in-between allocas, e.g. as-casts or debug metadata. I'd also not be surprised
if we find more cases that insert a cast or similar before an alloca.
One way to determine how different those two are in practice would be to
stop scanning the entire entry block in SROA. Any non-clustered alloca won't
easily be promoted and show up as a blip in our monitoring.
I personally don't feel strongly here though I imagine the currently written
down canonical form is simpler to maintain. Clustered static allocas can be
created for all static allocas with a simple scan+move over the entry. I think
maintaining clustered allocas is unnecessarily hard is because passes that
introduce casts (or similar) need to have a special check for the alloca/entry
block case. That said, it's not impossible.
Long story short, I feel SROA should define the canonical form here. It will
give people a strong incentive to follow that form
Frontends should place allocas in the entry block, and importantly, they
should appear before any instruction that can later expand into control
flow, such as an inlinable function call. Passes should not pessimize IR by
inserting control flow before static allocas. The doc you linked to seems
to cover that.
The interesting part is not as much control flow as it is non-alloca
instructions because control flow already breaks our current “canonical
form”.
So, do we want to say that allocas should not be interleaved with any
other instruction in the entry block (in our canonical form), or should
we say that canonical form is “just” requiring them to be in the entry.
We only do the latter explicitly today. Some FEs and passes insert code
in-between allocas, e.g. as-casts or debug metadata. I’d also not be
surprised
if we find more cases that insert a cast or similar before an alloca.
One way to determine how different those two are in practice would be to
stop scanning the entire entry block in SROA. Any non-clustered alloca won’t
easily be promoted and show up as a blip in our monitoring.
I personally don’t feel strongly here though I imagine the currently written
down canonical form is simpler to maintain. Clustered static allocas can be
created for all static allocas with a simple scan+move over the entry. I
think
maintaining clustered allocas is unnecessarily hard is because passes that
introduce casts (or similar) need to have a special check for the
alloca/entry
block case. That said, it’s not impossible.
Long story short, I feel SROA should define the canonical form here. It will
give people a strong incentive to follow that form
~ Johannes
Actually it is a gray area - On one hand, it is not an explicitly mandated/enforced requirement/rule since it cannot be, and on the other hand, such a canonical form is required for better code optimization/transformation.
Given this situation, I personally think that it is always better to maintain the canonical form - In the worst case, all the static allocas should be placed before any call, and in the best case, it is ideal to maintain all static allocas at the top of the entry block as one cluster.
How do we maintain it is a next sequel question if we all agree to the above required canonical form, and it all should start from the front-end.
Frontends should place allocas in the entry block, and importantly, they
should appear before any instruction that can later expand into control
flow, such as an inlinable function call. Passes should not pessimize IR
by
inserting control flow before static allocas. The doc you linked to seems
to cover that.
The interesting part is not as much control flow as it is non-alloca
instructions because control flow already breaks our current "canonical
form".
So, do we want to say that allocas should not be interleaved with any
other instruction in the entry block (in our canonical form), or should
we say that canonical form is "just" requiring them to be in the entry.
We only do the latter explicitly today. Some FEs and passes insert code
in-between allocas, e.g. as-casts or debug metadata. I'd also not be
surprised
if we find more cases that insert a cast or similar before an alloca.
One way to determine how different those two are in practice would be to
stop scanning the entire entry block in SROA. Any non-clustered alloca
won't
easily be promoted and show up as a blip in our monitoring.
I personally don't feel strongly here though I imagine the currently
written
down canonical form is simpler to maintain. Clustered static allocas can be
created for all static allocas with a simple scan+move over the entry. I
think
maintaining clustered allocas is unnecessarily hard is because passes that
introduce casts (or similar) need to have a special check for the
alloca/entry
block case. That said, it's not impossible.
Long story short, I feel SROA should define the canonical form here. It
will
give people a strong incentive to follow that form
~ Johannes
Actually it is a gray area - On one hand, it is not an
explicitly mandated/enforced requirement/rule since it cannot be, and on the
other hand, such a canonical form is required for better code
optimization/transformation.
If there are interleaved static allocas with function call in-between, such an ir is considered broken, even though the ir is valid from correctness perspective.
This is a strange use of the word "broken". Broken generally means not
correct or not valid.
Maybe say something like "such an ir is considered suboptimal"?