What a coincidence! I happen to be looking at MCExpr as well and jotted down some notes ([VE] Move MCSymbolRefExpr::VK_VE_ to VEMCExpr:: · Issue #130003 · llvm/llvm-project · GitHub).
I think MCSymbolRefExpr::VariantKind is not an appropriate way to encode relocations.
- other expressions, like MCConstantExpr (e.g., PPC
4@l) and MCBinaryExpr (e.g., PPC(a+1)@l), also need it - semantics become unclear (e.g., folding expressions with
@). - The generic interface
MCSymbolRefExproffers no target-specific extension point. Any target-specific logic will pollute the generic interface.
Consider what happens with addition or subtraction:
MCBinaryExpr
LHS(MCSymbolRefExpr): VariantKind, SymA
RHS(MCSymbolRefExpr): SymB
Here, the specifier attaches only to the LHS, leaving the full result uncovered. This awkward design demands workarounds.
- Parsing
a+4@gotexposes clumsiness. AfterAsmParser::parseExpressionprocessesa+4, it detects@gotand retrofits it ontoMCSymbolRefExpr(a), which feels hacked together. - PowerPC’s @l @ha optimization needs
PPCAsmParser::extractModifierFromExprandPPCAsmParser::applyModifierToExprto convert aMCSymbolRefExprto aPPCMCExpr. - Many targets (e.g., X86) use
MCValue::getAccessVariantto grab LHS’s specifier, thoughMCValue::RefKindwould be cleaner.
Worse, leaky abstractions that MCSymbolRefExpr is accessed widely in backend code introduces another problem:
while MCBinaryExpr with a constant RHS mimics MCSymbolRefExpr semantically, code often handles only the latter.
MCTargetExpr subclasses, as used by AArch64 and RISC-V, offer a cleaner approach to encode relocations.
(MIPS migrated from MCSymbolRefExpr::VariantKind to MCTargetExpr in 2016 (⚙ D19716 [mips] Use MipsMCExpr instead of MCSymbolRefExpr for all relocations.).)
Ideally, limit MCTargetExpr to top-level use to encode one single relocation and avoid its inclusion as a subexpression.
MCSymbolRefExpr::VariantKind as the legacy way to encode relocations should be completely removed (probably in a distant future as many cleanups are required).
- Is there another way to handle such a relocation?
- If not, do you have any idea how to fix that condition?
The generic llvm/lib/MC/MCParser/AsmParser.cpp might be where MCSymbolRefExpr::VariantKind gets created. One fix could be tweaking the target-specific XXXAsmParser::parseExpression to replace it with a top-level MCTargetExpr. As for handling cases like x@gotpcrel+1 vs. x+1@gotpcrel, we could simplify by treating them the same—ignoring where @ sits in the expression. This feels similar to how the GNU assembler works, storing relocations in a relocs[] list and keeping the expression structure itself relocation-agnostic.
The generic llvm/lib/MC/MCParser/AsmParser.cpp creates expressions with MCSymbolRefExpr::VariantKind.
We should consider it apply to the whole expression, not just the subexpression.
This feels similar to how the GNU assembler works, storing relocations in a relocs[] list associated with the instruction and keeping the expression structure itself relocation-agnostic.
The target-specific XXXAsmParser::parseExpression can be customized to remove MCSymbolRefExpr::VariantKind and add a top-level MCTargetExpr.
As for handling cases like x@gotpcrel+1 vs. x+1@gotpcrel (x86), we could simplify by treating them the same-ignoring where @ sits in the expression.
You might look at PPCAsmParser::parseExpression for inspiration.
I am working on a refactoring to eliminate MCSymbolRefExpr::VariantKind::VK_PPC_* (GitHub - MaskRay/llvm-project at ppc-mcexpr ; will still use the VariantKind member in MCSymbolRefExpr; should switch to MCTargetExpr in the future, but I might not have energy to do that)
MCBinaryExpr(SystemZMCExpr(MCSymbolRefExpr(sym)), MCConstantExpr(8))
We should limit MCTargetExpr to top-level use to encode one single relocation and avoid its inclusion as a subexpression.