[RFC] Syncing Asm Goto with Outputs with GCC

Clang’s extension of “asm goto” to support output constraints prompted GCC to
do the same. However, there are some differences between the two
implementations. In particular, GCC allows output constraints to be used on the
indirect branches, while Clang doesn’t.

GCC’s implementation of inline assembly is typically seen as the definitive
implementation. So I believe that we should sync our implementation to align
with GCC’s implementation.

The main difference between Clang’s and GCC’s implementations is how C labels
are handled. Clang allows them to be treated as data. GCC does to a certain
extent, but doesn’t guarantee that the value for a specific label will be the
same in all cases. This was the main reason why Clang chose not to support
output constraints on indirect branches.

Here’s a simple example.

$ cat asm-goto.c
int foo(int a)
{
	int b = 0;

	if (a)
		asm goto ("mov %l1, %0" : "=r" (b) : : : err);
	else
		asm goto ("mov %l1, %0" : "=r" (b) : : : err);

	return b;
err:
	return b;
}

GCC emits this:

$ gcc -o - -S asm-goto.c -O2
	.file	"asm-goto.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	testl	%edi, %edi
	je	.L2
#APP
# 6 "asm-goto.c" 1
	mov .L3, %eax
# 0 "" 2
#NO_APP
.L3:
	ret
	.p2align 4,,10
	.p2align 3
.L2:
#APP
# 8 "asm-goto.c" 1
	mov .L5, %eax
# 0 "" 2
#NO_APP
.L5:
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 12.2.0-1) 12.2.0"
	.section	.note.GNU-stack,"",@progbits

Note that even though the original “asm goto” statements refer to the same C
label, the values stored in “b” aren’t the same.

Implementation

Clang doesn’t have a restriction from using the output of an asm goto on the
indirect branch. I.e., there’s nothing that checks and warns about doing so.
The bulk of the implementation work will be in verifying that the code Clang
emits is correct for all situations. Given how asm goto is implemented in LLVM,
most changes will be in the back-end. (I know this is a bit hand-wavy, but it’s
difficult to estimate how much work it will take until implementation starts.)

-bw

We’ve basically already moved away from the whole “labels as data” thing, in anticipation of this, and because blockaddresses are hard to optimize. clang doesn’t emit blockaddresses for asm goto anymore. Given we’ve already done that, supporting the gcc semantics is basically just implementation work; there aren’t any interesting semantic questions.

Overall, the proposal seems fine.

Completely agree. My naïve wish is that it will “just work” with only a few kinks to work out.

I’ve been working on this for a while now; I think I’ve come to the same conclusion as @rnk did years ago that a new landingpad-like instruction would be helpful.

Via lots of iteration, I think I now have this all coded up and working. I did write up an RFC to accompany the patches. I’m going to spend a little more time polishing the series, then will publish the patches and RFC soon.

PDF of my RFC [RFC] handling outputs along indirect edges of callbr - callbrpad.pdf (128.2 KB)

Patch series:

  1. [IR] add new callbrpad instruction
  2. [Clang] refactor CodeGenFunction::EmitAsmStmt NFC
  3. [Clang] support for outputs along indirect edges of asm goto

cc @rnk @jyknight @efriedma-quic @topperc @nikic @arsenm

See also: ⚙ D139565 [IR] add new callbrpad instruction

Summarizing feedback from ⚙ D139565 [IR] add new callbrpad instruction as the new plan:

  1. allow the use of callbr values along ANY edge; direct or indirect. i.e. restore ⚙ D135997 [Dominators] check indirect branches of callbr.
  2. Create a new function pass (as part of TargetPassConfig::addISelPasses so it runs before isel on LLVM IR) that:
    A. for each basic block terminator if it’s a callbr that has outputs that have uses:
    a. splits indirect edges that happen to also be critical edges (if any) (akin to ⚙ D138078 [SelectionDAGISel] split critical indirect edges from callbr w/ outputs, but in a dedicated pass)
    b. inserts a new SSA value produced from either a intrinsic call. This will be used a marker during SelectionDAGBuilder to denote where physreg to virtreg copies need to go. It need not reference the corresponding callbr since we split critical edges in 2.A.a previously, so it’s unambiguous that our corresponding callbr is the terminator of my lone predecessor.
    c. Replaces the uses dominated by the indirect edge to use the newly created value. We might need to insert phis where previously there were none. llvm::SSAUpdater can help with this.
  3. Add a mapping from CallBrInst to either INLINEASM_BR MachineInstr or first defined virtreg (not sure yet), probably in SelectionDAGBuilder’s FunctionLoweringInfo instance member. Create an entry in this mapping when lowering a callbr or inlineasm. Consume this mapping info when visiting the “new value” created from 2.A.b above to insert COPYs as necessary. (May need to handle visiting the use before the def).

Ideas for names for the intrinsic: callbrdef, terminatordef, callbrpad, callbr_landingpad.

With `amdgpu-isel` appears to be the name for SelectionDAGISel for some targets? · Issue #59538 · llvm/llvm-project · GitHub out of the way, I think the following patches are ready for review:

  1. ⚙ D135997 [Dominators] check indirect branches of callbr
  2. ⚙ D140166 [IR] return nullptr in Instruction::getInsertionPointAfterDef for CallBrInst
  3. ⚙ D139861 [llvm] boilerplate for new callbrprepare codegen IR pass
  4. ⚙ D139872 [llvm][CallBrPrepare] split critical edges
  5. ⚙ D139883 [llvm][CallBrPrepare] add llvm.callbr.landingpad intrinsic
  6. ⚙ D139970 [llvm][CallBrPrepare] use SSAUpdater to use intrinsic value
  7. ⚙ D140160 [llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic
  8. ⚙ D137113 [Clang] refactor CodeGenFunction::EmitAsmStmt NFC
  9. ⚙ D136497 [Clang] support for outputs along indirect edges of asm goto

cc @efriedma-quic @gwelymernans @jyknight

  1. [llvm][SelectionDAGBuilder] change callbr.landingpad intrinsic to accept explicit param

Updated list:

  1. :gear: D135997 [Dominators] check indirect branches of callbr
  2. :gear: D140166 [IR] return nullptr in Instruction::getInsertionPointAfterDef for CallBrInst
  3. :gear: D139861 [llvm] boilerplate for new callbrprepare codegen IR pass
  4. :gear: D139872 [llvm][CallBrPrepare] split critical edges
  5. :gear: D139883 [llvm][CallBrPrepare] add llvm.callbr.landingpad intrinsic
  6. :gear: D139970 [llvm][CallBrPrepare] use SSAUpdater to use intrinsic value
  7. :gear: D140160 [llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic
  8. ⚙ D140180 [llvm] add CallBrPrepare pass to pipelines
  9. :gear: D137113 [Clang] refactor CodeGenFunction::EmitAsmStmt NFC
  10. :gear: D136497 [Clang] support for outputs along indirect edges of asm goto
  11. ⚙ D142940 [llvm][SelectionDAGBuilder] change callbr.landingpad intrinsic to accept explicit param
  12. ⚙ D140508 [clang] fix -Wuninitialized for asm goto outputs on indirect edges.
  13. ⚙ D143205 [clang] add __has_extension(gnu_asm_goto_with_outputs_full)
  14. ⚙ D143961 [llvm][SelectionDAGBuilder] use getRegistersForValue to generate legal copies

ok, everything has been reviewed and tested extensively on the Linux kernel. My next steps are:

  1. squash #2 into #1
  2. squash #11 into #5
  3. squash #14 into #7

Then publish one final rebased+reformatted set, then merge tonight.

Landed as:

  1. commit 45a291b5f609 (“[Dominators] check indirect branches of callbr”)
  2. commit fb471158aa0d (“[llvm] boilerplate for new callbrprepare codegen IR pass”)
  3. commit 0a39af0eb72d (“[llvm][CallBrPrepare] split critical edges”)
  4. commit 094190c2f529 (“[llvm][CallBrPrepare] add llvm.callbr.landingpad intrinsic”)
  5. commit 28d45c843cd0 (“[llvm][CallBrPrepare] use SSAUpdater to use intrinsic value”)
  6. commit 5cc1016a57b3 (“[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic”)
  7. commit a3a84c9e2511 (“[llvm] add CallBrPrepare pass to pipelines”)
  8. commit b1bc723dfe97 (“[Clang] refactor CodeGenFunction::EmitAsmStmt NFC”)
  9. commit 329ef60f3e21 (“[Clang] support for outputs along indirect edges of asm goto”)
  10. commit af6656338db3 (“[clang] fix -Wuninitialized for asm goto outputs on indirect edges.”)
  11. commit 54186d33c3a0 (“[clang] add __has_extension(gnu_asm_goto_with_outputs_full)”)

This closes out this feature (fingers crossed): [asm goto] support outputs along indirect branches · Issue #53562 · llvm/llvm-project · GitHub.