[RFC] Syncing Asm Goto with Outputs with GCC

gwelymernans · September 22, 2022, 8:08pm

Clang’s extension of “asm goto” to support output constraints prompted GCC to
do the same. However, there are some differences between the two
implementations. In particular, GCC allows output constraints to be used on the
indirect branches, while Clang doesn’t.

GCC’s implementation of inline assembly is typically seen as the definitive
implementation. So I believe that we should sync our implementation to align
with GCC’s implementation.

The main difference between Clang’s and GCC’s implementations is how C labels
are handled. Clang allows them to be treated as data. GCC does to a certain
extent, but doesn’t guarantee that the value for a specific label will be the
same in all cases. This was the main reason why Clang chose not to support
output constraints on indirect branches.

Here’s a simple example.

$ cat asm-goto.c
int foo(int a)
{
	int b = 0;

	if (a)
		asm goto ("mov %l1, %0" : "=r" (b) : : : err);
	else
		asm goto ("mov %l1, %0" : "=r" (b) : : : err);

	return b;
err:
	return b;
}

GCC emits this:

$ gcc -o - -S asm-goto.c -O2
	.file	"asm-goto.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	testl	%edi, %edi
	je	.L2
#APP
# 6 "asm-goto.c" 1
	mov .L3, %eax
# 0 "" 2
#NO_APP
.L3:
	ret
	.p2align 4,,10
	.p2align 3
.L2:
#APP
# 8 "asm-goto.c" 1
	mov .L5, %eax
# 0 "" 2
#NO_APP
.L5:
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 12.2.0-1) 12.2.0"
	.section	.note.GNU-stack,"",@progbits

Note that even though the original “asm goto” statements refer to the same C
label, the values stored in “b” aren’t the same.

Implementation

Clang doesn’t have a restriction from using the output of an asm goto on the
indirect branch. I.e., there’s nothing that checks and warns about doing so.
The bulk of the implementation work will be in verifying that the code Clang
emits is correct for all situations. Given how asm goto is implemented in LLVM,
most changes will be in the back-end. (I know this is a bit hand-wavy, but it’s
difficult to estimate how much work it will take until implementation starts.)

-bw

efriedma-quic · September 22, 2022, 8:22pm

We’ve basically already moved away from the whole “labels as data” thing, in anticipation of this, and because blockaddresses are hard to optimize. clang doesn’t emit blockaddresses for asm goto anymore. Given we’ve already done that, supporting the gcc semantics is basically just implementation work; there aren’t any interesting semantic questions.

Overall, the proposal seems fine.

gwelymernans · September 22, 2022, 8:27pm

Completely agree. My naïve wish is that it will “just work” with only a few kinks to work out.

nickdesaulniers · September 22, 2022, 8:38pm

github.com/llvm/llvm-project

[asm goto] support outputs along indirect branches

opened 01:07AM - 03 Feb 22 UTC

josephcsible

enhancement clang:codegen

Consider this C code: ```c int main(void) { int x = 123; asm goto("m…ov %1, %0\n\tjmp %l[label]" : "=r" (x) : "r" (45) : : label); x = 6; label: return x; } ``` It's supposed to return 45. But when you compile it in Clang, it returns 123 instead. I tested this on both trunk (01b52f7) and 13.0.0, and both with -O3 and without optimizations, and the bug happens in all of those cases. https://godbolt.org/z/vKeYrfYY5

nickdesaulniers · December 6, 2022, 11:06pm

I’ve been working on this for a while now; I think I’ve come to the same conclusion as @rnk did years ago that a new landingpad-like instruction would be helpful.

Via lots of iteration, I think I now have this all coded up and working. I did write up an RFC to accompany the patches. I’m going to spend a little more time polishing the series, then will publish the patches and RFC soon.

nickdesaulniers · December 7, 2022, 7:28pm

PDF of my RFC [RFC] handling outputs along indirect edges of callbr - callbrpad.pdf (128.2 KB)

Patch series:

cc @rnk @jyknight @efriedma-quic @topperc @nikic @arsenm

nickdesaulniers · December 8, 2022, 8:48am

See also: ⚙ D139565 [IR] add new callbrpad instruction

nickdesaulniers · December 9, 2022, 8:06pm

Summarizing feedback from ⚙ D139565 [IR] add new callbrpad instruction as the new plan:

allow the use of callbr values along ANY edge; direct or indirect. i.e. restore ⚙ D135997 [Dominators] check indirect branches of callbr.
Create a new function pass (as part of TargetPassConfig::addISelPasses so it runs before isel on LLVM IR) that:
A. for each basic block terminator if it’s a callbr that has outputs that have uses:
a. splits indirect edges that happen to also be critical edges (if any) (akin to ⚙ D138078 [SelectionDAGISel] split critical indirect edges from callbr w/ outputs, but in a dedicated pass)
b. inserts a new SSA value produced from either a intrinsic call. This will be used a marker during SelectionDAGBuilder to denote where physreg to virtreg copies need to go. It need not reference the corresponding callbr since we split critical edges in 2.A.a previously, so it’s unambiguous that our corresponding callbr is the terminator of my lone predecessor.
c. Replaces the uses dominated by the indirect edge to use the newly created value. We might need to insert phis where previously there were none. llvm::SSAUpdater can help with this.
Add a mapping from CallBrInst to either INLINEASM_BR MachineInstr or first defined virtreg (not sure yet), probably in SelectionDAGBuilder’s FunctionLoweringInfo instance member. Create an entry in this mapping when lowering a callbr or inlineasm. Consume this mapping info when visiting the “new value” created from 2.A.b above to insert COPYs as necessary. (May need to handle visiting the use before the def).

Ideas for names for the intrinsic: callbrdef, terminatordef, callbrpad, callbr_landingpad.

nickdesaulniers · December 21, 2022, 10:23pm

With `amdgpu-isel` appears to be the name for SelectionDAGISel for some targets? · Issue #59538 · llvm/llvm-project · GitHub out of the way, I think the following patches are ready for review:

cc @efriedma-quic @gwelymernans @jyknight

nickdesaulniers · January 30, 2023, 11:42pm

[llvm][SelectionDAGBuilder] change callbr.landingpad intrinsic to accept explicit param

nickdesaulniers · February 6, 2023, 5:43pm

Updated list:

nickdesaulniers · February 16, 2023, 7:13pm

ok, everything has been reviewed and tested extensively on the Linux kernel. My next steps are:

squash #2 into #1
squash #11 into #5
squash #14 into #7

Then publish one final rebased+reformatted set, then merge tonight.

nickdesaulniers · February 17, 2023, 2:09am

Landed as:

commit 45a291b5f609 (“[Dominators] check indirect branches of callbr”)
commit fb471158aa0d (“[llvm] boilerplate for new callbrprepare codegen IR pass”)
commit 0a39af0eb72d (“[llvm][CallBrPrepare] split critical edges”)
commit 094190c2f529 (“[llvm][CallBrPrepare] add llvm.callbr.landingpad intrinsic”)
commit 28d45c843cd0 (“[llvm][CallBrPrepare] use SSAUpdater to use intrinsic value”)
commit 5cc1016a57b3 (“[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic”)
commit a3a84c9e2511 (“[llvm] add CallBrPrepare pass to pipelines”)
commit b1bc723dfe97 (“[Clang] refactor CodeGenFunction::EmitAsmStmt NFC”)
commit 329ef60f3e21 (“[Clang] support for outputs along indirect edges of asm goto”)
commit af6656338db3 (“[clang] fix -Wuninitialized for asm goto outputs on indirect edges.”)
commit 54186d33c3a0 (“[clang] add __has_extension(gnu_asm_goto_with_outputs_full)”)

This closes out this feature (fingers crossed): [asm goto] support outputs along indirect branches · Issue #53562 · llvm/llvm-project · GitHub.

Topic		Replies	Views
[RFC] ASM Goto With Output Constraints LLVM Dev List Archives	1	144	June 27, 2019
goto in inline asm LLVM Dev List Archives	1	103	January 21, 2011
[RFC] MS-style inline assembly LLVM Dev List Archives	7	129	August 14, 2012
inline asm constraints examples/tests LLVM Dev List Archives	12	254	September 22, 2010
RFC: New mechanism for hard register operands to inline asm Clang Frontend	0	83	June 29, 2021

[RFC] Syncing Asm Goto with Outputs with GCC

Implementation

Related topics