Return i1 in EFLAGS

Hi! :smiley:

I would like to teach LLVM to return i1 values in the EFLAGS register
instead of using some GPR or stack.

My original plan was to extend fastcc but it seems like fastcc doesn’t
support x86_64 and it is meant to be in sync with the gnu/ms/intel
compilers (?).

Should I create a new calling convention? I should then probably also
teach LLVM to use this conv and when? Or is there already a internal
LLVM-defined calling convention that is used to make calls fast when the
function is not externally visible?

The rest of my current plan I imagine is to teach X86ISelLowering.cpp to
notice i1 being returned and emit on the callee side something like:

mov ..., %rdi ; the i1 is in rdi
add %rdi, %rdi ; sets ZF to 0 if rdi is 0, 1 otherwise
ret

and on the caller side either:

call ...
jz ...
...

or

call ...
cmovz $1, %rdi
...

if the value is not immediately jumped on (maybe rare?).

The target usecase I have in mind is Zig error code returns which look
like returning a tagged union of error code and a value. The tag is
always i1 and is almost always immidiately jumped on to do error
handling so I imagine this optimisation could have a impact here.

Am I on the right track with this change?

BTW if this goes well I would like to extend this to also pass i1 into
function in EFLAGS and maybe even multiple at once but I should probably
start small :smiley:

Thanks for any help!

Prokop

I can’t help with your exact question, but note that adjacent cmp/test+jz/jnz (and many pairs) are fused into one uop in nearly all contemporary Intel/AMD cores. Moving the flag setting instruction into the callee would thus add an extra uop. Have you measured that the cmp/jmp pair is actually a performance problem? Assuming correct prediction, I would expect these to just execute in the shadow of other operations in an out-of-order core.

The only case there I could see this approach as beneficial is if the return value already uses all available registers.