Altering the return address , for a function with multiple return paths

Playing around with calling conventions naked functions and epilogue/prologue…
Is it possible/expressible/feasible to alter the return address the function will return to?

For example, when a function may return an Int8 or a Float64, depending on some external state
(user, or random variable), instead of checking the returned type in the calling function, is it possible
to pass 2 potential return addresses one suitable for Int8 and one suitable for Float64 and let the function return to the right place?

if it is possible, what are the implications? do these inhibit the optimization opportunities somehow?

one (non-LLVM) problem you will run into is that almost all processors
are optimized to have functions return to the instruction right after
the instruction that called them.

The most common method is to predict where the return instruction will
jump to by using a processor-internal stack of return addresses, which
is separate from the in-memory call stack. This enables the processor
to fetch, decode, and execute instructions following (in program
order) the return instruction before the processor knows for sure what
address the return instruction will branch to. If the return address
turns out to be different than the processor predicted, it has to
throw out all the instructions it started executing that it thought
came after the return, causing massive slow-downs.

For an interesting application of changing the return address, lookup
retpolines.

Hy Jay,

This trick can certainly be used by someone coding in assembly language directly, but I do not think this is possible for a compiler to do so. High level language functions are supposed to have a single entry point and a single return address to the instruction just next to the call. Virtually all high level languages and their compilers are designed according to these semantics and processors are optimized for that too. Inside the callee, the compiler may optimise the actual placement of the return code or it may repeat code to avoid branching, the compiler may also perform tail call optimisations that modify the standard return procedure, but the proper epilog code will effectively be executed in all cases with identical return value and execution transfer to the same return address.

In order for a compiler to implement what you suggest, I think that some explicit semantics would have to be incorporated to the high level languages being compiled. Currently, in order to declare a function to return a Float64 or an Int8 depending on external conditions, the user must either use function overloads, or function templates, or closures (on languages supporting them). In all these cases, the user must either explicitly declare a function for every type, or the compiler may generate a separate function for every type use case. So in reality the case where a single function may return multiple types does not happen. My point is that since in high level languages there’s no way to specify multiple return types for the same function, there’s no real use case where the compiler may want to do so. Unless I misunderstood your question.

Joan

In high-level languages, returning optionally different types of returned value will is usually handled with
a tagged union and a switch statement in the caller.

My intention is to skip this by giving the callee two different addresses to return to depending on what it did with the input.

for high-level jitted languages, this can simplify the “type inference” pass.

Another question on the topic. If I manage the stack myself somehow and replace ret with inline assembly jmp , will
the processor be able to prefetch instructions beyond the jmp?

Hi Tsur

In high-level languages, returning optionally different types of returned value will is usually handled with
a tagged union and a switch statement in the caller.

This is not really returning different types, as you are returning a value which is of the union type. The function is still declared as returning an union, not a dynamic type. Union member variables are reinterpretations of the same memory location (or register content). They tend to be very efficient when you want to look at the bare bottom of the bit representation, but they are generally discouraged in object oriented languages unless you really need to go down to the actual memory representation for some reason.

My intention is to skip this by giving the callee two different addresses to return to depending on what it did with the input.

As I said, I think this would require a redefinition of the language in order to be able to specify these two return addresses. I can’t really imagine that, but It occurs to me that you should be able to achieve a similar goal, which is ultimately avoiding switch statements and tagged objects, by a proper use of class inheritance. Think of a base class and several sub-classes, each subclass deals with a particular type. From the point of view of object memory usage it’s almost the same as an union, because you will only use one object instance at any given time and object member variables start at the beginning of the object memory, so they are taking the same memory as if they were an union.

for high-level jitted languages, this can simplify the “type inference” pass.

This still requires the compiler to know the type of an object at runtime, which is a problem that class object instances solve. In the case of unions, the inferred type will be always an union, the compiler is unable to determine at runtime the member type you want to use by looking at the tag that you may have provided. It just doesn’t work like that, if I understand what you attempt to do.

Another question on the topic. If I manage the stack myself somehow and replace ret with inline assembly jmp , will
the processor be able to prefetch instructions beyond the jmp?

I am not fully qualified to respond to this question as I’m not that versed on processor working internals. I think that processors are able to prefetch instructions that will be executed after a non-conditional jump, but I am unsure about that. In any case, if you replace ‘ret’ instructions by ‘jmp’ you must still generate the proper epilog code to restore any modified registers and make sure that the stack pointer or frame pointer point to the caller stack frame.

I think that it would be useful if you give some context about why you actually need this feature. It still looks to me as something that could be defined for a (possible) new language, not something that the LLVM compiler is able to take advantage of for existing languages like C or C++

Joan

Yes, indeed!

The SBCL lisp compiler (not llvm based) used to emit functions which would return either via ret to the usual instruction after the call, or else load the return-address from the stack, then jump 2 bytes later (which would skip over either a nop or a short jmp at original target location). Which one it used depended upon whether the function was doing a multi-valued return (in which case it used ret) or a single-valued return (in which case it did the jmp retpc+2).

While this seems like a clever and efficient hack, it actually has an absolutely awful effect on performance, due to the unpaired call vs return, and the unexpected return address.

SBCL stopped doing this in 2006, a decade later than it should’ve – the Pentium1 MMX from 1997 already had a hardware return stack which made this a really bad idea!

What it does now is have the called function set or clear the carry flag (using STC and CLC) immediately before the return. If the caller cares, then the caller emits JNC as the first instruction after the call. (but callers typically do not care – most calls only consume a single value, and any extra return-values are silently ignored).

On Swift, we’ve occasionally considered whether it would be useful to be
able to return values in flags. For example, you could imagine returning
a trinary comparison result on x86_64 based on whether ZF and CF are set.
A function which compares two pairs of unsigned numbers could be compiled
to something like:

  cmpq %rdi, %rdx
  jz end
  cmpq %rsi, %rcx
end:
  ret

And the caller can switch over the values just by testing the flags.

The main problem is that this is really elegant if you have an
instruction that sets the flags exactly right and really terrible
if you don’t. For example, if we want this function to compare two
pairs of signed numbers, we need to move OF to CF without disturbing
ZF, which I don’t think is possible without some really ugly
instruction sequences. (Or we could add 0x8000_0000_0000_0000 to both
operands before the comparison, but that’s terrible in its own right.)

That problem isn’t as bad if it’s just a single boolean in ZF or CF, but
it’s still not great, at least on x86.

Now, specialized purposes like SBCL’s can definitely still benefit from
being able to return in a flag. If LLVM had had the ability to return
values in flags, we might’ve used it in Swift’s coroutines ABI, where
(similar to SBCL) any particular return site does know exactly which
value it wants to return. So it’d be nice if someone was interested in
adding it.

But we did ultimately decide that it wasn’t even worth prototyping it
for the generic Swift CC.

John.

We've also got some cases where returning a value in a flag might be useful. Our typical use case is we have a "rare, but not *that* rare* slowpath which sometimes needs to run after a call from a runtime function. Our other compiler(s) - which use hand rolled assembly for all of these bits - return the "take-rare" bit in ZF, and branch on that after the call. For our LLVM based system, we just materialize the value into $rax and branch on that. That naive scheme has been surprisingly not bad performance wise.

* The "not *that* rare" part is needed to avoid having exceptional unwinding be the right answer.

If we were to support something like this, you'd really want to be able to define individual flags in the callee's calling convention clobber/preserve lists. It's really common to have a helper routine which sets say ZF, but leaves others unchanged. Or to have a function which sets ZF, clobbers OF, and preserves all others. But if we were going to do that, we'd quickly realize that the x86 backend doesn't track individual flags at all, and thus conclude it probably wasn't worth it begin with. :slight_smile:

Philip

I’m intrigued what functions would usefully preserve flags. That’s quite difficult on x86, since it means you have to spill flags if you do any basic arithmetic at all (unless you do it all with lea). Maybe on a target like PPC with really fleshed-out support for flags and conditionality?

John.

My use case would be hand written assembly stubs for particular fastpath operations. These aren't compiler generated, and the hour of human time to line of code ratio is quite high. Having said that, while I have practical examples where returning flags have been used, I don't, off the top of my head, have a concrete example of needing to preserve flags (except DF). So, take that part more as a "if we're there anyways, the ideal would be...". On a practical level, return via flags with full clobber (except DF) would probably be more than sufficient.

* On every x86 abi I know of, the direction flag (DF) is assumed to be preserved through a call. Messing with it and forgetting to reset it results in "interesting and unpleasant" bugs.

Philip

p.s. Has anyone played with lowering a switch via sahf and a series of jccs to dispatch? It occurs to me that given all the combo variants of jccs, you could probably lower many switches on 4 bit values this way. Purely asking as a fun thought experiment, nothing more.

Ah, assembly stubs make perfect sense.

John.