[Sparc] vararg double issue on 32 bit Sparc processors

Hi,

I’ve discovered a problem on Sparc processors (specifically, LEON, but I suspect but can’t verify that it also happens on all Sparc processors).

The problem is, or appears to be with using double values in Sparc (32 bit).

Specifically, double values are not being loaded into registers correctly within a function using va_args. Only half the value is loaded (i.e. 32, rather than 64 bits of the value).

What I have also found is that the failure does not happen in all circumstances. The simplest situation I’ve been able to re-create that avoids the problem is to put a store and subsequent load instruction around the va_arg instruction that deals with the double value. e.g. (in an .ll file):

This code fragment does not work:

define void @foo(i32 %v, i8* %ap) local_unnamed_addr {
entry:
%ap.addr = alloca i8*, align 4
store i8* %ap, i8** %ap.addr, align 4
%0 = va_arg i8** %ap.addr, i64
%conv = trunc i64 %0 to i32
%1 = va_arg i8** %ap.addr, double

Whereas this nearly identical code fragment, wrapping the store and load around the “double” va_arg instruction does work:

define void @foo(i32 %v, i8* %ap) local_unnamed_addr {
entry:
%ap.addr = alloca i8*, align 4
store i8* %ap, i8** %ap.addr, align 4
%0 = va_arg i8** %ap.addr, i64
%conv = trunc i64 %0 to i32
store i32 %conv, i32* @foo_arg, align 4
%1 = va_arg i8** %ap.addr, double
%2 = load i32, i32* @foo_arg, align 4

I had been attempting to make various changes to SparcISelLowering.cpp to try to simulate something similar where the code is output, but I don’t feel as though I’m heading in the right direction. I’m still not sure quite where the source of the problem lies.

I can provide more details on specifics, but rather than head off into excessive details immediately, I’d appreciate if anyone can help me identify what direction I really should be taking to fix this problem. I’m not convinced I’ve been going about it the right way so far.

Chris Dewhurst, Lero, University of Limerick.

Are you using variadic functions on the LLVM or the Clang side? The
former is essentially best effort and many things simply are not
supported.

Joerg

Hi Chris,

Can I bring this in a bit closer to Leon - is this specifically to do with 64-bit FP transactions with memory? The reason I ask is that I had similar problems adapting Sparc Leon 3 and 4 which are V8 Sparc, but with 64-bit memory bus transactions with the GCC compiler. It “feels” like the same problem, but LLVM versus GCC. In the GCC case, the 64-bit memory transaction had the register endian ordering the wrong way round; perhaps it is something similar in LLVM?

Are you doing this with “sparc” or “sparcel”?

Thanks, MartinO

Hi Chris,

Can I bring this in a bit closer to Leon - is this specifically to do with 64-bit FP transactions with memory? The reason I ask is that I had similar problems adapting Sparc Leon 3 and 4 which are V8 Sparc, but with 64-bit memory bus transactions with the GCC compiler. It “feels” like the same problem, but LLVM versus GCC. In the GCC case, the 64-bit memory transaction had the register endian ordering the wrong way round; perhaps it is something similar in LLVM?

Are you doing this with “sparc” or “sparcel”?

Thanks, MartinO

With my GCC implementation for this, it was the 64-bit ‘ldd’ and ‘std’ handling that I had to change. It was working ok for big-endian, but since we use a little-endian Leon4 the usual implementation was getting the two FP registers loaded the wrong way round, but it wasn’t zeroing one of them. The change I made was just to ensure that the endian ordering was correct (ditto 64-bit int).

If you try hand-written assembly using ‘ldd’ what happens? I’m wondering if there is a hardware issue failing to perform the 64-bit load. I’m assuming that you are using a proprietary development board for this, and it is possible that the bus has not been configured for the 64-bit load store extensions. Just a thought.

MartinO

A complete test-case which can be given to llc to produce incorrect assembly output would be a good start. My initial attempt to reproduce by making a complete function from your example did generate code to load both halves.