confused about float literals

I assumed that C floats are 32 bits and doubles 64 bits ... but

This code

int main(){
  float f;
  double f1;
  f = 3.145;
  f1 = 3.145;
  return(0);
}

Compiles (via clang) to:

  ; ModuleID = 'test101.c'
  target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
  target triple = "i386-pc-linux-gnu"

define i32 @main() nounwind {
  %1 = alloca i32, align 4 ; <i32*> [#uses=3]
  %f = alloca float, align 4 ; <float*> [#uses=1]
  %f1 = alloca double, align 8 ; <double*> [#uses=1]
  store i32 0, i32* %1
  store float 0x400928F5C0000000, float* %f
  store double 3.145000e+00, double* %f1
  store i32 0, i32* %1
  %2 = load i32* %1 ; <i32> [#uses=1]
  ret i32 %2
}

This is very strange - what is 0x400928F5C0000000 ???
the 32 bit hex representation of 3.145 is 0x404947ae, and,
the 64 bit hex representation of 3.145 is 0x400928f5c28f5c29

[as calculated by http://babbage.cs.qc.edu/IEEE-754/Decimal.html
and my own independent program]

But the given literal is neither of these. Are floats 32 bits? and if so why is
the literal value represented in 64 bits and why are the low order
bits all zero?

What's going on here?

Secondly, I was smitten by the behavior of the llvm-as

This code:

  %f = alloca float, align 4
  store float 1.25, float* %f

Assembles correctly:

But a minor edit to this:

  %f = alloca float, align 4
  store float 1.251, float* %f

Does not - but gives the error message

     error: floating point constant invalid for type
     store float 1.251, float* %f

This violates the principle of least astonishment - and yes I know the
manual says it *should* do this - but it's not exactly helpful.

If I see a floating point literal in my C code (like 1.25) I'd rather
like to see the same floating point literal in the assembler, and
not a (mysterious) 64 bit hex literal.

Cheers

/Joe

For murky historical reasons, float literals are represented as
if they were doubles, but with the precision of a float. It does
not change their semantics.

As for your other question, I don't know that there's a good reason
that the parser isn't more accommodating.

John.

Compiles (via clang) to:

; ModuleID = 'test101.c'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i386-pc-linux-gnu"

define i32 @main() nounwind {
%1 = alloca i32, align 4 ; <i32*> [#uses=3]
%f = alloca float, align 4 ; <float*> [#uses=1]
%f1 = alloca double, align 8 ; <double*> [#uses=1]
store i32 0, i32* %1
store float 0x400928F5C0000000, float* %f
store double 3.145000e+00, double* %f1
store i32 0, i32* %1
%2 = load i32* %1 ; <i32> [#uses=1]
ret i32 %2
}

This is very strange - what is 0x400928F5C0000000 ???
the 32 bit hex representation of 3.145 is 0x404947ae, and,
the 64 bit hex representation of 3.145 is 0x400928f5c28f5c29

For murky historical reasons, float literals are represented as
if they were doubles, but with the precision of a float. It does
not change their semantics.

But what is 0x400928F5C0000000? it is NOT the 64 bit representation
with the low order
32 bits zeroed - it is the 62 bit representation with the low order 28
bits zeroed.

Why 28 bits? - 32 bits might be understandable but not 28.

If I manually edit the hex constant to the exact 64 bit representation
then I get an error. for example

store float 0x400928F5C28F5C29, float* %f <= best possible representation
    llvm-as: bug2.s:11:15: error: floating point constant invalid for type

so the "as if they were doubles" bit seems wrong -

try clearing 24 bits

  store float 0x400928F5C2000000, float* %f <= low order 24 bits cleared
    llvm-as: bug2.s:12:18: error: floating point constant invalid for type

clear 28 bits

   store float 0x400928F5C0000000, float* %f <= low order 28 bits cleared
     no error

or 32 bits

   store float 0x400928F500000000, float* %f <= low order 32 bits cleared
     no error

So the Hex constant seems to be obtained by taking the 64 bit value
and clearing the
low order 28 bits this seems bit arbitrary to me - something smells a
bit fishy here

/Joe

The exact bit representation of the float is laid out with the
corresponding bitwise representation of a double: the sign
bit is copied over, the exponent is encoded in the larger width,
and the 23 bits of significand fills in the top 23 bits of significand
in the double. A double has 52 bits of significand, so this means
that the last 29 bits of significand will always be ignored. As an
error-detection measure, the IR parser requires them to be zero.

I don't think this is a great representation, and I think we'd be
open to someone changing the IR to at least use 32-bit hex
float literals. There are a lot of tests which would have
to be updated, but that could be done with a perl script.

John.

I bet the low 29 bits are zeroed. That is the difference between the 52 bit double significand and the 23 bit float significand.

This is what you get if 3.145 is first converted to a float and then to a double.

/jakob

This seems to be empirically true. So to convert a float32 I zero the low order
29 bits in the 64 bit equivalent representation and render the result in hex.

So right now the documentation and the code are in disagreement :slight_smile:

/Joe