design questions

Hey, I'm curious about two of the design questions in LLVM after
reading the language reference. If there's preexisting material
explaining this, just point me at it (looked and couldn't find any).

  - Why unsigned types rather than signed-only like Java/JVM? If I
    understand correctly, their behavior is only distinguishable in
    overflow situations, and the availability of hardware-assisted
    overflow detection varies quite a bit across platforms.

    In other words, abandoning overflow detection makes the
    duplication of types redundant, while requiring it would be a
    great burden on CPUs that don't have overflow exception hardware.

  - Why identify functions by their type signatures? I have to assume
    that this is meant to allow overloading, but overloading is
    generally a higher-level concept (at the "language-aware" level).

    Chances are quite good that the sets of arguments to two different
    overloads of a function would wind up mapping down to the same
    LLVM-level types (since type information is partially lost in the
    transition). For example, the types for polar coordinates and
    cartesian coordinates are both {float,float}. So most languages
    will need to mangle symbols anyways.

    Actually, having an equal-by-name (in addition to the
    equal-by-structure array/struct type operators) type in LLVM would
    let compilers encode the distinction. But it might complicate
    linking conventions.

Thanks! Overall I was hugely impressed. I'd had my attention focused
on architecture-independant code generation libraries for a while, but
being able to specify an IR outside of the language used to manipulate
it opens up way more possibilities. Neat stuff.

  - a

Hey, I'm curious about two of the design questions in LLVM after
reading the language reference. If there's preexisting material
explaining this, just point me at it (looked and couldn't find any).

  - Why unsigned types rather than signed-only like Java/JVM? If I
    understand correctly, their behavior is only distinguishable in
    overflow situations, and the availability of hardware-assisted
    overflow detection varies quite a bit across platforms.

    In other words, abandoning overflow detection makes the
    duplication of types redundant, while requiring it would be a
    great burden on CPUs that don't have overflow exception hardware.

Yes, you're right. This has been a desired change for quite some time
now. Unfortunately, its a huge impact to nearly every part of LLVM. We
will probably do it around the 2.0 time frame when we can afford to
break bytecode compatibility and generally clean up a lot of other
things as well.

  - Why identify functions by their type signatures? I have to assume
    that this is meant to allow overloading, but overloading is
    generally a higher-level concept (at the "language-aware" level).

As with all other things in LLVM, values are partitioned by their type,
not by their name. That is, identity is determined by type (structure)
equivalence. In order to distinguish functions we must type them. A by
product of this is that we get overloading for free, but its not the
main concern.

    Chances are quite good that the sets of arguments to two different
    overloads of a function would wind up mapping down to the same
    LLVM-level types (since type information is partially lost in the
    transition). For example, the types for polar coordinates and
    cartesian coordinates are both {float,float}. So most languages
    will need to mangle symbols anyways.

Remember that LLVM is "low level". How a higher order language decides
to deal with ambiguity in its runtime library is up to it. LLVM just
provides the capability to express what the higher level language needs.

    Actually, having an equal-by-name (in addition to the
    equal-by-structure array/struct type operators) type in LLVM would
    let compilers encode the distinction. But it might complicate
    linking conventions.

We've thought about this and it gets debated from time to time. I think
I'll let Chris answer it, hwoever.

Thanks! Overall I was hugely impressed. I'd had my attention focused
on architecture-independant code generation libraries for a while, but
being able to specify an IR outside of the language used to manipulate
it opens up way more possibilities. Neat stuff.

Yup! Glad you like it :slight_smile:

Reid.

Hey, I'm curious about two of the design questions in LLVM after
reading the language reference. If there's preexisting material
explaining this, just point me at it (looked and couldn't find any).

No problem. :slight_smile:

- Why unsigned types rather than signed-only like Java/JVM? If I
   understand correctly, their behavior is only distinguishable in
   overflow situations, and the availability of hardware-assisted
   overflow detection varies quite a bit across platforms.

In this case, it's not about overflow detection. Some operators behave differently on signed vs unsigned data (e.g. division, remainder, <, >, etc). Over time, I would like to slowly move to a situation where LLVM moves the signed distinction from the type-system to the operators (e.g. we would only have i1/i8/i16/i32/i64, but would get SMOD vs UMOD).

- Why identify functions by their type signatures? I have to assume
   that this is meant to allow overloading, but overloading is
   generally a higher-level concept (at the "language-aware" level).

You're right.

   Chances are quite good that the sets of arguments to two different
   overloads of a function would wind up mapping down to the same
   LLVM-level types (since type information is partially lost in the
   transition). For example, the types for polar coordinates and
   cartesian coordinates are both {float,float}. So most languages
   will need to mangle symbols anyways.

You're right.

   Actually, having an equal-by-name (in addition to the
   equal-by-structure array/struct type operators) type in LLVM would
   let compilers encode the distinction. But it might complicate
   linking conventions.

Yup, you're absolutely right :slight_smile: This is something that exists due to historical reasons. It is another minor thing that we will be moving away from in time.

Thanks! Overall I was hugely impressed. I'd had my attention focused
on architecture-independant code generation libraries for a while, but
being able to specify an IR outside of the language used to manipulate
it opens up way more possibilities. Neat stuff.

Great! :slight_smile:

-Chris