Moving towards a singular pointer type

Hi there,

Sorry, I don't have the thread history to reply to since I normally
read llvmdev through the archives, but wanted to give my .02 of
feedback anyway.

As far as I understand, this change is wanted because the LLVM
infrastructure derives no value from knowing the types, and there's a
cost in terms of code spent to support all of it. I've been creating a
frontend that mostly writes out plain-text IR.

Is the type-checking of pointer types still done when constructing IR
through builder APIs? Otherwise this makes debugging significantly
harder for me. I've also really liked how readable LLVM IR is, and it
would seem like this change would negatively affect the readability.

I have no qualms with not propagating any of the pointer type data to
lower-level LLVM APIs and from the discussion, it seems like I'll
still be allowed to have the types in there for backwards
compatibility, but I wonder if it wouldn't be worthwhile to also keep
some of the type-checking in the verification code, and only ditch all
the type information (or, just not consider it) after that part.

Cheers,

Dirkjan

Hi,

Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:

As far as I understand, this change is wanted because the LLVM
infrastructure derives no value from knowing the types, and there's a
cost in terms of code spent to support all of it. I've been creating a
frontend that mostly writes out plain-text IR.

Is the type-checking of pointer types still done when constructing IR
through builder APIs? Otherwise this makes debugging significantly
harder for me. I've also really liked how readable LLVM IR is, and it
would seem like this change would negatively affect the readability.

I kindof agree with Dirkjan. We use a similar approach (Numba calls into
llvmlite to generate textual LLVM IR), and the type checking can come in
handy to avoid later crashes (otherwise it's very easy to mess up a
getelementptr instruction).

However, llvmlite does track types on its own, so we could also add our
own error checking before generating the IR.

Regards

Antoine.

Hi,

Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:
>
> As far as I understand, this change is wanted because the LLVM
> infrastructure derives no value from knowing the types, and there's a
> cost in terms of code spent to support all of it. I've been creating a
> frontend that mostly writes out plain-text IR.
>
> Is the type-checking of pointer types still done when constructing IR
> through builder APIs? Otherwise this makes debugging significantly
> harder for me. I've also really liked how readable LLVM IR is, and it
> would seem like this change would negatively affect the readability.

I kindof agree with Dirkjan. We use a similar approach (Numba calls into
llvmlite to generate textual LLVM IR), and the type checking can come in
handy to avoid later crashes (otherwise it's very easy to mess up a
getelementptr instruction).

However, llvmlite does track types on its own, so we could also add our
own error checking before generating the IR.

This change would result in decrreased readibility and a higher chance of
introducing bugs.

I don't think the builder will necessarily be able to do all the required
type-checking for free. Some of it won't be available even at builder-time
(most obvious example would be a parameter register would have no type
information to check against when performing a load - in other cases (like
an alloca to a load) we could probably have some special cases to consult
the type information in the source instruction & check that against the
explicit type in the load - of course then we have the problem of how to
whitelist the cases where you really intended to convert. In those cases we
could have a builder function that tracks that information in a side table
or synthetic instruction perhaps - maybe even a real no-op instruction we
can emit in debug builds just to make the IR more readable/verifiable)

But I don't think that's not high on my priority list unless there's a
pretty strong desire for it - just ideas of how it could be addressed.

- David

I kind of agree with that sentiment. I has various bugs in my projects catched by typechecking on pointer in the past, so this change will come at a real cost.

As I understand it, the main adantage of that change is to get rid of various bitcasts. Would that be possible to have a ptr_bitcast flag in memory operation, indicating that the pointer type needs to be ignored ?

I kind of agree with that sentiment. I has various bugs in my projects
catched by typechecking on pointer in the past, so this change will come at
a real cost.

As I understand it, the main adantage of that change is to get rid of
various bitcasts. Would that be possible to have a ptr_bitcast flag in
memory operation, indicating that the pointer type needs to be ignored ?

You mean leave pointer types in the IR but flag operations (like geps and
loads) to indicate that they shouldn't care about the type of their
operands? That'd still not get us the canonicalization benefits (store and
gep would still produce a type that depends on its operand, etc).

I think if we want the extra safety we could, after introducing the ptr
type, add a no-op cast (like bitcast) that just documents the change in
type so that, say, store to load could be verified. This still wouldn't
provide the current convenience of pointee types in the LLVM IR APIs, but
it /could/ be retrieved with some effort (oh, the load instruction says
load an int, the ptr operand is a store instruction of... an int, etc).

- David

The problem I have with all of this is that it is putting more semantic
information on the pointee type than is actually there.

At a very fundamental level, memory is *not typed* in LLVM. As such,
casting between pointer types isn't actually a bug in a large number of
real-world cases. By providing a pointee type we encourage frontend authors
to rely on this for their type management and I think that is not a sound
practice in general. It may be convenient but it doesn't bear out in
practice long term.

There are a host of other ways in which the LLVM IR is more suited to what
the optimizer wants than what a frontend might find convenient. This is but
one of them. I don't think that the advantages of convenience afforded to
frontends really justify keeping the substantial cost and complexity in the
LLVM IR and optimizer.

My 2 cents.
-Chandler