In the language I am working on, there are cases where the call flow graph can affect the type of a variable. A simple example is a "nullable" type (a pointer which is allowed to be NULL), which can be converted into a non-nullable type via an if-test. So for example if x is nullable, and I test x != NULL, then inside the conditional block the type of x is no longer nullable. Nullable types behave slightly differently (and produce less efficient code) than non-nullable types. For example, a downcast to a nullable type is a dynamic cast (because if the cast fails, the result can be NULL), whereas a downcast to a non-nullable type throws an exception if the cast fails.
This kind of analysis is trivial given LLVM's architecture. The problem I have, however, is that all of the high-level type analysis occurs before LLVM ever gets into the picture. LLVM doesn't know anything about the front-end type system.
What I am wondering is, will I have to re-invent the same sort of CFG analysis that LLVM does in my frontend, or is there some shortcut? I guess this is really a general compiler-design question rather than one specific to LLVM, but I thought I would ask anyway.
-- Talin
I'm not completely sure what you're trying to do here, but it seems
like you want some special statement that essentially introduces a new
variable with the non-nullable type for a child block. Something like
ifnonnull P = getPersonForName("Talin") { code that references P },
for a language with C-like syntax.
It doesn't make any sense to do this sort of thing at the LLVM level;
I doubt you really want optimizers or flow-sensitive analysis messing
with the semantics of programs in your language.
-Eli
In the language I am working on, there are cases where the call flow graph can affect the type of a variable. A simple example is a "nullable" type (a pointer which is allowed to be NULL), which can be converted into a non-nullable type via an if-test. So for example if x is nullable, and I test x != NULL, then inside the conditional block the type of x is no longer nullable. Nullable types behave slightly differently (and produce less efficient code) than non-nullable types. For example, a downcast to a nullable type is a dynamic cast (because if the cast fails, the result can be NULL), whereas a downcast to a non-nullable type throws an exception if the cast fails.
This kind of analysis is trivial given LLVM's architecture. The problem I have, however, is that all of the high-level type analysis occurs before LLVM ever gets into the picture. LLVM doesn't know anything about the front-end type system.
What I am wondering is, will I have to re-invent the same sort of CFG analysis that LLVM does in my frontend, or is there some shortcut? I guess this is really a general compiler-design question rather than one specific to LLVM, but I thought I would ask anyway.
I have to agree with earlier poster, I'm not sure altering the type of values is tractable; this seems better as a diagnostic analysis.
A way to leverage the type system here might be with 'match' statement like ml's. Consider:
type maybe 'a = None | Some 'a
let maybe_add x y =
match x, y with
> Some x, Some y -> x + y
> Some n, None | None, Some n -> n
> None, None -> None
Here, the expressions like 'Some x' and 'None' are patterns, which match different variants of the 'maybe int' type. 'Some x' and 'Some y' additionally bind matched subexpressions to names. For the first pattern, I chose to shadow the nullable parameters with the non-nullable matched subexpressions.
More to your point, however, the analysis you're proposing seems strongly something that should be done in your language's IR prior to LLVM lowering. Several of LLVM's graph algorithms are templates, so you could possibly specialize a traits type for your compiler's parse tree/AST/IR and potentially leverage some of LLVM's code.
— Gordon
At the LLVM level, this is already what Talin wants to do.
The only difference is that Talin's code would need to rename the
variables inside the 'then' block in the frontend:
if x != NULL then
x1 = cast-to-non-nullable-type (x)
... continue with x1 and know it can't be NULL
endif
OCaml's match is a shorthand for the comparison-plus-cast combination.
IIRC there's an OCaml compiler that uses LLVM; Talin should be able to
reuse the technique employed there.
HTH
Jo

This reminds me of people that want to use CFG and the optimizer to make:
int i; int() { if (i) return 1; else return 0; }
not warn/error that flow falls off the end. Extend that out and you have to do arbitrarily hard things and the document the limit of what was done, which is messy and wrong, as the next person will come along and say, but I want this one obvious little case to work as well.
If you want to go down that path, you can, but we're gonna wave you off. In the end, I suspect you have to replicate all the code in the front end as it is part of the spec, not an `optimization'.
I'm not sure what your point is here Mike. That is clearly an important thing to avoid warning for. You can argue about whether the optimizer or front-end should be generating the warning, but any compiler that emits a false positive for that would be pretty useless in practice.
-Chris
In the language I am working on, there are cases where the call flow
graph can affect the type of a variable.
Ok.
A simple example is a
"nullable" type (a pointer which is allowed to be NULL), which can be
converted into a non-nullable type via an if-test. So for example if x
is nullable, and I test x != NULL, then inside the conditional block the
type of x is no longer nullable.
Makes sense.
Nullable types behave slightly
differently (and produce less efficient code) than non-nullable types.
For example, a downcast to a nullable type is a dynamic cast (because if
the cast fails, the result can be NULL), whereas a downcast to a
non-nullable type throws an exception if the cast fails.
Sure.
This kind of analysis is trivial given LLVM's architecture. The problem
I have, however, is that all of the high-level type analysis occurs
before LLVM ever gets into the picture. LLVM doesn't know anything about
the front-end type system.
Why not codegen this logically as a pair of <bool, T> where each is an LLVM Value*? This would allow the optimizer to propagate around and eliminate the bool.
What I am wondering is, will I have to re-invent the same sort of CFG
analysis that LLVM does in my frontend, or is there some shortcut? I
guess this is really a general compiler-design question rather than one
specific to LLVM, but I thought I would ask anyway.
This issue is fairly straight-forward I think. More general problems are things like auto-unboxing of types and doing type inference of source level languages. This can also be done pretty easily in the LLVM IR, if you have questions about that I can explain.
-Chris
Thanks (everyone) for the suggestions, they were helpful.
What I decided to do was to essentially "wrap" LLVM's BasicBlock structure with my own block class that contains additional high-level information. Up to this point I was generating LLVM basic blocks on the fly as I walked the statement tree; Now I build my own BB graph, and then use that to generate the BB graph for LLVM.
This allows me to do a couple of things. For one thing, its a lot easier to traverse the BB graph than the statement tree, and its more amenable to manipulation (especially when taking break/continue and other jumps into account.) For one thing, implementing Python-style "yield" statement (which is a transformation of the graph) will be easier with this data structure.
In the case of the nullable types, I don't need to store full use/def chains, since all I want to know is "given variable X, is there any chance that X could be null at this point?" If there are multiple definitions of X reaching a particular use of X, then the value could be null if any of the reaching definitions could be null. (In the face of incomplete knowledge, assume the worst.) So in other words, the "nullable" bits are merely or'd together. I don't need to deep dive into subroutines or expressions since "nullable" is propagated upward by the type system and frozen at call boundaries. Yes, there will be some false positives, but a few extra tests for pointer validity isn't going to hurt all that much.
So essentially I just have to store at the beginning of each BB the set of variables which could be null, and do the typical intra-block analysis.
The overall motivation for this, BTW, is my attempt to prove a personal theory, which is that most static type systems are guarding against the wrong things. Type mismatch errors don't keep programmers up at night; null pointers, deadlocks and race conditions do. So my language tries to check for null pointer errors at compile time using static types.
-- Talin
Chris Lattner wrote:
>> In the language I am working on, there are cases where the call flow
>> graph can affect the type of a variable.
>
> 
>
> This reminds me of people that want to use CFG and the optimizer to
> make:
>
> int i; int() { if (i) return 1; else return 0; }
>
> not warn/error that flow falls off the end.
Java does exactly this.
> Extend that out and you have to do arbitrarily hard things and the
> document the limit of what was done, which is messy and wrong
Correct in general, but not an issue if you draw a clear line.
In Java's case, the line is that the inference uses just structural
control flow information.
This is clearly spelled out in the language reference.
[...] any
compiler that emits a false positive for that would be pretty useless
in practice.
I'd agree that such a language is more restrictive than necessary, but I
wouldn't say it is useless.
You can always rewrite examples like the above as
int result;
if (i) result = 1; else result = 0;
return result;
If the resulting ugliness is a factor, the computation can be moved to a
function and we get
return int_to_01 (i)
which even gives us an opportunity to name that piece of code.
(Returns that are buried in multiple levels of nesting aren't too
beautiful either...)
Regards,
Jo
Re inference for null pointers, Haskell offers very similar facilities
with its type inference and the Maybe typeclass.
(I mention Haskell because its syntax and type system are particularly
simple; most other statically-typed functional programming languages
essentially do the same thing.)
I'm not sure about deadlocks and race conditions, but I heard the
Haskell community has been working on idioms for these as well.
Hope this helps,
Jo
Talin wrote:
The overall motivation for this, BTW, is my attempt to prove a personal theory, which is that most static type systems are guarding against the wrong things. Type mismatch errors don't keep programmers up at night; null pointers, deadlocks and race conditions do. So my language tries to check for null pointer errors at compile time using static types.
In addition to Haskell that Joachim suggested, you might also want to
look at predicate subtypes in the PVS theorem prover for inspiration.
The reason type-mismatch errors don't keep programmers up at night is
that the compiler catches them statically. In the old days, when
compilers didn't catch all these errors (I think of using pointer
variables in PL/1) programmers did stay up all night over type mismatches.
So it isn't that the static type systems are guarding against the wrong
errors -- it's that they aren't guarding against enough of them.
-- hendrik