More questions on the best way to write a compiler using LLVM:
Lets say I have a struct definition that looks like this:
const int imageSize = 77;
struct A {
char B[align(imageSize)];
}
...where 'imageSize' is some small inline function that rounds up to a
power of two or something. (A common requirement for textures on 3d
graphics cards.)
Now, clearly the compiler needs to evaluate the array size expression
at compile time, since the size of the array needs to be known up
front. My question is, would it generally be better to generate LLVM
code for the expression and run it in the compiler, or would you
basically have to create an interpreter within your compiler that is
capable of executing your AST?
The disavantage of generating LLVM code for a constant expression are
several, including in particular the handling of errors - if the
constant expression gets a fatal error, you'd prefer not to crash the
compiler. Also, in a cross-compilation environment you'd have to
generate the constant expression for the host platform, rather than
for the target platform.
On the other hand, writing an interpreter means duplicating a lot of
the functionality that's already in LLVM. For example, consider just
the problem of float to int conversions:
char B[ (int)3.0 ];
Generating code for this is relatively simple; Converting
arbitrary-sized APFloats to arbitrary-sized APInts isn't quite as
easy.
Similarly, the mathematical operators directly supported by APFloat
are only a subset of the math operators supported by the LLVM
instructions.
More questions on the best way to write a compiler using LLVM:
ok
Now, clearly the compiler needs to evaluate the array size expression
at compile time, since the size of the array needs to be known up
front. My question is, would it generally be better to generate LLVM
code for the expression and run it in the compiler, or would you
basically have to create an interpreter within your compiler that is
capable of executing your AST?
It's really up to you, and it may be up to your language spec. One approach would be to generate a series of llvm::ConstantExpr::get* method calls, which implicitly constant fold expressions where possible. For example, if you ask for "add 3+7" you'll get 10. If you ask for "div 15, 0" you'll get an unfolded constant expr back.
However, if you treat type checking separately from code generation, your language may require that you diagnose these sorts of things, and that would mean that you have to implement the constant folding logic in your frontend. This is what clang does, for example.
If you go this route, you can still use the LLVM APInt and APFloat classes to do these operations, and maintain the correct precision etc.
On the other hand, writing an interpreter means duplicating a lot of
the functionality that's already in LLVM. For example, consider just
the problem of float to int conversions:
char B[ (int)3.0 ];
Generating code for this is relatively simple; Converting
arbitrary-sized APFloats to arbitrary-sized APInts isn't quite as
easy.
APFloat::convertToInteger does just this. Why can't you use it?
Similarly, the mathematical operators directly supported by APFloat
are only a subset of the math operators supported by the LLVM
instructions.
Yes, arbitrary math ops are hard if you want to get correctly
rounded results for any rounding mode, which was a goal for APFloat.
But all IEEE754 ops represented.
One thing that jumped to mind was "Hey, I can write that as a
metafunction!" using C++, so you might be able to find hints in the
way that templates are handled.
On a similar note, there's a paper on Generalized Constant Expressions
( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2235.pdf )
proposed for the next C++ revision, so you might be able to find other
people doing the same thing. (Though IIRC that proposal doesn't allow
recursion or mutation of values, which drastically simplifies things.)
It's really up to you, and it may be up to your language spec. One approach would be to generate a series of llvm::ConstantExpr::get* method calls, which implicitly constant fold expressions where possible. For example, if you ask for "add 3+7" you'll get 10. If you ask for "div 15, 0" you'll get an unfolded constant expr back.
Thanks, that is exactly what I needed.
However, if you treat type checking separately from code generation, your language may require that you diagnose these sorts of things, and that would mean that you have to implement the constant folding logic in your frontend. This is what clang does, for example.
Since my AST nodes wrap the LLVM Constant nodes, I have the choice of mapping from my language types to LLVM types or the reverse, when needed. So I can use whatever is appropriate at any given point.
On the other hand, writing an interpreter means duplicating a lot of
the functionality that's already in LLVM. For example, consider just
the problem of float to int conversions:
char B[ (int)3.0 ];
Generating code for this is relatively simple; Converting
arbitrary-sized APFloats to arbitrary-sized APInts isn't quite as
easy.
APFloat::convertToInteger does just this. Why can't you use it?
Well, I may be using it wrong. But looking at APFloat.h, I see four functions that purport to convert to integer:
The first three convert to an array of integer parts, which (as far as I can tell) is not easily convertible into an APInt via any public methods I have been able to discover so far.
The last function doesn't appear to convert the APFloat into the nearest integer equivalent, since my experiments with it returned completely unexpected values; I'm assuming that what is returned is an APInt containing the bitwise representation of the floating-point value?
// First, determine the length of the command line.
unsigned len = 0;
for (unsigned i = 0; args[i]; i++) {
len += strlen(args[i]) + 1;
if (strchr(args[i], ’ '))
len += 2;
}
// Now build the command line.
char *command = reinterpret_cast<char *>(_alloca(len)); // should use len+1 to fix this
char *p = command;
for (unsigned i = 0; args[i]; i++) {
const char *arg = args[i];
size_t len = strlen(arg);
bool needsQuoting = strchr(arg, ’ ') != 0;
if (needsQuoting)
*p++ = ‘"’;
memcpy(p, arg, len);
p += len;
if (needsQuoting)
*p++ = ‘"’;
*p++ = ’ ';
}
The first three convert to an array of integer parts, which (as far as I
can tell) is not easily convertible into an APInt via any public methods
I have been able to discover so far.
The last function doesn't appear to convert the APFloat into the nearest
integer equivalent, since my experiments with it returned completely
unexpected values; I'm assuming that what is returned is an APInt
containing the bitwise representation of the floating-point value?
Only two convert to integer. The convertToAPInt is unfortunately
named; I'm not sure what it does but suspect it captures bitpatterns
like you suggest.
convertToInteger is the function I'm responsible for and it does
float->int conversion according to IEEE754. If you want to place
it in an APInt, create your APInt with the appropriate size and
use its buffer as input. APInt has made unfortunate sign choices
though, IIRC it is not sign-extended, so you may need to fudge in
the case of a signed target. This is also why there are two "from"
functions above
// This function creates an APInt that is just a bit map of the floating
// point constant as it would appear in memory. It is not a conversion,
// and treating the result as a normal integer is unlikely to be useful.
The first three convert to an array of integer parts, which (as far as I can tell) is not easily convertible into an APInt via any public methods I have been able to discover so far.
The last function doesn't appear to convert the APFloat into the nearest integer equivalent, since my experiments with it returned completely unexpected values; I'm assuming that what is returned is an APInt containing the bitwise representation of the floating-point value?
Only two convert to integer. The convertToAPInt is unfortunately
named; I'm not sure what it does but suspect it captures bitpatterns
like you suggest.
convertToInteger is the function I'm responsible for and it does
float->int conversion according to IEEE754. If you want to place
it in an APInt, create your APInt with the appropriate size and
use its buffer as input. APInt has made unfortunate sign choices
though, IIRC it is not sign-extended, so you may need to fudge in
the case of a signed target. This is also why there are two "from"
functions above
OK here's a follow-up question: So far the ConstantExpr class has been doing what I want pretty well. However, I'd like to be able to detect overflow / loss of precision when casting constant values so that I can issue the appropriate warning. Since the values are constant, I ought to be able to tell whether or not they can "fit" in the destination type without loss. For ints, this is easy enough using getActiveBits(). For floats, I guess I would need to know whether the fractional part of the number is zero or not, and I'd need floating-point equivalents to the various integer min and max values, so that I could compare them with the APFloat.
What would be a good technique for accomplishing this?
> Only two convert to integer. The convertToAPInt is unfortunately
> named; I'm not sure what it does but suspect it captures bitpatterns
> like you suggest.
>
> convertToInteger is the function I'm responsible for and it does
> float->int conversion according to IEEE754. If you want to place
> it in an APInt, create your APInt with the appropriate size and
> use its buffer as input. APInt has made unfortunate sign choices
> though, IIRC it is not sign-extended, so you may need to fudge in
> the case of a signed target. This is also why there are two "from"
> functions above
>
OK here's a follow-up question: So far the ConstantExpr class has been
doing what I want pretty well. However, I'd like to be able to detect
overflow / loss of precision when casting constant values so that I can
issue the appropriate warning. Since the values are constant, I ought to
be able to tell whether or not they can "fit" in the destination type
without loss. For ints, this is easy enough using getActiveBits(). For
floats, I guess I would need to know whether the fractional part of the
number is zero or not, and I'd need floating-point equivalents to the
various integer min and max values, so that I could compare them with
the APFloat.
What would be a good technique for accomplishing this?
The APFloat functionality I wrote is the only means I'm aware of
in LLVM, via the return value. APFloat is derived from C code I
wrote for my own C compiler front end, which catches and diagnoses
these constant-folding issues. It, in C, has a different implementation
of APInt that additionally provides this information for integer
operations, like APFloat does for floating operations, making this
kind of analysis easy. The static methods of APInt are essentially
part of that.
If LLVM's APFloat wrapper isn't conveying the info it is given
by APFloat, it needs to be improved