libc dependencies, code generation questions

Hello,

I'm looking into creating an llvm backend for the Free Pascal Compiler (<http://www.freepascal.org>). After reading a bit through the documentation and looking at some code generated by llvm-gcc, I have a couple of questions:

1) is there a way to specify ranges in the switch statement? Pascal supports switch statements (called "case" statements there) which look like this:

case <expr> of
   1..1000000: dothis;
   1000001..1000000000: do that;
end;

Generating a switch statement with 10^9 individual entries is not really feasible in practice. We can of course map all "large" ranges in case statements into equivalent if-statements, but that largely defeats the elegance and ease of use of the switch statement for us :slight_smile:

2) I assume llvm sometimes adds implicit calls to functions in the C library, e.g. for llvm.malloc, llvm.free, some floating point routines and some others. Is there a policy regarding which llvm opcodes may result in C library dependencies and which not? The reason I ask is that we try to only depend on stable system interfaces (in the sense of interfaces which are the most unlikely to break backwards binary compatibility), and on a number of OSes (such as Linux) this means using system calls rather than libc.

We have our own alternate implementations of all the functionality expressed by the "high level" llvm opcodes, but I don't know if there is a mechanism available to redirect these from their (presumed) standard libc dependencies to our own routines.

3) we support inline assembler in the same way that Turbo Pascal and Delphi did: you just type in code without telling the compiler what registers or memory locations this routine clobbers, and the compiler thus cannot make any assumptions about them (other than what the ABI/calling convention specifies). As far as llvm is concerned, they should be semantically equivalent to calling an external routine which was not compiled to llvm ir. Is there generic a way to tell this to llvm, or should one simply specify all volatile registers as read and clobbered, and the same for memory?

4) to what extent is the front end (i.e., our compiler) responsible for code selection and optimization? In other words, should we spend a lot of time on converting if-statements to select-based predicates and things like this, or will this be done by llvm afterwards anyway? What about vectorization? Are there particular kinds of optimizations which llvm will probably never be very good at (or which are not llvm's focus in the near to middle term), and which thus should definitely be done at a higher level?

Thanks,

Jonas

Hi Jonas,

I'm looking into creating an llvm backend for the Free Pascal
Compiler (<http://www.freepascal.org>). After reading a bit through
the documentation and looking at some code generated by llvm-gcc, I
have a couple of questions:

I've been working on getting the gcc Ada front-end to work in llvm-gcc.
Since Ada evolved from Pascal, it may be worth your while to see how
llvm-gcc handles these kinds of issues (most of the work is done in
llvm-convert.cpp).

1) is there a way to specify ranges in the switch statement? Pascal
supports switch statements (called "case" statements there) which
look like this:

case <expr> of
   1..1000000: dothis;
   1000001..1000000000: do that;
end;

Generating a switch statement with 10^9 individual entries is not
really feasible in practice. We can of course map all "large" ranges
in case statements into equivalent if-statements, but that largely
defeats the elegance and ease of use of the switch statement for us :slight_smile:

Currently llvm-gcc maps large ranges into if-statements, see
TreeToLLVM::EmitSWITCH_EXPR. Adding support for ranges to LLVM
is bug 1255. It will doubtless happen because it doesn't seem
hard to do and in fact I understand that the switch lowering code
generates such ranges internally anyway.

2) I assume llvm sometimes adds implicit calls to functions in the C
library, e.g. for llvm.malloc, llvm.free, some floating point
routines and some others. Is there a policy regarding which llvm
opcodes may result in C library dependencies and which not? The
reason I ask is that we try to only depend on stable system
interfaces (in the sense of interfaces which are the most unlikely to
break backwards binary compatibility), and on a number of OSes (such
as Linux) this means using system calls rather than libc.

We have our own alternate implementations of all the functionality
expressed by the "high level" llvm opcodes, but I don't know if there
is a mechanism available to redirect these from their (presumed)
standard libc dependencies to our own routines.

I don't think LLVM spontaneously creates calls to these - it just knows how
to optimize them if the calls were already in the IR fed to it.

3) we support inline assembler in the same way that Turbo Pascal and
Delphi did: you just type in code without telling the compiler what
registers or memory locations this routine clobbers, and the compiler
thus cannot make any assumptions about them (other than what the ABI/
calling convention specifies). As far as llvm is concerned, they
should be semantically equivalent to calling an external routine
which was not compiled to llvm ir. Is there generic a way to tell
this to llvm, or should one simply specify all volatile registers as
read and clobbered, and the same for memory?

I think you should just specify that everything is clobbered in the
inline asm you generate in the LLVM IR.

4) to what extent is the front end (i.e., our compiler) responsible
for code selection and optimization? In other words, should we spend
a lot of time on converting if-statements to select-based predicates
and things like this, or will this be done by llvm afterwards anyway?
What about vectorization? Are there particular kinds of optimizations
which llvm will probably never be very good at (or which are not
llvm's focus in the near to middle term), and which thus should
definitely be done at a higher level?

In llvm-gcc, front-ends do very little optimization. Constant folding
occurs (gcc does this automagically as you create your gcc trees) and
some common idioms are recognized and output as something well adapted
to LLVM optimization. But basically all the optimization is left to
LLVM. LLVM can certainly turn if statements into switches.

Ciao,

Duncan.

Hi Jonas. I’m very interested in an llvm backend for freepascal. Could you give some more details?
Is there already something to try and test? Will the code be avaiable on svn?

Thanks, Nicola

Hi Jonas. I'm very interested in an llvm backend for freepascal. Could you
give some more details?

I've only started some preliminary work, there isn't much to speak of yet.

Is there already something to try and test?

No, not even by a long shot.

Will the code be avaiable on svn?

Yes.

Jonas

I've been working on getting the gcc Ada front-end to work in llvm-gcc.
Since Ada evolved from Pascal, it may be worth your while to see how
llvm-gcc handles these kinds of issues (most of the work is done in
llvm-convert.cpp).

Thanks, I will certainly do so.

[llvm helpers such as malloc, free, memcpy, memset, fp helpers, ... ->
taken from the standard C library?]

I don't think LLVM spontaneously creates calls to these - it just knows how
to optimize them if the calls were already in the IR fed to it.

I don't really understand how it could do that. Does it simply look for declarations of functions with particular signatures and inserts calls to them?

4) to what extent is the front end (i.e., our compiler) responsible
for code selection and optimization? In other words, should we spend
a lot of time on converting if-statements to select-based predicates
and things like this, or will this be done by llvm afterwards anyway?
What about vectorization? Are there particular kinds of optimizations
which llvm will probably never be very good at (or which are not
llvm's focus in the near to middle term), and which thus should
definitely be done at a higher level?

In llvm-gcc, front-ends do very little optimization.

I actually did see gcc doing quite a bit of stuff when turning on optimizations and specifying -emit-llvm. At least it moved things from memory to (virtual) registers and factored common parts out of switch statements.

Thanks for your answer,

Jonas

Hi Jonas,

>> [llvm helpers such as malloc, free, memcpy, memset, fp
>> helpers, ... ->
>> taken from the standard C library?]
>
> I don't think LLVM spontaneously creates calls to these - it just
> knows how
> to optimize them if the calls were already in the IR fed to it.

I don't really understand how it could do that. Does it simply look
for declarations of functions with particular signatures and inserts
calls to them?

it looks for functions with a certain name and signature, and supposes
that these are the standard functions with those names.
The simplify-libcalls pass does it.

>> 4) to what extent is the front end (i.e., our compiler) responsible
>> for code selection and optimization? In other words, should we spend
>> a lot of time on converting if-statements to select-based predicates
>> and things like this, or will this be done by llvm afterwards anyway?
>> What about vectorization? Are there particular kinds of optimizations
>> which llvm will probably never be very good at (or which are not
>> llvm's focus in the near to middle term), and which thus should
>> definitely be done at a higher level?
>
> In llvm-gcc, front-ends do very little optimization.

I actually did see gcc doing quite a bit of stuff when turning on
optimizations and specifying -emit-llvm. At least it moved things
from memory to (virtual) registers and factored common parts out of
switch statements.

It is LLVM that is doing these optimizations. The LLVM optimizers
are run before the LLVM IR is output.

Ciao,

Duncan.

Duncan Sands wrote:-

it looks for functions with a certain name and signature, and supposes
that these are the standard functions with those names.

The way you write it makes this sound dubious. The way the standard
is written, particularly w.r.t. external identifiers, makes this
quite legitimate.

Neil.

Neil Booth wrote:

Duncan Sands wrote:-

it looks for functions with a certain name and signature, and supposes
that these are the standard functions with those names.

The way you write it makes this sound dubious. The way the standard
is written, particularly w.r.t. external identifiers, makes this
quite legitimate.

Assuming you're talking about the C standard, perhaps. Other languages
won't necessarily have the same rules. For that matter, even C specifies
hosted vs. free-standing. In free-standing C the library chapters of the
standard don't apply as the user is left to define their own functions.
This is popular among embedded system developers.

The LLVM pass really exists solely to optimize C code. Which is fine as
only the C frontend needs to include it in its sequence of passes (and
for that matter, exclude it when passed the '-ffreestanding' option).

Nick Lewycky

This is the simplifylibcalls pass, which is turned off when you pass -fno-builtins. If a pascal compiler (or anything else) doesn't want those, it should not run that pass.

-Chris

I'm looking into creating an llvm backend for the Free Pascal
Compiler (<http://www.freepascal.org>). After reading a bit through
the documentation and looking at some code generated by llvm-gcc, I
have a couple of questions:

Cool, I'll add a couple comments, others have covered most of your questions.

2) I assume llvm sometimes adds implicit calls to functions in the C

LLVM doesn't do this unless you ask for it. However, it does produce calls to functions in libgcc. For example, if your cpu doesn't support 64-bit division, the code generator emits a call to __udivdi (or something like that). At some point, LLVM may have its own codegen library, but none is planned and libgcc works fine.

3) we support inline assembler in the same way that Turbo Pascal and
Delphi did: you just type in code without telling the compiler what
registers or memory locations this routine clobbers, and the compiler
thus cannot make any assumptions about them (other than what the ABI/
calling convention specifies). As far as llvm is concerned, they
should be semantically equivalent to calling an external routine
which was not compiled to llvm ir. Is there generic a way to tell
this to llvm, or should one simply specify all volatile registers as
read and clobbered, and the same for memory?

No, this is purely a front-end issue. LLVM supports the semantic equivalent of GCC-style inline assembly, where you have to tell the code generator about register constraints etc.

If your front-end supports a different dialect of inline asm, it has to be smart enough to map it onto the LLVM style. In particular, llvm-gcc contains code that maps "microsoft style" inline asm (which also doesn't have explicit register constraints) to the LLVM style. This is enabled with -fasm-blocks.

4) to what extent is the front end (i.e., our compiler) responsible
for code selection and optimization? In other words, should we spend
a lot of time on converting if-statements to select-based predicates
and things like this, or will this be done by llvm afterwards anyway?

You shouldn't have to do any of this stuff.

What about vectorization?

LLVM doesn't currently support autovectorization, but it would make a lot more sense to implement this in LLVM than in your frontend.

Are there particular kinds of optimizations which llvm will probably never be very good at (or which are not llvm's focus in the near to middle term), and which thus should definitely be done at a higher level?

There is a class of very high-level optimizations that LLVM isn't currently very good at. However, LLVM should be extended to handle these things. I'd suggest keeping your front-end simple and letting the llvm optimizers grind on it. If you see performance problems or optimizations missing, contact the list and we can find the best way to fix specific issues (be it in LLVM or in your front-end).

-Chris