This is a little off-topic. But I am writing a compiler to llvm ir
for a language that admits recursive nested functions and am stuck
as to how to translate them. Concretely, I'm trying to lift them all
to the topmost level and pass all their free variables explicitly as
arguments. To do this, I have to determine all their free variables
in their bodies. In particular when I come across a function call,
I have to add _its_ free variables to the list of free variables of
the enclosing function. You can see how this will not work if we
allow recursive functions.
Any pointers would be greatly appreciated.
Topologically sort the functions defined in a single scope. The free
variables of a cluster of mutually-recursive functions is the union
of their "direct" free variables, plus the free variables of any functions
In an earlier life, I built a commercial Pascal to C++ translator, called unimaginatively p2c, that was used by most of the top Mac application developers to port their Object Pascal programs to C++ when Apple forced them during the Motorola 68XXX to PowerPC transition. Since Object Pascal (like most Pascals) permitted nested function (or procedure) definitions and C++ does not, I had to address this very issue.
Since we were translating commercial code, and our own database engine code, we needed the resulting C++ to be as close to the original as possible so it could be maintained. We even preserved comment positions and conditional compilation switches when possible.
If I recall, I handled the nested procedure/function issue by keeping track of the particular scope for each of the free variables used within inner functions during identifier resolution. I used a linked set of symbol tables and pushed a new symbol table for each new nested procedure. This made it easy to determine where the free variable was defined. If it was defined in an enclosing function, I marked the entry for that symbol to indicate that the symbol was required in a nested function.
I then created structs which held the free variables accessed by the nested functions and passed a pointer to the struct into each of the nested functions. I changed all uses of the variables to reference the struct members at the outer level, and then passed in a pointer to the struct into inner scoped functions. A double-nested function would have two struct pointers if it accessed free variables from both of its enclosing functions. I then changed all the accesses to the free variables within the nested procedures/functions to reference the struct members through the pointer.
This ended up producing fairly efficient code that was pretty easy to understand when looking at the resulting C++ code.
Curtis described a fairly standard approach -- you put all variables
used in nested functions into a struct and pass pointer to it to these
functions. The only difference is that these structs are usually
linked (struct in a nested function has a pointer to struct in outer
function), so deep nested functions only need one extra argument. gcc
supports nested functions in C (including mutually recursive nested
functions, forward declarations need to use "auto" linkage) using this
approach, you can look at what it does with llvm-gcc.