Parameter names in IR and debug info

Have I correctly inferred below, how I build IR and debug info for
a function type and a function (value), in particular, how to supply
the names of the formal parameters?

To create a function in llvm IR and give names to its formal parameters,
I must:

1. Build a LLVMTypeRef for the type of each formal and the function result.
2. Build a function type using LLVMFunctionType, from the results of 1.
3. Build a function (an LLVMValueRef), using LLVMAddFunction, from the result of 2.
4. Get the LLVMValueRef for each formal (apparently, these are constructed inside
    LLVMAddFunction), using LLVMGetParam, from the result of 3.
5. Set the formal name using LLVMSetValueName, from each result of 5.

Which appears to imply that the formal names are part of the function,
not the function type, and thus the function type could be reused for another
function whose signature differs only in the names of the formals. Also the
function type could be used as the referent of a pointer type, which could
then be used as the type of a variable, without any actual function at all.

To build corresponding debug info, I must:

6. Build a llvm::DIArray, using llvm::getOrCreateArray, from the results of 4.
7. Build a llvm::DIComposite type for the function, using
    llvm::createSubroutineType, from the result of 6.
8. Build a llvm::DIFunction using llvm::createFunction, from the result of 7.

Here, I need the formal values, with names, first, before building the function
type. This appears to imply that, in debug info, the formal names are also part
of the function type, which thus cannot be reused for a different function with
different formal names.

Can I build a DI function type without having an actual function of that type?
This happens in my language.

Have I correctly inferred below, how I build IR and debug info for
a function type and a function (value), in particular, how to supply
the names of the formal parameters?

Generally good advice: http://llvm.org/docs/tutorial/LangImpl8.html

To create a function in llvm IR and give names to its formal parameters,
I must:

1. Build a LLVMTypeRef for the type of each formal and the function result.

Sounds like you're using the C API. I'm not especially familiar with that,
so answers may be vague.

2. Build a function type using LLVMFunctionType, from the results of 1.
3. Build a function (an LLVMValueRef), using LLVMAddFunction, from the
result of 2.
4. Get the LLVMValueRef for each formal (apparently, these are constructed
inside
   LLVMAddFunction), using LLVMGetParam, from the result of 3.
5. Set the formal name using LLVMSetValueName, from each result of 5.

The names of LLVM IR values are purely aids for LLVM developers (such as
yourself), they should never have any impact on the result of LLVM (in
terms of machine asm/code - the textual LLVM IR will include the names, but
again, this is just a debugging aid for you, the LLVM developer (it has no
impact on the DWARF debug info LLVM emits))

Which appears to imply that the formal names are part of the function,
not the function type, and thus the function type could be reused for
another
function whose signature differs only in the names of the formals. Also
the
function type could be used as the referent of a pointer type, which could
then be used as the type of a variable, without any actual function at all.

Sure.

To build corresponding debug info, I must:

6. Build a llvm::DIArray, using llvm::getOrCreateArray, from the results
of 4.
7. Build a llvm::DIComposite type for the function, using
   llvm::createSubroutineType, from the result of 6.
8. Build a llvm::DIFunction using llvm::createFunction, from the result of
7.

Here, I need the formal values, with names, first, before building the
function
type.

I don't think you should need parameter names for createSubroutineType -
it's just a type (composed of other types, no variable names, just type
names).

This appears to imply that, in debug info, the formal names are also part
of the function type,

Shouldn't be. But the actual DWARF output doesn't necessarily have explicit
function types - it just has a function with some formal parameters, each
with a type and in a specified order.

which thus cannot be reused for a different function with
different formal names.

Can I build a DI function type without having an actual function of that
type?
This happens in my language.

Not sure I understand. You mean your language has, say, a function pointer
even though you have no function of that type. Certainly clang does this
(try compiling something simple like "void (*x)();" in clang and look at
the LLVM IR it produces - you'd want to produce something similar).

- David

    Have I correctly inferred below, how I build IR and debug info for
    a function type and a function (value), in particular, how to supply
    the names of the formal parameters?

Generally good advice: http://llvm.org/docs/tutorial/LangImpl8.html

    To create a function in llvm IR and give names to its formal parameters,
    I must:

    1. Build a LLVMTypeRef for the type of each formal and the function result.

Sounds like you're using the C API. I'm not especially familiar with that, so answers may be vague.

    2. Build a function type using LLVMFunctionType, from the results of 1.
    3. Build a function (an LLVMValueRef), using LLVMAddFunction, from the result of 2.
    4. Get the LLVMValueRef for each formal (apparently, these are constructed inside
        LLVMAddFunction), using LLVMGetParam, from the result of 3.
    5. Set the formal name using LLVMSetValueName, from each result of 5.

The names of LLVM IR values are purely aids for LLVM developers (such as yourself), they should never have any impact on the result of LLVM (in terms of machine asm/code - the textual LLVM IR will include the names, but again, this is just a debugging aid for you, the LLVM developer (it has no impact on the DWARF debug info LLVM emits))

Thanks, that's useful info. I had wondered about that. In any case, I think I
want the names just to help looking at IR code.

    Which appears to imply that the formal names are part of the function,
    not the function type, and thus the function type could be reused for another
    function whose signature differs only in the names of the formals. Also the
    function type could be used as the referent of a pointer type, which could
    then be used as the type of a variable, without any actual function at all.

Sure.

    To build corresponding debug info, I must:

    6. Build a llvm::DIArray, using llvm::getOrCreateArray, from the results of 4.
    7. Build a llvm::DIComposite type for the function, using
        llvm::createSubroutineType, from the result of 6.
    8. Build a llvm::DIFunction using llvm::createFunction, from the result of 7.

    Here, I need the formal values, with names, first, before building the function
    type.

I don't think you should need parameter names for createSubroutineType - it's just a type (composed of other types, no variable names, just type names).

That's what I might have expected, but ... createSubroutineType wants a DIArray, and its
creator getOrCreateArray, takes a list of Value*, not Type*.
(The Kaleidoscope chapter 8 uses getOrCreateTypeArray, returning DITypeArray, neither
of which shows up in a systematic grep of the entire source tree.)

    This appears to imply that, in debug info, the formal names are also part
    of the function type,

Shouldn't be. But the actual DWARF output doesn't necessarily have explicit function types - it just has a function with some formal parameters, each with a type and in a specified order.

    which thus cannot be reused for a different function with
    different formal names.

    Can I build a DI function type without having an actual function of that type?
    This happens in my language.

Not sure I understand. You mean your language has, say, a function pointer even though you have no function of that type. Certainly clang does this (try compiling something simple like "void (*x)();" in clang and look at the LLVM IR it produces - you'd want to produce something similar).

Yes, actually, the HLL view is there are procedure types (not pointers to) that take values of
any procedure whose signature meets a certain structural similarity criterion (weaker than equality)
with the type. Of course, pointers are used in the implementation, but this is a lowering from
the source code, complicated by the fact that sometimes the value can be a nested procedure, needing
an environment.

In order to support reasonable debugger behavior, using source language concepts, I need debug info for
procedure types to have parameter names. And better debugger behavior is one major reason for
connecting to an llvm back end.

One thing that makes it harder to figure out what I need to do is that many examples, e.g., clang-produced
IR for sample C programs, give assembly. Translating this into the sequence of calls needed to
build the in the in-memory form of IR is not always obvious.

Actually, I am increasing doubting the wisdom of my initial decision to generate llvm IR this
way. Maybe generating an assembly or bitcode file directly would be better.

(oops, dropped the list by accident)

(oops, dropped the list by accident)

From: *David Blaikie* <dblaikie@gmail.com <mailto:dblaikie@gmail.com>>
Date: Fri, Feb 20, 2015 at 1:17 PM
Subject: Re: [LLVMdev] Parameter names in IR and debug info
To: rodney.m.bates@acm.org <mailto:rodney.m.bates@acm.org>

             Have I correctly inferred below, how I build IR and debug info for
             a function type and a function (value), in particular, how to supply
             the names of the formal parameters?

        Generally good advice: http://llvm.org/docs/tutorial/__LangImpl8.html <http://llvm.org/docs/tutorial/LangImpl8.html>

             To create a function in llvm IR and give names to its formal parameters,
             I must:

             1. Build a LLVMTypeRef for the type of each formal and the function result.

        Sounds like you're using the C API. I'm not especially familiar with that, so answers may be vague.

             2. Build a function type using LLVMFunctionType, from the results of 1.
             3. Build a function (an LLVMValueRef), using LLVMAddFunction, from the result of 2.
             4. Get the LLVMValueRef for each formal (apparently, these are constructed inside
                 LLVMAddFunction), using LLVMGetParam, from the result of 3.
             5. Set the formal name using LLVMSetValueName, from each result of 5.

        The names of LLVM IR values are purely aids for LLVM developers (such as yourself), they should never have any impact on the result of LLVM (in terms of machine asm/code - the textual LLVM IR will include the names, but again, this is just a debugging aid for you, the LLVM developer (it has no impact on the DWARF debug info LLVM emits))

    Thanks, that's useful info. I had wondered about that. In any case, I think I
    want the names just to help looking at IR code.

             Which appears to imply that the formal names are part of the function,
             not the function type, and thus the function type could be reused for another
             function whose signature differs only in the names of the formals. Also the
             function type could be used as the referent of a pointer type, which could
             then be used as the type of a variable, without any actual function at all.

        Sure.

             To build corresponding debug info, I must:

             6. Build a llvm::DIArray, using llvm::getOrCreateArray, from the results of 4.
             7. Build a llvm::DIComposite type for the function, using
                 llvm::createSubroutineType, from the result of 6.
             8. Build a llvm::DIFunction using llvm::createFunction, from the result of 7.

             Here, I need the formal values, with names, first, before building the function
             type.

        I don't think you should need parameter names for createSubroutineType - it's just a type (composed of other types, no variable names, just type names).

    That's what I might have expected, but ... createSubroutineType wants a DIArray, and its
    creator getOrCreateArray, takes a list of Value*, not Type*.

OK, so all the debug info is described by values, it doesn't depend on or use LLVM Type*s. You can have an i32 value in LLVM described by some complex user defined type (because, for example, your source language tells you to lower "struct foo { int x; };" to i32 in a calling convention, etc) in the debug info because that's how it was written in the source code.

    (The Kaleidoscope chapter 8 uses getOrCreateTypeArray, returning DITypeArray, neither
    of which shows up in a systematic grep of the entire source tree.)

Which source tree are you searching? DIBuilder::getOrCreateTypeArray is in include/llvm/IR/DIBuilder.h

3.4.2. This was the latest released version when I got seriously started. I am not bold enough
to try developing something this size against an evolving development branch.

             This appears to imply that, in debug info, the formal names are also part
             of the function type,

        Shouldn't be. But the actual DWARF output doesn't necessarily have explicit function types - it just has a function with some formal parameters, each with a type and in a specified order.

             which thus cannot be reused for a different function with
             different formal names.

             Can I build a DI function type without having an actual function of that type?
             This happens in my language.

        Not sure I understand. You mean your language has, say, a function pointer even though you have no function of that type. Certainly clang does this (try compiling something simple like "void (*x)();" in clang and look at the LLVM IR it produces - you'd want to produce something similar).

    Yes, actually, the HLL view is there are procedure types (not pointers to) that take values of
    any procedure whose signature meets a certain structural similarity criterion (weaker than equality)
    with the type. Of course, pointers are used in the implementation, but this is a lowering from
    the source code, complicated by the fact that sometimes the value can be a nested procedure, needing
    an environment.

    In order to support reasonable debugger behavior, using source language concepts, I need debug info for
    procedure types to have parameter names.

It's possible that LLVM's debug info IR doesn't support this scenario. I'm not sure DWARF itself does either (hmm, well, a DW_TAG_subroutine_type has DW_TAG_formal_parameters for its parameter types, so in theory those could have DW_AT_names - but I don't think we support that in the IR at least & not sure if any existing debuggers would do anything useful with it if we did)

Yes, I have been assuming there could well be some debug info cases that would not get through
llvm, though I really hope not to have to touch llvm. Been around that block before (tho' not
WRT llvm.)

    And better debugger behavior is one major reason for
    connecting to an llvm back end.

    One thing that makes it harder to figure out what I need to do is that many examples, e.g., clang-produced
    IR for sample C programs, give assembly. Translating this into the sequence of calls needed to
    build the in the in-memory form of IR is not always obvious.

The Kaleidoscope examples should give some examples of API use.

Yes, they have helped in a number of cases.

And/or you can jump into clang in a debugger & step through how it's building different things.

Hadn't thought of that, though I have tried just reading clang source, a little.