Functions with unnamed parameters in LLVM IR

Hi,

Recently I came across some IR produced by a frontend that had unnamed
function arguments. For example something like this.

define i32 @foo(i32, i32, i32) #0 {
  %x = add i32 %1, %2
  ret i32 %x
}

I had never seen this before, so I took a look at the LLVM language
reference manual and the section on functions [1] doesn't say anything
about what "argument list" can be (other than the possibility of it
being empty).

The above LLVM IR was confusing to me because I usually see that
unnamed registers start counting from 1 (i.e. %1 = add ...). However
it seems (after calling the dump() method on the arguments of the
function) that actually unnamed parameters count from 0 so the three
arguments are in registers

%0
%1
%2

and not in registers

%1
%2
%3

(which is what I was expecting)

I'm slightly surprised that unnamed function arguments are allowed at
all. Maybe there is a use case but I can't think of a good one off the
top of my head.

I think it should be documented in the LLVM reference manual what the
register names are for unnamed arguments. This would also need to
consider the case where some arguments are named and some are not.
E.g.

; Function Attrs: nounwind uwtable
define i32 @foo(i32 %y, i32, i32) #0 {
  %x = add i32 %y, %1
  ret i32 %x
}

AFAICT this is equivalent to the IR shown earlier but it might not be
immediately obvious that it is.

[1] http://llvm.org/docs/LangRef.html#functions

Thanks,
Dan.

The above LLVM IR was confusing to me because I usually see that
unnamed registers start counting from 1 (i.e. %1 = add ...).

There's a (usually hidden) %0 representing the entry basic block
there. The general rule is "start from 0 and keep counting; skip named
values".

I'm slightly surprised that unnamed function arguments are allowed at
all. Maybe there is a use case but I can't think of a good one off the
top of my head.

They're useful for ensuring ABI conformance: padding out (hardware)
registers that you don't want to use for a particular call.

You might map void foo(int32_t a, int64_t b) to "declare void @foo(i32
%a, i32, i64 %b)" for example, if your 64-bit value had to start at an
even (32-bit) register number, as is the case on ARM. (Actually, it's
not necessary there, but is in more complicated cases).

I think it should be documented in the LLVM reference manual what the
register names are for unnamed arguments.

It does sound reasonable to add a note that function arguments get
included too, since they're not obviously value computations.

Cheers.

Tim.

The above LLVM IR was confusing to me because I usually see that
unnamed registers start counting from 1 (i.e. %1 = add ...).

There's a (usually hidden) %0 representing the entry basic block
there. The general rule is "start from 0 and keep counting; skip named
values".

Okay that's good to know. So unnamed arguments get assigned temporary
registers before the entry block? If so the entry block isn't always
%0.

I'm slightly surprised that unnamed function arguments are allowed at
all. Maybe there is a use case but I can't think of a good one off the
top of my head.

They're useful for ensuring ABI conformance: padding out (hardware)
registers that you don't want to use for a particular call.

You might map void foo(int32_t a, int64_t b) to "declare void @foo(i32
%a, i32, i64 %b)" for example, if your 64-bit value had to start at an
even (32-bit) register number, as is the case on ARM. (Actually, it's
not necessary there, but is in more complicated cases).

Thanks for the concrete use case.

I think it should be documented in the LLVM reference manual what the
register names are for unnamed arguments.

It does sound reasonable to add a note that function arguments get
included too, since they're not obviously value computations.

Included in what?

Anyway I've attached a patch that tries to clear this up. Is this good
enough to commit?

0001-Add-note-to-LangRef-about-how-function-arguments-can.patch (1.32 KB)

Okay that's good to know. So unnamed arguments get assigned temporary
registers before the entry block? If so the entry block isn't always
%0.

True.

Included in what?

Anyway I've attached a patch that tries to clear this up. Is this good
enough to commit?

I did mean the LangRef, but probably not there, and not to that degree.

It's a fairly minor point, perhaps warranting a sentence where unnamed
values are generally discussed and the basic-block case is mentioned
(under the "Identifiers" section).

Cheers.

Tim.

I did mean the LangRef, but probably not there, and not to that degree.

For documentation I think being explicit is much better than being implicit.

It's a fairly minor point, perhaps warranting a sentence where unnamed
values are generally discussed and the basic-block case is mentioned
(under the "Identifiers" section).

I took a look at the "Identifiers" section [1]. There isn't any
mention of the basic-block case there. In any case I don't think this
is the appropriate place. Perhaps I've given you the wrong impression
about what I want to document.

What I want to documented is the fact that in a function definition,
arguments can be unnamed and that unnamed arguments are assigned to
temporary registers using the function's counter. The logical place to
discuss that is [2] (if I was a new user of LLVM and I wanted to know
about function arguments I wouldn't go looking at the identifiers
section). The entry block being assigned "%0" is discussed here too so
I could correct that as well.

On a separate note whilst looking at the Language reference I did
observe think a few things were odd

- We seem to mix the terms argument and parameter. Functions seem to
have arguments, but those arguments have "parameter attributes"
- We sort of use backus naur form to show the IR syntax but we don't
define things like "argument list" (what I'm trying to document), "ret
attrs", "fn Attrs". For "ret attrs" and "fn Attrs" we define what the
attributes are but we don't define the form of the list (i.e. space
separated).

[1] http://llvm.org/docs/LangRef.html#identifiers
[2] http://llvm.org/docs/LangRef.html#functions

I did mean the LangRef, but probably not there, and not to that degree.

For documentation I think being explicit is much better than being implicit.

Sure, but the entire section on functions is 61 lines. Adding 20 to
cover this bit of trivia is *way* out of balance, and I don't think it
actually improves the usability of the documentation for most readers.

I took a look at the "Identifiers" section [1]. There isn't any
mention of the basic-block case there.

"Unnamed temporaries are numbered sequentially (using a per-function
incrementing counter, starting with 0). Note that basic blocks are
included in this numbering. For example, if the entry basic block is
not given a label name, then it will get number 0."

Cheers.

Tim.

You're probably right there. I could cut it down a bit. I still think having

+The argument list is a comma seperated sequence of arguments where
each argument is of the following form

Sorry for the delay on this. Is the attached patch any better?

I've modified the note in the Identifiers sections and trimmed down my
addition to the Functions section.

I've been a little inconsistent with using argument/parameter but the
entire document seems to be like this so I figured it would be okay.

Thanks,
Dan.

[v2]0001-Add-note-to-LangRef-about-how-function-arguments-can.patch (1.67 KB)

Hi Dan,

Thanks for taking a look. I have commit access so I'll commit it later on today.

Thanks,
Dan.

Committed in r216070