Implicit basic block labels?

Hello,

I only recently started to look at LLVM assembly generated by Clang,
and one of the first thing I saw was like:

define i32 @foo(i32 %a, i32 %b) nounwind {
  %1 = tail call i32 @bar(i32 %a) nounwind
  %2 = icmp eq i32 %1, 0
  br i1 %2, label %5, label %3

; <label>:3 ; preds = %0
  %4 = add nsw i32 %b, %a
  br label %7

I wondered what "; <label>:3" would mean and how absent label relates
to the language syntax. http://llvm.org/docs/LangRef.html says "Each
basic block may optionally start with a label", that's all. I googled
around and scratched my head for half an hour, still found nothing
(used terms like "llvm implicit basic block labels" and "llvm implicit
basic block labels"), but after peering into that dump long enough and
applying induction on temporary var naming in API (where there's no
naming at all, it's all just external representation), I finally was
able to understand logic of it:

1. For each function, "unnamed entity" counter is initialized with 0.
2. Whenever unnamed tmp var is seen, it's assigned name as counter++
value.
3. Whenever unlabeled block is seen, it's assigned label as counter++
value.

Still, the questions are:

1. Where is this documented, and why http://llvm.org/docs/LangRef.html
doesn't have it? (I didn't re-read it completely on this occasion,
but grepped for all occurrences of "label" - none was relevant).

2. Why label is not rendered explicitly? Putting instead comment like
"; <label>:3" is as helpful and non-confusing as dumping tmp var names
as:

/* temporary 1 */ = tail call i32 @bar(i32 %a)

(assuming LLVM syntax would have stream-type comments besides line-type
";").

Thanks,
Paul mailto:pmiscml@gmail.com

Hello,

I only recently started to look at LLVM assembly generated by Clang,
and one of the first thing I saw was like:

define i32 @foo(i32 %a, i32 %b) nounwind {
  %1 = tail call i32 @bar(i32 %a) nounwind
  %2 = icmp eq i32 %1, 0
  br i1 %2, label %5, label %3

; <label>:3 ; preds = %0
  %4 = add nsw i32 %b, %a
  br label %7

I wondered what "; <label>:3" would mean and how absent label relates
to the language syntax. http://llvm.org/docs/LangRef.html says "Each
basic block may optionally start with a label", that's all. I googled
around and scratched my head for half an hour, still found nothing
(used terms like "llvm implicit basic block labels" and "llvm implicit
basic block labels"), but after peering into that dump long enough and
applying induction on temporary var naming in API (where there's no
naming at all, it's all just external representation), I finally was
able to understand logic of it:

1. For each function, "unnamed entity" counter is initialized with 0.
2. Whenever unnamed tmp var is seen, it's assigned name as counter++
value.
3. Whenever unlabeled block is seen, it's assigned label as counter++
value.

Still, the questions are:

1. Where is this documented, and why http://llvm.org/docs/LangRef.html
doesn't have it? (I didn't re-read it completely on this occasion,
but grepped for all occurrences of "label" - none was relevant).

The closest it gets to talking about it I think is that itsays "Unnamed
temporaries are numbered sequentially", although it says it in a context
where the implication is that this applies to the results of instructions,
with no mention of BB names.

2. Why label is not rendered explicitly? Putting instead comment like
"; <label>:3" is as helpful and non-confusing as dumping tmp var names
as:

/* temporary 1 */ = tail call i32 @bar(i32 %a)

(assuming LLVM syntax would have stream-type comments besides line-type
";").

Printing a comment there is pretty useless. It's probably historical.

It should be pretty easy to change AssemblyWriter::printBasicBlock in
lib/IR/AsmWriter.cpp to print out a "unnamed" name for the BB, but the hard
part will be to ensure that the new policy will properly round-trip (i.e.,
can be parsed back; even in the presence of user-defined names). Also, we
like to keep the textual IR as compatible as possible, so the change would
have to be fully backward compatible.

Alternatively, you could use the experience from your wild goose chase to
choose a good location to document this strange behavior (in LangRef) so
that another person will be likely to find it.

-- Sean Silva

Hello,

> ; <label>:3 ; preds = %0
> %4 = add nsw i32 %b, %a
> br label %7
>
> I wondered what "; <label>:3" would mean and how absent label
> relates to the language syntax. http://llvm.org/docs/LangRef.html

>
The closest it gets to talking about it I think is that itsays
"Unnamed temporaries are numbered sequentially", although it says it
in a context where the implication is that this applies to the
results of instructions, with no mention of BB names.

Printing a comment there is pretty useless. It's probably historical.

It should be pretty easy to change AssemblyWriter::printBasicBlock in
lib/IR/AsmWriter.cpp to print out a "unnamed" name for the BB, but
the hard part will be to ensure that the new policy will properly
round-trip (i.e., can be parsed back; even in the presence of
user-defined names). Also, we like to keep the textual IR as
compatible as possible, so the change would have to be fully backward
compatible.

Alternatively, you could use the experience from your wild goose
chase to choose a good location to document this strange behavior (in
LangRef) so that another person will be likely to find it.

Ok, so I assume this indeed warrants a bug, submitted
http://llvm.org/bugs/show_bug.cgi?id=16043 . I'll try to submit patches
for docs, and the look into code (so far I tested that just adding
explicit labels to test code I reported leads to parsing errors, so
indeed, there should be smarter logic - if explicit label is numeric
and number corresponds to current counter value, the counter should be
incremented).

Another question is about phi syntax. Looking at this:

%.0 = phi i32 [ %4, %3 ], [ %6, %5 ]

it's unlikely that human new to LLVM (but knowing what phi is) will
understand what it means. What about being consistent with other
instructions which accept label arguments - specify type explicitly:

%.0 = phi i32 [ %4, label %3 ], [ %6, label %5 ]

It's of course understood why 1st form is used - because phi argument
list may become rather long. But well, it's not the only instruction in
LLVM which can get long - function call is such, and a typical call
will beat typical phi easily I guess. So, readability and
self-description of syntax might be of higher priority than saving few
horizontal positions.