I've attached a preliminary patch for `MDLocation` as a follow-up to the
RFC [1] last week. It's not commit-ready -- in particular, it squashes
a bunch of commits together and doesn't pass `make check` -- but I think
it's close enough to indicate the direction and work toward consensus.
[1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html
IMO, the files to focus on are:
include/llvm/IR/DebugInfo.h
include/llvm/IR/DebugLoc.h
include/llvm/IR/Metadata.h
include/llvm/IR/Value.h
lib/AsmParser/LLLexer.cpp
lib/AsmParser/LLParser.cpp
lib/AsmParser/LLParser.h
lib/AsmParser/LLToken.h
lib/Bitcode/Reader/BitcodeReader.cpp
lib/Bitcode/Writer/BitcodeWriter.cpp
lib/Bitcode/Writer/ValueEnumerator.cpp
lib/Bitcode/Writer/ValueEnumerator.h
lib/IR/AsmWriter.cpp
lib/IR/AsmWriter.h
lib/IR/DebugInfo.cpp
lib/IR/DebugLoc.cpp
lib/IR/LLVMContextImpl.cpp
lib/IR/LLVMContextImpl.h
lib/IR/Metadata.cpp
Using `Value` instead of `MDNode`
A number of APIs expect `MDNode` -- previously, the only referenceable
type of metadata -- but this patch (and the ones that will follow) have
referenceable metadata that *do not* inherit from `MDNode`. Metadata
APIs such as `Instruction::getMetadata()` and
`NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
I plan to commit the API changes incrementally so we can fix any issues
there before pushing the functionality changes. Unfortunately, this
currently adds a lot of noise to the (squashed) patch.
Introducing `MDLocation`
Of course, this adds `MDLocation`, the first subclass of `MDUser`. This
is a first-class IR type that has two other representations:
`DILocation` (which now trivially wraps `MDLocation` instead of
`MDNode`) and `DebugLoc`.
I've genericised the code in `LLParser` (and elsewhere) to sketch out
how adding other `MDUser` subclasses will go. Perhaps I used the wrong
axis, but we can adjust it as we go.
Usage examples:
!6 = metadata MDLocation(line: 43, column: 7, scope: !4)
!7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)
The fields can be listed in any order. The `scope:` field is required,
but the others are optional (`line:` and `column:` default to `0`,
`inlinedAt:` defaults to `null`).
(Note that in the RFC I referred to this as an `MDLineTable`, but
`MDLocation` is a better name. If/when this work supersedes the
`DIDescriptor` hierarchy, it'll likely get renamed to `DILocation`, but
for now there's a name clash.)
Where this is heading
Let's look at a concrete example. Here's some simple C++ code:
$ cat t.h
struct T { short a; short b; };
$ cat foo.cpp
#include "t.h"
int foo(T t) { return t.a + t.b; }
$ cat bar.cpp
#include "t.h"
int foo(T t);
int bar(T t) { return foo(t) * 2; }
Looking forward, after refactoring ownership and uniquing and fixing up
a few schema issues, I'd expect the above to link into something like
the following:
!0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")
!1 = metadata DIFile(filename: "./t.h", directory: "/path/to")
!2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")
!3 = metadata DIBaseType(name: "short", size: 16, align: 16)
!5 = metadata DIBaseType(name: "int", size: 32, align: 32)
!6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",
file: !1, line: 1, size: 32, align: 16)
!7 = metadata DIMember(line: 1, file: !1, type: !3,
name: "a", size: 16, align: 16, context: !6)
!8 = metadata DIMember(line: 1, file: !1, type: !3,
name: "b", size: 16, align: 16, context: !6)
!9 = metadata DISubroutineType(args: [ !5, !6 ])
!10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
producer: "clang version 3.6.0 ",
retainedUniqueTypes: [ !6 ])
!11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",
handle: i32(i32)* @_Z3foo1T, file: !0,
type: !9, context: !10)
!12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6,
context: !11)
!13 = metadata DILocation(line: 2, column: 11, scope: !11)
!14 = metadata DILocation(line: 2, column: 16, scope: !11)
!15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
producer: "clang version 3.6.0 ",
retainedUniqueTypes: [ !6 ])
!16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",
handle: i32 (i32)* @_Z3bar1T, file: !2,
type: !9, context: !15)
!17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6,
context: !16)
!18 = metadata DILocation(line: 3, column: 11, scope: !16)
!19 = metadata DILocation(line: 3, column: 23, scope: !16)
Notice that only the links to parents (i.e., `context:`) are explicit
here -- backlinks are implied. For example, !7 and !8 point to !6, but
not the reverse.
This may be a problem - the difference between nodes in a
structure/class_type's member list, and those that are not in the member
list but refer to the class/structure as their parent is meaningful. Type
units use this distinction to avoid emitting instantiation-specific data
into the canonical type unit (nested classes, implicit special members,
member template instantiations, etc).
Not sure what the right answer will be there.
This has the interesting property of removing all cycles from
serialization (assembly and bitcode).
Making debug info assembly readable and writable
Moreover, we're now in a place where it's trivial to express the
"context" pointer structurally. Here's the same debug info as above,
using syntactic sugar to fill the "context" pointers:
!0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")
!1 = metadata DIFile(filename: "./t.h", directory: "/path/to")
!2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")
!3 = metadata DIBaseType(name: "short", size: 16, align: 16)
!5 = metadata DIBaseType(name: "int", size: 32, align: 32)
!6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",
file: !1, line: 1, size: 32, align: 16) {
!7 = metadata DIMember(line: 1, file: !1, type: !3,
name: "a", size: 16, align: 16)
!8 = metadata DIMember(line: 1, file: !1, type: !3,
name: "b", size: 16, align: 16)
} ; !6
!9 = metadata DISubroutineType(args: [ !5, !6 ])
!10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
producer: "clang version 3.6.0 ",
retainedUniqueTypes: [ !6 ]) {
!11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",
handle: i32(i32)* @_Z3foo1T, file: !0,
type: !9) {
!12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6)
!13 = metadata DILocation(line: 2, column: 11)
!14 = metadata DILocation(line: 2, column: 16)
} ; !11
} ; !10
!15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
producer: "clang version 3.6.0 ",
retainedUniqueTypes: [ !6 ]) {
!16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",
handle: i32 (i32)* @_Z3bar1T, file: !2,
type: !9) {
!17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6)
!18 = metadata DILocation(line: 3, column: 11)
!19 = metadata DILocation(line: 3, column: 23)
} ; !16
} ; !15
This assembly has the following advantages over the status quo:
- Fields are named. Aside from readability, this prevents
adding/reordering fields in the schema from requiring testcase
updates.
- Serialization graph becomes a DAG. Aside from readability, this
removes most RAUW from assembly (and all RAUW from bitcode).
- Structure is clear.
Bike sheds to paint
1. Should we trim some boilerplate? E.g., it would be trivial to
change:
!6 = metadata MDLocation(line: 43, column: 7, scope: !4)
to:
!6 = MDLocation(line: 43, column: 7, scope: !4)
This would not complicate `LLParser`. Thoughts?
2. Which of the two "end goal" syntaxes is better: flat, or
hierarchical? Better for what? Why?
The flat one might be better for FileCheck-ing (not sure), but IMO
the hierarchical one is much saner for us humans, and that's the
main point of assembly. It wouldn't be hard to default to one and
write the other based on a command-line flag -- is that a good idea?
I don't think the flat form will be particularly more compelling for
testing. We test the hierarchical DWARF dumping all the time - doing so
pedantically does require littering CHECK-NOT: DW_TAG|NULL in a fair few
places, but even without those the tests aren't /too/ bad.
If checking hierarchical data is a problem, chances are we just want to
improve FileCheck until it isn't, since we already have hierarchies we have
to check.
3. Assembly syntax is pretty easy to change, so this doesn't have to be
perfect now. Nevertheless, is there a magical syntax that would be
easier to read/write/FileCheck?
Presumably when dumping the fields will come in a specific, defined order?
(probably not preserving the original source order, etc) Variation there
would probably make checking harder than it needs to be.
Would it be possible to omit the names of unreferenced nested metadata? (if
you have a bunch of member functions in a class, but don't need to refer to
them elsewhere (eg: those member functions aren't defined in this
translation unit)) - that'd help readability/writeability, but probably
wouldn't impact FileCheck.
Also the whole "metadata " prefix on anything is a bit verbose, if we can
omit that in some/many places, that'll help reduce the visual noise/improve
readability/writeability.
But most/all of that I imagine is fairly easily incremental
change/improvement, nothing fundamental that needs to be chosen up-front.