Adding line table debug information to LLVM on Windows

Hi David, Eric, LLVM devs,

You've probably heard about AddressSanitizer (ASan) and other
sanitizers based on LLVM. One of the things that makes ASan
not as awesome on Windows as it is on Linux
is the symbolization of the stacks.

Currently, ASan runtime on Windows uses
CaptureStackBackTrace/SymFromAddr/SymGetLineFromAddr64
to unwind and symbolize stacks. This works like a charm
in-process for stack frames built with CL, but yields
"function+0x0ff537" for frames built with Clang.

I came up with a prototype which emits "old-style debug info" COFF
sections that are sufficient to get function name / filename /
linenumber information. That's pretty much everything that's required
for ASan to work beautifully in terms of the completeness of error
reports.

Attached is a prototype patch which I've tried on some simple tests,
including some more complex ones with weird #line constructions.
It also works just great on ASan/Win tests without any link/run-time
warnings (I had a bunch of those during development, so I can tell it
works rather than fails silently).

I didn't have time to work on threading the command-line flags into
the AsmPrinter yet, so currently it just replaces DWARF entirely.
Of course, this should be fixed before this lands into trunk.
Currently, one can try this patch by using "clang-cl -Xclang -g ...".
Eventually we should have some dedicated flag for clang-cl.

Can you please take a look at the patch and suggest a good path forward?

I'm very unfamiliar with LLVM CodeGen/MC, so I'm pretty sure I made a
few weird things in the code... I've also put a few TODOs with
questions and suggestions.

Some general questions:
1) Threading flags from the driver down to CodeGen.
  How do we do that? Should we support all 4 combinations
  of no-info/DWARF/CVLT/both?
  How about "-Zmlt" as the clang-cl flag name? ("minimal line tables")

2) Am I right that DWARF is pretty much the only debug info format
  supported by LLVM/AsmPrinter right now? Do we want to take
  an effort to come up with a generic debuginfogenerator interface
  to share between DwarfDebug and WinCodeViewLineTables?
  Then AsmPrinter should just hold a SmallVector<DebugInfoEmitter*>
  rather than a pair of DD/DE pointers.

3) How would you suggest to write WinCodeViewLineTables tests
  given that dumpbin is not available everywhere except Windows?
  // Yeah, I should have looked at the DwarfDebug LIT tests and
  // written some; but the prototype development went faster
  // than I expected...

Can you suggest ways to split this patch so it's easier
to review part-by-part before this hits trunk?

Thanks!

codeview_linetables.patch (35.5 KB)

Hi David, Eric, LLVM devs,

You've probably heard about AddressSanitizer (ASan) and other
sanitizers based on LLVM. One of the things that makes ASan
not as awesome on Windows as it is on Linux
is the symbolization of the stacks.

Currently, ASan runtime on Windows uses
CaptureStackBackTrace/SymFromAddr/SymGetLineFromAddr64
to unwind and symbolize stacks. This works like a charm
in-process for stack frames built with CL, but yields
"function+0x0ff537" for frames built with Clang.

I came up with a prototype which emits "old-style debug info" COFF
sections that are sufficient to get function name / filename /
linenumber information. That's pretty much everything that's required
for ASan to work beautifully in terms of the completeness of error
reports.

Attached is a prototype patch which I've tried on some simple tests,
including some more complex ones with weird #line constructions.
It also works just great on ASan/Win tests without any link/run-time
warnings (I had a bunch of those during development, so I can tell it
works rather than fails silently).

I didn't have time to work on threading the command-line flags into
the AsmPrinter yet, so currently it just replaces DWARF entirely.
Of course, this should be fixed before this lands into trunk.
Currently, one can try this patch by using "clang-cl -Xclang -g ...".
Eventually we should have some dedicated flag for clang-cl.

Can you please take a look at the patch and suggest a good path forward?

I'm very unfamiliar with LLVM CodeGen/MC, so I'm pretty sure I made a
few weird things in the code... I've also put a few TODOs with
questions and suggestions.

Some general questions:
1) Threading flags from the driver down to CodeGen.
  How do we do that? Should we support all 4 combinations
  of no-info/DWARF/CVLT/both?
  How about "-Zmlt" as the clang-cl flag name? ("minimal line tables")

2) Am I right that DWARF is pretty much the only debug info format
  supported by LLVM/AsmPrinter right now? Do we want to take
  an effort to come up with a generic debuginfogenerator interface
  to share between DwarfDebug and WinCodeViewLineTables?
  Then AsmPrinter should just hold a SmallVector<DebugInfoEmitter*>
  rather than a pair of DD/DE pointers.

3) How would you suggest to write WinCodeViewLineTables tests
  given that dumpbin is not available everywhere except Windows?
  // Yeah, I should have looked at the DwarfDebug LIT tests and
  // written some; but the prototype development went faster
  // than I expected...

I found the MCAsmStreamer being used by llc which gives a decent text format.
I wrote a simple x86+x86_64 .ll test and FileCheck expectations for
the llc asm output.
Is this the right approach to write tests?
If so, I'll convert my remaining C program test cases into such an
.ll+llc tests.

Can you suggest ways to split this patch so it's easier
to review part-by-part before this hits trunk?

Attached is an updated patch with a new test and a few minor things improved.
I also removed the "TODO: test on X64" as I did try it on x64 and no
changes were required.

Looking forward to your feedback!

codeview_linetables.patch (42.5 KB)

Hi Timur,

There’s also a pending patch adding CodeView support in Phab: http://llvm-reviews.chandlerc.com/D165

Does your patch provide just a subset of the CodeView debug info provided in the other patch?

Looking at the patch, I think the approach the other patch took of abstracting the emission of debug information is a bit cleaner and it will probably make life easier when adding more debug formats in the future.

I wrote some more lit tests for my patch and realized I was generating
some redundant info. This is fixed now. Attached is a new version
of the prototype patch with some more tests.

Hi Timur,

There's also a pending patch adding CodeView support in Phab:
http://llvm-reviews.chandlerc.com/D165

I haven't looked at your patch yet, but based on the very low phab
review number, I'm pretty sure there are good reasons this wasn't
committed.

Does your patch provide just a subset of the CodeView debug info provided in
the other patch?

Yes. I prefer small incremental changes.
Also, the file:line debug info is much much more important for me atm
than the other types of debug info.

Looking at the patch, I think the approach the other patch took of
abstracting the emission of debug information is a bit cleaner and it will
probably make life easier when adding more debug formats in the future.

Why wasn't the "abstract the emission of DI" part of that patch
reviewed/committed separately?

codeview_linetables.patch (85 KB)

In general I do think we're going to need to abstract it out as much
as possible. I'm not sure what the previous patch looks like, but
abstracting the interface out would be general goodness for this. We
can talk about designs for that as we move on. As far as how to
migrate the decision down we can have it both as an option to code gen
maybe or, for now, make it dependent upon triple. The former is, I
think, the best option there.

I'm not sure if you're going to want to support both debug info at the
same time, but it's a readonly format at debug emission so I don't see
it as being a problem. -Zmlt makes sense as the clang-cl name, or just
make it whatever the debug mode flag is for cl.exe - this is at least
a start down that path.

I think the best way for the tests will be to write a set of parsers,
etc that can dump the info. I realize this is likely a very large
undertaking as well, but it seems like the only way to ensure we're
getting some decent testing out.

Any other questions I missed?

-eric

In general I do think we're going to need to abstract it out as much
as possible. I'm not sure what the previous patch looks like, but
abstracting the interface out would be general goodness for this. We
can talk about designs for that as we move on. As far as how to
migrate the decision down we can have it both as an option to code gen
maybe or, for now, make it dependent upon triple. The former is, I
think, the best option there.

I'm not sure if you're going to want to support both debug info at the
same time, but it's a readonly format at debug emission so I don't see
it as being a problem. -Zmlt makes sense as the clang-cl name, or just
make it whatever the debug mode flag is for cl.exe - this is at least
a start down that path.

I'd just use /Z7, since that's the cl.exe flag for the compatible format.
This will need a long-form clang -cc1 flag name, though.

Attached is a slightly updated patch.
(it doesn't include D2222 yet).

In general I do think we're going to need to abstract it out as much
as possible. I'm not sure what the previous patch looks like, but
abstracting the interface out would be general goodness for this. We
can talk about designs for that as we move on.

How about http://llvm-reviews.chandlerc.com/D2222 ?
// Ha! A lucky number!

As far as how to
migrate the decision down we can have it both as an option to code gen
maybe or, for now, make it dependent upon triple. The former is, I
think, the best option there.

You mean LLVM CodeGen or Clang CodeGen?

Can you suggest a few places in the code where I can find the clues for that?
I'm not yet familiar with this part of the project...

I'm not sure if you're going to want to support both debug info at the
same time, but it's a readonly format at debug emission so I don't see
it as being a problem.

Well, if two debug formats share the same section name, they might
conflict with each other.
I don't think this is the case for Dwarf&CodeView [yet?].

-Zmlt makes sense as the clang-cl name, or just
make it whatever the debug mode flag is for cl.exe - this is at least
a start down that path.

I'd just use /Z7, since that's the cl.exe flag for the compatible format.
This will need a long-form clang -cc1 flag name, though.

Z7 implies a much more complete debug info than what we'll emit short-term.
Since Z7 is not widely used, I don't think threading Z7 is any simpler
than Zmlt.
I don't have a strong opinion though, WDYT?

I think the best way for the tests will be to write a set of parsers,
etc that can dump the info. I realize this is likely a very large
undertaking as well, but it seems like the only way to ensure we're
getting some decent testing out.

As you can see from my patch, I actually succeeded in writing *some*
tests without a dumper - just by using the MCAsmStreamer.
Do you think we should really write a dumper too?
That's kinda hard and we don't plan to fully support the CodeView format yet...

Any other questions I missed?

Please see the TODOs in the attached patch.
You are very likely to come up with a better design/ideas given I'm
new to this part of the codebase.

One particular question I'd like to emphasize is getting a full
filepath for a given MDNode.
As far as I can tell, the metadata for scopes holds pairs of
<filename, directory>, which reflects how DWARF stores them.
However, CodeView stores full paths as entire strings (I admit that
ain't efficient).
Currently, I concat the directory and filename together, but it
a) requires some extra memory management
b) requires special tricks to handle filenames starting from "./", "../", etc.
c) the slashes in the directory name and filename are not consistent on Windows
and is ugly in general.

Do you think it's appropriate to change the scope metadata format to
store <filename, directory, fullpath> instead?
That'd require changing Clang, right?

codeview_linetables.patch (58.6 KB)

My thinking is that it works well with /fallback. We'll make Z7-style
debug info, and for TUs compiled with cl.exe, we'll get the same style info.

Ah, seamlessly integrating with /fallback is a good point. I’m not sure we really want Z7 rather than Zi when we fall back, but let’s start with Z7 and see how it goes.

Eric, David,

Attached is a slightly updated patch.
(it doesn't include D2222 yet).

The new version of the patch stopped fitting into the llvmdev 100K limit,
so I've uploaded it to http://llvm-reviews.chandlerc.com/D2232
(you need to apply D2222 first if you'd like to give D2232 a try)

In general I do think we're going to need to abstract it out as much
as possible. I'm not sure what the previous patch looks like, but
abstracting the interface out would be general goodness for this. We
can talk about designs for that as we move on.

How about http://llvm-reviews.chandlerc.com/D2222 ?
// Ha! A lucky number!

As far as how to
migrate the decision down we can have it both as an option to code gen
maybe or, for now, make it dependent upon triple. The former is, I
think, the best option there.

You mean LLVM CodeGen or Clang CodeGen?

On the second thought, "llc" seems to choose ELF vs COFF by using a triple.
I think we should make the DWARF vs CodeView choice in sync with the
object file format choice, at least to begin with.
WDYT?

Can you suggest a few places in the code where I can find the clues for that?
I'm not yet familiar with this part of the project...

I'm not sure if you're going to want to support both debug info at the
same time, but it's a readonly format at debug emission so I don't see
it as being a problem.

Well, if two debug formats share the same section name, they might
conflict with each other.
I don't think this is the case for Dwarf&CodeView [yet?].

-Zmlt makes sense as the clang-cl name, or just
make it whatever the debug mode flag is for cl.exe - this is at least
a start down that path.

I'd just use /Z7, since that's the cl.exe flag for the compatible format.
This will need a long-form clang -cc1 flag name, though.

Z7 implies a much more complete debug info than what we'll emit short-term.
Since Z7 is not widely used, I don't think threading Z7 is any simpler
than Zmlt.
I don't have a strong opinion though, WDYT?

[I agreed to use Z7 on this thread]

I think the best way for the tests will be to write a set of parsers,
etc that can dump the info. I realize this is likely a very large
undertaking as well, but it seems like the only way to ensure we're
getting some decent testing out.

As you can see from my patch, I actually succeeded in writing *some*
tests without a dumper - just by using the MCAsmStreamer.
Do you think we should really write a dumper too?
That's kinda hard and we don't plan to fully support the CodeView format yet...

I tried my patch on Chromium and it hit the llvm_unreachable I wrote
in WinCOFFStreamer::EmitCOFFStaticOffset().
Now that it also supports using a fixup to calculate the offset (that
happens as the second pass, not supported by MCAsmStreamer, right?), I
think I do have to write at least a simple dumper...

Any hints on how to do that? Should it be a separate app or built into
some COFF reader readily available? Should it have its own tests, i.e.
binary COFF files added to the repo?

Good news - other than that, the emitter seems to work fine on some
medium-sized Chromium tests and generate symbolized ASan reports if I
manually introduce an error in the binary.

Any other questions I missed?

Please see the TODOs in the attached patch.
You are very likely to come up with a better design/ideas given I'm
new to this part of the codebase.

One particular question I'd like to emphasize is getting a full
filepath for a given MDNode.
As far as I can tell, the metadata for scopes holds pairs of
<filename, directory>, which reflects how DWARF stores them.
However, CodeView stores full paths as entire strings (I admit that
ain't efficient).
Currently, I concat the directory and filename together, but it
a) requires some extra memory management
b) requires special tricks to handle filenames starting from "./", "../", etc.
c) the slashes in the directory name and filename are not consistent on Windows
and is ugly in general.

Do you think it's appropriate to change the scope metadata format to
store <filename, directory, fullpath> instead?
That'd require changing Clang, right?

ping

Eric, David,

> Attached is a slightly updated patch.
> (it doesn't include D2222 yet).

The new version of the patch stopped fitting into the llvmdev 100K limit,
so I've uploaded it to http://llvm-reviews.chandlerc.com/D2232
(you need to apply D2222 first if you'd like to give D2232 a try)

>> In general I do think we're going to need to abstract it out as much
>> as possible. I'm not sure what the previous patch looks like, but
>> abstracting the interface out would be general goodness for this. We
>> can talk about designs for that as we move on.
>
> How about http://llvm-reviews.chandlerc.com/D2222 ?
> // Ha! A lucky number!
>
>> As far as how to
>> migrate the decision down we can have it both as an option to code gen
>> maybe or, for now, make it dependent upon triple. The former is, I
>> think, the best option there.
>
> You mean LLVM CodeGen or Clang CodeGen?

On the second thought, "llc" seems to choose ELF vs COFF by using a triple.
I think we should make the DWARF vs CodeView choice in sync with the
object file format choice, at least to begin with.
WDYT?

I think we can use the triple OS to choose a sensible default here (win32
-> CodeView, mingw -> DWARF), but eventually some users might want to force
DWARF on win32 with a flag because it is more complete.

> Can you suggest a few places in the code where I can find the clues for
that?
> I'm not yet familiar with this part of the project...
>
>> I'm not sure if you're going to want to support both debug info at the
>> same time, but it's a readonly format at debug emission so I don't see
>> it as being a problem.
>
> Well, if two debug formats share the same section name, they might
> conflict with each other.
> I don't think this is the case for Dwarf&CodeView [yet?].
>
>> -Zmlt makes sense as the clang-cl name, or just
>> make it whatever the debug mode flag is for cl.exe - this is at least
>> a start down that path.
>
>> I'd just use /Z7, since that's the cl.exe flag for the compatible
format.
>> This will need a long-form clang -cc1 flag name, though.
>
> Z7 implies a much more complete debug info than what we'll emit
short-term.
> Since Z7 is not widely used, I don't think threading Z7 is any simpler
> than Zmlt.
> I don't have a strong opinion though, WDYT?

[I agreed to use Z7 on this thread]

>> I think the best way for the tests will be to write a set of parsers,
>> etc that can dump the info. I realize this is likely a very large
>> undertaking as well, but it seems like the only way to ensure we're
>> getting some decent testing out.
>
> As you can see from my patch, I actually succeeded in writing *some*
> tests without a dumper - just by using the MCAsmStreamer.
> Do you think we should really write a dumper too?
> That's kinda hard and we don't plan to fully support the CodeView format
yet...

I tried my patch on Chromium and it hit the llvm_unreachable I wrote
in WinCOFFStreamer::EmitCOFFStaticOffset().
Now that it also supports using a fixup to calculate the offset (that
happens as the second pass, not supported by MCAsmStreamer, right?), I
think I do have to write at least a simple dumper...

Any hints on how to do that? Should it be a separate app or built into
some COFF reader readily available? Should it have its own tests, i.e.
binary COFF files added to the repo?

Good news - other than that, the emitter seems to work fine on some
medium-sized Chromium tests and generate symbolized ASan reports if I
manually introduce an error in the binary.

>> Any other questions I missed?
>
> Please see the TODOs in the attached patch.
> You are very likely to come up with a better design/ideas given I'm
> new to this part of the codebase.
>
> One particular question I'd like to emphasize is getting a full
> filepath for a given MDNode.
> As far as I can tell, the metadata for scopes holds pairs of
> <filename, directory>, which reflects how DWARF stores them.
> However, CodeView stores full paths as entire strings (I admit that
> ain't efficient).
> Currently, I concat the directory and filename together, but it
> a) requires some extra memory management
> b) requires special tricks to handle filenames starting from "./",
"../", etc.
> c) the slashes in the directory name and filename are not consistent on
Windows
> and is ugly in general.
>
> Do you think it's appropriate to change the scope metadata format to
> store <filename, directory, fullpath> instead?
> That'd require changing Clang, right?

ping

I think the problems you mention can be resolved without changing the DI
metadata format. Debug info metadata already consumes too much memory.

On the second thought, "llc" seems to choose ELF vs COFF by using a triple.
I think we should make the DWARF vs CodeView choice in sync with the
object file format choice, at least to begin with.
WDYT?

On Windows it's perfectly fine to have DWARF encoded into COFF
objects. E.g. in case of mingw32/-w64 triple.

Eric, David,

> Attached is a slightly updated patch.
> (it doesn't include D2222 yet).

The new version of the patch stopped fitting into the llvmdev 100K limit,
so I've uploaded it to http://llvm-reviews.chandlerc.com/D2232
(you need to apply D2222 first if you'd like to give D2232 a try)

>> In general I do think we're going to need to abstract it out as much
>> as possible. I'm not sure what the previous patch looks like, but
>> abstracting the interface out would be general goodness for this. We
>> can talk about designs for that as we move on.
>
> How about http://llvm-reviews.chandlerc.com/D2222 ?
> // Ha! A lucky number!
>
>> As far as how to
>> migrate the decision down we can have it both as an option to code gen
>> maybe or, for now, make it dependent upon triple. The former is, I
>> think, the best option there.
>
> You mean LLVM CodeGen or Clang CodeGen?

On the second thought, "llc" seems to choose ELF vs COFF by using a
triple.
I think we should make the DWARF vs CodeView choice in sync with the
object file format choice, at least to begin with.
WDYT?

On Windows it's perfectly fine to have DWARF encoded into COFF
objects. E.g. in case of mingw32/-w64 triple.

Yeah, I actually meant to say "in sync with the triple OS", as Reid suggests.

I think we can use the triple OS to choose a sensible default here (win32 ->
CodeView, mingw -> DWARF),

Yep

but eventually some users might want to force
DWARF on win32 with a flag because it is more complete.

Do you think this is worth doing immediately or this can wait?

Let's wait.

Eric, David,

Attached is a slightly updated patch.
(it doesn't include D2222 yet).

The new version of the patch stopped fitting into the llvmdev 100K limit,
so I've uploaded it to http://llvm-reviews.chandlerc.com/D2232
(you need to apply D2222 first if you'd like to give D2232 a try)

That's a big patch. :slight_smile:

In general I do think we're going to need to abstract it out as much
as possible. I'm not sure what the previous patch looks like, but
abstracting the interface out would be general goodness for this. We
can talk about designs for that as we move on.

How about http://llvm-reviews.chandlerc.com/D2222 ?
// Ha! A lucky number!

As far as how to
migrate the decision down we can have it both as an option to code gen
maybe or, for now, make it dependent upon triple. The former is, I
think, the best option there.

You mean LLVM CodeGen or Clang CodeGen?

LLVM... and we really need to change that directory in Clang. :slight_smile:

On the second thought, "llc" seems to choose ELF vs COFF by using a triple.
I think we should make the DWARF vs CodeView choice in sync with the
object file format choice, at least to begin with.
WDYT?

Yep. Sounds good to me. I'd suggest triple, that way you can base it
on "windows" versus "coff".

Can you suggest a few places in the code where I can find the clues for that?
I'm not yet familiar with this part of the project...

lib/CodeGen/AsmPrinter/DwarfDebug.cpp has some checks for triple/os.

I tried my patch on Chromium and it hit the llvm_unreachable I wrote
in WinCOFFStreamer::EmitCOFFStaticOffset().
Now that it also supports using a fixup to calculate the offset (that
happens as the second pass, not supported by MCAsmStreamer, right?), I
think I do have to write at least a simple dumper...

Any hints on how to do that? Should it be a separate app or built into
some COFF reader readily available? Should it have its own tests, i.e.
binary COFF files added to the repo?

Probably the same structure that we've been going down via
lib/DebugInfo/. A set of files than handle reading and parsing and
both some binary files and some files produced by the backend.

Good news - other than that, the emitter seems to work fine on some
medium-sized Chromium tests and generate symbolized ASan reports if I
manually introduce an error in the binary.

Awesome.

Any other questions I missed?

Please see the TODOs in the attached patch.
You are very likely to come up with a better design/ideas given I'm
new to this part of the codebase.

Will do.

One particular question I'd like to emphasize is getting a full
filepath for a given MDNode.
As far as I can tell, the metadata for scopes holds pairs of
<filename, directory>, which reflects how DWARF stores them.
However, CodeView stores full paths as entire strings (I admit that
ain't efficient).
Currently, I concat the directory and filename together, but it
a) requires some extra memory management
b) requires special tricks to handle filenames starting from "./", "../", etc.
c) the slashes in the directory name and filename are not consistent on Windows
and is ugly in general.

Do you think it's appropriate to change the scope metadata format to
store <filename, directory, fullpath> instead?
That'd require changing Clang, right?

It would require changing clang. Since this is a single string per
file I'm not against adding the full path to the source file in the
metadata along side the basename/compilation dir pair.

-eric

> Eric, David,
>
>> Attached is a slightly updated patch.
>> (it doesn't include D2222 yet).
>
> The new version of the patch stopped fitting into the llvmdev 100K limit,
> so I've uploaded it to http://llvm-reviews.chandlerc.com/D2232
> (you need to apply D2222 first if you'd like to give D2232 a try)
>

That's a big patch. :slight_smile:

Mostly because of the tests, I hope :slight_smile:

At least I already committed some obviously-independent parts of it...

>>> In general I do think we're going to need to abstract it out as much
>>> as possible. I'm not sure what the previous patch looks like, but
>>> abstracting the interface out would be general goodness for this. We
>>> can talk about designs for that as we move on.
>>
>> How about http://llvm-reviews.chandlerc.com/D2222 ?
>> // Ha! A lucky number!
>>
>>> As far as how to
>>> migrate the decision down we can have it both as an option to code gen
>>> maybe or, for now, make it dependent upon triple. The former is, I
>>> think, the best option there.
>>
>> You mean LLVM CodeGen or Clang CodeGen?
>

LLVM... and we really need to change that directory in Clang. :slight_smile:

> On the second thought, "llc" seems to choose ELF vs COFF by using a
triple.
> I think we should make the DWARF vs CodeView choice in sync with the
> object file format choice, at least to begin with.
> WDYT?
>

Yep. Sounds good to me. I'd suggest triple, that way you can base it
on "windows" versus "coff".

>> Can you suggest a few places in the code where I can find the clues for
that?
>> I'm not yet familiar with this part of the project...
>>

lib/CodeGen/AsmPrinter/DwarfDebug.cpp has some checks for triple/os.

OK, I use the Triple().isOSWindows() now.

>
> I tried my patch on Chromium and it hit the llvm_unreachable I wrote
> in WinCOFFStreamer::EmitCOFFStaticOffset().
> Now that it also supports using a fixup to calculate the offset (that
> happens as the second pass, not supported by MCAsmStreamer, right?), I
> think I do have to write at least a simple dumper...
>
> Any hints on how to do that? Should it be a separate app or built into
> some COFF reader readily available? Should it have its own tests, i.e.
> binary COFF files added to the repo?
>

Probably the same structure that we've been going down via
lib/DebugInfo/. A set of files than handle reading and parsing and
both some binary files and some files produced by the backend.

Ooph, that's a big one.
Any more tips to prioritize diving into this codebase?

> Good news - other than that, the emitter seems to work fine on some
> medium-sized Chromium tests and generate symbolized ASan reports if I
> manually introduce an error in the binary.
>

Awesome.

>>> Any other questions I missed?
>>
>> Please see the TODOs in the attached patch.
>> You are very likely to come up with a better design/ideas given I'm
>> new to this part of the codebase.
>>

Will do.

Ping?

>> One particular question I'd like to emphasize is getting a full
>> filepath for a given MDNode.
>> As far as I can tell, the metadata for scopes holds pairs of
>> <filename, directory>, which reflects how DWARF stores them.
>> However, CodeView stores full paths as entire strings (I admit that
>> ain't efficient).
>> Currently, I concat the directory and filename together, but it
>> a) requires some extra memory management
>> b) requires special tricks to handle filenames starting from "./",
"../", etc.
>> c) the slashes in the directory name and filename are not consistent on
Windows
>> and is ugly in general.
>>
>> Do you think it's appropriate to change the scope metadata format to
>> store <filename, directory, fullpath> instead?
>> That'd require changing Clang, right?
>

It would require changing clang. Since this is a single string per
file I'm not against adding the full path to the source file in the
metadata along side the basename/compilation dir pair.

OK, will go down this path.

>

Probably the same structure that we've been going down via
lib/DebugInfo/. A set of files than handle reading and parsing and
both some binary files and some files produced by the backend.

Ooph, that's a big one.
Any more tips to prioritize diving into this codebase?

It mostly uses libObject to do the initial read and parse of the
object file. Past that I'd suggest adding a parallel set of
context/structure like the dwarf one. Ultimately it's going to look
somewhat similar.

>> Please see the TODOs in the attached patch.
>> You are very likely to come up with a better design/ideas given I'm
>> new to this part of the codebase.
>>

Will do.

Ping?

Can you rebase after the other patches and I'll take a more careful look?

-eric

>> >
>>
>> Probably the same structure that we've been going down via
>> lib/DebugInfo/. A set of files than handle reading and parsing and
>> both some binary files and some files produced by the backend.
>
>
> Ooph, that's a big one.
> Any more tips to prioritize diving into this codebase?
>

It mostly uses libObject to do the initial read and parse of the
object file. Past that I'd suggest adding a parallel set of
context/structure like the dwarf one. Ultimately it's going to look
somewhat similar.

>> >> Please see the TODOs in the attached patch.
>> >> You are very likely to come up with a better design/ideas given I'm
>> >> new to this part of the codebase.
>> >>
>>
>> Will do.
>
>
> Ping?
>

Can you rebase after the other patches and I'll take a more careful look?

Which ones?
It's already rebased on ToT.

-eric

>> >
>>
>> Probably the same structure that we've been going down via
>> lib/DebugInfo/. A set of files than handle reading and parsing and
>> both some binary files and some files produced by the backend.
>
>
> Ooph, that's a big one.
> Any more tips to prioritize diving into this codebase?
>

It mostly uses libObject to do the initial read and parse of the
object file. Past that I'd suggest adding a parallel set of
context/structure like the dwarf one. Ultimately it's going to look
somewhat similar.

>> >> Please see the TODOs in the attached patch.
>> >> You are very likely to come up with a better design/ideas given I'm
>> >> new to this part of the codebase.
>> >>
>>
>> Will do.
>
>
> Ping?
>

Can you rebase after the other patches and I'll take a more careful look?

Which ones?
It's already rebased on ToT.

Thought you had it in phabricator somewhere... if not, where is the
current patch again? :slight_smile:

-eric

http://llvm-reviews.chandlerc.com/D2232