[DWARFv5] Assembler syntax for new line-table features

TL;DR: If I'm trying to define new assembler directive syntax to
support DWARF v5, it seems like a good idea for all the various
assemblers out there in the world to support the same syntax.
How would I go about negotiating that syntax with other assembler
providers? Is GNU as the only really relevant one?

Long version:

DWARF v5 introduces a couple of new features in the .debug_line section
that require assembler syntax, because the information relates to the
files read by the compiler and there's no other way to inform the
assembler.

The two bits of information are:
(1) the MD5 checksum of each source file; and
(2) the primary source filename.

The primary source filename is given in the .debug_info section. In
DWARF v5 this is repeated in the .debug_line section; prior to DWARF v5
it is not. In both cases, file number 0 refers to this file. Because
the compiler emits the .debug_info section directly, the assembler is
not aware of the name of the primary source file without some new syntax
to provide that information. And, it needs the MD5 checksum as well, so
relying on the old-format '.file' directive is insufficient.

I've added support in LLVM for both of these features, but somewhat
arbitrarily defined assembler syntax to support them. Obviously if
implementers of other assemblers also want to support DWARF v5, the same
information will have to be represented with assembler syntax somehow,
and of course it would be best if all assemblers supporting DWARF v5
used the same syntax. But I don't know how to go about doing that.

Any advice would be welcome.
Thanks,
--paulr

+Eric Christopher +Adrian Prantl +Jonas Devlieghere (seems Jonas is doing a bunch of debug info work - guessing he’s working with you, Adrian?)

I’m guessing Eric’s the most likely to have contacts over in GCC land to maybe bridge the gap when talking about assembly syntax across the two. Eric - any ideas how best to negotiate this pseudo-standard? (there’s another feature or two I’d like to propose too - at least to standardize what the syntax /should/ be, even if gas doesn’t support it immediately)

Paul - perhaps a brief description of the proposed syntax would be helpful to get the ball rolling (even if it’s just discussing it amongst ourselves before it ends up in a cross-project discussion).

  • Dave

To pass the MD5 checksum to the assembler, I added a new optional clause to the .file directive:

md5 “checksum”

where checksum is the 16-byte checksum in hex. It’s quoted because the assembler doesn’t have a way to parse a 16-byte integer. Also this is the same syntax Reid invented for the CodeView equivalent.

To convey the root source filename, I allow the file number on the .file directive to have file number 0. There is special handling in the AsmParser to allow accepting “.file 0” when we’re not actually emitting DWARF 5, the root source file is kept in a separate field and not in the normal file table. If MC does emit a v5 .debug_line section, then it dumps that file entry first before the rest of the file table.

I’ve addressed the latest can’t-build-Linux revert of my patch by suppressing both the md5 clause and ‘.file 0’ for pre-v5. That way the feature is there for people experimenting with v5, but should not interfere with anybody else. I’ll commit that later this morning.

Regarding the discussion, it might be that dwarf-discuss is a better venue, because GCC people will be on that list who care about DWARF. Let me know what you think.

–paulr

To pass the MD5 checksum to the assembler, I added a new optional clause to the .file directive:

md5 “checksum”

where checksum is the 16-byte checksum in hex. It’s quoted because the assembler doesn’t have a way to parse a 16-byte integer.

I’d guess, long-term, that’s probably not a great motivation for choosing pseudo-standardized syntax.

Also this is the same syntax Reid invented for the CodeView equivalent.

To convey the root source filename, I allow the file number on the .file directive to have file number 0. There is special handling in the AsmParser to allow accepting “.file 0” when we’re not actually emitting DWARF 5, the root source file is kept in a separate field and not in the normal file table. If MC does emit a v5 .debug_line section, then it dumps that file entry first before the rest of the file table.

So .file 0 is accepted and ignored pre-5? & that’s to support some weird/old assembly?

I’ve addressed the latest can’t-build-Linux revert of my patch by suppressing both the md5 clause and ‘.file 0’ for pre-v5. That way the feature is there for people experimenting with v5, but should not interfere with anybody else. I’ll commit that later this morning.

Regarding the discussion, it might be that dwarf-discuss is a better venue, because GCC people will be on that list who care about DWARF. Let me know what you think.

Yeah, I’m guessing that might be useful - could see how this conversation goes for a little bit.

  • Dave

To pass the MD5 checksum to the assembler, I added a new optional
clause to the .file directive:
md5 "checksum"
where checksum is the 16-byte checksum in hex. It's quoted because
the assembler doesn't have a way to parse a 16-byte integer.

I'd guess, long-term, that's probably not a great motivation for
choosing pseudo-standardized syntax.

? anyone compiling source to asm needs to inform the assembler about the
checksum, because the assembler might not have access to the original
source when it runs. This will be a common problem across all assemblers
that speak DWARF v5.

To convey the root source filename, I allow the file number on the
.file directive to have file number 0. There is special handling in
the AsmParser to allow accepting ".file 0" when we're not actually
emitting DWARF 5, the root source file is kept in a separate field
and not in the normal file table. If MC does emit a v5 .debug_line
section, then it dumps that file entry first before the rest of the
file table.

So .file 0 is accepted and ignored pre-5? & that's to support some
weird/old assembly?

No, that was just expressed badly. File #0 is stashed in a separate
field, which used to be just the compilation dir. The '.file 0'
parsing stuffs the info there without bothering to check the DWARF
version. When the assembler finally emits the line table, if it's a
v5 line table then the root source file comes out first. If it's a
v4 line table, the root source file is not emitted. (Note the root
source file is still available from the .debug_info section.)

It would be feasible to reject '.file 0' unless the user requested
DWARF v5 specifically. Similarly we could reject the md5 clause.
On the other hand, accepting and ignoring means an assembler file
generated for DWARF v5 could be re-assembled for DWARF v4 without
having to hand-modify the assembler source, so it seemed better to
silently accept the syntax regardless of DWARF version.

--paulr

To pass the MD5 checksum to the assembler, I added a new optional
clause to the .file directive:
md5 “checksum”
where checksum is the 16-byte checksum in hex. It’s quoted because
the assembler doesn’t have a way to parse a 16-byte integer.

I’d guess, long-term, that’s probably not a great motivation for
choosing pseudo-standardized syntax.

? anyone compiling source to asm needs to inform the assembler about the
checksum, because the assembler might not have access to the original
source when it runs. This will be a common problem across all assemblers
that speak DWARF v5.

Sorry, I meant specifically the choice for it to be quoted rather than a bare hex literal.

To convey the root source filename, I allow the file number on the
.file directive to have file number 0. There is special handling in
the AsmParser to allow accepting “.file 0” when we’re not actually
emitting DWARF 5, the root source file is kept in a separate field
and not in the normal file table. If MC does emit a v5 .debug_line
section, then it dumps that file entry first before the rest of the
file table.

So .file 0 is accepted and ignored pre-5? & that’s to support some
weird/old assembly?

No, that was just expressed badly. File #0 is stashed in a separate
field, which used to be just the compilation dir. The ‘.file 0’
parsing stuffs the info there without bothering to check the DWARF
version. When the assembler finally emits the line table, if it’s a
v5 line table then the root source file comes out first. If it’s a
v4 line table, the root source file is not emitted. (Note the root
source file is still available from the .debug_info section.)

It would be feasible to reject ‘.file 0’ unless the user requested
DWARF v5 specifically. Similarly we could reject the md5 clause.
On the other hand, accepting and ignoring means an assembler file
generated for DWARF v5 could be re-assembled for DWARF v4 without
having to hand-modify the assembler source, so it seemed better to
silently accept the syntax regardless of DWARF version.

Ah, so the point you were making was that these things were accepted and ignored, rather than rejected, when in earlier dwarf version modes.

Makes sense to me - generally you don’t even pass a dwarf flag to the assembler, right? If it contains line directives, the assembler produces a line table, if it doesn’t, it doesn’t, yeah? Flags passed are to tweak that behavior - to a non-default format (more recent (v5) in this case).

Hi,

Sorry for the late reply, I was OOO last week.

+Eric Christopher +Adrian Prantl +Jonas Devlieghere (seems Jonas is doing a bunch of debug info work - guessing he's working with you, Adrian?)

Yup, I joined Adrian’s team last August.

I'm guessing Eric's the most likely to have contacts over in GCC land to maybe bridge the gap when talking about assembly syntax across the two. Eric - any ideas how best to negotiate this pseudo-standard? (there's another feature or two I'd like to propose too - at least to standardize what the syntax /should/ be, even if gas doesn't support it immediately)

Can we discuss this on the GCC mailing list? I guess if we both use the same syntax it’s more likely others will follow.

> I'm guessing Eric's the most likely to have contacts over in GCC land to
maybe bridge the gap when talking about assembly syntax across the two.
Eric - any ideas how best to negotiate this pseudo-standard? (there's
another feature or two I'd like to propose too - at least to standardize
what the syntax /should/ be, even if gas doesn't support it immediately)

Can we discuss this on the GCC mailing list? I guess if we both use the
same syntax it’s more likely others will follow.

My inclination was the dwarf-discuss mailing list, which has members
from the gcc community and others. But if people really prefer a
gcc-specific list, after which we simply impose a decision on everyone
else, let me know the address.

>
> Sorry, I meant specifically the choice for it to be quoted rather than a
bare hex literal.

+1, I’m wondering the same.

I assumed that LLVM's AsmParser could not accept a 128-bit hex literal,
given that the CodeView equivalent used a quotes. That's the only reason.
More digging turns up the .octa directive which demonstrates the quotes
are not necessary. I'll take an action item to make the 'md5' option
take a bare 128-bit number.
--paulr

I’m guessing Eric’s the most likely to have contacts over in GCC land to
maybe bridge the gap when talking about assembly syntax across the two.
Eric - any ideas how best to negotiate this pseudo-standard? (there’s
another feature or two I’d like to propose too - at least to standardize
what the syntax /should/ be, even if gas doesn’t support it immediately)

We can do either here if we mostly care about gcc compatibility. And honestly if you’re talking about gas then you should loop in binutils rather than gcc :slight_smile:

Can we discuss this on the GCC mailing list? I guess if we both use the
same syntax it’s more likely others will follow.

My inclination was the dwarf-discuss mailing list, which has members
from the gcc community and others. But if people really prefer a
gcc-specific list, after which we simply impose a decision on everyone
else, let me know the address.

Dwarf discuss works fine, or we could just loop in a discussion with Jakub. Either way. dwarf-discuss is going to be a little more unwieldy just because of the (in some ways) larger community.

Sorry, I meant specifically the choice for it to be quoted rather than a
bare hex literal.

+1, I’m wondering the same.

I assumed that LLVM’s AsmParser could not accept a 128-bit hex literal,
given that the CodeView equivalent used a quotes. That’s the only reason.
More digging turns up the .octa directive which demonstrates the quotes
are not necessary. I’ll take an action item to make the ‘md5’ option
take a bare 128-bit number.

Thanks.

-eric