Query/Suggestions on upgrading macro infrastructure.

Hello Everyone,

I would like to have your thoughts on this.

Overview of the problem

Hello Everyone,

I would like to have your thoughts on this.

Overview of the problem

===================

While implementing support for the DWARFv5 debug_macro section support in LLVM. I came across these holes:

  • The macros infrastructure in CLANG/LLVM is inherently tied to a particular version(v4 macinfo). For instance, consider this snippet from CLANG:

Gen->getCGDebugInfo()->CreateMacro(getCurrentScope(), llvm::dwarf::DW_MACINFO_define, location, Name.str(), Value.str()).

  • So, let’s say you want to add support for a new form /DW_MACRO_define_strx/, with existing infrastructure, you can’t do this without polluting code with lot of if-else. This is just to highlight the problem, there are numerous areas where we will have to add if-else.
  • DWARF forms[DW_MACINFO*] are unnecessarily hard-coded in CLANG, resulting in a false dependence of CLANG/LLVM on entire generation phase of debug info for macros, even when they don’t have to.
  • To worsen the problem even more, we have some DWARF forms sharing same encoding. Currently, we have 4-forms that share same encoding with previous DWARFv4.

4 forms that share the same encoding in DWARFv4 - you mean the define, define_strp, define_strx, and define_sup?

I think that the solution here is probably not to change the IR representation - IR doesn’t have the semantics that necessitate strip/strx/sup encodings (there’s no indexed string section at the IR level - string sharing is a lower level implementation detail at the IR Level that’s not apparent at the IR Level). These are choices the backend would make about how to efficiently encode the strings into the final output.

All this is to say that I agree with you about the semantics of your proposal, but I think the /syntax/ of the IR can remain the same, just with the semantics you’re describing. (ie: that DW_MACINFO_define in the IR doesn’t necessarily mean you’ll get a DW_MACINFO_define in the output (you might get a DW_MACINFO_define, maybe a DW_MACINFO_define_strx, etc, etc… )

Hi David,

Thanks for sharing your thoughts/suggestions on this.

4 forms that share the same encoding in DWARFv4 - you mean the define, define_strp, define_strx, and define_sup?

These are:
DW_MACINFO_define = DW_MACRO_define
DW_MACINFO_undef = DW_MACRO_undef
DW_MACINFO_start_file = DW_MACRO_start_file
DW_MACINFO_end_file = DW_MACRO_end_file

I think that the solution here is probably not to change the IR representation - IR doesn’t have the semantics that necessitate strip/strx/sup encodings (there’s no indexed string section at the IR level - string sharing is a lower level implementation detail at the IR Level that’s not apparent at the IR Level). These are choices the backend would make about how to efficiently encode the strings into the final output.

Agreed. IR doesn’t have the semantics that necessitate strp/strx/sup encodings, and don’t have too. However considering the most productive macro-string representation form /DW_FORM_define_strx/ and taking the existing infrastructure, emission psuedo code will be :

if (Version == 4 && getMacinfoType() == DW_MACINFO_define)
emit DW_MACINFO_define
if (Version == 5 && getMacinfoType() == DW_MACINFO_define)
emit DW_MACRO_define_strx

Now here the presence of “DW_MACINFO_define” here seems unfair/odd, may confuse the user/developer.

With new types, we can have much cleaner code:

if (getGenericMacroType() == MACRO_definition) /* IR types doesn’t show/share any false dependence on how back-end DWARF or other will consume this*/
if (Version == 5) emit “DW_MACRO_define_strx”
else emit “DW_MACRO_define”

Even at the IR level, user will have confusion. When in .ll file we have “DW_MACINFO_define” and subsequent binary contains “DW_MACRO_define_strx”. This again, is due to the fact that “DW_MACINFO_define” is defined in the DWARF spec, resulting in a false dependence/representation. At least visible to user familiar with DWARF.

Yet another motivating example: Exactly same thing happens with “DW_MACINFO_start_file”. We’ll have “DW_MACINFO_start_file” in .ll file and subsequent binary will contain “DW_MACRO_start_file”. Consider this snippet from llvm:

auto *MF = DIMacroFile::get(VMContext, dwarf::DW_MACINFO_start_file, TMF->getLine(), TMF->getFile(), getOrCreateMacroArray(I.second.getArrayRef()));
replaceTemporary(llvm::TempDIMacroNode(TMF), MF);

Now based on this, emission psuedo code will be:

if (Version == 4 && getMacinfoType() == DW_MACINFO_start_file)
emit “DW_MACINFO_start_file”
if (Version == 5 && getMacinfoType() == DW_MACINFO_start_file)
emit “DW_MACRO_start_file”

All this is to say that I agree with you about the semantics of your proposal, but I think the /syntax/ of the IR can remain the same, just with the semantics you’re describing. (ie: that DW_MACINFO_define in the IR doesn’t necessarily mean you’ll get a DW_MACINFO_define in the output (you might get a DW_MACINFO_define, maybe a DW_MACINFO_define_strx, etc, etc… )

To summarize: Adding new Generic macro type to metadata IR, and removing old ones seems a clean and promising solution. This also enforces/depicts that IR metadata is not dependent on the target consumer DWARF or other.

Thanks,
Sourabh.

Hi David,

Thanks for sharing your thoughts/suggestions on this.

4 forms that share the same encoding in DWARFv4 - you mean the define, define_strp, define_strx, and define_sup?

These are:
DW_MACINFO_define = DW_MACRO_define
DW_MACINFO_undef = DW_MACRO_undef
DW_MACINFO_start_file = DW_MACRO_start_file
DW_MACINFO_end_file = DW_MACRO_end_file

I think that the solution here is probably not to change the IR representation - IR doesn’t have the semantics that necessitate strip/strx/sup encodings (there’s no indexed string section at the IR level - string sharing is a lower level implementation detail at the IR Level that’s not apparent at the IR Level). These are choices the backend would make about how to efficiently encode the strings into the final output.

Agreed. IR doesn’t have the semantics that necessitate strp/strx/sup encodings, and don’t have too. However considering the most productive macro-string representation form /DW_FORM_define_strx/ and taking the existing infrastructure, emission psuedo code will be :

if (Version == 4 && getMacinfoType() == DW_MACINFO_define)
emit DW_MACINFO_define
if (Version == 5 && getMacinfoType() == DW_MACINFO_define)
emit DW_MACRO_define_strx

Now here the presence of “DW_MACINFO_define” here seems unfair/odd, may confuse the user/developer.

With new types, we can have much cleaner code:

if (getGenericMacroType() == MACRO_definition) /* IR types doesn’t show/share any false dependence on how back-end DWARF or other will consume this*/
if (Version == 5) emit “DW_MACRO_define_strx”
else emit “DW_MACRO_define”

I don’t think that adding a non-DWARF name can make the code cleaner than reusing the DWARF name for both v4 and v5.

Either you have to translate from generic to v4 and generic to v5, or you have to translate from v4 to v5, and do nothing for v4 to v4. It seems strictly simpler (you could for symmetry always do a v4 to v4 translation too, but I doubt that’d be necessary).

I’d probably leave the existing code mostly as-is with just a couple of ‘if’ statements:

bool UseMacinfo = getDwarfVersion() >= 5;
unsigned MacInfoType = M.getMacinfoType();
assert(MacinfoType == DW_MACINFO_define || MacinfoType == DW_MACINFO_undef);
// alternatively this could use a map lookup or other mechanism for mapping from DWARFv4 to DWARFv5 encodings
Asm->EmitULEB128(MacinfoType + (UseMacInfo ? (DW_MACRO_define_strx - 1) : 0);
Asm->EmitULEB128(M.getLine());
StringRef Name = M.getName();
StringRef Value = M.getValue();
if (UseMacinfo) {
DwarfFile &Holder = useSplitDwarf() ? SkeletonHolder : InfoHolder;
Asm->EmitULEB128(DG.getStringPool().getEntry(*DG.getAsmPrinter(), (Name + ’ ’ + Value).str()));
return;
}
Asm->OutStreamer->EmitBytes(Name);
if (!Value.empty()) {
// There should be one space between macro name and macro value.
Asm->emitInt8(’ ‘);
Asm->OutStreamer->EmitBytes(Value);
}
Asm->emitInt8(’\0’);