[Proposal][Debuginfo] dsymutil-like tool for ELF.

Ah, yeah - that seems like a missed opportunity - duplicating the whole type DIE. LTO does this by making monolithic types - merging all the members from different definitions of the same type into one, but that’s maybe too expensive for dsymutil (might still be interesting to know how much more expensive, etc). But I think the other way to go would be to produce a declaration of the type, with the relevant members - and let the DWARF consumer identify this declaration as matching up with the earlier definition. That’s the sort of DWARF you get from the non-MachO default -fno-standalone-debug anyway, so it’s already pretty well tested/supported (support in lldb’s a bit younger/more work-in-progress, admittedly). I wonder how much dsym size there is that could be reduced by such an implementation.

I see. Yes, that could be done and I think it would result in noticeable size reduction(I do not know exact numbers at the moment).

I work on multi-thread DWARFLinker now and it`s first version will do exactly the same type processing like current dsymutil.

Yeah, best to keep the behavior the same through that

Above scheme could be implemented as a next step and it would result in better size reduction(better than current state).

But I think the better scheme could be done also and it would result in even bigger size reduction and in faster execution. This scheme is something similar to what you`ve described above: “LTO does - making monolithic types - merging all the members from different definitions of the same type into one”.

I believe the reason that’s probably not been done is that it can’t be streamed - it’d lead to buffering more of the output

yes. The fact that DWARF should be streamed into AsmPrinter complicates parallel dwarf generation. In my prototype, I generate
several resulting files(each for one source compilation unit) and then sequentially glue them into the final resulting file.

How does that help? Do you use relocations in those intermediate object files so the DWARF in them can refer across files?

It does not help with referring across the file. It helps to parallel the generation of CU bodies.
It is not possible to write two CUs in parallel into AsmPrinter. To make possible parallel generation I stream them into different AsmPrinters(this comment is for “I believe the reason that’s probably not been done is that it can’t be streamed”. which initially was about referring across the file, but it seems I added another direction).

Oh, I see - thanks for explaining, essentially buffering on-disk.

(if two of these expandable types were in one CU - the start of the second type couldn’t be known until the end because it might keep getting pushed later due to expansion of the first type) and/or having to revisit all the type references (the offset to the second type wouldn’t be known until the end - so writing the offsets to refer to the type would have to be deferred until then).

That is the second problem: offsets are not known until the end of file.
dsymutil already has that situation for inter-CU references, so it has extra pass to
fixup offsets.

Oh, it does? I figured it was one-pass, and that it only ever refers back to types in previous CUs? So it doesn’t have to go back and do a second pass. But I guess if sees a declaration of T1 in CU1, then later on sees a definition of T1 in CU2, does it somehow go back to CU1 and remove the declaration/make references refer to the definition in CU2? I figured it’d just leave the declaration and references to it as-is, then add the definition and use that from CU2 onwards?

For the processing of the types, it do not go back.
This “I figured it was one-pass, and that it only ever refers back to types in previous CUs”
and this “I figured it’d just leave the declaration and references to it as-is, then add the definition and use that from CU2 onwards” are correct.

Great - thanks for explaining/confirming!

With multi-thread implementation such situation would arise more often
for type references and so more offsets should be fixed during additional pass.

DWARFLinker could create additional artificial compile unit and put all merged types there. Later patch all type references to point into this additional compilation unit. No any bits would be duplicated in that case. The performance improvement could be achieved due to less amount of the copied DWARF and due to the fact that type references could be updated when DWARF is cloned(no need in additional pass for that).

“later patch all type references to point into this additional compilation unit” - that’s the additional pass that people are probably talking/concerned about. Rewalking all the DWARF. The current dsymutil approach, as far as I know, is single pass - it knows the final, absolute offset to the type from the moment it emits that type/needs to refer to it.

Right. Current dsymutil approach is single pass. And from that point of view, solution
which you`ve described(to produce a declaration of the type, with the relevant members)
allows to keep that single pass implementation.

But there is a restriction for current dsymutil approach: To process inter-CU references
it needs to load all DWARF into the memory(While it analyzes which part of DWARF is live,
it needs to have all CUs loaded into the memory).

All DWARF for a single file (which for dsymutil is mostly a single CU, except with LTO I guess?), not all DWARF for all inputs in memory at once, yeah?

right. In dsymutil case - all DWARF for a single file(not all DWARF for all inputs in memory at once).
But in llvm-dwarfutil case single file contains DWARF for all original input object files and it all becomes
loaded into memory.

Yeha, would be great to try to go CU-by-CU.

That leads to huge memory usage.
It is less important when source is a set of object files(like in dsymutil case) and this
become a real problem for llvm-dwarfutil utility when source is a single file(With current
implementation it needs 30G of memory for compiling clang binary).

Yeah, that’s where I think you’d need a fixup pass one way or another - because cross-CU references can mean that when you figure out a new layout for CU5 (because it has a duplicate type definition of something in CU1) then you might have to touch CU4 that had an absolute/cross-CU forward reference to CU5. Once you’ve got such a fixup pass (if dsymutil already has one? Which, like I said, I’m confused why it would have one/that doesn’t match my very vague understanding) then I think you could make dsymutil work on a per-CU basis streaming things out, then fixing up a few offsets.

When dsymutil deduplicates types it changes local CU reference into inter-CU reference(so that CU2(next) could reference type definition from CU1(prev)). To do this change it does not need to do any fixups currently.

When dsymutil meets already existed(located in the input object file) inter-CU reference pointing into the CU which has not been processed yet(and then its offset is unknown) it marks it as “forward reference” and patches later during additional pass “fixup forward references” at a time when offsets are known.

OK, so limited 2 pass system. (does it do that second pass once at the end of the whole dsymutil run, or at the end of each input file? (so if an input file has two CUs and the first CU references a type in the second CU - it could write the first CU with a “forward reference”, then write the second CU, then fixup the forward reference - and then go on to the next file and its CUs - this could improve performance by touching recently used memory/disk pages only, rather than going all the way back to the start later on when those pages have become cold)

yes, It does it in the end of each input file.

If CUs would be processed in parallel their offsets would not be known at the moment when local type reference would be changed into inter-CU reference. So we would need to do the same fix-up processing for all references to the types like we already do for other inter-CU references.

Yeah - though the existence of this second “fixup forward references” system - yeah, could just use it much more generally as you say. Not an extra pass, just the existing second pass but having way more fixups to fixup in that pass.

If we would be able to change the algorithm in such way :

  1. analyse all CUs.
  2. clone all CUs.

Then we could create a merged type table(artificial CU containing types) during step1.
If that type table would be written first, then all following CUs could use known offsets
to the types and we would not need additional fix-up processing for type references.
It would still be necessary to fix-up other inter-CU references. But it would not be necessary
to fix-up type references (which constitute the vast majority).

To me, that sounds more expensive than the fixup forward references pass.

If we would speak about direct comparison then yes loading DWARF one more time looks more expensive than fixup forward references pass. But if we would speak about the general picture then it could probably be beneficial:

  1. merging types would lead to a smaller size of resulting DWARF. This would speed up the process.
    f.e. If we would switch “odr types deduplication” off in current implementation then it would increase execution time two times. That is because more DWARF should be cloned and written in the result. Implementation of “merging types” would probably have a similar effect
  • It would speed-up the overall process. So from one side additional step for loading DWARF would
    decrease performance but a smaller amount of resulting data would increase performance.
  1. When types would be put in the first CU then we would have a simple strategy for our liveness analysis algorithm: just always keep the first CU in memory. This allows us to speed up our liveness analysis step.

Anyway, all the above is just an idea for future work. Currently, I am going to implement multithread processing for CUs loaded into memory and having the same type of processing as it currently is(Which assumes that “fixup forward references pass” started to do more work by fixing types references).

Without loading all CU into the memory it would require two passes solution. First to analyze
which part of DWARF relates to live code and then second pass to generate the result.

Not sure it’d require any more second pass than a “fixup” pass, which it sounds like you’re saying it already has?

It looks like it would need an additional pass to process inter-CU references(existed in incoming file) if we do not want to load all CUs into memory.

Usually inter-CU references aren’t used, except in LTO - and in LTO all the DWARF deduplication and function discarding is already done by the IR linker anyway. (ThinLTO is a bit different, but really we’d be better off teaching it the extra tricks anyway (some can’t be fixed in ThinLTO - like emitting a “Home” definition of an inline function, only to find out other ThinLTO backend/shards managed to optimize away all uses of the function… so some cleanup may be useful there)). It might be possible to do a more dynamic/rolling cache - keep only the CUs with unresolved cross-CU references alive and only keep them alive until their cross-CU references are found/marked alive. This should make things no worse than the traditional dsymutil case - since cross-CU references are only effective/generally used within a single object file (it’s possible to create relocations for them into other files - but I know LLVM doesn’t currently do this and I don’t think GCC does it) with multiple CUs anyway - so at most you’d keep all the CUs from a single original input file alive together.

But, since it is a DWARF documented case the tool should be ready for such case(when inter-CU
references are heavily used).

Sure - but by implementing a CU liveness window like that (keeping CUs live only so long as they need to be rather than an all-or-nothing approach) only especially quirky inputs would hit the worst case while the more normal inputs could perform better.

It is not clear what should be put in such CU liveness window. If CU100 references CU1 - how could we know that we need to put CU1 into CU liveness window before we processed CU100?

Fair point, not just forward references to worry about but backward references too. I wonder how much savings there is in the liveness analysis compared to “keep one copy of everything, no matter whether it’s live or not”, then it can be a pure forward progress situation. (with the quirk that you might emit a declaration for an entity once, then a definition for it later - alternatively if a declaration is seen it could be skipped under the assumption that a definition will follow (& use a forward ref fixup) - and if none is found, splat some stub declarations into a trailing CU at the end)

Moreover, llvm-dwarfutil would be the tool producing
exactly such situation. The resulting file(produced by llvm-dwarfutil) would contain a lot of
inter-CU references. Probably, there is no practical reasons to apply llvm-dwarfutil to the same
file twice but it would be a good test for the tool.

It’d be a good stress test, but not necessarily something that would need to perform the best because it wouldn’t be a common use case.

I agree that we should not slow down the DWARFLinker in common cases only because we need to support the worst cases.
But we also need to implement a solution which works in some acceptable manner for the worst case.

I think that depends on “acceptable” - correct, yes. Practical to run in reasonable time/memory? Not necessarily, in my opinion.

The current solution - loading everything in memory - makes it hard to use in a non-dsymutil scenario(llvm-dwarfutil).

I agree it’s worth exploring the non-dsymutil scenario, as you are - I’m just saying we don’t necessarily need to support high usability (fast/low memory usage/etc) llvm-dwarfutil on an already dwarfutil’d binary (but as you’ve pointed out, the “window” is unknowable because of backward references, so this whole subthread is perhaps irrelevant).

There could be several things which could be used to decide whether we need to go on a light or heavy path:

  1. If the input contains only a single CU we do not need to unload it from memory. Thus - we would not need to do an extra DWARF loading pass.
  2. If abbreviations from the whole input file do not contain inter-CU references then while doing liveness analysis, we do not need to wait until other CUs are processed.

(2) Yeah, that /may/ be a good idea, cheap to test, etc. Though I’d still wonder if a more general implementation strategy could be found that would make it easier to get a sliding scale of efficiency depending on how much inter-CU references where were, not a “if there are none it’s good, if there are any it’s bad or otherwise very different to implement”.

That should probably be measured, but I think we would loss most of size reduction (since we would start keep unreferenced data which is currently removed). Which would lead to slowdown performance and bigger disk space usage. At the current point, I do not see how that could be done. One possibility is preliminary mark CU by IsReferenced flag. Then we could delay cloning for such CU(either by putting into CU liveness window/either by unloading). Not referenced CU could be cloned immediately. Such a solution would be more scalable and work well in cases when only a few inter-CU references exist. Though it requires changes in DWARF format.

                    Ah, yeah - that seems like a missed
                    opportunity - duplicating the whole type DIE.
                    LTO does this by making monolithic types -
                    merging all the members from different
                    definitions of the same type into one, but
                    that's maybe too expensive for dsymutil
                    (might still be interesting to know how much
                    more expensive, etc). But I think the other
                    way to go would be to produce a declaration
                    of the type, with the relevant members - and
                    let the DWARF consumer identify this
                    declaration as matching up with the earlier
                    definition. That's the sort of DWARF you get
                    from the non-MachO default
                    -fno-standalone-debug anyway, so it's already
                    pretty well tested/supported (support in
                    lldb's a bit younger/more work-in-progress,
                    admittedly). I wonder how much dsym size
                    there is that could be reduced by such an
                    implementation.

                    I see. Yes, that could be done and I think it
                    would result in noticeable size reduction(I do
                    not know exact numbers at the moment).

                    I work on multi-thread DWARFLinker now and
                    it`s first version will do exactly the same
                    type processing like current dsymutil.

                Yeah, best to keep the behavior the same through that

                    Above scheme could be implemented as a next
                    step and it would result in better size
                    reduction(better than current state).

                    But I think the better scheme could be done
                    also and it would result in even bigger size
                    reduction and in faster execution. This scheme
                    is something similar to what you`ve described
                    above: "LTO does - making monolithic types -
                    merging all the members from different
                    definitions of the same type into one".

                I believe the reason that's probably not been done
                is that it can't be streamed - it'd lead to
                buffering more of the output

                yes. The fact that DWARF should be streamed into
                AsmPrinter complicates parallel dwarf generation.
                In my prototype, I generate
                several resulting files(each for one source
                compilation unit) and then sequentially glue them
                into the final resulting file.

            How does that help? Do you use relocations in those
            intermediate object files so the DWARF in them can
            refer across files?

            It does not help with referring across the file. It
            helps to parallel the generation of CU bodies.
            It is not possible to write two CUs in parallel into
            AsmPrinter. To make possible parallel generation I
            stream them into different AsmPrinters(this comment is
            for "I believe the reason that's probably not been done
            is that it can't be streamed". which initially was about
            referring across the file, but it seems I added another
            direction).

        Oh, I see - thanks for explaining, essentially buffering
        on-disk.

                (if two of these expandable types were in one CU -
                the start of the second type couldn't be known
                until the end because it might keep getting pushed
                later due to expansion of the first type) and/or
                having to revisit all the type references (the
                offset to the second type wouldn't be known until
                the end - so writing the offsets to refer to the
                type would have to be deferred until then).

                That is the second problem: offsets are not known
                until the end of file.
                dsymutil already has that situation for inter-CU
                references, so it has extra pass to
                fixup offsets.

            Oh, it does? I figured it was one-pass, and that it
            only ever refers back to types in previous CUs? So it
            doesn't have to go back and do a second pass. But I
            guess if sees a declaration of T1 in CU1, then later on
            sees a definition of T1 in CU2, does it somehow go back
            to CU1 and remove the declaration/make references refer
            to the definition in CU2? I figured it'd just leave the
            declaration and references to it as-is, then add the
            definition and use that from CU2 onwards?

            For the processing of the types, it do not go back.
            This "I figured it was one-pass, and that it only ever
            refers back to types in previous CUs"
            and this "I figured it'd just leave the declaration and
            references to it as-is, then add the definition and use
            that from CU2 onwards" are correct.

        Great - thanks for explaining/confirming!

                With multi-thread implementation such situation
                would arise more often
                for type references and so more offsets should be
                fixed during additional pass.

                    DWARFLinker could create additional artificial
                    compile unit and put all merged types there.
                    Later patch all type references to point into
                    this additional compilation unit. No any bits
                    would be duplicated in that case. The
                    performance improvement could be achieved due
                    to less amount of the copied DWARF and due to
                    the fact that type references could be updated
                    when DWARF is cloned(no need in additional
                    pass for that).

                "later patch all type references to point into
                this additional compilation unit" - that's the
                additional pass that people are probably
                talking/concerned about. Rewalking all the DWARF.
                The current dsymutil approach, as far as I know,
                is single pass - it knows the final, absolute
                offset to the type from the moment it emits that
                type/needs to refer to it.

                Right. Current dsymutil approach is single pass.
                And from that point of view, solution
                which you`ve described(to produce a declaration of
                the type, with the relevant members)
                allows to keep that single pass implementation.

                But there is a restriction for current dsymutil
                approach: To process inter-CU references
                it needs to load all DWARF into the memory(While it
                analyzes which part of DWARF is live,
                it needs to have all CUs loaded into the memory).

            All DWARF for a single file (which for dsymutil is
            mostly a single CU, except with LTO I guess?), not all
            DWARF for all inputs in memory at once, yeah?

            right. In dsymutil case - all DWARF for a single
            file(not all DWARF for all inputs in memory at once).
            But in llvm-dwarfutil case single file contains DWARF
            for all original input object files and it all becomes
            loaded into memory.

        Yeha, would be great to try to go CU-by-CU.

                That leads to huge memory usage.
                It is less important when source is a set of object
                files(like in dsymutil case) and this
                become a real problem for llvm-dwarfutil utility
                when source is a single file(With current
                implementation it needs 30G of memory for compiling
                clang binary).

            Yeah, that's where I think you'd need a fixup pass one
            way or another - because cross-CU references can mean
            that when you figure out a new layout for CU5 (because
            it has a duplicate type definition of something in CU1)
            then you might have to touch CU4 that had an
            absolute/cross-CU forward reference to CU5. Once you've
            got such a fixup pass (if dsymutil already has one?
            Which, like I said, I'm confused why it would have
            one/that doesn't match my very vague understanding)
            then I think you could make dsymutil work on a per-CU
            basis streaming things out, then fixing up a few offsets.

            When dsymutil deduplicates types it changes local CU
            reference into inter-CU reference(so that CU2(next)
            could reference type definition from CU1(prev)). To do
            this change it does not need to do any fixups currently.

            When dsymutil meets already existed(located in the input
            object file) inter-CU reference pointing into the CU
            which has not been processed yet(and then its offset is
            unknown) it marks it as "forward reference" and patches
            later during additional pass "fixup forward references"
            at a time when offsets are known.

        OK, so limited 2 pass system. (does it do that second pass
        once at the end of the whole dsymutil run, or at the end of
        each input file? (so if an input file has two CUs and the
        first CU references a type in the second CU - it could write
        the first CU with a "forward reference", then write the
        second CU, then fixup the forward reference - and then go on
        to the next file and its CUs - this could improve
        performance by touching recently used memory/disk pages
        only, rather than going all the way back to the start later
        on when those pages have become cold)

        yes, It does it in the end of each input file.

            If CUs would be processed in parallel their offsets
            would not be known at the moment when local type
            reference would be changed into inter-CU reference. So
            we would need to do the same fix-up processing for all
            references to the types like we already do for other
            inter-CU references.

        Yeah - though the existence of this second "fixup forward
        references" system - yeah, could just use it much more
        generally as you say. Not an extra pass, just the existing
        second pass but having way more fixups to fixup in that pass.

        If we would be able to change the algorithm in such way :

        1. analyse all CUs.
        2. clone all CUs.

        Then we could create a merged type table(artificial CU
        containing types) during step1.
        If that type table would be written first, then all following
        CUs could use known offsets
        to the types and we would not need additional fix-up
        processing for type references.
        It would still be necessary to fix-up other inter-CU
        references. But it would not be necessary
        to fix-up type references (which constitute the vast majority).

    To me, that sounds more expensive than the fixup forward
    references pass.

    If we would speak about direct comparison then yes loading DWARF
    one more time looks more expensive than fixup forward references
    pass. But if we would speak about the general picture then it
    could probably be beneficial:

    1. merging types would lead to a smaller size of resulting DWARF.
    This would speed up the process.
     f.e. If we would switch "odr types deduplication" off in
    current implementation then it would increase execution time two
    times. That is because more DWARF should be cloned and written in
    the result. Implementation of "merging types" would probably have
    a similar effect
     - It would speed-up the overall process. So from one side
    additional step for loading DWARF would
     decrease performance but a smaller amount of resulting data
    would increase performance.

    2. When types would be put in the first CU then we would have a
    simple strategy for our liveness analysis algorithm: just always
    keep the first CU in memory. This allows us to speed up our
    liveness analysis step.

    Anyway, all the above is just an idea for future work. Currently,
    I am going to implement multithread processing for CUs loaded into
    memory and having the same type of processing as it currently
    is(Which assumes that "fixup forward references pass" started to
    do more work by fixing types references).

                Without loading all CU into the memory it would
                require two passes solution. First to analyze
                which part of DWARF relates to live code and then
                second pass to generate the result.

            Not sure it'd require any more second pass than a
            "fixup" pass, which it sounds like you're saying it
            already has?

            It looks like it would need an additional pass to
            process inter-CU references(existed in incoming file) if
            we do not want to load all CUs into memory.

        Usually inter-CU references aren't used, except in LTO - and
        in LTO all the DWARF deduplication and function discarding
        is already done by the IR linker anyway. (ThinLTO is a bit
        different, but really we'd be better off teaching it the
        extra tricks anyway (some can't be fixed in ThinLTO - like
        emitting a "Home" definition of an inline function, only to
        find out other ThinLTO backend/shards managed to optimize
        away all uses of the function... so some cleanup may be
        useful there)). It might be possible to do a more
        dynamic/rolling cache - keep only the CUs with unresolved
        cross-CU references alive and only keep them alive until
        their cross-CU references are found/marked alive. This
        should make things no worse than the traditional dsymutil
        case - since cross-CU references are only
        effective/generally used within a single object file (it's
        possible to create relocations for them into other files -
        but I know LLVM doesn't currently do this and I don't think
        GCC does it) with multiple CUs anyway - so at most you'd
        keep all the CUs from a single original input file alive
        together.

        But, since it is a DWARF documented case the tool should be
        ready for such case(when inter-CU
        references are heavily used).

    Sure - but by implementing a CU liveness window like that
    (keeping CUs live only so long as they need to be rather than an
    all-or-nothing approach) only especially quirky inputs would hit
    the worst case while the more normal inputs could perform better.

    It is not clear what should be put in such CU liveness window. If
    CU100 references CU1 - how could we know that we need to put CU1
    into CU liveness window before we processed CU100?

Fair point, not just forward references to worry about but backward references too. I wonder how much savings there is in the liveness analysis compared to "keep one copy of everything, no matter whether it's live or not", then it can be a pure forward progress situation. (with the quirk that you might emit a declaration for an entity once, then a definition for it later - alternatively if a declaration is seen it could be skipped under the assumption that a definition will follow (& use a forward ref fixup) - and if none is found, splat some stub declarations into a trailing CU at the end)

        Moreover, llvm-dwarfutil would be the tool producing
        exactly such situation. The resulting file(produced by
        llvm-dwarfutil) would contain a lot of
        inter-CU references. Probably, there is no practical reasons
        to apply llvm-dwarfutil to the same
        file twice but it would be a good test for the tool.

    It'd be a good stress test, but not necessarily something that
    would need to perform the best because it wouldn't be a common
    use case.

    I agree that we should not slow down the DWARFLinker in common
    cases only because we need to support the worst cases.
    But we also need to implement a solution which works in some
    acceptable manner for the worst case.

I think that depends on "acceptable" - correct, yes. Practical to run in reasonable time/memory? Not necessarily, in my opinion.

    The current solution - loading everything in memory - makes it
    hard to use in a non-dsymutil scenario(llvm-dwarfutil).

I agree it's worth exploring the non-dsymutil scenario, as you are - I'm just saying we don't necessarily need to support high usability (fast/low memory usage/etc) llvm-dwarfutil on an already dwarfutil'd binary (but as you've pointed out, the "window" is unknowable because of backward references, so this whole subthread is perhaps irrelevant).

    There could be several things which could be used to decide
    whether we need to go on a light or heavy path:

    1. If the input contains only a single CU we do not need to unload
    it from memory. Thus - we would not need to do an extra DWARF
    loading pass.
    2. If abbreviations from the whole input file do not contain
    inter-CU references then while doing liveness analysis, we do not
    need to wait until other CUs are processed.

(2) Yeah, that /may/ be a good idea, cheap to test, etc. Though I'd still wonder if a more general implementation strategy could be found that would make it easier to get a sliding scale of efficiency depending on how much inter-CU references where were, not a "if there are none it's good, if there are any it's bad or otherwise very different to implement".

I think, there is a scenario which would make it possible to process CU once for not referenced CUs and handle inter-CU references in a scalable way(even for dwarfutil`d binary):

1. Implement a global type's table and types merging. This allows us to have all types in the memory.
Then, all inter-CU type references would point into that memory type table.
(we do not know which CU should be put into CU liveness window, we also could not put all CUs into the memory, but we could put all types into the memory).

2. If there are not other inter-CU references then all CUs would be handled by one pass.

3. If there are other inter-CU references, then after all CU processed by the first pass we would have a list of referenced CUs. Then, we could delete already cloned data(for referenced CU) and start the process again:
load CU, mark liveness, clone data. This second pass would be done for only referenced CUs.
For not-complex, not closely coupled cases it would work relatively fast.

4. put memory type table into artificial CU. Update all type`s references.

I think if we’re in the realm of DWARF extensions a whole bunch of other considerations come into it (& indeed, your suggested proposal may be a good one - but I think it’s a very wide problem space once we’re considering DWARF extensions). Mostly I was making arguments/suggestions/thoughts on the basis of being compatible with all existing DWARF producers.

I think if we're in the realm of DWARF extensions a whole bunch of other considerations come into it (& indeed, your suggested proposal may be a good one - but I think it's a very wide problem space once we're considering DWARF extensions). Mostly I was making arguments/suggestions/thoughts on the basis of being compatible with all existing DWARF producers.

the described scenario does not assume DWARF extensions. global type table is not new DWARF construction. This is an artificial CU keeping all types. That solution would be compatible with existing DWARF consumers/produces.

I think if we’re in the realm of DWARF extensions a whole bunch of other considerations come into it (& indeed, your suggested proposal may be a good one - but I think it’s a very wide problem space once we’re considering DWARF extensions). Mostly I was making arguments/suggestions/thoughts on the basis of being compatible with all existing DWARF producers.

the described scenario does not assume DWARF extensions. global type table is not new DWARF construction. This is an artificial CU keeping all types. That solution would be compatible with existing DWARF consumers/produces.

Sorry, guess I’m not following. Maybe this conversation’s getting a bit too abstract/theoretical/forward looking for me right now - no worries. Happy to chat more about it, but might be easier to focus on the immediate steps forward for now & tackle this when it’s the thing you’re planning to work on? (if I’m understanding correctly that this isn’t a direction you’re thinking to try right now)

Right, that is not what I am going to do immediately. My current plan, is to preserve current types processing but make it multi-thread(more multi-thread than it already is). Above ideas are for future plans. Let`s discuss them later. Thank you, for the comments.

Alexey.