Debug info: type uniquing for C++ and the status on building clang with "-flto -g"

Hi All,

Type uniquing for C++ is in. Some data for Xalan with -flto -g:

9.9MB raw dwarf size, peak memory usage at 2.8GB
The raw dwarf size was 58MB, memory usage was 7GB back in May, 2013.
Other efforts at size reduction helped, and type uniquing improved on top of those.

Data on building clang with “-flto -g” after type uniquing:
3.4GB MDNodes after parsing all bc files, 7GB MDNodes after linking all bc files
4.6GB DIEs
4G MCContext

→ The memory usage is still too big.

So how to reduce the memory footprint at MDNode level:

1> Combine integers into MDString and further combining MDStrings (see PR17891)
A partial implementation on the important debug info nodes can reduce the MDNodes from 7GB to 5.7GB
2> Release MDNodes that are only used by source modules (I will send out a proposal)
An estimation based on partial implementation: this will reduce MDNodes from 5.7GB to 3.9GB

Thanks,
Manman

Hi Manman,

Thanks for sending this summary and progress plans - it’s great to see the impact your changes have had and ideas for future direction.

Hi Manman,

Thanks for sending this summary and progress plans - it's great to see the
impact your changes have had and ideas for future direction.

Type uniquing for C++ is in. Some data for Xalan with -flto -g:

9.9MB raw dwarf size, peak memory usage at 2.8GB
The raw dwarf size was 58MB, memory usage was 7GB back in May, 2013.
Other efforts at size reduction helped, and type uniquing improved on top
of those.

Data on building clang with "-flto -g" after type uniquing:
  3.4GB MDNodes after parsing all bc files, 7GB MDNodes after linking all
bc files

What's the change between parsing and linking?

Parsing means reading in all bc files to source modules. Linking means
linking in the source modules to the destination module.
Extra MDNodes can be generated for the destination module.

   4.6GB DIEs

It seems like the DIEs are a substantial (more than the pre-linked, but
post-parsed BC files) part of the footprint. I think it might be important
to do the CU-at-a-time work sooner rather than later as I'm concerned about
the design impact it will have on existing and future work (it's already
going to substantially change the cross-CU-DIE references, potentially
changing the cost/benefit of that feature since we cannot inject DIEs from
later CUs into prior ones).

  4G MCContext

What's the data in the MCContext that's relevant to debug info?

One data point on "Xalan":
without -g, MCContext allocates 45MB,
with -g, MCContext allocates 286MB.

  --> The memory usage is still too big.

Do we have an idea of what size is "small enough"? It would be useful to
have a goal.

So how to reduce the memory footprint at MDNode level:
  1> Combine integers into MDString and further combining MDStrings (see
PR17891)
       A partial implementation on the important debug info nodes can
reduce the MDNodes from 7GB to 5.7GB

I think this'll be an interesting, and potentially valuable, change even
in non-LTO cases, but not necessarily where I would start just now.

  2> Release MDNodes that are only used by source modules (I will send
out a proposal)
         An estimation based on partial implementation: this will reduce
MDNodes from 5.7GB to 3.9GB

I'll keep an eye out for your proposal, as I can't quite picture what
you've got in mind from this brief description.

Yes, I plan to send out the proposal today or tomorrow.

Manman

Hi Manman,

Thanks for sending this summary and progress plans - it's great to see
the impact your changes have had and ideas for future direction.

Type uniquing for C++ is in. Some data for Xalan with -flto -g:

9.9MB raw dwarf size, peak memory usage at 2.8GB
The raw dwarf size was 58MB, memory usage was 7GB back in May, 2013.
Other efforts at size reduction helped, and type uniquing improved on
top of those.

Data on building clang with "-flto -g" after type uniquing:
  3.4GB MDNodes after parsing all bc files, 7GB MDNodes after linking
all bc files

What's the change between parsing and linking?

Parsing means reading in all bc files to source modules. Linking means
linking in the source modules to the destination module.
Extra MDNodes can be generated for the destination module.

OK, that's perhaps strange - do you have any ideas about what MDNodes we
create when linking modules together? If anyhting I would expect a
reduction in size as MDNodes are deduplicated across multiple modules. Are
you measuring this after the original modules have been unloaded? Are we
not unloading those modules once we've created the merged module?

   4.6GB DIEs

It seems like the DIEs are a substantial (more than the pre-linked, but
post-parsed BC files) part of the footprint. I think it might be important
to do the CU-at-a-time work sooner rather than later as I'm concerned about
the design impact it will have on existing and future work (it's already
going to substantially change the cross-CU-DIE references, potentially
changing the cost/benefit of that feature since we cannot inject DIEs from
later CUs into prior ones).

  4G MCContext

What's the data in the MCContext that's relevant to debug info?

One data point on "Xalan":
without -g, MCContext allocates 45MB,
with -g, MCContext allocates 286MB.

OK, might be useful to understand which parts of that - maybe the Values
(ints, strings, etc) themselves are being attributed to the MCContext
rather than the MDNode sizes you were reporting above? Not really sure.

Can you give an example? I’m just curious.

So something big is in the MCContext…. what is it?

Hi Manman,

Thanks for sending this summary and progress plans - it's great to see
the impact your changes have had and ideas for future direction.

Type uniquing for C++ is in. Some data for Xalan with -flto -g:

9.9MB raw dwarf size, peak memory usage at 2.8GB
The raw dwarf size was 58MB, memory usage was 7GB back in May, 2013.
Other efforts at size reduction helped, and type uniquing improved on
top of those.

Data on building clang with "-flto -g" after type uniquing:
  3.4GB MDNodes after parsing all bc files, 7GB MDNodes after linking
all bc files

What's the change between parsing and linking?

Parsing means reading in all bc files to source modules. Linking means
linking in the source modules to the destination module.
Extra MDNodes can be generated for the destination module.

OK, that's perhaps strange - do you have any ideas about what MDNodes we
create when linking modules together? If anyhting I would expect a
reduction in size as MDNodes are deduplicated across multiple modules. Are
you measuring this after the original modules have been unloaded? Are we
not unloading those modules once we've created the merged module?

We don't unload the source modules. Even when we unload the source modules,
the MDNodes belong to the Context, they are shared among the modules.
My proposal is going to suggest an interface to delete the source modules
and remove the MDNodes used only by the source modules from the Context.

There are a few cases where we generate MDNodes when linking modules:
1> when a MDNode points to a value that is different from the source module
such as Function*.
2> when we have a cycle in the MDNode graph, all nodes in the cycle will be
created for the destination module.

When we load in the source modules, the types are already de-duplicated
(i.e multiple source modules will share the same type if possible).

   4.6GB DIEs

It seems like the DIEs are a substantial (more than the pre-linked, but
post-parsed BC files) part of the footprint. I think it might be important
to do the CU-at-a-time work sooner rather than later as I'm concerned about
the design impact it will have on existing and future work (it's already
going to substantially change the cross-CU-DIE references, potentially
changing the cost/benefit of that feature since we cannot inject DIEs from
later CUs into prior ones).

  4G MCContext

What's the data in the MCContext that's relevant to debug info?

One data point on "Xalan":
without -g, MCContext allocates 45MB,
with -g, MCContext allocates 286MB.

OK, might be useful to understand which parts of that - maybe the Values
(ints, strings, etc) themselves are being attributed to the MCContext
rather than the MDNode sizes you were reporting above? Not really sure.

Same here. I will look into that when I have time. Or somebody else already
has the answer?

Manman

Hi All,

Type uniquing for C++ is in. Some data for Xalan with -flto -g:
9.9MB raw dwarf size, peak memory usage at 2.8GB
The raw dwarf size was 58MB, memory usage was 7GB back in May, 2013.
Other efforts at size reduction helped, and type uniquing improved on top
of those.

Data on building clang with "-flto -g" after type uniquing:
  3.4GB MDNodes after parsing all bc files, 7GB MDNodes after linking all
bc files
  4.6GB DIEs
  4G MCContext
  --> The memory usage is still too big.

What fraction of the memory space occupied by MDNodes is just pointers?
(IIRC our whole scheme for metadata is the epitome of "sea of linked
nodes"). Do you have any statistics of how often the flexibility offered by
links is used? (e.g. how often the links are changed). If huge swaths of
these nodes are read-mostly, then it may be much more efficient to use a
representation where the links are implicit.

More generally, can you gather some statistics about the relative
distribution of different operations on MDNodes that we do? (what is the
most called method? are there a couple of methods that account for >90% of
calls? How often do we mutate this data? etc.)

-- Sean Silva