RFC: Adding a code size analysis tool

Hello,

I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.

The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.

For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)

As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.

Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.

To get size information about a program, you do:

  llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>

This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.

The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)

To look at code growth between two programs, you'd do:

  llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>

Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.

Pending Work

Fantastic! I have been looking at creating a tool that a) only spits out actionable size reductions (preferably with a specific action should be specified) and b) only analyzes the size of allocated sections. The other deficiency I’ve seen with bloaty is speed and scaling. It’s very hard to get bloaty to analyze across a large system of interdependent shared libraries. You can add me as a reviewer to any changes as I would very much like to see such a tool exist.

Unlike bloaty, this tool focuses exclusively on the text segment.

I’d like to see support for everything within PT_LOAD segments, not just the executable parts. Everything else you’ve said is basically what I wanted.

Something that I’ve been looking to do for a while now is to do this at the .o level, and have something to combine the per .o results as well. I’ve wanted to do that to figure out where I can speed up builds that are overly slow because of redundant template instantiations.

You might also consider a view that goes across templates and across namespaces. It can be useful to see that X% of your code is in std::map instantiations (for example). This seems similar to how you have inheritance covered. I’ve also wanted to find ways to visualize the opposite… where I have class Foo and I want to see its total cost, including the size of std::vector.

When can we start using this to diff LLVM releases between each other (both their codegen quality, and bloat that comes with code changes)? :smiley:

Hello,

I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.

The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.

For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)

I really like the inlining info.

As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.

This is also really neat.

Unlike bloaty, this tool focuses exclusively on the text segment.

It seems like one could add more than text segments separately. What you implemented already does a bunch of great stuff.

Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.

To get size information about a program, you do:

llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>

This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.

The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)

To look at code growth between two programs, you'd do:

llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>

Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.

Pending Work
------------

I think the main piece of work the tool needs is better testing. Currently
there's just a single end-to-end test in clang. It might be better to check in
a few binaries so we can check that the tool reports sizes correctly.

Also, it may turn out that folks are interested in different ways of visualizing
size data. While the textual format of flamegraphs is really convenient for
humans to read, the graphs themselves do make more sense when the underlying
data have a frequentist interpretation. If there's enough interest I can explore
using an alternative format for visualization, e.g:

http://neugierig.org/software/chromium/bloat/
https://github.com/evmar/webtreemap

(Thanks JF for pointing these out!)

These were neat, we used them for PNaCl releases (when trying to shrink LLVM’s size). What you have might not have all the same features yet, but I think you’ve taken an approach which can be made more powerful over time.

Agreed that other visualizations can just come later.

Hi Jake,

Fantastic! I have been looking at creating a tool that a) only spits out actionable size reductions (preferably with a specific action should be specified)

I’m glad you brought this up. I think issuing FixIts for code size problems could be really useful.

Would it make sense to write a code size optimization guide as a starting point (say, in llvm/docs)? It could cover the tradeoffs of using relevant compiler features (-Oz, -fvisibility=hidden, etc) and attributes. FixIts could then reference this doc.

and b) only analyzes the size of allocated sections. The other deficiency I’ve seen with bloaty is speed and scaling. It’s very hard to get bloaty to analyze across a large system of interdependent shared libraries.

I’m not sure what you mean by ‘analyze across’: do you mean that it’s hard to analyze/diff a collection of executables/libraries as a single unit?

You can add me as a reviewer to any changes as I would very much like to see such a tool exist.

Thanks! Barring any objections I’ll split the logic out of llvm-dwarfdump and start a review. This should facilitate adding options, PDB support, etc.

Unlike bloaty, this tool focuses exclusively on the text segment.

I’d like to see support for everything within PT_LOAD segments, not just the executable parts. Everything else you’ve said is basically what I wanted.

I’d like to see this too.

vedant

Hi Ben,

Something that I’ve been looking to do for a while now is to do this at the .o level, and have something to combine the per .o results as well. I’ve wanted to do that to figure out where I can speed up builds that are overly slow because of redundant template instantiations.

You might also consider a view that goes across templates and across namespaces. It can be useful to see that X% of your code is in std::map instantiations (for example). This seems similar to how you have inheritance covered. I’ve also wanted to find ways to visualize the opposite… where I have class Foo and I want to see its total cost, including the size of std::vector<Foo>.

These are great ideas. DWARF might not provide enough information to generate these views (it doesn’t explicitly describe the types which parameterize classes, or the names of un-specialized templates). But it should be possible to piece some of this together by parsing type names.

vedant

Will this only be strictly for binary size, or can we use it for memory size too?

One thing I implemented in llvm-pdbutil kind of as a side-exercise to see if it found anything useful was a padding detector. It turns out it’s really annoyingly difficult to reconstruct an exact class layout from debug info, but I think it’s about 85% correct now (although for now it only works on Windows until the native high-level PDB access api is complete – currently only the native low level api is complete). It will allow you to sort all classes by amount of padding or percentage of class size attributable to padding. We shaved a couple of percent off of V8’s memory usage with this tool.

Granted, it’s better to have the compiler detect this if possible, but we’re talking about a tool that can be run on an arbitrary executable not necessarily built with a compiler we control.

BTW, even though DWARF doesn’t describe types which parameterize templates, it does give you mangled names, so you should be able to reconstruct those types from the mangled names.

(my vote, somewhat biased - is that I’d love to see more investment in Bloaty (to keep all these sort of size analysis tools and tricks in one place), but sort of accept folks are probably going to keep building more infrastructure for this sort of thing in LLVM directly)

(my vote, somewhat biased - is that I’d love to see more investment in Bloaty (to keep all these sort of size analysis tools and tricks in one place), but sort of accept folks are probably going to keep building more infrastructure for this sort of thing in LLVM directly)

I get where that comes from, but it seems a bit like a Valgrind versus sanitizer argument: integrating with the toolchain gives you things you can’t really get otherwise. Valgrind is still great as a self-standing thing.

(my vote, somewhat biased - is that I’d love to see more investment in Bloaty (to keep all these sort of size analysis tools and tricks in one place), but sort of accept folks are probably going to keep building more infrastructure for this sort of thing in LLVM directly)

I get where that comes from, but it seems a bit like a Valgrind versus sanitizer argument: integrating with the toolchain gives you things you can’t really get otherwise. Valgrind is still great as a self-standing thing.

Not sure that’s quite the same though - with sanitizer integrating with the optimizers is the key here.

With bloaty - it could, at worst, use LLVM’s libDebugInfo as a library to implement the more advanced debug-using features without being less functional than an in-LLVM implementation.

  • Dave

I personally think LLVM being the central place for such compiler tooling is great for discovery.

Are there plans to implement similar features for PE/COFF. I’ve switched to Clang for all my windows development, and I rarely do Linux/Mac development so for me this tool would be great for size profiling my windows binaries.

I have 3 issues with bloaty 1) It’s designed with a “give all information possible” philosophy not a “give only actionable information (preferably with reasons)” 2) there is (currently) no way to restrict the information you get to the allocated case 3) It hasn’t been scaling well on hundreds of binaries. Point 2 is easily fixable. Point 3 is most likely fixable and the features to drive bloaty exist I suppose currently we’re driving individual instances of bloaty ourselves and this is both faster and gives us more information than we get using config files and csv files. Point 1 seems to be a real killer and a major point this proposal seeks to address. If we can work out a good proposal to add the functionality mentioned in point 1 I’d be happy to support that as well.

I think an external tool could do this just as well. My bias is for it to be in llvm however because it makes it easier to distribute since we already have testing, CI, a distribution figured out for that. That’s not a good technical reason I grant you.

(my vote, somewhat biased - is that I’d love to see more investment in Bloaty (to keep all these sort of size analysis tools and tricks in one place), but sort of accept folks are probably going to keep building more infrastructure for this sort of thing in LLVM directly)

I get where that comes from, but it seems a bit like a Valgrind versus sanitizer argument: integrating with the toolchain gives you things you can’t really get otherwise. Valgrind is still great as a self-standing thing.

Not sure that’s quite the same though - with sanitizer integrating with the optimizers is the key here.

With bloaty - it could, at worst, use LLVM’s libDebugInfo as a library to implement the more advanced debug-using features without being less functional than an in-LLVM implementation.

I’m a bit biased too, but fwiw: my preference would be to add a new size analysis tool to llvm.

Such a tool might grow to depend on code for object file parsing, debug info parsing, demangling, and disassembling (all of which bloaty either reimplements or pulls in). Living in-tree should make it easier to pick up bug fixes in these dependencies and reduce maintenance overhead.

While I really like bloaty, my impression is that it’d be better to implement the functionality I’d like to use in a new tool.

vedant

Is it that it’d be better to have the functionality in LLVM, or in a new tool? (is it about it being a different tool, or about it being in the LLVM tree, or something else?)

What about possibly moving Bloaty into the LLVM project & improving it there?

Well that sounds complicated and non-trivial but I’d still be fine with it. I’d still want a story about propogating reasons for sizes. In addition I’d want a story about how we plan on migrating all that non-llvm code to be llvm style. Sounds like a total rewrite it you ask me.

Is it that it’d be better to have the functionality in LLVM, or in a new tool? (is it about it being a different tool, or about it being in the LLVM tree, or something else?)

I think it’d be better to have the functionality in LLVM. Short of porting bloaty, this would need to be a new tool.

What about possibly moving Bloaty into the LLVM project & improving it there?

Assuming its authors would like this to happen, this sounds like at least an equivalent amount of work compared to a implementing a new tool, if not more. I’d be open to this if contributors familiar with its codebase were willing to drive the effort.

vedant

Will this only be strictly for binary size, or can we use it for memory size too?

This could be a good fit. I’m not sure what would be helpful beyond a padding detector (which, btw, DWARF’s AT_data_member_location might help with?). DWARF does describe the size of each class/struct, but I’m skeptical that surfacing that could be very helpful.

One thing I implemented in llvm-pdbutil kind of as a side-exercise to see if it found anything useful was a padding detector. It turns out it’s really annoyingly difficult to reconstruct an exact class layout from debug info, but I think it’s about 85% correct now (although for now it only works on Windows until the native high-level PDB access api is complete – currently only the native low level api is complete). It will allow you to sort all classes by amount of padding or percentage of class size attributable to padding. We shaved a couple of percent off of V8’s memory usage with this tool.

This is really neat.

Granted, it’s better to have the compiler detect this if possible, but we’re talking about a tool that can be run on an arbitrary executable not necessarily built with a compiler we control.

BTW, even though DWARF doesn’t describe types which parameterize templates, it does give you mangled names, so you should be able to reconstruct those types from the mangled names.

Right.

vedant