I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.
The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.
For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few `noinline` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)
As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.
Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.
To get size information about a program, you do:
llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>
This emits four *.stats files into <dir>, each containing a distinct 'view' into
the code groups in <object>. There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.
The *.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)
To look at code growth between two programs, you'd do:
llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir>
Similarly, this emits four 'view' files into <dir>, but with a *.diffstats
suffix. The format is the same.