[RFC] Remarks-based code size analysis tool

Jessica_Paquette1 · June 13, 2019, 11:30pm

Hi everyone,

I frequently find that I want to quickly find out which functions have changed the most in size when I compile a program with one version of the compiler and another version of the compiler.

Optimization remarks provide deeper insight into what the compiler does to individual functions during the compilation process. A tool based off of remarks would give us the means to say exactly what a pass did to a function during the compilation process. We also have remarks from the asm-printer which tell us the number of instructions in a function at any given time.

It’s now possible to access remarks through object files. So, I thought it would be an interesting experiment to write a C++ tool using the new remarks API to write a simple size tool. A size tool using remarks would be able to tell us about the following things:

Which functions grew the most in size between one compilation and another compilation
Which functions were/were not inlined between compilations (and similar insights into other passes)
Which compiler passes caused the largest increases in each function

I have a prototype for a size tool which accomplishes (1) using asm-printer remarks. (2) and (3) can be accomplished by searching for remarks corresponding to interesting passes and by searching for size-info remarks respectively. For now, I just decided to implement the minimum set of features for a useful tool. The patch is available here:

https://reviews.llvm.org/D63306

The tool in the patch works like this:

$ ./sizediff old_version_of_program new_version_of_program

And it produces output like this:

function_name old_instruction_count new_instruction_count delta

This output can tell me the following things:

Which functions were not present in old_version_of_program (they have a size of 0)
Which functions were not present in new_version_of_program (same idea)
Which functions grew the most
Which functions shrunk the most

For a concrete example of the output, here’s some output I got by compiling the stepanov_container benchmark for AArch64 with -Os and -O3 respectively:

https://hastebin.com/hucexocuxi.md

If anyone is interested, feel free to try it out/review/etc. I think that a tool like this would be quite useful.

Jessica

sjoerdmeijer · June 14, 2019, 6:00am

Hi Jessica,

We often have exactly the same problem:

I frequently find that I want to quickly find out which functions have changed the most in size when I compile a program with one version of the compiler and another version of the compiler.

Your tool looks very interesting and useful. I think we are very interested. In fact, I am looking at code-size problems at the moment, and will give it a try. I think we can help with the review too.

FWIW, when I investigate code-size problems, I usually start looking at the final linked image (for context, that’s mostly baremetal images). Unused sections and functions will be removed by the linker, which e.g. avoids unnecessary analysing them which may happen when you only look at object files. I use a downstream tool for binaries that list the code-size per function. When I generate such a report for different images, I can easily diff them and see where codesize difference are the biggest. I then isolate a function, compile it with debug and print-after-all, so that I can do a diff again where changes are introduced. And when image sizes are small, and library inclusion is thus relatively important, the linker in verbose mode telling me why and which objects and functions are included is extremely useful too.

Cheers,
Sjoerd.

Jessica_Paquette1 · June 14, 2019, 5:20pm

Hi Sjoerd,

Your tool looks very interesting and useful. I think we are very interested. In fact, I am looking at code-size problems at the moment, and will give it a try. I think we can help with the review too.

That would be great!

FWIW, when I investigate code-size problems, I usually start looking at the final linked image (for context, that’s mostly baremetal images). Unused sections and functions will be removed by the linker, which e.g. avoids unnecessary analysing them which may happen when you only look at object files. I use a downstream tool for binaries that list the code-size per function. When I generate such a report for different images, I can easily diff them and see where codesize difference are the biggest. I then isolate a function, compile it with debug and print-after-all, so that I can do a diff again where changes are introduced. And when image sizes are small, and library inclusion is thus relatively important, the linker in verbose mode telling me why and which objects and functions are included is extremely useful too.

I have a similar tool!

I think that you’re right wrt looking at functions that actually appear in the final linked image. In my experience, this is the common case. (My personal script just looks at the output of llvm-nm)

I think that combining the two approaches could be interesting. By using remarks in conjunction with the symbols present in the final linked image, it should be possible to signify to the user if a function was eliminated by the compiler (and which pass did it) or the linker.

I think the default output should be something like this:

Only show functions that appear in the binary
If a function appears in one image, but not the other, check if there is a remark stating that its instruction count went to 0. If there is, state that it was eliminated by the compiler.
… If there isn’t, state that it was eliminated by the linker

I think that for a first pass, just to get the functionality going, I’d like to just focus on the remarks-based diffing. After that, I think it would be a natural step to add information about the actual binary to the tool as well.

Jessica

Jessica_Paquette1 · June 14, 2019, 5:51pm

I realized I made a mistake in the previous email, since it seems like remarks are only in object files. I thought they would be included in linked images for some reason. So, I guess that with the current scheme, we can’t really look at that.

Jessica

Topic		Replies	Views
RFC: Adding a code size analysis tool LLVM Dev List Archives	16	217	October 2, 2018
speed and code size issues LLVM Dev List Archives	17	196	July 19, 2009
Tool to help hunt down binary size regressions LLVM Dev List Archives	2	139	April 17, 2019
object size comparisons LLVM Dev List Archives	1	84	December 16, 2013
Next steps for optimization remarks? LLVM Dev List Archives	19	195	July 18, 2017

[RFC] Remarks-based code size analysis tool

Related topics