Prune unused metadata after function extraction

Are there any existing tools that will prune unused metadata from a module?

Using llvm-extract, I can pull out a single function of interest from a large module, but it seems to copy all of the metadata, even thought most of it has no connection to the single extracted function.

I could write my own pass to prune the metadata, but does this kind of step already exist somewhere?

Define “unused”. Any metadata could end up being used by some pass even if it’s directly attached to some instruction or function. For some known obvious cases perhaps llvm-extract could be improved.

Ah okay, mainly I am thinking of debug info metadata which describes variables, types, locations, etc. that have no connection to and are not used by the single extracted function.

(I don’t want to just strip all debug info, because I am working on testing the debug info itself…! I am just hoping to minimise it down to only what’s needed to cover the extracted function.)

llvm-extract already has some support for adjusting debug info after extraction, so if there’s nothing already around that would clean up this kind of unused debug info metadata, then perhaps an additional post-extraction pass is what’s needed here.

Not sure if they help here (don’t know exactly what llvm-extract leaves around), but to clean things up there are both some passes, and opt options, that can be used:

opt -passes='strip-dead-debug-info,strip-dead-prototypes'
opt -strip-named-metadata
opt -strip-debug

But if for example strip-dead-debug-info is able to clean things up that llvm-extract isn’t cleaning up, then maybe llvm-extract could run that pass automatically?

(But if I remember correctly, the “retainedNodes” fields in some metadata fields may need to be manually edited to get rid of things related to certain variables to get properly cleaned. Depending on what should be removed.)

Thanks for these suggestions! :slightly_smiling_face: It seems like all those existing passes aren’t able to clean up the metadata tangle I’m seeing (or else they just drop everything).

It likely is something to do with retainedNodes or similar holding onto more than is really needed.

With everyone’s feedback here, I am at least more confident that I haven’t missed some existing tool or pass, so I’ll likely dive into the details further and write my own pass to clean this up (that I can hopefully add to llvm-extract for others as well).