We are interested in adding the ability to produce machine readable output to some of the tools that LLVM provides. The goal is to provide easier automation capabilities for future scripting. We wanted to gauge interest as well as discuss how the community would like this to be implemented.
Currently the plan is to surface this as an optional flag which if set will have the tool output machine readable output. We are planning for the output format to be in JSON, if anyone feels strongly about another format please let us know. The tools we initially hope to tackle are llvm-nm and llvm-readelf but we’d be interested to hear if there are tools that the community feels would benefit from this.
The high level implementation plan is to provide an abstract interface for output with human-readable output as one concrete implementation of this interface and a new machine-readable output as another implementation. For tools like llvm-readelf, this infrastructure already exists so it would just be a matter of implementing a JSONELFDumper on top of the existing GNUELFDumper and LLVMELFDumper. For tools like llvm-nm this infrastructure doesn’t exist so we’d need to add this abstraction first.
Interested to hear any thoughts on:
How would we like to surface this feature?
Which tools would be most valuable to provide machine-readable output?
Does the implementation plan make sense?
This is a conversation that’s come up on odd occasions before (for example, it was a topic briefly discussed at the Brussels Euro LLVM meeting a couple of years ago), but to my knowledge, nobody has had a strong need for it until now, with the exception of llvm-symbolizer which already has a JSON output format.
Why would you need this for both llvm-nm and llvm-readelf? llvm-nm is basically just a way to dump the symbols in a file, but llvm-readelf already has that ability. If you implemented machine-readable output in llvm-readelf, would you need it for llvm-nm too?
In terms of which other tools might need it:
- llvm-objdump’s feature set broadly overlaps llvm-readelf. The only additional features are really to do with disassembly, but I doubt there are many people who are trying to parse disassembly for this.
- llvm-dwarfdump: this might be very vaguely useful, but I doubt there are many scripts that actually rely on its output.
- llvm-strings: this is just a raw dump already, so there’s no need for a “machine-readable” format (since it is already trivially parseable).
- llvm-cxxfilt: same as llvm-strings - the output is simple enough that there’s no need for JSON output here.
- llvm-ar: there are a limited number of output options in this tool, but again, I think the output is broadly trivial. There may be no need for it here either.
There may be other tools I am not so familiar with, but to summarise, if you do llvm-readelf, I doubt you’ll need to implement anything else.
I’d actually avoid doing it in any tool unless you have an actual concrete need: maintaining an additional output format is a non-trivial task, as every new feature added needs to have an additional implementation for the new output format, increasing development cost as a result.
I have previously given some thought to machine readable output in llvm-readelf. It certainly would seem that a JSONELFDumper would be the way forward, with json just becoming another output option for
--elf-output-style. I assume that’s what you mean by “surface this feature”?
Thank you for the insight! It makes a lot of sense to show restraint when adding output options. I think for now, focus will be put on llvm-readelf which we do have concrete needs for and other tools can be considered afterwards if a need arises.