Hi all,
I’d liked to get your thoughts on possibly adding a generic key-value store to the profile data formats for ‘metadata’. Some potential uses cases:
I. Profile Features
The most basic use could be as a central repository for internal bits of housekeeping information about the profile data. For example, to differentiate between FE and IR instrumentation:
llvm.instrumentation_source: “IR”
A key-value store would make it simple to add new bits of information and help keep everything human-readable for the text-based test formats. This could potentially also help with error checking at the llvm-profdata level if the Reader classes exposed it.
II. Profile Context
Basic (lightweight) information about the profile could be automatically gathered at profile time. The idea would be to automatically label profiles with contextual information so that the age/origin of a profile could be inspected using the llvm-profdata tool.
$ llvm-profdata show -metadata foo.profdata
llvm.profile_start_time: “2016-01-08T23:41:56.755Z”
llvm.profile_duration: 5.102s
llvm.exe_time: “2016-01-08T23:35:56.745Z”
Total functions: 4
Maximum function count: 866988873
Maximum internal block count: 267914296
Other possibilities: executable path, command line arguments, system info (uname)
III. Custom Content
The key-value store itself could be exposed to developers via the llvm-profdata tool. This would allow for users to associate arbitrary custom data with a profile, as well as inspect it:
$ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o foo.profdata
$ llvm-profdata show -metadata foo.profdata
customkey: “value1”
Total functions: 4
Maximum function count: 866988873
Maximum internal block count: 267914296
Developers could add as much custom context as they find valuable:
$ llvm-profdata merge -metadata=“mysoft.version,${SOFTWARE_VERSION} (${BUILD_NUMBER})” -metadata="mysoft.exe_md5,md5 -q foo.exe
foo.profraw -o foo.profdata
$ llvm-profdata show -metadata foo.profdata
mysoft.version: “0.1.0”
mysoft.exe_md5: “337b5c5bc29cbdca090a1921a58465d6”
Total functions: 4
Maximum function count: 866988873
Maximum internal block count: 267914296
Other information that might be interesting: git/svn revision, workload description, system info (uname -a)
This would be a way to embed almost any platform-specific or heavy-weight data without requiring the addition of platform-specific code in compiler-rt and without impacting other developers.
When profiles are merged it might be simplest to keep all input metadata (machine-readable things such as feature bits might need to be handled differently):
$ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o foobar.profdata
$ llvm-profdata show -metadata foobar.profdata
foo.profdata
llvm.profile_weight: 3
llvm.profile_start_time: “2016-01-08T23:41:56.755Z”
llvm.profile_duration: 5.102s
llvm.exe_time: “2016-01-08T23:35:56.745Z”
customkey: “value1”
bar.profdata
llvm.profile_weight: 1
llvm.profile_start_time: “2016-01-15T00:08:41.168Z”
llvm.profile_duration: “1.001s”
llvm.exe_time: “2016-01-15T00:08:13.000Z”
customkey: “value2”
Total functions: 4
Maximum function count: 866988873
Maximum internal block count: 267914296
In terms of implementation, the metadata could live as a separate contiguous section in the binary profile formats. It might make sense to encode it in something like YAML so that it could also be directly embedded in the various text formats.