Perf2bolt fails with PERF-ERROR: return code 1

Running perf2bolt fails at

PERF2BOLT: waiting for perf events collection to finish...
PERF-ERROR: return code 1

What might have failed here? How can I see the output of the nested perf call?

Do you run perf and perf2bolt on the same machine?

Yes, it’s on the same machine.

This is trying to apply BOLT to the clang binary, using a perf recording of our build. The script doing this worked before. What I changed now is adding -DLLVM_ENABLE_LTO=Thin to the clang/LLVM build.

It’s probably the perf invocation itself that is failing in this case.

The command is likely the following:

perf script -F pid,ip,brstack -f -i <>

Let us know if the command works, but keeps failing as part of perf2bolt.

That seems to run fine indeed:

$ perf script -F pid,ip,brstack -f -i >/tmp/perf-script-out 
Processed 62040822 events and lost 4050 chunks!

Check IO/CPU overload!
$ echo $?

Sorry aboud a delay. Please try a debug build of BOLT, passing -debug-only=aggregator to the invocation. You should see the perf command line, which should help pinpoint the issue.

I have expired the issue also often. This was mainly happening if the perf file was to big and I did not have enough ram.
When perf2bolt is running it will use the tmpfs. If the perf file is quite big (5GB+) it will get out of ram and fail. Thats my experience so far.

When I lowered the size with --max-size=xGB then it worked well.

1 Like

Ok, the size and memory usage might well have been an issue here, the perf file was much larger that 5GB. I’ll try this again once I get back to it, but this will probably be only next year.

To clarify: the profile does not need to be gigabytes in size. Start with smaller profile, and extend it until the dyno-stats executed instructions get to about 1 billion executed instructions. It’s best to have small but representative workload. The profiling duration depends on the workload, so we can’t give a rule of thumb for the profiling time.