[ubsan] Check recovery performance, optimizations

Hi all,

A short while back support to recover from checks was added as a -cc1
option "-fsanitize-recover".

Largest concern regarding recovery was the performance impact.

To explore this, I ran some experiments using SPEC CINT2006. These
aren't intended to represent much other than comparing the cost of
recovery vs non-recovery.

Flags used were "-fsanitize=integer
-fno-sanitize=unsigned-integer-overflow,shift", with unsigned and
shift checks disabled so the no-recover variant didn't abort on every
program :). Even so, perl, gcc, and h264 were removed from the runs
since they aborted on the no-recovery configuration.

Without recovery, overhead was 15% on average, and with recovery it
was 35%. So it does seem the performance concerns were rather
justified. See the attached graphs for per-benchmark breakdown
("NoRecover" and "Recover" respectively).

Exploring why, I made some tweaks and re-ran the same experiments,
these are shown in the "NoRecoverColdBr" and "RecoverColdBr" groups.
As can be seen, this reduces overhead of recovery to 22%, which as
shown on the second page brings the average overhead relative to
NoRecover down to 6% (second page).

The 'tweaks' made were:
* Branch metadata on the checks indicating the checks are very
unlikely to trigger. Performance counter experiments showed much
worse icache performance with recovery enabled, and branch metadata
helps block placement avoid these issues.

* Use of ColdCC on ubsan runtime calls. This alleviates the other
issue, severe register allocation damage incurred by inserting
function calls everywhere. The idea here is the same: optimize
assuming these checks should never be invoked. Unfortunately this
requires:
** Fixing ColdCC support in LLVM (see
http://llvm.org/bugs/show_bug.cgi?id=14481, which includes the patch
used for these experiments)
** Platform-specific code in ubsan to save/restore registers around
the handlers. Currently I'm doing this with hand-written asm, it'd be
much better if we could simply mark the functions themselves with
___attribute__(coldcc) or so.

These fixes seem worth exploring: adding branch metadata seems like
the right thing to do now regardless, and fixing ColdCC support (and
providing a reasonable way to make cold functions) seems of interest
to the community as well.

Anyway, thoughts/feedback welcome, especially on how to best proceed.
Also, is the reduced overhead low enough to make recovery a reasonable
default option?

Thank you for your time, and have a great Friday! :slight_smile:

~Will

recover.pdf (7.6 KB)

Hi all,

A short while back support to recover from checks was added as a -cc1
option "-fsanitize-recover".

Largest concern regarding recovery was the performance impact.

To explore this, I ran some experiments using SPEC CINT2006. These
aren't intended to represent much other than comparing the cost of
recovery vs non-recovery.

Flags used were "-fsanitize=integer
-fno-sanitize=unsigned-integer-overflow,shift", with unsigned and
shift checks disabled so the no-recover variant didn't abort on every
program :). Even so, perl, gcc, and h264 were removed from the runs
since they aborted on the no-recovery configuration.

Without recovery, overhead was 15% on average, and with recovery it
was 35%. So it does seem the performance concerns were rather
justified. See the attached graphs for per-benchmark breakdown
("NoRecover" and "Recover" respectively).

Exploring why, I made some tweaks and re-ran the same experiments,
these are shown in the "NoRecoverColdBr" and "RecoverColdBr" groups.
As can be seen, this reduces overhead of recovery to 22%, which as
shown on the second page brings the average overhead relative to
NoRecover down to 6% (second page).

Awesome - thanks for running the numbers. (ideally it'd be nice if
these sort of experiments could easily be run with LNT as that's what
we like to track perf with (not that it leaves nothing to be desired -
weighting the tests there appropriately & adding more relevant ones,
etc) - I know we've discussed LNT before, but were there particular
issues that meant LNT+the nightly test suite wasn't applicable? One of
the issues I had with it was noise. How was the noise on your SPEC
test runs?)

The 'tweaks' made were:
* Branch metadata on the checks indicating the checks are very
unlikely to trigger. Performance counter experiments showed much
worse icache performance with recovery enabled, and branch metadata
helps block placement avoid these issues.

* Use of ColdCC on ubsan runtime calls. This alleviates the other
issue, severe register allocation damage incurred by inserting
function calls everywhere. The idea here is the same: optimize
assuming these checks should never be invoked. Unfortunately this
requires:
** Fixing ColdCC support in LLVM (see
http://llvm.org/bugs/show_bug.cgi?id=14481, which includes the patch
used for these experiments)
** Platform-specific code in ubsan to save/restore registers around
the handlers. Currently I'm doing this with hand-written asm, it'd be
much better if we could simply mark the functions themselves with
___attribute__(coldcc) or so.

These fixes seem worth exploring: adding branch metadata seems like
the right thing to do now regardless, and fixing ColdCC support (and
providing a reasonable way to make cold functions) seems of interest
to the community as well.

I suppose the other relevant data point: how much does that (ColdCC on
ubsan runtime calls) speed up the non-recovery case? (kind of unfair
to compare recovery with ColdCC to non-recovery without it)

Anyway, thoughts/feedback welcome, especially on how to best proceed.
Also, is the reduced overhead low enough to make recovery a reasonable
default option?

I'm not sure about that value judgement but I expect (perhaps this is
already the assumption) that recovery will be necessary regardless of
whether it's the default.

Thank you for your time, and have a great Friday! :slight_smile:

Thanks again,
- David

Hi Will,

Hi all,

A short while back support to recover from checks was added as a -cc1
option "-fsanitize-recover".

Largest concern regarding recovery was the performance impact.

To explore this, I ran some experiments using SPEC CINT2006. These
aren't intended to represent much other than comparing the cost of
recovery vs non-recovery.

Flags used were "-fsanitize=integer
-fno-sanitize=unsigned-integer-overflow,shift", with unsigned and
shift checks disabled so the no-recover variant didn't abort on every
program :). Even so, perl, gcc, and h264 were removed from the runs
since they aborted on the no-recovery configuration.

Without recovery, overhead was 15% on average, and with recovery it
was 35%. So it does seem the performance concerns were rather
justified. See the attached graphs for per-benchmark breakdown
("NoRecover" and "Recover" respectively).

Exploring why, I made some tweaks and re-ran the same experiments,
these are shown in the "NoRecoverColdBr" and "RecoverColdBr" groups.
As can be seen, this reduces overhead of recovery to 22%, which as
shown on the second page brings the average overhead relative to
NoRecover down to 6% (second page).

The 'tweaks' made were:
* Branch metadata on the checks indicating the checks are very
unlikely to trigger. Performance counter experiments showed much
worse icache performance with recovery enabled, and branch metadata
helps block placement avoid these issues.

* Use of ColdCC on ubsan runtime calls. This alleviates the other
issue, severe register allocation damage incurred by inserting
function calls everywhere. The idea here is the same: optimize
assuming these checks should never be invoked. Unfortunately this
requires:
** Fixing ColdCC support in LLVM (see
http://llvm.org/bugs/show_bug.cgi?id=14481, which includes the patch
used for these experiments)
** Platform-specific code in ubsan to save/restore registers around
the handlers. Currently I'm doing this with hand-written asm, it'd be
much better if we could simply mark the functions themselves with
___attribute__(coldcc) or so.

These fixes seem worth exploring: adding branch metadata seems like
the right thing to do now regardless, and fixing ColdCC support (and
providing a reasonable way to make cold functions) seems of interest
to the community as well.

Anyway, thoughts/feedback welcome, especially on how to best proceed.
Also, is the reduced overhead low enough to make recovery a reasonable
default option?

Thanks a lot for doing this! That's really useful information.

I think the overhead of the recovery is low enough that we can provide
this as a driver-level option (patches welcome!). As you previously
noted, some users will want to be able to say 'do not try to recover'.
Your RecoverColdBr numbers look good enough that I'd be happy for
recovery to be the default.

Branch metadata seems like a great idea for the Recover case. We don't
have this yet mostly because it's not necessary if we're not
recovering (the unreachable tells the optimizer the same thing). Do
you know how much of the performance improvement can be gained from
that alone? Feel free to send out your patch for that!

Adding support for coldcc to Clang and using it in the runtime sounds
like a great idea, but it means we'd lose the ability to compile
compiler-rt with gcc. In the longer term, I think this is a good path
to take, and I would encourage you to post your patch for PR14481,
along with testcases, to llvm-commits, and also to pursue a coldcc
attribute for Clang. However, I'd like to get to a position where our
build system builds compiler-rt with the just-built Clang (and somehow
handle the $BUILD != $HOST case) before we migrate to this approach.