Adding CFI checks to clang vs llvm

Hi,

In http://reviews.llvm.org/D7424 we've been discussing whether to insert
control flow integrity checks in Clang or LLVM. The main challenge is that
the checks need something like a string associated with each call, and
there's currently no stable way to ensure that the string stays with the call.

The current version of the patch does the checks with an intrinsic, but
there's a concern that this may interfere with devirtualization.

Does anyone have any opinions besides what's been discussed on the review
thread?

Thanks,

Rather than using a new intrinsic, you could use either patchpoint or statepoints to represent this. If you passed the string you needed tied to the call as an argument, it would end up in the stackmap section. You'd be guaranteed that the string was available throughout the optimizer as well.

Philip

It may be a good idea to use patchpoints (or something like them) to give a
linker space to assemble a (possibly optimized based on global information)
check if we wanted to drop the dependency on LTO. I'd need to think about this
more though, and this is probably not something we'd want to do in version 1.

In general, the idea of representing the calls as an intrinsic call taking
a function pointer/args seems interesting, but it may be simplest to avoid
trying to overload one of the existing intrinsics.

Peter

My primary concern is that I would very much like the CFI implementation to
be truly generic for indirect function calls rather than specific to type
hierarchies.

Is the issue that for virtual calls there is a dramatically cheaper way to
structure the CFI implementation than there is for fully general indirect
calls?

Is the issue that detecting and instrumenting the calls in the IR is
particularly complex?

The main problem with a design that is fully generic to indirect calls
is a lack of precision. If we design our checks independent of the type
hierarchy we could permit virtual calls to a function of the wrong type if
the parameter/return types would otherwise match. See also [1] which contains
some discussion of precision.

There are also ancillary performance benefits of this design. For example,
if we lay out the type information near the virtual tables we can in some
cases avoid an additional cache miss.

Thanks,

> My primary concern is that I would very much like the CFI implementation
to
> be truly generic for indirect function calls rather than specific to type
> hierarchies.
>
> Is the issue that for virtual calls there is a dramatically cheaper way
to
> structure the CFI implementation than there is for fully general indirect
> calls?

The main problem with a design that is fully generic to indirect calls
is a lack of precision. If we design our checks independent of the type
hierarchy we could permit virtual calls to a function of the wrong type if
the parameter/return types would otherwise match. See also [1] which
contains
some discussion of precision.

This is helpful information, and clarifies a miscommunication. =]

My hope would be that we could implement the checks generically, but
provide constraints from the frontend to further constrain the correct
target set. Does that make sense at all?

Maybe to put it another way, if I have some other language than C++ which
provides a different language-based constraint on the set of possible
targets for an indirect call, it would be nice for both Clang and that
other language frontend to encode generic target constraints and then a
common check implementation to kick in to implement them. And then the
check mechanism could be *completely* generic in the case of fully general
indirect calls.

There are also ancillary performance benefits of this design. For example,
if we lay out the type information near the virtual tables we can in some
cases avoid an additional cache miss.

That does seem quite nice to preserve if possible. Is it possible to use a
hint to get this?

I would be wary of trying to be too generic up front. The two fundamental
things we might want to check are properties of pointers into data (for
virtual calls) and of pointers into code (for indirect calls through function
pointers), and these may very well call for different types of checks. For
example, we may well find that the best way to check indirect calls through
function pointers is to encode and check for specific instructions in the
function body as done in [1], but we shouldn't need to waste space on encoding
the instructions in functions called via virtual tables.

Also keep in mind that other languages have vtable-like things as well
(witness Go interfaces, Haskell type classes and Rust traits) and I do think
it would be feasible to implement checks for these features in terms of the
same primitives we're introducing here for C++ vtables.

Thanks,