RFC: Removing ptrtoint from asan instrumentation

[AMD Official Use Only - Internal Distribution Only]

Hi everyone,

Asan instrumentation introduces a ptrtoint instruction which is used as an argument to a number of runtime functions. Every instrumented load/store ends up having at least one ptrtoint attached to its address. However, the community has now raised a number of concerns about the use of LLVM generated ptrtoint & inttoptr [1] and have pointed out its effect on different optimizations. ptrtoint/inttoptr leads to conservative analysis results which does affect the performance.

We are thinking of extending the LLVM sanitizers, starting with asan, to heterogeneous situations such as those found in OpenCL and HIP. This will require asan to handle address spaces other than default 0. We want to handle address spaces similarly to how it is done in the present asan instrumentation, using inline shadow memory mapping. The required information to calculate shadow address is not known for a few address spaces at compile time, and for such use cases we plan to introduce runtime calls that can return the required shadow address.

The ptrtoint introduced by the instrumentation hinder optimizations that are important for OpenCL and other heterogeneous languages which directly or indirectly use address spaces. For example, InferAddressSpace which statically determines the address space of generic addresses, it does not work after seeing ptrtoint as one of the uses. The same applies to any optimization that deals with alias analysis or pointers in general.

The hurdle in getting rid of the introduced ptrtoint in instrumentation is the fact that all runtime function expects the address as a uptr (uptr is unsigned long). Changing the data type of addresses passed to sanitizer runtime functions from uptr to void* will eliminate the need for the conversion to integer using ptrtoint. Removing the ptrtoint from asan instrumentation will boost the performance and will reduce the differences with uninstrumented code.

  1. http://lists.llvm.org/pipermail/llvm-dev/2019-January/129095.html

Many thanks,

Reshabh and Brian

...

Hi everyone,

Asan instrumentationintroduces a ptrtointinstruction which is used as an argument to a number ofruntime functions. Every instrumented load/store ends up having at least one ptrtointattached to its address.However, the community has now raised anumber ofconcerns about the use of LLVM generated ptrtoint& inttoptr[1]and have pointed out its effecton different optimizations. ptrtoint/inttoptrleads to conservative analysis results which does affect the performance.

We are thinking of extending the LLVM sanitizers, starting with asan, to heterogeneoussituations such as those found in OpenCLandHIP.This will require asanto handle address spacesother than default 0.We want to handle address spacessimilarly to howit is done in the present asaninstrumentation, using inline shadow memory mapping. The required information to calculate shadow address is not known for a few address spaces at compile time, and for such use cases we plan to introduce runtime calls that can return the required shadow address.

The ptrtointintroduced by the instrumentation hinderoptimizations that are important for OpenCL and other heterogeneous languages which directly or indirectly use address spaces.For example, InferAddressSpacewhichstaticallydeterminesthe address space of generic addresses, it does not work after seeing ptrtointas one of the uses. The same applies to any optimization that deals with alias analysis or pointers in general.

The hurdle in getting rid of the introducedptrtointin instrumentation is the fact that all runtime function expectsthe address as a uptr(uptris unsigned long).Changing the data type of addresses passed to sanitizerruntime functionsfrom uptrto void* will eliminate the need forthe conversionto integer using ptrtoint. Removing the ptrtointfrom asaninstrumentation will boost the performance andwill reduce the differenceswith uninstrumentedcode.

1.

    http://lists.llvm.org/pipermail/llvm-dev/2019-January/129095.html
    <http://lists.llvm.org/pipermail/llvm-dev/2019-January/129095.html&gt;

Many thanks,
Reshabh and Brian

First, let me say that I think it's wonderful that you're interested in making the sanitizers work in heterogeneous environments. I definitely hope that you're able to make this work.

Second, for the reasons you've highlighted, our best practice is for optimizations not to introduce ptrtoint/inttoptr. i8* GEPs, along with casts as necessary, are preferred whenever possible. Given that one benefit of the Sanitizers is that the instrumentation can be optimized along with the program, I think it makes sense to consider these instrumentation passes in that category. It's true that, in the traditional use case, the fact that the ptrtoint obscures aliasing information isn't a big deal when the int is just being passed into a function with side effects (as it sounds like is the case here). If we have a use case that motivates changing this, however, I think that we should consider doing so.

 -Hal

Hi everyone,

Asan instrumentation introduces a ptrtoint instruction which is used as an argument to a number of runtime functions. Every instrumented load/store ends up having at least one ptrtoint attached to its address. However, the community has now raised a number of concerns about the use of LLVM generated ptrtoint & inttoptr [1] and have pointed out its effect on different optimizations. ptrtoint/inttoptr leads to conservative analysis results which does affect the performance.

We are thinking of extending the LLVM sanitizers, starting with asan, to heterogeneous situations such as those found in OpenCL and HIP. This will require asan to handle address spaces other than default 0. We want to handle address spaces similarly to how it is done in the present asan instrumentation, using inline shadow memory mapping. The required information to calculate shadow address is not known for a few address spaces at compile time, and for such use cases we plan to introduce runtime calls that can return the required shadow address.

The ptrtoint introduced by the instrumentation hinder optimizations that are important for OpenCL and other heterogeneous languages which directly or indirectly use address spaces. For example, InferAddressSpace which statically determines the address space of generic addresses, it does not work after seeing ptrtoint as one of the uses. The same applies to any optimization that deals with alias analysis or pointers in general.

The hurdle in getting rid of the introduced ptrtoint in instrumentation is the fact that all runtime function expects the address as a uptr (uptr is unsigned long). Changing the data type of addresses passed to sanitizer runtime functions from uptr to void* will eliminate the need for the conversion to integer using ptrtoint. Removing the ptrtoint from asan instrumentation will boost the performance and will reduce the differences with uninstrumented code.

  1. http://lists.llvm.org/pipermail/llvm-dev/2019-January/129095.html

Many thanks,

Reshabh and Brian

First, let me say that I think it’s wonderful that you’re interested in making the sanitizers work in heterogeneous environments. I definitely hope that you’re able to make this work.

Second, for the reasons you’ve highlighted, our best practice is for optimizations not to introduce ptrtoint/inttoptr. i8* GEPs, along with casts as necessary, are preferred whenever possible. Given that one benefit of the Sanitizers is that the instrumentation can be optimized along with the program, I think it makes sense to consider these instrumentation passes in that category. It’s true that, in the traditional use case, the fact that the ptrtoint obscures aliasing information isn’t a big deal when the int is just being passed into a function with side effects (as it sounds like is the case here). If we have a use case that motivates changing this, however, I think that we should consider doing so.

Yeah, I like this idea. I’ve briefly looked at replacing ptrtoint/inttoptr with GEP a few years back but could not get any perf or code size improvement out of it. In particular, I could not convince LLVM that shadow and regular memory never alias, though I did not try too hard. Maybe things have changed, or your target is sufficiently different from x86. In any case, this sounds like a change in the right direction. All runtime functions are extern “C”, so replacing uptr with void* in the arguments is not even an ABI break, really.

Note that MSan shadow mapping on linux/x86_64 is XOR with a constant, that would not work with a GEP.