Trace alias analysis arguments back to source level variables

Hi everyone,

We are currently working on a project, where we are trying to instrument code (either source or IR, source is C/C++) of the input program to give us results to cached alias analysis queries at runtime, which we intend to feed back into LLVM to provide perfectly precise alias query responses for this specific program.
To facilitate (and optimize) this, we thought about trying to relate every argument pair of input queries (specifically queries to an implementer of AAResultBase::alias) back to some source-level variables using the debug information. With this relation, we hope to then be able to instrument the source program directly to produce responses to all stored queries.
A (simple, expensive) alternative to this would be to simply run a separate instance of the compiler for each alias query, with the input being the IR of the program instrumented to produce responses for just that single query.

We were now wondering if our first approach could even work amidst optimizations, i.e. is there any sense in trying to trace alias arguments (MemoryLocations and their .Ptr Value *s) back to the source-level and also finding an appropriate place in the source code to add instrumentation to check for aliasing?

What we have found in some limited testing, is that the arguments to ::alias often do not directly have debug information attached, but instead are the result of e.g. a GEP or bitcast of a debug-annotated SSA variable (with @llvm.dbg .declare/.value), which would in and of itself not necessarily be a problem, since checking for aliasing on a GEP result can be done in the source-level by checking for aliasing with some pointer arithmetic offsets. Using the approach from this stackoverflow answer and walking backwards through GEPs we are able to find source-level variable names for those arguments which end up being a GEP result of some debug annotated variable. However, we haven’t figured out a way how to find where we should then add instrumentation in the source-level.

Alternatively, are there perhaps any examples you can think of that we’ve missed that would strongly indicate the first approach would not work, e.g. one where the argument to an alias query does not have a clear origin in the source-level, because it was “pulled out of thin air” by LLVM?

We are on LLVM 14.0.0.