As of late I am having a hard time getting my head around how array accesses
are translated by Clang into LLVM IR: the often misunderstood GEP instruction.
I am trying to reverse-engineer array accesses to discover the number of dimensions
and actual indexes of the original, and I am beginning to wonder whether this is
possible at all. To illustrate (some of) my troubles, consider the following code and
the LLVM IR for both 32 and 64 bit memory addresses:
Hi Gabriel, I suggest you don't bother with testcases like this that are doing
undefined things. For example, neither i nor k are initialized, so the result
of accessing the array is undefined. Thus the frontend can (and apparently
does) produce anything strange thing it does. What is more, the result aux is
unused, so there is no obligation to compute it correctly. I think you will
get more understandable results with a more sensible testcase.
See “How does VLA addressing work with GEPs?” in the GEP FAQ. In short, you have to
reverse-engineer, and the ScalarEvolution library can help you with that.
Also, one thing not mentioned in the FAQ is that if you want to assume that the dimensions
are independent (in other words, that the inner dimension is never over-indexed), you
have to prove it for yourself. Even though overindexing may be prohibited at the source level,
it’s valid at the LLVM IR level, and some LLVM optimizations do use it. ScalarEvolution
can help with this as well, though it doesn’t do everything.
Thank you both for your answers.
Actually, as a result of Duncan’s comment I’ve been investigating how come
this code is generated. It seems to happen only with a particular version of clang.
Can’t reproduce it on different machines cross-compiling for the original target
(x86_64-apple-darwin10.0.0), so I will assume I don’t need to consider this
I had read that part about ScalarEvolution helping with reverse engineering, but
haven’t looked at it in any depth as of yet. I will eventually come to that, I guess.
Thanks for the tip.