Counting load and stores of variables

Hi all,
I’m working on my thesis in which I need to count the number of memory access (i.e. load and stores) of each variable of program. how can I do that? these are some approaches came to my mind :

  • I need to know during the execution of the program how many load and store instructions are executed on a specific variable, hence simply counting static load and store instructions doesn’t help.
  • I also cannot write an IR pass to inject some instructions to count memory accesses of variables (instrumenting) , because in IR optimization I’m not aware of what will happen in register allocation.
  • And if I write a machine pass to inject machine instructions, I guess maybe I loose some informations( e.g. variable names) during the optimizations.

Thanks,
MohammadKazem

Seems tricky.

The complexity of the solution depends on

int x; // our variable of interest

int func(int &a)
{
    a++; // a++ actually updates `x` because it's passed in below.
}

int main()
{
    func(x);
}

We can of course make MUCH more complex variants of the above, but
there is no way the compiler, whilst compiling `func` can know if the
variable is `x` or not - until it optimizes and inlines `func`, at
least.

The simple C++ solution would be to replace the simple variable with
an object that has a "counter" applied to all accesses of the content
- but it will affect code-gen and it will affect the "compatibility"
of the variable (it's no longer a regular `int` or `double`). Aliasing
gets challenging here.

Using the compiler's profiling options will tell you which blocks are
hit how many times, and if you analyse which blocks contain your
variable [and how many times the variable is updated] - this assumes
your variable isn't aliased.

Another thought that comes to my mind is to use a virtual machine, and
place those variables in a special section of memory, that can then be
attributed with "read_only" or "no access" and have the VM catch the
access, update the value and then continue. But that's not an easy
solution.

If the number of variables is small, you could use debug functionality
for memory acesss breakpoints (register a "break on read" and "break
on write" for a certain variable - in the handler for this, count up
the relevant counter and continue). This is probably doable even if
the number of available hardware breakpoints is smaller than the
number of variables you want to monitor, by running the program
multiple times (assuming your program is repeatable with the same
result every time - if it's some sort of random number generation with
new seed every time you run it, or if it is based on some real-time
events (stock market, traffic, weather, live video etc), this won't
work).

Hardware solutions (either virtual machine or memory access
breakpoints) are the only reasonable ways to solve the aliasing
situation.

If you have plenty of time, you could also use a processor emulator,
for example qemu - and let it catch writes to certain addresses and
count how many times. Not practical for most applications tho', since
the processor emulators run much slower than real hardware (100-1000x
slower in a good case)

It gets even more interesting if we consider the actual machine
instructions. The `a++` above could in say x86 be implemented as `inc
a` or `mov a, reg; inc reg; move reg, a` - both will indeed read,
modify the value and write it back - but one is explicitly
loading/storing the value, the other is not. How do you account for
that? Is it one or two accesses?

Further on that last comment:

If you are interested in counting accesses to `x`, and have the following code:

    a = x + x + x + x;

and the compiler translates that to:

    a= x * 4;

should that count as 1 or 4 accesses?

What if you have code like this:

     x = 1;
     if (a) x = 4;
     if (b) x = 5;
     if (c) x = 7;

if a, b and c are true, and the compiler decided to re-arrange the
code like this:

     reg_temp = 1;
     if (a) reg_temp = 4;
     if (b) reg_temp = 5;
     if (c) reg_temp = 7;
     x = reg_temp;

how many accesses to `x` is that? 1 or 4? Do you care about accesses
in the original source or accesses to the memory location where it
stores the value?

Are you trying to capture information in higher optimization level ?

If yes, then it will be tricky and you may not get correct information.

You should be careful with cases where load & stores are optimized by registers.

You can try address based instrument in your translation units.

Whenever you see memory accesses (i.e. load & store) instrument them to print to a file.

i.e.

address x100 accessed as store.

address x200 accessed as store.

address x100 accessed as store.

address x100 accessed as load.

Also print variable belongs to address.

i.e.

a address is x100

b address is x200.

later as a post processing step, process that file to generate consolidated report.

Sometime adding instrumentation will hurdle few optimizations and generated report

may not be the default execution flavor but it will be very closed to it.

Regards,

Ashutosh

Thanks, your solution seems doable, but how can I print variable addresses (in particular local variables) ?

Thanks for your response,

Further on that last comment:

If you are interested in counting accesses to `x`, and have the following code:

a = x + x + x + x;

and the compiler translates that to:

a= x * 4;

should that count as 1 or 4 accesses?

if the compiler translated it to a= x*4 and it has actually one load it should count as 1,

What if you have code like this:

  x = 1;
  if (a) x = 4;
  if (b) x = 5;
  if (c) x = 7;

if a, b and c are true, and the compiler decided to re-arrange the
code like this:

  reg_temp = 1;
  if (a) reg_temp = 4;
  if (b) reg_temp = 5;
  if (c) reg_temp = 7;
  x = reg_temp;

how many accesses to `x` is that? 1 or 4? Do you care about accesses
in the original source or accesses to the memory location where it
stores the value?

I care about memory location and not original source. thus it has actually 1 access to x.

Thanks for giving so many alternatives.

It gets even more interesting if we consider the actual machine
instructions. The `a++` above could in say x86 be implemented as `inc
a` or `mov a, reg; inc reg; move reg, a` - both will indeed read,
modify the value and write it back - but one is explicitly
loading/storing the value, the other is not. How do you account for
that? Is it one or two accesses?

As I target arm instruction set, I need only to consider load and store instructions.

Thanks, your solution seems doable, but how can I print variable addresses (in particular local variables) ?