Can I get the binary address of a for-loop statement?

Hi, all

  What I want to do is to locate the range of a for-loop statement in
a binary. For example, given a for-loop statement belows,

for (stat1; stat2; stat3) {
  /* do something */
}

  Is it possible to get information about the range (binary address)
of the above for-loop, say, 0x0100 - 0x0120.

  One idea comes up in my mind is adding passes to retrieve such
information in LLVM, then use llvm-gcc to compile the code.

  Any suggestion appreciated.

Regards,
chenwj

In general, this is not possible. Consider, for example, the fact that various optimization passes may reorder the code, including such things as hoisting computations outside of the loop, and enclosing loops, scheduling those instructions before others that did not originate from source lines within the loop, etc. That's not to mention things like loop unrolling.

That said, especially at low optimization levels, you may be able to get at least an approximation of what you're after from debug information. See http://llvm.org/docs/SourceLevelDebugging.html for an overview.

Regards,
-Jim

In general, this is not possible. Consider, for example, the fact that various optimization passes may reorder the code, including such things as hoisting computations outside of the loop, and enclosing loops, scheduling those instructions before others that did not originate from source lines within the loop, etc. That's not to mention things like loop unrolling.

Another thing that might work would be to insert inline assembly statements that define symbols before and after the loop. The LLVM optimizations shouldn't move loop code around these inline assembly statements (if the statements are labeled as modifying memory in a volatile manner), but optimizations between the inline assembly statements should remain unfettered.

Granted, you might pessimize optimization a bit, but whether that's an issue depends upon your application.

-- John T.

Hi, all

  Actually, I want to see if it is possible to generating
annotations useful to a binary translator. The idea comes
from the paper below,

  Techniques to improve dynamic binary optimization
  http://www-users.cs.umn.edu/~adas/adas-thesis-embed.pdf

The paper lists annotations that may benefit a binary translator
on chapter 5. For example, basic block register usage (live-ins
ans live-outs) lets binary translators know which register
can be used for optimizing. And control flow information can be
used by the binary translator to identify a loop.

  Since such information tends to be used by a binary translator,
those information must be associated with binary (virtual) address.

  Any comments? Thanks!

Regards,
chenwj

Another thing that might work would be to insert inline assembly
statements that define symbols before and after the loop. The LLVM
optimizations shouldn't move loop code around these inline assembly
statements (if the statements are labeled as modifying memory in a
volatile manner), but optimizations between the inline assembly
statements should remain unfettered.

By the way, is it possible to insert additional labels using the
inline assembly statements?
I understand this may degrade the optimization quality, but it could
be done at the last optimization stage.

Alexander Potapenko

Another thing that might work would be to insert inline assembly
statements that define symbols before and after the loop. The LLVM
optimizations shouldn't move loop code around these inline assembly
statements (if the statements are labeled as modifying memory in a
volatile manner), but optimizations between the inline assembly
statements should remain unfettered.

By the way, is it possible to insert additional labels using the
inline assembly statements?
I understand this may degrade the optimization quality, but it could
be done at the last optimization stage.

I think so but have never tried it.

-- John T.