I am looking for a tool (in Linux or Windows) that allow me to get performance measures like cycle execution, cache accesses, etc. for an x86 architecture. I want to estimate the performance overhead due to the modification that I do using LLVM.
Oprofile for Linux is a pretty good alternative.
(About OProfile)
It uses hardware performance counters to collect profiling information
and therefore has very low overhead, whereas Valgrind performs dynamic
binary instrumentation and can be significantly slow (20-50x slower).
In addition, Cachegrind 'simulates' cache behavior through it's own
cache model, whereas Oprofile (or other counter based profilers)
report real cache events.
Depending on what your needs are (ease of use, runtime overhead, etc)
you could pick either.
I have never used CodeAnalyst first-hand, but the slow-down figures
that you quote lead me to believe that it must use hardware
performance counters. Instrumentation based profilers rarely, if ever,
display such low overhead. Also, instrumentation based profilers
cannot profile kernel routines, unless there is explicit support from
within the kernel (such as in Sun Solaris 10 and DTrace).