Code generation for llvm-mca instrumentation

Hi all,

I am trying to instrument functions by adding asm calls for llvm-mca similar to llvm-mca instrumentatio. For some reason, the instrumented code is duplicating in the assembly generated resulting in llvm-mca complaining about an invalid region

The code I am trying this on is from llvm test suite itself (read: fftmisc.c from telecom-fft in mibench)

unsigned ReverseBits ( unsigned index, unsigned NumBits )
    __asm volatile("# LLVM-MCA-BEGIN test");
    unsigned i, rev;

    for ( i=rev=0; i < NumBits; i++ )
        rev = (rev << 1) | (index & 1);
        index >>= 1;

    __asm volatile("# LLVM-MCA-END test");

    return rev;

I am instrumenting the function like this.

Flags used: O2 /O3 -g

Surprisingly this doesn’t happen with O1 or without optimization.

Any help regarding this will be really helpful. Thanks

this seems to be a short snippet, could you provide the assembly code that shows the problem?

Hi sure. I am pasting the link of the file from G-drive file

For some reason I can’t upload files it seems (Complains about being a new user!)

Please see if you can access the link above

If you search for the pragma LLVM-MCA END you’ll see two of them for the function reversebits i.e the pragma getting duplicated for some reason

I am unable to access your file. Can you please look into it?

Can you check now?

I haven’t looked deep into it, but it seems like some optimizations are duplicating some of the blocks. Have you checked the optimized LLVM IR and found anything weird? (Like, the last inline asm instruction was already duplicated in the IR)

Yes! In the IR it is fine and not duplicated but in code-generation, for some reason, it gets duplicated

After some quick debugging I found that this is caused by the TailDuplicator.

I also found that using __asm volatile("# LLVM-MCA-END test":::"memory"); doesn’t help. You need to add another memory barrier below LLVM-MCA-END to prevent such optimization:

__asm volatile("# LLVM-MCA-END test");
__asm volatile("":::"memory");

Hi, Oh! Thanks for this I will try it out. Just out of curiosity does the memory barrier here prevents re-ordering during code-gen?

We should update the documentation, and clarify that “memory” should be added to the clobber list for all mca markers.
That way, markers would always be treated like compiler barriers, and our machine schedulers won’t try to move instructions across those boundaries.

Note however that there may still be cases where passes hoist computation outside of a mca code region. Unfortunately markers are not perfect, and they often introduce side effects (as documented).

1 Like