llvm-mca markers prevent loop vectorization

The docs for llvm-mca suggest using inline assembly to mark the region that llvm-mca should examine, i.e.

__asm volatile("# LLVM-MCA-BEGIN");
// …
__asm volatile("# LLVM-MCA-END");

However, these directives seem to interfere with auto-vectorization.

:8:3: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis=loop-vectorize] __asm volatile("# LLVM-MCA-BEGIN sum_marked"); ^ :6:2: remark: loop not vectorized: read with atomic ordering or volatile read [-Rpass-analysis=loop-vectorize] for (size_t index = 0; index < count; index++) ^

Compiler Explorer link.

Any ideas for a workaround, other than compiling unmarked source and then manually inserting markers into the emitted assembly?

This is a known (but poorly documented) issue. Please can you raise a bug on this?