However, these directives seem to interfere with auto-vectorization.
:8:3: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis=loop-vectorize]
__asm volatile("# LLVM-MCA-BEGIN sum_marked");
^
:6:2: remark: loop not vectorized: read with atomic ordering or volatile read [-Rpass-analysis=loop-vectorize]
for (size_t index = 0; index < count; index++)
^