Polly's auto paralyser flag

There is one flag in polly which detects parallelism and creates openmp code. I’ve been using that flag along with polly , but there is no any performance improvement for general matrix multiplication program(in every data type). I’m experimenting on a local arm cluster and using LLVM version 15.0.7
flags used:clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp matrix.c

Use debug options or flags to verify Polly even recognizes your code and can translate it into the Polyhedral model, see How to manually use the Individual pieces of Polly — Polly 17.0.0git documentation.

(Tag @Meinersbur)

thank you…
But in the link mentioned above also , at the end they mentioned adding vectorization and using OpenMP degrades the performance.

Polly works on toy examples. If you make the input big enough, you get performance through parallelization.