using clang driver to generate executable for different subtargets

Hi All,

While using clang driver to create binary for aarch64 subtarget,
do we need to give any other flag apart from ‘–target=aarch64-linux-gnu -mcpu=<> -mtune=<>’ for using a AArch64’s subtarget scheduling model?
I see same executable (in size & contents) generated for subtargets of aarch64 such as cortex-a57, exynos-m1 & kryo even after replacing the cpu names.
Should I include some other flag ?

clang++ --target=aarch64-linux-gnu -mcpu=exynos-m1 -mtune=exynos-m1 way_mkbound.cpp -o my.out

With Regards,
Pankaj

While using clang driver to create binary for aarch64 subtarget,
do we need to give any other flag apart from '--target=aarch64-linux-gnu
-mcpu=<> -mtune=<>' for using a AArch64's subtarget scheduling model?

Nope, that should be enough.

I see same executable (in size & contents) generated for subtargets of
aarch64 such as cortex-a57, exynos-m1 & kryo even after replacing the cpu
names. Should I include some other flag ?

Both M1 and Kryo have different scheduling parameters, but not
radically so. It's possible that your code is not hitting any of those
differences.

Perhaps if you could provide an example of what the code is and how
you expected it to be in the assembly output (maybe comparing with
GCC's output), we'd know what is not being done.

cheers,
-renato

Thanks for clarifying.
The example I was using is the SpecInt-2006 astar benchmark with rivers.cfg & rivers.bin file.

As the loopmicroopbuffersize is 24 & 16 for m1 & kryo, so was expecting the effect on loop unroll pass, as this factor is considered as partial unrolling factor while unrolling.

Thanks,
Pankaj

While using clang driver to create binary for aarch64 subtarget,
do we need to give any other flag apart from ‘–target=aarch64-linux-gnu
-mcpu=<> -mtune=<>’ for using a AArch64’s subtarget scheduling model?

Nope, that should be enough.

I see same executable (in size & contents) generated for subtargets of
aarch64 such as cortex-a57, exynos-m1 & kryo even after replacing the cpu
names. Should I include some other flag ?

Both M1 and Kryo have different scheduling parameters, but not
radically so. It’s possible that your code is not hitting any of those
differences.

Perhaps if you could provide an example of what the code is and how
you expected it to be in the assembly output (maybe comparing with
GCC’s output), we’d know what is not being done.

cheers,
-renato

This seems like a too fine grained property to matter to the un-roller.

Having said that, it seems possible to teach the unroller to be a bit smarter.

cheers,
--renato