[AArch64] Will the instruction cnth's pipeline delay increase when there is a inside mul?

As we known, the operand mul usual has a high cost as its pipeline is longer.
here is an assemble in AArch64 target, which has a mul in the cntd, so does the cost of cntd x8, all, mul #5 will increase compare to cnth x8 ? (see Compiler Explorer)
cntd x8, all, mul #5
add w0, w0, w8

yes, refer to D132322, the cost of “cntd x8, all, mul #5” is expected bigger than cntd x8