KNL Assembly Code for Matrix Multiplication

Thank You,

It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so,

vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000.

zmm14= 3200, 3600, 40000, …28000.

now as you said

vpsrlq zmm15, zmm10, 32 ; will shift zmm10(=zmm22) each 64 bit element by 32bit so

zmm15=? (can you compute the value of zmm15 here)?

I think zmm15 is all 0s. Its doing zmm15[31:0] = zmm10[63:32]; zmm15[63:32] = 0; zmm15[95:64] = zmm10[127:96]; zmm15[127:96] = 0; etc. But zmm10[63:32], zmm10[127:96], etc. are 0 because the indices are very small compared to the 64-bits they are stored in.

Thank alot. Things are much clear now.