KNL Assembly Code for Matrix Multiplication

Further, I need to understand it with putting actual values since it is very confusing…

vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] ; i am supposing this will move 64 bit values from mentioned indexes though i still believe each value is required to be 32 bit. Now the indexes are [8, 9, 10, 11, 12, 13, 14, 15]. now when these indexes are added with rip it points to the value actually present at these locations so zmm22 will contain values not indexes. suppose [8]={1}, [9]={5}, [10]={4}… so zmm22 will become zmm22={1, 5, 4, 3, 8, 7, 6, 2}…these are those 64 bit values loaded from memory indexes.

vpbroadcastq zmm2, qword ptr [rip + .LCPI0_2]; here .LCPI0_2=4000 means broadcast value at this index for eg this location contains 2 so zmm2={2,2,2,2…2}.

vpmuludq zmm14, zmm10, zmm2 ; this step is value multiplication not index, there seems no point in multiplying these values here since we havent used A and B yet???

Please clarify my understanding about these initial steps; if these get cleared then only i will be able to move forward…

Thank You

If you see a comment after an instruction that contains LCP in the address, the comment indicates what static value we loading from the constant pool. So after this instruction bits 63:0 will contain the value 8. Bits 127:64 will contain the value 9. Bits 192:128 will contain 10. And so on. The CP in LCP stands for Constant Pool.

vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15]