Hi Tobias,

I think that we could split a patch that contains an implementation of

tiling, interchanging and unrolling of specific loops into three

separate patches:

1. The first one adds a class that describes a processor model. It

also adds a new command line parameter that contains all necessary

parameters of a target architecture, which are used to construct

objects of the class.

2. The second one adds methods to the class to compute parameters for

instantiations of the matrix-matrix multiplication. It also implements

tiling, interchanging and unrolling of specific loops.

3. The third one replaces manual passing of parameters of a target

architecture with utilization of information from LLVM.

What do you think about it?

P.S.: I’m not sure whether all necessary parameters of a target

architecture are accessible from LLVM and how it’s better to get them

in our case. Should we ask these questions on the mailing list now?

If I’m not mistaken, we’re interested in the following parameters:

1. Size of double-precision floating-point number.

2. Number of double-precision floating-point numbers that can be hold

by a vector register.

3. Throughput of vector instructions per clock cycle.

4. Latency of instructions (i.e., the minimum number of cycles between

the issuance of two dependent consecutive instructions).

5. Paramaters of cache levels (size of cache lines, associativity

degrees, sizes).