# Linalg.generic iterate over the rows

Hi Everyone,
Is it possible to use linalg.generic to iterate over the rows of a tensor and not only the elements. Something like that:

``````#indexing_accesses = [
affine_map<(m) -> (m, ?)>,
affine_map<(m) -> (m)>,
]
#my_trait = {
indexing_maps = #indexing_accesses ,
iterator_types = ["reduction"]
}
...
%res = linalg.generic #my_trait
ins(%A: tensor<?x?xf64>)
outs(%Y:tensor<?xf64>){
^bb0(%a: tensor<?xf64>, %y:f64) :
linalg.dot  ins(%a, %Y:tensor<1x?xf64>,tensor<1x?xf64> ) outs(%y1:f64)
linalg.yield %y1 : f64
}
``````

What I am trying to do here is different than a simple matrix vecteur multiplication. In fact, we start by multiplying the first row with the vector %Y and we directely update %Y[0] then we use the new updated %Y vector for the multiplication with the second row and so on. In lower level code, that should generate something like that

``````for i = 0 to N {
yi = 0
for j = 0 to N {
yi += A[i,j]*Y[j]
}
Y[i] = yi

}

``````

Another question related to that: we can see that the outer loop can not be parallelized especially in the case of a fully dense matrix, but if I would like to consider the case of a very sparse matrix and propably use sparse tensor: we can do some analysis on the sparsity to detect the groups of indices i that can be parallelized. So do you think that linalg.generic can allow that? or if it can be extended ?

hi,

`linalg.generic` in its current form does not support accessing rows. Instead, the body always works on elements. Another limitation is that we only support parallel and reduction dimensions. I think you code example would need a sequential and a reduction dimension?

Leveraging sparsity would probably require some inspector-executor model? I think that is an interesting research direction but quite a change from what we currently have.

`linalg.generic` can only do the dense part of the computation. A generic always iterates all elements of its input / output tensors and the loops are either parallel or reduction loops. That means in your case you probably need to use an scf.for to iterate over all groups and then you may be able to use a `linalg.generic` for the inner reduction. If you want to abstract the algorithm that searches the groups some how then you need custom abstractions / IR.