Exhaustive State Space Search for Efficient Code Generation

A long time ago, in a galaxy far away, I scribbled down some notes on how to use Prolog to exhaustively search for the best assembly instruction sequences that perform particular data manipulations, in particular for SIMD. And although I actually used such an approach to verify whether code examples shown in The Software Vectorization Handbook were truly optimal, I always thought the ideas were too thin for actual publication.

However, now that ML-to-optimize-ML is becoming popular, I was hoping that perhaps a few people would be interested in reading about ideas from a simpler time, when AI still meant Prolog and expert systems and such. Therefore, I made the notes available as arXiv white paper.

4 Likes

Very cool Aart! Prolog is perfect for this kind of system :slight_smile:

We have a similar, but much less sophisticated generator in LLVM: llvm-project/PerfectShuffle.cpp at main · llvm/llvm-project · GitHub

1 Like

Very nice! Thanks for the link, Sean, that is super interesting!

Aart you may be interested in slides 25+ in this slide deck. It talks about open problems in vector isel, perfect shuffle, etc. It is right up your alley :wink:

-Chris

1 Like

Thank you, Chris, for the slides on the perfect shuffle!

I love using “AI” for problems like this. There is something very satisfying about exhaustively (or smarter) searching for answers.

I remember how thrilled I was first time I typed (basically querying how to sum “horizontally”):

s( [ [a,b,c,d], xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7], I
   [ [_,_,_,(d+b)+(c+a)] | _ ] ).

in my “expert system”, and got this back:

I = [
i(movdqa,xmm1,xmm0),
i(psrldq,xmm0,8),
i(paddd,xmm1,xmm0),
i(movdqa,xmm0,xmm1),
i(psrldq,xmm1,4),
i(paddd,xmm0,xmm1)
]
1 Like