PSLP: Padded SLP Automatic Vectorization

Hey, I noticed this talk from the EuroLLVM 2015 (https://llvm.org/devmtg/2015-04/slides/pslp_slides_EUROLLVM2015.pdf) on the PSLP vectorization algorithm (CGO 2015 paper: http://vporpo.me/papers/pslp_cgo2015.pdf).

Is anyone working on implementing it?

If so, are there Phab reviews I can subscribe to?

Best,
Matt

The CGO paper was based on a very old LLVM and the last I heard, moving the transform to a newer LLVM and rerunning the benchmarks made the speedups go away. It's not clear what the cause of this was and the team responsible didn't have the time to do any root cause analysis.

David

Thank you for the reply; interesting!

Incidentally, would you happen to know whether VW-SLP fares any better?

I'm referring to "VW-SLP: Auto-Vectorization with Adaptive Vector Width" from PACT 2018 (http://vporpo.me/papers/vwslp_pact2018.pdf; also presented as "Extending the SLP vectorizer to support variable vector widths" at the 2018 LLVM Developers’ Meeting, http://llvm.org/devmtg/2018-10/).

I'm wondering whether it subsumes PSLP or whether there are areas where PSLP still works (or worked) better, given the (brief) comparison in the paper (from which it seems like it addresses the problem possibly in a more general manner):

"The widely used bottom-up SLP algorithm has been improved in several ways. Porpodas et al. [32] propose PSLP, a technique that pads the scalar code with redundant instructions, to convert non-isomorphic instruction sequences into isomorphic ones, thus extending the applicability of SLP. Just like VW-SLP-S, PSLP can vectorize code when some of the lanes differ, but it is most effective when the non-isomorphic parts are only a short section of the instruction chain. VW-SLP, on the other hand, works even if the chain never converges."

Best,
Matt

Hi,

Some of the PSLP functionality was added to the SLP vectorizer with the
support for vectorizing alternate sequences several years ago. So some
of the most common patterns, like bundles of alternating add-subs, are
already being taken care of by the current SLP pass. However, it won't
do the more sophisticated padding across graphs that PSLP does.

There is indeed some overlap between VW-SLP and PSLP since they are both
trying to form isomorphic graphs out of non-isomorphic code. But each
one works better in different cases, so they complement each other.

For example, let's say you are in the middle of vectorizing 4-wide and
you encounter opcodes like this: [A, A, B, B].
Then you could either vectorize this with:
1. Padding the lanes and emitting shuffles (PSLP-style), or
2. Falling back to 2-way vectorization like in VW-SLP.
Determining which one is better depends on the rest of the code.
If the rest of the code is 4-wide isomorphic, then the PSLP option is
better, if it is 2-wide isomorphic then the VW-SLP option is better.

As it is mentioned in the paper, the current SLP pass can change the
vector-width but it cannot do so in all cases. The main focus of the
VW-SLP paper is to highlight some of these cases and show a design for
the SLP pass where width switching is done in a more complete way.

Performance fluctuates a lot as the compiler changes, so it is hard to
tell what to expect from applying either of these techniques to the
current snapshot.

Regards,
Vasileios

"Matt P. Dziubinski" <matdzb@gmail.com> writes:

Hi,

Thank you for your reply, this is very informative.
Good to know about the VW-SLP & PSLP trade-offs, I appreciate your explanation!

Best,
Matt