Heads up! Planning to remove old vector shuffle lowering this week...

I’ll be skimming the PRs to see if there are any really critical regressions, but so far it looks pretty good.

If you are actively disabling the new vector shuffling and have some PR that blocks you, please reply here. Later this week, the flag will go away unless I hear strenuous objections. There is a really staggering amount of cleanup and tidying that needs to take place and can’t until we remove the old code paths.

-Chandler

It doesn’t look like there has been any changes for this yet - is the plan for the old shuffle code to be removed before the branch for 3.6? If so do you have this in hand or do you want assistance to get it done in time?

Simon.

I just got distracted by other things. I should be able to take care of it
now that folks are back from the holidays.

One question -- do you see any regressions that need fixing first? I don't
see any, but I'm curious about others. The silence on this thread didn't
inspire confidence, but perhaps its just that nothing is broken with the
new stuff?

No notable regressions, I’m seeing different code but mostly for the better - although there are a number of vec256 shuffles (mostly lower/upper crossings) that are rather poor (I think Quentin raised bugs on a couple of these) - but the old system could be a lot worse. I think the few cases that remain can easily be dealt with by individual bug reports.

The amount of domain crossing is much lower now - but there are a number of float shuffles that now use double shuffles instead - fine from a domain point of view but rather unexpected. IIRC this often appeared in matrix transpose code - movlhps / movhlps being replaced by unpcklpd / unpckhpd is the one I seem to remember.

Overall - a massive improvement - thank you!

No notable regressions, I’m seeing different code but mostly for the better - although there are a number of vec256 shuffles (mostly lower/upper crossings) that are rather poor (I think Quentin raised bugs on a couple of these) - but the old system could be a lot worse.

I think Simon talks about PR21943, but this should not hold for moving forward.

I think the few cases that remain can easily be dealt with by individual bug reports.

I concur.

Just a quick question, did anyone check the differences on i386?
I want to be sure we identified all the problems before we remove the ability to easily track down regressions.

Thanks,
Q.

I haven't checked in any detail I'm afraid. I don't have any good way to do
32-bit x86 benchmarking these days.

However, the Chromium performance bots haven't shown any significant
problems.

Same here.

In that case, I say we are good to go.

Thanks again for all the work!

Q.

PR21138 was the one that I was thinking of but PR21943 is a regression too.

PR21137 covers examples of the domain crossing issues I mentioned. If people are open to putting in specific logic for shuffles in the get/set ExecutionDomain code (and not just the basic matching tables) then a fix would be relatively trivial.

FWIW, I don’t think we can remove this path just yet.

I had missed something I always intended to do: we don’t adjust the shuffle legality tests to reflect the new lowering. Until we do, a bunch of the code is shared, and we’re producing deeply wrong results about what vector shuffles are legal.

I’ve added a flag to start experimenting with this and seeing how things look in r225491.