Nadav Rotem <email@example.com> writes:
Richard Sandiford <firstname.lastname@example.org> writes:
Are you worried that adding it to PMB will increase compile time?
The pass exits very early for any target that doesn't opt-in to doing
scalarisation at the IR level, without even looking at the function.
As an alternative, adding Scalarizer and InstCombine passes to
SystemZPassConfig::addIRPasses() would probably give me most of the
benefit without affecting the PMB. Scalarizer itself would then not
test TargetTransformInfo at all, at least in the initial version,
and the scalarisation would still logically be done by codegen.
Would that be OK?
I actually prefer that the Scalarizer would not touch TTI at all because
I view scalarization a canonicalization phase for DSLs, much like SROA
That's what Pekka is thinking of using it for, but it wasn't the reason
I wrote it. The original motivation was llvmpipe, which is a rasteriser
rather than a DSL compiler. The motivation wasn't to canonicalise,
it was to do the same thing that codegen currently does, but in a better
place from an optimisation perspective.
You said in an earlier message:
Other users of LLVM (such as OpenCL JITs) do scalarize early in the
optimization pipeline because the problem-domain presents lots of
vectors that needs to be legalized.
(a) Scalarising and revectorising only makes sense if the vectorisation
is done with the target in mind. If going from scalar code to vector
code can depend on the target, why shouldn't the same be true in the
other direction, for targets without vector support?
(b) The situation you describe isn't the one that applies to llvmpipe.
In llvmpipe the vectors are nice, known widths that are under the
driver's own control. We certainly don't want to scalarise and
revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX.
The original code is already well vectorised for those targets.
(And also for ARM NEON I expect.)
In the llvmpipe case, codegen's type legaliser already makes a good
decision about what to scalarise and what not to scalarise, without
any help from llvmpipe. The problem I'm trying to solve is that
codegen is too late to get the benefit of other IR optimisations.
So in my case I do not want to _change_ the decision about which
vectors get scalarised and how. I just want to do it earlier.
It would be a shame if that meant that llvmpipe had to duplicate
exactly the decisions that codegen makes wrt scalarisation,
since codegen can easily make those decisions available through
That's why I thought using TTI in the Scalarizer was a good thing
in principle, at least as an option.
SystemZ is a simple case because there is no vector support. But take MIPS
(which is often a good example when it comes to complicated possibilities :-)).
It has at least four separate vector extensions:
- <2 x float> support from the MIPS V floating-point extensions,
carried over to MIPS 32/64.
- <8 x i8> and <4 x i16> support from the optional MDMX extension,
now deprecated but used on older chips like the SB-1 and (in a
modified form) the VR5400.
- Processor-specific vector extensions for the Loongson range.
- The new MSA ASE.
That's a lot of possiblities. Maybe the LLVM port will never support
Loongson and MDMX (almost certain for the latter), but the point is that
even if it did support them, the current codegen interface would make the
right decisions about which of the llvmpipe vectors should be scalarised
If Scalarizer is an all-or-nothing pass then it cannot make as good a
decision for llvmpipe IR, where we don't expect to revectorise the result.
Obviously the current pass is all-or-nothing anyway, but I tried to
structure it so that it would be easy to make per-type decisions in
the future, based on the TargetTransformInfo.
I realise I'm not going to convince you, and I'm going to make the
change anyway. I still think it's the wrong direction though.