[GSoC] Supporting Efficiently the Shift-vector Instructions of the Connex Vector Processor

Hello,

I am applying for Google Summer of Code with a project related to LLVM and Connex SIMD processor and I would appreciate some feedback on the proposal.
The proposal can be found here:

Thank you,
Andrei Popa

Hello Andrei,

You proposal seems to be centered on improvements of out-of-tree
backend entirely and therefore is not entirely clear what are the
benefits of this project to the LLVM project making the project
slightly outside the scope of LLVM GSoC.

Probably it would make sense to restructure the proposal in such way
that it would be clear how the work that is supposed to be done over
the summer will benefit e.g. other backends that we already having in
the repository. After all, the vector instructions do exist in many
architectures.

Hello, Anton,
     I'd like to add a small reply regarding this GSoC project that I would like to mentor and I discussed also with Andrei.
     A good part of our GSoC project is indeed related to this Connex back end that it's not yet part of the LLVM source repository - an important thing proposed in the project is that we plan to perform efficient realignment for this Connex vector processor.

     I looked a bit in LLVM and I see that support for realignment of misaligned vector memory accesses is not implemented in the LoopVectorize pass (see lib/Transforms/Vectorize/LoadStoreVectorizer.cpp) nor in any back end (folder lib/Target). Please correct me if I'm wrong.
     But realignment is an interesting technique useful for many SIMD and (wide) vector processors - there are still SIMD processors today that either have performance issues or simply can't perform misaligned accesses, and for wide vector processors with many lanes this problem is equally important and even more complex. People have already addressed in a platform-independent way realignment of misaligned vector memory accesses - see, for example for GCC a paper of Nuzman and Henderson ("Multi-platform Auto-vectorization", CGO 2006, https://www.researchgate.net/publication/4231612_Multi-platform_auto-vectorization).
     It would be interesting to address this issue of realignment of misaligned vector memory accesses in LLVM - this seems to be already well supported in GCC.

     Please note also our Connex vector processor back end has been reviewed and we should be accepted as experimental (also because it has a few "exotic" features) - see, if interested, https://reviews.llvm.org/D60052 .

   Best regards,
     Alex

   Hello, Anton,
     I'd like to add a small reply regarding this GSoC project that I would like to mentor
and I discussed also with Andrei.
     A good part of our GSoC project is indeed related to this Connex back end that it's
not yet part of the LLVM source repository - an important thing proposed in the project is
that we plan to perform efficient realignment for this Connex vector processor.

     I looked a bit in LLVM and I see that support for realignment of misaligned vector
memory accesses is not implemented in the LoopVectorize pass (see
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp) nor in any back end (folder lib/Target).
Please correct me if I'm wrong.
     But realignment is an interesting technique useful for many SIMD and (wide) vector
processors - there are still SIMD processors today that either have performance issues or
simply can't perform misaligned accesses, and for wide vector processors with many lanes
this problem is equally important and even more complex. People have already addressed in
a platform-independent way realignment of misaligned vector memory accesses - see, for
example for GCC a paper of Nuzman and Henderson ("Multi-platform Auto-vectorization", CGO
2006, https://www.researchgate.net/publication/4231612_Multi-platform_auto-vectorization).
     It would be interesting to address this issue of realignment of misaligned vector
memory accesses in LLVM - this seems to be already well supported in GCC.

     Please note also our Connex vector processor back end has been reviewed and we should
be accepted as experimental (also because it has a few "exotic" features) - see, if
interested, ⚙ D60052 Add Connex vector processor back end .

I don't believe that statement to be true/faithful to the truth.
It was not reviewed as in "ready to land".
It's not quite anywhere near that state, yet anyway.

   Best regards,
     Alex

Roman.

Hello, Roman, llvm-dev,
     There is one aspect I would like to discuss with you, the people from the llvm-dev mailing list. There is one thing a bit more special with this back end, namely our back end handles symbolic immediate operands (C/C++ expressions written as strings in INLINEASM MachineInstrs). This means the back end can output a vector assembly code like:
       VLOAD RegVectorial0, N * 10 + 5 // where N is a variable in the original C program
     Therefore, in order to support retrieving from LLVM IR back to the original source C code, something possible with the current LLVM distribution, which is implemented in our Connex LLVM compiler project, we require adding a simple data structure in the LLVM source file:
         include/llvm/CodeGen/SelectionDAG.h (and helper methods in the related SelectionDAG.cpp file)
       that maps an SDValue to the LLVM IR Value object it was used to translate from: DenseMap<const Value*, SDValue> *crtNodeMapPtr . Note however, that if you find this undesirable, we have an alternative to this in which we don't need to add such a DenseMap... *crtNodeMapPtr to SelectionDAG.h - there are some disadvantages and advantages if we don't use this (and I can give more details about this), and personally we prefer making the above-mentioned change.

     My question to llvm-dev is: can I make some changes (that is, add new data structures like the above crtNodeMapPtr) to an existing LLVM source file that is not part of the target back end (that is, the files SelectionDAG.h, SelectionDAG.cpp)? As written in the paragraph above, this would be a great alternative for my Connex back end.

     I've committed a new version of the source code at https://reviews.llvm.org/D60052 . I hope it will get accepted soon - I'm discussing with the reviewers.

   Best regards,
     Alex

   Hello, Roman, llvm-dev,
     There is one aspect I would like to discuss with you, the people from the llvm-dev
mailing list. There is one thing a bit more special with this back end, namely our back
end handles symbolic immediate operands (C/C++ expressions written as strings in INLINEASM
MachineInstrs). This means the back end can output a vector assembly code like:
       VLOAD RegVectorial0, N * 10 + 5 // where N is a variable in the original C program
     Therefore, in order to support retrieving from LLVM IR back to the original source C
code,
<...>

I'm not sure how that is supposed to work. What if IR does not originate from C?
How do you verify that the "C string" is in the form that will be understood
by whatever will handle it later on? Is there a clang part of the patch?

Regardless, that was already answered in ⚙ D60052 Add Connex vector processor back end

In general, IR instructions should be lowered to SelectionDAG nodes in a way that
doesn't require referring back to the original Instruction afterwards.

     I've committed a new version of the source code at ⚙ D60052 Add Connex vector processor back end .
I hope it will get accepted soon - I'm discussing with the reviewers.

As it has already been stated multiple times in the review, that is a
huge patch. It needs to be split into *many* *small*, standalone patches,
each one with proper test coverage and no dead code, before any review
can happen. You won't get an an alternative opinion from any other reviewer.

   Best regards,
     Alex

Roman