Prologue and epilogue for vectorized code

Alex_Susu · April 27, 2016, 10:46pm

Hello.
     I'd like to generate a sort of prologue+epilogue for a code block running on a SIMD architecture obtained from the LLVM loop vectorizer. My SIMD processor receives data from the CPU via DMA transfer and sends it via DMA transfer or a FIFO.
    It is exactly for these transfers that I need to write the prologue+epilogue - relatively simple, e.g. a call to a function like TransferViaDMA().
    Although it doesn't seem to be very difficult, I'm curious what is the best way to do it.

I haven't found anybody to write prologue+epilogue for vector code (obtained from the loop vectorizer), and although it shouldn't be very different from the prologue+epilogue for function call, I'm still curious what's the best way to do it.

Please let me know what do you recommend.

Thank you,
Alex

Alex_Susu · May 11, 2016, 8:44pm

Hello.
I come back with this question, rephrased a bit. Note that I guess this question should be useful also for the NVPTX LLVM back end, when it will generate automatically code for both CPU and NVIDIA device and generate automatically memory transfers, with cudaMemcpy().

Given LLVM scalar and vector code I want to generate code for both the scalar CPU and for my research Connex SIMD unit. The CPU and SIMD unit have different memory spaces and we require to perform memory transfer from CPU to my Connex SIMD unit, via DMA, to "synchronize" the 2 memories.

     Therefore, in the LLVM code with vector instructions I need to add (on the way to code generation) a call to a function performing the memory transfer from CPU to my Connex SIMD unit. More exactly, for the LLVM code below (obtained from LLVM's opt tool):
       ...
       %8 = getelementptr inbounds [10000 x float], [10000 x float]* @A, i64 0, i64 %7
       %9 = bitcast float* %8 to <32 x float>*
       %wide.load = load <32 x float>, <32 x float>* %9, align 4
       [more...]
     I want on the CPU to add a call to an external function writeDataToArray() like this:
         ...
         %8 = getelementptr inbounds [10000 x float], [10000 x float]* @A, i64 0, i64 %7
         %9 = bitcast float* %8 to <32 x float>*
         call writeDataToArray(%9, 128, 0) ; 2nd parameter is the transfer size in bytes, 3rd param is the offset to write in the local memory of the SIMD unit
       and, then, run only the following code on the SIMD unit:
         %newVar = getelementptr inbounds i32, i32* inttoptr (i64 0 to i32*), i64 0
         %dst = load <32 x float>, <32 x float>* %newVar, align 4
         [more...]

     Should I perform the insertion of this function call in LLVM's llvm/lib/Transforms/Vectorize/LoopVectorize.cpp in method:
        /// Vectorize Load and Store instructions,
        virtual void vectorizeMemoryInstruction(Instruction *Instr) ?
     Or should I do it as a separate LLVM pass or maybe in the back end?

Thank you,
Alex

Topic		Replies	Views
Vector code LLVM Dev List Archives	20	290	May 14, 2008
Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR LLVM Dev List Archives	8	198	July 5, 2013
Pass that inserts assembly in function prologue and epilogue Beginners clang , llvm	6	301	August 20, 2024
Tracking all prologue and epilogue insertions through codegen/lowering LLVM Dev List Archives	3	130	April 29, 2021
[GSoC] Supporting Efficiently the Shift-vector Instructions of the Connex Vector Processor LLVM Dev List Archives	5	144	May 3, 2019

Prologue and epilogue for vectorized code

Related topics