[RFC] Vector-DDG: For Better Visualization and Verification of Vectorized LLVM-IR

raghesh · November 18, 2024, 6:29am

We propose Vector-DDG (Vector Data Dependence Graph), a tool to visualize and verify the complicated data flow in vectorized LLVM-IR. The visualization helps to understand the vectorized IR better and to further improve the quality of the same. The automatic verification helps improve the developer productivity by catching the vectorization errors early.

We presented this as a “Quick talk” in LLVM Developer’s Meeting 2024.

There was a design change that we made recently and we are in the process of implementing that and we will share the code soon, once it is in a good shape. Before that, we would like to propose this to the community for upstreaming and we are seeking your valuable opinions and feedback here.

Detailed explanation follows.

Vectorization modules in LLVM, like SLP and Loop Vectorizer, bring exceptional performance uplifts for various applications. Hence, understanding and debugging the vectorized code and improving the quality of the same is of primary importance. In LLVM-IR, compared to the scalar code, it is particularly challenging to understand the vector code due to the presence of complicated data flow and memory dependencies. The complication is there because in the vector code the data is flowing through vector lanes. The presence of glue instructions such as shufflevector, insertelement, and extractelement makes it further complicated as they permute the vector inputs to them.

We propose a tool, called Vector-DDG to address the above challenge, which is an extension to the state-of-the-art data dependence graph (DDG). We extend the existing DDG framework (Scalar-DDG) available in LLVM, by expanding each node to a multi-node, corresponding to each lane in the vector instruction. The edges to and from the nodes in the multi-node captures precise lane-wise data flow to and from those nodes. Thus, the tool helps understanding the vector code better and to develop insights for improving the quality of the same.

We further extend the tool as an automatic vector code verifier by comparing the data flow in the Scalar-DDG and the corresponding Vector-DDG. The verifier can be added as a verifier pass after each vectorization pass, similar to the function and module verifiers in LLVM.

Visualization of Vector-DDG

Here, we show the visualization of Vector-DDG using an example given in Figure 1. The default DDG constructed from this example is shown in Figure 2. The Vector-DDG constructed by our tool for the same is shown in Figure 3. Within each node in the Vector-DDG, the rectangles labelled 0 and 1 represents the lane numbers. Thus, very precise data flow through vector lanes can be captured in the Vector-DDG. For example, the edges from the nodes representing loads (%1, %3) to the nodes representing shufflevectors (%4, %5), clearly shows the lane from which the data is originating and the lane to which it is flowing. Also, the path starting from the add node (%6) through the extractelement node (%7) ending at the store node shows that the value in lane 0 of the add node is getting stored.

Verification of vector code using Vector-DDG

Here, we explain our approach to verify the vector code generated by a vectorization
pass. The objective is to check if there exists the exact same data flow in the Vector-DDG as that of the Scalar-DDG. The precise lane wise edges present in Vector-DDG help us in doing so. We can say that a Vector-DDG is equivalent to a Scalar-DDG if and only if for each path in Scalar-DDG there exists a unique path in Vector-DDG.

To make the context clear consider the example for a Scalar-DDG shown in Figure 4. Two
different Vector-DDGs for the same example, one generated after correct vector code generation and one after an incorrect vector code generation are shown in Figure 5 and Figure 6 respectively. The rectangles show the packing of scalar instructions into different lanes of a vector instruction. The prefix ‘v.’ on the label of each rectangle specifies that it is executed as a vector instruction. The labels S and E denote dummy start and end nodes inserted. We have two data flow paths in the Scalar-DDG which are S→ l1 → a2 → s2 →E and S→ l2 → a1 → s1 →E. In the Vector-DDGs, note that the v.add instruction uses a shuffled (v.shuffle) result of v.load. We have to ensure that the exact same paths exist in the Vector-DDG. The paths in Vector-DDG in Figure 5 are S→ l1 → p1 → a2 → s2 →E and S→ l2 → p2 → a1 → s1 →E. Since the nodes p1 and p2 are of a shuffle instruction, the edges l1 → p1 and l2 → p2 can be collapsed into single nodes l1 and l2 respectively. Thus, the two paths are S→ l1 → a2 → s2 →E and S→ l2 → a1 → s1 →E, which are the same as that of Scalar-DDG and hence they are equivalent. However, in Figure 6 we have the paths S→ l2 → a2 → s2 →E and S→ l1 → a1 → s1 →E, which are not matching with the paths in Scalar-DDG and hence they are not equivalent.

We do not implement an expensive path comparison algorithm, but resort to an efficient algorithm, by traversing both Scalar and Vector-DDGs in topological order and by performing a level-by-level comparison. More details of the algorithm and arguments for soundness can be discussed further.

Thanks for reading and requesting to respond with your valuable views and suggestions.

alexey-bataev · November 18, 2024, 12:15pm

I think it is great tool and might very useful

amara · November 18, 2024, 7:00pm

Are there any concrete problems that this addresses? I.e. what motivated this and is it worth the cost of implementation & maintenance?

Leon · November 19, 2024, 8:05am

Any plan of opensource this tool? I think the visual graph for LLVM is still naive.

raghesh · November 19, 2024, 9:00am

Yes. the motivation for this tool came from the difficulty of understanding the semantics of a complex vectorized code, both for the purpose of fixing some wrong vector code and for improving it for performance. Whenever, we extend SLP or implement a new SLP module, the vector code verifier can easily catch issues and improve our confidence in the implementation.

We faced multiple issues in the past in debugging many benchmarks in SPEC CPU2017 like 525.x264_r. The analyses, we made there really motivated this.

raghesh · November 19, 2024, 9:03am

Yes. We would like to upstream it.

Topic		Replies	Views
LLVMdev Digest, Vol 16, Issue 14 LLVM Dev List Archives	1	100	May 10, 2006
Thoughts about the llvm architecture LLVM Dev List Archives	7	123	September 5, 2010
RE: Vector LLVM checked in! LLVM Dev List Archives	0	90	October 18, 2005
Vectorization: Next Steps LLVM Dev List Archives	22	229	February 15, 2012
loop vectorizer issue LLVM Dev List Archives	5	144	November 4, 2013

[RFC] Vector-DDG: For Better Visualization and Verification of Vectorized LLVM-IR

Visualization of Vector-DDG

Verification of vector code using Vector-DDG

Related topics