Machine Unroller patch

Hi all,

I have submitted a patch that implements machine unroller as a utility class.

The intent of this patch is to provide the loop unrolling functionality at the MI level. Currently, It handles only small inner-most loops with the run-time trip count and a single basic block. We target such loops mostly because they often have underutilized resources. The unrolling is not only found to improve the resource usage in many cases but if scheduled properly it can also help reduce stalls by hiding the multi cycles instruction latencies. To avoid excessive code size increase, we perform it selectively and do it only for the loops where it’s determined to improve the resource usage. Since the Software Pipeliner already models the resource usage for ResMII (Resoure Minimum Initiation Interval) computation, we decided to use it for computing the unroll factor as well and thus let the pipeliner drive the machine unroller. The unroller-pipeliner combo is found to generate much better code for the small loops with the high latency instructions without loop carried dependence. For now, this feature is enabled only for the Hexagon backend. To enable it for other targets, they must extend the MachineUnroller class and provide their own implementation of the target specific APIs.

The patch currently doesn’t have any reviewers since I didn’t know whom to add. I will add some folks for the Hexagon specific changes, but I would greatly appreciate if others can review as well and provide their feedback. Please let me know if you any questions.