feedback on late machine combiner pass


we noticed that combining instructions at the ISEL level can result in sub-optimal schedules, since an expensive instruction could be folded and lengthen the critical path. A simple illustrative example for this is folding mul-mul-add into mul-madd (fused multiply add), which serializes two multiplications instead of allowing them to execute in parallel. Unfortunately ISEL does not have the information necessary to avoid sub-optimial cases in all instances. The alternative is to post-pone instruction combining to a later pass and decide on whether the combined code sequence performs better. This combiner pass would run at the machine IR level, use the machine trace model information, and roughly implement the following architecture independent algorithm initially targeting mul-add/sub pattern:

* Initial Instruction Combiner Algorithm
0. Recognize instruction patterns in a single basic block
1. For each add/sub check if it can be combined to madd/msub
2. For each pattern found, generate the alternative instruction sequence outside the basic block that contains mul - add/sub
3. Evaluate if the new pattern is more efficient:
a) In Os substitute when the new pattern has fewer instructions
b) Otherwise substitute when neither critical path nor resource length increases based on the machine trace model

Do you have any concerns? Or is this the pass you always wanted? :slight_smile:


Could you please provide a specific ll test and we will be able to consider what happened in details using debugger