Per-processor tuning

I'm doing some per-processor tuning and hit a problem with the "MOV" family of instructions on Intel/AMD.
(This problem will likely also become very real for ARMv8 too though)

Hi Christopher,

I’m doing some per-processor tuning and hit a problem with the “MOV” family of instructions on Intel/AMD.
(This problem will likely also become very real for ARMv8 too though)

In the Intel/AMD case there exist MOV8rr and VMOVDQArr
Both use WriteMove SchedRW
But one wants 1 cycles and ALU, another 4 cycles and FPU for bulldozer
If we speak about itinerary classes, they have different - IIC_MOV and IIC_SSE_MOVA_P_RR
these classes in the old IProcessortinerary model give me the possibility to describe latency and resources exactly
So as I understand if I want new model, I need to add new WriteMovSSE for example

That is a possibility if you think that distinction will likely benefit for several architectures. You would have to do the targeting of that new model for each architecture.

Maybe they’re just in progress
They pointed that it’s possible to create InstRW where you put for example your new WriteMovSSE
and associated instructions without modifying the definition but I tried it - no impact (I need to double check this was tested with MI scheduler)

Yes, you should be able to do that, see lib/Target/ARM/ARMScheduleSwift.td for an example.
If you are not seeing anything, it may just mean the scheduler does not think there is something to change even with your new model.
You can check the actual scheduling model for each instruction by poking around *.inc files (i do not remember which one contains the actual information).

To work-around this I’m currently doing it both the “old way” and the new way…

Bug? Is the code in some transition phase? (maybe I’m doing it “wrong”)

Any tips on how to get per-processor/per-vendor tuning would be most welcome

Again, lib/Target/ARM/ARMScheduleSwift.td should be a good example.

Thanks,
-Quentin