I’m looking to extend the current clang loop pragmas to also support hardware loops and allow a user to insert (or completely disable) hardware loop intrinsics on a per-loop basis.
One of the questions I have regarding this is how to go about incorporating the different hardware loop intrinsics in the pragma. A few options we came up with:
- The pragma incorporates which intrinsic to use for a loop:
#pragma loop hwloop(set_loop_i32)
#pragma loop hwloop(/LivesInReg=/ true, /AddTestGuard=/ true, /NumBits=/ 32)
- The pragma adds some target specific info (string?) to use in the hwloop TTI hook/new hwloop TTI hook:
#pragma loop hwloop(target=“bdnz”) // PPC example
#pragma loop hwloop(target=“bdz”) // PPC example
#pragma loop hwloop(max-count=42, …)
Option 1 requires the user to know about llvm’s hardware loops internals so I’m leaning more towards option 2 as users are more likely to be aware of target specific information (such as PPC’s bdnz/bdz).
These are just some options we came up with, we would love to hear about other (better) options, if any.
Janek van Oirschot