Loop pragma for hardware loops

Hey all,

I’m looking to extend the current clang loop pragmas to also support hardware loops and allow a user to insert (or completely disable) hardware loop intrinsics on a per-loop basis.

One of the questions I have regarding this is how to go about incorporating the different hardware loop intrinsics in the pragma. A few options we came up with:

  1. The pragma incorporates which intrinsic to use for a loop:

#pragma loop hwloop(set_loop_i32)


#pragma loop hwloop(/LivesInReg=/ true, /AddTestGuard=/ true, /NumBits=/ 32)

  1. The pragma adds some target specific info (string?) to use in the hwloop TTI hook/new hwloop TTI hook:

#pragma loop hwloop(target=“bdnz”) // PPC example

#pragma loop hwloop(target=“bdz”) // PPC example


#pragma loop hwloop(max-count=42, …)

Option 1 requires the user to know about llvm’s hardware loops internals so I’m leaning more towards option 2 as users are more likely to be aware of target specific information (such as PPC’s bdnz/bdz).

These are just some options we came up with, we would love to hear about other (better) options, if any.

Kind regards,

Janek van Oirschot

Hello Janek,

It looks like you would like to steer which hardwareloop form will be generated with a pragma by providing very detailed target information, but I think a more typical use case of pragmas is to override the cost-model or a transformation threshold/argument. In this case, I would have guessed that the idea of the new pragma is it takes precedence over TTI’s isHardwareLoopProfitable hook, and thus would probably have expected something as simple as “hwloop(enable|disable)” initially. If you would like to bring a hardwareloop into a more efficient form, then I think that’s mainly the responsibility of the hardwareloop pass or a backend pass (see e.g. the ARM backend passes). I think option 1 is a non-starter as it exposes all sorts of internals that we don’t want for different reasons, so option 2 looks a lot better but is still very specific.