Fexpand-inline-ptx as a feature?

People sometimes write parts of GPU kernels using inline assembly containing PTX statements. One wishes they wouldn’t, especially when it’s an openmp target region, but it happens.

Usually in clang we leave inline asm as a string for the backend to process / choke on and all is well. PTX is an unusual ISA in this context because it doesn’t describe a real machine so a lot of clever assembly programming trickery doesn’t apply to it. There’s a good chance we could parse it in clang and emit a collection of IR nodes representing the same program. More fun, expressions that don’t parse can be left alone, so any sufficiently nasty edge cases can be left to the current behaviour.

This is an obvious-in-hindsight idea which I totally failed to recognise, brought to light by https://docs.scale-lang.com who actually went through the effort to implement this.

I think this would be a great feature to have. All the openmp target regions that spuriously contain inline ptx from where it began life as cuda would be able to eliminate that compiler boundary. People compiling cuda to spir-v would be able to set a flag and have this collection of horrible edge cases disappear from their todo list.

I personally want this feature but am not totally confident about persuading my employer to sign off on me implementing it. I’ve started that discussion. I also don’t know how Spectral Compute will feel about upstream llvm implementing one of their product features - I hope they view it as a nuisance they’d like to move out of their codebase. Inline PTX is a nuisance in general, desugaring it in clang (on request, in case it’s working around compiler bugs) is a technically superb answer.

What do other people think about this? Anyone so enthusiastic that they want to implement it? Anyone claim it’s so philosophically abhorrent that we should continue asking people to change to intrinsics instead?

Thanks!

1 Like