Generating avx512 implicit broadcasts

For the test case given in the link
https://godbolt.org/z/P5aYsKr8P
llvm is able to generate implicit broadcast for single use case.
is there way to force implicit broadcasts when there are multiple uses for the constant being broadcasted?

Not currently, ideally we’d do this where we have high register pressure but it currently doesn’t do a good job here (and it tends to spill the broadcasted constant and then fold the stack reload as a full register…).

What is your particular use case here?

I was looking at an internal application code and compared it with GCC. GCC seems to be doing full vector constant folded load in few places. They are not generating folded broadcasts though. But LLVM always ended up broadcasting to a register followed by use of that register in FP Arithmetic operations. This is because many places number of uses for the constant being broadcasted is 2 or more. There is also register spills and almost all zmm registers are used.

https://reviews.llvm.org/D150143/new/#change-xPKKSPZcbh4D can this pass do the folding ? or do you suggest some place where we can try to fold like X86ISelDAGToDAG.cpp?

X86FixupVectorConstantsPass will convert full vector width constant loads to broadcasts.

On AVX512 the pass will also attempt to convert full vector width constant loads folded in instructions to the broadcast folded variant, but we’re still missing a lot of instructions from the AVX512 tables: [X86] X86FoldTablesEmitter - add support for AVX512 BroadcastFoldTables · Issue #66360 · llvm/llvm-project · GitHub

If we’re running out of registers, we will try to reload constants (which might allow it to fold into the instruction), but its inconsistent, so please raise bugs if you see examples where it goes wrong.

If we have spare registers then we will just load/broadcast the constant to a single register and reuse it for multiple instructions.