Trying to optimize small snippet

I’ve got this snipped:

  %bool.fast.i.i = select i1 %cmp.i.i, { i64, ptr } { i64 2, ptr null }, { i64, ptr } { i64 1, ptr null }
  %0 = extractvalue { i64, ptr } %bool.fast.i.i, 0
  %null.i22.i = icmp eq i64 %0, 0
  br i1 %null.i22.i, label %call.next.i, label %call.exit.i

It seems like LLVM should be able to optimize away the icmp and conditional br but I can’t seem to get it to happen. I’ve thrown O3 at it, and sroa and a bunch of other passes. Is there something here I’m missing, or do I need a specific analysis / transform?

I wouldn’t expect a select with a struct result type to optimize well, in general; it’s legal, but nothing in clang/LLVM normally generates that construct, so optimizations don’t really know how to handle it. If this is coming out of your frontend, I’d suggest changing the way you generate code.

Maybe we could teach instcombine to split selects like this (so instead of one select with an { i64, ptr } result, we have two selects: one with an i64 result, and one with a ptr result).

Of course. I would expect a pass to transform that into 2 selects. I would think SROA would do this, but does SROA not work when an alloca isn’t present? Or on constants?

The source was a hand written LLVM function with a select struct that had been inlined. I’m now going through the code looking for that pattern (select i1 struct) to replace. Still, it was unexpected.

SROA is, at it’s core, does just two things: spliting allocas into multiple pieces, and SSA construction on allocas. It doesn’t handle anything that isn’t an alloca.

I think instcombine has some transforms to split structs passed as values, but maybe just for load/store? I don’t recall exactly.

1 Like

Thanks. That is useful. It would be nice to be able to have SROA on selects without allocas. The replacement I used turns 1 line of IR into (4 * number of struct fields), 8 in the case of 2. That’s a lot of extra IR, though in many cases it can be optimized away.