Evolution of undef and poison over time

I’ve collected some statistics on the evolution of the usage of undef & poison since LLVM 10.
The usage of UndefValue has been roughly stable, while the usage of PoisonValue has been increasing steadily! That’s good, but we need to continue the efforts to replace UndefValue wherever posible.
Clang still doesn’t have a single use of Poison Value! :scream:



Now let’s look at the usage of undef & poison constants in tests:

The situation here is pretty serious! We keep adding tests that use undef at a scary pace. We’ve been adding about 15-20k uses of undef per release. We should make a serious effort to revert this. I hope most uses of undef can be replaced with poison as these are used as placeholders somewhere, so the exact value doesn’t matter.
With 140k references to undef, we will need a script to replace the easy cases with poison. But maybe that’s not for now, as we are still not removing undef this year.

Please use poison constants whenever possible in your test cases!

Data collected with:

for tool in llvm clang ; do
  echo "$tool UndefValue" `grep -RI UndefValue $tool | wc -l`
  echo "$tool PoisonValue" `grep -RI PoisonValue $tool | wc -l`
  echo "$tool undef test" `grep -RI '[^a-zA-Z0-9#-]undef[^a-zA-Z0-9_-]' $tool/test | grep -v 'define ' | grep -v '\-LABEL: ' | wc -l`
  echo "$tool poison test" `grep -RI '[^a-zA-Z0-9#-]poison[^a-zA-Z0-9_-]' $tool/test | wc -l`

The documentation seems pretty weak on discouraging undef. I knew things were changing in that area but was barely aware that it was effectively deprecated.

So it might be worth making the LangRef more explicit, or even writing an entirely new demystifying page (like the GEP one for example).

Does bugpoint still produce undef, or has it switched to poison? That could be another fairly quick win on test-cases. Same for llvm-reduce.

1 Like

I suppose the same applies to transforms in general. The legacy uses of UndefValue in passes will lead to steadily increasing use in tests even of unrelated features. That kind of undef will go away in one big step when the transform itself is updated.

Very good points, thanks! Those tools still produce a lot of undefs indeed.
I’ll submit a patch to LangRef with a discouraging note about undef. Maybe it makes sense to write an upgrade guide as well.

I think debug-info metadata uses undef a lot, and I don’t recall ever seeing it use poison. In fact I don’t think it would make sense for debug info to use poison. Maybe some of the growth you’re seeing is coming from that?

I will check, thanks!

BTW, why do you say that poison doesn’t make sense for debug info? As far as I can read from the manual, undef seems to be used like a sentinel. So it feels it can be replaced with any distinct value.

@jmorse @dblaikie @adrian.prantl
Would moving from undef to poison for debug-info metadata make sense?

Seems plausible, though not like it’d make any difference practically speaking?

Maybe figuring out a more nuanced way to gather the statistics would be useful? Perhaps creating a histogram by instruction/intrinsic type (eg: “there are 10 more br instructions with undef parameters in this release than the last release”) then we could ignore all the debug intrinsics easily enough.

Not that I’m especially concerned about making the change either.

(that said, I don’t get very involved in debug info locations - other folks have more context there than me & have opinions that should weigh more on whether this is worth doing, how much work, who should do it, etc)

true, poison might be too much. You would need to be able to introduce freeze operations as well.
Another trick I like is to replace operations with new arguments. It’s usually the best to generate clean testcases IMHO.

Bugpoint emits inordinate amounts of undefs, that’s probably where many of them come from. Also, it’s not completely clear (to me) that poison would be a better choice: undef can be used to prune unnecessary computations without automatically poisoning their original uses.