[RFC] Expanding the Experimental Histogram Intrinsic

Background

I’m building on the great work of auto-vectorizing histogram operations by Paschalis Mpeis and Graham Hunter, which introduced vectorized histogram updates. The original implementation has already landed in LLVM:

@paschalis.mpeis @graham.hunter

Motivation

Currently, histogram vectorization is limited to cases where the bucket is updated by an invariant amount, specifically using addition (+). While this is a great step forward, I’d like to extend support to more general use cases and provide finer-grained control for targets.

Proposed Enhancements

  1. Support Additional Update Operations
  • Extend beyond simple addition to include operations such as uadd.sat, umax, and umin.
  • A PR implementing this is in progress and should be available soon.
  1. Fine-Grained Target Control Over Vectorization
  • Instead of an all-or-nothing approach, allow targets to specify which load/store operations are safe to vectorize as histogram patterns.
  • One possible mechanism is leveraging address space information to guide vectorization decisions.
  1. Support for Variant Updates
  • Allow updates that are not invariant but instead depend on dynamic values.
  • This expands the range of histogram workloads that can benefit from vectorization.
  1. Returning Intermediate Histogram States
  • Modify the intrinsic to return the intermediate histogram state instead of void.
  • This enables use cases where the previous histogram value is needed before updating.

Example

Instead of just performing an update, the intrinsic would return the previous value stored in the bucket:

for (int i = 0; i < N; ++i) {
    int val = buckets[indices[i]];
    buckets[indices[i]] = val + 1; 
    out[i] = val;
}

Here, out[i] captures the histogram value before the update in iteration i, which is useful in various applications.


Feedback

I’d love to hear your thoughts on these proposed changes. Are there additional use cases or concerns I should consider?

Thanks,
Ron Dahan

1 Like

Pull Request for the first bullet: Expanding the Histogram Intrinsic by RonDahan101 · Pull Request #127399 · llvm/llvm-project · GitHub

Pull Request for the second bullet: Add target hook for automatic histogram vectorization by RonDahan101 · Pull Request #128414 · llvm/llvm-project · GitHub