DenseElementsAttr + i1 element type

Hi all,

In an attempt to clean up the DenseElementsAttr API, I landed this patch which removes the “isSplat” argument from the DenseElementsAttr::getFromRawBuffer(ShapedType type, ArrayRef<char> rawBuffer, bool isSplat) API and the corresponding lower level getRaw methods. This made sense to me because the getRaw method can just infer whether a splat was happening based on the number of bytes in the buffer being passed in.

This all seemed like a great simplification and removed a footgun that caused code to be pervasively written using patterns like this (or otherwise it is broken for single element tensor values):

  int64_t numElements = data.size() / dataEltSize;
  assert(numElements == 1 || numElements == type.getNumElements());
  return getRaw(type, data, /*isSplat=*/numElements == 1);

This was all great, but it turns out that this change isn’t correct. The problem is that we encode ‘i1’ (but not i2 or other small integers!) tensor element values densely, using a single bit each. This is very clever, but adds a bunch of complexity: for example in this API, we can’t tell whether single byte data value of “1” is a splat or not when the shaped type is tensor<2xi1>: it could either be a splat of 1/true or could be a 0, 1 initializer. This infects lots of other places as well [1] [2] [3] [4] [etc] which have to handle this as special cases. While most of this is internal implementation complexity, the “get” API exposes this to users of the MLIR APIs - this is how I fell into caring about this, when I encountered the footgun.

I have a choice here: I can either revert that patch where I got rid of the argument and we can live with this footgun + complexity, or I can move us to a model where i1 values take up a byte each, just like we do for i2 and other values. The only downside I see to the later approach is that it means i1 dense element attrs will take up more space, but I don’t think that is particularly important.

WDYT?

-Chris

This is out of my cache, was this due to std::vector interfacing? Or was it for compression and as operations on i1 easily work on this form still?

And this is an internal-only API I believe, how pervasive is it? (In general pro removing footguns, but raw APIs …)

This isn’t an internal only API, this is the obvious generic API for loading up DenseElementAttr.

i1 compression was fairly significant for many use cases at the time, which is why it was added.

In this case are you storing the raw “compressed” data? or the full data? If it’s the “full” data, we could just have a version of getRaw that is for things that are know not to be splats aside from the degenerate 1 element case.

This is because i2 integers were never a common use case, so we didn’t care to optimize for that.

IIRC it did technically start out as an internal only API, and then (I suppose like all things) got repurposed as a public API along the way.

– River

Was it really? or was it analogous to std::vector<bool> - a premature optimization that seemed like a good idea at the time, but broke abstractions and caused more problems than it solved? Recall that std::vector<bool> has been deprecated/removed from the C++ standard. I see this is analogous.

The code in question dates from 2019 and earlier, which is before MLIR was being used for anything in production. Furthermore, it doesn’t make sense: nothing significant in ML uses large boolean weights. There are exotic low-bitwidth quantized models, but if this was supposed to help with that, then it would also work for i2 and i4 which are much more important than the i1 case.

I just don’t believe this was made on an empirically “we need this” basis (but would love to have some evidence). This adds a bunch of complexity to the codebase, I think we should just rip it out to make everything simpler, more consistent and more efficient for the normal case.

I care more about the functionality being right and not having a footgun in the codebase. If we must keep this behavior then we could also take another approach - eliminate the “automatically splat if needed” behavior and go to a more clear model of “get()” which never splats and “getSplatted()” which always splats. That makes it more clear in the client code what the intention was. They can do the “if” statement on their side if they want both conditionally.

-Chris

1 Like

It was, but definitely long enough ago where I don’t remember everything.

If it really isn’t necessary, and removing really does simplify things I’m fine with dropping it.

Ok, well what is your feeling here? There are three reasonable paths I see:

  1. Just revert the patches.
  2. Remove dense<bool> compression
  3. Add get() vs getSplat() APIs

I tend to prefer #2 because it will remove a bunch of complexity, but not strongly. I think the more important question is “what do we want the MLIR API to look like?”

-Chris

If the complexity is not paying for itself, I’d go for 2.

1 Like

Ok, awesome. Does anyone else have any objections to removing it?

1 Like

Isn’t it true that splatness can’t be inferred only for sub-byte element types? Instead of killing compression, can’t we go with (3) or (1) with the doc comment/understanding that splatness can’t be inferred for sub-byte element types. I believe DenseElementsAttr API users don’t use getRaw (I have never) and only “internal” developers do, but they’d read the comment justifying the flag/getSplat.

Also, won’t the footgun always lead to an assertion? (when the splat flag is not consistent with type and data for byte or more)

Yes, as I mentioned 1 and 3 are definitely “valid” paths. Are you arguing to keep the complexity? Why? What is the use-case?

-Chris

First off, thank you. I’ve run into this footgun but didn’t take the time to untangle it. My general opinion is that i1 is not special, and at this level, if we are caring about that, we are probably doing it wrong. I’m generally an advocate of unweirding i1 in the handful of places I’ve found it special cased over the years (it becomes really awkward, when as you mention, we don’t apply the same rule to i2/i4, etc).

I vote for either promoting all sub-byte aligned types to 8 bits or having a much more principled stance with respect to sub-byte alignment. I don’t see us taking the time to handle the latter at this level any time soon and would rather remove the special case.

1 Like

I was just thinking that the 8x reduction in storage was appealing: it’d be 2x and 4x lower for i2 and i4. But I haven’t myself run into real use cases.

We have real use cases for both, and right now, I would say they have a higher value than the i1 specialization for what we do (complete guess).

I think we may want to have a better packed form for such things down the road as an optimization, but you kind of need to be all in on that and do it right. I’d be open to a proposal for packing these attributes in memory, but if it isn’t comprehensive, I’d rather we not have it at all and bias towards consistency.