MLIRContext doesn't free DenseElementsAttributes and bloats memory

We need some better examples of good uses of OpaqueElementsAttr and dialect constant materialization hooks for this. That seems like in the direction of being able to construct folding hook like things where transients don’t live forever. It would be nice if the act of doing a folding conversion could work on interfaces of some kind that could be backed by transient allocations. I’m mainly thinking of how I would write a pass that did a folding like transform and what the interface on the op should be for that.

I still want the ops to implement something that helps me fold them in a generic way. I just want to be able to build my own thing with a cost model and non forever storage.

Yes, but my point is that these simple cases can be done by a specific pass dedicated to such things. It isn’t a good thing for everything to be in canonicalization/fold, only universally profitable things should be. It isn’t that hard to write a pass in MLIR!

Again, this can be handled by a pass specific to this. The design I’m suggesting decouples the folding logic for an op (which you can define universally, specifically to solve things like the above modularly) from the “do I want this specific constant to be globally foldable”.

This is the key thing you need to achieve here, allowing you to implement the constant folding for operations, and separate it from the cost model.

-Chris

1 Like

I’ve had a TODO on my list I think for ~2 years now to redesign OpaqueElementsAttr into something usable, but never had a need strong enough to push it. I’d love if we took the change to turn it into something useful.

– River

We encountered this issue in onnx-mlir where we have a single pass for constant folding.

Our solution is using temporary buffers for DenseElementsAttr, and doing folding over the buffers. DenseElementsAttr is loaded into a buffer lazily when the first time we reach it, and a buffer id is stored as another attribute in the constant op for tracing. At the end of the pass, we construct DenseElementsAttr from the buffers.

With ResNet152 that has many conv-batchnorm, the peak memory footprint was reduced from 7 GB to <1GB.

as another attribute in the constant op for tracing. At the end of the pass, we
construct DenseElementsAttr from the buffers.

I’m not sure how this solves the problem. If the final DenseElementAttr is again a very large constant, you are stuck with it forever. If a future transformation makes it “dead” or if you had to run the constant folder multiple times and the incoming IR to the folder already had such large constants, you are again not free from such dead dense elements in the context. Is the remaining 1 GB of the memory footprint completely free of such large dead attributes?

I see - in that case, uniqui’ing would be necessary to allow consistently inexpensive equality checks and the other approach of using a different arena for these and a ref count would be better. To avoid “size-based” behavior for the same attribute, please see below.

Both of the above issues could be addressed by simply having a different derivative - a DenseHeapElementsAttribute (i.e., instead of a bit, a different attribute or if needed later a trait)?, and then as you/others pointed out, have separate passes that look at and deal with constant ops with such attributes. I’m not sure how duplication of “folding hook code” could be avoided because the regular folding hooks won’t fold these and these additional passes would have to use canonicalization patterns while having similar logic but with a different dense elements attribute kind.

Reg. GC vs destruction with op or “no refs remaining”, my assumption was that getting rid of such dead attributes as soon as possible was what was desired since this is really for very large constants. So one would like to destroy these as soon as the op is gone (if not uniqui’ing) and if uniqu’ing when no refs remain.

One related concern is that there is currently no flow or test case upstream that exercises this and that could be used to test a common solution upstream. One may not be able to check in 100s of MB of constant data and a separate repo will have to be used.

Yes, the final DenseElementsAttrs are still there, and the remaining 1GB includes them. If we call the pass multiple times, it would introduce more immortal attributes. We can only avoid intermediate attributes during a single pass call. In our pipeline, we only call the pass once.

Another concern about using DenseElementsAttr for large constants is that converting it to LLVM IR constants takes much time (llvm-project/ModuleTranslation.cpp at main · llvm/llvm-project · GitHub) since elements are loaded one by one. I am not sure this is easy to overcome or not. Just my observation.

I’d hope that what we can eventually factor to land upstream is mostly the mechanisms/interfaces. When we get there, some kind of test to make sure things are being deallocated or that the context’s allocator has not been bumped. Hopefully we can come up with something more clever to test the mechanism than hundreds of MB of dummy data :slight_smile:

I think this needs to be something other than an Attribute. Attributes mean something to the system and by definition are uniqued, pointer comparable, backed by forever storage, etc. If there was a new interface for this new kind of folder, it should probably take a new type that can contain/be downcastable to either a plain Attribute or some new OutOfLineElements. If we got the types correct, we could get some sharing between normal cost-neutral folders and the special things we don’t have yet (i.e. whatever mechanism emerges to allow us to not have all of the switchiness in the ONNX example).

It would further be nice if there was a common interface (say ElementsAccessor) for reading from either an ElementsAttr or OutOfLineElements.

I’m not exactly sure what the types look like yet, but it would be nice to have some convergence with folding hooks so that it is a) possible for custom “folding” passes to interop with normal cost-neutral folding hooks (i.e. I don’t want to re-implement AddOp/et-al result computers just because I want to control the folding policy), and b) have some kind of op interface that allows an op to generically compute its results from operands but dose not automatically imply folding (maybe cost-neutral default folding could always use this).

Mainly thinking of how to write such passes as presented for ONNX in a better way that isn’t completely disjoint.

[quote=“bondhugula, post:26, topic:3691”]

I don’t see how this works - Attributes are nested and are not refcounted. Allowing any attibute to be “garbage collected” requires all aggregate attributes to have the same support. Such a model would add a ton of complexity and design overhead here.

Further, I don’t see why it would be better than the model I proposed upthread.

-Chris

Further, I don’t see why it would be better than the model I proposed upthread.

Sorry I didn’t understand where you intended to store the constant via an OpaqueElements attribute or other indirection. As pointed out earlier, having it on the disk will impact performance. And if it’s in memory, how is it managed and how does it disappear when not needed?

I’m not proposing a single imposed storage model. The key observation here is that weight data is very large, we want to enable a large number of tools and usecases, and therefore there is no one “right storage model”. It may be reasonable to keep weights in compiler memory for small inference models, they may be best left on disk when they are large and the compiler has no need to poke at them, they may not be present at all for an untrained model, and they may be so large that they need to be sharded across a cluster or stored in distributed virtual file system or database. In addition to the storage location, there is also the storage format / layout etc.

The solution to this is simple: MLIR has ops, use an op for this. This gives an extensibility answer that keeps the non-storage related ops orthogonal. If someone wants to build (e.g.) an quantization tool that needs to rewrite weights and wants to be generic over storage formats, then they should use operation interfaces to provide that abstraction.

-Chris

Sending an update here.

As of D109190, ElementsAttr is now an attribute interface. Given that, users can define attributes that store data in whichever way they want (in a file, out-of-context pointer, random number generator, whatever) and inject it into operations that support ElementsAttr (kind of like OpaqueElementsAttr, but not bad). It also supports opaque element access to a certain degree.

– River

2 Likes