Sparse tensors can offer qualitative performance differences to some algorithms, including many in machine learning. MLIR has growing support for them, and there are at least a couple competing versions of sparse support for PyTorch. I’d like to pitch a path towards sparse tensor support in torch-mlir.
Sparse tensors at the PyTorch level
PyTorch itself provides a beta API for sparse tensors which supports a handful of linear algebra operations. Currently, that API is locked to two data layouts: COO (for arbitrary-dimensional sparse tensors), and CSR (for 2-dimensional ones). Internally, these are partly implemented via the layout attribute and partly via subclassing the tensor implementation (COO, CSR). I’ll re-iterate that this is a beta API and subject to change.
Alternatively, sparse tensors can be implemented at the user level, for instance be holding dense native tensors for indices and values in a user-defined class and then supplying custom operators for sparse math. torch_sparse is one such example. While not necessarily that relevant, I’ll point out that this was also more or less the approach that the COMET team took in their research, except at the MLIR level.
Finally, while it’s worth at least noting in passing that there are sometimes direct Python interfaces to sparse vendor libraries, but these are a bit outside of scope here, mostly because they sidestep everything we’re building (and thus neither have the same concerns nor receive many of the benefits of the compiler work in the MLIR ecosystem).
Sparse tensors at the MLIR level
The sparse_tensor dialect is MLIR’s answer to first-class support for sparse operations. It’s important to highlight that the dialect itself doesn’t actually provide a distinct sparse tensor type—sparse tensors are defined as attributes on the existing MLIR tensor type. The dialect also provides conversion and utility operations, and a lot of work also exists in related dialects (e.g, linalg) to support sparse-attributed tensors properly.
Bridging the gap
I’d like to propose opening up sparse tensor support for torch-mlir via extending the existing torch.tensor type with an attribute analogous to that of the sparse_tensor dialect. Concretely, this would mean
AnyTorchTensorType would be extended with an optional
encoding attribute (better names happily accepted). Normal tensors would be able to omit it, leaving all existing behavior unchanged. For conversion to the torch dialect, the importer would need some work to expand ivalue translation to include sparse tensors, and I believe there will need to be some work to extend tensor literal construction. Lowering to linalg+tensors would involve attaching an appropriate sparse tensor attribute and then constructing tensors with it. Because of MLIR’s existing transparent support, there shouldn’t be a lot of changes to math operations. There may need to be explicit conversion calls inserted. Once sparse layouts are supported, we can expand the set of torch-mlir-supported ops to include sparse-specific ones using the normal path. For backends that do not support sparse tensors, we would probably want to simply reject their existence, either via a legalization pass or just by throwing an exception during lowering.
I have some pieces of this approach under development in a personal branch. Because the proposed approach is (or should be) transparent to existing dense tensors, I’d prefer that some of this work happens in the main branch rather than a fork. This would allow (a) refactoring of some existing code that makes overly strict assumptions or relies on dense-only tensor pieces, and (b) clear, continuous evidence that the additional encoding attributes are indeed transparent to existing dense code.
Alternatives: implied layout via analysis
One other mechanism we could conceivably implement is just relying on conversion routines and dataflow analysis to reconstruct the implied layout of a tensor. This has the advantage of a smaller footprint in the PyTorch-torch dialect conversion surface, since no changes to the tensor type or torchscript translation would be needed. On the downside, we’d need a new analysis pass, and it would need to be integrated into each dialect lowering conversion.
I think the biggest downside to this approach is simply that it needlessly tosses away information only to recreate it later. PyTorch’s approach to sparse tensors is through attributed types. MLIR’s approach to sparse tensors is through attributed types. It seems unwieldy and labor-intensive to have torch-mlir’s approach be to chuck the attributes out and attempt to recreate them from scratch.
Alternatives: parallel datatype
We could always create an entirely separate type for sparse tensors. This would give even more isolation from the existing code, but at the cost of an enormous amount of duplicated work. For instance, sparse tensors (and other “tensor-likes” discussed later) need all the same shape inference machinery that normal tensors do. The data layout doesn’t change the mathematical object. This would mean that whenever a change or fix goes in for dense tensors, a second parallel fix would need to be applied to the sparse type. Oh, and all operations would now need to be explicitly multiply-defined on both types. This approach seems infeasible to me, but I’m willing to be corrected.
Exact vs. analogous attribute use
The sparse_tensor dialect has its own semantics defined in the MLIR builtin attributes. PyTorch has a slightly different set of semantics (and may change further). This isn’t a show-stopper, but it does raise the question of whether we should directly use the sparse encoding attribute at the torch-mlir level. I’ve not seen any concrete reason not to, but it feels like a defensive approach to have our own, even if it just directly wraps the underlying one. That way, if something changes later, we don’t have to back out a bunch of assumptions about identity. The downside is obviously some additional work and code to support.
Interaction with other layout options
PyTorch has support for regular sparse tensors via strided layout. MLIR’s sparse_tensor dialect doesn’t, but there’s been conversation on it. There’s no reason the proposed attribute couldn’t be extended to handle it, but it would require some thought as to what it would be lowered to.
A discussion on “tensor-likes”
This is a bit off-topic, but since it hits on some of the topics in this proposal, I thought I’d bring it up here anyways.
PyTorch tensors are not particularly extensible. There is no current or planned support for subclassing and inheritence in torchscript, and tensors are given special treatment and handling in much of PyTorch. For some vendors and users, it’s desirable to create an object that behaves a lot like a tensor, but for the reasons above, isn’t a tensor. I think torch-mlir is actually in a unique position to enable that.
PyTorch provides a mechanism to extend its functionality via custom classes. These end up being represented as opaque managed pointers which we have some support for in torch-mlir. There’s actually nothing that prevents us from generating real tensors from these objects. That would actually open up a whole new path for developers to allow tensor pseudo-inheritence by crafting an opaque custom class and providing a torch translation.
For my personal agenda, the use case is fairly obvious: providing a tensor-like that supports arbitrary TACO-style sparse encoding would expose the full power of the MLIR sparse tensor infrastructure to PyTorch users (instead of just the PyTorch beta COO and CSR layouts).
But this also might provides something like type extensibility for torch-mlir for platform-specific tensor representations. For instance, some hardware provides elaborate compression and storage layouts, which would now have a less-terrifying path to support.