DenseElementsAttr
is basically deprecated as a means of storing arbitrary sized tensor constants. They are still used in various contexts and make sense for certain “small” constants like splats, etc. But basically, because they are uniqued in the context, they perform very badly (I’ve seen cases where typical compiler pipelines using them for bulk tensor storage can take on the order of hours vs seconds if using resources). There were some threads on this a while ago, but afaik, all serious users of the infra stopped using DenseElementsAttr
for these purposes a few years ago. Some compilers go further and don’t even use inlined weights as resources at all, preferring to compile against their own external storage that can be mounted at runtime (IREE does this and just directly mounts safetensors, gguf, its own format, or in memory storage).
Sadly, I think the docs for resources are a bit less robust than some other parts of the infra. Perhaps @jpienaar or @River707, who did a lot of the work, know of a better stash of docs.
No idea. I expect that someone added some trivial folders based on DenseElementsAttr
and they were never removed. In our projects, we’ve had to disable all such things because the compile time is atrocious if ever hit.
IREE basically takes this approach. Here is its pass that outlines such a module, JITs and inlines results at compile time. It relies on quite a few things not present in upstream MLIR:
- An optimized CPU compilation pipeline (we’ve long since passed the point where even constant folding can be done in a reasonable time without a fairly optimized host compiler).
- Globals/load/store as the canonical form for bulk tensor data.
- Module level initializers for performing load time initialization of globals.
- A compiler driver which can receive a callback for invoking itself recursively.
- Some kind of const-expr outlining (IREE uses this pass and corresponding analysis in order to hoist eligible const-expr trees into module level initializers, which the JitGlobals pass can choose to evaluate at compile time).
There is another mode that we use this in where it can produce a new device-specific parameter pack for cases like you describe. It is a relatively fiddly bit of infrastructure and fairly tied to IREE. It works well enough as a “big hammer” which is pretty general and handles everything. We’re basically always fiddling with the analysis that identifies profitable expression trees – and that is more just getting the heuristics right and the result of having worked on it for a long time. Then the runtime integration tends to be fiddly when it comes to the large array of data types, etc that are always cropping up and need special handling.
I’ve long thought that a proper infrastructure like linalg should integrate a library like xtensor to provide a set of passes for doing more eager folding at compile time, especially for the large set of small/medium tensors that tend to unlock additional optimizations (i.e. shape/indexing/etc). I have a feeling that a set of passes like that could be made relatively modular/extensible and would serve a lot of people. However, it is just so hard and time consuming to contribute anything out of the ordinary to MLIR that we’ve never had the budget or stamina to do it. If someone ever wanted to work on something like that, at least as an optional component, it seems like it could be done.
I couldn’t advise on how best to go about contributing a constant JIT engine, though. Upstream MLIR is just missing so much basic infrastructure and has no culture of the integration tests that are needed to make such a thing any good (I can’t stress enough that making this kind of thing good is like 95% CI and having a good, comprehensive test suite – that I’m not sure I would even accept patches to the project which tried to build it without that).
I’ve made some comments that might elicit debate about project structure and charters, and I kindly request that if wanting to debate that further, it be done on the proper thread. The bottom line in my opinion is that the project structure itself lacks a component where such a constant-jitter would fit, and it lacks the production infra for testing needed to support such a thing. For the eager consteval, I mentioned, I think that if the deps/optionality were handled, we could probably find a place in the current project to build such a thing if needed. A prototype of either out of tree could be educational.
(edit: and you are of course, welcome to use/adapt any of the implementation that IREE has – it is liberally licensed and open source)