[Survey] MLIR Project Charter and Restructuring Survey

rengolin · November 7, 2024, 6:46pm

Following up our long discussion on the MLIR project governance and charter, we decided to create a survey to understand how MLIR developers and users connect to the upstream infrastructure.

Thanks @stellaraccident @mehdi_amini @jpienaar @ftynse for the help and comments. We’re trying to collect enough information to be able to cross-reference usage, patterns and find a reasonable set for most-used dialects per type of project, etc.

The survey will be live until the end of November, 2024, when we’ll start working on the answers, collecting some statistics and drawing some conclusions. We’ll present the results and anonymous answers soon after.

It is a bit long, and we appreciate your time filling it, but it’s important we get enough data now, so we can drive the rest of the conversation around governance and charter. We hope we can get a clear enough picture to start the process, not find the perfect balance on the first try.

If you find anything wrong with the survey, the questions, etc. either reply in this thread or ping me directly. We’ll try to fix any issues, but once people start to respond, we can change less and less without compromising the quality of the results.

Thank you very much in advance!

CC: @clattner @banach-space @River707 @Mogball @javedabsar @MaheshRavishankar @makslevental @kiranchandramohan @AaronBallman @TobiasGrosser @asb

rengolin · November 29, 2024, 1:10am

Apparently the survey is in “violation of Google’s Terms of Service”, and requesting a review brings a 404 page. Luckily, they didn’t delete the spreadsheet results, so I’ll still be able to extract information from it, just not as quick as I hoped. I also made copies, in case the sheets also get deleted.

This means the survey is now closed.

We got 83 responses across a wide spectrum of uses and areas. I’m hoping to get some bird-view stats next week.

The idea is to match usage type and projects with dialects and upstream experience and see how we can organize the governance in bundles that reduce overall time cost for those involved. This is not about code movement (yet), which should be discussed after we agree on governance.

Thank you everyone that took the time to answer this long questionnaire!

mehdi_amini · November 29, 2024, 11:24pm

That is quite unfortunate! Maybe a lesson about relying on Google’s product in the future

Also, considering ThanksGiving week, I’m concerned we haven’t got the feedback from everyone who intended to. I personally know multiple people who told me they intended to submit feedback and I’m sure they were still working on it and haven’t been able to submit it

dcaballe · November 30, 2024, 6:27pm

I’m one of those who was planning to answer it over the Thanksgiving break. It would be great if we could find a way to still answer it and give a bit more time.

bcardosolopes · December 2, 2024, 11:09pm

Same here, thanksgiving got in the way, it’d indeed be great if there’s a second opportunity to go over the questionnaire!

rengolin · December 3, 2024, 6:51pm

Because I had to do this all by hand with Sheets’ functions and plots, the quality of the results will be lower, but still bear with me while I compile more results.

Meanwhile, I have some interim stats that may be interesting.

Final tally: 88 results.
Initial take away:

The usual suspects (AMD, Intel, Google, Nvidia, SiFive) top the list
100% technical crowd (no “business leaders”)
Three major areas dominate (percentages don’t add up to 100%):
- Tensor Compiler (72%)
- Kernel generation (75%) (usually together with tensor compiler)
- Front-ends (49%)
More than half of the people didn’t need big changes to MLIR, no one thought it was a disaster (perhaps bias here?)
About 60% of the respondents do work upstream, most on dialects (another bias?)

On the experience upstream, core or dialect, the picture is the same:

About half had overall positive experiences.
About 20% were blocked on hard technical problems (expected, from a compiler!)
Dialects have a higher perception of “easy problems” getting in the way, but here, too, bias on “I know my local usage” could be at play.

On upstreaming a new dialect, half say they didn’t need, a quarter (25%) are happy with the results, with the final quarter split between problems they faced.

These results have detailed comments, which we should use to see what was wrong and how to fix it. But that’s for a later day. I just wanted to give an overview.

Overall, a positive picture is emerging, which is good news. Also, it’s clear that two sub-groups are distinct: tensor compiler + kernel lowering and front-ends, with other groups scattered amongst themselves. This gives us a good signal for the governance proposal on which groups should begin to take form, with the rest covered by a core group.

Will update with more data as I go.

jpienaar · December 3, 2024, 7:05pm

I think there is a bias, but may be going other way (from folks asking rather than folks answering).

Note, and this is what I meant by weighing, this data represent who answered not the folks they represent. E.g., I answered for the groups I’m involved in rather than tell every engineer working on the project to go answer. So company percentage doesn’t make sense beyond being respondents but not projects or engineers.

So these groups don’t share dialects? Or they do but different groups of people?

A primary goal was identifying unused dialects. What does those results look like?

Do we have to differentiate between the Linalg cluster and not?

stellaraccident · December 3, 2024, 7:32pm

I made announcements on two internal (AMD) chat rooms when the survey opened telling people that if they worked on this stuff, they should take the survey. A few people asked me for advice on how to answer, and I just told them to do their best with their own experiences. Based on feedback, I gather that the folks who directly touch this tech answered and (due to the detailed nature of the questions) few of the more passive consumers did. If we do future versions of asking questions, imo, it would be good to structure it to explicitly reinforce that bias.

Getting feedback is both an art and a science. I suspect that a first run survey like this has enough bias built in that it will only provide a rough analytical outline of the status quo. But it gives some starting points to ask more questions. Also, the comment feedback is likely to be pretty invaluable for both qualitative and quantitative follow-ups.

rengolin · December 3, 2024, 7:58pm

That!

People responded “Tensor Compiler” AND “Kernel codegen” together on most combinations, and the union amount to about 80% of the respondents. I don’t yet have the actual percentages because google sheets doesn’t give me that and I have not yet have time to use a proper tool, but visual inspection tells me it’s significant.

“Front-end” shows up in half of the answers, which is also significant, but could also mean StableHLO/PyTorch, which I’ll only know once I start looking at the dialects and text fields (which I haven’t).

Bottom 10:

Dialect	Users
vcix	0
amx	1
irdl	1
acc	2
polynomial	2
x86vector	4
arm_neon	5
mpi	5
quant	6
sparse_tensor	6

Top 10:

Dialect	Users
linalg	48
vector	53
math	57
tensor	57
memref	61
llvm	69
arith	72
scf	72
cf	73
func	73

Not yet. This will take some form of PCA to get right.

rengolin · December 4, 2024, 5:54pm

Sorry, I missed this point on my reply. I totally agree with you and did not intend to make this point at all.

I asked the company on the form precisely to do what you ask: let’s bundle and weight in by what the answer represents (single person, team, company wide) and not just count beans. That’s why I also asked project name.

For instance, most AMD people work on IREE (as expected). So I can group their answers into a common theme, but I cannot count all their answers as one, because there’s a lot of people there that actually work with it. Same for Google, Arm, Intel, Nvidia.

We’re not trying to compete who has the most people, just understand where these people are coming from, trying to do and the common trend.

rengolin · December 6, 2024, 6:56pm

Because people answered all the dialects they use, just looking at the cloud of dialects isn’t super helpful. Because some people answered for their groups while other for themselves, we have duplicated data. So, I wrote a small python script that uses https://networkx.org/ and looked at how connected the dialects really are.

Here’s the list of dialects in order of connectivity. This is how many people answered that they use these dialects together (edge weight), then summed up all edged from each dialect.

Most connected dialects:
[
 (799, 'scf'), (706, 'arith'), (689, 'math'), (687, 'llvm'), (656, 'memref'),
 (621, 'linalg'), (604, 'tensor'), (598, 'vector'), (544, 'cf'), (505, 'affine'),

 (428, 'index'), (408, 'gpu'), (376, 'transform'), (371, 'buffer'), (248, 'omp'),
 (171, 'amdgpu'), (137, 'nvvm'), (134, 'TOSA'), (131, 'emitc'), (125, 'shape'),

 (121, 'pdl'), (101, 'nvgpu'), (97, 'ptr'), (95, 'complex'), (89, 'rocdl'),
 (71, 'dlti'), (66, 'pdl_interp'), (58, 'ml_program'), (45, 'async'), (36, 'ub'),

 (31, 'xegpu'), (28, 'mesh'), (21, 'sparse'), (15, 'mpi'), (10, 'x86vector'),
 (3, 'poly'), (1, 'irdl'), (0, 'vcix')]

Of the top 10, the only one that isn’t a generic dialect (used by multiple different use-cases) is linalg.

This is how each of those are connected to other dialects, sorted by weight:

Top 10 connectivity:
 * 'scf': func llvm memref math tensor cf vector linalg affine gpu index buffer transform omp TOSA SPIR-V shape pdl rocdl ptr dlti pdl_interp async xegpu quant ArmSME x86vector arm_neon acc poly 
 * 'arith': scf llvm memref math cf tensor vector linalg gpu index transform omp TOSA SPIR-V pdl shape complex ptr dlti pdl_interp async xegpu mesh ArmSME arm_neon acc poly 
 * 'math': scf arith func llvm memref vector cf linalg affine gpu buffer index transform omp TOSA complex nvvm ptr nvgpu ml_program pdl_interp dlti async quant ArmSME sparse arm_neon acc poly 
 * 'llvm': scf arith memref math vector cf tensor affine linalg gpu index transform buffer omp nvvm nvgpu pdl TOSA ptr dlti async ArmSME sparse arm_neon x86vector acc amx 
 * 'memref': scf arith func llvm tensor math linalg vector cf gpu index transform omp SPIR-V pdl nvgpu nvvm ptr dlti xegpu quant mpi ArmSME arm_neon poly 
 * 'linalg': memref scf arith tensor func math llvm vector affine gpu buffer cf transform index TOSA omp pdl nvgpu nvvm ml_program dlti ptr xegpu ArmSME quant arm_neon ub poly 
 * 'tensor': scf arith memref func math vector linalg cf gpu transform index omp SPIR-V TOSA shape pdl nvvm complex ptr async xegpu quant sparse ArmSME mpi arm_neon amx 
 * 'vector': scf arith llvm func memref tensor cf linalg affine index buffer transform omp SPIR-V complex pdl nvvm ptr pdl_interp async xegpu quant sparse ArmSME mpi arm_neon amx 
 * 'cf': func scf llvm math memref vector affine linalg index buffer gpu transform omp complex pdl nvvm ptr pdl_interp dlti async ub quant ArmSME arm_neon acc poly 
 * 'affine': scf func memref llvm math tensor vector linalg gpu buffer transform index omp amdgpu pdl nvvm pdl_interp complex ptr quant ArmSME ub arm_neon acc

The linalg weights are:

Stats for dialect 'linalg':
[
 (46, ('linalg', 'memref')), (45, ('linalg', 'scf')),
 (44, ('arith', 'linalg')), (42, ('linalg', 'tensor')),
 (41, ('func', 'linalg')), (39, ('linalg', 'math')),
 (38, ('linalg', 'llvm')), (35, ('linalg', 'vector')),
 (33, ('affine', 'linalg')), (32, ('gpu', 'linalg')),
 (31, ('buffer', 'linalg')), (30, ('cf', 'linalg')),
 (29, ('linalg', 'transform')), (25, ('linalg', 'index')),

 (18, ('linalg', 'TOSA')), (15, ('linalg', 'omp')),
 (12, ('linalg', 'pdl')), (11, ('linalg', 'nvgpu')),
 (10, ('linalg', 'nvvm')), (9, ('linalg', 'ml_program')),
 (8, ('linalg', 'dlti')), (7, ('linalg', 'ptr')),
 (6, ('linalg', 'xegpu')), (5, ('linalg', 'ArmSME')),
 (4, ('linalg', 'quant')), (3, ('linalg', 'arm_neon')),
 (2, ('linalg', 'ub')), (1, ('linalg', 'poly'))]

Note: the order of the dialect in the tuple is an artifact of its alphabetical order in the list I ran on.

There’s an bulge there around the transform dialect onwards. I’d call this the “linalg bundle”.

Some of those dialects we can decorrelate, because they’re obviously related in other forms. For example, linalg uses func, scf, math, arith, cf, gpu and llvm for lowering and payload. It also uses affine for maps, mostly. Other usages (ex. front-ends) do too, for similar reasons.

This basically leaves: memref, tensor, vector, bufferization, transform as strong correlation. index, omp, pdl are weaker. Still in the same area, but also used elsewhere. The rest is specific to individual uses, it seems.

This is basically what we have been arguing to be the “Tensor Compiler” sub-group for a strong charter to design tensor compilers.

I am anonymizing the data I used to produce those results and will attach to this thread soon. But other correlations are needed for kernel codegen (Triton et al) and Front-ends (CIL, FIR) which are the other two largest sub-communities.

rengolin · December 7, 2024, 8:13pm

Here’s a snapshot of the script I used (now modified) and the anonymized data for dialects, project type and interaction type:

I don’t claim correctness, and will accept PRs for corrections and new features. I’ll also try to add more anonymous results in there, as time permits.

rengolin · December 9, 2024, 12:16pm

A bit more data on grouped stats. Again, this is flawed, since some people responded for the whole team, but on visual inspection, this does not change the status quo, as those responded with the signal, not the noise.

Tensor Compiler

Selecting all entries that selected “Tensor Compiler” in the survey (63/88), I have the following dialect list (up to 30% utilization):

Dialects

Dialect	count	%
arith	54	88.52%
cf	53	86.89%
scf	53	86.89%
tensor	52	85.25%
func	50	81.97%
memref	50	81.97%
*llvm*	47	77.05%
vector	45	73.77%
math	44	72.13%
*linalg*	42	68.85%
affine	41	67.21%
gpu	37	60.66%
transform	31	50.82%
bufferization	30	49.18%
index	28	45.90%
omp	19	31.15%

In bold, the most common ingress and egress dialects.

Ingress & Egress

To avoid bias, I looked at the data filtered by ingress and egress dialects and grouped by company/group, not individual responses. “Many” and “few” depend on how many were bundled, so, pinch of salt.

Ingress

Visual count	Dialect	Groups
Most overall	linalg	AMD, Intel, Arm, Qualcomm, Microsoft, and many others
Varied across projects	tosa	AMD/IREE, Arm, Google, startups
Many in one project	torch	Mainly AMD/IREE, Arm
Few in one path	triton	MS, Intel, Meta (on Triton side)

Egress

Visual count	Dialect	Groups
Most	LLVM	AMD, Intel, Arm, Nvidia, Qualcomm, Microsoft, and many others
Few in one path	SPIRV	AMD/IREE, Intel accelerators
Very few in one path	NVVM	Nvidia, AMD, startups
Very few in one path	ROCDL	AMD, startup
One, to be inclusive	XeGPU	Intel
Very few in one path	EMITC	AMD, Intel accelerators, startup

Analysys

By far, the most common ingress is linalg and the most common egress is llvm. There were a lot of custom/proprietary dialects in and out, hard to measure, but it was not in the noise.

The notion that “LLVM and SPIR-V” are the only valid upstream egress is not helpful. Neither is one that forces LLVM or SPIR-V for particular targets (CPU and GPU respectively), since all GPU targets go through their own dialects + LLVM.

Kernel

Selecting all entries that selected “Kernel” in the survey (66/88), I have the following dialect list (up to 30% utilization):

Dialects

Dialect	count	%
cf	57	93.44%
scf	57	93.44%
func	57	93.44%
arith	56	91.80%
llvm	54	88.52%
tensor	49	80.33%
math	49	80.33%
memref	48	78.69%
vector	46	75.41%
linalg	42	68.85%
affine	40	65.57%
gpu	36	59.02%
transform	30	49.18%
index	30	49.18%
bufferization	29	47.54%
omp	21	34.43%

Note a much more concentrated focus on low-level dialects. Others may be leaking here by being in answers that also relate to tensor-compiler. Unfortunately, there’s only 3 answers that were just kernel or tensor-compiler individually, and they had no common patterns.

Because of that leak, ingress/egress patterns are the same as tensor-compiler.

Front-Ends

Selecting all entries that selected “Language Front-Ends” in the survey (43/88), I have the following dialect list (up to 30% utilization):

Dialect	count	%
cf	35	57.38%
scf	34	55.74%
func	34	55.74%
arith	34	55.74%
llvm	34	55.74%
math	27	44.26%
memref	27	44.26%
tensor	25	40.98%
vector	25	40.98%
affine	23	37.70%
linalg	20	32.79%
gpu	19	31.15%
index	19	31.15%

Note the more distributed nature of front-ends (max 57%, while tensor/kernel were ~90%). linalg may also be leaking from tensor compiler, since there were many overlapping answers.

Again, only three answers exclusive to front-ends, but I managed to find 9 entries for front-ends without tensor compiler or kernel. This is the picture:

Pattern	Dialects	Ingress	Egress
llvm	8	0	8
cf	7	2	0
func	7	1	0
scf	6	2	0
arith	5	1	0
index	5	0	0
math	3	1	0
memref	3	0	0
omp	2	1	2
ptr	2	0	0
nvvm	2	0	0
emitc	2	0	0
dlti	2	1	0
vector	1	1	0
affine	1	1	0
linalg	1	0	0
gpu	1	0	1
complex	1	1	0
SPIR-V	1	0	0
nvgpu	1	0	1
ub	1	0	0
arm_sve	1	0	0
ArmSME	1	0	0
arm_neon	1	0	0
acc	1	1	1

Note that some entries are also related to language and hardware design, but that’s still a clear picture of low-level dialects. DLTI gets used twice, which is the place you really expect to use it.

Because these tools come from external dialects or ASTs, this is expected. There may be some discussion here about a mid-level dialect between ASTs (Clang/Flang/Rust) and the low-level ones in MLIR, but that’s a discussion for later, once they form their own charter.

rengolin · December 9, 2024, 12:43pm

Data

Of the 88 answers, there were 48 with linalg in the Dialect selection, and these are the entries above 30% (as above):

Dialect	Count	%
memref	46	95.83%
cf	45	93.75%
scf	45	93.75%
arith	44	91.67%
tensor	43	89.58%
func	41	85.42%
math	39	81.25%
llvm	38	79.17%
vector	36	75.00%
affine	33	68.75%
gpu	32	66.67%
bufferization	31	64.58%
transform	29	60.42%
index	25	52.08%
amdgpu	18	37.50%
TOSA	18	37.50%
omp	15	31.25%
SPIR-V	15	31.25%

Note, there are 20 entries for TOSA overall and 18 of those relate to Linalg and the half of those mention linalg as ingress. The others are Arm and AMD projects and most still go through Linalg at some point anyway.

Analysis

This looks similar to the Tensor Compiler and Kernel trends. There’s a very strong (and highly expected) correlation between these points.

There’s also a stronger correlation of the vector dialect with Tensor Compiler, Kernel and the linalg dialect (~75%) than with Front-Ends (~40%), which may change once these front-ends start running the tensor compiler vectorization transforms.

In a sub-division of charters, I would recommend the vector dialect to remain in the “tensor compiler/kernel” sub-group than in the “front-end” sub-group for the time being, since these operations were created with tensor compilers in mind.

Once the front-ends are firmly in MLIR upstream we can open the discussion as to what would a generic vector dialect look like, and if we want to extend, change or create a new one.

We should also discuss where the types live. The type dialects (memref, tensor and vector) are just operations, not types. The types are in the builtin dialect and would have a different sub-group looking at it. If that’s a good thing or not, it’s not yet clear to me, but we can also leave that for after we decide on a sub-division.

Topic		Replies	Views
[RFC] MLIR Project Charter and Restructuring MLIR	92	3491	November 16, 2024
[RFC] Restructuring of the MLIR repo MLIR	50	5075	February 14, 2022
Open MLIR Meeting 2/24/2022: Continuing discussion re ml_program Announcements	7	1379	March 16, 2022
Open MLIR Meeting 2/17/2022: IREE’s input dialect Announcements	10	1014	February 20, 2022
MLIR News 71st Edition (3rd Nov 2024) Newsletter llvm-weekly	0	303	November 3, 2024

[Survey] MLIR Project Charter and Restructuring Survey

Tensor Compiler

Dialects

Ingress & Egress

Ingress

Egress

Analysys

Kernel

Dialects

Front-Ends

Data

Analysis

Related topics