[Survey] MLIR Project Charter and Restructuring Survey

Following up our long discussion on the MLIR project governance and charter, we decided to create a survey to understand how MLIR developers and users connect to the upstream infrastructure.

Thanks @stellaraccident @mehdi_amini @jpienaar @ftynse for the help and comments. We’re trying to collect enough information to be able to cross-reference usage, patterns and find a reasonable set for most-used dialects per type of project, etc.

The survey will be live until the end of November, 2024, when we’ll start working on the answers, collecting some statistics and drawing some conclusions. We’ll present the results and anonymous answers soon after.

It is a bit long, and we appreciate your time filling it, but it’s important we get enough data now, so we can drive the rest of the conversation around governance and charter. We hope we can get a clear enough picture to start the process, not find the perfect balance on the first try.

If you find anything wrong with the survey, the questions, etc. either reply in this thread or ping me directly. We’ll try to fix any issues, but once people start to respond, we can change less and less without compromising the quality of the results.

Thank you very much in advance!

CC: @clattner @banach-space @River707 @Mogball @javedabsar @MaheshRavishankar @makslevental @kiranchandramohan @AaronBallman @TobiasGrosser @asb

16 Likes

Apparently the survey is in “violation of Google’s Terms of Service”, and requesting a review brings a 404 page. Luckily, they didn’t delete the spreadsheet results, so I’ll still be able to extract information from it, just not as quick as I hoped. I also made copies, in case the sheets also get deleted.

This means the survey is now closed.

We got 83 responses across a wide spectrum of uses and areas. I’m hoping to get some bird-view stats next week.

The idea is to match usage type and projects with dialects and upstream experience and see how we can organize the governance in bundles that reduce overall time cost for those involved. This is not about code movement (yet), which should be discussed after we agree on governance.

Thank you everyone that took the time to answer this long questionnaire!

2 Likes

That is quite unfortunate! Maybe a lesson about relying on Google’s product in the future :frowning:

Also, considering ThanksGiving week, I’m concerned we haven’t got the feedback from everyone who intended to. I personally know multiple people who told me they intended to submit feedback and I’m sure they were still working on it and haven’t been able to submit it :frowning:

3 Likes

I’m one of those who was planning to answer it over the Thanksgiving break. It would be great if we could find a way to still answer it and give a bit more time.

2 Likes

Same here, thanksgiving got in the way, it’d indeed be great if there’s a second opportunity to go over the questionnaire!

2 Likes

Because I had to do this all by hand with Sheets’ functions and plots, the quality of the results will be lower, but still bear with me while I compile more results.

Meanwhile, I have some interim stats that may be interesting.

Final tally: 88 results.
Initial take away:

  • The usual suspects (AMD, Intel, Google, Nvidia, SiFive) top the list
  • 100% technical crowd (no “business leaders”)
  • Three major areas dominate (percentages don’t add up to 100%):
    • Tensor Compiler (72%)
    • Kernel generation (75%) (usually together with tensor compiler)
    • Front-ends (49%)
  • More than half of the people didn’t need big changes to MLIR, no one thought it was a disaster (perhaps bias here?)
  • About 60% of the respondents do work upstream, most on dialects (another bias?)

On the experience upstream, core or dialect, the picture is the same:

  • About half had overall positive experiences.
  • About 20% were blocked on hard technical problems (expected, from a compiler!)
  • Dialects have a higher perception of “easy problems” getting in the way, but here, too, bias on “I know my local usage” could be at play.

On upstreaming a new dialect, half say they didn’t need, a quarter (25%) are happy with the results, with the final quarter split between problems they faced.

These results have detailed comments, which we should use to see what was wrong and how to fix it. But that’s for a later day. I just wanted to give an overview.

Overall, a positive picture is emerging, which is good news. Also, it’s clear that two sub-groups are distinct: tensor compiler + kernel lowering and front-ends, with other groups scattered amongst themselves. This gives us a good signal for the governance proposal on which groups should begin to take form, with the rest covered by a core group.

Will update with more data as I go.

3 Likes

I think there is a bias, but may be going other way :slight_smile: (from folks asking rather than folks answering).

Note, and this is what I meant by weighing, this data represent who answered not the folks they represent. E.g., I answered for the groups I’m involved in rather than tell every engineer working on the project to go answer. So company percentage doesn’t make sense beyond being respondents but not projects or engineers.

So these groups don’t share dialects? Or they do but different groups of people?

A primary goal was identifying unused dialects. What does those results look like?

Do we have to differentiate between the Linalg cluster and not?

1 Like

I made announcements on two internal (AMD) chat rooms when the survey opened telling people that if they worked on this stuff, they should take the survey. A few people asked me for advice on how to answer, and I just told them to do their best with their own experiences. Based on feedback, I gather that the folks who directly touch this tech answered and (due to the detailed nature of the questions) few of the more passive consumers did. If we do future versions of asking questions, imo, it would be good to structure it to explicitly reinforce that bias.

Getting feedback is both an art and a science. I suspect that a first run survey like this has enough bias built in that it will only provide a rough analytical outline of the status quo. But it gives some starting points to ask more questions. Also, the comment feedback is likely to be pretty invaluable for both qualitative and quantitative follow-ups.

That! :point_up:

People responded “Tensor Compiler” AND “Kernel codegen” together on most combinations, and the union amount to about 80% of the respondents. I don’t yet have the actual percentages because google sheets doesn’t give me that and I have not yet have time to use a proper tool, but visual inspection tells me it’s significant.

“Front-end” shows up in half of the answers, which is also significant, but could also mean StableHLO/PyTorch, which I’ll only know once I start looking at the dialects and text fields (which I haven’t).

Bottom 10:

Dialect Users
vcix 0
amx 1
irdl 1
acc 2
polynomial 2
x86vector 4
arm_neon 5
mpi 5
quant 6
sparse_tensor 6

Top 10:

Dialect Users
linalg 48
vector 53
math 57
tensor 57
memref 61
llvm 69
arith 72
scf 72
cf 73
func 73

Not yet. This will take some form of PCA to get right.

1 Like

Sorry, I missed this point on my reply. I totally agree with you and did not intend to make this point at all.

I asked the company on the form precisely to do what you ask: let’s bundle and weight in by what the answer represents (single person, team, company wide) and not just count beans. That’s why I also asked project name.

For instance, most AMD people work on IREE (as expected). So I can group their answers into a common theme, but I cannot count all their answers as one, because there’s a lot of people there that actually work with it. Same for Google, Arm, Intel, Nvidia.

We’re not trying to compete who has the most people, just understand where these people are coming from, trying to do and the common trend.

2 Likes

Because people answered all the dialects they use, just looking at the cloud of dialects isn’t super helpful. Because some people answered for their groups while other for themselves, we have duplicated data. So, I wrote a small python script that uses https://networkx.org/ and looked at how connected the dialects really are.

Here’s the list of dialects in order of connectivity. This is how many people answered that they use these dialects together (edge weight), then summed up all edged from each dialect.

Most connected dialects:
[
 (799, 'scf'), (706, 'arith'), (689, 'math'), (687, 'llvm'), (656, 'memref'),
 (621, 'linalg'), (604, 'tensor'), (598, 'vector'), (544, 'cf'), (505, 'affine'),

 (428, 'index'), (408, 'gpu'), (376, 'transform'), (371, 'buffer'), (248, 'omp'),
 (171, 'amdgpu'), (137, 'nvvm'), (134, 'TOSA'), (131, 'emitc'), (125, 'shape'),

 (121, 'pdl'), (101, 'nvgpu'), (97, 'ptr'), (95, 'complex'), (89, 'rocdl'),
 (71, 'dlti'), (66, 'pdl_interp'), (58, 'ml_program'), (45, 'async'), (36, 'ub'),

 (31, 'xegpu'), (28, 'mesh'), (21, 'sparse'), (15, 'mpi'), (10, 'x86vector'),
 (3, 'poly'), (1, 'irdl'), (0, 'vcix')]

Of the top 10, the only one that isn’t a generic dialect (used by multiple different use-cases) is linalg.

This is how each of those are connected to other dialects, sorted by weight:

Top 10 connectivity:
 * 'scf': func llvm memref math tensor cf vector linalg affine gpu index buffer transform omp TOSA SPIR-V shape pdl rocdl ptr dlti pdl_interp async xegpu quant ArmSME x86vector arm_neon acc poly 
 * 'arith': scf llvm memref math cf tensor vector linalg gpu index transform omp TOSA SPIR-V pdl shape complex ptr dlti pdl_interp async xegpu mesh ArmSME arm_neon acc poly 
 * 'math': scf arith func llvm memref vector cf linalg affine gpu buffer index transform omp TOSA complex nvvm ptr nvgpu ml_program pdl_interp dlti async quant ArmSME sparse arm_neon acc poly 
 * 'llvm': scf arith memref math vector cf tensor affine linalg gpu index transform buffer omp nvvm nvgpu pdl TOSA ptr dlti async ArmSME sparse arm_neon x86vector acc amx 
 * 'memref': scf arith func llvm tensor math linalg vector cf gpu index transform omp SPIR-V pdl nvgpu nvvm ptr dlti xegpu quant mpi ArmSME arm_neon poly 
 * 'linalg': memref scf arith tensor func math llvm vector affine gpu buffer cf transform index TOSA omp pdl nvgpu nvvm ml_program dlti ptr xegpu ArmSME quant arm_neon ub poly 
 * 'tensor': scf arith memref func math vector linalg cf gpu transform index omp SPIR-V TOSA shape pdl nvvm complex ptr async xegpu quant sparse ArmSME mpi arm_neon amx 
 * 'vector': scf arith llvm func memref tensor cf linalg affine index buffer transform omp SPIR-V complex pdl nvvm ptr pdl_interp async xegpu quant sparse ArmSME mpi arm_neon amx 
 * 'cf': func scf llvm math memref vector affine linalg index buffer gpu transform omp complex pdl nvvm ptr pdl_interp dlti async ub quant ArmSME arm_neon acc poly 
 * 'affine': scf func memref llvm math tensor vector linalg gpu buffer transform index omp amdgpu pdl nvvm pdl_interp complex ptr quant ArmSME ub arm_neon acc 

The linalg weights are:

Stats for dialect 'linalg':
[
 (46, ('linalg', 'memref')), (45, ('linalg', 'scf')),
 (44, ('arith', 'linalg')), (42, ('linalg', 'tensor')),
 (41, ('func', 'linalg')), (39, ('linalg', 'math')),
 (38, ('linalg', 'llvm')), (35, ('linalg', 'vector')),
 (33, ('affine', 'linalg')), (32, ('gpu', 'linalg')),
 (31, ('buffer', 'linalg')), (30, ('cf', 'linalg')),
 (29, ('linalg', 'transform')), (25, ('linalg', 'index')),

 (18, ('linalg', 'TOSA')), (15, ('linalg', 'omp')),
 (12, ('linalg', 'pdl')), (11, ('linalg', 'nvgpu')),
 (10, ('linalg', 'nvvm')), (9, ('linalg', 'ml_program')),
 (8, ('linalg', 'dlti')), (7, ('linalg', 'ptr')),
 (6, ('linalg', 'xegpu')), (5, ('linalg', 'ArmSME')),
 (4, ('linalg', 'quant')), (3, ('linalg', 'arm_neon')),
 (2, ('linalg', 'ub')), (1, ('linalg', 'poly'))]

Note: the order of the dialect in the tuple is an artifact of its alphabetical order in the list I ran on.

There’s an bulge there around the transform dialect onwards. I’d call this the “linalg bundle”.

Some of those dialects we can decorrelate, because they’re obviously related in other forms. For example, linalg uses func, scf, math, arith, cf, gpu and llvm for lowering and payload. It also uses affine for maps, mostly. Other usages (ex. front-ends) do too, for similar reasons.

This basically leaves: memref, tensor, vector, bufferization, transform as strong correlation. index, omp, pdl are weaker. Still in the same area, but also used elsewhere. The rest is specific to individual uses, it seems.

This is basically what we have been arguing to be the “Tensor Compiler” sub-group for a strong charter to design tensor compilers.

I am anonymizing the data I used to produce those results and will attach to this thread soon. But other correlations are needed for kernel codegen (Triton et al) and Front-ends (CIL, FIR) which are the other two largest sub-communities.

3 Likes

Here’s a snapshot of the script I used (now modified) and the anonymized data for dialects, project type and interaction type:

I don’t claim correctness, and will accept PRs for corrections and new features. I’ll also try to add more anonymous results in there, as time permits.

A bit more data on grouped stats. Again, this is flawed, since some people responded for the whole team, but on visual inspection, this does not change the status quo, as those responded with the signal, not the noise.

Tensor Compiler

Selecting all entries that selected “Tensor Compiler” in the survey (63/88), I have the following dialect list (up to 30% utilization):

Dialects

Dialect count %
arith 54 88.52%
cf 53 86.89%
scf 53 86.89%
tensor 52 85.25%
func 50 81.97%
memref 50 81.97%
llvm 47 77.05%
vector 45 73.77%
math 44 72.13%
linalg 42 68.85%
affine 41 67.21%
gpu 37 60.66%
transform 31 50.82%
bufferization 30 49.18%
index 28 45.90%
omp 19 31.15%

In bold, the most common ingress and egress dialects.

Ingress & Egress

To avoid bias, I looked at the data filtered by ingress and egress dialects and grouped by company/group, not individual responses. “Many” and “few” depend on how many were bundled, so, pinch of salt.

Ingress

Visual count Dialect Groups
Most overall linalg AMD, Intel, Arm, Qualcomm, Microsoft, and many others
Varied across projects tosa AMD/IREE, Arm, Google, startups
Many in one project torch Mainly AMD/IREE, Arm
Few in one path triton MS, Intel, Meta (on Triton side)

Egress

Visual count Dialect Groups
Most LLVM AMD, Intel, Arm, Nvidia, Qualcomm, Microsoft, and many others
Few in one path SPIRV AMD/IREE, Intel accelerators
Very few in one path NVVM Nvidia, AMD, startups
Very few in one path ROCDL AMD, startup
One, to be inclusive :smile: XeGPU Intel
Very few in one path EMITC AMD, Intel accelerators, startup

Analysys

By far, the most common ingress is linalg and the most common egress is llvm. There were a lot of custom/proprietary dialects in and out, hard to measure, but it was not in the noise.

The notion that “LLVM and SPIR-V” are the only valid upstream egress is not helpful. Neither is one that forces LLVM or SPIR-V for particular targets (CPU and GPU respectively), since all GPU targets go through their own dialects + LLVM.

Kernel

Selecting all entries that selected “Kernel” in the survey (66/88), I have the following dialect list (up to 30% utilization):

Dialects

Dialect count %
cf 57 93.44%
scf 57 93.44%
func 57 93.44%
arith 56 91.80%
llvm 54 88.52%
tensor 49 80.33%
math 49 80.33%
memref 48 78.69%
vector 46 75.41%
linalg 42 68.85%
affine 40 65.57%
gpu 36 59.02%
transform 30 49.18%
index 30 49.18%
bufferization 29 47.54%
omp 21 34.43%

Note a much more concentrated focus on low-level dialects. Others may be leaking here by being in answers that also relate to tensor-compiler. Unfortunately, there’s only 3 answers that were just kernel or tensor-compiler individually, and they had no common patterns.

Because of that leak, ingress/egress patterns are the same as tensor-compiler.

Front-Ends

Selecting all entries that selected “Language Front-Ends” in the survey (43/88), I have the following dialect list (up to 30% utilization):

Dialect count %
cf 35 57.38%
scf 34 55.74%
func 34 55.74%
arith 34 55.74%
llvm 34 55.74%
math 27 44.26%
memref 27 44.26%
tensor 25 40.98%
vector 25 40.98%
affine 23 37.70%
linalg 20 32.79%
gpu 19 31.15%
index 19 31.15%

Note the more distributed nature of front-ends (max 57%, while tensor/kernel were ~90%). linalg may also be leaking from tensor compiler, since there were many overlapping answers.

Again, only three answers exclusive to front-ends, but I managed to find 9 entries for front-ends without tensor compiler or kernel. This is the picture:

Pattern Dialects Ingress Egress
llvm 8 0 8
cf 7 2 0
func 7 1 0
scf 6 2 0
arith 5 1 0
index 5 0 0
math 3 1 0
memref 3 0 0
omp 2 1 2
ptr 2 0 0
nvvm 2 0 0
emitc 2 0 0
dlti 2 1 0
vector 1 1 0
affine 1 1 0
linalg 1 0 0
gpu 1 0 1
complex 1 1 0
SPIR-V 1 0 0
nvgpu 1 0 1
ub 1 0 0
arm_sve 1 0 0
ArmSME 1 0 0
arm_neon 1 0 0
acc 1 1 1

Note that some entries are also related to language and hardware design, but that’s still a clear picture of low-level dialects. DLTI gets used twice, which is the place you really expect to use it.

Because these tools come from external dialects or ASTs, this is expected. There may be some discussion here about a mid-level dialect between ASTs (Clang/Flang/Rust) and the low-level ones in MLIR, but that’s a discussion for later, once they form their own charter.

1 Like

Data

Of the 88 answers, there were 48 with linalg in the Dialect selection, and these are the entries above 30% (as above):

Dialect Count %
memref 46 95.83%
cf 45 93.75%
scf 45 93.75%
arith 44 91.67%
tensor 43 89.58%
func 41 85.42%
math 39 81.25%
llvm 38 79.17%
vector 36 75.00%
affine 33 68.75%
gpu 32 66.67%
bufferization 31 64.58%
transform 29 60.42%
index 25 52.08%
amdgpu 18 37.50%
TOSA 18 37.50%
omp 15 31.25%
SPIR-V 15 31.25%

Note, there are 20 entries for TOSA overall and 18 of those relate to Linalg and the half of those mention linalg as ingress. The others are Arm and AMD projects and most still go through Linalg at some point anyway.

Analysis

This looks similar to the Tensor Compiler and Kernel trends. There’s a very strong (and highly expected) correlation between these points.

There’s also a stronger correlation of the vector dialect with Tensor Compiler, Kernel and the linalg dialect (~75%) than with Front-Ends (~40%), which may change once these front-ends start running the tensor compiler vectorization transforms.

In a sub-division of charters, I would recommend the vector dialect to remain in the “tensor compiler/kernel” sub-group than in the “front-end” sub-group for the time being, since these operations were created with tensor compilers in mind.

Once the front-ends are firmly in MLIR upstream we can open the discussion as to what would a generic vector dialect look like, and if we want to extend, change or create a new one.

We should also discuss where the types live. The type dialects (memref, tensor and vector) are just operations, not types. The types are in the builtin dialect and would have a different sub-group looking at it. If that’s a good thing or not, it’s not yet clear to me, but we can also leave that for after we decide on a sub-division.

1 Like