Git move survey

Hi Tanya,

Do you have an idea on how we'll put up this survey online?

The other thread on the foundation list had some good proposals, but I
don't know how we'll make sure we follow all of them when we actually
publish it.

I'd like to have some process going before early Sep, so we can just
"put it up" as soon as all the proposals are finished.

Some questions...

Q1. How do we choose the questions / format?

Starting from the other thread would be good, but we may need a review
process to make sure everyone think it's a fair questionnaire (or the
exercise won't have the desired effect).

Q2. Where do we host it?

This is mostly independent and a bit irrelevant, but it could make a
difference to answer the question above.

Q3. Do we fix a date, or make it depend on the proposals?

If the former, we may run with incomplete proposals. If the latter, we
may run it too late. A mix of both will be the answer, but I
personally think we need to set a limit date (early Sep).

Q4. How do we present the results?

I volunteer to do the analysis and conclusions, but I think that the
bulk of the results should be made public at some point. If we decide
to have private text boxes, we must keep them out, but everything else
should be public.

I also think we should have a session at the US LLVM to present the
results and maybe have a final decision there.

cheers,
--renato

Hi Tanya,

Do you have an idea on how we’ll put up this survey online?

I can put it in Google Forms/Docs (or you can if you prefer). Survey monkey is also an option but its about the same I think but more costly… so lets go with free and easy.

The other thread on the foundation list had some good proposals, but I
don’t know how we’ll make sure we follow all of them when we actually
publish it.

I’d like to have some process going before early Sep, so we can just
“put it up” as soon as all the proposals are finished.

Some questions…

Q1. How do we choose the questions / format?

Starting from the other thread would be good, but we may need a review
process to make sure everyone think it’s a fair questionnaire (or the
exercise won’t have the desired effect).

I do think its been reviewed quite a bit, but if you want to give anyone else a chance to review then ok.

Q2. Where do we host it?

This is mostly independent and a bit irrelevant, but it could make a
difference to answer the question above.

Google Forms/Docs should work.

Q3. Do we fix a date, or make it depend on the proposals?

If the former, we may run with incomplete proposals. If the latter, we
may run it too late. A mix of both will be the answer, but I
personally think we need to set a limit date (early Sep).

I’m not sure I follow. Are you talking about the one repo versus many discussion? Otherwise, why wouldn’t there be one GIT proposal document that describes GITHUB versus SVN (pros and cons) and then people can give feedback?
http://llvm.org/docs/Proposals/GitHubSubMod.html

I also don’t think this is a process that needs to be rushed. It will take awhile to publicize the survey and give people a chance to comment. Its more reasonable to give until Dec (1 month after the dev meeting BoF) give time for feedback. But thats just my opinion.

Q4. How do we present the results?

I volunteer to do the analysis and conclusions, but I think that the
bulk of the results should be made public at some point. If we decide
to have private text boxes, we must keep them out, but everything else
should be public.

I would just make a note that the feedback may be made public and whoever is filling it out should be responsible for making sure its not releasing sensitive information. Everything should be anonymized of course.

I also think we should have a session at the US LLVM to present the
results and maybe have a final decision there.

I would use the BoF as a way to discuss in person but I wouldn’t put pressure to make a decision during it. It will probably become more clear when you start getting feedback.

-Tanya

Hi,

Hi Tanya,

Do you have an idea on how we’ll put up this survey online?

I can put it in Google Forms/Docs (or you can if you prefer). Survey monkey is also an option but its about the same I think but more costly… so lets go with free and easy.

The other thread on the foundation list had some good proposals, but I
don’t know how we’ll make sure we follow all of them when we actually
publish it.

I’d like to have some process going before early Sep, so we can just
“put it up” as soon as all the proposals are finished.

Some questions…

Q1. How do we choose the questions / format?

Starting from the other thread would be good, but we may need a review
process to make sure everyone think it’s a fair questionnaire (or the
exercise won’t have the desired effect).

I do think its been reviewed quite a bit, but if you want to give anyone else a chance to review then ok.

Q2. Where do we host it?

This is mostly independent and a bit irrelevant, but it could make a
difference to answer the question above.

Google Forms/Docs should work.

Q3. Do we fix a date, or make it depend on the proposals?

If the former, we may run with incomplete proposals. If the latter, we
may run it too late. A mix of both will be the answer, but I
personally think we need to set a limit date (early Sep).

I’m not sure I follow. Are you talking about the one repo versus many discussion? Otherwise, why wouldn’t there be one GIT proposal document that describes GITHUB versus SVN (pros and cons) and then people can give feedback?

I think the survey should be regarding question based-off a single document putting side-by-side the options that we came-up with on the mailing-list.
Indeed I don’t plan to write a document describing a “mono-repo” proposal to counter the submodules one, but I plan instead to unify it with the existing one (submodules…) along with the possible variants/options in a single document.
I plan to include examples of workflow today and after for each scenario, side-by-side. I hope to have it up for public review by the end of the month.

http://llvm.org/docs/Proposals/GitHubSubMod.html

I also don’t think this is a process that needs to be rushed. It will take awhile to publicize the survey and give people a chance to comment. Its more reasonable to give until Dec (1 month after the dev meeting BoF) give time for feedback. But thats just my opinion.

I’d regret not having the results of the survey for the BoF as these data seem critical to drive the discussion.

Q4. How do we present the results?

I volunteer to do the analysis and conclusions, but I think that the
bulk of the results should be made public at some point. If we decide
to have private text boxes, we must keep them out, but everything else
should be public.

I would just make a note that the feedback may be made public and whoever is filling it out should be responsible for making sure its not releasing sensitive information. Everything should be anonymized of course.

I also think we should have a session at the US LLVM to present the
results and maybe have a final decision there.

I would use the BoF as a way to discuss in person but I wouldn’t put pressure to make a decision during it. It will probably become more clear when you start getting feedback.

Right, I don’t think the BoF will much productive without the feedback on the proposal.

I can put it in Google Forms/Docs (or you can if you prefer). Survey monkey
is also an option but its about the same I think but more costly.. so lets
go with free and easy.

Let's go free/easy. :slight_smile:

I'll try to come up with something and see if it's at all possible to
have a review system based on Google Docs' comments (we use it quite
successfully at Linaro). It should be ok for "previously mostly
agreed" docs.

Are you talking about the one repo versus many
discussion? Otherwise, why wouldn’t there be one GIT proposal document that
describes GITHUB versus SVN (pros and cons) and then people can give
feedback? http://llvm.org/docs/Proposals/GitHubSubMod.html

Yes. There has been a lot of discussions on sub-mod vs. mono-repo, and
most people said we need both points reasonably well discussed and
consolidated to make the survey return more accurate results.

I also don’t think this is a process that needs to be rushed. It will take
awhile to publicize the survey and give people a chance to comment. Its more
reasonable to give until Dec (1 month after the dev meeting BoF) give time
for feedback.

Excellent! I agree with you.

I would just make a note that the feedback may be made public and whoever is
filling it out should be responsible for making sure its not releasing
sensitive information. Everything should be anonymized of course.

KISS, I like it.

I would use the BoF as a way to discuss in person but I wouldn’t put
pressure to make a decision during it. It will probably become more clear
when you start getting feedback.

That's good. I think we can have two summaries, one from the survey
and one from the BoF, and then take some time to digest and take the
decision later.

Leaving the decision for the BoF wouldn't be fair to all people that
didn't attend, even if we took the survey's results in consideration.

cheers,
--renato

I think the survey should be regarding question based-off a single document
putting side-by-side the options that we came-up with on the mailing-list.
Indeed I don’t plan to write a document describing a “mono-repo” proposal to
counter the submodules one, but I plan instead to unify it with the existing
one (submodules…) along with the possible variants/options in a single
document.

I agree this is probably the most sensible solution. Thanks for
merging the options.

I plan to include examples of workflow today and after for each scenario,
side-by-side. I hope to have it up for public review by the end of the
month.

Excellent! I'll get the form rolling in parallel, and hopefully we'll
reach maturity around the same time.

I’d regret not having the results of the survey for the BoF as these data
seem critical to drive the discussion.

Agreed. Let's aim for that.

cheers,
--renato

I went back and looked at the current survey, and I have a lot of thoughts I wanted to share. I apologize for this being a dense response to a days-dormant thread.

What information do we want to get out of the survey? The current survey is mostly just giving people people a way to vote for their preferred solution. I think this is a huge missed opportunity. One of the things I’ve found most frustrating about the Git-related threads is that there have been several assertions following the form "most people ". This is a really great opportunity for us to actually get some real data to prove or disprove these assertions. For some data points I pulled a few assertions out of the mono-repo thread:

David Chisnall wrote:

“clang-tools-extra is explicitly a bunch of stuff that doesn’t belong in the main clang repo because it’s not of interest to most people doing clang work”

Paul Robinson wrote:

“I’m not clear why imposing this cost on everybody who wants less-than-all (which I’d think would be most people)”

Justin Lebar wrote:

“If you use the workflow that we currently have, then on the client side, there is no guarantee that your subprojects will be sync’ed. (This is the same as most peoples’ client-side git workflows today.)”

I wrote:

“I think we have some pretty strong evidence in the form of the github fork counts (https://github.com/llvm-mirror/) that most people aren’t using all of the LLVM projects.”

In a very general sense I think the survey as written is little more than a vote for which option people prefer, and an opportunity to rate how good or bad they think the alternative is. As a result I think the current survey has a selection bias that will exclude people who may not have clear or strong opinions on the proposals. As I said in an earlier response I also think the reliance on text fields will make the data harder to process and understand if we get a large number of responses (and I really hope we get a lot of responses).

I think we should consider approaching this problem differently. Instead of structuring a vote, we could focus on gathering data about users and workflows, and using that real-world data to guide a decision that is best for the most common use cases. Correlating information about people’s workflow answers against their relationship to the community will allow us to categorize and weigh the results.

I’ve compiled a list of a few pieces of data I think we should gather. If we took an approach like I’m proposing for the survey we would want more people in the community to suggest additional things to gather information around.

My list is:

(1) Which projects people contribute to, and which ones they use (separately)

By combining the projects you use or contribute to into a single question we’re actually losing a lot of relevant information. I believe a lot of people contribute to Clang, but only use libcxx. I believe this based on the number of contributors to clang and libcxx over the last year (284 and 41 respectively). Mashing these into the same question loses information that I think is relevant. In particular I believe it is common for clang contributors to use projects that they don’t contribute to, and we should try and quantify that. If we don’t want to have multiple questions for this, we could infer the projects a person contributes to if we match the email address in the survey against the email address on commits, which would also be an acceptable route to this information.

(2) How many people build clang against an installed LLVM?

I know it does get used this way, but have no idea how common it is. We recently had a series of changes because cc1_main.cpp was including llvm’s Config.h which isn’t installed. I think this is a very uncommon use case, my evidence for this is that the change breaking the standalone build was months old before it was detected. Alternatively it might be a common use case that is only used on the release branches (which would make some sense). Either way it would be good to gather data around it. Knowing how end users and package maintainers are using our existing source distributions is useful information when thinking about infrastructure changes. This doesn’t necessarily mean we shouldn’t do something that impacts them, but it allows us to make informed decisions.

(3) How many people use runtime projects without LLVM or Clang?

There have been several discussions lately about supporting runtimes without LLVM sources, we might want to figure out how common that desire is. It also might be nice to be able to correlating people who want that support with people who contribute to the runtimes.

Data points:

Bug 18331 - [cmake] Please make compiler-rt’s build system stand-alone
Bug 29109 - [cmake / compiler-rt] Please make tests runnable against installed LLVM

(4) How many people are people getting LLVM sources today?

Over the course of the many discussions on moving to Git we still actually don’t know how many people are using Git already. Knowing how many people are using Git, or Git-SVN when interacting with LLVM sources is a really simple question that will tell us a lot about the impact of a move to Git on the wider community. We also don’t know whether people are getting sources from the LLVM SVN repository, or the git mirrors, or the GitHub mirrors, or Takumi’s mono-repo. It would be really great to gather information about where people are getting LLVM sources, and how they interact with them.

Structuring a survey to gather primarily information either in addition to or instead of opinion we can augment any decision with data providing a justification.

-Chris

I went back and looked at the current survey, and I have a lot of thoughts I
wanted to share. I apologize for this being a dense response to a
days-dormant thread.

That's ok, I ws just about to ask again. After a few long threads, I
learnt that silence rarely means consensus.

What information do we want to get out of the survey? The current survey is
mostly just giving people people a way to vote for their preferred solution.

So, this is specifically what people wanted. I apologise for not doing
a full sweep on the old threads, but a few points were clear as I
asked around:

1. We don't want simple votes, we want to understand the vote (the survey).
2. We need concrete data to choose from. Each proposition needs a
clear and complete description (the docs).
3. Free text fields are required to allow people to expand on their choices.

and a few things are clear (to me, personally) from previous attempts
to gather opinions and consensus:

4. Too many questions dilute the statistical quality of the answers.

We won't have many more than 100 answers, so asking more than 10
relevant questions could get us in a situation where there is no clear
consensus.

5. Different people answer the same questions differently.

It is practically impossible to phrase a question in a way that
everyone will answer in the same way, and trying to capture all
possible ways will explode the number of multiple choices to
exhaustion, leading to the problem above (4).

6. Too many free text answers can be exhaustive to read, interpret and classify.

100 answers may be too little for statistical relevance, but it's too
much to collate in a coherent and meaningful report. Free text is not
far from emails, where we'll end up quoting one line of a long thread
as more relevant than the rest of the thread itself. In a sense,
creating a survey of free text questions will be too much like a long
thread, but without digressions.

I think this is a huge missed opportunity. One of the things I've found most
frustrating about the Git-related threads is that there have been several
assertions following the form "most people <blah>". This is a really great
opportunity for us to actually get some real data to prove or disprove these
assertions.

While I agree with you that we could learn so much more if we did a
more elaborate survey, the point of this one in particular was to know
what people prefer with regards to their version control system.

I think it would be amazing to learn all different ways people use
LLVM, and that would give us a huge insight on how to organise the
repositories, websites, mailing lists and even the code itself and how
it's built.

But this is a different topic, and one that will take considerably
longer than two months to do. My first email on 31st of May and we're
planning to have a survey in September, a meeting in November to maybe
decide something in December. And this is *just* about version
control.

In a very general sense I think the survey as written is little more than a
vote for which option people prefer, and an opportunity to rate how good or
bad they think the alternative is. As a result I think the current survey
has a selection bias that will exclude people who may not have clear or
strong opinions on the proposals.

I disagree.

People that don't have a strong opinions *also* need to evaluate how
this is change their work. I myself don't have a strong opinion, and
I'm fine either way, but I *will* have to change my work and we are
already planning for both moves.

If people don't prefer any, but will have a much more serious problem
migrating to one and not the other, they should mark the right option
on the survey and then describe what the problem will be.

In the end, we are already ignoring the preference of a lot of people
that use Git today. So I don't see a way to cater for everyone, nor I
see a way to weight someone's opinions heavier that others based on
free text answers, in the same way I can't tell if a thread has
consensus or not by counting the number or people pro and against.

The survey needs an element of a vote, but it also needs an element of
description, and it has both.

As I said in an earlier response I also
think the reliance on text fields will make the data harder to process and
understand if we get a large number of responses (and I really hope we get a
lot of responses).

Precisely.

I think we should consider approaching this problem differently. Instead of
structuring a vote, we could focus on gathering data about users and
workflows, and using that real-world data to guide a decision that is best
for the most common use cases. Correlating information about people's
workflow answers against their relationship to the community will allow us
to categorize and weigh the results.

This is a *completely* different problem and, while I can see it's
related to the one at hand, I don't think we can reliably assess
what's the best way forward in the particular case of version control
in any reasonable time.

we could infer the projects a person contributes to if we match the email
address in the survey against the email address on commits, which would also
be an acceptable route to this information.

we could, for most cases.

Knowing how
end users and package maintainers are using our existing source
distributions is useful information when thinking about infrastructure
changes.

Again, a completely different problem. One I do want to solve, but I
really didn't want to start intermixing hard problems together.

There have been several discussions lately about supporting runtimes without
LLVM sources, we might want to figure out how common that desire is. It also
might be nice to be able to correlating people who want that support with
people who contribute to the runtimes.

There is enough evidence already in the lists and current downstream
users to assume it is a common use.

But this is being addressed already via other ways, ie. discussion
between the interested parties (ex. compiler-rt/libunwind split,
compiler-rt cross-build, test-suite cmake-ification, libc++ isolated
testing, etc.)

(4) How many people are people getting LLVM sources today?

This is one question that we could easily add, giving a few options
and "other" with a free text field.

I foresee no complications from this additional question.

Structuring a survey to gather primarily information either in addition to
or instead of opinion we can augment any decision with data providing a
justification.

It can also make the whole thing useless by not providing enough
information so that each biased opinion can be "proven" by munching
the data in a slightly different way.

In order to have a clear signal we need simple and aggregative
questions, with a free text question to complement.

I'm equally interested in the results of such a broader survey, but
not as a way to choose the version control system that we use.

Maybe we could do a parallel survey? One that wouldn't need to be
complete by the dev meeting to be of use? One that would be taken even
more advisory than the current one?

cheers,
--renato