[RFC] GitHub Survey - Please review

Folks,

I've created the survey with the feedback I got on the "Voting" thread
in the llvm-foundation list, and put it here:

https://goo.gl/forms/k4J7M3N7oLNTOlDq2

Apparently, I can't allow people to comment on the form itself. It's
either full permission or nothing. So, I think the best way to do this
is to do a review on the list, with my most sincere apologies to the
anti-spam folks.

For that reason, I have only sent to llvm-dev, and would encourage
people to share privately with colleagues that didn't get it, via
lists, IRC, etc. Let's leave social media out of this, or we risk
having to filter out a lot of spam / trolls and make the whole
exercise moot.

People that have an interest on this question already subscribe to
this list or the IRC channel.

  The Plan

Today it's the 19th, so about the time I promised to put the survey up
for review. From today to the Sep 1st, we'll be filling the form,
trying out the questions, changing the wording, adding new questions,
etc.

If you guys could fill up with some data, see how it feels, and in the
end I'll try to share the bogus results, to see if that's what people
expected.

Around Sep 1st, The GitHub proposal should be finished (we'll have a
common document with both sub-modules and mono-repo explained), and
the survey should also be finished.

Since the survey has some free-text fields, it's less important how
precise is the writing, but we need to get the multiple-choice
questions right, to have a general idea of a "voting" mechanism.

My hope is that by Sep 1st, we'll have the GitHub proposal done and
the survey online for real, when I'll wipe out all responses and we'll
start fresh again.

  Design Choices

TL;DR, feel free to ignore this section...

Just FYI, the design choices for the survey were:

1. Request name, email and affiliation to de-duplicate the data. There
is no way to prevent people from responding twice without forcing them
to sign up on Google, which I will most certainly not do.

The identification also helps us to group people by their affiliations
and to have an idea of representation. I'm not expecting everyone on
the same group to have the same opinion, but it will be interesting to
see how they change.

Name and email will not be shared, but affiliation will (should it?).
I'm expecting the free-text descriptions to be very telling to that
respect, so there's no point is hiding it.

2. Gathering people's involvement in LLVM is important. We want to
know how much stake people have in LLVM, so we can weight more the
choices of people with more stake, but weight the same the *opinions*
of everyone.

What I mean by this is that, if most of the core developers feel
strongly towards using Git and a few external developers feel strongly
against, the people that will be using the most will have a higher
weight.

But the technical arguments of the minority is still weighted in the
same way as the vast majority, after all, they're *technical*
arguments and not *opinions*.

3. Separating "moving to Git/Github" from "using
mono-repo/sub-modules" is crucial. We may not get a consensus on the
latter, but we should get it for the former. It'll be much simpler for
a second iteration if we know we're going to use Git and GitHub and I
want to make sure we get this right.

If we have an overwhelmingly positive response to using GitHub, but
we're still divided to use sub-modules or mono-repo, we can close the
"move to Git" question now, and just work on the details later.

cheers,
--renato

IMO this looks good overall, I think the questions are balanced and don't
encourage any direction themselves.
One minor point while reading (as a non-native speaker): "production product"
sounds weird :smiley:

A thing that doesn't hit myself but that may affect larger companies with e.g.
private build bots:
There may be a difference between "impact on developer workflow" (long-term)
and "one-time impact" (e.g. reconfiguring build bots).
Maybe we can duplicate the question and add an option "No (one-time) impact"
to the second one?

Thanks for working on the proposal and taking the ungrateful job of conducting
the survey!
Jonas

Thank you for working on this!

Some minor nits:

"If you chose 2~4 above, please explain your reasons here"

The above set of radio buttons are not numbered, so the use of numbers
here is a bit strange. Perhaps "If you answered above that moving to
Git/GitHub would have some (or greater) impact on your productivity,
please explain your reasons here" or something along those lines?

For the question about single repo vs submodules, you should have a
choice for "I have absolutely no idea what you're talking about." Not
everyone knows what submodules are or why a single repo vs submodules
would be impactful.

~Aaron

IMO this looks good overall, I think the questions are balanced and don't
encourage any direction themselves.
One minor point while reading (as a non-native speaker): "production product"
sounds weird :smiley:

It does... Changed...

There may be a difference between "impact on developer workflow" (long-term)
and "one-time impact" (e.g. reconfiguring build bots).
Maybe we can duplicate the question and add an option "No (one-time) impact"
to the second one?

That's a good point. Though, I assume everyone will have "some"
ont-time impact, and the real problem is if this is going to be
impossible (or at least very hard) for people to work with Git in the
long term goal.

Wouldn't hurt asking about the short-term impact, but maybe we
shouldn't add another free text, just a small multiple-choice one.
I've done that.

cheers,
--renato

The above set of radio buttons are not numbered, so the use of numbers
here is a bit strange. Perhaps "If you answered above that moving to
Git/GitHub would have some (or greater) impact on your productivity,
please explain your reasons here" or something along those lines?

Good point, updated.

For the question about single repo vs submodules, you should have a
choice for "I have absolutely no idea what you're talking about." Not
everyone knows what submodules are or why a single repo vs submodules
would be impactful.

Hum, interesting. I guess we need to cater for that. Added.

cheers,
--renato

For people who are doing read-only access, to include llvm in the build of
some other project, the impact might be as small as a one-time changing of
the URL for the "svn checkout" in their script.

Or maybe even using "svn relocate" on a permanent checkout (and their
script doesn't need to change its "svn up" or whatever). That does depend
on github's fake svn server output being absolutely identical to the
current svn server. I'm not sure whether that can be made true or not.

For that kind of use, I believe the GitHub SVN interface will "just
work". I have tested it read-only and read-write access and my SVN
client was very happy with it.

Some people said it was a bit slow, but that should only make a
difference if you're checking out the whole thing. (svn relocate may
help, too).

But there's also the idea of changing the process to Git, which would
involve some changes to their scripts, but not a big one.

I'm more worried with huge build systems that heavily use SVN as a
core part of their infrastructure, making it a lot harder to "move to
git".

I'm expecting this number to be small nowadays...

cheers,
--renato

I think that if we’re going to do a survey we should gather more information about how people interact with the LLVM projects, not just opinions on the proposals.

A few questions I think would be useful.

(1) Which project(s) do you contribute to?
(2) Which project(s) do you regularly use?
(3) How often do you bisect LLVM with one or more subproject?
(4) Do you use any of the llvm.org projects without LLVM or with out-of-sync LLVM (i.e. trunk libunwind with an old LLVM)?
(5) In which ways do you get LLVM sources from LLVM.org?
(a) SVN
(b) llvm.org Git mirrors
(c) Git-SVN
(d) GitHub Git mirrors
(e) Other
(6) Do you, or an organization you are affiliated with, maintain tooling or infrastructure that interacts with llvm.org and is not public?

-Chris

I think it might be good to draw a clearer line between the
contributor and their organization. I suspect Apple's infrastructure
will be far more affected by the change than I will personally and
there's not really a way to fit that information into the current
survey.

Tim.

Renato Golin via llvm-dev <llvm-dev@lists.llvm.org> writes:

3. Separating "moving to Git/Github" from "using
mono-repo/sub-modules" is crucial. We may not get a consensus on the
latter, but we should get it for the former. It'll be much simpler for
a second iteration if we know we're going to use Git and GitHub and I
want to make sure we get this right.

If we have an overwhelmingly positive response to using GitHub, but
we're still divided to use sub-modules or mono-repo, we can close the
"move to Git" question now, and just work on the details later.

This seems problematic. In fact, it makes some of the survey questions
unanswerable, ie:

    "What will be your one-time cost to moving to Git/GitHub?"

The one time cost of the mono-repo proposal is drastically different
than that of the multi-repo.

    "How would moving to Git/GitHub impact your usage of LLVM in the
     long term?"

I already use git, but depending on how things are organized in the new
world this may completely change how I work with LLVM.

+1.

I think it might be good to draw a clearer line between the
contributor and their organization. I suspect Apple's infrastructure
will be far more affected by the change than I will personally and
there's not really a way to fit that information into the current
survey.

Tim.

Excellent point. Sony's infrastructure pain would be significant to
those of us having to implement the conversion, but that change would
be essentially invisible to the rest of the team as our internal branches
aren't going to look any different. Data about our corporate pain would
be appropriate from a couple of people, but not from the rest of the team.
--paulr

I'm open to suggestions on how do we separate company's worries from
personal ones.

I'm assuming multiple people for some companies will reply. Are they
all giving their personal views or the company's? Would one person in
the company be selected to tell the tale, or do we join all responses
to mean the whole?

I don't have answer to those questions... :frowning:

cheers,
--renato

    "How would moving to Git/GitHub impact your usage of LLVM in the
     long term?"

I already use git, but depending on how things are organized in the new
world this may completely change how I work with LLVM.

Conversely, I'm currently using SVN for my own upstream interactions
but replacing my upstream checkouts with git clones (whatever they look
like) would cause me essentially no pain (probably). The current phrasing
of the survey doesn't let me say that.
--paulr

The one time cost of the mono-repo proposal is drastically different
than that of the multi-repo.

True.

But maybe not as different as from one company / project to another.
I'm assuming some people will suffer a lot more than others on either
choice.

I already use git, but depending on how things are organized in the new
world this may completely change how I work with LLVM.

It will, but you already work around with Git-SVN, which is a pain.

I don't see "the other option" being more cumbersome than Git-SVN,
whatever is the one option you pick. But this is just my opinion.

If that doesn't work for you, can you suggest a solution that will?

cheers,
--renato

Heh, that was an oversight from my part. :slight_smile:

I'll change the first answer to mean that: Moving to "Git will bear no
penalty", regardless if you already use Git.

cheers,
--renato

Hi Chris,

Bear in mind that the more questions we have, the harder it will be to
interpret the results. If we have 20+ questions, it'll be impossible
to understand anything.

Also, the multiple choice questions are meant as a guide to understand
"how many" people fall into one or another category, while the free
text ones are meant to complement and give technical reasons for their
answers.

So, we should focus our multiple choice questions on divisive topics
and let everything else to the free text-fields.

(1) Which project(s) do you contribute to?
(2) Which project(s) do you regularly use?

I've added these two as one. I know they're slightly different, but so
will be the the answer to the first question, which will work to
disambiguate this one.

(3) How often do you bisect LLVM with one or more subproject?

I understand that this is a contentious issue around the git move, but
we should focus on the bigger picture, which is day to day usage as
well as infrastructure.

(4) Do you use any of the llvm.org projects without LLVM or with out-of-sync
LLVM (i.e. trunk libunwind with an old LLVM)?

This looks very specific to me, I'm trying to avoid side questions
here and let people write up on the free text areas what their usage
is.

(5) In which ways do you get LLVM sources from LLVM.org?
   (a) SVN
   (b) llvm.org Git mirrors
   (c) Git-SVN
   (d) GitHub Git mirrors
   (e) Other

I don't think that previous usage is relevant. It may be relevant to
the people doing it and to their responses on how hard it will be, but
this should be encoded in the other questions. Some of that already
is.

(6) Do you, or an organization you are affiliated with, maintain tooling or
infrastructure that interacts with llvm.org and is not public?

This is a topic for the free-text fields.

cheers,
--renato

Hi Chris,

Bear in mind that the more questions we have, the harder it will be to
interpret the results. If we have 20+ questions, it'll be impossible
to understand anything.

Also, the multiple choice questions are meant as a guide to understand
"how many" people fall into one or another category, while the free
text ones are meant to complement and give technical reasons for their
answers.

So, we should focus our multiple choice questions on divisive topics
and let everything else to the free text-fields.

(1) Which project(s) do you contribute to?
(2) Which project(s) do you regularly use?

I've added these two as one. I know they're slightly different, but so
will be the the answer to the first question, which will work to
disambiguate this one.

(3) How often do you bisect LLVM with one or more subproject?

I understand that this is a contentious issue around the git move, but
we should focus on the bigger picture, which is day to day usage as
well as infrastructure.

I disagree. I think this is a really important thing to understand how people interact with the SCM system. I think one part of the benefit of doing a survey is that we can gather user data about how common specific workflows are.

(4) Do you use any of the llvm.org projects without LLVM or with out-of-sync
LLVM (i.e. trunk libunwind with an old LLVM)?

This looks very specific to me, I'm trying to avoid side questions
here and let people write up on the free text areas what their usage
is.

Fair enough. I do think that relying on the write-up fields is potentially difficult. If everyone who contributed to LLVM projects in the last year responds to the survey you’re talking about somewhere on the order of 500 responses. More if non-contributors respond (and I hope they do). Weeding through lots of write-in data could be come a monumental task. I think it is better to have more yes/no and multiple choice questions to help collate the data. Admittedly wording them neutrally will be a challenge, but I think it is something we should do.

(5) In which ways do you get LLVM sources from LLVM.org?
  (a) SVN
  (b) llvm.org Git mirrors
  (c) Git-SVN
  (d) GitHub Git mirrors
  (e) Other

I don't think that previous usage is relevant. It may be relevant to
the people doing it and to their responses on how hard it will be, but
this should be encoded in the other questions. Some of that already
is.

I also disagree on this. I think it is useful to know which workflows are common in the community. It will help make informed decisions about the direction of the infrastructure.

(6) Do you, or an organization you are affiliated with, maintain tooling or
infrastructure that interacts with llvm.org and is not public?

This is a topic for the free-text fields.

I feel less invested in this question than in some of the others, because I think we know that this is a fairly common situation (especially with the large corporate contributors), but I still think relying on free-text fields for questions that can be yes/no or multiple choice will degrade our ability to process the information.

In general I think there is a sweet-spot on the length of the survey. We don’t want to go overboard with too many questions, but I think that relying on text fields could turn out to make the data very difficult to process if we get a lot of respondents.

-Chris

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: Friday, August 19, 2016 10:42 AM
To: Robinson, Paul
Cc: Tim Northover; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [RFC] GitHub Survey - Please review

> Excellent point. Sony's infrastructure pain would be significant to
> those of us having to implement the conversion, but that change would
> be essentially invisible to the rest of the team as our internal
branches
> aren't going to look any different. Data about our corporate pain would
> be appropriate from a couple of people, but not from the rest of the
team.

I'm open to suggestions on how do we separate company's worries from
personal ones.

yeah, it seems like there would be a set of concerns relevant to those
who maintain some kind of infrastructure that draws on the upstream
repo to build an internal repo. This is different from concerns about
interacting directly with upstream. Some of us do both. :slight_smile:

Maybe divide the survey into sections, and sequence through different
sections depending on the answers to preliminary questions. So you could
ask (some better-phrased version of) "do you maintain infrastructure that
draws on the upstream repository?"
If yes, you get directed to an extra section of the survey to answer the
cost/benefit questions as they pertain to your infrastructure, separately
from the section about your individual interactions with upstream. And
people who answer No don't have to bother with that section.

I'm assuming multiple people for some companies will reply. Are they
all giving their personal views or the company's? Would one person in
the company be selected to tell the tale, or do we join all responses
to mean the whole?

I don't have answer to those questions... :frowning:

I had that exact same question, and now I've had time to think about it.

Early on in my interactions with the LLVM community (stay with me, I'll
get there) it became apparent that one's corporate affiliation didn't
really matter that much. It's a community of individuals, some of whom
happen to be working on the same projects or with consistent goals or
maybe just for the same employer. Ultimately it's good to have a
collectively positive reputation, but my perception is that (hopefully)
when I'm shooting my mouth off it doesn't really reflect badly on the
rest of the team.

Being individuals, not everybody in one organization will do things the
same way. I use SVN, somebody else might use git. We have folks focused
on the coverage or static analyzer or sanitizers, which I'm not, and so
their concerns will (reasonably) be different from mine.

Therefore, rather than try to come up with a collective "Sony response"
I'm going to encourage my teammates to take the survey, with the only
caveat being that we try to answer the affiliation question the same way.
And, the few of us directly involved with the infrastructure bits would
appreciate being able to answer differently for those bits than for what
we do individually.
HTH,
--paulr

Renato Golin <renato.golin@linaro.org> writes:

The one time cost of the mono-repo proposal is drastically different
than that of the multi-repo.

True.

But maybe not as different as from one company / project to another.
I'm assuming some people will suffer a lot more than others on either
choice.

I already use git, but depending on how things are organized in the new
world this may completely change how I work with LLVM.

It will, but you already work around with Git-SVN, which is a pain.

I think you misunderstood what I meant here. Whether "moving to git"
will affect my workflow depends very much on "how we're moving to
git". For example, if we do a monorepo, I may now need to lay code out
differently on my filesystem (since I currently check out multiple repos
rooted at llvm), or if we do a multirepo I probably need to learn some
new commands to associate llvm and clang repos (rather than using git
svn find-rev). If we do something where there's a monorepo of some of
the stuff but not all, I probably have to adapt to things from each.

What I'm saying is "How much will moving to git affect your workflow?"
isn't a meaningful question without concrete data on what the repos will
look like if we do move.