GitHub Survey?

Hi Duncan,

I don't understand your concerns.

First, the choice between sub-modules and mono-repo has been put
forward as the only two choices because people felt that, if we let it
open, we'd have too many different implementation details and we'd
never get anywhere.

Yup, that makes sense. I'm partly just trying to fill in the survey now that we've been informed by the proposal process (impossible to predict ahead of time).

BTW, this has now been committed, so you can read it here:
http://llvm.org/docs/Proposals/GitHubMove.html

So...

- how much pain the transition would cause, instead of what they think the right final state is.

The final state is defined by submod vs. monorepo, and that's
represented in a different question. Those questions are addressing
the additional work done to get there, as many have said would be the
crucial decision point.

It also outlines the cost over their preferred vs non-preferred
solutions, which leads to the aggregated cost over the whole project
for each decision.

I guess my problem is that it's really hard for people to judge this, so I'm breaking it down to specific questions that are easy to judge. I think if we can trust that people answered accurately, the date will be more useful for the BoF discussion.

- what's good for the individuals responding, instead of what they think is best for the LLVM project; and

That's implied. I think it is clear enough, but we can always change
the wording if others feel confused.

Secondly, I'm worried about this question: "How does the choice between a single repository with all projects and the use of sub-modules impact your usage of Git?" I'm not sure we'll good signal from this; it's essentially a vote on the two variants, but it doesn't force the respondent to think about the specific issues. I'd rather find a way to ask about the specific concerns raised in the document.

It is a vote. The "thinking" is on the extended answer that follows.
Answers with good extended reasoning will have a greater weight than
those without.

I'd like to move it later.

Asking for a vote first will affect how respondents answer the rest of the questions; humans have a tendency to post-rationalize their decisions. Asking for the vote at the end forces them to think through the issues first, informing their eventual decision/vote.

If you're worried about data mining, than leaving those questions to
full text answers will require someone to read it all, interpret, and
put their bias on top. Given the nature of this problem, we should
avoid bias whenever possible, especially when interpreting the
answers.

I agree that full text answers aren't good for data mining, and lead to bias.

That's why I argue for spelling out the known concerns with specific radio and/or checkbox questions.

Thirdly, I'm worried that the follow-ups talk about "preferred" and "non-preferred" instead of "multirepo" and "monorepo". This makes data-mining non-trivial (because the meaning depends on previous answers) and increases the chance of respondent confusion.

I see your point. We can re-word to make that more clear.

4. How often do you work on a small LLVM sub-project without using a checkout of LLVM itself?
- Always.
- Most of the time.
- Sometimes.
- Never.

Interesting, it covers the main problem with both proposals.

5. Please categorize how you interact with upstream.
- I need read/write access, and I have limited disk space.
- I need read/write access, but a 1GB clone doesn't scare me.
- I only need read access.

I'm not sure that's critical. My current source repo has 35GB with
just a few worktrees.

Also, both solutions have low-disk-usage modes, and this would make no
difference on how we proceed.

This is targeting the number one contentious issue about monorepo. You can see more in the proposal:
http://llvm.org/docs/Proposals/GitHubMove.html#id12

Affected users that need read-write access can use the SVN bridge (or the git-svn layer on top of it).

There's another concern that the SVN bridge might somehow go away, killing the split sub-project option for write access. However, we'll *always* be able to maintain a split sub-project Git mirror.

This question groups everyone into three categories:
- People that are worried about disk space and need read/write access. They'll be relying somehow on the SVN bridge.
- People that are not worried about disk space. Whether they decide to use monorepo or the SVN bridge, their disk space is not preventing them from using monorepo. (It sounds like this is your category.)
- People that don't need write access, so split Git mirrors would be sufficient (they don't rely on the SVN bridge).

6. How important is cross-project blame, grep, etc.?
- Vital. I already use SVN/monorepo/custom-tooling to accomplish this.
- Extremely. It should be easy enough that everyone does it by default.
- Somewhat. I would use it if it were easy, but it's just nice to have.
- Not at all. Anyone who cares can write their own tooling.

Based on other comments in the thread, we should leave this one out.

I don't understand your reasoning. Why?

This is targeting one of the benefits of monorepo. I think it's important to know if anyone cares.

7. Single-commit cross-project refactoring designs away a class of build failures and simplifies making API changes. How important is it?
- Vital. I already use SVN/monorepo/custom-tooling to accomplish this.
- Extremely. It should be easy enough that everyone does it by default.
- Somewhat. I would use it if it were easy, but it's just nice to have.
- Not at all. Anyone who cares can write their own tooling.

I don't like to assert my opinion and then ask how much people agree.

This doesn't strike me as opinion. It does design away a class of build failures (whether it's an important class is opinion), and it does simplify making API changes (whether it matters is opinion).

I prefer to ask the question directly, like:

How often do you need to commit across repositories (ex. llvm+clang)
and how often are your builds broken because they're in separate
repositories?

This affects more than just the people *making* the changes, so I'm not a big fan of your wording.

I also don't think the frequency of the problems necessarily reflects developers' opinions about how important they are.
- Some people may find that it happens "all the time", but think that it's not important.
- Others may find that it happens rarely, but think that it's devastating.

Also, I think your scale of important is somewhat skewed up. Vital and
Extremely are at the top, somewhat is right bang in the middle and not
at all is the very bottom.

How about "Quite" instead of "Extremely"?

You either have two positive and two negative (very, somewhat, not
much, not at all) or you add a fifth in the middle. I prefer 4 because
that makes people think harder.

I think they're all positive except "not at all" (which is 0). Since "vital" is an absolute adjective, it clearly sets an upper limit. But I'm happy to shift somewhat/extremely if you can think of better things in the middle.

8. The multirepo variant provides read-only umbrella repository to coordinate commits between the split sub-project repositories using Git submodules. Assuming multirepo gets adopted, how do you expect to use the umbrella?
// checkboxes:
+ Actively contribute tooling improvements to improve it.
+ Integrate it into our downstream fork.
+ Use it for upstream contributions.
+ Use it as the primary interface development environment.
+ Use it for bisection.

Good. (+ N/A, too)

Sure, that works. Although "leaving all boxes blank" means the same thing I think.

12. The multi/mono hybrid variant merges some sub-projects, but leaves runtimes in separate repositories using the umbrella to tie them together. Is this the best or worst of both worlds?
- This is great. Native cross-project refactoring, without penalizing runtime-only developers.
- Whatever. I'll deal with it.
- This is terrible. All the transition pain of monorepo, without the advantages.

I didn't know we were proposing yet another variant. This seems like a
last minute rushed in proposal and I don't want to endorse it in the
survey. We can discuss them in the BoF, though.

It was raised around a month ago in the proposal thread as a compromise solution. Here's the description.
http://llvm.org/docs/Proposals/GitHubMove.html#multi-mono-hybrid-variant

Since this hasn't been carefully thought through, it risks wasting a *lot* of time at the BoF. I'd like to raise it here so that we know if it's worth talking about there. If this gets a lot of support, then we should talk about it at the BoF. But if most people think it's the end of the world then we can skip the conversation.

13. If multirepo is adopted, how much pain will there be in your transition?
- Nothing consequential.
- A little; but it'll be fine.
- A lot; but it'll get done somehow.
- Too much; I/we may stop contributing to LLVM.

14. If monorepo is adopted, how much pain will there be in your transition?
- Nothing consequential.
- A little; but it'll be fine.
- A lot; but it'll get done somehow.
- Too much; I/we may stop contributing to LLVM.

Those are already covered by the current bad/good, but I'll change the
wording to be like this one.

Yes, these were basically rewording of those questions :).

15. If we could go back in time and restart the project with today's technologies, which repository scheme would be best for the LLVM project?
- CVS.
- Subversion repository with split sub-projects (<sub-project>/trunk), with git-svn.
- Subversion repository as a single project (trunk/<sub-project>), with git-svn.
- Git: multirepo variant.
- Git: monorepo variant.
- Git: multi/mono hybrid variant.
- Other.

Let's not put CVS in there, please. :slight_smile:

I believe it was the choice Chris made, so it seemed worth mentioning ;). If you really don't want it there, I'm fine with you taking it out.

So, what's the purpose of this question?

This is my wording for "the vote".

I mean, we are "starting
fresh" in a way, and the responses of the rest of the survey would
make this question irrelevant, no?

This is wording to tease out: "If there were no transition pain, what do you think the best solution would be?" I worded differently so that I wasn't making people think about the pain while they thought about their answer.

(Might be good to have a text box for "other" in case the entire community wants Mercurial or something; up to you.)

I'll be changing the wording on the ones we all agree on and leave the
ones with questions until they're all solved.

Makes sense!

Oh I see. You’re reword may be enough.
I think what confused me was the “contribute to upstream” part of the question. If you have an idea to rephrase the question maybe?

This is a point of contention and a concern that Chris voiced about the monorepo. It should be in the survey.

A lot of concerns were voiced on the discussion, not all of them here.

Hasn’t this particular point been solved by shallow checkouts?

Chris, are you still worried about disk size on a mono-repo vs. sub-modules?

The point of the survey is to gather data. The fact that not much people are doing it, does not mean that after reading the proposal document they wouldn’t answer " It should be easy enough that everyone does it by default.”.

We can go on and on about many topics, but the more we put in, the

harder it will be to make sense of things. Unless the question is
critical to the problem at hand, which I don’t believe it is, we
should avoid bloating the survey.

We should ask the set of questions that are relevant to the proposal document, this is why Duncan’s feedback arrives right after the proposal is “finalized”. The set of questions he worded are right on point with respect to the document, neither too much or not enough.

If the survey does not help quantifying how much a concern raised in the proposal is important, it is an issue to me.

For this particular question, here are some relevant pieces of the proposal:

http://llvm.org/docs/Proposals/GitHubMove.html#concerns "Refactoring across projects is not friendly: taking some functions from clang to make it part of a utility in libSupport wouldn’t carry the history of the code in the llvm repo, …. “
http://llvm.org/docs/Proposals/GitHubMove.html#monorepo-variant "Tooling based on git grep works natively across sub-projects, allowing to easier …”

As I said to Chris L. before, we can have a complete survey that will
take a lot of time to answer and will give us wonderful data over the
corse of months, and we can have a quick survey to feed the BoF
discussion, but we can’t have both.

Right, it seems we agree on the goal. All of my feedback is oriented toward it.

  1. Single-commit cross-project refactoring designs away a class of build failures and simplifies making API changes. How important is it?

I don’t see an “opinion” in the question.

Perhaps I should have said “a point of view”.

Asking it this way does not allows someone to answer "It should be easy enough that everyone does it by default.”.

I made a scale: must fix / could fix / doesn’t matter.

We’re not “endorsing” anything in the survey. We’re collecting data to help driving the BoF discussing the proposal document.

The deal was to collect what’s proposed only, and we’re not (or should
not be) proposing a third alternative which won’t have time to be
discussed.

Before starting the survey design I stated that we should first have the proposal document ready, and the survey should ask the relevant question with respect to the proposal.

We had the first proposal agreed and documented one month before the
survey first appeared. The second proposal is still not ready and we
won’t have time to do a third.

I’m not sure what you’re referring to here. In case I wasn’t clear before, I’m not interested in any way “to do a third” proposal.

My views is that there is a single document proposal for moving to GitHub, and it is mentioning multiple variants.
It is published here: http://llvm.org/docs/Proposals/GitHubMove.html and it is also attached to the Dev Meeting Schedule: http://sched.co/8Yzj

We need now a survey that matches the proposal document, because that is what will be discussed at the meeting.

The document contains already this variant: http://llvm.org/docs/Proposals/GitHubMove.html#multi-mono-hybrid-variant ; so we need a question about it.

6. How important is cross-project blame, grep, etc.?
- Vital. I already use SVN/monorepo/custom-tooling to accomplish this.
- Extremely. It should be easy enough that everyone does it by default.
- Somewhat. I would use it if it were easy, but it's just nice to have.
- Not at all. Anyone who cares can write their own tooling.

Btw, I now split into multiple pages, to make it less daunting, so
I've added this question at the "usage questions" page, but phrased in
a more generic way as "cross-repo usage" (example, bisect, blame,
etc).

To pick the nit, "cross-repo" is not correct for monorepo. It's only cross-project.

Reworded, thanks!

Btw, can you actually see my changes?

https://docs.google.com/forms/d/e/1FAIpQLSc2PBeHW-meULpCOpmbGK1yb2qX8yzcQBtT4nqNF05vSv69WA/viewform

cheers,
--renato

I’m not sure what you’re referring to here. In case I wasn’t clear before,
I’m not interested in any way “to do a third” proposal.

Ok, so we only mention two.

My views is that there is a single document proposal for moving to GitHub,
and it is mentioning multiple variants.

That mention was contentious and people did raise the issue that we
couldn't discuss that enough.

If people feel so inclined, they can mention that on the free text fields.

I'll add the sub-links to the relevant pages.

cheers,
--renato

I’m not sure what you’re referring to here. In case I wasn’t clear before,
I’m not interested in any way “to do a third” proposal.

Ok, so we only mention two.

No we mention what’s in the document, i.e. we mention this variant.

My views is that there is a single document proposal for moving to GitHub,
and it is mentioning multiple variants.

That mention was contentious and people did raise the issue that we
couldn't discuss that enough.

I don’t see how it is relevant, it is presented in the document, in its own section, just like the other variants. I don’t see any reason asking about it.

The deal, from the beginning, was that we'd restrict ourselves to one
proposal, and make that count. It didn't work, so a second one was
created by a different group.

Both reviews happened in similar ways. Both sides started discussing,
then one side took over and the other gave up. Each part of the
document then became the point of view of a segregated group in our
community.

The sub-modules group had initially many designs, but only one
survived. It would be unfair if the mono-repo group had multiple
designs of their own, because that dilutes the importance of the other
group's work.

If you want to propose a third model, we'll have to do in the same
way: start over, with a new document, and discuss that.

The discussion that happened in the second proposal doesn't count,
because the first group wasn't participating any more, for the same
reasons that the second group didn't participate on the final steps of
the first proposal.

I don't want to favour either groups.

In my view, it shouldn't even be present in the document unless there
was a third round of discussions, which we don't have time for.

cheers,
--renato

Hi Duncan,

I think I have addressed all your points. The survey is now much more
complex, but I think more in line with the document and with more
practical answers (thanks for that).

Can you please confirm that the result looks good on your end?

https://goo.gl/forms/HBlyyDuEsH2tQ5Xi2

cheers,
--renato

To pick the nit, "cross-repo" is not correct for monorepo. It's only cross-project.

Reworded, thanks!

Btw, can you actually see my changes?

https://docs.google.com/forms/d/e/1FAIpQLSc2PBeHW-meULpCOpmbGK1yb2qX8yzcQBtT4nqNF05vSv69WA/viewform

I'm way behind on the thread. I'll catch up in a moment to see if there's anything to respond to (maybe about changes you didn't take).

Great to have the links to the sections of the proposal throughout the survey. That's really useful!

Looking at the current survey as of 16h46 on Oct 13th, two comments on the questions that are there:

How important is it to have multiple projects in one repository (for example, bisecting problems, code archaeology, etc)?

^ I think that should be required.

Do you believe the LLVM project would be better with a main Git repository, or keeping it in SVN would still be better? *

You're missing an opinion I've heard frequently cited:
- All our usage is already in Git, but the main repository being in SVN is a huge benefit (for whatever reasons).

Renato,

Let me be clear about my motivation on this particular question: I don’t like this variant, and I don’t want us to extra time discussing it at the BoF because we have enough things to go through.

But that is only my personal opinion, and I avoid driving solely on my personal opinion, which is why this variant is present in the document.
I believe data and facts are and the only way is to confirm whether or not it would actually be a waste of time to discuss it extensively at the BoF, so we need to ask about it.

This is why, for the sake of “let’s optimize our limited time” at the BoF, I want to see this question in the survey.

If the survey shows that a significant amount of people have interest in this variant, then we should allocate more time.

For the exact same reason, this question is critical:

  1. Please categorize how you interact with upstream.
  • I need read/write access, and I have limited disk space.
  • I need read/write access, but a 1GB clone doesn’t scare me.
  • I only need read access.

Yes, the survey is longer, folks will have a few more checkboxes to select, I’m perfectly fine with this if it may save 5-10 min of discussion at the BoF (Between 10% and 20% of our allocated time).

ToC.png

Note: other than that the missing questions, I’m happy with the survey as is.
When filling it, I hesitated if it would be better to split the question about multi/mono repo in different pages. But I don’t have any strong opinion about it.

Hi Renato,

Thanks very much for putting this together.

I think the proposal document is almost finished now. Since I ended up reviewing it pretty thoroughly, I've gained a bit of understanding about the concerns we need input on.

Hi Duncan,

I think I have addressed all your points. The survey is now much more
complex, but I think more in line with the document and with more
practical answers (thanks for that).

Can you please confirm that the result looks good on your end?

https://goo.gl/forms/HBlyyDuEsH2tQ5Xi2

(I've caught up now.)

This looks great, thanks so much for filling it in.

Remaining concerns:

1. The minor comments I had in my response ~20 minutes ago.

2. The "disk space and read-only/read-write" question (#5) directly gathers data on the main concern with monorepo, and it's still missing.
=> If we don't collect data on this, we'll have no idea whether anyone cares about the concern.
=> Sparse checkouts, which you mentioned, do not have consensus as addressing this.
=> The Git-svn mirrors on the SVN bridge would address it, but there's a concern they could disappear somehow someday.
=> But all of this is only worth hashing out if there are real users that will be affected.
=> And assuming it is worth hashing out, the kind of solution we come up with in the BoF might depend on the *number* of real, affected users.

3. The multi/mono hybrid question (#12) directly gathers data on the compromise proposal, and it's still missing.
=> If we can show with the survey that no one wants this, we'll save a lot of time at the BoF by knowing that ahead of time.
=> On the other hand, if many people want it, someone should think through it deeply and be prepared to answer questions about it at the BoF.

I've actually got new wording for this one that I think is better (and I hope will demonstrate better why it's important):

12. The multi/mono hybrid variant merges some sub-projects, but leaves runtimes in separate repositories using the umbrella to tie them together. Is this the best or worst of both worlds?
- This is great. Native cross-project refactoring, without penalizing runtime-only developers.
- This is a compromise, but if I can't have multirepo, I want this.
- This is a compromise, but if I can't have monorepo, I want this.
- Whatever. I'll deal with it.
- This is terrible. All the transition pain of monorepo, without the advantages.

^ The difference is the new "compromise" options.
- Realistically, I don't think this is anyone's first choice, but we should have a "great" answer just in case someone surprises us.
- If there are a lot of "we are fine with the compromise" votes, that's useful to know, and might make this worth talking about.
- If there are a lot of "this is terrible" votes, that's useful to know as well.

How important is it to have multiple projects in one repository (for example, bisecting problems, code archaeology, etc)?

^ I think that should be required.

D'oh, fixed.

Do you believe the LLVM project would be better with a main Git repository, or keeping it in SVN would still be better? *

You're missing an opinion I've heard frequently cited:
- All our usage is already in Git, but the main repository being in SVN is a huge benefit (for whatever reasons).

Done.

thanks!
--renato

I had the same hesitation, and I don't have a strong opinion either,
but let together because they're largely similar and part of the same
"group".

Splitting also gets more confusing for people like me, that forgets
what they're filling half-way through, and then answer the wrong
question, because it's repetitive, and it's easier to spot.

cheers,
--renato

2. The "disk space and read-only/read-write" question (#5) directly gathers data on the main concern with monorepo, and it's still missing.
=> If we don't collect data on this, we'll have no idea whether anyone cares about the concern.
=> Sparse checkouts, which you mentioned, do not have consensus as addressing this.
=> The Git-svn mirrors on the SVN bridge would address it, but there's a concern they could disappear somehow someday.
=> But all of this is only worth hashing out if there are real users that will be affected.
=> And assuming it is worth hashing out, the kind of solution we come up with in the BoF might depend on the *number* of real, affected users.

Ok, I've changed the wording to be a bit more generic (section 4,
penultimate question).

3. The multi/mono hybrid question (#12) directly gathers data on the compromise proposal, and it's still missing.
=> If we can show with the survey that no one wants this, we'll save a lot of time at the BoF by knowing that ahead of time.
=> On the other hand, if many people want it, someone should think through it deeply and be prepared to answer questions about it at the BoF.

I think the argument here is that too many people were concerned about
how the mono-repo would be laid out to leave the question out.

For better or worse, it was easier to reach a consensus on the
sub-modules approach because there's only one way to do it (except the
web-hooks vs. server hooks problem).

But my point still stands: the description of that variant is
*substantially* less detailed than the other two and the concerns it
points out are very serious indeed. There are no proposals there, just
a though dump on a paragraph.

So, I think we should change this question into a slide over the
mono-repo proposal that was put forward: "If not all, how much goes
into the mono-repo?"

I tried to convey that on section 6, one of the last questions.

Let me know.

cheers,
--renato

This question is the last one that does not seem great to me. Right now it reads:

If multi-repo is adopted, how do you plan to contribute to upstream? *

  • Using Git submodules for everything (checkout, commit, push)
  • Using the Git repos directly, submodules only for bisecting, etc.
  • Using the SVN bridges.
  • I don’t contribute.

The first answer mention “commit” and “push”, using submodules, which isn’t clear to me what it means in practice since the submodules are supposed to be read-only (no-one should be able to push to the umbrella).

Can you clarify or reword?

Ouch, sorry. It was a bit late yesterday. :slight_smile:

Changed to "use for everything but commit". I'm not sure how much
better that is, but at least it's more accurate.

cheers,
--renato

This question is the last one that does not seem great to me. Right now it
reads:

If multi-repo is adopted, how do you plan to contribute to upstream? *
- Using Git submodules for everything (checkout, commit, push)
- Using the Git repos directly, submodules only for bisecting, etc.
- Using the SVN bridges.
- I don't contribute.

The first answer mention “commit” and “push”, using submodules, which isn’t
clear to me what it means in practice since the submodules are supposed to
be read-only (no-one should be able to push to the umbrella).

Can you clarify or reword?

Ouch, sorry. It was a bit late yesterday. :slight_smile:

Changed to "use for everything but commit". I'm not sure how much
better that is, but at least it's more accurate.

cheers,
--renato

LGTM on my end at this point, aside from a minor edit:

The mono-repo variant provides read/write access to sub-projects via an SVN bridge and git-SVN. Contributors will have the option to continue using repositories split on project boundaries. Assuming mono-repo gets adopted, how do you plan to contribute? *

s/git-SVN/git-svn/
(based on man page: https://git-scm.com/docs/git-svn)

Done.

--renato