Git autocrlf for Windows - why does the Getting Started guide say to use false?

Hi all,

TL;DR - should we recommend core.autocrlf=input instead of core.autocrlf=false on Windows?

I recently switched from doing pre-commit builds and tests from my Ubuntu VM, to my Windows machine where I do all my development. I was aware that line endings is an issue, so made sure to set core.autocrlf=false as directed in the Getting Started guide (see https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm and https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git which explicitly say to use “false”). This seemed a little odd to me, as that would mean that if I made any edits, I could potentially end up with Windows line endings, by my understanding, but I assumed that I had misunderstood something, so just went along with it.

However, at some point in one of my commits (at least I assume it was one of mine, since I was working on the file), I somehow managed to change an existing file to using Windows line endings, which someone else fixed for me (see commit aca3e70d2bc0dd89b7d486c2a8eac70d8a89e790).

Having double-checked against my settings on my downstream clone, I noticed that I had core.autocrlf set to input there, and as far as I know, I’ve never had any line-ending problems with working with LLVM in that repo. My understanding is “input” should avoid problems such as mine, i.e. by ensuring the checked-in files have LF line endings, whilst everybody, regardless of system will get native line endings in their actual clones. I am aware that there are some files that deliberately use Windows line endings - I’m guessing that these are not widespread and that they don’t need regularly modifying, so presumably those users who need to edit those files can change their setting as needed.

I’m happy to update the docs, but I want to make sure that any update I make is correct!

James

The previous discussion about it was here: https://reviews.llvm.org/D48494

Personally I’d rather see if we could solve it with some clever use of .gitattributes rather than make it a user-setting but sadly I don’t have time right now to commit to trying out a solution.
I notice that there is one case already of using .gitattributes to make sure autocrlf doesn’t change line endings, FWIW:
https://github.com/llvm/llvm-project/blob/master/llvm/.gitattributes#L19

In general, I think you’re asking for trouble if your source control system is doing anything other than storing the bits you send to it. It shouldn’t be editing files for you.

There are some places where we specifically want to test that tools behave correctly with certain line endings, and core.autocrlf=input will break those tests. That could probably be solved with proper use of .gitattributes files, but as Greg mentions, we aren’t there yet.

My solution for this in the past has been to use a git extension command which I call dos2unix so that I can run git dos2unix and it will run through my list of staged changes, run dos2unix on all of them, and then I can git add -u the result. This ensures I’m always committing unix line endings.

Right, my understanding is that files that specifically need CRLF should say so via .gitattributes somewhere.

autocrlf=false ought to work if your Windows editor doesn’t (or can be persuaded not to) introduce CRLFs. I’ve observed that the Visual Studio editor will detect the line-ending style of the file you’re editing, and imitate it. This works great until you create a new file. :stuck_out_tongue_closed_eyes: So I’ve learned to “create” a file by copying an existing file and then editing it.

I’m inclined to think the llvm.org docs should say to use “input” on Windows, we’ve been happy with that in the Sony repo as James mentioned.

But I’d also like to hear from a Windows user from outside Sony first.

–paulr

I thought the handful of test files that require specific line endings were marked as “binary” so that git won’t treat them as text.

I agree that, in theory, you don’t want your source control to change your source code, and that’s a complaint I’ve had about older source control systems that replace tokens like $Version$ upon check-in. That’s not the code you tested.

I’ve been getting by with false on Windows. You can tell Visual Studio (at least in 2017+) to warn you if you load a file with inconsistent line endings. I believe the VS 2019 editor has options to select which type of line endings you want be default. It will also show you the files line ending type in the status bar.

I also think we should recommend input, if it does what I think it does: convert from CRLF to LF, and never back.

This can break a few cases:

  • adding binary files for a test that you forgot to mark binary
  • adding CRLF text files for a test that you forgot to mark binary

These seem like more common exceptions than adding a new source file, which are often mistakenly committed with CRLF. Any screwups, reverts, and bot breakages from that seem less costly than accidentally adding line ending changes, which live forever as a tax on git annotate.

I use almost exclusively 4 text editors (for some meaning of “editor”) in my usual workflow: Sublime Text (for tests), and Visual Studio 2017 (for basically everything else). The third one is Beyond Compare which I use for diffs, and often make minor edits in there prior to commit. Finally, I use the p4merge tool for resolving merge conflicts. On further investigation, it looks like the last of these is the culprit for messing up my line endings (when I save the merge resolution, it saves with Windows line endings), but it has an explicit setting I can change to use LF only endings, so I have fixed this. Hopefully this should prevent me messing up existing files from now on. It doesn’t solve the new file issue though, nor does it solve the issue for new people to the project, since it is an obvious counter example to the “all reasonable text editors do the right thing” comment.

I do use dos2unix, but only for generating new patches for uploading to Phabricator. I did experiment with it once when I thought something looked fishy with the line endings in the code I was looking at, but it seemed to touch my whole file. Probably it was in a bad state already though, and I just hadn’t realised.

I do think fixing .gitattributes sounds like the best solution, and I certainly would prefer input be the specified preference (I might well change my settings to do that, to solve things for my own workflow), as this solves the new file issue. I note also that git warns if your line endings would be changed automatically when you add a file. I’d propose a note to add to the documentation as a possible compromise, or possibly a pair of options, listing the caveats of each, but I can’t think of a concise coherent way of saying it. Suggestions welcome!

What’s the state of this right now? I have seen several changes to llvm/.gitattributes but there is no clang/.gitattributes file and this thread seem to indicate that some of the tests under clang is the problem.

I’ve not made any attempts myself to resolve this, as I am unfamiliar with any tests that require the setting. I’m also not certain there’s been a proper consensus on the topic. That being said, I actually set my Windows git config to core.autocrlf=input quite some time ago. I don’t see any issues for day-to-day development, so I’m assuming that the issue, if it still exists, only impacts those actually modifying things.

I’ve made some fixes in this area recently. I had the LLVM test suite running clean, and fixed a few issues with Clang. Some of the Clang issues were just file offsets being checked in tests, some are that Clang’s tools don’t respect line endings (which is a little harder to fix).

I think it would be great to get this all fixed. Among other things being able to turn on autocrlf would eliminate those Windows users accidentally changing every line in a file.