LLVM source file line endings

Hi all,

This seems like a basic question, but does LLVM source code require any specific line ending encoding? I recently committed 2 new files to the LLVM repo (ModRef.cpp and OptionStrCmp.cpp) and both seems to have DOS style line endings (\r\n) whereas most other files seems to use Unix line endings. Its causing some of our internal builds to fail and while the fix is simple (run dos2unix of the 2 files) I was wondering if there is convention that LLVM uses and if not, should we automatically convert to Unix file endings ( Configuring Git to handle line endings - GitHub Docs may be with “text” for .c/.cpp/h/.td files?

Alternatively, it seems anyone adding new files needs to make sure that the line encoding is Unix style to manually in order to prevent inadvertent build failures downstream.

Another possibility is a CI warning. But handling this automatically, if possible, seems preferable.

(I am still not sure how the DOS line endings sneaked in; I suspect its VSCode used with WSL2 that defaults to dos style line endings).

Thanks,
Rahul

Files in the LLVM project should almost always have Unix line endings. There are a small number of test files that purposely have DOS endings, but aside from that, no.

I have core.autocrlf=input in my global config, which means files get checked-out with the original line endings. Modern Windows editors (even Notepad) can all tolerate LF endings and won’t randomly insert CR, if the file starts out with LF endings. I also tend to copy an existing file if I need to create a new file, just to make sure I don’t screw up.

Thanks. Yeah, that makes sense. My question then is really if we want this to be enforced that in git itself with the .gitattributes config option (assuming it works the way I think it will, so that any files with DOS endings get converted into Unix endings as a part of committing the file).

I took a quick look at llvm/.gitattributes and it explicitly specifies line endings on certain files (and ‘binary’ for some others). It doesn’t specify a general default, though.

I know this topic has come up before, you could look to see if there’s an old RFC and see what conclusions it came to (if any). You can always put up a new one, too. There are definitely people with Opinions on this topic.

See Finally formalise our defacto line-ending policy by ldrumm · Pull Request #86318 · llvm/llvm-project · GitHub for a related PR. Not sure why that never landed.

Thanks. May be @ldrumm can comment on why it did not land.

On Tue Sep 24, 2024 at 4:20 PM BST, Rahul Joshi via LLVM Discussion Forums wrote:

Thanks. May be @ldrumm can comment on why it did not land.

It never landed because I got distracted. I should have some bandwidth to get
this over the line very shortly. This thread is good incentive to get back to
it, so thanks. Feel free to comment on that PR if there are issues with the
implementation.

Great, thanks! Given that this seems like a common recurrence, would be great to land (I myself committed 2 of these line ending fixes in last 2 days).

I’ve just merged.

To github.com:llvm/llvm-project.git
   8c60efe94ba3..9d98acb196a4  main -> main

Thanks for the push to get this done and please let me know of any unforeseen consequences!

1 Like

Thanks @ldrumm. I think this might deserve a PSA and mention in LLVM weekly to not catch folks by surprise. @asb

1 Like

We were hit by this issue again (see [NFC][LLVM] Fix line endings for DXILABI.cpp by jurahul · Pull Request #160791 · llvm/llvm-project). I know that this PR above for git do this automatically was reverted due to issues, but can we say somewhere in the coding standard (or some other place) that all source code files are expected to use Unix style line endings? Test files (like HLSL source code) can continue to use Windows style line endings.

@nikic and others, seeking opinion if we can make that a part of the coding standard (in the microscopic-details section)? It seems this is already a loosely followed convention and not conforming to it results in downstream breakages for us (maybe others as well?) due to git and perforce interactions.

Making it part of the coding standard makes sense to me.

Thanks, I have started [CodingStandard] Require Unix line endings for C/C++ source and headers by jurahul · Pull Request #161228 · llvm/llvm-project

While the PR for the coding standard is being reviewed, I did this experiment based on @vbvictor prodding. I added LineEnding: LF to llvm/.clang-format and ran unix2dos on a file and ran clang-format, and it restored the unix line endings. Is that a solution to consider? That way, git itself does not do any source changes to the file and its explicit in the clang-format step.

1 Like

And here’s a test PR to demonstrate that this catches any non-unix line endings in the PR before committing: Test clang-format based unix line ending by jurahul · Pull Request #161257 · llvm/llvm-project. I also verified that test files (under llvm/test) are unaffected by the change. So it only applies to actual source files that will be built as a part of LLVM build.