RFC: Document and standardize Python code style

the only justification being “that’s black’s default.”

For python I think it would be best to go with “whatever is common/default”. I think it makes sense for LLVM to have opinions on C++ style, but I don’t see why the project should take a stance on python style. It will just increase the work for people needed to configure their editors, IDEs and tools to project specific settings. We also didn’t aim for 2 space indentation in python either which would be LLVMs C++ default…

That said I am a bit confused in that PEP-8 specifies 79/80 line length while the black documentation heavily defends their standard of 88 :-/ But given blacks ubiquity in the python world (as far as I can tell) I think we best go with their default anyway…

1 Like

I don’t have a strong opinion in the question about the line lengths. But there is a practical problem, setting the line length in black requires a command line switch with no config file (you are not supposed to customise black, I guess). So that will probably lead to people invoking it with the wrong options.

So my vote would be on the default 88. But I won’t argue loudly about it.

Feh. Well, having an easy-to-use tool is worth some minor inconveniences. I won’t kick any more about 88.

Is there a way to detect code style issues during build time(similar to pylint) that we can enforce so that it is easy to flag issues ?

Ping on this, I would like to move forward and close this topic out. Any concerns with committing the documentation changes and start batch updating the style with black?

2 Likes

No objections from me. I’d be happy to volunteer to do the LLDB parts so I can synchronize it with doing the same downstream and avoid merge conflicts.

Surfacing a comment from ⚙ D150782 [NFC][Py Reformat] Reformat python files in mlir subdir (I haven’t seen this discussion previously): MLIR has been using YAPF with Google style for a while for Python code. Running that results in a relatively clean diff (⚙ D151112 WIP: reformat MLIR python with yapf, 350 LoC) compared to near-complete reformatting with Black + PEP-8 (⚙ D150782 [NFC][Py Reformat] Reformat python files in mlir subdir, 33000 LoC). I am interested in arguments for/against letting MLIR keep using the pre-existing style.

I am against having different formatters for different subdirs because of project cohesion reasons. I think it will complicate reviews, CI workflows and communication about the code style for python. The upside of avoiding a one time reformat doesn’t seem enough to me to create a long term diversion.

I really don’t have much more to say about that. But I am open to hear what others think.

1 Like

I agree that diverging isn’t ideal, there should be a high bar to justify divergence IMO.
Can you summarize the yapf vs black situation? You mentioned yapf in the thread and expressed a personal preference for black, but it wasn’t discussed further as far as I can tell.

I’m agnostic here, Python isn’t my forte, and a quick internet search makes it look like it’s very polarized (reddit for example).
Here is a nice internal doc from an open-source project to motivate switching from Yapf to Black: [external] Proposal to Format Ray Code with Black - Google Docs (just posting this for reference, this is basically the kind of doc I was looking for to justify switching MLIR from yapf to black).

I prefer black mostly because it doesn’t have a million knobs and settings. It’s easy - doesn’t require settings or args. You run it and get formatted code.

I argued for PEP-8 formatting since it’s close to a standard you can get for python code style and black gives us that without any settings.

Even if we switched to YAPF we need to select a style and we have previously landed on PEP8. So it’s not certain that it will be a almost empty diff with MLIR in that case.

I think the linked document makes a lot of good points as well that I all agree with.

If there is a strong desire from anyone to change what’s already accepted into the documentation and we already have posted several patches for, I hope that this person can pick up this work and finish it - I don’t have more time to put into this topic.

1 Like

Hey everyone,

I’m very excited about this effort. I saw that @tobiashieta already landed his reformat patch for llvm and I’d happy to help with lldb reformat.

Have you thought about adding python format checks in Phabricator the same way we do it with C++ ?

I think we should also document clearly what’s the easiest way to format a patch, may be add a mention for python code formatting in the “How to Submit a Patch” section of tutorial ?

I’ve personally been using darker, which will only reformat the committed python changes instead of the whole file, similarly to git-clang-format.

3 Likes

Hi! Thanks for your interest in helping out.

I would like it if someone added pre-commit testing in Phabricator; I don’t have the know-how or the time to learn. But if it’s something you want to help out with, I am more than happy to receive that help :slight_smile:

Documentation updates are also welcome, of course! Darker seems interesting, for sure. I think the best way forward would be to test how to set up the pre-commit testing and then open a new forum thread with your suggestions. Then we can discuss the details better.

Thanks again for helping out!

1 Like

Hi,

I missed this discussion and the patch that applied the formatting in various subdirectories like libc++. I just wanted to give some feedback regarding the results that Black generates. In various places where we had “repetitive” code that was meant to be aligned together, Black generated something quite difficult to read that lost useful formatting properties of the original code. Is it possible to turn off Black for a local scope like // clang-format off?

One example of this is here: ⚙ D150763 [NFC][Py Reformat] Reformat python files in libcxx/libcxxabi

The original code was:

Parameter(name='use_sanitizer', choices=['', 'Address', 'HWAddress', 'Undefined', 'Memory', 'MemoryWithOrigins', 'Thread', 'DataFlow', 'Leaks'], type=str, default='',
            help="An optional sanitizer to enable when building and running the test suite.",
            actions=lambda sanitizer: filter(None, [
            AddFlag('-g -fno-omit-frame-pointer') if sanitizer else None,

            AddFlag('-fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all') if sanitizer == 'Undefined' else None,
            AddFeature('ubsan')                                                                          if sanitizer == 'Undefined' else None,

            AddFlag('-fsanitize=address') if sanitizer == 'Address' else None,
            AddFeature('asan')            if sanitizer == 'Address' else None,

            AddFlag('-fsanitize=hwaddress') if sanitizer == 'HWAddress' else None,
            AddFeature('hwasan')            if sanitizer == 'HWAddress' else None,

            AddFlag('-fsanitize=memory')               if sanitizer in ['Memory', 'MemoryWithOrigins'] else None,
            AddFeature('msan')                         if sanitizer in ['Memory', 'MemoryWithOrigins'] else None,
            AddFlag('-fsanitize-memory-track-origins') if sanitizer == 'MemoryWithOrigins' else None,

            AddFlag('-fsanitize=thread') if sanitizer == 'Thread' else None,
            AddFeature('tsan')           if sanitizer == 'Thread' else None,

            AddFlag('-fsanitize=dataflow') if sanitizer == 'DataFlow' else None,
            AddFlag('-fsanitize=leaks') if sanitizer == 'Leaks' else None,

            AddFeature('sanitizer-new-delete') if sanitizer in ['Address', 'HWAddress', 'Memory', 'MemoryWithOrigins', 'Thread'] else None,
            ])),

The new code is:

Parameter(
    name="use_sanitizer",
    choices=[
        "",
        "Address",
        "HWAddress",
        "Undefined",
        "Memory",
        "MemoryWithOrigins",
        "Thread",
        "DataFlow",
        "Leaks",
    ],
    type=str,
    default="",
    help="An optional sanitizer to enable when building and running the test suite.",
    actions=lambda sanitizer: filter(
        None,
        [
            AddFlag("-g -fno-omit-frame-pointer") if sanitizer else None,
            AddFlag(
                "-fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all"
            )
            if sanitizer == "Undefined"
            else None,
            AddFeature("ubsan") if sanitizer == "Undefined" else None,
            AddFlag("-fsanitize=address") if sanitizer == "Address" else None,
            AddFeature("asan") if sanitizer == "Address" else None,
            AddFlag("-fsanitize=hwaddress") if sanitizer == "HWAddress" else None,
            AddFeature("hwasan") if sanitizer == "HWAddress" else None,
            AddFlag("-fsanitize=memory")
            if sanitizer in ["Memory", "MemoryWithOrigins"]
            else None,
            AddFeature("msan")
            if sanitizer in ["Memory", "MemoryWithOrigins"]
            else None,
            AddFlag("-fsanitize-memory-track-origins")
            if sanitizer == "MemoryWithOrigins"
            else None,
            AddFlag("-fsanitize=thread") if sanitizer == "Thread" else None,
            AddFeature("tsan") if sanitizer == "Thread" else None,
            AddFlag("-fsanitize=dataflow") if sanitizer == "DataFlow" else None,
            AddFlag("-fsanitize=leaks") if sanitizer == "Leaks" else None,
            AddFeature("sanitizer-new-delete")
            if sanitizer
            in ["Address", "HWAddress", "Memory", "MemoryWithOrigins", "Thread"]
            else None,
        ],
    ),
),
2 Likes

According to the docs:

Black reformats entire files in place. It doesn’t reformat lines that end with # fmt: skip or blocks that start with # fmt: off and end with # fmt: on. # fmt: on/off must be on the same level of indentation and in the same block, meaning no unindents beyond the initial indentation level between them.

Maybe try fmt:off?

Thanks for reading the docs for me :slight_smile:

I’ll apply it to the places where Black did a poor job, for the rest I think an automated format is indeed a good thing. Cheers!

Do we need to document in what cases we recommend this should be used?

Probably not, AFAIK we don’t for clang-format yet use the directives at various places in our tree?

Earlier in this discussion, there was some confusion about black’s support for config files. It seems worthwhile to point out that black does support config files.

For example, I appended the following to llvm/utils/lit/pyproject.toml, and it had the expected effect on black’s behavior within lit:

[tool.black]
required-version = 23
line-length = 80

required-version = 23 is one possible way to address the following issue noted in the LLVM coding standard:

Black allows changing the formatting rules based on major version. In order to avoid unnecessary churn in the formatting rules we currently use black version 23.x in LLVM.

If people feel it’s best to go with black’s defaults no matter what and avoid settings like line-length = 80 (to override the default of 88), should we adjust the LLVM coding standard to clearly state that requirement? Currently, other text leads me to believe that black command-line options or config files must be supplied in order to obey the standard:

2 Likes

I think this was less about whether config files are supported or not; but how we would motivate different defaults. I would think there is a good chance that the black developers have a lot of experience on good python style, while LLVM as a project has relatively little python code; so it seemed like a good idea to just go along with black defaults and also tolerate possible churn between versions.

LLVM currently is pretty loose in enforcing linters and code formatters. We don’t even reformat and or strictly enforce clang-format for our C++ code (let alone require a specific version of clang-format). So given that precedent it made sense to go with the current scheme… I’m not saying this is better or my personal preference, but I am saying that this follows the looser spirit we have with C++ formatters and I think it’s good to follow a similar spirit here…

(And if you want to discuss enforcement of clang-format in LLVM, then we better start a new thread from scratch to not derail this discussion…)

1 Like

That can be true, but if they are so experienced and have compelling reasons for this default, why PEP-8 is still saying 80 columns? On top of that, I’m pretty sure that black is not the last Python formatter we are going to use. I’d expect that in Python ecosystem asking for PEP-8 is easier that PEP-8 with black defaults.