running clang format on the Mips target

We are considering running clang format on the whole Mips target.

Is there any rule against this?

Is there any good argument against doing this even if there is no rule against it?

TIA.

Reed

We are considering running clang format on the whole Mips target.

Is there any rule against this?

I don't think there is a rule, but even simpler things like "remove
all trailing spaces" were not common in the past

Is there any good argument against doing this even if there is no rule
against it?

It think the argument was that it makes history harder to follow. A
git blame lands you on the reformatting patch and you have to manually
skip them.

I don't personally consider that a strong argument and I think polly
did just that (run clang-format on everything), so it might be OK
these days.

Cheers,
Rafael

Unfortunately for Polly it was not as clean as we started using clang-format early, so we had to format piece by piece the sections that clang-format could already handle.

However, we now have an automatic clang-format buildbot so formatting became (in some way) a 'solved problem'.

One benefit of fully formatted source code is that there is no need for any new patches to include 'formatting fixes', which previously sometimes obfuscated submitted patches and commits. In fact, patch reviews are now free of formatting discussions. If the patch passes
make polly-check-format, it is ready to be discussed. (If it does not,
make polly-update-format needs to be run). Also, having a single formatting commit to skip might arguably be easier, than to have a mixed history of formatting and patches.

One reason that may hold us back from formatting all LLVM code is the risk of clang-format changing its formatting behavior. The only thing worse than formatting all code is formatting all code twice. :wink:

largely stable the last months. Changes in its formatting behavior have normally been fixed quickly. So it may make sense to format more code.
I would still be afraid of formatting all LLVM, but maybe a backend is
a reasonable sized chunk.

Maybe Manual and Daniel have some experience in formatting a larger set of files.

Cheers,
Tobias

As someone with an lot of out-of-tree changes to the MIPS back end, I would consider this to be somewhat impolite. I hope to upstream the changes that are relevant to users of other MIPS-derived processors in the new year, and having large numbers of formatting changes would make rebasing a lot more painful.

David

Hi David,

What kind of "a lot of out-of-tree changes"?

You should push changes incrementally as you do work. Holding onto changes means that many things,
not just reformatting, can make them need to be redone. We frequently clean up and rewrite
code to make it cleaner and easier to maintain.

We are moving to a more strict internal review and pushing of changes and post commit reviewing.

It takes time to review and respond to comments on formatting issues; time that would be better spent doing new work.

So we would like to have robots, i.e. clang format, do this checking and such.

I would recommend that you start to submit your patches for review.

Reed

Hi Reed,

Hi David,

What kind of "a lot of out-of-tree changes"?

Support for a new (experimental) processor architecture, which is not likely to be upstreamed because it's not of interest to anyone who isn't using our architecture. While these changes are public, there is no point in upstreaming them (they also include some hacks in the middle parts of LLVM to support fat pointers, which might be cleaned up and upstreamed at some point if anyone apart from us is interested in fat pointers), because it's a rapidly changing experimental ISA. Our processor runs FreeBSD, and so I've also been fixing things that are required to let us move the FreeBSD/MIPS build from gcc 4.2.1 to clang.

You should push changes incrementally as you do work. Holding onto changes means that many things,
not just reformatting, can make them need to be redone. We frequently clean up and rewrite
code to make it cleaner and easier to maintain.

We are moving to a more strict internal review and pushing of changes and post commit reviewing.

I have had to fix a large number of bugs in the MIPS back end to allow us to build parts of the FreeBSD kernel and userland with LLVM. It would be nice if you had worked on getting the basic functionality working before chasing microMIPS and the various ASE things.

For us, the first priority is getting things to work with our branch. Once we've got things into a state where it's tested, we'll start upstreaming things. Our code is on github and Jack has the URL, so feel free to pull in anything you want in the meantime, or if there are specific revisions that you'd like me to clean up and rebase I'd be happy to do so. In particular, we have dsub* and daddi, dli, dla, la, and .cpsetup implemented (although not the most efficient implementation of any of them, hence these patches not being ready for upstreaming), .set noat / at doing the right thing, and have fixed (I think) MIPS IV support (although not yet MIPS III, which is an oversight given that Loongson 2F is MIPS III).

It takes time to review and respond to comments on formatting issues; time that would be better spent doing new work.

Has anyone done a code review of the existing MIPS back end?

So we would like to have robots, i.e. clang format, do this checking and such.

I would recommend that you start to submit your patches for review.

I intend to in the new year (and I've already mentioned this to Jack and Vladimir), but only the ones that are relevant for other MIPS consumers. Given the large number of non-standard extensions to MIPS that abound now (and this will only increase when we open source our softcore in the new year), I imagine that there will be no shortage of other MIPS-based back ends being maintained out of tree. The poor modularity in the existing code makes it difficult to upstream these, although I've been (very slowly) working on fixing that in our branch.

David

Daniel Sanders is the official maintainer for the Mips LLVM port.

If you are finding bugs, you should be filing them. We even have an open facing bugzilla for Mips. It would be
possible to arrange you to have an account there. Or you could file them in the official LLVM bugzilla.

If you want to help and fix them yourself or contribute other work; that's great and is very welcome.

When you do that; you should make a patch and push them upstream.

I seriously doubt that anyone at imagination/Mips is going to spend time digging though some external Git
repository to glean possible changes.

At this time you don't have pre-commit privilege so you need to submit them for review, like we all needed to in the beginning including Akira and myself, and get them approved. After some time it's possible to get promoted to post commit review. All of us are subject to post-commit review and per the rules of LLVM, Daniel, being the Mips maintainer, is the final word on any changes to the Mips target and all of us have to adhere to that.

There are procedures in LLVM for all these things and nobody is exempt from them.

Our compiler passes test-suite for Mips32, mips16 and micro mips. It can recurse itself and work on all the
standard benchmarks, plumhall c/c++/c++ library and many other things. Several years back, someone from outside
of Mips, who is not connected to Mips in any way, tested all the compilers with some networking test suites and code and the Mips compiler was the only one at that time that passed 100% out of the box.

Does it have bugs? Most likely unless physics somehow works differently for us. We run our own build bots on many flavors of the compiler
every night and it's a "stop all new work" whenever any of these variants goes red.

All compilers (and software) have bugs otherwise nobody would have invented bugzilla or any of the hundreds of other bug tracking programs.

If you have patches that would help clean up parts of the compiler; these are also welcome but you have to go through
the procedures that we all have to go through.

If it's something major, then you would be advised to submit an RFC and get some kind of buy in from the rest of the
Mips team before doing them.

Reed

Hi David,

I agree with you that it would be rude to simply clang-format the MIPS backend without coordination with any out-of-tree derivatives. To be honest, it hadn't occurred to me that there would be any such derivatives and the possibility wasn't raised in our internal discussion before we brought the subject up on this list. I'm keen to coordinate with such derivatives to minimize disruption as far as is reasonably possible. It's not going to be possible to completely eliminate all disruption (and as Reed notes, general development can also cause this disruption) but we can do our best.

I'm keen to get the bugfixes you mention upstreamed. Some of my colleagues have had a look at your git repo and tell me that some of the bugfixes lack testcases but should otherwise be ok to upstream. It also sounds like you have working MIPS-IV support. If this is the case then I'm keen to get that upstreamed as well since it will go some way towards properly supporting the older ISA's (MIPS-II and MIPS-III in particular) in use by some Linux distro's, OpenBSD, etc.

Has anyone done a code review of the existing MIPS back end?

Yes, all patches are reviewed either before or after commit. A wider review of the backend would be a sensible thing to do at some point in the near future. However, I have to balance the desirability of a perfect design and implementation with impracticality of achieving it and the business needs of our company. I will therefore need to work this around existing schedules and deadlines without disrupting them. At the moment, my aim is to free up developer-time from style issues in patches. The time saved in this area is likely to be significant since our team is split between three/four timezones and as a result discussions about patches can take multiple days.

I'm also keen to hear specific criticisms from outside our team. Could you elaborate on the issues you mention?

I'm keen to get the bugfixes you mention upstreamed. Some of my colleagues have had a look at your git repo and tell me that some of the bugfixes lack testcases but should otherwise be ok to upstream. It also sounds like you have working MIPS-IV support. If this is the case then I'm keen to get that upstreamed as well since it will go some way towards properly supporting the older ISA's (MIPS-II and MIPS-III in particular) in use by some Linux distro's, OpenBSD, etc.

Some have test cases, but the lack is the main reason why I haven't pushed them upstreamed yet. We have to demo our platform to DARPA in early January, so my current priority is getting it working enough for the demo code to build, even if that requires some quite embarrassing hacks. After that deadline is passed, I intend to tidy up the code and separate out all of the generic-MIPS parts from the parts specific to our CPU.

We do have MIPS IV working, as our base processor (BERI, which we've almost finished the paperwork required to open source and should be pushing our very soon) is a MIPS IV implementation. I've not poked at MIPS III, but I now have a Loongson 2F to play with, so I may have a poke at some point.

This would have been easier if a more systematic approach had been taken to the instruction definitions, starting with an ISA reference and defining them, along with their minimum version, and then working on patterns once the assembler was working.

Yes, all patches are reviewed either before or after commit. A wider review of the backend would be a sensible thing to do at some point in the near future. However, I have to balance the desirability of a perfect design and implementation with impracticality of achieving it and the business needs of our company.

Some design decisions, such as the way 64-bit registers are treated as 64-bit registers with 32-bit subregisters rather than as registers capable of containing 64-bit or 32-bit subvalues have complicated things a lot. Some of this is simplified, but I still find I have to duplicate patterns or add some hacky code because SelectionDAG decides that something should be an i32 and then it's regarded as a different register to something that takes an i64.

Lots of things look like they're work-arounds for TableGen or SelectionDAG limitations, but don't document in the code what these limitations are.

I will therefore need to work this around existing schedules and deadlines without disrupting them. At the moment, my aim is to free up developer-time from style issues in patches. The time saved in this area is likely to be significant since our team is split between three/four timezones and as a result discussions about patches can take multiple days.

I'm also keen to hear specific criticisms from outside our team. Could you elaborate on the issues you mention?

There are almost no comments in the code. Things like the expansion of JALR rely on some magic and some documentation would be very helpful.

There have been some very odd implementation choices, for example deciding to special-case hardware register 29, when it is actually less code to support all 31 hardware registers.

There are loads of test cases for [d]la with immediate arguments, but [d]la is almost always used with a symbol as the argument and this case wasn't working.

No effort has been made over the course of development to ensure that we get the same output via the assembler as we do via direct object code emission. This is a very simple and obvious sanity check, but the back end can't consume its own output, which means that bugs are very easy to slip in.

David

From: Dr D. Chisnall [dc552@hermes.cam.ac.uk] on behalf of David Chisnall [David.Chisnall@cl.cam.ac.uk]
Sent: 24 December 2013 14:33
To: Daniel Sanders
Cc: Reed Kotler; LLVMdev@cs.uiuc.edu
Subject: Re: [LLVMdev] running clang format on the Mips target

> I'm keen to get the bugfixes you mention upstreamed. Some of my colleagues have had a look at your git repo
> and tell me that some of the bugfixes lack testcases but should otherwise be ok to upstream. It also sounds like
> you have working MIPS-IV support. If this is the case then I'm keen to get that upstreamed as well since it will
> go some way towards properly supporting the older ISA's (MIPS-II and MIPS-III in particular) in use by some
> Linux distro's, OpenBSD, etc.

Some have test cases, but the lack is the main reason why I haven't pushed them upstreamed yet. We have to
demo our platform to DARPA in early January, so my current priority is getting it working enough for the demo
code to build, even if that requires some quite embarrassing hacks. After that deadline is passed, I
intend to tidy up the code and separate out all of the generic-MIPS parts from the parts specific to our CPU.

That sounds good to me. I hope the demo goes well.

It sounds like late January or early/mid February may be a good time to do the proposed clang-formatting since the
generic parts of your work will likely be upstream by then. Does that sound good to you?

We do have MIPS IV working, as our base processor (BERI, which we've almost finished the paperwork required
to open source and should be pushing our very soon) is a MIPS IV implementation. I've not poked at MIPS III, but
I now have a Loongson 2F to play with, so I may have a poke at some point.

This would have been easier if a more systematic approach had been taken to the instruction definitions, starting
with an ISA reference and defining them, along with their minimum version, and then working on patterns once the
assembler was working.

I agree that that it would have been better to have the complete ISA version information in the .td files even with the
decision in LLVM 3.0 (prior to which the MIPS target was experimental) to not support the older ISA's. The older
ISA's still wouldn't be officially supported but it would have been easier to add them and it's likely that a considerable
amount would have worked with only minor patches.

> Yes, all patches are reviewed either before or after commit. A wider review of the backend would be a sensible thing
> to do at some point in the near future. However, I have to balance the desirability of a perfect design and
> implementation with impracticality of achieving it and the business needs of our company.

Some design decisions, such as the way 64-bit registers are treated as 64-bit registers with 32-bit subregisters rather
than as registers capable of containing 64-bit or 32-bit subvalues have complicated things a lot. Some of this is
simplified, but I still find I have to duplicate patterns or add some hacky code because SelectionDAG decides that
something should be an i32 and then it's regarded as a different register to something that takes an i64.

Lots of things look like they're work-arounds for TableGen or SelectionDAG limitations, but don't document in the code
what these limitations are.

Unfortunately, allowing multiple types in the same register has similar issues. When there are multiple choices of types
(after type-inferencing) it requires you to explicitly specify types to resolve it to a single choice. This then forces you back
into duplicating the patterns to cover all the types you wanted to cover. I've hit this problem in a few places in the
implementation of MSA.

> I will therefore need to work this around existing schedules and deadlines without disrupting them. At the moment, my
> aim is to free up developer-time from style issues in patches. The time saved in this area is likely to be significant since
> our team is split between three/four timezones and as a result discussions about patches can take multiple days.
>
> I'm also keen to hear specific criticisms from outside our team. Could you elaborate on the issues you mention?

There are almost no comments in the code. Things like the expansion of JALR rely on some magic and some documentation
would be very helpful.

There have been some very odd implementation choices, for example deciding to special-case hardware register 29,
when it is actually less code to support all 31 hardware registers.

There are loads of test cases for [d]la with immediate arguments, but [d]la is almost always used with a symbol as the
argument and this case wasn't working.

No effort has been made over the course of development to ensure that we get the same output via the assembler as we
do via direct object code emission. This is a very simple and obvious sanity check, but the back end can't consume its
own output, which means that bugs are very easy to slip in.

David

Thanks, those are fair criticisms.

Many of them are a result of us using binutils or direct-object-emission in our LLVM-based toolchains. This has resulted in a
strong focus on code-generation and direct-object emission but the assembler has had far less attention. This has been fine
for our purposes so far but may be something we ought to re-visit. I'll discuss this with the managers when I'm back in the office.

The comments and documentation situation has started to improve but has a long way to go. Jack Carter in particular has
been raising lack of API and implementation level comments in non-trivial functions in several internal patch reviews.
We currently lack documentation for many user-level issues (e.g. what intrinsics are supported, the constraints available to
inline assembly, etc) and higher level design issues (e.g. the meaning of 'SE', magic holding things together, valid but unexpected
code-selection, etc.). I need to think about the best way to do some of this since inaccurate or out-of-date documentation is
arguably worse than no documentation and it can be difficult to identify the documentation affected by a given piece of code.

Thanks Rafael and Tobias for the feedback. It's particularly good to hear that clang-format has helped Polly eliminate formatting discussions from patch reviews since this is one of the key benefits we hope to gain from doing this. The distribution of our team across different time zones makes these discussions rather slow.

One reason that may hold us back from formatting all LLVM code is the risk of
clang-format changing its formatting behavior. The only thing worse than
formatting all code is formatting all code twice. :wink:

That's a reasonable concern both for all LLVM code and for a single backend. I think that the benefits of mass-formatting outweigh the costs as long as we rarely need to do it. However, doing it frequently becomes more of a hindrance than it's worth.

It seems unlikely that clang-formats behaviour will change significantly (excluding occasional bugs). For example, indentation style and depth should remain constant. As long as any behaviour changes are fairly minor (such as the precise point a line break is inserted for >80 col lines), then it should be ok to do one mass-format and then maintain the good formatting by formatting only the lines touched by each patch. Eventually the patches might get noisy enough to warrant another mass-format but I would hope that it would take upwards of 5 years to reach the point where it becomes a sufficient nuisance again.

Manuel, Daniel,

I forgot to CC you in this mail. The thread here may be interesting to you, also in the light of the last discussion we had.

Tobias

FYI, our usual recommendation is to use clang-format only on changed code, unless there’s strong evidence that the cost of the initial reformatting is basically zero (which usually means a team / community is pretty unanimous about the decision).

I would slightly alter this recommendation depending on whether the codebase is currently in a reasonably consistent state (both internally and wrt. what clang-format would do).

The basic assumption is that code is easier to read if it is more consistent.

If the code base is currently inconsistently formatted, then a full reformatting might provide benefit. If it is significantly different from clang-format, then an incremental approach has the cost of making the code more inconsistent (at least temporarily).

Thanks for the recommendations.

The MIPS team itself is unanimously agreed that an initial formatting is worth the cost. I’ve had an objection from David Chisnall which I’m trying to accommodate by waiting for him to upstream the relevant parts of his out-of-tree modifications before going ahead. I haven’t had any other objections but I intend to send an email giving three weeks’ notice calling for comments/objections in case there are others.

At the moment, the MIPS backend is generally consistent inside each file but not between them. When I ran clang-format on lib/Target/Mips/*.{cpp,h} in my working copy, the diff was 555KB, so I’d say it’s generally inconsistent with clang-format.

Thanks for the recommendations.

The MIPS team itself is unanimously agreed that an initial formatting is worth the cost. I've had an objection from David Chisnall

Hi Daniel,

Without weighing in on either side of that discussion, I'd like to point out a technical solution to merging out-of-tree changes following upstream reformatting.

It should be possible to run something like the following on any out-of-tree branch to reformat each commit individually:

# Reformat divergent commits in history
git filter-branch --tree-filter "git diff -U0 | clang-format-diff.py -p1 -i"
# Rebase to ToT
git rebase origin/master

With these commands, version history will be rewritten to match upstream formatting and the individual commits will apply without conflicts regardless of when the branch was forked. We've used variations of this to great effect.

So if you're absolutely decided that you want to reformat the whole tree, I don't see a strong reason to wait three weeks.

Alp.