Well, this is definitely ABI breaking, so effectively a new ABI is what I meant.
I had the same initial reaction when it was first explained to me but it doesn't actually break the ABI compatibility. It's really a seamless migration. Suppose you have a big source base such as 32-bit Debian. All the packages are compiled for O32 so any package that uses the FPU requires FR=0 mode. Once O32+fpxx is in the compiler and becomes the default, the packages will be gradually rebuilt with O32+fpxx as the ABI. O32 and O32+fpxx are ABI compatible (I'll explain this in the next two paragraphs) so we can mix an O32+fpxx glibc and an O32 libavcodec as part of ffmpeg (the result being an O32 ffmpeg binary requiring FR=0 mode). Over time, more and more packages are rebuilt with O32+fpxx (through natural churn) and the source base becomes more and more FR mode independent. Sooner or later our previous example of glibc, libavcodec, and ffmpeg is entirely O32+fpxx and no longer requires any particular FR mode.
Earlier, I mentioned that O32 and O32+fpxx are ABI compatible but didn't explain it. We'll start with the 'Some callee-saved registers are also treated as caller-saved' change. In O32, the 32-bit values in $f20 - $f31 are callee saved. In O32+fpxx, the 64-bit values in even registers $f20 to $f30 are callee saved. In FR=0 mode, these definitions are equivalent since 'swc1 $f20, 0($base)' and 'swc1 $f21, 4($base)' modifies memory in the same way as 'sdc1 $f20, 0($base)'. We don't have to worry about the extra alignment restriction because the stack is 8-byte aligned. However, in O32+fpxx the odd registers $f21 to $f31 are _also_ caller saved in addition to being callee saved. In FR=0 mode, this means that these values are preserved by both the caller and the callee, but in FR=1 mode they are only preserved by the caller because in FR=1 mode 'sdc1 $f20, 0($base)' does not store any part of $f21. Hopefully that made sense to you, it's probably the hardest part of O32+fpxx to follow.
The one remaining difference between FR=0 and FR=1 is that reads/writes to/from the upper 32-bits of a double precision value can be done via odd registers in FR=0 mode, but must be done via dedicated instructions in FR=1 mode. For example, in FR=0 mode 'mtc1 $zero, $f21' zeros the upper 32-bits of the double-precision value in $f20. In FR=1, we would use 'mthc1 $zero, $f20'. To resolve this, O32+fpxx simply bans the exploitation of the fact that the odd registers are the upper 32-bits of the even registers. Both FR=0 and FR=1 must use 'mthc1 $zero, $f20'. The catch is that MIPS-II and MIPS32r1 don't have 'mthc1' and don't have the alternative available to the 64-bit ISA's (dmtc1) either. These two must use a sdc1/ldc1 sequence instead. It's unpleasant but the benefit of fixing the bigger issue and migrating to a world where code can run on both FR=0 and FR=1 without modification is seen to outweigh the cost to performance. The bigger catch is that MIPS-I doesn't have sdc1/ldc1 either. In this case, we've decided to leave MIPS-I behind, it can continue to use O32 as it currently does.
Hopefully at this point I've convinced you that this first step is ABI compatible and that the 32-bit Debian example I started with will gradually become a distribution that can run in both FR=0 and FR=1 rather than one that requires FR=0. The next step is to allow binaries to require FR=1 mode. At this point, this can actually happen without any further ABI modification since O32+fpxx is also compatible with the O32+fp64 I mentioned in my previous email. This is because O32+fp64 was careful to define the even registers $f20 to $f30 to be callee saved and the odd registers $f21 to $f31 to be caller saved. The main advantage to O32+fp64 is that it is permitted to use the odd FPU registers for storage and arithmetic whereas O32+fpxx is not in case it runs in FR=0 mode where these registers are not usable.
So in summary, each step is ABI compatible with the previous step. The linker will ensure that the end-user doesn't try to do the second step before the first step is finished since it will refuse to link a binary that contains both O32 and O32+fp64. It will produce an O32 binary given a combination of O32+fpxx, and similarly a O32+fp64 binary given a combination O32+fpxx and O32+fp64.
Curious why an extension to o32 for this and not, for example, just using n32?
N32 is an ABI that requires 64-bit general purpose registers so it's not supportable on a 32-bit ISA. More importantly, it would be difficult (if not impossible) to arrange that downstream source bases transition all their code to a new ABI at once. The seamless migration from O32 to O32+fpxx allows source bases to transition piece by piece and therefore allows us to drive the migration using the compiler and the natural tendency for living code to be recompiled at some point. Without this, it's doubtful that downstream sources would make the transition at all as evidenced by previous unsuccessful attempts at updating the ABI. Another important reason is some projects (e.g. Android) need to be able to execute the previously compiled apps from earlier toolchains without modification.