While implementing a custom 16 bit target for academical and demonstration purposes, I unexpectedly found that LLVM was not really ready for 8 bit and 16 bit targets. Let me expose why.
Target backends can be divided into two major categories, with essentially nothing in between:
Type 1: The big 32 or 64 bit targets. Heavily pipelined with expensive branches, running at clock frequencies up to the GHZ range. Aimed at workstations, computers or smartphones. For example PowerPC, x86 and ARM.
Type 2: The 8 or 16 bit targets. Non-pipelined processors, running at frequencies on the MHz range, generally fast access to memory, aimed at the embedded marked or low consumption applications (they are virtually everywhere). LLVM currently implements an experimental AVR target and the MSP430.
LLVM does a great for Type 1 targets, but it can be improved for Type 2 targets.
The essential target feature that makes one way of code generation better for either type 1 or type 2 targets, is pipelining. For type 1 we want branching to be avoided for as much as possible. Turning branching code into sequential instructions with the execution of speculative code is advantageous. These targets have instruction sets that help with that goal, in particular cheap ‘shifts’ and ‘cmove' type instructions.
Type 2 targets, on the contrary, have cheap branching. Their instruction set is not particularly designed to assist branching avoidance because that’s not required. In fact, branching on these targets is often desirable, as opposed to transforms creating expensive speculative execution. ‘Shifts’ are only one-single-bit, and conditional execution instructions other than branches are not available.
The current situation is that some LLVM target-independent optimisations are not really that ‘independent' when we bring type 2 targets into the mix. Unfortunately, LLVM was apparently designed with type 1 targets in mind alone, which causes degraded code for type 2 targets. In relation to this, I posted a couple of bug reports that show some of these issues:
The first bug is already fixed by somebody who also suggested me to raise this subject on the llvm-dev mailing list, which I’m doing now.
Incidentally, most code degradations happen on the DAGCombine code. It’s a bug because LLVM may create transforms into instructions that are not Legal for some targets. Such transforms are detrimental on those targets. This bug won't show for most targets, but it is nonetheless particularly affecting targets with no native shifts support. The bug consists on the transformation of already relatively cheap code to expensive one. The fix prevents that.
Still, although the above DAGCombine code gets fixed, the poor code generation issue will REMAIN. In fact, the same kind of transformations are performed earlier as part of the IR optimisations, in the InstCombine pass. The result is that the IR /already/ incorporates the undesirable transformations for type 2 targets, which DAGCombine can't do anything about.
At this point, reverse pattern matching looks as the obvious solution, but I think it’s not the right one because that would need to be implemented on every single current or future (type 2) target. It is also difficult to get rid of undesired transforms when they carry complexity, or are the result or consecutive combinations. Delegating the whole solution to only reverse pattern matching code, will just perpetuate the overall problem, which will continue affecting future target developments. Some reverse pattern matching is acceptable and desirable to deal with very specific target features, but not as a global solution to this problem.
On a previous email, a statement was posted that in recent years attempts have been made to remove code from InstCombine and port it to DAGCombiner. I agree that this is a good thing to do, but it was reportedly difficult and associated with potential problems or unanticipated regressions. I understand those concerns and I acknowledge the involved work as challenging. However, in order to solve the presented problem, some work is still required in InstCombine.
Therefore, I wondered if something in between could still be done, so this is my proposal: There are already many command line compiler options that modify IR output in several ways. Some options are even target dependent, and some targets even explicitly set them (In RenderTargetOptions). The InstCombine code, has itself its own small set of options, for example "instcombine-maxarray-size” or "instcombine-code-sinking”. Command line compiler options produce functionally equivalent IR output, while respecting stablished canonicalizations. In all cases, the output is just valid IR code in a proper form that depends on the selected options. As an example -O0 produces a very different output than -O3, or -Os, all of them are valid as the input to any target backend. My suggestion would be to incorporate a compiler option acting on the InstCombine pass. The option would improve IR code aimed at Type 2 targets. Of course, this option would not be enabled by default so the IR output would remain exactly as it is today if not explicitly enabled.
What this option would need to do in practice is really easy and straightforward. Just bypassing (avoiding) certain transformations that might be considered harmful for targets benefiting from it. I performed some simple tests, specially directed at the InstCombineSelect transformations, and I found them to work great and generating greatly improved code for both the MSP430 and AVR targets.
Now, I am aware that this proposal might come a bit unexpected and even regarded as inelegant or undesirable, but maybe after some careful balancing of pros and cons, it is just what we need to do, if we really care about LLVM as a viable platform for 8 and 16 bit targets. As stated earlier, It’s easy to implement, it’s just an optional compiler setting not affecting major targets at all, and the future extend of it can be gradually defined or agreed upon as it is put into operation. Any views would be appreciated.