Complete newbie here. (Sorry if this is common knowledge or a FAQ. If so, please point me to the reference doc that covers this.)
Can LLVM be used (reasonably, efficiently) for target machine architectures that know nothing about BYTES? I.e. for (old) word-oriented machines, in which the concept of (hardware) 8-bit bytes does not exist, and the word bit-length is not a power of 2?
If specifics are needed to give a good answer, then here is the specific case I am looking at:
word bit-length: 60
instruction bit-lengths: 15, 30, 60
address-register bit-length: 18
I believe LLVM is technically speaking written in such a way to be independent of the size of a byte (and possibly whether bytes even exist), but as far as I know, there currently is no backend where that is the case. So this is very much untested territory. Therefore there isn’t really a clear answer until someone makes the investment in writing a backend for an architecture that doesn’t have 8 bit bytes.
I believe there are a number of somewhat separate concerns here, with maybe different levels of leverage of the existing code.
Firstly: The LLVM architecture-independent IR is certainly capable of representing instructions with arbitrary data sizes. It is possible to describe arbitrary bitwidth data in an architecture independent way. The LLVM backend infrastructure is slightly less flexible: although it can represent arbitrary data types, portions of the infrastructure require these types used in machine code to be enumerated a-priori. Here as long as you have a small number of machine types, things should still be OK and you can leverage most of the code generation mechanisms. Some parts of the code generation infrastructure do require power-of-two instruction sizes, but you might be able to hack around that relatively easily.
Secondly: What is the semantics of memory? load and store operations in LLVM middle-end IR are fundamentally associated with a byte-oriented model of memory and many concepts (such as memory alignment) are described on bytes. LLVM represents some aspects of this with a ‘Data Layout’ that only represents some concepts with byte-wise granularity and many existing passes that deal with the layout of data in memory assume memory has bytes. Most of this code would probably have to be updated, or avoided for your target. If you can avoid re-interpreting memory as different data types, then you might be able to avoid the worst of this aspect.
Thirdly: some parts of LLVM assume that data stored in memory (particularly pointers) are not only byte-level granularity but also have power of two alignment. So if you have pointers stored in data memory, you’ll probably run into this. It’s possible if you have a pure Harvard architecture, then you might avoid this.
My best guess is that while not impossible, this probably gets relatively little leverage from alot of the LLVM infrastructure and will likely require some form of invasive patches/hacks on LLVM to make it work well.