Non-standard byte sizes

For a hypothetical Evil Project(tm), I would like to do a LLVM backend
for a virtual machine that does not use 8-bit bytes. Does LLVM support
this sort of thing?

The details are: each addressing unit in the virtual machine can store a
single value of any type, except for data pointers, which are stored as
pairs (handle and offset). As such, sizeof(char) == sizeof(int) ==
sizeof(long long) == sizeof(float) == sizeof(double) ==
sizeof(void(*)()) == 1, and sizeof(void*) == 2.

I did achieve success with another compiler framework a while back ---
but I did have to fix some bugs in it first. This isn't something people
tend to want to do much!

On a related note, I remember seeing compile-time flags to clang to tell
it the sizes of the various primitive types, but I can't find them any
more. Do they still exist or will I need to do a custom clang build to
change them?

Not without some modification.

I've developed an LLVM back end for a DSP with 24-bit word-addressable
memory, basically by defining the alignment of i8 to be the word size
of the machine. So it _is_ possible.

There are a bunch of places in clang/llvm that assume that the
alignment of i8 is 8, and these all need to be generalized to respect
the specified alignment. Then in your instruction lowering code you
need to convert all the offsets in memory accesses from 8-bit units to
word-sized ones so that addressing is correct.

Of course, this approach only works if you are willing to specify
chars as i8 in clang.

FWIW, I'm working towards properly generalizing the size of char in
clang in my spare time, but that work is not nearly complete. I plan
to someday extend that work into llvm.

I can send you a patch of the changes that I made to clang/llvm
release 2.8, if you'd like. Be warned that it also contains support
for non-power-of-2 machine types.

-Ken

[...]

I've developed an LLVM back end for a DSP with 24-bit word-addressable
memory, basically by defining the alignment of i8 to be the word size
of the machine. So it _is_ possible.

I think I might have an easier job of it, as I don't want to use any of
the standard C sizes at all --- my char will be ~64 bits wide (in fact,
my hypothetical VM stores all values as doubles) and will occupy one
complete storage cell, so I don't need to do anything as weird as
storing values of one size in a storage cell of another size.

From what you've said it sounds like all I need to do is to do the right
thing when lowering getelementptr, and it will all Just Work.
Unfortunately I know from experience that clang doesn't always use
getelementptr when doing point arithmetic --- how can I stop it trying
to advance a pointer by one (64-bit) char by adding eight to it?

(Also, how do I change the size of the built-in types in clang? I could
swear I once saw some command-line options to do this, but can't find
them now. And I can't find any documentation for clang's -cc1 mode...)

From what you've said it sounds like all I need to do is to do the right
thing when lowering getelementptr, and it will all Just Work.
Unfortunately I know from experience that clang doesn't always use
getelementptr when doing point arithmetic --- how can I stop it trying
to advance a pointer by one (64-bit) char by adding eight to it?

I added a StorageUnitSize property to the llvm TargetData class, which
I used to adjust pointer arithmetic in the parts of clang (and llvm,
for that matter) where there were problems.

It helps to have a C validation suite such as Plum Hall to uncover
these addressing issues.

(Also, how do I change the size of the built-in types in clang? I could
swear I once saw some command-line options to do this, but can't find
them now. And I can't find any documentation for clang's -cc1 mode...)

The only way that I am aware of is to implement a TargetInfo class for
your target.

-Ken

[p.s. My apologies for sending this to you twice, David. I forgot to
cc the mailing lists on my earlier reply]

I hope you've found it by now. If not:

clang -cc1 -help