Re-targeting clang to a new architecture

Hi all.

I'm contemplating re-targeting clang to a new architecture. Initially I'd like to just port the front end as a static analysis tool to use alongside our existing GCC based toolchain, but ultimately I'd like to write a code generator too. Unfortunately my architecture has a couple of wrinkles that sometimes make life hard for compilers:

CHAR_BIT is 16 (i.e. the minimum addressable unit of memory is 16 bits)
It's a Harvard architecture with 16 bit data pointers and 24 bit function pointers.

Does anyone have any thoughts on how difficult it would be to target clang to this sort of architecture - just as a front end (for now)?

Thanks,
Ned.

p.s. I'm reading the group through gmane.org, in case that makes a difference to anyone

This came up on the list about 6 months ago, and the consensus was that it would be fairly tricky to do, since the "8 bits per char/byte" assumption pervades Clang and LLVM:

  http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-September/006349.html

Since then, there has been some work to make Clang depend on the target's character width rather than assuming it is 8 bits, so the situation has improved. I still expect it to be fairly tricky, but you aren't the only one interested in working on this particular issue in Clang.

  - Doug

Thanks Doug. I guess my next step is to try it and see how far I get.

Any thoughts on the different sizes of pointers?

Ned.

Those won't be a problem; Clang already handles different pointer sizes.

  - Doug

Thanks Doug. Time for me to get hacking :slight_smile:

Ned.

Douglas Gregor <dgregor@apple.com> writes:

Since then, there has been some work to make Clang depend on the target's character width rather than assuming it is 8 bits, so the situation has improved.

Any thoughts on the different sizes of pointers?

Those won't be a problem; Clang already handles different pointer sizes.

All in all, this seems to be positive news for Ned. Any news on how
far Ray Fix is with the changes to LLVM regarding the 16bit char?

Another more general question for the LLVM maintainers, would they be
interested in these changes? Would they integrate possible changes
regarding this upstream? (assume they would actually make LLVM
independent of the char size instead of just changing the dependency
from 8bit to 16bit)? If yes, are there any licensing gotchas regarding
integrating patches upstream?

Cheers,

Since then, there has been some work to
make Clang depend on the target's character
width rather than assuming it is 8 bits, so
the situation has improved.

There is still a significant amount of work left to do here. I plan to
get back to work on this in the next couple of months.

All in all, this seems to be positive news
for Ned. Any news on how
far Ray Fix is with the changes to LLVM
regarding the 16bit char?

When I last talked to Ray he told me that the project on which he was
working switched from LLVM to another technology, so I wouldn't expect
anything to come from him anytime soon.

I have been working on a back end for a machine with 24-bit
word-addressable memory and have made numerous changes to a private
branch of LLVM to support word-addressable memory (and
non-power-of-2-sized native integer types, fwiw). I intend to contribute
these changes back to the mainline eventually. In the meantime, I could
make a patch available here or the llvm-dev list if anybody is
interested in seeing this work in progress (but probably not until next
week when I update to the 2.7 release).

-Ken

"Ken Dyck" <Ken.Dyck@onsemi.com> writes:

In the meantime, I could
make a patch available here or the llvm-dev list if anybody is
interested in seeing this work in progress (but probably not until next
week when I update to the 2.7 release).

Yes, it would be great it you could do that.

Thanks,

Okay. Attached is a patch to LLVM and Clang (based on rev 102726) that
allows them to target processors with word-addressable memory and
non-power-of-2-sized integer types.

I am NOT requesting that this patch be code-reviewed for inclusion in
LLVM/Clang. I am posting it here on the off chance that somebody working
on similar machines will find it helpful. Comments are of course
welcome, but not expected.

The support for word-addressable memory is quite limited. It expects
that the Clang char type is 8 bits wide and that i8 is aligned on the
word boundaries of the machine. Word addressing, then, only affects a
few parts of LLVM where it generates offsets for getelementptr. These
parts are located in SelectionDAGBuilder.cpp and ConstantFolding.cpp.
They make use of a new target data attribute in TargetData called
storage unit size (specified with a -u field in the descriptor string)
to convert sizes in byte units to word units.

The rest of the changes are for supporting non-power-of-2 integer types
and alignments. As these topics haven't been part of this discussion so
far, I won't bore you with details here. If you have questions, though,
I'd be happy to answer them.

-Ken

llvm-clang.r102726.diff (137 KB)