gcc switch -fexec-charset=IBM-1047 to generate EBCDIC character constants

This append is related to integers being expressed as character
constants, e.g., 'a'. Strings, e.g., "a" are not an issue in the
assembler code produced.

Consider a trivial program:

   int main(void) {char a='a';return a;}

Compiling with

   clang -target s390x-linux-gnu -S -D__x86_64__ -O2 test.c

Gets me this assembler:

main:
        lghi %r2, 97
        br %r14

Which is all correct as z/Linux is an ASCII operating system.

However, there are other operating systems for IBM's z/Architecture that
use the EBCDIC encoding, and there one wants 0x81 for 'a'.

If the constant 'a' could pass through the compilation system (even if
the assembler does not support such constants), I would have less than a
smop; as it is now, the code generated is indistinguishable from a =
0x61, which should not be converted to EBCDIC.

Any pointer to where I should start hacking would be greatly appreciated.

An alternative would be a target specification triple, e.g.,
s390-zvm-cms, but one would still wish to specify the target code page
as Germans are likely to want a different one from the French. And
presumably that also means a new back end (?)

Finally, the gcc implementation is not optimal because the conversion is
also applied to strings, in particular the ones in printf() and that
severely messes up the checking as the EBCDIC string is scanned for
ASCII %, which is not helpful.

This append is related to integers being expressed as character
constants, e.g., 'a'. Strings, e.g., "a" are not an issue in the
assembler code produced.

Consider a trivial program:

   int main(void) {char a='a';return a;}

Compiling with

   clang -target s390x-linux-gnu -S -D__x86_64__ -O2 test.c

Gets me this assembler:

main:
        lghi %r2, 97
        br %r14

Which is all correct as z/Linux is an ASCII operating system.

However, there are other operating systems for IBM's z/Architecture that
use the EBCDIC encoding, and there one wants 0x81 for 'a'.

If the constant 'a' could pass through the compilation system (even if
the assembler does not support such constants), I would have less than a
smop;

... what's a smop?

as it is now, the code generated is indistinguishable from a =
0x61, which should not be converted to EBCDIC.

Any pointer to where I should start hacking would be greatly appreciated.

An alternative would be a target specification triple, e.g.,
s390-zvm-cms, but one would still wish to specify the target code page
as Germans are likely to want a different one from the French. And
presumably that also means a new back end (?)

Finally, the gcc implementation is not optimal because the conversion is
also applied to strings, in particular the ones in printf() and that
severely messes up the checking as the EBCDIC string is scanned for
ASCII %, which is not helpful.

Wait, you want for character literals and string literals to use a
different encoding? That sounds like a phenomenally bad idea to me. Also,
if your printf assumes ASCII, it sounds like your implementation's
execution character set really is ASCII...

Supporting execution character sets that are not ASCII is probably not too
burdensome, but if we're going to do it, we should do it right (using the
same character set for all character and string literals with no
encoding-prefix). If you want to go ahead with that, start by looking at
lib/Lex/LiteralSupport.cpp.

...

However, there are other operating systems for IBM's z/Architecture that
use the EBCDIC encoding, and there one wants 0x81 for 'a'.

If the constant 'a' could pass through the compilation system (even if
the assembler does not support such constants), I would have less than a
smop;

... what's a smop?

http://en.wikipedia.org/wiki/Small_matter_of_programming ?

Csaba