RFC: Adding support for the z/OS platform to LLVM and clang

Hello.

  1. Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded
input source files. This would be done at the file open time to allow the 
rest of Clang to operate as if the source was UTF-8 and so require no 
changes downstream. Feedback on this plan is welcome from the Clang 
community.

Would it be correct to assume that this EBCDIC → UTF-8 mapping would be as prescribed by
UTF-EBCDIC / IBM CDRA, notably for the control characters that do not map exactly?
Notably, if the execution encoding is EBCDIC, is ‘0x06’ equivalent to ‘0086’, etc?

The question “Is Unicode sufficient to represent all characters present in the input source without using the Private Use Area?” is one that
is relevant to both Clang and the C/C++ standard. ( I do hope that it is the case!)

Thanks,

Corentin

> 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded

input source files. This would be done at the file open time to allow

the

rest of Clang to operate as if the source was UTF-8 and so require no
changes downstream. Feedback on this plan is welcome from the Clang
community.
Would it be correct to assume that this EBCDIC -> UTF-8 mapping
would be as prescribed by
UTF-EBCDIC / IBM CDRA, notably for the control characters that do
not map exactly?
Notably, if the execution encoding is EBCDIC, is '0x06' equivalent
to '0086', etc?

The question "Is Unicode sufficient to represent all characters
present in the input source without using the Private Use Area?" is one

that

is relevant to both Clang and the C/C++ standard. ( I do hope that
it is the case!)

The current goal is to make only minimal changes to the frontend to enable
reading of EBCDIC encoded files. For this, we use the auto-conversion
service of z/OS UNIX System Services (
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm
), together with file tagging and setting the CCSID for the program and
for opened files.. The auto-conversion service supports round-trip
conversion between EBCDIC and Enhanced ASCII. With it, boot strapping with
EBCDIC source files is possible.
Of course, more complete UTF-8 support is a valid implementation
alternative.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940