RFC: Enabling fexec-charset support to LLVM and clang (Reposting)

tahonermann · August 4, 2023, 1:26am

That seems reasonable. Is there an easily accessible list of EBCDIC code pages that do not meet this requirement? Are any of them known to be used to a sufficient extent to justify their use as the literal encoding?

reinterpretcast · August 4, 2023, 6:44am

IBM-939 (Japanese) has ^ and ¬ switched. An additional consideration for <format> would be to understand shift states (e.g., counting the right number of multibyte characters). On the Linux/Windows side, there will be variable-length encodings to handle.

tahonermann · August 4, 2023, 7:40pm

Thank you. That seems important enough to justify special handling.

Agreed, but I don’t think this is a new consideration nor more problematic for EBCDIC than it is for encodings such as Shift-JIS.

abhina-sree · December 5, 2023, 8:25pm

Hi everyone,

Sorry for the late reply, I did not have time to look into ICU support until recently. I have updated my old PR which had iconv support, and extended it with very basic ICU support. With my current solution, we check for icu support first, and if that is not available, we then check for iconv support.

This PR can be found here: Create a CharSetConverter class with both iconv and icu support by abhina-sree · Pull Request #74516 · llvm/llvm-project · GitHub

I was not able to test on Windows, so would greatly appreciate your feedback on if this path forward meets the needs of all platforms.

Thanks for your patience,
Abhina

abhina-sree · October 2, 2024, 2:28pm

Hi everyone, would anyone be able to review my PR https://github.com/llvm/llvm-project/pull/74516 again to add support for icu and iconv libraries?

abhina-sree · May 7, 2025, 3:41pm

I have replaced the old PR with two stacked PRs for ease of reviewing, I appreciate any feedback on them. Thanks!

github.com/llvm/llvm-project

Create a CharSetConverter class with both iconv and icu support.

main ← users/abhina/charset_converter

opened 03:28PM - 07 May 25 UTC

abhina-sree

+774 -2

This patch adds a wrapper class called CharSetConverter for ConverterEBCDIC. Thi…s class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library. Relevant RFCs: https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795 https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512 Stacked PR to enable fexec-charset that depends on this: https://github.com/llvm/llvm-project/pull/138895 See old PR for review and commit history: https://github.com/llvm/llvm-project/pull/74516

github.com/llvm/llvm-project

Enable fexec-charset option

users/abhina/charset_converter ← users/abhina/fexec_charset

opened 03:30PM - 07 May 25 UTC

abhina-sree

+377 -53

This patch enables the fexec-charset option to control the execution charset of …string literals. It sets the default internal charset, system charset, and execution charset for z/OS and UTF-8 for all other platforms. This patch depends on adding the CharSetConverter class https://github.com/llvm/llvm-project/pull/138893 Relevant RFCs: https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795 https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512

Topic		Replies	Views
RFC: Enabling fexec-charset support to LLVM and clang Clang Frontend	5	218	December 15, 2020
[RFC] Adding a CharSet Converter to the LLVM Support Library LLVM Project	17	1606	June 27, 2023
Wide strings and clang::StringLiteral. Clang Frontend	28	380	December 6, 2008
[RFC] Adding a char set converter to Support library LLVM Dev List Archives	4	337	October 2, 2020
Implementing charsets (-fexec-charset & -finput-charset) Clang Frontend	7	549	January 30, 2018

RFC: Enabling fexec-charset support to LLVM and clang (Reposting)

Related topics