RFC: Enabling fexec-charset support to LLVM and clang (Reposting)

That seems reasonable. Is there an easily accessible list of EBCDIC code pages that do not meet this requirement? Are any of them known to be used to a sufficient extent to justify their use as the literal encoding?

IBM-939 (Japanese) has ^ and ¬ switched. An additional consideration for <format> would be to understand shift states (e.g., counting the right number of multibyte characters). On the Linux/Windows side, there will be variable-length encodings to handle.

Thank you. That seems important enough to justify special handling.

Agreed, but I don’t think this is a new consideration nor more problematic for EBCDIC than it is for encodings such as Shift-JIS.

Hi everyone,

Sorry for the late reply, I did not have time to look into ICU support until recently. I have updated my old PR which had iconv support, and extended it with very basic ICU support. With my current solution, we check for icu support first, and if that is not available, we then check for iconv support.

This PR can be found here: Create a CharSetConverter class with both iconv and icu support by abhina-sree · Pull Request #74516 · llvm/llvm-project · GitHub

I was not able to test on Windows, so would greatly appreciate your feedback on if this path forward meets the needs of all platforms.

Thanks for your patience,
Abhina

Hi everyone, would anyone be able to review my PR https://github.com/llvm/llvm-project/pull/74516 again to add support for icu and iconv libraries?

I have replaced the old PR with two stacked PRs for ease of reviewing, I appreciate any feedback on them. Thanks!

1 Like