经过一些思考,我想,如果clang和llvm能提供多语言支持的话,必定会扩大其影响力.使计算机程序和代码世界变得更多彩更丰富和更多可能性./After some reflection, I think that if clang and llvm can provide multi-language support, it will certainly expand its influence to make computer programs and code the world m

尊敬的各位clang编译器开发者
你们好
我是一个业余的计算机技术爱好者,程序设计爱好者.我的英语很差,只能看懂一些简单的词汇.因此我想求助于你们,
我想使用中文编程,我学习了很多程序语言,但都是英文,这让我理解困难,我担心编写出我自己都看不懂的代码,所以,我一直在寻求一种能进行中文编程的程序语言,经过我的搜索查找,我没找到一种中文编程语言,因此,我决定汉化或者中文化一门编程语言,比如性能好,使用人数最多的C/C++语言.
经过我的研究,我发现如果仅仅只是改变一门编程语言的关键字和变量名根本无法实现中文编程,因为编译器不能识别.所以,我想,如果我能汉化这种程序语言的编译器,让编译器能理解中文,那我是否就能实现中文编程了呢?同时,因为开源的关系,还可以吸引更多有相同爱好和追求的朋友.但我的英文太差,llvm和clang的文档都没有中文版的,我根本无法学习和知晓到底该怎么做.我不知道从哪里开始进行这项工作.因此,我决定发邮件求助llvm和clang项目的创造者.希望能得到正确的指导和指引.谢谢.:relaxed:

一个业余的计算机技术爱好者,程序设计爱好者
2016.9.30

So, I’m not going to try to use google translate to make this into Chinese, because I don’t think it will help much, but I’d like to clarify what exactly it is you want to achieve:

  1. Implement C (or C++) in Chinese?

  2. Allow programs written for the Chinese market to be developed in LLVM & Clang?

  3. Something else?

Option 1 seems a little meaningless to me, as there are only a handful of keywords in C, and a few more in C++ to learn. Yes, there is also a large amount of C and C++ libraries that go together with the compiler, but these are not (in general) the task of Clang directly. I would also argue that this is NOT a good way to go. Programming is very much an international activity, and although there are probably more people that natively speak & write Chinese than any other language in the world, outside of those with Chinese as a native language, the Chinese language is not well known, where English is a reasonably workable language in many parts of the world, whether the native language is Chinese, German, Korean, French, Russian or Arabic. A program written in Chinese would be completely unreadable by me and I would guess about 90% of the subscribers of this mailing list.

Option 2 should already work. If there is a specific issue that doesn’t work, that should be reported as a bug.

Option 3 - please clarify what it is you want to achieve…

I would also suggest that for someone wishing to work as a software developer for a large international company or a company trading on the international market, being able to communicate in English is almost certainly a great benefit to the career prospects, if not fully required. I say this as a person whose first language is NOT English. Note that there has been programming languages in “other than English”, but they tend to not become very popular, for the reason I described above: Not many outside of the native language region are able to use that language. Yes, Chinese is a much bigger language than French of German, for example, but it’s still not used by most other countries in general communication, which is the key point: software is an international business, it is an international community, and using one human language for communications with the community makes the group understand each other, re-use each others solutions and being able to help each other.

你好,很抱歉没有及时回复邮件,原因是中国大陆政府使用GFW对谷歌和世界主流社交网站进行封锁,导致我不能访问谷歌邮箱.现在我是临时用VPN才勉强能打开.

我用谷歌翻译看了你的回复文章以后,我认为,重点不是像你说的用某一种自然语言作为编程语言的宿主,并且,之前的邮件中,我已经表达过了,我的想法是,llvm和clang能提供一种接口,也就是多语言支持,并不只有[汉语/Mandarin],世界上其它语言也能实现母语编程,就是用母语作为计算机程序语言.
对于你说的英语是计算机程序设计的官方语言和标准语言.我持中立看法.我既不否定也不认同,现在没有其它文化的语言作为编程语言流行起来是因为没有人去做这件事.我并不认为旧有的东西就是所有一切,我并不认为过去等于未来,未来需要新的人和新的思想注入和开创.

关于你提到的三个问题,我回答如下
1,我确实是想用[汉语/Mandarin]来实现一种编程语言,我的设想是在现有语言基础上翻译转换一种编程语言.
2,对的,是这样,我的设想是希望更多的人能写代码,让代码世界变得丰富多彩,给代码世界注入新鲜的想法与活力.
3,没有其它的,我的想法就只有这一点.

你不需要说服我学习英语,这不是我关注的问题的重点,并且,在之前的邮件中我也已经表达过这个看法,我的英语能力很差,与我相同情况的人在这个世界上还有非常非常多.
我知道计算机最终执行的并不是英语字符,而是二进制.所以,我的设想就是在现有编译器的基础上增加一个接口,让编译器可以识别其它自然语言的字符,但是语法树还是使用原有程序语言的.因此,不同自然语言的人用他们的母语编程也同样能与世界同步,而不存在无法沟通的问题,也不存在源文件无法被分享和交换使用的问题,这就是我设想的编译器对多语言支持的重点.

我做这样的设想也是因为第一段话提到的问题,因为中国政府封锁网络的做法,导致中国与世界处于隔绝和阻断的状态,要解决这个问题不仅仅是只要学会英语就能解决的.从来就没有救世主,我们需要拯救我们自己.中国政府不能代表所有中国人.中国政府只是中国人这个群体中的一部分.而思想是不怕子弹的,像我这样寻找新的出路的中国人,我迫切需要一种方法能把想法变成切实可行的可用的工具来自救.所以,我需要一种能快速把idea变成现实的实现工具.无疑,用母语来做这件事是最适合的.同时,我也希望这一想法能帮助世界上其他有相同需求的人.

我说这些的重点就是,多语言支持.概括起来就是这样.

Hello!

I don't think anybody is against this for "political" reasons. We don't try to force people to use english. We are only worried about the _technical_ problems you will have rewriting Clang to allow that any language is used.

My suggestion is that you write a translator that is executed before Clang.

Source code (I used google translate, I don't know if this is correct):

    如果(价值>三)

Your translator could convert that into:

    if (value > 3)

I see that the parentheses and ">" where not translated by google translate. but of course feel free to translate any operators.

After that, Clang is executed.

So if you manually compile your program it will be like:
$ translate-cpp chinese.cpp > temp.cpp
$ clang temp.cpp

The error messages and warning messages must be written in chinese. I assume that is possible already.. but I don't know how that is done.

Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

Option 1 seems a little meaningless to me, as there are only a handful of
keywords in C, and a few more in C++ to learn. Yes, there is also a large
amount of C and C++ libraries that go together with the compiler, but these
are not (in general) the task of Clang directly. I would also argue that
this is NOT a good way to go. Programming is very much an international
activity, and although there are probably more people that natively speak &
write Chinese than any other language in the world, outside of those with
Chinese as a native language, the Chinese language is not well known, where
English is a reasonably workable language in many parts of the world,
whether the native language is Chinese, German, Korean, French, Russian or
Arabic. A program written in Chinese would be completely unreadable by me
and I would guess about 90% of the subscribers of this mailing list.

Hi Matt,

I'm surprised by your comments. The concern is very real.

The reason why programming languages are written in English is not because
English is the most popular second language in the world, but because the
development of computers started in England and was popularised in the US.

This is not just about learning English to program, this is about your
native tongue, and how you write it, and how you think about it.

As Brazilian, I find programming in English reasonably simple. I write
left-to-right, I express "if" as "se" and "while" as "enquanto", and I
speak "==" as "igual". Reading C++ is very natural to me, even if it's in a
different language.

But Arabic speakers read right-to-left and their culture is different
enough that made people create Arabic programming languages [1]. It was
mostly an art project, but the concern, again, is real. People think
differently.

Most Asiatic countries have completely different writing systems, which is
not just top-down, but symbolic rather than syllabic. This is a huge shift
in understanding of the language, and it's a lot harder for native Chinese
speakers to "read" English *code*.

This puts a huge burden on non-English speakers to not only learn the
reality of computing (logic, hardware, algorithms), but another language
entirely and a special subset that only makes sense for computer
programmers. The burden is worse for the people that grew up with a
completely different communication mindset.

I think your remarks on the English language being "workable" and spoken by
90% of this mailing list were very poor, (unintentionally) bearing
prejudicial. European languages are far apart from each other, but they
have the same thinking pattern. Asian languages are much further apart from
European ones, and changing the thinking pattern is *really* hard and not
at all the same thing.

Option 2 should already work. If there is a specific issue that doesn't

work, that should be reported as a bug.

The point here is to program in a way that is expressive to different
people. Using variable names in Chinese using UTF-8 is not enough.

I would also suggest that for someone wishing to work as a software
developer for a large international company or a company trading on the
international market, being able to communicate in English is almost
certainly a great benefit to the career prospects, if not fully required. I
say this as a person whose first language is NOT English.

Again, I believe this response misses the point. Technology and Science are
for everyone, not only those that can easily learn a few new languages plus
a whole new field.

The dominance of English in STEM subjects is real, but that doesn't mean we
have to accept it or even think it's the only way forward. I truly believe
it isn't.

Note that there has been programming languages in "other than English", but

they tend to not become very popular, for the reason I described above: Not
many outside of the native language region are able to use that language.
Yes, Chinese is a much bigger language than French of German, for example,
but it's still not used by most other countries in general communication,
which is the key point: software is an international business, it is an
international community, and using one human language for communications
with the community makes the group understand each other, re-use each
others solutions and being able to help each other.

This is not about language, but about thought process.

Software is a learning tool, and as such, should be available to
*everyone*. Not only businesses.

Software re-use has nothing to do with the (programming) language you write
in, but in the well defined and documented ABIs in between.

As to the practicalities of making C++ "look" Chinese, I don't think it
will work in that way. But it should be possible to achieve a few improving
goals...

1. As Matt said, try to use Chinese characters as variable/function names
using UTF-8 and see if it helps. I'm not sure how name mangling will work,
though.

2. Try to change the Clang parser to recognise Chinese words for C++
keywords. This would be a departure from the C++ standard, no doubt, but an
interesting concept regardless.

3. Start a new front-end for LLVM, one that implements a language using the
Chinese thinking pattern and transforms into LLVM IR. If the transformation
matches well, it may make interoperability with the rest of LLVM supported
languages much easier.

cheers,
--renato

[1] Qalb (programming language) - Wikipedia

I don't think anybody is against this for "political" reasons. We don't try to force people to use english. We are only worried about the _technical_ problems you will have rewriting Clang to allow that any language is used.

Indeed, agreed.

My suggestion is that you write a translator that is executed before Clang.

Source code (I used google translate, I don't know if this is correct):

    如果(价值>三)

Your translator could convert that into:

    if (value > 3)

I don't think that'll work in all cases.

I don't know Chinese well enough to make a guess, but I know that
translating "natural" Portuguese or Italian to English is really hard
to get it right.

As I said earlier, "programming" Portuguese and "programming" English
are similar enough that this would probably work. But I won't guess
the same about Chinese. :slight_smile:

So, the only way this could work is that the "translate-cpp" program
would make a 1-to-1 relationship between <Lang> and C++, and the
programmers were trained to write code in that way.

Essentially, this would be technically identical to adding a list of
languages to Clang and just changing the identifier names.

This solution wouldn't be "ideal", but it would allow more Chinese
programmers to "translate" English code to Chinese, modify it,
recompile it, and send the patch back in English.

cheers,
--renato

Option 1 seems a little meaningless to me, as there are only a handful of
keywords in C, and a few more in C++ to learn. Yes, there is also a large
amount of C and C++ libraries that go together with the compiler, but these
are not (in general) the task of Clang directly. I would also argue that
this is NOT a good way to go. Programming is very much an international
activity, and although there are probably more people that natively speak &
write Chinese than any other language in the world, outside of those with
Chinese as a native language, the Chinese language is not well known, where
English is a reasonably workable language in many parts of the world,
whether the native language is Chinese, German, Korean, French, Russian or
Arabic. A program written in Chinese would be completely unreadable by me
and I would guess about 90% of the subscribers of this mailing list.

Hi Matt,

I'm surprised by your comments. The concern is very real.

The reason why programming languages are written in English is not because
English is the most popular second language in the world, but because the
development of computers started in England and was popularised in the US.

This is not just about learning English to program, this is about your
native tongue, and how you write it, and how you think about it.

As Brazilian, I find programming in English reasonably simple. I write
left-to-right, I express "if" as "se" and "while" as "enquanto", and I
speak "==" as "igual". Reading C++ is very natural to me, even if it's in a
different language.

But Arabic speakers read right-to-left and their culture is different
enough that made people create Arabic programming languages [1]. It was
mostly an art project, but the concern, again, is real. People think
differently.

Most Asiatic countries have completely different writing systems, which is
not just top-down, but symbolic rather than syllabic. This is a huge shift
in understanding of the language, and it's a lot harder for native Chinese
speakers to "read" English *code*.

This puts a huge burden on non-English speakers to not only learn the
reality of computing (logic, hardware, algorithms), but another language
entirely and a special subset that only makes sense for computer
programmers. The burden is worse for the people that grew up with a
completely different communication mindset.

I think your remarks on the English language being "workable" and spoken
by 90% of this mailing list were very poor, (unintentionally) bearing
prejudicial. European languages are far apart from each other, but they
have the same thinking pattern. Asian languages are much further apart from
European ones, and changing the thinking pattern is *really* hard and not
at all the same thing.

Option 2 should already work. If there is a specific issue that doesn't

work, that should be reported as a bug.

The point here is to program in a way that is expressive to different
people. Using variable names in Chinese using UTF-8 is not enough.

I would also suggest that for someone wishing to work as a software
developer for a large international company or a company trading on the
international market, being able to communicate in English is almost
certainly a great benefit to the career prospects, if not fully required. I
say this as a person whose first language is NOT English.

Again, I believe this response misses the point. Technology and Science
are for everyone, not only those that can easily learn a few new languages
plus a whole new field.

The dominance of English in STEM subjects is real, but that doesn't mean
we have to accept it or even think it's the only way forward. I truly
believe it isn't.

Note that there has been programming languages in "other than English",

but they tend to not become very popular, for the reason I described above:
Not many outside of the native language region are able to use that
language. Yes, Chinese is a much bigger language than French of German, for
example, but it's still not used by most other countries in general
communication, which is the key point: software is an international
business, it is an international community, and using one human language
for communications with the community makes the group understand each
other, re-use each others solutions and being able to help each other.

This is not about language, but about thought process.

Software is a learning tool, and as such, should be available to
*everyone*. Not only businesses.

Software re-use has nothing to do with the (programming) language you
write in, but in the well defined and documented ABIs in between.

But if the ABI description [or it's implementation] looks like the subject
of this email, I certainly wouldn't have a chance to figure out if it's
right for my needs, or fix any bugs that may be inside it. Sure, I probably
can't fix a significant bug in LLVM or Clang either - but at least it's not
completely impossible for me to TRY to understand it - this applies of
course, if I've written a wornderful API or ABI in Swedish for most members
of this list, just as much as Chinese, and if you wrote one in Brazilian, I
would have no chance either (even using google or bing translate, only SOME
of the translations from my Brazilian and Chinese friends facebook posts
are understandable without some clever guessing or knowledge of the
context).

As to the practicalities of making C++ "look" Chinese, I don't think it
will work in that way. But it should be possible to achieve a few improving
goals...

1. As Matt said, try to use Chinese characters as variable/function names
using UTF-8 and see if it helps. I'm not sure how name mangling will work,
though.

2. Try to change the Clang parser to recognise Chinese words for C++
keywords. This would be a departure from the C++ standard, no doubt, but an
interesting concept regardless.

3. Start a new front-end for LLVM, one that implements a language using
the Chinese thinking pattern and transforms into LLVM IR. If the
transformation matches well, it may make interoperability with the rest of
LLVM supported languages much easier.

This, #3 is in my opinion, probably the best approach for a "teaching
language".

If the purpose is to teach programming C++ is along the lines of the answer
to "How do I get to <some place>" with "I wouldn't start from here". Yes,
it's a very powerful and all encompassing language, but there are many
languages that are much easier and better for a beginner or intermediate
programmer. Python, Pascal, Java - or something completely new, more
aligned with the way Chinese language works.

If the purpose is to create commercial and international software, I still
would argue that English language is more or less essential. I have worked
for a couple of larger Swedish companies, and the programming AND documents
that go with that, are all written in English - and I know that French,
Spanish, Italian and German companies do the same, because it makes it
possible to communicate with people that do not speak/read/write Swedish,
French, Spanish, Italian or German.

But if the ABI description [or it's implementation] looks like the subject
of this email, I certainly wouldn't have a chance to figure out if it's
right for my needs, or fix any bugs that may be inside it.

That helps put yourself on the OP's shoes. :slight_smile:

Translating an ABI, or an API, or a User Guide document is far easier
than the whole software.

So, I'd be perfectly fine if there was an Open Source Chinese software
with an API that allows me to use their functions (some translated
mangling, probably) in my software.

It would be a lot more empowering to non-English speaking people if
they could express their thoughts in their natural ways, and only
translate the interfaces with other people.

The alternative, in the long run, is to have everyone thinking in the
exact same way... That'd be a sad future.

If the purpose is to teach programming C++ is along the lines of the answer
to "How do I get to <some place>" with "I wouldn't start from here". Yes,
it's a very powerful and all encompassing language, but there are many
languages that are much easier and better for a beginner or intermediate
programmer. Python, Pascal, Java - or something completely new, more aligned
with the way Chinese language works.

This is why I separated "C++ translation" to "New language". I agree
they're two completely separate objectives.

I have to be honest, it wasn't clear from the original post which one
was the objective.

If the purpose is to create commercial and international software, I still
would argue that English language is more or less essential. I have worked
for a couple of larger Swedish companies, and the programming AND documents
that go with that, are all written in English - and I know that French,
Spanish, Italian and German companies do the same, because it makes it
possible to communicate with people that do not speak/read/write Swedish,
French, Spanish, Italian or German.

This is the enterprise status quo, but I have seen a lot of
international comments on non-English speakers' code. Most notably in
academia.

I don't think C++ should be an enterprise-only language. I believe in
universal access, and in the benefits (empowering people) and costs
(communication) it entails.

But that's my personal opinion.

cheers,
--renato

1, for each programming language can add a custom interface
characters, my idea is because I understand that llvm have their own
assembly, which is the middle code? Bytecode?

Right, so this is case 2) of my suggestions: translate C++.

Formally, if we were going to do this, we'd need the C++ standard's
blessing, and probably some formal description of the process and
ABIs.

But you can start doing this on your own like Daniel said:

Solution 1: Develop a script that does the translation "before" calling Clang.

$ translate-cpp chinese.cpp > temp.cpp
$ clang temp.cpp

This only works if you use "Chinese like English" and if you get used
to the format of the new code. The problems you'll face with this
approach are:

1. If you stray from the strictly accepted Chinese symbols, you may
have leaked Chinese into "temp.cpp", which will confuse Clang.
2. The C++ error messages will be all in English

Fixing 2 above can be done if "translate-cpp" is a wrapper for Clang,
which decodes Chinese to English, calls Clang, then decodes English
error messages into Chinese, and present to the user.

Fixing 1 will need syntax checking on your wrapper program, which can
be hard to do.

Solution 2: Change Clang to accept multiple languages for the identifiers.

This will involve changing Clang to read from different language
files, which can be chosen via a command line "--language=chinese".
Given that most identifiers don't make it out in the object output,
this could work well. The problems you're likely to face are:

1. Debug information (example, type names) will be in Chinese.
Non-Chinese users of your libraries will have problems debugging it.
2. Calling standard library functions (like malloc/free) will either
have to remain in English or be translated in Clang.

Fixing 1 and 2 above is a matter of always loading the English
language, in addition to the international one, then falling back to
English what is either non-existent or problematic.

Most error messages are in string files, so it shouldn't be too hard
to create other language files for error messages and use them when
"--language=" is used.

2, to provide official documents of the multi-language version. That
is, llvm detailed instructions. As long as the developer llvm ontology
have enough understanding and awareness, it will not create redundant
errors that is the emergence of non-standard code.

This is the easiest step. You just need people that speak both
languages well and are interested in doing the translation. I'm sure
any patch in that respect will be well accepted.

Thanks,
--renato

OK,我已经明白了要实现这个想法到底有多难了,谢谢你的解答.
OK, I've seen how difficult it is to realize this idea, thank you for
the solution.