Could I use LLVM as a base to build different C++ compilation architecture?

Hello,

I love C++, but as years went by, I identified more and more quircks of it and I feel that someone should finally solve it.
This lead me to try to start some serious research in the possibility of implementation of some extension of C++ which would solve these. (Lets call it k++ for now)

I listed the basic ideas here: GitHub - wube/kov: C++ based programming language draft

The questions:

  1. Is there a reason why my goal (mostly with the compile architecture change) wouldn’t work?
  2. If not, is there a reason why it wouldn’t improve compile times drastically?
  3. How much of the existing LLVM could be used for this?
  4. What is your rough estimate of the people work years/money for this to be created?
  5. Any other comments on the ideas?
1 Like

The C++ standard committee has been working on the #include problem for quite a while; the solution there is called “modules.” You might investigate that before trying to invent a different solution.

I’m not super familiar with the internals of the Clang frontend, so I’m unwilling to make a real cost estimate; most likely this is a hobby project for you that will keep you entertained for years.

Hello, thanks for the reply.

I did some basic module investigation, but it is not as great as I hoped for. Modules could improve things somewhat, but not to the point of making the compilation drastically reduced. For example, from what I understand, modules won’t prevent duplicate template instantiation in different compilation units.
Based on the testing in some of the compilers I made, the module usage didn’t bring a drastic compile time reduction.

This is different compared to my proposal, where every template instantiation is made at most once per the compilation, the compiler stays in the background etc. I’m aiming a place where the project compilation times should go down 1 or 2 orders of magnitude.

The point is not to make a hobby project, I would like to find and hire several people fitting for this project and try to make some first prototypes in reasonable time. I’m basically willing to throw some money at the problem (few M$ probably) with not much asking in return (as I still want this to be open source and free), I would just like to make the world a better place …

But before doing that and investing into that, I would need to get someone deeply knowledgable in compiler architecture who could discuss this with me, and tell me the potential dangers of the whole plan.

Edit:
Lastly, import change still require to manually write the imports, and I would like to not replace include with something else, but just remove it, you would just write code, not includes.

I’m not sure I can give a cost estimate of how long it would take to create k++, but perhaps my experience will help. I used LLVM to create a language for one of my personal projects because I also felt that C++ was lacking in certain areas - includes being one of them.

It took a lot longer than expected. There is a lot I didn’t consider when I first started:

  • The importance of good quality error reporting, with line numbers, column numbers, helpful messages, and so on. Plus things like ‘fix-its’.
  • Most languages these days have a package manager (pip/gem/npm/cargo/composer/etc.) and a central publicly accessible repository of packages. Creating this isn’t trivial.
  • The codegen for compile-time programming (e.g. template programming) is hard. I got the impression that a compiler would need to JIT the compile-time constructs to get any kind of measurable speed-up when compiling templates or constexpr expressions compared with C++.
  • If your language is not ABI-compatible with C or C++, then you’ll need to either write your own standard library or write a tool to automatically convert the C/C++ standard library into k++. If you end up needing to write your own standard library, this will take a long time.
  • Good editor integration these days requires implementing an LSP server. Also not trivial.
  • What about debugging support? If your language is almost identical to C++ then you may be able to reuse some of what’s in Clang, but if your language evolves features that aren’t translatable to C++ then it may become trickier.
  • What about address, undefined behaviour and thread sanitizers? LLVM has built-in passes for these so this may be pretty straightforward. But if your language evolves to differ substantially from C++, this might become more difficult.

I started off trying to create a ‘better C++’ but I gradually realised it would take thousands of hours. I settled on a much simpler C-like language instead, which still took several hundred hours.

If you’re instead looking to fork clang rather than creating a new front-end on top of LLVM, perhaps it won’t take as much time.

EDIT: Also, I believe ‘Circle’ (https://www.circle-lang.org/) is a C++ compiler built on top of LLVM. I don’t have any experience with this compiler, but perhaps forking this might be a way to create k++?

The biggest issue with your compilation model is it’s impossible to do with C++'s syntax. You very quickly run into places where you cannot continue parsing without first seeing the declaration of a name. Even determining if something is a declaration at all requires having already resolved everything it references.

I’m also pretty sure that trying to represent this as a parse graph to handle the ambiguity ends up needing exponential space in some cases. Really the first step in any attempt to make a saner C++ is to change the syntax such that determining if something is a declaration is context-free.

2 Likes

Interesting, care to reference to an example of such ambiguity, or general reading material of these kinds of problems?

I vaguely remember seeing some examples of things like this, where the ambiguity was because of some obscurd usage that “nobody uses”. So if it would be possible to define set of syntax changes that would be done in a pragmatic way, so that it would affect just the obscured usages (it would be great), but I would love to know more details about the problem first.

The simplest case of an ambiguous declaration-or-expression issue is this: a * b; – is that a declaration of a variable b of type a*, or is that a multiplicative expression statement with a and b as operands? (FWIW, this is also an example of why typename is necessary–if we described it as a::c * b;, C++ assumes that a::c is an expression, whereas typename a::c is a type name. Note that Down with typename!, which made it into C++20 I’m told, removed mandatory typename in contexts where the only possible parse for such an expression was a type name).

In general, for as much as some of these features seem useful, there’s quite a few of them that look like “I’m not sure how to write the semantics for this stuff that’s at all sensible.” Const-deduplication definitely falls into this bucket, as does named parameter passing (which has been proposed to, and rejected by, the C++ committee) and type-based parameter resolution. If you believe that the semantics aren’t unworkable, then I might suggest that you take the time working on an actual proposal to the C++ standards committee; at the very least, that would also introduce you to some of the hidden trapdoors in complex language semantics interactions.

1 Like

Interesting, but this one seems to be solvable to me, either we have cases like this:

class X
{
  a * b; // clearly a declaration
};
void foo()
{
  a * b; // function body can be processed after (external) types are resolved, so I will know
};

About the semantics, it is mostly just draft, the important for me is the idea to solve the problem which I acknolwedge to be there.

Interesting, the paper you mentioned:
A proposal for named arguments for C++ basically proposes the same thing with the same syntax and was rejected, but I don’t agree with the arguments of rejections, especially in context of k++, so it stands by me.