[RFC] Moving (parts of) the Cling REPL in Clang

Motivation

Hi Vassil,

Thank you for the very detailed email. I am not directly involved in clang-dev anymore, but I would love to see Cling get folded back into mainline LLVM development. The Cling project is really cool and I think that it doesn’t get the recognition it deserves,

-Chris

I think that it would be great to have infrastructure for incremental C++ compilation, supporting interactive use, just-in-time compilation, and so on. I think that the best way to deal with the patches, etc., as well as IncrementalAction, is to first send an RFC explaining the overall design.

-Hal

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

I do not know enough about cling, but I like what you describe very much, am particularly intrigued about how your approach could also be appropriated to do ahead-of-time constexpr metaprogramming as well, which also involves incrementally adding declarations to the translation unit.

Dave

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

JF, Hal, did you mean you want a design document of how cling in general or a design RFC for the patches we have? A design document for cling would be quite large and will take us some time to write up. OTOH, we could relatively easily give a rationale for each patch.

I do not know enough about cling, but I like what you describe very much, am particularly intrigued about how your approach could also be appropriated to do ahead-of-time constexpr metaprogramming as well, which also involves incrementally adding declarations to the translation unit.

Wow, I do not think we have thought of something like that. Cling keeps a single clang compiler instance in memory and each new input just "adds" to it -- nothing fancy on the frontend. The more interesting part happens in CodeGen where we produce multiple llvm::Modules. Maybe some parts from that could be reused to pursue the direction you are intrigued about.

Hi Vassil,

This is a very exciting proposal that I can imagine bringing important benefits to the existing cling users and also to the clang user and developer community. Thank you for all the work you and your team have done on cling so far and for offering to bring that work under the LLVM umbrella!

Are you imagining cling being part of the clang repository, or a separate LLVM subproject (with only the changes necessary to support cling-style uses of the clang libraries added to the clang tree)?

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

JF, Hal, did you mean you want a design document of how cling in general or a design RFC for the patches we have? A design document for cling would be quite large and will take us some time to write up. OTOH, we could relatively easily give a rationale for each patch.

I had in mind something that's probably in between. Something that explains the patches and enough about how they fit into a larger system that we can reason about the context.

-Hal

Hi Richard,

Hi Vassil,

This is a very exciting proposal that I can imagine bringing important benefits to the existing cling users and also to the clang user and developer community. Thank you for all the work you and your team have done on cling so far and for offering to bring that work under the LLVM umbrella!

Are you imagining cling being part of the clang repository, or a separate LLVM subproject (with only the changes necessary to support cling-style uses of the clang libraries added to the clang tree)?

Good question. In principle cling was developed with the idea to become a separate LLVM subproject. Although I'd easily see it fit in clang/tools/.

Nominally, cling has "high-energy physics"-specific features such as the so called 'meta commands'. For example, `[cling] .L some_file` would try to load a library called some_file.so and if it does not exist, try #include-ing a header with that name; `[cling] .x script.C` includes script.C and calls a function named `script`. I can imagine that broader community may not like/use that. If we start trimming down features like that then it won't really be cling anymore. Here is what I would imagine as a way forward:

1. Land as many cling/"incremental compilation"-related patches as we can in clang.
2. Build a simple tool, let's use a strawman name -- clang-repl, which only does the basics. For example, one can feed it incremental C++ and execute it.
3. Rework cling to use that infrastructure -- ideally, implementing it's specific meta commands and other domain-specific features such as dynamic scopes.

We could move any of the cling features which the broader community finds useful closer to clang. For the moment I am being conservative as this will also give us the opportunity to rethink some of the features.

The hard part is what lives where. First bullet point is clear. The second -- not so much. Clang has a clang-interpreter in its examples folder and it looks a little unmaintained. Maybe we can start repurposing that to match 2.

As for cling itself there are some challenges we should try to solve. Our community lives downstream (currently llvm-5) and a straight-forward llvm upgrade + bugfixing takes around 3 months due to the nature of our software stacks. It would be a non-trivial task to move the cling-based development in llvm upstream. My worry is that HEP-cling will soon depart from LLVM-cling if we don't get both communities on the same codebase (we have experienced such a problem with the getFullyQualified* interfaces). I am hoping that a middleman, such as clang-repl, can help. When we move parts of cling in clang we will develop and test the required functionality using clang-repl. This way users will enjoy cling-like experience and when cling upgrades its llvm its codebase will become smaller in size.

Am I making sense?

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

JF, Hal, did you mean you want a design document of how cling in general or a design RFC for the patches we have? A design document for cling would be quite large and will take us some time to write up. OTOH, we could relatively easily give a rationale for each patch.

I had in mind something that’s probably in between. Something that explains the patches and enough about how they fit into a larger system that we can reason about the context.

Maybe a purpose would be more useful to understand your request? I assume you meant “I’d like us to understand what we’re signing up to maintain, and why it’s useful to do things this way”. In particular, if there’s undue burden in a particular component, and the code could be changed to work differently with less support overhead, then we’d want to identify this fact ahead of time.

I’m guessing at what Hal is asking, LMK if that’s not what you had in mind!

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

JF, Hal, did you mean you want a design document of how cling in general or a design RFC for the patches we have? A design document for cling would be quite large and will take us some time to write up. OTOH, we could relatively easily give a rationale for each patch.

I had in mind something that's probably in between. Something that explains the patches and enough about how they fit into a larger system that we can reason about the context.

Maybe a purpose would be more useful to understand your request? I assume you meant “I’d like us to understand what we’re signing up to maintain, and why it’s useful to do things this way”. In particular, if there’s undue burden in a particular component, and the code could be changed to work differently with less support overhead, then we’d want to identify this fact ahead of time.

I’m guessing at what Hal is asking, LMK if that’s not what you had in mind!

Thanks for the clarification. Sure, we can do that I was hoping that to be part of the particular patch review process. Also, if that's the preference, we can write a short-ish doc with some patch classification and explanations. Btw, I've uploaded the cling-specific patches against the clang-9 codebase: https://github.com/vgvassilev/clang/commits/upgrade_llvm90 Our production LLVM-5 is patch free, I had to introduce some patches in llvm-9 but thanks to Lang I know how to get rid of them.

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

JF, Hal, did you mean you want a design document of how cling in general or a design RFC for the patches we have? A design document for cling would be quite large and will take us some time to write up. OTOH, we could relatively easily give a rationale for each patch.

I had in mind something that's probably in between. Something that explains the patches and enough about how they fit into a larger system that we can reason about the context.

Maybe a purpose would be more useful to understand your request? I assume you meant “I’d like us to understand what we’re signing up to maintain, and why it’s useful to do things this way”. In particular, if there’s undue burden in a particular component, and the code could be changed to work differently with less support overhead, then we’d want to identify this fact ahead of time.

I’m guessing at what Hal is asking, LMK if that’s not what you had in mind!

Yes. To understand how all of the pieces fit together to enable support for incremental compilation of C++ code. Once everything is in place, if I wanted to use the infrastructure to do some kind of incremental compilation of C++, what would I do? And what do the set of patches aim to do to get us there?

-Hal

Hi Richard,

Hi Vassil,

This is a very exciting proposal that I can imagine bringing important benefits to the existing cling users and also to the clang user and developer community. Thank you for all the work you and your team have done on cling so far and for offering to bring that work under the LLVM umbrella!

Are you imagining cling being part of the clang repository, or a separate LLVM subproject (with only the changes necessary to support cling-style uses of the clang libraries added to the clang tree)?

Good question. In principle cling was developed with the idea to become a separate LLVM subproject. Although I’d easily see it fit in clang/tools/.

Nominally, cling has “high-energy physics”-specific features such as the so called ‘meta commands’. For example, [cling] .L some_file would try to load a library called some_file.so and if it does not exist, try #include-ing a header with that name; [cling] .x script.C includes script.C and calls a function named script. I can imagine that broader community may not like/use that. If we start trimming down features like that then it won’t really be cling anymore. Here is what I would imagine as a way forward:

  1. Land as many cling/“incremental compilation”-related patches as we can in clang.
  2. Build a simple tool, let’s use a strawman name – clang-repl, which only does the basics. For example, one can feed it incremental C++ and execute it.
  3. Rework cling to use that infrastructure – ideally, implementing it’s specific meta commands and other domain-specific features such as dynamic scopes.

We could move any of the cling features which the broader community finds useful closer to clang. For the moment I am being conservative as this will also give us the opportunity to rethink some of the features.

The hard part is what lives where. First bullet point is clear. The second – not so much. Clang has a clang-interpreter in its examples folder and it looks a little unmaintained. Maybe we can start repurposing that to match 2.

As for cling itself there are some challenges we should try to solve. Our community lives downstream (currently llvm-5) and a straight-forward llvm upgrade + bugfixing takes around 3 months due to the nature of our software stacks. It would be a non-trivial task to move the cling-based development in llvm upstream. My worry is that HEP-cling will soon depart from LLVM-cling if we don’t get both communities on the same codebase (we have experienced such a problem with the getFullyQualified* interfaces). I am hoping that a middleman, such as clang-repl, can help. When we move parts of cling in clang we will develop and test the required functionality using clang-repl. This way users will enjoy cling-like experience and when cling upgrades its llvm its codebase will become smaller in size.

Am I making sense?

Yes, the above all makes sense to me. I agree that there should be only one thing named ‘cling’, and that it should broadly have the feature set that current ‘cling’ has. I think there are a couple of ways we can get there while still providing the a minimalist interpreter to a broader audience: either we can build a simpler clang-interpreter and a more advanced cling binary from a common set of libraries, or we could produce a configurable binary that’s able to serve both rules depending on configuration or a plugin or scripting system.

One other thing I think we should consider: there will be substantial overlap between the incremental compilation, code generation, REPL, etc. of cling and that of lldb. For the initial integration of cling into LLVM, there’s probably not much we can do about that, but it would seem beneficial for both cling and lldb if common parts could be shared where possible. As an extreme example, if we could fully unify the projects to the point where a user could switch into an ‘lldb mode’ in the middle of a cling session to do step-by-step debugging of code entered into the REPL, that would seem like an incredibly useful feature. Perhaps there’s some common set of base functionality that can be factored out of lldb and cling and unified. It would likely be a good idea to start talking to the lldb folks about that early, in case it guides your work porting cling to trunk.

This is a really good point. I’m not sure how much awareness there is on this list, but the Swift REPL is worth looking at if you haven’t seen it. It is built on/in LLDB, and provides some really nice user experience features.

For example, if you evaluate an expression that crashes, you get a full backtrace and integrated debugger experience. There are a couple of examples on this page, and more detailed info online:
https://swift.org/lldb/

-Chris

    Hi Richard,

    Hi Vassil,

    This is a very exciting proposal that I can imagine bringing
    important benefits to the existing cling users and also to the
    clang user and developer community. Thank you for all the work
    you and your team have done on cling so far and for offering to
    bring that work under the LLVM umbrella!

    Are you imagining cling being part of the clang repository, or a
    separate LLVM subproject (with only the changes necessary to
    support cling-style uses of the clang libraries added to the
    clang tree)?

     Good question. In principle cling was developed with the idea to
    become a separate LLVM subproject. Although I'd easily see it fit
    in clang/tools/.

     Nominally, cling has "high-energy physics"-specific features
    such as the so called 'meta commands'. For example, `[cling] .L
    some_file` would try to load a library called some_file.so and if
    it does not exist, try #include-ing a header with that name;
    `[cling] .x script.C` includes script.C and calls a function named
    `script`. I can imagine that broader community may not like/use
    that. If we start trimming down features like that then it won't
    really be cling anymore. Here is what I would imagine as a way
    forward:

     1. Land as many cling/"incremental compilation"-related patches
    as we can in clang.
     2. Build a simple tool, let's use a strawman name -- clang-repl,
    which only does the basics. For example, one can feed it
    incremental C++ and execute it.
     3. Rework cling to use that infrastructure -- ideally,
    implementing it's specific meta commands and other domain-specific
    features such as dynamic scopes.

     We could move any of the cling features which the broader
    community finds useful closer to clang. For the moment I am being
    conservative as this will also give us the opportunity to rethink
    some of the features.

     The hard part is what lives where. First bullet point is clear.
    The second -- not so much. Clang has a clang-interpreter in its
    examples folder and it looks a little unmaintained. Maybe we can
    start repurposing that to match 2.

     As for cling itself there are some challenges we should try to
    solve. Our community lives downstream (currently llvm-5) and a
    straight-forward llvm upgrade + bugfixing takes around 3 months
    due to the nature of our software stacks. It would be a
    non-trivial task to move the cling-based development in llvm
    upstream. My worry is that HEP-cling will soon depart from
    LLVM-cling if we don't get both communities on the same codebase
    (we have experienced such a problem with the getFullyQualified*
    interfaces). I am hoping that a middleman, such as clang-repl, can
    help. When we move parts of cling in clang we will develop and
    test the required functionality using clang-repl. This way users
    will enjoy cling-like experience and when cling upgrades its llvm
    its codebase will become smaller in size.

     Am I making sense?

Yes, the above all makes sense to me. I agree that there should be only one thing named 'cling', and that it should broadly have the feature set that current 'cling' has. I think there are a couple of ways we can get there while still providing the a minimalist interpreter to a broader audience: either we can build a simpler clang-interpreter and a more advanced cling binary from a common set of libraries, or we could produce a configurable binary that's able to serve both rules depending on configuration or a plugin or scripting system.

Good point. We could make it extendable, and actually that should be a design goal. The question how exactly is not very clear to me. Can you elaborate on what you had in mind as configuration or scripting system (plugin system I think I know what you meant). I will give an example with 3 distinct features in cling which we have implemented over the years and had different requirements:

* AST-based automatic differentiation <https://llvm.org/devmtg/2013-11/slides/Vassilev-Poster.pdf> with the clad library <https://github.com/vgvassilev/clad> -- here we essentially extend cling's runtime by providing a `clad::differentiate`, `clad::gradient`, `clad::hessian` and `clad::jacobian` primitives. Each primitive is a specially annotated wrapper over a function, say `double pow2(double x) { return x*x; }; auto pow2dx = clad::differentiate(pow2,/*wrt*/0);`. Here we let clang build a valid AST and the plugin creates the first order derivative and swaps the DeclRefExpr just before codegen so that we call the derivative instead. This is achievable by the current clang plugin system ( a bit problematic on windows as clang plugins do not work there ).

* Language extensions which require Sema support -- we have a legacy feature which should define a variable on the prompt if not defined (something like implicit auto) `cling[] i = 13;` should be translated into `cling[] auto i = 13;` if I is undefined. We solve that by adding some last resort lookup callback which marks `i` of dependent type so that we can produce an AST which we can later 'fix'.

* Language extensions which require delayed lookup rules (aka dynamic scope) -- ROOT has an I/O system bound to cling people can write:`if (TFile::Open("file_that_has_hist_cpp_obj.root")) hist->Draw();`. Here we use the approach from the previous bullet and synthesize `if (TFile::Open("file_that_has_hist_cpp_obj.root")) eval<void>("hist->Draw()", /*escape some context*/...);`.

The implementation of these three features can be considered as possible with current clang. The issue is that it seems more like hacking clang rather than extending it. If we can come up with a sound way of implementing these features that would be awesome.

One other thing I think we should consider: there will be substantial overlap between the incremental compilation, code generation, REPL, etc. of cling and that of lldb.

I would love to hear opinions from the lldb folks. We have chatted number of times and I have looked at how they do it. I think lldb spawns (used to spawn last time I looked) a compiler instance per input line. That is not acceptable for cling due to its high-performance requirements. Most of the issues that need solving for lldb comes from materializing debug information to AST. LLDB folks, correct me if I am wrong.

That being said doesn't mean that we should not aim for centralizing the incremental compilation for both projects. We should but may be challenging because of the different focus which defines project priorities.

For the initial integration of cling into LLVM, there's probably not much we can do about that, but it would seem beneficial for both cling and lldb if common parts could be shared where possible. As an extreme example, if we could fully unify the projects to the point where a user could switch into an 'lldb mode' in the middle of a cling session to do step-by-step debugging of code entered into the REPL, that would seem like an incredibly useful feature. Perhaps there's some common set of base functionality that can be factored out of lldb and cling and unified. It would likely be a good idea to start talking to the lldb folks about that early, in case it guides your work porting cling to trunk.

Indeed. There have been user requests to be able to run step-by-step in cling. That would be the ultimate long term goal!

One other thing I think we should consider: there will be substantial overlap between the incremental compilation, code generation, REPL, etc. of cling and that of lldb. For the initial integration of cling into LLVM, there's probably not much we can do about that, but it would seem beneficial for both cling and lldb if common parts could be shared where possible. As an extreme example, if we could fully unify the projects to the point where a user could switch into an 'lldb mode' in the middle of a cling session to do step-by-step debugging of code entered into the REPL, that would seem like an incredibly useful feature. Perhaps there's some common set of base functionality that can be factored out of lldb and cling and unified. It would likely be a good idea to start talking to the lldb folks about that early, in case it guides your work porting cling to trunk.

This is a really good point. I’m not sure how much awareness there is on this list, but the Swift REPL is worth looking at if you haven’t seen it. It is built on/in LLDB, and provides some really nice user experience features.

For example, if you evaluate an expression that crashes, you get a full backtrace and integrated debugger experience. There are a couple of examples on this page, and more detailed info online:
https://swift.org/lldb/

Thanks Chris! The comments <https://reviews.llvm.org/D34444#793311> coming from John McCall allude to this that we need some broader discussion on how we do things for incremental compilation. I still have not forgotten that I need to get back to him with that :wink:

That's one of the challenges I see. Currently upstreaming our patches did not have enough context. Now I hope that we can start with something minimal (and likely wrong) but we will have a common tool in the context of which we can discuss how to make things better. Putting swift-repl folks, lldb folks and cling folks in one virtual room may be helpful.

Vassil is right, it’s just one Clang instance per expression. This is by design as it allows LLDB’s expression evaluator to be flexible and it also makes the code simpler and clearer.

Regarding the REPL part: LLDB’s expression parser for C++ isn’t meant to be a full REPL. There is only some limited data sharing between each expression (e.g., result types and a few specifically marked declarations from the user) as the goal is to make an AST that fits the context where the expression is evaluated. That means that we need to support that a user can type “MyStruct” and in one expression it might refer to a struct type, but in the next expression (which could by at some other point in the program) it might be a typedef, or a macro, or a Objective-C class, or a global/local variable name, or a member variable as the evaluation context changed into a class, or a keyword in the current C-language, or also a struct but with a different definition or not even anything at all.

I really don’t see a sane way to support just this one simple feature with with Cling’s single shared AST + incremental CodeGen approach.

Also LLDB’s expression parser doesn’t really have a lot of non-LLDB specific code left that could be shared with other projects. We used to have a bunch of Clang in the expression parser but most of it is be gone by now. The rest is really LLDB-specific (e.g., configuring Clang to match the target we are trying to debug, a lot of code for setting up the right evaluation context of the location where the program is stopped).

Having said that, I think Cling should be upstream any shared code we can find between Cling and LLDB should be shared. Feel free to add me to patches and I’ll see what I can find.

  • Raphael

I like cling, and having it integrated with the rest of the project would be neat. I agree with Hal’s suggestion to explain the design of what remains. It sounds like a pretty small amount of code.

JF, Hal, did you mean you want a design document of how cling in general or a design RFC for the patches we have? A design document for cling would be quite large and will take us some time to write up. OTOH, we could relatively easily give a rationale for each patch.

I had in mind something that's probably in between. Something that explains the patches and enough about how they fit into a larger system that we can reason about the context.

Maybe a purpose would be more useful to understand your request? I assume you meant “I’d like us to understand what we’re signing up to maintain, and why it’s useful to do things this way”. In particular, if there’s undue burden in a particular component, and the code could be changed to work differently with less support overhead, then we’d want to identify this fact ahead of time.

I’m guessing at what Hal is asking, LMK if that’s not what you had in mind!

Yes. To understand how all of the pieces fit together to enable support for incremental compilation of C++ code. Once everything is in place, if I wanted to use the infrastructure to do some kind of incremental compilation of C++, what would I do? And what do the set of patches aim to do to get us there?

It took us a while... We have published a blog post on interactive C++ with cling at the LLVM blog. Direct link <https://blog.llvm.org/posts/2020-11-17-interactive-cpp-with-cling/>. I realize this touches only on some aspects of interactive C++ and cling. We have a longer document with more information but I am not yet comfortable in giving a public pointer to it. If you are interested in that document please ping me off-list and I will give you access.