Hi,
I am trying to build a tool which can insert new AST nodes to a AST tree obtained from a source code and generate the modified source code. For example add an if condition to a given location.
I have seen examples on ReWriter which can insert text, but I want to insert a proper AST node and generate the source code from the modified AST.
For this purpose, I think I should be using ASTWriter and not ReWriter. Is there any documentation I can refer on how to implement this?
Any help in this regard is highly appreciated.
Thanks!
Ridwan
That’s interesting but is it not possible for you to work on IR?
The IRBuilder has better support for arbitrary instruction insertion.
Hi Madhur,
Thank you for the reply.
I am currently working on AST mutation and need this feature to insert custom nodes. If I have to use IR I will have to translate the AST mutations into IR level code and redo. Or is there a tool which I can use to translate AST diff into IR instructions?
Regards
I am not aware of any such tool.
It’s generally considered that the AST invariants are too subtle/complex to use AST modification and AST->source conversion reliably. Refactoring/source code modification is generally encouraged to be done via textual edits generated from source location information in the AST.
It’s odd though, because generating code on the fly would be easier on the AST than on the IR tree, if the goal is JIT and also saving the code at the same time.
It’s probably also easier also to generate properly formatted code?
Regards,
Matthieu
Hmm, not sure I follow.
Did the user write this source code? Are they going to want to change it later? Does it make sense for them to see the edits you’re suggesting, or are those edits really compiler optimizations/transformations? If they’re more the latter, then perhaps caching the LLVM IR (with these optimizations/transformations applied) rather than modifying the source would be more suitable.
Easier to generate correctly formatted code from the AST? Not really - the AST printing doesn’t have any particularly nuanced formatted printing. That’s what clang-format is for (it was specifically built for doing code rewrites based on ASTs - where the rewrite is expressed as a textual change to the original source (not an AST modification) & that change is applied, then clang-format is used to tidy it up).
That’s my use case, it’s different than the OP, probably.
In my case, I want to generate a first pass, with a JIT (the code is generated from another description), but the generated code could be changed by the user in a subsequent pass.
Modifying directly the AST is not an option, try generating equations with thousands of parameters that are solved in real time. Just no way someone can write them efficiently in IR (that’s why you have the AST to IR generator!).
I don’t understand your last paragraph. If clang-format can cleanup rewrites, why can’t it reformat code from the AST? If the AST printer writes any kind of code, why couldn’t clang-format reformat it?
That’s my use case, it’s different than the OP, probably.
In my case, I want to generate a first pass, with a JIT (the code is generated from another description), but the generated code could be changed by the user in a subsequent pass.
Curious. As much as possible, I’d encourage you to find ways to not have users work with generated code (by abstracting that generated code away from them - giving them a higher level representation to write, places where the generated code calls back into the user code, etc). But I don’t know your domain, etc, and wouldn’t suggest what is or isn’t right for you and your users.
But the main takeaway is that modifying the AST and generating code from that is discouraged in favor of generating source code edits.
Modifying directly the AST is not an option, try generating equations with thousands of parameters that are solved in real time. Just no way someone can write them efficiently in IR (that’s why you have the AST to IR generator!).
I don’t understand your last paragraph. If clang-format can cleanup rewrites, why can’t it reformat code from the AST? If the AST printer writes any kind of code, why couldn’t clang-format reformat it?
clang-format could format AST generated source too - I was commenting on that in answer to your question “Easier to generate correctly formatted code from the AST?” - that it’s not easier to generate correctly formatted code from the AST than it is from a textual edit. In both cases you’d use something like clang-format to tidy up the result. The AST itself doesn’t have fancy formatting support so it’s no better than a textual edit in terms of getting nicely formatted results.
My domain would be electrical schema modeling. Some people would like to have the generated code, but then change one model of a component to something else. Or remove the Newton Raphson algorithm for another one. Or remove entries in the Jacobian matrix to check for terms that don’t bring much to the result but could enhance performance.
I could write the code in memory and then pass it to clang, but it feels… odd. But maybe that what I need to do in the end? In there an example of getting code from a string?
Cheers,
Matthieu
I guess a few layers:
If you’re going source-to-source and want users to see/modify the new source, then making text edits based on source locations found in the AST (but not modifying the AST itself) is generally the suggested idea. If you simultaneously want to produce that source and compile it - yeah, probably easier to write it out, then compile it from that source on the filesystem.
(there are probably some ways to compile from source in memory - but I’m not sure of the details, it might involve using the virtual filesystem layers - I think they were implemented for continuous compilation in IDEs (compiling from the edited source buffers open in the editor without having to write them to disk first))
Indeed, that’s what I’m now aiming at. Unfortunately, it seems that there are no examples as how to use FrontEndAction properly with clang 6.0.0. I can use libeling with runToolOnCode to generate a module, but the triple is not set up properly in that case when I want to use the JIT. And it seems to be a problem with clang, as if I do this:
clang::DiagnosticOptions diagnosticOptions;
std::unique_ptr<clang::TextDiagnosticPrinter> textDiagnosticPrinter =
std::make_unique<clang::TextDiagnosticPrinter>(llvm::outs(),
&diagnosticOptions);
llvm::IntrusiveRefCntPtr<clang::DiagnosticIDs> diagIDs;
std::unique_ptr<clang::DiagnosticsEngine> diagnosticsEngine =
std::make_unique<clang::DiagnosticsEngine>(diagIDs, &diagnosticOptions, textDiagnosticPrinter.get());
clang::LangOptions languageOptions;
clang::FileSystemOptions fileSystemOptions;
clang::FileManager fileManager(fileSystemOptions);
clang::SourceManager sourceManager(*diagnosticsEngine,
fileManager);
std::shared_ptr<clang::HeaderSearchOptions> headerSearchOptions(new clang::HeaderSearchOptions());
const std::shared_ptr<clang::TargetOptions> targetOptions = std::make_shared<clang::TargetOptions>();
targetOptions->Triple = llvm::sys::getDefaultTargetTriple();
std::unique_ptr<clang::TargetInfo> targetInfo(
clang::TargetInfo::CreateTargetInfo(*diagnosticsEngine, targetOptions));
clang::HeaderSearch headerSearch(headerSearchOptions,
sourceManager,
*diagnosticsEngine,
languageOptions,
targetInfo.get());
clang::MemoryBufferCache PCMCache;
clang::CompilerInstance compInst;
std::shared_ptr<clang::PreprocessorOptions> opts(std::make_shared<clang::PreprocessorOptions>());
clang::Preprocessor preprocessor(opts,
*diagnosticsEngine,
languageOptions,
sourceManager,
PCMCache,
headerSearch,
compInst);
preprocessor.Initialize(*targetInfo);
auto filter = llvm::MemoryBuffer::getMemBufferCopy(fullfile);
sourceManager.setMainFileID(sourceManager.createFileID(std::move(filter)));
clang::IdentifierTable identifierTable(languageOptions);
clang::SelectorTable selectorTable;
clang::Builtin::Context builtinContext;
builtinContext.InitializeTarget(*targetInfo, nullptr);
clang::ASTContext astContext(languageOptions,
sourceManager,
identifierTable,
selectorTable,
builtinContext);
astContext.InitBuiltinTypes(*targetInfo);
compInst.setTarget(targetInfo.get());
llvm::LLVMContext context;
std::unique_ptr<clang::CodeGenAction> action = std::make_unique<clang::EmitLLVMAction>(&context);
textDiagnosticPrinter->BeginSourceFile(languageOptions, &preprocessor);
compInst.ExecuteAction(*action);
Then inside the action, even if I created the TargetInfo myself, clang tries something nasty:
ASAN:DEADLYSIGNAL
Matthieu, try https://github.com/firolino/clang-tool as getting started and change the transformer to your needs to insert code/text at a given location. Hope it helps.
Best,
Firat
Hi,
I can easily add source code to a file. It’s a no brainer, I’m not going to use clang for this, it’s overkill.
What doesn’t work, as stated in my previous example, is getting a module out of clang, a module that can be used inside llvm. When executing the code below, I get a write error. That’s a problem because there are no resources online on this issue. The api changes too quickly for this, and even lib clang doesn’t help because then the triple is not set (and then llvm breaks).
Cheers
Matthieu
Try having a look at an OpenCL implementation - pocl is the one that comes to mind. OpenCL relies on taking a string and outputting code, all in memory [the spec doesn’t precisely say you can’t generate a file and compile that through a standalone executable, but that’s not exactly a “nice” solution].
I work on ARM’s OpenCL solution, so I’m not familiar with the details of the pocl, but I’m 100% sure that they do something similar to what we do - build/take a string, call various parts of clang functions, and produce a binary executable in memory.
It may not be 100% like what you want to do, but it should give you something to start from.
Thanks. Hopefully, it won’t be too ugly to tear out a basic work piece out of it. None of the tutorials online tackle this aspect 
Cheers
Matthieu
Unfortunately, it seems that popencl is also based on a very old LLVM/clang version.
Isn’t there an up to date tutorial somewhere? Now I have the source manager crashing with access to NULL pointers (I presume). It’s very frustrating to see all these instabilities in the code.
Cheers,
Matthieu