Incremental parsing/compilation question

Vassil_Vassilev1 · December 2, 2011, 3:02pm

Hi,

In cling we have line-by-line input that comes from a terminal-like prompt.
We do incremental compilation of the input. The input lines come as llvm::MemoryBuffers
We compile each memory buffer by passing them to clang. However, when clang
parses a buffer it sees EOF in the end and destroys it's current lexer and what not.

For example cling can have:
[cling$] extern "C" int printf(const char* fmt, ...);
[cling$] int i = 12;
[cling$] printf("%d\n", i);

Every line comes in memory buffer containing \0 in the end. Clang considers that as
an EOF. Ideally I want to tell the parser that the parsing of the translation unit is not
yet done, but that it should be suspended until next user's input.

Correct me if I am wrong, but the best way of doing that would be to implement a
'suspend' token. When the lexer and parser see that 'suspend' token they would stop
as if it was EOF token but without deleting/cleaning anything, so that the parsing could
be restarted later with the same state.

If this is the right approach what would be the best way to represent the 'suspend'
token? i.e. which ascii char that would trigger the suspension? (I'd really like to have
"$" but it is already part of an extension).

Vassil

DougGregor · December 2, 2011, 3:43pm

Hi,

In cling we have line-by-line input that comes from a terminal-like prompt.
We do incremental compilation of the input. The input lines come as
llvm::MemoryBuffers
We compile each memory buffer by passing them to clang. However, when clang
parses a buffer it sees EOF in the end and destroys it's current lexer
and what not.

For example cling can have:
[cling$] extern "C" int printf(const char* fmt, ...);
[cling$] int i = 12;
[cling$] printf("%d\n", i);

Every line comes in memory buffer containing \0 in the end. Clang
considers that as
an EOF. Ideally I want to tell the parser that the parsing of the
translation unit is not
yet done, but that it should be suspended until next user's input.

Correct me if I am wrong, but the best way of doing that would be to
implement a
'suspend' token. When the lexer and parser see that 'suspend' token
they would stop
as if it was EOF token but without deleting/cleaning anything, so that
the parsing could
be restarted later with the same state.

'suspend' should probably just be a special handling of the 'eof' token, so that the parser/lexer doesn't tear everything down (but otherwise acts exactly the same). We don't want the parser or preprocessor to have to check 'is this eof or suspend?' every time it currently checks for eof.

If this is the right approach what would be the best way to represent
the 'suspend'
token? i.e. which ascii char that would trigger the suspension? (I'd
really like to have
"$" but it is already part of an extension).

I suggest looking at how the code-completion token is created. It uses \0 + a file offset to distinguish between a \0 at the end of the buffer and an embedded \0 that is the code completion point.

- Doug

Topic		Replies	Views
RFC: Flexible Lexer Buffering for Handling Incomplete Input in Interactive C/C++ Clang Frontend	8	1283	February 9, 2023
[GSoC 2024] On Demand Parsing in Clang GSoC clang , gsoc2024	15	1008	March 12, 2024
Using the Lexer... Clang Frontend	9	75	February 12, 2009
Updating IncrementalProcessingTest.cpp Clang Frontend	0	74	October 2, 2019
How to get diagnostics from clang::Parser Clang Frontend	0	99	September 18, 2019

Incremental parsing/compilation question

Related Topics