This small patch change comments handler in a simple way to permit to
implement quite easily comment translators.
Once applied this patch, a CommentHandler is allowed to build a first
token to be returned to Lexer and to push a TokenStream for the
others,
then allowing generic comment -> tokens transformer.
This can be useful to transform comment shaped program annotation that
should be translated to source code and also other interesting
applications.
This is an interesting approach. The only major concern I have is
that this only allows you to translate comments into exactly one
token. In the case of openmp pragmas (for example) this doesn't seem
Do I've been sufficient clear explaining that the comments may be
translated to an arbitrary number of tokens calling EnterTokenStream
inside the CommentHandler?
Yes, but I find the protocol for introducing tokens via a comment
handler to be very confusing. Could we instead eliminate the Token
&token argument, and just make the protocol: to "parse" the contents of
the comment, use EnterTokenStream and then return true?
The only proposal I see that would not change a lot of things is to
remove Token argument from CommentHandler and to specify that they
always shall use EnterTokenStream also when they need to insert only one
token.
Then at end of Preprocessor::HandleComment if any of the comment
handlers has informed the caller to have inserted some tokens the first
one available is fetched and returned by Preprocessor::HandleComment.
However the presence of argument for returned Token is perfectly
congruent with each other Handler.
Or, at the very least, the "token" argument should be named
"firstToken", to indicate that it is possible to inject other tokens. Of
course, HandleComment also needs documentation to describe what the
parameters and return value actually mean, and how comment handlers can
introduce tokens into the stream.
Yes, I will do all that before to submit final version of the patch and
once we agreed on the choosen approach.
rich enough. A different approach would be to allow the handler to
push an arbitrary number of tokens into the parser's lookahead
buffer. Would this work for what you're trying to do?
Yes, but perhaps this is not needed: as I wrote the CommentHandler could
return a first token *and* produce the other tokens to be read and push
them to lexer stack using EnterTokenStream.
I've already tried this with success in a sample implementation that
simply lex the comment content without modify it:
Still I've not got any feedback: do you think that the patch in original
mail will be applied as is? Should I improve it in some way?
I think it's okay if the HandleComment protocol can be simplified a bit
and if it is documented, although I'd like to hear from Chris. I'd feel
much better if we actually had some kind of use of this code path within
Clang itself. For example, would it be possible for the keep-comments
mode to be implemented outside of the lexer using your changes to
HandleComment? That might actually simplify the lexer while making it
more general.
... hmmm, probably it's feasible, but I'm not so sure it's a good idea
because this means we should always have a CommentHandler loaded and we
have the problem of CommentHandler execution order, as only the first
CommentHandler that want to return a first Token is allowed to do that
(unless we implement the variant above).
With proposed patch, instead, the CommentHandler that want to transform
comments has always the priority over keep-comments mode.
That apart I'd like to proceed by steps, what do you think about a
resubmittal of proposed patch with Token argument renamed in FirstToken
and proper documentation of HandleComment protocol?
Or you prefer I remove Token argument as described in the variant above
although it's a bit less efficient for returning a single token?