changing the lexer or parser


I’m working on a tool that allows time construct in C. I implemented this tool by adapting Clang.

In these time constructs I like to allow arguments like “1000s”, “1000 s”, “1000 s”, “100ms”, “100 ms”, etc.

The lexer creates 1 token called numerical_token if the argument is “1000s” even if ‘s’ is added as keyword or token in TokenKinds.def. I hoped the lexer would have generated two tokens, 1 numerical_constant and an identifier (or self defined token).

What is the best way to allow these kind of arguments ? Do I have to create a new token that allows some digits followed by an ‘s’?

Thanks for any help,

Bas Burgers

Hi Bas,

clang currently implements C integer constants by including the trailing suffix (see C99 for more details).

Sema::ActOnNumbericConstant() is then responsible for determining the type of constant (integer, floating) and size.

I haven’t thought about adapting clang’s lexer to generate tokens that don’t conform to C.

That said, you could simply examine the “suffix” by hand (without fiddling with the lexer directly).