-fspell-checking causing modifications to AST?

Hey,

I'm a bit puzzled by the following behavior of clang when inspecting the AST
for the following code snippet:

test.cpp:
char foo;
char bar = foo1;

$ clang++ -cc1 -ast-dump test.cpp:
main.cpp:2:12: error: use of undeclared identifier 'foo1'; did you mean 'foo'?
(...)
`-VarDecl 0xa19cb0 <line:2:1, col:12> col:6 bar 'char' cinit
  `-ImplicitCastExpr 0xa19d60 <col:12> 'char' <LValueToRValue>
    `-DeclRefExpr 0xa19d38 <col:12> 'char' lvalue Var 0xa19c40 'foo' 'char'

So, Clang seems to interpret 'foo1' as a typo 'foo'. This is still fine.
However, Clang also "fixes up the code" and pretends that 'foo1' *is* 'foo'
which -ast-dump shows (see last line of the dump). This is odd.

Obviously, this is only the case if -fno-spell-checking is *not* passed to
clang++. With -fno-spell-checking I get this instead:

$ clang++ -cc1 -ast-dump test.cpp -fno-spell-checking
(...)
`-VarDecl 0x1a79ca0 <line:2:1, col:6> col:6 bar 'char'

This looks fine to me.

So, my question: Is this intended behavior? Is this for error recovery in the
parser?
If true, is there a way to both enable spell-checking but to *disable* it
touching the AST?

Reminder: I'm working on integrating Clang (read: libclang) in KDevelop, hence
I'm looking at this from an development tool perspective. Magic behavior
inside Clang is somewhat undesirable there :slight_smile:

Greets

Hey,

I'm a bit puzzled by the following behavior of clang when inspecting the
AST
for the following code snippet:

test.cpp:
char foo;
char bar = foo1;

$ clang++ -cc1 -ast-dump test.cpp:
main.cpp:2:12: error: use of undeclared identifier 'foo1'; did you mean
'foo'?
(...)
`-VarDecl 0xa19cb0 <line:2:1, col:12> col:6 bar 'char' cinit
  `-ImplicitCastExpr 0xa19d60 <col:12> 'char' <LValueToRValue>
    `-DeclRefExpr 0xa19d38 <col:12> 'char' lvalue Var 0xa19c40 'foo' 'char'

So, Clang seems to interpret 'foo1' as a typo 'foo'. This is still fine.
However, Clang also "fixes up the code" and pretends that 'foo1' *is* 'foo'
which -ast-dump shows (see last line of the dump). This is odd.

Obviously, this is only the case if -fno-spell-checking is *not* passed to
clang++. With -fno-spell-checking I get this instead:

$ clang++ -cc1 -ast-dump test.cpp -fno-spell-checking
(...)
`-VarDecl 0x1a79ca0 <line:2:1, col:6> col:6 bar 'char'

This looks fine to me.

So, my question: Is this intended behavior? Is this for error recovery in
the
parser?

Yes, this is intended. It is our policy that whenever we emit an error
message with a fix-it hint, we recover as if that fix-it hint had been
applied to the AST. This is not limited to typo-correction; it happens
whenever we emit an error with a fix-it.

If true, is there a way to both enable spell-checking but to *disable* it

touching the AST?

Not currently, no. I think what you're really asking for is for us to turn
off all error recovery, so the AST contains only things that were in the
source code, and omits erroneous things rather than trying to recover from
errors? We could try to support such a mode, but I suspect that it would
degrade the diagnostic experience enough that you may want to run Clang
twice: once in this mode, and once to produce user-facing diagnostics.
Also, I don't expect that you'll find volunteers in the Clang community to
do the work, but if the design and rationale are reasonable and you can
provide a convincing argument that the mode will have sufficient ongoing
support work, we'd accept patches to implement this.

Reminder: I'm working on integrating Clang (read: libclang) in KDevelop,

hence
I'm looking at this from an development tool perspective. Magic behavior
inside Clang is somewhat undesirable there :slight_smile:

Can you explain a bit more about your use cases? I assume you want to
provide some kind of AST introspection on not-necessarily-valid code, and
you don't want to expose our typo-correction results in that? Depending on
what you're trying to achieve, there might be lighter-weight approaches
(for instance, checking if the token at the given location actually refers
to the right name).

> Hey,
>
> I'm a bit puzzled by the following behavior of clang when inspecting the
> AST
> for the following code snippet:
>
> test.cpp:
> char foo;
> char bar = foo1;
>
> $ clang++ -cc1 -ast-dump test.cpp:
> main.cpp:2:12: error: use of undeclared identifier 'foo1'; did you mean
> 'foo'?
> (...)
> `-VarDecl 0xa19cb0 <line:2:1, col:12> col:6 bar 'char' cinit
>
> `-ImplicitCastExpr 0xa19d60 <col:12> 'char' <LValueToRValue>
>
> `-DeclRefExpr 0xa19d38 <col:12> 'char' lvalue Var 0xa19c40 'foo'
> 'char'
>
> So, Clang seems to interpret 'foo1' as a typo 'foo'. This is still fine.
> However, Clang also "fixes up the code" and pretends that 'foo1' *is*
> 'foo'
> which -ast-dump shows (see last line of the dump). This is odd.
>
> Obviously, this is only the case if -fno-spell-checking is *not* passed to
> clang++. With -fno-spell-checking I get this instead:
>
> $ clang++ -cc1 -ast-dump test.cpp -fno-spell-checking
> (...)
> `-VarDecl 0x1a79ca0 <line:2:1, col:6> col:6 bar 'char'
>
> This looks fine to me.
>
> So, my question: Is this intended behavior? Is this for error recovery in
> the
> parser?

Yes, this is intended. It is our policy that whenever we emit an error
message with a fix-it hint, we recover as if that fix-it hint had been
applied to the AST. This is not limited to typo-correction; it happens
whenever we emit an error with a fix-it.

If true, is there a way to both enable spell-checking but to *disable* it

> touching the AST?

Not currently, no. I think what you're really asking for is for us to turn
off all error recovery, so the AST contains only things that were in the
source code, and omits erroneous things rather than trying to recover from
errors?

Yes. That sounds interesting.

We could try to support such a mode, but I suspect that it would
degrade the diagnostic experience enough that you may want to run Clang
twice: once in this mode, and once to produce user-facing diagnostics.

I fear the performance impacts here. Running it twice over a possibly large TU
won't be possible for us.

Also, I don't expect that you'll find volunteers in the Clang community to
do the work, but if the design and rationale are reasonable and you can
provide a convincing argument that the mode will have sufficient ongoing
support work, we'd accept patches to implement this.

Well appreciated.

Reminder: I'm working on integrating Clang (read: libclang) in KDevelop,

> hence
> I'm looking at this from an development tool perspective. Magic behavior
> inside Clang is somewhat undesirable there :slight_smile:

Can you explain a bit more about your use cases? I assume you want to
provide some kind of AST introspection on not-necessarily-valid code, and
you don't want to expose our typo-correction results in that? Depending on
what you're trying to achieve, there might be lighter-weight approaches
(for instance, checking if the token at the given location actually refers
to the right name).

Okay. You're right in a sense that we want to support introspecting half-
broken code in our IDE.

Suppose the following code snippet:

  class Foo {
    void bar();
  };

  void Foo::bar1() {}

Now when parsing this we'd like to have the following state in our code model:
- Be aware that 'bar1' might be a typo
- Be aware that this decl is named "Foo::bar1" right now, not "Foo::bar"

Regarding playing around with tokens to extract the identifier: This will
likely slow down the parser dramatically, because this would have to be done
for every declaration. After all, I cannot decide whether the state of an
individual declaration in the AST is actually "sane" when spell-checking is
enabled.

Does that make sense to you?

> > Hey,
> >
> > I'm a bit puzzled by the following behavior of clang when inspecting
the
> > AST
> > for the following code snippet:
> >
> > test.cpp:
> > char foo;
> > char bar = foo1;
> >
> > $ clang++ -cc1 -ast-dump test.cpp:
> > main.cpp:2:12: error: use of undeclared identifier 'foo1'; did you mean
> > 'foo'?
> > (...)
> > `-VarDecl 0xa19cb0 <line:2:1, col:12> col:6 bar 'char' cinit
> >
> > `-ImplicitCastExpr 0xa19d60 <col:12> 'char' <LValueToRValue>
> >
> > `-DeclRefExpr 0xa19d38 <col:12> 'char' lvalue Var 0xa19c40 'foo'
> > 'char'
> >
> > So, Clang seems to interpret 'foo1' as a typo 'foo'. This is still
fine.
> > However, Clang also "fixes up the code" and pretends that 'foo1' *is*
> > 'foo'
> > which -ast-dump shows (see last line of the dump). This is odd.
> >
> > Obviously, this is only the case if -fno-spell-checking is *not*
passed to
> > clang++. With -fno-spell-checking I get this instead:
> >
> > $ clang++ -cc1 -ast-dump test.cpp -fno-spell-checking
> > (...)
> > `-VarDecl 0x1a79ca0 <line:2:1, col:6> col:6 bar 'char'
> >
> > This looks fine to me.
> >
> > So, my question: Is this intended behavior? Is this for error recovery
in
> > the
> > parser?
>
> Yes, this is intended. It is our policy that whenever we emit an error
> message with a fix-it hint, we recover as if that fix-it hint had been
> applied to the AST. This is not limited to typo-correction; it happens
> whenever we emit an error with a fix-it.
>
> If true, is there a way to both enable spell-checking but to *disable* it
>
> > touching the AST?
>
> Not currently, no. I think what you're really asking for is for us to
turn
> off all error recovery, so the AST contains only things that were in the
> source code, and omits erroneous things rather than trying to recover
from
> errors?

Yes. That sounds interesting.

> We could try to support such a mode, but I suspect that it would
> degrade the diagnostic experience enough that you may want to run Clang
> twice: once in this mode, and once to produce user-facing diagnostics.

I fear the performance impacts here. Running it twice over a possibly
large TU
won't be possible for us.

> Also, I don't expect that you'll find volunteers in the Clang community
to
> do the work, but if the design and rationale are reasonable and you can
> provide a convincing argument that the mode will have sufficient ongoing
> support work, we'd accept patches to implement this.

Well appreciated.

> Reminder: I'm working on integrating Clang (read: libclang) in KDevelop,
>
> > hence
> > I'm looking at this from an development tool perspective. Magic
behavior
> > inside Clang is somewhat undesirable there :slight_smile:
>
> Can you explain a bit more about your use cases? I assume you want to
> provide some kind of AST introspection on not-necessarily-valid code, and
> you don't want to expose our typo-correction results in that? Depending
on
> what you're trying to achieve, there might be lighter-weight approaches
> (for instance, checking if the token at the given location actually
refers
> to the right name).

Okay. You're right in a sense that we want to support introspecting half-
broken code in our IDE.

Suppose the following code snippet:

  class Foo {
    void bar();
  };

  void Foo::bar1() {}

Now when parsing this we'd like to have the following state in our code
model:
- Be aware that 'bar1' might be a typo
- Be aware that this decl is named "Foo::bar1" right now, not "Foo::bar"

OK. If we had a mechanism you could ask "was this cursor's name
typo-corrected?", would that suffice?

Regarding playing around with tokens to extract the identifier: This will

likely slow down the parser dramatically, because this would have to be
done
for every declaration.

I don't know about "dramatically"; Clang can answer this sort of query
quite efficiently.

After all, I cannot decide whether the state of an

individual declaration in the AST is actually "sane" when spell-checking is
enabled.

Does that make sense to you?

I think so. One more question: in your example above, do you care whether
we believe that the two functions are redeclarations of the same entity?
(Do you want to provide 'add a declaration of this to the class'-style
functionality, for instance?)

Suppose we could typo-correct this:

struct X {
  void f() const;
};
void X::f() {}

... to insert the missing 'const'. Would that be a problem for you too?