Ranges for diagnostics

Hey there,

integrating clang into KDevelop we noticed that some diagnostics are lacking
ranges which are in our opinion useful to have. Take this example:

Is there a reason why some diagnostics only have a cursor pointing to the
beginning of a token, while others are underlined properly?

I think the reason is that we don't have a policy on this, so in each
separate case the decision was made by the developer implementing that
diagnostic. Do you have any suggestions for consistent guidelines on
this?

Would patches be accepted to add ranges to these diagnostics? Where would I
have to look in the codebase?

This should not be complex. Given the diagnostic text, find the
diagnostic name in
llvm/tools/clang/include/clang/Basic/Diagnostic*.td. Then, grep for
the diagnostic name in sources and add the source range to diag()
call.

Dmitri

Hey there,

integrating clang into KDevelop we noticed that some diagnostics are
lacking
ranges which are in our opinion useful to have. Take this example:

~~~~~~~
class foobar {};
class foobar {};

void foo(double);
void foo(float);

int main()
{
  foo(0);

  "123" + 1;
  return 0;
}
~~~~~~~

If you compile this with clang++ you'll get:

~~~~~~~
test.cpp:2:7: error: redefinition of 'foobar'
class foobar {};
      ^
test.cpp:1:7: note: previous definition is here
class foobar {};
      ^
test.cpp:9:3: error: call to 'foo' is ambiguous
  foo(0);
  ^~~
test.cpp:4:6: note: candidate function
void foo(double);
     ^
test.cpp:5:6: note: candidate function
void foo(float);
     ^
test.cpp:11:9: warning: expression result unused [-Wunused-value]
  "123" + 1;
  ~~~~~ ^ ~
1 warning and 2 errors generated.
~~~~~~~

Is there a reason why some diagnostics only have a cursor pointing to the
beginning of a token, while others are underlined properly?

This isn't a question of "properly" or not - you'll notice that the things
underlined could actually, in other contexts, be more than one token (eg:
(1 + 2) + "foo" the LHS is (1 + 2) is underlined, and in "foo(0)" it's
possible that 'foo' could be a more complex expression involving a fully
qualified function name, decltype, etc). I suppose arguably the 'foobar'
example could involve a fully qualified (nested) name too.

For the child diagnostics pointing to the candidate function it's fine to
only
get a cursor. But for the redefinition of 'foobar' I'd expect a range.
Similar
for diagnostics related to not-found #include statements.

Would patches be accepted to add ranges to these diagnostics? Where would I
have to look in the codebase?

If we decide that's the right direction (that all diagnostic locations
should highlight the entire token, rather than just use a caret to point to
the start) we should probably just fix the diagnostic printer, rather than
fixing every diagnostic call site.

> Hey there,
>
> integrating clang into KDevelop we noticed that some diagnostics are
> lacking
> ranges which are in our opinion useful to have. Take this example:
>
> ~~~~~~~
> class foobar {};
> class foobar {};
>
> void foo(double);
> void foo(float);
>
> int main()
> {
>
> foo(0);
>
> "123" + 1;
> return 0;
>
> }
> ~~~~~~~
>
> If you compile this with clang++ you'll get:
>
> ~~~~~~~
> test.cpp:2:7: error: redefinition of 'foobar'
> class foobar {};
>
> ^
>
> test.cpp:1:7: note: previous definition is here
> class foobar {};
>
> ^
>
> test.cpp:9:3: error: call to 'foo' is ambiguous
>
> foo(0);
> ^~~
>
> test.cpp:4:6: note: candidate function
> void foo(double);
>
> ^
>
> test.cpp:5:6: note: candidate function
> void foo(float);
>
> ^
>
> test.cpp:11:9: warning: expression result unused [-Wunused-value]
>
> "123" + 1;
> ~~~~~ ^ ~
>
> 1 warning and 2 errors generated.
> ~~~~~~~
>
> Is there a reason why some diagnostics only have a cursor pointing to the
> beginning of a token, while others are underlined properly?

This isn't a question of "properly" or not - you'll notice that the things
underlined could actually, in other contexts, be more than one token (eg:
(1 + 2) + "foo" the LHS is (1 + 2) is underlined, and in "foo(0)" it's
possible that 'foo' could be a more complex expression involving a fully
qualified function name, decltype, etc). I suppose arguably the 'foobar'
example could involve a fully qualified (nested) name too.

Right, properly is the wrong word.

To rephrase: I propose to add ranges to "top-level" diagnostics. In the
example above, that would mean adding a range to the "redifinition of
'foobar'.

> For the child diagnostics pointing to the candidate function it's fine to
> only
> get a cursor. But for the redefinition of 'foobar' I'd expect a range.
> Similar
> for diagnostics related to not-found #include statements.
>
> Would patches be accepted to add ranges to these diagnostics? Where would
> I
> have to look in the codebase?

If we decide that's the right direction (that all diagnostic locations
should highlight the entire token, rather than just use a caret to point to
the start) we should probably just fix the diagnostic printer, rather than
fixing every diagnostic call site.

Me as a consumer of the clang-c API also need to have that information. Thus
we'll need to store the ranges into the diagnostics anyways - no? The printers
should then just work™.

Or how would you implement this purely on the diagnostic printer side? How do
you get the token at a cursor position with the C-API? We tried
clang_Cursor_getSpellingNameRange but didn't get good results.

Thanks

> Is there a reason why some diagnostics only have a cursor pointing to the
> beginning of a token, while others are underlined properly?

I think the reason is that we don't have a policy on this, so in each
separate case the decision was made by the developer implementing that
diagnostic. Do you have any suggestions for consistent guidelines on
this?

My suggestion would be to always add ranges for relevant tokens to the "top-
level" diagnostics. I.e. to those tokens which the user wrote. Sub-level
diagnostics such as the candidate functions in the example above, don't need
the range I guess.

> Would patches be accepted to add ranges to these diagnostics? Where would
> I
> have to look in the codebase?

This should not be complex. Given the diagnostic text, find the
diagnostic name in
llvm/tools/clang/include/clang/Basic/Diagnostic*.td. Then, grep for
the diagnostic name in sources and add the source range to diag()
call.

Thank you thats helpful!

> > Hey there,
> >
> > integrating clang into KDevelop we noticed that some diagnostics are
> > lacking
> > ranges which are in our opinion useful to have. Take this example:
> >
> > ~~~~~~~
> > class foobar {};
> > class foobar {};
> >
> > void foo(double);
> > void foo(float);
> >
> > int main()
> > {
> >
> > foo(0);
> >
> > "123" + 1;
> > return 0;
> >
> > }
> > ~~~~~~~
> >
> > If you compile this with clang++ you'll get:
> >
> > ~~~~~~~
> > test.cpp:2:7: error: redefinition of 'foobar'
> > class foobar {};
> >
> > ^
> >
> > test.cpp:1:7: note: previous definition is here
> > class foobar {};
> >
> > ^
> >
> > test.cpp:9:3: error: call to 'foo' is ambiguous
> >
> > foo(0);
> > ^~~
> >
> > test.cpp:4:6: note: candidate function
> > void foo(double);
> >
> > ^
> >
> > test.cpp:5:6: note: candidate function
> > void foo(float);
> >
> > ^
> >
> > test.cpp:11:9: warning: expression result unused [-Wunused-value]
> >
> > "123" + 1;
> > ~~~~~ ^ ~
> >
> > 1 warning and 2 errors generated.
> > ~~~~~~~
> >
> > Is there a reason why some diagnostics only have a cursor pointing to
the
> > beginning of a token, while others are underlined properly?
>
> This isn't a question of "properly" or not - you'll notice that the
things
> underlined could actually, in other contexts, be more than one token (eg:
> (1 + 2) + "foo" the LHS is (1 + 2) is underlined, and in "foo(0)" it's
> possible that 'foo' could be a more complex expression involving a fully
> qualified function name, decltype, etc). I suppose arguably the 'foobar'
> example could involve a fully qualified (nested) name too.

Right, properly is the wrong word.

To rephrase: I propose to add ranges to "top-level" diagnostics. In the
example above, that would mean adding a range to the "redifinition of
'foobar'.

> > For the child diagnostics pointing to the candidate function it's fine
to
> > only
> > get a cursor. But for the redefinition of 'foobar' I'd expect a range.
> > Similar
> > for diagnostics related to not-found #include statements.
> >
> > Would patches be accepted to add ranges to these diagnostics? Where
would
> > I
> > have to look in the codebase?
>
> If we decide that's the right direction (that all diagnostic locations
> should highlight the entire token, rather than just use a caret to point
to
> the start) we should probably just fix the diagnostic printer, rather
than
> fixing every diagnostic call site.

Me as a consumer of the clang-c API also need to have that information.
Thus
we'll need to store the ranges into the diagnostics anyways - no? The
printers
should then just work™.

Or how would you implement this purely on the diagnostic printer side? How
do
you get the token at a cursor position with the C-API? We tried
clang_Cursor_getSpellingNameRange but didn't get good results.

I haven't used the C API, but if there's a limitation there that you can't
get the range of a token - we should just fix that.

That’s not the right thing to do; sometimes a single location is a character location that may point to the start of a token, and sometimes it’s a token location. That’s the trouble with using SourceLocation for both of those.

Jordan