[RFC][GSOC 2024] - Improve Clang Diagnostics

Clang Diagnostics play a critical role in the compiler by issuing warnings and errors to the programmer. Exceptional diagnostics can greatly enhance the user experience of the compiler and boost productivity. With 869 open issues tagged “clang-diagnostics,” there is ample room for improvement in this area.

Humble Ping to previous years mentors for clang-diagnostics, Can this be considered as a project for this year GSOC 2024. I have recently started contributing to LLVM/Clang { mainly clang only } and am quite interested about this.
Thank you
Yours sincerely

@AaronBallman , @Shivam , @erichkeane , @tbaeder

1 Like

I believe last time we did this we made the mistake of not having the diagnostics improvements proposed not well defined at the beginning of the process.

That diagnostics improvement is pretty open-ended, so I would prefer that we start this by identifying WHICH diagnostics are in greatest need of improvements, and which you believe you can handle in the short time that GSoC lasts.

Note that many of those clang-diagnostics tagged items are some pretty vague and not particularly actionable bug reports, so I’d like a well defined proposal before being willing to try this again.

Sure Sure, I will take an in-depth look and will get back to you soon.
In between I also request for your guidance to define the direction for this project.
Thank you

1 Like

Hello @erichkeane as you said diagnostics improvement is quite open ended,I realised it pretty soon as I was researching for diagnostics, Here are my some previous day’s observations and findings that I found interesting(some are new feature enhancements we can look or some are difference between GCC and clang)

  1. New feature to prevent Trojan Source attacks : https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-attacks-gcc-12
    https://godbolt.org/z/x1srnYcjc

  2. New feature which will be in GCC14 looks quite interseting:


    https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=2023-Cauldron-diagnostics-talk.pdf

  3. https://godbolt.org/z/E9e5bz4d9 - Here fixit hints are not provided in C language howver when I tried them on C++ they are shown.

  4. New flag - dangling reference https://godbolt.org/z/xeofY1898

  5. https://godbolt.org/z/xEczMb8Md here - -Wredundant-move in GCC has been expanded to detect instances where a const object is being moved. For example, in the code provided, the use of std::move on a const object t is flagged as redundant. This is because the move operation wouldn’t be utilized due to the absence of a constructor T(const T&&) in the T struct. In such cases, the const qualifier prevents the use of T(T&&) constructor, making T(const T&) the preferred choice even with std::move

  6. GCC has introduced a new warning, -Wxor-used-as-pow , which aims to identify potentially erroneous usages of the exclusive OR operator ^ . This warning is triggered when the code contains expressions like 2^8 , where it’s more probable that the intention was to use a left shift operation 1 << 8 instead. To ensure the warning remains practical and minimizes false positives, it specifically targets cases where the first operand of the XOR operation is either the decimal constant 2 or 10.

So the point here I want to make is that I realised it very soon that its very vast open ended issue and as you said to identify which diagnostics are in greatest need of improvement ,I feel I may not be able to judge the priorities of the Diagnostics, and request your guidance in that matter, Further I wanted to ask whether this project will be in a manner where incremental improvements to clang diagnostics will be done in GSoc Time period if that would be case we can try for atleast 9-10 that would be good number considering 12 weeks time period, or this project would be in a manner to implement a single big feature which currently you may have in your mind or is needed currently?.

Also I request comments and suggestions from previous mentors as well.
Thank you

For the #1 there, we have trojan source detection, but we put it in tidy, since we don’t see it as something worth putting into clang itself.

All of those are perhaps interesting to start with. I do NOT think just ‘incremental’ improvements would be acceptable to me, we tried that last time and it wasn’t nearly as effective as we wanted it to be.

I think having ME pick a direction isn’t going to be very helpful as it is going to be very much down to where your capabilities and interests lie.

I would be ok with 1 of two approaches ( though not sure I’d be the best mentor for it):
1- Pick 1 big feature to implement
-or-
2- Identify a good sized collection of smaller things in advance (perhaps on a theme? But ‘misc’ interesting fixes is also sufficiently themed) and start with those.

Again, either of those things is going to be based on your interests and capabilities.

I am interested in direction to work on 1 big feature, with that being said the idea of adding ASCII art to clang diagnostics particularly excites me
image
, and am interested to work in that direction.

The difficult part of the ‘ascii art’ is that we don’t really have that diagnostic, so you’d likely have to identify ones where it makes sense. I don’t have any good ideas on one, but perhaps @AaronBallman or @cjdb has an idea for that?

I think we do have some existing diagnostics that could perhaps benefit from ASCII art. For example, diagnostics related to the order tokens are parsed in could have ASCII art showing how to rearrange things (e.g., Compiler Explorer could show ASCII art about how to rearrange everything after the parameter list), or when identifying cycles in things we could have ASCII art showing the cycle rather than notes (e.g., Compiler Explorer), etc. But I agree, the trick will be finding a compelling diagnostic that already exists and seeing if we can turn it into ASCII art that is more understandable than the current diagnostic behavior.

2 Likes

I haven’t thought about ASCII art in diagnostics before, but will be happy to review any explicit suggestions/contributions. The key thing to keep in mind is that the artwork needs to work with whatever the text is trying to communicate, not say something independent.

1 Like

Diagnostics about layout of a type might benefit from ASCII art, e.g. -Wpadded.

1 Like

Thank you everyone for great ideas about diagnostics which can be improved with ASCII art, I am currently making list of all diagnostics that I think may benefit from ASCII art (including above suggested ones) and will get back soon with the list for review.

Identified two groups that may mostly benefit from ASCII art leading to a better diagnostic overall.

Cycle related warning / errors :

struct S {
  explicit S(int i) : S((float)i) {}
  explicit S(float f) : S((int)f) {}
};

class Foo {
  Mutex mu1 ACQUIRED_AFTER(mu3);     // expected-warning {{Cycle in acquired_before/after dependencies, starting with 'mu1'}}
  Mutex mu2 ACQUIRED_AFTER(mu1);     // expected-warning {{Cycle in acquired_before/after dependencies, starting with 'mu2'}}
  Mutex mu3 ACQUIRED_AFTER(mu2);     // expected-warning {{Cycle in acquired_before/after dependencies, starting with 'mu3'}}
};

  • Warn_arc_retain_cycle {here cycle can be shown}
  • Err_default_member_initializer_cycle {here cycle can be shown}
  • Err_exception_spec_cycle {here cycle can be shown}
  • warn_infinite_recursive_function {here cycle can be shown}
  • warn_template_qualified_friend_unsupported here nesting can be shown
  • warn_template_qualified_friend_ignored here we can show on whom who is dependent

Bits related error / warning :

  • warn_impcast_bitfield_precision_constant
  struct { int i5 : 5; } a;

  a.i5 = 36; // expected-warning {{implicit truncation from 'int' to bit-field changes value from 36 to 4}}
}

Here it can be shown how last 5 bits resulting in getting converted to 4 will lead to more clarity.

  • Wsingle-bit-bitfield-constant-conversion here implicit truncation from C to a one-bit wide bit-field can be shown
  • warn_shift_result_sets_sign_bit here how the signed bit is getting set can be shown leading to more clarity
  • warn_impcast_high_order_zero_bits here can be shown which higher bits like in case of int to long long the bits from 32 to 64 will be set to zero.
  • def warn_strncat_large_size overflow of buffer related can be shown how many bits/bytes are overflowing?

Layout / ordering related warning :

Requesting comments & suggestions from @AaronBallman , @cjdb , @erichkeane , @Endill .
Thank you

I just tried some basic code to print ASCII art , given string and making a cycle for below code:

struct S {
  explicit S(int i) : S((float)i) {}
  explicit S(float f) : S((int)f) {}
};

my code :


class ASCIINode {
private:
    std::string content;
    int x;
    int y;
public:
    vector<int> link[8]; // link points for box
    // mat is the Cartesian plane
    ASCIINode(const std::string& str, int x, int y, vector<vector<char>> &mat) : content(str), x(x), y(y) {
        int length = str.length();
        int width = length; 
        int half = (width / 2) + 2;
        for (int i = x - half; i <= x + half; ++i) {
            mat[y - 1][i] = '-';
            mat[y + 1][i] = '-';
        }
        for (int i = y - 1; i <= y + 1; ++i) {
            mat[i][x - half] = '|';
            mat[i][x + half] = '|';
        }

        int text_start = x - (length / 2);
        for (int i = 0; i < length; ++i) {
            mat[y][text_start + i] = str[i];
        }
        for (int i = 1; i <= 8; ++i) { 
            int cur_x = x;
            int cur_y = y;
            switch (i) {
                case 1: cur_x -= half; cur_y -= 1; link[i-1] = {cur_x,cur_y}; break;
                case 2: cur_y -= 1; link[i-1] = {cur_x,cur_y};break;
                case 3: cur_x += half; cur_y -= 1; link[i-1] = {cur_x,cur_y};break;
                case 4: cur_x -= half; link[i-1] = {cur_x,cur_y};break;
                case 5: cur_x += half; link[i-1] = {cur_x,cur_y};break;
                case 6: cur_x -= half; cur_y += 1; link[i-1] = {cur_x,cur_y};break;
                case 7: cur_y += 1; link[i-1] = {cur_x,cur_y};break;
                case 8: cur_x += half; cur_y += 1; link[i-1] = {cur_x,cur_y};break;
            }
            mat[cur_y][cur_x] = '+'; 
        }
    }
};


void print(vector<vector<char>> mat){
    for(int i = 0;i<mat.size();i++){
        for(int j = 0;j<mat[i].size();j++){
            cout<<mat[i][j];
        }
        cout<<"\n";
    }
}
// just a simple line joing func can be more generalised
void joinline(vector<int> p1,vector<int> p2,vector<vector<char>> &mat){
    int level = p1[1];
    for(int i = min(p1[0],p2[0]);i<=max(p1[0],p2[0]);i++){
        mat[level][i] = '-';    
    } 
    int dest = (min(p1[0],p2[0]) == p1[0])?max(p1[0],p2[0]):min(p1[0],p2[0]);
    for(int i = min(p1[1],p2[1]);i<=max(p1[1],p2[1]);i++){
        mat[i][dest] = '|';
    }

}
void drawcycle2(string s1,string s2){
    vector<vector<char>>  mat(10,vector<char>(40,' ')); // defining cartesion plane
    int x1 = (s1.size()/2) 
    ASCIINode a1(s1,x1,3,mat); 
    int x2 = 40 -x1;
    ASCIINode a2(s2,x2,8,mat);
    joinline(a1.link[4],a2.link[1],mat);
    joinline(a2.link[3],a1.link[6],mat);
    print(mat);

}
drawcycle(S(int i), S(float f))

Output :
Screenshot from 2024-02-19 20-22-29

just a simple execution It can be heavily generalised.
Am I headed in correct direction. Requesting guidance.
Thank you

Those do seem like quite good candidates, I think that if we were able to get all/most of those, it would be a nice improvement. I’m not sure who the best folks to be mentor (last time I was a technical mentor and someone else actually managed the GSoC parts of it), but I can help out technically. There are perhaps others (@cjdb ) who are more familiar with diagnostics who might be able to help.

One thing: This ascii art should be disable-able via a flag, and probably disabled by default at least at first. That said, I think it might be interesting to see what comes out of it.

1 Like

I think that direction is looking pretty good! One thing that would perhaps help the ASCII art is to use “arrows” to help clarify the direction of the graph (e.g., ->, <-, etc) as in the earlier screenshots.

In terms of the diagnostics you’re looking at, those seem like reasonable candidates to me. Can you give an example of what you have in mind for ASCII art for the diagnostics related to bit widths? I’m having a bit of a hard time imagining how “implicit truncation from ‘int’ to bit-field changes value from 36 to 4” will look in terms of ASCII art.

1 Like

-Wanalyzer-out-of-bounds

Worth noting that this warning is produced by gcc -fanalyzer, a symbolic execution based analysis engine similar to clang --analyze (https://clang-analyzer.llvm.org). It is quite different from warnings produced in the process of normal compilation as it requires a completely different analysis technology. In Clang you cannot even enable --analyze together with normal compilation, it has to be a separate Clang invocation (though arguably historical reasons for that aren’t very good).

In any case, such symbolic analysis produces warnings in the form of full “execution paths” that need to be presented to the user step-by-step: “if you pass these input values then first this happens, then this happens, then this happens, then you have something bad happen”.

The Clang analyzer reports these issues relatively poorly on the command line (it simply presents each step as a note:) but it mostly focuses on either a good IDE integration or a web interface such as scan-build or CodeChecker. The GCC analyzer seems to be focusing on good command-line experience from the start and it’s much fancier than the clang’s --analyzer-output text but I’m not sure it’s actually good enough yet. In real-world applications (outside of “toy” examples) it’s probably still much easier to comprehend the warning if you see execution path steps overlayed on top of your actual code without skipping any intermediate steps or reordering it in the order of events.

It’s not unreasonable to try to implement an “actually good” text output mode for such tools and I definitely have some ideas for that. But I wouldn’t recommend making this your project because the niche for such use is relatively small. There are much better ways to consume the output of these tools no matter how minimalistic your setup is.


The array bounds diagram is great though. It doesn’t have anything to do with symbolic analysis and I’m sure some compiler warnings could really benefit from a facility of this kind.

So generally I really like where this is going!

1 Like

Thank you, @NoQ, for your insightful response on the differences between Clang and GCC’s analysis tools, and for sharing your thoughts on the potential of ASCII art in diagnostics.

Your feedback provides a lot to consider, particularly regarding the niche nature of improving text output modes for tools like symbolic analysis. It’s helpful to have this perspective as I refine my project proposal.

I noticed you mentioned both not recommending this project due to its relatively small niche and expressing that you really like where this is going. This leaves me a bit puzzled about whether you are suggesting to continue with this project or not?. Your guidance would be greatly appreciated as I feel highly interested to work in this direction.

Additionally, you mentioned having some ideas for improving text output modes. I’m intrigued by this and would love to hear more about those ideas if you’re willing to share them.

Once again, thank you for your valuable insights and encouragement.
Thank you

Yeah, no, I mean, good text diagnostics for the static analyzer are less valuable than good text diagnostics for traditional compiler warnings. They’re still nice to have but the static analyzer itself has much fewer users (being treated as largely a separate tool), and most of the users of such tools are already using a better visualizer than anything that can ever be built on the command line.

So my point is, I think you should focus on individual traditional compiler warnings and their individual needs. Which may be more varied in requirements, but also more immediately useful to the larger user base. This isn’t what I have immediate ideas about, you probably already have more specific ideas than me. But I’m definitely generally excited about developing better ways to communicate with our users and making warnings more visual, more friendly. Many warnings are too nuanced to explain in a single statement of text. Attaching a picture, even if it’s just ASCII art is just so powerful. If you build anything that could be useful for a broader class of static analysis tools I’d be happy to integrate your solution into those tools, but it probably shouldn’t be a requirement on your project, or even your primary source of inspiration.

Another group of folks who may be interested in your work is the FlowSensitive team (cc @ymand!) - they’re building a framework for Data Flow analysis, for use in compiler proper as well as in static analysis tools.

In the area of flow-sensitive analysis I actually have a specific suggestion that I’m running around with: as these flow-sensitive warnings become smarter, it may be a good idea to display them to the user all at once. I.e., instead of displaying 3 separate warnings:

warning: variable ‘x’ is uninitialized when used here
warning: this statement is never executed
warning: this if-condition is always false

…it may be a good idea to emit a single combined warning:

warning: variable ‘x’ is uninitialized when used here
note: …BECAUSE the statement that was supposed to initialize ‘x’ is never executed
note: …BECAUSE the if-condition that guards that statement is always false

(Assuming that these three warnings are actually in this kind of cause-and-effect relationship.)

This allows the developers of such warnings to find much deeper bugs without worrying that they’ll be too hard to for the user to understand. (This isn’t necessarily a visualization problem. A system of notes may do a sufficiently good job, just like in the static analyzer text output. But it definitely calls for something more than a simple text message. It’s likely that displaying a portion of code as-is and overlaying all these warnings on top of it, connected by arrows, would be as beneficial here as it already is in static analyzer world.)

Another somewhat interesting area is, some static analysis diagnostics require non-trivial “constraint solving” (read, SAT/SMT solving). I don’t think any existing compiler warnings do anything close to that level of complexity, but FlowSensitive based warnings may potentially get pretty close to that, where even if they don’t ship a full-featured SAT solver they’d certainly look like they should have. Visualizing the system of constraints that are being fed to the solver, and maybe to some extent the solution (if the solver can explain it at all), could be valuable. It’s already a large-ish problem in the clang static analyzer world where we struggle to improve our ad-hoc constraint solver because any such improvements will unfortunately make diagnostics harder to understand. So this could be an interesting area to look into.

But again, I think any work done on basic compiler warnings is probably going to have a bigger impact on humanity’s well-being. So I think you can keep it in mind that the problems I describe exist, but don’t focus on them too much unless you independently discover them in your work on traditional compiler warnings.

2 Likes

Thanks Aaron for setting the direction & your support, for warning:

  • warn_impcast_bitfield_precision_constant I am thinking something along the lines of this:
    examples are of : 36 → 4, 36 ->36, 156->28; (like current version don’t tell how & why 36 is getting converted to 4 , with ASCII art its clear how the last 5 bits are getting converted to 4)

Working code

#include <iostream>
#include <string>
#include <vector>
#include <cmath>
using namespace std;

class ASCIIbits {
private:
    std::string content;
    int num;
    int focus;
    bool last;
    string asciirep;
    string binaryToDecimal(const std::string& binary) {
    int decimal = 0;
    int size = binary.size();
    for (int i = size - 1; i >= 0; --i) {
        if (binary[i] == '1') {
            decimal += pow(2, size - 1 - i);
        }
    }
    return to_string(decimal);
}
public:

    void ASCIIREP(string s,int x_start,int y_start,vector<vector<char>> &mat,int focus,string binary){
        int x = x_start;
        int y = y_start;
        for(int i = 0;i<s.size();i++){
            mat[x][y] = '|';y++;
            mat[x][y] = ' ';y++;
            mat[x][y] = s[i];y++;
            mat[x][y] = ' ';y++;
            // mat[x][y] = '|';y++;
        }
        mat[x][y] = '|';
        x+=1;

        mat[x][y] = '|';
        
        mat[x][y-((focus*4))] = '|';
        for(int i = y-((focus*4))+1;i<y;i++){
            mat[x][i] = '_';
        }
        y+=2;
        mat[x][y] = '=';y++;
        mat[x][y] = '>';y+=2;
        string temp = "(last "+to_string(focus)+" bits)";
        binary+=" "+temp;
        for(int i = 0;i<binary.size();i++){
            mat[x][y] = binary[i];y++;
        }


    }
    ASCIIbits(int num, int focus, vector<vector<char>> &mat,int x_start,int y_start) : last(last), num(num), focus(focus) {
    asciirep = "";
    if(focus == 30 || focus == 31 || focus == 32){

    }
    else{
        // show first two;
        for(int i = 0;i<2;i++){
            int temp = num;
            int bit = temp>>(31-i)&1;
            asciirep += to_string(bit);
        }
        for(int i = 0; i<4;i++){
            asciirep+=".";
        }
        string binary = "";
        for(int i = focus-1;i>=0;i--){
            int temp = num;
            int bit = (temp>>i)&1;
            asciirep += to_string(bit);
            binary += to_string(bit);
        }
        binary = binaryToDecimal(binary);
        ASCIIREP(asciirep,x_start,y_start,mat,focus,binary);
    }
        
}
};
void print(vector<vector<char>> mat){
    for(int i = 0;i<mat.size();i++){
        for(int j = 0;j<mat[i].size();j++){
            cout<<mat[i][j];
        }
        cout<<"\n";
    }
}
void helper(int num,int focus){
    vector<vector<char>>  mat(6,vector<char>(100,' ')); // defining cartesion plane
    ASCIIbits(num,focus,mat,2,2);
    print(mat);
}
int main() {
    helper(36,5);
    helper(36,6);
    helper(156,7);
    return 0;
}

Thank you, @NoQ, for your insightful suggestions. Focusing on individual traditional compiler warnings and addressing their specific needs resonates with current direction, given their potential impact on a larger user base.

I’m particularly intrigued by your ideas regarding flow-sensitive analysis and the possibility of displaying related warnings together. This approach could significantly enhance developers’ understanding and debugging capabilities. Additionally, the concept of visualizing constraint-solving processes for more complex diagnostics is fascinating and could be valuable in improving diagnostic comprehension.

I’ll definitely keep these suggestions in mind as I continue to develop my project proposal. Your guidance has been incredibly helpful.
Thank you
Yours sincerely