I started some weeks ago a small side project with the goal to automatically generate documentation of all clang compiler diagnostics. When I learned programming over 2 decades ago, I loved the msvc compiler (compared to gcc) because there every diagnostic has a unique number and an associated article that documents its behavior. My initial idea was to also add a unique id combined with a wiki where the users can help built up a documentation that goes beyond what is available on Diagnostic flags in Clang â Clang 19.0.0git documentation .
So I looked into the diagnostic subsystem and was surprised that there already exists an id for (most) diagnostic messages that is used internally within the compiler but is not outlined to the end user. Furthermore, the diagnostics can be easily extracted from the source code by taking a look at the tablegen generated inc files. I started to extract all information I could get from one diagnostics (id, message, category, groups, etc.) and after playing around a bit with ChatGPT, I thought that maybe larger parts of a documentation could be automatically extracted from source and combined with chat bot knowledge.
The first thing that I tried was to generate small code snippets that generate the diagnostics and could serve as an example for a documentation. So I worked on my first AI application that asked ChatGPT via API to generate for a specific diagnostic code that would trigger it. I sent the source code to CompilerExplorer to get the stderr and checked the result for the particular diagnostic message. I sent ChatGPT up to 5 times the result of stderr if the code was not triggering the diagnostics. With that approach I had a success rate of 30-40% (for c++ related diagnostics).
I soon found out that ChatGPT needed more knowledge about the diagnostic to achieve better results. Some diagnostic messages are very generic and it is hard to know what compiler feature is involved or even which input language to use. So I extracted parts of clangâs source code where the diagnostic is trigger, searched for git commit message that first introduced the diagnostics and looked for existing test cases that trigger the diagnostics. With that information I could increase the success rate to 70-80% (for c++).
After having a remarkable amount of information per diagnostic, I uploaded everything into a wiki and started to generate articles using lua modules. I used again ChatGPT API to generate a generic description and an explanation of the code examples within each wiki article.
The result of the first 100 articles can be found at Category:Clang Errors - emmtrix Wiki and Category:Clang Warnings - emmtrix Wiki .
Now I am looking for feedback about the articles. What do you think about the result? What is good/bad? What other information could be included?
The current result is from clang 17.0.6. I plan to further include information how the diagnostic evolved over the past versions. When was it introduced or removed, messages changed, etc.