[RFC] Improve Clang crash reproducer support for modules

Hi all,

I've been recently working in ways to improve Clang's crash reproducer support
for modules and would like share/discuss some ideas and issues I've found.

*** Context
For those not yet aware, clang currently produces a script and a
preprocessed source file whenever it crashes, which makes it a lot
easier to reproduce and therefore fix issues in the compiler.

When using modules (-fmodules) we also generate a cache directory containing:
(A) the PCM files from serialized module ASTs, (B) headers in
directory structure resembling that of the original filesystem where
the crash happened and (C) a YAML file containing the map for this
headers (overlay filesystem). However, we are currently unable to
actually reproduce module issues because of limitations in the current
implementation.

I hacked together several smalls fixes to be able to successfully
reproduce a simple "ObjC hello world importing Foundation.h" crash
without relying on the real filesytem.

The changes are going to be split up in several patches and sent for
review but I've attached a pretty rough patch containing all the
changes I needed to get this working. This should give an approximate
idea (in case anyone is interested) of what I intend to do. Note that
this still need some work to be upstreamed.

If anyone has trouble debugging module issues with the current reproducers or
have ideas to share, please chime in :slight_smile:

*** Problems
Some of the issues found include:

- YAML files contain hardcoded paths in 'external-contents' entries prevents
clang from using the files under the VFS overlay in .cache/vfs.
My plan to fix this is to add a new option in the VFS yaml file called
'overlay-relative'. When it's equal to 'true' it means that the
provided path to the YAML file through the -ivfsoverlay option should
also be used to prefix the final path for every 'external-contents'.

Example, given the invocation snippet "... -ivfsoverlay
file-HASH.cache/vfs/vfs.yaml"
and the following entry in the yaml file:

"overlay-relative": "true",
"roots": [
...
  "type": "directory",
  "name": "/usr/include",
  "contents": [
    {
      "type": "file",
      "name": "stdio.h",
      "external-contents": "/usr/include/stdio.h"
    },
...

Whenever there's file manager request for "/usr/include/stdio.h", that
will map into "/<absolute_path_to>/file-HASH.cache/vfs/usr/include/stdio.h.

- Missing header files that are not currently collected by the AST listeners and
are needed during module parsing.
- Symbolic links from the real filesystem don't have a counterpart in the
VFS file (neither by copying or by generating multiple entries for one file)
- VFS path traversal code has bugs that prevent some files and
directories to be found; buggy/redudant empty "" directory entries
(used to denote current dir) and "." at the end of dir paths.
- Case sensitivity issues between the YAML entries vs real filesystem.

*** Limitations
- This isn't yet enough to allow reusing PCM files from .cache dir,
but it's enough
to use the overlay to build new ones.
- More changes should come to make it work for larger / more complex projects.

-Bruno

crash-reproducer.patch (38.2 KB)