default include paths in the frontend

Hi,

I'm using the clang libraries in one of my applications
(called pet) to parse a C file, after which I extract
some information from the generated AST. Since I only
need the parser, I'm using the frontend directly.

The sources I need to parse may include system headers
and so clang should also look in system include paths.
This used to work fine, but in revision 143822, the
Linux header searching was moved from the frontend to
the driver, leaving InitHeaderSearch::AddDefaultIncludePaths
in the frontend to do nothing and system header files
no longer getting found.

How am I supposed to make the frontend look for system
header files now?

Thanks,

skimo

Hi,

Hey, sorry for not getting back to you sooner…

I’m using the clang libraries in one of my applications
(called pet) to parse a C file, after which I extract
some information from the generated AST. Since I only
need the parser, I’m using the frontend directly.

Ok… is there any reason you don’t use the driver at all? The driver is structured such that you can call into it as a library as well, and get it to reason about command line options.

The sources I need to parse may include system headers
and so clang should also look in system include paths.
This used to work fine, but in revision 143822, the
Linux header searching was moved from the frontend to
the driver, leaving InitHeaderSearch::AddDefaultIncludePaths
in the frontend to do nothing and system header files
no longer getting found.

Correct. The frontend should be in the business of parsing C++ code, not of inspecting your system to find various versions of libstdc++, etc. This was always a poor location for the logic… I didn’t worry much about announcing that I was actually moving the logic because frankly, it was so broken that I doubted many people used it.

For reference, we use Clang in extremely similar ways, and have many tools in this space. However, the header search logic never really worked correctly in the frontend, so we have for a long time been rolling our own.

How am I supposed to make the frontend look for system
header files now?

You can’t directly make the frontend do this, but you can always use the driver to build up the arguments for your compiler invocation. Look at Driver::BuildCompilation, which accepts commandline args for a compile. You can then get the JobList out of the Compilation. It should have a single job in it, which is the command for running the Frontend CC1 layer, with system header search paths pre-filled. You can pull the ArgStringList directly out of the command and hand it to the Frontend APIs yourself if you need to avoid a subprocess.

I can sketch out the code to do this if it helps (as I said, we’re doing it ourselves), but there’s nothing really fancy to it. The interface is kinda gross and unpolished, but that’s because this isn’t a usecase that has been heavily polished in Clang.

Let’s assume I want to avoid a subprocess.
Maybe there is no need, but that’s how I’m doing it now.

Entirely reasonable…

I can sketch out the code to do this if it helps

If it’s not too much trouble for you, then I would appreciate
such a sketch. It may also help others.

It looks something like the following. I’ve editted this down heavily from some real code we’ve used, but I’ve not even compiled it (as you can tell) so its very pseudo-code-ish… lemme know if you need something more concrete, but hopefully this is enough to get you started:

/// \brief Retrieves the clang CC1 specific flags out of the compilation’s jobs.
/// Returns NULL on error.
static const clang::driver::ArgStringList *GetCC1Arguments(
clang::DiagnosticsEngine *Diagnostics,
clang::driver::Compilation *Compilation) {
// We expect to get back exactly one Command job, if we didn’t something
// failed. Extract that job from the Compilation.
const clang::driver::JobList &Jobs = Compilation->getJobs();
if (Jobs.size() != 1 || !isaclang::driver::Command(*Jobs.begin())) {
// diagnose this…
return NULL;
}

// The one job we find should be to invoke clang again.
const clang::driver::Command *Cmd = castclang::driver::Command(*Jobs.begin());
if (llvm::StringRef(Cmd->getCreator().getName()) != “clang”) {
// diagnose this…
return NULL;
}

return &Cmd->getArguments();
}

const std::vector<char*> Argv = …; // get this from somewhere…
const char *const BinaryName = Argv[0];
DiagnosticOptions DefaultDiagnosticOptions;
TextDiagnosticPrinter DiagnosticPrinter(
llvm::errs(), DefaultDiagnosticOptions);
DiagnosticsEngine Diagnostics(llvm::IntrusiveRefCntPtrclang::DiagnosticIDs(
new DiagnosticIDs()), &DiagnosticPrinter, false);

clang::driver::Driver Driver(…, Diagnostics);
const llvm::OwningPtrclang::driver::Compilation Compilation(
Driver.BuildCompilation(llvm::ArrayRef<const char*>(
&Argv[0], Argv.size() - 1)));
const clang::driver::ArgStringList *const CC1Args = GetCC1Arguments(
&Diagnostics, Compilation.get());
if (CC1Args == NULL) {
return false;
}
llvm::OwningPtrclang::CompilerInvocation Invocation(
new clang::CompilerInvocation);
clang::CompilerInvocation::CreateFromArgs(
*Invocation, CC1Args->data() + 1, CC1Args->data() + CC1Args->size(),
Diagnostics);
Invocation->getFrontendOpts().DisableFree = false;

Isn’t this what “createInvocationFromCommandLine” in Frontend/Utils.h does ?

-Argyrios

Isn’t this what “createInvocationFromCommandLine” in Frontend/Utils.h does ?

Aha! I vaguely remember we cribbed this from somewhere, but had forgotten where. We should probably extend that to fix up the InstalledDir in the driver…

Isn't this what "createInvocationFromCommandLine" in Frontend/Utils.h does ?

Thanks for the suggestion, but that doesn't seem to work very well
for me.

Aha! I vaguely remember we cribbed this from somewhere, but had forgotten
where. We should probably extend that to fix up the InstalledDir in the
driver...

Indeed, one of the problems is that there does not appear to be any
way of influencing ResourceDir when using createInvocationFromCommandLine.
Note though, that setting InstalledDir does not appear to have any effect
either as ResourceDir appears to be computed from "ClangExecutable" instead.

Another problem that I find when using createInvocationFromCommandLine
is that it appears to lead to an access to uninitialised values.
I'm probably calling this function in the wrong way, but it's not obvious
to me what exactly I'm doing wrong.

In particular, this code works for me:

static CompilerInvocation *construct_invocation(const char *filename,
        llvm::IntrusiveRefCntPtr<DiagnosticsEngine> Diags)
{
        const char *binary = CLANG_PREFIX"/bin/clang";
        Driver *driver = new Driver(binary, llvm::sys::getDefaultTargetTriple(),
                            "", false, *Diags);
        std::vector<const char *> Argv;
        Argv.push_back(binary);
        Argv.push_back(filename);
        Compilation *compilation = driver->BuildCompilation(
                ArrayRef<const char *>(Argv));
        JobList &Jobs = compilation->getJobs();

        Command *cmd = cast<Command>(*Jobs.begin());
        const ArgStringList *args = &cmd->getArguments();

        CompilerInvocation *invocation = new CompilerInvocation;
        CompilerInvocation::CreateFromArgs(*invocation, args->data() + 1,
                                                args->data() + args->size(),
                                                *Diags);
        delete compilation;
        delete driver;
        return invocation;
}

This doesn't:

static CompilerInvocation *construct_invocation(const char *filename,
        llvm::IntrusiveRefCntPtr<DiagnosticsEngine> Diags)
{
        return createInvocationFromCommandLine(ArrayRef<const char *>(filename),
                                                Diags);
}

Besides the ResourceDir problem, I get this warning from valgrind:

==14736== Conditional jump or move depends on uninitialised value(s)
==14736== at 0x5CE52C2: clang::TextDiagnostic::emitIncludeStack(clang::SourceLocation, clang::DiagnosticsEngine::Level) (TextDiagnostic.cpp:546)
==14736== by 0x5CE4E91: clang::TextDiagnostic::emitDiagnostic(clang::SourceLocation, clang::DiagnosticsEngine::Level, llvm::StringRef, llvm::ArrayRef<clang::CharSourceRange>, llvm::ArrayRef<clang::FixItHint>) (TextDiagnostic.cpp:437)
==14736== by 0x5CA7A11: clang::TextDiagnosticPrinter::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (TextDiagnosticPrinter.cpp:182)
==14736== by 0x638DB03: clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (DiagnosticIDs.cpp:783)
==14736== by 0x6386280: clang::DiagnosticsEngine::ProcessDiag() (Diagnostic.h:648)
==14736== by 0x638368E: clang::DiagnosticBuilder::Emit() (Diagnostic.cpp:351)
==14736== by 0x5C5EFF4: clang::DiagnosticBuilder::~DiagnosticBuilder() (Diagnostic.h:746)
==14736== by 0x5C50164: clang::DiagnosticBuilder::~DiagnosticBuilder() (ArgList.cpp:0)
==14736== by 0x634F33D: clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::DirectoryLookup const*, bool) (PPDirectives.cpp:1294)
==14736== by 0x6350A39: clang::Preprocessor::HandleIncludeNextDirective(clang::SourceLocation, clang::Token&) (PPDirectives.cpp:1342)
==14736== by 0x634DA81: clang::Preprocessor::HandleDirective(clang::Token&) (PPDirectives.cpp:651)
==14736== by 0x633F23F: clang::Lexer::LexTokenInternal(clang::Token&) (Lexer.cpp:2933)
==14736== by 0x633D30B: clang::Lexer::LexTokenInternal(clang::Token&) (Lexer.cpp:2327)

It also results in a memory leak, while the working code above does not
produce any warnings or memory leaks.

The calling context looks like this:

        CompilerInstance *Clang = new CompilerInstance();
        DiagnosticOptions DO;
        MyDiagnosticPrinter *printer = new MyDiagnosticPrinter(DO);
        llvm::IntrusiveRefCntPtr<DiagnosticsEngine> Diags =
                Clang->createDiagnostics(DO, 0, NULL, printer);
        Clang->setDiagnostics(&*Diags);
        Diags->setSuppressSystemWarnings(true);
        CompilerInvocation *invocation = construct_invocation(filename, Diags);
  Clang->setInvocation(invocation);
        Clang->createFileManager();
        Clang->createSourceManager(Clang->getFileManager());
        TargetOptions TO;
        TO.Triple = llvm::sys::getDefaultTargetTriple();
        TargetInfo *target = TargetInfo::CreateTargetInfo(*Diags, TO);
        Clang->setTarget(target);
        CompilerInvocation::setLangDefaults(Clang->getLangOpts(), IK_C,
                                            LangStandard::lang_unspecified);
        Clang->createPreprocessor();
        Preprocessor &PP = Clang->getPreprocessor();
  ...

skimo

Could you file a bug report with a compilable test case that reproduces the issue ?

I'll try, but it will probably have to wait until next weekend.

skimo