libclang crash when parsing MS-style inline assembly

Hello,

The following minimal code that contain MS-style inline assembly will
compile fine with clang, but libclang fails to parse it
(clang_parseTranslationUnit will return NULL).

   void Break(){ __asm { int 3 } }

In yesterday's llvm and clang sources, the problem occurs in
ParseMicrosoftAsmStatement.
In the code below, because no target have been registered, the first line
will set TheTarget to NULL, and the second line will dereference TheTarget,
thus causing the problem.

   const llvm::Target *TheTarget = llvm::TargetRegistry::lookupTarget(TT,
Error);
   OwningPtr<llvm::MCRegisterInfo> MRI(TheTarget->createMCRegInfo(TT));

For what I understood, clang, in cc1_main, will initialize targets and
targets' functions with the following 4 lines, whereas libclang won't.

  llvm::InitializeAllTargets();
  llvm::InitializeAllTargetMCs();
  llvm::InitializeAllAsmPrinters();
  llvm::InitializeAllAsmParsers();

Just for testing purpose, adding those 4 lines somewhere in
clang_createIndex fixes the problem. I know this is probably wrong, but did
it just to see if more problems were hiding behind.

So my questions are:
  1] Is it wanted that libclang doesn't initialize any target ?
  2] if yes, how shloud it behave with MS inline assembly ?
  3] if no, what is the proper way to make it initialize targets ?

Many thanks for you work !
William

Hi William,

For what I understood, clang, in cc1_main, will initialize targets and
targets' functions with the following 4 lines, whereas libclang won't.

  llvm::InitializeAllTargets();
  llvm::InitializeAllTargetMCs();
  llvm::InitializeAllAsmPrinters();
  llvm::InitializeAllAsmParsers();

Just for testing purpose, adding those 4 lines somewhere in
clang_createIndex fixes the problem. I know this is probably wrong, but did
it just to see if more problems were hiding behind.

We added a similar workaround in a clang-based tool, first thing in main():

  llvm::InitializeNativeTarget();
  llvm::InitializeNativeTargetAsmParser();

This fixed a crash I saw when some Windows header with inline assembly
was parsed.

I can't vouch for its goodness, just wanted to add a data point.

- Kim

Hello William,

I’ve landed a fix in r193685 to diagnose this instead of crashing:

clang/test/Index/ms-asm-no-target.cpp:8:3: error: MS-style inline assembly is not available: Unable to find target for this triple (no targets are registered)

It’s up to the embedder to decide what targets are linked in, so how the API exposes that is still an open question.

Alp.

Hi Alp,

Thanks for helping !

1] Is there a way I could tell libclang which target to initialize without modifying it ?
2] How does it work for gcc style assembly, is there a similar error message or does it work without the need of any targets ?
3] Since you decided to diagnose this, you might also want to check the case were the target have been initialized, but not the used function pointers of this target

William.

Hi William,
I can answer your second question. GCC style inline assembly does not
require a target AsmParser. MS-style inline assembly requires a parser
because this is how we discover the constraints. Hope this helps.

Chad

Hi Chad,

Thanks, yes it makes perfect sense,
I guess it is one reason why clang is quite strict about inline asm ( http://clang.llvm.org/compatibility.html#inline-asm )

So now I understand that there is no way to parse MS-style inline assembly without a target asm parser.
But I still need to figure out if I can and how to tell libclang to initialize a specific target.

Hi Chad,

So now I understand that there is no way to parse MS-style inline assembly
without a target asm parser.
But I still need to figure out if I can and how to tell libclang to
initialize a specific target.

Unfortunately, I don't have an answer here. Perhaps Alp or another member
of the list has a solution.

Hi all

Apologies in advance that I don't fully understand the technicalities of
this area, but could that regression test be made more robust?

My clang build has LLVM_DEFAULT_TARGET=aarch64-none-eabi (Host is
x86_64-linux-gnueabi) and the error message that I get is slightly different
causing the test to fail:

.../ms-asm-no-target.cpp:8:3: error: Unsupported architecture 'aarch64' for
MS-style inline assembly

Again, I don't fully understand the area, but the test seems only to require
that an x86 backend is available. It doesn't account for cross-compiling or
when the default triple is not x86. If the test needs to be run on x86,
shouldn't the REQUIRES line be more strict? Alternatively, if my failure
mode is equally acceptable behaviour, could the two error messages be
combined?

Thanks,
Rich

From: cfe-dev-bounces@cs.uiuc.edu [mailto:cfe-dev-bounces@cs.uiuc.edu] On
Behalf Of mcrosier@codeaurora.org
Sent: 30 October 2013 15:58
To: William Ledoux
Cc: cfe-dev@cs.uiuc.edu
Subject: Re: [cfe-dev] libclang crash when parsing MS-style inline

assembly

> Hi Chad,
>
> So now I understand that there is no way to parse MS-style inline
> assembly without a target asm parser.
> But I still need to figure out if I can and how to tell libclang to
> initialize a specific target.

Unfortunately, I don't have an answer here. Perhaps Alp or another member

of

the list has a solution.

>
>
>> Hi William,
>> I can answer your second question. GCC style inline assembly does
>> not require a target AsmParser. MS-style inline assembly requires a
>> parser because this is how we discover the constraints. Hope this

helps.

William, Kim,

Now that we've dealt with the underlying crash, we can start to look at
how to handle this cleanly.

I'd like to understand your requirements first..

For your use cases, would it be acceptable to skip over the MS inline
assembly without parsing it, ie. having a special mode that turns this
into a soft error in parse-only tools?

Alp.

Hi Alp,

Now that we've dealt with the underlying crash, we can start to look at
how to handle this cleanly.

I'd like to understand your requirements first..

For your use cases, would it be acceptable to skip over the MS inline
assembly without parsing it, ie. having a special mode that turns this
into a soft error in parse-only tools?

For me, I think so.

We ran into this in IWYU [1], and I don't think inline assembly would
introduce a "use" of any significance...

I'm not sure what you mean by "soft error", but I think I'd just
ignore it completely rather than spewing diagnostics for things we
can't affect and don't really care about.

We don't use libclang, the problem is a little different there, as
they can't initialize the respective targets. Our current workaround
solves the problem for us.

Thanks!

- Kim

[1] https://code.google.com/p/include-what-you-use/

Hi Alp,

>
> Now that we've dealt with the underlying crash, we can start to look at
> how to handle this cleanly.
>
> I'd like to understand your requirements first..
>
> For your use cases, would it be acceptable to skip over the MS inline
> assembly without parsing it, ie. having a special mode that turns this
> into a soft error in parse-only tools?

For me, I think so.

We ran into this in IWYU [1], and I don't think inline assembly would
introduce a "use" of any significance...

Muahahaha:

// foo.h
struct Foo {
  static const int x = 20;
};
// foo.cc
int main() {
  int r;
  __asm {
    mov ecx, Foo::x
    mov r, ecx
  };
  return r;
}

Seems like a use that is relevant to IWYU. :slight_smile:

    Hi Alp,

    >
    > Now that we've dealt with the underlying crash, we can start to
    look at
    > how to handle this cleanly.
    >
    > I'd like to understand your requirements first..
    >
    > For your use cases, would it be acceptable to skip over the MS
    inline
    > assembly without parsing it, ie. having a special mode that
    turns this
    > into a soft error in parse-only tools?

    For me, I think so.

    We ran into this in IWYU [1], and I don't think inline assembly would
    introduce a "use" of any significance...

Muahahaha:

// foo.h
struct Foo {
  static const int x = 20;
};
// foo.cc
int main() {
  int r;
  __asm {
    mov ecx, Foo::x
    mov r, ecx
  };
  return r;
}

Seems like a use that is relevant to IWYU. :slight_smile:

Exactly what I was thinking.

Looks like we'll end up having to do something like this for both
Tooling and libclang :-/

  LLVMInitializeX86TargetInfo();
  LLVMInitializeX86TargetMC();
  LLVMInitializeX86AsmParser();

On the plus side, it only pulls in an extra 1M on the binary size.

Alp.

Count on compiler developers to prove me wrong :slight_smile:

Patches most welcome!

- Kim

Actually, it turns out this already works in IWYU.

So we'll just keep the InitializeNativeTarget* calls in main(), and
life is good!

- Kim

Muahahaha:

// foo.h
struct Foo {
  static const int x = 20;
};
// foo.cc
int main() {
  int r;
  __asm {
    mov ecx, Foo::x
    mov r, ecx
  };
  return r;
}

Seems like a use that is relevant to IWYU. :slight_smile:

Count on compiler developers to prove me wrong :slight_smile:

Patches most welcome!

Actually, it turns out this already works in IWYU.

So we'll just keep the InitializeNativeTarget* calls in main(), and
life is good!

Good on x86, not so good when cross-compiling (cross-IWYUing?) from a
non-Intel architecture. The snippet I gave handles that as long as an
x86 target has been built in FWIW.

Alp.

I was just about to ask -- thanks for confirming!

And what controls whether an x86 target is built in? What happens in
its absence? It's not really a critical error for syntax-only tools,
so it would be nice if LLVMInitializeX86* could warn instead of
failing. Or maybe there needs to be a way for tools to ask whether the
target is available before attempting to initialize, so they can shape
the warning themselves.

- Kim

Maybe the ASM syntax parser should be changed to gracefully failed when the target is not available ?

-- Jean-Daniel

That's what Alp fixed, see up-thread. I'm just not sure
LLVMInitialize* fail as gracefully, I haven't looked into it. If they
fail silently (and we consider that a good thing), then this issue is
moot.

- Kim