How to deal with mangled names matching with extern "C" ones?

Hi All,

The following test:

extern "C" void _ZN1TD1Ev();

struct T {
  ~T() {}
};

int main() {
  _ZN1TD1Ev();
  T t;
}

crashes clang.

The problem here is when CodeGen generates code for t's destructor
call, it tries to find destructor's declaration using its mangled name
("_ZN1TD1Ev") as a key... and finds extern "C" function's declaration
instead. Of course, number of arguments is different, so an assert
hits.

I see two ways to resolve this:

1) Check that C++ mangled names don't match with extern "C" ones in
Sema. This is logical and easy enough to do when a C++ declaration
follows extern "C" (as in the example above) -- one simply has to
create a mangled name for C++ method and look for it among existing
declarations. But what to do when extern "C" follows C++ method
declaration (just put T's structure declaration before extern "C" in
the example above)? In this case, *all* existing declarations should
be mangled and checked against matching with extern "C" function name
in hand. This would be an overkill.

2) Check that what is returned in CodeGen (through
"CodeGenModule::GetGlobalValue") is suitable for the type of call we
have to generate (at least, matches number of arguments -- maybe
something else?) If not, print an error message. Granted, this is not
an ideal solution, but at least not crashes clang and doesn't add too
much overhead to detect this [arguably extremely rare] case.

Any suggestions here? Any other way to deal with this? Personally, I'm
leaning towards 2).

Yours,
Andrey Bokhanko

The following test:
(...)
crashes clang.

That's a bug. :slight_smile:

In your case, your C definition is "void ()" while the destructor is
"void (this)", so they are different. Clang is just messing up.

The problem here is when CodeGen generates code for t's destructor
call, it tries to find destructor's declaration using its mangled name
("_ZN1TD1Ev") as a key... and finds extern "C" function's declaration
instead. Of course, number of arguments is different, so an assert
hits.

The standard says:

C++11 7.5 p1: "Two function types with different language linkages are
distinct types even if they are otherwise identical."

However...

C++11 7.5 p5: "If two declarations declare functions with the same
name and parameter-type-list (8.3.5) to be members of the same
namespace or declare objects with the same name to be members of the
same namespace and the declarations give the names different language
linkages, the program is ill-formed; no diagnostic is required if the
declarations appear in different translation units."

My standard-reading skills are a bit rusty, but following the examples
on that chapter, I believe to be an error to have something like:

struct T {
  ~T() {}
};

extern "C" void _ZN1TD1Ev(struct T*);

int main() {
  T t;
  _ZN1TD1Ev(&t);
}

This should be resolved simply by the same mechanism that is already
in place for C++.

Though, g++ seems to accept it...

cheers,
--renato

[lex.name]/3.1: "Each identifier that contains a double underscore __ or begins with an underscore followed by an
uppercase letter is reserved to the implementation for any use."

Right, I always forget the _[A-Z] case. I guess how we deal with this
is up to interpretation.

cheers,
--renato

Hmmm, maybe we should just check that identifiers don't start with
_[A-Z] and print an error if they are?

Richard [Smith], what do you think?

Yours,
Andrey

IIRC, we already have a warning for defining identifiers with reserved
names, but I think we only do it for macros. (See
-Wreserved-id-macro). We may be able to expand that to defining any
reserved identifier, but since those are reserved for the
implementation namespace, I am uncertain of how well that would play
with things that can legitimately use the implementation namespace and
use Clang to compile (like libc++).

~Aaron

I think we should just fix the assert by erroring out from CodeGen when it detects that two Decls have the same mangled name. This might be tricky because we usually assume that decls have a unique mangled name.

If people want to compile C++ transpiled to C, I don’t see why clang should forbid that. The Itanium ABI restricts itself to valid C identifiers to support this exact use case.

I think we should just fix the assert by erroring out from CodeGen when it
detects that two Decls have the same mangled name. This might be tricky
because we usually assume that decls have a unique mangled name.

That is both reasonable and what we already attempt to do, but we don't do
so particularly thoroughly or consistently. For instance:

  int _Z1fv;
  void f() {} // gives an error "definition with same mangled name as
another definition"

but:

  void f() {}
  int _Z1fv; // no error, variable silently replaces function

If people want to compile C++ transpiled to C, I don't see why clang
should forbid that. The Itanium ABI restricts itself to valid C identifiers
to support this exact use case.

I agree. While it may be reasonable to warn on reserved identifiers outside
system headers, we should still accept them when they don't conflict with
some other meaning we've assigned to them. CodeGen needs to be able to deal
with name conflicts, and should avoid crashing when they occur. And this
problem is not limited to C++ name mangling; we get a conflict here too:

  int n = 1;
  int m __asm__("n") = 2;

OK, I will add checks in CodeGen and will send a patch for review.

Yours,
Andrey

Fix prepared, submitted for review: http://reviews.llvm.org/D11297.

It fixes the problem described in my initial message and PR17829. It
doesn't cover cases from Richard Smith's message, which are a subject
of a separate fix.

Andrey