Reminder: 3.6 branch is coming

Reminder: The plan is to create the 3.6 branch next week, on 14 January.

Please help get the release notes and other documents updated. For
example, both the LLVM [1] and Clang [2] release notes look pretty
empty.

Also, if you'd like to volunteer to be a release tester, please let me know.

Cheers,
Hans

1. http://llvm.org/docs/ReleaseNotes.html
2. http://clang.llvm.org/docs/ReleaseNotes.html

Happy to test on Ubuntu 14.04 x86_64.

Ben

ARM and AArch64 here.

--renato

Happy to test, all Debian archs.

Sylvestre

I'll test on FreeBSD, as usual. Note that trunk has been failing "check-all" on FreeBSD for some time now, due to this:

FAIL: Clang :: CXX/drs/dr6xx.cpp (627 of 20130)
******************** TEST 'Clang :: CXX/drs/dr6xx.cpp' FAILED ********************
Script:

I've just read the relevant parts of the C11 spec, and it's not really clear to me what the 'basic character set' is. There are two possible interpretations:

- The set of characters that can be represented by a char in locale "C"
- The set of characters that can be represented by a char in *any* locale

On FreeBSD, it is correct to define __STDC_MB_MIGHT_NEQ_WC__ for the second definition, but not for the first. Can anyone point to something in the spec that clarifies this?

David

I'm again in for testing OS X.

Cheers,
Sebastian

I'll be doing Mips.

Just tried building "latest", works fine for a "production build", but
my debug build, which is build with:
CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/usr/
local/llvm-debug -DLLVM_TARGETS_TO_BUILD=X86 ../llvm
followed by "make -j 8", gives me a load of warnings like this:

clang-3.6: warning: argument unused during compilation: '-I
/home/MatsP/src/buildclang-debug/lib/Transforms/Utils'
clang-3.6: warning: argument unused during compilation: '-I
/home/MatsP/src/llvm/lib/Transforms/Utils'
clang-3.6: warning: argument unused during compilation: '-I
/home/MatsP/src/buildclang-debug/include'
clang-3.6: warning: argument unused during compilation: '-I
/home/MatsP/src/llvm/include'

(home/MatsP/src/llvm is where my LLVM source lives, buildclang-debug
is my build directory).

Not sure if this is a bug, and what the fix is, but I guess it's not
really meant to happen?

So, digging a bit deeper, this appears to be related to ccache -
although I'm not quite sure what/how at this point. If I run
/usr/local/bin/clang++ or /usr/local/llvm-debug/bin/clang++, it
doesn't complain about surplus includes.

However, it seems a bit excessive to complain about unused includes
after using ccache?

The warning tells you that -I is not used when handling preprocessed
include, which is correct. You can disable it with -Qunused-arguments.

Joerg

Ok, so basically, I need to either turn off the warnings altogether,
or detect if I'm using ccache or not, and then turn off warnings for
unused args based on that? Given that I use
llvm-config --cxxflags
which gives me the -I /usr/local/llvm-debug/include/ that it complains
about, I don't really have the choice as to whether I use that include
or not.

Or is using ccache with clang not a recommended "thing"?

[As my own compiler project uses -Werror, it stops my project from
building, so the problem definitely needs a fix of some sort - using
-Qunused-arguments for now, but doesn't really feel like the RIGHT
thing permanently]

The recommented way to use ccache with clang is to not cut the
preprocessing step away.

Joerg

It looks like the interpretation of what "__STDC_MB_MIGHT_NEQ_WC__" means differs between the llvm developers and the FreeBSD developers. I'm not sure what a good solution is.

I've just read the relevant parts of the C11 spec, and it's not really clear to me what the 'basic character set' is. There are two possible interpretations:

- The set of characters that can be represented by a char in locale "C"
- The set of characters that can be represented by a char in *any* locale

On FreeBSD, it is correct to define __STDC_MB_MIGHT_NEQ_WC__ for the second definition, but not for the first. Can anyone point to something in the spec that clarifies this?

I don't have any clarification, but I do want to quote some discussion from a previous email conversation with Ed Schouten and Richard Smith. This started with me mailing Ed about this particular test failure, to which he replied:

We talked about this a little on IRC, and the opinion seems to be that defining __STDC_MB_MIGHT_NEQ_WC__ for FreeBSD is not the right thing to do. The standard says:

__STDC_MB_MIGHT_NEQ_WC__ The integer constant 1, intended to indicate that, in the encoding for wchar_t, a member of the basic character set need not have a code value equal to its value when used as the lone character in an ordinary character literal.

But the "basic character set" is just the whitespace characters, plus a-zA-Z0-9_{}#()<>%:;.?*+-/^&|∼!=,\"’, and I don't think this set is dependent on the locale at all, even with wchar_t...?

As far as I know, they are. On FreeBSD, the encoding of wchar_t
depends on the locale entirely. This is annoying. I would have loved
to see us use UCS-4 instead.

Even though FreeBSD does not ship with this, it should be perfectly
feasible to come up with an EBCDIC locale that would directly map to
the low 8 bits of wchar_t.

The test case in the LLVM tree is invalid and should be discarded. It
erroneously assumes that the encoding of wchar_t is independent of the
locale.

Richard then argued this makes no sense:

...

The test case in the LLVM tree is invalid and should be discarded. It
erroneously assumes that the encoding of wchar_t is independent of the
locale.

That makes no sense. These value are compile-time constants and cannot possibly depend on the locale.

Next, Ed remarked that wide characters are indeed locale-dependent:

...
That makes no sense. These value are compile-time constants and cannot
possibly depend on the locale.

Exactly, but as far as I know, that's exactly the problem why wide
characters are broken as implemented on FreeBSD. They are locale
dependent, meaning that there is no way a compiler could reliably emit
literal character/string literals.

And Richard then seemed to conclude that this was something to be solved on the FreeBSD side:

...

That is a much more fundamental problem than the value of this macro, and is a problem the FreeBSD folks will need to sort out for themselves.

Nonetheless, we need to have a fixed encoding for wide character literals, and the macro is specified as corresponding to *that* encoding. And in that encoding, narrow and wide basic source characters have the same value.

This was the end of the thread. Now, to go back to the beginning again, when I read the C++11 standard, it mentions a "basic source character set" in 2.3 [lex.charset]:

The basic source character set consists of 96 characters: the space character, the control characters repre- senting horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:14

abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 _{}#()<>%:;.?*+-/^&|∼!=,\"’

It looks like this is equivalent to "basic character set", since references in the document mentioning that name refer back to section 2.3. That section also has a footnote which seems relevant:

The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.

All in all, I'm not sure whether the test case should fail when __STDC_MB_MIGHT_NEQ_WC__ is 1 and all basic source characters just 'happen' to be equal to their representation.

Either that, or maybe just XFAIL the test case on FreeBSD, to fix it.

-Dimitry

I believe Richard is wrong here. There are a number of similar compile-time constant macros in the C spec. I believe the clue as to the correct reading of the spec is in the name of the macro: __STDC_MB_MIGHT_NEQ_WC__

Note the word *might*. It means that it is not safe for code to assume that a cast will give the corresponding char value if one exists. i.e. that assumption is not true for all locales.

As dim says, the test is wrong. It would be a valid test for it to fail if __STDC_MB_MIGHT_NEQ_WC__ is *not* defined and not all characters in the basic set have the same encoding as wide chars, but it is not correct to fail if it is set unless that have the same encoding in *all* locales, *and* the vendor is willing to guarantee that they will have the same encoding for all locales in all future binary-compatible versions of the system (which an automated test can't check).

In summary: the test is nonsense and should be removed.

David

I couldn't find an existing PR for this, so I filed
http://llvm.org/PR22208 with folks on this thread cc'd. It would be
great if we could get it resolved soon.

Thanks,
Hans

You should read the definitions in the relevant standards rather than
trying to guess what the macro means from its name. Here is the definition:

"The integer constant 1, intended to indicate that, in the encoding for
wchar_t, a member of the basic character set need not have a code value
equal to its value when used as the lone character in an integer character
constant."

So the value 1 indicates that 'x' might not equal L'x' for some character x
in the basic character set. (Note that the 'might' means that there might
exist some *character* where this happens, not that there might exist some
*locale* where this happens.) Since the value of 'x' and L'x' are
determined at translation time, this property obviously cannot depend in
any way on the current locale in the execution environment.

Note that the above property is *exactly* what the test is testing for.

However... the FreeBSD folks don't seem interested in fixing their bug, and
it's technically conforming for an implementation to define this macro to 1
in any situation -- a member of the basic source character set "need not"
have the same value as a narrow or wide character, even though they all
actually do -- making this a quality-of-implementation issue, and I'm tired
of discussing this, so I've relaxed the test for FreeBSD in r225751.

Count me in for testing Fedora and OpenSUSE.