[libc++] xlocale usage portability issues

Hi, i'm trying to port libcxx to the musl libc [1] which does not offer xlocale.h

Here are some concrete issues:

- strtol_l and friends do not exist. can we add a #ifdef _HAVE_XLOCALE to use the standard functions where xlocale doesnt exist ?
- ctype_base uses the libc's macros for the character masks. I don't think it has to do that at all, since it never passes the mask back to the libc.
   this appears to works for me, but i couldn't test it, as there are other errors to fix first:

diff --git a/include/__locale b/include/__locale
index 8aa8fc1..05cc9ac 100644
--- a/include/__locale
+++ b/include/__locale
@@ -22,7 +22,7 @@
  #if _WIN32
  # include <support/win32/locale_win32.h>
  #else // _WIN32
-# include <xlocale.h>
+# include <locale.h>
  #endif // _WIN32

  #pragma GCC system_header
@@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo, const char_type* hi) const
  {
      size_t h = 0;
      const size_t sr = __CHAR_BIT__ * sizeof(size_t) - 8;
- const size_t mask = size_t(0xF) << (sr + 4);
+ const size_t cmask = size_t(0xF) << (sr + 4);
      for(const char_type* p = lo; p != hi; ++p)
      {
          h = (h << 4) + *p;
- size_t g = h & mask;
+ size_t g = h & cmask;
          h ^= g | (g >> sr);
      }
      return static_cast<long>(h);
@@ -330,7 +330,7 @@ public:
      static const mask punct = _PUNCT;
      static const mask xdigit = _HEX;
      static const mask blank = _BLANK;
-#else // __GLIBC__ || _WIN32
+#elif (__APPLE__ || __FreeBSD__)
  #if __APPLE__
      typedef __uint32_t mask;
  #elif __FreeBSD__
@@ -346,7 +346,19 @@ public:
      static const mask punct = _CTYPE_P;
      static const mask xdigit = _CTYPE_X;
      static const mask blank = _CTYPE_B;
-#endif // __GLIBC__ || _WIN32
+#else // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
+ typedef unsigned long mask;
+ static const mask space = 1<<0;
+ static const mask print = 1<<1;
+ static const mask cntrl = 1<<2;
+ static const mask upper = 1<<3;
+ static const mask lower = 1<<4;
+ static const mask alpha = 1<<5;
+ static const mask digit = 1<<6;
+ static const mask punct = 1<<7;
+ static const mask xdigit = 1<<8;
+ static const mask blank = 1<<9;
+#endif // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
      static const mask alnum = alpha | digit;
      static const mask graph = alnum | punct;

Hi, i'm trying to port libcxx to the musl libc [1]

Great!

which does not offer
xlocale.h

Here are some concrete issues:

- strtol_l and friends do not exist. can we add a #ifdef _HAVE_XLOCALE
to use the standard functions where xlocale doesnt exist ?

Our experience to date is that the best way to deal with this issue is to emulate the *_l functions elsewhere. The _WIN32 port does just this.

- ctype_base uses the libc's macros for the character masks. I don't
think it has to do that at all, since it never passes the mask back to
the libc.

On OS X and FreeBSD libc++ uses the libc character mask table.

  this appears to works for me, but i couldn't test it, as there are
other errors to fix first:

Is there a flag that can be used to identify your port instead of just putting it all under "#else"? Your very first change:

diff --git a/include/__locale b/include/__locale
index 8aa8fc1..05cc9ac 100644
--- a/include/__locale
+++ b/include/__locale
@@ -22,7 +22,7 @@
#if _WIN32
# include <support/win32/locale_win32.h>
#else // _WIN32
-# include <xlocale.h>
+# include <locale.h>
#endif // _WIN32

breaks OS X and FreeBSD.

#pragma GCC system_header
@@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo,
const char_type* hi) const
{
     size_t h = 0;
     const size_t sr = __CHAR_BIT__ * sizeof(size_t) - 8;
- const size_t mask = size_t(0xF) << (sr + 4);
+ const size_t cmask = size_t(0xF) << (sr + 4);
     for(const char_type* p = lo; p != hi; ++p)
     {
         h = (h << 4) + *p;
- size_t g = h & mask;
+ size_t g = h & cmask;
         h ^= g | (g >> sr);
     }
     return static_cast<long>(h);

Why does mask need to be renamed to cmask?

@@ -330,7 +330,7 @@ public:
     static const mask punct = _PUNCT;
     static const mask xdigit = _HEX;
     static const mask blank = _BLANK;
-#else // __GLIBC__ || _WIN32
+#elif (__APPLE__ || __FreeBSD__)
#if __APPLE__
     typedef __uint32_t mask;
#elif __FreeBSD__
@@ -346,7 +346,19 @@ public:
     static const mask punct = _CTYPE_P;
     static const mask xdigit = _CTYPE_X;
     static const mask blank = _CTYPE_B;
-#endif // __GLIBC__ || _WIN32
+#else // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
+ typedef unsigned long mask;
+ static const mask space = 1<<0;
+ static const mask print = 1<<1;
+ static const mask cntrl = 1<<2;
+ static const mask upper = 1<<3;
+ static const mask lower = 1<<4;
+ static const mask alpha = 1<<5;
+ static const mask digit = 1<<6;
+ static const mask punct = 1<<7;
+ static const mask xdigit = 1<<8;
+ static const mask blank = 1<<9;
+#endif // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
     static const mask alnum = alpha | digit;
     static const mask graph = alnum | punct;

This part looks ok, but we're getting enough platforms in here we should probably do away with the generic "else" branch (including for Apple and FreeBSD as you suggest above).

Thanks for working on this!

Howard

Is there a flag that can be used to identify your port instead of
just putting it all under "#else"? Your very first change:

unfortunately not. but it might not be needed for anything but the xlocale.h
I am right now patching musl to include the nessesary _l functions instead.
David Chisnall has convinced me, it is worth the effort.

-# include <xlocale.h>
+# include <locale.h>
#endif // _WIN32

breaks OS X and FreeBSD.

sorry that wasn't intended as a real patch. I don't quite know how to handle the missing xlocale.h case yet. maybe #ifdef __GLIBC__ it

#pragma GCC system_header
@@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo,
const char_type* hi) const
{
     size_t h = 0;
     const size_t sr = __CHAR_BIT__ * sizeof(size_t) - 8;
- const size_t mask = size_t(0xF) << (sr + 4);
+ const size_t cmask = size_t(0xF) << (sr + 4);
     for(const char_type* p = lo; p != hi; ++p)
     {
         h = (h << 4) + *p;
- size_t g = h & mask;
+ size_t g = h & cmask;
         h ^= g | (g >> sr);
     }
     return static_cast<long>(h);

Why does mask need to be renamed to cmask?

good question, clang throws a weird error otherwise:

/home/aep/proj/evprojects/heresy/libcpp/evocation_build/build/s/
include/__locale:238:23: error: expected unqualified-id
      const size_t mask = size_t(0xF) << (sr + 4);
                        ^

(it points at the = just in case my mail client screws up the indend)

@@ -330,7 +330,7 @@ public:
     static const mask punct = _PUNCT;
     static const mask xdigit = _HEX;
     static const mask blank = _BLANK;
-#else // __GLIBC__ || _WIN32
+#elif (__APPLE__ || __FreeBSD__)
#if __APPLE__
     typedef __uint32_t mask;
#elif __FreeBSD__
@@ -346,7 +346,19 @@ public:
     static const mask punct = _CTYPE_P;
     static const mask xdigit = _CTYPE_X;
     static const mask blank = _CTYPE_B;
-#endif // __GLIBC__ || _WIN32
+#else // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
+ typedef unsigned long mask;
+ static const mask space = 1<<0;
+ static const mask print = 1<<1;
+ static const mask cntrl = 1<<2;
+ static const mask upper = 1<<3;
+ static const mask lower = 1<<4;
+ static const mask alpha = 1<<5;
+ static const mask digit = 1<<6;
+ static const mask punct = 1<<7;
+ static const mask xdigit = 1<<8;
+ static const mask blank = 1<<9;
+#endif // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
     static const mask alnum = alpha | digit;
     static const mask graph = alnum | punct;

This part looks ok, but we're getting enough platforms in here we
should probably do away with the generic "else" branch (including for
Apple and FreeBSD as you suggest above).

right, if you want i can remove the Apple and Freebsd branches, but i am unsure about the implications.
David said there was some good reason for using the libc's types here. But i don't it from my libc's POV.

Thanks for working on this!

thanks for libc++ :slight_smile:

I am right now patching musl to include the nessesary _l functions instead.
David Chisnall has convinced me, it is worth the effort.

Great! I'll hold off committing any changes right now until the dust settles on your new _l functions.

#pragma GCC system_header
@@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo,
const char_type* hi) const
{
    size_t h = 0;
    const size_t sr = __CHAR_BIT__ * sizeof(size_t) - 8;
- const size_t mask = size_t(0xF) << (sr + 4);
+ const size_t cmask = size_t(0xF) << (sr + 4);
    for(const char_type* p = lo; p != hi; ++p)
    {
        h = (h << 4) + *p;
- size_t g = h & mask;
+ size_t g = h & cmask;
        h ^= g | (g >> sr);
    }
    return static_cast<long>(h);

Why does mask need to be renamed to cmask?

good question, clang throws a weird error otherwise:

/home/aep/proj/evprojects/heresy/libcpp/evocation_build/build/s/
include/__locale:238:23: error: expected unqualified-id
    const size_t mask = size_t(0xF) << (sr + 4);
                      ^

(it points at the = just in case my mail client screws up the indend)

It looks like this section of code could do with leading underscore treatment for all the local variables. We're probably being stepped on by macros.

Howard

allright this compiles for me with my recent patches to musl. yey :slight_smile:

There is still two unclear parts here

1) I commented out some usings. I don't quite know what they are supposed to do, but they break the build against musl.
2) The replacement of reinterpret to static cast is because in musl they are both long, and clang doesn't like reinterpret_cast from long to long.
    I'm unsure if that breaks something else.

diff --git a/include/__locale b/include/__locale
index 8aa8fc1..6029e6e 100644
--- a/include/__locale
+++ b/include/__locale
@@ -21,9 +21,9 @@
  #include <locale.h>
  #if _WIN32
  # include <support/win32/locale_win32.h>
-#else // _WIN32
+#elif (__GLIBC__ || __APPLE__ || __FreeBSD__)
  # include <xlocale.h>
-#endif // _WIN32
+#endif // _WIN32 || __GLIBC__ || __APPLE__ || __FreeBSD_

  #pragma GCC system_header

@@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo, const char_type* hi) const
  {
      size_t h = 0;
      const size_t sr = __CHAR_BIT__ * sizeof(size_t) - 8;
- const size_t mask = size_t(0xF) << (sr + 4);
+ const size_t cmask = size_t(0xF) << (sr + 4);
      for(const char_type* p = lo; p != hi; ++p)
      {
          h = (h << 4) + *p;
- size_t g = h & mask;
+ size_t g = h & cmask;
          h ^= g | (g >> sr);
      }
      return static_cast<long>(h);
@@ -330,7 +330,7 @@ public:
      static const mask punct = _PUNCT;
      static const mask xdigit = _HEX;
      static const mask blank = _BLANK;
-#else // __GLIBC__ || _WIN32
+#elif (__APPLE__ || __FreeBSD__)
  #if __APPLE__
      typedef __uint32_t mask;
  #elif __FreeBSD__
@@ -346,7 +346,19 @@ public:
      static const mask punct = _CTYPE_P;
      static const mask xdigit = _CTYPE_X;
      static const mask blank = _CTYPE_B;
-#endif // __GLIBC__ || _WIN32
+#else // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
+ typedef unsigned long mask;
+ static const mask space = 1<<0;
+ static const mask print = 1<<1;
+ static const mask cntrl = 1<<2;
+ static const mask upper = 1<<3;
+ static const mask lower = 1<<4;
+ static const mask alpha = 1<<5;
+ static const mask digit = 1<<6;
+ static const mask punct = 1<<7;
+ static const mask xdigit = 1<<8;
+ static const mask blank = 1<<9;
+#endif // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
      static const mask alnum = alpha | digit;
      static const mask graph = alnum | punct;

diff --git a/include/cmath b/include/cmath
index f8bc0df..eec9f03 100644
--- a/include/cmath
+++ b/include/cmath
@@ -623,7 +623,7 @@ isunordered(_A1 __x, _A2 __y)

  _LIBCPP_BEGIN_NAMESPACE_STD

-using ::signbit;
+//using ::signbit;
  using ::fpclassify;
  using ::isfinite;
  using ::isinf;
@@ -637,8 +637,8 @@ using ::islessgreater;
  using ::isunordered;

-using ::float_t;
-using ::double_t;
+//using ::float_t;
+//using ::double_t;

  // abs

diff --git a/include/locale b/include/locale
index 9f9996a..55185d6 100644
--- a/include/locale
+++ b/include/locale
@@ -3696,7 +3696,7 @@ messages<_CharT>::do_open(const basic_string<char>& __nm, const locale&) const
  #if _WIN32
      return -1;
  #else // _WIN32
- catalog __cat = reinterpret_cast<catalog>(catopen(__nm.c_str(), NL_CAT_LOCALE));
+ catalog __cat = static_cast<catalog>(catopen(__nm.c_str(), NL_CAT_LOCALE));
      if (__cat != -1)
          __cat = static_cast<catalog>((static_cast<size_t>(__cat) >> 1));
      return __cat;
@@ -3717,7 +3717,7 @@ messages<_CharT>::do_get(catalog __c, int __set, int __msgid,
                                                         __dflt.c_str() + __dflt.size());
      if (__c != -1)
          __c <<= 1;
- nl_catd __cat = reinterpret_cast<nl_catd>(__c);
+ nl_catd __cat = static_cast<nl_catd>(__c);
      char* __n = catgets(__cat, __set, __msgid, __ndflt.c_str());
      string_type __w;
      __widen_from_utf8<sizeof(char_type)*__CHAR_BIT__>()(back_inserter(__w),
@@ -3733,7 +3733,7 @@ messages<_CharT>::do_close(catalog __c) const
  #if !_WIN32
      if (__c != -1)
          __c <<= 1;
- nl_catd __cat = reinterpret_cast<nl_catd>(__c);
+ nl_catd __cat = static_cast<nl_catd>(__c);
      catclose(__cat);
  #endif // !_WIN32
  }

allright this compiles for me with my recent patches to musl. yey :slight_smile:

I'm not sure why but I'm having trouble applying this patch:

$ patch -p1 < temp.patch
patching file include/__locale
patch: **** malformed patch at line 15: @@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo, const char_type* hi) const

There is still two unclear parts here

1) I commented out some usings. I don't quite know what they are supposed to do, but they break the build against musl.

They are supposed to import symbols from the global namespace into namespace std (more below).

2) The replacement of reinterpret to static cast is because in musl they are both long, and clang doesn't like reinterpret_cast from long to long.
  I'm unsure if that breaks something else.

On OS X catopen() returns a pointer. Perhaps a C-style cast is the way to go here.

diff --git a/include/__locale b/include/__locale
index 8aa8fc1..6029e6e 100644
--- a/include/__locale
+++ b/include/__locale
@@ -21,9 +21,9 @@
#include <locale.h>
#if _WIN32
# include <support/win32/locale_win32.h>
-#else // _WIN32
+#elif (__GLIBC__ || __APPLE__ || __FreeBSD__)
# include <xlocale.h>
-#endif // _WIN32
+#endif // _WIN32 || __GLIBC__ || __APPLE__ || __FreeBSD_

#pragma GCC system_header

@@ -235,11 +235,11 @@ collate<_CharT>::do_hash(const char_type* lo, const char_type* hi) const
{
    size_t h = 0;
    const size_t sr = __CHAR_BIT__ * sizeof(size_t) - 8;
- const size_t mask = size_t(0xF) << (sr + 4);
+ const size_t cmask = size_t(0xF) << (sr + 4);

Does __mask work for you instead? And while you're at it can you prefix every local variable in here with "__"?

    for(const char_type* p = lo; p != hi; ++p)
    {
        h = (h << 4) + *p;
- size_t g = h & mask;
+ size_t g = h & cmask;
        h ^= g | (g >> sr);
    }
    return static_cast<long>(h);
@@ -330,7 +330,7 @@ public:
    static const mask punct = _PUNCT;
    static const mask xdigit = _HEX;
    static const mask blank = _BLANK;
-#else // __GLIBC__ || _WIN32
+#elif (__APPLE__ || __FreeBSD__)
#if __APPLE__
    typedef __uint32_t mask;
#elif __FreeBSD__
@@ -346,7 +346,19 @@ public:
    static const mask punct = _CTYPE_P;
    static const mask xdigit = _CTYPE_X;
    static const mask blank = _CTYPE_B;
-#endif // __GLIBC__ || _WIN32
+#else // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
+ typedef unsigned long mask;
+ static const mask space = 1<<0;
+ static const mask print = 1<<1;
+ static const mask cntrl = 1<<2;
+ static const mask upper = 1<<3;
+ static const mask lower = 1<<4;
+ static const mask alpha = 1<<5;
+ static const mask digit = 1<<6;
+ static const mask punct = 1<<7;
+ static const mask xdigit = 1<<8;
+ static const mask blank = 1<<9;
+#endif // __GLIBC__ || _WIN32 || __APPLE__ || __FreeBSD__
    static const mask alnum = alpha | digit;
    static const mask graph = alnum | punct;

diff --git a/include/cmath b/include/cmath
index f8bc0df..eec9f03 100644
--- a/include/cmath
+++ b/include/cmath
@@ -623,7 +623,7 @@ isunordered(_A1 __x, _A2 __y)

_LIBCPP_BEGIN_NAMESPACE_STD

-using ::signbit;
+//using ::signbit;

At first I was thinking this was a problem because signbit is a macro for you. However this code earlier in cmath should turn signbit from a macro into an inline function:

// signbit

#ifdef signbit

template <class _A1>
_LIBCPP_ALWAYS_INLINE
bool
__libcpp_signbit(_A1 __x)
{
    return signbit(__x);
}

#undef signbit

template <class _A1>
inline _LIBCPP_INLINE_VISIBILITY
typename std::enable_if<std::is_floating_point<_A1>::value, bool>::type
signbit(_A1 __x)
{
    return __libcpp_signbit(__x);
}

#endif // signbit

Can you preprocess this and find out why "using ::signbit" is causing a problem? I can't just eliminate this using without breaking code.

using ::fpclassify;
using ::isfinite;
using ::isinf;
@@ -637,8 +637,8 @@ using ::islessgreater;
using ::isunordered;
using ::isunordered;

-using ::float_t;
-using ::double_t;
+//using ::float_t;
+//using ::double_t;

Same here.

// abs

diff --git a/include/locale b/include/locale
index 9f9996a..55185d6 100644
--- a/include/locale
+++ b/include/locale
@@ -3696,7 +3696,7 @@ messages<_CharT>::do_open(const basic_string<char>& __nm, const locale&) const
#if _WIN32
    return -1;
#else // _WIN32
- catalog __cat = reinterpret_cast<catalog>(catopen(__nm.c_str(), NL_CAT_LOCALE));
+ catalog __cat = static_cast<catalog>(catopen(__nm.c_str(), NL_CAT_LOCALE));

This breaks OS X. Try:

catalog __cat = (catalog)catopen(__nm.c_str(), NL_CAT_LOCALE);

(it seems a shame we have to resort to this)

Howard

I'm not sure why but I'm having trouble applying this patch:

yes may mail client line wrapped it :frowning:
I attached it this time.

2) The replacement of reinterpret to static cast is because in musl they are both long, and clang doesn't like reinterpret_cast from long to long.
  I'm unsure if that breaks something else.

On OS X catopen() returns a pointer. Perhaps a C-style cast is the
way to go here.

I dunno, one line below those casts there is a static_cast anyway, so i'd assume it's ok?

There is still two unclear parts here

1) I commented out some usings. I don't quite know what they are supposed to do, but they break the build against musl.

They are supposed to import symbols from the global namespace into
namespace std (more below).

I'll analyze this later today as you suggested.
Although i still don't understand the point of importing from the global namespace, as this should be imported anyway, right?

Does __mask work for you instead? And while you're at it can you
prefix every local variable in here with "__"?

Yes. Of course.

0001-DONT-PUSH-temp-hacks-for-musl.patch (4.05 KB)

On OS X catopen() returns a pointer. Perhaps a C-style cast is the
way to go here.

i changed it. let's see if the attached patch works for you.
This works for me, if i additionally uncomment the usings.

I'll analyze this later today as you suggested.

No point in checking any further, musl is just missing some C99 feature :frowning:
So this has to be fixed in musl.

I still wonder why this compiles anyway. Probably it will throw undefined references later?
Oh well, we'll see after i fixed musl.

Does __mask work for you instead? And while you're at it can you
prefix every local variable in here with "__"?

done

0001-portability-fixes-for-generic-libc.patch (4.28 KB)

This patch looks good to me except for the casting in messages<_CharT> in <locale>:

../include/locale:3719:21: error: cannot cast from type 'catalog' (aka 'int') to pointer type 'nl_catd' (aka '__nl_cat_d *')
    nl_catd __cat = static_cast<nl_catd>(__c);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~
../include/locale:3735:21: error: cannot cast from type 'catalog' (aka 'int') to pointer type 'nl_catd' (aka '__nl_cat_d *')
    nl_catd __cat = static_cast<nl_catd>(__c);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.

For me, nl_catd is a pointer. The only way to make both of us happy is to replace the reinterpret_cast with a C (cast)expression.

Howard

For me, nl_catd is a pointer. The only way to make both of us happy
is to replace the reinterpret_cast with a C (cast)expression.

screwup from my side, i sent the wrong patch. sorry!

next attempt, see attachment

0001-portability-fixes-for-generic-libc.patch (4.28 KB)

cused. wrong again!
now this one it should be...

0001-portability-fixes-for-generic-libc.patch (4.26 KB)

About to commit, but one more thing: Could you send me a patch to CREDITS.TXT?

Thanks!
Howard

if you insist :slight_smile:
doesn't feel necessary though.

credits.patch (328 Bytes)

Thanks! Committed revision 141672.

Howard