Constant CF/NSString and Unicode

I just wonder if there is some kind of unicode support in __builtin___CFStringMakeConstantString.

In the current GCC version, when you compile an objc file, constant strings that contains non-ascii chars are converted into utf-16 strings and a flag is set into the generated CFString.
The fact that it works only for objc file look more like a design decision than a technical limit, and this feature can easily be extended to c files. In fact, I managed to implement this feature in cc1 and it look like it works. (if I'm wrong, feel free to correct me).

And what about clang and unicode CFString ?

You're trying to use CFString in a C file? How can that possibly work?

-Eli

I just wonder if there is some kind of unicode support in
__builtin___CFStringMakeConstantString.

In GCC or clang? Clang doesn't have any unicode support yet.

In the current GCC version, when you compile an objc file, constant
strings that contains non-ascii chars are converted into utf-16
strings and a flag is set into the generated CFString.

Ok. Fariborz implemented that fwiw.

The fact that it works only for objc file look more like a design
decision than a technical limit, and this feature can easily be
extended to c files. In fact, I managed to implement this feature in
cc1 and it look like it works. (if I'm wrong, feel free to correct me).

And what about clang and unicode CFString ?

I'm not sure what you mean, can you explain a bit more?

-Chris

It works just like in clang/test/CodeGen/cfstring.c

Isn't CoreFoundation a C API ? What's the problem with CFString in C file ?

Yep,
put this simple code snippet in cfstring.c :

#include <CoreFoundation/CoreFoundation.h>

int main(int argc, char **argv) {
   CFShowStr(CFSTR("hé hé hé"));
   CFShow(CFSTR("hé hé hé"));
   return 0;
}

if you compile this file using "gcc -o cfstring cfstring.c -framework CoreFoundation" and run it you got:

Length 11
IsEightBit 1
HasLengthByte 0
HasNullByte 1
InlineContents 0
Allocator SystemDefault
Mutable 0
Contents 0x1ff2
h\u221a\u00a9 h\u221a\u00a9 h\u221a\u00a9

Now, if you compile this same file using

gcc -x objective-c -o cfstring cfstring.c -framework CoreFoundation

the output is:

Length 8
IsEightBit 0
HasLengthByte 0
HasNullByte 0
InlineContents 0
Allocator SystemDefault
Mutable 0
Contents 0x1fee
h\u00e9 h\u00e9 h\u00e9

Maybe I miss something, but I really do not understand the current limitation.
As clang will probably implements this feature some day, I just wonder if it should duplicate the GCC behavior (emitting a warning and generating an ascii based CFString) or if it can be extended to support also UTF-16 CFString generation in plain C file.

Now I'm curious. Does this behavior change using -fconstant-cfstrings
instead of defining the language as ObjC? According to the documentation,
it looks like that is the flag to enable __builtin__CFStringMakeConstantString.

-Matthew

This flags is on by default on modern version of Xcode (I think it depends the macosx-min-version flags).
Turning it off (-fno-constant-cfstrings ) remove the compilation warning and defere it at runtime :wink:

This is what the app log when CFSTR is called with "false constant cfstrings" that contains something that's not ascii.

WARNING: CFSTR("h\37777777703\37777777651 h\37777777703\37777777651 h\37777777703\37777777651") has non-7 bit chars, interpreting using MacOS Roman encoding for now, but this will change. Please eliminate usages of non-7 bit chars (including escaped characters above \177 octal) in CFSTR().

I'm not suprise by this result.
the GCC __builtin___CFStringMakeConstantString codegen function try to determine if the argument string contains non ascii chars. If it find one, it try to convert the string into an unicode string and to save it as a constant string in the module.
But the function that converts the string and writes it, is implemented only in the obj-c module (cc1obj and cc1objplus) and not in the c one (cc1). So in C the convertion always returns null and GCC fall back to ascii string generation.