How to represent zero-sized string?

Hi all,

int main() {
   t("");
   return 0;
}

On Mac OS X, llvm-gcc compiles the zero-sized string to:
.lcomm LC,1,0

gcc:
         .cstring
LC0:
         .ascii "\0"

The difference seems innocent enough. However, in objc if the zero-sized string is part of a cfstring, it causes a problem. The linker expects it in the readonly __cstring section, but llvm puts it in the read / write bss section.

The problem is llvm represents this as

@"\01LC" = internal constant [1 x i8] zeroinitializer

CodeGen can tell it should go into a read only section, but it cannot know it's a cstring. Any ideas how I can fix this? If I write the zero-sized string as c"A\00", bitcode reader still turns it back to zeroinitializer.

Thanks,

Evan

The difference seems innocent enough. However, in objc if the zero-
sized string is part of a cfstring, it causes a problem. The linker
expects it in the readonly __cstring section, but llvm puts it in the
read / write bss section.

That seems extremely weird... what sort of magic is objc using that
could possibly care where a string is stored? Can you give a more
complete testcase? It sounds like LLVM isn't modelling something
which it really should be...

The problem is llvm represents this as

@"\01LC" = internal constant [1 x i8] zeroinitializer

CodeGen can tell it should go into a read only section, but it cannot
know it's a cstring. Any ideas how I can fix this? If I write the zero-
sized string as c"A\00", bitcode reader still turns it back to
zeroinitializer.

LangRef claims that you can specify a section for globals, although I
can't actually manage to get it to work...

-Eli

The difference seems innocent enough. However, in objc if the zero-
sized string is part of a cfstring, it causes a problem. The linker
expects it in the readonly __cstring section, but llvm puts it in the
read / write bss section.

That seems extremely weird... what sort of magic is objc using that
could possibly care where a string is stored? Can you give a more
complete testcase? It sounds like LLVM isn't modelling something
which it really should be...

The runtime probably has various requirements about this.

The problem is llvm represents this as

@"\01LC" = internal constant [1 x i8] zeroinitializer

CodeGen can tell it should go into a read only section, but it cannot
know it's a cstring. Any ideas how I can fix this? If I write the zero-
sized string as c"A\00", bitcode reader still turns it back to
zeroinitializer.

LangRef claims that you can specify a section for globals, although I
can't actually manage to get it to work...

We normally specify the sections in the config/darwin.c file for various special kinds of OBJC names (see darwin_objc_llvm_special_name_section()). Perhaps we could do the same for this? Obviously, it's not a special name, but if we could tag the global with the correct section then perhaps all will be well.

-bw