RFC: -fwritable-strings Change

There is a problem with Objective-C code where a null string is placed
in the wrong section. If we have this code:

#include <Foundation/Foundation.h>
void foo() {
  NSLog(@"");
}

The null string is defined like this:

  .const
  .lcomm LC,1,0 ## LC

Causing our linker to go nuts, because it expects anonymous strings to
be in the __cstring section. I came up with the attached patch, which
places the string in the .cstring section.

My GCC-fu isn't great. Could someone review this to see if I broke anything?

Thanks!
-bw

cfstring.diff (2.74 KB)

There is a problem with Objective-C code where a null string is placed
in the wrong section. If we have this code:

#include <Foundation/Foundation.h>
void foo() {
NSLog(@"");
}

The null string is defined like this:

  .const
  .lcomm LC,1,0 ## LC

Causing our linker to go nuts, because it expects anonymous strings to
be in the __cstring section. I came up with the attached patch, which
places the string in the .cstring section.

My GCC-fu isn't great. Could someone review this to see if I broke anything?

I don't see anything obvious wrong, but this is an easy area to break.
I'd recommend running the gcc testsuite and checking for regressions.

Are you sure the problem is limited to CFStrings? That seems like the wrong
thing to be checking somehow.

cfstring.diff (2.74 KB)

There is a problem with Objective-C code where a null string is placed
in the wrong section. If we have this code:

#include <Foundation/Foundation.h>
void foo() {
NSLog(@"");
}

The null string is defined like this:

       .const
       .lcomm LC,1,0 ## LC

Causing our linker to go nuts, because it expects anonymous strings to
be in the __cstring section. I came up with the attached patch, which
places the string in the .cstring section.

My GCC-fu isn't great. Could someone review this to see if I broke
anything?

I don't see anything obvious wrong, but this is an easy area to break.
I'd recommend running the gcc testsuite and checking for regressions.

Okay, that's a good plan.

Are you sure the problem is limited to CFStrings? That seems like the wrong
thing to be checking somehow.

Nick Kledzik had this to say about the initial radar (<rdar://problem/6479858>):

"1/7/09 11:15 PM Nick Kledzik:
The linker issues this warning when the .o file contains a literal
NSString (e.g. @"hello") and the bytes (usually utf8) that the object
pointers to are not an anonymous string in the __cstring section."

And later:

"1/8/09 10:37 PM Nick Kledzik:
I traced this down to an llvm bug with zero length constant CF
strings. Here is a simple example:

[/tmp]> cat foo.m
#include <Foundation/Foundation.h>
void foo() {
  NSLog(@"");
}

[/tmp]> /Developer/usr/bin/llvm-gcc-4.2 foo.m -arch x86_64 -Os -S
[/tmp]> cat foo.s
...
L_unnamed_cfstring_0: ## L_unnamed_cfstring_0
        .quad ___CFConstantStringClassReference
        .long 1992 ## 0x7C8
        .space 4
        .quad LC
        .space 8
        .const

        .lcomm LC,1,0 ## LC
...
[/tmp]>

The bug is that LC1 should be a read-only zero byte in the cstring
section. The compiler is making this a r/w zero byte in the bss
section."

I initially took this to mean that it's an Objective-C only problem.
Writing a simple test program, it looks like GCC puts the null string
in the ".cstring " section even for non-CFStrings. So the problem may
be more extensive than I first thought.

-bw

Hello, Bill

I don’t see anything obvious wrong, but this is an easy area to break.

I’d recommend running the gcc testsuite and checking for regressions.

Okay, that’s a good plan.

I’m strongly agains any target-specific and language-specific hacks in the generic tree-conversion code. What if we decide to support objc on non-darwin platforms someday?

This is theoretically possible (well, modulo all bunch of apple-local stuff arond ;)).

I believe, that TAI should place constant zeros in mergeable const section, not in BSS. Could you please send me the .bc file?

Hi Bill,

My GCC-fu isn't great. Could someone review this to see if I broke anything?

is there no better way?! Is this an objc problem, a darwin problem or...?
If it is an objc problem, can't you just have objc set the section on the
strings (DECL_SECTION_NAME) when it creates them? Anything but this patch,
please! :slight_smile:

Ciao,

Duncan.

I should have put this in darwin.h and called it from there. I think
that there is a far more fundamental problem that affects more than
Objective-C. Even with C code, we place a null string in a writable
section, which isn't correct. I've attached the .bc file. The culprit
is the @"\01LC" global. (This is generated with my hack, so it has an
explicit "section".) Any suggestions for a back-end fix would be
welcome, as I'm not a huge fan of hacking the front-end to get the
sections for symbols correct. :slight_smile:

-bw

f.bc (1.05 KB)

Hi Duncan,

There is resistance to this patch, so it's not going in. :slight_smile: As I
mentioned to Anton, I think that there is a more fundamental problem
here. It isn't just Objective-C. I'm going to investigate more. It
might be possible to handle this in the back-end (which I would prefer
to do).

-bw

-bw

You could say that, but it's not really wrong... say you had a 10
kilobyte struct that was all null. If you put it into a data section,
it takes up 10k in the executable (unless Darwin has some unusual data
section); it takes up no space at all in the BSS. So naturally we'd
prefer to put it in the BSS... for a 1 byte string that doesn't really
matter, though.

The thing that seems worrisome that you're depending on choices that
could be made arbitrarily; no matter where the string is, it normally
doesn't change the semantics of the program. In certain situations,
it could be a profitable transformation to, for example, put a string
on the stack. If the runtime environment is depending on it being in
a specific place, you have to mark it so LLVM knows it can't do
arbitrary manipulations on it.

-Eli

Even with C code, we place a null string in a writable
section, which isn't correct.

You could say that, but it's not really wrong... say you had a 10
kilobyte struct that was all null. If you put it into a data section,
it takes up 10k in the executable (unless Darwin has some unusual data
section); it takes up no space at all in the BSS. So naturally we'd
prefer to put it in the BSS... for a 1 byte string that doesn't really
matter, though.

Sure. I wasn't suggesting that all zero initialized variables be moved
out of the BSS section. :slight_smile: But if we place a null string in a
writable section and we don't have "writable strings" enabled, then,
well, it's in a writable section when it shouldn't be. It just happens
that (for Objective-C) our linker complains about this (it doesn't
complain about it for non-Objective-C languages). A non-Darwin
implementation of Objective-C would probably want to complain about
this (I'm guessing, as I don't know Obj-C well enough to know its
semantics).

The thing that seems worrisome that you're depending on choices that
could be made arbitrarily; no matter where the string is, it normally
doesn't change the semantics of the program. In certain situations,
it could be a profitable transformation to, for example, put a string
on the stack. If the runtime environment is depending on it being in
a specific place, you have to mark it so LLVM knows it can't do
arbitrary manipulations on it.

I think that one of the problems with Objective-C (and why this became
a problem in the first place) is that it does require the string to be
in a specific place. The __builtin_CFString structure has, among other
things, a field that points to the actual string. When this string
isn't in the correct section, the linker complains.

-bw

Hello, Bill

I should have put this in darwin.h and called it from there. I think
that there is a far more fundamental problem that affects more than
Objective-C. Even with C code, we place a null string in a writable
section, which isn't correct. I've attached the .bc file. The culprit
is the @"\01LC" global. (This is generated with my hack, so it has an
explicit "section".) Any suggestions for a back-end fix would be
welcome

Fix commited :slight_smile: Actually it was my fault. We have ConstantArray stuff for everything string-like but non-zero and ConstantAggregateZero for string-like but zero (rather odd!). I was not aware about such during writing of section selection code, thus missed the second opportunity. Now everything should be ok as of r63139. Let me know, if it still fails for you somehow.

I'm currently in paper-deadline-really-soon-but-no-paper-yet mode and thus have really no free time, please consider adding testcase by yourself :slight_smile:

Anton,

You rock!! Thanks for the fix. :slight_smile:

Good luck on the paper!

-bw

But if we place a null string in a
writable section and we don't have "writable strings" enabled, then,
well, it's in a writable section when it shouldn't be.

If writable strings are disabled, writing to a string is undefined;
it's not a bug if it doesn't crash.

I think that one of the problems with Objective-C (and why this became
a problem in the first place) is that it does require the string to be
in a specific place. The __builtin_CFString structure has, among other
things, a field that points to the actual string. When this string
isn't in the correct section, the linker complains.

Mmm... if the __builtin_CFString structure has special properties to
the linker, its strings really ought to be marked. There are all
sorts of transformations which could make it impossible to distinguish
otherwise; for example, a [4 x i8] constant could be turned into an
i32 constant, or the empty string could be merged into an integer
constant with the value 256.

-Eli

But if we place a null string in a
writable section and we don't have "writable strings" enabled, then,
well, it's in a writable section when it shouldn't be.

If writable strings are disabled, writing to a string is undefined;
it's not a bug if it doesn't crash.

That's not what I meant. Symbols and data should go in their correct
sections no matter what the program eventual does.

I think that one of the problems with Objective-C (and why this became
a problem in the first place) is that it does require the string to be
in a specific place. The __builtin_CFString structure has, among other
things, a field that points to the actual string. When this string
isn't in the correct section, the linker complains.

Mmm... if the __builtin_CFString structure has special properties to
the linker, its strings really ought to be marked.

Yup! That's what my hack was attempting to do.

There are all
sorts of transformations which could make it impossible to distinguish
otherwise; for example, a [4 x i8] constant could be turned into an
i32 constant, or the empty string could be merged into an integer
constant with the value 256.

Anton applied a much better patch in the back-end.

-bw