Strange behavior when converting arrays to strings

Hello,

I found saw some strange behavior (to me) when converting constant arrays to strings. Consider the following example:

std::string Text = “HelloWorld”;

unsigned TextLengthBefore = Text.length();

ConstantArray *pArray = dyn_cast(llvm::ConstantArray::get(pModule->getContext(), Text, true));

unsigned NumElements = pArray->getNumOperands();

Text = pArray->getAsString();

unsigned TextLengthAfter = Text.length();

After running this example here are the values in each variable:

TextLengthBefore = 10

NumElements = 11

TextLengthAfter = 11

In the conversion from constant array to a string the null terminating character is added as part of the string and becomes the 11th character. This becomes a problem when the data is streamed out to a buffer because a NULL is inserted in the middle. Below is the code for getAsString:

1: std::string ConstantArray::getAsString() const {

2: assert(isString() && “Not a string!”);

3: std::string Result;

4: Result.reserve(getNumOperands());

5: for (unsigned i = 0, e = getNumOperands(); i != e; ++i)

6: Result.push_back((char)cast(getOperand(i))->getZExtValue());

7: return Result;

8: }

I think that the loop terminating condition in line 5 should be changed from != to <. Does this look right?

Thanks,

Javier

Hi,

I haven’t seen a response and I’m curious if I should submit a patch for this.

Thanks,

Javier

Hi Javier,

I found saw some strange behavior (to me) when converting constant
arrays to strings. Consider the following example:

std::string Text = "HelloWorld";

unsigned TextLengthBefore = Text.length();

ConstantArray *pArray =
dyn_cast<ConstantArray>(llvm::ConstantArray::get(pModule->getContext(),
Text, true));

from Constants.h:

   /// This method constructs a ConstantArray and initializes it with a text
   /// string. The default behavior (AddNull==true) causes a null terminator to
   /// be placed at the end of the array. This effectively increases the length
   /// of the array by one (you've been warned). However, in some situations
   /// this is not desired so if AddNull==false then the string is copied without
   /// null termination.
   static Constant *get(LLVMContext &Context, StringRef Initializer,
                        bool AddNull = true);

Ciao,

Duncan.

Hi Duncan,

Thanks for the reply. I had seen that notice before and I'm sorry I didn't mention it in my original email. In the default case a NULL gets added at the end of the array which is the way character arrays are usually represented in memory. In this default case a conversion back to a string that's not the same as the original. In a scenario where only the array is passed to a function without knowledge about how it was generated one would have to resort to using the c_str() string class member to manipulate the string using the old school str functions. This I believe goes against the spirit of the ConstantArray class implementation.

I probably was too hasty in proposing a solution. Perhaps a better one is to add a bool member variable to the ConstantArray class to store whether a NULL was added to the end of the array. The new member variable would be checked during a getAsString() and if true the NULL character won't be added to the string. With this change the conversion from string to ConstantArray and back would always result in the same string.

Thanks,
Javier