Selector::getAsString problem in CIndex

I’m looking at a test failure of c-index-api-test.m on Windows, and I’ve run into a problem I don’t know how to easily fix.

In the function Selector::getAsString function in IdentifierTable.cpp at line 330:

return II->getName().str() + “:”;

On Windows, at least, this isn’t working as expected. I think it’s because a tempory string created gets destructed before the return completes, causing the resulting string pointer to point to garbage, resulting in garbage for the identifier name in the index output.

If someone more knowlegeable about it could take a look at it, I’d appreciate it.

Note that if you need to run it on windows, you’ll need some CIndex changes I’m working on, but haven’t completed yet, so let me know if you need them.

-John

Now that I have electricity again…

Sorry, I was looking too deep, not noticing that the function returned a std::string. The problem was up higher in CIndex.cpp, in a couple of the functions that return const char *, with the same problem I described of the temporary going away before the string can be used.

I worked around it by using a static buffer, which is admittedly not so good. A cleaner way might be to have the user pass in a string buffer, since it seems this needs to be a C interface. Should I do that, or does the API need to remain unchanged? Let me know if there’s a better way.

The enclosed patch contains my hacked version, which also includes the DLL exporting glue (redone in the naming style I’m used to - apologies to Jeff Krall, who’d already done it).

Note that I moved the getLocationFromCursor function out of the extern “C” {} because MSVC was complaining about it because of the object return type.

Where does “basename” come from on the non-Windows platforms? It was unresolved, so I guessed at it, but which seems to work.

The c-index-api-test.m test passes on Windows with these changes.

-John

cindex.patch (14.4 KB)

Hi John,

Comments below…

Now that I have electricity again…

Sorry, I was looking too deep, not noticing that the function returned a std::string. The problem was up higher in CIndex.cpp, in a couple of the functions that return const char *, with the same problem I described of the temporary going away before the string can be used.

Nice catch. I guess other platforms are more forgiving…

I worked around it by using a static buffer, which is admittedly not so good. A cleaner way might be to have the user pass in a string buffer, since it seems this needs to be a C interface. Should I do that, or does the API need to remain unchanged? Let me know if there’s a better way.

I’d like to avoid cluttering the interface if possible. I’m not an expert on std::String, so I’ll have to look into this. I’ll get back to you…

The enclosed patch contains my hacked version, which also includes the DLL exporting glue (redone in the naming style I’m used to - apologies to Jeff Krall, who’d already done it).

Can you separate this from the other stuff in the patch? I’d like to keep this type of mechanical change separate from the trickier stuff…

Note that I moved the getLocationFromCursor function out of the extern “C” {} because MSVC was complaining about it because of the object return type.

Where does “basename” come from on the non-Windows platforms? It was unresolved, so I guessed at it, but which seems to work.

basename is a Unix/Linux function…not standard C or POSIX. Does Windows have something called _splitpath()?

The c-index-api-test.m test passes on Windows with these changes.

Great. Thanks for helping out on this!

snaroff

Hi John,

The patch didn’t apply for me (it’s based on yesterday’s source). Could you update it?

On the string lifetime issue…after consulting with Daniel, we’d prefer to avoid cluttering up the API to deal with lazily computed names (that aren’t part of the AST per se).

One spin on your change…

+// For temporary string returns.
+static char CIBuffer[256];

Just after sending this, I realized there may be another AST-based solution:

The ObjC AST’s could implement the following:

const char *getSelectorAsCString(); // for multi-keyword selectors, the result would be computed/cached.

This would make the semantics uniform with getNameAsCString(). Since it looks like this API may be deprecated, maybe not a great idea.

Since this is only an issue when integrating a C-based API with C++, it may be overkill to solve this issue at the AST level…

Nevertheless, I wanted to mention it for completeness…

snaroff

If the names, such as selectors, are globally unique, it makes sense to go with a). Unless, per-index naming is needed for functionality reasons.

- Fariborz

Hi John,

The patch didn't apply for me (it's based on yesterday's source). Could you update it?

On the string lifetime issue...after consulting with Daniel, we'd prefer to avoid cluttering up the API to deal with lazily computed names (that aren't part of the AST per se).

One spin on your change...

+// For temporary string returns.
+static char CIBuffer[256];
+

...is to have a static llvm::StringSet that contains copies of the strings. For example:

static llvm::StringSet<> LazilyComputedShortLivedNames;

While this is more general than your change (no limit, can hold an arbitrary # of strings), it is static data (which is less than great).

It would be nice if this data were cached per-Index. Unfortunately, this would require either (a) more bookkeeping or (b) more explicit 'CXIndex' arguments be passed around.

So...it seem like the choices are:

(a) static StringSet<> to keep the API simple (i.e. no requirement on the client to size/deal with strings).
(b) per-index StringSet<> to keep the string API simple. Modify the API to pass CXIndex in more places.
(c) per-index StringSet<> to keep the string API simple. Add bookkeeping to navigate from a CXDecl->CXTranslationUnit->CXIndex.
(d) Avoid passing back C strings entirely (what I previously said we'd like to avoid). Complicates the API, but makes the contract very explicit.
(e) Go with your change (documenting the limitations).

At the moment, this is primarily a problem with ObjC names (i.e. Selectors), however this will be an issues for C++ as well I believe (for names that contain type info).

If the names, such as selectors, are globally unique, it makes sense to go with a). Unless, per-index naming is needed for functionality reasons.

Makes sense, however it's not just functionality.

I'm also concerned about reclaiming the string pool. If it's static, I can't reclaim it. If it's per-index, clang_disposeIndex() could reclaim it.

snaroff

Steve,

Here’s another patch for the DLL imports part of it, redone on the current versions. I made a change to it, however, inputting the CMAKE_LIB definition for triggering the export verses import via a flag from the build instead of the CIndex.cpp file. This might avoid confusion in the future if additional files are added later to the DLL.

For the rest, I’ll wait to hear from you.

-John

cindex_imports_1.patch (15.2 KB)