Addressing const reference in ArrayRef

Analyzing why GCC failed to build LLVM recently, one root cause lies
in definition of ArrayRef:
// ArrayRef.h:
ArrayRef(const T &OneElt) : Data(&OneElt), Length(1) {}

Here address of const reference is taken and stored to an object. It
is believed that live range of const reference is only at the function
call site, escaping of its address to an object with a longer live
range is invalid. Referring to the case and discussion here:
https://gcc.gnu.org/ml/gcc/2014-08/msg00173.html

So I would suggest to fix ArrayRef. Adding a non-const version of
constructor should work, but it still leaves the vulnerability in
const version, which I'd expect on people in the community to work out
a solution.
ArrayRef(T &OneElt) : Data(&OneElt), Length(1) {}

Thanks,
Joey

I seem to recall discussing this before - is there an existing llvm bug filed, another email thread or something (or perhaps it was just an IRC conversation)? It would be good to keep all the discussion together or at least reference the prior (llvm community) discussion.

Have you tried applying your suggested patch? I assume you meant to suggest replacing/modifying the existing ctor (to take non-const ref) rather than adding another?

I’d assume that causes a lot of build failures as we probably rely on binding temporaries to ArrayRef’s in many places correctly (most ArrayRef’s are temporaries, not local variables).

I think in the previous discussion I suggested we might just want to make a practice of treating named/local (not parameter) ArrayRef’s with greater suspicion, the same way we do for twine, but I’m not sure.

We could move this ctor into a factory function (and/or just make the CTor private and friend the makeArrayRef helper for this case) to make it more obvious/easier to find bad call sites. But it would be helpful to have the context of the prior discussion to continue from there.

Just applied and it didn't work. I'm fresh to LLVM so not sure if it
is built correct. Anyway, if the issue is known I'll leave people from
this community to solve it.

Thanks,
Joey

Just applied and it didn't work. I'm fresh to LLVM so not sure if it
is built correct.

Did it build successfully without changes?

Anyway, if the issue is known I'll leave people from
this community to solve it.

Sorry I've seemed a bit blunt, I didn't mean to discourage you from
engaging in the community. It's a useful thing to consider, it's just
difficult rehashing previous conversations without the context there.

I assume the original bug(s) have been fixed? The question is just how
to avoid introducing more instances of this bug?

- David

David Blaikie <dblaikie@gmail.com> writes:

I seem to recall discussing this before - is there an existing llvm
bug filed, another email thread or something (or perhaps it was just
an IRC conversation)? It would be good to keep all the discussion
together or at least reference the prior (llvm community) discussion.

I'm not sure if it's been discussed before, but it's led to issues as
recently as a couple of weeks ago:

    http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/229557.html

I certainly think it's worthwhile to make this API safer, or at least to
document the caveats in how it can safely be used.

I think David is referring this thread:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074909.html

I also tried building with the suggested patch (replacing/modifying the existing ctor (to take non-const ref) ) and this indeed results in plenty of build errors. I tried fixing them for a few hours but haven’t seen light at the end of the tunnel.

Best regards,

Andy

Yeah - I suspect there are just too many cases where we use this ctor
correctly: where the ArrayRef is only a temporary to a function call.

Perhaps this is a problem for which the best solution is a clang-tidy
checker - though I'm not sure if we have good integration there yet.

(& yes, Andreas, that looks like the previous thread - thanks!)

Is there some way we can get lifetime extension of temporaries to kick in here?

Is there some way we can get lifetime extension of temporaries to kick in
here?

Nope - since the temporary is a subexpression - not the thing being declared.

A variation of these solutions makes sense to me. I strongly dislike
this constructor since it leads to other subtle bugs. E.g., suppose you
have this API:

    void foo(ArrayRef<const char *> Strings);

and some caller uses `foo("string")`.

If you change the API to add an optional `bool`:

    void foo(ArrayRef<const char *> Strings, bool B = false);
    void foo(bool B = false);

the caller silently switches to `foo(bool("string"))` instead of
`foo(ArrayRef<const char *>("string"))`.

If we remove the `ArrayRef` constructor AND the helper, and add a new
helper like:

    template <class T> ArrayRef<T> makeTempArrayRef(const T &OneElt) {
      return ArrayRef<T>(&OneElt, &OneElt + 1);
    }

then both types of bugs will be more rare and obvious.

For the OP's case, the name "Temp" makes it more clear that the ArrayRef
shouldn't be named. For my case, since the caller is forced to use
`foo(makeTempArrayRef("string"))`, it's easier to make safe API changes.

This would cause a lot of churn though -- ArrayRefs are *usually*
temporaries. Although this makes sense to me, disallowing named
ArrayRefs might be more practical.

Is there some way we can get lifetime extension of temporaries to kick in
here?

Nope - since the temporary is a subexpression - not the thing being declared.

We could move this ctor into a factory function (and/or just make the
CTor
private and friend the makeArrayRef helper for this case) to make it
more
obvious/easier to find bad call sites. But it would be helpful to
have
the
context of the prior discussion to continue from there.

A variation of these solutions makes sense to me. I strongly dislike
this constructor since it leads to other subtle bugs. E.g., suppose you
have this API:

    void foo(ArrayRef<const char *> Strings);

and some caller uses `foo("string")`.

If you change the API to add an optional `bool`:

    void foo(ArrayRef<const char *> Strings, bool B = false);
    void foo(bool B = false);

the caller silently switches to `foo(bool("string"))` instead of
`foo(ArrayRef<const char *>("string"))`.

My perspective on this /\ is that this kind of API refactoring problem
is just one instance of a broader problem (replace ArrayRef with int,
T*, etc in the above and you're back to square one). We could catch
some instances (such as the one you've shown) with a broader
-Wliteral-conversion warning (any literal other than "true" or "false"
should warn if converted to bool - modulo the usual false positive
suppressions... (RHS of an && for assertions, etc)). Of course that
would be thwarted by moving the string into a local variable. Nothing
perfect here.

If we remove the `ArrayRef` constructor AND the helper, and add a new
helper like:

    template <class T> ArrayRef<T> makeTempArrayRef(const T &OneElt) {
      return ArrayRef<T>(&OneElt, &OneElt + 1);
    }

then both types of bugs will be more rare and obvious.

For the OP's case, the name "Temp" makes it more clear that the ArrayRef
shouldn't be named. For my case, since the caller is forced to use
`foo(makeTempArrayRef("string"))`, it's easier to make safe API changes.

Yep - the convenience of one-element->ArrayRef is "cute" at best, I
think. Having to wrap it doesn't seem detrimental. Would have to look
at some numbers, though (if we could easily gather the number of
instances of this - I'm not sure of the easiest way to do that, given
the build system likes to stop after it sees an error).

I'm not sure if the "Temp" in the name would be a sufficient
deterrent, but maybe.

This would cause a lot of churn though -- ArrayRefs are *usually*
temporaries. Although this makes sense to me, disallowing named
ArrayRefs might be more practical.

Yeah, I'm not sure how practical that is either - we have (guessing)
quite a few named ArrayRefs... but it might be tractible to remove
them. Not sure how I feel about that.

Wow, I think I finally understand how the temporary is being created.

Can we get by without the single element constructor? Callers would have to
write:
const char *strs = { "string" };
foo(strs);

Instead of foo("string"), which is subtle anyway.

Yep - the convenience of one-element->ArrayRef is "cute" at best, I
think. Having to wrap it doesn't seem detrimental. Would have to look
at some numbers, though (if we could easily gather the number of
instances of this - I'm not sure of the easiest way to do that, given
the build system likes to stop after it sees an error).

I'm not sure if the "Temp" in the name would be a sufficient
deterrent, but maybe.

Wow, I think I finally understand how the temporary is being created.

Subtle, innit?

Can we get by without the single element constructor? Callers would have to
write:
const char *strs = { "string" };
foo(strs);

We can get by - I think that'd be a tad verbose though. But we don't
quite have a feel for just how many of these
single-element-conversions there are.

Instead of foo("string"), which is subtle anyway.

Yeah - I think it's "cute" at best, and wouldn't mind something more
explicit - I'm just not sure defining a whole local array is the right
tradeoff either.

diff --git a/include/llvm/ADT/ArrayRef.h b/include/llvm/ADT/ArrayRef.h
index 0351cf5..79f14da 100644
--- a/include/llvm/ADT/ArrayRef.h
+++ b/include/llvm/ADT/ArrayRef.h
@@ -68,7 +68,7 @@ namespace llvm {
     /*implicit*/ ArrayRef(NoneType) : Data(nullptr), Length(0) {}

     /// Construct an ArrayRef from a single element.
- /*implicit*/ ArrayRef(const T &OneElt)
+ __attribute__((deprecated)) /*implicit*/ ArrayRef(const T &OneElt)
       : Data(&OneElt), Length(1) {}

     /// Construct an ArrayRef from a pointer and length.

and rebuilding LLVM and Clang gives me:

$ ninja 2>&1 | grep -e Wdeprecated-declarations | wc -l
    6083

This includes forwards from the `makeArrayRef()` helper.

Yep - the convenience of one-element->ArrayRef is "cute" at best, I
think. Having to wrap it doesn't seem detrimental. Would have to look
at some numbers, though (if we could easily gather the number of
instances of this - I'm not sure of the easiest way to do that, given
the build system likes to stop after it sees an error).

diff --git a/include/llvm/ADT/ArrayRef.h b/include/llvm/ADT/ArrayRef.h
index 0351cf5..79f14da 100644
--- a/include/llvm/ADT/ArrayRef.h
+++ b/include/llvm/ADT/ArrayRef.h
@@ -68,7 +68,7 @@ namespace llvm {
     /*implicit*/ ArrayRef(NoneType) : Data(nullptr), Length(0) {}

     /// Construct an ArrayRef from a single element.
- /*implicit*/ ArrayRef(const T &OneElt)
+ __attribute__((deprecated)) /*implicit*/ ArrayRef(const T &OneElt)
       : Data(&OneElt), Length(1) {}

     /// Construct an ArrayRef from a pointer and length.

and rebuilding LLVM and Clang gives me:

$ ninja 2>&1 | grep -e Wdeprecated-declarations | wc -l
    6083

Uniqued? I imagine a few are in headers.

Good point.

$ grep Wdeprecated-declarations warnings.log | sort -u | wc -l
     644

Thanks!

I'll try to sum up my thoughts on this:

That seems like many enough that I'd rather not force all those
callers to use an explicit local array.
The utility function doesn't actually save us - we still have to be
diligent about not using it for named ArrayRefs.
  But it might be enough...
A clang-tidy check would be great (instead of any API change) - if we
had a decent developer integration, clang-tidy must-clean shortlist
that was on a buildbot, etc.

All that being said - I don't have any great alternative suggestions,
nor do I feel personally compelled to do any of the work listed above,
so if anyone feels like doing any of those things (even the ones I
don't much like) I don't think I'll stand in the way of progress :slight_smile:

- David