RFC: Source-level attribute for conservative __builtin_object_size(..., 1)

Context

Ping. Does anyone have feedback on this?

From: "Bob Wilson via cfe-dev" <cfe-dev@lists.llvm.org>
To: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
Cc: "Richard Smith" <richard@metafoo.co.uk>, "CFE Dev" <cfe-dev@lists.llvm.org>, "George Burgess IV via llvm-commits"
<llvm-commits@lists.llvm.org>
Sent: Monday, March 7, 2016 4:30:57 PM
Subject: Re: [cfe-dev] RFC: Source-level attribute for conservative __builtin_object_size(..., 1)

>
> Context
> =======
>
> r245403 and follow-ups improved __builtin_object_size(ptr, 1). The
> type=1
> version of __builtin_object_size returns the maximum possible
> distance
> between `ptr` and the end of the subobject it's part of. E.g., in:
> ```
> struct S {
> int i[2];
> char c[5];
> };
> ```
> then __builtin_object_size(s.i, 1) should return 8, and
> __builtin_object_size(s.c + 2, 1) should return 3.
>
> r250488 relaxed the rules for 0- or 1-sized arrays. Consider
> ```
> struct S {
> int i[2];
> char c[0];
> };
> ```
> The expectation is that S will be over-allocated by some number of
> bytes,
> so that `S::c` has a runtime-determined array size. r250488
> changed
> __builtin_object_size(ptr, 1) to be more conservative in this case,
> falling
> back to "type=0". (Type=0 returns the maximum possible distance
> between
> `ptr` and the end of the allocation.)
>
> Problem
> =======
>
> We use __builtin_object_size(ptr, 1) in our strcpy implementation
> when
> _FORTIFY_SOURCE=2 (the default). The code is something equivalent
> to:
> ```
> #define strcpy(a, b, sz) \
> __builtin_strcpy_chk(a, b, sz, __builtin_object_size(a, 1))
> ```
>
> This is causing a problem for some
> structs in our headers that end with a char array that has a size >
> 1 and
> that are expected to be over-allocated.
>
> One example is `sockaddr_un`. From the unix(4) manpage:
> unix(4) [freebsd man page]
> ```
> struct sockaddr_un {
> u_char sun_len;
> u_char sun_family;
> char sun_path[104];
> };
> ```
>
> This has been around a long time. Unfortunately,
> `sizeof(sockaddr_un)`
> cannot really change, and there's a ton of code out there that uses
> strcpy() on this struct. We need some way for clang to be
> conservative
> in this case.
>
> Solution
> ========
>
> We're thinking about adding an attribute, something like:
> ```
> struct sockaddr_un {
> u_char sun_len;
> u_char sun_family;
> char sun_path[104] __attribute__((variable_length_array));
> };
> ```
> (is there a better name?). This would allow us to decorate our
> headers,
> and explicitly opt-in on a struct-by-struct basis to the
> conservative
> behaviour that r250488 gives to 0- and 1-sized arrays.
>
> The only other option we can think of us to be conservative
> whenever the
> the subobject is part of an array at the end of a struct.
>
> Thoughts?

Ping. Does anyone have feedback on this?

If you can add an attribute to your headers, this seems like a good solution. We might also need the conservative option for dealing with older headers. Regarding the attribute, is is really a property of the array member, or a property of the structure (i.e. that it will be overallocated)?

FWIW, the proposed attribute seems independently useful for eliminating unnecessary warnings. For example, if we have:

void foo(struct sockaddr_un *s) {
  s->sun_path[106] = 0;
}

we generate this:

warning: array index 106 is past the end of the array (which contains 104 elements) [-Warray-bounds]
  s->sun_path[106] = 0;

which, as you describe the situation, is not useful.

-Hal

I was thinking only of the narrow fix. But applying it to the struct
seems more interesting. It formalizes other types of over-allocation.

Also (maybe this is where you were going), this would let someone add
the attribute after #include.

From: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Bob Wilson" <bob.wilson@apple.com>, "Richard Smith" <richard@metafoo.co.uk>, "CFE Dev" <cfe-dev@lists.llvm.org>,
"George Burgess IV via llvm-commits" <llvm-commits@lists.llvm.org>
Sent: Monday, March 7, 2016 5:57:44 PM
Subject: Re: [cfe-dev] RFC: Source-level attribute for conservative __builtin_object_size(..., 1)

>
>> From: "Bob Wilson via cfe-dev" <cfe-dev@lists.llvm.org>
>> To: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
>> Cc: "Richard Smith" <richard@metafoo.co.uk>, "CFE Dev"
>> <cfe-dev@lists.llvm.org>, "George Burgess IV via llvm-commits"
>> <llvm-commits@lists.llvm.org>
>> Sent: Monday, March 7, 2016 4:30:57 PM
>> Subject: Re: [cfe-dev] RFC: Source-level attribute for
>> conservative __builtin_object_size(..., 1)
>>
>>
>>>
>>> Context
>>> =======
>>>
>>> r245403 and follow-ups improved __builtin_object_size(ptr, 1).
>>> The
>>> type=1
>>> version of __builtin_object_size returns the maximum possible
>>> distance
>>> between `ptr` and the end of the subobject it's part of. E.g.,
>>> in:
>>> ```
>>> struct S {
>>> int i[2];
>>> char c[5];
>>> };
>>> ```
>>> then __builtin_object_size(s.i, 1) should return 8, and
>>> __builtin_object_size(s.c + 2, 1) should return 3.
>>>
>>> r250488 relaxed the rules for 0- or 1-sized arrays. Consider
>>> ```
>>> struct S {
>>> int i[2];
>>> char c[0];
>>> };
>>> ```
>>> The expectation is that S will be over-allocated by some number
>>> of
>>> bytes,
>>> so that `S::c` has a runtime-determined array size. r250488
>>> changed
>>> __builtin_object_size(ptr, 1) to be more conservative in this
>>> case,
>>> falling
>>> back to "type=0". (Type=0 returns the maximum possible distance
>>> between
>>> `ptr` and the end of the allocation.)
>>>
>>> Problem
>>> =======
>>>
>>> We use __builtin_object_size(ptr, 1) in our strcpy implementation
>>> when
>>> _FORTIFY_SOURCE=2 (the default). The code is something
>>> equivalent
>>> to:
>>> ```
>>> #define strcpy(a, b, sz) \
>>> __builtin_strcpy_chk(a, b, sz, __builtin_object_size(a, 1))
>>> ```
>>>
>>> This is causing a problem for some
>>> structs in our headers that end with a char array that has a size
>>> >
>>> 1 and
>>> that are expected to be over-allocated.
>>>
>>> One example is `sockaddr_un`. From the unix(4) manpage:
>>> unix(4) [freebsd man page]
>>> ```
>>> struct sockaddr_un {
>>> u_char sun_len;
>>> u_char sun_family;
>>> char sun_path[104];
>>> };
>>> ```
>>>
>>> This has been around a long time. Unfortunately,
>>> `sizeof(sockaddr_un)`
>>> cannot really change, and there's a ton of code out there that
>>> uses
>>> strcpy() on this struct. We need some way for clang to be
>>> conservative
>>> in this case.
>>>
>>> Solution
>>> ========
>>>
>>> We're thinking about adding an attribute, something like:
>>> ```
>>> struct sockaddr_un {
>>> u_char sun_len;
>>> u_char sun_family;
>>> char sun_path[104] __attribute__((variable_length_array));
>>> };
>>> ```
>>> (is there a better name?). This would allow us to decorate our
>>> headers,
>>> and explicitly opt-in on a struct-by-struct basis to the
>>> conservative
>>> behaviour that r250488 gives to 0- and 1-sized arrays.
>>>
>>> The only other option we can think of us to be conservative
>>> whenever the
>>> the subobject is part of an array at the end of a struct.
>>>
>>> Thoughts?
>>
>> Ping. Does anyone have feedback on this?
>
> If you can add an attribute to your headers, this seems like a good
> solution. We might also need the conservative option for dealing
> with older headers. Regarding the attribute, is is really a
> property of the array member, or a property of the structure (i.e.
> that it will be overallocated)?

I was thinking only of the narrow fix. But applying it to the struct
seems more interesting. It formalizes other types of
over-allocation.

Also (maybe this is where you were going), this would let someone add
the attribute after #include.
--
// Define 'struct something' in a header that can't be changed.
#include <something.h>

// Add an attribute.
struct something __attribute__((overallocated));
--

I like this better.

Great :slight_smile:

> FWIW, the proposed attribute seems independently useful for
> eliminating unnecessary warnings. For example, if we have:
>
> void foo(struct sockaddr_un *s) {
> s->sun_path[106] = 0;
> }
>
> we generate this:
>
> warning: array index 106 is past the end of the array (which
> contains 104 elements) [-Warray-bounds]
> s->sun_path[106] = 0;
>
> which, as you describe the situation, is not useful.

Good point. I think we could silence this with either attribute
location, right?

Yes, I think so.

-Hal

From: "Hal Finkel via cfe-dev" <cfe-dev@lists.llvm.org>
To: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
Cc: "Richard Smith" <richard@metafoo.co.uk>, "CFE Dev" <cfe-dev@lists.llvm.org>, "George Burgess IV via llvm-commits"
<llvm-commits@lists.llvm.org>
Sent: Monday, March 7, 2016 6:00:16 PM
Subject: Re: [cfe-dev] RFC: Source-level attribute for conservative __builtin_object_size(..., 1)

> From: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "Bob Wilson" <bob.wilson@apple.com>, "Richard Smith"
> <richard@metafoo.co.uk>, "CFE Dev" <cfe-dev@lists.llvm.org>,
> "George Burgess IV via llvm-commits" <llvm-commits@lists.llvm.org>
> Sent: Monday, March 7, 2016 5:57:44 PM
> Subject: Re: [cfe-dev] RFC: Source-level attribute for conservative
> __builtin_object_size(..., 1)
>
>
> >
> >> From: "Bob Wilson via cfe-dev" <cfe-dev@lists.llvm.org>
> >> To: "Duncan P. N. Exon Smith" <dexonsmith@apple.com>
> >> Cc: "Richard Smith" <richard@metafoo.co.uk>, "CFE Dev"
> >> <cfe-dev@lists.llvm.org>, "George Burgess IV via llvm-commits"
> >> <llvm-commits@lists.llvm.org>
> >> Sent: Monday, March 7, 2016 4:30:57 PM
> >> Subject: Re: [cfe-dev] RFC: Source-level attribute for
> >> conservative __builtin_object_size(..., 1)
> >>
> >>
> >>>
> >>> Context
> >>> =======
> >>>
> >>> r245403 and follow-ups improved __builtin_object_size(ptr, 1).
> >>> The
> >>> type=1
> >>> version of __builtin_object_size returns the maximum possible
> >>> distance
> >>> between `ptr` and the end of the subobject it's part of. E.g.,
> >>> in:
> >>> ```
> >>> struct S {
> >>> int i[2];
> >>> char c[5];
> >>> };
> >>> ```
> >>> then __builtin_object_size(s.i, 1) should return 8, and
> >>> __builtin_object_size(s.c + 2, 1) should return 3.
> >>>
> >>> r250488 relaxed the rules for 0- or 1-sized arrays. Consider
> >>> ```
> >>> struct S {
> >>> int i[2];
> >>> char c[0];
> >>> };
> >>> ```
> >>> The expectation is that S will be over-allocated by some number
> >>> of
> >>> bytes,
> >>> so that `S::c` has a runtime-determined array size. r250488
> >>> changed
> >>> __builtin_object_size(ptr, 1) to be more conservative in this
> >>> case,
> >>> falling
> >>> back to "type=0". (Type=0 returns the maximum possible
> >>> distance
> >>> between
> >>> `ptr` and the end of the allocation.)
> >>>
> >>> Problem
> >>> =======
> >>>
> >>> We use __builtin_object_size(ptr, 1) in our strcpy
> >>> implementation
> >>> when
> >>> _FORTIFY_SOURCE=2 (the default). The code is something
> >>> equivalent
> >>> to:
> >>> ```
> >>> #define strcpy(a, b, sz) \
> >>> __builtin_strcpy_chk(a, b, sz, __builtin_object_size(a, 1))
> >>> ```
> >>>
> >>> This is causing a problem for some
> >>> structs in our headers that end with a char array that has a
> >>> size
> >>> >
> >>> 1 and
> >>> that are expected to be over-allocated.
> >>>
> >>> One example is `sockaddr_un`. From the unix(4) manpage:
> >>> unix(4) [freebsd man page]
> >>> ```
> >>> struct sockaddr_un {
> >>> u_char sun_len;
> >>> u_char sun_family;
> >>> char sun_path[104];
> >>> };
> >>> ```
> >>>
> >>> This has been around a long time. Unfortunately,
> >>> `sizeof(sockaddr_un)`
> >>> cannot really change, and there's a ton of code out there that
> >>> uses
> >>> strcpy() on this struct. We need some way for clang to be
> >>> conservative
> >>> in this case.
> >>>
> >>> Solution
> >>> ========
> >>>
> >>> We're thinking about adding an attribute, something like:
> >>> ```
> >>> struct sockaddr_un {
> >>> u_char sun_len;
> >>> u_char sun_family;
> >>> char sun_path[104] __attribute__((variable_length_array));
> >>> };
> >>> ```
> >>> (is there a better name?). This would allow us to decorate our
> >>> headers,
> >>> and explicitly opt-in on a struct-by-struct basis to the
> >>> conservative
> >>> behaviour that r250488 gives to 0- and 1-sized arrays.
> >>>
> >>> The only other option we can think of us to be conservative
> >>> whenever the
> >>> the subobject is part of an array at the end of a struct.
> >>>
> >>> Thoughts?
> >>
> >> Ping. Does anyone have feedback on this?
> >
> > If you can add an attribute to your headers, this seems like a
> > good
> > solution. We might also need the conservative option for dealing
> > with older headers. Regarding the attribute, is is really a
> > property of the array member, or a property of the structure
> > (i.e.
> > that it will be overallocated)?
>
> I was thinking only of the narrow fix. But applying it to the
> struct
> seems more interesting. It formalizes other types of
> over-allocation.
>
> Also (maybe this is where you were going), this would let someone
> add
> the attribute after #include.
> --
> // Define 'struct something' in a header that can't be changed.
> #include <something.h>
>
> // Add an attribute.
> struct something __attribute__((overallocated));
> --
>
> I like this better.

Great :slight_smile:

>
> > FWIW, the proposed attribute seems independently useful for
> > eliminating unnecessary warnings. For example, if we have:
> >
> > void foo(struct sockaddr_un *s) {
> > s->sun_path[106] = 0;
> > }
> >
> > we generate this:
> >
> > warning: array index 106 is past the end of the array (which
> > contains 104 elements) [-Warray-bounds]
> > s->sun_path[106] = 0;
> >
> > which, as you describe the situation, is not useful.
>
> Good point. I think we could silence this with either attribute
> location, right?

Yes, I think so.

Also, if we put this on classes, we could have a warning to catch the bugs Richard just fixed in r262891.

-Hal