[RFC] Harmonize flexible array members handling

Hi Folks,

While working on flexible array members (FAM) and -fstrict-flex-arrays= flag, I realized that clang has a
(very) inconsistent behavior wrt. what is a flexible array member, depending on the context.

Currently, supported patterns are (for trailing arrays, obviously):

P1: type name[ ]; // C99 style, always legit
P2: type name[0]; //C89 style, GNU extension
P3: type name[1]; //C89 style, GNU extension
P4: type name[n]; // n> 1, supported for legacy compatibility, behavior varies depending on wether n is a literal, the result of a macro expansion or template parameter

The situation where being a flexible array member matters are:

S1: bound checking (under -Warray-bound)
S2: sanitizier checking (under -fsanitize=array-bounds)
S3: __builtin_object_size computation (to determine if the array type size is the real size)

But they handle very differently P4, basically depending on reported bug (let’s say it has grown organically).

Under -fstrict-flex-arrays=0, I think it’s safe to say that cases P1, P2 and P3
should be considered FAM in all situations. Considering P4 I’d like to make it
consistent by stating that if at least one of S1, S2, S3 consider it a FAM, all
of them should do so. Any other approach would be considered as a regression,
and clang users wouldn’t be happy with new warnings or new sanitizer failures
popping off for code that were considered legit before.

Doing so will probably miss a few warnings / sanitizer opportunity, but that
would only be for p4 & -fstrict-flex-arrays=0.

So, yeah, any thoughts?

1 Like

How plausible is it for us to stop supporting this as a FAM for legacy compatibility? It’s not a FAM and it seems significantly less defensible to support than the GNU extensions. If we’re stuck with it, I think we should diagnose (default on) any situation we treat that field as being a FAM.

Some more examples where it matters: https://godbolt.org/z/n4rvWoqbo

I think we need something more principled to determine what is or isn’t a FAM than relying on an ad hoc list of places we’ve historically thought array members were interesting. For example, if we consider it to be a FAM because of sanitizer checking, should we also consider it a FAM in terms of checking constraints like capturing in a lambda?

I definitely agree that users would be surprised by a change in diagnostic behavior or support, but it would help to know just what that change will cause problems for. To me, some of this is about fixing bugs in the way we handle FAMs, and fixing bugs can break code that previously did something useful. But if that only impacts a handful of projects, that breakage might be worth it for clawing back some consistency in this space.

It’s hard to measure exactly how much code is affected, but it does seem to be a somewhat frequent pattern. See REGRESSION: clang-3.8+ do not work well with _FORTIFY_SOURCE and struct sockaddr with addresses > 14 bytes (__builtin_object_size doesn't grok SOCK_MAXADDRLEN extension) · Issue #29694 · llvm/llvm-project · GitHub .

We can’t really forbid capturing length 1 arrays.

Ugh, yeah, that’s in system header code, we can’t do much about that.

Agreed, but saying “this is a FAM except for when it isn’t” is not particularly helpful behavior for users either. To me, I think int fam[] is a FAM and two or three other forms are not a FAM, but are something FAM-like and should be clearly distinguished as such in terms of diagnostic wording, documentation, etc.

P2 is a conforming extension to treat as FAM if we wanted to because a zero-sized array is UB, so we can define that space however we want (so long as we issue a pedantic warning, which we do). But treating P3 or P4 as a FAM does not seem like a conforming extension because it changes the meaning of conforming code to treat it as one (for example, sizeof rules around FAMs are special, see C2x 6.7.2.1p20, “In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.”) e.g., if we call int[1] a FAM explicitly, then this code has nonconforming behavior: Compiler Explorer

So it’s less about forbidding existing code (though I still really wish I could kill support for P4 because that’s just egregiously ignoring the language rules) and more about clarifying diagnostic wording, documentation, etc so we don’t call something a FAM when it isn’t one (like cases P3 and P4). Does that make sense?

1 Like

Since -Warray-bounds is on by default, it is likely the behavior people observe most easily. Losing that diagnostic for trailing arrays of size N>1 may not be a desirable change – even if it improves consistency.

Ideally, we’d simply switch the default to -fstrict-flex-arrays=1…but that’s likely not viable, at least in the short/medium term.

So, instead, I think we should strongly consider allowing -Warray-bounds to be explicitly inconsistent with – and purely stricter than – the sanitizer and _b_o_s behavior. The warning only triggers on constant indices, so the chance of false positive diagnostic on an intended use of a N>=1 flexible array is small (such intended accesses will generally use a variable index). Also, because it’s a compile warning, it is significantly less harmful if it does give such a false-positive than a runtime abort or change in behavior is.

Do note that -Warray-bounds’s behavior differences depending on whether it’s literal or came from a macro/template also applies to size==1 (“P3”), not just size>1 (“P4”).

Agreed, but saying “this is a FAM except for when it isn’t” is not particularly helpful behavior for users either. To me, I think int fam is a FAM and two or three other forms are not a FAM, but are something FAM-like and should be clearly distinguished as such in terms of diagnostic wording, documentation, etc.

Yes, you’re absolutely correct. We are not talking at all about P[234] being actually a C99 FAM. Almost none of the behavioral changes of an actual FAM apply to them (and I do not think we should change that.)

Indeed, all we’re talking about here is whether you’re allowed to index past the declared bounds. While the term “flexible array member” does reasonably describe such a thing, it’d be best to avoid such informal usage, since the term has a formal definition per C99 – and that formal meaning is not what’s intended.

Would that be an option to have an allow-list for the few system header structs that use this pattern, and otherwise drop that support?

I just commented on ⚙ D126864 [clang] Introduce -fstrict-flex-arrays=<n> for stricter handling of flexible arrays which had some issues with the documentation, but what caught my eye from that review was: “Control which arrays are considered as flexible arrays members.” – that sounds a lot like “treat int trailing[1]; as though it’s a FAM according to the language rules” to me and is possibly where a lot of my confusion comes from.

It would be nice if we could find better terminology to make this more clear.

Maaayyyyybbbbeee, but it would be risky because it’s hard to know how many non-system-header situations are also impacted.

In FreeBSD userspace-visible types that get used in the kernel I know of dirent’s d_name, fd_set, sockaddr’s sa_data, sockaddr_dl’s sdl_data, sockaddr_nb’s snb_name, sockaddr_un’s sun_path and direct’s d_name (UFS on-disk dirent). In our experimenting with subobject bounds enforcement for CheriBSD we introduced a couple of attributes to mark flexible array-like struct members, one taking no arguments and the other taking an integer argument giving an upper bound on the size. We use them for code generation purposes, but something equivalent could be useful upstream for static analysis and provide a way to allow ABI/API compatibility to be preserved for such historic types in a general manner (though you’d still have to match them if you want to handle older headers…).

Note also that, for T foo[N], there are two conflicting ways it’s used. Typically either N is the minimum length, with extra T’s shoved on the end of the allocation (the sockaddr case), or N is the maximum length, with less than a full aggregate allocated, chopping off some of the T’s that should be allocated (the dirent case), which has always been quite dodgy but needs to keep working unless you want to break Unix systems.

V2 of the GCC -fstrict-flex-arrays patch under review introduced a strict_flex_array(LEVEL) attribute to annotate trailing arrays with and override the global option setting.

1 Like

Trying to summarize the concerns here, although it maybe a bit difficult because the answers to the original raises more Issues than solutions :slight_smile:

Considering naming:

  • Flexible Array Members is a normalized term that identifies a well known patter, namely T var[]. Whatever the value of -fstrict-flex-arrays they (obviously) always are treated as FAM.
  • Other forms, although they behave like FAM, are not FAM. I’ve seen them being referred as struct hack in various places. -fstrict-flex-arrays impact which form are treated as FAM

Considering the normalization of treatments and the regressions, because system headers and legacy behavior are involved, We can’t change existing behavior for Struct Hack while not breaking significant code area. And even if we could, I don’t think this should be done at the same time as normalizing Clang behavior (i.e. once normalized, we can consider breaking things, but first normalize behavior).

What would be the impact of normalization? If we avoid regression and normalize, it would mean some array accesses will no longer raise a warning / imply a sanitizer call. I think that’s a decent price to pay.

Technically: I’d add a method Type::isFlexibleArrayMemberLike(unsigned StrictFlexArrayLevel) and use it wherever it’s needed in place of existing ad-hoc and redundant checks are done.

Does it makes sense?

+1, I think that’s sensible.

That sounds like an improvement to me!

What would be the impact of normalization? If we avoid regression and normalize, it would mean some array accesses will no longer raise a warning / imply a sanitizer call. I think that’s a decent price to pay.

I’m unconvinced, as I wrote upthread (which probably got lost in the naming discussion):

1 Like

Decoupling the warning from __builtin_object_size makes a gradual transition to the modern T[] form possible with less disruption than otherwise. (That’s the approach that GCC has historically taken, although I’m not sure the -fstrict-flex-arrays patches don’t do something different.)

We already have something where any call from -Warray-bounds uses a different -fstrict-flex-array level from the one passed in command line. That way we keep locality of the computation while preserving code locality. We could also have an extra argument in isFlexibleArrayMemberLike to take this oddities into account

That’s actually already the case in clang and it will stay as is, because the logic there is relatively different from the logic in the other code area impacted by FAM.

Patch posted here: ⚙ D134791 [clang] Unify Sema and CodeGen implementation of isFlexibleArrayMemberExpr
To lower friction, I’ve unified Sema and CodeGen functions while keeping a flag to have different paths wrt. macro and template parameter substitution.