put "str" in __attribute__((annotate("str"))) to dwarf

Hi,

This feature is for the BPF community. The detailed use case is
described in https://reviews.llvm.org/D103549. And I have crafted a
WIP patch https://reviews.llvm.org/D103667 which implements necessary
frontend and codegen (plus others) to show the scope of the work.

To elaborate the use case a little bit more. Basically, we want to put
some annotations into variables (include parameters), functions,
structure/union types and structure/union members. The string
arguments in annotations will not
be interpreted inside the compiler. The compiler should just emit
these annotations into dwarf. Currently in the linux build system,
pahole will convert dwarf to BTF which will encode these annotation
strings into BTF. The following is a C example how annotations look
like at source level:

$ cat t1.c
/* a pointer pointing to user memory */
#define __user __attribute__((annotate("user")))
/* a pointer protected by rcu */
#define __rcu __attribute__((annotate("rcu")))
/* the struct has some special property */
#define __special_struct __attribute__((annotate("special_struct")))
/* sock_lock is held for the function */
#define __sock_lock_held __attribute((annotate("sock_lock_held")))
/* the hash table element type is socket */
#define __special_info __attribute__((annotate("elem_type:socket")))

struct hlist_node;
struct hlist_head {
  struct hlist_node *prev;
  struct hlist_node *next;
} __special_struct;
struct hlist {
   struct hlist_head head __special_info;
};

extern void bar(struct hlist *);
int foo(struct hlist *h, int *a __user, int *b __rcu) __sock_lock_held {
  bar(h);
  return *a + *b;
}

In https://reviews.llvm.org/D103667, I implemented a LLVM extended attribute
DWARF_AT_LLVM_annotations. But this might not be the right thing to do
as it is not clear whether there are use cases beyond BPF.
David suggested that we discuss this in llvm-dev to get consensus on
how this feature may be supported in LLVM. Hence this email.

Please share your comments, suggestions on how to support this feature
in LLVM. Thanks!

Yonghong

(Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)

These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?

And I’m guessing maybe we’d want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as “annotate” in the global attribute namespace seems fairly bold for what’s currently a fairly narrow use case. +Aaron Ballman thoughts on this?

(Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)

These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?

Yes, they are hand-written in the input source and fit into the clang
compiler. They are not derived inside the clang/llvm.

And I'm guessing maybe we'd want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as "annotate" in the global attribute namespace seems fairly bold for what's currently a fairly narrow use case. +Aaron Ballman thoughts on this?

I am okay with something like bpf_annotate as the existing annotate
attribute will generate global variables or codes for annotations
which is unnecessary for bpf use case,
although the overhead should be quite small.

(Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)

These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?

Yes, they are hand-written in the input source and fit into the clang
compiler. They are not derived inside the clang/llvm.

Good to know/understand.

And I’m guessing maybe we’d want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as “annotate” in the global attribute namespace seems fairly bold for what’s currently a fairly narrow use case. +Aaron Ballman thoughts on this?

I am okay with something like bpf_annotate as the existing annotate
attribute will generate global variables or codes for annotations
which is unnecessary for bpf use case,
although the overhead should be quite small.

Ah, there’s an existing annotate attribute you’re proposing leveraging/reusing that? Got a pointer to the documentation for that? I don’t see it documented here: https://clang.llvm.org/docs/AttributeReference.html

>
> (Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)
>
> These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?

Yes, they are hand-written in the input source and fit into the clang
compiler. They are not derived inside the clang/llvm.

Good to know/understand.

>
> And I'm guessing maybe we'd want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as "annotate" in the global attribute namespace seems fairly bold for what's currently a fairly narrow use case. +Aaron Ballman thoughts on this?

I am okay with something like bpf_annotate as the existing annotate
attribute will generate global variables or codes for annotations
which is unnecessary for bpf use case,
although the overhead should be quite small.

Ah, there's an existing annotate attribute you're proposing leveraging/reusing that? Got a pointer to the documentation for that? I don't see it documented here: https://clang.llvm.org/docs/AttributeReference.html

Looks like this attribute is not well documented.

I forgot how I found it. But below is a public blog on how it could be used:
   https://blog.quarkslab.com/implementing-a-custom-directive-handler-in-clang.html
I then went to
  clang/include/clang/Basic/Attr.td
and found

def Annotate : InheritableParamAttr {
  let Spellings = [Clang<"annotate">];
  let Args = [StringArgument<"Annotation">, VariadicExprArgument<"Args">];
  // Ensure that the annotate attribute can be used with
  // '#pragma clang attribute' even though it has no subject list.
  let AdditionalMembers = [{
  static AnnotateAttr *Create(ASTContext &Ctx, llvm::StringRef Annotation, \
              const AttributeCommonInfo &CommonInfo) {
    return AnnotateAttr::Create(Ctx, Annotation, nullptr, 0, CommonInfo);
  }
  static AnnotateAttr *CreateImplicit(ASTContext &Ctx, llvm::StringRef
Annotation, \
              const AttributeCommonInfo &CommonInfo = {SourceRange{}}) {
    return AnnotateAttr::CreateImplicit(Ctx, Annotation, nullptr, 0,
CommonInfo);
  }
  }];
  let PragmaAttributeSupport = 1;
  let Documentation = [Undocumented];
}

and tried to use it for places BPF cares about and it all covers.

BTW, the above attr definition does say Undocumented.

>
>>
>> >
>> > (Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)
>> >
>> > These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?
>>
>> Yes, they are hand-written in the input source and fit into the clang
>> compiler. They are not derived inside the clang/llvm.
>
>
> Good to know/understand.
>
>>
>>
>> >
>> > And I'm guessing maybe we'd want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as "annotate" in the global attribute namespace seems fairly bold for what's currently a fairly narrow use case. +Aaron Ballman thoughts on this?
>>
>> I am okay with something like bpf_annotate as the existing annotate
>> attribute will generate global variables or codes for annotations
>> which is unnecessary for bpf use case,
>> although the overhead should be quite small.
>
>
> Ah, there's an existing annotate attribute you're proposing leveraging/reusing that? Got a pointer to the documentation for that? I don't see it documented here: https://clang.llvm.org/docs/AttributeReference.html

Looks like this attribute is not well documented.

Correct -- it's an ancient attribute that predates us documenting
attributes at all.

I forgot how I found it. But below is a public blog on how it could be used:
   https://blog.quarkslab.com/implementing-a-custom-directive-handler-in-clang.html
I then went to
  clang/include/clang/Basic/Attr.td
and found

def Annotate : InheritableParamAttr {
  let Spellings = [Clang<"annotate">];
  let Args = [StringArgument<"Annotation">, VariadicExprArgument<"Args">];
  // Ensure that the annotate attribute can be used with
  // '#pragma clang attribute' even though it has no subject list.
  let AdditionalMembers = [{
  static AnnotateAttr *Create(ASTContext &Ctx, llvm::StringRef Annotation, \
              const AttributeCommonInfo &CommonInfo) {
    return AnnotateAttr::Create(Ctx, Annotation, nullptr, 0, CommonInfo);
  }
  static AnnotateAttr *CreateImplicit(ASTContext &Ctx, llvm::StringRef
Annotation, \
              const AttributeCommonInfo &CommonInfo = {SourceRange{}}) {
    return AnnotateAttr::CreateImplicit(Ctx, Annotation, nullptr, 0,
CommonInfo);
  }
  }];
  let PragmaAttributeSupport = 1;
  let Documentation = [Undocumented];
}

and tried to use it for places BPF cares about and it all covers.

I don't think it's a good idea to use annotate for BPF needs. The
basic idea behind annotate is that it's a way to pass arbitrary string
(and starting very recently, other kinds of constant expressions) from
the frontend to the backend. So it's a general-purpose tool that's
used for one-off situations. As an example, attribute plugins will use
it because they cannot currently create their own semantic attribute
easily, and I think the static analyzer may make use of the feature as
well. Because the BPF needs are so specific, I think it'd be better to
use an attribute dedicated to those needs rather than using a
general-purpose attribute like annotate -- this will reduce the
likelihood of conflicts with the other creative uses people put
annotate to.

BTW, the above attr definition does say Undocumented.

Yeah, the build requires there to be some documentation for every
attribute, and Undocumented is what we use for attributes that we
elect not to document because they're implementation details (rarely)
or have failed to document yet (much more common).

HTH!

~Aaron

(Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)

These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?

Yes, they are hand-written in the input source and fit into the clang
compiler. They are not derived inside the clang/llvm.

Good to know/understand.

And I’m guessing maybe we’d want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as “annotate” in the global attribute namespace seems fairly bold for what’s currently a fairly narrow use case. +Aaron Ballman thoughts on this?

I am okay with something like bpf_annotate as the existing annotate
attribute will generate global variables or codes for annotations
which is unnecessary for bpf use case,
although the overhead should be quite small.

Ah, there’s an existing annotate attribute you’re proposing leveraging/reusing that? Got a pointer to the documentation for that? I don’t see it documented here: https://clang.llvm.org/docs/AttributeReference.html

Looks like this attribute is not well documented.

Correct – it’s an ancient attribute that predates us documenting
attributes at all.

I forgot how I found it. But below is a public blog on how it could be used:
https://blog.quarkslab.com/implementing-a-custom-directive-handler-in-clang.html
I then went to
clang/include/clang/Basic/Attr.td
and found

def Annotate : InheritableParamAttr {
let Spellings = [Clang<“annotate”>];
let Args = [StringArgument<“Annotation”>, VariadicExprArgument<“Args”>];
// Ensure that the annotate attribute can be used with
// ‘#pragma clang attribute’ even though it has no subject list.
let AdditionalMembers = [{
static AnnotateAttr *Create(ASTContext &Ctx, llvm::StringRef Annotation,
const AttributeCommonInfo &CommonInfo) {
return AnnotateAttr::Create(Ctx, Annotation, nullptr, 0, CommonInfo);
}
static AnnotateAttr *CreateImplicit(ASTContext &Ctx, llvm::StringRef
Annotation,
const AttributeCommonInfo &CommonInfo = {SourceRange{}}) {
return AnnotateAttr::CreateImplicit(Ctx, Annotation, nullptr, 0,
CommonInfo);
}
}];
let PragmaAttributeSupport = 1;
let Documentation = [Undocumented];
}

and tried to use it for places BPF cares about and it all covers.

I don’t think it’s a good idea to use annotate for BPF needs. The
basic idea behind annotate is that it’s a way to pass arbitrary string
(and starting very recently, other kinds of constant expressions) from
the frontend to the backend. So it’s a general-purpose tool that’s
used for one-off situations. As an example, attribute plugins will use
it because they cannot currently create their own semantic attribute
easily, and I think the static analyzer may make use of the feature as
well. Because the BPF needs are so specific, I think it’d be better to
use an attribute dedicated to those needs rather than using a
general-purpose attribute like annotate – this will reduce the
likelihood of conflicts with the other creative uses people put
annotate to.

Any suggestions/preferences for the spelling, Aaron?

>
> >
> >>
> >> >
> >> > (Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)
> >> >
> >> > These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?
> >>
> >> Yes, they are hand-written in the input source and fit into the clang
> >> compiler. They are not derived inside the clang/llvm.
> >
> >
> > Good to know/understand.
> >
> >>
> >>
> >> >
> >> > And I'm guessing maybe we'd want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as "annotate" in the global attribute namespace seems fairly bold for what's currently a fairly narrow use case. +Aaron Ballman thoughts on this?
> >>
> >> I am okay with something like bpf_annotate as the existing annotate
> >> attribute will generate global variables or codes for annotations
> >> which is unnecessary for bpf use case,
> >> although the overhead should be quite small.
> >
> >
> > Ah, there's an existing annotate attribute you're proposing leveraging/reusing that? Got a pointer to the documentation for that? I don't see it documented here: https://clang.llvm.org/docs/AttributeReference.html
>
> Looks like this attribute is not well documented.

Correct -- it's an ancient attribute that predates us documenting
attributes at all.

> I forgot how I found it. But below is a public blog on how it could be used:
> https://blog.quarkslab.com/implementing-a-custom-directive-handler-in-clang.html
> I then went to
> clang/include/clang/Basic/Attr.td
> and found
>
> def Annotate : InheritableParamAttr {
> let Spellings = [Clang<"annotate">];
> let Args = [StringArgument<"Annotation">, VariadicExprArgument<"Args">];
> // Ensure that the annotate attribute can be used with
> // '#pragma clang attribute' even though it has no subject list.
> let AdditionalMembers = [{
> static AnnotateAttr *Create(ASTContext &Ctx, llvm::StringRef Annotation, \
> const AttributeCommonInfo &CommonInfo) {
> return AnnotateAttr::Create(Ctx, Annotation, nullptr, 0, CommonInfo);
> }
> static AnnotateAttr *CreateImplicit(ASTContext &Ctx, llvm::StringRef
> Annotation, \
> const AttributeCommonInfo &CommonInfo = {SourceRange{}}) {
> return AnnotateAttr::CreateImplicit(Ctx, Annotation, nullptr, 0,
> CommonInfo);
> }
> }];
> let PragmaAttributeSupport = 1;
> let Documentation = [Undocumented];
> }
>
> and tried to use it for places BPF cares about and it all covers.

I don't think it's a good idea to use annotate for BPF needs. The
basic idea behind annotate is that it's a way to pass arbitrary string
(and starting very recently, other kinds of constant expressions) from
the frontend to the backend. So it's a general-purpose tool that's
used for one-off situations. As an example, attribute plugins will use
it because they cannot currently create their own semantic attribute
easily, and I think the static analyzer may make use of the feature as
well. Because the BPF needs are so specific, I think it'd be better to
use an attribute dedicated to those needs rather than using a
general-purpose attribute like annotate -- this will reduce the
likelihood of conflicts with the other creative uses people put
annotate to.

Any suggestions/preferences for the spelling, Aaron?

I don't know this domain particularly well, so takes these suggestions
with a giant grain of salt:

If the concept is specific to DWARF and you don't think it'll need to
extend into other debug formats, you could go with `dwarf_annotate`.
If it's not really a DWARF thing but is more about B[P|T]F, then
`btf_annotate` or `bpf_annotate` could work, but those may be a bit
mysterious to folks outside of the domain. If it's a generic debug
info concept, probably `debug_info_annotate` or something.

My primary concern with reusing `annotate` itself is because user
programs are likely already using that attribute for basically
arbitrary purposes, so I worry reusing it for this purpose may
accidentally expose annotations in debug info that the user never
really expected to be there (which may confuse whatever is reading the
annotations from the debug info).

~Aaron

(Crossposting to cfe-dev because this includes a proposal for a new C/C++ level attribute)

These attributes are all effectively hand-written (with or without macros) in the input source? None of them are derived by the compiler frontend based on other characteristics?

Yes, they are hand-written in the input source and fit into the clang
compiler. They are not derived inside the clang/llvm.

Good to know/understand.

And I’m guessing maybe we’d want the name to be a bit narrower, like bpf_annotate, perhaps - taking such a generic term as “annotate” in the global attribute namespace seems fairly bold for what’s currently a fairly narrow use case. +Aaron Ballman thoughts on this?

I am okay with something like bpf_annotate as the existing annotate
attribute will generate global variables or codes for annotations
which is unnecessary for bpf use case,
although the overhead should be quite small.

Ah, there’s an existing annotate attribute you’re proposing leveraging/reusing that? Got a pointer to the documentation for that? I don’t see it documented here: https://clang.llvm.org/docs/AttributeReference.html

Looks like this attribute is not well documented.

Correct – it’s an ancient attribute that predates us documenting
attributes at all.

I forgot how I found it. But below is a public blog on how it could be used:
https://blog.quarkslab.com/implementing-a-custom-directive-handler-in-clang.html
I then went to
clang/include/clang/Basic/Attr.td
and found

def Annotate : InheritableParamAttr {
let Spellings = [Clang<“annotate”>];
let Args = [StringArgument<“Annotation”>, VariadicExprArgument<“Args”>];
// Ensure that the annotate attribute can be used with
// ‘#pragma clang attribute’ even though it has no subject list.
let AdditionalMembers = [{
static AnnotateAttr *Create(ASTContext &Ctx, llvm::StringRef Annotation,
const AttributeCommonInfo &CommonInfo) {
return AnnotateAttr::Create(Ctx, Annotation, nullptr, 0, CommonInfo);
}
static AnnotateAttr *CreateImplicit(ASTContext &Ctx, llvm::StringRef
Annotation,
const AttributeCommonInfo &CommonInfo = {SourceRange{}}) {
return AnnotateAttr::CreateImplicit(Ctx, Annotation, nullptr, 0,
CommonInfo);
}
}];
let PragmaAttributeSupport = 1;
let Documentation = [Undocumented];
}

and tried to use it for places BPF cares about and it all covers.

I don’t think it’s a good idea to use annotate for BPF needs. The
basic idea behind annotate is that it’s a way to pass arbitrary string
(and starting very recently, other kinds of constant expressions) from
the frontend to the backend. So it’s a general-purpose tool that’s
used for one-off situations. As an example, attribute plugins will use
it because they cannot currently create their own semantic attribute
easily, and I think the static analyzer may make use of the feature as
well. Because the BPF needs are so specific, I think it’d be better to
use an attribute dedicated to those needs rather than using a
general-purpose attribute like annotate – this will reduce the
likelihood of conflicts with the other creative uses people put
annotate to.

Any suggestions/preferences for the spelling, Aaron?

I don’t know this domain particularly well, so takes these suggestions
with a giant grain of salt:

If the concept is specific to DWARF and you don’t think it’ll need to
extend into other debug formats, you could go with dwarf_annotate.
If it’s not really a DWARF thing but is more about B[P|T]F, then
btf_annotate or bpf_annotate could work, but those may be a bit
mysterious to folks outside of the domain. If it’s a generic debug
info concept, probably debug_info_annotate or something.

Arguably it can/could be a generic debug info or dwarf thing, but for now we don’t have any use for it other than to squirrel info along to BTF/BPF so I’m on the fence about which prefix to use exactly

My primary concern with reusing annotate itself is because user
programs are likely already using that attribute for basically
arbitrary purposes, so I worry reusing it for this purpose may
accidentally expose annotations in debug info that the user never
really expected to be there (which may confuse whatever is reading the
annotations from the debug info).

Yeah, +1 there.

A bit more bike shedding colors...

The __rcu and __user annations might be used by the clang itself eventually.
Currently the "sparse" tool is doing this analysis and warns users
when __rcu pointer is incorrectly accessed in the kernel C code.
If clang can do that directly that could be a huge selling point
for folks to switch from gcc to clang for kernel builds.
The front-end would treat such annotations as arbitrary string, but
special "building-linux-kernel-pass" would interpret the semantical context.

Considering above the dwarf_annotate, btf_annotate, debug_info_annotate
names don't fit that well. The accuracy of the annotations is important
unlike debug info that can be dropped on a whim of some optimization pass.

bpf_annotate wouldn't fit either, since the kernel might use that
without any bpf bits.

kernel_annotate might sound like it's not applicable to user space.

How about __attribute__((note("str"))) or __attribute__((tag("str"))) ?

>
>> >
>> >
>> > Any suggestions/preferences for the spelling, Aaron?
>>
>> I don't know this domain particularly well, so takes these suggestions
>> with a giant grain of salt:
>>
>> If the concept is specific to DWARF and you don't think it'll need to
>> extend into other debug formats, you could go with `dwarf_annotate`.
>> If it's not really a DWARF thing but is more about B[P|T]F, then
>> `btf_annotate` or `bpf_annotate` could work, but those may be a bit
>> mysterious to folks outside of the domain. If it's a generic debug
>> info concept, probably `debug_info_annotate` or something.
>
>
> Arguably it can/could be a generic debug info or dwarf thing, but for now we don't have any use for it other than to squirrel info along to BTF/BPF so I'm on the fence about which prefix to use exactly
>

A bit more bike shedding colors...

The __rcu and __user annations might be used by the clang itself eventually.
Currently the "sparse" tool is doing this analysis and warns users
when __rcu pointer is incorrectly accessed in the kernel C code.
If clang can do that directly that could be a huge selling point
for folks to switch from gcc to clang for kernel builds.
The front-end would treat such annotations as arbitrary string, but
special "building-linux-kernel-pass" would interpret the semantical context.

Are __rcu and __user annotations notionally distinct things from bpf
(and perhaps each other as well)? Distinct enough that it would make
sense to use a different attribute name for user source for each need?
I suspect the answer is yes given that the existing annotations have
their own names which are distinct, but I don't know this domain
enough to be sure.

Considering above the dwarf_annotate, btf_annotate, debug_info_annotate
names don't fit that well. The accuracy of the annotations is important
unlike debug info that can be dropped on a whim of some optimization pass.

bpf_annotate wouldn't fit either, since the kernel might use that
without any bpf bits.

kernel_annotate might sound like it's not applicable to user space.

How about __attribute__((note("str"))) or __attribute__((tag("str"))) ?

I don't think we'd want to use such generic terms for this
functionality. e.g., a note attribute could complement the existing
diagnose_if attribute for providing diagnostic notes, and a tag
attribute could be specific to tag types (struct, union, enum), etc.

I'm skeptical of using the same attribute for all these purposes
because that usually leads to needing some sort of uniqueness to the
string argument so it can be properly distinguished. e.g.,
__attribute__((note("bpf.instruction"))) vs
__attribute__((note("rcu.instruction")), which is harder for tooling
to handle if it feels the need to inspect the string literal (such as
for diagnostic purposes).

~Aaron

__rcu and __user don't overlap. __rcu is not a single annotation though.
It's a combination of annotations in pointers, functions, macros.
Some functions have:
__acquires(rcu)
another function might have:
__acquires(rcu_bh)
There are several flavors of the RCU in the kernel.
So single __attribute__((rcu_annotate("foo"))) won't work even within RCU scope.
But if we do:
struct foo {
  void * __attribute__((tag("ptr.rcu_bh")) ptr;
};
int bar(int) __attribute__((tag("acquires.rcu_bh")) { ... }
int baz(int) __attribute__((tag("releases.rcu_bh")) { ... }
int qux(int) __attribute__((tag("acquires.rcu_sched")) { ... }
...
The clang pass can parse these strings and correlate one tag to another.
RCU flavors come and go, so clang cannot hard code the names.

Maybe we can name it as "bpf_tag" as it is a "tag" for "bpf" use case?

David, in one of your early emails, you mentioned:

Nah, not especially. bpf_tag sounds OK-ish to me if it suits you.

Sounds good. I will use "bpf_tag" as the starting point now.
Also, it is possible "bpf_tag" may appear multiple times for the same
function, declaration etc.

For example,
  #define __bpf_tag(s) __attribute__((bpf_tag(s)))
  int g __bpf_tag("str1") __bpf_tag("str2");
Let us say we introduced a LLVM vendor tag DWARF_AT_LLVM_bpf_tag.

How do you want the above to be represented in dwarf?

My current scheme is to put all bpf_tag's in a string, separated by ",".
This will make things simpler. So the final output will be
     DWARF_AT_LLVM_bpf_tag "str1,str2"
I may need to do a discussion with the kernel folks to use a different
delimiter than ",", but we still represent all tags with ONE string.

But alternatively, it could be represented as a list of strings like
   DWARF_AT_LLVM_bpf_tag
             "str1"
             "str2"
is similar to DWARF_AT_location.

The first internal representation
   DWARF_AT_LLVM_bpf_tag "str1,str2"
should be easier for IR/bitcode read/write and dwarf parsing.

What do you think?

Any suggestions/preferences for the spelling, Aaron?

I don’t know this domain particularly well, so takes these suggestions
with a giant grain of salt:

If the concept is specific to DWARF and you don’t think it’ll need to
extend into other debug formats, you could go with dwarf_annotate.
If it’s not really a DWARF thing but is more about B[P|T]F, then
btf_annotate or bpf_annotate could work, but those may be a bit
mysterious to folks outside of the domain. If it’s a generic debug
info concept, probably debug_info_annotate or something.

Arguably it can/could be a generic debug info or dwarf thing, but for now we don’t have any use for it other than to squirrel info along to BTF/BPF so I’m on the fence about which prefix to use exactly

A bit more bike shedding colors…

The __rcu and __user annations might be used by the clang itself eventually.
Currently the “sparse” tool is doing this analysis and warns users
when __rcu pointer is incorrectly accessed in the kernel C code.
If clang can do that directly that could be a huge selling point
for folks to switch from gcc to clang for kernel builds.
The front-end would treat such annotations as arbitrary string, but
special “building-linux-kernel-pass” would interpret the semantical context.

Are __rcu and __user annotations notionally distinct things from bpf
(and perhaps each other as well)? Distinct enough that it would make
sense to use a different attribute name for user source for each need?
I suspect the answer is yes given that the existing annotations have
their own names which are distinct, but I don’t know this domain
enough to be sure.

__rcu and __user don’t overlap. __rcu is not a single annotation though.
It’s a combination of annotations in pointers, functions, macros.
Some functions have:
__acquires(rcu)
another function might have:
__acquires(rcu_bh)
There are several flavors of the RCU in the kernel.
So single attribute((rcu_annotate(“foo”))) won’t work even within RCU scope.
But if we do:
struct foo {
void * attribute((tag(“ptr.rcu_bh”)) ptr;
};
int bar(int) attribute((tag(“acquires.rcu_bh”)) { … }
int baz(int) attribute((tag(“releases.rcu_bh”)) { … }
int qux(int) attribute((tag(“acquires.rcu_sched”)) { … }

The clang pass can parse these strings and correlate one tag to another.
RCU flavors come and go, so clang cannot hard code the names.

Maybe we can name it as “bpf_tag” as it is a “tag” for “bpf” use case?

David, in one of your early emails, you mentioned:

===
Arguably it can/could be a generic debug info or dwarf thing, but for
now we don’t have any use for it other than to squirrel info along to
BTF/BPF so I’m on the fence about which prefix to use exactly

and suggests since it might be used in the future for non-bpf things,
maybe the name could be a little more generic then bpf-specific.

Do you have any suggestions on what name to pick?

Nah, not especially. bpf_tag sounds OK-ish to me if it suits you.

The more generic the better IMO. And, the less the need to parse string literals the better.

Why not simply __attribute__((debuginfo("arg1", "arg2", ...))), e.g.:

#define BPF_TAG(...) __attribute__((debuginfo("bpf", __VA_ARGS__)))
struct foo {
void * BPF_TAG("ptr","rcu","bh") ptr;
};
#define BPF_RCU_TAG(PFX, ...) BPF(PFX, "rcu", __VA_ARGS__)
int bar(int) BPF_RCU_TAG("acquires","bh") { ... }
int baz(int) BPF_RCU_TAG("releases","bh") { ... }
int qux(int) BPF_RCU_TAG("acquires","sched") { ... }

Any suggestions/preferences for the spelling, Aaron?

I don’t know this domain particularly well, so takes these suggestions
with a giant grain of salt:

If the concept is specific to DWARF and you don’t think it’ll need to
extend into other debug formats, you could go with dwarf_annotate.
If it’s not really a DWARF thing but is more about B[P|T]F, then
btf_annotate or bpf_annotate could work, but those may be a bit
mysterious to folks outside of the domain. If it’s a generic debug
info concept, probably debug_info_annotate or something.

Arguably it can/could be a generic debug info or dwarf thing, but for now we don’t have any use for it other than to squirrel info along to BTF/BPF so I’m on the fence about which prefix to use exactly

A bit more bike shedding colors…

The __rcu and __user annations might be used by the clang itself eventually.
Currently the “sparse” tool is doing this analysis and warns users
when __rcu pointer is incorrectly accessed in the kernel C code.
If clang can do that directly that could be a huge selling point
for folks to switch from gcc to clang for kernel builds.
The front-end would treat such annotations as arbitrary string, but
special “building-linux-kernel-pass” would interpret the semantical context.

Are __rcu and __user annotations notionally distinct things from bpf
(and perhaps each other as well)? Distinct enough that it would make
sense to use a different attribute name for user source for each need?
I suspect the answer is yes given that the existing annotations have
their own names which are distinct, but I don’t know this domain
enough to be sure.

__rcu and __user don’t overlap. __rcu is not a single annotation though.
It’s a combination of annotations in pointers, functions, macros.
Some functions have:
__acquires(rcu)
another function might have:
__acquires(rcu_bh)
There are several flavors of the RCU in the kernel.
So single attribute((rcu_annotate(“foo”))) won’t work even within RCU scope.
But if we do:
struct foo {
void * attribute((tag(“ptr.rcu_bh”)) ptr;
};
int bar(int) attribute((tag(“acquires.rcu_bh”)) { … }
int baz(int) attribute((tag(“releases.rcu_bh”)) { … }
int qux(int) attribute((tag(“acquires.rcu_sched”)) { … }

The clang pass can parse these strings and correlate one tag to another.
RCU flavors come and go, so clang cannot hard code the names.

Maybe we can name it as “bpf_tag” as it is a “tag” for “bpf” use case?

David, in one of your early emails, you mentioned:

===
Arguably it can/could be a generic debug info or dwarf thing, but for
now we don’t have any use for it other than to squirrel info along to
BTF/BPF so I’m on the fence about which prefix to use exactly

and suggests since it might be used in the future for non-bpf things,
maybe the name could be a little more generic then bpf-specific.

Do you have any suggestions on what name to pick?

Nah, not especially. bpf_tag sounds OK-ish to me if it suits you.

The more generic the better IMO. And, the less the need to parse string literals the better.

Why not simply __attribute__((debuginfo("arg1", "arg2", ...))), e.g.:

#define BPF_TAG(...) __attribute__((debuginfo("bpf", __VA_ARGS__)))
struct foo {
void * BPF_TAG("ptr","rcu","bh") ptr;
};
#define BPF_RCU_TAG(PFX, ...) BPF(PFX, "rcu", __VA_ARGS__)
int bar(int) BPF_RCU_TAG("acquires","bh") { ... }
int baz(int) BPF_RCU_TAG("releases","bh") { ... }
int qux(int) BPF_RCU_TAG("acquires","sched") { ... }

Unless Paul & Adrian, etc chime in in agreement of a more general name, like ‘debuginfo’, I’m inclined to avoid that/go with something bpf specific until there’s a broader use case/proposal, something we might be able to/want to encourage GCC to support too. Otherwise we’re taking a pretty broad attribute name & choosing its behavior when we don’t necessarily have a lot of leverage if GCC ends up using that name for something else.

& as for separate strings - maybe, but I’m not sure what that’ll look like in the resulting DWARF, what sort of form would you propose using to encode that? (same question below /)

Sounds good. I will use “bpf_tag” as the starting point now.
Also, it is possible “bpf_tag” may appear multiple times for the same
function, declaration etc.

For example,
#define __bpf_tag(s) attribute((bpf_tag(s)))
int g __bpf_tag(“str1”) __bpf_tag(“str2”);
Let us say we introduced a LLVM vendor tag DWARF_AT_LLVM_bpf_tag.

How do you want the above to be represented in dwarf?

My current scheme is to put all bpf_tag’s in a string, separated by “,”.
This will make things simpler. So the final output will be
DWARF_AT_LLVM_bpf_tag “str1,str2”
I may need to do a discussion with the kernel folks to use a different
delimiter than “,”, but we still represent all tags with ONE string.

But alternatively, it could be represented as a list of strings like
DWARF_AT_LLVM_bpf_tag
“str1”
“str2”
is similar to DWARF_AT_location.

What DWARF form were you thinking of using for this? There isn’t a built in form that provides encoding for multiple delimited/separated strings that I know of.

Any suggestions/preferences for the spelling, Aaron?

I don't know this domain particularly well, so takes these suggestions
with a giant grain of salt:

If the concept is specific to DWARF and you don't think it'll need to
extend into other debug formats, you could go with `dwarf_annotate`.
If it's not really a DWARF thing but is more about B[P|T]F, then
`btf_annotate` or `bpf_annotate` could work, but those may be a bit
mysterious to folks outside of the domain. If it's a generic debug
info concept, probably `debug_info_annotate` or something.

Arguably it can/could be a generic debug info or dwarf thing, but for now we don't have any use for it other than to squirrel info along to BTF/BPF so I'm on the fence about which prefix to use exactly

A bit more bike shedding colors...

The __rcu and __user annations might be used by the clang itself eventually.
Currently the "sparse" tool is doing this analysis and warns users
when __rcu pointer is incorrectly accessed in the kernel C code.
If clang can do that directly that could be a huge selling point
for folks to switch from gcc to clang for kernel builds.
The front-end would treat such annotations as arbitrary string, but
special "building-linux-kernel-pass" would interpret the semantical context.

Are __rcu and __user annotations notionally distinct things from bpf
(and perhaps each other as well)? Distinct enough that it would make
sense to use a different attribute name for user source for each need?
I suspect the answer is yes given that the existing annotations have
their own names which are distinct, but I don't know this domain
enough to be sure.

__rcu and __user don't overlap. __rcu is not a single annotation though.
It's a combination of annotations in pointers, functions, macros.
Some functions have:
__acquires(rcu)
another function might have:
__acquires(rcu_bh)
There are several flavors of the RCU in the kernel.
So single __attribute__((rcu_annotate("foo"))) won't work even within RCU scope.
But if we do:
struct foo {
void * __attribute__((tag("ptr.rcu_bh")) ptr;
};
int bar(int) __attribute__((tag("acquires.rcu_bh")) { ... }
int baz(int) __attribute__((tag("releases.rcu_bh")) { ... }
int qux(int) __attribute__((tag("acquires.rcu_sched")) { ... }
...
The clang pass can parse these strings and correlate one tag to another.
RCU flavors come and go, so clang cannot hard code the names.

Maybe we can name it as "bpf_tag" as it is a "tag" for "bpf" use case?

David, in one of your early emails, you mentioned:

===
Arguably it can/could be a generic debug info or dwarf thing, but for
now we don't have any use for it other than to squirrel info along to
BTF/BPF so I'm on the fence about which prefix to use exactly

and suggests since it might be used in the future for non-bpf things,
maybe the name could be a little more generic then bpf-specific.

Do you have any suggestions on what name to pick?

Nah, not especially. bpf_tag sounds OK-ish to me if it suits you.

The more generic the better IMO. And, the less the need to parse string literals the better.

Why not simply `__attribute__((debuginfo("arg1", "arg2", ...)))`, e.g.:

#define BPF_TAG(...) __attribute__((debuginfo("bpf", __VA_ARGS__)))
struct foo {
 void * BPF_TAG("ptr","rcu","bh") ptr;
};
#define BPF_RCU_TAG(PFX, ...) BPF(PFX, "rcu", __VA_ARGS__)
int bar(int) BPF_RCU_TAG("acquires","bh") { ... }
int baz(int) BPF_RCU_TAG("releases","bh") { ... }
int qux(int) BPF_RCU_TAG("acquires","sched") { ... }

Unless Paul & Adrian, etc chime in in agreement of a more general name, like 'debuginfo', I'm inclined to avoid that/go with something bpf specific until there's a broader use case/proposal, something we might be able to/want to encourage GCC to support too. Otherwise we're taking a pretty broad attribute name & choosing its behavior when we don't necessarily have a lot of leverage if GCC ends up using that name for something else.

& as for separate strings - maybe, but I'm not sure what that'll look like in the resulting DWARF, what sort of form would you propose using to encode that? (same question below \/)

Sounds good. I will use "bpf_tag" as the starting point now.
Also, it is possible "bpf_tag" may appear multiple times for the same
function, declaration etc.

For example,
#define __bpf_tag(s) __attribute__((bpf_tag(s)))
int g __bpf_tag("str1") __bpf_tag("str2");
Let us say we introduced a LLVM vendor tag DWARF_AT_LLVM_bpf_tag.

How do you want the above to be represented in dwarf?

My current scheme is to put all bpf_tag's in a string, separated by ",".
This will make things simpler. So the final output will be
    DWARF_AT_LLVM_bpf_tag "str1,str2"
I may need to do a discussion with the kernel folks to use a different
delimiter than ",", but we still represent all tags with ONE string.

But alternatively, it could be represented as a list of strings like
  DWARF_AT_LLVM_bpf_tag
            "str1"
            "str2"
is similar to DWARF_AT_location.

What DWARF form were you thinking of using for this? There isn't a built in form that provides encoding for multiple delimited/separated strings that I know of.

Actually I have not looked at the details on how to implement multiple
separated strings yet. Since you are mentioning there exists no such a
built-in form and the attribute is bpf specific, I will then just go
to one string only approach (e.g. "str1;str2" where ";" is the
delimiter). I just checked linux:include/linux/compiler_*.h, it is
possible "," may appear in some attributes, so I will use ";" as the
delimiter. Thanks for the clarification!

Do you need to support multiple distinct attribute((XXX(“stuff”))) on one entity? If so, maybe it’s worth considering how to encode them separately, rather than having the frontend have to concatenate them together?

One option would be to support multiple of the same attribute on the DIE in question - though that’s probably still difficult to encode in the LLVM IR metadata (we don’t have any repeating fields in the LLVM IR debug info metadata) - which, maybe comes back to the idea of having the frontend concatenate all the attributes together with some separator like “;”.

I'd prefer to not have to parse strings and rather have multiple
attributes individual "tag" attributes, but seems like DWARFv5
reference explicitly prohibits multiple tags of the same type under
single DIE:

  2.2 Attribute Types
  Each attribute value is characterized by an attribute name. No more than one
  attribute with a given name may appear in any debugging information entry.
  There are no limitations on the ordering of attributes within a debugging
  information entry.