Tracking variable endieness at compile time (C language)

Hello

I am writing a code that deals with low-level PCI machinery. I need to
read data from PCI config MMIO. The data from device can be 8/16/32
bits long and per specification it is little-endien. I have a set of
macroses that convert from little-endien to CPU endieness. I have to
use these macroses every time I access PCI config data. CPU endieness
depends on CPU architecture and can be either little- or big-endien.

I am looking for a way that helps me to detect errors when I access
little-endien fields without proper conversion macros. I am thinking
about following solution - have a marker macros called __le __be that
tells compiler this field has specific endieness:

struct virtio_net_hdr {
    __le uint8_t flags;
    __le uint8_t gso_type;
    __le uint16_t hdr_len;
    __le uint16_t gso_size;
    __le uint16_t csum_start;
    __le uint16_t csum_offset;
};

If we try to access the field as
hdr->hdr_len = 20;

then compiler should either:

1) report an error and tell me that there is endieness mismatch
between hdr->hdr_len and const '20'. The right way would be to use
cpu_to_le() macro like:
hdr->hdr_len = cpu_to_le(20);

2) Or maybe compiler can completely automate conversion and call
__builtin_bswap16()/__builtin_bswap32/.. functions for us
automatically?

Is there any existing solution that does something similar. Quick
googling did not give me anything useful. Could it be a part of core
compiler functionality? Or as a third-party compiler plugin?

Hello

Hello

I am writing a code that deals with low-level PCI machinery. I need to
read data from PCI config MMIO. The data from device can be 8/16/32
bits long and per specification it is little-endien. I have a set of
macroses that convert from little-endien to CPU endieness. I have to
use these macroses every time I access PCI config data. CPU endieness
depends on CPU architecture and can be either little- or big-endien.

I am looking for a way that helps me to detect errors when I access
little-endien fields without proper conversion macros. I am thinking
about following solution - have a marker macros called __le __be that
tells compiler this field has specific endieness:

struct virtio_net_hdr {
    __le uint8_t flags;
    __le uint8_t gso_type;
    __le uint16_t hdr_len;
    __le uint16_t gso_size;
    __le uint16_t csum_start;
    __le uint16_t csum_offset;
};

If we try to access the field as
hdr->hdr_len = 20;

then compiler should either:

1) report an error and tell me that there is endieness mismatch
between hdr->hdr_len and const '20'. The right way would be to use
cpu_to_le() macro like:
hdr->hdr_len = cpu_to_le(20);

2) Or maybe compiler can completely automate conversion and call
__builtin_bswap16()/__builtin_bswap32/.. functions for us
automatically?

Is there any existing solution that does something similar. Quick
googling did not give me anything useful. Could it be a part of core
compiler functionality? Or as a third-party compiler plugin?

Actually instead of a marker macro it is better to have an attribute.
It can be applied to a variable or struct. If applied to struct then
all fields of the struct inherit the provided endianness. Here is a
proposed example of usage

__attribute__ ((endianness(little))) uint32_t pci_id;

__attribute__ ((endianness(big))) struct arp_header {
  uint16_t hw_type;
  uint16_t proto_type;
  uint8_t hw_len;
  uint8_t proto_len;
  ...
}

This attribute enables automatic byte swapping when accessing the fields.

UPDATE: After more searching I found that GCC implements
scalar_storage_order attribute [1]. I tested following example [2]
with 7.1.0 and I see that gcc uses 'bswap' ASM instruction when
accessing the struct fields. Yep that is exactly what I am looking
for. But it seems the feature is not present in Clang. I wonder if
Clang developers have plans to implement this feature.

[1] Using the GNU Compiler Collection (GCC): Common Type Attributes
[2] c++ - Does g++ support scalar_storage_order? - Stack Overflow

Hello

Hello

I am writing a code that deals with low-level PCI machinery. I need to
read data from PCI config MMIO. The data from device can be 8/16/32
bits long and per specification it is little-endien. I have a set of
macroses that convert from little-endien to CPU endieness. I have to
use these macroses every time I access PCI config data. CPU endieness
depends on CPU architecture and can be either little- or big-endien.

I am looking for a way that helps me to detect errors when I access
little-endien fields without proper conversion macros. I am thinking
about following solution - have a marker macros called __le __be that
tells compiler this field has specific endieness:

struct virtio_net_hdr {
     __le uint8_t flags;
     __le uint8_t gso_type;
     __le uint16_t hdr_len;
     __le uint16_t gso_size;
     __le uint16_t csum_start;
     __le uint16_t csum_offset;
};

If we try to access the field as
hdr->hdr_len = 20;

then compiler should either:

1) report an error and tell me that there is endieness mismatch
between hdr->hdr_len and const '20'. The right way would be to use
cpu_to_le() macro like:
hdr->hdr_len = cpu_to_le(20);

2) Or maybe compiler can completely automate conversion and call
__builtin_bswap16()/__builtin_bswap32/.. functions for us
automatically?

Is there any existing solution that does something similar. Quick
googling did not give me anything useful. Could it be a part of core
compiler functionality? Or as a third-party compiler plugin?

Actually instead of a marker macro it is better to have an attribute.
It can be applied to a variable or struct. If applied to struct then
all fields of the struct inherit the provided endianness. Here is a
proposed example of usage

__attribute__ ((endianness(little))) uint32_t pci_id;

__attribute__ ((endianness(big))) struct arp_header {
   uint16_t hw_type;
   uint16_t proto_type;
   uint8_t hw_len;
   uint8_t proto_len;
   ...
}

This attribute enables automatic byte swapping when accessing the fields.

UPDATE: After more searching I found that GCC implements
scalar_storage_order attribute [1]. I tested following example [2]
with 7.1.0 and I see that gcc uses 'bswap' ASM instruction when
accessing the struct fields. Yep that is exactly what I am looking
for. But it seems the feature is not present in Clang. I wonder if
Clang developers have plans to implement this feature.

If nothing else, please file a bug on bugs.llvm.org requesting the feature.

  -Hal

A solution requiring no compiler support is to wrap the little endian values in a struct.

template
struct little_endian {
little_endian(T value) : _value(value) {}
T _value;
T convert() {
// conversion here
}
};

That requires a lot more compiler support than C gives you :wink: (The need
for C wasn't explicit in the message body, I think, but it's in the subject
line.)

-- James

Hello

I just filed a bug for this request https://bugs.llvm.org/show_bug.cgi?id=35293

I'll be glad to test this feature in my project.

Oh, C, well, you could write a struct for each type and… yeah. No-one wants to do that. :slight_smile: