__attribute__((enable_if(...))) for de-macroifying the builtin headers

In the post-commit of r231792, I suggested the idea of using attribute((enable_if(…))) for avoiding the mess of the macros in the builtin headers. AFAIK, the macros are currently used to make sure that the “immediates” are constant expressions, piggybacking on the constant expression check in the _builtin* call.

I’ve attached a file with a proof-of-concept for using attribute((enable_if(…))) for this purpose. I originally though using __builtin_constant_p in the enable_if, but that turns out to not be necessary (see the docs: http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) extension ({
// (__m256i)__builtin_shufflevector(
// (__v4di)(V1),
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)),
// (((M) & 1) ? 0 : 4),
// (((M) & 1) ? 1 : 5),
// (((M) & 1) ? 4 : 2),
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline attribute((always_inline, nodebug))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
attribute((enable_if(__imm8, “‘__imm8’ must be a constant”)))
{
if (__imm8 & 1)
return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0, 1, 4, 5);
else
return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4, 5, 2, 3);
}

Nick, are you okay using enable_if for this? It’s sort of a hack but if we are going to be carrying this attribute around forever (has it reached that level of compatibility guarantee yet?), we might as well use it to solve this problem for us.

– Sean Silva

testenableif.cpp (1.61 KB)

In the post-commit of r231792, I suggested the idea of using
__attribute__((enable_if(...))) for avoiding the mess of the macros in the
builtin headers. AFAIK, the macros are currently used to make sure that the
"immediates" are constant expressions, piggybacking on the constant
expression check in the __builtin_* call.

I've attached a file with a proof-of-concept for using
__attribute__((enable_if(...))) for this purpose. I originally though using
__builtin_constant_p in the enable_if, but that turns out to not be
necessary (see the docs:
http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the
enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) __extension__ ({ \
// (__m256i)__builtin_shufflevector( \
// (__v4di)(V1), \
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)), \
// (((M) & 1) ? 0 : 4), \
// (((M) & 1) ? 1 : 5), \
// (((M) & 1) ? 4 : 2), \
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8, "'__imm8' must be a constant")))
{
  if (__imm8 & 1)
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0, 1,
4, 5);
  else
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4, 5,
2, 3);
}

I think:

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 1, "'__imm8' must be a constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0, 1, 4,
5);
}

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 0, "'__imm8' must be a constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4, 5, 2,
3);
}

Nick, are you okay using enable_if for this? It's sort of a hack but if we

are going to be carrying this attribute around forever (has it reached that
level of compatibility guarantee yet?), we might as well use it to solve
this problem for us.

Attribute enable_if is here to stay, though it may change meaning in corner
cases (notably multiple enable_if's on a single function). Using it in the
compiler's own header files is fine. Does anyone ever try to take the
address of _mm256_insertf128_si256 (I assume not if it's a macro)?

In the post-commit of r231792, I suggested the idea of using
__attribute__((enable_if(...))) for avoiding the mess of the macros in the
builtin headers. AFAIK, the macros are currently used to make sure that the
"immediates" are constant expressions, piggybacking on the constant
expression check in the __builtin_* call.

I've attached a file with a proof-of-concept for using
__attribute__((enable_if(...))) for this purpose. I originally though using
__builtin_constant_p in the enable_if, but that turns out to not be
necessary (see the docs:
http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the
enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) __extension__ ({ \
// (__m256i)__builtin_shufflevector( \
// (__v4di)(V1), \
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)), \
// (((M) & 1) ? 0 : 4), \
// (((M) & 1) ? 1 : 5), \
// (((M) & 1) ? 4 : 2), \
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8, "'__imm8' must be a constant")))
{
  if (__imm8 & 1)
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0,
1, 4, 5);
  else
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4,
5, 2, 3);
}

I think:

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 1, "'__imm8' must be a constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0, 1,
4, 5);
}

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 0, "'__imm8' must be a constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4, 5,
2, 3);
}

Unfortunately this approach doesn't scale to some of the more nontrivial
ones like _mm256_alignr_epi8 in http://reviews.llvm.org/D8301 .

Nick, are you okay using enable_if for this? It's sort of a hack but if we

are going to be carrying this attribute around forever (has it reached that
level of compatibility guarantee yet?), we might as well use it to solve
this problem for us.

Attribute enable_if is here to stay, though it may change meaning in
corner cases (notably multiple enable_if's on a single function). Using it
in the compiler's own header files is fine. Does anyone ever try to take
the address of _mm256_insertf128_si256 (I assume not if it's a macro)?

Yeah, I think taking the address of these is a "don't do that" sort of
thing. "static inline always_inline"

What changes do you foresee in the case of multiple enable_if's? The
current behavior (in my empirical testing; haven't used the source) seems
to be a good fit for this use case.

-- Sean Silva

In the post-commit of r231792, I suggested the idea of using
__attribute__((enable_if(...))) for avoiding the mess of the macros in the
builtin headers. AFAIK, the macros are currently used to make sure that the
"immediates" are constant expressions, piggybacking on the constant
expression check in the __builtin_* call.

I've attached a file with a proof-of-concept for using
__attribute__((enable_if(...))) for this purpose. I originally though using
__builtin_constant_p in the enable_if, but that turns out to not be
necessary (see the docs:
http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the
enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) __extension__ ({ \
// (__m256i)__builtin_shufflevector( \
// (__v4di)(V1), \
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)), \
// (((M) & 1) ? 0 : 4), \
// (((M) & 1) ? 1 : 5), \
// (((M) & 1) ? 4 : 2), \
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8, "'__imm8' must be a constant")))
{
  if (__imm8 & 1)
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0,
1, 4, 5);
  else
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4,
5, 2, 3);
}

I think:

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 1, "'__imm8' must be a constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0, 1,
4, 5);
}

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 0, "'__imm8' must be a constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4, 5,
2, 3);
}

Unfortunately this approach doesn't scale to some of the more nontrivial
ones like _mm256_alignr_epi8 in http://reviews.llvm.org/D8301 .

Fair enough. I would still try to hoist out a simple if-statement though,
even if we can't do it for all of the builtins.

Nick, are you okay using enable_if for this? It's sort of a hack but if we

are going to be carrying this attribute around forever (has it reached that
level of compatibility guarantee yet?), we might as well use it to solve
this problem for us.

Attribute enable_if is here to stay, though it may change meaning in
corner cases (notably multiple enable_if's on a single function). Using it
in the compiler's own header files is fine. Does anyone ever try to take
the address of _mm256_insertf128_si256 (I assume not if it's a macro)?

Yeah, I think taking the address of these is a "don't do that" sort of
thing. "static inline always_inline"

You can still take the address of a static inline always_inline function
(at least with GCC 4.7), but I guess we're okay not allowing that.

What changes do you foresee in the case of multiple enable_if's? The

current behavior (in my empirical testing; haven't used the source) seems
to be a good fit for this use case.

I don't know what we'd change it to, but we currently have a problem where
you end up with a combinatorial explosion in number of overloads necessary
to represent the check you want. I don't recall the situation off the top
of my head.

In the post-commit of r231792, I suggested the idea of using
__attribute__((enable_if(...))) for avoiding the mess of the macros in the
builtin headers. AFAIK, the macros are currently used to make sure that the
"immediates" are constant expressions, piggybacking on the constant
expression check in the __builtin_* call.

I've attached a file with a proof-of-concept for using
__attribute__((enable_if(...))) for this purpose. I originally though using
__builtin_constant_p in the enable_if, but that turns out to not be
necessary (see the docs:
http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the
enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) __extension__ ({ \
// (__m256i)__builtin_shufflevector( \
// (__v4di)(V1), \
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)), \
// (((M) & 1) ? 0 : 4), \
// (((M) & 1) ? 1 : 5), \
// (((M) & 1) ? 4 : 2), \
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8, "'__imm8' must be a constant")))
{
  if (__imm8 & 1)
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0,
1, 4, 5);
  else
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4,
5, 2, 3);
}

I think:

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 1, "'__imm8' must be a
constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0, 1,
4, 5);
}

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 0, "'__imm8' must be a
constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4, 5,
2, 3);
}

Unfortunately this approach doesn't scale to some of the more nontrivial
ones like _mm256_alignr_epi8 in http://reviews.llvm.org/D8301 .

Fair enough. I would still try to hoist out a simple if-statement though,
even if we can't do it for all of the builtins.

What is the advantage of hoisting it?

Nick, are you okay using enable_if for this? It's sort of a hack but if we

are going to be carrying this attribute around forever (has it reached that
level of compatibility guarantee yet?), we might as well use it to solve
this problem for us.

Attribute enable_if is here to stay, though it may change meaning in
corner cases (notably multiple enable_if's on a single function). Using it
in the compiler's own header files is fine. Does anyone ever try to take
the address of _mm256_insertf128_si256 (I assume not if it's a macro)?

Yeah, I think taking the address of these is a "don't do that" sort of
thing. "static inline always_inline"

You can still take the address of a static inline always_inline function
(at least with GCC 4.7), but I guess we're okay not allowing that.

Yeah, I sent before finishing my sentence, which I think was supposed to be
something like: "static inline always_inline" reads to me like "we really
really really don't want this to end up as a symbol in the object file",
which implies "shouldn't have an address".

What changes do you foresee in the case of multiple enable_if's? The

current behavior (in my empirical testing; haven't used the source) seems
to be a good fit for this use case.

I don't know what we'd change it to, but we currently have a problem where
you end up with a combinatorial explosion in number of overloads necessary
to represent the check you want. I don't recall the situation off the top
of my head.

Ok, I can see that. I guess we can cross that bridge when we need to.

-- Sean Silva

Hi Sean -

Thanks for pushing on this.

Re: the complicated cases - in one of the recent patch reviews in this area, Andrea pointed out a major shortcoming of the header hacks: we’re optimizing vector intrinsics into generic IR even at -O0. This won’t make for happy debugging.

So I’ve been optimizing these in InstCombine rather than headers recently:
http://reviews.llvm.org/rL235124
http://reviews.llvm.org/rL232852

This is a much nicer solution IMO, and I haven’t heard any objections to this approach, so I’m hoping we’ll clean up all of the complicated macros eventually.

In the post-commit of r231792, I suggested the idea of using
__attribute__((enable_if(...))) for avoiding the mess of the macros in the
builtin headers. AFAIK, the macros are currently used to make sure that the
"immediates" are constant expressions, piggybacking on the constant
expression check in the __builtin_* call.

I've attached a file with a proof-of-concept for using
__attribute__((enable_if(...))) for this purpose. I originally though using
__builtin_constant_p in the enable_if, but that turns out to not be
necessary (see the docs:
http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the
enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) __extension__ ({ \
// (__m256i)__builtin_shufflevector( \
// (__v4di)(V1), \
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)), \
// (((M) & 1) ? 0 : 4), \
// (((M) & 1) ? 1 : 5), \
// (((M) & 1) ? 4 : 2), \
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8, "'__imm8' must be a constant")))
{
  if (__imm8 & 1)
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b),
0, 1, 4, 5);
  else
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b),
4, 5, 2, 3);
}

I think:

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 1, "'__imm8' must be a
constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0,
1, 4, 5);
}

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 0, "'__imm8' must be a
constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4,
5, 2, 3);
}

Unfortunately this approach doesn't scale to some of the more nontrivial
ones like _mm256_alignr_epi8 in http://reviews.llvm.org/D8301 .

Fair enough. I would still try to hoist out a simple if-statement though,
even if we can't do it for all of the builtins.

What is the advantage of hoisting it?

1) We won't emit a redundant branch instruction that the middle-end would
always remove
2) More reasonable code at -O0
3) No need for __builtin_constant_p usage (note that your original version
didn't work if __imm8 was 0).

And disadvantages: we'll produce less good diagnostics (extra notes), and
we'll need __attribute__((overloadable)) for this to work in C.

Nick, are you okay using enable_if for this? It's sort of a hack but if we

Hi Sean -

Thanks for pushing on this.

Re: the complicated cases - in one of the recent patch reviews in this
area, Andrea pointed out a major shortcoming of the header hacks: we're
optimizing vector intrinsics into generic IR even at -O0. This won't make
for happy debugging.

So I've been optimizing these in InstCombine rather than headers recently:
http://reviews.llvm.org/rL235124
http://reviews.llvm.org/rL232852

This is a much nicer solution IMO, and I haven't heard any objections to
this approach, so I'm hoping we'll clean up all of the complicated macros
eventually.

How are you handling the frontend checks for them being a compile-time
constant? I think that at the very least the enable_if will allow removing
the macros which I thought were just for that purpose (or are you planning
on adding the intrinsics directly as clang intrinsics?).

-- Sean Silva

In the post-commit of r231792, I suggested the idea of using
__attribute__((enable_if(...))) for avoiding the mess of the macros in the
builtin headers. AFAIK, the macros are currently used to make sure that the
"immediates" are constant expressions, piggybacking on the constant
expression check in the __builtin_* call.

I've attached a file with a proof-of-concept for using
__attribute__((enable_if(...))) for this purpose. I originally though using
__builtin_constant_p in the enable_if, but that turns out to not be
necessary (see the docs:
http://clang.llvm.org/docs/AttributeReference.html#enable-if ; the
enable_if condition fails for non-constant expressions anyway). The core is:

// Current builtin headers:
//
//#define _mm256_insertf128_si256(V1, V2, M) __extension__ ({ \
// (__m256i)__builtin_shufflevector( \
// (__v4di)(V1), \
// (__v4di)_mm256_castsi128_si256((__m128i)(V2)), \
// (((M) & 1) ? 0 : 4), \
// (((M) & 1) ? 1 : 5), \
// (((M) & 1) ? 4 : 2), \
// (((M) & 1) ? 5 : 3) );})

// A bit cleaner.
static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8, "'__imm8' must be a constant")))
{
  if (__imm8 & 1)
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b),
0, 1, 4, 5);
  else
    return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b),
4, 5, 2, 3);
}

I think:

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 1, "'__imm8' must be a
constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 0,
1, 4, 5);
}

static __inline __attribute__((__always_inline__, __nodebug__))
__m256i _mm256_insertf128_si256(__m256i __a, __m128i __b, int __imm8)
__attribute__((enable_if(__imm8 & 1 == 0, "'__imm8' must be a
constant")))
{
  return __builtin_shufflevector(__a, _mm256_castsi128_si256(__b), 4,
5, 2, 3);
}

Unfortunately this approach doesn't scale to some of the more
nontrivial ones like _mm256_alignr_epi8 in
http://reviews.llvm.org/D8301 .

Fair enough. I would still try to hoist out a simple if-statement
though, even if we can't do it for all of the builtins.

What is the advantage of hoisting it?

1) We won't emit a redundant branch instruction that the middle-end would
always remove
2) More reasonable code at -O0
3) No need for __builtin_constant_p usage (note that your original version
didn't work if __imm8 was 0).

Actually, __builtin_constant_p always returns true when used in the context
of the enable_if expression (I didn't look into why; perhaps the expression
is only evaluated when the variables it uses are constants or something). I
was actually literally using enable_if(__imm8,...) as a constant test,
since it fails when __imm8 isn't a constant. I stupidly overlooked that
this would fail for the 0 case. This can be handled by enable_if((__imm8 ==
__imm8),...) or some such tautology, or perhaps comma operator (untested):
enable_if((__imm8, 1),...).

-- Sean Silva

Handling the intrinsics in InstCombine decouples the optimization part of
the problem from the constant checking. So if we can handle the simple
macros using enable_if, then we should be able to convert the complicated
macros back into their original simple inline function forms + enable_if
checking and do all of the optimization work in InstCombine for those cases
too.

Put another way: if enable_if works for one case, then we should be able to
make it work for all cases and eventually have a uniform header file; each
intrinsic can be a simple function call with all optimization deferred to
InstCombine.