RFC: new intrinsic llvm.memcmp?

I propose a new intrinsic "llvm.memcmp" that compares a block of memory
for equality (a subset of the libc behavior). Backends are free to use the
alignment to optimize using wider than byte operations. Since the result is
only equal/not-equal, byte order is not important.

For languages that support array compares, this would be very useful.

Syntax:

   declare i1 @llvm.memcmp(i8* <arg1>, i8* <arg2>, i32 <len>, i32 <align>)
   declare i1 @llvm.memcmp(i8* <arg1>, i8* <arg2>, i64 <len>, i32 <align>)

Overview:

The 'llvm.memcmp.*' intrinsic compares a two blocks of memory for equality,
returning true if they are equal.

Arguments:

The first two arguments are pointers to the memory to be compared.
The third argument is an integer argument specifying the number of bytes to
compare, the fourth argument is the alignment of the two memory locations

If the call to this intrinsic has an alignment value that is not 0 or 1,
then the caller guarantees that both source pointers are aligned to that boundary.

I propose a new intrinsic "llvm.memcmp" that compares a block of memory
for equality (a subset of the libc behavior). Backends are free to use the
alignment to optimize using wider than byte operations. Since the result is
only equal/not-equal, byte order is not important.

For languages that support array compares, this would be very useful.

Syntax:

declare i1 @llvm.memcmp(i8* <arg1>, i8* <arg2>, i32 <len>, i32 <align>)
declare i1 @llvm.memcmp(i8* <arg1>, i8* <arg2>, i64 <len>, i32 <align>)

The following would be preferred:
   declare i1 @llvm.memcmp.i32(i8* <arg1>, i8* <arg2>, i32 <len>, i32 <align>)
   declare i1 @llvm.memcmp.i64(i8* <arg1>, i8* <arg2>, i64 <len>, i32 <align>)

Overview:

The 'llvm.memcmp.*' intrinsic compares a two blocks of memory for equality,
returning true if they are equal.

Arguments:

The first two arguments are pointers to the memory to be compared.
The third argument is an integer argument specifying the number of bytes to
compare, the fourth argument is the alignment of the two memory locations

If the call to this intrinsic has an alignment value that is not 0 or 1,
then the caller guarantees that both source pointers are aligned to that boundary.

I assume <align> is required to be a constant integer?

Also, I assume this is supposed to guarantee that arg1 and arg2 point
to len bytes of valid memory?

Most importantly, is the overhead of calling memcmp actually
significant for your application? Are there enough other people in
the same situation to make this worth implementing? This is unlikely
to provide significant performance improvements for C/C++ code...

-Eli

I propose a new intrinsic "llvm.memcmp" that compares a block of memory
for equality (a subset of the libc behavior). Backends are free to use the
alignment to optimize using wider than byte operations. Since the result is
only equal/not-equal, byte order is not important.

For languages that support array compares, this would be very useful.

Syntax:

   declare i1 @llvm.memcmp(i8*<arg1>, i8*<arg2>, i32<len>, i32<align>)
   declare i1 @llvm.memcmp(i8*<arg1>, i8*<arg2>, i64<len>, i32<align>)

The following would be preferred:
    declare i1 @llvm.memcmp.i32(i8*<arg1>, i8*<arg2>, i32<len>, i32<align>)
    declare i1 @llvm.memcmp.i64(i8*<arg1>, i8*<arg2>, i64<len>, i32<align>)

OK. I had assumed that the it would be overloaded as is memcpy.

Overview:

The 'llvm.memcmp.*' intrinsic compares a two blocks of memory for equality,
returning true if they are equal.

Arguments:

The first two arguments are pointers to the memory to be compared.
The third argument is an integer argument specifying the number of bytes to
compare, the fourth argument is the alignment of the two memory locations

If the call to this intrinsic has an alignment value that is not 0 or 1,
then the caller guarantees that both source pointers are aligned to that boundary.

I assume<align> is required to be a constant integer?

Yes.

Also, I assume this is supposed to guarantee that arg1 and arg2 point
to len bytes of valid memory?

Yes.

Most importantly, is the overhead of calling memcmp actually
significant for your application? Are there enough other people in
the same situation to make this worth implementing? This is unlikely
to provide significant performance improvements for C/C++ code...

I suppose it wouldn't help C/C++ much. But, with languages that support array compares directly, e.g. "D", memcmp() is not actually called and array compares can be quite common. For example, in IPv6 address compares, where 16 bytes are compared, using bigger chunks (assuming the addresses are suitably aligned) can be a big saving in a IPv6 stack.

Of course, each front end could expand an aligned memcmp into chunk compares, but this does require knowledge of what widths the target can handle.

bagel

This looks more like user code than language feature to me...

cheers,
--renato