One little test, comparing clang and gcc

Hi all,
          I just wanted to try clang, so I compiled the current trunk.

clang version 1.1 (trunk 85507)
Target: x86_64-unknown-linux-gnu
Thread model: posix

Then I compiled with it and gcc-4.4.2 the appended code with:
1) $(CC) -Wall -O0
2) $(CC) -Wall -O2
3) $(CC) -Wall -O3

No errors, good.
ls -l:
27713 2009-10-29 17:48 time_tests_clang
19346 2009-10-29 17:49 time_tests_clang_02
19346 2009-10-29 17:49 time_tests_clang_03
23329 2009-10-29 17:48 time_tests_gcc
19039 2009-10-29 17:48 time_tests_gcc_02
19039 2009-10-29 17:49 time_tests_gcc_03

strip --strip-all
ls -l:
23512 2009-10-29 17:53 time_tests_clang
15280 2009-10-29 17:53 time_tests_clang_02
15280 2009-10-29 17:53 time_tests_clang_03
19128 2009-10-29 17:52 time_tests_gcc
15008 2009-10-29 17:52 time_tests_gcc_02
15008 2009-10-29 17:52 time_tests_gcc_03

A bit bloated the unoptimized version, but the optimized version is good.

These are the sha1sums of the stripped files:

c99fed6779e4b6078b8a9eca80c64a31e741ac61 time_tests_clang
e162a06e483d14d6baa64d2bfd519b8d2e1e0d0c time_tests_clang_02
e162a06e483d14d6baa64d2bfd519b8d2e1e0d0c time_tests_clang_03
62579489108af5fc4bb4d4d57d4d8e5a1537b4ee time_tests_gcc
ee582023b6acbbbb5235e674534d5949bb18fb0f time_tests_gcc_02
5f83c4e4474001632590f5689e9eea84c537145a time_tests_gcc_03

The -O3 switch isn't really implemented or the new optimizations
activated simply didn't find anything to modify on this code?

This is the code (taken from Programming Pearls 2nd ed., by Jon Bentley):
(with some irrelevant modifications by me)

/* Copyright (C) 1999 Lucent Technologies */
/* From 'Programming Pearls' by Jon Bentley */

/* timemod.c -- Produce table of C run time costs */

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>

#define MAXN 100000
int x[MAXN];
int startn = 5000;
int n;

/* FUNCTIONS TO BE TIMED */

int intcmp(int *i, int *j)
{
  return *i - *j;
}

#define swapmac(i, j) { t = x[i]; x[i] = x[j]; x[j] = t; }

void swapfunc(int i, int j)
{
  int t = x[i];
  x[i] = x[j];
  x[j] = t;
}

static inline void swap_func_inline(int arr[], int i, int j)
{
  int t = arr[i];
  arr[i] = arr[j];
  arr[j] = t;
}

#define maxmac(a, b) ((a) > (b) ? (a) : (b))

int maxfunc(int a, int b)
{
  return a > b ? a : b;
}

/* WORKHORSE */

#define T(s, n)\
do {\
  typeof(s) s__ = s;\
  typeof(n) n__ = n;\
  printf("%s (n=%d)\n", s__, n__);\
} while (0)

#define TRIALS 5

#define M(op) do { \
  printf(" %-26s", #op); \
  k = 0; \
  timesum = 0; \
  for (ex = 0; ex < TRIALS; ex++) { \
    start = clock(); \
    for (i = 1; i <= n; i++) { \
      fi = (float) i; \
      for (j = 1; j <= n; j++) { \
        op; \
      } \
    } \
    t = clock()-start; \
    printf("%8d", t); \
    timesum += t; \
  } \
  nans = 1e9 * timesum / ((double) \
    n*n * TRIALS * CLOCKS_PER_SEC); \
  printf("%8.0f\n", nans); } while (0)

int main()
{
  int i, j, k;
  float fi, fj, fk;
  int t, ex, timesum, start, globalstart;
  double nans;

  globalstart = clock();
  for (i = 0; i < n; i++)
    x[i] = rand();
  n = startn;
  printf("C Time Cost Model, n=%d\n", n);
  T("\nInteger Arithmetic", n);
  M({});
  M(k++);
  M(k = i);
  M(k = i + j);
  M(k = i - j);
  M(k = i * j);
  M(k = i / j);
  M(k = i % j);
  M(k = i & j);
  M(k = i | j);
  T("\nFloating Point Arithmetic", n);
  M(fj=j;);
  M(fj=j; fk = fi + fj;);
  M(fj=j; fk = fi - fj;);
  M(fj=j; fk = fi * fj;);
  M(fj=j; fk = fi / fj;);
  T("\nArray Operations", n);
  M(k = i + j);
  M(k = x[i] + j);
  M(k = i + x[j]);
  M(k = x[i] + x[j]);
  T("\nComparisons", n);
  M(if (i < j) k++);
  M(if (x[i] < x[j]) k++);
  T("\nArray Comparisons and Swaps", n);
  M(k = (x[i]<x[k]) ? -1 : 1);
  M(k = intcmp(x+i, x+j));
  M(swapmac(i, j));
  M(swapfunc(i, j));
  M(swap_func_inline(x, i, j));
  T("\nMax Function, Macro and Inline", n);
  M(k = (i > j) ? i : j);
  M(k = maxmac(i, j));
  M(k = maxfunc(i, j));
  n = startn / 5;
  T("\nMath Functions", n);
  M(fk = j+fi;);
  M(k = rand(););
  M(fk = sqrt(j+fi));
  M(fk = sin(j+fi));
  M(fk = sinh(j+fi));
  M(fk = asin(j+fi));
  M(fk = cos(j+fi));
  M(fk = tan(j+fi));
  n = startn / 10;
  T("\nMemory Allocation", n);
  M(free(malloc(16)););
  M(free(malloc(100)););
  M(free(malloc(2000)););

  /* Possible additions: input, output, malloc */
  printf("\n Secs: %4.2f\n",
    ((double) clock()-globalstart)
    / CLOCKS_PER_SEC);
  return 0;
}

These are the execution results:

->time_tests_gcc:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 60000 70000 60000 70000 60000 3
k++ 60000 60000 60000 60000 70000 2
k = i 60000 70000 60000 70000 60000 3
k = i + j 60000 50000 60000 60000 50000 2
k = i - j 60000 60000 70000 60000 70000 3
k = i * j 50000 60000 50000 60000 60000 2
k = i / j 100000 100000 100000 110000 100000 4
k = i % j 100000 100000 110000 100000 100000 4
k = i & j 60000 50000 60000 60000 50000 2
k = i | j 60000 50000 60000 60000 50000 2

Floating Point Arithmetic (n=5000)
fj=j; 70000 60000 70000 60000 70000 3
fj=j; fk = fi + fj; 120000 120000 120000 120000 120000 5
fj=j; fk = fi - fj; 130000 120000 120000 120000 120000 5
fj=j; fk = fi * fj; 130000 130000 130000 130000 130000 5
fj=j; fk = fi / fj; 230000 220000 230000 220000 220000 9

Array Operations (n=5000)
k = i + j 60000 60000 50000 60000 50000 2
k = x[i] + j 60000 60000 50000 60000 60000 2
k = i + x[j] 60000 60000 60000 60000 60000 2
k = x[i] + x[j] 70000 70000 70000 70000 70000 3

Comparisons (n=5000)
if (i < j) k++ 60000 70000 60000 60000 60000 2
if (x[i] < x[j]) k++ 70000 80000 70000 70000 80000 3

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 70000 70000 70000 80000 70000 3
k = intcmp(x+i, x+j) 160000 160000 170000 160000 160000 6
swapmac(i, j) 110000 120000 110000 120000 110000 5
swapfunc(i, j) 200000 200000 200000 200000 210000 8
swap_func_inline(x, i, j) 210000 220000 220000 210000 220000 9

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 70000 70000 60000 70000 70000 3
k = maxmac(i, j) 60000 70000 70000 70000 60000 3
k = maxfunc(i, j) 140000 140000 140000 140000 140000 6

Math Functions (n=1000)
fk = j+fi; 10000 0 0 0 0 2
k = rand(); 20000 10000 10000 10000 10000 12
fk = sqrt(j+fi) 20000 10000 20000 20000 20000 18
fk = sin(j+fi) 50000 60000 60000 60000 60000 58
fk = sinh(j+fi) 30000 20000 30000 30000 30000 28
fk = asin(j+fi) 30000 20000 30000 30000 20000 26
fk = cos(j+fi) 60000 60000 60000 60000 60000 60
fk = tan(j+fi) 80000 80000 90000 80000 80000 82

Memory Allocation (n=500)
free(malloc(16)); 10000 10000 10000 0 10000 32
free(malloc(100)); 10000 10000 0 10000 10000 32
free(malloc(2000)); 10000 20000 10000 10000 20000 56

  Secs: 15.40

->time_tests_clang:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 60000 70000 60000 70000 60000 3
k++ 60000 50000 60000 60000 50000 2
k = i 70000 60000 70000 60000 70000 3
k = i + j 50000 60000 60000 50000 60000 2
k = i - j 50000 60000 60000 50000 60000 2
k = i * j 60000 60000 60000 60000 60000 2
k = i / j 100000 110000 100000 100000 100000 4
k = i % j 110000 100000 100000 110000 100000 4
k = i & j 60000 50000 60000 60000 50000 2
k = i | j 60000 60000 50000 60000 60000 2

Floating Point Arithmetic (n=5000)
fj=j; 70000 70000 80000 70000 80000 3
fj=j; fk = fi + fj; 120000 120000 120000 120000 130000 5
fj=j; fk = fi - fj; 120000 120000 120000 120000 120000 5
fj=j; fk = fi * fj; 140000 130000 130000 130000 130000 5
fj=j; fk = fi / fj; 240000 250000 240000 240000 240000 10

Array Operations (n=5000)
k = i + j 60000 60000 50000 60000 60000 2
k = x[i] + j 60000 70000 60000 70000 70000 3
k = i + x[j] 60000 70000 60000 70000 70000 3
k = x[i] + x[j] 70000 80000 70000 80000 70000 3

Comparisons (n=5000)
if (i < j) k++ 60000 70000 60000 60000 60000 2
if (x[i] < x[j]) k++ 70000 80000 80000 70000 80000 3

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 80000 90000 80000 90000 90000 3
k = intcmp(x+i, x+j) 170000 180000 170000 180000 180000 7
swapmac(i, j) 120000 110000 120000 110000 120000 5
swapfunc(i, j) 190000 180000 190000 190000 190000 8
swap_func_inline(x, i, j) 220000 210000 220000 220000 220000 9

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 70000 80000 80000 80000 70000 3
k = maxmac(i, j) 80000 80000 80000 80000 80000 3
k = maxfunc(i, j) 180000 170000 170000 170000 170000 7

Math Functions (n=1000)
fk = j+fi; 10000 0 0 0 10000 4
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 20000 20000 20000 20000 20000 20
fk = sin(j+fi) 50000 60000 60000 50000 60000 56
fk = sinh(j+fi) 30000 30000 30000 30000 30000 30
fk = asin(j+fi) 30000 20000 30000 30000 30000 28
fk = cos(j+fi) 60000 50000 60000 60000 60000 58
fk = tan(j+fi) 80000 80000 80000 90000 80000 82

Memory Allocation (n=500)
free(malloc(16)); 10000 0 10000 10000 10000 32
free(malloc(100)); 0 10000 10000 10000 0 24
free(malloc(2000)); 20000 10000 10000 20000 10000 56

  Secs: 15.99

->time_tests_gcc_02:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 0 0 0 0 0 0
k++ 0 0 0 0 0 0
k = i 0 0 0 0 0 0
k = i + j 0 0 0 0 0 0
k = i - j 0 0 0 0 0 0
k = i * j 0 0 0 0 0 0
k = i / j 0 0 0 0 0 0
k = i % j 0 0 0 0 0 0
k = i & j 0 0 0 0 0 0
k = i | j 0 0 0 0 0 0

Floating Point Arithmetic (n=5000)
fj=j; 0 0 0 0 0 0
fj=j; fk = fi + fj; 0 0 0 0 0 0
fj=j; fk = fi - fj; 0 0 0 0 0 0
fj=j; fk = fi * fj; 0 0 0 0 0 0
fj=j; fk = fi / fj; 0 0 0 0 0 0

Array Operations (n=5000)
k = i + j 0 0 0 0 0 0
k = x[i] + j 0 0 0 0 0 0
k = i + x[j] 0 0 0 0 0 0
k = x[i] + x[j] 0 0 0 0 0 0

Comparisons (n=5000)
if (i < j) k++ 0 0 0 0 0 0
if (x[i] < x[j]) k++ 0 0 0 0 0 0

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 0 0 0 0 0 0
k = intcmp(x+i, x+j) 0 0 0 0 0 0
swapmac(i, j) 30000 30000 30000 20000 30000 1
swapfunc(i, j) 30000 30000 30000 30000 20000 1
swap_func_inline(x, i, j) 40000 30000 40000 30000 40000 1

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 0 0 0 0 0 0
k = maxmac(i, j) 0 0 0 0 0 0
k = maxfunc(i, j) 0 0 0 0 0 0

Math Functions (n=1000)
fk = j+fi; 0 0 0 0 0 0
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 0 10000 0 0 0 2
fk = sin(j+fi) 0 0 0 0 0 0
fk = sinh(j+fi) 20000 20000 20000 20000 20000 20
fk = asin(j+fi) 20000 20000 20000 20000 20000 20
fk = cos(j+fi) 0 0 0 0 0 0
fk = tan(j+fi) 0 0 0 0 0 0

Memory Allocation (n=500)
free(malloc(16)); 10000 10000 0 10000 10000 32
free(malloc(100)); 0 10000 10000 10000 10000 32
free(malloc(2000)); 10000 10000 20000 10000 10000 48

  Secs: 0.86

->time_tests_clang_02
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 10000 20000 20000 20000 20000 1
k++ 20000 20000 20000 20000 20000 1
k = i 10000 20000 20000 20000 20000 1
k = i + j 20000 20000 20000 20000 10000 1
k = i - j 20000 20000 20000 20000 20000 1
k = i * j 20000 20000 20000 10000 20000 1
k = i / j 20000 20000 20000 20000 20000 1
k = i % j 20000 20000 20000 10000 20000 1
k = i & j 20000 20000 20000 20000 20000 1
k = i | j 20000 20000 10000 20000 20000 1

Floating Point Arithmetic (n=5000)
fj=j; 20000 20000 20000 20000 20000 1
fj=j; fk = fi + fj; 20000 20000 20000 20000 20000 1
fj=j; fk = fi - fj; 10000 20000 20000 20000 20000 1
fj=j; fk = fi * fj; 20000 20000 20000 20000 20000 1
fj=j; fk = fi / fj; 20000 20000 20000 20000 20000 1

Array Operations (n=5000)
k = i + j 20000 20000 10000 20000 20000 1
k = x[i] + j 20000 20000 20000 20000 20000 1
k = i + x[j] 20000 20000 20000 20000 20000 1
k = x[i] + x[j] 20000 10000 20000 20000 20000 1

Comparisons (n=5000)
if (i < j) k++ 20000 20000 20000 20000 20000 1
if (x[i] < x[j]) k++ 20000 20000 20000 20000 10000 1

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 20000 20000 20000 20000 20000 1
k = intcmp(x+i, x+j) 20000 20000 20000 20000 10000 1
swapmac(i, j) 30000 30000 30000 30000 30000 1
swapfunc(i, j) 30000 20000 30000 30000 30000 1
swap_func_inline(x, i, j) 30000 30000 30000 30000 20000 1

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 20000 20000 20000 20000 20000 1
k = maxmac(i, j) 20000 20000 10000 20000 20000 1
k = maxfunc(i, j) 20000 20000 20000 20000 20000 1

Math Functions (n=1000)
fk = j+fi; 0 0 0 0 0 0
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 20000 20000 20000 10000 20000 18
fk = sin(j+fi) 60000 50000 60000 60000 50000 56
fk = sinh(j+fi) 30000 30000 30000 20000 30000 28
fk = asin(j+fi) 30000 30000 20000 30000 30000 28
fk = cos(j+fi) 60000 50000 60000 60000 50000 56
fk = tan(j+fi) 80000 80000 90000 80000 80000 82

Memory Allocation (n=500)
free(malloc(16)); 0 0 0 0 0 0
free(malloc(100)); 0 0 0 0 0 0
free(malloc(2000)); 0 0 0 0 0 0

  Secs: 4.30

->time_tests_gcc_03:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 0 0 0 0 0 0
k++ 0 0 0 0 0 0
k = i 0 0 0 0 0 0
k = i + j 0 0 0 0 0 0
k = i - j 0 0 0 0 0 0
k = i * j 0 0 0 0 0 0
k = i / j 0 0 0 0 0 0
k = i % j 0 0 0 0 0 0
k = i & j 0 0 0 0 0 0
k = i | j 0 0 0 0 0 0

Floating Point Arithmetic (n=5000)
fj=j; 0 0 0 0 0 0
fj=j; fk = fi + fj; 0 0 0 0 0 0
fj=j; fk = fi - fj; 0 0 0 0 0 0
fj=j; fk = fi * fj; 0 0 0 0 0 0
fj=j; fk = fi / fj; 0 0 0 0 0 0

Array Operations (n=5000)
k = i + j 0 0 0 0 0 0
k = x[i] + j 0 0 0 0 0 0
k = i + x[j] 0 0 0 0 0 0
k = x[i] + x[j] 0 0 0 0 0 0

Comparisons (n=5000)
if (i < j) k++ 0 0 0 0 0 0
if (x[i] < x[j]) k++ 0 0 0 0 0 0

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 0 0 0 0 0 0
k = intcmp(x+i, x+j) 0 0 0 0 0 0
swapmac(i, j) 30000 30000 30000 20000 30000 1
swapfunc(i, j) 30000 30000 30000 30000 30000 1
swap_func_inline(x, i, j) 30000 40000 30000 40000 30000 1

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 0 0 0 0 0 0
k = maxmac(i, j) 0 0 0 0 0 0
k = maxfunc(i, j) 0 0 0 0 0 0

Math Functions (n=1000)
fk = j+fi; 0 0 0 0 0 0
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 0 10000 0 0 0 2
fk = sin(j+fi) 0 0 0 0 0 0
fk = sinh(j+fi) 20000 20000 20000 20000 20000 20
fk = asin(j+fi) 20000 20000 20000 20000 20000 20
fk = cos(j+fi) 0 0 0 0 0 0
fk = tan(j+fi) 0 0 0 0 0 0

Memory Allocation (n=500)
free(malloc(16)); 10000 0 10000 10000 10000 32
free(malloc(100)); 0 10000 10000 10000 0 24
free(malloc(2000)); 20000 10000 20000 10000 10000 56

  Secs: 0.86

time_tests_clang_02 and time_tests_clang_03 are the same, so testing
it again is irrelevant.

Disregarding the difference between time_tests_gcc_02 and
time_tests_gcc_03, which doesn't seems relevant, is the difference
between time_tests_gcc_02 and time_tests_clang_02 something expected?

I'm talking about what gcc's optimizations seems to have removed while
clang seems to have maintained. Is that a design choice or something
that simply still needs work?

Giangiacomo Mariotti

Hi Giangiacomo,

Hi all,
I just wanted to try clang, so I compiled the current trunk.

clang version 1.1 (trunk 85507)
Target: x86_64-unknown-linux-gnu
Thread model: posix

Then I compiled with it and gcc-4.4.2 the appended code with:
1) $(CC) -Wall -O0
2) $(CC) -Wall -O2
3) $(CC) -Wall -O3

No errors, good.
ls -l:
27713 2009-10-29 17:48 time_tests_clang
19346 2009-10-29 17:49 time_tests_clang_02
19346 2009-10-29 17:49 time_tests_clang_03
23329 2009-10-29 17:48 time_tests_gcc
19039 2009-10-29 17:48 time_tests_gcc_02
19039 2009-10-29 17:49 time_tests_gcc_03

strip --strip-all
ls -l:
23512 2009-10-29 17:53 time_tests_clang
15280 2009-10-29 17:53 time_tests_clang_02
15280 2009-10-29 17:53 time_tests_clang_03
19128 2009-10-29 17:52 time_tests_gcc
15008 2009-10-29 17:52 time_tests_gcc_02
15008 2009-10-29 17:52 time_tests_gcc_03

A bit bloated the unoptimized version, but the optimized version is good.

Yes, clang's -O0 codegen is known to be more verbose than gcc. We'd
like to fix it, but it isn't a very high priority. Also as you
noticed, there isn't much difference between -O2 and -O3 for clang.

These are the sha1sums of the stripped files:

c99fed6779e4b6078b8a9eca80c64a31e741ac61 time_tests_clang
e162a06e483d14d6baa64d2bfd519b8d2e1e0d0c time_tests_clang_02
e162a06e483d14d6baa64d2bfd519b8d2e1e0d0c time_tests_clang_03
62579489108af5fc4bb4d4d57d4d8e5a1537b4ee time_tests_gcc
ee582023b6acbbbb5235e674534d5949bb18fb0f time_tests_gcc_02
5f83c4e4474001632590f5689e9eea84c537145a time_tests_gcc_03

The -O3 switch isn't really implemented or the new optimizations
activated simply didn't find anything to modify on this code?

It's implemented, it just didn't do anything different.

This is the code (taken from Programming Pearls 2nd ed., by Jon Bentley):
(with some irrelevant modifications by me)

/* Copyright (C) 1999 Lucent Technologies */
/* From 'Programming Pearls' by Jon Bentley */

/* timemod.c -- Produce table of C run time costs */

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>

#define MAXN 100000
int x[MAXN];
int startn = 5000;
int n;

/* FUNCTIONS TO BE TIMED */

int intcmp(int *i, int *j)
{
return *i - *j;
}

#define swapmac(i, j) { t = x[i]; x[i] = x[j]; x[j] = t; }

void swapfunc(int i, int j)
{
int t = x[i];
x[i] = x[j];
x[j] = t;
}

static inline void swap_func_inline(int arr[], int i, int j)
{
int t = arr[i];
arr[i] = arr[j];
arr[j] = t;
}

#define maxmac(a, b) ((a) > (b) ? (a) : (b))

int maxfunc(int a, int b)
{
return a > b ? a : b;
}

/* WORKHORSE */

#define T(s, n)\
do {\
typeof(s) s__ = s;\
typeof(n) n__ = n;\
printf("%s (n=%d)\n", s__, n__);\
} while (0)

#define TRIALS 5

#define M(op) do { \
printf(" %-26s", #op); \
k = 0; \
timesum = 0; \
for (ex = 0; ex < TRIALS; ex++) { \
start = clock(); \
for (i = 1; i <= n; i++) { \
fi = (float) i; \
for (j = 1; j <= n; j++) { \
op; \
} \
} \
t = clock()-start; \
printf("%8d", t); \
timesum += t; \
} \
nans = 1e9 * timesum / ((double) \
n*n * TRIALS * CLOCKS_PER_SEC); \
printf("%8.0f\n", nans); } while (0)

int main()
{
int i, j, k;
float fi, fj, fk;
int t, ex, timesum, start, globalstart;
double nans;

   globalstart = clock\(\);
   for \(i = 0; i &lt; n; i\+\+\)
           x\[i\] = rand\(\);
   n = startn;
   printf\(&quot;C Time Cost Model, n=%d\\n&quot;, n\);
   T\(&quot;\\nInteger Arithmetic&quot;, n\);
   M\(\{\}\);
   M\(k\+\+\);
   M\(k = i\);
   M\(k = i \+ j\);
   M\(k = i \- j\);
   M\(k = i \* j\);
   M\(k = i / j\);
   M\(k = i % j\);
   M\(k = i &amp; j\);
   M\(k = i | j\);
   T\(&quot;\\nFloating Point Arithmetic&quot;, n\);
   M\(fj=j;\);
   M\(fj=j; fk = fi \+ fj;\);
   M\(fj=j; fk = fi \- fj;\);
   M\(fj=j; fk = fi \* fj;\);
   M\(fj=j; fk = fi / fj;\);
   T\(&quot;\\nArray Operations&quot;, n\);
   M\(k = i \+ j\);
   M\(k = x\[i\] \+ j\);
   M\(k = i \+ x\[j\]\);
   M\(k = x\[i\] \+ x\[j\]\);
   T\(&quot;\\nComparisons&quot;, n\);
   M\(if \(i &lt; j\) k\+\+\);
   M\(if \(x\[i\] &lt; x\[j\]\) k\+\+\);
   T\(&quot;\\nArray Comparisons and Swaps&quot;, n\);
   M\(k = \(x\[i\]&lt;x\[k\]\) ? \-1 : 1\);
   M\(k = intcmp\(x\+i, x\+j\)\);
   M\(swapmac\(i, j\)\);
   M\(swapfunc\(i, j\)\);
   M\(swap\_func\_inline\(x, i, j\)\);
   T\(&quot;\\nMax Function, Macro and Inline&quot;, n\);
   M\(k = \(i &gt; j\) ? i : j\);
   M\(k = maxmac\(i, j\)\);
   M\(k = maxfunc\(i, j\)\);
   n = startn / 5;
   T\(&quot;\\nMath Functions&quot;, n\);
   M\(fk = j\+fi;\);
   M\(k = rand\(\);\);
   M\(fk = sqrt\(j\+fi\)\);
   M\(fk = sin\(j\+fi\)\);
   M\(fk = sinh\(j\+fi\)\);
   M\(fk = asin\(j\+fi\)\);
   M\(fk = cos\(j\+fi\)\);
   M\(fk = tan\(j\+fi\)\);
   n = startn / 10;
   T\(&quot;\\nMemory Allocation&quot;, n\);
   M\(free\(malloc\(16\)\);\);
   M\(free\(malloc\(100\)\);\);
   M\(free\(malloc\(2000\)\);\);

   /\* Possible additions: input, output, malloc \*/
   printf\(&quot;\\n  Secs: %4\.2f\\n&quot;,
           \(\(double\) clock\(\)\-globalstart\)
           / CLOCKS\_PER\_SEC\);
   return 0;

}

These are the execution results:

->time_tests_gcc:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 60000 70000 60000 70000 60000 3
k++ 60000 60000 60000 60000 70000 2
k = i 60000 70000 60000 70000 60000 3
k = i + j 60000 50000 60000 60000 50000 2
k = i - j 60000 60000 70000 60000 70000 3
k = i * j 50000 60000 50000 60000 60000 2
k = i / j 100000 100000 100000 110000 100000 4
k = i % j 100000 100000 110000 100000 100000 4
k = i & j 60000 50000 60000 60000 50000 2
k = i | j 60000 50000 60000 60000 50000 2

Floating Point Arithmetic (n=5000)
fj=j; 70000 60000 70000 60000 70000 3
fj=j; fk = fi + fj; 120000 120000 120000 120000 120000 5
fj=j; fk = fi - fj; 130000 120000 120000 120000 120000 5
fj=j; fk = fi * fj; 130000 130000 130000 130000 130000 5
fj=j; fk = fi / fj; 230000 220000 230000 220000 220000 9

Array Operations (n=5000)
k = i + j 60000 60000 50000 60000 50000 2
k = x[i] + j 60000 60000 50000 60000 60000 2
k = i + x[j] 60000 60000 60000 60000 60000 2
k = x[i] + x[j] 70000 70000 70000 70000 70000 3

Comparisons (n=5000)
if (i < j) k++ 60000 70000 60000 60000 60000 2
if (x[i] < x[j]) k++ 70000 80000 70000 70000 80000 3

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 70000 70000 70000 80000 70000 3
k = intcmp(x+i, x+j) 160000 160000 170000 160000 160000 6
swapmac(i, j) 110000 120000 110000 120000 110000 5
swapfunc(i, j) 200000 200000 200000 200000 210000 8
swap_func_inline(x, i, j) 210000 220000 220000 210000 220000 9

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 70000 70000 60000 70000 70000 3
k = maxmac(i, j) 60000 70000 70000 70000 60000 3
k = maxfunc(i, j) 140000 140000 140000 140000 140000 6

Math Functions (n=1000)
fk = j+fi; 10000 0 0 0 0 2
k = rand(); 20000 10000 10000 10000 10000 12
fk = sqrt(j+fi) 20000 10000 20000 20000 20000 18
fk = sin(j+fi) 50000 60000 60000 60000 60000 58
fk = sinh(j+fi) 30000 20000 30000 30000 30000 28
fk = asin(j+fi) 30000 20000 30000 30000 20000 26
fk = cos(j+fi) 60000 60000 60000 60000 60000 60
fk = tan(j+fi) 80000 80000 90000 80000 80000 82

Memory Allocation (n=500)
free(malloc(16)); 10000 10000 10000 0 10000 32
free(malloc(100)); 10000 10000 0 10000 10000 32
free(malloc(2000)); 10000 20000 10000 10000 20000 56

Secs: 15.40

->time_tests_clang:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 60000 70000 60000 70000 60000 3
k++ 60000 50000 60000 60000 50000 2
k = i 70000 60000 70000 60000 70000 3
k = i + j 50000 60000 60000 50000 60000 2
k = i - j 50000 60000 60000 50000 60000 2
k = i * j 60000 60000 60000 60000 60000 2
k = i / j 100000 110000 100000 100000 100000 4
k = i % j 110000 100000 100000 110000 100000 4
k = i & j 60000 50000 60000 60000 50000 2
k = i | j 60000 60000 50000 60000 60000 2

Floating Point Arithmetic (n=5000)
fj=j; 70000 70000 80000 70000 80000 3
fj=j; fk = fi + fj; 120000 120000 120000 120000 130000 5
fj=j; fk = fi - fj; 120000 120000 120000 120000 120000 5
fj=j; fk = fi * fj; 140000 130000 130000 130000 130000 5
fj=j; fk = fi / fj; 240000 250000 240000 240000 240000 10

Array Operations (n=5000)
k = i + j 60000 60000 50000 60000 60000 2
k = x[i] + j 60000 70000 60000 70000 70000 3
k = i + x[j] 60000 70000 60000 70000 70000 3
k = x[i] + x[j] 70000 80000 70000 80000 70000 3

Comparisons (n=5000)
if (i < j) k++ 60000 70000 60000 60000 60000 2
if (x[i] < x[j]) k++ 70000 80000 80000 70000 80000 3

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 80000 90000 80000 90000 90000 3
k = intcmp(x+i, x+j) 170000 180000 170000 180000 180000 7
swapmac(i, j) 120000 110000 120000 110000 120000 5
swapfunc(i, j) 190000 180000 190000 190000 190000 8
swap_func_inline(x, i, j) 220000 210000 220000 220000 220000 9

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 70000 80000 80000 80000 70000 3
k = maxmac(i, j) 80000 80000 80000 80000 80000 3
k = maxfunc(i, j) 180000 170000 170000 170000 170000 7

Math Functions (n=1000)
fk = j+fi; 10000 0 0 0 10000 4
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 20000 20000 20000 20000 20000 20
fk = sin(j+fi) 50000 60000 60000 50000 60000 56
fk = sinh(j+fi) 30000 30000 30000 30000 30000 30
fk = asin(j+fi) 30000 20000 30000 30000 30000 28
fk = cos(j+fi) 60000 50000 60000 60000 60000 58
fk = tan(j+fi) 80000 80000 80000 90000 80000 82

Memory Allocation (n=500)
free(malloc(16)); 10000 0 10000 10000 10000 32
free(malloc(100)); 0 10000 10000 10000 0 24
free(malloc(2000)); 20000 10000 10000 20000 10000 56

Secs: 15.99

->time_tests_gcc_02:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 0 0 0 0 0 0
k++ 0 0 0 0 0 0
k = i 0 0 0 0 0 0
k = i + j 0 0 0 0 0 0
k = i - j 0 0 0 0 0 0
k = i * j 0 0 0 0 0 0
k = i / j 0 0 0 0 0 0
k = i % j 0 0 0 0 0 0
k = i & j 0 0 0 0 0 0
k = i | j 0 0 0 0 0 0

Floating Point Arithmetic (n=5000)
fj=j; 0 0 0 0 0 0
fj=j; fk = fi + fj; 0 0 0 0 0 0
fj=j; fk = fi - fj; 0 0 0 0 0 0
fj=j; fk = fi * fj; 0 0 0 0 0 0
fj=j; fk = fi / fj; 0 0 0 0 0 0

Array Operations (n=5000)
k = i + j 0 0 0 0 0 0
k = x[i] + j 0 0 0 0 0 0
k = i + x[j] 0 0 0 0 0 0
k = x[i] + x[j] 0 0 0 0 0 0

Comparisons (n=5000)
if (i < j) k++ 0 0 0 0 0 0
if (x[i] < x[j]) k++ 0 0 0 0 0 0

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 0 0 0 0 0 0
k = intcmp(x+i, x+j) 0 0 0 0 0 0
swapmac(i, j) 30000 30000 30000 20000 30000 1
swapfunc(i, j) 30000 30000 30000 30000 20000 1
swap_func_inline(x, i, j) 40000 30000 40000 30000 40000 1

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 0 0 0 0 0 0
k = maxmac(i, j) 0 0 0 0 0 0
k = maxfunc(i, j) 0 0 0 0 0 0

Math Functions (n=1000)
fk = j+fi; 0 0 0 0 0 0
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 0 10000 0 0 0 2
fk = sin(j+fi) 0 0 0 0 0 0
fk = sinh(j+fi) 20000 20000 20000 20000 20000 20
fk = asin(j+fi) 20000 20000 20000 20000 20000 20
fk = cos(j+fi) 0 0 0 0 0 0
fk = tan(j+fi) 0 0 0 0 0 0

Memory Allocation (n=500)
free(malloc(16)); 10000 10000 0 10000 10000 32
free(malloc(100)); 0 10000 10000 10000 10000 32
free(malloc(2000)); 10000 10000 20000 10000 10000 48

Secs: 0.86

->time_tests_clang_02
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 10000 20000 20000 20000 20000 1
k++ 20000 20000 20000 20000 20000 1
k = i 10000 20000 20000 20000 20000 1
k = i + j 20000 20000 20000 20000 10000 1
k = i - j 20000 20000 20000 20000 20000 1
k = i * j 20000 20000 20000 10000 20000 1
k = i / j 20000 20000 20000 20000 20000 1
k = i % j 20000 20000 20000 10000 20000 1
k = i & j 20000 20000 20000 20000 20000 1
k = i | j 20000 20000 10000 20000 20000 1

Floating Point Arithmetic (n=5000)
fj=j; 20000 20000 20000 20000 20000 1
fj=j; fk = fi + fj; 20000 20000 20000 20000 20000 1
fj=j; fk = fi - fj; 10000 20000 20000 20000 20000 1
fj=j; fk = fi * fj; 20000 20000 20000 20000 20000 1
fj=j; fk = fi / fj; 20000 20000 20000 20000 20000 1

Array Operations (n=5000)
k = i + j 20000 20000 10000 20000 20000 1
k = x[i] + j 20000 20000 20000 20000 20000 1
k = i + x[j] 20000 20000 20000 20000 20000 1
k = x[i] + x[j] 20000 10000 20000 20000 20000 1

Comparisons (n=5000)
if (i < j) k++ 20000 20000 20000 20000 20000 1
if (x[i] < x[j]) k++ 20000 20000 20000 20000 10000 1

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 20000 20000 20000 20000 20000 1
k = intcmp(x+i, x+j) 20000 20000 20000 20000 10000 1
swapmac(i, j) 30000 30000 30000 30000 30000 1
swapfunc(i, j) 30000 20000 30000 30000 30000 1
swap_func_inline(x, i, j) 30000 30000 30000 30000 20000 1

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 20000 20000 20000 20000 20000 1
k = maxmac(i, j) 20000 20000 10000 20000 20000 1
k = maxfunc(i, j) 20000 20000 20000 20000 20000 1

Math Functions (n=1000)
fk = j+fi; 0 0 0 0 0 0
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 20000 20000 20000 10000 20000 18
fk = sin(j+fi) 60000 50000 60000 60000 50000 56
fk = sinh(j+fi) 30000 30000 30000 20000 30000 28
fk = asin(j+fi) 30000 30000 20000 30000 30000 28
fk = cos(j+fi) 60000 50000 60000 60000 50000 56
fk = tan(j+fi) 80000 80000 90000 80000 80000 82

Memory Allocation (n=500)
free(malloc(16)); 0 0 0 0 0 0
free(malloc(100)); 0 0 0 0 0 0
free(malloc(2000)); 0 0 0 0 0 0

Secs: 4.30

->time_tests_gcc_03:
C Time Cost Model, n=5000

Integer Arithmetic (n=5000)
{} 0 0 0 0 0 0
k++ 0 0 0 0 0 0
k = i 0 0 0 0 0 0
k = i + j 0 0 0 0 0 0
k = i - j 0 0 0 0 0 0
k = i * j 0 0 0 0 0 0
k = i / j 0 0 0 0 0 0
k = i % j 0 0 0 0 0 0
k = i & j 0 0 0 0 0 0
k = i | j 0 0 0 0 0 0

Floating Point Arithmetic (n=5000)
fj=j; 0 0 0 0 0 0
fj=j; fk = fi + fj; 0 0 0 0 0 0
fj=j; fk = fi - fj; 0 0 0 0 0 0
fj=j; fk = fi * fj; 0 0 0 0 0 0
fj=j; fk = fi / fj; 0 0 0 0 0 0

Array Operations (n=5000)
k = i + j 0 0 0 0 0 0
k = x[i] + j 0 0 0 0 0 0
k = i + x[j] 0 0 0 0 0 0
k = x[i] + x[j] 0 0 0 0 0 0

Comparisons (n=5000)
if (i < j) k++ 0 0 0 0 0 0
if (x[i] < x[j]) k++ 0 0 0 0 0 0

Array Comparisons and Swaps (n=5000)
k = (x[i]<x[k]) ? -1 : 1 0 0 0 0 0 0
k = intcmp(x+i, x+j) 0 0 0 0 0 0
swapmac(i, j) 30000 30000 30000 20000 30000 1
swapfunc(i, j) 30000 30000 30000 30000 30000 1
swap_func_inline(x, i, j) 30000 40000 30000 40000 30000 1

Max Function, Macro and Inline (n=5000)
k = (i > j) ? i : j 0 0 0 0 0 0
k = maxmac(i, j) 0 0 0 0 0 0
k = maxfunc(i, j) 0 0 0 0 0 0

Math Functions (n=1000)
fk = j+fi; 0 0 0 0 0 0
k = rand(); 10000 10000 10000 10000 10000 10
fk = sqrt(j+fi) 0 10000 0 0 0 2
fk = sin(j+fi) 0 0 0 0 0 0
fk = sinh(j+fi) 20000 20000 20000 20000 20000 20
fk = asin(j+fi) 20000 20000 20000 20000 20000 20
fk = cos(j+fi) 0 0 0 0 0 0
fk = tan(j+fi) 0 0 0 0 0 0

Memory Allocation (n=500)
free(malloc(16)); 10000 0 10000 10000 10000 32
free(malloc(100)); 0 10000 10000 10000 0 24
free(malloc(2000)); 20000 10000 20000 10000 10000 56

Secs: 0.86

time_tests_clang_02 and time_tests_clang_03 are the same, so testing
it again is irrelevant.

Disregarding the difference between time_tests_gcc_02 and
time_tests_gcc_03, which doesn't seems relevant, is the difference
between time_tests_gcc_02 and time_tests_clang_02 something expected?

I'm talking about what gcc's optimizations seems to have removed while
clang seems to have maintained. Is that a design choice or something
that simply still needs work?

I don't know the answers to these questions without digging in to your
example -- there is a lot of information here which I find hard to
read. I think you are asking about the time difference between
clang_02 and gcc_02, this is probably due to some missed optimization
which of course we always would like to fix. However, you'll have to
actually dig in to the example to find out what is going on. If you
can identify a small test case where clang clearly isn't optimizing
something it should, please file a bugzilla for it
(http://llvm.org/bugs).

- Daniel

Let's make the example code much shorter:

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define MAXN 100000
int x[MAXN];
int startn = 5000;
int n;

#define T(s, n)\
do {\
  typeof(s) s__ = s;\
  typeof(n) n__ = n;\
  printf("%s (n=%d)\n", s__, n__);\
} while (0)

#define TRIALS 5

#define M(op) do { \
  printf(" %-26s", #op); \
  k = 0; \
  timesum = 0; \
  for (ex = 0; ex < TRIALS; ex++) { \
    start = clock(); \
    for (i = 1; i <= n; i++) { \
      fi = (float) i; \
      for (j = 1; j <= n; j++) { \
        op; \
      } \
    } \
    t = clock()-start; \
    printf("%8d", t); \
    timesum += t; \
  } \
  nans = 1e9 * timesum / ((double) \
    n*n * TRIALS * CLOCKS_PER_SEC); \
  printf("%8.0f\n", nans); } while (0)

int main(void)
{
  int i, j, k;
  float fi;
  int t, ex, timesum, start;
  double nans;

  for (i = 0; i < n; i++)
    x[i] = rand();
  n = startn;
  printf("C Time Cost Model, n=%d\n", n);

  M({});

  return 0;
}

Here, if we look at the instruction that is timed and we ask ourself,
what is that instruction doing? The answer is simply: absolutely
nothing.

Now these are the results for the code compiled as explained on the
previous mail:

->time_tests_001_gcc_O2
C Time Cost Model, n=5000
{} 0 0 0 0 0 0

->time_tests_001_clang_O2
C Time Cost Model, n=5000
{} 10000 20000 20000 20000 20000 1

So the point is, at optimization level -O2, gcc compiles out code that
does nothing, while clang optimizes it, but it still leaves the code
there.
(Notice that I haven't inspected the object file, so I may be wrong. I
wanted to post this here, before doing that)

Giangiacomo Mariotti

And even shorter:

int n = 50000;

int main(void) {
   int i, j;
   for (i = 1; i <= n; i++) {
     for (j = 1; j <= n; j++) {
     }
   }
   return 0;
}

Please file a PR for this on the website. Should be trivial enough for them to fix. Thanks.

Mike Stump wrote:

That's why I wanted to ask about this here, before start talking about
bugs and whatnot. It may have been a design choice or a work in
progress.

So it is a design choice? Is it explicitly documented somewhere?(So
that I can see what are the other known differences in optimizations
with gcc) IMO this could be an interesting information for those who
start comparing the performance of code compiled with this and the
other compilers.

Giangiacomo Mariotti

That should probably be a target-specific decision. On a multitasking operating system, you want to optimise these away, because you don't get a deterministic delay anyway (so running it or not running it does not alter the language model) and if you want a short wait then you should call something like sleep(0) or sched_yield(). On something like a PIC16 target with no OS, this code was probably intended as a delay loop.

In the original example, it's actually slightly more subtle; the original code had some things in the loop, but they were optimised away during macro expansion (op) and unused variable elimination (fi = (float) i). In this case, removing the loop always makes sense.

David

-- Sent from my brain

We intentionally do optimize them out. Consider if this was a generic c++ iterator loop who was instantiated on a type with a trivial destructor. We'd want the loop to be nuked.

-Chris

Good, but that's not happening for the code I posted. So, is this a
work in progress?

Giangiacomo

I'd say this is a "missed optimization" bug. Please file a PR in http://llvm.org/bugs/

-Chris

Here it is: http://llvm.org/bugs/show_bug.cgi?id=5355