Runtime linker issue wtih X11R6 on i386 with -O3 optimization

Hi everybody. I have an odd issue that I'd like to get some advice on.
It is a bit of a long story so please bear with me.

X11R6 has a notion of modules so it basically compiles everything into
shared libraries and at start-of-day it loads libraries (modules) as
needed. A side effect of that is that they require really lazy binding
because they do (can?) not enforce the load order.

The problem I am seeing is with any optimization higher than -O0 on the
following code:

uxa_check_poly_lines(DrawablePtr pDrawable, GCPtr pGC,
         int mode, int npt, DDXPointPtr ppt)
  ScreenPtr screen = pDrawable->pScreen;

  UXA_FALLBACK(("to %p (%c), width %d, mode %d, count %d\n",
          pDrawable, uxa_drawable_location(pDrawable),
          pGC->lineWidth, mode, npt));

  if (pGC->lineWidth == 0) {
    if (uxa_prepare_access(pDrawable, UXA_ACCESS_RW)) {
      if (uxa_prepare_access_gc(pGC)) {
        fbPolyLine(pDrawable, pGC, mode, npt, ppt);
  /* fb calls mi functions in the lineWidth != 0 case. */
  fbPolyLine(pDrawable, pGC, mode, npt, ppt);

This code optimizes into a TAILCALL and that makes X unhappy. Now to
make things worse, this exact same code works fine on X86_64, I only see
this issue on i386. Admittedly I have not looked at the x86_64 asm to
look for differences. All the code was compiled using clang 3.0 release
on OpenBSD.

Prototyping the offending functions with __attribute__((weak)) works
around the problem but is pretty ugly and unmaintainable in a project as old
and the size of xorg. Is there a magic flag I can use to enforce this
behavior or can we consider this a bug of sorts. I get why clang does
what it does, unfortunately it breaks stuff.

And I'll add the mandatory whine, yes it works with gcc at all
optimization levels.

I can provide more information if needed.