[Patches] Some LazyValueInfo and related patches

Hi.

Attached you will find a set of patches which I did while I was trying to solve
two problems.
I did not manage to solve fully what i wanted to improve, but I think it is
still a step in the right direction.

The patches are hopefully self-explanatory.
The biggest change here is that LazyValueInfo do not maintain a separate stack
of work to do,
but do the work directly recursively.

The test included in the patch 4 also test the patch 2.

The first problem I was trying to solve is to be let the code give hint on the
range of the values.

Imagine, in a library:

class CopyOnWrite {
    char *stuff;
    int ref_count;
    void detach_internal();
    inline void detach() {
        if (ref_count > 1) {
            detach_internal();
            /* ref_count = 1; */
        }
    }
public:
    char &operator(int i) { detach(); return stuff[i]; }
};

Then, in code like this:

int doStuffWithStuff(CoptOnWrite &stuff) {
    return stuff[0] + stuff[1] * stuff[2];
}

The generated code will contains three test of ref_count, and three call to
detach_internal

Is there a way to tell the compiler that ref_count is actually smaller or
equal to 1 after a call to detach_internal?
Having the "ref_count=1" explicit in the code help (with my patches), but then
the operation itself is in the code, and I don't want that.

Something like

if (ref_count>1)
     __builtin_unreachable()

Works fine in GCC, but does not work with LLVM.
Well, it almost work. but the problem is that the whole condition is removed
before the inlining is done.
So what can be done for that to work? Either delay the removal of
__builtin_unreachable() to after inlining (when?)
Another way could be, while removing branches because they are unreachable,
somehow leave the range information kept.
I was thinking about a !range metadata, but I don't know where to put it.

The other problem was that i was analyzing code like this:

void toLatin1(uchar *dst, const ushort *src, int length)
{
    if (length) {
#if defined(__SSE2__)
        if (length >= 16) {
            for (int i = 0; i < length >> 4; ++i) {
                /* skipped code using SSE2 intrinsics */
                src += 16; dst += 16;
            }
            length = length % 16;
        }
#endif
        while (length--) {
            *dst++ = (*src>0xff) ? '?' : (uchar) *src;
            ++src;
        }
    }
}

I was wondering, if compiling with AVX, would clang/LLVM be able to even
vectorize more the SSE2 intrinsics to wider vectors? Or would the non
intrinsics branch be better?
It turns out the result is not great. LLVM leaves the intrinsics code
unchanged (that's ok), but tries to also vectorize the second loop. (And the
result of this vectorisation is quite horrible.)
Shouldn't the compiler see that length is never bigger than 16 and hence
deduce that there is no point in vectorizing? This is why I implemented the
srem and urem in LVI.
But then, maybe some other pass a loop pass should use LVI to see than a loop
never enters, or loop vectorizer could use LVI to avoid creating the loop in
the first place.

0001-SCCP-Do-not-transform-load-of-a-null-pointer-into-0.patch (1003 Bytes)

0002-LVI-Be-able-to-optimize-the-condition-with-and-and-o.patch (7.67 KB)

0003-LVI-Re-order-the-check-that-the-second-operand-is-co.patch (2.88 KB)

0004-LVI-Look-recursively-the-dependencies-for-finding-ra.patch (3.08 KB)

0007-LVI-Support-range-detection-of-srem-and-urem.patch (8.16 KB)

0005-LVI-simplify-a-bit-by-not-having-a-separate-stack.patch (12.5 KB)

0008-CVP-Look-for-LVI-information-when-there-is-a-compari.patch (7.55 KB)

0006-LVI-simplify-remove-hasBlockValue-and-solve-from-get.patch (5.95 KB)

Ping?