[PATCH 1/2] libclc: vload/vstore disable assembly and fix offset calculation

This commit gets us back to pure CLC and fixes offset calculations.

The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.

Signed-off-by: Aaron Watry <awatry@gmail.com>

The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.

Also, fix 64-bit pointer calculation. This was broken previously for Radeon SI.

This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.

v2: 1) Leave v[load|store]_impl.ll in generic/lib
    2) Remove vload_if.ll and vstore_if.ll interfaces
    3) Fix address+offset calculations
    3) Remove offset from assembly arg list

Signed-off-by: Aaron Watry <awatry@gmail.com>

This commit gets us back to pure CLC and fixes offset calculations.

The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.

Signed-off-by: Aaron Watry <awatry@gmail.com>

For the series:

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>