96 core Aarch64 cloud server for $0.50/hr

If you follow the news sites you’re probably aware that packet.net launched an Aarch64 cloud server this week with 96 core (2x 48 core) 2.0 GHz Cavium “ThunderX” with 128GB RAM, 320 GB m.2 SSD, for $0.50/hour.

https://www.packet.net/blog/arming-the-world-with-an-arm64-bare-metal-server/

I made an account and tried building llvm&clang on it, and comparing to various Intel machines (sorted by speed):

05m31s AWS c4.8xlarge 36 vCPU 60 GB RAM $1.68/hour
08m47s packet.net A64 96 core 128 GB RAM $0.50/hour
15m08s local i7 6700K 4 core 32 GB RAM
21m30s AWS c4.2xlarge 4 vCPU 15 GB RAM $0.42/hour
22m37s local i7 3770 4 core 32 GB RAM

So, this ARM server is not as fast as the fastest Intel machine (AWS c4.8xlarge), but it has much better price/performance with 60% of the performance at 30% of the price [1]. It’s four times faster than a comparably priced Intel server (with far less RAM).

On this particular highly parallel task, of course.

If you actually need to run Aarch64 code then this is definitely massively better than qemu. (So are Raspberry Pi or Odroid, except they have too little RAM to reasonably build llvm)

I ran the following commands on fresh Ubuntu 16.04 installs:

sudo apt-get update
sudo apt-get -y install g++ cmake make bzip2 gzip zip subversion
svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
pushd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
popd
mkdir build install
cd build
cmake -DCMAKE_INSTALL_PREFIX=$(readlink -f …/install) -DCMAKE_BUILD_TYPE=Release …/llvm
time make -j$(grep -c ^processor /proc/cpuinfo)

Interestingly, I failed in attempts to run 32 bit ARM code on the packet.net server. Even something as simple as the following (assembled and tested on a Pi) which I think should need no runtime support other than the kernel. It tried to run it, but segfaulted. Could it be that this is an Aarch64 only CPU? I haven’t been able to find anything about this on the net.

.syntax unified
.arch armv4

.equ SYSCALL_EXIT, 1
.equ SYSCALL_WRITE, 4
.equ STDOUT, 1

.globl _start
_start:
movs r0,#STDOUT
adr r1,hello
movs r2,#11
movs r7,#SYSCALL_WRITE
swi 0x0
movs r7,#SYSCALL_EXIT
swi 0x0

.align 2
hello: .asciz “Hello asm!\n”

[1] but the c4.8xlarge can often be obtained for $0.30 - $0.50 with spot pricing

Yes, the ThunderX doesn’t implement AArch32 (we have a couple of them). They’re intended as a successor for Cavium’s multithreaded MIPS64 designs, so the intended customers don’t have any legacy ARM code or any legacy 32-bit code.

David