Hello World assembly without clib "puts"?

Can Hello World be written in LLVM assembly without using a C library function like “puts”?

Cheers,

Andrew Pennebaker
www.yellosoft.us

The low level details of sending text to the terminal are very OS and target specific. It generally requires putting the values into specific registers and triggering an interrupt. The only way to represent this behavior in LLVM IR would by adding new target specific intrinsics that can be converted during instruction selection to the right machine instructions.

LLVM IR models a general-purpose unprivileged CPU instruction set and so lacks anything to do I/O. If you want to interact with anything beyond the CPU and stack, you must either call a library function, issue a system call, or modify some of the target directly. This basically means either calling a libc function (possibly indirectly) or writing some inline assembly.

For example, on UNIX-like platforms puts() is typically implemented by a call to strlen() to calculate the length and then a write system call with the standard output file descriptor (number 1, traditionally). The details of a system call are implementation dependent, however you could write a small bit of inline assembly that would issue a write system call and then use this from LLVM IR.

The more important question is: why would you want to do that? What problem are you trying to solve?

David

The more important question is: why would you want to do that? What problem are you trying to solve?

As weird as it sounds, I’m looking for multiplatform assembly languages. I want to learn assembly, but I want my knowledge and code to carry over no matter which operating system I’m using. I regularly use Windows, Mac, and Linux, and I don’t want to have to rewrite my codebase every time I boot into another operating system.

I can do this by writing assembly code that calls C functions, but I get the distinct feeling: Why am I doing it this way? Why not just write in C? And there’s only so much assembly you can learn by calling C functions, instead of writing lower level code.

I understand that OS’s have different conventions for I/O, but what I don’t understand is why multiplatform assembly languages like LLVM, NASM, YASM, FASM, and Gas don’t give coders an macro or instruction set that gets expanded to the actual, per-OS instructions during assembly. I guess it lowers development efforts to reuse libc rather than add multiplatform I/O assembly macros. Smaller, non-libc dependent binaries don’t matter in a world with hefty hard drives.

The more important question is: why would you want to do that? What problem are you trying to solve?

As weird as it sounds, I'm looking for multiplatform assembly languages. I want to learn assembly, but I want my knowledge and code to carry over no matter which operating system I'm using. I regularly use Windows, Mac, and Linux, and I don't want to have to rewrite my codebase every time I boot into another operating system.

In that case, LLVM IR is a really bad choice. It has to be in static single assignment form, which makes it totally unlike any real assembly (it is designed to be easy to generate machine code from).

LLVM IR also does not abstract differences in calling conventions, nor does it have a macro language, and so LLVM IR is intrinsically not portable between architectures and often not between operating systems on the same architecture.

I can do this by writing assembly code that calls C functions, but I get the distinct feeling: Why am I doing it this way? Why not just write in C? And there's only so much assembly you can learn by calling C functions, instead of writing lower level code.

Learning C will be a lot more use to you.

I understand that OS's have different conventions for I/O, but what I don't understand is why multiplatform assembly languages like LLVM, NASM, YASM, FASM, and Gas don't give coders an macro or instruction set that gets expanded to the actual, per-OS instructions during assembly. I guess it lowers development efforts to reuse libc rather than add multiplatform I/O assembly macros. Smaller, non-libc dependent binaries don't matter in a world with hefty hard drives.

Using libc is more sensible because libc is the official public interface for any POSIX system. On Windows, for example, the underlying kernel routines are undocumented and are not part of the public API. On OS X, there are quite limited ABI stability guarantees at the kernel level and the official way of interacting with the kernel is via libc. A non-libc-dependent (or, more accurately, a non-libSystem-dependent) binary is one that may be broken by a minor kernel upgrade (this doesn't often happen, but it's not guaranteed not to).

Even on open source kernels, such as Linux of FreeBSD, if you look at the system call table you will see that once you stray past trivial things that were inherited from ancient UNIX there are significant differences between platforms and even between versions of the same platform. FreeBSD and Linux may implement the same C library functionality in terms of wildly different APIs. It takes more than a little assembly macro to, for example, abstract the difference between epoll() and kqueue(). Once you stray to something like OS X, you may see even wider varieties, such as things being implemented in terms of Mach ports.

If you want portable code, don't use assembly.

David

LLVM IR is not an assembly language. It is a public, well-documented compiler intermediate representation that abstracts away several (but not all) details of platform ABIs. Different operating systems have extremely different conventions for system calls, and the system calls are themselves quite different between operating systems. Even on the same architecture: Linux only uses int 0x80 for example, while DOS uses all of the different possible interrupt codes. If you want portable assembly, use C (which has often been called, literally, portable assembly language). Only one of those languages is intended to be “multiplatform” in the sense that it can be compiled to two different platforms (OS/architecture combinations) reliably, and that one isn’t an assembly language but a compiler IR. NASM, YASM, and FASM are all Intel-syntax x86 assemblers with varying degrees of macro support and output format support. Gas is pretty much a suite of assemblers that have a more-or-less uniform syntax. Reliably abstracting over I/O for an actual assembler is impossible, since the registers, stack, and operands you need for actual syscalls differs wildly from platform to platform. It’s pointless for LLVM IR, since it’s designed mostly to handle the output of compilers, which are already going to use the libc if possible. And libc isn’t that large–/lib/libc.so.6 (i.e., glibc) I measured to be 1MB on my laptop, and that’s probably the heftiest C standard library implementation around for *nix platforms. And when you link to it shared, all it requires is a few shared library entries that amounts to only a few K at most.

As weird as it sounds, I'm looking for multiplatform assembly
languages. I want to learn assembly, but I want my knowledge and code
to carry over no matter which operating system I'm using. I regularly
use Windows, Mac, and Linux, and I don't want to have to rewrite my
codebase every time I boot into another operating system.

Looks to me like you are describing the C language. From the wikipedia
article:

http://en.wikipedia.org/wiki/C_(programming_language)

  [C] was designed to be compiled using a relatively
  straightforward compiler, to provide low-level access to
  memory, to provide language constructs that map efficiently to
  machine instructions, and to require minimal run-time support.
  C was therefore useful for many applications that had formerly
  been coded in assembly language, such as in system programming.

  Despite its low-level capabilities, the language was designed to
  encourage cross-platform programming. A standards-compliant and
  portably written C program can be compiled for a very wide
  variety of computer platforms and operating systems with few
  changes to its source code.

I can do this by writing assembly code that calls C functions, but I
get the distinct feeling: *Why am I doing it this way? Why not just
write in C?

Exactly.

Sameer.