"Linux Gazette...making Linux just a little more fun!"

Introduction to UNIX Assembly Programming

By Konstantin Boldyshev

This document is intended to be a tutorial, showing how to write a simple assembly program in several UNIX operating systems on IA32 (i386) platform. Included material may or may not be applicable to other hardware and/or software platforms. Document explains program layout, system call convention, and build process. It accompanies Linux Assembly HOWTO, which may be of your interest as well, though is more Linux specific.

v0.3, April 09, 2000

1. Introduction

1.1 Legal blurb

Copyright © 1999-2000 Konstantin Boldyshev. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation.

1.2 Obtatining this document

The latest version of this document is available from http://linuxassembly.org/intro.html. If you are reading a few-months-old copy, please check the url above for a new version.

1.3 Tools you need

You will need several tools to play with programs included in this tutorial.

First of all you need assembler (compiler). As a rule modern UNIX distribution includes gas (GNU Assembler), but all examples specified here use another assembler -- nasm (Netwide Assembler). You can download it from the nasm page, it comes with full source code. Compile it, or try to find precompiled binary for your OS; note that several distributions (at least Linux ones) already have nasm, check first.

Second, you need linker -- ld, since nasm produces only object code. Any distribution should embrace ld.

If you're going to dig in, you should also install include files for your OS, and if possible, kernel source.

Now you should be ready to start, welcome..

2. Hello, world!

Now we will write our program, classical "Hello, world" (hello.asm). You can download its sources and binaries here. But before let me explain several basics.

2.1 System call

Unless program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. Here comes a need to call some OS service. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.

There are two common ways of performing a system call in UNIX OS: trough the C library (libc) wrapper, or directly.

Using or not using libc in assembly programming is more a question of taste/belief than something practical. Libc wrappers are made to protect program from possible system call convention change, and to provide POSIX compatible interface, if kernel lacks it for some call. However usually UNIX kernel is more or less POSIX compliant, this means that syntax of most libc "system calls" exactly matches syntax of real kernel system calls (and vice versa). But main drawback of throwing libc away is that are loosing several functions that are not just syscall wrappers, like printf(), malloc() and similar.

This tutorial will show how to use direct kernel calls, since this is the fastest way to call kernel service; our code is not linked to any library, it communicates with kernel directly.

Things that differ in different UNIX kernels are set of system calls and system call convention (however as they strive for POSIX compliance, there's a lot of common between them).

Note for (former) DOS programmers: so, what is that system call? Better to explain it in such a way: if you ever wrote a DOS assembly program (and most IA32 assembly programmers did), you remember DOS services int 0x21, int 0x25, int 0x26 etc.. This is what can be designated as system call. However the actual implementation is absolutely different, and this doesn't mean that system calls necessary are done via some interrupt. Also, quite often DOS programmers mix OS services with BIOS services like int 0x10 or int 0x16, and are very surprised when they fail to perform them in UNIX, since these are not OS services).

2.2 Program layout

As a rule, modern IA32 UNIXes are 32bit (*grin*), run in protected mode, have flat memory model, and use ELF format for binaries.

Program can be divided into sections (or segments): .text for your code (read-only), .data for your data (read-write), .bss for uninitialized data (read-write); actually there can be few other, as well as user-defined sections, but there's rare need to use them and they are out of our interest here. Program must have at least .text section.

Ok, now we'll dive into OS specific details.

2.3 Linux

System calls in Linux are done through int 0x80. (actually there's a kernel patch allowing system calls to be done via syscall (sysenter) instruction on newer CPUs, but this thing is still experimental).

Linux differs from usual UNIX calling convention, and features "fastcall" convention for system calls (it resembles DOS). System function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to five arguments in ebx, ecx, edx, esi, edi consequently. If there's more than five arguments, they are simply passed though the structure as first argument. Result is returned in eax, stack is not touched at all.

System call function numbers are in sys/syscall.h, but actually in asm/unistd.h, some documentation is in the 2nd section of manual (f.e. to find info on write system call, issue man 2 write).

There are several attempts to made up-to-date documentation of Linux system calls, examine URLs in the references.

So, our Linux program will look like:



section .text
    global _start                       ;must be declared for linker (ld)

msg     db      'Hello, world!',0xa     ;our dear string
len     equ     $ - msg                 ;length of our dear string

_start:                 ;we tell linker where is entry point

        mov     edx,len ;message length
        mov     ecx,msg ;message to write
        mov     ebx,1   ;file descriptor (stdout)
        mov     eax,4   ;system call number (sys_write)
        int     0x80    ;call kernel

        mov     eax,1   ;system call number (sys_exit)
        int     0x80    ;call kernel

As you will see futther, Linux syscall convention is the most compact one.

Kernel source references:

arch/i386/kernel/entry.S
include/asm-i386/unistd.h
include/linux/sys.h

2.4 FreeBSD

FreeBSD has "usual" calling convention, when syscall number is in eax, and parameters are on the stack (the first argument is pushed the last). System call is to be performed through the function call to a function containing int 0x80 and ret, not just int 0x80 itself (return address MUST be on the stack before int 0x80 is issued!). Caller must clean up the stack after call. Result is returned as usual in eax.

Also there's an alternate way of using call 7:0 gate instead of int 0x80. End-result is the same, not counting increase of program size, since you will also need to push eax before, and these two instructions occupy more bytes.

System call function numbers are in sys/syscall.h, documentation is in the 2nd section of man.

Ok, I think the source will explain this better:

Note: Included code may run on other *BSD as well, I think.



section .text
    global _start                       ;must be declared for linker (ld)

msg     db      "Hello, world!",0xa     ;our dear string
len     equ     $ - msg                 ;length of our dear string

_syscall:               
        int     0x80            ;system call
        ret

_start:                         ;tell linker entry point

        push    dword len       ;message length
        push    dword msg       ;message to write
        push    dword 1         ;file descriptor (stdout)
        mov     eax,0x4         ;system call number (sys_write)
        call    _syscall        ;call kernel

                                ;actually there's an alternate
                                ;way to call kernel:
                                ;push   eax
                                ;call   7:0

        add     esp,12          ;clean stack (3 arguments * 4)

        push    dword 0         ;exit code
        mov     eax,0x1         ;system call number (sys_exit)
        call    _syscall        ;call kernel

                                ;we do not return from sys_exit,
                                ;there's no need to clean stack

Kernel source references:

i386/i386/exception.s
i386/i386/trap.c
sys/syscall.h

2.5 BeOS

BeOS kernel is using "usual" UNIX calling convention too. The difference from FreeBSD example is that you call int 0x25.

On information where to find system call function numbers and other interesting details, examine asmutils, especially os_beos.inc file.

Note: to make nasm compile correctly on BeOS you need to insert #include "nasm.h" into float.h, and #include <stdio.h> into nasm.h.



section .text
    global _start                       ;must be declared for linker (ld)

msg     db      "Hello, world!",0xa     ;our dear string
len     equ     $ - msg                 ;length of our dear string

_syscall:                       ;system call
        int     0x25
        ret

_start:                         ;tell linker entry point

        push    dword len       ;message length
        push    dword msg       ;message to write
        push    dword 1         ;file descriptor (stdout)
        mov     eax,0x3         ;system call number (sys_write)
        call    _syscall        ;call kernel
        add     esp,12          ;clean stack (3 * 4)

        push    dword 0         ;exit code
        mov     eax,0x3f        ;system call number (sys_exit)
        call    _syscall        ;call kernel
                                ;no need to clean stack

2.6 Building binary

Building binary is usual two-step process of compiling and linking. To make binary from our hello.asm we must do the following:

$ nasm -f elf hello.asm         # this will produce hello.o object file
$ ld -s -o hello hello.o        # this will produce hello executable

That's it. Simple. Now you can launch hello program by entering ./hello, it should work. Look at the binary size -- surprised?

3. References

I hope you enjoyed the journey. If you get interested in assembly programming for UNIX, I strongly encourage you to visit Linux Assembly for more information, and download asmutils package, it contains a lot of sample code. For comprehensive overview of Linux/UNIX assembly programming refer to the Linux Assembly HOWTO.

Thank you for your interest!

"Linux Gazette...making Linux just a little more fun!"

Introduction to UNIX Assembly Programming

By Konstantin Boldyshev

Copyright © 2000, Konstantin Boldyshev Published in Issue 53 of Linux Gazette, May 2000

Copyright © 2000, Konstantin Boldyshev
Published in Issue 53 of Linux Gazette, May 2000