This document is intended to be a tutorial, showing how to write a simple assembly program in several UNIX operating systems on IA32 (i386) platform. Included material may or may not be applicable to other hardware and/or software platforms. Document explains program layout, system call convention, and build process. It accompanies Linux Assembly HOWTO, which may be of your interest as well, though is more Linux specific.
v0.3, April 09, 2000
Copyright © 1999-2000 Konstantin Boldyshev. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation.
The latest version of this document is available from http://linuxassembly.org/intro.html. If you are reading a few-months-old copy, please check the url above for a new version.
You will need several tools to play with programs included in this tutorial.
First of all you need assembler (compiler).
As a rule modern UNIX distribution includes gas
(GNU Assembler),
but all examples specified here use another assembler -- nasm
(Netwide Assembler).
You can download it from the
nasm page,
it comes with full source code.
Compile it, or try to find precompiled binary for your OS;
note that several distributions (at least Linux ones)
already have nasm
, check first.
Second, you need linker -- ld
, since nasm
produces only object code.
Any distribution should embrace ld
.
If you're going to dig in, you should also install include files for your OS, and if possible, kernel source.
Now you should be ready to start, welcome..
Now we will write our program, classical "Hello, world" (hello.asm). You can download its sources and binaries here. But before let me explain several basics.
Unless program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. Here comes a need to call some OS service. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.
There are two common ways of performing a system call in UNIX OS: trough the C library (libc) wrapper, or directly.
Using or not using libc in assembly programming is more a question of taste/belief than something practical. Libc wrappers are made to protect program from possible system call convention change, and to provide POSIX compatible interface, if kernel lacks it for some call. However usually UNIX kernel is more or less POSIX compliant, this means that syntax of most libc "system calls" exactly matches syntax of real kernel system calls (and vice versa). But main drawback of throwing libc away is that are loosing several functions that are not just syscall wrappers, like printf(), malloc() and similar.
This tutorial will show how to use direct kernel calls, since this is the fastest way to call kernel service; our code is not linked to any library, it communicates with kernel directly.
Things that differ in different UNIX kernels are set of system calls and system call convention (however as they strive for POSIX compliance, there's a lot of common between them).
Note for (former) DOS programmers: so, what is that system call?
Better to explain it in such a way:
if you ever wrote a DOS assembly program (and most IA32 assembly programmers did),
you remember DOS services int 0x21, int 0x25, int 0x26
etc..
This is what can be designated as system call.
However the actual implementation is absolutely different,
and this doesn't mean that system calls necessary are done via some interrupt.
Also, quite often DOS programmers mix OS services with BIOS services
like int 0x10
or int 0x16
, and are very surprised when they fail
to perform them in UNIX, since these are not OS services).
As a rule, modern IA32 UNIXes are 32bit (*grin*), run in protected mode, have flat memory model, and use ELF format for binaries.
Program can be divided into sections (or segments):
.text
for your code (read-only),
.data
for your data (read-write),
.bss
for uninitialized data (read-write);
actually there can be few other, as well as user-defined sections,
but there's rare need to use them and they are out of our interest here.
Program must have at least .text
section.
Ok, now we'll dive into OS specific details.
System calls in Linux are done through int 0x80. (actually there's a kernel patch allowing system calls to be done via syscall (sysenter) instruction on newer CPUs, but this thing is still experimental).
Linux differs from usual UNIX calling convention,
and features "fastcall" convention
for system calls (it resembles DOS).
System function number is passed in eax
,
and arguments are passed through registers, not the stack.
There can be up to five arguments in ebx, ecx, edx, esi, edi
consequently.
If there's more than five arguments, they are simply passed though the
structure as first argument.
Result is returned in eax
, stack is not touched at all.
System call function numbers are in sys/syscall.h,
but actually in asm/unistd.h,
some documentation is in the 2nd section of manual
(f.e. to find info on write
system call, issue man 2 write
).
There are several attempts to made up-to-date documentation of Linux system calls, examine URLs in the references.
So, our Linux program will look like:
section .text
global _start ;must be declared for linker (ld)
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
_start: ;we tell linker where is entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
As you will see futther, Linux syscall convention is the most compact one.
Kernel source references:
FreeBSD has "usual" calling convention,
when syscall number is in eax, and parameters are on the stack
(the first argument is pushed the last).
System call is to be performed through the function call to a
function containing int 0x80
and ret
, not just int 0x80
itself
(return address MUST be on the stack before int 0x80
is issued!).
Caller must clean up the stack after call.
Result is returned as usual in eax
.
Also there's an alternate way of using call 7:0
gate instead of int 0x80
.
End-result is the same, not counting increase of program size,
since you will also need to push eax
before,
and these two instructions occupy more bytes.
System call function numbers are in sys/syscall.h, documentation is in the 2nd section of man.
Ok, I think the source will explain this better:
Note: Included code may run on other *BSD as well, I think.
section .text
global _start ;must be declared for linker (ld)
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
_syscall:
int 0x80 ;system call
ret
_start: ;tell linker entry point
push dword len ;message length
push dword msg ;message to write
push dword 1 ;file descriptor (stdout)
mov eax,0x4 ;system call number (sys_write)
call _syscall ;call kernel
;actually there's an alternate
;way to call kernel:
;push eax
;call 7:0
add esp,12 ;clean stack (3 arguments * 4)
push dword 0 ;exit code
mov eax,0x1 ;system call number (sys_exit)
call _syscall ;call kernel
;we do not return from sys_exit,
;there's no need to clean stack
Kernel source references:
BeOS kernel is using "usual" UNIX calling convention too.
The difference from FreeBSD example is that you call int 0x25
.
On information where to find system call function numbers and other interesting details, examine asmutils, especially os_beos.inc file.
Note: to make nasm
compile correctly on BeOS you need
to insert #include "nasm.h"
into float.h
,
and #include <stdio.h>
into nasm.h
.
section .text
global _start ;must be declared for linker (ld)
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
_syscall: ;system call
int 0x25
ret
_start: ;tell linker entry point
push dword len ;message length
push dword msg ;message to write
push dword 1 ;file descriptor (stdout)
mov eax,0x3 ;system call number (sys_write)
call _syscall ;call kernel
add esp,12 ;clean stack (3 * 4)
push dword 0 ;exit code
mov eax,0x3f ;system call number (sys_exit)
call _syscall ;call kernel
;no need to clean stack
Building binary is usual two-step process of compiling and linking. To make binary from our hello.asm we must do the following:
$ nasm -f elf hello.asm # this will produce hello.o object file $ ld -s -o hello hello.o # this will produce hello executable
That's it. Simple.
Now you can launch hello program by entering ./hello
, it should work.
Look at the binary size -- surprised?
I hope you enjoyed the journey. If you get interested in assembly programming for UNIX, I strongly encourage you to visit Linux Assembly for more information, and download asmutils package, it contains a lot of sample code. For comprehensive overview of Linux/UNIX assembly programming refer to the Linux Assembly HOWTO.
Thank you for your interest!