#52WeeksOfCode Week 32 – Assembly Language

Week: 32

Language: Assembly

IDE(s): SciTe + NASM

History (official):

(There is no one version of Assembly language since it is aimed at specific hardware. However, I did find this in the Forward for The Art of Assembly Language by Randall Hyde which I thought was amusing):

Egads. What kind of book begins this way? What kind of author would begin the book with a forward like this one? Well, the truth is, I considered putting this stuff into the first chapter since most people never bother reading the forward. A discussion of what’s right and what’s wrong with assembly language is very important and sticking it into a chapter might encourage someone to read it. However, I quickly found that university students can skip Chapter One as easily as they can skip a forward, so this stuff wound up in a forward after all.

So why would anyone learn this stuff, anyway? Well, there are several reasons which come to mind:

 

  • Your major requires a course in assembly language; i.e., you’re here against your will.
  • A programmer where you work quit. Most of the source code left behind was written in assembly language and you were elected to maintain it.
  • Your boss has the audacity to insist that you write your code in assembly against your strongest wishes.
  • Your programs run just a little too slow, or are a little too large and you think assembly language might help you get your project under control.
  • You want to understand how computers actually work.
  • You’re interested in learning how to write efficient code.
  • You want to try something new.

 

 

Well, whatever the reason you’re here, welcome aboard. Let’s take a look at the subject you’re about to study.

History (real):

I feel the need to lend some perspective before I dig into Assembly language, otherwise known as Assembler.

This is a picture of the first personal computer ever sold, the MITS Altair (from OldComputers.Net):

MITS Altair image, courtesy of OldComputers.net

The First Personal Computer

 

The Altair was introduced in January of 1975. It had 256 bytes of RAM (up to a maximum of 64K), a 2.0 MHz Intel 8080 CPU and was available for $395 as a kit. You got a box of parts. You had to make your own circuit boards. If you wanted it pre-assembled, that was an extra $255.

I want to draw your attention to the front panel of this computer. Yes, I mean all of the blinky lights and flippy switches. If you wanted to program this personal computer, you had to flip the switches on the front panel, entering your commands one word at a time in machine language (a series of 1’s and 0’s).

Now because there was no floppy drive, no hard drive, no secondary storage of any kind, if someone happened to trip over the power cord while you were flipping switches, you had to start over from the very beginning.

I mention this to tell you that no matter how arcane and hard to write, maintain, debug, read and understand Assembler is, things could be (and have been) much, much worse.

Discussion:

While Assembler was a step up from machine language, it wasn’t what you would call a giant leap. You still have to shuffles 1’s and 0’s around and have a deep understanding of the underlying hardware of your computer, but at least it took slightly less time to write code.

But if we now have all of these high-level languages, why bother writing code in Assembler?

  • Speed – Assembler programs are extremely efficient and very fast.
  • Size – The efficiency of Assembler also means that programs take up very little space.
  • Power – Since you’re working very close to the hardware, you can do things with Assembler that you can’t do with higher level languages

It’s not as if you have to choose between one or the other. If you have a section of code that needs to run quickly and efficiently you can write it in Assembler and hook it into your higher-level code.

For this exercise I decided to use NASM (The Netwide Assembler), a free, cross-platform assembler for Intel-compatible processors. There doesn’t seem to be a good IDE for Assembler (rumors exist of an Assembler plugin for Eclipse) so I’m going to use a text editor. Though NASM is available for Mac OS X, documentation on how to use it is limited or out-of-date. Linux, on the other hand, is well-documented so I’ll be using Debian 7 for my development platform and SciTe as my text editor. It’s free, cross-platform and does syntax highlighting for a number of common programming languages.

I looked for a “Hello World!” program in Assembler and thanks to a very nice tutorial from Ray Toal at Loyola Marymount University, this is what I found:

Hello World in Assembler

Hello World in Assembler (Using the SciTe text editor)

Compiling and running just involved a bit of typing.

Compiling and running code

Compiling and running our code

Before we look through the source, let’s see what’s happening at the command line.

nasm -felf64 hello.asm

The source code file is called hello.asm. This command tells NASM what format (-f) the output should take. In this case it’s 64-bit ELF (Executable Linkable Format, the standard binary format for Linux executables, object code and software libraries). NASM takes the source code (hello.asm) and compiles it into the object code file hello.o. Object code is machine code, a set of binary instructions.

However, as far as the computer is concerned it’s just a pile of 1’s and 0’s.

ld hello.o && ./a.out

There are actually two commands here.

ld hello.o

The first runs the linker (ld) to link the object code with the ABI (Application Binary Interface) for the target platform. It’s like putting a ‘Start’ button on our code so our processor knows how to run it. The default output is an executable called a.out. The line

./a.out

runs the program to print “Hello World!”. The ‘&&’ between the two commands simply means ‘if the first command succeeds run the second command, otherwise just stop’.

Now let’s go through the source code line-by-line:

# ----------------------------------------------------------------------------------------
# Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
# To assemble and run:
#
#     gcc -c hello.s && ld hello.o && ./a.out
#
# or
#
#     gcc -nostdlib hello.s && ./a.out
# ----------------------------------------------------------------------------------------

 

Anything that starts with a pound sign (hashtag for you youngsters) is a comment and is ignored by nasm. This is pretty standard Linux syntax and you may see the semicolon (;) used for a similar purpose on other platforms.

       .global _start

The global keyword marks symbols that need to show up in the output (hello.o). As you might suspect, NASM is showing the linker where the executable starts.

       .text

This is just a name for a section of code. Remember, the output is just a string of 1’s and 0’s which could mean anything, depending on where you start reading and how many bits you read at a time. This keyword .text tells us that that the following section is runnable code, not data.

_start:

As stated above, here’s where our actual code starts, in case it wasn’t obvious.

       mov     $1, %rax
       mov     $1, %rdi
       mov     $message, %rsi
       mov     $13, %rdx

A machine language instruction consists of two parts, the opcode (what the code should do) and the operand (the data on which the opcode should act). In these four lines, .mov is the opcode for loading data into a register (a small fast chunk of memory used for temporary storage). These four instructions copy data into the registers rax, rdi, rsi and rdx.

The first (rax) holds the location of the system call we’re going to use, which is write ($1).

The second (rdi) is the memory location where we’re going to write our message which is standard output ($1).

The third (rsi) is the string we’re going to write to output. ($message is just a label for the string data, which we’ll find elsewhere in our code. Hang on.)

The fourth (rdx) is how much of our message to print, 13 bytes (or characters).

We’re loading a sequence of commands. Write 13 bytes of our message to standard output. It’s like setting up a series of dominoes, just waiting for that lone finger to start the string of activity.

And here comes the finger:

            syscall

Since our system already knows how to print strings, why reinvent the wheel? This opcode tells our processor to read the four sections of memory we’ve set up and act accordingly. We still have to tell the processor we’re done so let’s set the dominoes up again.

       mov     $60, %rax
       xor     %rdi, %rdi

The first loads the exit command ($60) into our register rax. The second is the exit code for our program to spit out, in this case 0. Ready for the finger again?

       syscall

Finally we come to our data section.

message:

       .ascii  "Hello, world\n"

This is labelled with message: so we can refer to it elsewhere in our code, as we saw above. In this case, it contains the string that we want the system to print to standard output. It’s labelled as plain old alphanumeric text, because we have to specify how we want the processor to interpret those 1’s and 0’s.

On the plus side, Assembler is a very powerful way to write fast code that takes up very little memory space. On the minus side, you have to keep track of where your data is at all times and what takes a single line of code in a high-level language takes thirteen in Assembler.

Not that there’s anything wrong with that.

Hyde, Randall. The art of assembly language. No Starch Press, 2003.

 

Leave a comment