Assembly language and machine code – Gary explains
Today we are very used to running a rich variety of operating systems and programs on our mobile devices, from Office on a Windows laptop to a game on our Android smartphones, we are accustomed to running any program that we have installed (stored) on a device. But things didn’t used to be like that. OK, I am not talking about 5 years ago, but more like 50 or 60 years ago. You see the first computers didn’t run programs stored on some kind of media, they only ran the program that the physical circuit board allowed them to run. The idea of loading and running a stored program didn’t exist.
That was until two very clever guys started to think about building a universal computer that could theoretically run any program we care to create. The first of these two guys from Alan Turing. He played a major role in cracking the German Enigma code during the second world war, however he is also known for lots of other things including his work on AI (i.e. the Turing Test) and for his idea of the Turing Machine (and the Universal Turing Machine). In essence Turing described a machine that could read or write symbols from a tape and then under the direction of those symbols move to another part of the tape and read or write more symbols and so on. This idea was extended by a Jon von Neumann in a design which is known as a the von Neumann architecture, instead of tape it had Random Access Memory (RAM) and a CPU that could execute instructions from RAM and alter data in the same RAM. The von Neumann architecture is the basic premise of almost all modern computers.
But what does this all have to do with assembly language and machine code? In a nutshell the computer at the heart of your smartphone is a von Neumann machine that runs programs (apps) stored in the phone (the flash memory) and those programs can be changed, updated, and removed, just by altering what is stored in the flash. Each app is made up of instructions, stored instructions which tell the processor what to do. Your smartphone probably has a processor based on the ARM architecture and a CPU core designed either by ARM (e.g. the Cortex-A72) or by one of ARM’s partners like Samsung or Qualcomm. These processors all understand the same instruction codes.
Instructions are basically numbers. The width of those numbers (e.g. 8-bit, 16-bit, etc) depend on the architecture. ARM instructions can be 16-bit, 32-bits wide or 64-bit wide, depending on which mode is being used. When the CPU sees a number, for example 0x0120 or 288, it knows that this means “put 1 in register 0.” It is the same on the Cortex-A72, on the Qualcom Kryo, on the Apple A9 processor, and so on.
It is this “raw” number format that is machine code. On a modern processor it is very hard (and inefficient) to write machine code by hand, typing in the raw numbers. So there is a slightly higher level language called assembly language which is a text representation of the machine code. A program called an assembler is then used to convert from the assembly language to the machine code.
Earlier I mentioned that 0x0120 means “put 1 in register 0.” A register is a little pot which can hold a number, there are only a few (at most 64), so they can’t replace main memory, however when doing a particular job (say, looping around while working on a string) they are great as a fast temporary holder for data. In assembly language “put 1 in register 0” is written like this: “movs r0, #1”. So when the assembler see a “movs” operation it can generate the right machine code, depending on the register used etc.
So here is a snippet of assembly language:
// i = 15; mov r3, #15 str r3, [r11, #-8] //j = 25; mov r3, #25 str r3, [r11, #-12] // i = i + j; ldr r2, [r11, #-8] ldr r3, [r11, #-12] add r3, r2, r3 str r3, [r11, #-8]
The lines starting with “//” are actually comments which contain the C language equivalent of what the assembly language is doing. As you can see this code sets a variable called i, which is stored 8 bytes down on the stack, to 15. It then sets j, which is stored 12 bytes down on the stack, to 25. Finally it adds i to j (by loading i into r2 and j into r3) and then stores the result in i (8 bytes down the stack).
This means that to set the value of two variables and then add them together takes 8 lines of code. Imagine how much code you would need to write a game like Clash Royale! That is where higher level langues like C, C++ and Java come in. The equivalent program is C is just three lines long, quite a saving! Also high level languages let you use nice variable names rather than having to store things on the stack or in main memory.
A slightly more human readable form of machine code is called assembly language and a program called an assembler is used to convert the assembly notations into machine code.
Normally apps for Android are written in Java. The Java is compiled to Java byte-code which in turn is executed on the Java Virtual Machine. This works well for the majority of apps, but if you need to squeeze that extra bit of performance out of your app then you might want to write the code in C or directly in assembly language. Using the Android Native Development Kit (NDK) it is possible to write an app in C. The C is then compiled directly to machine code. Or if you want the ultimate level of control then you can even write assembly code using the NDK! Nerds only need apply.
Stored-program computers can be referred to as von Neumann architecture machines. They run programs stored somewhere on the system and are flexible (universal) in the sense that it can run any computable algorithm. The actual raw instructions that the CPU executes is called machine code. A slightly more human readable form of machine code is called assembly language and a program called an assembler is used to convert the assembly notations into machine code. Higher level languages like C or C++ are converted into machine code using a compiler. While normal apps are written in Java on Android, it is possible to write C, C++ and assembly language programs using the NDK.