Writing Assembly for AARCH64 & X86_64 vs Writing C

What is assembly like?

Writing in assembly is quite a challenge, and uses a lot more lines of code to do very simple things.

Consider the following output for the program we want to write:
Loop: 1
Loop: 2
Loop: 3
Loop: 4
...
Loop: 30

It would only take 8 lines to write a simple loop that gives the following output in any higher level language.

Note: As you look at the code examples below, please ignore the word "Javascript". Each of the code blocks will have a heading with the language used. Namely C++, AARCH64 assembly, and X86_64 assembly.

For example, in C:




But in AARCH64 Assembly:




And in X86_64 Assembly:




Difference in Coding in C and Assembly

C and C++ are higher level programming languages that:

  • Allow access to memory addresses through pointers to RAM (unlike languages like Java)
  • Has very explicit instructions. You can follow your code from one function to another without needing to assume that much is going on in the background
  • Use variables that can be assigned a value. The variable name describes the value so that you don’t have to memorize a random name like ‘x’

Assembler, in contrast:

  • Has more granular control of the memory through pointers to registers in the cache
  • Every control structure and instruction in C/C++ is broken down into a bunch of smaller ones that each do only one task

AARCH64 specifically:

  • Registers names are mostly meaningless (eg. x0, x1, x2, x3)
  • Has more single focused (simple) commands compared to X86_64, meaning that sometimes you must use 2 separate commands instead of 1.
    Eg. To store a quotient and remainder
    1. in AARCH64: use udiv and msub
    2. In X86_64: use mov twice
    Notice that the number of lines in the file is still not shorter in X86_64 because mov must be used twice

X86_64 specifically:

  • Has some specific use registers that have acronyms for names which give more of a clue as to what they hold Eg.
    rbp = Register Base Pointer, which holds the start of a stack
  • Register names are prefixed with % (Eg. %r10 is register 10)
    This helps to quickly recognize which symbols are registers
  • Constants and Hexadecimal values are prefixed with $ (Eg. $start, and $0x30)
    This is similar to bash variable substitution where you must use $ to access any variables, not just constants.


Assumptions you must make

One of the most confusing things for me was:
How does the system call know what to use as its parameters?

This quote found on the SPO600 Wiki gave me the hint

“r0-r7(x0-x7 for the smaller width registers) are used for arguments and return values; additional arguments are on the stack”

The sys call would take argument one from r0, and proceed through the registers until it found an empty one

For example, when printing to the console:
  1. Register 0 (x0) is set to standard output (line 24 of the AARCH64 code)
  2. Register 1 (x1) is set to the address of the string location (line 18 of the AARCH64 code)
  3. Register 2 (x2) is set to the length of the string to print in bytes (line 27 of the AARCH64 code)

Debugging in Assembly vs C

Debugging was just a bit more difficult than in C. C/C++ compilers seem to check for a lot of other things in the code to make sure that everything will run smoothly before it is run. Assembly will let you get away with a lot more, but just not give you any errors OR output (a lot like javascript).

But most of the time, errors were straight forward and provided enough information to fix the problem. For example:

Errors like:
  1. operand 3 should be an integer register -- `udiv x9,x4,10'
    • Tells you that the operand (parameter) for a command was not the right data type. In this case it should have been a register holding an integer, but it was a raw integer - 10 - instead.
    • Line 11 of the AARCH64 code contains the corrected code
  2. invalid addressing mode at operand 2 -- `str x12, x1, 7'
    • Tells you that the addressing mode is incorrect. In other words, the syntax for the command was incorrectly written - it is missing the square brackets.
    • Line 20 of the AARCH64 code contains the corrected code
  3. segmentation fault
    • Tells you that you’re trying to access a memory address that is null or inaccessible. This is the same in C.
Although assembly is different, longer, and less human readable, it wasn't as terrible an experience as most make it out to be!
In just 6 hours, I was able to learn assembly and write the program above, and you can too :)

References

Useful Start Guide to Assembly
Useful Start Guide to Assembly in AARCH64
Useful Start Guide to Assembly in X86_64

Comments

Popular Posts