Security

Introduction to Reverse Engineering - What is Assembly Code?

A beginner-friendly tutorial that teaches you the basics of reverse engineering by learning about assembly or machine code.
Captain Salem 6 min read
Introduction to Reverse Engineering - What is Assembly Code?

Welcome to our beginner's series to reverse engineering, binary exploitation, web exploitation, and other security-related concepts.

This series will cover some basic concepts related to security, hacking, cybersecurity, or whatever you want to call it. The following are some upcoming topics you can check:

  1. What is Binary Exploitation
  2. Introduction to Registers
  3. The Stack explained
  4. Introduction to Calling Conversions
  5. Brief Introduction to Global Offsec Table
  6. Introduction to Buffers and Buffer Overflows
  7. Introduction to the heap and heap exploitation
  8. The Basics of Disassemblers, Debuggers, and Decompilers

And many more. If that sounds interesting, subscribe to our newsletter to get the post straight to your email.

What is Reverse Engineering?

Reverse engineering refers to taking an already compiled code, either machine code or byte code, and converting it back into a human-readable format.

In most cases, reverse engineering allows us to understand the program's functionality better and determine how it runs. This can then help us find flaws and attempt to exploit them and make the code work in a different way than its intended use.

An example use of reverse engineering is software cracks. These are tools developed to circumnavigate the licensing of a given software and bypass the locked interface.

NOTE: This site does not in no way condone or encourage the use of Pirated software :D

Anatomy of Reverse Engineering

It is good to understand that reverse engineering is an extensive field built on other disciplines. However, although it can be difficult to list exactly what you need, three main components are fundamental to reverse engineering.

  • Assembly or Machine Code
  • Disassemblers
  • Decompilers

For this tutorial, we will introduce you to the world of RE by learning the fundamentals of Assembly or Machine code. Stay tuned for upcoming topics on Disassemblers and Decompilers.

Introduction to Assembly Code

Assembly code or machine code refers to assembly instructions that are formatted to be read and understood by the computer CPU. When we write a program in any human-readable language, such as C, C++, Rust, etc, it must be converted to assembly code allowing the CPU to decode and understand the target operations. This is also known as compilation.

Once the code has been compiled to assembly code, it is hard to reverse back into human readable code as you find in your favorite language. There are tools that can do a good job of it but not quite.

Source Code to Assembly Code

Let us now illustrate how assembly code looks like. For our illustration, we will write a simple hello world program in Rust and convert it to Assembly code using Compiler Explorer.

For example, take a simple hello world program in C as shown:

#include <stdio.h>
int main() {
  printf("Hello World!");
  return 0;
}

Head over to the Compiler Explorer and paste the hello world program above. This should show you the resulting assembly code in real-time on the left panel.

An example resulting code is as shown:

.LC0:
        .string "Hello World!"
main:
        push    rbp
        mov     rbp, rsp
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        pop     rbp
        ret

Ok, what is that?

Although it may look gibberish or complex at first glance, Assembly code is easy to read and interpret with little practice. This is because it's made up of repeatable and logical instructions.

X86-64

x86-64 or amd64, or i64, is a 64-bit Complex Instruction Set Computing (CISC) architecture. This means the registers used for this architecture extend an extra 32 bits on Intel's x86 architecture. CISC means that a single instruction can do many different things simultaneously, such as memory accesses, register reads, etc.

It is also a variable-length instruction set, meaning different instructions can be of different sizes ranging from 1 to 16 bytes long. And finally, x86-64 allows for multi-sized register access, which means you can access certain parts of a register of different sizes.

x86-64 Registers

x86-64 registers behave similarly to other architectures. A key component of x86-64 registers is multi-sized access, meaning the register RAX can have its lower 32-bits accessed with EAX. The next lower 16 bits can be accessed with AX, and the lowest 8 bits can be accessed with AL, allowing the computer to optimize program execution.

Multi-access Register

x86-64 has plenty of registers, including rax, tax, rcx, rdx, rdi, rsi, rsp, rip, r8-r15, and more! But some registers serve particular purposes.

The special registers include:

  1. RIP: the instruction pointer
  2. RSP: the stack pointer
  3. RBP: the base pointer

Source

Assembly Instructions

Assembly code is comprised of a series of instructions that determine the operation performed by the CPU. You will find various instructions, such as:

  • Data Movement instructions - mov, pop, push lea
  • Arithmetic and Logic Instructions - add, sub, inc, dec, imul, and, or etc.
  • Control Flow Instructions - jmp, jcondition, cmp, call.ret

Execution

What should the CPU execute? This is determined by the RIP register, where IP means instruction pointer. Execution follows the pattern: fetch the instruction at the address in RIP, decode it, and run it.

Examples

mov rax, 0xdeadbeef

Here the operation mov is moving the "immediate" 0xdeadbeef into the register RAX

mov rax, [0xdeadbeef + rbx * 4]

Here the operation mov moves the data at the address of [0xdeadbeef + RBX*4] into the register RAX. When brackets are used, you can think of the program as getting the content from that effective address.

Example Execution

-> 0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804000
   0x080400a: add, rax, rbx                  RAX = 0x0
   0x080400d: inc rbx                        RBX = 0x0
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
-> 0x0804005: mov ebx, 0x1234                RIP = 0x0804005
   0x080400a: add, rax, rbx                  RAX = 0xdeadbeef
   0x080400d: inc rbx                        RBX = 0x0
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x080400a
-> 0x080400a: add, rax, rbx                  RAX = 0xdeadbeef
   0x080400d: inc rbx                        RBX = 0x1234
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x080400d
   0x080400a: add, rax, rbx                  RAX = 0xdeadd123
-> 0x080400d: inc rbx                        RBX = 0x1234
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804010
   0x080400a: add, rax, rbx                  RAX = 0xdeadd123
   0x080400d: inc rbx                        RBX = 0x1235
-> 0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804013
   0x080400a: add, rax, rbx                  RAX = 0xdeadbeee
   0x080400d: inc rbx                        RBX = 0x1235
   0x0804010: sub rax, rbx                   RCX = 0x0
-> 0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804005
   0x080400a: add, rax, rbx                  RAX = 0xdeadbeee
   0x080400d: inc rbx                        RBX = 0x1235
   0x0804010: sub rax, rbx                   RCX = 0xdeadbeee
   0x0804013: mov rcx, rax                   RDX = 0x0

Control Flow

How can we express conditionals in x86-64? We use conditional jumps such as:

  • jnz <address>
  • je <address>
  • jge <address>
  • jle <address>
  • etc.

They jump if their condition is true, and go to the next instruction otherwise. These conditionals check EFLAGS which are special registers that store flags on specific instructions such as add rax, rbx which sets the o (overflow) flag if the sum is greater than a 64-bit register can hold and wraps around. You can jump based on that with a jo instruction. The most important thing to remember is the cmp instruction:

cmp rax, rbx
jle error

This assembly jumps if RAX <= RBX

Addresses

Memory acts similarly to an immense array where the indices of this "array" are memory addresses. Remember from earlier:

mov rax, [0xdeadbeef]

The square brackets mean "get the data at this address." This is analogous to the C/C++ syntax: rax = *0xdeadbeef;

Conclusion

This was a simple introduction to reverse engineering by learning how to work with Assembly code. This article is produced in conjunction with OSIRIS Lab and [CTF101

If you enjoy our content, please consider buying us a coffee to support our work:

Share
Comments
More from GeekBits

Join us at GeekBits

Join our members and get a currated list of awesome articles each month.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to GeekBits.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.