ai, programming, just for fun,

This 6502 Emulator Executes 1-3 Instructions Per Second (Written in Markdown, Running in an LLM)

Adam Dunkels, PhD Adam Dunkels, PhD Follow May 25, 2026 · 18 mins read
This 6502 Emulator Executes 1-3 Instructions Per Second (Written in Markdown, Running in an LLM)
Share this

In previous posts we explored what happens if we treat LLMs as processors that run Markdown as their machine code: a user-space IP stack written in Markdown and a BASIC interpreter written in Markdown. Today we are going to look at an emulator for a 6502 microprocessor written in Markdown and executed with OpenCode and the GLM 5.1 model running on Grunden.ai.

Yes, this is a ridiculous idea with no practical value. But it is a fun experiment.

A 6502 Microprocessor Written in Markdown

The 6502 microprocessor was a popular microprocessor for home computers in the 1980s. It was used in the Commodore 64, Apple II, Nintendo Entertainment System , among many others.

The 6502 microprocessor has three built-in registers (A, X, and Y) and can address a memory range of 64k. In addition to the three registers, it also has a 256 byte memory bank, called zero-page, that works like a larger register pool. The stack is fixed at address $0100 and limited to 256 bytes.

Opcodes are one byte wide and take zero, one, or two additional bytes as arguments. There are several undocumented opcodes because the opcode decoder circuitry was simplified. But for now, we do not worry about those.

Let’s see what happens if we implement a 6502 emulator in Markdown, with the documented instructions and addressing modes (except the weird Binary Coded Decimal mode). This is the resulting code:

run-6502.md (click to expand)
# Run 6502 — LLM as CPU

You are a MOS 6502 CPU emulator. The machine code is provided inline below as hex bytes. Execute it by fetching opcodes, decoding instructions, computing results, tracking registers/flags/memory, and following control flow — all in your own reasoning. No libraries, no Python, no calculator tools.

## Program

``
$ARGUMENTS
``

## Memory Model

- 64KB address space ($0000–$FFFF), sparsely tracked (only store bytes that are written).
- Program is loaded starting at **$0600** (the first byte in the hex dump is at $0600).
- Output region: **$0200–$02FF**. After execution, this region is displayed as the program's output.
- Stack: **$0100–$01FF**. Stack pointer (SP) indexes into this page.
- Zero page: **$0000–$00FF**. Fast access, used by zero-page addressing modes.

## CPU State

Initialize before execution:

``
Registers:
  A  = $00       (accumulator, 8-bit)
  X  = $00       (X index register, 8-bit)
  Y  = $00       (Y index register, 8-bit)
  SP = $FD       (stack pointer, 8-bit, points into $01xx)
  PC = $0600     (program counter, 16-bit)

Status flags (P register):
  N = 0  (Negative: bit 7 of result)
  V = 0  (Overflow: signed overflow on ADC/SBC)
  B = 0  (Break: set by BRK)
  I = 0  (Interrupt disable)
  Z = 0  (Zero: result is zero)
  C = 0  (Carry: unsigned overflow on ADC, unsigned borrow on SBC)

Memory: (empty — only the program bytes are loaded)
``

## Fetch-Decode-Execute Loop

Repeat until halted (BRK encountered or PC runs past loaded program bytes):

1. **Fetch**: Read the byte at PC. This is the opcode.
2. **Decode**: Look up the opcode in the instruction table below. Determine the mnemonic, addressing mode, and byte count.
3. **Read operands**: Fetch additional bytes as required by the addressing mode.
4. **Execute**: Perform the operation. Update registers, flags, and memory as specified.
5. **Advance PC**: PC += instruction byte count (already done during fetch/operand read).

**After every instruction**, track state in your reasoning:

``
[$xxxx] MNEMONIC operand → A=$xx X=$xx Y=$xx SP=$xx | NV-BDIZC=xxxxxxxx | PC=$xxxx
``

This is mandatory. It catches errors in flag computation and addressing.

## Addressing Modes

| Mode | Syntax | Bytes | How to resolve |
|------|--------|-------|----------------|
| Implied | `CLC` | 1 | No operand |
| Immediate | `LDA #$xx` | 2 | Value is the byte after opcode |
| Zero Page | `LDA $xx` | 2 | Address is $00xx; read/write that byte |
| Zero Page,X | `LDA $xx,X` | 2 | Address is ($xx + X) & $FF; read/write that byte |
| Zero Page,Y | `LDX $xx,Y` | 2 | Address is ($xx + Y) & $FF; read/write that byte |
| Absolute | `LDA $xxxx` | 3 | Address is the 16-bit value (low byte first); read/write that byte |
| Absolute,X | `LDA $xxxx,X` | 3 | Address is (16-bit value + X) & $FFFF |
| Absolute,Y | `LDA $xxxx,Y` | 3 | Address is (16-bit value + Y) & $FFFF |
| Relative | `BEQ $xx` | 2 | Signed offset (-128 to +127) added to PC (after PC has advanced past this instruction) |

## Instruction Set

### Load/Store

| Opcode | Mnemonic | Mode | Flags |
|--------|----------|------|-------|
| $A9 | LDA #imm | Immediate | N, Z |
| $A5 | LDA zp | Zero Page | N, Z |
| $B5 | LDA zp,X | Zero Page,X | N, Z |
| $AD | LDA abs | Absolute | N, Z |
| $BD | LDA abs,X | Absolute,X | N, Z |
| $B9 | LDA abs,Y | Absolute,Y | N, Z |
| $A2 | LDX #imm | Immediate | N, Z |
| $A6 | LDX zp | Zero Page | N, Z |
| $AE | LDX abs | Absolute | N, Z |
| $A0 | LDY #imm | Immediate | N, Z |
| $A4 | LDY zp | Zero Page | N, Z |
| $AC | LDY abs | Absolute | N, Z |
| $85 | STA zp | Zero Page | — |
| $95 | STA zp,X | Zero Page,X | — |
| $8D | STA abs | Absolute | — |
| $9D | STA abs,X | Absolute,X | — |
| $99 | STA abs,Y | Absolute,Y | — |
| $86 | STX zp | Zero Page | — |
| $8E | STX abs | Absolute | — |
| $84 | STY zp | Zero Page | — |
| $8C | STY abs | Absolute | — |

### Arithmetic

| Opcode | Mnemonic | Mode | Flags |
|--------|----------|------|-------|
| $69 | ADC #imm | Immediate | N, V, Z, C |
| $65 | ADC zp | Zero Page | N, V, Z, C |
| $6D | ADC abs | Absolute | N, V, Z, C |
| $E9 | SBC #imm | Immediate | N, V, Z, C |
| $E5 | SBC zp | Zero Page | N, V, Z, C |
| $ED | SBC abs | Absolute | N, V, Z, C |

**ADC**: `A + operand + C → A`. Set C if result > 255. Set V if signed overflow. N and Z from result.

**SBC**: `A - operand - (1-C) → A`. Equivalent to `A + ~operand + C`. Set C if result >= 0 (no borrow). Set V if signed overflow. N and Z from result.

### Comparison

| Opcode | Mnemonic | Mode | Flags |
|--------|----------|------|-------|
| $C9 | CMP #imm | Immediate | N, Z, C |
| $C5 | CMP zp | Zero Page | N, Z, C |
| $CD | CMP abs | Absolute | N, Z, C |
| $E0 | CPX #imm | Immediate | N, Z, C |
| $E4 | CPX zp | Zero Page | N, Z, C |
| $C0 | CPY #imm | Immediate | N, Z, C |
| $C4 | CPY zp | Zero Page | N, Z, C |

**CMP/CPX/CPY**: Compute `register - operand`. Set C if register >= operand. Set Z if equal. Set N from bit 7 of result. Do NOT store the result.

### Logic

| Opcode | Mnemonic | Mode | Flags |
|--------|----------|------|-------|
| $29 | AND #imm | Immediate | N, Z |
| $25 | AND zp | Zero Page | N, Z |
| $09 | ORA #imm | Immediate | N, Z |
| $05 | ORA zp | Zero Page | N, Z |
| $49 | EOR #imm | Immediate | N, Z |
| $45 | EOR zp | Zero Page | N, Z |

### Shifts and Rotates

| Opcode | Mnemonic | Mode | Flags |
|--------|----------|------|-------|
| $0A | ASL A | Implied (accumulator) | N, Z, C |
| $06 | ASL zp | Zero Page | N, Z, C |
| $4A | LSR A | Implied (accumulator) | N, Z, C |
| $46 | LSR zp | Zero Page | N, Z, C |
| $2A | ROL A | Implied (accumulator) | N, Z, C |
| $26 | ROL zp | Zero Page | N, Z, C |
| $6A | ROR A | Implied (accumulator) | N, Z, C |
| $66 | ROR zp | Zero Page | N, Z, C |

**ASL**: Shift left. Bit 7 goes to C, 0 goes into bit 0.
**LSR**: Shift right. Bit 0 goes to C, 0 goes into bit 7.
**ROL**: Rotate left through carry. Old C goes into bit 0, bit 7 goes to new C.
**ROR**: Rotate right through carry. Old C goes into bit 7, bit 0 goes to new C.

### Increment/Decrement

| Opcode | Mnemonic | Mode | Flags |
|--------|----------|------|-------|
| $E6 | INC zp | Zero Page | N, Z |
| $EE | INC abs | Absolute | N, Z |
| $C6 | DEC zp | Zero Page | N, Z |
| $CE | DEC abs | Absolute | N, Z |
| $E8 | INX | Implied | N, Z |
| $CA | DEX | Implied | N, Z |
| $C8 | INY | Implied | N, Z |
| $88 | DEY | Implied | N, Z |

All values wrap at 8 bits: `$FF + 1 = $00`, `$00 - 1 = $FF`.

### Branches (all Relative addressing, 2 bytes)

| Opcode | Mnemonic | Condition |
|--------|----------|-----------|
| $F0 | BEQ | Z = 1 |
| $D0 | BNE | Z = 0 |
| $B0 | BCS | C = 1 |
| $90 | BCC | C = 0 |
| $30 | BMI | N = 1 |
| $10 | BPL | N = 0 |
| $70 | BVS | V = 1 |
| $50 | BVC | V = 0 |

**Branch offset**: The byte after the opcode is a signed 8-bit offset. If the condition is true, PC = PC + offset (where PC already points to the next instruction). To convert: if byte > 127, offset = byte - 256.

### Jumps and Subroutines

| Opcode | Mnemonic | Mode | Notes |
|--------|----------|------|-------|
| $4C | JMP abs | Absolute | PC = address |
| $20 | JSR abs | Absolute | Push (PC-1) high then low byte onto stack, PC = address |
| $60 | RTS | Implied | Pull low then high byte from stack, PC = pulled address + 1 |

### Stack

| Opcode | Mnemonic | Notes |
|--------|----------|-------|
| $48 | PHA | Push A onto stack. SP decrements. |
| $68 | PLA | Pull from stack into A. SP increments. N, Z set. |
| $08 | PHP | Push P (status) onto stack. SP decrements. |
| $28 | PLP | Pull from stack into P. SP increments. All flags set from pulled value. |

Stack push: write to $0100+SP, then SP = SP - 1.
Stack pull: SP = SP + 1, then read from $0100+SP.

### Register Transfers

| Opcode | Mnemonic | Flags |
|--------|----------|-------|
| $AA | TAX | N, Z |
| $A8 | TAY | N, Z |
| $8A | TXA | N, Z |
| $98 | TYA | N, Z |
| $BA | TSX | N, Z |
| $9A | TXS | — |

### Flag Operations

| Opcode | Mnemonic | Effect |
|--------|----------|--------|
| $18 | CLC | C = 0 |
| $38 | SEC | C = 1 |
| $58 | CLI | I = 0 |
| $78 | SEI | I = 1 |
| $B8 | CLV | V = 0 |

### Miscellaneous

| Opcode | Mnemonic | Effect |
|--------|----------|--------|
| $EA | NOP | No operation |
| $00 | BRK | Halt execution (in this emulator, signals end of program) |

## Flag Computation Rules

**N (Negative)**: Set to bit 7 of the result. `N = (result >> 7) & 1`.

**Z (Zero)**: Set if result is zero. `Z = (result == 0) ? 1 : 0`.

**C (Carry)**:
- After ADC: `C = 1` if unsigned result > 255.
- After SBC: `C = 1` if unsigned result >= 0 (no borrow). Equivalently, the carry output of `A + ~operand + C_in`.
- After CMP/CPX/CPY: `C = 1` if register >= operand.
- After ASL/ROL: old bit 7.
- After LSR/ROR: old bit 0.

**V (Overflow)**: Only set by ADC and SBC. Set when the sign of the result is wrong given the signs of the inputs:
- `V = ((A ^ result) & (operand ^ result) & $80) != 0` for ADC.
- `V = ((A ^ result) & (~operand ^ result) & $80) != 0` for SBC.

All results are masked to 8 bits (`& $FF`) before being stored.

## Two's Complement Reference

For branch offsets and signed interpretation:
- If byte <= 127 ($7F): value is positive (0 to +127)
- If byte >= 128 ($80): value is negative (byte - 256, giving -128 to -1)

Example: offset byte $FC = 252 decimal = 252 - 256 = -4.

## Output Format

When execution halts, print:

1. **Output memory** ($0200–$02FF) — only non-zero bytes, shown as: `$02xx: $yy (decimal)`. If output memory contains what looks like ASCII, also show the character.

2. **Final CPU state**:
``
A=$xx X=$xx Y=$xx SP=$xx PC=$xxxx
NV-BDIZC = xxxxxxxx
``

3. **Summary**: instructions executed (count), any errors encountered.

## Critical Rules

1. **Do ALL arithmetic yourself.** No Python, no tools, no shortcuts. Work through each addition, subtraction, and comparison step by step. Show intermediate results for multi-byte or carry-dependent operations.
2. **Track state after every instruction.** This catches flag and addressing errors immediately.
3. **All values are unsigned 8-bit (0–255) unless interpreting as signed for branches or overflow detection.**
4. **Little-endian byte order.** In a 16-bit address stored as two bytes, the low byte comes first. `$4C 00 06` means JMP $0600 (low=$00, high=$06).
5. **If an unknown opcode is encountered, report the error and HALT.**
6. **No tool calls.** The entire emulation happens in your reasoning. Output goes directly in your reply.
7. **Maximum 500 instructions.** If execution exceeds 500 instructions without halting, stop and report "execution limit reached" with current state.

Running it with OpenCode

To run our 6502 emulator, we setup OpenCode and ask it to use the GLM-5.1 model running on Grunden.ai. GLM-5.1 is a state-of-the-art model with a 200k context that Grunden.ai runs on NVIDIA H200 servers.

We ask OpenCode to run a set of 6502 programs, provided in hex code. The programs are in increasing complexity: the first one adds two numbers, the second writes memory in a loop, and the third one computes the first few values of the Fibonacci sequence. The programs end with a BRK instruction. The three programs are below.

add-two.hex (click to expand)
; Add two numbers: 3 + 5 = 8, store result in $0200
; LDA #$03    → A9 03
; CLC         → 18
; ADC #$05    → 69 05
; STA $0200   → 8D 00 02
; BRK         → 00
;
; Expected: $0200 = $08 (8)
A9 03 18 69 05 8D 00 02 00
count-loop.hex (click to expand)
; Count 1 to 10, storing each value in $0200-$0209
; Uses X as counter (1-10) and Y as output index (0-9)
;
;       LDX #$01      → A2 01       ; X = 1 (counter)
;       LDY #$00      → A0 00       ; Y = 0 (output index)
; loop: TXA           → 8A          ; A = X
;       STA $0200,Y   → 99 00 02    ; output[Y] = A
;       INX           → E8          ; X++
;       INY           → C8          ; Y++
;       CPX #$0B      → E0 0B       ; compare X to 11
;       BNE loop      → D0 F6       ; if X != 11, branch back (-10)
;       BRK           → 00
;
; Expected: $0200-$0209 = 01 02 03 04 05 06 07 08 09 0A
A2 01 A0 00 8A 99 00 02 E8 C8 E0 0B D0 F6 00
fibonacci.hex (click to expand)
; Compute first 10 Fibonacci numbers, store in $0200-$0209
; fib(0)=1, fib(1)=1, fib(2)=2, fib(3)=3, fib(4)=5, ...
;
; Zero page usage:
;   $00 = previous fib number
;   $01 = current fib number
;   $02 = temp (for swap)
;   $03 = counter (counts down from 8)
;
;       LDA #$01      → A9 01       ; A = 1
;       STA $00       → 85 00       ; prev = 1
;       STA $01       → 85 01       ; curr = 1
;       STA $0200     → 8D 00 02    ; output[0] = 1
;       STA $0201     → 8D 01 02    ; output[1] = 1
;       LDX #$02      → A2 02       ; X = 2 (output index)
;       LDA #$08      → A9 08       ; A = 8 (remaining count)
;       STA $03       → 85 03       ; counter = 8
; loop: LDA $01       → A5 01       ; A = curr
;       STA $02       → 85 02       ; temp = curr
;       CLC           → 18
;       ADC $00       → 65 00       ; A = curr + prev
;       STA $01       → 85 01       ; curr = curr + prev
;       STA $0200,X   → 9D 00 02    ; output[X] = new curr
;       LDA $02       → A5 02       ; A = temp (old curr)
;       STA $00       → 85 00       ; prev = old curr
;       INX           → E8          ; X++
;       DEC $03       → C6 03       ; counter--
;       BNE loop      → D0 EB       ; if counter != 0, loop (-21)
;       BRK           → 00
;
; Expected: $0200-$0209 = 01 01 02 03 05 08 0D 15 22 37
;           (1, 1, 2, 3, 5, 8, 13, 21, 34, 55)
A9 01 85 00 85 01 8D 00 02 8D 01 02 A2 02 A9 08 85 03 A5 01 85 02 18 65 00 85 01 9D 00 02 A5 02 85 00 E8 C6 03 D0 EB 00

As we let OpenCode run those three programs we also ask it to measure the runtime and get the table below.

Program Instr. Time Time / instr Output
add-two 5 5.51 1.10 $0200=08
count-loop 63 25.64 0.41 $0200-$0209=01–0A
fibonacci 97 33.65 0.35 $0200-$0209=01,01,02,03,05,08,0D,15,22,37

The first thing we see is that all three programs produce the expected output. The second thing we see is that this is very slow: a typical 6502 running at 1 MHz would execute some 500k instructions per second. Our 6502 emulator runs 1-2 instructions per second. An emulation speed of approximately 0.0002%.

Running in Claude Code

Running the three programs in Claude Code gives very similar results. The add-two run was slower, most likely due to the system having to warm-up before tackling the first program.

Program Instr. Time (s) Time/instr (s) Output
add-two 5 22.45 4.49 $0200=$08
count-loop 63 34.11 0.54 $0200–$0209=01–0A
fibonacci 97 33.5 0.35 $0200–$0209=01,01,02,03,05,08,0D,15,22,37

Again, performance obviously is not great, but performance was certainly not the intended target here. This was just a fun experiment to see how far we could take this idea.

Adam Dunkels, PhD
Written by Adam Dunkels, PhD
Helping companies build complex products in the intersection of hardware and software: Services