Encoding Programs in QR Codes
QR codes are usually used for storing small amounts of textual data such as web links. However, they are capable of storing other arbitrary data. Here a small computer architecture is designed around a compact encoding of computer instructions in the textual data of QR codes. To experiement, I explore a small instruction-set, and create a small development environment (assembler & virtual machine) to experiment with some simple programs.
Background
A while ago I stumbled upon two interesting videos: “Can you fit a whole game into a QR code?” & “Can you fit an Entire Windows App inside a QRCode?”. It was pretty cool to see some of the different tricks and techniques to shrink a binary executable file to meet the modest size constraints (~max 3Kb binary data) of a QR code.
It got me thinking a bit – these programs were for modern computer architectures. What could it look like to build instead a computer architecture around the QR code? Instead of storing arbitrary data, could we exploit properties of the format to reach a more optimal encoding of instructions in the QR code?
Exploring QR Code Encoding Modes
The capacity of QR code is determined by the matrix size (from 21x21 “Version 1” to 177x177 “Version 40”), the desired level of level of error correction (L, M, Q, H), and the mode (e.g. only numeric data, alphanumeric, binary etc). The maximum capacity for a Version 40-L code by mode:
Mode | Max chars | Bits/char | Charset Size |
---|---|---|---|
Numeric0123456789 |
7,980 | 3 1/3 | 10 |
Alphanumeric0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ $ %*+-./: |
4,296 | 5 1/2 | 44 |
Binary/byte | 2,953 | 8 | 256 |
Kanji/kana | 1,817 | 13 | 8,192 |
For many URLs, the alphanumeric subset is sufficient. Limiting characters to this subset allows data to be encoded using fewer bits per character.
An instruction set designed around this gamut (limited to 44 instructions) could fit more compactly than bytecode, or other arbitrary binary data like programs for modern computing architectures.
For example the ARM A64 ISA has fixed width 32-bit instructions, whereas in a “QuaRrchitecture”, each could be encoded in 5.5 bits. In the space of 10 single A64 instructions, you could encode 58.
Of course, given such a limited instruction set, you might need in multiple instructions to match the functionality of a single A64 instruction.
Computer Architecture
We start with a stack machine with a Harvard architecture (separate program & working memory). Stack machines typically have a higher code density. This as many instructions require no arguments as working on the stack. The architecture in many ways resembles Java bytecode.
Operations and (non-program) memory accesses are applied on “words”. This word-size is not necessarily aligned with the instruction encoding size, and is implementation defined.
Instruction set Overview
A more full description of the operations and assembly directives can be found in this WIP document.
Type | Ops |
---|---|
Arithmetic (8) | inc , dec , neg , add , sub , mul , div , mod |
Bitwise (6) | not , shl , shr , xor , or , and |
Conditional (4) | z zero, nz not-zero, m minus, p plus |
Control Flow (6) | ret , jmp (2), call (2), loop |
Load/Store (8) | ldi (3), ld , st , ldz , stz , ldp |
Stack (9) | dup , drop , over , swap , nip , rot , rtop , tos , tor |
Misc (3) | nop , hlt , sys |
Total (44) |
A disassembly of a simple program O5+%/2N+.
(counts down from 5), showing how instructions are encoded:
ldi 5 # O5 tor # + l: rtop # % sys OUTNUM # /2 loop l # N+ hlt # . |
![]() |
The ISA takes inspiration also from different sources:
- Forth language.
rtop
tos
andtor
are synonymous with Forth’sR@
,R>
and>R
- conditionals similar to conditionally executed instructions in older ARM ISAs. Any op can be made conditional. This condition-prefix encoding saves opcode space by not requiring both unconditional and conditional variants.
loop
inspired by Z80djnz
/ x86loop
. If top of return stack is > 0, it is decremented andrel1
branch taken. Otherwise, top of return stack is dropped. This makes it convinient to write loopsldz
andstz
inspired by the 6502 zero-page addresing mode, this allows easy access to first page (44 words) of memory, to be used as registers/temporary storage.- The
sys
system call is a mechanism to allow calling external functions by a numeric identifier. These are a cross between operating system calls & library functions as found in other langauges. They allow IO to the text output, and primitive graphic display.
Numeric Encoding
While most ops operate directly on the stack, some take numeric arguments – int literals in the case of ldi
, or absolute / relative offsets as in jmp
, call
. Unsigned numbers are encoded in base-44. Using c
characters, a signed number, n
in range \(-44^c/2 < n < 44^c/2\) is encoded by:
This “wrap-around” behaviour is similar to that of twos-complement in base-2.
Programming
An assembler & virtual machine are available in this 90s-themed Web IDE (source code repository). A USB barcode scanner is used to scan and enter programs into the virtual machine (alternatively the textual program “binary” can be copied from assembler to VM, without the QR code).
Below are some relatively simple programs with assembly source linked. Size is number of charaters, % of total capacity, number of bits when encoded, and bits in equivalent 8-bit bytes:
QR | Program | Size |
---|---|---|
![]() |
Draw Random DotsP:0W/5Y/5/3K30 |
14 chars (0.3%) 77 bits ~10 bytes |
![]() |
Generate Prime NumbersO2W/2MP00WP5B4H.WRFK80K20WWP5B4HK/0WWSY3KQ0XI |
45 chars (1%) 248 bits ~31 bytes |
![]() |
Generate Maze PatternP02MU0.O25PM23V3IWP:07ZP:06/3IWWSWMH0O3+O4/5MP1FKI1ZMH0MU0NTKM1XXNMXXI0O47YYM70YYZM70WWP:07EKI2WP:0W56FKI2RIXO1I0:10/::0 |
120 chars (2.8%) 660 bits ~83 bytes |
Keeping track of the stack can be a little tricky. Sometimes easier to work through a program using pen and paper:
Thoughts & Future
The examples here are still pretty small & simple programs. At this scale, a big obstacle to encoding programs for existing architectures is more the overhead of the executable binary format (e.g. ELF, Mach-O) rather than the executable code itself. QuaRchitecture programs are reminiscent of an old-school COM file executable – no headers or metadata encoded with the program.
It would be interesting to write some more complex programs and compare the executable code (without binary metadata / executable file overhead) against other architectures.
The approach of encoding the programs takes each character to be a low-level operation, similar to bytecode. It could be interesting to assign higher-level operations to each encoded character. For example, the APL language allows dense programming using a limited set of operators working on higher-level arrays.
The primitive graphical output resembles a little the QR code matrix.. perhaps a fun challenge is to write a quine – a program that when scanned and executed, generates its own QR code as output.