Computer Architecture Deep Dive
From transistors to the instruction pipeline — how a modern CPU actually executes code, and how that shapes software performance.
- Reading time
- ~69 min (+32 min optional)
- Level mix
- 9 intermediate · 3 advanced
Modern CPUs are not simple — they reorder instructions, predict branches, cache data at multiple levels, and execute several operations simultaneously. Understanding this machinery explains why a “fast” algorithm can be slow in practice, and why cache misses matter more than operation counts in real workloads.
This path builds the mental model in layers: start with what an instruction set is, move through the pipeline stages, then work through the memory hierarchy that dominates real-world performance.
Roadmap
Loading progress...
What the CPU understands
The contract between a CPU and the programs that run on it — the menu of operations a processor can perform.
A handful of tiny, lightning-fast storage cells inside the CPU — the only memory most instructions actually read from or write to.
- RISC vs CISCOptional
Two philosophies of CPU instruction set design — Reduced Instruction Set Computing (small, simple, fast) versus Complex Instruction Set Computing (rich, do-more-per-instruction).
How instructions execute
An assembly-line technique for executing instructions — split each instruction into stages and overlap many in flight at once.
A CPU technique that guesses the outcome of a branch (an if or a loop) before it's actually known, so the pipeline can keep executing instead of stalling — and rolls back if the guess was wrong.
A CPU technique that executes instructions as soon as their inputs are ready rather than in strict program order, keeping execution units busy while slow operations like memory loads complete.
- SIMDOptional
Single Instruction, Multiple Data — a CPU feature that applies one operation to many data elements at once, accelerating the vector and array math common in graphics, media, and machine learning.
Memory
Small, fast memory close to the CPU that keeps recently or about-to-be-used data, hiding the slowness of main memory.
The layered set of storage in a computer — from registers to disk — trading size for speed.
- DRAM vs SRAMOptional
The two main kinds of volatile memory — DRAM is dense and cheap but slow and needs refreshing; SRAM is fast and stable but bulky and expensive, so it's used for CPU caches.
A technique that gives each process the illusion of a private, contiguous memory space — built from page tables that map virtual addresses to physical RAM.
I/O
- DMAOptional
Direct Memory Access — hardware that lets devices transfer data to and from main memory without routing every byte through the CPU, freeing the processor for other work.