Scaling a Web Service
How a single web app survives going from "100 users" to "100 million" — caching, replication, sharding, queues, edge networks, and the orchestration glue.
- Reading time
- ~98 min (+24 min optional)
- Level mix
- 4 beginner · 12 intermediate · 2 advanced
Most “scalable web app” advice is folk wisdom. This path is the underlying mental model: where the bottlenecks actually appear as load grows, what each scaling tool buys you, and what it costs.
Start with single-machine speed (indexes, query plans, caches). Move to multi-machine (replication, sharding, CDN). Then work through the architecture and infrastructure decisions (microservices, containers, CAP, eventual consistency) that distinguish “works at scale” from “fails subtly in production”.
Roadmap
Loading progress...
Start here
An organised collection of data with a system around it that lets you store, retrieve, update, and query it efficiently.
A declarative language for querying and manipulating relational databases — you say what you want, the database figures out how.
Auxiliary data structures a database maintains so it can answer queries without scanning every row — the single biggest knob for query performance.
Faster queries
The step-by-step program a database derives from your SQL query before executing it — the lever you use to diagnose and fix slow queries.
Small, fast memory close to the CPU that keeps recently or about-to-be-used data, hiding the slowness of main memory.
A database that maps keys to values with O(1) lookup — the simplest possible data store, and the most-deployed.
More machines
Keeping copies of the same data on multiple machines for availability, durability, and read scaling.
Splitting data across multiple machines by some key, so the working set per machine — and the request load — stays manageable as the system grows.
Architecture
An architectural style that builds an application as many small, independently-deployable services, each owning its own data and talking over a network.
Closer to users
A global network of edge servers that caches and serves content close to users — making the web feel fast from anywhere on Earth.
Operating it
A lightweight, isolated unit of computing — a process running with its own filesystem and namespaces, packaged with its dependencies.
- KubernetesOptional
An open-source container orchestrator that manages where containers run, how they're scaled, how they're networked, and how they're restarted when things break.
A durable middleman that lets services send and receive messages asynchronously — decoupling producers from consumers, smoothing load, and surviving outages.
- ServerlessOptional
A deployment model where you write functions or small services and the platform handles provisioning, scaling, and idling them — you pay per-request, not per-server.
- Service MeshOptional
An infrastructure layer that handles service-to-service communication for microservices — routing, retries, encryption, and observability — usually via sidecar proxies, so application code doesn't have to.
- Cloud ProviderOptional
A company that rents computing — servers, storage, networking, and managed services — on demand over the internet, so you don't have to build and run your own data center.
The trade-offs
A foundational result of distributed systems: under a network partition, a system can be either Consistent or Available, not both.
A consistency model where replicas of data are temporarily allowed to disagree, with the guarantee that they will converge once writes stop.