Mutex
Also known as: mutual exclusion lock, lock
A synchronization primitive that ensures only one thread at a time can hold it — the basic tool for protecting shared state from data races.
- Primary domain
- Concurrency & Parallelism
- Sub-category
- Concurrent & Parallel Computing
In simple terms
A mutex (“mutual exclusion”) is a lock. When a thread wants to touch shared data, it acquires the mutex; when it’s done, it releases it. While one thread holds the mutex, any other thread that tries to acquire it has to wait. Used correctly, this prevents two threads from corrupting shared state by writing to it at the same time.
The Visual Map
flowchart LR
T1["thread 1"] -->|"lock() — succeeds"| CS["critical section<br/>(shared data)"]
T2["thread 2"] -->|"lock() — blocks"| WQ["wait queue"]
T3["thread 3"] -->|"lock() — blocks"| WQ
CS -->|"unlock()"| WAKE["kernel wakes one waiter"]
WAKE --> WQ
WQ -.->|"next holder"| CS
One door, one key, a queue outside — that’s the entire mental model.
More detail
The mutex API is small: lock() (acquire, blocking if needed) and unlock() (release). Most languages also offer try_lock() (return immediately if held).
How a modern mutex works under the hood:
- Fast path — atomic compare-and-swap on a single memory word. If uncontended, the lock is acquired in nanoseconds without involving the kernel.
- Slow path — if the lock is held, the thread spins briefly (in case the holder is about to release) then makes a kernel syscall (
futexon Linux,WaitOnAddresson Windows) to sleep until woken. - Wake — when
unlock()is called and others are waiting, the kernel wakes one of them.
Variants and adjacent primitives:
- Recursive mutex — same thread can lock it multiple times; must unlock the same number of times.
- Read–write lock — many concurrent readers OR one exclusive writer.
- Spinlock — busy-loops instead of sleeping; only safe in very specific kernel contexts.
- Semaphore — counts permits; mutex is a semaphore with count 1.
- Condition variable — used with a mutex to wait for a condition to become true.
- Atomic operations — for the smallest cases (counters, flags), no mutex needed.
Mutex hazards:
- Deadlock — two threads each hold a lock the other wants.
- Lock contention — many threads queue on one popular lock; throughput collapses.
- Lock ordering — different code paths acquire locks in different orders, causing deadlocks intermittently.
- Forgetting to unlock — RAII (Rust’s
Drop, C++‘sstd::lock_guard, Python’swith) is the standard defence.
Almost every multithreaded program touches mutexes — even those that don’t use them directly, because the standard library does. Understanding the cost (contention is often the bottleneck in well-tuned services) and the failure modes (deadlock, priority inversion) is essential to writing correct concurrent code.
Under the Hood
The race, and the one-line cure:
import threading
balance = 0
lock = threading.Lock()
def deposit_unsafe(times):
global balance
for _ in range(times):
balance += 1 # read-modify-write: three steps, interruptible
def deposit_safe(times):
global balance
for _ in range(times):
with lock: # acquire ... release, even on exception
balance += 1
# run either version with several threads to compare
threads = [threading.Thread(target=deposit_safe, args=(100_000,))
for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(balance) # safe: exactly 400000, every run
with lock: is the load-bearing line: it makes the three-step balance += 1 (read, add, write) behave as one indivisible step. The with form also guarantees release if the body raises — the RAII defence against forgotten unlocks.
Engineering Trade-offs
- Coarse vs fine-grained locking. One lock around a whole subsystem is simple and deadlock-resistant but serialises everything — Python’s GIL is the cautionary tale. Per-object locks scale but multiply the ordering rules and deadlock surface. Start coarse; refine only where contention is measured.
- Sleep vs spin. A sleeping waiter frees the CPU but pays two context switches to nap and wake (~microseconds). A spinlock burns CPU but reacts in nanoseconds — right only when hold times are shorter than a context switch, which is why spinlocks live mostly inside kernels.
- Mutex vs read–write lock. RW locks let readers share — a win when reads vastly outnumber writes — but cost more per acquisition and can starve writers. Under mostly-write or short-critical-section workloads, a plain mutex is often faster.
- Locks vs atomics vs no sharing. For a lone counter or flag, an atomic beats a mutex. For complex invariants, a mutex beats clever atomics you’ll get wrong. Best of all is often restructuring so threads don’t share mutable state at all (channels, sharding per thread).
Real-world examples
- The Linux kernel uses thousands of mutexes; lockdep, a built-in lock validator, has prevented countless cross-locking bugs by reporting on inconsistent lock ordering.
- Python’s Global Interpreter Lock (GIL) is, mechanically, one giant mutex around the Python interpreter — the reason Python threads don’t parallelise CPU-bound work in CPython.
- Database engines use fine-grained mutexes (often per-page or per-row) to allow concurrent transactions without serialising the whole engine.
Common misconceptions
- “Locking is slow.” Uncontended locking is a few nanoseconds; contended locking can stall a thread for milliseconds. The trick is to design for low contention, not to avoid locks.
- “Lock everything to be safe.” Over-locking causes deadlocks and serialises code that could have run in parallel.
Try it yourself
Measure what contention actually costs — the same work, alone vs fighting over one lock:
python3 -c "
import threading, time
lock = threading.Lock()
N = 500_000
def hammer():
for _ in range(N):
with lock:
pass
t = time.perf_counter()
hammer() # 1 thread: uncontended fast path
solo = time.perf_counter() - t
threads = [threading.Thread(target=hammer) for _ in range(4)]
t = time.perf_counter()
[x.start() for x in threads]; [x.join() for x in threads]
contended = time.perf_counter() - t
print(f'uncontended: {solo:.2f}s 4 threads contending: {contended:.2f}s')
"
Four threads doing 4× the acquisitions take far more than 4× the time — the surplus is the slow path: futex sleeps, wakeups, and cache-line ping-pong.
Learn next
- Deadlock — the classic failure mode of multiple mutexes.
- Semaphore — the counting generalisation of a lock.
- Lock-free programming — what avoiding mutexes entirely looks like.
Read this in a learning path
All paths →This topic is part of 3 learning paths. Start in context to keep prev/next and progress tracking.
- Read this in Concurrency 101The smallest mental model of concurrent programming — processes, threads, the scheduler, locks, deadlock — and why your multithreaded code keeps breaking in weird ways. Start here View the whole path
- Read this in High-Performance SystemsHow sub-millisecond software is built — cache-aware data layout, pool allocation, and lock-free concurrency, grounded in how the memory hierarchy and cache coherence really behave. Start here View the whole path
- Read this in OS ConceptsThe minimum viable mental model of an operating system — processes, threads, scheduling, memory, files, and the kernel that ties it together. Start here View the whole path
Relationships
- Requires
- Required by
Neighborhood
A visual companion to the relationships above. Click any node to visit that topic.