Logging
Also known as: application logs, log management
Recording discrete events from a running system, so the engineers operating it can reconstruct what happened — and when, and why.
- Primary domain
- Systems Software
- Sub-category
- Dependability, Fault Tolerance & Reliability
In simple terms
A log is a stream of timestamped events produced by a running program: “request received”, “DB query took 312 ms”, “user 42 deleted file X”, “ERROR: payment gateway timeout”. Logs are the system’s running journal — invaluable when debugging, irreplaceable during an incident.
More detail
Modern good practice:
- Structured logging — emit JSON (or similar), not free-form text. Lets you filter and aggregate.
- Levels —
TRACE,DEBUG,INFO,WARN,ERROR. Production usually defaults to INFO; you can raise it temporarily during investigation. - Correlation IDs — every log line for a single request carries the same
request_idortrace_id. Lets you reconstruct a journey across services. - Don’t log secrets — passwords, tokens, full credit card numbers. Redact at source, not in the pipeline.
- Don’t log PII unnecessarily — and even where you must, treat it like the regulated data it is.
- Sample noisy lines — full-rate “health check OK” logs drown the signal.
Pipeline shape (one of many):
app → stdout / file → agent (Fluent Bit, Vector) → message broker (Kafka)
→ indexer (OpenSearch, Loki, ClickHouse) → UI (Kibana, Grafana, Datadog)
Logs are the “what exactly happened” signal. Metrics tell you that something is wrong; logs tell you what; traces tell you where.
Centralised logging is essential the moment you have more than one server — chasing logs across many machines by hand is hopeless.
Why it matters
When a system breaks at 3 a.m., logs are usually how the on-call engineer reconstructs what happened. Good logs cut incident time dramatically; bad logs (or no logs) turn 15-minute fixes into all-night investigations.
Real-world examples
-
A 500 error in a single request, traced back through 7 services by a shared
trace_id. -
A retroactive analysis of “how many users hit this bug yesterday?” answered with a single log query.
-
A compliance audit asking “what did this admin do last Tuesday?” answered from access logs.
-
GDPR and HIPAA both treat log files containing personal data as in-scope for compliance, which is why most modern logging pipelines strictly redact at source rather than relying on downstream filtering.
Common misconceptions
- “Log everything.” Eventually unaffordable in storage and unfindable in search. Pick what matters; sample the rest.
- “Logs are just for debugging.” They are also evidence for postmortems, security forensics, and product analytics.
Learn next
The other major signal: monitoring. What you do with these signals during a bad night: incident response.
Read this in a learning path
All paths →This topic is part of 2 learning paths. Start in context to keep prev/next and progress tracking.
- Read this in Backend Engineer Starter KitThe minimum set of topics that turns a programmer into someone who can ship and operate a backend service in production. Start here View the whole path
- Read this in Site Reliability EngineeringHow to keep software running reliably in production — from SLOs and observability to incident response and safe deployments. Start here View the whole path
Relationships
- Requires
- Related
- Leads to
Neighborhood
A visual companion to the relationships above. Click any node to visit that topic.