Monitoring tools ingest telemetry from your applications, infrastructure, and dependencies, then turn it into dashboards, alerts, and answers when production goes sideways. The category covers metrics (host CPU, request rate, queue depth), logs (structured events from services), traces (request flow across distributed systems), profiles (in-process CPU and memory), and increasingly RUM and synthetic monitoring for the user-facing side.
Observability platforms (Datadog, New Relic, Dynatrace, Honeycomb, Grafana Cloud, Splunk) bundle most or all of these signals into a single backend. Open source stacks (Prometheus + Grafana + Loki + Tempo, OpenTelemetry collectors feeding ClickHouse or SigNoz) match many features at lower cost but require more operational investment. The right pick depends almost entirely on whether you have the engineering capacity to operate a stack or you would rather pay to make the problem go away.
The dirty secret of the category is that cost scales with logging and trace cardinality, not with sticker price. A team that adds a high-cardinality tag (per-user request ID, per-tenant trace) to Datadog's pipeline can multiply their bill 10x in a quarter without realizing it. Cost predictability matters more than feature checklists, and the best platforms surface what is driving the bill before it explodes.