observabilityClickHousetutorial

Building Real-Time Observability with ClickHouse: Schemas, Retention, and Low-Latency Queries

UUnknown

2026-02-27

10 min read

Hands‑on ClickHouse guide for DevOps: schema, partitioning, TTLs and query patterns to power sub‑second observability dashboards in 2026.

Stop slow dashboards: build real‑time observability with ClickHouse

If your DevOps or SRE dashboards take tens of seconds to render, or your alerting pipelines bog down when traffic spikes, the root cause is often schema and ingestion design — not raw CPU. In 2026, teams expect sub‑second charts for operational decision‑making. This hands‑on guide shows how to design ClickHouse schemas, partitioning, retention and query patterns to power near‑real‑time dashboards with predictable, low‑latency behavior.

Why ClickHouse now (and why it matters to SREs)

ClickHouse's rapid adoption through 2024–2026 — driven by cloud alternatives and major funding rounds for the project and vendors — has pushed feature development toward production observability: lower ingestion latencies, better TTL and tiering, efficient indexes, and managed cloud options. For SREs this means you can get high‑cardinality telemetry, rapid ingestion, and cost‑effective retention in one system when you design for the right trade‑offs.

Key observability trends in 2026

Managed ClickHouse and tiered storage: cheaper long‑term retention on object storage while keeping recent data hot.
Faster streaming ingestion: tighter Kafka integrations, buffer patterns and improved merge scheduling for high throughput.
Pre‑aggregation and rollups: adoption of materialized rollups to keep dashboards sub‑second at scale.
Index and low‑cardinality patterns: token/Bloom indexes and LowCardinality types reduce query IO for tag filtering.

Core design goals for near‑real‑time dashboards

Write throughput: ingest huge volumes without blocking reads.
Query predictability: fast response for recent time ranges and common tag filters.
Affordable retention: keep 7–90 days hot, move older data to cheaper storage.
Operational simplicity: manageable compaction, small part counts, and clear TTL policies.

Example observability schema patterns

Below are practical table patterns tuned for dashboards: an append‑only events table for raw telemetry, a minute‑level rollup for dashboards, and a compact long‑term store with TTL migration to cold S3 storage.

1) Raw events table (high ingest, wide schema)

Use MergeTree, low cardinality for labels, and an order key that favors recent time windows. The example assumes events with timestamp, service, host, metric, value and a small set of tags.

CREATE TABLE metrics_events (
  ts DateTime64(3) DEFAULT now(),
  service LowCardinality(String),
  host LowCardinality(String),
  metric LowCardinality(String),
  value Float32,
  tags Nested(key String, value String),
  ingestion_id UUID DEFAULT generateUUIDv4()
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (toStartOfMinute(ts), service, metric)
SETTINGS index_granularity = 8192;

Why this layout?

LowCardinality reduces memory and index size for high‑card label columns while keeping fast equality filters.
Partition by month keeps partition count manageable while enabling fast range pruning.
ORDER BY with toStartOfMinute(ts) makes recent minutes contiguous on disk, optimizing scans for dashboard time windows.
index_granularity tuned to 8192 for balanced index size vs lookup speed — adjust lower (e.g. 4096) for more selective point queries, at a cost of larger index memory.

2) Minute‑level rollup (fast dashboard queries)

Materialize aggregated metrics at minute resolution for common dashboard queries (per service or per metric). This reduces the work each query must do and keeps UI latency low.

CREATE TABLE metrics_rollup_minute (
  minute DateTime DEFAULT toStartOfMinute(ts),
  service LowCardinality(String),
  metric LowCardinality(String),
  cnt UInt64,
  sum_value Float64,
  avg_value AggregateFunction(avg, Float64)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (minute, service, metric);

CREATE MATERIALIZED VIEW mv_rollup_minute
TO metrics_rollup_minute
AS
SELECT
  toStartOfMinute(ts) AS minute,
  service,
  metric,
  count() AS cnt,
  sum(value) AS sum_value,
  avgState(value) AS avg_value
FROM metrics_events
GROUP BY minute, service, metric;

Querying the rollup uses final functions or avgMerge(avg_value) from AggregatingMergeTree. This cuts compute and I/O dramatically for dashboards showing per‑minute trends.

3) Long‑term cold store with TTL

Move or drop old data automatically to control storage costs. ClickHouse supports TTLs that can MOVE TO DISK (if you use multiple disks) or TO VOLUME with object storage integration in managed/cloud setups. Example: keep 30 days hot, 365 days in S3, delete after 2 years.

ALTER TABLE metrics_events
MODIFY TTL ts + INTERVAL 30 DAY TO VOLUME 'hot',
  ts + INTERVAL 365 DAY TO VOLUME 'cold_s3',
  ts + INTERVAL 730 DAY DELETE;

If you're on self‑managed ClickHouse, configure disks/volumes in the server config. For managed cloud offerings, use the vendor UI to bind volumes to colder tiers.

Ingestion patterns for low latency

Real‑time dashboards need continuous ingestion with bounded tail latency. Use these patterns to avoid write stalls and noisy neighbor effects.

Use Kafka engine + buffer pattern

Create a Kafka engine table to capture raw messages.
Create a target MergeTree table for persistent storage.
Use a materialized view to move and transform messages from Kafka to MergeTree asynchronously.

CREATE TABLE kafka_events (
  key String,
  payload String
) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka:9092', kafka_topic = 'metrics',
  kafka_group_name = 'ch_ingest_group', format = 'JSONEachRow';

CREATE MATERIALIZED VIEW kafka_to_metrics TO metrics_events AS
SELECT
  parseDateTimeBestEffort(payload->>'$.ts') AS ts,
  payload->>'$.service' AS service,
  payload->>'$.host' AS host,
  payload->>'$.metric' AS metric,
  toFloat32OrNull(payload->>'$.value') AS value,
  -- parse tags into Nested
  parseJSON(payload->>'$.tags') AS tags
FROM kafka_events;

This decouples producers from ClickHouse ingestion — Kafka cushions bursts. Tune Kafka consumer parallelism and ClickHouse insert_batch_size to keep ingestion latency low.

Buffer engine for spikes

For short bursts without Kafka, the Buffer engine accepts fast small inserts and flushes to MergeTree in larger batches. Use it where you can't run Kafka but still need burst tolerance.

Tuning tips for ingestion

Set max_insert_block_size and insert_quorum conservatively; large batches improve compression but increase tail latency.
Monitor MergeTree merges and tune merge_max_size and background pool sizes to avoid compaction stalls.
Use asynchronous inserts (HTTP or native) and keep client retries exponential to avoid thundering herds.

Query patterns for low latency dashboards

Dashboard queries are usually time‑bounded, filter heavy on service/host/metric, and often aggregate over short windows. Design both schema and queries to exploit indexes and pre‑aggregations.

1) Always bound by time

Use WHERE ts BETWEEN now() - INTERVAL 5 MINUTE AND now() to keep scans small and cache‑friendly.

SELECT
  toStartOfMinute(ts) AS minute,
  metric,
  avg(value) AS avg_v,
  count() AS cnt
FROM metrics_events
WHERE ts >= now() - INTERVAL 5 MINUTE
  AND service = 'api'
GROUP BY minute, metric
ORDER BY minute DESC
LIMIT 1000;

2) Prefer rollups for UI panels

Panels that show rate per second or average latency across hosts should query the minute rollup table instead of raw events. If you need higher fidelity for anomaly detection, keep a short hot window of raw events (e.g., 5–30 minutes).

3) Use indexes for high‑card filters

Add a token/bloom filter index on tags or JSON blobs to avoid full scans on high‑card searches. Example syntax for a Bloom token index:

ALTER TABLE metrics_events
ADD INDEX idx_tags tokenbf_v1(tags.key, 0.5, 2) TYPE tokenbf_v1(65536) GRANULARITY 4;

Use these selectively — they add write cost and disk usage but dramatically speed selective lookups when cardinallity is high.

4) Use sampled downsampling for quick overviews

For ultra‑fast overview charts you can maintain a sampled table (e.g., 1% sample) and expose it for latency‑sensitive UI views where exact counts aren't required.

Retention and cost control: practical recipes

Retention has two goals: keep recent data easily queryable and move old data cheaply. Use TTLs to automate movement and deletions, and plan compaction to avoid small part explosion.

Recipe: 30/365/730

Keep 30 days hot on fast NVMe volumes for live dashboards and SLO windows.
Move 31–365 days to object storage (S3) with cold volumes and higher compression.
Delete data older than 2 years, or move to archived compressed Parquet exports if legal retention requires it.

ALTER TABLE metrics_events MODIFY TTL
  ts + INTERVAL 30 DAY TO DISK 'fast_nvme',
  ts + INTERVAL 365 DAY TO VOLUME 's3_cold',
  ts + INTERVAL 730 DAY DELETE;

When you move parts to S3: use higher compression (zstd -T6 or greater) and larger part sizes to amortize object storage overhead.

Operational observability for ClickHouse itself

Make ClickHouse observable: track insert latency, queue sizes, number of parts, merge rates, and read latency. Build small dashboards from system tables:

SELECT
  metric,
  avg(value) as avg_value,
  count() as cnt
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY metric;

Monitor system.parts for exploding part counts and system.merges for stuck merges. Alert when merges backlog > threshold or when CPU waits on I/O rise.

Real‑world example: SRE team migration checklist

A production migration from Prometheus → ClickHouse for long‑term storage and dashboards used this approach and achieved 80–95% faster dashboard render times and 50% lower storage cost over 12 months.

Checklist

Define retention tiers and volumes with storage capture costs per GB.
Design raw and rollup tables; test ORDER BY patterns on a staging dataset.
Set up streaming ingestion: Kafka → ClickHouse MV → MergeTree.
Add selective indices for high‑card queries (tokenbf_v1 / bloom filters).
Create minute/hour rollups for dashboard queries and a 1% sampled table for overviews.
Automate TTLs to move older parts to cold volumes and delete when appropriate.
Instrument ClickHouse with system.* tables and set alerts for merge/backlog/insert latency.

Advanced strategies and future‑proofing (2026+)

As observability needs evolve, plan for these advanced capabilities that many teams are adopting in 2026:

Multi‑cluster replication: use cluster replication for geo‑redundant reads close to engineers and on‑call teams.
Column‑level TTLs and tiered compression: older numeric columns compressed with stronger algorithms to save cost.
Serverless query gateways: abstract query routing so dashboards hit rollups or sampled tables automatically based on SLA.
Nearline compute: use ephemeral aggregation nodes for heavy ad‑hoc analysis to avoid impacting dashboard latency.

Common mistakes and how to avoid them

Bad ORDER BY: Choosing a primary key that fragments recent writes across many parts. Fix: make time first if dashboards are time‑focused.
Too many partitions: Partitioning by day for multi‑year retention causes overhead. Use monthly or weekly depending on traffic volume.
Unbounded tags: Storing free‑form tags as raw strings causes high cardinality. Fix: normalize tags and use LowCardinality or even a tag dictionary table.
No rollups: Querying raw events for every dashboard panel. Fix: pre‑aggregate common slices to reduce runtime work.

Actionable takeaways

Design tables with LOW_CARDINALITY types and an ORDER BY that favors recent time ranges.
Use Kafka + Materialized Views to decouple ingestion and smooth spikes.
Pre‑aggregate with AggregatingMergeTree materialized views for sub‑second dashboards.
Use TTLs to automate hot→cold movement and control cost; choose part sizes and compression by retention tier.
Add selective Bloom/token indexes for costly tag searches, but measure write overhead first.

"Design for common queries first: predictability beats flexibility when SREs rely on dashboards under pressure."

Next steps — quick checklist to implement in 2 weeks

Prototype raw MergeTree table and short‑window rollup on a representative dataset.
Set up Kafka ingestion and a materialized view pipeline; tune batch sizes for 99th percentile insert latency.
Measure dashboard queries: if any panel > 1s, create a rollup or index to bring it under 500ms.
Configure TTLs and validate data movement to cold volume on staging before production cutover.

Final thoughts

ClickHouse can power real‑time observability for DevOps and SRE teams, but only when you design schemas, partitions and retention with query patterns in mind. In 2026, with improved managed offerings and tighter streaming integrations, the biggest wins come from predictable data layouts, pre‑aggregation, and automated tiering.

Call to action

Ready to migrate or optimize your observability stack? Download the sample ClickHouse repo with ready‑to‑use DDLs, ingestion configs and dashboard queries, or schedule a 30‑minute audit with our engineers to get a customized retention and schema plan for your environment.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.