ADR-0004 — ClickHouse as both audit and analytical store¶

Status: Accepted
Date: 2026-04-28

Context¶

The platform has two big-data shapes:

Analytical — tracks, frames, detections. High-volume, append-mostly, queried by org/time/region/H3-cell. Used by Planner web (coverage, search) and Gen (federated dashboards).
Audit — every administrative action recorded with org context for compliance. High-volume, append-only, queried by (org_id, created_at) + filters.

We could:

Postgres for both. Already there. But the analytical workload — millions of frames per org, geospatial filters, time-range scans — is not Postgres' shape. We've seen this fail at smaller scales.
Postgres for audit, ClickHouse for analytics. Two stores, two schemas, two retention models, two backup stories.
One OLAP engine for both. Add ClickHouse, use it for both analytics and audit.

Decision¶

Use ClickHouse 26.3 (Altinity Operator-managed, 1 shard × 2 replicas) for both analytical (tracks, frames, detections) and audit (audit_log) workloads.

Consequences¶

Positive: One OLAP engine to operate, one cluster to back up, one set of skills to hire for.
Positive: Audit queries (filter by org, by time range) are exactly ClickHouse's sweet spot. Postgres would have been workable but expensive at growth.
Positive: Native partitioning by toYYYYMM(...) lets us drop old data wholesale, no DELETE storms.
Positive: Analytical engines handle the "scan recent frames in this region" workload without bespoke indexes.
Negative: ClickHouse is eventually consistent within a replica set (merges happen in background). Idempotency on the analytical tables is enforced client-side via DELETE + INSERT (see below) rather than at the engine level.
Negative: ClickHouse is operationally different from Postgres. We rely on the Altinity operator and on SigNoz dashboards for visibility.
Negative: Two stores total in the platform (Postgres + ClickHouse) instead of one. Acceptable cost.

Why one cluster, two purposes (not two clusters)¶

Audit and analytics share the same write path (Sense Worker, Kafka-fed batch insert) and similar query shapes. Splitting them into two physical clusters would double operational cost without solving a real problem. If audit retention requirements diverge wildly (e.g., audit must live 7 years, analytics 2), we can split per-table TTL within the same cluster — no cluster split needed.

Why plain `ReplicatedMergeTree` + client-side `DELETE` + `INSERT`¶

All four tables use plain ReplicatedMergeTree. Idempotency for analytics is handled at the writer: the worker issues a lightweight DELETE by primary key (with mutations_sync = 1, allow_nondeterministic_mutations = 1) before every INSERT, so re-deliveries from Kafka leave a single row.

We tried ReplacingMergeTree(updated_at) first — it was attractive because the engine collapses duplicates during background merges. The problem: querying it correctly requires SELECT … FINAL, and FINAL cost killed read latency once the working set grew.
Plain ReplicatedMergeTree keeps reads fast. The trade-off is a heavier write path (one DELETE mutation per upsert) and the requirement that we own the dedup logic in code.
Audit follows the same engine but the contract is stronger: every audit row is unique and must never be coalesced. Duplicates are kept out at the producer (Kafka idempotent producer + at-most-once acceptance per request_id).