ClickHouse¶
The analytical and audit store. Sense writes; Gen reads (federated); Sense API reads for catalog-style queries.
At a glance¶
| Property | Value |
|---|---|
| Version | ClickHouse 26.3 |
| Operator | Altinity ClickHouse Operator 0.3.11 (in-cluster) |
| Topology | 1 shard × 2 replicas + ClickHouse Keeper |
| Service DNS | clickhouse-sense.axion.svc:8123 (HTTP), :9000 (TCP) |
| Database | axion_sense |
| Helm chart | axion.infra/services/clickhouse |
| Backup | Per-replica backups via clickhouse-backup (out-of-band cron) |
| Storage | PVC per replica |
Tables¶
All tables use plain ReplicatedMergeTree. Idempotency is handled client-side via lightweight DELETE (with mutations_sync = 1, allow_nondeterministic_mutations = 1) followed by INSERT.
| Table | Engine | Sort key | Partition | Purpose |
|---|---|---|---|---|
tracks |
ReplicatedMergeTree |
(id) |
toYYYYMM(UUIDv7ToDateTime(id)) |
Track metadata + map-matched route |
frames |
ReplicatedMergeTree |
(track_id, captured_at, id) |
toYYYYMM(UUIDv7ToDateTime(id)) |
Per-frame metadata, GPS, sensors, H3 cells |
detections |
ReplicatedMergeTree |
(track_id, frame_id, detector_id) |
toYYYYMM(UUIDv7ToDateTime(track_id)) |
ML detection results |
audit_log |
ReplicatedMergeTree (write-once) |
(organization_id, created_at) |
toYYYYMM(created_at) |
Compliance audit trail |
Why plain ReplicatedMergeTree + client-side DELETE + INSERT¶
We deliver from Kafka at-least-once. Re-deliveries on consumer restart, retries on transient ClickHouse errors, and idempotent re-runs of the Citylens migration all create duplicates. We handle them with a writer-side DELETE by primary key + INSERT (using mutations_sync = 1, allow_nondeterministic_mutations = 1) so each row is replaced cleanly.
ReplacingMergeTree(updated_at) was the original choice — it collapses duplicates by sort key during background merges, which sounds ideal. We moved away from it because querying it correctly required SELECT … FINAL, and FINAL cost killed read latency once the working set grew. Plain ReplicatedMergeTree keeps reads fast at the cost of a more involved write path.
audit_log follows the same pattern but the contract is stronger: every audit row is unique and must never be coalesced. Duplicates are kept out at the producer (Kafka idempotent producer + at-most-once acceptance criteria for audit batches).
Why UUIDv7ToDateTime for partitioning¶
Track and frame IDs are UUIDv7 — they encode a millisecond timestamp. Partitioning by toYYYYMM(UUIDv7ToDateTime(id)) gives time-ordered monthly partitions without storing a separate timestamp column on the partition key. Same applies to detections (partitioned by track_id).
Operational benefits:
- OPTIMIZE PARTITION works on a known time slice.
- Old partitions can be dropped wholesale (ALTER TABLE ... DROP PARTITION '202501') for retention enforcement without DELETE overhead.
Inserts¶
All inserts come from Sense Worker via batch path:
TrackMetadataBatchMiddleware→tracks+frames.RecognitionResponseBatchMiddleware→detections.AuditEventBatchMiddleware→ClickHouseAuditBatcher→audit_log.Citylens*BatchMiddleware→tracks/frames/detections(via mapping).
Insert pattern: bulk INSERT INTO ... VALUES (...) (...) ... with prepared columns. Batch size 1000–5000 rows, flush every few seconds or when size threshold is hit.
Migrations¶
Idempotent CREATE TABLE IF NOT EXISTS SQL in axion.sense.backend/deploy/clickhouse/init.sql. Migration runner applies the script at every startup — safe because it's all IF NOT EXISTS. Schema changes are additive only; column drops are deferred to a planned downtime window.
Reading¶
- Sense API: catalog-style reads (e.g., recent tracks for an org) via the data-context's ClickHouse readers.
- Sense Worker: reads frames by H3 cell during PMTiles generation.
- Gen API: federated read via the ClickHouse data-source connector. Gen has its own DB role with
SELECTonly.
Operations¶
- Hot upgrades: Altinity Operator handles rolling upgrade of replicas. Pin the operator + ClickHouse versions in
Chart.yamlto control timing. - Replica failure: Operator promotes the surviving replica; new replica resyncs from the other.
- Keeper failure: 3-node Keeper quorum required for ZooKeeper-style coordination — verify replica count.
- Disk pressure: monitor
system.partssize; tune merge settings if part count blows up.
Observability¶
- ClickHouse exporter → OTeL (query latencies, merge counts, replication lag).
- Per-query traces via
system.query_logaccessible through the ClickHouse UI (ch-uiexposed athttps://axion-clickhouse-ui-staging.dev.axionx.ai).
Capacity¶
| Metric | Steady state | Peak (after coverage day) |
|---|---|---|
frames rows / sec |
~3 / s (long-run avg); ~5–10 / s during active hours | ~45–50 / s |
detections rows / sec |
~2–3 / s (long-run avg) | ~40 / s (≈1,000 / s during reprocessing/backfill bursts) |
tracks rows / sec |
~0.1 / s (long-run avg) | ~2.5 / s |
| Disk per replica | ~0.9–1.1 GiB / month growth (frames + detections + tracks); +0.3–0.5 GiB during detection-reprocessing months | — |
| RAM per replica | 12–16 GiB recommended | — |