Business Context¶
The problem¶
Cities and road operators need a continuously refreshed picture of their road network: where is the asphalt cracking, which signs are missing, where are the potholes, and which segments still need to be surveyed this quarter. Today that picture is built by sending field inspectors out with cameras and manually reconciling video, GPS tracks, and detection results in spreadsheets.
The Axion Platform replaces that workflow.
What the platform does¶
Axion Sense turns a phone-equipped vehicle into a calibrated data collector and gives operators a planning surface to direct that fleet:
- The mobile app records video, slices frames at fixed intervals, captures GPS, WiFi, and cell-tower scans, and uploads them while still in the field.
- The Planner web app lets supervisors plan territories, assign detour tasks, and visualize coverage and detection results on a map.
- The Vision subsystem (first-party) runs the recognition pipeline — quality gating, dispatch to external detector APIs (signs, surface defects, etc.), spatial clustering, and map-matching of GPS tracks to the road network.
Axion Gen is the analytics surface on top of the same data:
- The Gen web app lets analysts build dashboards over Sense data and other federated sources (BigQuery, Flight SQL, Parquet on S3) — with a built-in AI agent that can write the queries, charts, and dashboard layouts via tool-calling.
Both products share the same identity, authorization, storage, and observability plane.
Actors¶
| Actor | Surface | Key activities |
|---|---|---|
| Inspector | Mobile app | Drive routes, record video, sync detour tasks. Often offline; uploads opportunistically. |
| Supervisor | Planner web | Plan territories, assign tasks, review coverage and detections, manage org users. |
| Analyst | Gen web | Build dashboards, ask the AI agent for queries, share with stakeholders. |
| External ML team | Kafka contract | Consume recognition requests, return detections. |
Constraints that drive the architecture¶
These constraints are why the platform looks the way it does. Most architectural choices trace back to one of these:
- Multi-tenant by Organization. Every entity (track, frame, task, dashboard, data source) is org-scoped. Permissions are not just RBAC — they're relationship-based ("user X can view tracks of org Y as long as their membership is active"). → drives OpenFGA.
- Field-first, intermittent connectivity. The mobile flow has to tolerate flaky uploads, partial commits, and delayed confirms. → drives the frame-upload state machine,
CommitFrameBatch/ConfirmTrackseparation, and presigned URLs (mobile uploads to S3 directly, never proxied through the API). - Audit-grade traceability. Every administrative action is recorded with org context for compliance. → drives the dedicated
audit_logClickHouse table fed via Kafka batching. - Heavy spatial + analytical queries. Frame search by H3 cell, coverage maps, detection counts by region — these are not OLTP shapes. → drives the ClickHouse split (transactional state in Postgres, analytics in ClickHouse) with
tracks/frames/detectionspartitioned monthly via UUIDv7-derived timestamps. - Both on-prem and cloud deployments must be supported. Customers deploy the platform in their own clusters. → drives the use of vendor-neutral building blocks (Postgres, Kafka, S3-compatible object storage) over managed-only services, and Helm charts in
axion.infraover cloud-specific IaC. - Legacy data must come along. Citylens (an older system) holds years of tracks, frames, and detections that must be importable both as a one-shot bulk migration and as an ongoing realtime feed. → drives the Citylens initial migration and Citylens realtime sync flows, plus the
CitylensTrackMappingtranslation table. - Two product teams, one platform. Sense and Gen ship on different cadences and have different vocabularies (data collection vs. dashboards). → drives the sibling-systems-on-shared-infra split rather than a monolith.
Quality goals (in priority order)¶
- Data integrity — never lose a frame, never double-count a detection.
- Operability — pre-failure visibility (SigNoz traces over the entire upload→ML→ClickHouse path); recoverable jobs (Hangfire restart-safe).
- Throughput — peak burst is "city-wide simultaneous upload after a coverage day"; baseline is steady single-vehicle ingest.
- Latency — UI interactions ≤ 300ms p95; realtime Citylens sync lag ≤ 30s; ML round-trip is best-effort (per-frame, not per-track).
- Cost — frames live in object storage, never in Postgres; analytics live in ClickHouse with TTL.
The next page — System Context — places this platform into its surrounding world.